LM:语言模型
MFCC:Mel频谱特征
PLP: Perceptual Linear Prediction, PLP特征
fBank: fBank特征
CMVN:
Cepstral Mean and Variance Normalization
倒谱均值方差归一化
Mono:Mono phone,单音素模型训练
Triphone:三音素模型训练,一般 tri1: deltas; tri2: delta+delta-delta; tri3a: lda+mllt
GMM:高斯混合模型
HMM:隐马尔可夫
sGMM:子空间高斯混合模型(subspace GMM),可有效减少GMM参数
GMM-HMM:MFCC+Mono+Triphone
MLLT:Maximum Likely Linear Transform, 最大似然线性变换,用在training阶段
CMLLR/fMLLR:Contraint/feature Maximum Likelyhood Linear Regression, 约束最大似然线性回归/特征空间最大似然线性回归(feature-space maximum likelihood linear regression),针对说话人特征的鲁棒性,用在alignment阶段
SAT:Speaker Adaptive Training, 说话人自适应
VTLN:Vocal Tract Length Normalisation,声道长度归一化。主要用于语音识别,消除男,女的声道长度的差异。在HTK中有源码,HTK book中有介绍。修改了MEL频率中的中心频率。
LDA:Linear Discriminated Analysis, 线性判别分析
PLDA:Probality Linear Discriminated Analysis概率线性判别分析
MMI/BMMI:Maximum Mutual Information / Boosted MMI 最大互信息(最小化句子错误率?),steps/train_mmi.sh
LF-MMI: Lattice Free – Maximum Mutual Information
MPE:Minimum Phone Error, 最小化各种粒度指标的错误率,steps/train_mpe.sh
sMBR:state-level Minimum Bayes Risk, 最小化状态错误率
lattice:词格,lmrescore会用到
EM: Expection Maximumization
LMWT: language model weights, 语言模型权重
acwt: Acoustic weight(acoustic scale), 声学模型权重
下面是看kaldi脚本的时候遇到的一些术语和缩写
hires: hi-res , high resolution, to depict mfcc
scp: script file, content is of format: each line is pair of [utterence id] and [wav file or zipped wav file]
ark: archive file, token1 [something]token2 [something]token3 [something] ….
dur: duration, for example, utt2dur file is to specify pair of [utterance id] and [duration]
feats: features, like feats.scp which includes pair of [utterance id] and [mfcc feature ark file]
phones: phonemes, like phones.txt
int and txt: file extension, txt is like #1, #2, #3, while int include integer inside, for example, disambig.int and disambig.txt
disambig: it is short for disambiguation which is used for minimization and determinization of fst
lat: lattic, e.g. lat.1.gz
CTM: stands for time-marked conversation file and contains a time-aligned phoneme transcription of the utterances. Its format is:
utt_id channel_num start_time phone_dur phone_id
egs: Examples
rm: Resource Management
wsj: Wall Street Journal
s5: Script version 5
exp: Experiments
acc: Accumulate
accs: Accumulate states
ali: Alignment
mdl: Model
occs: Occurrence counts/occupancy
am: Acoustic model
csl: colon seperated list files