Kaldi的egs下通用样例解释

aidatatang_200zh/s5

数据堂200h中文开源数据，用于语音识别 LM+MFCC+Mono+Triphone(tri1:deltas;tri2:delta+delta-delta;tri3a:lda+mllt)+fMLLR+SAT+TDNN

aishell/v1

openslr33数据，声纹识别 MFCC+UBM+PLDA

aishell/s5

openslr33数据，语音识别 LM+MFCC+Mono+Triphone+fMLLR+SAT+TDNN

aishell2/s5

aishell2，语音识别 LM + GMM-HMM(MFCC+Mono+Triphone)+TDNN

ami/s5/run_ihm.sh

—-，语音识别 IHM(independent headset microphone): LM+MFCC+Mono+Triphone+tri4a(LDA+MLLT+SAT)+DNN+TDNN;

ami/s5/run_mdm.sh

—-，语音识别 MDM(multiple distant microphone): LM+MFCC+Mono+Triphone+SAT+MMI+DNN(dnn+lad+mllt)+TDNN;

ami/s5/run_sdm.sh

—-，语音识别 SDM(single distant microphone): LM+MFCC+Mono+Triphone+SAT+MMI+DNN(dnn+lad+mllt)+TDNN

ami/s5b

—-，语音识别 LM+MFCC+tri1(deltas)+tri2(lda+mllt)+tri3(lda+mllt+sat)+tdnn

an4/s5

AN4，语音识别 LM+MFCC+tri1(deltas)+tri2(lda+mllt)+tri3(lda+mllt+sat)

apiai_decode/s5

16Hz数据，只有解码，没有训练模型略

aspire/s5

corpora3/LDC/LDC2005T19，corpora3/LDC/LDC2004S13，corpora3/LDC/LDC2005S13，语音识别 LM+MFCC+CMVN+Mono+Triphone+fMLLR+SAT+build_silprob.sh+TDNN+TDNN_SLTM

aurora4/s5

corpora5/LDC/LDC93S6B，corpora5/AURORA，语音识别 MFCC+tri1(deltas)+tri2(deltas)+tri2b(lda_mllt)+tri3b(lda+mllt+sat)+TDNN

babel/s5

run有点多，挑有特点的写，plp+pitch+feats+(ffv)+mono+tri1+tri2+tri3(deltas)+tri4(lda_mllt)+sat+SGMM(fmllr+ubm+sgmm)+MMI

bentham/v1/run_end2end.sh

corpora5/handwriting_ocr/hwr1/ICDAR-HTR-Competition-2015，图像识别，OCR识别，端到端识别 features+cmvn+lm+e2e_cnn

bn_music_speech/v1

corpora5/LDC/LDC97S44，corpora/LDC/LDC97T22，音乐语音识别 MFCC+UBM+vad_GMM

callhome_diarization/v1

swbd，家庭电话的声纹识别 MFCC+VAD+UBM+PLDA+Cluster

callhome_diarization/v2

swbd，家庭电话的声纹识别 xvector+vad+数据增强+mfcc+plda+cluster+diag(ubm)+VB

callhome_egyptian/s5

略，语音识别 mfcc+cmvn+mono+Triphone+sat+fmllr+tdnn

casia_hwdb/v1

corpora5/handwriting_ocr/CASIA_HWDB/Offline，端到端语音识别略

chime1-6

略，语音识别

cifar/v1

cifar，图像识别略

cmu_cslu_kids/s5

略，语音识别 LM+MFCC+CMVN+Mono+Triphone+MMI+Boosting+MPE+SAT+VTLN+tdnnf

cnceleb/v1

CN-Celeb dataset，声纹识别 MFCC+UBM+PLDA

commonvoice/s5

corpus v1，语音识别 LM+MFCC+Mono+Triphone+fmllr+tdnn

csj/s5

日语语料库，语音识别 LM+MFCC+CMVN+GMM-HMM+fmllr+（sgmm, tdnn, dnn, rnnlm等)

dihard_2018/v1

略，声纹识别 MFCC+UBM+PLDA+Cluster

dihard_2018/v2

略，声纹识别 MFCC+数据增强+cmvn+xvector+plda+cluster

egs/fame

弗里斯兰人语料库，语音识别s5，声纹识别v1+v2 s5: mfcc+cmvn+mono+triphone+sgmm+dnn+dnn_fbank；v1:常规操作，略；v2:引入了ubm+dnn

farsdat/s5

波斯语语料库，语音识别 MFCC+CMVN+Mono+tri1(deltas + delta-deltas)+tri2(LDA + MLLT)+tri3(LDA + MLLT + SAT)+SGMM+MMI + SGMM2

fisher_callhome_spanish/s5

西班牙语语料库，语音识别 MFCC+CMVN+Mono+deltas+deltas+lda_mllt+fmllr+sgmm+mmi+tdnn_1g

fisher_english/s5

Fisher-English corpus，语音识别 MFCC+CMVN+deltas+deltas+lda_mllt+fmllr+sat

fisher_swbd/s5

SWBD语料库，语音识别 lm+mfcc+cmvn+mono+delta+delta+delta+lda_mllt+fmllr+sat+lmresocre

formosa/s5

台湾话，语音识别 lm+mfcc+pitch+cmvn+mono+delta+delta+lda_mllt+fmllr+sat+tdnn

gale_arabic

阿拉伯语语料库，语音识别 s5:lm+mfcc+cmvn+mono+delta+delta+lda_mllt+sat+fmllr+mmi+sgmm+dnn, s5b:lm+mfcc+cmvn+mono+delta+lad_mllt+sat+fmllr+tdnn, s5c:lm+mfcc+mono+delta+lda_mllt+sat+fmllr+tdnn, s5d:lm+mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+tdnn+tdnn_lstm

gale_mandarin/s5

中文普通话语料库，语音识别 lm+mfcc+cmvn+mono+delta+lad_mllt+MMI+MPE+sat+fmllr+UBM+sgmm

gop/s5

略，google的电话评分略

gp

三个语种，每个语种15-20h，多语种语音识别略

heroico/s5

西班牙语，语音识别 lm+mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+tdnn

hi_mia/v1

openslr，唤醒词识别略

hkust/s5

湖南方言，语音识别 lm+mfcc+cmvn+mono+delta+delta+lda_mllt+fmllr+sat+nnet2_ms+tdnn+tdnn

hub4_english/s5 E

nglish Broadcast News (HUB4) corpus，语音识别 lm+mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr

hub4_spanish/s5

西班牙语，语音识别 lm+mfcc+cmvn+mono+delta+delta+delta+lda_mllt+sat+fmllr

iam

手写数据，图像识别略

iban

马来西亚语，语音识别 lm+mfcc+cmvn+mono+delta+lmrescore+delta+lmrescore+lda_mllt+lmrescore+sat+fmllr+ubm+sgmm+lmrescore（特色是每次decode都会用lmrescore）

ifnenit

手写数据，图像识别略

librispeech/s5

英语 lm+mfcc+cmvn+mono+deltas+lmrescore+lda_mllt+lmrescore+sat+fmllr+tdnn（除了没有数据增强，其他比较齐全了）

lre/v1

—-，语种识别 mfcc+vad+ubm+vtln+ivector

lre07/v1

—-，语种识别 v1:vtln+mfcc+ubm+ivector, v2:vtln+mfcc+ubm+ivector_dnn+dnn
madcat_ar，madcat_zh 手写数据，图像文字识别略

malach/s5

MALACH data，语音识别 mfcc+cmvn+lda_mllt+sat+fmllr+tdnn

mandarin_bn_bc/s5

LDC，语音识别 lm+mfcc+pitch+cmvn+mono+delta+lad_mllt+sat+fmllr+tdnn+dtnn_lstm

material/s5

斯瓦希里语，语音识别 lm+mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+lm修改

mgb2_arabic/s5

MGB-2 corpus，语音识别 lm+mfcc+cmvn+mono+delta+delta+lad_mllt+sat+fmllr+dnn

mgb5/s5

MGB-5 corpus lm+mfcc+cmvn+mono+delta+delta+lda_mllt+sat+fmllr+sgmm+tdnn

mini_librispeech/s5

openslr 31，语音识别 lm+mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+lmrescore+tdnn

mobvoi/v1

mobvoi提供的数据，语音识别数据增强+mfcc+cmvn+tdnn

mobvoihotwords/v1

略，语音识别数据增强+mfcc+cmvn+fmllr+tdnn

multi_cn/s5

中文(openslr)，语音识别 lm+mfcc+pitch+cmvn+mono+delta+delta+lda_mllt+sat+fmllr+cnn_tdnn

multi_en/s5

英语，语音识别 lm+mfcc+cmvn+mono+delta+delta+delta+lda_mllt+fmllr+sat

ptb/s5

Penn Treebank corpus，lm建模略

reverb/s5

—-，带混响的语音识别 mfcc+cmvn+mono+delta+lad_mllt+sat+fmllr+tdnn

rimes/v1

French handwriting，图片文字识别略

rm/s5

语音识别（dan的ppt上讲语音识别流程用的例子） mfcc+plp+cmvn+mono+delta+lda_mllt+denlats+mmi+mpe+sat+fmllr+ubm+mmi_fmmi+sgmm2+tdnn+tdnn_online_cmn

sitw

数据，真实环境中的说话人识别
v1:mfcc+vad+ubm+ivector+数据增强+lda+plda
v2:mfcc+vad+数据增强+xvector+lda+plda

snips/v1

唤醒词，语音识别 mfcc+cmvn+数据增强+mfcc+cmvn+mono+fmllr+tdnn

spanish_dimex100/s5

墨西哥西班牙语，语音识别 mfcc+cmvn+mono+delta+lda_mllt+denlats+mm

sprakbanken/s5

丹麦语，语音识别 mfcc+cmvn+irstlm+mono+delta+delta+lda_mllt+sat+fmllr+tdnn_lstm

sprakbanken_swe/s5

瑞典语，语音识别 mfcc+cmvn+irstlm+mono+delta+delta+lda_mllt+sat+fmllr+local/sprak_run_nnet_cpu.sh

sre08/v1

LDC2011S05，声纹识别 mfcc+vad+ubm+ivector+lda+plda

sre10

NIST SRE 2010 ，声纹识别 v1:mfcc+vad+ubm+ivector+plda, v2:mfcc+vad+ubm+ivector_dnn+plda

sre16

NIST SRE 2016 enroll，声纹识别 v1:mfcc+vad+ubm+ivector+数据增强+mfcc+ivector+plda, v2:mfcc+vad+数据增强+mfcc+cmvn+xvector+plda

svhn/v1

Street View House Numbers，图像识别略

swahili/s5

斯瓦希里语语音语料库，语音识别 mfcc+cmvn+mono+delta+lad_mllt+sat+fmllr+denlats+mmi+ubm+mmi_fmmi+ubm+sgmm+denlats_sgmm+mmi_sgmm

swbd

Switchboard corpus，Fisher corpus，语音识别 s5:mfcc+cmvn+mono+delta+delta+lda_mllt+fmllr+sgmm+sat+fmllr+denlats+mmi+ubm+mmi_fmmi, s5b:mfcc+cmvn+mono+delta+delta+lda_mllt+fmllr+sat+fmllr+denlats+mmi+ubm+mmi_fmmi, s5c:mfcc+cmvn+mono+delta+delta+lda_mllt+fmllr+lmrescore+mmi+ubm+mmi_fmmi+lmrescore

tedlium

—-，语音识别 s5:mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+denlats+mmi+dnn, s5_r2:mfcc+cmvn+mono+delta+lmscore+lda_mllt+sat+fmllr+tdnn, s5_r2_wsj:mfcc+cmvn+mono+delta+lad_mllt+sat+fmllr, s5_r3:mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+tdnn

thchs30/s5

中文，语音识别 mfcc+cmvn+lm+mono+delta+lda_mllt+sat+fmllr+quick+dnn

tidigits/s5

LDC93S10，英文数字语音识别 mfcc+cmvn+mono+delta

timit/s5

LDC93S1，语音识别 mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+ubm+sgmm+mmi_sgmm+dnn

tunisian_msa/s5

突尼斯语料库，语音识别 mfcc+cmvn+mono+lda_mllt+sat+fmllr+tdnn
uw3/v1 —-，图像识别略

voxceleb

VoxCeleb1 and VoxCeleb2 corpora，声纹识别 v1:mfcc+vad+ubm+ivector+lad+plda, v2:mfcc+vad+数据增强+cmvn+xvector+lda+plda

voxforge/s5

可以从voxforge得到免费语音库，语音识别 mfcc+cmvn+mono+delta+delta+lda_mllt+denlats+mmi+mpe+sat+fmllr+ubm+mmi_fmmi+sgmm

vystadial_cz

捷克语，语音识别 s5:mfcc+cmvn+mono+delta+delta+lda_mllt+denlats+mmi, s5b:mfcc+cmvn+mono+delta+lda_mllt+sat+fmllr+tdnn

vystadial_en/s5

英语，语音识别 mfcc+cmvn+mono+delta+delta+lda_mllt+denlats+mmi+mpe

wsj/s5

华尔街日报数据，语音识别 mfcc+cmvn+mono+delta+lmrescore+lda_mllt+lmrescore+sat+fmllr+tdnn

yesno/s5

yesno数据，语音识别 mfcc+cmvn+mono

yomdle_fa, yomdle_korean, yomdle_russian, yomdle_tamil, yomdle_zh

OCR数据，图像识别略

zeroth_korean/s5

韩语，语音识别 mfcc+cmvn+mono+delta+lmrescore+lda_mllt+sat+fmllr+rebulidlm+lmrescore+fmllr+sat+tdnn

杂项

数据增强：加噪，加音乐，加混响，速度扰动，SpecAugment()
特征提取：MFCC，pitch，CMVN，fbank，ubm
ASR训练：mono+triphone+tdnn，其中triphone会有变化（deltas，LDA，MLLT，fMLLR，SGMM等），tdnn会被替换成其他
训练策略：CE，MMI/BMM，MPE，sMBR
LM：先用较小LM，而后decode的时候用RNNLM进行重打分（主要是为了节省时间），当然可以直接用完整的LM，只是比较费时。
ASR：一般训练是把数据拆分train(训练集)，dev(开发集)，test(测试集)。一般调参是根据dev结果进行调参。此外，也会把train拆分成多个，在训练过程中不断增加数据，增加参数。
声纹识别

若没有segment，则需要先做一步vad，以去除静音段
特征提取：ivector，xvector
训练：ubm，lda/plda，cluster

More from my site

分享到：