Kaldi中的topo结构
Kaldi中的topo结构,在每个样例中是以topo文件表示,在代码中是由HmmTopology这个类表示,拓扑结构中的参数的更新会在TransitionModel这个类体现出来。
下面是Kaldi中yesno样例下面的topo文件,文件位置:
egs/yesno/s5/data/lang/topo
这个文件是由utils/gen_topo.pl生成,而这个脚本是在utils/prepare_lang.sh中调用,prepare_lang.sh在run.sh中被调用。run.sh这个脚本就是我们用来train模型的。
<Topology>
<TopologyEntry>
<ForPhones>
2 3
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
<State> 3 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
1
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.75 <Transition> 5 0.25 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>
上面
上面的概率0.25, 0.75是 utils/gen_topo.pl中给的一个初始的概率。每一个样例中topo文件中这些概率都是一样的,我们可以看另外一个例子,
<Topology>
<TopologyEntry>
<ForPhones>

</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
<State> 3 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.75 <Transition> 5 0.25 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>
对于上面的topo,除了静音因素,还有其他,总共加起来20,这些包括,笑的因素,杂音,oov。可以通过查看 data/lang/phones.txt这个文件
<eps> 0
sil 1
sil_B 2
sil_E 3
sil_I 4
sil_S 5
laughter 6
laughter_B 7
laughter_E 8
laughter_I 9
laughter_S 10
noise 11
noise_B 12
noise_E 13
noise_I 14
noise_S 15
oov 16
oov_B 17
oov_E 18
oov_I 19
oov_S 20
下面看下HmmTopology
class HmmTopology {
public:
/// A structure defined inside HmmTopology to represent a HMM state.
struct HmmState {
int32 forward_pdf_class;
int32 self_loop_pdf_class;
std::vector<std::pair<int32, BaseFloat> > transitions;
<pre><code>HmmState(): forward_pdf_class(-1), self_loop_pdf_class(-1) { }
HmmState(int32 i): forward_pdf_class(i), self_loop_pdf_class(i) { }
</code></pre>
};
/// TopologyEntry is a typedef that represents the topology of
/// a single (prototype) state.
typedef std::vector<HmmState> TopologyEntry;
void Read(std::istream &is, bool binary);
void Write(std::ostream &os, bool binary) const;
/// Returns the topology entry (i.e. vector of HmmState) for this phone;
/// will throw exception if phone not covered by the topology.
const TopologyEntry &TopologyForPhone(int32 phone) const{
return entries_[phone2idx_[phone]];
}
HmmTopology() {}
private:
std::vector<int32> phones_; // list of all phones we have topology for. Sorted, uniq. no epsilon (zero) phone.
std::vector<int32> phone2idx_; // map from phones to indexes into the entries vector (or -1 for not present).
std::vector<TopologyEntry> entries_;
};
phones_存放了音素,phone2idx是个索引表,根据这个索引表,能找到音素的拓扑结构。通过阅读上面的TopologyForPhone函数可以清楚看出其作用。entries_存放了具体的拓扑结构。就像上面的例子一样,2个音素的拓扑结构是一模一样的,所以entries_中,只要存放一份拓扑结构就可以了,这个是所有音素共享的。
topo的文件的内容会作为初始训练的模型,比如我们查看yesno下面的单因素训练的第一个模型 0.mdl。
我们使用 kaldi查看文件的工具提到命令将这个二进制文件转换成文本文件
$> gmm-copy --binary=false exp/mono0a/0.mdl 0.mdl.txt
LOG (gmm-copy[5.5.896~1-8a59]:main():gmm-copy.cc:75) Written model to 0.mdl.txt
下面是得到0.mdl.txt文件的头部内容,
<TransitionModel>
<Topology>
<TopologyEntry>
<ForPhones>
2 3
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.75 <Transition> 1 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.75 <Transition> 2 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 2 0.75 <Transition> 3 0.25 </State>
<State> 3 </State>
</TopologyEntry>
<TopologyEntry>
<ForPhones>
1
</ForPhones>
<State> 0 <PdfClass> 0 <Transition> 0 0.25 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 </State>
<State> 1 <PdfClass> 1 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 2 <PdfClass> 2 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 3 <PdfClass> 3 <Transition> 1 0.25 <Transition> 2 0.25 <Transition> 3 0.25 <Transition> 4 0.25 </State>
<State> 4 <PdfClass> 4 <Transition> 4 0.75 <Transition> 5 0.25 </State>
<State> 5 </State>
</TopologyEntry>
</Topology>
<Triples> 11
...
...
...
可以看到它就是前面的看到topo文件的内容。