CrossWord of AM training

来源：互联网发布：潮汕话输入法软件编辑：程序博客网时间：2024/05/20 07:33

General Framework for Acoustic Modeling

Building ASR system incrementally：

Context-independent ➔ Context-dependent modeling
Mono-phone ➔ Tri-phone HMM
Single Gaussian mixture per state ➔ Multiple Gaussian mixtures per state

Context-independent Modeling 上下文无关建模

Flowchart for Crossword Modeling：

Forced Alignment：

Input:
Word level transcription 词汇转录
Lexicon/Dictionary 词汇、字典
Multiple pronunciations 多重发音
Z. (z eh d vs. z iy)
HMMs

Output:
Phoneme level transcription of actual pronunciation with time boundary 具有时间边界的实际发音转换

To deal with the issue of imprecise transcription 处理不精确转录的问题最初，

Initially HMMs are trained on the basis of one fixed pronunciation per word HMM是根据每个单词一个固定的发音进行训练的

To determine the actual pronunciations in the utterances used to train the HMM system 确定用于训练HMM系统的话语中的实际发音
HVite is used in forced alignment mode to select the best matching pronunciations. HVite用于强制对齐模式，以选择最佳匹配发音。
The new phone level transcriptions can then be used to retrain the HMMs 然后可以使用新的phone级转录来重新训练HMM

Transcription snippets：转录片段