CrossWord of AM training

来源:互联网 发布:潮汕话输入法软件 编辑:程序博客网 时间:2024/05/20 07:33

General Framework for Acoustic Modeling


Building ASR system incrementally:

Context-independent ➔ Context-dependent modeling
Mono-phone ➔ Tri-phone HMM
Single Gaussian mixture per state ➔ Multiple Gaussian mixtures per state

Context-independent Modeling   上下文无关建模



Flowchart for Crossword Modeling:


Forced Alignment:



Input:
Word level transcription  词汇转录
Lexicon/Dictionary        词汇、字典
Multiple pronunciations   多重发音
Z. (z eh d vs. z iy)
HMMs

Output:
Phoneme level transcription of actual pronunciation with time boundary 具有时间边界的实际发音转换


To deal with the issue of imprecise transcription     处理不精确转录的问题最初,

Initially HMMs are trained on the basis of one fixed pronunciation per word   HMM是根据每个单词一个固定的发音进行训练的

To determine the actual pronunciations in the utterances used to train the HMM system 确定用于训练HMM系统的话语中的实际发音
HVite is used in forced alignment mode to select the best matching pronunciations.  HVite用于强制对齐模式,以选择最佳匹配发音。
The new phone level transcriptions can then be used to retrain the HMMs  然后可以使用新的phone级转录来重新训练HMM


Transcription snippets:转录片段


Workflow of Crossword Acoustic Modeling in Autotrain

Input of Crossword Training:



Stage 1-Generate phone-based trans:



Stage 2 Generate monophone HMMs:


Stage 3-Generate triphone HMMs and trans




Stage 4-Bulid fully-trained triphone HMMs


Stage 5- TrainingPriors



Stage 6- Gender-specific HMMs



Output of Crossword Training







原创粉丝点击