HTK搭建大词汇量连续语音识别系统( 五)
来源:互联网 发布:阳光读屏软件 编辑:程序博客网 时间:2024/04/30 01:11
混合高斯模型+语言模型
今天事情比较多,就花了点时间看了一下HTKbook的高斯混合模型和data driven,然后使用HVite进行解码,时间比较长,出去吃了个饭,打几局台球回来刚好运行完。
1、初始proto 的hmm模型:
~o <VecSize> 39 <MFCC_0_D_A>~h "proto1"<BeginHMM><VecSize> 39 <MFCC_0_D_A><NumStates> 5<State> 2 <NumMixes> 5<Mixture> 1 0.2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0<Mixture> 2 0.2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0<Mixture> 3 0.2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0<Mixture> 4 0.2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0<Mixture> 5 0.2<Mean> 390.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 <Variance> 391.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0。。。。。。。<TransP> 50.0 1.0 0.0 0.0 0.00.0 0.6 0.4 0.0 0.00.0 0.0 0.6 0.4 0.00.0 0.0 0.0 0.7 0.30.0 0.0 0.0 0.0 0.0<EndHMM>
2、绑定三音素
由于使用了GMM,此时就不能使用决策树了,而应该使用data driven,在生成tree.hed文件时所用命令为:
perl scripts/mkclscript.prl TC 100.0 lists/monophones1>tree.hed
详见HTKbook p170-171
在生成的tree.hed中,
开头加入:RO 100 "stats"
末尾加入:CO lists/tiedlist
形如:
RO 100 "stats"TC 100.0 "ST_ax_2_" {("ax","*-ax+*","ax+*","*-ax").state[2]}TC 100.0 "ST_sp_2_" {("sp","*-sp+*","sp+*","*-sp").state[2]}TC 100.0 "ST_b_2_" {("b","*-b+*","b+*","*-b").state[2]}TC 100.0 "ST_r_2_" {("r","*-r+*","r+*","*-r").state[2]}…………………TC 100.0 "ST_em_4_" {("em","*-em+*","em+*","*-em").state[4]}TC 100.0 "ST_zh_4_" {("zh","*-zh+*","zh+*","*-zh").state[4]}TC 100.0 "ST_sil_4_" {("sil","*-sil+*","sil+*","*-sil").state[4]}CO lists/tiedlist
然后使用命令:
HHEd -H hmms/hmm12/macros -H hmms/hmm12/hmmdefs -M hmms/hmm13 tree.hed lists/triphones1>log
重估两次即可
3、解码评测
在dict6中的开头加入:
!!UNK [] sil
!NULL [] sil
</s> [] sil
<s> [] sil
使用语言模型的解码方法进行再次评估:
HVite -T 1 -H hmms1/hmm15/macros -H hmms1/hmm15/hmmdefs -s 10.0 -S test/test.scp -i results/recout_GMM_lm.mlf -w dict/bigram.net -C config/config2 -t 250.0 -n 4 20 -q Atal -z lat dict/dict6 lists1/tiedlist
由于高斯混合度为5,运行时间比单一混合度长,大约需要2小时左右。
然后使用HResults命令进行评测:
HResults -I test/testwords1.mlf lists1/tiedlist recults/recout_GMM_lm.mlf
结果如下:
句子识别率已经有了良好的改善。
4、总结:增加高斯混合度可以提高识别率,同时在进行解码的时候需要更多的时间,加上语言模型的训练就可以得到较理想的结果。下一步重点放在语言模型的训练上。
0 0
- HTK搭建大词汇量连续语音识别系统( 五)
- HTK搭建大词汇量连续语音识别系统(二)
- HTK搭建大词汇量连续语音识别系统(三)
- HTK搭建大词汇量连续语音识别系统(四)
- HTK搭建大词汇量连续语音识别系统(一)
- 应用HTK搭建连续语音识别系统(总结)
- HTK英文大词汇连续语音识别
- 基于HTK的连续语音识别系统搭建学习笔记(一)
- 基于HTK的连续语音识别系统搭建学习笔记(二)
- 基于HTK的连续语音识别系统搭建学习笔记(三)
- 基于HTK的连续语音识别系统搭建学习笔记(四)
- 基于HTK的连续语音识别系统搭建学习笔记(一)
- 基于HTK的连续语音识别系统搭建学习笔记(二)
- 基于HTK的连续语音识别系统搭建学习笔记(三)
- 基于HTK的连续语音识别系统搭建学习笔记(四)
- 语音识别系统之htk-----连续语音识别
- 利用自收敛深度人工神经网络构建(DNN)构建多语种大词汇量连续语音识别系统
- 语音识别系统之htk------孤立词识别(yesno)
- 串匹配算法KMP详解
- 动态规划算法
- 2013的收获
- ARM中的RO、RW和ZI DATA说明
- 集线器,交换机,路由器的比较
- HTK搭建大词汇量连续语音识别系统( 五)
- mysql中插入更新一步到位的sql
- Thumb、ARM指令 状态切换
- java class文件结构(一)
- 2013->2014
- 循环码系统与非系统编码的C语言实现
- java class文件结构(二)
- SSH本机免登陆密码
- 在C#环境和C环境(嵌入式)之间使用DES进行加密、解密