kaldi的triphone训练详解
来源:互联网 发布:java 游戏服务器架构 编辑:程序博客网 时间:2024/06/05 20:18
Triphone
以单音素模型为输入训练上下文相关的三音素模型
#triphonesteps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
train_deltas.sh中的相关配置如下:
stage=-4 # This allows restarting after partway, when something when wrong.config=cmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 20 30";num_iters=35 # Number of iterations of trainingmax_iter_inc=25 # Last iter to increase #Gauss on.beam=10careful=falseretry_beam=40boost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentpower=0.25 # Exponent for number of gaussians according to occurrence countscluster_thresh=-1 # for build-tree control final bottom-up clustering of leavesnorm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=true" # use the option --cmvn-opts "--norm-means=false"cmvn_opts=delta_opts=context_opts= # use"--context-width=5 --central-position=2" for quinphone
用法为:echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"
LDA_MLLT
对特征使用LDA和MLLT进行变换,训练加入LDA和MLLT的三音素模型。
LDA+MLLT refers to the way we transform the features after computing the MFCCs: we splice across several frames, reduce the dimension (to 40 by default) using Linear Discriminant Analysis), and then later estimate, over multiple iterations, a diagonalizing transform known as MLLT or CTC.
详情可参考 http://kaldi-asr.org/doc/transform.html
#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali` exp/tri2b || exit 1;
test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &train_lda_mllt.sh
相关代码配置如下:
cmd=run.plconfig=stage=-5scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 20 30";mllt_iters="2 4 6 12";num_iters=35 # Number of iterations of trainingmax_iter_inc=25 # Last iter to increase #Gauss on.dim=40beam=10retry_beam=40careful=falseboost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentpower=0.25 # Exponent for number of gaussians according to occurrence countsrandprune=4.0 # This is approximately the ratio by which we will speed up the # LDA and MLLT calculations via randomized pruning.splice_opts=cluster_thresh=-1 # for build-tree control final bottom-up clustering of leavesnorm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=false"cmvn_opts=context_opts= # use "--context-width=5 --central-position=2" for quinphone.
Sat
运用基于特征空间的最大似然线性回归(fMLLR)进行说话人自适应训练
This does Speaker Adapted Training (SAT), i.e. train on fMLLR-adapted features. It can be done on top of either LDA+MLLT, or delta and delta-delta features. If there are no transforms supplied in the alignment directory, it will estimate transforms itself before building the tree (and in any case, it estimates transforms a number of times during training).
lda_mllt_ali
steps/align_si.sh –nj
sat
steps/train_sat.sh –cmd “$train_cmd” 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;
test tri3b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &train_sat.sh
相关具体配置如下:
stage=-5exit_stage=-100 # you can use this to require it to exit at the # beginning of a specific stage. Not all values are # supported.fmllr_update_type=fullcmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"beam=10retry_beam=40careful=falseboost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentcontext_opts= # e.g. set this to "--context-width 5 --central-position 2" for quinphone.realign_iters="10 20 30";fmllr_iters="2 4 6 12";silence_weight=0.0 # Weight on silence in fMLLR estimation.num_iters=35 # Number of iterations of trainingmax_iter_inc=25 # Last iter to increase #Gauss on.power=0.2 # Exponent for number of gaussians according to occurrence countscluster_thresh=-1 # for build-tree control final bottom-up clustering of leavesphone_map=train_tree=truetree_stats_opts=cluster_phones_`这里写代码片`opts=compile_questions_opts=
decode_fmllr.sh :对做了发音人自适应的模型进行解码
Decoding script that does fMLLR. This can be on top of delta+delta-delta, or LDA+MLLT features.
steps/train_quick.sh –cmd “$train_cmd” 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;
test tri4b model
local/thchs-30_decode.sh –nj $n “steps/decode_fmllr.sh” exp/tri4b data/mfcc &train_quick.sh
配置:
Begin configuration..
cmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 15"; # Only realign twice.num_iters=20 # Number of iterations of trainingmaxiterinc=15 # Last iter to increase #Gauss on.batch_size=750 # batch size to use while compiling graphs... memory/speed tradeoff.beam=10 # alignment beam.retry_beam=40stage=-5cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
End configuration section.
- kaldi的triphone训练详解
- Kaldi-Timit 训练
- Kaldi声学模型训练
- Kaldi中的plda的训练以及computer-socre
- kaldi训练thchs30详细步骤
- Kaldi 入门详解
- kaldi的语音识别数据timit例子详解
- 如何用kaldi训练好的DNN模型做在线识别
- Kaldi中如何使用已经训练好的模型进行语音识别ASR呢?
- 异常声音检测之kaldi DNN 训练
- kaldi 中mono phone训练过程
- kaldi mono训练(学习查阅博客)
- Kaldi 训练一个 DNN 声学模型
- kaldi 学习笔记-单音素训练
- kaldi学习笔记-三音素训练2
- Kaldi学习手记:Kaldi的编译安装
- Ubuntu 16.04安装Kaldi详解
- Kaldi 入门train_mono.sh详解
- 最近开发项目的总结
- 冒泡排序
- OpenAirInterface安装说明
- 栈的压入、弹出序列(判断弹出序列是否正确)
- mui滚动选项卡-加强版
- kaldi的triphone训练详解
- 《番茄工作法》
- kmp算法 汇总
- acm水题,颜色气球
- Android 开发问题集锦之:You need to use a Theme.AppCompat theme (or descendant) with this activity.
- 从今起写博客
- springboot读写分离开关控制
- Mybatis学习(涵盖所有内容)
- C++:内部类