kaldi的triphone训练详解

来源：互联网发布：java 游戏服务器架构编辑：程序博客网时间：2024/06/05 20:18

Triphone
以单音素模型为输入训练上下文相关的三音素模型

 #triphonesteps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;

train_deltas.sh中的相关配置如下：

stage=-4 #  This allows restarting after partway, when something when wrong.config=cmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 20 30";num_iters=35    # Number of iterations of trainingmax_iter_inc=25 # Last iter to increase #Gauss on.beam=10careful=falseretry_beam=40boost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentpower=0.25 # Exponent for number of gaussians according to occurrence countscluster_thresh=-1  # for build-tree control final bottom-up clustering of leavesnorm_vars=false # deprecated.  Prefer --cmvn-opts "--norm-vars=true"                # use the option --cmvn-opts "--norm-means=false"cmvn_opts=delta_opts=context_opts=   # use"--context-width=5 --central-position=2" for quinphone

用法为：echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"

LDA_MLLT
对特征使用LDA和MLLT进行变换，训练加入LDA和MLLT的三音素模型。
LDA+MLLT refers to the way we transform the features after computing the MFCCs: we splice across several frames, reduce the dimension (to 40 by default) using Linear Discriminant Analysis), and then later estimate, over multiple iterations, a diagonalizing transform known as MLLT or CTC.
详情可参考 http://kaldi-asr.org/doc/transform.html

#triphone_ali

steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;

lda_mllt

steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali` exp/tri2b || exit 1;

test tri2b model

local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &train_lda_mllt.sh

相关代码配置如下：

cmd=run.plconfig=stage=-5scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 20 30";mllt_iters="2 4 6 12";num_iters=35    # Number of iterations of trainingmax_iter_inc=25  # Last iter to increase #Gauss on.dim=40beam=10retry_beam=40careful=falseboost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentpower=0.25 # Exponent for number of gaussians according to occurrence countsrandprune=4.0 # This is approximately the ratio by which we will speed up the              # LDA and MLLT calculations via randomized pruning.splice_opts=cluster_thresh=-1  # for build-tree control final bottom-up clustering of leavesnorm_vars=false # deprecated.  Prefer --cmvn-opts "--norm-vars=false"cmvn_opts=context_opts=   # use "--context-width=5 --central-position=2" for quinphone.

Sat
运用基于特征空间的最大似然线性回归（fMLLR）进行说话人自适应训练
This does Speaker Adapted Training (SAT), i.e. train on fMLLR-adapted features. It can be done on top of either LDA+MLLT, or delta and delta-delta features. If there are no transforms supplied in the alignment directory, it will estimate transforms itself before building the tree (and in any case, it estimates transforms a number of times during training).

lda_mllt_ali

steps/align_si.sh –nj n−−cmd"train_cmd” –use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1;

sat

steps/train_sat.sh –cmd “$train_cmd” 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;

test tri3b model

local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &train_sat.sh
相关具体配置如下：

stage=-5exit_stage=-100 # you can use this to require it to exit at the                # beginning of a specific stage.  Not all values are                # supported.fmllr_update_type=fullcmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"beam=10retry_beam=40careful=falseboost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentcontext_opts=  # e.g. set this to "--context-width 5 --central-position 2" for quinphone.realign_iters="10 20 30";fmllr_iters="2 4 6 12";silence_weight=0.0 # Weight on silence in fMLLR estimation.num_iters=35   # Number of iterations of trainingmax_iter_inc=25 # Last iter to increase #Gauss on.power=0.2 # Exponent for number of gaussians according to occurrence countscluster_thresh=-1  # for build-tree control final bottom-up clustering of leavesphone_map=train_tree=truetree_stats_opts=cluster_phones_`这里写代码片`opts=compile_questions_opts=

decode_fmllr.sh ：对做了发音人自适应的模型进行解码
Decoding script that does fMLLR. This can be on top of delta+delta-delta, or LDA+MLLT features.
steps/train_quick.sh –cmd “$train_cmd” 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;

test tri4b model

local/thchs-30_decode.sh –nj $n “steps/decode_fmllr.sh” exp/tri4b data/mfcc &train_quick.sh
配置：

Begin configuration..

cmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"realign_iters="10 15"; # Only realign twice.num_iters=20    # Number of iterations of trainingmaxiterinc=15 # Last iter to increase #Gauss on.batch_size=750 # batch size to use while compiling graphs... memory/speed tradeoff.beam=10 # alignment beam.retry_beam=40stage=-5cluster_thresh=-1  # for build-tree control final bottom-up clustering of leaves

End configuration section.

阅读全文

0 0