kaldi中文语音识别thchs30模型训练代码功能和配置参数解读
来源:互联网 发布:方太 超声波 农残 知乎 编辑:程序博客网 时间:2024/06/05 19:11
Monophone
单音素模型的训练
# Flat start and monophone training, with delta-delta features.# This script applies cepstral mean normalization (per speaker).#monophone 训练单音素模型steps/train_mono.sh --boost-silence 1.25 --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/mono || exit 1;#test monophone modellocal/thchs-30_decode.sh --mono true --nj $n "steps/decode.sh" exp/mono data/mfcc &train_mono.sh用法
echo "Usage: steps/train_mono.sh [options] <data-dir> <lang-dir> <exp-dir>"echo " e.g.: steps/train_mono.sh data/train.1k data/lang exp/mono"echo "main options (for others, see top of script file)"其中的参数设置,训练单音素的基础HMM模型,迭代40次,并按照realign_iters的次数对数据对齐
# Begin configuration section.nj=4cmd=run.plscale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"num_iters=40 # Number of iterations of trainingmax_iter_inc=30 # Last iter to increase #Gauss on.totgauss=1000 # Target #Gaussians.careful=falseboost_silence=1.0 # Factor by which to boost silence likelihoods in alignmentrealign_iters="1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32 35 38";config= # name of config file.stage=-4power=0.25 # exponent to determine number of gaussians from occurrence countsnorm_vars=false # deprecated, prefer --cmvn-opts "--norm-vars=false"cmvn_opts= # can be used to add extra options to cmvn.# End configuration section.
thchs-30_decode.sh测试单音素模型,实际使用mkgraph.sh建立完全的识别网络,并输出一个有限状态转换器,最后使用decode.sh以语言模型和测试数据为输入计算WER.
#decode wordutils/mkgraph.sh $opt data/graph/lang $srcdir $srcdir/graph_word || exit 1;$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_word $datadir/test $srcdir/decode_test_word || exit 1#decode phoneutils/mkgraph.sh $opt data/graph_phone/lang $srcdir $srcdir/graph_phone || exit 1;$decoder --cmd "$decode_cmd" --nj $nj $srcdir/graph_phone $datadir/test_phone $srcdir/decode_test_phone || exit 1align_si.sh用指定模型对指定数据进行对齐,一般在训练新模型前进行,以上一版本模型作为输入,输出在<align-dir>
#monophone_ali
steps/align_si.sh--boost-silence1.25--nj $n--cmd"$train_cmd" data/mfcc/train data/lang exp/mono exp/mono_ali|| exit1;
# Computes training alignments using a model with delta or
# LDA+MLLT features.
# If you supply the "--use-graphs true" option, it will use the training
# graphs from the source directory (where the model is). In this
# case the number of jobs must match with the source directory.
echo"usage: steps/align_si.sh <data-dir> <lang-dir> <src-dir> <align-dir>"
- echo "e.g.: steps/align_si.sh data/train data/lang exp/tri1 exp/tri1_ali"
echo"main options (for others, see top of script file)"
echo" --config <config-file> # config containing options"
echo" --nj <nj> # number of parallel jobs"
echo" --use-graphs true # use graphs in src-dir"
echo" --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs."
Triphone
以单音素模型为输入训练上下文相关的三音素模型#triphone
steps/train_deltas.sh --boost-silence 1.25 --cmd "$train_cmd" 2000 10000 data/mfcc/train data/lang exp/mono_ali exp/tri1 || exit 1;
#test tri1 model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri1 data/mfcc &
train_deltas.sh中的相关配置如下,其中输入# Begin configuration.
stage=-4 # This allows restarting after partway, when something when wrong.
config=
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
beam=10
careful=false
retry_beam=40
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=true"
# use the option --cmvn-opts "--norm-means=false"
cmvn_opts=
delta_opts=
context_opts= # use"--context-width=5 --central-position=2" for quinphone
# End configuration.
echo "Usage: steps/train_deltas.sh <num-leaves> <tot-gauss> <data-dir> <lang-dir> <alignment-dir> <exp-dir>"
echo "e.g.: steps/train_deltas.sh 2000 10000 data/train_si84_half data/lang exp/mono_ali exp/tri1"
LDA_MLLT
对特征使用LDA和MLLT进行变换,训练加入LDA和MLLT的三音素模型。
LDA+MLLT refers to the way we transform the features after computing the MFCCs: we splice across several frames, reduce the dimension (to 40 by default) using Linear Discriminant Analysis), and then later estimate, over multiple iterations, a diagonalizing transform known as MLLT or CTC.
详情可参考 http://kaldi-asr.org/doc/transform.html
#triphone_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri1 exp/tri1_ali || exit 1;
#lda_mllt
steps/train_lda_mllt.sh --cmd "$train_cmd" --splice-opts "--left-context=3 --right-context=3" 2500 15000 data/mfcc/train data/lang exp/tri1_ali exp/tri2b || exit 1;
#test tri2b model
local/thchs-30_decode.sh --nj $n "steps/decode.sh" exp/tri2b data/mfcc &
train_lda_mllt.sh相关代码配置如下:# Begin configuration.
cmd=run.pl
config=
stage=-5
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 20 30";
mllt_iters="2 4 6 12";
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
dim=40
beam=10
retry_beam=40
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
power=0.25 # Exponent for number of gaussians according to occurrence counts
randprune=4.0 # This is approximately the ratio by which we will speed up the
# LDA and MLLT calculations via randomized pruning.
splice_opts=
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
norm_vars=false # deprecated. Prefer --cmvn-opts "--norm-vars=false"
cmvn_opts=
context_opts= # use "--context-width=5 --central-position=2" for quinphone.
# End configuration.
Sat
运用基于特征空间的最大似然线性回归(fMLLR)进行说话人自适应训练This does Speaker Adapted Training (SAT), i.e. train on fMLLR-adapted features. It can be done on top of either LDA+MLLT, or delta and delta-delta features. If there are no transforms supplied in the alignment directory, it will estimate transforms itself before building the tree (and in any case, it estimates transforms a number of times during training).#lda_mllt_ali
steps/align_si.sh --nj $n --cmd "$train_cmd" --use-graphs true data/mfcc/train data/lang exp/tri2b exp/tri2b_ali || exit 1;
#sat
steps/train_sat.sh --cmd "$train_cmd" 2500 15000 data/mfcc/train data/lang exp/tri2b_ali exp/tri3b || exit 1;
#test tri3b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri3b data/mfcc &
train_sat.sh的具体配置如下:# Begin configuration section.
stage=-5
exit_stage=-100 # you can use this to require it to exit at the
# beginning of a specific stage. Not all values are
# supported.
fmllr_update_type=full
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
beam=10
retry_beam=40
careful=false
boost_silence=1.0 # Factor by which to boost silence likelihoods in alignment
context_opts= # e.g. set this to "--context-width 5 --central-position 2" for quinphone.
realign_iters="10 20 30";
fmllr_iters="2 4 6 12";
silence_weight=0.0 # Weight on silence in fMLLR estimation.
num_iters=35 # Number of iterations of training
max_iter_inc=25 # Last iter to increase #Gauss on.
power=0.2 # Exponent for number of gaussians according to occurrence counts
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
phone_map=
train_tree=true
tree_stats_opts=
cluster_phones_opts=
compile_questions_opts=
# End configuration section.
decode_fmllr.sh :对做了发音人自适应的模型进行解码Decoding script that does fMLLR. This can be on top of delta+delta-delta, or LDA+MLLT features.Quick# There are 3 models involved potentially in this script,
# and for a standard, speaker-independent system they will all be the same.
# The "alignment model" is for the 1st-pass decoding and to get the
# Gaussian-level alignments for the "adaptation model" the first time we
# do fMLLR. The "adaptation model" is used to estimate fMLLR transforms
# and to generate state-level lattices. The lattices are then rescored
# with the "final model".
#
# The following table explains where we get these 3 models from.
# Note: $srcdir is one level up from the decoding directory.
#
# Model Default source:
#
# "alignment model" $srcdir/final.alimdl --alignment-model <model>
# (or $srcdir/final.mdl if alimdl absent)
# "adaptation model" $srcdir/final.mdl --adapt-model <model>
# "final model" $srcdir/final.mdl --final-model <model>
Train a model on top of existing features (no feature-space learning of any kind is done). This script initializes the model (i.e., the GMMs) from the previous system's model.That is: for each state in the current model (after tree building), it chooses the closes state in the old model, judging the similarities based on overlap of counts in the tree stats.#sat_ali
steps/align_fmllr.sh --nj $n --cmd "$train_cmd" data/mfcc/train data/lang exp/tri3b exp/tri3b_ali || exit 1;
#quick
steps/train_quick.sh --cmd "$train_cmd" 4200 40000 data/mfcc/train data/lang exp/tri3b_ali exp/tri4b || exit 1;
#test tri4b model
local/thchs-30_decode.sh --nj $n "steps/decode_fmllr.sh" exp/tri4b data/mfcc &
train_quick.sh的配置:# Begin configuration..
cmd=run.pl
scale_opts="--transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1"
realign_iters="10 15"; # Only realign twice.
num_iters=20 # Number of iterations of training
maxiterinc=15 # Last iter to increase #Gauss on.
batch_size=750 # batch size to use while compiling graphs... memory/speed tradeoff.
beam=10 # alignment beam.
retry_beam=40
stage=-5
cluster_thresh=-1 # for build-tree control final bottom-up clustering of leaves
# End configuration section.
阅读全文
1 0
- kaldi中文语音识别thchs30模型训练代码功能和配置参数解读
- kaldi中文语音识别thchs30模型训练代码功能和配置参数解读
- kaldi中文语音识别thchs30模型训练代码功能和配置参数解读
- 基于kaldi、thchs30 的离线中文识别
- kaldi上运行thchs30中文语音库的错误总结
- kaldi训练thchs30详细步骤
- Kaldi中如何使用已经训练好的模型进行语音识别ASR呢?
- 使用kaldi进行语音自动切分、模型训练和强制对齐
- 中文语音识别代码
- 语音识别Kaldi
- Kaldi语音识别注意事项
- kaldi上第一个免费的中文语音识别例子
- Kaldi 中文语音识别需要考虑的问题
- kaldi上第一个免费的中文语音识别例子
- kaldi上第一个免费的中文语音识别例子
- Kaldi声学模型训练
- Pocketsphinx语音识别--重新训练声学模型
- kaldi中跑thchs30
- 实现用户注册与登录、登出(一)
- java 基础
- 三能动力PLM项目实施案例分享
- MyEclipse相关配置
- Android Studio新建类头部注释和添加函数注释模板及快捷键
- kaldi中文语音识别thchs30模型训练代码功能和配置参数解读
- Matlab中plot基本用法
- Sysbench压力测试基准测试用例
- tab表单三种写法及问题
- jQuery的ajax学习(二)事件监听
- 强连通分量
- Android性能优化之利用Rxlifecycle解决RxJava内存泄漏
- Leetcode 158 Read N Characters Given Read4 II
- 给初学者的RxJava2.0教程(四)