Kaldi中的plda的训练以及computer-socre
来源:互联网 发布:mac版战网客户端 编辑:程序博客网 时间:2024/06/05 02:43
最近一直有个疑惑,plda是怎么训练的?就是plda是怎么计算得分的? 又是怎么根据得分进行判断说话人的? EER跟准确率之间又是什么关系?
时间很紧,论文还没写,需要静一静,好好弄明白这个来清醒一下。
提取完i-vector之后,计算出每一句话的ivector特征,然后计算了sre
ivector-mean scp:exp/ivectors_sre/ivector.scp exp/ivectors_sre/mean.vec
下面看ivector-mean.cc里边进行了什么操作:
#因为我们只传了两个参数,所以这个地方我们只把两个参数的处理代码拿出来#If 2 arguments are given, computes the mean of all input files and writes out the mean vector."if (po.NumArgs() == 2) { // Compute the mean of the input vectors and write it out. std::string ivector_rspecifier = po.GetArg(1), mean_wxfilename = po.GetArg(2); int32 num_done = 0; SequentialBaseFloatVectorReader ivector_reader(ivector_rspecifier); Vector<double> sum; for (; !ivector_reader.Done(); ivector_reader.Next()) { if (sum.Dim() == 0) sum.Resize(ivector_reader.Value().Dim()); sum.AddVec(1.0, ivector_reader.Value()); num_done++; } if (num_done == 0) { KALDI_ERR << "No iVectors read"; } else { sum.Scale(1.0 / num_done); WriteKaldiObject(sum, mean_wxfilename, binary_write); return 0; } } 上面代码就是把sre中所有的i-vector全都加起来并且总的utt number. 然后得到一个sre集合的均值向量,猜测是训练PLDA模型中u.������ =��+������ +������
下面就是plda-scoring.sh的脚本,输入了8个参数,最后得到的得分文件 plda_scores.
local/plda_scoring.sh $tandem_feats_dir/sre $tandem_feats_dir/train $tandem_feats_dir/test \ exp/ivectors_sre exp/ivectors_train exp/ivectors_test $trials exp/scores_gmm_512_ind_pooled
vim local/plda_scoring.sh, 这里把没用的代码都删了, 想看的可以去看一下这个文件。
#各个参数看上面的脚本,8个参数plda_data_dir=$1 enroll_data_dir=$2test_data_dir=$3plda_ivec_dir=$4enroll_ivec_dir=$5test_ivec_dir=$6trials=$7scores_dir=$8#由i-vector特征来训练一个plda模型,plda模型也是由sre集合训练的,所以这里传的参数都是sre的。ivector-compute-plda ark:$plda_data_dir/spk2utt \ "ark:ivector-normalize-length scp:${plda_ivec_dir}/ivector.scp ark:- |" \ $plda_ivec_dir/plda 2>$plda_ivec_dir/log/plda.logmkdir -p $scores_dirivector-plda-scoring --num-utts=ark:${enroll_ivec_dir}/num_utts.ark \ "ivector-copy-plda --smoothing=0.0 ${plda_ivec_dir}/plda - |" "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \ scp:${enroll_ivec_dir}/spk_ivector.scp ark:- |" \ "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \ scp:${test_ivec_dir}/ivector.scp ark:- |" \ "cat '$trials' | awk '{print \$1, \$2}' |" $scores_dir/plda_scores
先看一下训练PLDA的源码:ivector-compute-plda.cc 只把关键代码拿出来,否则看起来很乱
int main(int argc, char *argv[]) { try { const char *usage = "Computes a Plda object (for Probabilistic Linear Discriminant Analysis)\n" "from a set of iVectors. Uses speaker information from a spk2utt file\n" "to compute within and between class variances.\n" "; ParseOptions po(usage); bool binary = true; PldaEstimationConfig plda_config; plda_config.Register(&po); po.Register("binary", &binary, "Write output in binary mode"); po.Read(argc, argv); #需要三个参数:sre的spk2utt,sre的ivetor.scp, plda模型 std::string spk2utt_rspecifier = po.GetArg(1), ivector_rspecifier = po.GetArg(2), plda_wxfilename = po.GetArg(3); int64 num_spk_done = 0, num_spk_err = 0, num_utt_done = 0, num_utt_err = 0; SequentialTokenVectorReader spk2utt_reader(spk2utt_rspecifier); RandomAccessBaseFloatVectorReader ivector_reader(ivector_rspecifier); PldaStats plda_stats; for (; !spk2utt_reader.Done(); spk2utt_reader.Next()) { std::string spk = spk2utt_reader.Key(); const std::vector<std::string> &uttlist = spk2utt_reader.Value(); #所有spk的utts std::vector<Vector<BaseFloat> > ivectors; #注意类型,包涵所有的ivector ivectors.reserve(uttlist.size()); #对每一句话进行处理 for (size_t i = 0; i < uttlist.size(); i++) { std::string utt = uttlist[i]; ivectors.resize(ivectors.size() + 1); ivectors.back() = ivector_reader.Value(utt); num_utt_done++; } Matrix<double> ivector_mat(ivectors.size(), ivectors[0].Dim()); #每个i-vector一行,组成一个矩阵, for (size_t i = 0; i < ivectors.size(); i++){ ivector_mat.Row(i).CopyFromVec(ivectors[i]); } double weight = 1.0; plda_stats.AddSamples(weight, ivector_mat); #每个人一个plda_stats,在plda.cc num_spk_done++; } #对所有的plda_stats排序 #PLDA的实现是根据:"Probabilistic Linear Discriminant Analysis" by Sergey Ioffe, ECCV 2006. plda_stats.Sort(); PldaEstimator plda_estimator(plda_stats); Plda plda; plda_estimator.Estimate(plda_config, &plda); #根据config的迭代次数训练,最后得到一个对角的类间方差矩阵,600维是不是只是对角线元素?在plda.log WriteKaldiObject(plda, plda_wxfilename, binary);
最后生成了一个p lda的模型,怎么查看这个模型呢?
~/kaldi/src/ivectorbin/ivector-copy-plda --binary=false plda - > plda.txt
plda.txt是一个604个600维的向量,一共48个人,1304句话,我也不知道604怎么来的,等会看看论文再补充。
然后就是计算得分:
ivector-plda-scoring --num-utts=ark:${enroll_ivec_dir}/num_utts.ark \ "ivector-copy-plda --smoothing=0.0 ${plda_ivec_dir}/plda - |" "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \ scp:${enroll_ivec_dir}/spk_ivector.scp ark:- |" \ "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \ scp:${test_ivec_dir}/ivector.scp ark:- |" \ "cat '$trials' | awk '{print \$1, \$2}' |" $scores_dir/plda_scores#--num-utts是训练集中每个人对应的句子的数目:<spkid> <number of this spk's utts>#其中第二个参数的结果是: enrollment的每个说话人的ivector都减去mean.vec的结果#其中第三个参数的结果是: test的每句话的ivector都减去mean.vec的结果,注意跟第二个参数的区别#其中第四个参数的结果是: trials文件的前两列,<spkid> <spkid_uttid>
vim ivector-plda-scoring.cc
int main(int argc, char *argv[]) { using namespace kaldi; try { const char *usage = "Computes log-likelihood ratios for trials using PLDA model\n" "Note: the 'trials-file' has lines of the form\n" "<key1> <key2>\n" "and the output will have the form\n" "<key1> <key2> [<dot-product>]\n" "For training examples, the input is the iVectors averaged over speakers;\n" "a separate archive containing the number of utterances per speaker may be\n" "optionally supplied using the --num-utts option; this affects the PLDA\n" "scoring (if not supplied, it defaults to 1 per speaker).\n"; ParseOptions po(usage); std::string num_utts_rspecifier; PldaConfig plda_config; plda_config.Register(&po); po.Register("num-utts", &num_utts_rspecifier, "Table to read the number of " "utterances per speaker, e.g. ark:num_utts.ark\n"); po.Read(argc, argv); std::string plda_rxfilename = po.GetArg(1), train_ivector_rspecifier = po.GetArg(2), test_ivector_rspecifier = po.GetArg(3), trials_rxfilename = po.GetArg(4), scores_wxfilename = po.GetArg(5); // diagnostics: double tot_test_renorm_scale = 0.0, tot_train_renorm_scale = 0.0; int64 num_train_ivectors = 0, num_train_errs = 0, num_test_ivectors = 0; int64 num_trials_done = 0, num_trials_err = 0; Plda plda; ReadKaldiObject(plda_rxfilename, &plda); #将plda模型文件传入plda int32 dim = plda.Dim(); #等于604还是600? mean.dim()应该是600吧 SequentialBaseFloatVectorReader train_ivector_reader(train_ivector_rspecifier); #(20,600) SequentialBaseFloatVectorReader test_ivector_reader(test_ivector_rspecifier); #(20,600) RandomAccessInt32Reader num_utts_reader(num_utts_rspecifier); #dict{spk:num_of_utts,...} typedef unordered_map<string, Vector<BaseFloat>*, StringHasher> HashType; // These hashes will contain the iVectors in the PLDA subspace // (that makes the within-class variance unit and diagonalizes the // between-class covariance). HashType train_ivectors, test_ivectors; KALDI_LOG << "Reading train iVectors"; for (; !train_ivector_reader.Done(); train_ivector_reader.Next()) { std::string spk = train_ivector_reader.Key(); if (train_ivectors.count(spk) != 0) { KALDI_ERR << "Duplicate training iVector found for speaker " << spk; } const Vector<BaseFloat> &ivector = train_ivector_reader.Value(); Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim); tot_train_renorm_scale += plda.TransformIvector(plda_config, ivector, transformed_ivector); train_ivectors[spk] = transformed_ivector; #这个说话人的ivector经过一次变换 num_train_ivectors++; } KALDI_LOG << "Average renormalization scale on training iVectors was " << (tot_train_renorm_scale / num_train_ivectors); KALDI_LOG << "Reading test iVectors"; for (; !test_ivector_reader.Done(); test_ivector_reader.Next()) { std::string utt = test_ivector_reader.Key(); if (test_ivectors.count(utt) != 0) { KALDI_ERR << "Duplicate test iVector found for utterance " << utt; } const Vector<BaseFloat> &ivector = test_ivector_reader.Value(); Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim); tot_test_renorm_scale += plda.TransformIvector(plda_config, ivector, transformed_ivector); test_ivectors[utt] = transformed_ivector; #这个说话人的ivector经过一次变换,跟train中的操作一样 num_test_ivectors++; } KALDI_LOG << "Average renormalization scale on test iVectors was " << (tot_test_renorm_scale / num_test_ivectors); #下面开始根据trials文件计算得分 Input ki(trials_rxfilename); bool binary = false; Output ko(scores_wxfilename, binary); double sum = 0.0, sumsq = 0.0; std::string line; #读取一行<spkid> <uttid> while (std::getline(ki.Stream(), line)) { std::vector<std::string> fields; SplitStringToVector(line, " \t\n\r", true, &fields); std::string key1 = fields[0], key2 = fields[1]; #key1是对应的train中每个人的变换后的向量, key2是对应的test中每个人的变换后的向量 const Vector<BaseFloat> *train_ivector = train_ivectors[key1], *test_ivector = test_ivectors[key2]; Vector<double> train_ivector_dbl(*train_ivector), test_ivector_dbl(*test_ivector); int32 num_train_examples; if (!num_utts_rspecifier.empty()) { // we already checked that it has this key. num_train_examples = num_utts_reader.Value(key1); } else { num_train_examples = 1; } #第一个参数是train[key1]对应key1这个spk的ivector均值中心化向量,中心化的意思是减去了全局的那个mean.vec #第二个参数是num_train_examples, 是在训练集中这个人有多少句话 #第三个参数是test[key2]对应key2这句话的的ivector中心化向量 BaseFloat score = plda.LogLikelihoodRatio(train_ivector_dbl, num_train_examples, test_ivector_dbl); #这里重点是这个LogLikelihoodRatio函数的实现,在plda.cc里边,源码在下边。 sum += score; sumsq += score * score; num_trials_done++; ko.Stream() << key1 << ' ' << key2 << ' ' << score << std::endl; } #while 循环结束 for (HashType::iterator iter = train_ivectors.begin(); iter != train_ivectors.end(); ++iter) delete iter->second; for (HashType::iterator iter = test_ivectors.begin(); iter != test_ivectors.end(); ++iter) delete iter->second; if (num_trials_done != 0) { BaseFloat mean = sum / num_trials_done, scatter = sumsq / num_trials_done, variance = scatter - mean * mean, stddev = sqrt(variance); KALDI_LOG << "Mean score was " << mean << ", standard deviation was " << stddev; } KALDI_LOG << "Processed " << num_trials_done << " trials, " << num_trials_err << " had errors."; return (num_trials_done != 0 ? 0 : 1);}
下面是计算对数似然比的源码部分:
// There is an extended comment within this file, referencing a paper by Ioffe, that 111 // may clarify what this function is doing. 112 double Plda::LogLikelihoodRatio( 113 const VectorBase<double> &transformed_train_ivector, 114 int32 n, // number of training utterances. 115 const VectorBase<double> &transformed_test_ivector) const { 116 int32 dim = Dim(); 117 double loglike_given_class, loglike_without_class; 118 { // work out loglike_given_class. 119 // "mean" will be the mean of the distribution if it comes from the 120 // training example. The mean is \frac{n \Psi}{n \Psi + I} \bar{u}^g 121 // "variance" will be the variance of that distribution, equal to 122 // I + \frac{\Psi}{n\Psi + I}. 123 Vector<double> mean(dim, kUndefined); 124 Vector<double> variance(dim, kUndefined); 125 for (int32 i = 0; i < dim; i++) { 126 mean(i) = n * psi_(i) / (n * psi_(i) + 1.0) * transformed_train_ivector(i); 127 variance(i) = 1.0 + psi_(i) / (n * psi_(i) + 1.0); 128 } 129 double logdet = variance.SumLog(); 130 Vector<double> sqdiff(transformed_test_ivector); 131 sqdiff.AddVec(-1.0, mean); 132 sqdiff.ApplyPow(2.0); 133 variance.InvertElements(); 134 loglike_given_class = -0.5 * (logdet + M_LOG_2PI * dim + 135 VecVec(sqdiff, variance)); 136 } 137 { // work out loglike_without_class. Here the mean is zero and the variance 138 // is I + \Psi. 139 Vector<double> sqdiff(transformed_test_ivector); // there is no offset. 140 sqdiff.ApplyPow(2.0); 141 Vector<double> variance(psi_); 142 variance.Add(1.0); // I + \Psi. 143 double logdet = variance.SumLog(); 144 variance.InvertElements(); 145 loglike_without_class = -0.5 * (logdet + M_LOG_2PI * dim + 146 VecVec(sqdiff, variance)); 147 } 148 double loglike_ratio = loglike_given_class - loglike_without_class; 149 return loglike_ratio; 150 }
这里还没有看懂,可以参考:http://blog.csdn.net/xmu_jupiter/article/details/47281211
好吧,最重点都没说,光把源码捋了一遍,先理解一下大体思路.
0 0
- Kaldi中的plda的训练以及computer-socre
- kaldi的triphone训练详解
- plda
- Kaldi-Timit 训练
- Kaldi声学模型训练
- kaldi训练thchs30详细步骤
- kaldi中的特征提取
- kaldi中的数据准备
- kaldi中的深度神经网络
- kaldi中的深度神经网络
- kaldi中的深度神经网络
- kaldi中的深度神经网络
- 修复kaldi中的在线语音识别的bug
- 如何用kaldi训练好的DNN模型做在线识别
- Kaldi中如何使用已经训练好的模型进行语音识别ASR呢?
- 异常声音检测之kaldi DNN 训练
- kaldi 中mono phone训练过程
- kaldi mono训练(学习查阅博客)
- 【Android 进阶:翻译】Get Started With Firebase for Android
- composer autoload原理浅析
- 基础知识
- java网络编程之TCP
- 高数常考考点总结
- Kaldi中的plda的训练以及computer-socre
- Android6.0读取通话记录
- 【自考】数据库系统原理--范式
- iOS_数字变化(UICountingLabel与CADisplayLink)
- 安装5.7.11版本mysql所遇问题
- ruby基本概念(学习笔记二)
- Spring MVC 使用ajax异步交互返回数据
- 网络——host绑定
- Oracle 11g学习笔记--model子句