Kaldi中的plda的训练以及computer-socre

来源：互联网发布：mac版战网客户端编辑：程序博客网时间：2024/06/05 02:43

最近一直有个疑惑，plda是怎么训练的？就是plda是怎么计算得分的? 又是怎么根据得分进行判断说话人的？ EER跟准确率之间又是什么关系？
时间很紧，论文还没写，需要静一静，好好弄明白这个来清醒一下。
提取完i-vector之后，计算出每一句话的ivector特征，然后计算了sre

ivector-mean scp:exp/ivectors_sre/ivector.scp exp/ivectors_sre/mean.vec

下面看ivector-mean.cc里边进行了什么操作：

＃因为我们只传了两个参数，所以这个地方我们只把两个参数的处理代码拿出来#If 2 arguments are given, computes the mean of all input files and writes out the mean vector."if (po.NumArgs() == 2) {      // Compute the mean of the input vectors and write it out.      std::string ivector_rspecifier = po.GetArg(1),          mean_wxfilename = po.GetArg(2);      int32 num_done = 0;      SequentialBaseFloatVectorReader ivector_reader(ivector_rspecifier);      Vector<double> sum;      for (; !ivector_reader.Done(); ivector_reader.Next()) {        if (sum.Dim() == 0) sum.Resize(ivector_reader.Value().Dim());        sum.AddVec(1.0, ivector_reader.Value());        num_done++;      }      if (num_done == 0) {        KALDI_ERR << "No iVectors read";      } else {        sum.Scale(1.0 / num_done);        WriteKaldiObject(sum, mean_wxfilename, binary_write);        return 0;      }    } 上面代码就是把sre中所有的i-vector全都加起来并且总的utt number. 然后得到一个sre集合的均值向量，猜测是训练PLDA模型中u.������ =��+������ +������

下面就是plda-scoring.sh的脚本，输入了8个参数，最后得到的得分文件 plda_scores.

local/plda_scoring.sh $tandem_feats_dir/sre $tandem_feats_dir/train $tandem_feats_dir/test \     exp/ivectors_sre exp/ivectors_train exp/ivectors_test $trials exp/scores_gmm_512_ind_pooled

vim local/plda_scoring.sh，这里把没用的代码都删了，想看的可以去看一下这个文件。

＃各个参数看上面的脚本，8个参数plda_data_dir=$1  enroll_data_dir=$2test_data_dir=$3plda_ivec_dir=$4enroll_ivec_dir=$5test_ivec_dir=$6trials=$7scores_dir=$8#由i-vector特征来训练一个plda模型，plda模型也是由sre集合训练的，所以这里传的参数都是sre的。ivector-compute-plda ark:$plda_data_dir/spk2utt \    "ark:ivector-normalize-length scp:${plda_ivec_dir}/ivector.scp  ark:- |" \      $plda_ivec_dir/plda 2>$plda_ivec_dir/log/plda.logmkdir -p $scores_dirivector-plda-scoring --num-utts=ark:${enroll_ivec_dir}/num_utts.ark \   "ivector-copy-plda --smoothing=0.0 ${plda_ivec_dir}/plda - |"     "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \         scp:${enroll_ivec_dir}/spk_ivector.scp ark:- |" \   "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \        scp:${test_ivec_dir}/ivector.scp ark:- |" \   "cat '$trials' | awk '{print \$1, \$2}' |" $scores_dir/plda_scores

先看一下训练PLDA的源码：ivector-compute-plda.cc 只把关键代码拿出来，否则看起来很乱

int main(int argc, char *argv[]) {  try {    const char *usage =        "Computes a Plda object (for Probabilistic Linear Discriminant Analysis)\n"        "from a set of iVectors.  Uses speaker information from a spk2utt file\n"        "to compute within and between class variances.\n" ";    ParseOptions po(usage);    bool binary = true;    PldaEstimationConfig plda_config;    plda_config.Register(&po);    po.Register("binary", &binary, "Write output in binary mode");    po.Read(argc, argv);    #需要三个参数：sre的spk2utt，sre的ivetor.scp， plda模型    std::string spk2utt_rspecifier = po.GetArg(1),        ivector_rspecifier = po.GetArg(2),        plda_wxfilename = po.GetArg(3);    int64 num_spk_done = 0, num_spk_err = 0,        num_utt_done = 0, num_utt_err = 0;    SequentialTokenVectorReader spk2utt_reader(spk2utt_rspecifier);    RandomAccessBaseFloatVectorReader ivector_reader(ivector_rspecifier);    PldaStats plda_stats;     for (; !spk2utt_reader.Done(); spk2utt_reader.Next()) {      std::string spk = spk2utt_reader.Key();      const std::vector<std::string> &uttlist = spk2utt_reader.Value(); ＃所有spk的utts      std::vector<Vector<BaseFloat> > ivectors; ＃注意类型，包涵所有的ivector      ivectors.reserve(uttlist.size());      #对每一句话进行处理      for (size_t i = 0; i < uttlist.size(); i++) {        std::string utt = uttlist[i];        ivectors.resize(ivectors.size() + 1);        ivectors.back() = ivector_reader.Value(utt);        num_utt_done++;       }       Matrix<double> ivector_mat(ivectors.size(), ivectors[0].Dim()); #每个i-vector一行，组成一个矩阵，       for (size_t i = 0; i < ivectors.size(); i++)｛          ivector_mat.Row(i).CopyFromVec(ivectors[i]);       ｝       double weight = 1.0;        plda_stats.AddSamples(weight, ivector_mat); #每个人一个plda_stats,在plda.cc       num_spk_done++;    }    ＃对所有的plda_stats排序    #PLDA的实现是根据："Probabilistic Linear Discriminant Analysis" by Sergey Ioffe, ECCV 2006.    plda_stats.Sort();    PldaEstimator plda_estimator(plda_stats);    Plda plda;    plda_estimator.Estimate(plda_config, &plda);      #根据config的迭代次数训练，最后得到一个对角的类间方差矩阵，600维是不是只是对角线元素？在plda.log    WriteKaldiObject(plda, plda_wxfilename, binary);

最后生成了一个p lda的模型，怎么查看这个模型呢？

~/kaldi/src/ivectorbin/ivector-copy-plda --binary=false plda - > plda.txt

plda.txt是一个604个600维的向量，一共48个人，1304句话，我也不知道604怎么来的，等会看看论文再补充。
然后就是计算得分:

ivector-plda-scoring --num-utts=ark:${enroll_ivec_dir}/num_utts.ark \   "ivector-copy-plda --smoothing=0.0 ${plda_ivec_dir}/plda - |"     "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \         scp:${enroll_ivec_dir}/spk_ivector.scp ark:- |" \   "ark:ivector-subtract-global-mean ${plda_ivec_dir}/mean.vec \        scp:${test_ivec_dir}/ivector.scp ark:- |" \   "cat '$trials' | awk '{print \$1, \$2}' |" $scores_dir/plda_scores#--num-utts是训练集中每个人对应的句子的数目：<spkid> <number of this spk's utts>#其中第二个参数的结果是： enrollment的每个说话人的ivector都减去mean.vec的结果#其中第三个参数的结果是： test的每句话的ivector都减去mean.vec的结果，注意跟第二个参数的区别#其中第四个参数的结果是： trials文件的前两列，<spkid> <spkid_uttid>

vim ivector-plda-scoring.cc

int main(int argc, char *argv[]) {  using namespace kaldi;  try {    const char *usage =        "Computes log-likelihood ratios for trials using PLDA model\n"        "Note: the 'trials-file' has lines of the form\n"        "<key1> <key2>\n"        "and the output will have the form\n"        "<key1> <key2> [<dot-product>]\n"        "For training examples, the input is the iVectors averaged over speakers;\n"        "a separate archive containing the number of utterances per speaker may be\n"        "optionally supplied using the --num-utts option; this affects the PLDA\n"        "scoring (if not supplied, it defaults to 1 per speaker).\n";    ParseOptions po(usage);    std::string num_utts_rspecifier;    PldaConfig plda_config;    plda_config.Register(&po);    po.Register("num-utts", &num_utts_rspecifier, "Table to read the number of "                "utterances per speaker, e.g. ark:num_utts.ark\n");    po.Read(argc, argv);    std::string plda_rxfilename = po.GetArg(1),        train_ivector_rspecifier = po.GetArg(2),        test_ivector_rspecifier = po.GetArg(3),        trials_rxfilename = po.GetArg(4),        scores_wxfilename = po.GetArg(5);    //  diagnostics:    double tot_test_renorm_scale = 0.0, tot_train_renorm_scale = 0.0;    int64 num_train_ivectors = 0, num_train_errs = 0, num_test_ivectors = 0;    int64 num_trials_done = 0, num_trials_err = 0;    Plda plda;    ReadKaldiObject(plda_rxfilename, &plda); #将plda模型文件传入plda    int32 dim = plda.Dim(); #等于604还是600？ mean.dim()应该是600吧    SequentialBaseFloatVectorReader train_ivector_reader(train_ivector_rspecifier); #(20,600)    SequentialBaseFloatVectorReader test_ivector_reader(test_ivector_rspecifier); #(20,600)    RandomAccessInt32Reader num_utts_reader(num_utts_rspecifier); #dict{spk:num_of_utts,...}    typedef unordered_map<string, Vector<BaseFloat>*, StringHasher> HashType;    // These hashes will contain the iVectors in the PLDA subspace    // (that makes the within-class variance unit and diagonalizes the    // between-class covariance).      HashType train_ivectors, test_ivectors;    KALDI_LOG << "Reading train iVectors";    for (; !train_ivector_reader.Done(); train_ivector_reader.Next()) {      std::string spk = train_ivector_reader.Key();      if (train_ivectors.count(spk) != 0) {        KALDI_ERR << "Duplicate training iVector found for speaker " << spk;      }      const Vector<BaseFloat> &ivector = train_ivector_reader.Value();      Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim);      tot_train_renorm_scale += plda.TransformIvector(plda_config, ivector,                                                           transformed_ivector);      train_ivectors[spk] = transformed_ivector;  #这个说话人的ivector经过一次变换      num_train_ivectors++;    }    KALDI_LOG << "Average renormalization scale on training iVectors was "              << (tot_train_renorm_scale / num_train_ivectors);    KALDI_LOG << "Reading test iVectors";    for (; !test_ivector_reader.Done(); test_ivector_reader.Next()) {      std::string utt = test_ivector_reader.Key();      if (test_ivectors.count(utt) != 0) {        KALDI_ERR << "Duplicate test iVector found for utterance " << utt;      }      const Vector<BaseFloat> &ivector = test_ivector_reader.Value();      Vector<BaseFloat> *transformed_ivector = new Vector<BaseFloat>(dim);      tot_test_renorm_scale += plda.TransformIvector(plda_config, ivector,                                                     transformed_ivector);      test_ivectors[utt] = transformed_ivector; #这个说话人的ivector经过一次变换，跟train中的操作一样      num_test_ivectors++;   }  KALDI_LOG << "Average renormalization scale on test iVectors was "              << (tot_test_renorm_scale / num_test_ivectors);    ＃下面开始根据trials文件计算得分    Input ki(trials_rxfilename);    bool binary = false;    Output ko(scores_wxfilename, binary);    double sum = 0.0, sumsq = 0.0;    std::string line;    ＃读取一行<spkid> <uttid>    while (std::getline(ki.Stream(), line)) {        std::vector<std::string> fields;      SplitStringToVector(line, " \t\n\r", true, &fields);      std::string key1 = fields[0], key2 = fields[1];      #key1是对应的train中每个人的变换后的向量， key2是对应的test中每个人的变换后的向量      const Vector<BaseFloat> *train_ivector = train_ivectors[key1],          *test_ivector = test_ivectors[key2];      Vector<double> train_ivector_dbl(*train_ivector),          test_ivector_dbl(*test_ivector);      int32 num_train_examples;      if (!num_utts_rspecifier.empty()) {        // we already checked that it has this key.        num_train_examples = num_utts_reader.Value(key1);      } else {        num_train_examples = 1;      }      ＃第一个参数是train[key1]对应key1这个spk的ivector均值中心化向量，中心化的意思是减去了全局的那个mean.vec      #第二个参数是num_train_examples,  是在训练集中这个人有多少句话      ＃第三个参数是test[key2]对应key2这句话的的ivector中心化向量      BaseFloat score = plda.LogLikelihoodRatio(train_ivector_dbl,                                                num_train_examples,                                                test_ivector_dbl);      ＃这里重点是这个LogLikelihoodRatio函数的实现，在plda.cc里边，源码在下边。      sum += score;      sumsq += score * score;      num_trials_done++;      ko.Stream() << key1 << ' ' << key2 << ' ' << score << std::endl;    } #while 循环结束    for (HashType::iterator iter = train_ivectors.begin();         iter != train_ivectors.end(); ++iter)      delete iter->second;    for (HashType::iterator iter = test_ivectors.begin();         iter != test_ivectors.end(); ++iter)      delete iter->second;    if (num_trials_done != 0) {      BaseFloat mean = sum / num_trials_done, scatter = sumsq / num_trials_done,          variance = scatter - mean * mean, stddev = sqrt(variance);      KALDI_LOG << "Mean score was " << mean << ", standard deviation was "                << stddev;    }    KALDI_LOG << "Processed " << num_trials_done << " trials, " << num_trials_err              << " had errors.";    return (num_trials_done != 0 ? 0 : 1);}

下面是计算对数似然比的源码部分：

// There is an extended comment within this file, referencing a paper by Ioffe, that  111 // may clarify what this function is doing.  112 double Plda::LogLikelihoodRatio(  113     const VectorBase<double> &transformed_train_ivector,  114     int32 n, // number of training utterances.  115     const VectorBase<double> &transformed_test_ivector) const {  116   int32 dim = Dim();  117   double loglike_given_class, loglike_without_class;  118   { // work out loglike_given_class.  119     // "mean" will be the mean of the distribution if it comes from the  120     // training example.  The mean is \frac{n \Psi}{n \Psi + I} \bar{u}^g  121     // "variance" will be the variance of that distribution, equal to  122     // I + \frac{\Psi}{n\Psi + I}.  123     Vector<double> mean(dim, kUndefined);  124     Vector<double> variance(dim, kUndefined);  125     for (int32 i = 0; i < dim; i++) {  126       mean(i) = n * psi_(i) / (n * psi_(i) + 1.0) * transformed_train_ivector(i);  127       variance(i) = 1.0 + psi_(i) / (n * psi_(i) + 1.0);  128     }  129     double logdet = variance.SumLog();  130     Vector<double> sqdiff(transformed_test_ivector);  131     sqdiff.AddVec(-1.0, mean);  132     sqdiff.ApplyPow(2.0);  133     variance.InvertElements();  134     loglike_given_class = -0.5 * (logdet + M_LOG_2PI * dim +  135                                   VecVec(sqdiff, variance));  136   }  137   { // work out loglike_without_class.  Here the mean is zero and the variance  138     // is I + \Psi.  139     Vector<double> sqdiff(transformed_test_ivector); // there is no offset.  140     sqdiff.ApplyPow(2.0);  141     Vector<double> variance(psi_);  142     variance.Add(1.0); // I + \Psi.  143     double logdet = variance.SumLog();  144     variance.InvertElements();  145     loglike_without_class = -0.5 * (logdet + M_LOG_2PI * dim +  146                                     VecVec(sqdiff, variance));  147   }  148   double loglike_ratio = loglike_given_class - loglike_without_class;  149   return loglike_ratio;  150 }

这里还没有看懂，可以参考：http://blog.csdn.net/xmu_jupiter/article/details/47281211

好吧，最重点都没说，光把源码捋了一遍，先理解一下大体思路.

0 0