EER(等概率错误)
来源:互联网 发布:淘宝网帽子女士鸭舌帽 编辑:程序博客网 时间:2024/04/27 21:31
----- 2017.3.06更新----
Equal Error Rate , 这个在说话人识别,说话人确认中最常用的评价标准,是一种使错误接受率(nontarget_is_target / (target_is_target + nontarget_is_target)) 和 错误拒绝率(target_is_nontarget / (target_is_nontarget + nontarget_is_nontarget))的一个相对平衡点阈值点,然后这个阈值点可以作为实际使用阶段的固定的阈值。
还记得trials文件嘛,还记得没有cvs文件自己伪造trials文件嘛, 还记得不明白为什么要制造50%或者80%的nontarget嘛,就是为了要计算EER。所以在伪造trials文件的时候,最好是分布均匀,也就是要涉及到每一个人,每一个人都要有一定数量的nontarget,其实也可以每个人对其他所有人都做一个nontarget,到底是取一部分还是所有的这个我也不确定,等验证过后再更新(记得验证)。
-->先说一些EER的计算:
false reject and false accept. Clearly, the false reject rate and the false accept rate depend on the threshold. When the two rates are equal, the common value is called equal error rate (EER).
什么是false reject(用fr表示), 就是本来应该accept 的结果 reject了:
FR = target_is_nontarget / (target_is_nontarget + nontarget_is_nontarget)
而false accept(用fa表示),就是本来应该reject的结果accept了:
FA = nontarget_is_target / (target_is_target + nontarget_is_target)
当E(fr) = E(fa) = E 时, E即 EER的值。
-->维基百科ROC曲线 https://zh.wikipedia.org/wiki/ROC曲线
--> 然后看一下kaldi源码:
eer=compute-eer <(python local/prepare_for_eer.py $trials local/scores_gmm_${num_components}_${x}_${y}/plda_scores) 2> /dev/null
单独运行:
python local/prepare_for_eer.py data/test/trials exp/scores_gmm_2048_ind_female/plda_scores
结果:
-30.99115 target-28.06169 target-17.78868 target-87.6428 nontarget-74.32495 nontarget-74.18333 nontarget-5.662024 target-7.832421 target-26.46083 target-74.93365 nontarget-86.17784 nontarget-50.90917 nontarget-26.51904 target-14.09044 target...
#Ki就是上面的lines流, 把没用的代码全都删掉,可以去看kaldi的源码 while (std::getline(ki.Stream(), line)) { std::vector<std::string> split_line; SplitStringToVector(line, " \t", true, &split_line); BaseFloat score; if (split_line[1] == "target") target_scores.push_back(score); else if (split_line[1] == "nontarget") nontarget_scores.push_back(score); else KALDI_ERR << "blablabla" } BaseFloat threshold; #定义一个threshold,两个list: target_scores, nontarget_scores BaseFloat eer = ComputeEer(&target_scores, &nontarget_scores, &threshold); KALDI_LOG << "Equal error rate is " << (100.0 * eer) << "%, at threshold " << threshold; std::cout.precision(4); std::cout << (100.0 * eer); return 0;
下面看ComputeEer(&target_scores, &nontarget_scores, &threshold)的实现
{ #将两个都从大到小排列 std::sort(target_scores->begin(), target_scores->end()); std::sort(nontarget_scores->begin(), nontarget_scores->end()); size_t target_position = 0, target_size = target_scores->size(); for (; target_position + 1 < target_size; target_position++) { size_t nontarget_size = nontarget_scores->size(), #计算nontarget的个数 #比如nontarget_size=100 ,target_size=100 这个 nontarget_n 属于[0,100], #所以nontarget_positon 从99到-1 nontarget_n = nontarget_size * target_position * 1.0 / target_size; nontarget_position = nontarget_size - 1 - nontarget_n; if (nontarget_position < 0) nontarget_position = 0; #所以当nontarget_position 小于 target_position 的值的时候 if ((*nontarget_scores)[nontarget_position] < (*target_scores)[target_position]) break; } *threshold = (*target_scores)[target_position]; BaseFloat eer = target_position * 1.0 / target_size; return eer;}
要理解这个函数的实现,其实在compute-eer里边还是一行注释:
ComputeEer computes the Equal Error Rate (EER) for the given scores and returns it as a proportion beween 0 and 1. If we set the threshold at x, then the target error-rate is the proportion of target_scores below x; and the non-target error-rate is the proportion of non-target scores above x. We seek a threshold x for which these error rates are the same; this error rate is the EER. We compute this by iterating over the positions in target_scores: 0, 1, 2, and so on, and for each position consider whether the cutoff could be here. For each of these position we compute the corresponding position in nontarget_scores where the cutoff would be if the EER were the same. For instance, if the vectors had the same length, this would be position length() - 1, length() - 2, and so on. As soon as the value at that position in nontarget_scores at that position is less than the value from target_scores, we have our EER.
下面拿一个例子用python简单模拟一下:
#coding: utf-8'''首先计算这一步,将target的得分和nontarget的得分文件python local/prepare_for_eer.py data/test/trials exp/scores_gmm_2048_ind_female/plda_scores > scores'''target_scores = []nontarget_scores = []f = open('scores').readlines()#将两个数组读出来for line in f: splits = line.strip().split(' ') if splits[1] == 'target': target_scores.append(eval(splits[0])) else: nontarget_scores.append(eval(splits[0]))#排序,从小到大排序target_scores = sorted(target_scores)nontarget_scores = sorted(nontarget_scores)print target_scorestarget_size = len(target_scores)target_position = 0for target_position in range(target_size): nontarget_size = len(nontarget_scores) nontarget_n = nontarget_size * target_position * 1.0 / target_size nontarget_position = int(nontarget_size - 1 - nontarget_n) if nontarget_position < 0: nontarget_position = 0 if nontarget_scores[nontarget_position] < target_scores[target_position]: print "nontarget_scores[nontarget_position] is", nontarget_position, nontarget_scores[nontarget_position] print "target_scores[target_position] is", target_position, target_scores[target_position] breakthreshold = target_scores[target_position]print "threshold is --> ", thresholdeer = target_position * 1.0 / target_sizeprint "eer is --> ", eer
今天突然又想到一个问题,如果不用plda来计算score, 不管是余弦距离也好或者是其他的也好只要有分数,都可以用这个脚本”compute-eer score-file(target/nontarget score)”来计算EER.
怎么计算阈值怎么画EER曲线,找到平衡点,另一篇博客:
http://blog.csdn.net/zjm750617105/article/details/60503253
- EER(等概率错误)
- 等错误率EER
- 错误接受率 (FAR), 错误拒绝率(FRR), 等错误率(EER)
- eer
- EER
- 如何理解等错误率(EER, Equal Error Rate)?
- 随机数---等概率,特殊概率
- eer建模
- 等概率色子问题
- 等概率获取随机数
- 生成等概率
- 等概率产生器
- 等概率生成器
- 等概率随机数问题
- 等概率投骰子
- 等概率洗牌算法
- 等概率随机函数
- 等概率抽样
- jquery 对 Json 的各种遍历
- 路由命令学习笔记
- Android自定义View
- Spring4深入理解----事务(声明式事务和xml配置事务,事务传播属性,事务其他属性(隔离级别&回滚&只读&过期))
- 如何成为一名黑客(网络安全从业者)——网络攻击技术篇(2/8 网络嗅探)
- EER(等概率错误)
- 【Java TCP/IP Socket】构建和解析自定义协议消息(含代码)
- 有关Ionic
- POJ 2031
- Hive运行原理
- 【Java TCP/IP Socket】基于线程池的TCP服务器(含代码)
- SEO与标签的语义化
- 如何成为一名黑客(网络安全从业者)——网络攻击技术篇(3/8 网络协议欺骗)
- some math problems solved by Recursion & divide and conquer