跬步系列 - ROC和AUC

来源:互联网 发布:部落冲突储存器数据 编辑:程序博客网 时间:2024/05/17 01:07

Schedule

  • ROC(Receiver Operating Characteristic)
  • AUC(Area Under the Curve)

ROC(Receiver Operating Characteristic)

ROC在wikipedia上的介绍是这样的:

引用 : ROC(Receiver operating characteristic)-Wikipedia

In statistics, a receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection[1] in machine learning. The false-positive rate is also known as the fall-out or probability of false alarm[1] and can be calculated as (1 − specificity). The ROC curve is thus the sensitivity as a function of fall-out. In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from {\displaystyle -\infty } -\infty to the discrimination threshold) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability in x-axis.

AUC(Area Under the Curve)

AUC(Area Under the Curve)-Wikipedia
AUC计算方法总结

ROC和AUC的优点

那么ROC和AUC相较其他标准来说,优势是什么呢?

引用 : ROC和AUC介绍以及如何计算AUC

既然已经这么多评价标准,为什么还要使用ROC和AUC呢?因为ROC曲线有个很好的特性:当测试集中的正负样本的分布变化的时候,ROC曲线能够保持不变。在实际的数据集中经常会出现类不平衡(class imbalance)现象,即负样本比正样本多很多(或者相反),而且测试数据中的正负样本的分布也可能随着时间变化。下图是ROC曲线和Precision-Recall曲线5的对比:
ROC曲线 vs. Precision-Recall曲线

相关阅读

  • ROC(Receiver operating characteristic)-Wikipedia
  • ROC和AUC介绍以及如何计算AUC
0 0