kaggle 各种评价指标之二 :Error Metrics for Classification Problems 分类问题错误度量
来源:互联网 发布:java开发工程师累不累 编辑:程序博客网 时间:2024/05/29 17:33
基本上必须看一遍,顺便简单翻译一下:(暂时留着,持续更新ing)
Error Metrics for Classification Problems 分类问题错误度量
Logarithmic Loss
对数损失
The logarithm of the likelihood function for a Bernoulli random distribution.
In plain English, this error metric is used where contestants have to predict that something is true or false with a probability (likelihood) ranging from definitely true (1) to equally true (0.5) to definitely false(0).
The use of log on the error provides extreme punishments for being both confident and wrong. In the worst possible case, a single prediction that something is definitely true (1) when it is actually false will add infinite to your error score and make every other entry pointless. In Kaggle competitions, predictions are bounded away from the extremes by a small value in order to prevent this.
where N is the number of examples, M is the number of classes, and $y_{ij}$ is a binary variable indicating whether class j was correct for example i. In the case where the number of classes is 2 (M=2) then the formula simplies to:
Python Code
import scipy as spdef logloss(act, pred): epsilon = 1e-15 pred = sp.maximum(epsilon, pred) pred = sp.minimum(1-epsilon, pred) ll = sum(act*sp.log(pred) + sp.subtract(1,act)*sp.log(sp.subtract(1,pred))) ll = ll * -1.0/len(act) return lllanguage: python
R Code
MultiLogLoss <- function(act, pred){ eps <- 1e-15 pred <- pmin(pmax(pred, eps), 1 - eps) sum(act * log(pred) + (1 - act) * log(1 - pred)) * -1/NROW(act)}language: matlab
Sample usage Example in Python
pred = [1,0,1,0]act = [1,0,1,0]print(logloss(act,pred))error: nan,RuntimeWarning: divide by zero encountered in logplease solve this error kaggle peoplelanguage: haskell
Sample usage Example in R
pred1 = c(0.8,0.2)pred2 = c(0.6,0.4)pred <- rbind(pred1,pred2)predact1 <- c(1,0)act2 <- c(1,0)act <- rbind(act1,act2)MultiLogLoss(act,pred)
- Mean Consequential Error
Mean Consequential Error (MCE)
The mean/average of the "Consequential Error", where all errors are equally bad (1) and the only value that matters is an exact prediction (0).
Matlab code:
MCE= mean(logical(y-y_pred));
- Mean Average Precision@n
MAP n
Introduction
Parameters: n
Suppose there are m missing outbound edges from a user in a social graph, and you can predict up ton other nodes that the user is likely to follow. Then, by adapting the definition of average precision in IR (http://en.wikipedia.org/wiki/Information_retrieval,http://sas.uwaterloo.ca/stats_navigation/techreports/04WorkingPapers/2004-09.pdf), the average precision at n for this user is:
where P(k) means the precision at cut-off k in the item list, i.e.,the ratio of number of recommended nodes followed, up to the position k, over the number k; P(k) equals 0 when the k-th item is not followed upon recommendation; m is the number of relevant nodes; n is the number of predicted nodes. If the denominator is zero, P(k)/min(m,n) is set to zero.
(1) If the user follows recommended nodes #1 and #3 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/3)/3 ≈ 0.56
(2) If the user follows recommended nodes #1 and #2 along with another node that wasn't recommended, then ap@10 = (1/1 + 2/2)/3 ≈ 0.67
(3) If the user follows recommended nodes #1 and #3 and has no other missing nodes, then ap@10 = (1/1 + 2/3)/2 ≈ 0.83
The mean average precision for N users at position n is the average of the average precision of each user, i.e.,
Note this means that order matters. But it depends. Order matters only if there is at least one incorrect prediction. In other words, if all predictions are correct, it doesn't matter in which order they are given.
Thus, if you recommend two nodes A & B in that order and a user follows node A and not node B, your MAP@2 score will be higher (better) than if you recommended B and then A. This makes sense - you want the most relevant results to show up first. Consider the following examples:
(1) The user follows recommended nodes #1 and #2 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(2) The user follows recommended nodes #2 and #1 and has no other missing nodes, then ap@2 = (1/1 + 1/1)/2 = 1.0
(3) The user follows node #1 and it was recommended first along with another node that wasn't recommended, then ap@2 = (1/1 + 0)/2 = 0.5
(4) The user follows node #1 but it was recommended second along with another node that wasn't recommend, then ap@2 = (0 + 1/2)/2 = 0.25
So, it is better to submit more certain recommendations first. AP score reflects this.
Here's an easy intro to MAP: http://fastml.com/what-you-wanted-to-know-about-mean-average-precision/
Here's another intro to MAP from our forums.
Sample Implementations
- our C# Production Implementation
- R,test cases
- Haskell,test cases
- MATLAB / Octave,test cases
- Python,test cases
Contests that used MAP@K
- MAP@500: https://www.kaggle.com/c/msdchallenge/details/Evaluation
- MAP@200: https://www.kaggle.com/c/event-recommendation-engine-challenge
- MAP@12: https://www.kaggle.com/c/outbrain-click-prediction/details/evaluation
- MAP@10: https://www.kaggle.com/c/FacebookRecruiting
- MAP@10: https://www.kaggle.com/c/coupon-purchase-prediction/details/evaluation
- MAP@7: https://www.kaggle.com/c/santander-product-recommendation
- MAP@5: https://www.kaggle.com/c/expedia-hotel-recommendations
- MAP@3: https://www.kddcup2012.org/c/kddcup2012-track1/details/Evaluation
Article needs:
- explanation
- formula
- example solution & submission files
- Multi Class Log Loss
只有一个R 代码
R code
multiloss <- function(predicted, actual){ #to add: reorder the rows predicted_m <- as.matrix(select(predicted, -device_id)) # bound predicted by max value predicted_m <- apply(predicted_m, c(1,2), function(x) max(min(x, 1-10^(-15)), 10^(-15))) actual_m <- as.matrix(select(actual, -device_id)) score <- -sum(actual_m*log(predicted_m))/nrow(predicted_m) return(score)}
补充来自scikitlearn:
逻辑斯蒂损失或者交叉熵损失
Log loss, aka logistic loss or cross-entropy loss.
This is the loss function used in (multinomial) logistic regressionand extensions of it such as neural networks, defined as the negativelog-likelihood of the true labels given a probabilistic classifier’spredictions. The log loss is only defined for two or more labels.For a single sample with true label yt in {0,1} andestimated probability yp that yt = 1, the log loss is
-log P(yt|yp) = -(yt log(yp) + (1 - yt) log(1 - yp))
- Hamming Loss
The Hamming Loss measures accuracy in a multi-label classification task. The formula is given by:
汉明损失 用于多分类问题预测
where
- Mean Utility
Mean Utility
带权重的true positives, true negatives, false positives, and false negatives.
Mean Utility is the weighted sum of true positives, true negatives, false positives, and false negatives. There are 4 parameters, each are the weights.
The Mean Utility score is given by
Kaggle's implementation of Mean Utility is directional, which means higher values are better.
- Matthews Correlation Coefficient
kaggle未定义
- kaggle 各种评价指标之二 :Error Metrics for Classification Problems 分类问题错误度量
- kaggle 各种评价指标之一 :Error Metrics for Regression Problems 回归问题错误度量
- 二类分类问题评价指标
- 二分类问题的评价指标:ROC,AUC
- 【分享】SLR Toolbox for Classification Problems(用于分类问题的SLR工具包)
- 二值分类模型的评价指标
- 二分类模型评价指标-AUC
- Classification-对评价分类
- 失真度量与评价指标
- 二分类 度量可分类程度指标Hindex IN ,matlab
- keras Usage of metrics 评价指标
- 分类器评价指标
- 分类器评价指标
- 分类器评价指标:
- 分类器评价指标
- 二分类问题模型指标
- 模式识别分类器评价指标之CMC曲线
- 分类器模型评价指标之ROC曲线
- 多线程
- Tv上button焦点问题
- oracle判断是否是数字的函数
- popwindow初始时弹出引发的Unable to add window --token null is not valid
- Tomcat学习2.1(简单的Servlet容器)
- kaggle 各种评价指标之二 :Error Metrics for Classification Problems 分类问题错误度量
- 结构体定义使用小总结
- Oracle 两表关联更新
- 自定义navigationcontroller返回动画
- python 不是内部或外部命令
- 关于JavaScript判断语句的一个小坑
- 用js实现返回上一步操作
- STM32F10x之RCC
- jQuery实现自由拖动DIV插件