优达机器学习:评估指标

来源:互联网 发布:网络打字员兼职平台 编辑:程序博客网 时间:2024/06/07 22:19

准确率、精确率和召回率

这里写图片描述

对于上图中的预测情况来看

  • 精确率为 (8+9)/20 = 85%,表示的是所有预测正确数量占总体预测数量的比例
  • 准确率为 8/(1+8)=88.89%,表示的是positive情况下,预测正确的个数占总positive预测数的比例
  • 召回率为8/(2+8)=80%,表示的是positive情况下,预测正确的个数占positive实际总数的比例
  • 准确率和召回率一般都指的是positive,当然对于多分类来说,也可以进行计算
  • 准确率也可以理解为查准率,召回率可以理解为查全率

练习:鲍威尔精确率和召回率

这里写图片描述

注意:猜测该表格的意思指的是每个人都收集不一样数量的照片,每个人照片的数量就是一行的和,然后用其他的方式去做训练,再对这些照片做预测,预测的结果就是图表,其中对于一个人来说,一行中数量最多的地方也就是他的positive点,在本题中都在斜对角上,所以精确率和召回率要对positive点进行预测

这里写图片描述

  • TRUE POSITIVES 表示预测正确且预测类型为positive
  • FALSE POSITIVES 表示预测错误且预测类型为positive
  • FALSE NEGTIVES 表示预测错误且预测类型为negtive

Precision=tptp+fp

Recall=tptp+fn

F1分数

  • F1 = 2 * (精确率 * 召回率) / (精确率 + 召回率)

回归指标

平均绝对误差

均方误差

回归分数函数

  • R2分数
  • 可释方差分数

练习:评估指标迷你项目

#!/usr/bin/python"""    Starter code for the evaluation mini-project.    Start by copying your trained/tested POI identifier from    that which you built in the validation mini-project.    This is the second step toward building your POI identifier!    Start by loading/formatting the data..."""import pickleimport syssys.path.append("../tools/")from feature_format import featureFormat, targetFeatureSplitdata_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )### add more features to features_list!features_list = ["poi", "salary"]data = featureFormat(data_dict, features_list)labels, features = targetFeatureSplit(data)### your code goes here from sklearn.model_selection import train_test_splitfeatures_train, features_test, labels_train, labels_test = train_test_split(features, labels,test_size=0.3,random_state=42)from sklearn import treeclf = tree.DecisionTreeClassifier()clf.fit(features_train,labels_train)result = clf.predict(features_test)# 3 answer:4# num = len( filter(lambda x:x==1,result) )# print num# 4 answer:29# print len(features_test)# 5 answer:0.862068965517# import numpy as np# result = np.zeros( len(features_test) )# from sklearn.metrics import accuracy_score# print accuracy_score(labels_test , result)# 6 answer:I don't know why is nope,because I get 3# from sklearn import tree# clf = tree.DecisionTreeClassifier()# clf.fit(features,labels)# result = clf.predict(features_test)# sum = 0# for index,item in enumerate(result):#     if item == labels_test[index] == 1:#         sum += 1# print sum# 7 answer:0# from sklearn.metrics import precision_score# print precision_score(labels_test, result) # 8 answer:0# from sklearn.metrics import recall_score# print recall_score(labels_test, result) # 9 answer:6# pre = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]# real =  [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]# sum = 0# for index,item in enumerate(pre):#     if item == real[index] == 1:#         sum += 1# print sum# 10 answer:9# pre = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]# real =  [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]# sum = 0# for index,item in enumerate(pre):#     if item == 0 and real[index] == 0:#         sum += 1# print sum# 11 answer:3# pre = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]# real =  [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]# sum = 0# for index,item in enumerate(pre):#     if item == 1 and real[index] == 0:#         sum += 1# print sum# 12 answer:2# pre = [0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1]# real =  [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0]# sum = 0# for index,item in enumerate(pre):#     if item == 0 and real[index] == 1:#         sum += 1# print sum# 13.14 answer: precision = 0.666666666667 recall = 0.75# TP = 6.0# FP = 3.0# FN = 2.0# P = TP / (TP+FP)# R = TP / (TP+FN)# print P# print R# 15 answer:POI# 16 answer:precision/recall# 17 answer:recall/precision# 18 answer:F1 score/low# 19 answer:none
原创粉丝点击