机器学习实战:决策树(decision Trees)
来源:互联网 发布:apache spark mahout 编辑:程序博客网 时间:2024/05/29 17:10
from numpy import *from math import logimport operatordef calcShannonEnt(dataSet): num=len(dataSet) labelCount={} for data in dataSet: currentLabel = data[-1] #if currentLabel not in labelCount.keys(): # labelCount[currentLabel]=0 #labelCount[currentLabel]+=1 labelCount[currentLabel]=labelCount.get(currentLabel,0)+1 shannonEnt=0.0 for key in labelCount: p=float(labelCount[key])/num shannonEnt -= p*log(p,2) return shannonEnt def createDataSet(): dataSet=[[1,1,'yes'],[1,1,'yes'],[1,0,'no'],[0,1,'no'],[0,1,'no']] labels=['no surfacing','flippers'] return dataSet,labels def splitDataSet(dataSet, axis, value): retDataSet=[] for featureVec in dataSet: if featureVec[axis]==value: temp=featureVec[:axis] temp.extend(featureVec[axis+1:]) retDataSet.append(temp) return retDataSet def chooseBestFeatureToSplit(dataSet): EntD=calcShannonEnt(dataSet) feaNo=len(dataSet[0])-1 bestFeature=-1 bestEntD=-1 for i in range(feaNo): feati=[example[i] for example in dataSet] uniqueVals=set(feati) subEnt=0.0 for value in uniqueVals: subDataSet=splitDataSet(dataSet, i, value) p=len(subDataSet)/float(len(dataSet)) subEnt+=p*calcShannonEnt(subDataSet) newEnt=EntD-subEnt if newEnt > bestEntD: bestEntD=newEnt bestFeature=i return bestFeature def majorityCnt(classList): classCount={} for item in classList: classCount[item]=classCount.get(item)+1 sortedClass=sorted(classCount.iteritems,key=operator.itemgetter(1),reverse=True) return sortedClass[0][0] def createTree(dataSet,labels): classList=[item[-1] for item in dataSet] if len(set(classList))==1: return classList[0] if len(dataSet[0])==1: return majorityCnt(classList) bestFeature=chooseBestFeatureToSplit(dataSet) bestLabel=labels[bestFeature] bFeatureItems=[example[bestFeature] for example in dataSet] uniqueVals=set(bFeatureItems) trees={bestLabel:{}} del(labels[bestFeature]) for value in uniqueVals: subDataSet=splitDataSet(dataSet,bestFeature, value) subLabels=labels[:] trees[bestLabel][value]=createTree(subDataSet,subLabels) return treesdef classify(inputTree,featLabels,testVec): firstStr=inputTree.keys()[0] secTree=inputTree[firstStr] try: featIndex=featLabels.index(firstStr) except ValueError: print("List does not contain value") for key in secTree.keys(): if testVec[featIndex]==key: if type(secTree[key]).__name__ == 'dict': result=classify(secTree[key],featLabels,testVec) else: result=secTree[key] return resultdef storeTree(inputTree,filename): import pickle fw=open(filename,'w') pickle.dump(inputTree,fw) fw.close() def grabTree(filename): import pickle fr=open(filename) return pickle.load(fr)dataSet,label=createDataSet()trees=createTree(dataSet,label)print treesdataSet,label=createDataSet()r=classify(trees,label,[0,1])print r'''fr=open('lenses.txt')lenses=[inst.strip().split('\t') for inst in fr.readlines()]lenseLabel=['age', 'prescript', 'astigmatic', 'tearRate']trees=createTree(lenses,lenseLabel)lenseLabel=['age', 'prescript', 'astigmatic', 'tearRate']result=classify(trees,lenseLabel,['pre','myope','no','normal'])print result'''
0 0
- 机器学习实战:决策树(decision Trees)
- 《机器学习实战》(三)决策树(decision trees)
- 机器学习算法之:决策树 (decision trees)
- [完]机器学习实战 第三章 决策树(Decision Tree)
- 机器学习实战第3章-决策树(decision tree)
- scikit-learn学习1.10. 决策树(Decision Trees)
- spark机器学习库指南[Spark 1.3.1版]——决策树(decision trees)
- Decision Trees - 决策树
- 决策树(Decision Trees)
- 【机器学习】决策树(Decision Tree)
- 机器学习: 决策树(Decision Tree)
- 机器学习之决策树(Decision Tree)
- 机器学习:决策树(Decision Tree)
- 机器学习之:决策树(Decision Tree)
- 机器学习算法实践:决策树 (Decision Tree)
- 机器学习(三)决策树算法Decision Tree
- 决策树(Decision Tree)-机器学习ML
- 机器学习---决策树(decision tree)算法
- POJ1458Common Subsequence(LCS)
- 放学快走,你的电脑在实验室自己喊啪嗒!
- XML和JSON区别
- Java反射(一):获取类的反射
- POJ 3264 Balanced Lineup .
- 机器学习实战:决策树(decision Trees)
- Java生成动态GIF图片
- 在启动Mongo时我出现了错误:Failed to connect 127.0.0.1:27017,reason:errno:10061由于目标计算机积极拒绝,无法连接
- Python学习笔记21:Python数据库编程
- myEclipse Socket编程半双工练习问题
- POJ1236----tarjan缩点
- C语言的输入输出-几个函数的使用比较
- 深度学习21天实战Caffe学习笔记--笔记3--caffe代码梳理
- TCP连接的三次握手