决策树算法详解(ID3)
来源:互联网 发布:淘宝网店推广平台 编辑:程序博客网 时间:2024/05/17 04:32
from math import logimport operatordef createDataSet():#创建数据集 dataSet = [[1,1,"yes"], [1,1,"yes"], [1,0,"no"], [0,1,"no"], [0,1,"no"]] labels = ["no surfacing","flippers"] return dataSet,labelsdef calcShannonEnt(dataSet):#计算信息熵 numEntries = len(dataSet) labelCounts = {} for featVec in dataSet: currentLabel = featVec[-1] if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0 labelCounts[currentLabel] += 1 shannonEnt = 0.0 for key in labelCounts: prob = float(labelCounts[key]) / numEntries shannonEnt -= prob * log(prob,2) return shannonEntdef splitdataSet(dataSet,axis,value): retDataSet = [] for featVec in dataSet: if featVec[axis] == value: reducedFeatVec = featVec[:axis] reducedFeatVec.extend(featVec[axis + 1:]) retDataSet.append(reducedFeatVec) return retDataSetdef chooseBestFeatureToSplit(dataSet): numFeatures = len(dataSet[0]) - 1 baseEntropy = calcShannonEnt(dataSet) bestInfoGain = 0.0;bestFeature = -1 for i in range(numFeatures): featList = [example[i] for example in dataSet] uniqueVals = set(featList) newEntropy = 0.0 for value in uniqueVals: subDataSet = splitdataSet(dataSet,i,value) prob = len(subDataSet) / float(len(dataSet)) newEntropy += prob * calcShannonEnt(subDataSet) infoGain = baseEntropy - newEntropy if (infoGain > bestInfoGain): bestInfoGain = infoGain bestFeature = i return bestFeaturedef majorityCnt(classList): classCount = {} for vote in classList: if vote not in classCount.keys(): classCount[vote] = 0 classCount[vote] += 1 sortedClassCount = sorted(classCount.iteritems(),key=operator.itemgetter(1),reverse=True) return sortedClassCount[0][0]def createTree(dataSet,labels): classList = [example[-1] for example in dataSet] if classList.count(classList[0]) == len(classList): return classList[0] if len(dataSet[0]) == 1: return majorityCnt(classList) bestFeat = chooseBestFeatureToSplit(dataSet) bestFeatLabel = labels[bestFeat] myTree = {bestFeatLabel:{}} del(labels[bestFeat]) featValues = [example[bestFeat] for example in dataSet] uniqueVals = set(featValues) for value in uniqueVals: subLabels = labels[:] myTree[bestFeatLabel][value] = createTree(splitdataSet(dataSet,bestFeat,value),subLabels) return myTreeif __name__ == "__main__": myDat,labels = createDataSet() #print calcShannonEnt(myDat) #print splitdataSet(myDat,0,1) #print chooseBestFeatureToSplit(myDat) myTree = createTree(myDat,labels) print myTree
阅读全文
0 0
- 决策树算法详解(ID3)
- (决策树)ID3算法
- 决策树(一)ID3算法
- 详解决策树ID3算法划分数据集
- 分类算法之决策树ID3详解
- 决策树(decisions tree)和ID3算法
- python实现决策树(ID3算法)
- 决策树(decisions tree)和ID3算法
- Python 决策树算法(ID3 & C4.5)
- 机器学习方法:决策树(一):ID3算法
- ID3决策树算法(python实现)
- 决策树之ID3算法(转)
- 决策树(ID3算法)Python实现
- 决策树ID3算法(C++实现)
- 决策树之id3算法
- 决策树ID3算法
- ID3决策树建立算法
- ID3 算法实现决策树
- Angular用户管理
- 模拟第三周题
- 3
- 封装okHttp 吐司打印捕获异常 recyclerView的多条目
- java-jdbc2
- 决策树算法详解(ID3)
- 神经进化:一种不一样的深度学习——通过进化算法来探求神经网络的进化
- 安卓控件原理分析篇-不定时总结更新
- PayPal创始人《从0到1》作者彼得•蒂尔,上周宣布与他的同性男友结婚了
- 从阿尔法狗元(AlphaGo Zero)的诞生看终极算法的可能性
- Banner框架的使用
- U3D显示鼠标悬停位置物件的名字
- 封装
- 性能优化之AJAX