决策树

来源:互联网 发布:c语言输入3个数排序 编辑:程序博客网 时间:2024/06/10 12:12

决策树:(decision tree)是一种基本的分类和回归方法。 由结点(node)和有向边(directed edge)组成,结点分为内部节点(internal node)和叶节点(leaf node)。内部结点表示一个特征或属性,叶结点表示一个类。

决策树学习 本质 上是从训练数据集中归纳出一组分类规则。

$$ \dot {x} $$


计算信息熵

#to compute the information entropyfrom math import logdef calcShannonEnt(dataSet):numEntries = len(dataSet)labelCounts = {}for featVec in dataSet:currentLabel = featVec[-1]if currentLabel not in labelCounts.keys():labelCounts[currentLabel] = 0labelCounts[currentLabel] += 1shannonEnt = 0.0for key in labelCounts:prob = float(labelCounts[key]) / numEntriesshannonEnt -= prob * log(prob , 2)return shannonEntdef createDataSet():dataSet = [[1,1,'yes'],[1,1,'yes'],[1,0,'no'],[0,1,'no'],[0,1,'no']]labels = ['no surfacing','flippers']return dataSet,labels #reload(trees.py)myDat,labels = createDataSet()#print myDat

测试代码

#from trees import *#import treesimport treesreload(trees)myDat,labels = trees.createDataSet()print myDatprint trees.calcShannonEnt(myDat)myDat[0][-1] = 'maybe'print trees.calcShannonEnt(myDat)


0 0