决策树(ID3,C4.5)Python实现
来源:互联网 发布:数据库论文 编辑:程序博客网 时间:2024/06/06 09:51
看了《统计学习方法》就尝试写了个简单的决策树,使用信息增益(ID3)或者信息增益率(C4.5),但是没写好剪枝,自己写的剪枝一剪就只剩根节点和一个叶子节点了,目前只有训练和预测的功能,容易过拟合。
用的隐形眼镜数据集,把数据集读入np.array里,就可以进行训练了。
young myope no reduced nolenses young myope no normal soft young myope yes reduced nolenses young myope yes normal hard young hyper no reduced nolenses young hyper no normal soft young hyper yes reduced nolenses young hyper yes normal hard pre myope no reduced nolenses pre myope no normal soft pre myope yes reduced nolenses pre myope yes normal hard pre hyper no reduced nolenses pre hyper no normal soft pre hyper yes reduced nolenses pre hyper yes normal nolenses presbyopic myope no reduced nolenses presbyopic myope no normal nolenses presbyopic myope yes reduced nolenses presbyopic myope yes normal hard presbyopic hyper no reduced nolenses presbyopic hyper no normal soft presbyopic hyper yes reduced nolenses presbyopic hyper yes normal nolenses
首先写计算熵、条件熵、信息增益(互信息)的相关函数
熵
def getEnt(x): #x是随机变量 from math import log try: l = x.tolist() except: l = x total = len(x) ent = 0 for i in set(l): p = l.count(i)*1./total ent += -p*log(p) return ent
条件熵
def getConEnt(x,y): #x是特征,y是类别,x条件下y的条件熵 from math import log try: lx = x.tolist() ly = y.tolist() except: lx = x ly = y l = zip(lx,ly) total = len(l) ent = 0 for i in set(lx): p = lx.count(i)*1./total ey = [] for k,v in l: if k == i:ey.append(v) ent += p*getEnt(ey) return ent
信息增益
def getMutInfo(x,y): #x是特征,y是类别,x对y的信息增益 return getEnt(y) - getConEnt(x,y)
信息增益率
def getEntGainRatio(x,y): gda = getMutInfo(x,y) had = getEnt(x) return gda*1./had
def trainDecisionTree(features,classes): tree = {} #获取熵最大的特征 f_dim = features.shape[1] for n,e in zip(range(0,f_dim),[getMutInfo(features.T[i],classes) for i in range(0,f_dim)]): emax = e maxindex = n if e > emax: maxindex = n tree.setdefault(maxindex,{}) #获取特征下的可能取值 di = {} cls_count = {} for i in set(classes): cls_count.setdefault(i,0) for k,v in zip(features.T[maxindex],classes): di.setdefault(k,cls_count.copy()) di[k][v] += 1 #特征的各取值做节点还是继续构造 #np.delete(features[features.T[3] == 1.5],3,axis=1) for i in di.keys(): flag = 0 cls = None for c in di[i].keys(): if di[i][c] == sum(di[i].values()): flag += 1 cls = c else:continue if flag == 1: #子集之中只有一类的情况 tree[maxindex].setdefault(i,cls) else: subset = np.delete(features[features.T[maxindex] == i],maxindex,axis=1) subcls = classes[features.T[maxindex] == i] #继续分 tree[maxindex].setdefault(i,trainDecisionTree(subset,subcls)) return tree
训练好的决策树,3、2、1这些数字是列号,因为传入的数据集没有特征标识,所以这里用列号表示
预测和评价
预测一条数据
def predictOnce(data,tree): for branch in tree.keys():#对树tree下所有的分支 node = tree[branch] #检测分支的类型是内部节点还是叶节点 if isinstance(node,dict):#如果是内部节点 if data[branch] in node.keys(): child = node[data[branch]] if isinstance(child,dict): return predictOnce(data,child) else: return child
预测一个数据集
def predictSet(test_data,mytree): result = [] for i in test_data: result.append(predictOnce(i,mytree)) return np.array(result)
评价函数,准确度:
def validation(test,real): if len(test) != len(real): print "Length error!" return good = 0. bad = 0. for i in range(len(test)): if test[i] == real[i]:good+=1 else:bad+=1 return good/len(test)
#预测result = predictSet(test_data,mytree)#评价,过拟合了。。validation(result,classes)
0 0
- 决策树(ID3,C4.5)Python实现
- Python实现决策树(ID3、C4.5)
- Python 决策树算法(ID3 & C4.5)
- 决策树ID3和C4.5算法Python实现源码
- 决策树ID3和C4.5算法Python实现源码
- 决策树ID3和C4.5算法Python实现源码
- python实现决策树C4.5算法(在ID3基础上改进)
- Python实现决策树算法 C4.5和ID3算法
- Python 实现决策树 ID3 C4.5 悲观剪枝
- python实现决策树C4.5算法(ID3基础上改进)
- 决策树分类器(ID3、C4.5 Java实现)
- 决策树学习(ID3,C4.5)
- 决策树(ID3 C4.5 CART)
- 决策树(ID3,C4.5,CART)
- C4.5决策树算法(Python实现)
- 决策树ID3;C4.5详解和python实现与R语言实现比较
- 决策树:ID3\C4.5\Cart
- 决策树之ID3,C4.5
- iOS 中的 GpuImage 及相关滤镜介绍
- react组件生命周期
- sphinx 源码阅读之 分词,压缩索引,倒排
- iOS block
- Multi-University 2015 #7 F(hdu 5374 Tetris)
- 决策树(ID3,C4.5)Python实现
- 和最接近定值的元组
- Word Ladder II
- RaspberryPi 3B之初体验笔记
- STL vector用法介绍
- Android系统的常用权限
- vmware中使用net方式配置docker静态ip
- 根据Model部署流程
- MySQL的redo和undo