机器学习决策树的算法实现

来源:互联网 发布:unity3d 协程重复调用 编辑:程序博客网 时间:2024/06/06 03:00
#coding=gbk# DictVectorizer:数据类型转换from sklearn.feature_extraction import DictVectorizer# csv:原始数据放在csv文件中,该package为python自带,不需要安装import csv#引入数据预处理包、决策树包、读写字符串包from sklearn import preprocessingfrom sklearn import treefrom sklearn.externals.six import StringIO#从csv文件中读取数据,并保存到allElectronicsData变量中allElectronicsData = open(r'D:\eclipse\mars\project\DeepLearningBasicsMachineLearning\Datasets\AllElectronics.csv','r')# csv的reader方法按行读取数据reader = csv.reader(allElectronicsData)#next方法读取到csv文件的第一行数据headers = next(reader)#打印第一行数据print(headers)#建两个list,featureList装特征值,labelList装类别标签featureList = []labelList = []#遍历csv文件的每一行for row in reader:    #将类别标签加入到labelList中    labelList.append(row[len(row)-1])    #下面这几步的目的是为了让特征值转化成一种字典的形式,就可以调用sk-learn里面的DictVectorizer,直接将特征的类别值转化成0,1值    rowDict = {}    for i in range(1,len(row)-1):        rowDict[headers[i]] = row[i]    featureList.append(rowDict)print(featureList)#实例化    vec = DictVectorizer()dummyX = vec.fit_transform(featureList).toarray()print("dummyX:"+str(dummyX))print(vec.get_feature_names())# label的转化,直接用preprocessing的LabelBinarizer方法lb = preprocessing.LabelBinarizer()dummyY = lb.fit_transform(labelList)print("dummyY:"+str(dummyY))print("labelList:"+str(labelList))#criterion是选择决策树节点的标准,这里是按照“熵”为标准,即ID3算法;默认标准是gini index,即CART算法。clf = tree.DecisionTreeClassifier(criterion = 'entropy')clf = clf.fit(dummyX,dummyY)print("clf:"+str(clf))#生成dot文件with open("allElectronicInformationGainOri.dot",'w') as f:    f = tree.export_graphviz(clf,feature_names = vec.get_feature_names(),out_file = f)#测试代码,取第1个实例数据,将001->100,即age:youth->middle_aged    oneRowX = dummyX[0,:]print("oneRowX:"+str(oneRowX))newRowX = oneRowXnewRowX[0] = 1newRowX[2] = 0print("newRowX:"+str(newRowX))#预测代码predictedY = clf.predict(newRowX)print("predictedY:"+str(predictedY))

5.9、sk-learn的决策树文档

地址:scikit-learn.org/stable/modules/tree.html

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
    原创粉丝点击