1-2 决策树算法应用
来源:互联网 发布:短信验证码软件 编辑:程序博客网 时间:2024/06/05 00:08
决策树算法应用
数据集
训练集
RID,age,income,student,credit_rating,class_buys_computer1,youth,high,no,fair,no2,youth,high,no,excellent,no3,middle_aged,high,no,fair,yes4,senior,medium,no,fair,yes5,senior,low,yes,fair,yes6,senior,low,yes,excellent,no7,middle_aged,low,yes,excellent,yes8,youth,medium,no,fair,no9,youth,low,yes,fair,yes10,senior,medium,yes,fair,yes11,youth,medium,yes,excellent,yes12,middle_aged,medium,no,excellent,yes13,middle_aged,high,yes,fair,yes14,senior,medium,no,excellent,no
测试集
RID,age,income,student,credit_rating,class_buys_computer1,youth,high,no,fair,no2,youth,high,no,excellent,no3,middle_aged,high,no,fair,yes4,senior,medium,no,fair,yes5,senior,low,yes,fair,yes6,senior,low,yes,excellent,no7,middle_aged,low,yes,excellent,yes8,youth,medium,no,fair,no9,youth,low,yes,fair,yes10,senior,medium,yes,fair,yes11,youth,medium,yes,excellent,yes12,middle_aged,medium,no,excellent,yes13,youth,medium,no,excellent,yes
代码
#coding=utf-8#设置python编码from sklearn.feature_extraction import DictVectorizerimport csvimport osfrom sklearn import preprocessingfrom sklearn import treefrom sklearn.externals.six import StringIO#---数据获取---#使用CSV包,按行读取CSV数据allElectronicsData = open(r'./AllElectronics.csv','rb')reader = csv.reader(allElectronicsData)#获取各个字段及其名headers = reader.next()print("headers : " ,headers)#---数据预处理---#sklearn只接受数值型的数据#以CSV中第一行age数据为例#age:youth middle_age senior#矩阵: 1 0 0#特征值ListfeatureList = []#类别List , Yes/NolabelList = []for row in reader: #将每一行的结果放入labelList labelList.append(row[len(row)-1]) #对每一行数据创建一个字典(将每行特征数据转为JSON格式),将headers中的字段与实际值对应 如age:youth rowDict = {} #i从1开始,取消RID的影响 for i in range(1,len(row)-1): rowDict[headers[i]] = row[i] featureList.append(rowDict)# print labelList# print featureList#把featureList向量化vec = DictVectorizer()dummyX = vec.fit_transform(featureList).toarray()print("dummyX : " + str(dummyX))print ("feature mapping : " + str(vec.get_feature_names()))#把labelList向量化,使用python自带LabelBinarizerlb = preprocessing.LabelBinarizer()dummyY = lb.fit_transform(labelList)# print("dummyY : " + str(dummyY))#使用tree分类器创建,使用信息熵 ID3算法clf = tree.DecisionTreeClassifier(criterion='entropy')clf = clf.fit(dummyX,dummyY)print ("clf: " + str(clf))#创建dot文件并输出树数据with open('DTreeData.dot','w') as f: f = tree.export_graphviz(clf,feature_names= vec.get_feature_names(),out_file=f)os.system("dot -Tpdf D:\Data\MyCode\codepython\ML_Base_Demo\DecisionTree\DTreeData.dot -o D:\Data\MyCode\codepython\ML_Base_Demo\DecisionTree\DTree.pdf")#利用生成的决策树进行预测# oneRow = dummyX[1,:]# print ("one row : " + str(oneRow))## newRow = oneRow# newRow[0] = 1# newRow[2] = 0# print("new row x : " + str(newRow))## predictedY = clf.predict(newRow)## print ("predict result : " + str(predictedY))testSet = open(r'test_set.csv','rb')reader = csv.reader(testSet)reader.next()testList = []for row in reader: #将每一行的结果放入labelList labelList.append(row[len(row)-1]) #对每一行数据创建一个字典(将每行特征数据转为JSON格式),将headers中的字段与实际值对应 如age:youth rowDict = {} #i从1开始,取消RID的影响 for i in range(1,len(row)-1): rowDict[headers[i]] = row[i] testList.append(rowDict)# print testList#把testList向量化vec = DictVectorizer()testX = vec.fit_transform(testList).toarray()print("testX : " + str(testX))predictSet = clf.predict(testX)print predictSet
运行结果
('headers : ', ['RID', 'age', 'income', 'student', 'credit_rating', 'class_buys_computer'])dummyX : [[ 0. 0. 1. 0. 1. 1. 0. 0. 1. 0.] [ 0. 0. 1. 1. 0. 1. 0. 0. 1. 0.] [ 1. 0. 0. 0. 1. 1. 0. 0. 1. 0.] [ 0. 1. 0. 0. 1. 0. 0. 1. 1. 0.] [ 0. 1. 0. 0. 1. 0. 1. 0. 0. 1.] [ 0. 1. 0. 1. 0. 0. 1. 0. 0. 1.] [ 1. 0. 0. 1. 0. 0. 1. 0. 0. 1.] [ 0. 0. 1. 0. 1. 0. 0. 1. 1. 0.] [ 0. 0. 1. 0. 1. 0. 1. 0. 0. 1.] [ 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.] [ 0. 0. 1. 1. 0. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 0. 0. 1. 1. 0.] [ 1. 0. 0. 0. 1. 1. 0. 0. 0. 1.] [ 0. 1. 0. 1. 0. 0. 0. 1. 1. 0.]]feature mapping : ['age=middle_aged', 'age=senior', 'age=youth', 'credit_rating=excellent', 'credit_rating=fair', 'income=high', 'income=low', 'income=medium', 'student=no', 'student=yes']clf: DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best')Error: Could not open "D:\Data\MyCode\codepython\ML_Base_Demo\DecisionTree\DTree.pdf" for writing : Permission deniedtestX : [[ 0. 0. 1. 0. 1. 1. 0. 0. 1. 0.] [ 0. 0. 1. 1. 0. 1. 0. 0. 1. 0.] [ 1. 0. 0. 0. 1. 1. 0. 0. 1. 0.] [ 0. 1. 0. 0. 1. 0. 0. 1. 1. 0.] [ 0. 1. 0. 0. 1. 0. 1. 0. 0. 1.] [ 0. 1. 0. 1. 0. 0. 1. 0. 0. 1.] [ 1. 0. 0. 1. 0. 0. 1. 0. 0. 1.] [ 0. 0. 1. 0. 1. 0. 0. 1. 1. 0.] [ 0. 0. 1. 0. 1. 0. 1. 0. 0. 1.] [ 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.] [ 0. 0. 1. 1. 0. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 0. 0. 1. 1. 0.] [ 0. 0. 1. 1. 0. 0. 0. 1. 1. 0.]][0 0 1 1 1 0 1 0 1 1 1 1 0]
0 0
- 1-2 决策树算法应用
- 3.2 决策树算法应用
- 决策树算法介绍及应用
- 决策树算法介绍及应用
- 1-1 决策树算法
- 决策树2 -- CART算法
- 决策树算法详解(2)
- 决策树算法详解(1)
- 决策树算法的研究与应用
- 机器学习算法—决策树应用
- 机器学习算法(2) 决策树
- 数据分析算法----1决策树
- 决策树1 -- ID3_C4.5算法
- 机器学习算法应用篇之决策树算法(sklearn)
- 机器学习决策树算法原理以及用sklearn对决策树算法的应用
- 决策树算法
- 决策树算法
- 决策树算法
- 知识图谱
- jenkins自动部署代码上线2
- 1-1 决策树算法
- 计算字符串中各个字母个数
- 10: docker 主机远程访问另一台docker 主机
- 1-2 决策树算法应用
- Uva699 The Falling Leaves 【递归输入】【例题6-10】
- LeetCode 14. Longest Common Prefix
- 2-1 最近邻规则分类(K-Nearest Neighbor)KNN算法
- 利用位图(Bit Map)和二分查找实现快速查找算法
- 文本文件与二进制文件区别
- OpenCV3.0 或OpenCV3.1 与cuda编译出现LINK: warning LNK4044: 无法识别的选项/LC:/Program Files/XXXXXXXX/lib/x64
- ViewPager、PagerTabStrip、FragmentPagerStateAdapter实现状态栏切换界面
- 数据库 表之间的联系