决策树(实践)
来源:互联网 发布:大连商品交易所软件 编辑:程序博客网 时间:2024/05/18 03:32
决策树实验
1.准备数据(E:\MachineLearning-data\AllElectronics.csv)
2.实验代码
# -*- coding: utf-8 -*-# coding=utf-8# 实现决策树并进行预测from sklearn.feature_extraction import DictVectorizerimport csvfrom sklearn import preprocessingfrom sklearn import tree#1.读取数据,rt模式下,python在读取文本时会自动把\r\n转换成\n.,设置编码格式与文档统一allElectronicsData = open('E:\MachineLearning-data\AllElectronics.csv', 'rt',encoding="utf-8")reader = csv.reader(allElectronicsData)headers = next(reader)#读出数据的属性名print(headers)#2.存放数据#featuresList:将属性:age、 Income、student、 credit_rating、的值存放在列表中,#labelList:分类的结果存放在列表featuresList = []labelList = []for row in reader: labelList.append(row[len(row) - 1]) rowDict = {} for i in range(1, len(row) - 1): rowDict[headers[i]] = row[i] featuresList.append(rowDict)#3.将数据向量化vec = DictVectorizer()dummyX = vec.fit_transform(featuresList).toarray()print("dummyX:" + str(dummyX))#输出属性的类别print(vec.get_feature_names())#输出训练集分类结果print("labelList:" + str(labelList))#4.将训练集结果进行数据化处理lb = preprocessing.LabelBinarizer()dummyY = lb.fit_transform(labelList)print("dummyY:" + str(dummyY))#5.属性设置结束,设置决策树构造参数clf = tree.DecisionTreeClassifier(criterion='entropy')clf = clf.fit(dummyX, dummyY)print("clf:" + str(clf))#6.将结果写入文件中with open("E:\MachineLearning-data\AllElectronicInformationGainOri.dot", 'w') as f: f = tree.export_graphviz(clf, feature_names=vec.get_feature_names(), out_file=f)#7.给定数据,进行预测,读出第一条数据(一行)oneRowX = dummyX[0, :]print("oneRowX: " + str(oneRowX))#修改数据中的值newRowX = oneRowXnewRowX[0] = 1newRowX[2] = 1print("newRowX: " + str(newRowX))#8.给出预测结果predictedY = clf.predict(newRowX)print("predictedY: " + str(predictedY))
3.实验结果
"D:\Program Files\Python\Anaconda\python.exe" E:/Python/machinelearning/01.py['\ufeffRID', 'age', 'Income', 'student', 'credit_rating', 'Class_buys_computer']dummyX:[[ 1. 0. 0. 0. 0. 1. 0. 1. 1. 0.] [ 1. 0. 0. 0. 0. 1. 1. 0. 1. 0.] [ 1. 0. 0. 1. 0. 0. 0. 1. 1. 0.] [ 0. 0. 1. 0. 1. 0. 0. 1. 1. 0.] [ 0. 1. 0. 0. 1. 0. 0. 1. 0. 1.] [ 0. 1. 0. 0. 1. 0. 1. 0. 0. 1.] [ 0. 1. 0. 1. 0. 0. 1. 0. 0. 1.] [ 0. 0. 1. 0. 0. 1. 0. 1. 1. 0.] [ 0. 1. 0. 0. 0. 1. 0. 1. 0. 1.] [ 0. 0. 1. 0. 1. 0. 0. 1. 0. 1.] [ 0. 0. 1. 0. 0. 1. 1. 0. 0. 1.] [ 0. 0. 1. 1. 0. 0. 1. 0. 1. 0.] [ 1. 0. 0. 1. 0. 0. 0. 1. 0. 1.] [ 0. 0. 1. 0. 1. 0. 1. 0. 1. 0.]]['Income=high', 'Income=low', 'Income=medium', 'age=middle_aged', 'age=senior', 'age=youth', 'credit_rating=excellent', 'credit_rating=fair', 'student=no', 'student=yes']labelList:['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']dummyY:[[0]
[0] [1] [1] [1] [0] [1] [0] [1] [1] [1] [1] [1] [0]]clf:DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_split=1e-07, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best')oneRowX: [ 1. 0. 0. 0. 0. 1. 0. 1. 1. 0.]newRowX: [ 1. 0. 1. 0. 0. 1. 0. 1. 1. 0.]predictedY: [0]D:\Program Files\Python\Anaconda\lib\site-packages\sklearn\utils\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
4.将dot文件转化为pdf输出(命令为:dot -Tpdf E:\MachineLearning-data\AllElectronics.dot -o E:\MachineLearning-data\AllElectronics.pdf)
其中将dot转化为pdf的软件graphviz在9中进行详述;
5.错误总结
1..错误1
python读取文件时提示"UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 205: illegal multibyte sequence"
解决办法1.
FILE_OBJECT= open('order.log','r', encoding='UTF-8')
解决办法2.
FILE_OBJECT= open('order.log','rb')
2..错误2
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
原因:循环的数据不应该是二进制数据
open('E:\MachineLearning-data\AllElectronics.csv', 'rb',encoding="utf-8")
解决方案:
open('E:\MachineLearning-data\AllElectronics.csv', 'rt',encoding="utf-8")
说明:rb:以二进制格式打开一个文件用于只读
rt:读文件,python在读取文本时会自动把\r\n转换成\n
3..错误3
.csv文件编码必须与读写时的编码格式相符合;
6.安装graphviz
1)下载:第一个为安装版,第二个为免安装版
2)安装配置环境变量
a.配置环境变量(系统变量PATH中添加)
b.检测是否安装正确
- 决策树(实践)
- 决策树——实践
- Python-决策树ID3实践
- 决策树实践学习
- Python机器学习算法实践——决策树(ID3)
- 机器学习笔记(五)决策树算法及实践
- C4.5决策树+代码实践
- 决策树的原理与实践
- 决策树(五)--OpenCV决策树
- 基于R的数据挖掘方法与实践(3)——决策树分析
- 决策树实践,参考《机器学习实战》
- GBDT(MART) 迭代决策树 实践
- 机器学习算法实践:决策树 (Decision Tree)
- 决策树绘图(一)
- 决策树绘图(二)
- 决策树(一) ID3
- 决策树绘图(python)
- (决策树)ID3算法
- 2017/10/30 学习笔记
- 413. Arithmetic Slices
- wordpress disabled for security reasons in
- 归一化方法总结
- Learning Nagios, Third Edition.pdf 英文原版 免费下载
- 决策树(实践)
- Qt实现的局域网对战五子棋
- window下Python+Numpy+SciPy+MatPlotlib详细安装过程及可能遇到的问题
- Mastering Media with the Raspberry Pi.pdf 英文原版免费下载
- 算法练习(7) —— 动态规划 Strange Printer
- 第一章Web开发新时代
- Learning Embedded Android N Programming.pdf 英文原版免费下载
- Jvm调优指南
- UML关系(泛化,实现,依赖,关联(聚合,组合))