3.2 决策树算法应用

来源:互联网 发布:matlab计算矩阵编程 编辑:程序博客网 时间:2024/05/29 07:16
  1. Python

  2. Python机器学习的库:scikit-learn

    2.1: 特性:
    简单高效的数据挖掘和机器学习分析
    对所有用户开放,根据不同需求高度可重用性
    基于Numpy, SciPy和matplotlib
    开源,商用级别:获得 BSD许可

    2.2 覆盖问题领域:
    分类(classification), 回归(regression), 聚类(clustering), 降维(dimensionality reduction)
    模型选择(model selection), 预处理(preprocessing)

  3. 使用用scikit-learn
    安装scikit-learn: pip, easy_install, windows installer
    安装必要package:numpy, SciPy和matplotlib, 可使用Anaconda (包含numpy, scipy等科学计算常用
    package)
    安装注意问题:Python解释器版本(2.7 or 3.4?), 32-bit or 64-bit系统

  4. 例子:
    这里写图片描述

from sklearn.feature_extraction import DictVectorizerimport csvfrom sklearn import  tree, preprocessingfrom sklearn.externals.six import StringIOimport numpy  as npallElectronicsData=open(r'C://AllElectronics.csv')reader=csv.reader(allElectronicsData)headers=reader.next()print(headers)featrueList=[]labelList=[]for row in reader:    labelList.append(row[len(row)-1])    rowDict={}    for i in range(1,len(row)-1):        rowDict[headers[i]]=row[i]    featrueList.append(rowDict)print(featrueList)vec=DictVectorizer()dummyX=vec.fit_transform(featrueList).toarray()print("dummyX:"+str(dummyX))print(vec.get_feature_names())print("labelList:"+str(labelList))lb=preprocessing.LabelBinarizer()dummyY=lb.fit_transform(labelList)print("dummyY:"+str(dummyY))clf = tree.DecisionTreeClassifier(criterion="entropy")clf=clf.fit(dummyX,dummyY)print("clf:"+str(clf))with open("allElectronicInformationGainOri.dot",'w') as f:    f=tree.export_graphviz(clf,out_file=f,feature_names=vec.get_feature_names())oneRowX=dummyX[0,:]print("oneRowx:"+str(oneRowX))        newRowX=oneRowXnewRowX[0]=1newRowX[1]=0print("newRowX:"+str(newRowX))predictedY = clf.predict(newRowX)print("predictedY:"+str(predictedY))
0 0
原创粉丝点击