试用Orange进行数据挖掘:Classification

来源:互联网 发布:linux 内存使用 编辑:程序博客网 时间:2024/06/06 03:58

环境

--------------------------------------------------------

Kubuntu 12.04/Python 2.7.3/Orange 2.0b  

准备工作

--------------------------------------------------------
#1.下载Orange的源码和Numpy的源码#2.编译Numpy#3.安装Python开发包sudo apt-get install python-dev#4.安装Python networkx包sudo apt-get install python-networkx#5.编译Orangepython install.py build

测试

--------------------------------------------------------
import orange#导入orangeorange.version'2.0b (21:58:41, Nov 3 2012)'

Classification

--------------------------------------------------------
从UCI Machine Learning Repository]下载一个测试数据集;比如Voting.tab  

Naive Bayes classifier

--------------------------------------------------------

import orangedata = orange.ExampleTable("voting")classifier = orange.BayesLearner(data)for i in range(5):    c = classifier(data[i])    print("original",data[i].getclass(),"classified as",c)
输出结果
original republican classified as republicanoriginal republican classified as republicanoriginal democrat classified as republicanoriginal democrat classified as democratoriginal democrat classified as democrat
可以看出,Naive Bayes在第三个实例处出现了错误,但是其他的都是正确的。


import orangedata = orange.ExampleTable("voting")classifier = orange.BayesLearner(data)corrcetNum = 0#计数器for i in data:    a = i.getclass()    b = classifier(i)    if a == b:        corrcetNum += 1print "CA:%.3f" %(float(corrcetNum)/len(data))#计算分类正确率
输出结果
Possible classes: <republican,democrat>CA:0.9034
可见Naive Bayes在总数量比较大的情况下,Classification的正确率还是比较好的,但是也只能说是一般。

参考资料

--------------------------------------------------------
* Orange reference : http://orange.biolab.si/doc/reference/
* Orange tutorial : http://orange.biolab.si/doc/tutorial/