Mahout决策树算法源码分析(3-1)建树实战
来源:互联网 发布:软件销售流程 编辑:程序博客网 时间:2024/06/05 14:53
上篇主要分析了Partial Implementation的建树主要操作,下面就自己使用mahout的源码自己实战一下:
(注意:所建的MR工程需要导入下面的包:http://download.csdn.net/detail/fansy1990/5030740,否则看不到console里面的提示)
新建如下的类文件:
package org.fansy.forest.test;import java.io.*;import java.util.List;import java.util.Random;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.mahout.classifier.df.Bagging;import org.apache.mahout.classifier.df.builder.DecisionTreeBuilder;import org.apache.mahout.classifier.df.builder.TreeBuilder;import org.apache.mahout.classifier.df.data.Data;import org.apache.mahout.classifier.df.data.DataConverter;import org.apache.mahout.classifier.df.data.Dataset;import org.apache.mahout.classifier.df.data.Instance;import org.apache.mahout.classifier.df.node.Node;import org.apache.mahout.common.RandomUtils;import com.google.common.collect.Lists;public class TestBuildTree {/** * use the Mahout source code to build a decision tree * @param args * @throws IOException */public static void main(String[] args) throws IOException {Path dsPath=new Path("/home/fansy/workspace/MahTestDemo/car_small.info");String dataPath="/home/fansy/mahout/data/forest/car_test_small.txt";Random rng=RandomUtils.getRandom(555);// create datasetDataset ds=Dataset.load(new Configuration(), dsPath);// create converterDataConverter converter=new DataConverter(ds);// load dataData data=loadData(ds,converter,dataPath);// create treeBuilder and build treeTreeBuilder treeBuilder=new DecisionTreeBuilder();Bagging bag=new Bagging(treeBuilder,data);Node tree=bag.build(rng);System.out.println("the tree is builded"+tree);}/** * load data from the given data path * @param ds :dataset * @param converter: DataConverter * @param dataPath : data path * @return Data * @throws IOException */public static Data loadData(Dataset ds,DataConverter converter,String dataPath) throws IOException{List<Instance> instances=Lists.newArrayList();File dataSourthPath=new File(dataPath);FileReader fileReader=null;try {fileReader = new FileReader(dataSourthPath);} catch (FileNotFoundException e) {e.printStackTrace();}BufferedReader bf=new BufferedReader(fileReader);String line=null;try {while((line=bf.readLine())!=null){instances.add(converter.convert(line));}} catch (IOException e) {e.printStackTrace();}bf.close();fileReader.close();System.out.println("load file to Data done ...");return new Data(ds,instances);}}其中car_small.info可以参考http://blog.csdn.net/fansy1990/article/details/8443342得到 ,原始数据文件和car_test_small.txt一样,参考上篇http://blog.csdn.net/fansy1990/article/details/8544344中的原始数据文件;
直接运行可以看到下面的提示:
load file to Data done ...SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/fansy/hadoop-1.0.2/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/fansy/mahout-0.7-pure/examples/target/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/home/fansy/mahout-0.7-pure/core/target/mahout-core-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.__________________________ bag data which is changed-----------------------{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:1.0,1:2.0,2:3.0,4:1.0}{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{0:2.0,2:3.0}{0:1.0,2:3.0,3:1.0,4:2.0,5:1.0,6:1.0}{1:2.0,2:1.0,3:1.0,5:2.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:1.0,2:2.0,4:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0}{0:2.0,2:3.0,5:2.0}{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{1:2.0,2:3.0,4:2.0}{0:1.0,1:3.0,2:1.0,4:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:2.0,1:1.0,4:2.0}{0:1.0,1:1.0,2:2.0,4:1.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:2.0,2:3.0}{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{0:3.0,2:2.0,4:2.0,5:1.0}{0:3.0,1:3.0,3:1.0,5:2.0}{2:3.0,4:2.0,5:1.0}{0:1.0,1:1.0,2:2.0,4:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:2.0,2:3.0,4:1.0}{1:1.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{1:1.0,2:3.0,4:2.0,5:1.0}{0:2.0,1:1.0,2:1.0,3:1.0,4:2.0,5:2.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{0:1.0,1:2.0,3:2.0,4:2.0}{0:3.0,2:2.0,4:2.0,5:1.0}{0:3.0,1:2.0,2:1.0}{1:2.0,2:1.0,4:1.0,5:1.0}{0:2.0,1:1.0,2:1.0,4:1.0}{0:1.0,1:3.0,2:1.0,3:2.0}{0:1.0,1:2.0,2:2.0,3:2.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:1.0,2:2.0,3:2.0,5:1.0}{0:2.0,2:1.0,4:2.0,5:1.0}{0:2.0,1:2.0,2:2.0,3:1.0,4:2.0}{1:2.0,2:1.0,4:1.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:3.0,3:1.0,5:2.0}{0:3.0,2:3.0,5:2.0}{1:1.0,2:3.0,4:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{1:2.0,2:1.0,3:1.0,5:2.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:2.0,2:3.0,4:1.0}{0:2.0,1:2.0,2:2.0,3:1.0,4:2.0}-----------------------goes down*************+time:1359169561154the igSplit is:null%%%%%%%%%%% the attributes2,5,3,the best ig is:0.3791769206396574,attribute:5,split:NaNthe attributes%%%%%%%%%%%%%****************** not return but goes down1359169561157if(complemented) before................0.0,2.0,1.0,complemented:true0.0,2.0,1.0,if(complemented) after................subset[0] size:18subset[1] size:10subset[2] size:26******************* cnt:3,---------------->end subsets size__________________________subsets[0]-----------------------{0:1.0,1:2.0,2:3.0,4:1.0}{0:2.0,2:3.0}{0:1.0,1:1.0,2:2.0,4:1.0}{0:3.0,1:1.0,2:1.0,3:2.0}{1:2.0,2:3.0,4:2.0}{0:2.0,1:1.0,4:2.0}{0:1.0,1:1.0,2:2.0,4:1.0}{0:2.0,2:3.0}{0:1.0,1:1.0,2:2.0,4:1.0}{0:1.0,1:2.0,2:3.0,4:1.0}{0:1.0,1:2.0,3:2.0,4:2.0}{0:3.0,1:2.0,2:1.0}{0:2.0,1:1.0,2:1.0,4:1.0}{0:1.0,1:3.0,2:1.0,3:2.0}{0:1.0,1:2.0,2:2.0,3:2.0}{0:2.0,1:2.0,2:2.0,3:1.0,4:2.0}{0:1.0,1:2.0,2:3.0,4:1.0}{0:2.0,1:2.0,2:2.0,3:1.0,4:2.0}-----------------------XXXXXXXXXXXXXXXXXXXXXXXXXdata.isIdenticalLabel() in DecisionTreeBuilder it should not be here,time is :1359169561167,data.getDataset.getLabel():0.0__________________________subsets[1]-----------------------{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{1:2.0,2:1.0,3:1.0,5:2.0}{0:2.0,2:3.0,5:2.0}{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{0:3.0,1:3.0,3:1.0,5:2.0}{0:2.0,1:1.0,2:1.0,3:1.0,4:2.0,5:2.0}{0:3.0,1:3.0,3:1.0,5:2.0}{0:3.0,2:3.0,5:2.0}{0:1.0,1:1.0,3:2.0,4:1.0,5:2.0}{1:2.0,2:1.0,3:1.0,5:2.0}-----------------------XXXXXXXXXXXXXXXXXXXXXXXXXdata.isIdenticalLabel() in DecisionTreeBuilder it should not be here,time is :1359169561168,data.getDataset.getLabel():0.0__________________________subsets[2]-----------------------{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:1.0,2:3.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:1.0,1:3.0,2:1.0,4:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{2:3.0,4:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:1.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{1:1.0,2:3.0,4:2.0,5:1.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{1:2.0,2:1.0,4:1.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:1.0,2:2.0,3:2.0,5:1.0}{0:2.0,2:1.0,4:2.0,5:1.0}{1:2.0,2:1.0,4:1.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:1.0,2:3.0,4:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}-----------------------goes down*************+time:1359169561170%%%%%%%%%%% the attributes3,0,2,the best ig is:0.8024757691014436,attribute:3,split:NaNthe attributes%%%%%%%%%%%%%****************** not return but goes down1359169561170if(complemented) before................0.0,2.0,1.0,complemented:true0.0,2.0,1.0,if(complemented) after................subset[0] size:10subset[1] size:10subset[2] size:6******************* cnt:3,---------------->end subsets size__________________________subsets[0]-----------------------{0:1.0,1:3.0,2:1.0,4:2.0,5:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{2:3.0,4:2.0,5:1.0}{1:1.0,2:3.0,4:2.0,5:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{0:3.0,2:2.0,4:2.0,5:1.0}{1:2.0,2:1.0,4:1.0,5:1.0}{0:2.0,2:1.0,4:2.0,5:1.0}{1:2.0,2:1.0,4:1.0,5:1.0}{1:1.0,2:3.0,4:2.0,5:1.0}-----------------------XXXXXXXXXXXXXXXXXXXXXXXXXdata.isIdenticalLabel() in DecisionTreeBuilder it should not be here,time is :1359169561172,data.getDataset.getLabel():0.0__________________________subsets[1]-----------------------{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:1.0,1:1.0,2:2.0,3:2.0,5:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}-----------------------goes down*************+time:1359169561173%%%%%%%%%%% the attributes4,0,2,the best ig is:0.4689955935892812,attribute:4,split:NaNthe attributes%%%%%%%%%%%%%****************** not return but goes down1359169561173if(complemented) before................0.0,2.0,1.0,complemented:true0.0,2.0,1.0,if(complemented) after................subset[0] size:1subset[1] size:5subset[2] size:4******************* cnt:2,---------------->end subsets size__________________________subsets[0]-----------------------{0:1.0,1:1.0,2:2.0,3:2.0,5:1.0}-----------------------isIdentical(data) in DecisionTreeBuilder,time is :1359169561174,data.majorityLabel:0__________________________subsets[1]-----------------------{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}{1:2.0,3:2.0,4:2.0,5:1.0,6:1.0}-----------------------isIdentical(data) in DecisionTreeBuilder,time is :1359169561175,data.majorityLabel:1__________________________subsets[2]-----------------------{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{1:2.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:1.0,2:1.0,3:2.0,4:1.0,5:1.0,6:1.0}-----------------------XXXXXXXXXXXXXXXXXXXXXXXXXdata.isIdenticalLabel() in DecisionTreeBuilder it should not be here,time is :1359169561175,data.getDataset.getLabel():1.0__________________________subsets[2]-----------------------{0:1.0,2:3.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{1:1.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:2.0,1:2.0,2:3.0,3:1.0,4:1.0,5:1.0,6:1.0}{0:3.0,1:2.0,2:2.0,3:1.0,4:2.0,5:1.0,6:1.0}-----------------------XXXXXXXXXXXXXXXXXXXXXXXXXdata.isIdenticalLabel() in DecisionTreeBuilder it should not be here,time is :1359169561176,data.getDataset.getLabel():1.0the tree is buildedCATEGORICAL:LEAF:0.0;,LEAF:0.0;,CATEGORICAL:LEAF:0.0;,CATEGORICAL:LEAF:0.0;,LEAF:1.0;,LEAF:1.0;,;,LEAF:1.0;,;,;对照这个提示结果和前篇的原理分析的<2.1>~<2.6>即可明白意思;
由最后一行提示亦可以看出最后得到的树如下:
转换为原始数据对应的树为:
分享,快乐,成长
转载请注明出处:http://blog.csdn.net/fansy1990
- Mahout决策树算法源码分析(3-1)建树实战
- Mahout决策树算法源码分析(3)
- Mahout决策树算法源码分析(1)
- Mahout决策树算法源码分析(2)
- Mahout决策树算法源码分析(4)
- Mahout决策树算法源码分析(2)
- Mahout线性回归算法源码分析(1)--实战
- Mahout线性回归算法源码分析(1)--实战
- Mahout决策树算法实战
- Mahout贝叶斯算法源码分析(3)
- Mahout贝叶斯算法源码分析(1)
- Mahout并行频繁集挖掘算法源码分析(1)--实战
- Mahout源码MeanShiftCanopy聚类算法分析(3-1)
- Mahout源码分析之DistributedLanczosSolver(1)--实战
- Mahout协同过滤算法源码分析--Itembased Collaborative Filtering实战
- mahout算法源码分析之Itembased Collaborative Filtering实战
- Mahout源码canopy聚类算法分析(3)
- Mahout源码K均值聚类算法分析(3)
- 数据库系统基础教程二:关系数据库设计理论
- CGFloat和float的区别
- 在Eclipse上安装Android 2.3开发环境
- qt从mysql数据库中读取和显示图片
- Android开发经验总结——ListView的使用
- Mahout决策树算法源码分析(3-1)建树实战
- 数组问题:产生一个int数组,长度为100,并向其中随机插入1-100,并且不能重复。
- 小希的迷宫 hdu1272 并查集
- Android ListView常用用法
- next_permutation函数
- 装饰模式(Decorator)
- 评委评分系统
- 有关cookie的一序列基础
- Eclipse快捷键大全(转载)