随机森林算法的使用

来源:互联网 发布:python语言的前景 编辑:程序博客网 时间:2024/04/28 19:28

数据格式:

1,1.52101,13.64,4.49,1.10,71.78,0.06,8.75,0.00,0.00,12,1.51761,13.89,3.60,1.36,72.73,0.48,7.83,0.00,0.00,13,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.00,0.00,14,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.00,0.00,15,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.00,0.00,16,1.51596,12.79,3.61,1.62,72.97,0.64,8.07,0.00,0.26,17,1.51743,13.30,3.60,1.14,73.09,0.58,8.17,0.00,0.00,18,1.51756,13.15,3.61,1.05,73.24,0.57,8.24,0.00,0.00,19,1.51918,14.04,3.58,1.37,72.08,0.56,8.30,0.00,0.00,1

一、生成描述文件

命令:hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.tools.Describe

--path(-p)                                 任务的输入路径,必选

--file(-f)                                      任务的描述文件路径,必选

--descriptor(-d)                         输入数据的描述,可选

--regression(-r)                        指名使用回归或者分类,默认是分类  ,可选

--help

二、建立随机森林模型

hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest

--data(-d)                                  任务的输入文件选项,必选

--dataset(-ds)                           描述文件的路径 ,必选

--selection(-sl)                         随机选取属性的个数对于分类问题默认为平方根,对于回归问题默认为1/3个,可选

--no-complete(-nc)                 建立决策树是否完整,可选

--minsplit(-mp)                        决策树是否分支的数据几容量阀值,可选

--minprop(-mp)                       决策树是否分支的数据集比例阀值,可选

--seed(-sd)                              随机种子,可选

--partial(-p)                             使用mapreduce还是sequential,可选

--nbtrees(-t)                             决策树个数,必选

--output(-o)                              任务的输出文件路径,必选

--help

三、测试随机森林

hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest

--input (-i)                                  任务的输入文件路径,必选

--dataset(-ds)                            描述文件路径,必选

--model(-m)                               随机森林模型的存储路径,必选

--mapreduce(-mr)                   使用mapreduce还是sequential,可选

--output(-o)                                任务的输出文件路径,必选

--analyze(-a)                             是否显示评估信息,可选

--help

0 0
原创粉丝点击