logistic算法使用方法

来源:互联网 发布:迈乐网络电视机顶盒 编辑:程序博客网 时间:2024/06/06 07:46
数据格式:
"x","y","shape","color","k","k0","xx","xy","yy","a","b","c","bias"0.923307513352484,0.0135197141207755,21,2,4,8,0.852496764213146,0.0124828536260896,0.000182782669907495,0.923406490600458,0.0778750292332978,0.644866125183976,10.711011884035543,0.909141522599384,22,2,3,9,0.505537899239772,0.64641042683833,0.826538308114327,1.15415605849213,0.953966686673604,0.46035073663368,10.75118898646906,0.836567111080512,23,2,3,9,0.564284893392414,0.62842000028592,0.699844531341594,1.12433510339845,0.872783737128441,0.419968245447719,10.308209649519995,0.418023289414123,24,1,5,1,0.094993188057238,0.128838811521522,0.174743470492603,0.519361780024138,0.808280495564412,0.208575453051705,10.849057961953804,0.500220163026825,25,1,5,2,0.720899422757147,0.424715912147755,0.250220211498583,0.985454024425153,0.52249756970547,0.349058031386046,10.0738831346388906,0.486534863477573,21,2,6,1,0.00545871758406844,0.0359467208248278,0.236716173379140,0.492112681164801,1.04613986717142,0.42632955896436,10.612888508243486,0.0204555552918464,22,2,4,10,0.375632323536926,0.0125369747681119,0.000418429742297785,0.613229772009826,0.387651566219268,0.492652707029903,1

数据第一行为每行数据对应值的变量。数据中间用逗号分割。

使用mahout构建训练模型:

命令:mahout trainlogistic  --input  <输入文件路径>  --output  <输出文件路径>   --target <要进行分类的目标变量名>   --categories <目标变量类别的个数>  --predictors  < 预测变量的名称,多个变量时用空格分割> --type <预测变量的类型列表>   --feature <设定用于构建模型的内部特征向量大小>  --passes  <指定在训练过程中对训练数据的复核次数>  --rate<设定初始学习率>

其他的一些命令解释:

--quiet                                   产生较少的状态和进度输出

--lamdba                               控制算法在最终模型中对变量的抑制程度

--noBias                                 消除模型中截距项

模型的评估:

mahout   runlogistic --input<输入文件路径>  --model<模型文件路径>  --auc<读入数据后打印模型在输入数据上的AUC分值>  --confusion<打印某个阀值的混淆矩阵>

其他一些命令:

--scores                                    打印每个输入样本的目标变量值和分数

0 0
原创粉丝点击