动手实现Logistic Regression (c++)_测试

来源:互联网 发布:助创cms众筹破解版 编辑:程序博客网 时间:2024/05/21 18:49

承上文,接口有了,实现晚了,就找点儿数据测吧。整个过程是在windows vs2008下完成。

数据样例如下:

1 0 1 2 5 81 1 3 4 7 81 01 0 1 4 5 8 91 0 1 2 4 5 90 4 5 7 81 0 21 2 3 5 6 7 81 0 3 4 5 6 71 0 1 2 4 5 7 8 9

数据来源是从 http://komarix.org/ 网站上找的一个LR工具包,里面自带的测试数据,我简单调整了格式。现在的数据格式是:分类标识 特征索引1 特征索引2 ......

测试代码,在Test函数中,如下:

void LogisticRegression::Test (void){/*TrainSGDOnSampleFile ("..\\Data\\SamplesTrain.txt", 10, 0.01, 100, 0.05);SaveLRModelTxt ("Model\\Mod_001_100_005.txt");*//*LoadLRModelTxt ("Model\\Mod_001_100_005.txt");PredictOnSampleFile ("..\\Data\\SamplesTest.txt", "Model\\Rslt_001_100_005.txt", "Model\\Log_001_100_005.txt");*//*TrainSGDOnSampleFile ("..\\Data\\SamplesTrain.txt", 10, 0.01, 1, 0.05);SaveLRModelTxt ("Model\\Mod_001_1_005.txt");*/LoadLRModelTxt ("Model\\Mod_001_1_005.txt");PredictOnSampleFile ("..\\Data\\SamplesTest.txt", "Model\\Rslt_001_1_005.txt", "Model\\Log_001_1_005.txt");}

按照“训练-测试-换参数训练-再测试”来进行的。在main函数中调用Test函数运行:

#include "LogisticRegression.h"#include <iostream>using namespace std;int main (void){cout << "Hello world for Logistic Regression" << endl;LogisticRegression toDo;toDo.Test ();return 0;}

样本集合里共393个样本,前300个用来训练,后93个用来测试,共用了10个特征。当只有1轮迭代的时候,训练出的参数列表为:

100.4925280.1668050.1265430.4759350.1375430.1107320.08447690.1590480.1180250.0599141

(第一行是参数个数)。测试结果为81.72%:

The total number of sample is : 93The correct prediction number is : 76Precision : 0.817204

当最大迭代轮数增加到100时,训练过程的log为:

Hello world for Logistic RegressionIn loop 0: current cost (0.541574) previous cost (1) ratio (0.458426)In loop 1: current cost (0.433781) previous cost (0.541574) ratio (0.199035)In loop 2: current cost (0.380875) previous cost (0.433781) ratio (0.121966)In loop 3: current cost (0.340621) previous cost (0.380875) ratio (0.105687)In loop 4: current cost (0.308502) previous cost (0.340621) ratio (0.0942953)In loop 5: current cost (0.282294) previous cost (0.308502) ratio (0.0849548)In loop 6: current cost (0.260501) previous cost (0.282294) ratio (0.0771983)In loop 7: current cost (0.242082) previous cost (0.260501) ratio (0.0707073)In loop 8: current cost (0.226293) previous cost (0.242082) ratio (0.0652207)In loop 9: current cost (0.212595) previous cost (0.226293) ratio (0.0605327)In loop 10: current cost (0.200586) previous cost (0.212595) ratio (0.0564851)In loop 11: current cost (0.189964) previous cost (0.200586) ratio (0.0529563)In loop 12: current cost (0.180494) previous cost (0.189964) ratio (0.0498525)

共进行了13轮迭代,直到cost下降幅度小于5%为止。训练出的参数为:

102.56680.12814-0.1352312.55564-0.0892993-0.256378-0.279498-0.100414-0.185509-0.447792

对比上面的参数,各个参数之间的大小关系没变,不过参数值之间的“差距”在拉大。此时再测,结果为100%:

The total number of sample is : 93The correct prediction number is : 93Precision : 1

完。


转载请注明出处:http://blog.csdn.net/xceman1997/article/details/17882981


0 0