机器学习基石 4.5 作业一
来源:互联网 发布:怎样安装ubuntu双系统 编辑:程序博客网 时间:2024/05/28 20:19
Question 15
For Questions 15-20, you will play with PLA and pocket algorithm. First, we use an artificial data set to study PLA. The data set is inhttps://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_15_train.dat
Each line of the data set contains one (xn,yn) with xn∈R4. The first 4 numbers of the line contains the components of xn orderly, the last number is yn.
Please initialize your algorithm with w=0 and take sign(0) as ?1
Implement a version of PLA by visiting examples in the naive cycle using the order of examples in the data set. Run the algorithm on the data set. What is the number of updates before the algorithm halts?
A⩾ 201 updates
B 51 - 200 updates
C< 10 updates
D 31 - 50 updates
E 11 - 30 updatesQuestion 16
Implement a version of PLA by visiting examples in fixed, pre-determined random cycles throughout the algorithm. Run the algorithm on the data set. Please repeat your experiment for 2000 times, each with a different random seed. What is the average number of updates before the algorithm halts?
A⩾ 201 updates
B 11 - 30 updates
C 51 - 200 updates
D 31 - 50 updates
E< 10 updatesQuestion 17
Implement a version of PLA by visiting examples in fixed, pre-determined random cycles throughout the algorithm, while changing the update rule to beWt+1←Wt+ηyn(t)Xn(t) with η=0.5. Note that your PLA in the previous Question corresponds to η=1. Please repeat your experiment for 2000 times, each with a different random seed. What is the average number of updates before the algorithm halts?
A 51 - 200 updates
B< 10 updates
C 31 - 50 updates
D 11 - 30 updates
E⩾ 201 updatesQuestion 18
Next, we play with the pocket algorithm. Modify your PLA in Question 16 to visit examples purely randomly, and then add the ‘pocket’ steps to the algorithm. We will usehttps://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_train.dat
as the training data set D, and
https://d396qusza40orc.cloudfront.net/ntumlone%2Fhw1%2Fhw1_18_test.dat
as the test set for ‘’verifying’’ the g returned by your algorithm (see lecture 4 about verifying). The sets are of the same format as the previous one.
Run the pocket algorithm with a total of 50 updates on D, and verify the performance of wPOCKET using the test set. Please repeat your experiment for 2000 times, each with a different random seed. What is the average error rate on the test set?
A 0.6 - 0.8
B< 0.2
C 0.4 - 0.6
D⩾ 0.8
E 0.2 - 0.4Question 19
Modify your algorithm in Question 18 to return w50 (the PLA vector after 50 updates) instead of w^ (the pocket vector) after 50 updates. Run the modified algorithm on D, and verify the performance using the test set. Please repeat your experiment for 2000 times, each with a different random seed. What is the average error rate on the test set?
A< 0.2
B⩾ 0.8
C 0.4 - 0.6
D 0.6 - 0.8
E 0.2 - 0.4Question 20
Modify your algorithm in Question 18 to run for 100 updates instead of 50, and verify the performance of wPOCKET using the test set. Please repeat your experiment for 2000 times, each with a different random seed. What is the average error rate on the test set?
A< 0.2
B 0.2 - 0.4
C 0.6 - 0.8
D⩾ 0.8
E 0.4 - 0.6
完整代码
#include <iostream>#include <fstream>#include <vector>#include <string>#include <cstdlib>#define INF -1#define RANDOM 1#define NUMERICAL_ORDER 0using namespace std;chrono::time_point<std::chrono::system_clock> start = chrono::system_clock::now();struct vector4{ double x0,x1,x2,x3,x4; //构造 vector4(double x0 = 1,double x1 = 0,double x2 = 0,double x3 = 0,double x4 = 0) :x0(x0),x1(x1),x2(x2),x3(x3),x4(x4){} //重载 double operator * (vector4& p);//内积 vector4 operator * (double t);//数乘 vector4 operator + (vector4 p); friend istream&operator >> (istream& in, vector4& p); friend ostream&operator << (ostream& out, vector4& p);};typedef vector<vector4> Vector;//读取数据inline void readFile(Vector&, vector<int>&, Vector&, vector<int>&, string, string = "");//clearinline void clear(Vector&, Vector&, Vector&, Vector&);//PLAinline int PLA(int, Vector&, vector<int>&, vector4& W, double ETA, int);double getETA();//获得随机序列vector<int> Rand(int, int);//符号函数int sign(double x){ return x>0?1:-1; }//犯错次数int mistakes(vector4&, Vector&, vector<int>&);//Pocketinline int Pocket(int, Vector&, vector<int>&, Vector&, vector<int>&, vector4&, double, int);int main() { vector4 W(0, 0, 0, 0, 0); Vector X, VerificationX; vector<int> Y, VerificationY; cout<<"15题的数据集:\n"; readFile(X, Y, VerificationX, VerificationY, "hw1_15_train.dat"); cout<<"1.PLA 顺序:\n"; int times = PLA(NUMERICAL_ORDER, X, Y, W, 1.0, INF); cout<<"update times: "<<times; cout<<"\n\n"; cout<<"2.PLA 随机 2000次:\n"; double average = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; average += PLA(RANDOM, X, Y, W, 1.0, INF); } average/= 2000; cout<<"average update times: "<<average<<endl; cout<<"\n"; cout<<"3.PLA 随机 + 随机ETA 2000次:\n"; average = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; average += PLA(RANDOM, X, Y, W, getETA(), INF); } average/= 2000; cout<<"average update times: "<<average<<endl; cout<<"\n"; X.clear(); Y.clear(); VerificationX.clear(); VerificationY.clear(); cout<<"18题的数据集:\n"; readFile(X, Y, VerificationX, VerificationY, "hw1_18_train.dat", "hw1_18_test.dat"); cout<<"4.Pocket 2000次 update times = 50: \n"; double error = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; Pocket(RANDOM, X, Y, VerificationX, VerificationY, W, 1.0, 50); error += (double)mistakes(W,VerificationX,VerificationY)/VerificationX.size(); } error/=2000; cout<<"average error rate: "<<error<<endl; cout<<"\n"; cout<<"5.Pocket 2000次 update times = 100: \n"; error = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; Pocket(RANDOM, X, Y, VerificationX, VerificationY, W, 1.0, 100); error += (double)mistakes(W,VerificationX,VerificationY)/VerificationX.size(); } error/=2000; cout<<"average error rate: "<<error<<endl; cout<<"\n"; cout<<"6.PLA 2000次 update times = 100: \n"; error = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; PLA(RANDOM, X, Y, W, 1.0, 50); error += (double)mistakes(W,VerificationX,VerificationY)/VerificationX.size(); } error/=2000; cout<<"average error rate: "<<error<<endl; cout<<"\n"; cout<<"7.PLA 2000次 update times = 100: \n"; error = 0; for(int i=0;i<2000;++i){ W = vector4(0, 0, 0, 0, 0); ///test //cout<<"No."<<i+1<<" "; PLA(RANDOM, X, Y, W, 1.0, 100); error += (double)mistakes(W,VerificationX,VerificationY)/VerificationX.size(); } error/=2000; cout<<"average error rate: "<<error<<endl; cout<<"\n";}inline void readFile(Vector& X, vector<int>& Y, Vector& VerificationX, vector<int>& VerificationY, string train, string test){//训练集,测试集 vector4 x; int y; ifstream Data(train,ios::in); if(Data) for(int i=0;Data>>x>>y;++i){ X.push_back(x); Y.push_back(y); } Data.close(); if(test != ""){ Data.open(test, ios::in); if(Data) for(int i=0;Data>>x>>y;++i){ VerificationX.push_back(x); VerificationY.push_back(y); } Data.close(); }}//vector4vector4 vector4::operator + (vector4 p){ return vector4(x0+p.x0,x1+p.x1,x2+p.x2,x3+p.x3,x4+p.x4);}vector4 vector4::operator * (double t){ return vector4(x0*t,x1*t,x2*t,x3*t,x4*t);}istream&operator >> (istream& in, vector4& p){ p.x0 = 1; in>>p.x1>>p.x2>>p.x3>>p.x4; return in;}ostream&operator << (ostream& out, vector4& p){ out<<p.x0<<" "<<p.x1<<" "<<p.x2<<" "<<p.x3<<" "<<p.x4; return out;}double vector4::operator * (vector4& p){ return x0*p.x0 + x1*p.x1 + x2*p.x2 + x3*p.x3 + x4*p.x4;}inline int PLA(int op, Vector& X, vector<int>& Y, vector4& W, double ETA, int maxTimes){ ///test //cout<<"ETA: "<<ETA<<endl; int times = 0; bool halt = false; while(!halt){ halt = true; vector<int> rnd = Rand(op,(int)X.size()); for(int i=0;i<X.size()&&(times<maxTimes||maxTimes == INF);++i){ if(sign(W*X[rnd[i]]) != Y[rnd[i]]){ halt = false; ++times; W = W + X[rnd[i]]*Y[rnd[i]]*ETA; } } } ///test //cout<<"W: "<<W<<endl; //cout<<"update times: "<<times<<"\n"<<endl; return times;}inline int Pocket(int op, Vector& X, vector<int>& Y, Vector& VerificationX, vector<int>& VerificationY, vector4& W, double ETA, int maxTimes){ ///test //cout<<"ETA: "<<ETA<<endl; vector4 min(0, 0, 0, 0, 0); int times = 0; bool halt = false; while(!halt&×<maxTimes){ halt = true; vector<int> rnd = Rand(op,(int)X.size()); for(int i=0;i<X.size()&×<maxTimes;++i){ if(sign(W*X[rnd[i]]) != Y[rnd[i]]){ halt = false; ++times; W = W + X[rnd[i]]*Y[rnd[i]]*ETA; if(mistakes(W, VerificationX, VerificationY) < mistakes(min, VerificationX, VerificationY)){ min = W; } } } } W = min; ///test //cout<<"W: "<<W<<endl; //cout<<"update times: "<<times<<"\n"<<endl; return times;}vector<int> Rand(int x, int size){ vector<int> rnd; for(int i = 0;i < size; ++i){ rnd.push_back(i); } if(x){ chrono::duration<double> elapsed_seconds; elapsed_seconds = chrono::system_clock::now()-start; srand((unsigned int)(elapsed_seconds.count()*1000000)); int times = 8000; for(int i=0;i<times;i++){ int first = rand()%size; int second = rand()%size; swap(rnd[first],rnd[second]); } } return rnd;}double getETA(){ double ETA = rand(); while (ETA>1){ ETA/=10; } return ETA;}int mistakes(vector4& W, Vector& VerificationX, vector<int>& VerificationY){ int count = 0; for(int i=0;i<VerificationX.size();++i){ if(sign(W*VerificationX[i]) != VerificationY[i]){ ++count; } } return count;}
对于15题的训练集:
- 顺序读取需要的更新次数为45次。
- 随机读取需要的更新次数在40次左右。
- 更改
η 并不会改变次数(和网上的有些说法不同)。
对于18题的训练集:
- Pocket 2000次 update times = 50: 平均错误率:0.12左右
- Pocket 2000次 update times = 100: 平均错误率:0.10左右
- PLA 2000次 update times = 50: 平均错误率:0.36左右
- PLA 2000次 update times = 100: 平均错误率:0.33左右
注意:Pocket:仍然需要进行PLA,只是记录出现过的最好的。
- 机器学习基石 4.5 作业一
- 《机器学习基石》作业一
- 机器学习基石作业一PLA算法
- 机器学习基石第二次作业
- 《机器学习基石》作业一第17题的程序
- 机器学习基石系列一
- 机器学习基石第四次作业代码
- 机器学习基石 作业1 实现PLA和Pocket算法
- 机器学习基石作业4第5题
- 台大机器学习基石作业编程题
- 林轩田-机器学习基石-作业1-python源码
- 林轩田-机器学习基石-作业3-python源码
- 林轩田-机器学习基石-作业4-python源码
- 机器学习基石 作业1 程序题(15-20)
- 机器学习基石——作业2解答
- 机器学习基石作业1-17-PLA 的c++实现
- 机器学习基石作业1-Pocket_PLA 的c++实现
- Coursera课程-机器学习基石作业一Q18-Q20(pocket on D算法 for PLA/C++ edition)
- _String类
- 小公司程序员怎么进大公司--容易的路越走越难走
- Centos7提示" xxx 不在 sudoers 文件中。此事将被报告。"
- Kotlin-47.Kotlin调用JavaScript(Call JavaScript from Kotlin)
- Spring 4.0 学习日记(2) --IOC 创建对象方式小记
- 机器学习基石 4.5 作业一
- git--搭配git服务器
- sql
- Ajax技术--引入jQuery不起作用(细心)
- 网易云课堂JAVA进阶程序设计题一
- 并查集的扩展 poj 2492
- Codeforces55D_Beautiful numbers_记忆化搜索版数位DP
- cs231n-2017-assignments2-TensorFlow.ipynb 心得体会
- BZOJ 1079-着色方案(DP)