林轩田-机器学习基石-作业3-python源码
来源:互联网 发布:英语会被取代知乎 编辑:程序博客网 时间:2024/06/10 21:00
大家好,以下是林轩田机器学习基石--作业3的Python的参考代码,自己码的。Python方面没有工程经验,如有错误或者更好的代码优化方法,麻烦大家留言提醒一下下,大家互相交流学习,谢谢。
13-15题主要考察在分类问题上的线性回归和特征转换,所使用的样本点均由目标函数
f(x1; x2) = sign(x1^2 + x2^2 − 0.6)
产生
13.1:在使用线性回归且不进行参数转换的情况下(也就是直接使用特征向量(1; x1; x2)),对数据进行拟合。进行1000次试验,并且画出1000次试验的Ein的直方图,并求出Ein的平均值。
### For Questions 13-15, Generate a training set of N = 1000 points on X = [−1; 1] × [−1; 1] with uniform probability of### picking each x 2 X. Generate simulated noise by flipping the sign of the output in a random 10% subset### of the generated training set.import randomimport numpy as npimport matplotlib.pyplot as plt### target function f(x1, x2) = sign(x1^2 + x2^2 - 0.6)def target_function(x1, x2): return (1 if (x1*x1 + x2*x2 - 0.6) >= 0 else -1)### plot dot picture, two dimension featuresdef plot_dot_picture(features, lables, w=np.zeros((3, 1))): x1 = features[:,1] x2 = features[:,2] y = lables[:,0] plot_size = 20 size = np.ones((len(x1)))*plot_size size_x1 = np.ma.masked_where(y<0, size) size_x2 = np.ma.masked_where(y>0, size) ### plot scatter plt.scatter(x1, x2, s=size_x1, marker='x', c='r') plt.scatter(x1, x2, s=size_x2, marker='o', c='b') ### plot w line x1_tmp = np.arange(-1,1,0.01) x2_tmp = np.arange(-1,1,0.01) x1_tmp, x2_tmp = np.meshgrid(x1_tmp, x2_tmp) f = x1_tmp*w[1, 0] + x2_tmp*w[2, 0] + w[0, 0] try: plt.contour(x1_tmp, x2_tmp, f, 0) except ValueError: pass plt.xlabel('X1') plt.ylabel('X2') plt.title('Feature scatter plot') plt.legend() plt.show()### return a numpy arraydef training_data_with_random_error(num=1000): features = np.zeros((num, 3)) labels = np.zeros((num, 1)) points_x1 = np.array([round(random.uniform(-1, 1) ,2) for _ in range(num)]) points_x2 = np.array([round(random.uniform(-1, 1) ,2) for _ in range(num)]) for i in range(num): features[i, 0] = 1 features[i, 1] = points_x1[i] features[i, 2] = points_x2[i] labels[i] = target_function(points_x1[i], points_x2[i]) ### choose 10 error labels if i <= num*0.1: labels[i] = (1 if labels[i]<0 else -1) return features, labelsdef error_rate(features, labels, w): wrong = 0 for i in range(len(labels)): if np.dot(features[i], w)*labels[i,0] < 0: wrong += 1 return wrong/(len(labels)*1.0)(features,labels) = training_data_with_random_error(1000)plot_dot_picture(features, labels)
因为特征是二维的,很容易用图片表述。从图上可以看出,点的分布和目标方程大致一致。
### 13.1 (*) Carry out Linear Regression without transformation, i.e., with feature vector:### (1; x1; x2);### to find wlin, and use wlin directly for classification. Run the experiments for 1000 times and plot### a histogram on the classification (0/1) in-sample error (Ein). What is the average Ein over 1000### experiments?""" linear regression: model : g(x) = Wt * X strategy : squared error algorithm : close form(matrix) result : w = (Xt.X)^-1.Xt.Y"""def linear_regression_closed_form(X, Y): return np.linalg.inv(np.dot(X.T, X)).dot(X.T).dot(Y)w = linear_regression_closed_form(features, labels)""" plot the one result(just for visual)"""plot_dot_picture(features, labels, w)""" run 1000 times, and plot histogram"""error_rate_array = []for i in range(1000): (features,labels) = training_data_with_random_error(1000) w = linear_regression_closed_form(features, labels) error_rate_array.append(error_rate(features, labels, w))bins = np.arange(0,1,0.05)plt.hist(error_rate_array, bins, rwidth=0.8, histtype='bar')plt.title("Error rate histogram(without feature transform)")plt.show()### error rate, approximately 0.5avr_err = sum(error_rate_array)/(len(error_rate_array)*1.0)print "13.1--Linear regression for classification without feature transform:Average error--",avr_err
下面这张图片是执行1次试验学习到的直线,可见效果很糟糕
下面这张直方图描述的是执行1000次的Ein的直方图,Ein大概为0.5左右,可以说没什么学习效果。证明我们选择的一次模型不能够满足该数据集。
13.1--Linear regression for classification without feature transform:Average error-- 0.50587
### Now, transform the training data into the following nonlinear feature vector:### (1; x1; x2; x1x2; x1^2; x2^2)### Find the vector ~w that corresponds to the solution of Linear Regression, and take it for classification.""" feature transform φ(x) = z = (1; x1; x2; x1x2; x1^2; x2^2)"""def feature_transform(features): new = np.zeros((len(features), 6)) new[:, 0:3] = features[:,:]*1 new[:, 3] = features[:, 1] * features[:, 2] new[:, 4] = features[:, 1] * features[:, 1] new[:, 5] = features[:, 2] * features[:, 2] return newdef plot_dot_pictures(features, lables, w=np.zeros((6, 1))): x1 = features[:,1] x2 = features[:,2] y = lables[:,0] plot_size = 20 size = np.ones((len(x1)))*plot_size size_x1 = np.ma.masked_where(y<0, size) size_x2 = np.ma.masked_where(y>0, size) ### plot scatter plt.scatter(x1, x2, s=size_x1, marker='x', c='r') plt.scatter(x1, x2, s=size_x2, marker='o', c='b') ### plot w line x1_tmp = np.arange(-1,1,0.01) x2_tmp = np.arange(-1,1,0.01) x1_tmp, x2_tmp = np.meshgrid(x1_tmp, x2_tmp) f = w[0, 0] + x1_tmp*w[1, 0] + x2_tmp*w[2, 0] + x1_tmp*x2_tmp*w[3, 0] \ + x1_tmp*x1_tmp*w[4, 0] + x2_tmp*x2_tmp*w[5, 0] try: plt.contour(x1_tmp, x2_tmp, f, 0) except ValueError: pass plt.xlabel('X1') plt.ylabel('X2') plt.title('Feature scatter plot') plt.legend() plt.show()""" plot the one result(just for visual)"""(features,labels) = training_data_with_random_error(1000)new_features = feature_transform(features)w = linear_regression_closed_form(new_features, labels)plot_dot_pictures(features, labels, w)""" run 1000 times, and plot histogram"""error_rate_array = []for i in range(1000): (features,labels) = training_data_with_random_error(1000) new_features = feature_transform(features) w = linear_regression_closed_form(new_features, labels) error_rate_array.append(error_rate(new_features, labels, w))bins = np.arange(0,1,0.05)plt.hist(error_rate_array, bins, rwidth=0.8, histtype='bar')plt.title("Error rate histogram(with feature transform)")plt.show()### error rate, approximately 0.5avr_err = sum(error_rate_array)/(len(error_rate_array)*1.0)print "13.2--Linear regression for classification with feature transform:Average error--",avr_err
所以在13题后面,我们使用二次的假设,并使用使用了特征转换,将非线性问题转换为线性问题,以便于使用线性回归。从图中看出来,我们学习的效果很不错,错误率在12%左右(数据集里面本身有10%的噪声点)
13.2--Linear regression for classification with feature transform:Average error-- 0.124849
所以说线性回归总是适合分类分类问题吗?下面做了一个小实验。有意地挑选了六个样本点,分别在[1,1]附近和[-1, -1]附近。
### is linear regression always good for classification, see the following examplefeatures = np.array([[1, 1.1, 1.2], [1, 1.2,1.0], [1, 1.0, 1.0], [1, -1.1, -1.2], [1, -1.2, -1.0], [1, -1.0, -1.0]])labels = np.array([[1],[1],[1],[-1],[-1],[-1]])w = linear_regression_closed_form(features, labels)""" plot the one result(just for visual)"""plot_dot_picture(features, labels, w)
使用线性回归,得到如图的一条直线(其实该结果出乎了我的意料,我还以为会生成一条类似y=x的直线呢)。
### if add a new large x point, what happens?features = np.array([[1, 100, 100], [1, 1.1, 1.2], [1, 1.2,1.0], [1, 1.0, 1.0], [1, -1.1, -1.2], [1, -1.2, -1.0], [1, -1.0, -1.0]])labels = np.array([[1], [1],[1],[1],[-1],[-1],[-1]])w = linear_regression_closed_form(features, labels)""" plot the one result(just for visual)"""print wplot_dot_picture(features, labels, w)print np.dot(features, w)### total 7 points, 2 points error!!!!!
现在加入一个[100, 100]的样本点,加入这个点是很合理的,可见生成了一条类似Y=X的直线,但是居然有2个点分类错误(本来有图的。。)。但如果该问题用binary classification或者其他的分类器,均可以很好的工作。所以现行回归并不是总是适合分类问题的。
### 14. (*) Run the experiment for 1000 times, and plot a histogram on ~ w3, the weight associated with### x1x2. What is the average ~ w3?""" run 1000 times, and plot histogram"""w3_array = []for i in range(1000): (features,labels) = training_data_with_random_error(1000) new_features = feature_transform(features) w = linear_regression_closed_form(new_features, labels) w3_array.append(w[3,0])bins = np.arange(-2,2,0.05)plt.hist(w3_array, bins, rwidth=0.8, histtype='bar')plt.title("Parameters W3(with feature transform)")plt.show()print "Average of W3 is: ", sum(w3_array)/(len(w3_array)*1.0)
Average of W3 is: 0.00120328875641
### 15. (*) Continue from Question 14, and plot a histogram on the classification Eout instead. You can### estimate it by generating a new set of 1000 points and adding noise as before. What is the average### Eout?error_out = []for i in range(1000): (features,labels) = training_data_with_random_error(1000) new_features = feature_transform(features) error_out.append(error_rate(new_features,labels, w))bins = np.arange(-1,1,0.05)plt.hist(error_out, bins, rwidth=0.8, histtype='bar')plt.title("Error out(with feature transform)")plt.show()print "Average of Eout is: ", sum(error_out)/(len(error_out)*1.0)
Average of Eout is: 0.133649
### 18. (*) Implement the fixed learning rate gradient descent algorithm below for logistic regression, initialized with 0. Run the algorithm with η = 0:001 and T = 2000 on the following set for training:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_train.dat### and the following set for testing:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_test.dat### What is the weight vector within your g? What is the Eout(g) from your algorithm, evaluated using### the 0=1 error on the test set?import mathimport numpy as np"""Read data from data file"""def data_load(file_path): ### open file and read lines f = open(file_path) try: lines = f.readlines() finally: f.close() ### create features and lables array example_num = len(lines) feature_dimension = len(lines[0].strip().split()) ###i do not know how to calculate the dimension features = np.zeros((example_num, feature_dimension)) features[:,0] = 1 labels = np.zeros((example_num, 1)) for index,line in enumerate(lines): ### items[0:-1]--features items[-1]--label items = line.strip().split(' ') ### get features features[index,1:] = [float(str_num) for str_num in items[0:-1]] ### get label labels[index] = float(items[-1]) return features,labels### gradient descentdef gradient_descent(X, Y, w): ### -YnWtXn tmp = -Y*(np.dot(X, w)) ### θ(-YnWtXn) = exp(tmp)/1+exp(tmp) ### weight_matrix = np.array([math.exp(_)/(1+math.exp(_)) for _ in tmp]).reshape(len(X), 1) weight_matrix = np.exp(tmp)/((1+np.exp(tmp))*1.0) gradient = 1/(len(X)*1.0)*(sum(weight_matrix*-Y*X).reshape(len(w), 1)) return gradient### gradient descentdef stochastic_gradient_descent(X, Y, w): ### -YnWtXn tmp = -Y*(np.dot(X, w)) ### θ(-YnWtXn) = exp(tmp)/1+exp(tmp) ###weight = math.exp(tmp[0])/((1+math.exp(tmp[0]))*1.0) weight = np.exp(tmp)/((1+np.exp(tmp))*1.0) gradient = weight*-Y*X return gradient.reshape(len(gradient), 1)### LinearRegression Class,first time use Class, HaHa...class LinearRegression: 'Linear Regression of My' def __init__(self): pass ### fit model def fit(self, X, Y, Eta=0.001, max_interate=2000, sgd=False): ### ∂E/∂w = 1/N * ∑θ(-YnWtXn)(-YnXn) self.__w = np.zeros((len(X[0]),1)) if sgd == False: for i in range(max_interate): self.__w = self.__w - Eta*gradient_descent(X, Y, self.__w) else: index = 0 for i in range(max_interate): if (index >= len(X)): index = 0 self.__w = self.__w - Eta*stochastic_gradient_descent(np.array(X[index]), Y[index], self.__w) index += 1 ### predict def predict(self, X): binary_result = np.dot(X, self.__w) >= 0 return np.array([(1 if _ > 0 else -1) for _ in binary_result]).reshape(len(X), 1) ### get vector w def get_w(self): return self.__w ### score(error rate) def score(self, X, Y): predict_Y = self.predict(X) return sum(predict_Y != Y)/(len(Y)*1.0)### training model(X, Y) = data_load("hw3_train.dat")lr = LinearRegression()lr.fit(X, Y, max_interate = 2000)### get weight vectorprint "weight vector: ", lr.get_w()### get 0/1 error in test datatest_X, test_Y = data_load("hw3_test.dat")###print "Eout: ", lr.score(test_X,test_Y) lr.score(test_X,test_Y)
weight vector: [[ 0.01878417] [-0.01260595] [ 0.04084862] [-0.03266317] [ 0.01502334] [-0.03667437] [ 0.01255934] [ 0.04815065] [-0.02206419] [ 0.02479605] [ 0.06899284] [ 0.0193719 ] [-0.01988549] [-0.0087049 ] [ 0.04605863] [ 0.05793382] [ 0.061218 ] [-0.04720391] [ 0.06070375] [-0.01610907] [-0.03484607]]array([ 0.475])
### 19. (*) Implement the fixed learning rate gradient descent algorithm below for logistic regression,### initialized with 0. Run the algorithm with η = 0:01 and T = 2000 on the following set for training:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_train.dat### and the following set for testing:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_test.dat### What is the weight vector within your g? What is the Eout(g) from your algorithm, evaluated using### the 0=1 error on the test set?### training model(X, Y) = data_load("hw3_train.dat")lr_eta = LinearRegression()lr_eta.fit(X, Y, 0.01, 2000)### get weight vectorprint "weight vector: ", lr_eta.get_w()### get 0/1 error in test datatest_X, test_Y = data_load("hw3_test.dat")print "Eout: ", lr_eta.score(test_X,test_Y)
weight vector: [[-0.00385379] [-0.18914564] [ 0.26625908] [-0.35356593] [ 0.04088776] [-0.3794296 ] [ 0.01982783] [ 0.33391527] [-0.26386754] [ 0.13489328] [ 0.4914191 ] [ 0.08726107] [-0.25537728] [-0.16291797] [ 0.30073678] [ 0.40014954] [ 0.43218808] [-0.46227968] [ 0.43230193] [-0.20786372] [-0.36936337]]Eout: [ 0.22]
### 20. (*) Implement the fixed learning rate stochastic gradient descent algorithm below for logistic regression,### initialized with 0. Instead of randomly choosing n in each iteration, please simply pick### the example with the cyclic order n = 1; 2; : : : ; N; 1; 2; : : :. Run the algorithm with η = 0:001 and### T = 2000 on the following set for training:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_train.dat### and the following set for testing:### http://www.csie.ntu.edu.tw/~htlin/course/ml15fall/hw3/hw3_test.dat### What is the weight vector within your g? What is the Eout(g) from your algorithm, evaluated using### the 0=1 error on the test set?### training model(X, Y) = data_load("hw3_train.dat")lr_sgd = LinearRegression()lr_sgd.fit(X, Y, sgd=True, max_interate = 2000)### get weight vectorprint "weight vector: ", lr_sgd.get_w()### get 0/1 error in test datatest_X, test_Y = data_load("hw3_test.dat")print "Eout: ", lr_sgd.score(test_X,test_Y)
weight vector: [[ 0.01826899] [-0.01308051] [ 0.04072894] [-0.03295698] [ 0.01498363] [-0.03691042] [ 0.01232819] [ 0.04791334] [-0.02244958] [ 0.02470544] [ 0.06878235] [ 0.01897378] [-0.02032107] [-0.00901469] [ 0.04589259] [ 0.05776824] [ 0.06102487] [-0.04756147] [ 0.06035018] [-0.01660574] [-0.03509342]]Eout: [ 0.473]
阅读全文
0 0
- 林轩田-机器学习基石-作业3-python源码
- 林轩田-机器学习基石-作业1-python源码
- 林轩田-机器学习基石-作业4-python源码
- 机器学习基石第二次作业
- 《机器学习基石》作业一
- 机器学习基石第四次作业代码
- 机器学习基石 4.5 作业一
- 机器学习基石作业一PLA算法
- 机器学习基石—作业2(16-20题Python实现)
- 林轩田--机器学习基石&机器学习技法
- 机器学习基石 作业3 Logistic回归/线性回归/特征转换
- 台大-林轩田老师-机器学习基石学习笔记3
- 机器学习基石 作业1 实现PLA和Pocket算法
- 《机器学习基石》作业一第17题的程序
- 机器学习基石作业4第5题
- 台大机器学习基石作业编程题
- 机器学习基石 作业1 程序题(15-20)
- 机器学习基石——作业2解答
- PMML模型文件在机器学习的实践经验
- Elasticsearch学习笔记(三) mongodb数据同步
- JavaScript使元素滑动
- 创建StringBuffer类
- 技术问题列表
- 林轩田-机器学习基石-作业3-python源码
- Universal Link 通用链接
- use English to self-introduction or speech
- muduo网络库学习(一)对io复用的封装Poller,面向对象与基于对象
- aes对称加密 -- node版
- Android开发之--布局中包含listview时 如何去除listview的默认置顶(焦点)
- 在linux搭建集群(nginx)
- 关键字标示符包的命名注释常量进制转换变量基本数据类型引用数据类型数据类型转换
- 2016北京区域赛E UVAlive 7672 题目:What a Ridiculous Election 带约束条件的BFS