机器学习入门（一）

来源：互联网发布：钢管舞教学视频软件编辑：程序博客网时间：2024/06/06 03:58

前几天听了不错的机器学习课，顺利入坑。但是由于基础差，刚开始的线性回归部分都有些问题搞不清楚，老师讲这个地方的时候，不知道。所以需要自己补过来。

先如何进行线性回归？

思路：代价函数求极值-》梯度下降法：批梯度下降，随机梯度下降。

批梯度和随机梯度只是每次迭代的时候，计算样本的数量不一样而已。对于随机梯度，只是计算当前样本，改变theta优化代价函数；而批梯度则是每次计算所有的样本，再去改变theta。而且求解公式的重要一步都是对于某个样本的单个特征值情况下，theta公式的推导。

要获取J(θ)最小，即对J(θ)进行求导且为零：

当单个特征值时，上式中j表示系数(权重)的编号，右边的值赋值给左边θ_j从而完成一次迭代。

单个特征的迭代如下：

theta(i)表示样本第i个特征的权重，因此也就只和样本的第i个特征相关。

批梯度下降：（此处拷贝别人代码，外加自己注释。通过代码来理解公式也是一个很好的角度。）

#Training data set12 #each element in x represents (x0,x1,x2)13 x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]14 #y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]15 y = [95.364,97.217205,75.195834,60.105519,49.342380]16 # x,y 一共五对。每个样本x有三个特征。17 18 epsilon = 0.00000119 #learning rate20 alpha = 0.00121 diff = [0,0]22 error1 = 023 error0 =024 m = len(x)25 #学习率也代表着步进的快慢，和其他一些手动设置的参数。26 #init the parameters to zero27 theta0 = 028 theta1 = 029 theta2 = 030 sum0 = 031 sum1 = 032 sum2 = 033 while True:34     35     #calculate the parameters，每个样本进行循环。对于单个样本进行循环的时候，有化简的公式。返回上文去看。36     for i in range(m):37         #begin batch gradient descent38         diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )39         sum0 = sum0 + alpha * diff[0]* x[i][0]40         sum1 = sum1 + alpha * diff[0]* x[i][1]41         sum2 = sum2 + alpha * diff[0]* x[i][2]42         #end  batch gradient descent43     theta0 = sum0;44     theta1 = sum1;45     theta2 = sum2;46     #calculate the cost function47     error1 = 048     for lp in range(len(x)):49         error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/250     51     if abs(error1-error0) < epsilon:52         break53     else:54         error0 = error155     56     print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)57 58 print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

随机梯度下降

#Training data set12 #each element in x represents (x0,x1,x2)13 x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]14 #y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]15 y = [95.364,97.217205,75.195834,60.105519,49.342380]16 17 18 epsilon = 0.000119 #learning rate20 alpha = 0.0121 diff = [0,0]22 error1 = 023 error0 =024 m = len(x)25 26 27 #init the parameters to zero28 theta0 = 029 theta1 = 030 theta2 = 031 32 while True:33 34     #calculate the parameters35     for i in range(m):36     37         diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )38         39         theta0 = theta0 + alpha * diff[0]* x[i][0]40         theta1 = theta1 + alpha * diff[0]* x[i][1]41         theta2 = theta2 + alpha * diff[0]* x[i][2]42     43     #calculate the cost function44     error1 = 045     for lp in range(len(x)):46         error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/247     48     if abs(error1-error0) < epsilon:49         break50     else:51         error0 = error152     53     print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)54 55 print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)

可以看到二者的主要区别在红色代码部分，批梯度则是给出三个权值theta的公式，然后全部计算所有样本的再累加最后赋给theta。随机梯度则是，每次只计算一个样本的各个特征权值。

阅读全文

0 0