机器学习算法总结（上）

来源：互联网发布：长春数控cnc编程招聘编辑：程序博客网时间：2024/06/11 04:55

写在前面：学习机器学习也有了一段时间，现在想总结一下常用的机器学习算法，主要是说明原理，以及代码实现算法核心逻辑

1. 线性回归

线性回归的目标是，通过选择参数w，构建起假设y=wx+b，w是一个向量，以此预测出自变量x对应的因变量。
代价函数：

J (θ) = 1 2 * m (h θ (x (i)) - y (i)) 2

梯度下降算法：

θ j = θ j - α * 1 m (h θ (x (i)) - y (i)) * x i j

j = 0, . . ., n + 1 ； n 是 自 变 量 的 维 度

代码实现：

#################################################  # logRegression: linear Regression  # Author : jiede1  # Date   : 2017-06-20  # HomePage : http://blog.csdn.net/jiede1  # Email  : 3081881935@qq.com  #################################################  #代价函数def ComputeCost(dataSet,classLabel,w):    dataSet=np.mat(dataSet)    classLabel=np.mat(classLabel).\    \reshape((dataSet.shape[0],1))    m=dataSet.shape[0]    dataset=np.hstack((dataSet,np.mat(ones((m,1)))))    J=1/(2*m)*sum((dataSet*w-classLabel).T*(dataSet*w-classLabel))    return J#梯度下降def GradientDescent(dataSet,classLabel,w,alpha,time):    dataSet=np.mat(dataSet)    classLabel=np.mat(classLabel).\    \reshape((dataSet.shape[0],1))    m=dataSet.shape[0]    dataset=np.hstack((dataSet,np.mat(ones((m,1)))))    print(dataSet.shape,classLabel.shape)    for i in range(time):        w=w-alpha*(1/dataSet.shape[0])*dataSet.T*(dataSet*w-classLabel)    return w#导入iris数据集def read_iris():    from sklearn.datasets import load_iris    from sklearn import preprocessing    data_set = load_iris()    data_x = data_set.data     label = data_set.target + 1    #preprocessing.scale(data_x, axis=0, with_mean=True, with_std=True, copy=False)     return data_x,labeldataSet,classLabel=read_iris()n=dataSet.shape[1]print(dataSet.shape,classLabel.shape)w=GradientDescent(dataSet,classLabel,np.random.random((n,1)),0.01,7000)wComputeCost(dataSet,classLabel,w)

在实现过程中不考虑正则化，归一化问题。线性回归求参数还可以用正规方程

θ = (X T * X) - 1 * X T * y

2. 逻辑回归
线性回归用于预测，但逻辑回归用于分类。模型假设是：

h θ (x) = g (θ T X)

g代表逻辑函数，是一个常用的S型函数，公式为：

g (z) = 1 1 + e - z

因此逻辑回归的作用原理是，对于输入变量，根据选择的参数计算出输出变量=1的概率。

代价函数：
为了不陷入局部解，代价函数变形为

J (θ) = - 1 m * [\sum i = 1 m y (i) * l o g (h θ (x (i))) - (1 - y (i)) * l o g (1 - h θ (x (i))]

梯度下降：

θ j = θ j - α * (h θ (x (i)) - y (i)) * x i j

代码实现：

#sigmoid函数def sigmoid(dataSet,w):    g=1/(1+exp(-dataSet*w))    return g#梯度下降def GradientDescent(dataSet,classLabel,w,alpha,time):    dataSet=np.mat(dataSet)    classLabel=np.mat(classLabel).reshape((dataSet.shape[0],1))    m=dataSet.shape[0]    dataset=np.hstack((dataSet,np.mat(ones((m,1)))))    print(dataSet.shape,classLabel.shape)    for i in range(time):        w=w-alpha*dataSet.T*(sigmoid(dataSet,w)-classLabel)    return wdef read_iris():    from sklearn.datasets import load_iris    from sklearn import preprocessing    data_set = load_iris()    data_x = data_set.data     label = data_set.target     #preprocessing.scale(data_x, axis=0, with_mean=True, with_std=True, copy=False)     data_x=data[np.nonzero(label!=2)[0]]    label=label[np.nonzero(label!=2)[0]]    return data_x,labeldataSet,classLabel=read_iris()n=dataSet.shape[1]print(dataSet.shape,classLabel.shape)w=GradientDescent(dataSet,classLabel,np.random.random((n,1)),0.01,4000)w

3.BP神经网络
神经网络适合处理输入特征特别多的分类问题。并最少包括一个输入层，一个隐含层，一个输出层。

代价函数：

J (θ) = 1 2 (p r e d i c t - y) 2

接下来的反向传播算法基于这条式子。

前向传播算法:
从输入到输出的方向，神经网络的计算过程。

假设输入矩阵为(m*n),考虑三层神经网络：输入层n维，隐含层p维，输出层t维。
则输入-隐含权重矩阵为(p*(n+1)),隐含-输出层权重矩阵为(1*(p+1))。为什么多了1了呢？因为考虑偏置项，默认为1。

向量化表示：

i n t p u t 隐 含 层 = W 输 入 - 隐 含 * X T 加 偏 置

o u t p u t 隐 含 层 = s i g m o i d (i n t p u t 隐 含 层)

p r e d i c t = W 隐 含 - 输 出 * o u t p u t 隐 含 层, 加 偏 置

反向传播算法：
输出层误差项：

δ o u t p u t = m u l t i p l y (p r e d i c t * (1 - p r e d i c t), (y - p r e d i c t))

δ y i n h a n = m u l t i p l y (W T 隐 含 - 输 出 ， 无 偏 置 * δ o u t p u t, o u t p u t 隐 含 层 * (1 - o u t p u t 隐 含 层))

这里要注意，偏置项是没有误差的，永远为1，但对应的w却可以更新。

更新w：

输出层：

W = W + α * δ o u t p u t * o u t p u t T 隐 含 层

隐含层：

W = W + α * δ y i n h a n * X

对于偏置项对应的b：

输出层：

b = b + α * δ o u t p u t

隐含层：

b = b + α * δ y i n h a n

代码如下（我实现了两个版本，其中一个较朴素的如下，全部版本可以查看我的Github）：

#########the first version########BP神经网络,实现的是一个三层，输入到输出分别是4，5，1层的网络（未加偏置）#正向传播算法def forward(dataSet,classLabels,W1,W2):    m=dataSet.shape[0]    X=np.hstack((np.ones((m,1)),dataSet))   #对数据集加偏置    input_yin=W1.dot(np.hstack((np.ones((m,1)),dataSet)).T)    output_yin=sigmoid(input_yin)    output_yin_pianzhi=np.vstack((np.ones((1,m)),output_yin))  #对隐含层输出加偏置    input_out=W2.dot(output_yin_pianzhi)    output_out=sigmoid(input_out)    predict=output_out    return output_yin,predict  #返回隐含层和输出层的输出def predict_end(output_out):    #最终预测结果    predict=[]    for i in range(output_out.shape[1]):        if output_out[0][i]>=0.5:            predict.append(1)        else:            predict.append(0)    predict=np.array(predict).reshape(output_out.shape[1])    return predictdef sigmoid(X):    return 1/(1+exp(-X))#反向传播算法    def backward(dataSet,classLabels,output_yin,predict,W1,W2):    delta_output=np.multiply(predict*(1-predict),classLabels-predict)  #输出层误差项    W2_wupianzhi=W2[:,1:]  #去掉偏置项对应的权重项    W1_wupianzhi=W1[:,1:]    delta_yinhan=np.multiply(W2_wupianzhi.T*delta_output,output_yin*(1-output_yin))  #隐含层误差项    return delta_output,delta_yinhandef updatew(W1,W2,dataSet,delta_output,delta_yinhan,output_yin,alpha):    #print(delta_output.shape,delta_yinhan.shape,output_yin.shape)    W2[:,1:]=W2[:,1:]+alpha*delta_output.dot(output_yin.T)   #输出层权重更新    W1[:,1:]=W1[:,1:]+alpha*delta_yinhan.dot(dataSet)        #隐含层权重更新    W2[:,0]=W2[:,0]+alpha*delta_output.dot(np.ones((dataSet.shape[0],1)))   #输出层偏置项更新    W1[:,:1]=W1[:,:1]+alpha*delta_yinhan.dot(np.ones((dataSet.shape[0],1)))       return W2,W1#计算错误率，这里为了简单起见，只考虑输出是个标量的情况def error_rate(classLabels,predict):    rate=0    for i in range(len(classLabels)):        if classLabels[i]!=predict[i]:            rate+=1    return float(rate)/len(predict)#假设输入的是从左到右是4，5，1层的网络（二类分类问题）def read_iris():    from sklearn.datasets import load_iris    from sklearn import preprocessing    data_set = load_iris()    data_x = data_set.data     label = data_set.target     #preprocessing.scale(data_x, axis=0, with_mean=True, with_std=True, copy=False)     data_x=data[np.nonzero(label!=2)[0]]    label=label[np.nonzero(label!=2)[0]]    arr = np.arange(data_x.shape[0])    np.random.shuffle(arr)   #打乱数据    data_x=data_x[arr]    label=label[arr]    return data_x,label dataSet,classLabels=read_iris()maxiter=1000alpha=0.001W1=np.hstack((np.random.random((5,1)),np.random.random((5,4))))-np.random.random(W1.shape) #加上偏置项W2=np.hstack((np.random.random((1,1)),np.random.random((1,5))))-np.random.random(W2.shape)for i in range(maxiter):    output_yin,predict=forward(dataSet,classLabels,W1,W2)    delta_output,delta_yinhan=backward(dataSet,classLabels,output_yin,predict,W1,W2)    W2,W1=updatew(W1,W2,dataSet,delta_output,delta_yinhan,output_yin,alpha)predict=predict_end(predict)print(error_rate(classLabels,predict))

4.朴素贝叶斯
朴素贝叶斯应用到了贝叶斯公式来分类

p (y | x) = p ( x | y ) p ( y ) p ( x )

拉普拉斯校准：对于上式，有可能出现p(x|y)=0的情况，这样的话会导致p(y|x)=0。处理办法是对每类别下所有划分的计数加1，这样如果训练样本集数量充分大时，并不会对结果产生影响，并且解决了上述频率为0的尴尬局面。

代码实现：

import numpy as npX = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])Y = np.array([1, 1, 1, 2, 2, 2])from sklearn.naive_bayes import GaussianNBclf = GaussianNB().fit(X, Y)print clf.predict([[-0.8,-1]])

5. SVM支持向量机
SVM的目标是，找出一个分类超平面，使得支持向量到超平面的距离最远。
这这里不直接实现SVM了，而是利用sklearn中已有的SVM函数。

代码：

import numpy as npX = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])y = np.array([1, 1, 2, 2])from sklearn.svm import SVCclf = SVC()clf.fit(X, y) print(clf.predict([[-0.8, -1]]))

阅读全文

0 0