Deep Learning 学习笔记(一):softmax Regression及Python实现
来源:互联网 发布:webshell扫描器下载 编辑:程序博客网 时间:2024/05/02 02:34
茫然中不知道该做什么,更看不到希望。
偶然看到coursera上有Andrew Ng教授的机器学习课程以及他UFLDL上的深度学习课程,于是静下心来,视频一个个的看,作业一个一个的做,程序一个一个的写。N多数学的不懂、Matlab不熟悉,开始的时候学习进度慢如蜗牛,坚持了几个月,终于也学完了。为了避免遗忘,在这里记下一些内容。由于水平有限,Python也不是太熟悉,英语也不够好,有错误或不当的地方,请不吝赐教。
对于softmax背后的理论还不是很清楚,不知道是来自信息论还是概率。不过先了解个大概,先用起来,背后的理论再慢慢补充。
softmax的基本理论:
对于给定的输入x和输出有,K类分类器每类的概率为P(y=k|x,Θ),即
模型参数 θ(1),θ(2),…,θ(K)∈Rn ,矩阵θ以K*n的形式比较方便(其中n为输入x的维度或特征数)。
softmax回归的代价函数:
其中1{y(i)=k}为指示函数,即y(i)为k时其值为1,否则为0,或则说括号内的表达式为真时其值为1,否则为0.
梯度公式:
在实现此模型时遇到了2个问题,卡了一段时间:
1. 指示函数如何实现?我的实现方法:把y转换为一个k个元素的向量yv,如果y=i,则yv[i]=1,其他位置为零。在代价函数中用这个向量和概率P元素相乘、在梯度公式中与概率P相减即可实现指示函数。
2. 对矩阵不太熟练,矢量化花费了不少时间。
对概率P的参数θ进行平移得到结果和原概率一致,因此可得到参数θ是有冗余的结论。解决方法有2种,第一种是在代价函数和梯度中加上L2范式惩罚项,这种方式又增加了一个自由参数:惩罚项系数。第二种方式是固定某个类的参数为零,这样的方式不影响最终的分类结果。在我的实现方式里使用第二种方式。
教程里提到的梯度检验的方法非常有效,可以有效验证代价函数和梯度实现是否正确。只要通过梯度检验,一般都能得到正确的结果。
UFLDL教程上的练习是Matlab,由于对Matlab熟悉度不够,我使用Python+numpy+scipy来实现。代码的意义参考代码中注释。
第一段代码是一个抽象的监督学习的模型类,可以用于神经网络等监督学习模型。
import numpy as npfrom dp.common.optimize import minFuncSGDimport scipy.optimize as spoptclass SupervisedLearningModel(object): def flatTheta(self): ''' convert weight and intercept to 1-dim vector ''' pass def rebuildTheta(self,theta): ''' overwrite the method in SupervisedLearningModel convert 1-dim theta to weight and intercept Parameters: theta - The vector hold the weights and intercept, needed by scipy.optimize function size:outputSize*inputSize ''' def cost(self, theta,X,y): ''' This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize Parameters: theta - 1-Dim vector of weight X - samples, numFeatures by numSamples y - labels, numSamples elements vector return: the model cost ''' pass def gradient(self, theta,X,y): ''' This method is used to some optimize function such as fmin_cg,fmin_l_bfgs_b in scipy.optimize Parameters: theta - 1-Dim vector of weight X - samples, numFeatures by numSamples y - labels, numSamples elements vector return: the model gradient ''' pass def costFunc(self,theta,X,y): ''' This method is used to some optimize function such as minFuncSGD in this package Parameters: theta - 1-Dim vector of weight X - samples, numFeatures by numSamples y - labels, numSamples elements vector return: the model cost and gradient ''' pass def predict(self, Xtest): ''' predict the test samples Parameters: X - test samples, numFeatures by numSamples return: the predict result,a vector, numSamples elements ''' pass def performance(self,Xtest,ytest): ''' Before calling this method, this model should be training Parameter: Xtest - The data to be predicted, numFeatures by numData ''' pred = self.predict(Xtest) return np.mean(pred == ytest) * 100 def train(self,X,y): ''' use this method to train the model. Parameters: theta - 1-Dim vector of weight X - samples, numFeatures by numSamples y - labels, numSamples elements vector ''' theta =self.flatTheta() ret = spopt.fmin_l_bfgs_b(self.cost, theta, fprime=self.gradient,args=(X,y),m=200,disp=1, maxiter=100) opttheta= ret[0] ''' opttheta = spopt.fmin_cg(self.cost, theta, fprime=self.gradient,args=(X,y),full_output=False,disp=True, maxiter=100) ''' ''' options=dict() options['epochs']=10 options['alpha'] = 2 options['minibatch']=256 opttheta = minFuncSGD(self.costFunc,theta,X,y,options) ''' self.rebuildTheta(opttheta)
第二段代码定义了一个单一神经网络层NNLayer,从第一段代码中的SupervisedModel类继承下来。
它在softmax和多层神经网络中用得到。
class NNLayer(SupervisedLearningModel): ''' This class is single layer of Neural network ''' def __init__(self, inputSize,outputSize,Lambda,actFunc='sigmoid'): ''' Constructor: initialize one layer w.r.t params parameters : inputSize - the number of input elements outputSize - the number of output lambda - weight decay parameter actFunc - the can be sigmoid,tanh,rectified linear function ''' super().__init__() self.inputSize = inputSize self.outputSize = outputSize self.Lambda = Lambda self.actFunc=sigmoid self.actFuncGradient=sigmodGradient self.input=0 #input of this layer self.activation=0 #output of the layer self.delta=0 #the error of this layer self.W=0 #the weight self.b=0 #the intercept if actFunc=='sigmoid': self.actFunc = sigmoid self.actFuncGradient = sigmodGradient if actFunc=='tanh': self.actFunc = tanh self.actFuncGradient =tanhGradient if actFunc=='rectfiedLinear': self.actFunc = rectfiedLinear self.actFuncGradient = rectfiedLinearGradient #epsilon的值是一个经验公式 #initialize weights and intercept (bias) epsilon_init = 2.4495/np.sqrt(self.inputSize+self.outputSize)*0.001 theta = np.random.rand(self.outputSize, self.inputSize + 1) * 2 * epsilon_init - epsilon_init self.rebuildTheta(theta) def flatTheta(self): ''' convert weight and intercept to 1-dim vector ''' W = np.hstack((self.W, self.b)) return W.ravel() def rebuildTheta(self,theta): ''' overwrite the method in SupervisedLearningModel convert 1-dim theta to weight and intercept Parameters: theta - The vector hold the weights and intercept, needed by scipy.optimize function size:outputSize*inputSize ''' W=theta.reshape(self.outputSize,-1) self.b=W[:,-1].reshape(self.outputSize,1) #bias b is a vector with outputSize elements self.W = W[:,:-1] def forward(self): ''' Parameters: X - The examples in a matrix, it's dimensionality is inputSize by numSamples ''' Z = np.dot(self.W,self.input)+self.b #Z self.activation= self.actFunc(Z) #activations return self.activation def backpropagate(self): ''' parameter: inputMat - the actviations of previous layer, or input of this layer, inputSize by numSamples delta - the next layer error term, outputSize by numSamples assume current layer number is l, delta is the error term of layer l+1. delta(l) = (W(l).T*delta(l+1)).f'(z) If this layer is the first hidden layer,this method should not be called The f' is re-writed to void the second call to the activation function ''' return np.dot(self.W.T,self.delta)*self.actFuncGradient(self.input) def layerGradient(self): ''' grad_W(l)=delta(l+1)*input.T grad_b(l) = SIGMA(delta(l+1)) parameters: inputMat - input of this layer, inputSize by numSamples delta - the next layer error term ''' m=self.input.shape[1] gw = np.dot(self.delta,self.input.T)/m gb = np.sum(self.delta,1)/m #combine gradients of weights and intercepts #and flat it grad = np.hstack((gw, gb.reshape(-1,1))) return grad def sigmoid(Z): return 1.0 /(1.0 + np.exp(-Z))def sigmodGradient (a): #a = sigmoid(Z) return a*(1-a)def tanh(Z): e1=np.exp(Z) e2=np.exp(-Z) return (e1-e2)/(e1+e2)def tanhGradient(a): return 1-a**2def rectfiedLinear(Z): a = np.zeros(Z.shape)+Z a[a<0]=0 return adef rectfiedLinearGradient(a): b = np.zeros(a.shape)+a b[b>0]=1 return b
第三段代码是softmax回归的实现,它从NNLayer继承。
import numpy as np#import scipy.optimize as spoptfrom dp.supervised import NNBasefrom time import time#from dp.common.optimize import minFuncSGDclass SoftmaxRegression(NNBase.NNLayer): ''' We assume the last class weight to be zeros in this implementation. The weight decay is not used here. ''' def __init__(self, numFeatures, numClasses,Lambda=0): ''' Initialization of weights,intercepts and other members Parameters: numClasses - The number of classes to be classified X - The training samples, numFeatures by numSamples y - The labels of training samples, numSamples elements vector ''' # call the super constructor to initialize the weights and intercepts # We do not need the last weights and intercepts of the last class super().__init__(numFeatures, numClasses - 1, Lambda, None) #self.X=0 self.y_mat=0 def predict(self, Xtest): ''' Prediction. Before calling this method, this model should be training Parameter: Xtest - The data to be predicted, numFeatures by numData ''' Z = np.dot(self.W, Xtest) + self.b #add the prediction of the last class,they are all zeros lastClass = np.zeros((1, Xtest.shape[1])) Z = np.vstack((Z, lastClass)) #get the index of max value in each column, it is the prediction return np.argmax(Z, 0) def forward(self): ''' get the matrix of softmax hypothesis this method will be called by cost and gradient methods Parameters: ''' h = np.dot(self.W, self.input) + self.b h = np.exp(h) #add probabilities of the last class, they are all ones h = np.vstack((h, np.ones((1, self.input.shape[1])))) #The probability of all classes hsum = np.sum(h, axis=0) #get the probability of each class self.activation = h / hsum #delta = -(self.y_mat-h) self.delta = self.activation - self.y_mat self.delta=self.delta[:-1, :] return self.activation def setTrainingLabels(self,y): # convert Vector y to a matrix y_mat. # For sample i, if it belongs to the k-th class, # y_mat[k,i]=1 (k==j), y_mat[k,i]=0 (k!=j) y = y.astype(np.int64) m=y.shape[0] yy = np.arange(m) self.y_mat = np.zeros((self.outputSize+1, m)) self.y_mat[y, yy] = 1 def softmaxforward(self,theta,X,y): self.input = X self.setTrainingLabels(y) self.rebuildTheta(theta) return self.forward() def cost(self, theta,X,y): ''' The cost function. Parameters: theta - The vector hold the weights and intercept, needed by scipy.optimize function size: (numClasses - 1)*(numFeatures + 1) ''' h = np.log(self.softmaxforward(theta,X,y)) #h * self.y_mat, apply the indicator function cost = -np.sum(h *self.y_mat, axis=(0, 1)) return cost / X.shape[1] def gradient(self, theta,X,y): ''' The gradient function. Parameters: theta - The vector hold the weights and intercept, needed by scipy.optimize function size: (numClasses - 1)*(numFeatures + 1) ''' self.softmaxforward(theta,X,y) #get the gradient grad = super().layerGradient() return grad.ravel() def costFunc(self,theta,X,y): grad=self.gradient(theta, X, y) h=np.log(self.activation) cost = -np.sum(h * self.y_mat, axis=(0, 1))/X.shape[1] return cost,grad def checkGradient(X,y): sm = SoftmaxRegression(X.shape[0], 10) #W = np.hstack((sm.W, sm.b)) #sm.setTrainData(X, y) theta = sm.flatTheta() #grad = sm.gradient(theta,X, y) cost,grad=sm.costFunc(theta, X, y) numgrad = np.zeros(grad.shape) e = 1e-6 for i in range(np.size(grad)): theta[i]=theta[i]-e loss1,g1 =sm.costFunc(theta,X, y) theta[i]=theta[i]+2*e loss2,g2 = sm.costFunc(theta,X, y) theta[i]=theta[i]-e numgrad[i] = (-loss1 + loss2) / (2 * e) print(np.sum(np.abs(grad-numgrad))/np.size(grad))
测试数据使用MNIST数据集。测试结果,正确率在92.5%左右。
测试代码:
X = np.load('../../common/trainImages.npy') / 255 X = X.T y = np.load('../../common/trainLabels.npy') ''' X1=X[:,:10] y1=y[:10] checkGradient(X1,y1) ''' Xtest = np.load('../../common/testImages.npy') / 255 Xtest = Xtest.T ytest = np.load('../../common/testLabels.npy') sm = SoftmaxRegression(X.shape[0], 10) t0=time() sm.train(X,y) print('training Time %.5f s' %(time()-t0)) print('test acc :%.3f%%' % (sm.performance(Xtest,ytest)))
参考资料:
Softmax Regression http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
- Deep Learning 学习笔记(一):softmax Regression及Python实现
- Deep learning:一softmax Regression 练习
- Deep learning------------softmax regression
- Deep Learning 学习随记(三)Softmax regression - bzjia
- Deep Learning 学习随记(三)Softmax regression
- Deep learning:十三(Softmax Regression)
- 【Deep Learning】3、Softmax Regression
- Deep learning:十三(Softmax Regression)
- Deep learning:十三(Softmax Regression)
- Deep learning:十三(Softmax Regression)
- 深度学习 Deep Learning UFLDL 最新Tutorial 学习笔记 5:Softmax Regression
- 深度学习 Deep Learning UFLDL 最新Tutorial 学习笔记 5:Softmax Regression
- Deep learning:十四(Softmax Regression练习)
- Deep Learning by Andrew Ng --- Softmax regression
- Deep learning:十四(Softmax Regression练习)
- (MIT)Deep learning 学习笔记(5)--Linear Regression
- UFLDL Tutorial学习笔记(一)Linear&Logistic&Softmax Regression
- Deep Learning 学习笔记(二):神经网络Python实现
- 物联网安全的重要性
- Android NDK开发Crash错误定位
- 提高Xcode 的编译速度
- SVG 与 Canvas:如何选择
- 在 ubuntu 下编译 android 找不到头文件问题解决
- Deep Learning 学习笔记(一):softmax Regression及Python实现
- 炉石传说》架构设计赏析(7):使用Google.ProtocolBuffers处理网络消息
- 【教程分享】使用WCF搭建企业通用架构
- Timus 1746 Hyperrook
- 《暗时间》读书笔记
- red hat enterprise linux 小 tip
- 【姿势】一张图识别好公司烂公司
- SecureCRT中文显示乱码
- 生活美——邮件字体也要美