UFLDL Exercise:Softmax Regression

来源：互联网发布：android安全卫士源码编辑：程序博客网时间：2024/05/22 12:35

这一节主要是使用softmax实现一个手写数字的识别器，难点主要是在代价函数和梯度的矢量化写法。

STEP 2: Implement softmaxCost

function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)% numClasses - the number of classes % inputSize - the size N of the input vector% lambda - weight decay parameter% data - the N x M input matrix, where each column data(:, i) corresponds to%        a single test set% labels - an M x 1 matrix containing the labels corresponding for the input data%% Unroll the parameters from thetatheta = reshape(theta, numClasses, inputSize);numCases = size(data, 2);groundTruth = full(sparse(labels, 1:numCases, 1));cost = 0;thetagrad = zeros(numClasses, inputSize);%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute the cost and gradient for softmax regression.%                You need to compute thetagrad and cost.%                The groundTruth matrix might come in handy.m = theta * data;m = bsxfun(@minus,m,max(m,[],1));m = exp(m);m = bsxfun(@rdivide, m, sum(m));cost = -sum(sum(groundTruth .* log(m)))/size(data,2) + lambda/2*sum(sum(theta.^2));thetagrad = -(groundTruth - m)*data'/size(data,2) + lambda*theta;% ------------------------------------------------------------------% Unroll the gradient matrices into a vector for minFuncgrad = [thetagrad(:)];end

其中groundTruth的大小为numClasses * numCases的矩阵,groundTruth横坐标表示的是类别，纵坐标表示的是第几个样本，如果第j样本属于i类，那么groundTruth[i][j]=1,其他groundTruth[x][j]=0(x!=i)

则根据公式

容易知道

groundTruth .* log(m)就等价于

理解了这个就不难得到cost的公式

至于thetagrad，根据公式

-(groundTruth - m)*data'得到了一个新的矩阵，它的大小跟theta是一样，它的第j行就等价于下面的公式，因为groundTruth - m的size为numClasses*numCases，data的size为inputSize*numCases，所以(groundTruth - m)*data'就有一个将所有样本累加的作用（想象一下两个矩阵相乘是怎么样的就容易明白了）

STEP 3: Gradient checking

运行它提供的代码即可（DEBUG设为true），下面是我测试的结果

STEP 4&5 :Learning parameters && Testing

softmaxPredict.m

function [pred] = softmaxPredict(softmaxModel, data)% softmaxModel - model trained using softmaxTrain% data - the N x M input matrix, where each column data(:, i) corresponds to%        a single test set%% Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)). % Unroll the parameters from thetatheta = softmaxModel.optTheta;  % this provides a numClasses x inputSize matrixpred = zeros(1, size(data, 2));%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute pred using theta assuming that the labels start %                from 1.m = theta * data;[~,pred] = max(m);% ---------------------------------------------------------------------end

接下来运行它提供的训练和测试代码即可，记得要把DEBUG设为false

结果如下

0 0