softmax原理及Matlab实现

来源：互联网发布：天梯金融淘宝编辑：程序博客网时间：2024/06/05 17:52

一、softmax

softmax模型的含义是假设后验概率P(y|x)服从多项式分布，y=1,2,3,4,..,k,即有k类，根据多项式分布(n=1，也可以称为目录分布)的定义：

二、从广义线性模型中推导出softmax模型

我们的目标是给定X，求出参数phi，需要建立参数phi对X的模型，下面给出模型的推导。

下面我们将后验概率写成指数函数族的形式，以得出

三、优化函数与梯度

现在我们已经建立了参数phi对X的模型，下面需要做的是估计参数theta的值，利用最大似然估计即可。

下面求解梯度：

四、正则惩罚

为了使目标函数严格凸函数即存在唯一最小值，再加入一个权值惩罚项，得到新的目标函数与梯度：

五、matlab实验

实验数据用到了mnist数据库，用于识别10个手写数字。

%% CS294A/CS294W Softmax Exercise %  Instructions%  ------------% %  This file contains code that helps you get started on the%  softmax exercise. You will need to write the softmax cost function %  in softmaxCost.m and the softmax prediction function in softmaxPred.m. %  For this exercise, you will not need to change any code in this file,%  or any other files other than those mentioned above.%  (However, you may be required to do so in later exercises)%%======================================================================%% STEP 0: Initialise constants and parameters%%  Here we define and initialise some constants which allow your code%  to be used more generally on any arbitrary input. %  We also initialise some parameters used for tuning the model.inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28)numClasses = 10;     % Number of classes (MNIST images fall into 10 classes)lambda = 1e-4; % Weight decay parameter%%======================================================================%% STEP 1: Load data%%  In this section, we load the input and output data.%  For softmax regression on MNIST pixels, %  the input data is the images, and %  the output data is the labels.%% Change the filenames if you've saved the files under different names% On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyteimages = loadMNISTImages('mnist/train-images-idx3-ubyte');labels = loadMNISTLabels('mnist/train-labels-idx1-ubyte');labels(labels==0) = 10; % Remap 0 to 10inputData = images;% For debugging purposes, you may wish to reduce the size of the input data% in order to speed up gradient checking. % Here, we create synthetic dataset using random data for testing% DEBUG = true; % Set DEBUG to true when debugging.% if DEBUG%     inputSize = 8;%     inputData = randn(8, 100);%     labels = randi(10, 100, 1);% end% Randomly initialise thetatheta = 0.005 * randn(numClasses * inputSize, 1);%%======================================================================%% STEP 2: Implement softmaxCost%%  Implement softmaxCost in softmaxCost.m. [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels);                                     %%======================================================================%% STEP 3: Gradient checking%%  As with any learning algorithm, you should always check that your%  gradients are correct before learning the parameters.% % if DEBUG%     numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ...%                                     inputSize, lambda, inputData, labels), theta);% %     % Use this to visually compare the gradients side by side%     disp([numGrad grad]); % %     % Compare numerically computed gradients with those computed analytically%     diff = norm(numGrad-grad)/norm(numGrad+grad);%     disp(diff); %     % The difference should be small. %     % In our implementation, these values are usually less than 1e-7.% %     % When your gradients are correct, congratulations!% end%%======================================================================%% STEP 4: Learning parameters%%  Once you have verified that your gradients are correct, %  you can start training your softmax regression code using softmaxTrain%  (which uses minFunc).options.maxIter = 100;softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...                            inputData, labels, options);                          % Although we only use 100 iterations here to train a classifier for the % MNIST data set, in practice, training for more iterations is usually% beneficial.%%======================================================================%% STEP 5: Testing%%  You should now test your model against the test images.%  To do this, you will first need to write softmaxPredict%  (in softmaxPredict.m), which should return predictions%  given a softmax model and the input data.images = loadMNISTImages('mnist/t10k-images-idx3-ubyte');labels = loadMNISTLabels('mnist/t10k-labels-idx1-ubyte');labels(labels==0) = 10; % Remap 0 to 10inputData = images;% You will have to implement softmaxPredict in softmaxPredict.m[pred] = softmaxPredict(softmaxModel, inputData);acc = mean(labels(:) == pred(:));fprintf('Accuracy: %0.3f%%\n', acc * 100);% Accuracy is the proportion of correctly classified images% After 100 iterations, the results for our implementation were:%% Accuracy: 92.200%%% If your values are too low (accuracy less than 0.91), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)

function [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)% numClasses - the number of classes % inputSize - the size N of the input vector% lambda - weight decay parameter% data - the N x M input matrix, where each column data(:, i) corresponds to%        a single test set% labels - an M x 1 matrix containing the labels corresponding for the input data%% Unroll the parameters from thetatheta = reshape(theta, numClasses, inputSize);numCases = size(data, 2);groundTruth = full(sparse(labels, 1:numCases, 1));cost = 0;thetagrad = zeros(numClasses, inputSize);%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute the cost and gradient for softmax regression.%                You need to compute thetagrad and cost.%                The groundTruth matrix might come in handy.[N,M]=size(data);eta=bsxfun(@minus,theta*data,max(theta*data,[],1));eta=exp(eta);pij=bsxfun(@rdivide,eta,sum(eta));cost=-1./M*sum(sum(groundTruth.*log(pij)))+lambda/2*sum(sum(theta.^2));thetagrad=-1/M.*(groundTruth-pij)*data'+lambda.*thetagrad;% ------------------------------------------------------------------% Unroll the gradient matrices into a vector for minFuncgrad = [thetagrad(:)];end

function [softmaxModel] = softmaxTrain(inputSize, numClasses, lambda, inputData, labels, options)%softmaxTrain Train a softmax model with the given parameters on the given% data. Returns softmaxOptTheta, a vector containing the trained parameters% for the model.%% inputSize: the size of an input vector x^(i)% numClasses: the number of classes % lambda: weight decay parameter% inputData: an N by M matrix containing the input data, such that%            inputData(:, c) is the cth input% labels: M by 1 matrix containing the class labels for the%            corresponding inputs. labels(c) is the class label for%            the cth input% options (optional): options%   options.maxIter: number of iterations to train forif ~exist('options', 'var')    options = struct;endif ~isfield(options, 'maxIter')    options.maxIter = 400;end% initialize parameterstheta = 0.005 * randn(numClasses * inputSize, 1);% Use minFunc to minimize the functionaddpath minFunc/options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost                          % function. Generally, for minFunc to work, you                          % need a function pointer with two outputs: the                          % function value and the gradient. In our problem,                          % softmaxCost.m satisfies this.minFuncOptions.display = 'on';[softmaxOptTheta, cost] = minFunc( @(p) softmaxCost(p, ...                                   numClasses, inputSize, lambda, ...                                   inputData, labels), ...                                                                 theta, options);% Fold softmaxOptTheta into a nicer formatsoftmaxModel.optTheta = reshape(softmaxOptTheta, numClasses, inputSize);softmaxModel.inputSize = inputSize;softmaxModel.numClasses = numClasses;                          end

function [pred] = softmaxPredict(softmaxModel, data)% softmaxModel - model trained using softmaxTrain% data - the N x M input matrix, where each column data(:, i) corresponds to%        a single test set%% Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)). % Unroll the parameters from thetatheta = softmaxModel.optTheta;  % this provides a numClasses x inputSize matrixpred = zeros(1, size(data, 2));%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute pred using theta assuming that the labels start [prob,pred]=max(theta*data);% ---------------------------------------------------------------------end

to be continued....

0 2