深度学习 Deep LearningUFLDL 最新Tutorial 学习笔记 2：Logistic Regression

来源：互联网发布：android安全卫士源码编辑：程序博客网时间：2024/05/22 15:39

1 Logistic Regression 简述

Linear Regression 研究连续量的变化情况，而Logistic Regression则研究离散量的情况，简单地说就是对于判断一个训练样本是属于1还是0。那么很容易地我们会想到概率，对，就是我们计算样本属于1的概率及属于0的概率，这样就可以根据概率来估计样本的情况，通过概率也将离散问题变成了连续问题。

Specifically, we will try to learn a function of the form:

P (y = 1 | x) P (y = 0 | x) = h θ (x) = 1 1 + exp ( - θ ⊤ x ) \equiv σ (θ ⊤ x), = 1 - P (y = 1 | x) = 1 - h θ (x) .

The function σ(z)≡11+exp(−z) is often called the “sigmoid” or “logistic” function

我们只需要计算y=1的概率就ok了。其Cost Function如下：

J(θ)=−∑i(y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))).

除了方程不一样，其他的计算和Linear Regression是完全一样的。

OK，接下来我们来看看练习怎么做。

2 exercise1B 解答

本练习通过使用MNIST的数据来判断手写数字0或者1.

我直接贴出代码：

ex1b_regression.m （无需更改）

addpath ../commonaddpath ../common/minFunc_2012/minFuncaddpath ../common/minFunc_2012/minFunc/compiled% Load the MNIST data for this exercise.% train.X and test.X will contain the training and testing images.%   Each matrix has size [n,m] where:%      m is the number of examples.%      n is the number of pixels in each image.% train.y and test.y will contain the corresponding labels (0 or 1).binary_digits = true;[train,test] = ex1_load_mnist(binary_digits);% Add row of 1s to the dataset to act as an intercept term.train.X = [ones(1,size(train.X,2)); train.X]; test.X = [ones(1,size(test.X,2)); test.X];% Training set dimensionsm=size(train.X,2);n=size(train.X,1);% Train logistic regression classifier using minFuncoptions = struct('MaxIter', 100);% First, we initialize theta to some small random values.theta = rand(n,1)*0.001;% Call minFunc with the logistic_regression.m file as the objective function.%% TODO:  Implement batch logistic regression in the logistic_regression.m file!%%tic;%theta=minFunc(@logistic_regression, theta, options, train.X, train.y);%fprintf('Optimization took %f seconds.\n', toc);% Now, call minFunc again with logistic_regression_vec.m as objective.%% TODO:  Implement batch logistic regression in logistic_regression_vec.m using% MATLAB's vectorization features to speed up your code.  Compare the running% time for your logistic_regression.m and logistic_regression_vec.m implementations.%% Uncomment the lines below to run your vectorized code.%theta = rand(n,1)*0.001;tic;theta=minFunc(@logistic_regression_vec, theta, options, train.X, train.y);fprintf('Optimization took %f seconds.\n', toc);% Print out training accuracy.tic;accuracy = binary_classifier_accuracy(theta,train.X,train.y);fprintf('Training accuracy: %2.1f%%\n', 100*accuracy);% Print out accuracy on the test set.accuracy = binary_classifier_accuracy(theta,test.X,test.y);fprintf('Test accuracy: %2.1f%%\n', 100*accuracy);

logistic_regression.m

function [f,g] = logistic_regression(theta, X,y)  %  % Arguments:  %   theta - A column vector containing the parameter values to optimize.  %   X - The examples stored in a matrix.    %       X(i,j) is the i'th coordinate of the j'th example.  %   y - The label for each example.  y(j) is the j'th example's label.  %  m=size(X,2);  n=size(X,1);    % initialize objective value and gradient.  f = 0;  g = zeros(size(theta));  %  % TODO:  Compute the objective function by looping over the dataset and summing  %        up the objective values for each example.  Store the result in 'f'.  %  % TODO:  Compute the gradient of the objective by looping over the dataset and summing  %        up the gradients (df/dtheta) for each example. Store the result in 'g'.  %%%% YOUR CODE HERE %%%% Step 1?Compute Cost Functionfor i = 1:m    f = f - (y(i)*log(sigmoid(theta' * X(:,i))) + (1-y(i))*log(1-...        sigmoid(theta' * X(:,1))));endfor j = 1:n    for i = 1:m        g(j) = g(j) + X(j,i)*(sigmoid(theta' * X(:,i)) - y(i));    end    end

ex1_load_mnist.m (无需更改）

function [train, test] = ex1_load_mnist(binary_digits)  % Load the training data  X=loadMNISTImages('train-images-idx3-ubyte');  % 784x60000 60000张图片28x28pixel  y=loadMNISTLabels('train-labels-idx1-ubyte')'; % 1*60000  if (binary_digits)    % Take only the 0 and 1 digits    X = [ X(:,y==0), X(:,y==1) ];  %通过y==0和y==1直接得到y=0和1的index    y = [ y(y==0), y(y==1) ];  end  % Randomly shuffle the data  I = randperm(length(y));  y=y(I); % labels in range 1 to 10  X=X(:,I);  % We standardize the data so that each pixel will have roughly zero mean and unit variance.  s=std(X,[],2);  %??std??X???  m=mean(X,2);  X=bsxfun(@minus, X, m);    X=bsxfun(@rdivide, X, s+.1);  % 就是计算(x-m)/s 加0.1是为了防止分母为0  % Place these in the training set  train.X = X;  train.y = y;  % Load the testing data  X=loadMNISTImages('t10k-images-idx3-ubyte');  y=loadMNISTLabels('t10k-labels-idx1-ubyte')';  if (binary_digits)    % Take only the 0 and 1 digits    X = [ X(:,y==0), X(:,y==1) ];    y = [ y(y==0), y(y==1) ];  end  % Randomly shuffle the data  I = randperm(length(y));  y=y(I); % labels in range 1 to 10  X=X(:,I);  % Standardize using the same mean and scale as the training data.  X=bsxfun(@minus, X, m);  X=bsxfun(@rdivide, X, s+.1);  % Place these in the testing set  test.X=X;  test.y=y;

【说明：本文为原创文章，转载请注明出处：blog.csdn.net/songrotek 欢迎交流QQ:363523441】

2 0