Logistic Regression

来源：互联网发布：linux 重命名命令编辑：程序博客网时间：2024/06/10 11:46

声明：本博客内容参考了网上的代码，若有处理的不当的地方请指正！

1.problem:

In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.
Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision.
Your task is to build a classification model that estimates an applicant’s probability of admission based the scores from those two exams. This outline and the framework code in ex2.m will guide you through the exercise.

2.Implementation:
1.sigmoid function
Before you start with the actual cost function, recall that the logistic regression hypothesis is defined as:

where function g is the sigmoid function. The sigmoid function is defined as:

Matlab代码：

<span style="font-size:14px;">function g = sigmoid(z)%SIGMOID Compute sigmoid functoon%   J = SIGMOID(z) computes the sigmoid of z. % You need to return the following variables correctly g = zeros(size(z)); % ====================== YOUR CODE HERE ======================% Instructions: Compute the sigmoid of each value of z (z can be a matrix,%               vector or scalar).g = 1 ./ ( 1 + exp(-z) ) ;% ============================================================= end</span>

2.Cost function and gradient

Now you will implement the cost function and gradient for logistic regression. Complete the code in costFunction.m to return the cost and gradient.

Recall that the cost function in logistic regression is:

and the gradient of the cost is a vector of the same length asθwhere the jth element (for j = 0, 1, . . . , n) is defined as follows:

推导：

Note that while this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of hθ(x).

MATLAB代码如下：

<span style="font-size:14px;">function [J, grad] = costFunction(theta, X, y)%COSTFUNCTION Compute cost and gradient for logistic regression%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the%   parameter for logistic regression and the gradient of the cost%   w.r.t. to the parameters. % Initialize some useful valuesm = length(y); % number of training examples % You need to return the following variables correctly J = 0;grad = zeros(size(theta)); % ====================== YOUR CODE HERE ======================% Instructions: Compute the cost of a particular choice of theta.%               You should set J to the cost.%               Compute the partial derivatives and set grad to the partial%               derivatives of the cost w.r.t. each parameter in theta%% Note: grad should have the same dimensions as theta J= -1 * sum( y .* log( sigmoid(X*theta) ) + (1 - y ) .* log( (1 - sigmoid(X*theta)) ) ) / m ; grad = ( X' * (sigmoid(X*theta) - y ) )/ m ; end</span>

3.Learning parameters using fminunc（MATLAB中求最小值的函数）

MATLAB代码如下：

<span style="font-size:14px;">%% ============= Part 3: Optimizing using fminunc  =============%  In this exercise, you will use a built-in function (fminunc) to find the%  optimal parameters theta. %  Set options for fminuncoptions = optimset('GradObj', 'on', 'MaxIter', 400);%GradObj:查看fminunc函数 %  Run fminunc to obtain the optimal theta%  This function will return theta and the cost [theta, cost] = ...    fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);%返回的是J(theta)中的theta和J的值，可以看fminunc函数的例子</span>

4.PlotDecisionBoundary

This final θ value will then be used to plot the decision boundary on the training data, resulting in a figure similar to Figure 2. We also encourage you to look at the code in plotDecisionBoundary.m to see how to plot such a boundary using the θ values.

Matlab代码如下：

<span style="font-size:14px;">function plotDecisionBoundary(theta, X, y)%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with%the decision boundary defined by theta%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the %   positive examples and o for the negative examples. X is assumed to be %   a either %   1) Mx3 matrix, where the first column is an all-ones column for the %      intercept.%   2) MxN, N>3 matrix, where the first column is all-ones % Plot DataplotData(X(:,2:3), y);hold on if size(X, 2) <= 3    % Only need 2 points to define a line, so choose two endpoints    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];     % Calculate the decision boundary line    plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));     % Plot, and adjust axes for better viewing    plot(plot_x, plot_y)        % Legend, specific for the exercise    legend('Admitted', 'Not admitted', 'Decision Boundary')    axis([30, 100, 30, 100])else    % Here is the grid range    u = linspace(-1, 1.5, 50);    v = linspace(-1, 1.5, 50);     z = zeros(length(u), length(v));    % Evaluate z = theta*x over the grid    for i = 1:length(u)        for j = 1:length(v)            z(i,j) = mapFeature(u(i), v(j))*theta;        end    end    z = z'; % important to transpose z before calling contour     % Plot z = 0    % Notice you need to specify the range [0, 0]    contour(u, v, z, [0, 0], 'LineWidth', 2)endhold off end</span>

5.Evaluating logistic regression

After learning the parameters, you can use the model to predict whether a particular student will be admitted. For a student with an Exam 1 score of 45 and an Exam 2 score of 85, you should expect to see an admission probability of 0.776.

Another way to evaluate the quality of the parameters we have found is to see how well the learned model predicts on our training set. In this part, your task is to complete the code in predict.m. The predict function will produce “1”or“0”predictions given a dataset and a learned parameter vectorθ.

After you have completed the code in predict.m, the ex2.m script will

proceed to report the training accuracy of your classifier by computing the

percentage of examples it got correct.

MATLAB代码如下：

<span style="font-size:14px;">function p = predict(theta, X)%PREDICT Predict whether the label is 0 or 1 using learned logistic %regression parameters theta%   p = PREDICT(theta, X) computes the predictions for X using a %   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1) m = size(X, 1); % Number of training examples % You need to return the following variables correctlyp = zeros(m, 1); % ====================== YOUR CODE HERE ======================% Instructions: Complete the following code to make predictions using%               your learned logistic regression parameters. %               You should set p to a vector of 0's and 1's% k = find(sigmoid( X * theta) >= 0.5 );p(k)= 1; % p(sigmoid( X * theta) >= 0.5) = 1;   % it's a more compat way. end</span>

运行结果：

本博客完整代码链接：http://download.csdn.net/detail/zhe123zhe123zhe123/9541491

0 0