Exercise 4: Logistic Regressionand Newton's Method

来源:互联网 发布:linux 杀死进程命令 编辑:程序博客网 时间:2024/04/30 18:52

Raw page: http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex4/ex4.html


Exercise 4: Logistic Regressionand Newton's Method

In this exercise, you will use Newton's Method to implementlogistic regression on a classification problem.

Data

To begin, download ex4Data.zipand extract the files from the zip file.

For this exercise, suppose that a high school has a datasetrepresenting 40 students who were admitted to college and 40 students who werenot admitted. Each  trainingexample contains a student's score on two standardized exams and a label ofwhether the student was admitted.

Your task is to build a binary classification model thatestimates college admission chances based on a student's scores on two exams.In your training data,

a. The first column of yourx array represents all Test 1 scores, and the second column represents all Test2 scores.

b. The y vector uses '1' tolabel a student who was admitted and '0' to label a student who was notadmitted.

 

Plot the data

Load the data for the training examples into your programand add the intercept term into yourx matrix.

Before beginning Newton's Method, we will first plot thedata using different symbols to represent the two classes. In Matlab/Octave,you can separate the positive class and the negative class using the findcommand:

% find returns the indices of the
% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);
 
% Assume the features are in the 2nd and 3rd
% columns of x
plot(x(pos, 2), x(pos,3), '+'); hold on
plot(x(neg, 2), x(neg, 3), 'o')

Your plot should look like the following:

Newton's Method

Recall that in logistic regression, the hypothesis functionis

 

 

 

 

In our example, the hypothesis is interpreted as theprobability that a driver will be accident-free, given the values of thefeatures in x.

Matlab/Octave does not have a library function for thesigmoid, so you will have to define it yourself. The easiest way to do this isthrough an inline expression:

g = inline('1.0 ./ (1.0 + exp(-z))'); 
% Usage: To find the value of the sigmoid 
% evaluated at 2, call g(2)

The cost function is defined as

 


Our goal is to use Newton's method to minimize thisfunction. Recall that the update rule for Newton's method is

 


In logistic regression, the gradient and the Hessian are

 


 


Note that the formulas presented above are the vectorizedversions. Specifically, this means that , ,while and are scalars.

Implementation

Now, implement Newton's Method in your program, startingwith the initial value of . To determinehow many iterations to use, calculate for each iterationand plot your results as you did in Exercise 2. As mentioned in thelecture videos, Newton's method often converges in 5-15 iterations. If you findyourself using far more iterations, you should check for errors in yourimplementation.

After convergence, use your values of theta to find thedecision boundary in the classification problem. The decision boundary isdefined as the line where


which corresponds to


Plotting the decision boundary is equivalent to plottingthe line. When youare finished, your plot should appear like the figure below.

Questions

Finally, record your answers to these questions.

1. What values of did you get? How manyiterations were required for convergence?

2. What is the probabilitythat a student with a score of 20 on Exam 1 and a score of 80 onExam 2 will not be admitted?

Solutions

After you have completed the exercises above, please referto the solutions below and check that your implementation and your answers arecorrect. In a case where your implementation does not result in the sameparameters/phenomena as described below, debug your solution until you manageto replicate the same effect as our implementation.

A complete m-file implementation of the solutions can befound here.


Newton's Method

1. Your final values oftheta should be

 

 

 

 

Plot. Your plot of the costfunction should look similar to the picture below:

From this plot, you can infer that Newton's Method hasconverged by around 5 iterations. In fact, by looking at a printout of thevalues of J, you will see that J changes by less than between the 4th and 5thiterations. Recall that in the previous two exercises, gradient descent tookhundreds or even thousands of iterations to converge. Newton's Method is muchfaster in comparison.

2. The probability that astudent with a score of 20 on Exam 1 and 80 on Exam 2 will not beadmitted to college is 0.668.