Logistic Regression and Newton's Method

来源：互联网发布：北京妇产科排名知乎编辑：程序博客网时间：2024/05/01 03:13

close all,clear,clcx = load('ex4x.dat');y = load('ex4y.dat');% find returns the indices of the% rows meeting the specified conditionpos = find(y == 1);  %得到被录取学生的下标序列neg = find(y == 0);  %得到没有录取学生的下标序列% Assume the features are in the 2nd and 3rd% columns of xm = length(y);   %样本个数x=[ones(m,1),x];plot(x(pos, 2), x(pos,3), '+'); %x1作为横坐标;x2作为纵坐标hold onplot(x(neg, 2), x(neg, 3), 'o')ylabel('Exam 2 score');xlabel('Exam 1 score');legend('Admitted','Not admitted')theta0=0;theta1=0;theta2=0;theta=[theta0;theta0;theta0];iter = 15;J = [];for t=1:iter    grad = [0,0,0]';    H =zeros(3);     tmp = 0;    for i=1:m        fun = 1/(1+exp(-(theta'*x(i,:)'))) ;         %Logistic函数        grad = grad + (1/m)*(fun-y(i,:))*x(i,:)';    %牛顿迭代中的梯度        H = H + (1/m)*fun*(1-fun)*x(i,:)'*x(i,:);    %hession               tmp = tmp + (1/m)*(-y(i,:)*log(fun) - (1-y(i,:))*log(1-fun))  %代价函数的计算过程    end        J=[J;tmp];                     %代价函数        theta = theta - H^(-1)*grad;       %牛顿法迭代规则end%绘制分解面:theta0*x0+theta1*x1+theta2*x2=0,这个时候h(theta'*x)=0.5%所以，以x1为横坐标，x2为纵坐标：x2=-(theta0*x0+theta1*x)/theta2hold onplot_x = [min(x(:,2))-2,  max(x(:,2))+2];plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));plot(plot_x, plot_y)%Plot Jhold offfigureplot(0:iter-1, J, 'o--', 'MarkerFaceColor', 'r', 'MarkerSize', 8)xlabel('Iteration'); ylabel('J')J%测试数据x=[1,20,80]，即学生的一门成绩20分，第二门成绩80分x_test = [1,20,80];predict = 1./(1+exp(-(x_test*theta)));if predict > 0.5    disp('学生的成绩为[20,80]，他/她将被大学录取')else     disp('学生的成绩为[20,80]，他/她将不被大学录取')end

运行结果：

图1 最终分类的结果图，圆圈代码学生没被录取，加号代表学生被录取

图2 代价函数J随着，牛顿迭代次数的变化曲线，可以看到在迭代到第5次的时候就收敛了

图3 代价函数的值以及学生的成绩为第一门成绩为20，第二门成绩为80的时候，她将不会被大学录取的预测

Logistic回归和牛顿方法

一、数据准备

假设有一个数据库中，有40个学生参加高考被大学录取了，有另外40个学生参加高考没有被录取。训练样本为(x,y)，其中x=(x1,x2)表示某个学生高考的两门考试的成绩，y表示及是否被大学录取的标签，其中，x1表示第一门考试的成绩，x2表示第二门考试的成绩，y为0表示被没有被大学录取，y为1表示被大学录取了。

二、方法

Logistic回归模型

Logistic回归函数

在这个例子中，逻辑回归的假设可以认为是一种概率，例如当给定特征的时候，某事件发生的可能性。求解后的结果通过与0.5进行比较，来对样本进行划分，即如果给定的值，那么把它代入到Logistic函数中（当然啦，函数中的具体参数是什么，这是需要我们根据已有的样本和相关的学习规则去确定的，如这里我们使用了牛顿法进行参数的估计，这是后面的内容了），如果它的概率大于0.5那么归为一类（1），小于0.5那么归为另外一类（0）。

牛顿方法

迭代规则，梯度，hession矩阵

改进后的代码：

close all,clear,clcx = load('ex4x.dat');y = load('ex4y.dat');% find returns the indices of the% rows meeting the specified conditionpos = find(y == 1);  %得到被录取学生的下标序列neg = find(y == 0);  %得到没有录取学生的下标序列% Assume the features are in the 2nd and 3rd% columns of xm = length(y);   %样本个数x=[ones(m,1),x];plot(x(pos, 2), x(pos,3), '+'); %x1作为横坐标;x2作为纵坐标,被录取hold onplot(x(neg, 2), x(neg, 3), 'o')ylabel('Exam 2 score');xlabel('Exam 1 score');legend('Admitted','Not admitted')theta0=0;theta1=0;theta2=0;theta=[theta0;theta0;theta0];iter = 15;J = [];for t=1:iter    %grad = [0,0,0]';    %H =zeros(3);     h = 1./(1+exp(-(x*theta)));    tmp = 0;        gradient = (1/m).*x'* (h-y);             %梯度    hession = (1/m).*x'*diag(h)*diag(1-h)*x; %hession矩阵    %for i=1:m    %    fun = 1/(1+exp(-(theta'*x(i,:)'))) ;    %    grad = grad + (1/m)*(fun-y(i,:))*x(i,:)';        %   H = H + (1/m)*fun*(1-fun)*x(i,:)'*x(i,:); %hession           %    tmp = tmp + (1/m)*(-y(i,:)*log(fun) - (1-y(i,:))*log(1-fun))    %end    %   J=[J;tmp];         tmp =(1/m)*sum(-y.*log(h) - (1-y).*log(1-h)); %代价函数    J=[J;tmp];        % theta = theta - H^(-1)*grad;        theta = theta - hession^(-1)*gradient;        %更新规则end%绘制分解面hold onplot_x = [min(x(:,2))-2,  max(x(:,2))+2];plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));plot(plot_x, plot_y)%Plot Jhold offfigureplot(0:iter-1, J, 'o--', 'MarkerFaceColor', 'r', 'MarkerSize', 8)xlabel('Iteration'); ylabel('J')J

0 0