Classification and logistic regression

来源：互联网发布：linux more cat 编辑：程序博客网时间：2024/04/30 22:27

logistic 回归

1.问题：

在上面讨论回归问题时，讨论的结果都是连续类型，但如果要求做分类呢？即讨论结果为离散型的值。

2.解答：

假设：
其中：
g(z)的图形如下：

由此可知：当hθ(x)<0.5时我们可以认为为0，反之为1，这样就变成离散型的数据了。
推导迭代式：
- 利用概率论进行推导，找出样本服从的分布类型，利用最大似然法求出相应的θ
- 因此：
结果：
注意：这里的迭代式增量迭代法

Newton迭代法：

1.问题：

上述迭代法，收敛速度很慢，在利用最大似然法求解的时候可以运用Newton迭代法，即θ := θ−f(θ)f′(θ)

2.解答：

推导：
- Newton迭代法是求θ，且f(θ)=0，刚好：l′(θ)=0
- 所以可以将Newton迭代法改写成：
定义：
- 其中：l′(θ) =
- 因此：H矩阵就是l′′(θ)，即H−1 = 1/l′′(θ)
- 所以：
应用：
- 特征值比较少的情况，否则H−1的计算量是很大的

Logistic 0、1分类：

1.自己设定迭代次数

　　自己编写相应的循环，给出迭代次数以及下降坡度alpha，进行增量梯度下降。
主要函数及功能：

Logistic_Regression 相当于主函数
gradientDecent 梯度下降更新θ函数
computeCost 计算损失J函数

Logistic_Regression

%%  part0： 准备data = load('ex2data1.txt');x = data(:,[1,2]);y = data(:,3);pos = find(y==1);neg = find(y==0);x1 = x(:,1);x2 = x(:,2);plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,1);J = computeCost(x,y,theta);theta = gradientDecent(x, y, theta);X = 25:100;Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');pause;

gradientDecent

function theta = gradientDecent(x, y, theta)%% compute GradientDecent 更新theta,利用的是增量梯度下降m = size(x,1);alph = 0.001;for iter = 1:150000    for j = 1:3        dec = 0;        for i = 1:m            dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);        end        theta(j,1) = theta(j,1) + dec*alph/m;    endendend

sigmoid

function g = sigmoid(z)%% SIGMOID Compute sigmoid functoong = 1/(1+exp(-z));end

computeCost

function J = computeCost(x, y, theta)%% compute cost: Jm = size(x,1);J = 0;for i = 1:m   J =  J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));endJ = (-1/m)*J;end

结果如下：

离散点，初始数据

这里写图片描述

2. 利用fminunc函数：

　　给出损失J的计算方式和θ的计算方式，然后调用fminunc函数计算出最优解

主要函数及功能：

Logistics_Regression 相当于主函数
computeCost给出J和θ的计算方式
sigmoid函数

Logistics_Regression

%%  part0： 准备data = load('ex2data1.txt');x = data(:,[1,2]);y = data(:,3);pos = find(y==1);neg = find(y==0);x1 = x(:,1);x2 = x(:,2);plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,1);options = optimset('GradObj', 'on', 'MaxIter', 400);%  Run fminunc to obtain the optimal theta%  This function will return theta and the cost [theta, cost] = ...    fminunc(@(t)(computeCost(x,y,t)), theta, options);X = 25:100;Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');pause;

sigmoid

function g = sigmoid(z)%% SIGMOID Compute sigmoid functoong = zeros(size(z));g = 1.0 ./ (1.0 + exp(-z));end

computeCost

function [J,grad] = computeCost(x, y, theta)%% compute cost: Jm = size(x,1);grad = zeros(size(theta));hx = sigmoid(x * theta);   J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));   grad = (1.0/m) .* x' * (hx - y);end

结果

这里写图片描述

Logistic multi_class

1.条件

自己做的数据：

1,5,11,6,11.5,3.5,12.5,3.5,12,6,13,7,14,6,13.5,4.5,12,4,12,5,14,4,15,5,16,4,15,3,14,2,14,3,25,3,25,2,25,1.5,27,1.5,25,2.5,26,2.5,25.5,2.5,25,1,26,2,26,3,25,4,27,5,27,2,28,1,28,3,27,4,37,5,38.5,5.5,39,4,38,5.5,38,4.5,39.5,5.5,38,4.5,38.5,4.5,37,6,36,5,39,5,39,6,38,6,38,7,310,6,310,4,3

数据离散图：

2.算法推到

花费J :
更新θ：
算法思路（这个算法也叫one_vs_all）：
如果样本分成K类，，那我们训练K组θ，依次考虑每一类样本，然后把其它的所有样本当做一类样本，这样就把这类样本和其它分开了。我们把考虑的那类样本的y值改为1，其它为0。这样就得到K组θ值。

3.代码实现：

这里采用fminuc函数实现

1.函数级功能简介：

Logistic_Regression : 相当于主函数
oneVsAll: 写成一个循环，依次计算出K组θ，利用fminunc调用计算函数
computeCost：其中主要写J&θ更新函数

2.代码：

Logistic_Regerssion:

%%  part0： 准备data = load('data.txt');x = data(:,[1,2]);y = data(:,3);y1 = find(y==1);y2 = find(y==2);y3 = find(y==3);plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,3);%  Run fminunc to obtain the optimal theta%  This function will return theta and the cost [thetas,cost]= one_vs_all(x,y,theta);X = 1:10;Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');hold onplot(X,Y1,'r',X,Y2,'g',X,Y3,'c');

one_vs_all:

function [theta,cost] = one_vs_all(x, y, theta)%% compute cost: Joptions = optimset('GradObj', 'on', 'MaxIter', 400);n = size(x,2);cost = zeros(n,1);num_labels = 3;for i = 1:num_labels    L = logical(y==i);    [theta(:,i), cost(i,1)] = ...    fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);end

computeCost:

function [J,grad] = computeCost(x, y, thetas)%% compute cost: Jm = size(x,1);grad = zeros(size(thetas));hx = sigmoid(x * thetas);   J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));   grad = (1.0/m) .* x' * (hx - y);end

3.效果：

θ & J cost：

thetas =    6.3988    5.1407  -24.4266   -2.0773    0.2173    2.1641    0.9857   -1.9490    2.2038>> costcost =    0.1715    0.2876    0.1031

图形显示：
注意三条线组成的三角形，，这个地方的点不属于任何类别。

补充：

1.regularized Logistic Regerssion

regularized 和普通的Logistics没有太大的区别，只是在J的计算和θ更新中加上了以前的结果。

这里写图片描述

2.one_vs_all:

1.简介：

其实one_vs_all还有一种算法，把θ当做单隐层前馈神经网络进行计算，比如说我们有K类样本，第一类样本我们可以看成[1,0,0,0...]共k个数，，然后依次，，第i个为1则代表第i类样本。计算方式和上面multi_class一样。
前馈神经网络模型如下：

2.代码：

函数介绍：
- one_vs_all:相当于主函数，
- IrCostFunction:花费J和θ更新
- myPredict:统计训练误差
数据和训练得到的θ：
点击这儿下载
训练结果：

Local minimum found.Optimization completed because the size of the gradient is less thanthe default value of the function tolerance.<stopping criteria details>Local minimum found.Optimization completed because the size of the gradient is less thanthe default value of the function tolerance.<stopping criteria details>Training Set Accuracy: 100.000000

one_vs_all:

function [all_theta,cost] = oneVsAll(X, y, num_labels)%ONEVSALL trains multiple logistic regression classifiers and returns all%the classifiers in a matrix all_theta, where the i-th row of all_theta %corresponds to the classifier for label i%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels%   logisitc regression classifiers and returns each of these classifiers%   in a matrix all_theta, where the i-th row of all_theta corresponds %   to the classifier for label i% Some useful variablesm = size(X, 1);n = size(X, 2);% You need to return the following variables correctly all_theta = zeros(n+1,num_labels);% Add ones to the X data matrixX = [ones(m, 1),X];% ====================== YOUR CODE HERE ======================% Instructions: You should complete the following code to train num_labels%               logistic regression classifiers with regularization%               parameter lambda. %% Hint: theta(:) will return a column vector.%% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use %       whether the ground truth is true/false for this class.%% Note: For this assignment, we recommend using fmincg to optimize the cost%       function. It is okay to use a for-loop (for c = 1:num_labels) to%       loop over the different classes.%%       fmincg works similarly to fminunc, but is more efficient when we%       are dealing with large number of parameters.%% Example Code for fmincg:%%     % Set Initial theta%     initial_theta = zeros(n + 1, 1);%     %     % Set options for fminunc%     options = optimset('GradObj', 'on', 'MaxIter', 50);% %     % Run fmincg to obtain the optimal theta%     % This function will return theta and the cost %     [theta] = ...%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...%                 initial_theta, options);%cost = zeros(num_labels,1);options = optimset('GradObj', 'on', 'MaxIter', 50);for i =1:num_labels    L = logical(y==i);     [all_theta(:,i),cost(i,1)] = ...       fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);endmyPredict(all_theta,X,y);% =========================================================================end

IrCostFunction:

function [J,grad] = lrCostFunction(thetas,x, y)%LRCOSTFUNCTION Compute cost and gradient for logistic regression with %regularization%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using%   theta as the parameter for regularized logistic regression and the%   gradient of the cost w.r.t. to the parameters. % Initialize some useful valuesm = length(y); % number of training examples%单独调试该函数时用的代码%x = [ones(m,1),x];%theta = zeros(size(x,2),1);%y = logical(y==1);% ====================== YOUR CODE HERE ======================% Instructions: Compute the cost of a particular choice of theta.%               You should set J to the cost.%               Compute the partial derivatives and set grad to the partial%               derivatives of the cost w.r.t. each parameter in theta%% Hint: The computation of the cost function and gradients can be%       efficiently vectorized. For example, consider the computation%%           sigmoid(X * theta)%%       Each row of the resulting matrix will contain the value of the%       prediction for that example. You can make use of this to vectorize%       the cost function and gradient computations. %% Hint: When computing the gradient of the regularized cost function, %       there're many possible vectorized solutions, but one solution%       looks like:%           grad = (unregularized gradient for logistic regression)%           temp = theta; %           temp(1) = 0;   % because we don't add anything for j = 0  %           grad = grad + YOUR_CODE_HERE (using the temp variable)%grad = zeros(size(thetas));hx = sigmoid(x * thetas);  J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));  grad = (1.0/m) .* x' * (hx - y);% ================================================x=============end

myPredict:

function p = myPredict(Theta1,X,y)%PREDICT Predict the label of an input given a trained neural network%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the%   trained weights of a neural network (Theta1, Theta2)% Useful valuesm = size(X, 1);num_labels = 10;% You need to return the following variables correctly p = zeros(size(X, 1), 1);% ====================== YOUR CODE HERE ======================% Instructions: Complete the following code to make predictions using%               your learned neural network. You should set p to a %               vector containing labels between 1 to num_labels.%% Hint: The max function might come in useful. In particular, the max%       function can also return the index of the max element, for more%       information see 'help max'. If your examples are in rows, then, you%       can use max(A, [], 2) to obtain the max for each row.%z_2 = X*Theta1;a_2 = sigmoid(z_2);for i = 1:m    for j = 1:num_labels        if a_2(i,j) >= 0.5            p(i,1) = j;            break;        end    endend  fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100);% =========================================================================end

与本博客相关知识链接：

由Logistics Regression multi_class中的one_vs _all算法 ——> 双隐层前馈神经网络：BP神经网络
由Logistic Regerssion —–> SVM : 特征空间映射
Logistic Regerssion 的理论解释：概率论解释

2 0