Classification and logistic regression

来源:互联网 发布:linux more cat 编辑:程序博客网 时间:2024/04/30 22:27

logistic 回归

1.问题:

在上面讨论回归问题时,讨论的结果都是连续类型,但如果要求做分类呢?即讨论结果为离散型的值。

2.解答:

  • 假设:这里写图片描述
    其中:这里写图片描述
    g(z)的图形如下:
    这里写图片描述
    由此可知:当hθ(x)<0.5时我们可以认为为0,反之为1,这样就变成离散型的数据了。

  • 推导迭代式:

    • 利用概率论进行推导,找出样本服从的分布类型,利用最大似然法求出相应的θ
    • 这里写图片描述
    • 因此:这里写图片描述
      这里写图片描述
  • 结果:这里写图片描述

  • 注意:这里的迭代式增量迭代法

Newton迭代法:

1.问题:

上述迭代法,收敛速度很慢,在利用最大似然法求解的时候可以运用Newton迭代法,即θ := θf(θ)f(θ)

2.解答:

  • 推导:

    • Newton迭代法是求θ,且f(θ)=0,刚好:l(θ)=0
    • 所以可以将Newton迭代法改写成:这里写图片描述
  • 定义:

    • 其中:l(θ) = 这里写图片描述
    • 这里写图片描述
      因此:H矩阵就是l′′(θ),即H1 = 1/l′′(θ)
    • 所以:这里写图片描述
  • 应用:

    • 特征值比较少的情况,否则H1的计算量是很大的

Logistic 0、1分类:

1.自己设定迭代次数

  自己编写相应的循环,给出迭代次数以及下降坡度alpha,进行增量梯度下降。
主要函数及功能:

  • Logistic_Regression 相当于主函数
  • gradientDecent 梯度下降更新θ函数
  • computeCost 计算损失J函数

Logistic_Regression

%%  part0: 准备data = load('ex2data1.txt');x = data(:,[1,2]);y = data(:,3);pos = find(y==1);neg = find(y==0);x1 = x(:,1);x2 = x(:,2);plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,1);J = computeCost(x,y,theta);theta = gradientDecent(x, y, theta);X = 25:100;Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');pause;

gradientDecent

function theta = gradientDecent(x, y, theta)%% compute GradientDecent 更新theta,利用的是增量梯度下降m = size(x,1);alph = 0.001;for iter = 1:150000    for j = 1:3        dec = 0;        for i = 1:m            dec = dec + (y(i) - sigmoid(x(i,:)*theta))*x(i,j);        end        theta(j,1) = theta(j,1) + dec*alph/m;    endendend

sigmoid

function g = sigmoid(z)%% SIGMOID Compute sigmoid functoong = 1/(1+exp(-z));end

computeCost

function J = computeCost(x, y, theta)%% compute cost: Jm = size(x,1);J = 0;for i = 1:m   J =  J + y(i)*log(sigmoid(x(i,:)*theta)) + (1 - y(i))*log(1 - sigmoid(x(i,:)*theta));endJ = (-1/m)*J;end

结果如下:

离散点,初始数据

这里写图片描述

2. 利用fminunc函数:

  给出损失J的计算方式和θ的计算方式,然后调用fminunc函数计算出最优解

主要函数及功能:

  • Logistics_Regression 相当于主函数
  • computeCost给出Jθ的计算方式
  • sigmoid函数

Logistics_Regression

%%  part0: 准备data = load('ex2data1.txt');x = data(:,[1,2]);y = data(:,3);pos = find(y==1);neg = find(y==0);x1 = x(:,1);x2 = x(:,2);plot(x(pos,1),x(pos,2),'r*',x(neg,1),x(neg,2),'co');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,1);options = optimset('GradObj', 'on', 'MaxIter', 400);%  Run fminunc to obtain the optimal theta%  This function will return theta and the cost [theta, cost] = ...    fminunc(@(t)(computeCost(x,y,t)), theta, options);X = 25:100;Y = ( -theta(1,1) - theta(3,1)*X)/theta(2,1);plot(x(pos,2),x(pos,3),'r*',x(neg,2),x(neg,3),'co', X, Y, 'b');pause;

sigmoid

function g = sigmoid(z)%% SIGMOID Compute sigmoid functoong = zeros(size(z));g = 1.0 ./ (1.0 + exp(-z));end

computeCost

function [J,grad] = computeCost(x, y, theta)%% compute cost: Jm = size(x,1);grad = zeros(size(theta));hx = sigmoid(x * theta);   J = (1.0/m) * sum(-y .* log(hx) - (1.0 - y) .* log(1.0 - hx));   grad = (1.0/m) .* x' * (hx - y);end

结果

这里写图片描述

这里写图片描述

Logistic multi_class

1.条件

  • 自己做的数据:
1,5,11,6,11.5,3.5,12.5,3.5,12,6,13,7,14,6,13.5,4.5,12,4,12,5,14,4,15,5,16,4,15,3,14,2,14,3,25,3,25,2,25,1.5,27,1.5,25,2.5,26,2.5,25.5,2.5,25,1,26,2,26,3,25,4,27,5,27,2,28,1,28,3,27,4,37,5,38.5,5.5,39,4,38,5.5,38,4.5,39.5,5.5,38,4.5,38.5,4.5,37,6,36,5,39,5,39,6,38,6,38,7,310,6,310,4,3
  • 数据离散图:

    这里写图片描述

2.算法推到

  • 花费J :
    这里写图片描述

  • 更新θ
    这里写图片描述

  • 算法思路(这个算法也叫one_vs_all):

    这里写图片描述

    如果样本分成K类,,那我们训练K组θ,依次考虑每一类样本,然后把其它的所有样本当做一类样本,这样就把这类样本和其它分开了。我们把考虑的那类样本的y值改为1,其它为0。这样就得到K组θ值。

3.代码实现:

这里采用fminuc函数实现

1.函数级功能简介:

  • Logistic_Regression : 相当于主函数
  • oneVsAll: 写成一个循环,依次计算出K组θ,利用fminunc调用计算函数
  • computeCost:其中主要写J&θ更新函数

2.代码:

  • Logistic_Regerssion:
%%  part0: 准备data = load('data.txt');x = data(:,[1,2]);y = data(:,3);y1 = find(y==1);y2 = find(y==2);y3 = find(y==3);plot(x(y1,1),x(y1,2),'r*',x(y2,1),x(y2,2),'c+',x(y3,1),x(y3,2),'bo');pause;%% part1: GradientDecent and compute cost of J[m,n] = size(x);x = [ones(m,1),x];theta = zeros(3,3);%  Run fminunc to obtain the optimal theta%  This function will return theta and the cost [thetas,cost]= one_vs_all(x,y,theta);X = 1:10;Y1 = -(thetas(1,1) + thetas(2,1)*X)/thetas(3,1);Y2 = -(thetas(1,2) + thetas(2,2)*X)/thetas(3,2);Y3 = -(thetas(1,3) + thetas(2,3)*X)/thetas(3,3);plot(x(y1,2),x(y1,3),'r*',x(y2,2),x(y2,3),'c+',x(y3,2),x(y3,3),'bo');hold onplot(X,Y1,'r',X,Y2,'g',X,Y3,'c');
  • one_vs_all:
function [theta,cost] = one_vs_all(x, y, theta)%% compute cost: Joptions = optimset('GradObj', 'on', 'MaxIter', 400);n = size(x,2);cost = zeros(n,1);num_labels = 3;for i = 1:num_labels    L = logical(y==i);    [theta(:,i), cost(i,1)] = ...    fminunc(@(t)(computeCost(x,L,t)), theta(:,i), options);end
  • computeCost:
function [J,grad] = computeCost(x, y, thetas)%% compute cost: Jm = size(x,1);grad = zeros(size(thetas));hx = sigmoid(x * thetas);   J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));   grad = (1.0/m) .* x' * (hx - y);end

3.效果:

  • θ & J cost:
thetas =    6.3988    5.1407  -24.4266   -2.0773    0.2173    2.1641    0.9857   -1.9490    2.2038>> costcost =    0.1715    0.2876    0.1031
  • 图形显示:
    这里写图片描述

  • 注意三条线组成的三角形,,这个地方的点不属于任何类别。

补充:

1.regularized Logistic Regerssion

  • regularized 和 普通的Logistics没有太大的区别,只是在J的计算和θ更新中加上了以前的结果。

这里写图片描述

2.one_vs_all:

1.简介:

  • 其实one_vs_all还有一种算法,把θ当做单隐层前馈神经网络进行计算,比如说我们有K类样本,第一类样本我们可以看成[1,0,0,0...]共k个数,,然后依次,,第i个为1则代表第i类样本。计算方式和上面multi_class一样。

  • 前馈神经网络模型如下:
    这里写图片描述

2.代码:

  • 函数介绍:

    • one_vs_all:相当于主函数,
    • IrCostFunction:花费Jθ更新
    • myPredict:统计训练误差
  • 数据 和 训练得到的θ
    点击这儿下载

  • 训练结果:

Local minimum found.Optimization completed because the size of the gradient is less thanthe default value of the function tolerance.<stopping criteria details>Local minimum found.Optimization completed because the size of the gradient is less thanthe default value of the function tolerance.<stopping criteria details>Training Set Accuracy: 100.000000
  • one_vs_all:
function [all_theta,cost] = oneVsAll(X, y, num_labels)%ONEVSALL trains multiple logistic regression classifiers and returns all%the classifiers in a matrix all_theta, where the i-th row of all_theta %corresponds to the classifier for label i%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels%   logisitc regression classifiers and returns each of these classifiers%   in a matrix all_theta, where the i-th row of all_theta corresponds %   to the classifier for label i% Some useful variablesm = size(X, 1);n = size(X, 2);% You need to return the following variables correctly all_theta = zeros(n+1,num_labels);% Add ones to the X data matrixX = [ones(m, 1),X];% ====================== YOUR CODE HERE ======================% Instructions: You should complete the following code to train num_labels%               logistic regression classifiers with regularization%               parameter lambda. %% Hint: theta(:) will return a column vector.%% Hint: You can use y == c to obtain a vector of 1's and 0's that tell use %       whether the ground truth is true/false for this class.%% Note: For this assignment, we recommend using fmincg to optimize the cost%       function. It is okay to use a for-loop (for c = 1:num_labels) to%       loop over the different classes.%%       fmincg works similarly to fminunc, but is more efficient when we%       are dealing with large number of parameters.%% Example Code for fmincg:%%     % Set Initial theta%     initial_theta = zeros(n + 1, 1);%     %     % Set options for fminunc%     options = optimset('GradObj', 'on', 'MaxIter', 50);% %     % Run fmincg to obtain the optimal theta%     % This function will return theta and the cost %     [theta] = ...%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...%                 initial_theta, options);%cost = zeros(num_labels,1);options = optimset('GradObj', 'on', 'MaxIter', 50);for i =1:num_labels    L = logical(y==i);     [all_theta(:,i),cost(i,1)] = ...       fminunc (@(t)(lrCostFunction(t, X, L)),all_theta(:,i), options);endmyPredict(all_theta,X,y);% =========================================================================end
  • IrCostFunction:
function [J,grad] = lrCostFunction(thetas,x, y)%LRCOSTFUNCTION Compute cost and gradient for logistic regression with %regularization%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using%   theta as the parameter for regularized logistic regression and the%   gradient of the cost w.r.t. to the parameters. % Initialize some useful valuesm = length(y); % number of training examples%单独调试该函数时用的代码%x = [ones(m,1),x];%theta = zeros(size(x,2),1);%y = logical(y==1);% ====================== YOUR CODE HERE ======================% Instructions: Compute the cost of a particular choice of theta.%               You should set J to the cost.%               Compute the partial derivatives and set grad to the partial%               derivatives of the cost w.r.t. each parameter in theta%% Hint: The computation of the cost function and gradients can be%       efficiently vectorized. For example, consider the computation%%           sigmoid(X * theta)%%       Each row of the resulting matrix will contain the value of the%       prediction for that example. You can make use of this to vectorize%       the cost function and gradient computations. %% Hint: When computing the gradient of the regularized cost function, %       there're many possible vectorized solutions, but one solution%       looks like:%           grad = (unregularized gradient for logistic regression)%           temp = theta; %           temp(1) = 0;   % because we don't add anything for j = 0  %           grad = grad + YOUR_CODE_HERE (using the temp variable)%grad = zeros(size(thetas));hx = sigmoid(x * thetas);  J = (1.0/m) * sum(-y .* log(hx) - (1 - y) .* log(1 - hx));  grad = (1.0/m) .* x' * (hx - y);% ================================================x=============end
  • myPredict:
function p = myPredict(Theta1,X,y)%PREDICT Predict the label of an input given a trained neural network%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the%   trained weights of a neural network (Theta1, Theta2)% Useful valuesm = size(X, 1);num_labels = 10;% You need to return the following variables correctly p = zeros(size(X, 1), 1);% ====================== YOUR CODE HERE ======================% Instructions: Complete the following code to make predictions using%               your learned neural network. You should set p to a %               vector containing labels between 1 to num_labels.%% Hint: The max function might come in useful. In particular, the max%       function can also return the index of the max element, for more%       information see 'help max'. If your examples are in rows, then, you%       can use max(A, [], 2) to obtain the max for each row.%z_2 = X*Theta1;a_2 = sigmoid(z_2);for i = 1:m    for j = 1:num_labels        if a_2(i,j) >= 0.5            p(i,1) = j;            break;        end    endend  fprintf('\nTraining Set Accuracy: %f\n', mean(double(p == y)) * 100);% =========================================================================end

与本博客相关知识链接:

  • 由Logistics Regression multi_class中的one_vs _all算法 ——> 双隐层前馈神经网络 :BP神经网络
  • 由Logistic Regerssion —–> SVM : 特征空间映射
  • Logistic Regerssion 的理论解释: 概率论解释
2 0
原创粉丝点击