coursera Machine learning Andrew NG 学习笔记（二）—Logistic regression

来源：互联网发布：linux c项目开发ide 编辑：程序博客网时间：2024/05/21 08:47

Classification

Logistic regression 主要用来解决分类（classification）问题。为什么解决分类问题的模型会包含regression这个词呢，我的理解是实际上这个本来logistic regression是尝试用linear regression（线性回归）的方法解决classification的问题，也就是说仍然用一条直线来拟合离散的数据。比如下面这个例子。可以以hθ(x)≥0.5为判断标准，来确定是否是positive class。

这里写图片描述

但事实上用linear regression会出现问题，如下图的情况，如果其中一个data以outlier的形式出现，就会使得预测效果不佳。这时候就产生了logistic regression。

这里写图片描述

Hypothesis representation

1. Logistic regression model

logistic regression其实也是在linear regression的基础上有所改进产生的模型。线性回归的目标函数是hθ(x)=θTx.
我们希望0≤hθ(x)≤1（逻辑回归与线性回归不同，需要将hθ的输出界定在0至1之间，因果关系？）, 但正常的线性回归的目标函数是不可能的，因此我们令hθ(x)=g(θTx) ，其中g(z)=11+e−z。所以，hθ(x)=11+e−θTx 。

在这里g(z)被称为 sigmoid function或者logistic function，这是一个s型的方程

这里写图片描述

2. Interpretation of hypothesis output

这里面hθ(x)的数学意义是estimated probability that y=1 on input x,也就是说对于一个x，hθ(x)代表了输入是x时y=1的概率。
可以表示为

hθ(x) = P(y=1/x;θ) “probability that y=1, given x,parameterized by θ”

Decision Boundary

linear model
这里写图片描述

non-linear model
这里写图片描述
如果已经知道了最优解的参数，那么decision boundary很容易得出，类似线性规划
Q：关于data structure的intuition：比如看见data的分布能够大致推测fit什么模型（线性，非线性）。需要这样的intuition。

Cost Function

1. problem

How to choose parameters to fit the data？可以用下图概括
这里写图片描述

2. cost funciton

类比线性回归的代价方程，logistic function的cost function可以为cost(hθ(x),y)=12(hθ(x)−y)2，但是因为hθ 是sigmoid function而不是线性的，所以可能导致cost function不是凸函数而难以找到最优解，如以下的图，没办法找到全局的最小值。
这里写图片描述

因此在这里我们选择对数似然损失函数作为逻辑回归的cost function

if y =1, Cost(hθ(x),y)=−log(hθ(x))
if y =0, Cost(hθ(x),y)=−log(1−hθ)

Simplified cost function and gradient penalty

以y作为判断依据，但还可以简化为
Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
J(θ)=−1m[∑mi=1y(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i))], 这个可以利用maximum likelihood estimation来推导。

仍旧用gradient descent 来求最优解，可表示为
repeat{θj:=θj−α∑mi=1(hθ(x(i))−y(i))x(i)j}

处理multi-class的问题时，可以用vectorized implementation.

参考
【1】coursera上面第六课logistic regression的视频和课件
【2】https://en.wikipedia.org/wiki/Sigmoid_function

0 0