Logistic Regression

来源：互联网发布：饱和攻击知乎编辑：程序博客网时间：2024/05/01 04:37

1. Sigmoid Function

Sigmoid Function (也叫Logistic Function)的定义如下

$g(z) = \frac{1}{1+e^{-z}}$

它的函数图像如下

从上图可以看出，Sigmoid Function的函数值介于(0, 1)，当z $<$ 0时，g(z) $<$ 0.5; 当z $>$ 0时，g(z) $>$ 0.5; 当z = 0时，g(z) = 0.5。

2. Logistic Regression

在线性回归中，我们假设

$h(\mathbf{x}) = \mathbf{\Theta}^{T}\mathbf{X}$

这里 $h(\mathbf{x})$ 的输出为( $-\infty,+\infty$ )。如果我们利用Sigmoid Function对它做一个归一化处理，就得到Logistic Regression

$h(\mathbf{x}) = g(\mathbf{\Theta }^{T}\mathbf{x}) = \frac{1}{1+e^{-\mathbf{\Theta }^{T}\mathbf{x}}}$

假设我们有两个类别，分别用1和0表示，当 $h(\mathbf{x})$ $<$ 0.5时，我们输出0；当 $h(\mathbf{x})$ $\geq$ 0.5时输出1。这样，我们就使用Logistic Regression解决了一个只有两个类别的分类问题。

注意到 $h(\mathbf{x})$ $<$ 0.5对应的 $\mathbf{\Theta }^{T}\mathbf{X} < 0$ ， $h(\mathbf{x})$ $\geq$ 0.5对应的 $\mathbf{\Theta }^{T}\mathbf{X} \geq 0$ ， $\mathbf{\Theta }^{T}\mathbf{X} = 0$ 其实就是两个类别的分界线，叫做Decision Boundary。

3. 训练模型

我们定义

$cost(h_{\mathbf{\Theta }}(\mathbf{x}), y) = \left\{\begin{matrix} &-log(h_{\mathbf{\Theta }}(\mathbf{x})) &if &y=1\\ &-log(1-h_{\mathbf{\Theta }}(\mathbf{x})) &if &y=0\\ \end{matrix}\right.$

它的函数图像如下

从上图可以看出，当 $h_{\mathbf{\Theta }}(\mathbf{x})$ 较小时，我们倾向于选择y=0, 因为这时cost会比较小；而当 $h_{\mathbf{\Theta }}(\mathbf{x})$ 较大时，我们倾向于选择y=1。这时，我们可以定义Logistic Regression的代价函数

$J(\mathbf{\Theta }) = \frac{1}{m}\sum_{i=1}^{m}cost(h_{\mathbf{\Theta }}(\mathbf{x}^{(i)}), y^{(i)})$

$=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_{\mathbf{\Theta }}(\mathbf{x}^{(i)})) + (1-y^{(i)})log(1-h_{\mathbf{\Theta }}(\mathbf{x}^{(i)}))]$

和线性回归一样，我们仍然使用梯度下降法来求 $\mathbf{\Theta }$ ：

starts with some $\mathbf{\Theta}$ ;

repeats until convergence {

$\Theta_{j} = \Theta_{j} - \alpha \frac{\partial }{\partial \Theta_{j}}J(\mathbf{\Theta })=\Theta_{j} - \alpha \frac{1}{m} \sum_{i=1}^{m}(h_{\mathbf{\Theta }}(\mathbf{x}^{(i)})-y^{(i)})\mathbf{x}_{j}^{(i)}$ (Simultaneously update $\Theta_{j}$ )

}

$\frac{\partial }{\partial \Theta_{j}}J(\mathbf{\Theta})$ 的部分推导过程见下图手稿

4. Regularization

和线性回归一样，Logistic Regression也可能出现overfitting的现象，我们仍然使用Regularization来解决这个问题。新的代价函数为

$J(\mathbf{\Theta }) = \frac{1}{m}\sum_{i=1}^{m}cost(h_{\mathbf{\Theta }}(\mathbf{x}^{(i)}), y^{(i)}) + \lambda \frac{1}{2m}\sum_{j=1}^{n}\Theta_{j}^{2}$

5. 多类别分类

上面讲的是只有两个类别的情况，如果有多个类别，如何使用Logistic Regression进行分类呢？先看一个简单的例子：

假设有3个类别C1, C2, C3，我们先训练一个针对C1的分类器，此时将C1类的样例标记为y=1，而将C2, C3类的样例都标记为y=0，于是将一个多类别的分类问题转换为一个只有两个类别的分类问题。这样得到一个针对C1的分类器 $h_{\mathbf{\Theta }}^{(1)}(\mathbf{x})$ ，如下面左所示图

同理，针对C2、C3各训练一个分类器 $h_{\mathbf{\Theta }}^{(2)}(\mathbf{x})$ 、 $h_{\mathbf{\Theta }}^{(3)}(\mathbf{x})$ 。这样，对于任意的输入 $\mathbf{{x}'}$ ，分别计算 $h_{\mathbf{\Theta }}^{(1)}(\mathbf{{x}'})$ 、 $h_{\mathbf{\Theta }}^{(2)}(\mathbf{{x}'})$ 、 $h_{\mathbf{\Theta }}^{(3)}(\mathbf{{x}'})$ ，从中选择最大的 $h_{\mathbf{\Theta }}^{(i)}(\mathbf{{x}'})$ ，i即为 ${\mathbf{x}}'$ 的类别。

总结一下，当有多个(n)类别时，使用Logistic Regression进行分类的方法如下：

1. 针对每个类别训练一个分类器 $h_{\mathbf{\Theta }}^{(i)}(\mathbf{{x}})$ ，计算y=i的几率（并不是概率的意思）

2. 对于新的输入 ${\mathbf{x}}$ ，计算 $h_{\mathbf{\Theta }}^{(1)}(\mathbf{{x}}),h_{\mathbf{\Theta }}^{(2)}(\mathbf{{x}}), ..., h_{\mathbf{\Theta }}^{(n)}(\mathbf{{x}})$ ，计算 $\underset{i}{max}\left \{ h_{\mathbf{\Theta }}^{(i)}(\mathbf{{x}}) \right \}$ ，i即为 ${\mathbf{x}}$ 的类别。

0 0