CS229 Lecture Notes(2): Logistic Regression

来源:互联网 发布:舆情监控软件多少钱 编辑:程序博客网 时间:2024/05/29 17:28

Logistic Regression

  • Binary classification problem

  • Failure of OLS regression in binary classification problem:

    • hard to define the threshold
    • no sense if y>1 or y<0
  • Hypothesis:

    hθ(x)=g(θTx)=11+eθTx

    where
    g(z)=11+ez
    is called the logistic function or the sigmoid function.
    A useful property of sigmoid function:
    g(z)=g(z)(1g(z))

    理论上,似乎任何一个值域在[0,1]区间上的平滑单增函数都可以做为hypothesis中的g(z)。然而,在学习了GLM和generative learning algorithms后,我们会看到这里选择sigmoid function的原因。

Maximum Likelihood Estimation

  • Probabilistic assumption: Bernoulli distribution

    p(y|x;θ)=(hθ(x))y(1hθ(x))1y

  • Likelihood function:

    L(θ)=i=1mp(y(i)|x(i);θ)=i=1m(hθ(x(i)))y(i)(1hθ(x(i)))1y(i)

    log likelihood:
    l(θ)=logL(θ)=i=1my(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))

  • Gradient ascent (since we’re maximizing rather than minimizing a function now):

    θ:=θ+αθl(θ)

    where
    θjl(θ)=(yhθ(x))xj

    在logistic regression中,我们得到一个与linear regression类似的更新法则:除了这里的hθ(x)θTx的一个非线性函数。这只是一个巧合,还是有什么更深层次的原因呢?我们会在学习GLM模型时给出解答。

Digression: The perceptron learning algorithm

  • Hypothesis:

    hθ(x)=g(θTx)

    where
    g(z)={10z0z<0

    注意这里的g(z)z=0处不可微,所以很难给予perceptron一个概率性的解释,并用最大似然法去求解。

  • Perceptron learning algorithm:

    θj:=θj+α(y(i)hθ(x(i)))x(i)j

Newton’s method for maximizing l(θ)

  • Newton’s method: to find a value of θ so that f(θ)=0, we perform the following update:

    θ:=θf(θ)f(θ)

  • Using Newton’s method to maximize l(θ) by letting f(θ)=l(θ)=0:

    θ:=θl(θ)l(θ)

  • Newton-Raphson method (also called Fisher scoring when applied to logistic regression problem): a vectorized generalization of Newton’s method:

    θ:=θH1θl(θ)

    where
    Hij=2l(θ)θiθj
    is called Hessian Matrix.

虽然计算Hessian矩阵比较耗时,但由于引入了二阶偏导信息,Newton迭代法在求解最大似然函数时往往要比Gradient Descent更快地收敛。

0 0
原创粉丝点击