CS229 Lecture Notes(3): Generalized Linear Models

来源:互联网 发布:腾讯数据库泄露 编辑:程序博客网 时间:2024/06/06 05:47

The exponential family

  • A class of distributions is in the exponential family if it can be written in the from

    p(y,η)=b(y)exp(ηTT(y)a(η))

    where:

    • η: the natural parameter (also called the canonical parameter)
    • T(y): the sufficient statistic (often be the case that T(y)=y)
    • a(η): the log partition function (ea(η) plays the role as a normalization constant)

    在指数分布族中,给定Ta、和b的函数形式,我们就确定了一组以η为参数的分布族。

  • Bernoulli distribution family:

    p(y;ϕ)=ϕy(1ϕ)1y=exp(ylogϕ+(1y)log(1ϕ))=exp((log(ϕ1ϕ))y+log(1ϕ))

    thus we have:

    • η=log(ϕ1ϕ)
    • ϕ=1/(1+eη) (the Sigmoid function!)
    • T(y)=y
    • a(η)=log(1ϕ)=log(1+eη)
    • b(y)=1

    Bernoulli分布是指数分布族的一个例子。值得注意的是,如果我们将Bernoulli分布写成指数分布的形式,并用参数η来表示y=1的概率ϕ,我们很自然地得到了logistic function:ϕ=1/(1+eη)。在后面学习GLM时,我们将进一步阐释这个结论。

  • Gaussian distribution family (for simplicity we set σ2=1):

    p(y;μ)=12πexp(12(yμ)2)=12πexp(12y2)exp(μy12μ2)

    thus we have:

    • η=μ
    • T(y)=y
    • a(η)=μ2/2=η2/2
    • b(y)=(1/2π)exp(y2/2)

    Gaussian分布也是指数分布族的一个例子。只不过,对于Gaussian分布而言,其均值μ(也是要预估的y)恰是其对应指数分布的参数η。后面将会看到为什么要将这些分布写成以η为参数的指数分布的形式。

Constructing GLMs

  • Motivation: Given the distribution family of response variable (such as Bernoulli distribution or Gaussian distribution), how can we construct a regression/classification hypothesis?

  • Three assumptions for constructing a Generalized Linear Model:

    • p(y|x;θ)ExponentialFamily(η)
    • h(x)=E[T(y)|x] (for most cases, T(y)=y, which leads to h(x)=E[y|x])
    • η=θTx (design choice)

    通过上面三个假定得到的模型h(x)称之为Generalized Linear Model。后面会看到,通过这种方式得到的GLMs有着很多优雅的性质,使得模型的学习更加简单高效。

  • Derivative of Ordinary Least Squares (OLS):

    • probabilistic assumption: p(y|x)(μ,σ2)ExponentialFamily(η)
    • canonical response function: g(η)=E[T(y)|x;η]=μ=η
    • hypothesis: hθ(x)=g(θTx)=θTx
  • Derivative of Logistic Regression:

    • probabilistic assumption: p(y|x)Bernoulli(ϕ)ExponentialFamily(η)
    • canonical response function: g(η)=E[T(y)|x;η]=ϕ=11+eη
    • hypothesis: hθ(x)=g(θTx)=11+eθTx

无论是linear regression,还是logistic regression,都是广义线性模型的一个特例。这也隐含着二者在学习算法上的相通性。

  • Derivative of Softmax Regression:
    • multi-classification problem
    • probabilistic assumption: p(y|x)Multinomial(ϕ1,...,ϕk1)ExponentialFamily(η) with:
      • T(y)k1 and
        T(y)i=1{y=i}={10y=iyi
      • a(η)=log(ϕk)=log(1k1i=1ϕi)
      • b(y)=1
      • ηk1 and
        ηi=logϕiϕk
    • canonical response function:
      g(η)i=E[T(y)i|x;η]=ϕi=eηi1+k1j=1eηj

      which is called the softmax function
    • hypothesis:
      [hθ(x)]i=g(η)i=eθTix1+k1j=1eθTjx

      which is called the softmax regression
0 0