CS229 Lecture Notes(3): Generalized Linear Models
来源:互联网 发布:腾讯数据库泄露 编辑:程序博客网 时间:2024/06/06 05:47
The exponential family
A class of distributions is in the exponential family if it can be written in the from
p(y,η)=b(y)exp(ηTT(y)−a(η))
where:η : the natural parameter (also called the canonical parameter)T(y) : the sufficient statistic (often be the case thatT(y)=y )a(η) : the log partition function (e−a(η) plays the role as a normalization constant)
在指数分布族中,给定
T 、a 、和b 的函数形式,我们就确定了一组以η 为参数的分布族。Bernoulli distribution family:
p(y;ϕ)=ϕy(1−ϕ)1−y=exp(ylogϕ+(1−y)log(1−ϕ))=exp((log(ϕ1−ϕ))y+log(1−ϕ))
thus we have:η=log(ϕ1−ϕ) ϕ=1/(1+e−η) (the Sigmoid function!)T(y)=y a(η)=−log(1−ϕ)=log(1+eη) b(y)=1
Bernoulli分布是指数分布族的一个例子。值得注意的是,如果我们将Bernoulli分布写成指数分布的形式,并用参数
η 来表示y=1 的概率ϕ ,我们很自然地得到了logistic function:ϕ=1/(1+e−η) 。在后面学习GLM时,我们将进一步阐释这个结论。Gaussian distribution family (for simplicity we set
σ2=1 ):p(y;μ)=12π‾‾‾√exp(−12(y−μ)2)=12π‾‾‾√exp(−12y2)⋅exp(μy−12μ2)
thus we have:η=μ T(y)=y a(η)=μ2/2=η2/2 b(y)=(1/2π‾‾‾√)exp(−y2/2)
Gaussian分布也是指数分布族的一个例子。只不过,对于Gaussian分布而言,其均值
μ (也是要预估的y )恰是其对应指数分布的参数η 。后面将会看到为什么要将这些分布写成以η 为参数的指数分布的形式。
Constructing GLMs
Motivation: Given the distribution family of response variable (such as Bernoulli distribution or Gaussian distribution), how can we construct a regression/classification hypothesis?
Three assumptions for constructing a Generalized Linear Model:
p(y|x;θ)∼ExponentialFamily(η) h(x)=E[T(y)|x] (for most cases,T(y)=y , which leads toh(x)=E[y|x] )η=θTx (design choice)
通过上面三个假定得到的模型
h(x) 称之为Generalized Linear Model。后面会看到,通过这种方式得到的GLMs有着很多优雅的性质,使得模型的学习更加简单高效。Derivative of Ordinary Least Squares (OLS):
- probabilistic assumption:
p(y|x)∼(μ,σ2)∼ExponentialFamily(η) - canonical response function:
g(η)=E[T(y)|x;η]=μ=η - hypothesis:
hθ(x)=g(θTx)=θTx
- probabilistic assumption:
Derivative of Logistic Regression:
- probabilistic assumption:
p(y|x)∼Bernoulli(ϕ)∼ExponentialFamily(η) - canonical response function:
g(η)=E[T(y)|x;η]=ϕ=11+e−η - hypothesis:
hθ(x)=g(θTx)=11+e−θTx
- probabilistic assumption:
无论是linear regression,还是logistic regression,都是广义线性模型的一个特例。这也隐含着二者在学习算法上的相通性。
- Derivative of Softmax Regression:
- multi-classification problem
- probabilistic assumption:
p(y|x)∼Multinomial(ϕ1,...,ϕk−1)∼ExponentialFamily(η) with:T(y)∈k−1 andT(y)i=1{y=i}={10y=iy≠i a(η)=−log(ϕk)=−log(1−∑k−1i=1ϕi) b(y)=1 η∈k−1 andηi=logϕiϕk
- canonical response function:
g(η)i=E[T(y)i|x;η]=ϕi=eηi1+∑k−1j=1eηj
which is called the softmax function - hypothesis:
[hθ(x)]i=g(η)i=eθTix1+∑k−1j=1eθTjx
which is called the softmax regression
- CS229 Lecture Notes(3): Generalized Linear Models
- CS229 Lecture notes 1
- CS229 Lecture notes
- 4 Generalized linear models
- generalized Linear Models
- Generalized Linear Models笔记(一)
- Generalized linear models and linear classification
- 网易公开课讲义3 Generalized Linear Models 笔记
- Stanford机器学习__Lecture notes CS229. Linear Regression(3)
- CS229 Lecture Note(1): Linear Regression
- CS229 Lecture Notes(2): Logistic Regression
- CS229 Lecture Notes(4): Generative Learning Algorithm
- 广义线性模型--Generalized Linear Models
- 1.1 Generalized Linear Models 广义线性模型
- Generalized Linear Models广义线性模型
- 广义线性模型(Generalized Linear Models)
- 广义线性模型(Generalized Linear Models)
- Supervised learning-1.1 Generalized Linear models
- Retrofit详解(一)(Retrofit创建过程)
- synchronized和lock,volitile区别
- js将格式化后的时间拼接成字符串
- swift学习笔记1
- 三、冒泡排序
- CS229 Lecture Notes(3): Generalized Linear Models
- android studio转eclipse
- OpenGL使用libPng读取png图片
- guava学习目录
- win7实现多用户同时登陆
- 台大林轩田《机器学习基石》学习笔记:相关领域与三大原则
- 字符串处理类
- NSWindowDelegate 关于窗口大小 位置 显示 等api说明
- CS229 Lecture Notes(4): Generative Learning Algorithm