Iterative Reweighted Least Squares

来源:互联网 发布:淘宝小二人工服务电话 编辑:程序博客网 时间:2024/04/28 11:36

  • 逼近 approximation
    • 加权最小二乘weighted least squared error approximation
    • 逼近其他范式
    • IRLS逼近 l_p norm
      • Overdetermined system N p
        • Algebra 方法
      • IRLS for Logit Regression
        • Modeling
  • Iterative Least Squares for logistic regression
    • IRLS

本文基于这篇paper。

逼近 (approximation)

对于线性的问题,其模型是:

Ax=b An×px,bp

如果b不能存在于A的列空间内(the space spanned by the columns of A),x的解不唯一,那么将采用一种近似的方法。它通过最小化误差的范式来寻求解(to be solved by minimizing the norm)
e=Axb

常采用的是norm2l2
||e||22=ie2i=eTe

对于l2,它的解分三种情况

  • If N=p, (square and nonsingular), 存在严格解:
    x=A1b
  • If N > p (over specified):
    ATAx=ATbx=[ATA]1ATb
    ATAp×p
  • If N < p (under specified):
    Ax=AAT(AAT)1bx=AT[AAT]1b

加权最小二乘(weighted least squared error approximation)

W=w1                ...              wpp×p

W 是diagonal matrix。
||We||22=eTWTWeError=(WAxWb)T(WAxWb)=xTATWTWAx2bTWTWAx+bTWTWb

Error是一个convex function, 其解是:

ax2+bx+c的极值是x=0.5a1b

  • 对于N > p:
    x=[ATWTWA]1ATWTWb
  • 对于N < p:

x=[WTW]1AT[A[WTW]AT]1b

逼近其他范式

比较有名的就是具有稀疏性意义的l1l

||e||p=(i|ei|p)1/p

IRLS逼近 lp norm

IRLS(iterative reweighted least squares) allows an iterative algorithm to be built from the analytical solutions of the weighted least squares with squares with an iterative re weighting to converge to the optimal lp approximation.
IRLS使用迭代的方式解决带权重的lpnorm逼近问题。

Overdetermined system N > p

Algebra 方法

x=[ATWTWA]1ATWTWb

IRLS for Logit Regression

对于二分问题,我们想知道条件概率Pr(Y=1|X) XRm,Y{0,1}。广义线性模型(Generalized Linear Models, GLM)的观点是:

  • 我们的链接函数(link function)是logit,也即:

    g(p)=logit(p)=logp1pp=Pr(Y=1|X)

  • 那么我们要对X进行拓展(extension)

    ϕ(x)=[1,ϕ1(x),...,ϕp(x)]T

这样用ϕlogit就将px使联系起来,使其可以用回归的方法解决原问题。
我们记:

logit(p)=logp1p=wTϕ(x)p=σ(wTϕ(x))=11+ewTϕ(x)pw=σ(1σ)ϕ(x)

Modeling

对于n个数据集(xi,yi), i = 1,…,n, xim,yi[1,0],可以列出其似然函数以及entropy-cross函数:

Likelihood=i=1npyii(1pi)1yil=log(Likelihood)=i=1nyiln(pi)+(1yi)ln(1pi)

这是一个优化问题:
argminw{i=1nyiln(pi)+(1yi)ln(1pi)}

我们对l关于w求偏微分:
lw=i=1nyi(1/pi)pi(1pi)ϕ(xi)+(1yi)(1/(1pi))(pi)(1pi)ϕ(xi)=i=1n(piyi)ϕ(xi)=|ϕ(x1)||ϕ(xn)|(p+1)×n×|(piyi)|n×1=ΦT(py)

Hessian Matrix :
H=2lw2=((l(w))T)=[(ni=1(piyi)ϕ(xi))T]w=i=1npi(1Pi)ϕ(xi)ϕ(xi)T=i=1nϕ(xi)pi(1pi)ϕ(xi)T=ΦTp1(1p1)pn(1pn)n×nΦ=ΦTRΦ

采用Newton-Raphson 迭代方式:
wz=wH1l(w)=w(ΦTRΦ)1ΦT(py)=(ΦTRΦ)1[ΦTRΦwΦT(py)]=(ΦTRΦ)1ΦTRz=ΦwR1(py)

R=STS
w==((SΦ)T(SΦ))1(SΦ)TSz

可以看作是weighted least square.

Iterative Least Squares for logistic regression

我们知道logistic regression模型是:

η(x)=βxη(x)=logit(p)=log(p1p)

Link function 是g(p)=log(p1p).
我们知道,y{0,1}, 则g(y)=, 为了能够regression,使用Taylor Expansion:
g(Y)g(p)+(Yp)g(p)=Z

Logit Regression 的分布是二项分布,Var(Y|X=1)=p(1p)
Var(Z|X=1)=(g(p))2Var(Y|X=1)=(1p(1p))2(p(1p))=(p(1p))1

Because the variance of Z changes with X , this is a heteroskedastic regression
problem, the appropriate way of dealing with such a problem is to use weighted least squares, with weights inversely proportional to the variances. This means that, in logistic regression, the weight at x should be proportional to p(1 − p)

IRLS

  1. Get the data (x1,y1),...(xn,yn), and some initial guesses β.
  2. Until β0 , β converge:
    a) Calculate η(xi) = βxi,以及相应的p(xi)
    b) Transformed responses zi=η(xi)+yip(xi)p(xi)(1p(xi))
    c) Calculate the weights wi=p(xi)(1p(xi))
    d) Do a weighted linear regression of zi on xi with weights wi
0 0
原创粉丝点击