Iterative Reweighted Least Squares

来源：互联网发布：淘宝小二人工服务电话编辑：程序博客网时间：2024/04/28 11:36

逼近 approximation
- 加权最小二乘weighted least squared error approximation
- 逼近其他范式
- IRLS逼近 l_p norm
  - Overdetermined system N p
    - Algebra 方法
  - IRLS for Logit Regression
    - Modeling
Iterative Least Squares for logistic regression
- IRLS

本文基于这篇paper。

逼近（approximation）

对于线性的问题，其模型是：

A x = b A \in ℝ n \times p x, b \in ℝ p

如果

b不能存在于

A的列空间内(the space spanned by the columns of A)，

x的解不唯一，那么将采用一种近似的方法。它通过最小化误差的范式来寻求解（to be solved by minimizing the norm）

e = A x - b

常采用的是

norm2或

| | e | | 22 = \sum i e 2 i = e T e

对于

l2,它的解分三种情况

If N=p, (square and nonsingular), 存在严格解：
x=A−1b

If N > p (over specified)：
ATAx=ATb→x=[ATA]−1ATb
ATA∈ℝp×p

If N < p (under specified):
Ax=AAT(AAT)−1b→x=AT[AAT]−1b

加权最小二乘(weighted least squared error approximation)

W = ⎡ ⎣ ⎢ ⎢ ⎢ w 1 . . . w p ⎤ ⎦ ⎥ ⎥ ⎥ p \times p

W 是diagonal matrix。

| | W e | | 22 = e T W T W e E r r o r = (W A x - W b) T (W A x - W b) = x T A T W T W A x - 2 b T W T W A x + b T W T W b

Error是一个convex function, 其解是：

ax2+bx+c的极值是x=−0.5a−1b

对于N > p:
$x = [A T W T W A] - 1 A T W T W b$
对于N < p:

x = [W T W] - 1 A T [A [W T W] A T] - 1 b

逼近其他范式

比较有名的就是具有稀疏性意义的l1和l∞

| | e | | p = (\sum i | e i | p) 1 / p

IRLS逼近 lp norm

IRLS(iterative reweighted least squares) allows an iterative algorithm to be built from the analytical solutions of the weighted least squares with squares with an iterative re weighting to converge to the optimal lp approximation.
IRLS使用迭代的方式解决带权重的lpnorm逼近问题。

Overdetermined system N > p

Algebra 方法

x = [A T W T W A] - 1 A T W T W b

IRLS for Logit Regression

对于二分问题，我们想知道条件概率Pr(Y=1|X) X∈Rm,Y∈{0,1}。广义线性模型（Generalized Linear Models, GLM）的观点是：

我们的链接函数（link function）是logit，也即：

$g (p) = l o g i t (p) = l o g p 1 - p p = P r (Y = 1 | X)$

那么我们要对X进行拓展(extension)

$ϕ (x) = [1, ϕ 1 (x), . . ., ϕ p (x)] T$

这样用ϕ和logit就将p与x使联系起来，使其可以用回归的方法解决原问题。
我们记：

l o g i t (p) = l o g p 1 - p = w T ϕ (x) p = σ (w T ϕ (x)) = 1 1 + e - w T ϕ ( x ) \partial p \partial w = σ (1 - σ) ϕ (x)

Modeling

对于n个数据集(xi,yi), i = 1,…,n, xi∈ℝm,yi∈[1,0]，可以列出其似然函数以及entropy-cross函数：

L i k e l i h o o d = \prod i = 1 n p y i i (1 - p i) 1 - y i l = - l o g (L i k e l i h o o d) = - \sum i = 1 n y i l n (p i) + (1 - y i) l n (1 - p i)

这是一个优化问题：

a r g m i n w {- \sum i = 1 n y i l n (p i) + (1 - y i) l n (1 - p i)}

我们对

l关于

w求偏微分：

\partial l \partial w = - \sum i = 1 n y i (1 / p i) p i (1 - p i) ϕ (x i) + (1 - y i) (1 / (1 - p i)) (- p i) (1 - p i) ϕ (x i) = \sum i = 1 n (p i - y i) ϕ (x i) = ⎡ ⎣ ⎢ ⎢ | ϕ (x 1) | \dots \dots \dots | ϕ (x n) | ⎤ ⎦ ⎥ ⎥ (p + 1) \times n \times ⎡ ⎣ ⎢ ⎢ ⎢ | (p i - y i) | ⎤ ⎦ ⎥ ⎥ ⎥ n \times 1 = Φ T (p - y)

Hessian Matrix :

H = \partial 2 l \partial w 2 = \nabla ((\nabla l (w)) T) = \partial [ ( \sum n i = 1 ( p i - y i ) ϕ ( x i ) ) T ] \partial w = \sum i = 1 n p i (1 - P i) ϕ (x i) ϕ (x i) T = \sum i = 1 n ϕ (x i) p i (1 - p i) ϕ (x i) T = Φ T ⎡ ⎣ ⎢ ⎢ ⎢ p 1 (1 - p 1) ⋱ p n (1 - p n) ⎤ ⎦ ⎥ ⎥ ⎥ n \times n Φ = Φ T R Φ

采用Newton-Raphson 迭代方式：

w' z = w - H - 1 \nabla l (w) = w - (Φ T R Φ) - 1 Φ T (p - y) = (Φ T R Φ) - 1 [Φ T R Φ w - Φ T (p - y)] = (Φ T R Φ) - 1 Φ T R z = Φ w - R - 1 (p - y)

令

R=STS

w' = = ((S Φ) T (S Φ)) - 1 (S Φ) T S z

可以看作是weighted least square.

Iterative Least Squares for logistic regression

我们知道logistic regression模型是：

η (x) = β x η (x) = l o g i t (p) = l o g (p 1 - p)

Link function 是

g(p)=log(p1−p).
我们知道，

y∈{0,1}, 则

g(y)=∞, 为了能够regression，使用Taylor Expansion:

g (Y) \approx g (p) + (Y - p) g' (p) = Z

Logit Regression 的分布是二项分布，

Var(Y|X=1)=p(1−p)

Var(Z|X=1)=(g′(p))2Var(Y|X=1)=(1p(1−p))2(p(1−p))=(p(1−p))−1

Because the variance of Z changes with X , this is a heteroskedastic regression
problem, the appropriate way of dealing with such a problem is to use weighted least squares, with weights inversely proportional to the variances. This means that, in logistic regression, the weight at x should be proportional to p(1 − p)

IRLS

Get the data (x1,y1),...(xn,yn), and some initial guesses β.

Until β0 , β converge:
a) Calculate η(xi) = βxi，以及相应的p(xi)
b) Transformed responses zi=η(xi)+yi−p(xi)p(xi)(1−p(xi))
c) Calculate the weights wi=p(xi)(1−p(xi))
d) Do a weighted linear regression of zi on xi with weights wi

0 0