最小二乘学习算法基础

来源：互联网发布：android股票k线图源码编辑：程序博客网时间：2024/05/29 13:26

Least Squares
Least squares regression is a traditional ML algorithm that minimizes the total square error between samples and learning output, i.e. minimize

J L S (θ) = 1 2 \sum i = 1 n (f θ (x i) - y i) 2

We’d like to get the estimated

θ where

JLS is the smallest.

θ^L S = arg min θ J L S (θ)

Take the linear model as an illustration.

f θ (x) = \sum j = 1 b θ j ϕ j (x) = θ T ϕ (x)

Then

J L S (θ) = 1 2 ∥ Φ θ - y ∥ 2

where

Φ = ⎛ ⎝ ⎜ ⎜ ϕ 1 (x 1) ⋮ ϕ 1 (x n) \dots ⋱ \dots ϕ b (x 1) ⋮ ϕ b (x n) ⎞ ⎠ ⎟ ⎟

which is also known as “design matrix”.
It is not necessary to calculate the minimum

JLS, but in order to get the corresponding

θ, we may take the derivative of

JLS(θ), which you have learned in high school.

\nabla θ J L S = (\partial J L S \partial θ 1, \dots, \partial J L S \partial θ b) T = Φ T Φ θ - Φ T y = 0

Then

θ^L S = (Φ T Φ) - 1 Φ T y ≜ Φ † y

where

Φ† is the general inverse of

Φ.
You can also apply weights to the training set:

min θ 1 2 \sum i = 1 n w i (f θ (x i) - y i) 2

Then

θ^L S = (Φ T W Φ) † Φ T W y

If we take the kernal model as an example, i.e.

f θ (x) = \sum j = 1 n θ j K (x, x j)

which can also be known as a kind of linear model. For simplicity, We just show the corresponding design matrix

K = ⎛ ⎝ ⎜ ⎜ K (x 1, x 1) ⋮ K (x n, x 1) \dots ⋱ \dots K (x 1, x n) ⋮ K (x n, x n) ⎞ ⎠ ⎟ ⎟

Note that least squares algorithm share the property of asymptotic unbiasedness, which means that the noise in y can be removed especially when the expectation of noise equals to zero.

E [θ^L S] = θ * .

0 0