Logistic Regression

来源：互联网发布：ktv歌曲下载软件编辑：程序博客网时间：2024/04/30 23:41

why named logistic regression? (cause losigtic function)
what is the model?
how to solve the minimal/maximal problem?

2-class problem

P (y = 1 | x, θ) = f (x) = 1 1 + e - θ T x

this logistic (sigmoid) makes probability lies in 0~1

P (y = 0 | x, θ) = 1 - f (x)

// also lies in 0~1

in all,

P (y | x, θ) = f (x) y (1 - f (x)) 1 - y

假定我们已经学到了最优的θ，那么分类的实现是计算P(1|x,θ), if >0.5, then p1−p>1; if <0.5, then p1−p<1

学习的目标是最大化整个样本集合成立的概率：

L (θ) = \prod i = 1 n P (y i | x i, θ)

ℓ (θ) = l o g L (θ)

then gradient descent could be applied to solve the

MLE problem.

Multi-class problem

ofvitalimportance
0
give the constrain the probability of different class:

\sum i = 1 m P (y (i) = 1 | x, w) = 1

m equals to the class number and

w∈Rd×m, where

d is the dimension of feature vector

P (y (i) = 1 | x, w) = e x p ( w ( i ) T x ) \sum m j = 1 e x p ( w ( j ) T x ) for i \in 1, . . ., m

The

cost function is

SMLR - Sparse multinomial Logistic Regression

In total m classes, input vector/feature is d-dimensional,
the weight vector for one of the classes need not be estimated. Without loss of generality, we thus set w(m)=0 and the only parameters to be learned are the weight vectors w(i) for i∈1,…,m−1. For the remainder of the paper, we use w to denote the (d(m-1))-dimensional vector of parameters to be learned.

for ordinary softmax regression (also named as multinomial logistic regression-MLR), the probability that x belongs to class i is written as:

P (y (i) = 1 | x, w) = e x p ( w ( i ) T x ) \sum m j = 1 e x p ( w ( j ) T x ) for i \in 1, . . ., m

ℓ (w) = l o g \prod j = 1 n P (y j | x j, w)

, where n is the total number of samples.

ℓ (w) = \sum j = 1 n l o g P (y j | x j, w)

ℓ (w) = \sum j = 1 n \sum i = 1 m 1 {y (j) = i} l o g e x p ( w ( i ) T x j ) \sum m j = 1 e x p ( w ( j ) T x j ),

where n is the number of samples, m is the number of classes.

ℓ (w) = \sum j = 1 n {\sum i = 1 m y (i) j w (i) T x j - l o g \sum i = 1 m e x p (w (i) T x j)}

Besides on, add sparsity constraints to the cost function,

w^MAP=argmaxw{ℓ(w)+logp(w)}

In SMLR,

p(w)∝exp(−λ||w||1)

0 0