Logistic Regression

来源:互联网 发布:ktv歌曲下载软件 编辑:程序博客网 时间:2024/04/30 23:41
  • why named logistic regression? (cause losigtic function)
  • what is the model?
  • how to solve the minimal/maximal problem?

2-class problem

P(y=1|x,θ)=f(x)=11+eθTx
this logistic (sigmoid) makes probability lies in 0~1
P(y=0|x,θ)=1f(x)

// also lies in 0~1

in all,

P(y|x,θ)=f(x)y(1f(x))1y

假定我们已经学到了最优的θ,那么分类的实现是计算P(1|x,θ), if >0.5, then p1p>1; if <0.5, then p1p<1

学习的目标是最大化整个样本集合成立的概率:

L(θ)=i=1nP(yi|xi,θ)

(θ)=logL(θ)

then gradient descent could be applied to solve the MLE problem.

Multi-class problem

ofvitalimportance
0
give the constrain the probability of different class:

i=1mP(y(i)=1|x,w)=1
m equals to the class number and wRd×m, where d is the dimension of feature vector x.
P(y(i)=1|x,w)=exp(w(i)Tx)mj=1exp(w(j)Tx)for i1,...,m

The cost function is

SMLR - Sparse multinomial Logistic Regression

In total m classes, input vector/feature is d-dimensional,
the weight vector for one of the classes need not be estimated. Without loss of generality, we thus set w(m)=0 and the only parameters to be learned are the weight vectors w(i) for i1,,m1. For the remainder of the paper, we use w to denote the (d(m-1))-dimensional vector of parameters to be learned.

for ordinary softmax regression (also named as multinomial logistic regression-MLR), the probability that x belongs to class i is written as:

P(y(i)=1|x,w)=exp(w(i)Tx)mj=1exp(w(j)Tx)for i1,...,m

(w)=logj=1nP(yj|xj,w)
, where n is the total number of samples.
(w)=j=1nlogP(yj|xj,w)

(w)=j=1ni=1m1{y(j)=i}logexp(w(i)Txj)mj=1exp(w(j)Txj),
where n is the number of samples, m is the number of classes.
(w)=j=1n{i=1my(i)jw(i)Txjlogi=1mexp(w(i)Txj)}

Besides on, add sparsity constraints to the cost function,

w^MAP=argmaxw{(w)+logp(w)}

In SMLR, p(w)exp(λ||w||1)

0 0
原创粉丝点击