Maximum-Likelihood Expectation-Maximization (ML-EM)

来源:互联网 发布:软件的数据采集 编辑:程序博客网 时间:2024/05/19 22:57

I. Notations

X={x1,x2,...,xN} i.i.d. observed variables

Z={z1,z2,...,zN} latent variables

Θ(t)   The estimate of the parameters at iteration t

l(Θ)  The marginal log-likelihood logp(X|Θ)

II. Derivations

1. Maximum-Likelihood:

Θ̂ =argmaxΘlogp(X|Θ)=argmaxΘiNlogp(xi|Θ)=argmaxΘiNlogkKP(xi,z=k|Θ)

which is hard to compute with a gradient method.

l(Θ)=logp(X|Θ)=iNlogkKP(xi,z=k|Θ)=iNlogkKq(z=k | xi,Θ)P(xi,z=k|Θ)q(z=k | xi,Θ)

iNkKq(z=k | xi,Θ)logP(xi,z=k|Θ)q(z=k | xi,Θ)Q(q,Θ)

where q(z | x,Θ) is an arbitrary density over Z, and the inequality is given by Jessen’s inequality, i.e, Ef(x)f(E(x)) for convex function and Ef(x)f(E(x)) for concave function. Here f(x)=log(x) is a concave function.

2. Expectation-Maximization:

Thus we have the lower bound of target function l(Θ). Instead of maximizing l(Θ) directly, EM maximizes the lower-bound Q(q,Θ) via coordinate ascent:

Estep:q(t+1)=argmaxqQ(q,Θt)

Mstep:Θ(t+1)=argmaxΘQ(q(t+1),Θ)

E-Step: compute q(t+1)=argmaxqQ(q,Θt) with constraint Kkq(z=k | x,Θ)=1 (arbitrary density function over Z), by introducing the lagrange multiplier λ, we define

G(q)=λ(1kKq(z=k | x,Θ))+kKq(z=k | x,Θ)logP(x,z=k|Θ)kKq(z=k | x,Θ)logq(z=k | x,Θ)

G(q)q=λ+logP(x,z=k|Θ)logq(z=k | x,Θ)1=0

q(z=k | x,Θ)P(x,z=k | Θ)=P(x,z=k|Θ)KkP(x,z=k|Θ)=P(z=k|x,Θ)

thus, q=P(z|x,Θ) give the closest lower bound of l(Θ)

M-Step: update parameters θ, with

Θ(t+1)=argmaxΘiNkKP(z=k | xi,Θ(t))logP(xi,z=k|Θ)P(z=k | xi,Θ(t))

=argmaxΘiNkKP(z=k | xi,Θ(t))logP(xi,z=k|Θ)

III. Applications to Gaussian Mixture Models(GMMs)

For general mixture models, we have

P(x|Θ)=kKP(x,z=k|Θ)=kKP(z=k|Θ)P(x|z=k,Θ)

For Gaussian Mixture Models (GMMs), we have
P(x|Θ)=kKπk (x | μk,Σk)withkKπk=1

E-Step for GMMs:

Define qi,kP(z=k|xi,Θ), then

qi,k=P(z=k,xi,|Θ)KkP(z=k,xi,|Θ)=πk (xi | μk,Σk)Kkπk (xi | μk,Σk)

M-Step for GMMs:
update the parameters which maximizes the Log-likelihood below

Θ(t+1)=argmaxΘiNkKqi,klogP(xi,z=k|Θ)=argmaxΘiNkKqi,klog(πk (xi | μk,Σk))

with subject to Kkπk=1, introducing the lanrange multiplier into the objective function, we thus have
G(Θ,λ)=λ(1kKπk)+iNkKqi,klogπk+iNkKqi,klog(xi | μk,Σk)

G(Θ,λ)πk=0andkKπk=1πk=Niqi,kKkNiqi,k=Niqi,kN

G(Θ,λ)μk=0μk=Niqi,kxiNiqi,k

G(Θ,λ)Σk=0Σk=Niqi,k(xiμk)(xiμk)TNiqi,k

0 0
原创粉丝点击