# Maximum likelihood in multivariate Gaussian distribution (1)

来源:互联网 发布:google play推荐算法 编辑:程序博客网 时间:2024/06/03 03:54

The multivariate Gaussian distrubution is a well-known probability distribution, with a distribution function as P(x)=12πn2|Σ|12e(xμ)TΣ1(xμ)2. As we all know, if we assume the we have m sample data, the value of μ and Σ can be determined with the sample data x(1),x(2)x(m) by the following equations with maximum likelihood estimation:

μΣ=x(1)+x(2)++x(m)m=i=1m (x(i)μ)(x(i)μ)T

However, how does the formula be derived?


To understand the points I make in this article, you have to hold the basic ideas about calculus, linear algebra, and statistics. You should be familar with derivation, matrix and maximum-likelihood estimation.


We start out to write out the log-likelihood function of μ and Σ. We denote the likelihood function as L(μ,Σ), and


So, the log-likelihood function l(μ,Σ) is


Then, the derivative of l with respect to μ is given by


First, we expand the equation and get


Then, we move the partial derivative into the summation


Because the last term (x(i))TΣ1x(i) hase no relation with μ


We analyse each terms in the formula. The first point I want to make is that (x(i))TΣ1μ=μTΣ1x(i). Why? Let’s consider a general form of this identity which I will prove in the next few paragraphs.

Theorem: Let a, b to be n-dimentional vectors and X is a n-by-n sysmetric matrix, then the following identity holds:


Proof: You can expand the two terms at both sides of this identity, and you will get


As a result, aTXb=bTXa if and only if Xij=Xji, i,j1,2n

We can interpret the result with inner product. Recall that the inner product of the vector a and b is defined as <a,b>=aTb. And it is obvious that bTa=<b,a>=<a,b>=aTb=ni=1aibi. In fact, we can also define the weighted inner product of a and b as <a,b>W=ni=1,j=1aiWijbj=aTWb, where the matrix W is a symetric matrix called weight matrix. The term ‘weighted inner product’ is used to emphasize that we are about to “insert” a “weight” into the product of all the pairs of every items in a and b, namely, ai and bj. The inner product of two vectors a and b is cummutative, and so as its weighted counterpart with respect to a perticular weight matrix. As a result, the identity in the previous theorem, aTXb=bTXa, is now almost obvious. It is just the cummutative rule.

We continue to analyse the lμ. Now, we can get


Calculate lμ is not so easy, so we instead calculate lμj, which is the derivative of l with respect to the j’s item in μ.


We denote the column vector with a 1 in its j’s item and 0 in its all other items as ej, it can be shown that μμj=ej because only the j’s item in μ has relation with μj. So we can get


Note that we make use of the theorem that aTXb=bTXa if X is symetric again.

eTjΣ1x(i)=eTj(Σ1x(i)) is just the j’s item of Σ1x(i) (which is simply a n-dimentional vector), so we denote it as (Σ1x(i))j (the same as the convention of some numerical package such as numpy). As a result, the equation will be converted to


That is, lμj is just the j’s item of (mi=1(Σ1x(i)Σ1μ)). As a result, lμ is just (mi=1(Σ1x(i)Σ1μ)).

Recall that we want to maximize l with respect to μ. So we should set lμ to 0. Thus, we will get


The term Σ1 is obviously non-singular. So we can get


and the expression for μ is thus just


0 0