[LA] Centering a data set

来源:互联网 发布:写轮眼美瞳软件下载 编辑:程序博客网 时间:2024/05/17 20:24

  • Centering data set
  • Application

1. Centering data set

If we have a data set XRn×p(each row is a sample), then column mean of this data set can be expressed in

X¯=1nXT1n

So the centered data set is
Xc=X1nX¯T=X1n1n1TnX=(I1n1n1Tn)X

The matrix

C=(I1n1p1Tp)

is called centering matrix.

2. Application

Note that this is a more complex decomposition by centering matrix

Proof of Proposition 1 in
http://blog.csdn.net/comeyan/article/details/50514596

proof: Firstly we express mean of full data set by group means,

μ¯=g=1NngNμ¯g=1N(n1μ¯1,n2μ¯2,,ngμ¯G)(n1,n2,,nG)T

Let K=(n1,n2,,nG)T, then using the formula of between covariance matrix, we have

(n1μ¯1n1μ¯,n2μ¯2n2μ¯,,nGμ¯GnGμ¯)=(n1μ¯1,n2μ¯2,,nGμ¯G)μ¯(n1,n2,,nG)=(n1μ¯1,n2μ¯2,,nGμ¯G)(I1NKKT)

ngμ¯g=ng1ngXT1g=1ngXT1g

Σ^b=1Ng=1Gng(μ¯gμ¯)(μ¯gμ¯)T=1Ng=1Gng(μ¯gμ¯)ng(μ¯gμ¯)T=1N(n1μ¯1,n2μ¯2,,nGμ¯G)(I1NKKT)(n1μ¯1,n2μ¯2,,nGμ¯G)T=1NXT(1ng1g)N×G(I1NKKT)(1ng1g)TN×GX=1NXT(1ng1g)N×GC(1ng1g)TN×GX

Claim that C=H~TH~, where H~RG1×G. That is to say

C=(I1NKKT)=H~H~T

so (K,H~T) is an orthogonal matrix. From the theory of orthogonal contrasts for unbalanced data , we have the G1 orthogonal contrasts have the following form:

δr=nr+1(h=1rnh(μ¯hμ¯r+1))

Denoted by hr the rth row of H~. Then from the definition of orthogonal contrasts, for some constant Cr,

XT(1ng)hTr=Crδr

which can be rewritten as

j=1Ghrhnjμ¯j=Crnr+1j=1rnj(μ¯jμ¯r+1)

Then

hrjnjhrr+1nr+1hri=Crnr+1nj=Crnr+1t=1rnt=0forj=1,2,,r

which gives

hrjhrr+1hri=Crnr+1nj=Crt=1rnt=0forj=1,2,,r

To making hr2=1, set

C2ri=1rnr+1ni+j=1rnj2=1

Cr=1ri=1nir+1j=1nj

Now

XT(1ng1g)hr=nr+1ri=1nir+1j=1nj(h=1rnh(μ¯hμ¯r+1))

So

Σ^b=1NXT(1ng1g)N×GC(1ng1g)TN×GX=1NXT(1ng1g)N×GH~TH~(1ng1g)TN×GX=ΔΔT

where Δ=1NXT(1ng1g)N×GH~T
and
Δr=nr+1Nri=1nir+1j=1nj(h=1rnh(μ¯hμ¯r+1))

0 0