机器学习实战:PCA降维 样本协方差

来源:互联网 发布:软件质量管理活动包括 编辑:程序博客网 时间:2024/06/06 09:22

先用matlab试试样本协方差:

 

>> X=[1,3;2,4;0,6]

X =

     1     3
     2     4
     0     6

注意,行表示样本点,列表示属性列向量。以前使用matlab习惯用列向量表示点。

以下去除平均值、中心化(类似移动坐标系到样本质心

>> removMeanX=X-[mean(X);mean(X);mean(X)]

removMeanX =

         0   -1.3333
    1.0000   -0.3333
   -1.0000    1.6667

 

以下表示中心化前后样本协方差不变,注意样本方差是对各属性列而言。

>> covX=cov(X)

covX =

    1.0000   -1.0000
   -1.0000    2.3333

>> covRemovMeanX=cov(removMeanX)

covRemovMeanX =

    1.0000   -1.0000
   -1.0000    2.3333

 

以下表示中心化后,样本协方差可用各属性列点积得到,默认的cov()得到的是样本无偏协方差

(removMeanX'*removMeanX类似样本惯性矩阵,对角为惯性矩,其他为惯性积。)

>> covRemovMeanX_getByDot=removMeanX'*removMeanX/(3-1)

covRemovMeanX_getByDot =

    1.0000   -1.0000
   -1.0000    2.3333

 

>> [V,D]=eig(covRemovMeanX)

V =

   -0.8817   -0.4719
   -0.4719    0.8817


D =

    0.4648         0
         0    2.8685

>> V*D

ans =

   -0.4098   -1.3535
   -0.2193    2.5291

>> D*V

ans =

   -0.4098   -0.2193
   -1.3535    2.5291

>> covRemovMeanX*V

ans =

   -0.4098   -1.3535
   -0.2193    2.5291

 

(设样本中心化后坐标系为Er,惯性主坐标系为Ep,则V为Er到Ep的过渡矩阵,其列向量(特征向量)为Ep的单位基在Er下的投影坐标

 

# coding=utf-8'''Created on Jun 1, 2011@author: Peter Harrington'''from numpy import *def loadDataSet(fileName, delim='\t'):    fr = open(fileName)    stringArr = [line.strip().split(delim) for line in fr.readlines()]    datArr = [list(map(float,line)) for line in stringArr]    return mat(datArr)def pca(dataMat, topNfeat=9999999):    meanVals = mean(dataMat, axis=0)    meanRemoved = dataMat - meanVals #remove mean    covMat = cov(meanRemoved, rowvar=0)    eigVals,eigVects = linalg.eig(mat(covMat))    eigValInd = argsort(eigVals)            #sort, sort goes smallest to largest    eigValInd = eigValInd[:-(topNfeat+1):-1]  #cut off unwanted dimensions    redEigVects = eigVects[:,eigValInd]       #reorganize eig vects largest to smallest    lowDDataMat = meanRemoved * redEigVects#transform data into new dimensions    reconMat = (lowDDataMat * redEigVects.T) + meanVals    return lowDDataMat, reconMatdataMat=loadDataSet(r'C:\Users\li\Downloads\machinelearninginaction\Ch13\testSet.txt')lowDMat,reconMat=pca(dataMat,1)print(shape(lowDMat))import matplotlib.pyplot as pltfig=plt.figure()ax=fig.add_subplot(111)ax.scatter(dataMat[:,0].flatten().A[0],dataMat[:,1].flatten().A[0],marker='^',s=90)ax.scatter(reconMat[:,0].flatten().A[0],reconMat[:,1].flatten().A[0],marker='o',s=50,c='red')plt.show()


 

0 0
原创粉丝点击