《机器学习实战》第十三章 PCA

来源:互联网 发布:通过销售数据进行分析 编辑:程序博客网 时间:2024/05/18 05:44

在这一章的学习过程中,前面的程序都可以正常执行,但是在做13.3节,利用PCA对半导体制造数据降维时提示错误:

numpy.linalg.linalg.LinAlgError: Array must not contain infs or NaNs

错误写的很明显,数据中含有了无穷大(infs)或缺失值(NaNs),那么我们找到提示出错的语句

eigVals, eigVects = np.linalg.eig(np.matrix(covMat))

明显是covMat里面的数据有问题,我们将它输出来看下

>>> covMatarray([[           nan,            nan,            nan, ...,                   nan,            nan,            nan],       [           nan,  6436.49876891,            nan, ...,                   nan,            nan,            nan],       [           nan,            nan,            nan, ...,                   nan,            nan,            nan],       ...,        [           nan,            nan,            nan, ...,                   nan,            nan,            nan],       [           nan,            nan,            nan, ...,                   nan,            nan,            nan],       [           nan,            nan,            nan, ...,                   nan,            nan,            nan]])

covMat数据中有多个缺失值nan,covMat之前的计算过程是这样的

>>> dataMat = pca.replaceNanWithMean()>>> meanVals = np.mean(dataMat, axis = 0)>>> meanRemoved = dataMat - meanVals>>> covMat = np.cov(meanRemoved, rowvar = 0)

我们将meanRemoved输出,发现有nan值,将meanVals输出,发现也有nan值,甚至将dataMat输出,也有nan值,哎,扶额,看来之前replaceNanWithMean()函数中去nan值的过程有问题,不过我不想再去细改了,心累,我直接采用了一种比较简单的方法,那就是只把covMat里面的nan改为0,这是一个偷懒的方法,结果可能会和书中老师的教程有一点差异,不过我是在学习,我也就不在乎了。我是这样改的:
在covMat求出来后执行这样一条语句

>>> for i in range(np.shape(covMat)[0]):       for j in range(np.shape(covMat)[1]):          if np.isnan(covMat[i][j]):             covMat[i][j] = 0

看一下修改结果

>>> covMatarray([[    0.        ,     0.        ,     0.        , ...,            0.        ,     0.        ,     0.        ],       [    0.        ,  6436.49876891,     0.        , ...,            0.        ,     0.        ,     0.        ],       [    0.        ,     0.        ,     0.        , ...,            0.        ,     0.        ,     0.        ],       ...,        [    0.        ,     0.        ,     0.        , ...,            0.        ,     0.        ,     0.        ],       [    0.        ,     0.        ,     0.        , ...,            0.        ,     0.        ,     0.        ],       [    0.        ,     0.        ,     0.        , ...,            0.        ,     0.        ,     0.        ]])

嗯,不错,然后再执行

>>> eigVals, eigVects = np.linalg.eig(np.matrix(covMat))>>> eigValsarray([  1.06780209e+04,   8.59736396e+03,   6.41273414e+03,         5.02643597e+03,   3.40488093e+03,   3.50779450e+03,         2.22725443e+03,   2.75157234e+03,   2.84136889e+02,         4.31321293e+01,   3.88326123e+01,   3.62389854e+01,         2.40571225e+01,   1.75260412e+01,   1.33949736e+01,         1.07994035e+01,   7.22312565e+00,   3.51543592e+00,         2.09186700e+00,   1.04068981e+00,   9.22874395e-01,         8.93680570e-01,   5.61249320e-01,   3.11675322e-01,         7.21988883e-02,   2.18386663e-02,   1.35886283e-02,         5.46328191e-03,   2.03257991e-03,   1.33860851e-03,         2.03863244e-04,   1.41361019e-04,   1.15731062e-04,         9.39311390e-05,   7.57144587e-05,   5.19232822e-05,         4.07149667e-05,   2.43725392e-05,   2.35559282e-05,         1.72257179e-05,   7.59457610e-06,   6.94631765e-06,         1.99023740e-06,   1.39115700e-06,   6.53622178e-07,         3.13084316e-07,   1.32830309e-09,   9.79155640e-09,         3.03601711e-08,   1.20546679e-07,   1.08996063e-07,         2.10808964e-07,   1.89481992e-07,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,        ....................................................         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00,   0.00000000e+00,         0.00000000e+00,   0.00000000e+00])

画下图看下结果,还是不错的嘛,哈哈哈(自我催眠中。。。)
前20个eigVals数据

原创粉丝点击