无监督机器学习－－Ipython notebook

来源：互联网发布：东风本田广汽本田知乎编辑：程序博客网时间：2024/05/05 12:32

Unsupervised Learning

Dimensionality Reduction & Feature Extraction via PCA(EigenFace)

%matplotlib inlineimport itertoolsimport numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport scipyfrom sklearn import clusterfrom sklearn import datasetsfrom sklearn import metricsfrom sklearn.neighbors import kneighbors_graphfrom sklearn.preprocessing import StandardScalerfrom sklearn import decomposition #PCAimport time

Modified Olivetti faces dataset

faces = datasets.olivetti_faces.fetch_olivetti_faces()print(faces.DESCR)

downloading Olivetti faces from http://cs.nyu.edu/~roweis/data/olivettifaces.mat to /home/hz/scikit_learn_dataModified Olivetti faces dataset.The original database was available from (now defunct)    http://www.uk.research.att.com/facedatabase.htmlThe version retrieved here comes in MATLAB format from the personalweb page of Sam Roweis:    http://www.cs.nyu.edu/~roweis/There are ten different images of each of 40 distinct subjects. For somesubjects, the images were taken at different times, varying the lighting,facial expressions (open / closed eyes, smiling / not smiling) and facialdetails (glasses / no glasses). All the images were taken against a darkhomogeneous background with the subjects in an upright, frontal position (withtolerance for some side movement).The original dataset consisted of 92 x 112, while the Roweis versionconsists of 64x64 images.

faces_images = faces['images']faces_data = faces.datafaces_images.shape

(400, 64, 64)

faces_data.shape

(400, 4096)

faces_images即是faces_data数据图片格式表示，便于绘制图片

现在绘制６４张人脸图像

fig = plt.figure(figsize=(16,16))for i in range(64):    plt.subplot(8,8,i+1)    plt.imshow(faces_images[i], cmap=plt.cm.gray)    plt.grid(False)    plt.xticks([])    plt.yticks([])

这里写图片描述

n_eigenfaces = 16pca = decomposition.RandomizedPCA(n_components=n_eigenfaces, whiten=True)pca.fit(faces_data)

RandomizedPCA(copy=True, iterated_power=3, n_components=16, random_state=None,       whiten=True)

pca.components_.shape

(16, 4096)

plt.figure(figsize=(16,16))plt.suptitle('EigenFaces')for i in range(pca.components_.shape[0]):    plt.subplot(4,4,i+1)    plt.imshow(pca.components_[i].reshape(64,64), cmap=plt.cm.gray)    plt.grid(False)    plt.xticks([])    plt.yticks([])

这里写图片描述

with plt.style.context('fivethirtyeight'):    plt.figure(figsize=(16,12))    plt.title('Explained VAriance Ratio over Component')    plt.plot(pca.explained_variance_)

这里写图片描述

with plt.style.context('fivethirtyeight'):    plt.figure(figsize=(16,12))    plt.title('Cumulative Explained Variance over EigenFace')    plt.plot(pca.explained_variance_ratio_.cumsum())

这里写图片描述

print('PCA captures {:.2f} percent of the variance in the dataset'.format(pca.explained_variance_ratio_.sum() * 100))

PCA captures 73.09 percent of the variance in the dataset

n_eigenfaces = 121pca = decomposition.RandomizedPCA(n_components=n_eigenfaces, whiten=True)pca.fit(faces_data)

RandomizedPCA(copy=True, iterated_power=3, n_components=121,       random_state=None, whiten=True)

with plt.style.context('fivethirtyeight'):    plt.figure(figsize=(16,12))    plt.title('Cumulative Explained Variance over EigenFace')    plt.plot(pca.explained_variance_ratio_.cumsum())

这里写图片描述

print('PCA captures {:.2f} percent of the variance in the dataset'.format(pca.explained_variance_ratio_.sum() * 100))

PCA captures 94.84 percent of the variance in the dataset

plt.figure(figsize=(16, 16));plt.suptitle('EigenFaces');for ii in range(pca.components_.shape[0]):    plt.subplot(11, 11, ii + 1) # It starts with one    plt.imshow(pca.components_[ii].reshape(64, 64), cmap=plt.cm.gray)    plt.grid(False);    plt.xticks([]);    plt.yticks([]);

这里写图片描述

0 0