
来源:互联网 发布:矢量数据的概念 编辑:程序博客网 时间:2024/09/21 09:27




解决"Curse ofDimensionality"问题 。if we are able toproject our data from a higher-dimensional space to a lower one while keepingmost of the relevant information, that would make life a lot easier for ourlearning methods.


from sklearn.manifold importTSNE #Nonlinear,probabilistic method
from sklearn.decomposition import PCA #Unsupervised,linear method
from sklearn.discriminant_analysis import LinearDiscriminantAnalysisas LDA

1.  (PCA ) - Unsupervised, linear method



For a brilliant and detailed description on this,check out this stackexchange thread:

PCA and proportion of variance explained by amoeba他给出很好的解释!


Principal Component Analysis in 3 Simple Steps bySebastian Raschka


2.    Linear Discriminant Analysis (LDA) - Supervised, linear method

Both Linear Discriminant Analysis(LDA) and PCA are linear transformation methods. PCA yields the directions(principal components) that maximize the variance of the data, whereas LDA alsoaims to find the directions that maximize the separation (or discrimination)between different classes, which can be useful in pattern classification problem. 
In other words, PCA projects theentire dataset onto a different feature (sub)space, and LDA tries to determinea suitable feature (sub)space in order to distinguish between patterns thatbelong to different classes.

3.     T-SNE ( t-Distributed StochasticNeighbour Embedding )  - Nonlinear, probabilisticmethod


T-SNE aims to convert the Euclidean distancesbetween points into conditional probabilities. A Student-t distribution is thenapplied on these probabilities which serve as metrics to calculate thesimilarity between one datapoint to another.

From the t-SNE scatter plot the first thing that strikes is thatclusters ( and even subclusters ) are very well defined and segregated, resultingin Jackson-Pollock like Modern Art visuals, even more so than the PCA and LDAmethods. T-SNE提供非常好的集群可视化的能力可以归结为算法的拓扑保护属性。

