Standardization, or mean removal and variance scaling
来源:互联网 发布:照片日期添加软件 编辑:程序博客网 时间:2024/05/16 08:45
4.2.1. Standardization, or mean removal and variance scaling
Standardization of datasets is a common requirement for many machine learning estimators implemented in the scikit: they might behave badly if the individual feature do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance.
In practice we often ignore the shape of the distribution and just transform the data to center it by removing the mean value of each feature, then scale it by dividing non-constant features by their standard deviation.
For instance, many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the l1 and l2 regularizers of linear models) assume that all features are centered around zero and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
The function scale provides a quick and easy way to perform this operation on a single array-like dataset:
>>> from sklearn import preprocessing
>>> import numpy as np
>>> X = np.array([[ 1., -1., 2.],
... [ 2., 0., 0.],
... [ 0., 1., -1.]])
>>> X_scaled = preprocessing.scale(X)
>>> X_scaled
array([[ 0. ..., -1.22..., 1.33...],
[ 1.22..., 0. ..., -0.26...],
[-1.22..., 1.22..., -1.06...]])
Scaled data has zero mean and unit variance:
>>> X_scaled.mean(axis=0)
array([ 0., 0., 0.])
>>> X_scaled.std(axis=0)
array([ 1., 1., 1.])
The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to compute the mean and standard deviation on a training set so as to be able to later reapply the same transformation on the testing set. This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline:
>>> scaler = preprocessing.StandardScaler().fit(X)
>>> scaler
StandardScaler(copy=True, with_mean=True, with_std=True)
>>> scaler.mean_
array([ 1. ..., 0. ..., 0.33...])
>>> scaler.std_
array([ 0.81..., 0.81..., 1.24...])
>>> scaler.transform(X)
array([[ 0. ..., -1.22..., 1.33...],
[ 1.22..., 0. ..., -0.26...],
[-1.22..., 1.22..., -1.06...]])
The scaler instance can then be used on new data to transform it the same way it did on the training set:
>>> scaler.transform([[-1., 1., 0.]])
array([[-2.44..., 1.22..., -0.26...]])
It is possible to disable either centering or scaling by either passing with_mean=False or with_std=False to the constructor of StandardScaler.
4.2.1.1. Scaling features to a range
An alternative standardization is scaling features to lie between a given minimum and maximum value, often between zero and one. This can be achieved using MinMaxScaler.
http://dfeej.info;
http://rtkuh.info;
http://xrtyf.info;
http://qefky.info;
http://whjir.info;
http://dfew.info;
http://cbjya.info;
http://qwfgr.info;
http://kuyhb.info;
http://qjyt.info;
http://wrtu.info;
http://xdes.info;
http://mkoy.info;
http://txbf.info;
http://wfkm.info;
http://njer.info;
http://www.dfeej.info;
http://www.rtkuh.info;
http://www.xrtyf.info;
http://www.qefky.info;
http://www.whjir.info;
http://www.dfew.info;
http://www.cbjya.info;
http://www.qwfgr.info;
http://www.kuyhb.info;
http://www.qjyt.info;
http://www.wrtu.info;
http://www.xdes.info;
http://www.mkoy.info;
http://www.txbf.info;
http://www.wfkm.info;
http://www.njer.info;
http://yuip.info;
http://fwqw.info;
http://hyui.info;
http://q237.info;
http://www.yuip.info;
http://www.fwqw.info;
http://www.hyui.info;
http://www.q237.info;
http://gsfea.info;
http://swzsa.info;
http://123wb.info;
http://ts235.info;
http://dt098.info;
http://sbr69.info;
http://xdfth.info;
http://dft2.info;
http://dvny6.info;
http://rh5n.info;
The motivation to use this scaling include robustness to very small standard deviations of features and preserving zero entries in sparse data.
Here is an example to scale a toy data matrix to the [0, 1] range:
- Standardization, or mean removal and variance scaling
- Statistics: Mean, Variance, Covariance, and Correlation
- Detecting Hardware Insertion and/or Removal
- Detecting Hardware Insertion and/or Removal
- Detecting Hardware Insertion and/or Removal
- Detecting Hardware Insertion and/or Removal
- bias & variance 以及 Mean squared error
- 平均值(Mean)、方差(Variance)、标准差(Standard Deviation)
- XML or BI publisher template and data Definition deletion or Removal
- Scaling Lucene and Solr
- Scaling Up And Out
- Covariance and Contra-variance
- Bias and Variance Tradeoff
- bias and variance
- Bias and Variance
- bias and variance
- Bias and Variance
- bias and variance
- 矩阵快速幂求递推式
- obj-c编程15[Cocoa实例01]:一个会发声的随机数生成器
- 使用Boost.Python开发
- 下载资源
- 【数论】欧拉定理
- Standardization, or mean removal and variance scaling
- /usr: 目录
- Linux设置环境变量小结
- android JUnit单元测试
- Xlight-让FTP搭建变得如此容易
- Spring就是通过工厂+反射将我们的bean放到它的容器中的
- 将Spring源码导入eclipse步骤
- 设计模式六大原则
- Fiddler-强悍开源HTTP调试代理工具