python机器学习Day‘7

来源:互联网 发布:java读取文本内容 编辑:程序博客网 时间:2024/06/05 02:42

Lets do it!

1.1. Generalized Linear Models 

1.1.广义线性模型

The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if \hat{y} is the predicted value.

以下是一组用于线性回归的方法,其中目标值(希望能够符合)与输入变量是线性组合关系。在数学概念中,假设是预测值。

\hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p

Across the module, we designate the vector w = (w_1,..., w_p) as coef_ and w_0 as intercept_.

在整个模块中,我们指定向量作为系数(对应的权重)coef_ ,以及作为截距intercept_

To perform classification with generalized linear models, see Logistic regression.

使用广义线性模型完成分类,请参阅 Logistic回归

 


1.1.1. Ordinary Least Squares

1.1.1.普通最小二乘法

LinearRegression fits a linear model with coefficients w = (w_1, ..., w_p) to minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Mathematically it solves a problem of the form:

LinearRegression模块用来拟合系数为 的线性模型,来得到数据集中观察样本的残差平方和的最小值,并通过线性近似预测。用数学式子表达:


\underset{w}{min\,} {|| X w - y||_2}^2

../_images/sphx_glr_plot_ols_0011.png

LinearRegression will take in its fit method arrays X, y and will store the coefficients w of the linear model in its coef_member:

LinearRegression模块将采用其fit方法拟合数组Xy并将其线性模型的系数存储在其 coef_成员中:


>>>
>>> from sklearn import linear_model>>> reg = linear_model.LinearRegression()>>> reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)>>> reg.coef_array([ 0.5,  0.5])

However, coefficient estimates for Ordinary Least Squares rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the design matrix becomes close to singular and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed response, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.

然而,普通最小二乘的系数估计依赖于模型项的独立性。当各项之间有关联并且回归矩阵的列之间具有近似线性依赖时,回归矩阵变得接近于奇异矩阵(singular),并且,如果训练集中出现了随机误差,最小二乘估计对其高度敏感,方差也随之增大。多重共线性的情况可能出现,例如,当没有通过实验设计( experimental design)收集数据时。


Examples:

  • Linear Regression Example

    线性回归实例

1.1.1.1. Ordinary Least Squares Complexity

1.1.1.1.普通最小二乘复杂度


This method computes the least squares solution using a singular value decomposition of X. If X is a matrix of size (n, p) this method has a cost of O(n p^2), assuming that n \geq p.

该方法使用X的奇异值分解来计算最小二乘解。如果X是大小为(n,p)的矩阵,则该方法的成本为O(n p^2),设n \geq p

1.1.2. Ridge Regression

1.1.2.岭回归


Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of coefficients. The ridge coefficients minimize a penalized residual sum of squares,

Ridge回归通过对系数的大小增加惩罚来解决普通最小二乘的一些问题 。脊系数最小化了惩罚性残差平方和,


\underset{w}{min\,} {{|| X w - y||_2}^2 + \alpha {||w||_2}^2}

Here, \alpha \geq 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of \alpha, the greater the amount of shrinkage and thus the coefficients become more robust to collinearity.

在这里,是一个控制收缩量的复杂性参数:值越大,收缩量越大,因此系数能够更利于达到共线性。


../_images/sphx_glr_plot_ridge_path_0011.png

As with other linear models, Ridge will take in its fit method arrays X, y and will store the coefficients w of the linear model in its coef_ member:

与其他线性模型一样,Ridge将采用其fit方法拟合数组Xy并将其线性模型的系数存储在其coef_成员中:


>>>
>>> from sklearn import linear_model>>> reg = linear_model.Ridge (alpha = .5)>>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,      normalize=False, random_state=None, solver='auto', tol=0.001)>>> reg.coef_array([ 0.34545455,  0.34545455])>>> reg.intercept_ 0.13636...

Examples:

  • Plot Ridge coefficients as a function of the regularization 
  • 绘制山脊系数作为正则化的函数
  • Classification of text documents using sparse features
  • 使用稀疏特征分类文本文档

1.1.2.1. Ridge Complexity

1.1.2.1.岭复杂度

This method has the same order of complexity than an Ordinary Least Squares.

此方法与普通最小二乘方法的复杂度相同 

1.1.2.2. Setting the regularization parameter: generalized Cross-Validation

1.1.2.2.设置正则化参数:广义交叉验证

RidgeCV implements ridge regression with built-in cross-validation of the alpha parameter. The object works in the same way as GridSearchCV except that it defaults to Generalized Cross-Validation (GCV), an efficient form of leave-one-out cross-validation:

RidgeCV通过内置的Alpha参数交叉验证实现岭回归。该对象的工作方式与GridSearchCV的工作方式相同,只是它默认为通用交叉验证(GCV),这是一种有效的弃一法交叉验证形式:


>>>
>>> from sklearn import linear_model>>> reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])>>> reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])       RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, scoring=None,    normalize=False)>>> reg.alpha_                                      0.1

References

  • “Notes on Regularized Least Squares”, Rifkin & Lippert (technical report, course slides).
  • “关于正则最小二乘的注释RifkinLippert技术报告 课程幻灯片)。


1.1.3. Lasso

The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero weights (see Compressive sensing: tomography reconstruction with L1 prior (Lasso)).

Lasso是估计稀疏系数的线性模型。它适用于某些情况,因为它倾向于使用更少的参数得到更好的解决方案,有效地减少了依赖于给定情况的参数数量。为此,Lasso及其变体是压缩传感测领域的基础。在某些条件下,它可以恢复精确的非零权重集(参见 压缩感知:使用L1 priorLasso)进行层析成像重建)。

Mathematically, it consists of a linear model trained with \ell_1 prior as regularizer. The objective function to minimize is:

在数学上,它是一个由先验概率为是正则化(regularizer)处理过的线性模型。目标函数最小化是:


\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}

The lasso estimate thus solves the minimization of the least-squares penalty with \alpha ||w||_1 added, where \alpha is a constant and ||w||_1 is the \ell_1-norm of the parameter vector.

Lasso估计因此解决了加上的最小二乘罚分(the least-squares penalty)的最小化,其中 是常数,是参数向量的曼哈顿距离(就是绝对值相加,-norm

The implementation in the class Lasso uses coordinate descent as the algorithm to fit the coefficients. See Least Angle Regression for another implementation:

Lasso类中的实现使用坐标下降算法来拟合系数。查看最小角度回归 用于另一个实现:

>>>
>>> from sklearn import linear_model>>> reg = linear_model.Lasso(alpha = 0.1)>>> reg.fit([[0, 0], [1, 1]], [0, 1])Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,   normalize=False, positive=False, precompute=False, random_state=None,   selection='cyclic', tol=0.0001, warm_start=False)>>> reg.predict([[1, 1]])array([ 0.8])

Also useful for lower-level tasks is the function lasso_path that computes the coefficients along the full path of possible values.

lasso_path对于较低级别的任务也很有效,按照可能值的完整路径计算系数。


Examples:

  • Lasso and Elastic Net for Sparse Signals
  • 稀疏信号的套索和弹性网

  • Compressive sensing: tomography reconstruction with L1 prior (Lasso)
  • 压缩感知:L1先前(Lasso)的断层扫描重建

Note

 

Feature selection with Lasso    Lasso特征选择

As the Lasso regression yields sparse models, it can thus be used to perform feature selection, as detailed in L1-based feature selection.

由于Lasso回归产生稀疏模型,因此可以用于进行特征选择,详述见基于L1的特征选择。

Note

 

Randomized sparsity 

随机稀疏(sparsity

For feature selection or sparse recovery, it may be interesting to use Randomized sparse models.

对于特征选择或稀疏求解,使用随机稀疏模型可能会有趣。

1.1.3.1. Setting regularization parameter

1.1.3.1.设定正则化参数

The alpha parameter controls the degree of sparsity of the coefficients estimated.

alpha参数控制估计系数的稀疏度。

1.1.3.1.1. Using cross-validation

1.1.3.1.1.使用交叉验证

scikit-learn exposes objects that set the Lasso alpha parameter by cross-validation: LassoCV and LassoLarsCV.LassoLarsCV is based on the Least Angle Regression algorithm explained below.

scikit-learn提供有通过交叉验证确定alpha参数的Lasson模块:LassoCVLassoLarsCVLassoLarsCV是基于下文的最小角度回归算法。

For high-dimensional datasets with many collinear regressors, LassoCV is most often preferable. However, LassoLarsCVhas the advantage of exploring more relevant values of alpha parameter, and if the number of samples is very small compared to the number of observations, it is often faster than LassoCV.

对于具有许多线性回归的高维数据集, LassoCV是最常用的模块。但是LassoLarsCV还有探索更多α参数相关值的优势,并且如果样本数量与参考点(observations)数量相比非常小,则时间成本通常比LassoCV更小。

lasso_cv_1 lasso_cv_2

1.1.3.1.2. Information-criteria based model selection

1.1.3.1.2. 信息化标准的基于模型的选择

Alternatively, the estimator LassoLarsIC proposes to use the Akaike information criterion (AIC) and the Bayes Information criterion (BIC). It is a computationally cheaper alternative to find the optimal value of alpha as the regularization path is computed only once instead of k+1 times when using k-fold cross-validation. However, such criteria needs a proper estimation of the degrees of freedom of the solution, are derived for large samples (asymptotic results) and assume the model is correct, i.e. that the data are actually generated by this model. They also tend to break when the problem is badly conditioned (more features than samples).

另外,分类器(estimatorLassoLarsIC建议使用Akaike信息标准(AIC)和贝叶斯信息准则(BIC)。在寻找α的最优值时,这是一个计算成本更低廉的替代方法,因为当使用k-fold交叉验证时,正则化路径只需要计算一次,替代了原本的k+1次。然而,这样的标准需要对解决方案的自由度进行适当的估计,衍生为(arederived for)大样本(渐近结果),并假设模型是正确的,即数据实际上是由该模型生成的。当问题严重受限(比样本更多的特征)时,他们也倾向于突破(break)。

../_images/sphx_glr_plot_lasso_model_selection_0011.png

Examples:

  • Lasso model selection: Cross-Validation / AIC / BIC
  • Lasso型号选择:交叉验证/AIC / BIC