[Sklearn应用5] Feature Selection 特征选择(一) SelectFromModel
来源:互联网 发布:大漠驼铃 php 编辑:程序博客网 时间:2024/06/04 22:13
此内容在sklearn官网地址: http://scikit-learn.org/stable/modules/feature_selection.html
sklearn版本:0.18.2
sklearn.feature_selection
The module can be used for feature selection/dimensionality reduction on sample sets, either to improve estimators’ accuracy scores or to boost their performance on very high-dimensional datasets.
用于特征选择/降维,可提高精度和性能。
特征选择有很多种方式,下面讲第一种:通过SelectFromModel选择。
sklearn.feature_selection.SelectFromModel
SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.
fit后得到coef_ 或feature_importances_,应用于特征中,小于设定的阈值特征认为不重要,可去除。coef_(系数)适用于线性模型,而无系数的非线性模型使用feature_importances_ 。
SelectFromModel包括:L1-based feature selection 、 Tree-based feature selection 等
L1-based feature selection
Linear models penalized with the L1 norm have sparse solutions: many of their estimated coefficients are zero. When the goal is to reduce the dimensionality of the data to use with another classifier, they can be used along with feature_selection.SelectFromModel to select the non-zero coefficients. In particular, sparse estimators useful for this purpose are the linear_model.Lasso for regression, and of linear_model.LogisticRegression and svm.LinearSVC for classification
有系数的线性模型中,L1正则化可生成一个稀疏矩阵,利于计算,所以可以做特征选择。L1正则造成稀疏的原因具体请参考:http://blog.csdn.net/jinping_shi/article/details/52433975
# class sklearn.feature_selection.SelectFromModel(estimator, threshold=None, prefit=False) #from sklearn.feature_selection import SelectFromModelfrom sklearn.linear_model import Lasso # 此处以L1正则化的线性模型Lasso为例lasso = Lasso() # 可在此步对模型进行参数设置,这里用默认值。lasso.fit(X, y) # 训练模型,传入X、y, 数据中不能包含miss_valuemodel = SelectFromModel(lasso)X_new = model.transform(X) # 此步可删除参数为0的特征,使用get_dummies处理后的数据需一个特征包含的所有项都为0
Tree-based feature selection
Tree-based estimators (see the sklearn.tree module and forest of trees in the sklearn.ensemble module) can be used to compute feature importances, which in turn can be used to discard irrelevant features (when coupled with the sklearn.feature_selection.SelectFromModel meta-transformer)
在无系数的非线性模型中,通过计算得到特征重要性,根据重要性筛选无关特征。
from sklearn.feature_selection import SelectFromModelfrom sklearn.ensemble import RandomForestRegressor # 同样以此模型举例rf = RandomForestRegressor() # 默认参数rf.fit(X, y)model = SelectFromModel(rf)X_new = model.transform(X)
- [Sklearn应用5] Feature Selection 特征选择(一) SelectFromModel
- [Sklearn应用6] Feature Selection 特征选择(二)
- sklearn-学习:Dimensionality reduction(降维)-(feature selection)特征选择
- 特征选择Feature Selection
- 特征选择(feature selection)
- 特征选择(Feature Selection)
- Feature Selection(特征选择)
- 特征选择(feature selection)
- 1.13. 特征选择(Feature selection)
- 斯坦福大学机器学习——特征选择(Feature selection)
- 总结 特征选择(feature selection)算法笔记
- 总结 特征选择(feature selection)算法笔记
- 机器学习的特征选择(feature selection)
- 交叉验证(Cross-Validation)和特征选择(Feature Selection)
- Java机器学习库ML之二Feature Selection(特征选择)
- Weka中的Correlation based Feature Selection(特征选择)方法简介
- 特征选择与稀疏学习(Feature Selection and Sparse Learning)
- RELIEF Feature Selection(RELIEF特征选择) Python实现
- Android 源码编译SDK
- eclipse下Android项目65535方法数限制的完美解决,亲测有效
- 数据结构与算法第一章练习
- homestead注意事项
- Node JS实现简单网页服务器
- [Sklearn应用5] Feature Selection 特征选择(一) SelectFromModel
- Kotlin-23.内联函数(Inline Functions)
- 微店 Android 插件化实践
- 各种OJ刷题记录6.27-7.6
- Linux/Mac 交叉编译 Android 程序
- 机器学习实战--朴素贝叶斯
- 如何在一个Tomcat下部署两个应用
- eclipse如何设置点击页面找到侧边的目录
- 多线程