[Sklearn应用6] Feature Selection 特征选择（二）

来源：互联网发布：gps pf11导航仪端口编辑：程序博客网时间：2024/06/05 18:04

此内容在sklearn官网地址： http://scikit-learn.org/stable/modules/feature_selection.html
sklearn版本：0.18.2

特征选择除了上节说到的L1-based feature selection、Tree-based feature selection的主要方式。另外还有一些不那么常用方法，如官网中提到的Removing features with low variance、Univariate feature selection。

Removing features with low variance

以下为官网关于此内容的全部解释
这里写图片描述

设置方差阈值，根据阈值进行筛选。

Univariate feature selection

Univariate feature selection works by selecting the best features based on univariate statistical tests. It can be seen as a preprocessing step to an estimator. Scikit-learn exposes feature selection routines as objects that implement the transform method:

SelectKBest removes all but the k highest scoring features # 根据分数筛选
SelectPercentile removes all but a user-specified highest scoring percentage of features # 根据比例筛选

These objects take as input a scoring function that returns univariate scores and p-values.

For regression: f_regression, mutual_info_regression
For classification: chi2, f_classif, mutual_info_classif

>>> from sklearn.datasets import load_iris>>> from sklearn.feature_selection import SelectKBest>>> from sklearn.feature_selection import chi2>>> iris = load_iris()                         # 加载数据集>>> X, y = iris.data, iris.target>>> X.shape                                    # 原始数据中包含4个特征(150, 4)>>> X_new = SelectKBest(chi2, k=2).fit_transform(X, y)  # 设置scoring function以及保留特征数量>>> X_new.shape                                # 处理完后剩余2个特征(150, 2)

阅读全文

0 0