scikit-learn 中的随机森林用法
来源:互联网 发布:js判断ie版本是否大于8 编辑:程序博客网 时间:2024/04/30 11:14
随机森林是一种以决策树为基分类器的常用集成分类器,使用取平均方法组合基分类器来预测样本类别。在Python的机器学习包scikit-learn中已经有具体实现。
下面给出使用方法
from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(n_estimators=10) model.fit(train_x, train_y)
其中 train_x为训练样本特征集,train_y为对应的样本标签。
下面给出RandomForestClassifier函数的输入参数:
sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_split=1e-07, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
主要参数(Parameters)有:
- n_estimators : 森林中树的数量,默认为10。
- criterion : 结点属性划分度量准则,可选择“gini”准则,即基尼不纯度度量准则,或者是“entropy”准则, 即信息增益度量准则,默认为“gini”准则。此参数为决策树分类器独有。
- max_features: 寻找最佳属性划分时所使用的特征数量。
If int, then consider max_features features at each split.
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features. - max_depth : 树的最大深度。默认 max_depth=None, 此时结点会一直增长,直到结点下所有样本均为同一类别,或者样本数目不大于min_samples_split 。
- min_samples_split :(default=2),内部结点所需划分的最小样本数,如果是int类型,那么当属于该结点的样本数不大于该值时,不再进行分裂。如果是float类型,min_samples_split 是比例系数,最小样本数为ceil(min_samples_split * n_samples) 。
- min_samples_leaf : 叶子结点最小样本数。
- min_weight_fraction_leaf : The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
- max_leaf_nodes :树的最大叶结点数,如果是None,则不限制。
- min_impurity_split : float, optional (default=1e-7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf. - bootstrap : boolean, optional (default=True)
Whether bootstrap samples are used when building trees. - oob_score : bool (default=False)
Whether to use out-of-bag samples to estimate the generalization accuracy. - n_jobs : integer, optional (default=1)
并行计算时使用的核数目。为-1时,使用所有核。 - random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. - verbose : int, optional (default=0)
Controls the verbosity of the tree building process. - warm_start : bool, optional (default=False)
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. - class_weight : dict, list of dicts, “balanced”,
“balanced_subsample” or None, optional (default=None) Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.
For multi-output, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
比较重要的几个模型属性: - feature_importances_ : array of shape = [n_features]
特征重要性,值越大,特征相对越重要。 n_features_ : int 模型拟合时使用的特征数量。
Reference:http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier
阅读全文
0 0
- scikit-learn 中的随机森林用法
- 使用scikit-learn解释随机森林算法
- scikit-learn随机森林调参小结
- scikit-learn随机森林调参小结
- scikit-learn随机森林调参
- scikit-learn随机森林调参小结
- 使用scikit-learn解释随机森林算法
- scikit-learn中随机森林使用详解
- <机器学习笔记-05 ><scikit-learn 05>决策树 & 随机森林
- Scikit-Learn 随机森林分类器的使用
- 【集成学习】scikit-learn随机森林调参小结
- Machine Learning with Scikit-Learn and Tensorflow 7.5 随机森林
- Python scikit-learn包 决策树和随机森林实例代码
- 使用scikit-learn的随机森林对西瓜进行分类
- 转:Scikit-Learn 随机森林分类器的使用
- python包sk-learn中的随机森林
- WePay机器学习反欺诈实践:Python+scikit-learn+随机森林
- python机器学习库scikit-learn简明教程之:随机森林
- hdu 1028带来的启示
- hadoop 2.7.3 集群模式
- Tomcat服务器学习和使用
- Codeforces Round #419
- Lua 函数function
- scikit-learn 中的随机森林用法
- 观点:深度学习,先跟上再说
- 线程
- Linux下搭建集群环境(2)-----------linux下安装Mysql
- cloudera反向解析问题
- Codeforces Round #419 (Div. 2)
- node.js多种路由功能
- codeforces Karen and Coffee (区间贡献 思维)
- 关于thinkphp写入缓存失败的原因