1.10. Decision Trees : sklearn.tree.DecisionTreeClassifier

来源：互联网发布：centos 6关闭防火墙编辑：程序博客网时间：2024/06/05 01:50

apply(X, check_input=True)[source]

Returns the index of the leaf that each sample is predicted as.

返回预测样本所在叶子节点的索引。

Parameters:

X : array_like or sparse matrix, shape = [n_samples, n_features]

The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

check_input : boolean, (default=True)

Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

X_leaves : array_like, shape = [n_samples,]

For each datapoint x in X, return the index of the leaf x ends up in. Leaves are numbered within [0; self.tree_.node_count), possibly with gaps in the numbering

decision_path(X, check_input=True)[source]

Return the decision path in the tree

返回在决策树中的决策路径。

New in version 0.18.

Parameters:

X : array_like or sparse matrix, shape = [n_samples, n_features]

The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

check_input : boolean, (default=True)

Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

indicator : sparse csr array, shape = [n_samples, n_nodes]

Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes.

从第二幅图和下图可以较清晰地看出，括号中第二个数字代表的是节点的索引，连续起来即为决策路径。

feature_importances_

Return the feature importances.

返回特征的重要性

The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.（并不是很懂）

Returns:feature_importances_ : array, shape = [n_features]

fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)[source]

Build a decision tree classifier from the training set (X, y).

使用训练集来构建决策树分类器。

Parameters:

X : array-like or sparse matrix, shape = [n_samples, n_features]

The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

y : array-like, shape = [n_samples] or [n_samples, n_outputs]

The target values (class labels) as integers or strings.

sample_weight : array-like, shape = [n_samples] or None

Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.

check_input : boolean, (default=True)

Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

X_idx_sorted : array-like, shape = [n_samples, n_features], optional

The indexes of the sorted training input samples. If many tree are grown on the same dataset, this allows the ordering to be cached between trees. If None, the data will be sorted here. Don’t use this parameter unless you know what to do.

Returns:

self : object

Returns self.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

并不是很懂，而且好像要不能用了？

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X : numpy array of shape [n_samples, n_features]

Training set.

y : numpy array of shape [n_samples]

Target values.

Returns:

X_new : numpy array of shape [n_samples, n_features_new]

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

获得分类器的参数。

Parameters:

deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : mapping of string to any

Parameter names mapped to their values.

predict(X, check_input=True)[source]

Predict class or regression value for X.

预测X的类别。

For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

check_input : boolean, (default=True)

Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

y : array of shape = [n_samples] or [n_samples, n_outputs]

The predicted classes, or the predict values.

predict_log_proba(X)[source]

Predict classlog-probabilities of the input samples X.

（并不是很懂）

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns:

p : array of shape = [n_samples, n_classes], or a list of n_outputs

such arrays if n_outputs > 1. The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X, check_input=True)[source]

Predict class probabilities of the input samples X.

预测输入样本X的类别概率。

The predicted class probability is the fraction of samples of the same class in a leaf.

check_input: boolean, (default=True): Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Parameters:

X : array-like or sparse matrix of shape = [n_samples, n_features]

The input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csr_matrix.

Returns:

p : array of shape = [n_samples, n_classes], or a list of n_outputs

such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]

Returns the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

X : array-like, shape = (n_samples, n_features)

Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)

True labels for X.

sample_weight : array-like, shape = [n_samples], optional

Sample weights.

Returns:

score : float

Mean accuracy of self.predict(X) wrt. y.

set_params(**params)[source]

Set the parameters of this estimator.

设置分类器的参数。

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:self :

0 0