决策树相关知识点

来源：互联网发布：电信是什么网络制式编辑：程序博客网时间：2024/05/19 13:22

决策树

if-then 规则集合

决策树学习是从训练集估计条件概率模型

损失函数（正则化极大似然）

其中T为决策树，N_t 为决策树子叶节点个数，N_tk为k类样本的点数。

（如何由极大化似然推导？）

Tree-based methods partition the feature space into a set of rectangles, and then fit asimple model(like a constant)in each one不一定为常数 via ESL2 P305

一般步骤：

特征选择

决策树生成

决策树枝剪

Output : Tree

特征选择:

信息增益/互信息（Information Gain/mutual information）最大化:

信息增益意义：得知特征X从而使特征Y不确定性减小的程度

信息增益比（Information Gain Ratio）

该针对 IG 趋向于选择特征值较多的特征的问题

但IGR对可取值较少的属性有偏好

Gini系数：

gini反应了数据纯度

生成

ID3/C4.5/CART分别采用

IG_max

IGR_max from IG>IG_ave,

Gini_min

枝剪

预枝剪

对单点展开进行评估，当精度下降则剪去

后枝剪

损失函数最小化，

递归的从各个子叶节点自底向上，判断损失函数是否减小

一些trick & Issue (ESL2 310-313)

连续值处理（ESL2 P310 Catogorical Predictor）

二分法：寻找一个切分使得Gain最大

当值很多时会overfitting

缺失值处理（周志华 85）

建树时的缺失值 — 特征选择标准的计算

以Gain为例

预测时的缺失值

以样本集中各种属性数目的比例，概率的划分到各个节点中

多变量决策（周志华 88）/Linear CombinationSplit（ESL）

提高分类能力，损失可解释性

The Loss Matrix ??

损失函数中α的adaptively choose

用 weakest link pruning方法

Why binary splits

The problem is that multiway splits fragment the data too quickly, leaving insuﬃcient data at the next level down. Hence we would want to use such splits only when needed. Since multiway splits can be achieved by a series of binary splits, the latter are preferred.

一些缺点

不稳定（Instability of Trees）和不光滑（Lack of Smoothness）

Adaboosting Tree

阅读全文

0 0