ｅｎｓｅｍｂｌｅ　总结　Kaggle-Ensemble-Guide

来源：互联网发布：mac os x 10.6.8 编辑：程序博客网时间：2024/05/01 00:17

今天看到

Kaggle-Ensemble-Guide　，里面有详细的介绍，也有代码，

https://github.com/vzhangmeng726/Kaggle-Ensemble-Guide　　网址，　

http://mlwave.com/kaggle-ensembling-guide/　使用介绍。

里面主要讲了４种方法：1、Voting ensembles.，，Averaging ，，ｒａｎｋ　ａｖｅｒａｇｅ，　，　

ｇｅｏｍｅａｎ，方法。

Voting

投票机制，少数服从多数，多个弱分类器，从概率上可以计算组合起来正确可能性更高
各个分类器相关性太高的话，最终结果不好。选择相关性不太强的分类器
weighing，加权重。比如一种5个classifier，最强的一个算三票，其余各个算一票。这样子，The only way for the inferior models to overrule the best model (expert) is for them to collectively (and confidently) agree on an alternative.

Averaging

过拟合的问题
Rank averaging，先转换成rank，再average。因为有可能各个classifier的结果量级不一致
stream类型的数据，利用历史的rank来做Historical ranks

Stacked Generalization & Blending

我去，这个看起来最麻烦，大体意思就是，先用一层来产出一个中间结果，再用这个中间结果套一层方法，产出最终结果
stack vs blending，感觉就是是不是要k-fold。blending是固定选一个集合做test
结合代码来看比较清晰
- https://github.com/emanuele/kaggle_pbr/blob/master/blend.py
- 第一层时一共有5个不同classifier
- 训练集是X，对应的标签是y，预测集合是X_Submission
- 将原始的X、y，进行k-fold划分，分成小train和小test（Xtrain，Xtest，Ytrain，Ytest）
- 遍历5个classifier i
  - 遍历k-fold j
    - 获得当前拆分的Xtrain和Ytrain，在当前的classifier中，训练得到一个模型
    - 用这个模型去预测这部分的Xtest（遍历完k-fold后，就得到完整的 i 的 X-new-train）
    - 用模型去跑整个预测集X_Submission， blend_test_j
  - 将上面的预测 blend_test_j 求平均，得到X_Submission在Classifier i 的作用下的预测结果
i 的循环结束后，获得两个新数据集
1. X-new-train 行数等于X的样本个数，列数为训练器i个数，每一个值为这个训练器i的预测结果
2. X-new-test 行数等于X_Submission的样本个数，列数为训练器i个数，每一个值为这个训练器i的预测结果
然后，用X-new-train这个合成的结果作为新的训练集train，用y作为标签，套一层LR，算得模型
再用X-new-test作为预测集合，预测结果。
擦，感觉还是讲复杂了Orz。其实就是，n个训练器，在train集得到的结果组成新的train集，在predict集得到的结果组成新的predict集。标签用一样的。就okay了。

一、Creating ensembles from submission files

简单方案，直接通过其他人提交的结果进行整合

1. Voting ensemble

2. Averaging

3. Rank averaging

二、Stacked Generalization & Blending

高端方案，融合多个模型。

1. Stacked generalization

The basic idea behind stacked generalization is to use a pool of base classifiers, then using another classifier to combine their predictions, with the aim of reducing the generalization error.

The procedure is as follows:

Split the training set into two disjoint sets.
Train several base learners on the first part.
Test the base learners on the second part.
Using the predictions from 3) as the inputs, and the correct responses as the outputs, train a higher level learner.

Note that steps 1) to 3) are the same as cross-validation, but instead of using a winner-takes-all approach, we combine the base learners, possibly nonlinearly.

2. Blending

Blending这个词很多地方都是和stacking一样用。这里单独提了一下，我没太看懂，好像意思是：上面的stacking方法是在第一部分数据上train在第二部分上test，颇为类似于交叉检验，充分利用整个数据集进行model的训练。blending额外取一部分进行test，避免了stacker和generalizer使用一样的数据。

3. Stacking with logistic regression

并无特别，使用LR作为stacker。

4. Stacking with non-linear algorithms

线性的试完了，还有非线性的stacker：GBM, KNN, NN, RF and ET(求指点，et是哪个分类器？)...

Non-linear algorithms find useful interactions between the original features and the meta-model features.

5. Feature weighted linear stacking

09年Netfilx一个队伍的方法，称作Feature-Weighted LinearStacking (FWLS)。相比于一般stacking只是使用一个linear regression将不同模型通过线性权重参数整合到一起，这里的权重是一个特征的线性组合，从而整个模型被拓展成了一个feature*model组合的形式，增强了模型的表达能力。（下图来自原论文，也是挺简陋的，看得懂就行 :D）

6. Quadratic linear stacking of models

和上面的feature*model的组合类似，这里可以说通过interaction对所有generalizer output又额外包装了一层model*model的组合，以上图为例创造新的特征如SVD*K-NN或者SVD*RBM，就像二次项似的。如果你想的话构造三次、四次的也可以，有没有效就不知道了 :D

7. Stacking classifiers with regressors and vice versa
用stacking解决回归问题。

8. Stacking unsupervised learned features
整合非监督学习特征，方法很多，这里举了K-Means和t-SNE，就是降维取主特征并加入stacking。

9. Online Stacking

作者关于在线stacking的一些想法，对于Kaggle主要的比赛形式没什么帮助。

还有一些别的内容我就不转述了，比如作者设计了一个ensemble自动机，然后自动跑到了前10%甚至第5。。此外还提到了一个人在比赛中blend了1000+个模型，最后拿了第一。总之，ensemble确实是个艺术，一再创造新的记录。