Online random forest

来源：互联网发布：mac mobi转pdf软件编辑：程序博客网时间：2024/04/30 14:20

Online randomforest

1. online learning的概念

对于online learning，他的数据是come in sequence也就是说training sample是一个一个来，或者是几个几个来，然后classifier根据新来的sample进行更新。Online Learning是比较困难的，主要是你无法知道将来的数据是如何的。显然SVM和Adaboost是行不通的。最近几年也有一些人在做online learning的研究，主要方法还是集中在online boosting这一块。

传统的SVM和adaboost都是batch modelearning. Batch mode learning，就是所有的训练数据都是available的（或者说所有训练数据都已经存在于内存当中），这种方法有两个缺点：

1) 有的时候数据量太大，在内存中放不下，处理起来不方便

2) 由于应用环境限制，有时候无法再训练之前得到所有的训练数据

1. RF适合做计算机视觉的原因[1]

a) First, they arevery fast in both training and classification.

也就是说，RF速度快，效率高，这是外不能及的优势

b) Second, they canbe easily parallelized, which makes them interesting for multi-core and GPUimplementations [2].

c) Additionally, RFs are inherentlymulti-class, therefore they do not require to build several binary classifiersfor a multi-class problem.

RF本身就是多类别的，对于多分类问题不需要建立很多个二进制分类器。

d) Compared toboosting and other ensemble methods, RFs are also more robust against labelnoise.

相比较boosting和其他的模型融合方法，RF更加robust和generalization，能够较好的处理噪音。

2. Online learning& off-line learning

Off-line mode: the entire training data is given in advance and thetraining and testing phases are separated.

On-line mode: training data are not given in advance but arrivessequentially, for instance in tracking in tracking applications wherepredictions are required on-the-fly.

Online learning相比offline learning的优点：

l 所需内存lower than off-line mode，因为之前的样本不需要在此放入内存中，大量的training data通常可以这样处理，训练通常很快。如果数据处理on-line或者数据的分布随着时间发生变化，线下的方法是不可用的。

3. On-line RandomDecision Trees

每个决策树节点都包含test在形成g(x) > . 那些测试通常包含两个部分：1)一个随机生成的测试函数，g(x)通常返回一个标量值，2)由决定样本左右传播过程的随机特征来决定阈值。在off-line模式下，RF随机选择这样的册数，然后根据评估选择最好的结果。如果阈值是随机选择的，RF结果通常成了Extremely Randomized Forest.[3]

在on-line模式中，我们通过随机生成测试函数和阈值来生成extremely random forest. 在生成随机树的过程中，每个决策节点随机的生成一些测试，然后根据quality measurement选择最好的test。常用的quality measurement就是信息熵。计算这样的quality measurement主要是依靠样本的密度。

其实，这一段完全就是决策树。

4. On-line Adaption

其实就是有依据的随机选择一个特征，然后根据之前设定的阈值测试函数来进行部分修改决策树，这个需要较强的数学功底。

5. Experiments

实验的目的是比较新颖的on-line learning算法与off-line learning的算法在标准的机器学习的数据集上并且说明这个方法适合做视觉物体追踪。

6. paper outline

1. review on random forest

2. experiments and comparision

3. related work first

上述来源：http://www.cvchina.info/tag/online-random-forest/

[1] on-line random forest-2009-olcv

[2] Implementing decision trees and forests on a gpu ECCV-2008

[3] Extremely randomized trees 2006 Jurnal machine learning

0 0