机器学习算法之AdaBoost算法python实现

来源：互联网发布：解压软件官方免费下载编辑：程序博客网时间：2024/05/16 11:08

- 一理论基础
  - 算法描述
  - 算法步骤
  - 训练误差分析
  - 算法的理论解释与推导
    - 前向分步算法
    - 前向分步算法步骤
    - 前向分步算法与AdaBoost
- 二 python实现
  - 代码
  - 结果

前言：纸上得来终觉浅，绝知此事要躬行

一. 理论基础

结合后面代码看理论基础，将会更加清楚。

1. 算法描述

AdaBoost算法是boosting方法的代表性算法。给定一个训练集，adaboost求一系列比较粗糙的分类器（弱分类器），每学完一个弱分类器，改变训练数据的概率分布（权值分布），使正确分类的数据权值减小，错误分类的数据权值增大，最后组合这些弱分类器，构成一个强分类器。

2. 算法步骤

初始化权值分布

D 1 = (w 11, w 12, \dots, w 1 N), w 1 i = 1 N

迭代训练弱分类器，对每个分类训练步骤：
a. 使用具有权值分布Dm的训练数据集学习，得到基本分类器
$G m (x) : χ \to - 1, 1$
b. 计算Gm(x)在训练数据集上的分类误差率
$e m = \sum i = 1 N w m i I (y i \neq G m (x i))$
c. 计算分类器Gm(x)权值
$α m = 1 2 ln 1 - e m e m$
d. 更新训练数据的权值，为下一步迭代做准备
$w m + 1, i = w m i Z m e - α m y i G m (x i)$
其中，
$Z m = \sum i = 1 N e - α m y i G m (x i)$
当错误率（这里的错误率是指前m个弱分类器的线性组合后的分类器的错误率）达到阈值或者迭代次数（弱分类器个数）达到指定次数后，将所有的弱分类器线性组合起来

G (x) = s i g n (f (x)) = s i g n (\sum m = 1 M α m G m (x))

3. 训练误差分析

最终分类器的误差率满足：

1 N \sum i = 1 N I (G (x) \neq y i) \leq 1 N \sum i = 1 N e - y i f (x i) = \prod i = 1 m Z m

这个定理说明，每一轮选取适当的Gm使得Zm最小，从而使训练误差下降最快。

上述定理证明如下：

当G(x)≠yi时， yif(xi)<0：

e - y i f (x i) > I (G (x) \neq y i) = 1

当G(x)=yi时， yif(xi)>0：

1 > e - y i f (x i) > I (G (x) \neq y i) = 0

所以定理左半部分不等式成立.

下面的证明会用到公式(在算法步骤中出现过)：

Z m w m + 1, i = w m i e - α m y i G m (x i)

右边等式部分证明如下：

1 N \sum i = 1 N e - y i f (x i) = 1 N \sum i = 1 N e - y i \sum M m = 1 [α m G m (x i)] = \sum i = 1 N w 1 i \prod m = 1 M e - y i α m G m (x i) = Z 1 \sum i = 1 N w 2 i \prod m = 2 M e - y i α m G m (x i) = Z 1 Z 2 \sum i = 1 N w 3 i \prod m = 3 M e - y i α m G m (x i) = \dots = \prod m = 1 M Z m

所以定理右半部分成立.

4. 算法的理论解释与推导

另一种解释认为，AdaBoost是模型为加法模型，损失函数为指数函数，学习算法为前向分步算法时的二类分类学习方法。

1. 前向分步算法

加法模型：

f (x) = \sum m = 1 M β m b (x; γ m)

其中， b(x;γm)为基函数，γm为基函数的参数，βm为基函数的系数.

损失函数：L(x,f(x))

目标：

min β m, γ m \sum i = 1 N L (y i, \sum m = 1 M β m b (x i; γ m))

通常这是一个复杂的优化问题。前向分步算法求解这一优化问题的想法是：因为学习的是加法模型，如果能够从前向后，每一步只学习一个基函数及其系数，逐步逼近优化目标函数式，那么就可以简化优化的复杂度.具体地，每步只需优化如下损失函数:

min β, γ \sum i = 1 N L (y i, β b (x i; γ))

2. 前向分步算法步骤

初始化f0(x)=0
对m=1,2,⋯,M
a. 极小化损失函数
$(β m, γ m) = a r g min β, γ \sum i = 1 N L (y i, f m - 1 (x - i) + β b (x i; γ))$
得到参数βm,γm
b. 更新
$f m (x) = f m - 1 (x) + β m b (x; γ m)$
得到加法模型
$f (x) = f M (x) = \sum m = 1 M β m b (x; γ m)$

这样，前向分步算法将同时求解从m=1到M所有参数βm,γm的优化问题简化为逐次求解各个βm,γm的优化问题.

3. 前向分步算法与AdaBoost

AdaBoost算法是前向分步加法算法的特例。这时，模型是由基本分类器组成的加法模型，损失函数是指数函数。

当基函数为基本分类器，基函数的系数为基本分类器的权值时，该加法模型等价于AdaBoost的最终分类器

f (x) = \sum m = 1 M α m G m x

损失函数为指数函数：

L (y, f (x)) = e - y f (x)

假设经过m-1轮迭代前向分步算法已经得到fm−1(x)，在第m轮迭代得到αm,Gm(x),fm(x).

f m (x) = f m - 1 (x) + α m G m (x)

目标是使前向分步算法得到的

αm,Gm(x)使

fm(x)在训练数据集T上的指数损失最小，即

(α m, G m (x)) = a r g min α, G \sum i = 1 N e - y i (f m - 1 (x) + α G (x i)) = a r g min α, G \sum i = 1 N ω ¯ m i e - y i α G (x i)

其中，ω¯mi=e−yifm−1(xi)，因为ω¯mi既不依赖α也不依赖于G，所以与最小化无关，但ω¯mi依赖于fm−1(x)，随着每一轮迭代而发生变化.

现证使上式达到最小的α∗m,G∗m(x)就是AdaBoost算法所得到的αm和Gm(x).

首先，求G∗m(x).对任意α>0，使上式最小的G(x)由下式得到:

G * m (x) = a r g min G \sum i = 1 N ω ¯ m i I (y i \neq G (x i))

此分类器G∗m(x)即为AdaBoost算法的基本分类器Gm(x)，因为它是使第ｍ轮加权训练数据分类错误率最小的基本分类器.

之后，求α∗m.

a r g min α, G \sum i = 1 N ω ¯ m i e - y i α G (x i) = \sum y i = G m (x i) ω ¯ m i e - α + \sum y i \neq G m (x i) ω ¯ m i e α = (e α - e - α) \sum i = 1 N ω ¯ m i I (y i \neq G (x i)) + e - α \sum i = 1 N ω ¯ m i = (e α - e - α) e m + e - α

将上式对α求导并令其等于０，求得

α * m = 1 2 ln 1 - e m e m

em为分类错误率.
这里的α∗m与AdaBoost算法第二中的αm完全一致.

最后来看每一轮样本权值的更新，由

f m (x) = f m - 1 (x) + α m G m (x)

以及ω¯mi=e−yifm−1(xi)，可得

ω ¯ m + 1, i = ω ¯ m i e - y i α m G m (x)

这与AdaBoost算法第2步中的样本权值更新只差规范化因子，因而等价.

二. python实现

1. 代码

这里的弱分类器是单层的决策树。

#encoding=utf-8#######################################################################Copyright: CNIC#Author: LiuYao#Date: 2017-9-11#Description: implements the adaboost algorithm######################################################################'''implements the adaboost'''import numpy as npimport matplotlib.pyplot as pltclass AdaBoost:    '''    implements the adaboost classifier    '''    def __init__(self):        pass    def load_simple_data(self):        '''        make a simple data set        '''        data = np.mat([[1.0, 2.0],                    [2.0, 1.1],                    [1.3, 1.0],                    [1.0, 1.0],                    [2.0, 1.0]])        labels = [1.0, 1.0, -1.0, -1.0, 1.0]        return data, labels    def classify_results(self, x_train, demension, thresh, op):        '''        get the predict results by the data, thresh, op and the special demension        Args:            x_train: train data            demension: the special demension            thresh: the spliting value            op: the operator, including '<=', '>'        '''        y_predict = np.ones((x_train.shape[0], 1))        if op == 'le':            y_predict[x_train[:, demension] <= thresh] = -1.0        else:            y_predict[x_train[:, demension] > thresh] = -1.0        return y_predict    def get_basic_classifier(self, x_train, y_train, D):        '''        generate basic classifier by the data and the weight of data        Args:            x_train: train data            y_train: train label            D: the weight of the data        '''        x_mat = np.mat(x_train)        y_mat = np.mat(y_train).T        D_mat = np.mat(D)        [m,n] = x_mat.shape        num_steps = 10.0        min_error = np.inf        best_basic_classifier = {}        best_predict = np.mat(np.zeros((m, 1)))        #traverse all demensions to find best demension        for demension in range(n):            step_length = (x_mat[:, demension].max() - x_mat[:, demension].min()) / num_steps            #traverse all spliting range in the special demension to find best spliting value            for step in range(-1, int(num_steps) + 1):                #determine which op has lower error                for op in ['le', 'g']:                    thresh = x_mat[:, demension].min() + step * step_length                    y_predict = self.classify_results(x_mat, demension, thresh, op)                    error = np.sum(D_mat[np.mat(y_predict) != y_mat])                    if error < min_error:                        min_error = error                        best_predict = np.mat(y_predict).copy()                        best_basic_classifier['demension'] = demension                        best_basic_classifier['thresh'] = thresh                        best_basic_classifier['op'] = op        return best_basic_classifier, min_error, best_predict    def train(self, x_train, y_train, max_itr=50):        '''        train function        '''        m = len(x_train)        n = len(x_train[0])        D = [1.0/m for i in range(m)]        D = np.mat(D).T        self.basic_classifier_list = []        acc_label = np.mat(np.zeros((m, 1)))        #generate each basic classifier        for i in range(max_itr):            #generate basic classifier            basic_classifier, error, y_predict = self.get_basic_classifier(x_train, y_train, D)            print 'y_predict:', y_predict.T            #compute the basic classifier weight            alpha = 0.5 * np.log((1 - error) / max(error, 1e-16))            #compute the data weight            D = np.multiply(D, np.exp(-1 * alpha * np.multiply(np.mat(y_train).T, np.mat(y_predict))))            D = D / D.sum()            print 'D:', D.T            basic_classifier['alpha'] = alpha            #store the basic classifier            self.basic_classifier_list.append(basic_classifier)            #accmulate the predict results            acc_label += alpha * y_predict            print 'acc_label', acc_label            #compute the total error of all basic classifier generated until now            total_error = np.sum(np.sign(acc_label) != np.mat(y_train).T) / float(m)            print 'total_error:', total_error            #if total error equals to the thresh, then stop            if total_error == 0.0:                 break        return self.basic_classifier_list    def predict(self, x_test):        '''        adaboost predict function        '''        x_mat = np.mat(x_test)        m = x_mat.shape[0]        acc_label = np.mat(np.zeros((m, 1)))        for i in range(len(self.basic_classifier_list)):            predict = self.classify_results(x_mat,                                 self.basic_classifier_list[i]['demension'],                                self.basic_classifier_list[i]['thresh'],                                self.basic_classifier_list[i]['op'])            # accmulate the predict results of each basic classifier            acc_label += self.basic_classifier_list[i]['alpha'] * predict        print acc_label        return np.sign(acc_label)def main():    adaboost = AdaBoost()    data, labels = adaboost.load_simple_data()    adaboost.train(data, labels, max_itr=9)    print adaboost.predict([[5,5], [0,0]])if __name__ == '__main__':    main()

2. 结果

结果用来验证实现的adaboost算法是否正确。

y_predict: [[-1.  1. -1. -1.  1.]]D: [[ 0.5    0.125  0.125  0.125  0.125]]acc_label [[-0.69314718] [ 0.69314718] [-0.69314718] [-0.69314718] [ 0.69314718]]total_error: 0.2y_predict: [[ 1.  1. -1. -1. -1.]]D: [[ 0.28571429  0.07142857  0.07142857  0.07142857  0.5       ]]acc_label [[ 0.27980789] [ 1.66610226] [-1.66610226] [-1.66610226] [-0.27980789]]total_error: 0.2y_predict: [[ 1.  1.  1.  1.  1.]]D: [[ 0.16666667  0.04166667  0.25        0.25        0.29166667]]acc_label [[ 1.17568763] [ 2.56198199] [-0.77022252] [-0.77022252] [ 0.61607184]]total_error: 0.0[[ 2.56198199] [-2.56198199]][[ 1.] [-1.]]

阅读全文

0 0