机器学习_算法_AdaBoost

来源：互联网发布：台atm机遭植软件编辑：程序博客网时间：2024/06/05 08:41

参考:

http://download.csdn.net/detail/blacklaw0/5828223

第七章，Improving classification with the adaboost meta-algorithm

简单介绍一下这个例子，DATMAT 就是5case * 2feature的矩阵，每一个弱分类器都可以根据一个feature对5个case进行简单分类，我们现在就是先找到几个弱分类器，寻找的原理就是遍历所有的可能，看谁的错误最小就用它作为分类界限

然后根据弱分类器构造一个adaboost分类器，这里面有几个参数我介绍一下，D参数：它的纬度为5，指示的是每个例子的权重，如果有case太离谱了，可以调节D下降，起抗噪的作用，alpha是每个feature的权重，ada=sum(classify_by_weak * alpha)来构造adaboost分类器，对于alpha和D的逼近可以参考原书，我注释里也简单写了一下

结果贴在最小面，还不错

'''Created on Sep 1, 2013@author: blacklaw'''from numpy import *#     classify  featureA featureB, 5 casesDATMAT = mat([[ 1. ,  2.1],            [ 2. ,  1.1],            [ 1.3,  1. ],            [ 1. ,  1. ],            [ 2. ,  1. ]])#     classify result for each caseCLSLABELS = [1.0, 1.0, -1.0, -1.0, 1.0]def stump_classify(data, dimen, threshold, ineq):    result = ones((shape(data)[0], 1))    if ineq == 'lt':        result[data[:,dimen] <= threshold] = -1    else: # 'lt'        result[data[:,dimen] > threshold] = -1    return resultdef classify(data, clser):# classifier    return list(stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']).T[0])def ada_classify(data, clsers):    class_sum = zeros((shape(data)[0], 1))    for clser in clsers:        #print stump_classify(data, clser['dimen'], clser['thres'], clser['ineq'])        class_sum += stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']) * clser['alpha']    result = ones((shape(data)[0],1))    result[class_sum < 0] = -1    return list(result.T[0])def evaluate(labels, prediction):    result = ones((len(labels), 1)).T    result[mat(labels) != mat(prediction)] = 0    right = sum(result)    return float(right) / len(labels)import copydef build_stump(data, labels, D):    D = copy.deepcopy(D)    STEP = 10.0    count, dimen_l = shape(data)    best_stump = {}    best_classify = []    min_error_count = inf    # traversal all dimen, range(min, max) and ['lt', 'gt'] to find the best_stump    for d in range(dimen_l):        min_v = data[:,d].min()        max_v = data[:,d].max()        step_size = (max_v - min_v) / float(STEP)        # Caution: there must use range form -1 to len+1        for i in range(-1, int(STEP) + 1):            threshold = min_v + float(i) * step_size            for ineq in ['lt', 'gt']:                re = stump_classify(data, d, threshold, ineq)                re_mat = ones((shape(data)[0], 1))                re_mat[re == mat(labels).T] = 0                error_count = sum(re_mat.T * D.T)                if error_count < min_error_count:                    min_error_count = error_count                    best_classify = re                    best_stump['dimen'] = d                    best_stump['ineq'] = ineq                    best_stump['thres'] = threshold    # calc weak classify weight: alpha = 1/2 * ln((1 - error_count) / error_count)    alpha = float(0.5 * log((1.0 - min_error_count) / max(min_error_count, 1e-16)))    best_stump['alpha'] = alpha    expon = multiply(-1*alpha*mat(labels).T, best_classify)    '''    calc new D: D(t+1) = D(t)*e**(+/-)alpha / Sum(D)     Caution: when it's a incorrectly predicted use -alpha to decrease weight    '''    D = multiply(mat(D).T, exp(expon))    D = D / D.sum()    return best_stump, D.A1if __name__ == "__main__":    l = len(DATMAT[:,0])    D = ones(l) / l    stumps = []    print  "****** weak classify **********"    for i in range(4):        stump, D = build_stump(DATMAT, CLSLABELS, D)        stumps.append(stump)        cls = classify(DATMAT, stump)        print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls))        print  "******* adaboost classify **********"    cls = ada_classify(DATMAT, stumps)    print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls))

输出结果:

****** weak classify **********classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000classify: [1.0, 1.0, -1.0, -1.0, -1.0] rate: 0.800000classify: [1.0, 1.0, 1.0, 1.0, 1.0] rate: 0.600000classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000******* adaboost classify **********classify: [1.0, 1.0, -1.0, -1.0, 1.0] rate: 1.000000