机器学习_算法_AdaBoost

来源:互联网 发布:台atm机遭植软件 编辑:程序博客网 时间:2024/06/05 08:41

参考:

http://download.csdn.net/detail/blacklaw0/5828223

第七章,Improving classification with the adaboost meta-algorithm

简单介绍一下这个例子,DATMAT 就是5case * 2feature的矩阵,每一个弱分类器都可以根据一个feature对5个case进行简单分类,我们现在就是先找到几个弱分类器,寻找的原理就是遍历所有的可能,看谁的错误最小就用它作为分类界限

然后根据弱分类器构造一个adaboost分类器,这里面有几个参数我介绍一下,D参数:它的纬度为5,指示的是每个例子的权重,如果有case太离谱了,可以调节D下降,起抗噪的作用,alpha是每个feature的权重,ada=sum(classify_by_weak * alpha)来构造adaboost分类器,对于alpha和D的逼近可以参考原书,我注释里也简单写了一下

结果贴在最小面,还不错

'''Created on Sep 1, 2013@author: blacklaw'''from numpy import *#     classify  featureA featureB, 5 casesDATMAT = mat([[ 1. ,  2.1],            [ 2. ,  1.1],            [ 1.3,  1. ],            [ 1. ,  1. ],            [ 2. ,  1. ]])#     classify result for each caseCLSLABELS = [1.0, 1.0, -1.0, -1.0, 1.0]def stump_classify(data, dimen, threshold, ineq):    result = ones((shape(data)[0], 1))    if ineq == 'lt':        result[data[:,dimen] <= threshold] = -1    else: # 'lt'        result[data[:,dimen] > threshold] = -1    return resultdef classify(data, clser):# classifier    return list(stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']).T[0])def ada_classify(data, clsers):    class_sum = zeros((shape(data)[0], 1))    for clser in clsers:        #print stump_classify(data, clser['dimen'], clser['thres'], clser['ineq'])        class_sum += stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']) * clser['alpha']    result = ones((shape(data)[0],1))    result[class_sum < 0] = -1    return list(result.T[0])def evaluate(labels, prediction):    result = ones((len(labels), 1)).T    result[mat(labels) != mat(prediction)] = 0    right = sum(result)    return float(right) / len(labels)import copydef build_stump(data, labels, D):    D = copy.deepcopy(D)    STEP = 10.0    count, dimen_l = shape(data)    best_stump = {}    best_classify = []    min_error_count = inf    # traversal all dimen, range(min, max) and ['lt', 'gt'] to find the best_stump    for d in range(dimen_l):        min_v = data[:,d].min()        max_v = data[:,d].max()        step_size = (max_v - min_v) / float(STEP)        # Caution: there must use range form -1 to len+1        for i in range(-1, int(STEP) + 1):            threshold = min_v + float(i) * step_size            for ineq in ['lt', 'gt']:                re = stump_classify(data, d, threshold, ineq)                re_mat = ones((shape(data)[0], 1))                re_mat[re == mat(labels).T] = 0                error_count = sum(re_mat.T * D.T)                if error_count < min_error_count:                    min_error_count = error_count                    best_classify = re                    best_stump['dimen'] = d                    best_stump['ineq'] = ineq                    best_stump['thres'] = threshold    # calc weak classify weight: alpha = 1/2 * ln((1 - error_count) / error_count)    alpha = float(0.5 * log((1.0 - min_error_count) / max(min_error_count, 1e-16)))    best_stump['alpha'] = alpha    expon = multiply(-1*alpha*mat(labels).T, best_classify)    '''    calc new D: D(t+1) = D(t)*e**(+/-)alpha / Sum(D)     Caution: when it's a incorrectly predicted use -alpha to decrease weight    '''    D = multiply(mat(D).T, exp(expon))    D = D / D.sum()    return best_stump, D.A1if __name__ == "__main__":    l = len(DATMAT[:,0])    D = ones(l) / l    stumps = []    print  "****** weak classify **********"    for i in range(4):        stump, D = build_stump(DATMAT, CLSLABELS, D)        stumps.append(stump)        cls = classify(DATMAT, stump)        print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls))        print  "******* adaboost classify **********"    cls = ada_classify(DATMAT, stumps)    print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls))    


输出结果:

****** weak classify **********classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000classify: [1.0, 1.0, -1.0, -1.0, -1.0] rate: 0.800000classify: [1.0, 1.0, 1.0, 1.0, 1.0] rate: 0.600000classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000******* adaboost classify **********classify: [1.0, 1.0, -1.0, -1.0, 1.0] rate: 1.000000






原创粉丝点击