机器学习_算法_AdaBoost
来源:互联网 发布:台atm机遭植软件 编辑:程序博客网 时间:2024/06/05 08:41
参考:
http://download.csdn.net/detail/blacklaw0/5828223
第七章,Improving classification with the adaboost meta-algorithm
简单介绍一下这个例子,DATMAT 就是5case * 2feature的矩阵,每一个弱分类器都可以根据一个feature对5个case进行简单分类,我们现在就是先找到几个弱分类器,寻找的原理就是遍历所有的可能,看谁的错误最小就用它作为分类界限
然后根据弱分类器构造一个adaboost分类器,这里面有几个参数我介绍一下,D参数:它的纬度为5,指示的是每个例子的权重,如果有case太离谱了,可以调节D下降,起抗噪的作用,alpha是每个feature的权重,ada=sum(classify_by_weak * alpha)来构造adaboost分类器,对于alpha和D的逼近可以参考原书,我注释里也简单写了一下
结果贴在最小面,还不错
'''Created on Sep 1, 2013@author: blacklaw'''from numpy import *# classify featureA featureB, 5 casesDATMAT = mat([[ 1. , 2.1], [ 2. , 1.1], [ 1.3, 1. ], [ 1. , 1. ], [ 2. , 1. ]])# classify result for each caseCLSLABELS = [1.0, 1.0, -1.0, -1.0, 1.0]def stump_classify(data, dimen, threshold, ineq): result = ones((shape(data)[0], 1)) if ineq == 'lt': result[data[:,dimen] <= threshold] = -1 else: # 'lt' result[data[:,dimen] > threshold] = -1 return resultdef classify(data, clser):# classifier return list(stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']).T[0])def ada_classify(data, clsers): class_sum = zeros((shape(data)[0], 1)) for clser in clsers: #print stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']) class_sum += stump_classify(data, clser['dimen'], clser['thres'], clser['ineq']) * clser['alpha'] result = ones((shape(data)[0],1)) result[class_sum < 0] = -1 return list(result.T[0])def evaluate(labels, prediction): result = ones((len(labels), 1)).T result[mat(labels) != mat(prediction)] = 0 right = sum(result) return float(right) / len(labels)import copydef build_stump(data, labels, D): D = copy.deepcopy(D) STEP = 10.0 count, dimen_l = shape(data) best_stump = {} best_classify = [] min_error_count = inf # traversal all dimen, range(min, max) and ['lt', 'gt'] to find the best_stump for d in range(dimen_l): min_v = data[:,d].min() max_v = data[:,d].max() step_size = (max_v - min_v) / float(STEP) # Caution: there must use range form -1 to len+1 for i in range(-1, int(STEP) + 1): threshold = min_v + float(i) * step_size for ineq in ['lt', 'gt']: re = stump_classify(data, d, threshold, ineq) re_mat = ones((shape(data)[0], 1)) re_mat[re == mat(labels).T] = 0 error_count = sum(re_mat.T * D.T) if error_count < min_error_count: min_error_count = error_count best_classify = re best_stump['dimen'] = d best_stump['ineq'] = ineq best_stump['thres'] = threshold # calc weak classify weight: alpha = 1/2 * ln((1 - error_count) / error_count) alpha = float(0.5 * log((1.0 - min_error_count) / max(min_error_count, 1e-16))) best_stump['alpha'] = alpha expon = multiply(-1*alpha*mat(labels).T, best_classify) ''' calc new D: D(t+1) = D(t)*e**(+/-)alpha / Sum(D) Caution: when it's a incorrectly predicted use -alpha to decrease weight ''' D = multiply(mat(D).T, exp(expon)) D = D / D.sum() return best_stump, D.A1if __name__ == "__main__": l = len(DATMAT[:,0]) D = ones(l) / l stumps = [] print "****** weak classify **********" for i in range(4): stump, D = build_stump(DATMAT, CLSLABELS, D) stumps.append(stump) cls = classify(DATMAT, stump) print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls)) print "******* adaboost classify **********" cls = ada_classify(DATMAT, stumps) print 'classify: %s rate: %f' % (cls, evaluate(CLSLABELS, cls))
输出结果:
****** weak classify **********classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000classify: [1.0, 1.0, -1.0, -1.0, -1.0] rate: 0.800000classify: [1.0, 1.0, 1.0, 1.0, 1.0] rate: 0.600000classify: [-1.0, 1.0, -1.0, -1.0, 1.0] rate: 0.800000******* adaboost classify **********classify: [1.0, 1.0, -1.0, -1.0, 1.0] rate: 1.000000
- 机器学习_算法_AdaBoost
- 机器学习_adaboost 算法
- 机器学习算法_Adaboost
- 机器学习_算法_svm
- 机器学习_算法_Apriori
- 机器学习_算法_HMM
- 机器学习_算法_KNN
- 机器学习_集成算法
- 机器学习_遗传算法
- 机器学习_算法_朴素贝叶斯
- 机器学习_算法_神经网络_BP
- 机器学习_常用算法列举
- 机器学习_常用算法简介
- 机器学习_算法_ID3,C4.5
- 机器学习_算法_kmeans聚类
- 机器学习_贝叶斯网络分类算法
- 机器学习算法_第1篇
- 机器学习:EM算法_续
- Java设计和使用异常的最佳实践
- hdu1495非常可乐
- hdu 4465 Candy( 概率 log 组合数 )
- Java中的数组
- [每日一题] OCP1z0-047 :2013-08-24 FLASHBACK—TABLE/PRIMARY KEY(FOREIGN KEY?)......98
- 机器学习_算法_AdaBoost
- 在ListView中使用多个布局
- Android的TextView使用Html来处理图片显示、字体样式、超链接等
- 题目1531:货币面值(网易游戏2013年校园招聘笔试题)
- ANGSTROM
- oracle 之flashback 深入研究。
- Java中的final关键字
- WinCE MUI的实现----本人亲自实践
- POJ2342_Anniversary party 树形DP