MLiA AdaBoost
来源:互联网 发布:手机淘宝店铺管理软件 编辑:程序博客网 时间:2024/06/05 06:32
1.元算法是对其他算法进行组合的一种方式。
2.boosting方法
3.adaboost分类器
4.单层决策树(决策树桩)分类器(decision stump)
5.非均衡分类问题
6.基于数据集多重抽样的分类器
6.1. 基于数据随机重抽样的分类器构建方法:自举汇聚法 更先进的bagging算法:随机森林
6.2. boosting:通过集中关注被已有分类器错分的那些数据来获得新的分类器。 弱分类器是说分类器的性能比随机猜测要略好,但是不会好太多。 adaboost是adaptive boosting的缩写。
代码如下:
# -*- coding: cp936 -*-from numpy import *import numpy as npdef loadSimDat(): dataMat = matrix([[1, 2.1], [2.0, 1.1], [1.3, 1.0], [1.0, 1.0], [2.0, 1.0]]) classLabels = [1.0, 1.0, -1.0, -1.0, 1.0] return dataMat, classLabelsdatMat,classLabels = loadSimDat()# print datMatD = mat(ones((5,1))/5)# 单层决策树生成函数def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data retArray = ones((shape(dataMatrix)[0],1)) if threshIneq == 'lt': retArray[dataMatrix[:,dimen] <= threshVal] = -1.0 else: retArray[dataMatrix[:,dimen] > threshVal] = -1.0 return retArraydef buildStump(dataArr, classLabels, D): dataMatrix = mat(dataArr); labelMat = mat(classLabels).T m,n = shape(dataMatrix) numSteps = 10.0; bestStump = {}; bestClassEst = mat(zeros((m,1))) minError = inf for i in range(n): rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max(); stepSize = (rangeMax - rangeMin)/numSteps for j in range(-1, int(numSteps)+1): for inequal in ['lt','gt']: threshVal = (rangeMin + float(j)*stepSize) predictedVals = stumpClassify(dataMatrix, i, threshVal, inequal) errArr = mat(ones((m,1))) errArr[predictedVals == labelMat] = 0 weightedError = D.T * errArr #这里的error是错误向量errArr和权重向量D的相应元素相乘得到的即加权错误率 #print "split: dim %d, thresh %.2f, thresh inequal: %s, the weighted error is %.3f" %(i, threshVal, inequal, weightedError) if weightedError < minError: minError = weightedError bestClassEst = predictedVals.copy() bestStump['dim'] = i bestStump['thresh'] = threshVal bestStump['ineq'] = inequal return bestStump, minError, bestClassEst# print buildStump(datMat,classLabels,D)# buildStump(datMat,classLabels,D)# 基于单层决策树的adaboost训练过程def adaBoostTrainDS(dataArr, classLabels, numIt = 40): weakClassArr = [] m = shape(dataArr)[0] D = mat(ones((m,1))/m) aggClassEst = mat(zeros((m,1))) for i in range(numIt): bestStump, error, classEst = buildStump(dataArr, classLabels, D) print "D:", D.T alpha = float(0.5 * np.log((1.0 - error)/max(error, 1e-16))) #确保在没有错误时不会发生除零溢出 bestStump['alpha'] = alpha weakClassArr.append(bestStump) print "classEst:", classEst.T expon = np.multiply(-1 * alpha * mat(classLabels).T, classEst) #乘法用于区分是否正确或者错误样本 D = np.multiply(D, np.exp(expon)) D = D/D.sum() # 归一化用的 aggClassEst += alpha * classEst #累加变成强分类器 print "aggClassEst:",classEst.T aggErrors = np.multiply(np.sign(aggClassEst) != mat(classLabels).T, ones((m,1))) errorRate = aggErrors.sum()/m print "total error: ", errorRate, "\n" if errorRate == 0.0: break # return weakClassArr, aggClassEst return weakClassArrclassifierArray = adaBoostTrainDS(datMat,classLabels,9)# print classifierArray#adaboost分类函数def adaClassify(datToClass, classifierArrayay): dataMatrix = mat(datToClass) m = shape(dataMatrix)[0] aggClassEst = mat(zeros((m,1))) for i in range(len(classifierArray)): classEst = stumpClassify(dataMatrix, classifierArray[i]['dim'], classifierArray[i]['thresh'], classifierArray[i]['ineq']) aggClassEst += classifierArray[i]['alpha']*classEst print aggClassEst return np.sign(aggClassEst)# print adaClassify([0,0],classifierArray)print adaClassify([[5,5],[0,0]],classifierArray)# 案例-通过马疝病预测马的死亡率def loadDataSet(fileName): #general function to parse tab -delimited floats numFeat = len(open(fileName).readline().split('\t')) #自动检测特征的数目 dataMat = []; labelMat = [] fr = open(fileName) for line in fr.readlines(): lineArr =[] curLine = line.strip().split('\t') for i in range(numFeat-1): lineArr.append(float(curLine[i])) dataMat.append(lineArr) labelMat.append(float(curLine[-1])) return dataMat,labelMatdatArr,labelArr = loadDataSet('horseColicTraining2.txt')classfierArray = adaBoostTrainDS(datArr,labelArr,10)testArr,testLabelArr = loadDataSet('horseColicTest2.txt')prediction10 = adaClassify(testArr,classifierArray)errArr = mat(ones((67,1)))print errArr[prediction10!=mat(testLabelArr).T].sum()
0 0
- MLiA AdaBoost
- MLiA knn
- MLiA ID3 DecisionTree
- MLiA 朴素贝叶斯
- MLiA Logistic回归
- MLiA SVM心得
- Adaboost
- adaBoost
- adaboost
- Adaboost
- Adaboost
- AdaBoost
- AdaBoost
- AdaBoost
- AdaBoost
- AdaBoost
- AdaBoost
- adaboost
- 翻转链表 II
- 排列组合
- Oracle数据库
- 接口设计
- Matlab GUI 鼠标事件(一)
- MLiA AdaBoost
- hdu5400(模拟)
- Android自助餐之大图片加载
- HDU 1195 Open the Lock (不一样的BFS)
- 如何在三个月内获得三年的工作经验
- 二叉树的深度,平衡二叉树深度
- time元素和pubdate属性
- 静态文件版本号替换与压缩
- 异常类型处理