ML—AdaBoost(二)—MATLAB代码
来源:互联网 发布:gps军用级精度 知乎 编辑:程序博客网 时间:2024/05/16 07:25
华电北风吹
天津大学认知计算与应用重点实验室
修改日期:2015/7/27
在网上看了几篇AdaBoost的介绍后,感觉网上介绍的都不好,不能够让人完全理解,因此就下载了一个外国人写的代码,总算透彻的理解了AdaBoost,可以向Transfer开进了,现在分享一下代码:
主函数代码
clear;clc;%% DEMONSTRATION OF ADABOOST_tr and ADABOOST_te%% Just type "demo" to run the demo.%% Using adaboost with linear threshold classifier% for a two class classification problem.%% Bug Reporting: Please contact the author for bug reporting and comments.%% Cuneyt Mertayak% email: cuneyt.mertayak@gmail.com% version: 1.0% date: 21/05/2007% Creating the training and testing sets%tr_n = 200;te_n = 200;weak_learner_n = 20;tr_set = abs(rand(tr_n,2))*100;te_set = abs(rand(te_n,2))*100;tr_labels = (tr_set(:,1)-tr_set(:,2) > 0) + 1;te_labels = (te_set(:,1)-te_set(:,2) > 0) + 1;% Displaying the training and testing setsfigure;subplot(2,2,1);hold on; axis square;indices = tr_labels==1;plot(tr_set(indices,1),tr_set(indices,2),'b*');indices = ~indices;plot(tr_set(indices,1),tr_set(indices,2),'r*');title('Training set');subplot(2,2,2);hold on; axis square;indices = te_labels==1;plot(te_set(indices,1),te_set(indices,2),'b*');indices = ~indices;plot(te_set(indices,1),te_set(indices,2),'r*');title('Testing set');% Training and testing error ratestr_error = zeros(1,weak_learner_n);te_error = zeros(1,weak_learner_n);for i=1:weak_learner_n adaboost_model = ADABOOST_tr(@threshold_tr,@threshold_te,tr_set,tr_labels,i); % 训练样本测试 [L_tr,hits_tr] = ADABOOST_te(adaboost_model,@threshold_te,tr_set,tr_labels); tr_error(i) = (tr_n-hits_tr)/tr_n; % 测试样本测试 [L_te,hits_te] = ADABOOST_te(adaboost_model,@threshold_te,te_set,te_labels); te_error(i) = (te_n-hits_te)/te_n;endsubplot(2,2,3);plot(1:weak_learner_n,tr_error);axis([1,weak_learner_n,0,1]);title('Training Error');xlabel('weak classifier number');ylabel('error rate');grid on;subplot(2,2,4); axis square;plot(1:weak_learner_n,te_error);axis([1,weak_learner_n,0,1]);title('Testing Error');xlabel('weak classifier number');ylabel('error rate');grid on;为了计算每一种迭代次数的准确率的时候,迭代次数增加的时候让计算机重复计算
调用的分类器训练函数代码:
function model = threshold_tr(train_set, sample_weights, labels)%% TRAINING THRESHOLD CLASSIFIER%% Training of the basic linear classifier where seperation hyperplane% is perpedicular to one dimension.%% model = threshold_tr(train_set, sample_weights, labels)% train_set: an NxD-matrix, each row is a training sample in the D dimensional feature% space.% sample_weights: an Nx1-vector, each entry is the weight of the corresponding training sample% labels: Nx1 dimensional vector, each entry is the corresponding label (either 1 or 2)%% model: the ouput model. It consists of% 1) min_error: training error% 2) min_error_thr: threshold value% 3) pos_neg: whether up-direction shows the positive region (label:2, 'pos') or% the negative region (label:1, 'neg')%% Bug Reporting: Please contact the author for bug reporting and comments.%% Cuneyt Mertayak% email: cuneyt.mertayak@gmail.com% version: 1.0% date: 21/05/2007model = struct('min_error',[],'min_error_thr',[],'pos_neg',[],'dim',[]);sample_n = size(train_set,1);min_error = sum(sample_weights);min_error_thr = 0;pos_neg = 'pos';% for each dimensionfor dim=1:size(train_set,2) sorted = sort(train_set(:,dim),1,'ascend'); % for each interval in the specified dimension for i=1:(sample_n+1) if(i==1) thr = sorted(1)-0.5; elseif(i==sample_n+1) thr = sorted(sample_n)+0.5; else thr = (sorted(i-1)+sorted(i))/2; end ind1 = train_set(:,dim) < thr; ind2 = ~ind1; tmp_err = sum(sample_weights((labels.*ind1)==2))+sum(sample_weights((labels.*ind2)==1)); if(tmp_err < min_error) min_error = tmp_err; min_error_thr = thr; pos_neg = 'pos'; model.dim = dim; end ind1 = train_set(:,dim) < thr; ind2 = ~ind1; tmp_err = sum(sample_weights((labels.*ind1)==1))+sum(sample_weights((labels.*ind2)==2)); if(tmp_err < min_error) min_error = tmp_err; min_error_thr = thr; pos_neg = 'neg'; model.dim = dim; end endendmodel.min_error = min_error;model.min_error_thr = min_error_thr;model.pos_neg = pos_neg;
分类器的输入输出就不说了,分类器是最简单的与坐标轴垂直的超平面,模型从所有的dim*(sample_n+1)个超平面中,选择加权分类错误率最小的超平面,作为当前权重的最优超平面,并输出结果
调用的分类器测试函数:
function [L,hits,error_rate] = threshold_te(model,test_set,sample_weights,true_labels)%% TESTING THRESHOLD CLASSIFIER%% Testing of the basic linear classifier where seperation hyperplane is% perpedicular to one dimension.%% [L,hits,error_rate] = threshold_te(model,test_set,sample_weights,true_labels)%% model: the model that is outputed from threshold_tr. It consists of% 1) min_error: training error% 2) min_error_thr: threshold value% 3) pos_neg: whether up-direction shows the positive region (label:2, 'pos') or% the negative region (label:1, 'neg')% test_set: an NxD-matrix, each row is a testing sample in the D dimensional feature% space.% sample_weights: an Nx1-vector, each entry is the weight of the corresponding test sample% true_labels: Nx1 dimensional vector, each entry is the corresponding label (either 1 or 2)%% L: an Nx2-matrix showing likelihoods of each class% hits: the number of hits% error_rate: the error rate with the sample weights%%% Bug Reporting: Please contact the author for bug reporting and comments.%% Cuneyt Mertayak% email: cuneyt.mertayak@gmail.com% version: 1.0% date: 21/05/2007feat = test_set(:,model.dim);if(strcmp(model.pos_neg,'pos')) ind = (feat>model.min_error_thr)+1;else ind = (feat<model.min_error_thr)+1;endhits = sum(ind==true_labels);error_rate = sum(sample_weights(ind~=true_labels));L = zeros(length(feat),2);L(ind==1,1) = 1;L(ind==2,2) = 1;
模型训练函数就是从当前模型训练输入的数据,得到错误率等指标,这个跟模型训练函数对应,看懂那个这里就很简单,从训练的模型中,找出模型需要的那一纬数据,分类,不说了。
调用的AdaBoost训练函数:
function adaboost_model = ADABOOST_tr(tr_func_handle,te_func_handle,train_set,labels,no_of_hypothesis)%% ADABOOST TRAINING: A META-LEARNING ALGORITHM% adaboost_model = ADABOOST_tr(tr_func_handle,te_func_handle,% train_set,labels,no_of_hypothesis)%% 'tr_func_handle' and 'te_func_handle' are function handles for% training and testing of a weak learner, respectively. The weak learner% has to support the learning in weighted datasets. The prototypes% of these functions has to be as follows.%% model = train_func(train_set,sample_weights,labels)% train_set: a TxD-matrix where each row is a training sample in% a D dimensional feature space.% sample_weights: a Tx1 dimensional vector, the i-th entry% of which denotes the weight of the i-th sample.% labels: a Tx1 dimensional vector, the i-th entry of which% is the label of the i-th sample.% model: the output model of the training phase, which can% consists of parameters estimated.%% [L,hits,error_rate] = test_func(model,test_set,sample_weights,true_labels)% model: the output of train_func% test_set: a KxD dimensional matrix, each of whose row is a% testing sample in a D dimensional feature space.% sample_weights: a Dx1 dimensional vector, the i-th entry% of which denotes the weight of the i-th sample.% true_labels: a Dx1 dimensional vector, the i-th entry of which% is the label of the i-th sample.% L: a Dx1-array with the predicted labels of the samples.% hits: number of hits, calculated with the comparison of L and% true_labels.% error_rate: number of misses divided by the number of samples.%%% 'train_set' contains the samples for training and it is NxD matrix% where N is the number of samples and D is the dimension of the% feature space. 'labels' is an Nx1 matrix containing the class% labels of the samples. 'no_of_hypothesis' is the number of weak% learners to be used.%% The output 'adaboost_model' is a structure with the fields% - 'weights': 1x'no_of_hypothesis' matrix specifying the weights% of the resulted weighted majority voting combination% - 'parameters': 1x'no_of_hypothesis' structure matrix specifying% the special parameters of the hypothesis that is% created at the corresponding iteration of% learning algorithm%% Specific Properties That Must Be Satisfied by The Function pointed% by 'func_handle'% ------------------------------------------------------------------%% Note: Labels must be positive integers from 1 upto the number of classes.% Node-2: Weighting is done as specified in AIMA book, Stuart Russell et.al. (sec edition)%% Bug Reporting: Please contact the author for bug reporting and comments.%% Cuneyt Mertayak% email: cuneyt.mertayak@gmail.com% version: 1.0% date: 21/05/2007%adaboost_model = struct('weights',zeros(1,no_of_hypothesis),'parameters',[]); %cell(1,no_of_hypothesis));sample_n = size(train_set,1);samples_weight = ones(sample_n,1)/sample_n;for turn=1:no_of_hypothesis model=tr_func_handle(train_set,samples_weight,labels); adaboost_model.parameters{turn} =model; [L,hits,error_rate]=te_func_handle(adaboost_model.parameters{turn},train_set,samples_weight,labels); if(error_rate==1) error_rate=1-eps; elseif(error_rate==0) error_rate=eps; end % The weight of the turn-th weak classifier adaboost_model.weights(turn) = log10((1-error_rate)/error_rate); C=likelihood2class(L); t_labeled=(C==labels); % true labeled samples % Importance of the true classified samples is decreased for the next weak classifier samples_weight(t_labeled) = samples_weight(t_labeled)*((error_rate)/(1-error_rate)); % Normalization samples_weight = samples_weight/sum(samples_weight);end% Normalizationadaboost_model.weights=adaboost_model.weights/sum(adaboost_model.weights);
根据输入的迭代次数,迭代,得到新模型,计算新模型权重,更新样本权重,迭代。。。。。。
调用的AdaBoost测试函数:
function [L,hits] = ADABOOST_te(adaboost_model,te_func_handle,test_set,true_labels)%% ADABOOST TESTING%% [L,hits] = ADABOOST_te(adaboost_model,te_func_handle,train_set,% true_labels)%% 'te_func_handle' is a handle to the testing function of a% learning (weak) algorithm whose prototype is shown below.%% [L,hits,error_rate] = test_func(model,test_set,sample_weights,true_labels)% model: the output of train_func% test_set: a KxD dimensional matrix, each of whose row is a% testing sample in a D dimensional feature space.% sample_weights: a Dx1 dimensional vector, the i-th entry% of which denotes the weight of the i-th sample.% true_labels: a Dx1 dimensional vector, the i-th entry of which% is the label of the i-th sample.% L: a Dx1-array with the predicted labels of the samples.% hits: number of hits, calculated with the comparison of L and% true_labels.% error_rate: number of misses divided by the number of samples.%% It is the corresponding testing% module of the function that is specified in the training phase.% 'test_set' is a NxD matrix where N is the number of samples% in the test set and D is the dimension of the feature space.% 'true_labels' is a Nx1 matrix specifying the class label of% each corresponding sample's features (each row) in 'test_set'.% 'adaboost_model' is the model that is generated by the function% 'ADABOOST_tr'.%% 'L' is the likelihoods that are assigned by the 'ADABOOST_te'.% 'hits' is the number of correctly predicted labels.%% Specific Properties That Must Be Satisfied by The Function pointed% by 'func_handle'% ------------------------------------------------------------------%% Notice: Labels must be positive integer values from 1 upto the number classes.%% Bug Reporting: Please contact the author for bug reporting and comments.%% Cuneyt Mertayak% email: cuneyt.mertayak@gmail.com% version: 1.0% date: 21/05/2007%hypothesis_n = length(adaboost_model.weights);sample_n = size(test_set,1);class_n = length(unique(true_labels));temp_L = zeros(sample_n,class_n,hypothesis_n); % likelihoods for each weak classifier% for each weak classifier, likelihoods of test samples are collectedfor i=1:hypothesis_n [temp_L(:,:,i),hits,error_rate] = te_func_handle(adaboost_model.parameters{i},test_set,ones(sample_n,1),true_labels); temp_L(:,:,i) = temp_L(:,:,i)*adaboost_model.weights(i);endL = sum(temp_L,3);hits = sum(likelihood2class(L)==true_labels);懒得说了,把训练的模型,计算每个模型的结果,加权,投票决定最终结果。
一个结果辅助转换函数:
function classes = likelihood2class(likelihoods) % % LIKELIHOODS TO CLASSES % % classes = likelihood2class(likelihoods) % % Find the class assignment of the samples from the likelihoods % 'likelihoods' an NxD matrix where N is the number of samples and % D is the dimension of the feature space. 'likelihoods(i,j)' is % the i-th samples likelihood of belonging to class-j. % % 'classes' contains the class index of the each sample maximum likelihood % % Bug Reporting: Please contact the author for bug reporting and comments. % % Cuneyt Mertayak % email: cuneyt.mertayak@gmail.com % version: 1.0 % date: 21/05/2007 % [sample_n,class_n] = size(likelihoods); maxs = (likelihoods==repmat(max(likelihoods,[],2),[1,class_n])); classes=zeros(sample_n,1); for i=1:sample_n classes(i) = find(maxs(i,:),1); end
这个也不说了,就是把结果转化成矩阵,这个作用是什么,我也懒得看了,看别人的代码,不用看这么细,没必要。抓住精髓就好了。休息。
------------------
祝身体健康,万事如意
华电北风吹
天津大学认知计算与应用重点实验室
天津市卫津路92号
邮编: 300072
邮箱: 1194603539@qq.com
0 0
- ML—AdaBoost(二)—MATLAB代码
- AdaBoost—MATLAB代码
- ML—AdaBoost算法
- ML—AdaBoost(一)—历史
- ML—决策树(train,matlab)
- AdaBoost相关的Matlab代码
- ML—FullBNT学习笔记之一(matlab)
- ML—感知机算法(MATLAB)
- ML—逻辑回归算法(MATLAB)
- ML—决策树算法实现(train+test,matlab)
- Matlab代码转C++(二) —— mwArray
- Matlab代码转C++(二) —— mwArray
- ML—线性回归系列(二)—基础统计
- ML—核技巧
- ML—朴素贝叶斯
- ML—EM
- 数学基础—ML
- Adaboost——三个臭皮匠赛过诸葛亮
- 进程的内核栈和用户栈
- 详解onMeasure()(二)--利用onMeasure测量来实现图片拉伸永不变形,解决屏幕适配问题
- Ubuntu官方:制作USB启动盘
- 常见问题的解决方案
- Eval()时间格式化
- ML—AdaBoost(二)—MATLAB代码
- 百度地图在ie8下面的错误
- Sharedreference与Application的区别
- CentOS 6.5 安装配置Docker指南
- 设置线程堆栈大小-----一台电脑最多能开启多少个线程
- PHPStorm 安装 SASS、SCSS + Compass
- EXTJS 3 EXT容器布局(Fit,Card,Border)
- [c++]多继承
- mybatis学习之一对多查询如何避免只查出一条数据