matlab tooltiks for machine learning--CLOP

来源:互联网 发布:阿里云和百度云 编辑:程序博客网 时间:2024/05/16 09:16

CLOP LINK

一,Introduction

1, What is CLOP?

CLOP是基于Spider package的。所以它也提供了Spider的所有的功能:

– bias
– chain
– ensemble
– gentleboost
– gs
– kridge

– naive
– neural
– normalize
– pc extract
– relief
– rf
– rffs
– shift n scale
– standardize
– subsample
– svc
– svcrfe
– s2n

他们都是为了实现利用面向对象编程方式实现机器学习的工具包。

2, How to install CLOP?

Step 1. 下载CLOP压缩包,解压到某个路径下:root_dir: MyProjects/

Step 2. set_path 到 工作路径下 MyProjects/

Step 3. 将准备的datasets放置到 MyProjects/Data/

Step 4. 然后就可以通过main函数来实现CLOP的例子了

3, How to run CLOP?

参考main函数

4, Compilation of SVC

5,面向对象的matlab

CLOP将model都抽象为了对象object:

复制代码
•  An object is a structure (i.e. has data members), which has a number ofprograms (or methods) associated to it.  The methods modify eventuallythe data members.•  The methods of an object myObject are stored in a directory named @my-Object, which must be in your MATLAB path if you want to use the object(e.g. call addpath).•  One particular method, called constructor is named myObject. It is called(with eventually some parameters) to create a new object.  For example:>>  myObjectInstance  =  myObject(someParameters);•  Once an instance is created, you can call a particular method. For exam-ple:>>  myResults  =  myObjectMethod(myObjectInstance,  someOtherParameters);•  Note that myObjectInstance should be the first argument of myObject-Method. Matlab knows that because myObjectInstance is an instance ofmyObject, it must call the method myObjectMethod found in the direc-tory @myObject.  This allows methods overloading (i.e.  calling methodsthe same name for different objects.)•  Inheritance is supported in MATLAB, so an object may be derived fromanother object.  A child object inherits from the methods of its parents.For example:>>  myResults  =  myParentMethod(myObjectInstance,  someOtherParameters);In that case, the method myParentMethod is found in the parent directory@myObjectParent, unless of course it has been overloaded by a methodof the same name found in @myObject.
复制代码

二, Sample Program 介绍示例函数:main

1,main.m

(1)Initialization--清理数据

clear allclose all%clc

(2)Initialization--设定一些参数

复制代码
my_root     = 'C:/Users/wenting.tu/Documents/MATLAB';  % Change that to the directory of your projectdata_dir    = [my_root '/DataAgnos_w_labels'];    %  DataAgnos Path to the five data directories downloaded (ADA, GINA, HIVA, NOVA, SYLVA).resu_dir    = [my_root '/Results']; % Where the results will end up.    zip_dir     = [my_root '/Zipped'];  % Zipped files with results ready to go!model_dir   = [my_root '/Models'];  % Where the trained models will end up.code_dir    = [my_root '/Clop'];    % Path to the sample code or the                                     % Challenge Learning Objects Package (CLOP).score_dir   = [my_root '/Score'];   % Directory where the model scores are found.ForceOverWrite  = 1;                % Change this value to 0 if you want to be warned when                                    % a file already exists before saving a result or model.                                    DoNotLoadTestData   = 1;            % To save memory, does not load the test dataMergeDataSets   = 0;                % If this flag is zero, training is done on the                                    % training data only. Otherwise training                                    % and validation data are merged. FoldNum = 0;                        % If this flag is positive,                                     % k-fold cross-validation is performed.                                    % with k=FoldNum.CorrectBias=0;                      % Post fitting of the bias by cross-validation                                    % Works only if FoldNum>0
复制代码

(3)Train/Test--循环训练和测试models

Step.1 设置数据名称

复制代码
% LOOP OVER DATASETS % ===================for k = 1:length(dataset)        data_name   = dataset{k};        fprintf('\n-o-|-o-|-o-|-o-|-o-|-o-      %s      -o-|-o-|-o-|-o-|-o-|-o-\n', upper(data_name));    fprintf('\n-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-|-o-\n\n');
复制代码

Step.2 为模型存储设定文件夹 + 载入数据

Loading the dataset and creating data structures:

自动检测数据是否为mat格式,如果不是,则转为mat格式再存储。下次loop中就快了

复制代码
% LOOP OVER MODELS     % ================    for j = 1:length(modelset)                model_name  = modelset{j};                resu_name   = [resu_dir '/' model_name ];        makedir(resu_name);        resu_name   = [resu_name '/' data_name];                % Create a data structure and check the data        %===========================================        % Note: it may seem waistful to reload the data every time.        % This is done because of memory management problems.        fprintf('-- %s loading data --\n', upper(data_name));        input_dir   = [data_dir '/' upper(data_name)];        input_name  = [input_dir '/' data_name];        D   = create_data_struct(input_name, DoNotLoadTestData);        % New: compute data statistics        data_stats(D);        fprintf('-- %s data loaded --\n', upper(data_name));        % Note: the data are saved as a Matlab structure         % so they will load faster the second time around. 
复制代码

Step 3. 如果有模型就直接load

calls model examples.m function with the model and dataset names, in order to get back a valid CLOP or Spider model.

复制代码
        % Build a model        %==============        fprintf('-- %s-%s building  model\n', upper(data_name), upper(model_name));        % "model_examples" calls a model constructor and returns a learning object.        % Enter at the prompt "> type model_examples" to view the examples.        % All learning objects have the two methods "train" and "test".        % To see the data members, type at the prompt "> struct(my_model)".        my_model    = model_examples(model_name, data_name);        % Save the model (untrained) in the result file        save_model([resu_name , '_model'], my_model, ForceOverWrite, 0);        % Reload it just to make sure        my_model    = load_model([resu_name , '_model']);        fprintf('-- %s-%s model built\n', upper(data_name), upper(model_name));        
复制代码

Step 4. 没有模型就要建立了

复制代码
        % Train the model        %================        debug=0; % Set to 1 to compute the training error in the process of training        guessed_ber=[];        if MergeDataSets            % Get rid of the validation set            if isempty(D.valid.Y),                 fprintf('!! Cannot merge validation set, labels not available !!\n');                return            end            D.train = data([D.train.X ; D.valid.X], [D.train.Y ; D.valid.Y]);            rmfield(D, 'valid');        end
复制代码

Step.5 交叉检验你的模型,计算出CV BER

 

复制代码
        cv_ber=[];        chain_length=[];        correct_bias=0;        if FoldNum > 0            % Create a CV model            cv_model    = cv(my_model, {['folds=' num2str(FoldNum)], 'store_all=0'});            fprintf('-- %s-%s performing %d fold cross-validation\n', upper(data_name), upper(model_name), FoldNum);            % Call the method "train" of the object "cv_model":            cv_output   = train(cv_model, D.train);             % Collect the results            OutX = []; OutY = []; ber =[];            for kk = 1:FoldNum,                outX    = cv_output.child{kk}.X;                outY    = cv_output.child{kk}.Y;                OutX    = [OutX; outX];                 OutY    = [OutY; outY];                 ber(kk) = balanced_errate(outX, outY);            end            % Check whether to do a bias correction            if isa(my_model, 'chain') & CorrectBias                chain_length=length(my_model.child);                if isa(my_model{chain_length}, 'bias')                    correct_bias=1;                end            end            if correct_bias                fprintf('== CV post-fitting of the bias ==\n');                % Train a bias model with the cv outputs                [d, cv_bias]=train(bias, data(OutX, OutY));                            % Compensate the bias                OutX=OutX+cv_bias.b0;            end            % Compute the CV error rate and error bar            cv_ber   = balanced_errate(OutX, OutY);            cv_ebar    = std(ber,1)/sqrt(FoldNum);            fprintf('CV BER=%5.2f+-%5.2f%%\n', 100*cv_ber, 100*cv_ebar);             fprintf('-- %s-%s cross-validation done in %5.2f seconds\n', upper(data_name), upper(model_name), toc);            fprintf('-- %s-%s training model on all training data\n', upper(data_name), upper(model_name));            % Guessing the BER...            guessed_ber=cv_ber;         end
复制代码