OpenCv中决策树源代码解读(一)

来源：互联网发布：淘宝运营讲师编辑：程序博客网时间：2024/06/08 02:52
/*********************************************************************************************************模块说明:        Decision Trees模块说明:        1)The ML classes discussed in this section implement Classification and Regression Tree algorithms   described in [Breiman84]2)The class CvDTree represents a single decision tree that may be used to alone or as a base class    in tree ensembles(see Boosting and Random Trees)        3)A decision tree is a binary tree(tree where each non-leaf node has two child nodes).It can be used    either for classification or for regression.For classification,each tree leaf is marked with a    class label;multiple leaves may have the same label.For regression,a constant is also assigned to    each tree leaf,so the approximation function is piecewise constant.决策树的用途：        1）Predicting with Decision Trees:To reach a leaf node and to obtain a response for the input feature           vector,the prediction procedure starts with the root node.From each non-leaf node the procedure           goes to the left(selects the left child nodes as the next observed)or the right based on the value           of a certain variable whose index is stored in the observed node.The following variables are possibe:           1)Ordered variables:The variable value is compared with a threshold that is also stored in the node.                     the procedure goes to the left.Otherwise,it goes to the right.For example.if the weight is                      less than 1 kilogram,the procedure goes to the left,else to the right.           2)Categorical variables. A discrete variable value is tested to see whether it belongs to a certain              subset of values (also stored in the node) from a limited set of values the variable could  take. If it does, the procedure goes to the left. Otherwise, it goes to the right. For  example, if the color is green or red, go to the left, else to the right.           1)So, in each node, a pair of entities (variable_index , decision_rule (threshold/subset) ) is used.              This pair is called a split (split on the variable variable_index ). Once a leaf node is  reached, the value assigned to this node is used as the output of the prediction procedure.                     (因此，在每一个节点中，使用一对实体(variable_index , decision_rule (threshold/subset)) 变量索引，决策规则(阈值或者子集))，这对实体被称为一个分裂（在分裂变量variable_index上进行分 裂）。一旦到达叶节点，这个值将被被分配给一个节点，这个节点被用于预测过程的输出        2）Variable Importance   Besides the prediction that is an obvious use of decision trees, the tree can be also used for various  data analyses. One of the key properties of the constructed decision tree algorithms is an ability to compute the importance (relative decisive power) of each variable. For example, in a spam filter that uses a set of words occurred in the message as a feature vector, the variable importance rating can be used to determine the most “spam-indicating” words and thus help keep the dictionary size reasonable.            Importance of each variable is computed over all the splits on this variable in the tree, primaryand surrogate ones. Thus, to compute variable importance correctly, the surrogate splits must be enabled in the training parameters, even if there is no missing data. 【小结】:    下面将从以下几个方面对决策树DecisionTree进行讲解:    1)决策树分裂结构体-----------------CvDTreeSplit2)决策树中的节点结构体-------------CvDTreeNode3)决策树中的参数结构体-------------CvDTreeParams4)决策树中的训练参数结构体---------CvDTreeTrainData5)决策树结构体---------------------CvDTree6)机器学习中的训练函数train()7)机器学习中的预测函数predict()8)机器学习库中的通用类CvStatModel类的介绍**********************************************************************************************************//*******************************************************************************************************模块【1】:        1）The structure represents a possible decision tree node split.2）这个结构体表示一个可能分裂的决策树节点********************************************************************************************************/struct CvDTreeSplit{    int           var_idx;                //[1]创建分裂的变量索引    int           condensed_idx;          //[2]    int           inversed;               //[3]如果它不为空，则使用左分支和右分支在下面的规则表达式中进行                                      //   逆分裂规则    float         quality;                //[4]分裂质量，一个正数。它用于选择最佳的分裂，然后选择和排序代                                      //   理分割.树构建之后，它也被用于计算变量的重要性    CvDTreeSplit* next;                   //[5]指向分裂节点列表中下一个分裂的指针    union    {        int subset[2];                    //[6]位组数，表示在分类变量上分裂时的值子集        struct        {            float c;                      //[7]在有序变量上分裂的情况下的阈值            int   split_point;            //[8]训练算法内部使用        }        ord;    };};/*******************************************************************************************************模块【2】:     1）决策树中的节点结构体 2）The structure represents a node in a decision tree. 3）这个结构体代表的是决策树中的一个节点成员说明:     1)class_idx：Class index normalized to 0..class_count-1 range and assigned to the node. It is used               internally in classification trees and tree ensembles(类索引，它的范围在[0,class_count-1])  这个区间，并且将这些索引分配指定给节点。这个参数通常被用于分类树或者树的组合 2）Tn：Tree index in a ordered sequence of pruned trees. The indices are used during and after the         pruning procedure. The root node has the maximum value Tn of the whole tree, child nodes have Tn less than or equal to the parent’s Tn, and nodes with Tn \leq CvDTree::pruned\_tree\_idx are not used at prediction stage (the corresponding branches are considered as cut-off), even if they have not been physically deleted from the tree at the pruning stage.(有序序列修剪树中树的索引。这个指数被用于修剪树的修剪过程之中和修剪过程之后。根节点具有整个树的最大值Tn,孩子节点的Tn小于或者等于父节点的Tn.并且在预测阶段不使用具有Tn \leq CvDTree::pruned\_tree\_idx的节点，即使它们在修剪阶段没有从树上删除)3）value：Value at the node: a class label in case of classification or estimated function value in case         of regression(节点上的值。分类情况下的类标签或者回归情况下的估计函数值)4）parent：Pointer to the parent node(指向父节点的指针)5）left：Pointer to the left child node(指向左孩子节点的指针)6）right:Pointer to the right child node(指向右孩子节点的指针)7）split:Pointer to the first(primary) split in the node of splits(指向分裂节点列表中，第一个或者主分裂         点的的地址)    8）sample_count：The number of samples that fall into the node at the training stage.It is used to         resolve the difficult cases-When the variable for the primary split is missing and all the variables for other surrogate splits are missing too.In this cases,the sample is directed tothe left if left->sample>right->sample_count and to the right otherwise.（在训练节点，落入该节点的样本的数量。这个参数被用于解决困难情况下的判断，当主分裂的变量丢失并且凄然代理分裂的变量也丢失的情况下，如果 left->sample>right->sample_count，则指向左支树，否则，指向右边的子树）9）depth：Depth of the node.The root node depth is 0,thc child nodes depth is the parent's depth+1.（        节点的深度，父节点的深度为0，孩子节点的深度=父节点的深度+1）10）其他的参数，在训练阶段的内部使用********************************************************************************************************/struct CvDTreeNode{    int           class_idx;    int           Tn;    double        value;    CvDTreeNode*  parent;    CvDTreeNode*  left;    CvDTreeNode*  right;    CvDTreeSplit* split;    int           sample_count;    int           depth;    int*          num_valid;    int           offset;    int           buf_idx;    double        maxlr;    // global pruning data    int           complexity;    double        alpha;    double        node_risk, tree_risk, tree_error;    // cross-validation pruning data    int*          cv_Tn;    double*       cv_node_risk;    double*       cv_node_error;    int   get_num_valid(int vi) { return num_valid ? num_valid[vi] : sample_count; }    void  set_num_valid(int vi, int n) { if( num_valid ) num_valid[vi] = n; }};/****************************************************************************************************模块【3】:       决策树参数结构体:CvDTreeParams参数说明:       1）max_categories：最大的类别参数，在结构体中，max_categories默认为10，这个值限制了特征类别取值        的数目，也就是说这个分类器最多只能分10类。这个参数对于有序的或者数字的特征没有影响，因为对于有序的或者数字的特征，算法仅仅需要找到分割的阈值。类别个数超过max_categories个类别个数。这样，决策树每次只需要测试不多于max_categories个层。当这个参数设置的比较小的时候，将减少计算的时间复杂度，但是也损失了精确度2）max_depth：树的最大可能深度。那就是算法尝试在其深度小于max_depth的时候去分割节点。如果其他的    迭代条件被满足或者这棵树被修剪，它的实际深度可能更小。3）min_sample_count：如果在这个节点中的样本的数量少于这个参数，那么，这个节点将不会被分割。（If    the number of samples in a node is less than this parameter then the node will not be split）4）cv_folds：If cv_folds > 1 then prune a tree with K-fold cross-validation where K is equal to    cv_folds（如果cv_folds>1,则使用K-fold交叉验证修剪树，其中K的值为cv_flods）5）use_surrogates：If true then surrogate splits will be built. These splits allow to work with    missing data and compute variable importance correctly.(如果这个变量的值为真，则将建立代理分   裂，这些分裂允许使用缺省的数据，并且正确的计算变量的重要性)6）use_1se_rule： If true then a pruning will be harsher. This will make a tree more compact and     more resistant to the training data noise but a bit less accurate.（若果这个值为真，则将会更加   严格的修剪树，这将使得这个树更加的紧凑，并且更加能够抵抗训练过程中的噪声数据，但是结果将不会太   准确）7）truncate_pruned_tree： If true then pruned branches are physically removed from the tree.    Otherwise they are retained and it is possible to get results from the original unpruned (or    pruned less aggressively) tree by decreasing CvDTree::pruned_tree_idx parameter(如果为真，则将   修剪过的分支从树上移除；否则，它们将被保留并且有可能从原来没有被修剪过的树中得到结果，通过减少   CvDTree::pruned_tree_idx parameter)8）regression_accuracy：Termination criteria for regression trees. If all absolute differences    between an estimated value in a node and values of train samples in this node are less than   this parameter then the node will not be split.(用于回归树的终止标准。如果这个节点中的估计值   和这个节点中训练样本的所有绝对差值都小于这个参数，那么，则个节点将被分割)9）priors：The array of a priori class probabilities, sorted by the class label value. The    parameter can be used to tune the decision tree preferences toward a certain class. For example,    if you want to detect some rare anomaly occurrence, the training base will likely contain much    more normal cases than anomalies, so a very good classification performance will be achieved    just by considering every case as normal. To avoid this, the priors can be specified,    where the anomaly probability is artificially increased (up to 0.5 or even greater), so the    weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly.    You can also think about this parameter as weights of prediction categories which determine    relative weights that you give to misclassification. That is, if the weight of the first category    is 1 and the weight of the second category is 10, then each mistake in predicting the second    category is equivalent to making 10 mistakes in predicting the first category.(这个参数设置了   错误分类的代价。这就是说，如果第一个类别的代价是1，第二个类别的代价就是10，则预测第二个类别出现   一些错误就等于预测第一个类别出现10次错误。在毒蘑菇的程序中，我们判断有毒和无毒的蘑菇，所以我们   惩罚吧毒蘑菇认为为无毒10倍于把无毒的蘑菇认为有毒)*****************************************************************************************************/struct CV_EXPORTS_W_MAP CvDTreeParams{    CV_PROP_RW int   max_categories;                 CV_PROP_RW int   max_depth;    CV_PROP_RW int   min_sample_count;    CV_PROP_RW int   cv_folds;    CV_PROP_RW bool  use_surrogates;    CV_PROP_RW bool  use_1se_rule;    CV_PROP_RW bool  truncate_pruned_tree;    CV_PROP_RW float regression_accuracy;    const float* priors;    CvDTreeParams();    CvDTreeParams( int max_depth, int min_sample_count,                   float regression_accuracy, bool use_surrogates,                   int max_categories, int cv_folds,                   bool use_1se_rule, bool truncate_pruned_tree,                   const float* priors );};/****************************************************************************************************模块【4】:       决策树训练数据结构体:CvDTreeTrainData结构体说明:       1)决策树的训练数据和树集成应用中的共享数据。   2）这个结构主要被用于内部存储翻个决策树的数据或者树集合中的数据。   它基本上包含以下类型的信息：      1）训练参数，一个CvDTreeParams的实例  2）训练数据进行预处理，以便更加有效的进行最佳分割。  3）对于树的组合，这些预处理数据被所有树重用。此外，集合中所有树共享的训练数据特征被存储在这     里。这些变量包括：变量的类型，类别的数量，类标签的压缩映射等  4）缓冲区。树节点的存储空间，分裂和其他构建树的元素3）使用这种结构体的方法有两种：  1）在简单的情况下(例如，独立的决策树或者机器学习中随机森林或者梯度提升中的黑盒组合)，无需关     心树的结构。你只需要构建你所需要的统计模型，然后进行训练和使用它。  2）对于自定义的树和其他复杂情况，可以明确的构建和使用这种结构方案如下所示:  1）结构体使用默认的构造函数进行初始化，然后使用set_data()设置数据，或者使用完整形式的构造函     数构建。_shared这个参数必须是true          2）使用此数据训练一个或者多个数          3）一旦所有使用它的树被释放，结构就被释放了  *****************************************************************************************************/struct CV_EXPORTS CvDTreeTrainData{    CvDTreeTrainData();    CvDTreeTrainData( const CvMat* trainData,                   int tflag,                      const CvMat* responses,   const CvMat* varIdx=0,                      const CvMat* sampleIdx=0,   const CvMat* varType=0,                      const CvMat* missingDataMask=0,                      const CvDTreeParams& params=CvDTreeParams(),                      bool _shared=false,   bool _add_labels=false );    virtual ~CvDTreeTrainData();    virtual void set_data( const CvMat* trainData,                                 int tflag,                           const CvMat* responses,    const CvMat* varIdx=0,                           const CvMat* sampleIdx=0,    const CvMat* varType=0,                           const CvMat* missingDataMask=0,                           const CvDTreeParams& params=CvDTreeParams(),                           bool _shared     =false,    bool _add_labels =false,                           bool _update_data=false );    virtual void do_responses_copy();    virtual void get_vectors( const CvMat* _subsample_idx,    float* values, uchar* missing, float* responses, bool get_class_idx=false );    virtual CvDTreeNode* subsample_data( const CvMat* _subsample_idx );    virtual void         write_params( CvFileStorage* fs ) const;    virtual void         read_params( CvFileStorage* fs, CvFileNode* node );    // release all the data    virtual void clear();    int get_num_classes() const;    int get_var_type(int vi) const;    int get_work_var_count() const {return work_var_count;}    virtual const float* get_ord_responses( CvDTreeNode* n, float* values_buf, int* sample_indices_buf );    virtual const int*   get_class_labels( CvDTreeNode* n, int* labels_buf );    virtual const int*   get_cv_labels( CvDTreeNode* n, int* labels_buf );    virtual const int*   get_sample_indices( CvDTreeNode* n, int* indices_buf );    virtual const int*   get_cat_var_data( CvDTreeNode* n, int vi, int* cat_values_buf );    virtual void         get_ord_var_data( CvDTreeNode* n, int vi, float* ord_values_buf, int* sorted_indices_buf,                                   const float** ord_values, const int** sorted_indices, int* sample_indices_buf );    virtual int          get_child_buf_idx( CvDTreeNode* n );    ////////////////////////////////////    virtual bool set_params( const CvDTreeParams& params );    virtual CvDTreeNode* new_node( CvDTreeNode* parent, int count,int storage_idx, int offset );    virtual CvDTreeSplit* new_split_ord( int vi, float cmp_val,int split_point, int inversed, float quality );    virtual CvDTreeSplit* new_split_cat( int vi, float quality );    virtual void free_node_data( CvDTreeNode* node );    virtual void free_train_data();    virtual void free_node( CvDTreeNode* node );    int  sample_count, var_all, var_count, max_c_count;    int  ord_var_count, cat_var_count, work_var_count;    bool have_labels, have_priors;    bool is_classifier;    int  tflag;    const CvMat* train_data;    const CvMat* responses;    CvMat*       responses_copy; // used in Boosting    int buf_count, buf_size; // buf_size is obsolete, please do not use it, use expression ((int64)buf->rows * (int64)buf->cols / buf_count) instead    bool shared;    int is_buf_16u;    CvMat* cat_count;    CvMat* cat_ofs;    CvMat* cat_map;    CvMat* counts;    CvMat* buf;    inline size_t get_length_subbuf() const    {        size_t res = (size_t)(work_var_count + 1) * (size_t)sample_count;        return res;    }    CvMat* direction;    CvMat* split_buf;    CvMat* var_idx;    CvMat* var_type; // i-th element =                     //   k<0  - ordered                     //   k>=0 - categorical, see k-th element of cat_* arrays    CvMat*        priors;    CvMat*        priors_mult;    CvDTreeParams params;    CvMemStorage* tree_storage;    CvMemStorage* temp_storage;    CvDTreeNode*  data_root;    CvSet*        node_heap;    CvSet*        split_heap;    CvSet*        cv_heap;    CvSet*        nv_heap;    cv::RNG*      rng;};/*******************************************************************************************************模块【5】:         决策树结构体:CvDTree********************************************************************************************************/class CV_EXPORTS_W CvDTree : public CvStatModel{public:    CV_WRAP CvDTree();    virtual ~CvDTree();    /****************************************************************************************************函数说明:        此块需要说明的是，用于训练决策树的训练函数(方法)用两种：1）第一种用于决策树的直接使用2）第二种用于集成学习中的方法(比如boosting)或者(随机森林)函数说明:        下面这个训练方法就是：直接用于决策树训练的函数*****************************************************************************************************/    virtual bool train( const CvMat* trainData,                       int   tflag,                        const CvMat* responses, const CvMat* varIdx=0,                        const CvMat* sampleIdx=0, const CvMat* varType=0,                        const CvMat* missingDataMask=0,                        CvDTreeParams params=CvDTreeParams() );    /****************************************************************************************************函数说明:        这个函数用于集成学习中(boosting或者随机森林)的训练*****************************************************************************************************/    virtual bool train( CvMLData* trainData,                     CvDTreeParams params=CvDTreeParams() );    // type in {CV_TRAIN_ERROR, CV_TEST_ERROR}    virtual float calc_error( CvMLData* trainData,                           int type,   std::vector<float> *resp = 0 );    virtual bool train( CvDTreeTrainData* trainData,                     const CvMat* subsampleIdx );    virtual CvDTreeNode* predict( const CvMat* sample,                               const CvMat* missingDataMask=0,                                  bool preprocessedInput=false ) const;    CV_WRAP virtual bool train( const cv::Mat& trainData,                             int tflag,    const cv::Mat& responses, const cv::Mat& varIdx=cv::Mat(),    const cv::Mat& sampleIdx=cv::Mat(), const cv::Mat& varType=cv::Mat(),    const cv::Mat& missingDataMask=cv::Mat(),    CvDTreeParams params=CvDTreeParams() );    CV_WRAP virtual CvDTreeNode* predict( const cv::Mat& sample,                                       const cv::Mat& missingDataMask=cv::Mat(),                                          bool preprocessedInput=false ) const;/****************************************************************************************************函数原型:        virtual const CvMat* get_var_importance()函数说明:        返回每个特征属性/特征变量的重要性；这个函数将会返回一个N*1的数组，数组中的每一个数字发表一个特征向量/特征属性的重要性，返回值的类型均为双精度浮点型；其中1代表最重要，0代表最不重要，不重要的属性，可以在第二次训练的过程中去掉。*****************************************************************************************************/      CV_WRAP virtual cv::Mat getVarImportance();       virtual const CvMat* get_var_importance();    CV_WRAP virtual void clear();    virtual void  read( CvFileStorage* fs, CvFileNode* node );    virtual void write( CvFileStorage* fs, const char* name ) const;    // special read & write methods for trees in the tree ensembles    virtual void read( CvFileStorage* fs, CvFileNode* node,CvDTreeTrainData* data );    virtual void write( CvFileStorage* fs ) const;    /****************************************************************************************************函数原型:        const CvDTreeNode* get_root() const;函数说明:        Returns the root of the decision tree(返回这个决策树的根节点)*****************************************************************************************************/    const CvDTreeNode* get_root() const;/****************************************************************************************************函数原型:        int     get_pruned_tree_idx() const函数说明:        Returns the CvDTree::pruned_tree_idx parameter；The parameter CvDTree::pruned_tree_idx is usedto prune a decision tree,See the CvDTreeNode::Tn parameter(这个参数被用于修剪决策树，具体的使用方法请看CvDTreeNode::Tn parameter)*****************************************************************************************************/    int     get_pruned_tree_idx() const;/****************************************************************************************************函数原型:        CvDTreeTrainData* get_data();函数说明:       Returns used train data of the decision tree.*****************************************************************************************************/    CvDTreeTrainData* get_data();protected:    friend struct cv::DTreeBestSplitFinder;    virtual bool do_train( const CvMat* _subsample_idx );    virtual void try_split_node( CvDTreeNode* n );    virtual void split_node_data( CvDTreeNode* n );    virtual CvDTreeSplit* find_best_split( CvDTreeNode* n );    virtual CvDTreeSplit* find_split_ord_class( CvDTreeNode* n, int vi,                            float init_quality = 0, CvDTreeSplit* _split = 0, uchar* ext_buf = 0 );    virtual CvDTreeSplit* find_split_cat_class( CvDTreeNode* n, int vi,                            float init_quality = 0, CvDTreeSplit* _split = 0, uchar* ext_buf = 0 );    virtual CvDTreeSplit* find_split_ord_reg( CvDTreeNode* n, int vi,                            float init_quality = 0, CvDTreeSplit* _split = 0, uchar* ext_buf = 0 );    virtual CvDTreeSplit* find_split_cat_reg( CvDTreeNode* n, int vi,                            float init_quality = 0, CvDTreeSplit* _split = 0, uchar* ext_buf = 0 );    virtual CvDTreeSplit* find_surrogate_split_ord( CvDTreeNode* n, int vi, uchar* ext_buf = 0 );    virtual CvDTreeSplit* find_surrogate_split_cat( CvDTreeNode* n, int vi, uchar* ext_buf = 0 );    virtual double calc_node_dir( CvDTreeNode* node );    virtual void complete_node_dir( CvDTreeNode* node );    virtual void cluster_categories( const int* vectors, int vector_count,        int var_count, int* sums, int k, int* cluster_labels );    virtual void calc_node_value( CvDTreeNode* node );    virtual void prune_cv();    virtual double update_tree_rnc( int T, int fold );    virtual int cut_tree( int T, int fold, double min_alpha );    virtual void free_prune_data(bool cut_tree);    virtual void free_tree();    virtual void write_node( CvFileStorage* fs, CvDTreeNode* node ) const;    virtual void write_split( CvFileStorage* fs, CvDTreeSplit* split ) const;    virtual CvDTreeNode* read_node( CvFileStorage* fs, CvFileNode* node, CvDTreeNode* parent );    virtual CvDTreeSplit* read_split( CvFileStorage* fs, CvFileNode* node );    virtual void write_tree_nodes( CvFileStorage* fs ) const;    virtual void read_tree_nodes( CvFileStorage* fs, CvFileNode* node );    CvDTreeNode*      root;    CvMat*            var_importance;    CvDTreeTrainData* data;public:    int pruned_tree_idx;};/****************************************************************************************************模块【6】:        ML(机器学习)中的训练函数train()参数说明:        1）const CvMat* trainData-----ML中的train()方法根据具体的算法呈现不同的形式,所有的算法都以一个         指向CvMat矩阵的指针作为训练数据，矩阵必须是32FC1(32位浮点单通道类型)。CvMat结构可以支 持多通道矩阵，但是机器学习只采用单通道矩阵。一般情况下，这个矩阵以行排列数据样本，每个 数据样本被表示为一行行向量。矩阵的每一列表示某个变量在不同的数据样本中不同的取值，这样 就组成了一个二维的数据矩阵。请记住输入数据矩阵的构成方式(行，列)=（数据样本，特征变量） ，但是，有些算法可以直接处理转置后的输入矩阵。针对这些算法，可以使用参数tflag向算法表明 训练数据是以列排列的。这样不需要转置一个大的数据矩阵，如果算法可以处理行排列和列排列， 可以使用下面的标记。 1）tflag = CV_ROW_SAMPLE：特征向量以行排列(默认情况) 2）tflag = CV_COL_SAMPLE：特征向量以列排列3）const CvMat* responses-----返回的参数responses可以是类别的标签(如毒蘑菇分类中的有毒和无毒标         签)，也可以是连续的数值(如温度计测得的人体体温)，返回值通常是一个一维的向量，向量的每 一个值对应一个输入数据的样本，但是神经网络除外，神经网络的每个样本返回一个向量。 返回值有两种类型:    1）对于分类问题：类型是整型(32SC1)2）对已回归问题：类型是浮点型(32FC1)3）一些算法只能处理分类问题，一些算法只能处理回归问题，还有一些算法回归和分类问题都   可以处理，这个时候需要用到参数varType来进行设定   1）varType=CV_VAR_CATRGORICAL:分类问题，输出离散的类型标签   2）varType=CV_VAR_ORDERED:输出的是数值类型的标签4）varIdx和sampleIdx：ML库中很多模型可以选择训练特定数据子集或特定的特征子集，为了   方便用户，train()方法包含了向量参数var_id和sampleIdx，可以用它们来指定使用那些数   据样本和那些特征样本5)missingDataMask：另外，一些算法可以处理数据丢失的情况*****************************************************************************************************/   virtual bool train( const CvMat* trainData,                          int   tflag,                       const CvMat* responses,    const CvMat* varIdx=0,                       const CvMat* sampleIdx=0,    const CvMat* varType=0,                       const CvMat* missingDataMask=0,                       CvDTreeParams params=CvDTreeParams() );/****************************************************************************************************模块【7】:        ML(机器学习)中的预测函数predict()参数说明:        1）const CvMat* sample:是一个需要预测的浮点型特征向量2）missingDataMask    ：是一个同样长度和同样大小的字节向量，它的非0值预示着对应的特征丢失3）preprocessedInput  ：输入的数据是否被归一化函数返回值:        预测方法返回决策树的一个节点，可以通过(CvDTreeNode*)->得到预测值*****************************************************************************************************/virtual CvDTreeNode* predict( const CvMat* sample,                               const CvMat* missingDataMask=0,                              bool         preprocessedInput=false ) const;/*******************************************************************************************************文件说明:        CvStatModel----Computer vision statistic model-----计算机视觉统计学习模型********************************************************************************************************/class CV_EXPORTS_W CvStatModel{public:    CvStatModel();                                        //[1]构造函数    virtual ~CvStatModel();                               //[2]析构函数    virtual void clear();                                                          //[3]将训练出的模型以XML或者YAML格式保存到硬盘    CV_WRAP virtual void save( const char* filename, const char* name=0 ) const;                                                      //[4]首先调用clean()函数,然后通过此函数装载XML格式的模型    CV_WRAP virtual void load( const char* filename, const char* name=0 );    virtual void write( CvFileStorage* storage, const char* name ) const;    virtual void read( CvFileStorage* storage, CvFileNode* node );protected:    const char* default_model_name;};
阅读全文
0 0