行人检测论文 Integral Channel Features(上)

来源:互联网 发布:剑灵捏脸数据百度云 编辑:程序博客网 时间:2024/05/21 06:23
 Piotr Dollar 是行人检测研究领域的执牛耳者,在深度学习大行其道的今天,学习Dollar 的论文仍然很有意义。 在学习行人检测之前,建议先拜读作者的综述文章: [1]P.Dollar, C. Wojek,B. Schiele, et al. Pedestrian detection: an evaluation of the state of the art [J].IEEE Transactions。

另一种改进方法Aggrate Channel Features,我译作聚合通道特征。两者大同小异,以下分别简称为ICF和ACF。
ACF代码的matlab版本已经由作者提供在github上,地址如下:
http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
以上代码要求先配置opencv和作者的视觉工具箱,工具箱地址如下:
P. Dollár. Piotr’s Computer Vision Matlab T
http://vision.ucsd.edu/~pdollar/toolbox/doc/
里面分为:channels,classify,detector,filters,images,matlab,viedeos,结合作者的这篇论文好好研究,定有收获。
注:
(1)下载数据要放在适当位置。
(2)画roc曲线时必须把if(0)改为if(1).

ICF是ACF的先驱,那么什么是积分通道特征,看看原文的三句话:

(1)  multiple registered image channels are computed using linear and non-linear transformations of the input image(2)features are extracted from each channel using sums over local rectangular regions. (3)We refer to such features as integral channel features言而总之,类似于haar-like计算矩形块内的像素差分,ICF先对图像进行各种线性或非线性变化,然后计算矩形块内的像素和。如下图:

这里写图片描述

积分通道特征选用10 个通道:6 个方向的梯度直方图、3 个LUV 颜色通道和1 梯度幅值。这些通道可以高效计算并且捕获输入图像不同的信息,计算得到10 个通道后,分别在这10 通道内随机选择矩形区域和大小,求取其内部所有像素的像素值之和,最终随机选取30000 个矩形区域组成积分通道特征池。灰度图像中,像素点(x,y)的梯度幅值和方向分别为:

这里写图片描述

其中,H(x,y)是像素点(x,y)的像素值。梯度直方图是一个加权直方图,它的bin索引通过梯度的方向来计算,而它的权值则通过梯度的幅值计算。梯度直方图通道计算公式为:

这里写图片描述
式 3 中,G(x, y)和Q(x, y)分别代表图像(x, y) 像素处的梯度幅值和量化的梯度方向,L 是一个指示函数,thatea 为梯度方向arfa(x, y)的量化范围,本文选用 6 个方向的梯度直方图通道,因此thatea 取值范围分别为0-30 度、30-60 度、60-90 度、90-120 度、120-150 度、150-180 度。
6个方向的梯度直方图类似于Dalal使用的Histogram of Oriented Gradient,HOG+SVM进行行人检测的方法是法国研究人员Dalal在2005的CVPR上提出的,而如今虽然有很多行人检测算法不断提出,但基本都是以HOG+SVM的思路为主。ICF里梯度直方图特征也对分类贡献最大,和LUV 颜色通道以及梯度幅值单独做检测时检测率最高达到87.2%,如图:
这里写图片描述

提取好特征,链接并归一化,用soft cascade进行分类,检测部分4步:

(1)1000弱分类器,每次计算3个值(featureId,thresholdValue,directionValue)对应于一个3000行矩阵。

(2)计算每个矩形积分通道特征记为featureValueSat

 if( (featureValueSat - thresholdValueSat) * directionValueSat >=0 )            weakClass =  1;        else            weakClass = -1;     confidence = confidence + weakClass*alpha(classifierId);     

(3) if(confidence < magicThreshold) 如果累计到小与-3就不要循环1000个 弱分类器提前止,magicThreshold=-3即soft cascade的阈值
(4)NMS非极大值抑制,以后介绍,它是将重叠的检测窗口去重,保留置信度最大的检测窗口。

整个检测流程如下图:
这里写图片描述

 检测部分计算积分特征部分c++代码如下:
//*************************************    // 2. Run the soft cascade on the data    //*************************************    int windowCounter=0;    double weakClass = 0;    for (col = -1; col<(nCols - windowWidth); col++ ){    //322/((2^(1/8))^34) = 16.92          for (row = -1; row<(nRows - windowHeight); row++ ){  //248/((2^(1/8))^34) = 13.03          //Run the detector on this window        double confidence = 0;        for (classifierId = 0; classifierId<nClassifiers; classifierId++){   //nClassifiers=1000   7(第一次到第七分类                                                                            //器止<-3(soft cascade阀值)                                                                        //rows=0 col=-1 classifierId=9结束            // Compute the value of the 3 features associated with this            // weak classifier            // 1. Root            int featureId = feature(classifierId);    //1949 2503 2795(2) 2213 173            double thresholdValue = threshold(classifierId);//2.425 0.444 1.298 4.4 1.14            double directionValue = direction(classifierId);//1 1 -1 1            double featureValue;            if(featureId<nBaseFeatures){ //if feature 0<=f<=2999                // WARNING: wrong way of accessing the data: lots of                // cache faults. The data is in consecutive columns,                // but the memory is contiguous along the rows. I should                // transpose the matrix to solve this problem.                //-1 beacuse of matlab/c++ conventions                int channelId = rectangles[featureId*nRectCols + 4] -1;//1950行最后一个减1=4 10-1=9 5 2 4 1  ...3                int row0    = rectangles[featureId*nRectCols + 0]+row;  //23-1=22 8  8 18 12 5                int col0    = rectangles[featureId*nRectCols + 1]+col;  //6-1=5   9  2 6  3  6                int rowEnd  = rectangles[featureId*nRectCols + 2]+row;  //25-1=24 10 10 20 15 7                int colEnd  = rectangles[featureId*nRectCols + 3]+col;  //9-1=8   9  5 8  4  7                featureValue =                                         //应该是积分特征1+4-2-3//IC[4*249*323+9*249+25]=323974                    + IC[channelId*(nRows+1)*(nCols+1) + (colEnd+1) *(nRows+1) + (rowEnd+1)]                    - IC[channelId*(nRows+1)*(nCols+1) + (col0)     *(nRows+1) + (rowEnd+1)]                    - IC[channelId*(nRows+1)*(nCols+1) + (colEnd+1) *(nRows+1) + (row0)    ]  //2.5269-3.4851-7.65216+10.1123 = 1.50194                     + IC[channelId*(nRows+1)*(nCols+1) + (col0)     *(nRows+1) + (row0)    ];   //1.0576 .0999 0.1141 4.75 1.024 .. 2.16                }else{                //3000 - 3000  = channel 0                int channelId = featureId - nBaseFeatures;                double outside = 0;                double inside = 0;                int c= col +1; //this way it will go from 0 to...                int r= row +1; //this way it will go from 0 to...            //  IC=Integral Channels                outside = + IC[channelId*(nRows+1)*(nCols+1) + (12+c) *(nRows+1) + (31 +r)] //A  ??                      - IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + (27 +r)] //B                      + IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + ( 3 +r)] //C                      + IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + (27 +r)] //D                      - IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + ( 3 +r)] //E                      - IC[channelId*(nRows+1)*(nCols+1) + (12+c) *(nRows+1) + ( 0 +r)] //F                      - IC[channelId*(nRows+1)*(nCols+1) + ( 0+c) *(nRows+1) + (31 +r)] //G                      + IC[channelId*(nRows+1)*(nCols+1) + ( 0+c) *(nRows+1) + ( 0 +r)];//H                inside  = + IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + (27 +r)] //B                      - IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + ( 3 +r)] //C                      - IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + (27 +r)] //D                      + IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + ( 3 +r)];//E                 // featureValue = outside - inside; //TODO :: I added this line - Fabio                }                if( (featureValue - thresholdValue) * directionValue >=0 ){                // 2. Satisfy leaf                // I removed the +1 because the id's are in C++ style,                // they start from 0                int featureIdSat         = featureSat(  classifierId);   //classifierId:1000个弱分类器中的第几个从0                double thresholdValueSat = thresholdSat(classifierId);                double directionValueSat = directionSat(classifierId);                double featureValueSat;                if(featureIdSat<nBaseFeatures){                     //if feature 0<=f<=2999                    // WARNING: wrong way of accessing the data: lots of                    // cache faults. The data is in consecutive columns,                    // but the memory is contiguous along the rows.                    // I should transpose the matrix to solve this problem.                    //-1 beacuse of matlab/c++ conventions                    int channelId = rectangles[featureIdSat*nRectCols + 4] -1;                    int row0    = rectangles[featureIdSat*nRectCols + 0]+row;                    int col0    = rectangles[featureIdSat*nRectCols + 1]+col;                    int rowEnd  = rectangles[featureIdSat*nRectCols + 2]+row;                    int colEnd  = rectangles[featureIdSat*nRectCols + 3]+col;                    featureValueSat = + IC[channelId*(nRows+1)*(nCols+1) + (colEnd+1) *(nRows+1) + (rowEnd+1)]                              - IC[channelId*(nRows+1)*(nCols+1) + (col0)     *(nRows+1) + (rowEnd+1)]                              - IC[channelId*(nRows+1)*(nCols+1) + (colEnd+1) *(nRows+1) + (row0)    ]                              + IC[channelId*(nRows+1)*(nCols+1) + (col0)     *(nRows+1) + (row0)    ];                }else{                    //3000 - 3000  = channel 0                    int channelId = featureIdSat - nBaseFeatures ;                    if((channelId<0)||(channelId>9)){                    cout <<"error: negative channel\n";                    cout << "Negative channel." << endl                          << "Source code line: " << __FILE__                          << " @ " << __LINE__ << endl;                    return NULL;                    }                    double outside = 0;                    double inside = 0;                    int c= col +1; //this way it will go from 0 to...                    int r= row +1; //this way it will go from 0 to...                    outside = + IC[channelId*(nRows+1)*(nCols+1) + (12+c) *(nRows+1) + (31 +r)] //A                          - IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + (27 +r)] //B                          + IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + ( 3 +r)] //C                          + IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + (27 +r)] //D                          - IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + ( 3 +r)] //E                          - IC[channelId*(nRows+1)*(nCols+1) + (12+c) *(nRows+1) + ( 0 +r)] //F                          - IC[channelId*(nRows+1)*(nCols+1) + ( 0+c) *(nRows+1) + (31 +r)] //G                          + IC[channelId*(nRows+1)*(nCols+1) + ( 0+c) *(nRows+1) + ( 0 +r)];//H                    inside  = + IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + (27 +r)] //B                          - IC[channelId*(nRows+1)*(nCols+1) + ( 8+c) *(nRows+1) + ( 3 +r)] //C                          - IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + (27 +r)] //D                          + IC[channelId*(nRows+1)*(nCols+1) + ( 3+c) *(nRows+1) + ( 3 +r)];//E                    featureValueSat = outside - inside;                }                if( (featureValueSat - thresholdValueSat) * directionValueSat >=0 )                    weakClass =  1;                else                    weakClass = -1;
  代码中计算IC特征应该是矩形块内的像素和,具体没搞明白,如各位大神理解了望不吝赐教。。。。
1 0