行为识别笔记：iDT算法用法与代码解析

来源：互联网发布：js遍历二维数组编辑：程序博客网时间：2024/06/05 07:48

该文章转自http://blog.csdn.net/wzmsltw/article/details/53221179

基本功能

iDT算法框架中还包括Fisher Vector编码和SVM分类两阶段的工作，但作者提供的代码只包括到输出iDT特征的阶段，后续步骤需要使用其他代码或工具。其中有人编写了专门用与DT特征的FV编码C++程序：DTFV 。SVM则可以使用liblinear，对于高维度数据速度比较快。
因此，此处要讨论的iDT算法代码的输入为一段视频，输出为iDT特征的列表，每行为1个特征，对应视频中的某段轨迹。每个特征的维度为426维：Trajectory-30, HOG-96, HOF-108, MBH-192。每种特征的维数是如何得到的见iDT算法那篇博客。

编译与基本使用

详细内容见文件夹中的README。

iDT代码的依赖包括两个库：

OpenCV: readme中推荐用2.4.2，实际上用最新的2.4.13也没问题。但OpenCV3就不知道能不能用了，没有试过。
ffmpeg: readme中推荐用0.11.1。实际上装最新的版本也没有问题

这两个库的安装教程网上很多，就不再多做介绍了。而且也都是很常用的库。

在安装完以上两个库后，就可以进行代码编译了。只需要在代码文件夹下make一下就好，编译好的可执行文件在./release/下。

使用时输入视频文件的路径作为参数即可./release/DenseTrackStab ./test_sequences/person01_boxing_d1_uncomp.avi。

代码结构

iDT代码中主要包括如下几个代码文件

DenseTrackStab.cpp:iDT算法主程序
DenseTrackStab.h:轨迹跟踪的一些参数，以及一些数据结构体的定义
Descriptors.h:特征相关的各种函数
Initialize.h:初始化相关的各种函数
OpticalFlow.h:光流相关的各种函数
Video.cpp: 这个程序与iDT算法无关，只是作者提供用来测试两个依赖库是否安装成功的。

bound box相关内容

bound box即提供视频帧中人体框的信息，在计算前后帧的投影变换矩阵时，不使用人体框中的匹配点对。从而排除人体运动干扰，使得对相机运动的估计更加准确。

作者提供的文件中没有bb_file的格式，代码中也没有读入bb_file的接口，若需要用到需要在代码中添加一条读入文件语句（下面的代码解析中已经添加）。bb_file的格式如下所示

frame_id a1 a2 a3 a4 a5 b1 b2 b3 b4 b51

其中frame_id是帧的编号，从0开始。代码中还有检查步骤，保证bb_file的长度与视频的帧数相同。

后面的数据5个一组，为人体框的参数。按顺序分别为：框左上角点的x，框左上角点的y，框右下角点的x，框右下角点的y，置信度。需要注意的是虽然要输入置信度，但实际上这个置信度在代码里也没有用上的样子，所以取任意值也不影响使用。

至于如何获得这些bound box的数据，最暴力的方法当然是手工标注，不过这样太辛苦了。在项目中我们采用了SSD（single shot multibox detector）算法检测人体框的位置。

主程序代码解析

iDT算法代码的大致思路为：

读入新的一帧
通过SURF特征和光流计算当前帧和上一帧的投影变换矩阵
使用求得的投影变换矩阵对当前帧进行warp，消除相机运动影响
利用warp后的当前帧图像和上一帧图像计算光流
在各个图像尺度上跟踪轨迹并计算特征
保存当前帧的相关信息，跳到1

以下通过一些简单的注释对代码进行解析

#include "DenseTrackStab.h"#include "Initialize.h"#include "Descriptors.h"#include "OpticalFlow.h"#include <time.h>using namespace cv;//如果要可视化轨迹，将show_track设置为1int show_track = 0;int main(int argc, char** argv){    //读入并打开视频文件    VideoCapture capture;    char* video = argv[1];    int flag = arg_parse(argc, argv);    capture.open(video);    if(!capture.isOpened()) {        fprintf(stderr, "Could not initialize capturing..\n");        return -1;    }    //这句代码是我自己添加的，源代码中没有提供bb_file的输入接口    char* bb_file = argv[2];    int frame_num = 0;    TrackInfo trackInfo;    DescInfo hogInfo, hofInfo, mbhInfo;    //初始化轨迹信息变量    InitTrackInfo(&trackInfo, track_length, init_gap);    InitDescInfo(&hogInfo, 8, false, patch_size, nxy_cell, nt_cell);    InitDescInfo(&hofInfo, 9, true, patch_size, nxy_cell, nt_cell);    InitDescInfo(&mbhInfo, 8, false, patch_size, nxy_cell, nt_cell);    SeqInfo seqInfo;    InitSeqInfo(&seqInfo, video);    //初始化bb信息，将bb_file中的信息加载到bb_list中    std::vector<Frame> bb_list;    if(bb_file) {        LoadBoundBox(bb_file, bb_list);        assert(bb_list.size() == seqInfo.length);    }    if(flag)        seqInfo.length = end_frame - start_frame + 1;    if(show_track == 1)        namedWindow("DenseTrackStab", 0);    //初始化surf特征检测器    //此处200为阈值，数值越小则用于匹配的特征点越多，效果越好（不一定），速度越慢    SurfFeatureDetector detector_surf(200);    SurfDescriptorExtractor extractor_surf(true, true);    std::vector<Point2f> prev_pts_flow, pts_flow;    std::vector<Point2f> prev_pts_surf, pts_surf;    std::vector<Point2f> prev_pts_all, pts_all;    std::vector<KeyPoint> prev_kpts_surf, kpts_surf;    Mat prev_desc_surf, desc_surf;    Mat flow, human_mask;    Mat image, prev_grey, grey;    std::vector<float> fscales(0);    std::vector<Size> sizes(0);    std::vector<Mat> prev_grey_pyr(0), grey_pyr(0), flow_pyr(0), flow_warp_pyr(0);    std::vector<Mat> prev_poly_pyr(0), poly_pyr(0), poly_warp_pyr(0);    std::vector<std::list<Track> > xyScaleTracks;    int init_counter = 0; // 记录何时应该计算新的特征点    while(true) {        Mat frame;        int i, j, c;        // 读入新的帧        capture >> frame;        if(frame.empty())            break;        if(frame_num < start_frame || frame_num > end_frame) {            frame_num++;            continue;        }        /*-----------------------对第一帧做处理-------------------------*/        //由于光流需要两帧进行计算，故第一帧不计算光流        if(frame_num == start_frame) {            image.create(frame.size(), CV_8UC3);            grey.create(frame.size(), CV_8UC1);            prev_grey.create(frame.size(), CV_8UC1);            InitPry(frame, fscales, sizes);            BuildPry(sizes, CV_8UC1, prev_grey_pyr);            BuildPry(sizes, CV_8UC1, grey_pyr);            BuildPry(sizes, CV_32FC2, flow_pyr);            BuildPry(sizes, CV_32FC2, flow_warp_pyr);            BuildPry(sizes, CV_32FC(5), prev_poly_pyr);            BuildPry(sizes, CV_32FC(5), poly_pyr);            BuildPry(sizes, CV_32FC(5), poly_warp_pyr);            xyScaleTracks.resize(scale_num);            frame.copyTo(image);            cvtColor(image, prev_grey, CV_BGR2GRAY);            //对于每个图像尺度分别密集采样特征点            for(int iScale = 0; iScale < scale_num; iScale++) {                if(iScale == 0)                    prev_grey.copyTo(prev_grey_pyr[0]);                else                    resize(prev_grey_pyr[iScale-1], prev_grey_pyr[iScale], prev_grey_pyr[iScale].size(), 0, 0, INTER_LINEAR);                // 密集采样特征点                std::vector<Point2f> points(0);                DenseSample(prev_grey_pyr[iScale], points, quality, min_distance);                // 保存特征点                std::list<Track>& tracks = xyScaleTracks[iScale];                for(i = 0; i < points.size(); i++)                    tracks.push_back(Track(points[i], trackInfo, hogInfo, hofInfo, mbhInfo));            }            // compute polynomial expansion            my::FarnebackPolyExpPyr(prev_grey, prev_poly_pyr, fscales, 7, 1.5);            //human_mask即将人体框外的部分记作1,框内部分记作0            //在计算surf特征时不计算框内特征（即不使用人身上的特征点做匹配）            human_mask = Mat::ones(frame.size(), CV_8UC1);            if(bb_file)                InitMaskWithBox(human_mask, bb_list[frame_num].BBs);            detector_surf.detect(prev_grey, prev_kpts_surf, human_mask);            extractor_surf.compute(prev_grey, prev_kpts_surf, prev_desc_surf);            frame_num++;            continue;        }        /*-----------------------对后续帧做处理-------------------------*/        init_counter++;        frame.copyTo(image);        cvtColor(image, grey, CV_BGR2GRAY);        // 计算新一帧的surf特征，并与前一帧的surf特帧做匹配        // surf特征只在图像的原始尺度上计算        if(bb_file)            InitMaskWithBox(human_mask, bb_list[frame_num].BBs);        detector_surf.detect(grey, kpts_surf, human_mask);        extractor_surf.compute(grey, kpts_surf, desc_surf);        ComputeMatch(prev_kpts_surf, kpts_surf, prev_desc_surf, desc_surf, prev_pts_surf, pts_surf);        // 在所有尺度上计算光流，并用光流计算前后帧的匹配        my::FarnebackPolyExpPyr(grey, poly_pyr, fscales, 7, 1.5);        my::calcOpticalFlowFarneback(prev_poly_pyr, poly_pyr, flow_pyr, 10, 2);        MatchFromFlow(prev_grey, flow_pyr[0], prev_pts_flow, pts_flow, human_mask);        // 结合SURF的匹配和光流的匹配        MergeMatch(prev_pts_flow, pts_flow, prev_pts_surf, pts_surf, prev_pts_all, pts_all);        //用上述点匹配计算前后两帧图像之间的投影变换矩阵H        //为了避免由于匹配点多数量过少造成 投影变换矩阵计算出错，当匹配很少时直接取单位矩阵作为H        Mat H = Mat::eye(3, 3, CV_64FC1);        if(pts_all.size() > 50) {            std::vector<unsigned char> match_mask;            Mat temp = findHomography(prev_pts_all, pts_all, RANSAC, 1, match_mask);            if(countNonZero(Mat(match_mask)) > 25)                H = temp;        }        //使用上述得到的投影变换矩阵H对当前帧图像进行warp，从而消除相机造成的运动        Mat H_inv = H.inv();        Mat grey_warp = Mat::zeros(grey.size(), CV_8UC1);        MyWarpPerspective(prev_grey, grey, grey_warp, H_inv); // warp the second frame        // 用变换后的图像重新计算各个尺度上的光流图像        my::FarnebackPolyExpPyr(grey_warp, poly_warp_pyr, fscales, 7, 1.5);        my::calcOpticalFlowFarneback(prev_poly_pyr, poly_warp_pyr, flow_warp_pyr, 10, 2);        //在每个尺度分别计算特征        for(int iScale = 0; iScale < scale_num; iScale++) {            //尺度0不缩放，其余尺度使用插值方法缩放            if(iScale == 0)                grey.copyTo(grey_pyr[0]);            else                resize(grey_pyr[iScale-1], grey_pyr[iScale], grey_pyr[iScale].size(), 0, 0, INTER_LINEAR);            int width = grey_pyr[iScale].cols;            int height = grey_pyr[iScale].rows;            // compute the integral histograms            DescMat* hogMat = InitDescMat(height+1, width+1, hogInfo.nBins);            HogComp(prev_grey_pyr[iScale], hogMat->desc, hogInfo);            DescMat* hofMat = InitDescMat(height+1, width+1, hofInfo.nBins);            HofComp(flow_warp_pyr[iScale], hofMat->desc, hofInfo);            DescMat* mbhMatX = InitDescMat(height+1, width+1, mbhInfo.nBins);            DescMat* mbhMatY = InitDescMat(height+1, width+1, mbhInfo.nBins);            MbhComp(flow_warp_pyr[iScale], mbhMatX->desc, mbhMatY->desc, mbhInfo);            // 在当前尺度 追踪特征点的轨迹，并计算相关的特征            std::list<Track>& tracks = xyScaleTracks[iScale];            for (std::list<Track>::iterator iTrack = tracks.begin(); iTrack != tracks.end();) {                int index = iTrack->index;                Point2f prev_point = iTrack->point[index];                int x = std::min<int>(std::max<int>(cvRound(prev_point.x), 0), width-1);                int y = std::min<int>(std::max<int>(cvRound(prev_point.y), 0), height-1);                Point2f point;                point.x = prev_point.x + flow_pyr[iScale].ptr<float>(y)[2*x];                point.y = prev_point.y + flow_pyr[iScale].ptr<float>(y)[2*x+1];                if(point.x <= 0 || point.x >= width || point.y <= 0 || point.y >= height) {                    iTrack = tracks.erase(iTrack);                    continue;                }                iTrack->disp[index].x = flow_warp_pyr[iScale].ptr<float>(y)[2*x];                iTrack->disp[index].y = flow_warp_pyr[iScale].ptr<float>(y)[2*x+1];                // get the descriptors for the feature point                RectInfo rect;                GetRect(prev_point, rect, width, height, hogInfo);                GetDesc(hogMat, rect, hogInfo, iTrack->hog, index);                GetDesc(hofMat, rect, hofInfo, iTrack->hof, index);                GetDesc(mbhMatX, rect, mbhInfo, iTrack->mbhX, index);                GetDesc(mbhMatY, rect, mbhInfo, iTrack->mbhY, index);                iTrack->addPoint(point);                // 在原始尺度上可视化轨迹                if(show_track == 1 && iScale == 0)                    DrawTrack(iTrack->point, iTrack->index, fscales[iScale], image);                // 若轨迹的长度达到了预设长度,在iDT中应该是设置为15                // 达到长度后就可以输出各个特征了                if(iTrack->index >= trackInfo.length) {                    std::vector<Point2f> trajectory(trackInfo.length+1);                    for(int i = 0; i <= trackInfo.length; ++i)                        trajectory[i] = iTrack->point[i]*fscales[iScale];                    std::vector<Point2f> displacement(trackInfo.length);                    for (int i = 0; i < trackInfo.length; ++i)                        displacement[i] = iTrack->disp[i]*fscales[iScale];                    float mean_x(0), mean_y(0), var_x(0), var_y(0), length(0);                    if(IsValid(trajectory, mean_x, mean_y, var_x, var_y, length) && IsCameraMotion(displacement)) {                        // output the trajectory                        printf("%d\t%f\t%f\t%f\t%f\t%f\t%f\t", frame_num, mean_x, mean_y, var_x, var_y, length, fscales[iScale]);                        // for spatio-temporal pyramid                        printf("%f\t", std::min<float>(std::max<float>(mean_x/float(seqInfo.width), 0), 0.999));                        printf("%f\t", std::min<float>(std::max<float>(mean_y/float(seqInfo.height), 0), 0.999));                        printf("%f\t", std::min<float>(std::max<float>((frame_num - trackInfo.length/2.0 - start_frame)/float(seqInfo.length), 0), 0.999));                        // output the trajectory                        for (int i = 0; i < trackInfo.length; ++i)                            printf("%f\t%f\t", displacement[i].x, displacement[i].y);                        //实际上，traj特征的效果一般，可以去掉，那么输出以下几个就好了                        //如果需要保存输出的特征，可以修改PrintDesc函数                        PrintDesc(iTrack->hog, hogInfo, trackInfo);                        PrintDesc(iTrack->hof, hofInfo, trackInfo);                        PrintDesc(iTrack->mbhX, mbhInfo, trackInfo);                        PrintDesc(iTrack->mbhY, mbhInfo, trackInfo);                        printf("\n");                    }                    iTrack = tracks.erase(iTrack);                    continue;                }                ++iTrack;            }            ReleDescMat(hogMat);            ReleDescMat(hofMat);            ReleDescMat(mbhMatX);            ReleDescMat(mbhMatY);            if(init_counter != trackInfo.gap)                continue;            // detect new feature points every gap frames            std::vector<Point2f> points(0);            for(std::list<Track>::iterator iTrack = tracks.begin(); iTrack != tracks.end(); iTrack++)                points.push_back(iTrack->point[iTrack->index]);            DenseSample(grey_pyr[iScale], points, quality, min_distance);            // save the new feature points            for(i = 0; i < points.size(); i++)                tracks.push_back(Track(points[i], trackInfo, hogInfo, hofInfo, mbhInfo));        }        //这里有好多个copyTo prev_xxx        //因为计算光流，surf匹配等都需要上一帧的信息，故在每帧处理完后保存该帧信息，用作下一帧计算时用        init_counter = 0;        grey.copyTo(prev_grey);        for(i = 0; i < scale_num; i++) {            grey_pyr[i].copyTo(prev_grey_pyr[i]);            poly_pyr[i].copyTo(prev_poly_pyr[i]);        }        prev_kpts_surf = kpts_surf;        desc_surf.copyTo(prev_desc_surf);        frame_num++;        if( show_track == 1 ) {            imshow( "DenseTrackStab", image);            c = cvWaitKey(3);            if((char)c == 27) break;        }    }    if( show_track == 1 )        destroyWindow("DenseTrackStab");    return 0;}<std::list1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320

以上只是对程序代码的简单解析，如果需要使用到iDT的代码还是需要自己好好研究代码的，此篇笔记也只算是自己的一个笔记啦。个人感受iDT算法的思路非常经典，有很多值得参考的地方，代码也写的很好，可以进行修改用到别的地方。

阅读全文

0 0