Understanding and Diagnosing Visual Tracking Systems

来源：互联网发布：学服装设计软件编辑：程序博客网时间：2024/05/19 03:23

文章把一个跟踪器分为几个模块，分别为motion model, feature extractor, observation model, model updater, and ensemble post-processor

例如将HOG作为特征，将SVM或者岭回归方法作为observation model，大多数论文关注的都是motion model，但是这个对最终的性能的影响不如特征的影响大，并且ensemble post-processor的影响也比较大，具体解释如下:

Motion Model:

Based on the estimation from the previous frame, the motion model generates a set of candidate regions or bounding boxes which may contain the target in the current frame

Feature Extractor:

The feature extractor represents each candidate in the candidate set using some features

Observation Model:

The observation model judges whether a candidate is the target based on the features extracted from the candidate.

Model Updater:

The model updater controls the strategy and frequency of updating the observation model. It has to strike a balance between model adaptation and drift.

Ensemble Post-processor:

When a tracking system consists of multiple trackers, the ensemble post-processor takes the outputs of the constituent trackers and uses the ensemble learning approach to combine them into the final result.

具体处理视频的流程如下图所示：

接下来作者对每一块进行了分析（重要性从前到后）

Feature Extractor
作者对灰度图像素值、颜色特征（CIE）、Haar-like特征、HOG特征、HOG+颜色特征（CIE）进行对比，发现HOG+颜色特征（CIE）的表现比较好，当然，使用CNN提取出的特征也是比较好的，特征的选择对结果的影响很大

Observation Model
对logistic regression、ridge regression、SVM、structured output SVM（SO-SVM）进行对比，发现，当特征的选取不太好的时候（灰度图像像素值），SO-SVM的效果是最好的，但是当特征的选取比较好的时候（HOG+颜色特征），最后结果相差无几

Motion Model

作者对Particle Filter（粒子滤波）和Sliding Window（滑窗）两种方式进行对比，说明了粒子滤波的两种好处

1：the particle filter approach can maintain a probabilistic estimation for each frame. Thus when several candidates have high probability of being the target, they will all be kept for the next frames. As a result, it can help to recover from tracker failure.

2：the particle filter framework can easily incorporate changes in scale, aspect ratio, and even rotation and skewness.

尺度变化或者快速运动的时候，作者认为需要调参，当你的视频是egocentric的，需要谨慎地设计motion model。作者最后还是把粒子滤波方法作为motion model，但是将input resize了一下，这个对结果的提高很重要（这个地方还不是特别懂作者的意思，英语没看懂。。。。）。

Model Updater
这个主要是决定model什么时候更新（when），以及更新的频率（frequency），主要是考虑更新model的时候，不要引入噪声，并且又完成了必要的更新操作，两种更新方式：

1：目标的confidence低于一个阈值

2：目标与负样本的confidence之差低于一个阈值

这里的阈值是针对overlap和中心像素点的距离的。首先，结果和阈值的设置有关，并且，后一种方法的结果较好参数范围更大

Ensemble Post-processor

当然，你用多种跟踪器/方法，最后进行一个处理得到的结果是会比单一类型的跟踪器要好的

最后，作者说明了一下，有很多的跟踪方法并不能按照他的这么划分，比如经典的mean-shift方法，或者基于deep learning的方法，并且，它没有考虑速度的问题，他的最好的组合在matlab上的速度大约为10fps，这个就做不到实时了。

更多的细节还得去原文中查阅

阅读全文

0 0