Online Object Tracking: A Benchmark

来源：互联网发布：深圳数据恢复公司编辑：程序博客网时间：2024/05/22 08:28

Wu Y, Lim J, Yang M H. Online object tracking: A benchmark [C]//Computer vision and pattern recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013: 2411-2418.

本文地址：http://blog.csdn.net/shanglianlm/article/details/47376323

1. 简介（Introduction）

本文比较了50个视频集，29个比较的算法。

2. 相关工作（Related Work）

2.1 表示模式（Representation Scheme）

模板：
1. holistic templates (raw intensity values)-based tracking approaches [25, 39, 2].
2. subspace-based tracking approaches [11, 47].
3. sparse representation-based tracking approaches [40], then impoved by [41, 57, 64, 10, 55, 42].

详细见 [35]:

除了模板，还有许多视觉特征也别加入到跟踪算法中如 color histograms [16], histograms of oriented gradients (HOG) [17, 52], covariance region descriptor [53, 46, 56] and Haar-like features [54, 22].
discriminative model (在线学习一个二分类器来分辨目标和背景) [15, 4] 也被加入，such as SVM [3], structured output SVM [26], ranking SVM [7], boosting [4, 22], semiboosting [23] and multi-instance boosting [5].
为了有效地处理姿势变化和部分遮挡，目标的每个部分被表示为一个描述子或直方图，如 [1] several local histograms are used to represent the object in a pre-defined grid structure. [32] automatically update the topology of local patches to handle large pose changes.
为了有效处理 appearance variations，一些算法整合了多个表示模式 [62, 51, 33]。

2.2 搜索机理（Search Mechanism）

deterministic methods：当跟踪问题可以在一个优化框架内解决（假设目标函数关于运动参数可导），可以使用 gradient descent methods [16, 20, 49]. 但是这些目标函数通常是非线性的和包含很多局部最小值的。为了减轻这些问题，提出了 dense sampling methods [22, 5, 26] ，但是它们要求一个高的计算负荷。
stochastic methods：由于对局部最小值相对不敏感和计算效率高[47, 40, 30]，particle filters [44] 被广泛地用于跟踪。

2.3 模型的更新（Model Update）

[39] 使用来自第一帧和最近的帧的固定参考模板来更新模板。
其他还有 online mixture model [29], online boosting [22], and incremental subspace update [47].
对于 discriminative models ，主要的问题是提高 sample collection 来使 online-trained classifier 更鲁棒 [23, 5, 31, 26] 。
尽管提出了很多方法，但是仍然很难获得一个能处理 drifts 的 adaptive appearance model 。

2.4 跟踪器的上下文和融合（Context and Fusion of Trackers）

Context information
当目标被完全遮挡或者离开图片区域时，目标周围的上下文信息（auxiliary objects or local visual information） [59, 24, 18]就变得很有用了。
Fusion
[48] combines static, moderately adaptive and highly adaptive trackers to account for appearance changes.
multiple trackers [34] or multiple feature sets [61] are maintained and selected in a Bayesian framework to better account for appearance changes.

3. 评价的算法和数据集（Evaluated Algorithms and Datasets）

3.1 评价的算法（Algorithms ）

这里写图片描述
除此以外还有包含在 VIVID testbed [1] 中的 mean shift
(MS-V), template matching (TM-V), ratio shift (RS-V) and
peak difference (PD-V) 算法。

[1] R. Collins, X. Zhou, and S. K. Teh. An Open Source Tracking Testbed and Evaluation Web Site. In PETS, 2005.

3.2 评价的数据集（Datasets）

这里写图片描述

3.3 评价的视频属性（Attributes）

这里写图片描述

4. 评价方法（Evaluation Methodology）

precision 和 success rate。
Precision plot：正确估计（在一定阈值范围内，本文中为20像素）的帧在所有帧中所占比例。
Success plot：边界框重叠：
其中 rt 和 ra 分别为对应于跟踪获得的和真实获得的边界框。
本文使用 area under curve (AUC) 曲线来刻画。
Robustness Evaluation
1. Temporal Robustness Evaluation (TRE)：把视频划分成 20 段，从每帧开始运行直到完，计算总体结果。
2. Spatial Robustness Evaluation (SRE)：在第一帧使用8个空间偏移（其中4个中心偏移和4个角偏移，都为目标尺寸的10%）和4个尺度偏移（0.8,0.9,1.1 和 1.2）。共运行12次。

5. 评价结果（Evaluation Results）

5.1. 全部的性能（Overall Performance）

每类中最好的 10 个算法（为了清晰）的 success and precision plots 展示在下图：

这里写图片描述

5.2. 基于属性的性能分析（Attribute-based Performance Analysis）

When an object moves fast, dense sampling based trackers (e.g., Struck, TLD and CXT) perform much better than others.
On the OCC subset, the Struck, SCM, TLD, LSK and ASLA methods outperform others. The results suggest that structured learning and local sparse representations are effective in dealing with occlusions.
On the SV subset, ASLA, SCM and Struck perform best. The results show that trackers with affine motion models (e.g., ASLA and SCM) often handle scale variation better than others that are designed to account for only translational motion with a few exceptions such as Struck

5.3. 不同尺寸的初始化（Initialization with Different Scale）

这里写图片描述

The performance of TLD, CXT, DFT and LOT decreases with the increase of initialization scale. This indicates these trackers are more sensitive to background clutters.
Some trackers perform better when the scale factor is smaller, such as L1APG, MTT, LOT and CPF. One reason for this in the case of L1APG and MTT is that the templates have to be warped to fit the size of the usually smaller canonical template so that if the initial template is small, more appearance details will be kept in the model.
On the other hand, some trackers perform well or even better when the initial bounding box is enlarged, such as Struck, OAB, SemiT, and BSBT. This indicates that the Haar-like features are somewhat robust to background clutters due to the summation operations when computing features.
Overall, Struck is less sensitive to scale variation than other well-performing methods.

6. 结论（Concluding Remarks）

通过上面的大量实验我们得出三点结论：

背景信息很重要。可以考虑使用高级的学习技术，像 Struck 那样在判别式模型中隐式地编码背景信息或者像 CXT 那样明式地把背景信息作为跟踪的上下文。（First, background information is critical for effective tracking. It can be exploited by using advanced learning techniques to encode the background information in the discriminative model implicitly (e.g., Struck), or serving as the tracking context explicitly (e.g., CXT)）
局部信息也很重要。局部稀疏编码的方法（e.g., ASLA and SCM）通常比全局的编码方法（e.g., MTT and L1 APG）效果更好。特别是在目标部分遮挡或变形的时候。（Second, local models are important for tracking as shown in the performance improvement of local sparse representation (e.g., ASLA and SCM) compared with the holistic sparse representation (e.g., MTT and L1 APG). They are particularly useful when the appearance of target is partially changed, such as partial occlusion or deformation）
运动模型或者动态模型及其重要，特别是在目标的运动是大的或突变的时候。当前的大部分方法没有关注到这点。基于动态模型的局部位置预测能减少搜索范围，因此提高跟踪效率和鲁棒性。（Third, motion model or dynamic model is crucial for object tracking, especially when the motion of target is large or abrupt. However, most of our evaluated trackers do not focus on this component. Good location prediction based on the dynamic model could reduce the search range and thus improve the tracking efficiency and robustness. Improving these components will further advance the state of the art of online object tracking）

附：比较的方法

[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust Fragments-based Tracking using the Integral Histogram. In CVPR, 2006.
[5] B. Babenko, M.-H. Yang, and S. Belongie. Visual Tracking with Online Multiple Instance Learning. In CVPR, 2009
[10] C. Bao, Y. Wu, H. Ling, and H. Ji. Real Time Robust L1 Tracker Using Accelerated Proximal Gradient Approach. In CVPR, 2012.
[14] R. T. Collins. Mean-shift Blob Tracking through Scale Space. In CVPR, 2003.
[15] R. T. Collins, Y. Liu, and M. Leordeanu. Online Selection of Discriminative Tracking Features. PAMI, 27(10):1631–1643, 2005.
[16] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-Based Object Tracking. PAMI, 25(5):564–577, 2003.
[18] T. B. Dinh, N. Vo, and G. Medioni. Context Tracker: Exploring Supporters and Distracters in Unconstrained Environments. In CVPR, 2011.
[22] H. Grabner, M. Grabner, and H. Bischof. Real-Time Tracking via On-line Boosting. In BMVC, 2006.
[23] H. Grabner, C. Leistner, and H. Bischof. Semi-supervised On-Line Boosting for Robust Tracking. In ECCV, 2008.
[26] S. Hare, A. Saffari, and P. H. S. Torr. Struck: Structured Output Tracking with Kernels. In ICCV, 2011.
[27] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploiting the Circulant Structure of Tracking-by-Detection with Kernels. In ECCV, 2012.
[30] X. Jia, H. Lu, and M.-H. Yang. Visual Tracking via Adaptive Structural Local Sparse Appearance Model. In CVPR, 2012
[31] Z. Kalal, J. Matas, and K. Mikolajczyk. P-N Learning: Bootstrapping Binary Classifiers by Structural Constraints. In CVPR, 2010.
[33] J. Kwon and K. M. Lee. Visual Tracking Decomposition. In CVPR, 2010.
[34] J. Kwon and K. M. Lee. Tracking by Sampling Trackers. In ICCV, 2011
[36] B. Liu, J. Huang, L. Yang, and C. Kulikowsk. Robust Tracking using Local Sparse Appearance Model and K-Selection. In CVPR, 2011.
[43] S. Oron, A. Bar-Hillel, D. Levi, and S. Avidan. Locally Orderless Tracking. In CVPR, 2012.
[44] P. P´ erez, C. Hue, J. Vermaak, and M. Gangnet. Color-Based Probabilistic Tracking. In ECCV, 2002.
[47] D. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental Learning for Robust Visual Tracking. IJCV, 77(1):125–141, 2008
[49] L. Sevilla-Lara and E. Learned-Miller. Distribution Fields for Tracking. In CVPR, 2012
[50] S. Stalder, H. Grabner, and L. van Gool. Beyond Semi-Supervised Tracking: Tracking Should Be as Simple as Detection, but not Simpler than Recognition. In ICCV Workshop, 2009.
[58] Y. Wu, B. Shen, and H. Ling. Online Robust Image Alignment via Iterative Convex Optimization. In CVPR, 2012
[63] K. Zhang, L. Zhang, and M.-H. Yang. Real-time Compressive Tracking. In ECCV, 2012.
[64] T. Zhang, B. Ghanem, S. Liu, and N. Ahuja. Robust Visual Tracking via Multi-task Sparse Learning. In CVPR, 2012.
[65] W. Zhong, H. Lu, and M.-H. Yang. Robust Object Tracking via Sparsity-based Collaborative Model. In CVPR, 2012.

引用文献：
[1] A. Adam, E. Rivlin, and I. Shimshoni. Robust Fragments-based Tracking using the Integral Histogram. In CVPR, 2006.
[2] N. Alt, S. Hinterstoisser, and N. Navab. Rapid Selection of Reliable Templates for Visual Tracking. In CVPR, 2010.
[3] S. Avidan. Support Vector Tracking. PAMI, 26(8):1064–1072, 2004.
[4] S. Avidan. Ensemble Tracking. PAMI, 29(2):261–271, 2008.
这里写图片描述

0 0