structed_structured output tracking with kernels

来源:互联网 发布:苹果电脑下不了淘宝 编辑:程序博客网 时间:2024/05/16 23:54

http://blog.csdn.net/ikerpeng/article/details/39525321

http://blog.csdn.net/qianxin_dh/article/details/39377959


Paper Review: Struck: Structured Output Tracking with Kernels by Sam Hare, Amir Saffari, Philip H. S. Torr

Warning
My advisor asked me to review this paper on tracking. I really enjoyed this paper, then I decided to do a small post of it, because the paper is interesting and I tend to forget the things that I read, therefore  I am doing this post as a personal reminder of the cool aspects of the paper and math of it. There are still some pieces of the paper that I do not fully know how they get it, but at least the post will tell you  the main ideas involved. Let's start.

What problem is solving ?

This paper is solving the problem of tracking (select an initial bounding box. The position of the bounding box is changing as the frames run, trying to keep the main object/thing was initially inside the bounding box ). Specifically they propose a variant of tracking by detection framework (detection task applied over time). These methods relies on classifiers that can be trained online in order to handle appearance change of the object. Samples from arriving frames are used to update the trainer. For this reason, a labeller must decide the label of the sample and feed the classifier. The main research focus of tracking by detection is how to deal with poorly labelled samples, then robust loss functions, semi-supervised learning methods or multiple instance learning methods are used to train the classifier.
The test process relies in sliding window technique around expected position of the object and assign the new position as the best detection.


What is this paper doing different?

In the traditional approach the classifier output tells you how similar the image inside the tested bounding box is to the trained model of the object, then giving this information you decide the position of the best bounding box. Classifier has two possible labels "look like the object" or "look like something else".
In contrast, in this paper the output of the classifier is directly how much to translate the bounding box (they are planning to add rotations as future work). They used a polar grid with 5 radial and 16 angular divisions, giving 81 possible translations. In other words, the classifier has 81 possible output labels or classes corresponding to the  81 possible translations, named y.
Structural SVM is the tool used as classifier. Before going to that point, it is good to remind a little bit the traditional SVM. In binary SVM, you use your sample data to train the best hyper plane that separates two classes (output labels). In the case of linear SVM, it is determined by the vector w which has normal direction to the hyperplane. You can build a multiclass SVM in several ways including 1 vs all or 1 vs 1 (and voting)... that is a different long story that I am not going to talk about now.
The point is that by using Structural-SVM you can do the multi class classifier in a single optimization problem, which makes a good alternative in this case where you have 81 possible outputs. Our structure is simply 81 alternatives, there is no really structure or hierarchy like in a parse tree in NLP.
Structural SVM

The main trick in this paper

If we talk about training samples then xi is the feature of a training sample, and  y_i is the translation of the sample.

 


0 0