[paper note] Human-In-The-Loop Person Re-Identification

来源：互联网发布：马蓉出轨内裤淘宝编辑：程序博客网时间：2024/05/07 17:17

ECCV 2016
Paper
Poster
Author: Hanxiao Wang, Shaogang Gong, Xiatian Zhu, Tao (Tony) Xiang

Professor Shaogang Gong from Queen Mary, University of London, who works closely with Dr. Tony Xiang, is an expert in person re-id field. He wrote a book naming Person Re-Identification. Since this book has been published, supervised learning method with CNN feature extractor has gradually dominate this field (person re-id). However, prof. Gong and his group are seeking for novel ways to resolve re-id problem. They have two papers about person re-id in ECCV 2016, together with Person Re-identification by Unsupervised L1 Graph Learning, both do not follow the supervised learning scheme.

model pipe line

Highlight

Propose Human Verification Incremental Learning (HVIL), an online learning approach for person re-id.
Do not need labeled data. Human participates in the training procedure, to give a pair of probe-gallery image a feedback as true, similar(but not true), dissimilar.
Small training set, large test set.
They show some other works attempting to relax the need of labeling, with semi-supervised, unsupervised and transfer learning approches, in Related Work part.

Modeling human feedback as a loss function

Incrementally optimised ranking function
$e r r (f x p (x g), y) =  y (r a n k (f x p (x g)))$
where fxp(xg) is the distanse of pair {xp,xg}, which is defined as negative Mahalanobis Distance. y denotes the feedback, that is, y∈ {true-match, strong-negative, week-negative} ({m,s,w}). rank is just an int number denotes the rank of a gallery image.
Re-id ranking loss y is defined as
$ y (k) o r = \sum i = 1 k α i i f y \in {m, w} = \sum i = k + 1 n g α i i f y \in {s}$
with α1≥α2≥⋯≥0

Real-time Model Update for Instant Feedback Reward

Negative Mahalanobis Distance:
$f x p (x g) = - [(x p - x g) T M (x p - x g)], M \in S d +$
Sd+ represents semi-definate matrix.
Knowledge cumulation by online learning
$M t = arg min M \in S d + Δ F (M, M t - 1) + η  (t)$
This equation with t indicate that the matrix M in M-distance is learned in stages (knowledge cumulation). (t) is the loss of human feed back in t stage. ΔF is Burg matrix divergence??
$Δ F (M, M t - 1) = t r (M M - 1 t - 1) - l o g d e t (M M - 1 t - 1)$

Metric ensemble learning

When no human feedback is avilable.
Idea: re-using pairs already verified by human
$f e n s i j = f e n s x p i (x g j) = - d T i j W d i j$
Ideal ranking: f∗ij=0 for ci=cj and f∗ij=−1 for ci≠cj.

Experiment

Settings
- For human feedback, 300 people/image probe; 1000 people/image gallery. Return top-50 in the rank list for feedback.
- Max 3 rounds for each probe, result in 300-900 indicative verification.
Claim that suer input will be 10-fold less.
Better than other human-in-the-loop methods. Less feedback and search time.
56.1% on CUHK-03, 78% on Market-1501.
Evaluate automated person re-id
- 168 pairs on CUHK-03, 234 pairs on Market-1501; supervised model trained with 300 ground truth data for comparison.
- Also compared with unensembled matrix after τ (Mτ) and average matrix M for all time 1−τ (Mavg)
- Ensembled performs best.

0 0