Scalable Recognition with a Vocabulary Tree

来源：互联网发布：it狂人第一季编辑：程序博客网时间：2024/05/22 14:31

Scalable Recognition with a Vocabulary Tree

1,Text Retrieval Approach

The text retrieval approach:
(1)Parsing an article into words;
(2)Some words have the same stem, e.g.,“walk”,”walking”,”walks”, these different variants have the same stem: walk;
(3)Some words like “the” and “an” are extremely common in articles,and  have almost no contribution to text retrieval. So they should be excluded;
(4)Each article represented as a histogram vector, and each element of the vector is the frequency of some word (actually some stem); such as TF (short for Term Frequency):
where t is the number of total stems；ni is the number of words which have the same stem i in the article；
(5)Considering the fact that different words have different contribution to the retrieval, so weighting is necessary.Such as IDF (short for Inverse Document Frequency):
where wi is for the weight of stem i;
(6)Finally, an article represented as:
where

2,Visual Words

Construction of visual words:
(1)SIFT keypoints extraction from images:
  Images------>SIFT keypoints     v.s.        Articles------->words
(2)Applying k-means clustering to the 128-D feature space,defining k cluster centers:
      Keypoints in a cluster          v.s.        Variants having the same stem
     Cluster center -----> Visual Word

3,Vocabulary Tree

The vocabulary tree partitions the feature space hierarchically by hierarchical k-means clustering.A node of the tree is a visual word, and also a cluster center of the k-means clustering.See the following steps:
(1) SIFT features extraction from the training image set; each 128-D descriptor vector is a point in the feature space;
(2) Partitioning the feature space into k parts by k-means clustering; descriptor vectors into k groups; k is the branch factor of the called vocabulary tree;
(3) Applying the same process recursively to each group of descriptor vectors,so also into k parts; up to some maximum number of levels L;
(4) Level 1 has k cluster centers,Level 2 has k^2 cluster centers,... Level L has k^L cluster centers; every cluster center is a visual word,so we get t=k+k^2+k^3...+k^L visual words.

4,Representation of An Image

The database image d represented as an vector , where t is the total number of visual words.
TF-IDF scheme also used here:
(1) mi is the number of descriptor vectors of the database image d, with a path through node i, in other words the number of descriptor vectors with a same visual word i.
(2) wi is the weight of visual word i , in the paper entropy weighting is used:

Where N is the total number of images in the database, and Ni is the number of images with at least one descriptor vector path through node i (at least one descriptor vector with visual word i )

In the same way, the query image q is represented as

5,Similarity Measure Based on Relevance Score

The relevance score s of database image d is defined as the normalized difference between the query vector and database vector:

6,Fast Image Retrieval Based on Vocabulary Tree

Every node is associated with an inverted file which stores the id-number of the images as well as for each image the term frequency mi.
Assume that the entropy of each node is fixed, the vectors representing database images can be pre-computed and normalized to unit vector. So does the query image.
The normalized difference in Lp-norm:
For the case of L2-norm, it simplifies to:
For each non-zero , the inverted file can be used to traverse the corresponding non-zero database entries .

References:
1,词袋模型bag of words精讲
2,Scalable Recognition with a Vocabulary Tree

0 0