SIFT论文整理

来源：互联网发布：君の知らない物语无损编辑：程序博客网时间：2024/04/29 13:05

Distinctive Image Featuresfrom Scale-Invariant Keypoints

Distinctive Image Featuresfrom Scale-Invariant Keypoints
- Introduction
- Related research
- Detection of scale-space extrema
  - 1 Local extrema detection
  - 2 Frequency of sampling in scale
  - 3 Frequency of sampling in the spatial domain
- Accurate keypoint localization
  - 1 Eliminating edge responses
- Orientation assignment
- The local image descriptor - 给特征点赋值一个128维方向参数
  - 1 Descriptor representation
- Application to object recognition
  - 1 Keypoint matching
  - 2 Efficient nearest neighbor indexing
  - 3 Clustering with the Hough transform
- References

本文主要是对Lowe SIFT论文的提炼，标注自己阅读论文时需要重点理解的知识点，以备日后回顾时，无需从头看论文。（仅供他人参考）

1. Introduction

Scale-space extrema detection:
Keypoint localization
Orientation assignment
Keypoint descriptor

…..

3. Detection of scale-space extrema

Detecting locations that areinvariant to scale change of the image can be accomplished by searching for stable featuresacross all possible scales, using a continuous function of scale known as scale space (Witkin,1983).

构建尺度空间
LoG近似DoG找到关键点<检测DOG尺度空间极值点>

3.1 Local extrema detection

In order to detect the local maxima and minima of D(x, y, σ), each sample point is comparedto its eight neighbors in the current image and nine neighbors in the scale above and below(see Figure 2). It is selected only if it is larger than all of these neighbors or smaller than allof them. The cost of this check is reasonably low due to the fact that most sample points willbe eliminated following the first few checks.

3.2 Frequency of sampling in scale

To summarize, these experiments show that the scale-space difference-of-Gaussian func-tion has a large number of extrema and that it would be very expensive to detect them all.Fortunately, we can detect the most stable and useful subset even with a coarse sampling of scales.

3.3 Frequency of sampling in the spatial domain

Just as we determined the frequency of sampling per octave of scale space, so we must de-termine the frequency of sampling in the image domain relative to the scale of smoothing.Given that extrema can be arbitrarily close together, there will be a similar trade-off betweensampling frequency and rate of detection. Figure 4 shows an experimental determination ofthe amount of prior smoothing, σ, that is applied to each image level before building thescale space representation for an octave.

Of course, if we pre-smooth the image before extrema detection, we are effectively dis-carding the highest spatial frequencies. Therefore, to make full use of the input, the imagecan be expanded to create more sample points than were present in the original. We double the size of the input image using linear interpolation prior to building the first level ofthe pyramid.

4. Accurate keypoint localization

Once a keypoint candidate has been found by comparing a pixel to its neighbors, the nextstep is to perform a detailed fit to the nearby data for location, scale, and ratio of principalcurvatures. This information allows points to be rejected that have low contrast (and aretherefore sensitive to noise) or are poorly localized along an edge.

4.1 Eliminating edge responses

For stability, it is not sufficient to reject keypoints with low contrast. The difference-of-Gaussian function will have a strong response along edges, even if the location along theedge is poorly determined and therefore unstable to small amounts of noise.

5. Orientation assignment

By assigning a consistent orientation to each keypoint based on local image properties, the keypoint descriptor can be represented relative to this orientation and therefore achieve in-variance to image rotation. This approach contrasts with the orientation invariant descriptorsof Schmid and Mohr (1997), in which each image property is based on a rotationally invariant measure. The disadvantage of that approach is that it limits the descriptors that can be usedand discards image information by not requiring all measures to be based on a consistentrotation.

Peaks in the orientation histogram correspond to dominant directions of local gradients.The highest peak in the histogram is detected, and then any other local peak that is within 80% of the highest peak is used to also create a keypoint with that orientation. Therefore, forlocations with multiple peaks of similar magnitude, there will be multiple keypoints created atthe same location and scale but different orientations. Only about 15% of points are assignedmultiple orientations, but these contribute significantly to the stability of matching. Finally, aparabola is fit to the 3 histogram values closest to each peak to interpolate the peak positionfor better accuracy.

6. The local image descriptor - 给特征点赋值一个128维方向参数

6.1 Descriptor representation

7. Application to object recognition

7.1 Keypoint matching

7.2 Efficient nearest neighbor indexing

No algorithms are known that can identify the exact nearest neighbors of points in high di-mensional spaces that are any more efficient than exhaustive search. Our keypoint descriptorhas a 128-dimensional feature vector, and the best algorithms, such as the k-d tree (Friedmanet al., 1977) provide no speedup over exhaustive search for more than about 10 dimensionalspaces. Therefore, we have used an approximate algorithm, called the Best-Bin-First (BBF) algorithm (Beis and Lowe, 1997).

7.3 Clustering with the Hough transform

To maximize the performance of object recognition for small or highly occluded objects, wewish to identify objects with the fewest possible number of feature matches. We have foundthat reliable recognition is possible with as few as 3 features .

References

http://blog.csdn.net/abcjennifer/article/details/7639681
Lowe SIFT 原文: http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

0 0