- Kannav Mehta (2019101044)
- Raj Maheshwari (2019101039)
- Triansh Sharma (2019101006)
- Aashwin Vaish (2019114014)
Github link. The repo contains the SIFT algorithm implemented in C++.
The main objective of this project is to implement the SIFT algorithm described in the paper by David G. Lowe [1] from scratch (without any computer-vision dependencies). Distinct invariant features are extracted from images and matched with those from other views of the object or scene. These features are invariant to scaling, rotation, and give robust matching over a range of affine transforms.
Following are the major stages of computation used to generate the set of image features using SIFT:
Finding scale-invariant locations in the image can be done by searching for stable features across all possible scales. The scale space of an image is defined as a function,
Gaussian function is the only possible scale-space kernel under reasonable assumptions as showed by Koenderink(1984) and Lindberg(1994).
Scale invariability is implemented to efficiently detect stable keypoint locations in scale space by computing
Lindeberg(1994) showed that normalisation of Laplacian by
The scale-normalized Laplacian
The constant factor
The images are incrementally convolved with Gaussian kernels as shown above. Every consecutive image differs by a factor of
Octave 0 | ||||
---|---|---|---|---|
Octave 1 | ||||
Octave 2 | ||||
Octave 3 | ||||
Octave 4 | ||||
Octave 5 | ||||
Octave 6 | ||||
For
Once each candidate is found by comparing its pixel value with its neighbours (
Keypoints are selected based on measures of their stability. The location of the extremum,
where taylor expansion of
To obtain more stable fixed points, we apply a threshold on the value of
We define Hessian
We modified the
This shows that on increasing the threshold value, the number of keypoints increases and we obtain a curve which is similar to a parabolic graph.
One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to each feature’s set orientation, scale, and location, thereby providing invariance to these transformations.
The scale of the keypoint is used to select the Gaussian smoothed image,
An orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint. The orientation histogram has 36 bins covering the 360-degree range of orientations. We SHIFT the binned histogram by the orientation angle. That is, if the orientation angle is 93 degrees clockwise, we shift it
Feature matching on rotated views (Rotational Invariance) |
The local image gradients are measured at the selected scale in the region around each keypoint. These samples are then accumulated into orientation histograms summarising the contents over 4x4 subregions, as described in the figure, with the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within the region calculated using a Gaussian weighted function. This allows for significant levels of local shape distortion and change in illumination.
Once features have been extracted from all
The Fast Library for Approximate Nearest Neighbors (FLANN) is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It uses either one of two algorithms automatically depending on the situation, as they each perform predictably better depending on given situation. The two algorithm it selects from are :
- randomized kd-tree
- hierarchical k-means tree
The classic kd-tree algorithm consists of repeated binary partitioning the data on the dimension that shows the highest variance. However, the classic kd-tree algorithm is efficient for low dimensions only.
An improved but approximate version proposed by Silpa-Anan and Hartley (2008) creates multiple randomized trees from the top
This algorithm works by recursively splitting the datapoints into
In RootSIFT we use a square root (Hellinger) kernel instead of the standard Euclidean distance to measure the similarity between SIFT descriptors. RootSIFT is an enhanced SIFT descriptor.
Euclidean distance is a straight line distance between two pixels. The Hellinger distance between two points x and y is defined as: $$ H(x,y) = \sum_{i=1}^{n} \sqrt{x_i y_i} $$
RANSAC (random sample consensus) is a robust estimation procedure that uses a minimal set of randomly sampled correspondences to estimate image transformation parameters and finds a solution that has the best consensus with the data. In the case of panoramas, we select sets of
SIFT features are extracted from all of the images. After matching all of the features using a k-d tree, the m images with the most significant number of feature matches to a given image are checked for an image match. First RANSAC is performed to compute the homography, and then a probabilistic model is invoked to verify the image match based on the number of inliers. For each pair that have a consistent set of feature matches, connected components of image matches are detected and stitched into panoramas.
As a test for the verification of the Keypoints and the descriptors our implementation produces, we decided to use them to do image stitching. Firstly, the use of invariant features (SIFT features in this case) enables reliable matching of panoramic image sequences despite rotation, zoom and illumination change in the input images. Here are a few examples illustrating the use of the features that we generate to first do feature matching and then use the matches to find the homography and stitch the images.
Individual Images | ||
Stitched Panaroma | ||
Feature matching (168 good matches) |
Individual Images | ||
Stitched Panaroma |
We analyzed multiple images by applying various filters and affine transformations and matched features to know the effectiveness of SIFT descriptors when matching using FLANN.
SIFT ensures rotational and scale invariance, whose examples are already mentioned above. Apart from it we judged the effectiveness of SIFT by modifying images on illumination, blurring and skewing. Following results are obtained.
Size: 512
Input image | Keypoints |
Feature matching under different brightness |
SIFT descriptors are produced in such a way that they are able to obtain robust image matching in presence of affine transformations. Skew is such an example.
Feature matching under after skew transformation |
Feature matching under after add artifical noise to the image. The noise was added to all the three channels. |
Number of keypoints matched are reduced to 8. These correspond to the largest value maxima and minima in the scale space as smaller value maxima and minima have been eroded due to blurring.
Feature matching under different blurring |
- Kannav: Gaussian Pyramid, DoG generation, Multithreading, Scale Space Extrema Detection, Sub-pixel Localization, Orientation Assignment, Keypoint Pruning, Descriptor Generation, Image Stitching & Feature Matching.
- Triansh: Gaussian Pyramid, DoG generation, Sub-pixel Localization, Scale Space Extrema Detection, Orientation Assignment, Keypoint Pruning, Descriptor Generation, Image Stitching & Feature Matching.
- Raj: Gaussian Pyramid, DoG generation, Scale Space Extrema Detection, Sub-pixel Localization, Orientation Assignment,Keypoint Pruning, Descriptor Generation.
- Ashwin: Hessian, Gradient computation, Pixel Cube , Refactoring, Debugging.
- Brown, M., Lowe., D.G. Recognising panoramas. In proceedings of the 9th International Conference on Computer Vision (ICCV03), volume 2, pages 1218–1225, Nice, October 2003.
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60, 91–110 (2004).
- Brown, M., Lowe, D.G. Automatic Panoramic Image Stitching using Invariant Features. Int J Comput Vision 74, 59–73 (2007).
- Muja, Marius & Lowe, David. (2009). Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration.. VISAPP 2009 - Proceedings of the 4th International Conference on Computer Vision Theory and Applications. 1. 331-340.