This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\affils

Department of Mechanical Engineering
Korea Advanced Institute of Science and Technology (KAIST),
Daejeon, 34141, Republic of Korea.
{pranjayshyam, antabangun, kyungsookim}@kaist.ac.kr

thanks: * Denotes Equal Contribution
This paper was supported by Korea Civil-Military Combined Technology Development Project (Task No. 19-CMGU-02).

Retaining Image Feature Matching Performance Under Low Light Conditions

Pranjay Shyam    Antyanta Bangunharcana and Kyung-Soo Kim
Abstract

Poor image quality in low light images may result in a reduced number of feature matching between images. In this paper, we investigate the performance of feature extraction algorithms in low light environments. To find an optimal setting to retain feature matching performance in low light images, we look into the effect of changing feature acceptance threshold for feature detector and adding pre-processing in the form of Low Light Image Enhancement (LLIE) prior to feature detection. We observe that even in low light images, feature matching using traditional hand-crafted feature detectors still performs reasonably well by lowering the threshold parameter. We also show that applying Low Light Image Enhancement (LLIE) algorithms can improve feature matching even more when paired with the right feature extraction algorithm.

keywords:
Feature Matching, Low Light Image Enhancement

1 Introduction

Feature matching between image pair is a building block for high level tasks such as Simultaneous Localization and Mapping (SLAM) [1, 2], Image alignment[3], 3D Reconstruction. It relies on image feature extractors that detects feature points and compute the corresponding descriptors. Some of the well known hand-crafted feature extractors are SIFT [4], SURF [5], ORB [6], AKAZE [7], and BRISK [8]. These hand-crafted features often perform robustly in properly illuminated conditions. However, in real world scenarios illumination variations affect image quality by changing color distribution of objects captured within an image, adversely affecting the performance of underlying image processing algorithms that rely on feature matching techniques.

Of late deep learning based algorithms have demonstrated state of the art performance in various tasks, with deep learning based image feature extractor [9] and matcher [10] gaining popularity. To retain algorithm performance under diverse illumination conditions different works [11] recommended extending the training dataset for deep learning based algorithms to account for scenarios exhibiting extreme illumination variations using real [12] and synthetic [13, 14] samples. However, this requires collecting, aligning and training on more low light image pairs, which is a time consuming and expensive process. Thus even until now, hand crafted features are still relied upon to perform feature matching.

On the other hand, performance of traditional hand-crafted feature extractors cannot be guaranteed in low light conditions, primarily since they were designed to work with illuminated images. This hinders utilization of such techniques in applications wherein deployment conditions pertain to low light scenarios e.g., moon rovers, disaster response robot. A simple approach to address this issue is to enhance the low light image as a preprocessing step, prior to feature extraction via enhancement mechanism that have been extensively studied. However directly using CNN based enhancement techniques doesn’t guarantee performance improvement, primarily due to noise amplification and image stylization that increase number of extracted features but these features couldn’t be matched within the image pairs. Furthermore, lowering the threshold of feature matching algorithms generally improve the matching quality. We summarize our contributions as,

  • We study the matching performance of handcrafted feature extractors in low light images with varying thresholds.

  • We analyze the effects of integrating image enhancement algorithms on performance of feature matching on our test dataset.

  • We also compare the enhancement results to determine the best approach for enhancing images while minimizing the noisy pixels.

To the best of our knowledge, no study on feature extraction performance in low light condition has been performed before. We hope that this study would improve future works on feature matching based computer vision tasks in low light environment.

2 Related Works

2.1 Feature Matching

Harris corner detector [15] was one of the earliest work on feature extraction. Shi-Tomasi proposed a modified the scoring to extract corners in GFTT [16]. Lowe proposed SIFT [4] to detect features of multiple scales with Gaussian scale space on an image and extract rotation invariant descriptors. It remains one of the most popular choice for feature matching due to its robustness, but with a high computational cost. SURF [5] improved on the speed of feature detection using integral images. These two however, extracts descriptors with floating point numbers, resulting in a high computation cost for feature matching.

Refer to caption Refer to caption
(a) (b)
Figure 1: (a) Standard and (b) Modified feature extraction pipeline by Integrating deep learning based light enhancement algorithms.

Various binary descriptors are proposed to improve the feature matching speed for real-time application. BRIEF[17] presents an efficient binary descriptor. ORB [6] expand on this by modifying it to be rotation invariant as well as modifying a fast feature detector called FAST [18]. It became popular due to its successful application in real-time SLAM [1, 2]. BRISK [8] is another efficient algorithm that utilize FAST-based detector and binary descriptor.

Unlike SIFT, KAZE [19] builds scale space in a non-linear manner to better retain object boundaries in images instead of smoothing them with Gaussian filter. AKAZE [7] is a faster and more efficient improvement on KAZE. Previous research [20] compared the performance of some of the mentioned algorithms.

2.2 Low Light Image Enhancement

Improving illumination quality of an image is a long researched topic with classical approaches such as histogram equalization [21, 22] and gamma correction [23] focusing on improving contrast of the complete image, ignoring region specific enhancement which leads to over and under saturation of regions within an image. To improve the performance of such systems, Retinex theory [24] was proposed wherein an image is represented into reflectance and illumination maps that represent the color and lighting information respectively. Leveraging the feature extraction capabilities of CNNs, different works were able to enhance image quality at both local and global level using a large labelled set of paired images. Specifically, MBLLEN[25] proposed a multi-branch enhancement network to extract and enrich features across multiple networks for improving illumination condition within an image. GLADNet [26], functions by first estimating a global illumination map and subsequently performs detail reconstruction to recover features during downsampling. KinD[27] and KinD++[27] follow retinex theory to decompose impose into illumination and reflectance maps and estimate these maps concurrently following a supervised learning approach, whereas RetinexNet[28] followed similar principle and first decomposes the image into reflectance and illumination maps, enhances them and subsequently reconstructs enhanced image using improved maps.

3 Methodologies

In this section, we describe the procedure of our experiments to study the performance of various feature extractors in raw and enhanced low light images.

3.1 Feature Matching

We analyze the performance of feature extractor of images captured in low illumination settings, by extracting feature points and matching the features between image pairs (Fig. 1(a)). However, such an approach doesn’t ensure correct feature matching thus, we utilize homography between image pairs to obtain inliers and use them to filter the extracted features.

For this study, the feature detector-descriptors we are investigating are SIFT [4], SURF [5], ORB [6], AKAZE [7], BRISK [8], and GFTT [16] detector paired with BRIEF [17] descriptor. To match the features, we follow the Nearest Neighbor Distance Ratio (NNDR) method by accepting good matches when the distance of the closest matching candidate is more than 0.7 times the distance of the second matching candidate. L2-norm is used to compute matching distance for SIFT and SURF, and Hamming distance is used for the other binary-type descriptors.

From the obtained feature matches, we find the matching inliers using RANSAC to compute homography transformation of the image pair. A match is rejected if the re-projection error of the homography transformed points is larger than 10.010.0.

Each of the aforementioned feature extractors have some form of threshold parameter to accept or reject candidate points for feature detection based on the feature strength. In low light images, it is to be expected that the strength of features are lower compared to illuminated images. By lowering feature acceptance thresholds we may improve detection rate. However, the hand-crafted feature descriptors are aimed towards illuminated image features, so the subsequent feature matching may not work in spite of the increased number of extracted features. For that reason, we investigate the performance of the feature extractor algorithms with lowered thresholds in this paper.

3.2 Dataset Description

Refer to caption Refer to caption Refer to caption Refer to caption
Outdoor Locations
Refer to caption Refer to caption Refer to caption Refer to caption
Outdoor Location Indoor Location
Figure 2: Low Light Image pairs captured by rotating camera at different locations, highlighting local illumination sources and its effect on color distribution of objects within an image.

We collected 4 sets of image pairs captured in indoor (office room) and outdoor (parking lot, field, alley) low light environments with each set containing more than 5 image pairs. To obtain image pairs related by homography, the second image of each pair is captured by only rotating the camera after the first image is captured. Fig. 2 shows a examples of captured image pairs in each of the 4 sets.

3.3 Low Light Image Enhancement

To determine if pairing image enhancement networks would improve the performance of feature matching, we select different CNN based enhancement algorithms such as MBLLEN[25], GLADNet [26], KinD[27], KinD++[27] and RetinexNet[28]. Our motivation of using CNN based image enhancement techniques originates from its superior performance on different datasets, arising from its ability to non linearly enhance local regions. In this study, we retrain these algorithms on LOL dataset [28] that comprises of 500 image pairs, divided into 485 training and 15 test image pairs and evaluate its performance using peak-signal-to-noise-ratio (PSNR) and Structural Similarity (SSIM)[29] metrics, the results of which are summarized in Table 1.

Table 1: Evaluation of different low light image enhancement algorithms on LOL dataset
  Algorithm PSNR SSIM
  Linear 12.1706 0.5604
MBLLEN[25] 17.8583 0.7247
GLADNet [26] 19.7182 0.6820
KinD[27] 17.6476 0.7715
KinD++[27] 17.7518 0.7581
RetinexNet[28] 16.7740 0.4250
 

4 Experiments

In this section, we first evaluate the performance of different CNN based enhancement techniques, focusing on noise amplification and image stylization and subsequently evaluate the performance of feature extractors by comparing the average number across our dataset of features extracted, matched features along with the number of accepted inliers in low light image pairs.

The extensive results of our feature matching experiments is shown in Table 2. For each feature extractors, we show the matching results performed directly in low light images as well as in LLIE processed images. We also show the results with multiple different feature detector threshold parameters. It is important to note that the left most threshold value in each feature extractor is the default value in the OpenCV implementation.

We observe that at higher threshold values, excluding linear based image enhancement, CNN based enhancement techniques improves the number of detected as well as the matched features. This is to be expected as image enhancement strengthen the edges and corners within the image as shown in Fig. 3. However, simply lowering the feature acceptance threshold consistently improve performance massively in raw images. This improvement is not seen as much in processed images as most of the features were already strengthened in the first place. Moreover, adding image enhancement prior to feature extraction with lower thresholds does not uniformly improve performance for all feature extractors.

From Table 2, we infer that at lower thresholds, integrating enhancement networks such as KinD and MBLLEN improves the feature matching performance of AKAZE and BRISK, however it reduces the performance of SURF and ORB while demonstrating negligible difference in performance of SIFT and GFTT-BRIEF. We attribute this inconsistency in performance arising from inconsistent introduction of new features after application of enhancement techniques, as demonstrated in Fig. 3. While linear enhancement focuses on improving global contrast, different CNN based techniques introduce different categories of artifacts. Specifically, RetinexNet (Fig. 3 (c)) improves illumination performance but it also generates stylized features thereby destroying the natural features present in the image. Similarly GLAD affects the natural image features by introducing large pixelated noisy features arising mainly due to its inability to extracting and represent features from small pixels. On contrary, MBLLEN and KinD performs image enhancement without introducing a significant amount of noisy pixels or stylizing images which helps in improving the performance of underlying feature extraction and matching stage. Due to this side effect of image enhancement and the different ways each feature extractors detect and obtain descriptors, some image enhancement and feature extractor pairs works well while some others doesn’t.

Refer to caption Refer to caption
(a) (b)
Refer to caption Refer to caption
(c) (d)
Refer to caption Refer to caption
(e) (f)
Figure 3: Qualitative results from different enhancement networks for a given (a) low light image and corresponding results from (b) Linear Enhancement, (c) RetinexNet, (d) Kind, (e) MBLLEN and (f) GLAD.

From this observation, in resource constrained application such as SLAM in mobile robotics or Augmented Reality in mobile phones where the use of computationally demanding deep learning techniques is impossible, we recommend a mere adjustment in threshold parameters to improve feature matching. However, image enhancement technique can be applied if paired with the correct feature extractors if a further improvement upon the feature matching is desired. Out of all the feature extractors we experimented on, AKAZE seems to perform the best in low light environment followed by BRISK, with MBLLEN and KinD pairing well with them.

Table 2: Average number of extracted and matched features as well as accepted inliers based on homography fitting for various feature detector-descriptor with multiple thresholds on low light images enhanced with several LLIE algorithms. Underlined numbers represents the highest numbers for each feature extractor. Bold numbers are the overall highest numbers.
  Feature detector-descriptor LLIE algorithm # Features detected # Features matched # Inliers accepted
               Contrast Threshold : 0.04 / 0.01 / 0.001
SIFT Raw 61 / 584 / 2461 25 / 73 / 84 21 / 67 / 73
Linear 62 / 585 / 2463 24 / 72 / 83 21 / 66 / 72
MBLLEN 873 / 1863 / 2856 68 / 82 / 91 63 / 70 / 69
GLAD 1137 / 2475 / 2932 60 / 68 / 73 54 / 57 / 57
KinD 672 / 2304 / 2881 68 / 78 / 82 63 / 70 / 70
KinDpp 1619 / 2691 / 3152 77 / 81 / 101 72 / 72 / 74
RetinexNet 1635 / 2588 / 2878 38 / 39 / 43 34 / 34 / 34
               Hessian Threshold : 100 / 10 / 1
SURF Raw 123 / 650 / 1514 42 / 87 / 102 37 / 76 / 85
Linear 123 / 651 / 1513 41 / 87 / 101 36 / 75 / 86
MBLLEN 859 / 1249 / 1427 82 / 92 / 95 71 / 75 / 76
GLAD 1065 / 1721 / 1967 66 / 70 / 72 55 / 57 / 57
KinD 837 / 1393 / 1612 73 / 81 / 83 64 / 69 / 69
KinDpp 1271 / 1702 / 1871 71 / 74 / 75 60 / 62 / 62
RetinexNet 1465 / 2404 / 2744 40 / 45 / 48 31 / 33 / 33
               FAST Threshold : 20 / 2
ORB Raw 175 / 481 38 / 65 35 / 61
Linear 173 / 481 38 / 66 36 / 62
MBLLEN 456 / 487 54 / 55 51 / 52
GLAD 464 / 487 47 / 47 43 / 44
KinD 450 / 487 61 / 62 59 / 59
KinDpp 476 / 487 54 / 54 50 / 50
RetinexNet 469 / 487 27 / 28 25 / 24
               Threshold : 0.001 / 0.0001 / 0.00001
AKAZE Raw 26 / 130 / 714 13 / 52 / 147 11 / 50 / 139
Linear 26 / 130 / 715 13 / 53 / 148 11 / 50 / 141
MBLLEN 352 / 1055 / 1645 84 / 170 / 202 81 / 160 / 188
GLAD 256 / 1297 / 2191 59 / 146 / 171 57 / 137 / 160
KinD 177 / 1016 / 1871 54 / 157 / 194 52 / 150 / 182
KinDpp 457 / 1524 / 2127 80 / 143 / 156 78 / 136 / 146
RetinexNet 285 / 1661 / 2614 37 / 87 / 97 35 / 81 / 90
               Quality Level : 0.01 / 0.001
GFTT-BRIEF Raw 113 / 522 49 / 115 42 / 106
Linear 126 / 598 55 / 124 48 / 116
MBLLEN 574 / 710 111 / 124 102 / 113
GLAD 735 / 737 96 / 96 87 / 87
KinD 736 / 746 123 / 123 112 / 113
KinDpp 730 / 730 109 / 109 101 / 101
RetinexNet 732 / 732 51 / 51 46 / 46
               Threshold : 30 / 10
BRISK Raw 72 / 296 23 / 68 20 / 65
Linear 72 / 296 23 / 68 21 / 65
MBLLEN 727 / 2624 74 / 155 70 / 145
GLAD 1049 / 5049 64 / 123 60 / 115
KinD 397 / 3112 63 / 156 60 / 147
KinDpp 1305 / 4940 86 / 142 83 / 131
RetinexNet 2287 / 7370 43 / 67 40 / 58
 
SIFT Refer to caption Refer to caption
SURF Refer to caption Refer to caption
ORB Refer to caption Refer to caption
AKAZE Refer to caption Refer to caption
GFTT-BRIEF Refer to caption Refer to caption
BRISK Refer to caption Refer to caption
Low Light Image Pair MBLLEN Enhanced Image Pair
Figure 4: Feature matching of image pairs using feature detector-descriptor at their best performing thresholds. Blue points are extracted features green lines represents inlier matches and red lines represents outlier matches.

Fig. 4 shows the examples of feature matching results of different feature extractors on the raw images and MBLLEN processed images. The images shown are the matching results at the lowest feature detection threshold, hence the large number of features.

5 Acknowledgements

This paper was supported by the Korea Civil-Military Combined Technology Development project (Task No. 19-CMGU-02).

References

  • [1] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
  • [2] R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
  • [3] R. Szeliski, “Image alignment and stitching: A tutorial,” Foundations and Trends® in Computer Graphics and Vision, vol. 2, no. 1, pp. 1–104, 2006.
  • [4] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
  • [5] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in European conference on computer vision, pp. 404–417, Springer, 2006.
  • [6] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision, pp. 2564–2571, Ieee, 2011.
  • [7] P. F. Alcantarilla and T. Solutions, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” IEEE Trans. Patt. Anal. Mach. Intell, vol. 34, no. 7, pp. 1281–1298, 2011.
  • [8] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in 2011 International conference on computer vision, pp. 2548–2555, Ieee, 2011.
  • [9] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236, 2018.
  • [10] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947, 2020.
  • [11] C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel, “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” arXiv preprint arXiv:1907.07484, 2019.
  • [12] Y. P. Loh and C. S. Chan, “Getting to know low-light images with the exclusively dark dataset,” Computer Vision and Image Understanding, vol. 178, pp. 30–42, 2019.
  • [13] D. Dai and L. Van Gool, “Dark model adaptation: Semantic image segmentation from daytime to nighttime,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3819–3824, IEEE, 2018.
  • [14] T. Liu, Z. Chen, Y. Yang, Z. Wu, and H. Li, “Lane detection in low-light conditions using an efficient data enhancement: Light conditions style transfer,” arXiv preprint arXiv:2002.01177, 2020.
  • [15] C. G. Harris, M. Stephens, et al., “A combined corner and edge detector.,” in Alvey vision conference, vol. 15, pp. 10–5244, Citeseer, 1988.
  • [16] J. Shi et al., “Good features to track,” in 1994 Proceedings of IEEE conference on computer vision and pattern recognition, pp. 593–600, IEEE, 1994.
  • [17] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in European conference on computer vision, pp. 778–792, Springer, 2010.
  • [18] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in European conference on computer vision, pp. 430–443, Springer, 2006.
  • [19] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “Kaze features,” in European Conference on Computer Vision, pp. 214–227, Springer, 2012.
  • [20] S. A. K. Tareen and Z. Saleem, “A comparative analysis of sift, surf, kaze, akaze, orb, and brisk,” in 2018 International conference on computing, mathematics and engineering technologies (iCoMET), pp. 1–10, IEEE, 2018.
  • [21] E. D. Pisano, S. Zong, B. M. Hemminger, M. DeLuca, R. E. Johnston, K. Muller, M. P. Braeuning, and S. M. Pizer, “Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms,” Journal of Digital imaging, vol. 11, no. 4, p. 193, 1998.
  • [22] C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2d histograms,” IEEE transactions on image processing, vol. 22, no. 12, pp. 5372–5384, 2013.
  • [23] S.-C. Huang, F.-C. Cheng, and Y.-S. Chiu, “Efficient contrast enhancement using adaptive gamma correction with weighting distribution,” IEEE transactions on image processing, vol. 22, no. 3, pp. 1032–1041, 2012.
  • [24] E. H. Land, “The retinex theory of color vision,” Scientific american, vol. 237, no. 6, pp. 108–129, 1977.
  • [25] J. W. Feifan Lv, Feng Lu and C. Lim, “Mbllen: Low-light image/video enhancement using cnns,” in British Machine Vision Conference (BMVC), 2018.
  • [26] W. Wang, C. Wei, W. Yang, and J. Liu, “Gladnet: Low-light enhancement network with global awareness,” in Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference, pp. 751–755, IEEE, 2018.
  • [27] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, (New York, NY, USA), pp. 1632–1640, ACM, 2019.
  • [28] W. Y. Chen Wei, Wenjing Wang and J. Liu, “Deep retinex decomposition for low-light enhancement,” in British Machine Vision Conference, British Machine Vision Association, 2018.
  • [29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.