\affils

Department of Mechanical Engineering
Korea Advanced Institute of Science and Technology (KAIST),
Daejeon, 34141, Republic of Korea.
{pranjayshyam, antabangun, kyungsookim}@kaist.ac.kr

^†^†thanks: * Denotes Equal Contribution
This paper was supported by Korea Civil-Military Combined Technology Development Project (Task No. 19-CMGU-02).

Retaining Image Feature Matching Performance Under Low Light Conditions

Pranjay Shyam^∗ Antyanta Bangunharcana^∗ and Kyung-Soo Kim

Abstract

Poor image quality in low light images may result in a reduced number of feature matching between images. In this paper, we investigate the performance of feature extraction algorithms in low light environments. To find an optimal setting to retain feature matching performance in low light images, we look into the effect of changing feature acceptance threshold for feature detector and adding pre-processing in the form of Low Light Image Enhancement (LLIE) prior to feature detection. We observe that even in low light images, feature matching using traditional hand-crafted feature detectors still performs reasonably well by lowering the threshold parameter. We also show that applying Low Light Image Enhancement (LLIE) algorithms can improve feature matching even more when paired with the right feature extraction algorithm.

keywords:

Feature Matching, Low Light Image Enhancement

1 Introduction

Feature matching between image pair is a building block for high level tasks such as Simultaneous Localization and Mapping (SLAM) [1, 2], Image alignment[3], 3D Reconstruction. It relies on image feature extractors that detects feature points and compute the corresponding descriptors. Some of the well known hand-crafted feature extractors are SIFT [4], SURF [5], ORB [6], AKAZE [7], and BRISK [8]. These hand-crafted features often perform robustly in properly illuminated conditions. However, in real world scenarios illumination variations affect image quality by changing color distribution of objects captured within an image, adversely affecting the performance of underlying image processing algorithms that rely on feature matching techniques.

Of late deep learning based algorithms have demonstrated state of the art performance in various tasks, with deep learning based image feature extractor [9] and matcher [10] gaining popularity. To retain algorithm performance under diverse illumination conditions different works [11] recommended extending the training dataset for deep learning based algorithms to account for scenarios exhibiting extreme illumination variations using real [12] and synthetic [13, 14] samples. However, this requires collecting, aligning and training on more low light image pairs, which is a time consuming and expensive process. Thus even until now, hand crafted features are still relied upon to perform feature matching.

On the other hand, performance of traditional hand-crafted feature extractors cannot be guaranteed in low light conditions, primarily since they were designed to work with illuminated images. This hinders utilization of such techniques in applications wherein deployment conditions pertain to low light scenarios e.g., moon rovers, disaster response robot. A simple approach to address this issue is to enhance the low light image as a preprocessing step, prior to feature extraction via enhancement mechanism that have been extensively studied. However directly using CNN based enhancement techniques doesn’t guarantee performance improvement, primarily due to noise amplification and image stylization that increase number of extracted features but these features couldn’t be matched within the image pairs. Furthermore, lowering the threshold of feature matching algorithms generally improve the matching quality. We summarize our contributions as,

•

We study the matching performance of handcrafted feature extractors in low light images with varying thresholds.
•

We analyze the effects of integrating image enhancement algorithms on performance of feature matching on our test dataset.
•

We also compare the enhancement results to determine the best approach for enhancing images while minimizing the noisy pixels.

To the best of our knowledge, no study on feature extraction performance in low light condition has been performed before. We hope that this study would improve future works on feature matching based computer vision tasks in low light environment.

2 Related Works

2.1 Feature Matching

Harris corner detector [15] was one of the earliest work on feature extraction. Shi-Tomasi proposed a modified the scoring to extract corners in GFTT [16]. Lowe proposed SIFT [4] to detect features of multiple scales with Gaussian scale space on an image and extract rotation invariant descriptors. It remains one of the most popular choice for feature matching due to its robustness, but with a high computational cost. SURF [5] improved on the speed of feature detection using integral images. These two however, extracts descriptors with floating point numbers, resulting in a high computation cost for feature matching.

Refer to caption — Figure 1: (a) Standard and (b) Modified feature extraction pipeline by Integrating deep learning based light enhancement algorithms.

Various binary descriptors are proposed to improve the feature matching speed for real-time application. BRIEF[17] presents an efficient binary descriptor. ORB [6] expand on this by modifying it to be rotation invariant as well as modifying a fast feature detector called FAST [18]. It became popular due to its successful application in real-time SLAM [1, 2]. BRISK [8] is another efficient algorithm that utilize FAST-based detector and binary descriptor.

Unlike SIFT, KAZE [19] builds scale space in a non-linear manner to better retain object boundaries in images instead of smoothing them with Gaussian filter. AKAZE [7] is a faster and more efficient improvement on KAZE. Previous research [20] compared the performance of some of the mentioned algorithms.

2.2 Low Light Image Enhancement

Improving illumination quality of an image is a long researched topic with classical approaches such as histogram equalization [21, 22] and gamma correction [23] focusing on improving contrast of the complete image, ignoring region specific enhancement which leads to over and under saturation of regions within an image. To improve the performance of such systems, Retinex theory [24] was proposed wherein an image is represented into reflectance and illumination maps that represent the color and lighting information respectively. Leveraging the feature extraction capabilities of CNNs, different works were able to enhance image quality at both local and global level using a large labelled set of paired images. Specifically, MBLLEN[25] proposed a multi-branch enhancement network to extract and enrich features across multiple networks for improving illumination condition within an image. GLADNet [26], functions by first estimating a global illumination map and subsequently performs detail reconstruction to recover features during downsampling. KinD[27] and KinD++[27] follow retinex theory to decompose impose into illumination and reflectance maps and estimate these maps concurrently following a supervised learning approach, whereas RetinexNet[28] followed similar principle and first decomposes the image into reflectance and illumination maps, enhances them and subsequently reconstructs enhanced image using improved maps.

3 Methodologies

In this section, we describe the procedure of our experiments to study the performance of various feature extractors in raw and enhanced low light images.

3.1 Feature Matching

We analyze the performance of feature extractor of images captured in low illumination settings, by extracting feature points and matching the features between image pairs (Fig. 1(a)). However, such an approach doesn’t ensure correct feature matching thus, we utilize homography between image pairs to obtain inliers and use them to filter the extracted features.

For this study, the feature detector-descriptors we are investigating are SIFT [4], SURF [5], ORB [6], AKAZE [7], BRISK [8], and GFTT [16] detector paired with BRIEF [17] descriptor. To match the features, we follow the Nearest Neighbor Distance Ratio (NNDR) method by accepting good matches when the distance of the closest matching candidate is more than 0.7 times the distance of the second matching candidate. L2-norm is used to compute matching distance for SIFT and SURF, and Hamming distance is used for the other binary-type descriptors.

From the obtained feature matches, we find the matching inliers using RANSAC to compute homography transformation of the image pair. A match is rejected if the re-projection error of the homography transformed points is larger than $10.0$ .

Each of the aforementioned feature extractors have some form of threshold parameter to accept or reject candidate points for feature detection based on the feature strength. In low light images, it is to be expected that the strength of features are lower compared to illuminated images. By lowering feature acceptance thresholds we may improve detection rate. However, the hand-crafted feature descriptors are aimed towards illuminated image features, so the subsequent feature matching may not work in spite of the increased number of extracted features. For that reason, we investigate the performance of the feature extractor algorithms with lowered thresholds in this paper.

3.2 Dataset Description

We collected 4 sets of image pairs captured in indoor (office room) and outdoor (parking lot, field, alley) low light environments with each set containing more than 5 image pairs. To obtain image pairs related by homography, the second image of each pair is captured by only rotating the camera after the first image is captured. Fig. 2 shows a examples of captured image pairs in each of the 4 sets.

3.3 Low Light Image Enhancement

To determine if pairing image enhancement networks would improve the performance of feature matching, we select different CNN based enhancement algorithms such as MBLLEN[25], GLADNet [26], KinD[27], KinD++[27] and RetinexNet[28]. Our motivation of using CNN based image enhancement techniques originates from its superior performance on different datasets, arising from its ability to non linearly enhance local regions. In this study, we retrain these algorithms on LOL dataset [28] that comprises of 500 image pairs, divided into 485 training and 15 test image pairs and evaluate its performance using peak-signal-to-noise-ratio (PSNR) and Structural Similarity (SSIM)[29] metrics, the results of which are summarized in Table 1.

Table 1: Evaluation of different low light image enhancement algorithms on LOL dataset

Algorithm	PSNR	SSIM
Linear			12.1706	0.5604
MBLLEN[25]	17.8583	0.7247
GLADNet [26]	19.7182	0.6820
KinD[27]	17.6476	0.7715
KinD++[27]	17.7518	0.7581
RetinexNet[28]	16.7740	0.4250

4 Experiments

In this section, we first evaluate the performance of different CNN based enhancement techniques, focusing on noise amplification and image stylization and subsequently evaluate the performance of feature extractors by comparing the average number across our dataset of features extracted, matched features along with the number of accepted inliers in low light image pairs.

The extensive results of our feature matching experiments is shown in Table 2. For each feature extractors, we show the matching results performed directly in low light images as well as in LLIE processed images. We also show the results with multiple different feature detector threshold parameters. It is important to note that the left most threshold value in each feature extractor is the default value in the OpenCV implementation.

We observe that at higher threshold values, excluding linear based image enhancement, CNN based enhancement techniques improves the number of detected as well as the matched features. This is to be expected as image enhancement strengthen the edges and corners within the image as shown in Fig. 3. However, simply lowering the feature acceptance threshold consistently improve performance massively in raw images. This improvement is not seen as much in processed images as most of the features were already strengthened in the first place. Moreover, adding image enhancement prior to feature extraction with lower thresholds does not uniformly improve performance for all feature extractors.

From Table 2, we infer that at lower thresholds, integrating enhancement networks such as KinD and MBLLEN improves the feature matching performance of AKAZE and BRISK, however it reduces the performance of SURF and ORB while demonstrating negligible difference in performance of SIFT and GFTT-BRIEF. We attribute this inconsistency in performance arising from inconsistent introduction of new features after application of enhancement techniques, as demonstrated in Fig. 3. While linear enhancement focuses on improving global contrast, different CNN based techniques introduce different categories of artifacts. Specifically, RetinexNet (Fig. 3 (c)) improves illumination performance but it also generates stylized features thereby destroying the natural features present in the image. Similarly GLAD affects the natural image features by introducing large pixelated noisy features arising mainly due to its inability to extracting and represent features from small pixels. On contrary, MBLLEN and KinD performs image enhancement without introducing a significant amount of noisy pixels or stylizing images which helps in improving the performance of underlying feature extraction and matching stage. Due to this side effect of image enhancement and the different ways each feature extractors detect and obtain descriptors, some image enhancement and feature extractor pairs works well while some others doesn’t.

From this observation, in resource constrained application such as SLAM in mobile robotics or Augmented Reality in mobile phones where the use of computationally demanding deep learning techniques is impossible, we recommend a mere adjustment in threshold parameters to improve feature matching. However, image enhancement technique can be applied if paired with the correct feature extractors if a further improvement upon the feature matching is desired. Out of all the feature extractors we experimented on, AKAZE seems to perform the best in low light environment followed by BRISK, with MBLLEN and KinD pairing well with them.

Table 2: Average number of extracted and matched features as well as accepted inliers based on homography fitting for various feature detector-descriptor with multiple thresholds on low light images enhanced with several LLIE algorithms. Underlined numbers represents the highest numbers for each feature extractor. Bold numbers are the overall highest numbers.

Feature detector-descriptor	LLIE algorithm	# Features detected	# Features matched	# Inliers accepted
					Contrast Threshold : 0.04 / 0.01 / 0.001
SIFT	Raw	61 / 584 / 2461	25 / 73 / 84	21 / 67 / 73
	Linear	62 / 585 / 2463	24 / 72 / 83	21 / 66 / 72
	MBLLEN	873 / 1863 / 2856	68 / 82 / 91	63 / 70 / 69
	GLAD	1137 / 2475 / 2932	60 / 68 / 73	54 / 57 / 57
	KinD	672 / 2304 / 2881	68 / 78 / 82	63 / 70 / 70
	KinDpp	1619 / 2691 / 3152	77 / 81 / 101	72 / 72 / 74
	RetinexNet	1635 / 2588 / 2878	38 / 39 / 43	34 / 34 / 34
					Hessian Threshold : 100 / 10 / 1
SURF	Raw	123 / 650 / 1514	42 / 87 / 102	37 / 76 / 85
	Linear	123 / 651 / 1513	41 / 87 / 101	36 / 75 / 86
	MBLLEN	859 / 1249 / 1427	82 / 92 / 95	71 / 75 / 76
	GLAD	1065 / 1721 / 1967	66 / 70 / 72	55 / 57 / 57
	KinD	837 / 1393 / 1612	73 / 81 / 83	64 / 69 / 69
	KinDpp	1271 / 1702 / 1871	71 / 74 / 75	60 / 62 / 62
	RetinexNet	1465 / 2404 / 2744	40 / 45 / 48	31 / 33 / 33
					FAST Threshold : 20 / 2
ORB	Raw	175 / 481	38 / 65	35 / 61
	Linear	173 / 481	38 / 66	36 / 62
	MBLLEN	456 / 487	54 / 55	51 / 52
	GLAD	464 / 487	47 / 47	43 / 44
	KinD	450 / 487	61 / 62	59 / 59
	KinDpp	476 / 487	54 / 54	50 / 50
	RetinexNet	469 / 487	27 / 28	25 / 24
					Threshold : 0.001 / 0.0001 / 0.00001
AKAZE	Raw	26 / 130 / 714	13 / 52 / 147	11 / 50 / 139
	Linear	26 / 130 / 715	13 / 53 / 148	11 / 50 / 141
	MBLLEN	352 / 1055 / 1645	84 / 170 / 202	81 / 160 / 188
	GLAD	256 / 1297 / 2191	59 / 146 / 171	57 / 137 / 160
	KinD	177 / 1016 / 1871	54 / 157 / 194	52 / 150 / 182
	KinDpp	457 / 1524 / 2127	80 / 143 / 156	78 / 136 / 146
	RetinexNet	285 / 1661 / 2614	37 / 87 / 97	35 / 81 / 90
					Quality Level : 0.01 / 0.001
GFTT-BRIEF	Raw	113 / 522	49 / 115	42 / 106
	Linear	126 / 598	55 / 124	48 / 116
	MBLLEN	574 / 710	111 / 124	102 / 113
	GLAD	735 / 737	96 / 96	87 / 87
	KinD	736 / 746	123 / 123	112 / 113
	KinDpp	730 / 730	109 / 109	101 / 101
	RetinexNet	732 / 732	51 / 51	46 / 46
					Threshold : 30 / 10
BRISK	Raw	72 / 296	23 / 68	20 / 65
	Linear	72 / 296	23 / 68	21 / 65
	MBLLEN	727 / 2624	74 / 155	70 / 145
	GLAD	1049 / 5049	64 / 123	60 / 115
	KinD	397 / 3112	63 / 156	60 / 147
	KinDpp	1305 / 4940	86 / 142	83 / 131
	RetinexNet	2287 / 7370	43 / 67	40 / 58

Fig. 4 shows the examples of feature matching results of different feature extractors on the raw images and MBLLEN processed images. The images shown are the matching results at the lowest feature detection threshold, hence the large number of features.

5 Acknowledgements

This paper was supported by the Korea Civil-Military Combined Technology Development project (Task No. 19-CMGU-02).

References

[1] R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE transactions on robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[2] R. Mur-Artal and J. D. Tardós, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
[3] R. Szeliski, “Image alignment and stitching: A tutorial,” Foundations and Trends® in Computer Graphics and Vision, vol. 2, no. 1, pp. 1–104, 2006.
[4] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[5] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” in European conference on computer vision, pp. 404–417, Springer, 2006.
[6] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International conference on computer vision, pp. 2564–2571, Ieee, 2011.
[7] P. F. Alcantarilla and T. Solutions, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” IEEE Trans. Patt. Anal. Mach. Intell, vol. 34, no. 7, pp. 1281–1298, 2011.
[8] S. Leutenegger, M. Chli, and R. Y. Siegwart, “Brisk: Binary robust invariant scalable keypoints,” in 2011 International conference on computer vision, pp. 2548–2555, Ieee, 2011.
[9] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236, 2018.
[10] P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947, 2020.
[11] C. Michaelis, B. Mitzkus, R. Geirhos, E. Rusak, O. Bringmann, A. S. Ecker, M. Bethge, and W. Brendel, “Benchmarking robustness in object detection: Autonomous driving when winter is coming,” arXiv preprint arXiv:1907.07484, 2019.
[12] Y. P. Loh and C. S. Chan, “Getting to know low-light images with the exclusively dark dataset,” Computer Vision and Image Understanding, vol. 178, pp. 30–42, 2019.
[13] D. Dai and L. Van Gool, “Dark model adaptation: Semantic image segmentation from daytime to nighttime,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 3819–3824, IEEE, 2018.
[14] T. Liu, Z. Chen, Y. Yang, Z. Wu, and H. Li, “Lane detection in low-light conditions using an efficient data enhancement: Light conditions style transfer,” arXiv preprint arXiv:2002.01177, 2020.
[15] C. G. Harris, M. Stephens, et al., “A combined corner and edge detector.,” in Alvey vision conference, vol. 15, pp. 10–5244, Citeseer, 1988.
[16] J. Shi et al., “Good features to track,” in 1994 Proceedings of IEEE conference on computer vision and pattern recognition, pp. 593–600, IEEE, 1994.
[17] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “Brief: Binary robust independent elementary features,” in European conference on computer vision, pp. 778–792, Springer, 2010.
[18] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in European conference on computer vision, pp. 430–443, Springer, 2006.
[19] P. F. Alcantarilla, A. Bartoli, and A. J. Davison, “Kaze features,” in European Conference on Computer Vision, pp. 214–227, Springer, 2012.
[20] S. A. K. Tareen and Z. Saleem, “A comparative analysis of sift, surf, kaze, akaze, orb, and brisk,” in 2018 International conference on computing, mathematics and engineering technologies (iCoMET), pp. 1–10, IEEE, 2018.
[21] E. D. Pisano, S. Zong, B. M. Hemminger, M. DeLuca, R. E. Johnston, K. Muller, M. P. Braeuning, and S. M. Pizer, “Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms,” Journal of Digital imaging, vol. 11, no. 4, p. 193, 1998.
[22] C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2d histograms,” IEEE transactions on image processing, vol. 22, no. 12, pp. 5372–5384, 2013.
[23] S.-C. Huang, F.-C. Cheng, and Y.-S. Chiu, “Efficient contrast enhancement using adaptive gamma correction with weighting distribution,” IEEE transactions on image processing, vol. 22, no. 3, pp. 1032–1041, 2012.
[24] E. H. Land, “The retinex theory of color vision,” Scientific american, vol. 237, no. 6, pp. 108–129, 1977.
[25] J. W. Feifan Lv, Feng Lu and C. Lim, “Mbllen: Low-light image/video enhancement using cnns,” in British Machine Vision Conference (BMVC), 2018.
[26] W. Wang, C. Wei, W. Yang, and J. Liu, “Gladnet: Low-light enhancement network with global awareness,” in Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference, pp. 751–755, IEEE, 2018.
[27] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, (New York, NY, USA), pp. 1632–1640, ACM, 2019.
[28] W. Y. Chen Wei, Wenjing Wang and J. Liu, “Deep retinex decomposition for low-light enhancement,” in British Machine Vision Conference, British Machine Vision Association, 2018.
[29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.


(a)	(b)


Outdoor Locations

Outdoor Location	Indoor Location


(a)	(b)

(c)	(d)

(e)	(f)