Quantitative and Qualitative Evaluation of NLM and Wavelet Methods in Image Enhancement
Abstract
This paper presents a comprehensive analysis of image denoising techniques, primarily focusing on Non-local Means (NLM) and Daubechies Soft Wavelet Thresholding, and their efficacy across various datasets. These methods are applied to the CURE-OR, CURE-TSD, CURE-TSR, SSID, and Set-12 datasets, followed by an evaluation using Image Quality Assessment (IQA) metrics PSNR, SSIM, CW-SSIM, UNIQUE, MS-UNIQUE, CSV, and SUMMER. The results indicate that NLM and Wavelet Thresholding perform optimally on Set12 and SIDD datasets, attributed to their ability to effectively handle general additive and multiplicative noise masks. However, their performance on CURE datasets is limited due to the presence of complex distortions like Dirty Lens and Codec Error, which these methods are not well-suited to address. Analysis between NLM and Wavelet Thresholding shows that while NLM generally offers superior visual quality, Wavelet Thresholding excels in specific IQA metrics, particularly SUMMER, due to its enhancement in the frequency domain as opposed to NLM’s spatial domain approach.
Index Terms:
Image Enhancement, Image Denoising, Non-local Means, NLM, Wavelet Thresholding, Image Quality Assessment.I Introduction
In any image, noise and distortions may be present and affect the clarity of the image. One of the biggest drivers in the field of Digital Image Processing is the process of understanding the noise and distortions of the image, denoising and enhancing the image, then quantifying those results with Image Quality Assessment (IQA) metrics that can either mock the Human Visual System (HVS) or tell more about the results of the denoising process. Denoising images is also a crucial step for computer visual systems, allowing the application of image denoising and enhancement to extend to object detection in challenging scenarios.
This paper will utilize two image denoising methods, Non-local Means (NLM) and Daubechies Soft Wavelet Thresholding, on the CURE-OR, CURE-TSD, CURE-TSR, SSID, and Set-12 datasets. The resulting images will be evaluated using IQA metrics such as PSNR, SSIM, CW-SSIM, UNIQUE, MS-UNIQUE, CSV, and SUMMER. The paper will then go into further analysis of the different types of noise and distortions in each dataset and showcase what parameters can be adjusted to improve that denoising method, as well as showcase what types of distortions that denoising method best enhances.
II Background
II-A Traditional Method for Image Denoising
Classical assumptions used to understand the distortions in an image include assuming that neighboring pixels in an image are highly correlated. Therefore, filtering the image by averaging nearby pixels via Gaussian Smoothing is preformed to denoise an image. More mathematically,
(1) |
where is a denoised image from a noisy image that iterates through a kernel where is the standard deviation of the image. This is effectively a low pass filter in the frequency domain that creates a uniform blur across the image, eliminating a large portion of the noise at the trade off of losing the fine details of the original image. This paper extends this method utilizing NLM and Wavelet Threshold denoising.
II-B IQA Metrics
Two popular objective image quality metrics are PSNR and SSIM. PSNR between two images and can be calculated as follows:
(2) |
The SSIM is given by:
(3) |
where and are the mean and standard deviation of image , and is the covariance of and . , with , , and being 255 [1].
CW-SSIM (Complex Wavelet Structural Similarity Index) modifies SSIM for application in the complex wavelet domain, providing robustness to small geometric distortions.
(4) |
where is a small positive constant to improve the robustness of the CW-SSIM measure when the local signal to noise ratios are small [2].
The CSV (Color Structure and Visual system) metric is a full-reference estimator that quantifies perceptual color degradations, structural, and perceptual differences. It utilizes to represent the color difference based on CIEDE2000, Color Name Distance, Structural Difference, and Retinal Ganglion Cell-based Difference to closely match to HVS perceived fidelity [3].
UNIQUE (Unsupervised Image Quality Estimation) uses sparse representations obtained through unsupervised learning. Its quality estimation is based on the Spearman rank order correlation coefficient of sparse representations [4].
MS-UNIQUE extends UNIQUE using multiple linear decoders. Its quality estimation involves a weighted combination of filter responses [5].
SUMMER (SUM of Modified Error Ratios) assesses image quality by emphasizing perceptual features in the Fourier domain by using location information from analyzing the magnitude spectrums over each color channel and use frequency-based weights to align quality scores [6].
III Method
III-A Non-Local Means (NLM)
The idea of NLM is a weighted average of all pixels in the image to get a denoised image :
(5) |
and is the weights that measures the similarity between the neighborhood around the th pixel and the th pixel:
(6) |
where is defined as:
(7) |
Here is the neighborhood around the th pixel in the image, denotes the pixels in the neighborhood, is a Gaussian kernel, and controls the decay of the weight [1].
The denoise_nl_means() function in the skimage module of Python was used. The parameter was chosen to be , where is the estimated standard deviation of Gaussian Noise in the image. The constant 0.8 was multiplied to to allow less blurring of fine details of the denoised image. During later analysis, it will be shown how changing this constant can improve denoising of different kinds of distortions.
III-B Wavelet Thresholding
Wavelet Thresholding is a method for image denoising that exploits the multi-resolution representation of wavelet transforms. The denoise_wavelet() function from the skimage module in Python implements this technique. The approach involves decomposing an image into a set of wavelet coefficients, applying thresholding to these coefficients, and then reconstructing the image from the modified coefficients. The process can be described as follows:
Given a noisy image , the wavelet transform decomposes it into a set of coefficients , which includes both approximation coefficients (low-frequency components) and detail coefficients (high-frequency components). Soft thresholding is then applied to modify these coefficients according to the function:
(8) |
where are the thresholded wavelet coefficients, are the original wavelet coefficients, and is the threshold value, which is chosen based on the estimated noise standard deviation . This soft thresholding process attenuates the coefficients by reducing their magnitude, thus reducing noise and preserving significant image features [7]. The denoised image is then reconstructed from these threshold coefficients by taking the inverse of the transform.
Daubechies ’db2’ wavelet was used for its good balance between performance and computational efficiency. The denoise_wavelet() function automatically sets and is calculated similarly to NLM. The choice of soft thresholding ensures a more natural denoising effect by preserving the continuity of the wavelet coefficients, which is particularly effective in maintaining the sharpness and textures of the original image.
IV Results
Utilizing NLM and Wavelet Thresholding, images from the Set12, SIDD, CURE-OR, CURE-TSD, and CURE-TSR were denoised [8-10]. The grayscale Set12 image set was given three levels of noise of varying difficulty. After each dataset was denoised, IQA metrics were ran and averaged across all of the denoised images for each dataset.
IQA Metrics | Set12 | SIDD | CURE-OR | CURE-TSD | CURE-TSR |
PSNR (dB) | 32.190 | 34.509 | 25.257 | 23.956 | 23.644 |
SSIM | 0.7620 | 0.7795 | 0.4782 | 0.4473 | 0.4357 |
CW-SSIM | 0.9314 | 0.7492 | 0.5452 | 0.5395 | 0.5283 |
UNIQUE | 0.6464 | 0.8763 | 0.1496 | 0.1438 | 0.1395 |
MS-UNIQUE | 0.7602 | 0.9128 | 0.4124 | 0.3821 | 0.3913 |
SUMMER | 3.6747 | 3.4506 | 0.7162 | 0.6040 | 0.6150 |
CSV | 0.9803 | 0.9756 | 0.9547 | 0.8490 | 0.7931 |
IQA Metrics | Set12 | SIDD | CURE-OR | CURE-TSD | CURE-TSR |
PSNR (dB) | 31.452 | 33.774 | 25.884 | 23.752 | 23.363 |
SSIM | 0.6901 | 0.7436 | 0.3791 | 0.3642 | 0.3588 |
CW-SSIM | 0.9261 | 0.7157 | 0.6232 | 0.5981 | 0.5877 |
UNIQUE | 0.5572 | 0.8690 | 0.2039 | 0.1895 | 0.1911 |
MS-UNIQUE | 0.7091 | 0.9148 | 0.2712 | 0.2680 | 0.2708 |
SUMMER | 3.8866 | 3.5701 | 2.2341 | 1.9863 | 2.0412 |
CSV | 0.9802 | 0.9769 | 0.9428 | 0.8691 | 0.8703 |
As shown in the tables above, NLM and Wavelet Thresholding preformed significantly better across all IQA metrics in the Set12 and SIDD image sets over the CURE image sets.






Another interesting result is the higher score in the SUMMER metric’s average values for Wavelet Thresholding compared to NLM. This likely is due to SUMMER being an IQA metric based on analyzing the spectrum of the image, and Wavelet Thresholding being a frequency domain based image enhancement gave it better scores compared to the spatial domain based image enhancement NLM.
V Analysis
Figure 1 shows an image from the Set12 dataset with three varying levels of noise being enhanced via NLM. NLM does a good job, even under heavy noise, at removing distortions from the image. At the higher levels of noise, there is a ”smudging” effect on the image. In terms of IQA metrics, Figure 3 shows the PSNR and SSIM of the different levels of noise. While there is less of a drop in PSNR, there is a sharper drop in SSIM, which more closely reflects the loss of fidelity from denoising higher levels of noise.
One way to improve the smudging effect from NLM is to change the value of to be as shown in Figure 2. By using a lower value you can preserve more of the fine details of the image. The cameraman image shows sharper edges along the coat and tripod, as well as more fine details on the face and camera themselves. While lowering does help in this case to improve image quality, it does not guarantee improved performance for all images.

Additionally, while this method does improve general additive and multiplicative noise masks, it does not affect the more complex distortions in the CURE datasets.
Comparing the IQA metrics for the SIDD dataset, the brighter images that were denoised did better overall in IQA metrics compared to darker images in the dataset. This is likely due to the fact the noise in SIDD are from smartphone cameras which are limited due to their small aperture and sensor size. So when more light is available to the image, the issues of having a small aperture and sensor size are alleviated, meaning less noise are present in the image.
In the CURE datasets, NLM and Wavelet Thresholding caused more of a blurring effect in the with the complex distortions like in Dirty Lense. While this slightly alleviated some of the original distortions, the perceptual quality of the images were diminished because of the addition of the blurring.
Comparing NLM to Wavelet Thresholding directly, IQA metrics show that NLM for the most part surpasses Wavelet Thresholding in image quality (exception being SUMMER). This also matches to visually how the images compare in terms of quality. However, NLM was computationally more intensive than Wavelet Thresholding because Wavelet Thresholding utilized faster algorithms like Fast Wavelet Transform (FWT).
VI Conclusion
This paper evaluated the effectiveness of Non-local Means (NLM) and Daubechies Soft Wavelet Thresholding in denoising images across the datasets Set12, SIDD, and the CURE series. The results showed that while both methods excel in reducing general noise in Set12 and SIDD datasets, they struggle with complex distortions like Dirty Lens and Codec Error in the CURE datasets. This highlights a limitation in their adaptability to diverse distortion types.
Comparatively, NLM generally provides better visual quality, but Wavelet Thresholding has a distinct advantage in specific IQA metrics, particularly SUMMER, due to its frequency domain approach. This finding emphasizes the importance of method selection based on noise characteristics and the desired image quality.
The paper also explored parameter tuning, showing that adjustments in NLM can enhance detail preservation but do not universally guarantee improved performance. Additionally, lighting conditions significantly impact noise characteristics, especially in images from smartphone cameras.
Future work could focus on further parameter tuning to more effectively enhance complex distortions found in the CURE datasets. Another potential improvement is in the quality assessment. Object Recognition algorithm could be used on the denoised images of the CURE-TSD and CURE-TSR dataset to show the improvement in traffic sign detection, which has applications in autonomous vehicles.
References
- [1] M. Zhou, Comparison of CNN based and self-similarity based denoising methods, 2020. [Online]. Available: https://stanford.edu/class/ee367/Winter2020/report/zhou_m_report.pdf (accessed Nov. 17, 2023).
- [2] M. Sampat et al., Complex Wavelet Structural similarity: A new image Similarity index, IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2385–2401, Nov. 2009. DOI: https://doi.org/10.1109/tip.2009.2025923.
- [3] D. Temel and G. AlRegib, CSV: Image quality assessment based on color, structure, and visual system, Signal Processing: Image Communication, vol. 48, pp. 92–103, 2016. DOI: https://doi.org/10.1016/j.image.2016.08.008.
- [4] D. Temel, M. Prabhushankar, and G. AlRegib, UNIQUE: Unsupervised Image Quality Estimation, IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1414–1418, 2016. DOI: https://doi.org/10.1109/LSP.2016.2601119.
- [5] M. Prabhushankar, D. Temel, and G. AlRegib, MS-UNIQUE: Multi-model and Sharpness-weighted Unsupervised Image Quality Estimation, Electronic Imaging, vol. 29, no. 12, pp. 30–35, Jan. 2017. DOI: http://dx.doi.org/10.2352/ISSN.2470-1173.2017.12.IQSP-223.
- [6] D. Temel and G. AlRegib, Perceptual image quality assessment through spectral analysis of error representations, Signal Processing: Image Communication, vol. 70, pp. 37–46, 2019. DOI: https://doi.org/10.1016/j.image.2018.09.005.
- [7] J.-Y. Lu, H. Lin, D. Ye, and Y.-S. Zhang, A New Wavelet Threshold Function and Denoising Application, Mathematical Problems in Engineering, vol. 2016, article 3195492, 2016. DOI: https://doi.org/10.1155/2016/3195492.
- [8] D. Temel, M. Chen, and G. AlRegib, “Traffic Sign Detection Under Challenging Conditions: A Deeper Look into Performance Variations and Spectral Characteristics,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–11, 2019. DOI: https://doi.org/10.1109/TITS.2019.2931429. [Online]. Available: https://arxiv.org/abs/1908.11262.
- [9] D. Temel, T. Alshawi, M.-H. Chen, and G. AlRegib, “Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions,” in arXiv:1902.06857, 2015.
- [10] D. Temel and G. AlRegib, “Traffic Signs in the Wild: Highlights from the IEEE Video and Image Processing Cup 2017 Student Competition [SP Competitions],” IEEE Signal Processing Magazine, vol. 35, no. 2, pp. 154–161, 2018. DOI: https://doi.org/10.1109/MSP.2017.2783449. [Online]. Available: https://arxiv.org/abs/1810.06169.