This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Empowering Low-Light Image Enhancer through Customized Learnable Priors

Naishan Zheng 1111Both authors contributed equally to this research., Man Zhou 1111Both authors contributed equally to this research., Yanmeng Dong, Xiangyu Rui 2, Jie Huang 1, Chongyi Li 3, Feng Zhao 1222Corresponding author.
1University of Science and Technology of China
2Xi’an Jiaotong University,  3Nankai University
{nszheng,manman,hj0117}@mail.ustc.edu.cn,
{dongym.aca,xyrui.aca,lichongyi25}@gmail.com, fzhao956@ustc.edu.cn
Abstract

Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) structure flow: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and 2) optimization flow: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model. Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at https://github.com/zheng980629/CUE.

Refer to caption
Figure 1: Comparison between previous deep unfolding low-light enhancement methods and our proposed paradigm. (a) Previous works deliver the ambiguous and implicit priors by the heuristically proximal networks in a “black box” manner; and (b) our CUE explores the potential of customized learnable priors on low-light image enhancement to improve the transparency of the deep unfolding paradigm.

1 Introduction

Low-light conditions often result in images with limited visibility and noise, which can negatively affect downstream computer vision tasks. To recover details buried in low-light images and remove noise effects, the field of low-light image enhancement has received significant attention. Existing approaches can be classified into two categories: traditional methods and deep learning-based methods.

Traditional methods for low-light image enhancement involve formulating the problem as an optimization task and using image priors as regularization terms to constrain the solution space. One such representative method is based on the Retinex theory[23], which assumes that an image can be decomposed into reflectance and illumination components where the former keeps consistency under any lighting conditions and the latter reflects variations in brightness. Nonetheless, estimating illumination and reflectance terms simultaneously is challenging. To overcome this issue, based on the Retinex theory, well-designed priors of the illumination and reflectance terms are proposed. For example, an 2\ell_{2} norm on illumination gradients is proposed by [11, 10] to ensure smoothness but generate blurred borders around areas where the illumination suddenly changes. [28] further introduces a noise term and develops a 1\ell_{1} prior to constrain illumination gradients, maintaining the overall structure of the illumination map. However, hand-crafted priors are difficult to design and have limited representative ability in complex scenes, hindering their practical usage.

Inspired by the powerful learning capability of deep learning, explosive learning-driven methods have undeniably ushered in a new era for low-light image enhancement [17, 16, 19, 58, 18, 15, 64, 65, 31, 39, 33, 38, 42, 22, 54, 37, 41, 30]. For example, RetinexNet [51] and KinD [61] are two popular methods that use deep learning to enhance low-light images. RetinexNet integrates Retinex decomposition and illumination adjustment into an end-to-end trainable network, while KinD separately trains layer decomposition, illumination adjustment, and reflectance restoration subnetworks. Despite their success, these methods often construct black-box networks without considering the intrinsic priors of illumination and reflectance components, leading to a lack of transparency and interpretability.

To improve the interpretability, a model-driven deep unfolding paradigm, URetinexNet [53], has been proposed. It formulates the Retinex decomposition as an implicit prior regularized model but neglects the effects of noise pollution. Moreover, this method delivers the illumination and reflectance priors in a vague and ambiguous manner through empirically constructed proximal operator networks. Thus, the inherent properties of these two components are not considered, leading to an ambiguous and implicit prior principle (see Fig. 1). We therefore wonder, “Can we customize learnable priors for illumination and noise terms that leverage their intrinsic properties?”.

To answer this question, we first analyze the characteristics of the illumination and noise components: 1) illumination prior: Based on the Retinex theory, the illumination component should be smooth and preserve the structure while adapting to different lighting conditions; 2) noise prior: In low-light images, noise is inherent and cannot be removed simply by adjusting brightness, i.e., irrelevant to enhanced lightness.

Motivated by the above analysis, we aim at exploring the potential of customized learnable priors for low-light image enhancement to improve the transparency of the deep unfolding paradigm. Our proposed method is called Customized Unfolding Enhancer (CUE). To achieve this, we utilize the innate feature representation capability of Masked Autoencoder (MAE) to customize MAE-based illumination and noise priors with a masked image modeling strategy. Specifically, the illumination prior is trained from a normal-light image to its illumination map filtered by a bilateral filter, reducing noise without altering the intrinsic structure [5]. The noise prior aims to learn the histograms of oriented gradients of a normal-light image, which presents the gradient variation while being irrelevant to enhanced lightness [7]. To integrate the customized priors into the low-light image enhancement process, we redevelop the two learned priors from two perspectives: 1) structure flow: embedding the learned illumination prior into the design of the proximal operator in the Retinex decomposition unfolding process; and 2) optimization flow: redeveloping the learned noise prior as a regularization term to eliminate noise by minimizing the gradient presentation difference between the enhanced and normal-light images. Extensive experimental results demonstrate the superiority of our paradigm over state-of-the-art methods. Additionally, we also verify the effectiveness of the proposed learnable noise prior for image denoising.

Our contributions are summarized as follows:

  • We activate the potential of customized learnable illumination and noise priors via a new deep unfolding paradigm for low-light image enhancement.

  • From the structure flow, we embed the MAE-based customized illumination prior into the unfolding architecture to improve the transparency and interpretability of the unfolding structure.

  • From the optimization flow, we redevelop the MAE-based customized noise prior as a regularization term to constrain the gradient representation consistency.

  • Our experiments on multiple low-light image benchmarks show that the proposed paradigm outperforms state-of-the-art methods, and our customized noise prior is effective for image denoising.

2 Related Work

2.1 Low-Light Image Enhancement

There are three groups of traditional image enhancement techniques: Histogram Equalization (HE), Gamma Correction (GC), and Retinex theory [23]. HE-based techniques stretch the dynamic range of the low-light images with various complicated priors [20, 2, 4, 56]. GC-based [45, 48] methods expand the dynamic range of an image by applying an exponential function to each pixel individually. In addition, Retinex-based variants [11, 28, 47, 28] have developed priors to constrain the solution space of the illumination and reflectance maps. However, the inflexible hand-crafted priors may not generalize well in complex scenes.

Refer to caption
Figure 2: The end-to-end training paradigm of the proposed Customized Unfolding Enhancer (CUE) includes an unfolding Retinex decomposition step embedded with a Masked Autoencoder-based customized illumination prior (see Fig. 3) and an enhancement step. First, a low-light image 𝐈𝐥\mathbf{I_{l}} is decomposed into illumination, reflectance, and noise terms, [𝐋l,𝐑l,𝐍l][\mathbf{L}_{l},\mathbf{R}_{l},\mathbf{N}_{l}], by the unfolding Recinx decomposition step. Then, the illumination adjustment and reflectance restoration networks are employed to enhance the illumination and restore the reflectance components. Finally, the MAE-based noise prior (see Fig. 6) is redeveloped as a regularization term to further eliminate noise in the enhanced image by constraining the gradient representation.

In the past decade, deep learning-based low-light image enhancement approaches [34, 51, 61, 29, 55, 35, 9, 27, 50, 62, 25, 68, 24, 67, 26, 66], have achieved remarkable performance gains. Wei et al. [51] first attempted to integrate Retinex decomposition and illumination adjustment into an end-to-end network. Zhang et al. [61] employed three subnetworks for decomposition, reflection restoration, and illumination adjustment. Wang et al. [46] proposed to enhance the underexposed images by estimating an illumination map. Guo et al. [12] adjusted the dynamic range of the low-light images by predicting pixel-wise mapping without paired or unpaired data. However, these learning-based methods empirically build network architectures in a black-box fashion, neglecting the inherent prior and lacking adequate interpretability. To address this limitation, Liu et al. [32] developed a Retinex-inspired unrolling strategy to discover the architectures embedded with atomic priors automatically. Wu et al. [53] proposed to unroll the Retinex decomposition and formulated it as an implicit prior regularized model. Zheng et al. [63] unfolded the total variant minimization algorithms to provide fidelity and smoothness constraints via a learnable noise level map. While these deep unfolding solutions improve the interpretability, the priors are still vaguely delivered by convolutional neural networks constructed empirically in a black-box manner, resulting in an ambiguous and implicit prior principle.

2.2 Masked Image Modeling

Inspired by the success of BERT [8] in NLP, BEiT [3] proposes a masked image modeling technique that predicts the visual token of each masked patch as a pretext task. MAE [14] presents an asymmetric encoder-decoder paradigm that focuses on the visible patch. The encoder operates on this patch, while the decoder reconstructs the original image from the latent representation and mask tokens. To improve representation capability, CAE [6] decouples the encoding and pretext task completion roles through an alignment constraint and a latent contextual regressor. MVP [52] aims to create robust and multi-modal representation by predicting the guidance from other modalities. However, their primary purpose is to serve as a pre-training technique to enhance performance on downstream tasks.

3 Methodology

In this section, we introduce the proposed Customized Unfolding Enhancer (CUE) paradigm, which consists of a structure flow and an optimization flow, as illustrated in Fig. 2. In the structure flow, we first describe an unfolding Retinex decomposition step embedded with an MAE-based customized illumination prior and provide an enhancement step. Then we go into the optimization flow, including Retinex decomposition, enhancement, and a gradient representation regularization derived from an MAE-based customized noise prior.

3.1 Structure Flow

3.1.1 Retinex Decomposition

Classical Retinex theory assumes that an observed image can be decomposed into illumination and reflectance components. Due to the inevitable noise in low-light images, [28] introduces a noise term:

𝐈=𝐑𝐋+𝐍,\mathbf{I}=\mathbf{R}\circ\mathbf{L}+\mathbf{N}, (1)

where I\rm I, R\rm R, L\rm L, and N\rm N denote the observed image, reflectance, illumination, and noise, respectively. The operator \circ represents the element-wise multiplication. Referring to the above observation model, the reflectance, illumination, and noise terms can be simultaneously obtained by solving the minimization problem:

argmin𝐑,𝐋,𝐍𝐍1+γρ1(𝐍)+βρ2(𝐋)+ωρ3(𝐑),\arg\min_{\mathbf{R},\mathbf{L},\mathbf{N}}\|\mathbf{N}\|_{1}+\gamma\rho_{1}(\mathbf{N})+\beta\rho_{2}(\mathbf{L})+\omega\rho_{3}(\mathbf{R}), (2a)
s.t.𝐈=𝐑𝐋+𝐍,s.t.\mathbf{I}=\mathbf{R}\circ\mathbf{L}+\mathbf{N}, (2b)

where 1\|\cdot\|_{1} denotes 1\ell_{1} norm, ρ1\rho_{1}, ρ2\rho_{2}, and ρ3\rho_{3} are the regularization terms denoting imposed priors over 𝐍\mathbf{N}, 𝐋\mathbf{L}, and 𝐑\mathbf{R}, and γ\gamma, β\beta, and ω\omega are trade-off parameters. 𝐍1\|\mathbf{N}\|_{1} is a general prior that simply constrains the noise’s sparsity, which cannot accurately model the noise distribution in the low-light image. Therefore, we incorporate an implicit noise prior, ρ1(𝐍)\rho_{1}(\mathbf{N}), to further estimate the extreme noise.

By introducing a penalty function to remove the equalization constraint, Eq. (2b) is written as:

(𝐍,𝐋,𝐑)\displaystyle\mathcal{L}(\mathbf{N},\mathbf{L},\mathbf{R}) =𝐍1+γρ1(𝐍)+βρ2(𝐋)\displaystyle=\|\mathbf{N}\|_{1}+\gamma\rho_{1}(\mathbf{N})+\beta\rho_{2}(\mathbf{L}) (3)
+ωρ3(𝐑)+μ2𝐈𝐑𝐋𝐍F2,\displaystyle+\omega\rho_{3}(\mathbf{R})+\frac{\mu}{2}\|\mathbf{I}-\mathbf{R}\circ\mathbf{L}-\mathbf{N}\|_{F}^{2},

where F\|\cdot\|_{F} represents Frobenius norm and μ\mu is a penalty parameter. The equivalent objective function can be solved by iteratively updating 𝐋\mathbf{L}, 𝐑\mathbf{R}, and 𝐍\mathbf{N} while considering other variables that have been estimated in previous iterations as constants. Here we present the sub-problem solutions for the (k+1)(k+1)-th iteration.

L sub-problem: Given the estimated reflectance and noise at iteration kk, 𝐑k\mathbf{R}_{k}, and 𝐍k\mathbf{N}_{k}, 𝐋\mathbf{L} can be updated as:

𝐋k+1\displaystyle\mathbf{L}_{k+1} =argmin𝐋(𝐍k,𝐋,𝐑k)\displaystyle=\arg\min_{\mathbf{L}}\mathcal{L}\left(\mathbf{N}_{k},\mathbf{L},\mathbf{R}_{k}\right) (4)
=argmin𝐋βρ2(𝐋)+μk2𝐈𝐑k𝐋𝐍kF2g(𝐋).\displaystyle=\arg\min_{\mathbf{L}}\beta\rho_{2}(\mathbf{L})+\underbrace{\frac{\mu_{k}}{2}\left\|\mathbf{I}-\mathbf{R}_{k}\circ\mathbf{L}-\mathbf{N}_{k}\right\|_{F}^{2}}_{g(\mathbf{L})}.

By applying the proximal gradient method to Eq. (4), we can derive:

𝐋k+1\displaystyle\mathbf{L}_{k+1} =proxβρ2(𝐋kα1g(𝐋k)),\displaystyle=\operatorname{prox}_{\beta\rho_{2}}\left(\mathbf{L}_{k}-\alpha_{1}\nabla g\left(\mathbf{L}_{k}\right)\right), (5)

where proxβρ2\operatorname{prox}_{\beta\rho_{2}} is the proximal gradient operator corresponding to the prior ρ2\rho_{2}, α1\alpha_{1} denotes the updating step size, and g(𝐋k)=μk𝐑k(𝐑k𝐋k+𝐍k𝐈)\nabla g\left(\mathbf{L}_{k}\right)=\mu_{k}\mathbf{R}_{k}\circ\left(\mathbf{R}_{k}\circ\mathbf{L}_{k}+\mathbf{N}_{k}-\mathbf{I}\right).

Refer to caption
Figure 3: The MAE-based customized illumination prior. A UNet-like convolution neural network with a mask image modeling strategy takes the whole randomly masked illumination map as input to predict the corresponding map processed by a bilateral filter. After pre-training, the encoder fMAElf_{MAE_{l}} is embedded into the design of the proximal operation in the L sub-problem.

Existing methods for designing the functional form of proxβρ2\operatorname{prox}_{\beta\rho{2}} construct implicit networks in a black box manner, resulting in an ambiguous and implicit prior principle. However, according to Retinex theory, the illumination term should be smooth while preserving the structure and adapting to the diverse lighting conditions [23]. Fortunately, the bilateral filter operator is inherently capable of filtering noise while preserving edges [5]. In this paper, standing on the shoulder of MAE [14], which is trained using the masked image modeling strategy and embraces powerful feature representation capability, we design a pre-trained MAE with the illumination map of a normal-light image filtered by the bilateral filter as the target feature, thereby creating a customized illumination prior. We then embed the learned illumination prior into the design of proxβρ2\operatorname{prox}_{\beta\rho{2}} to enhance its transparency.

The original MAE is an asymmetric encoder-decoder architecture that reconstructs the masked pixels given partially visible patches. During pre-training, a high proportion of the input image is masked, and the encoder only maps the visible patches into a latent feature representation. Then, a lightweight decoder reconstructs the original images in pixels from the latent representation and mask tokens. After the pre-training, the decoder is discarded, and the encoder is applied to uncorrupted images (full sets of patches) for feature representation.

Refer to caption
Figure 4: The details of the design of proxβρ2\operatorname{prox}_{\beta\rho_{2}}.

In this paper, as shown in Fig. 3, deviated from the original MAE, we implement a UNet-like convolution neural network with a customized target feature in the masked image modeling strategy:

  • employing the vanilla convolution operator to construct the encoder-decoder architecture;

  • implementing diverse lighting conditions by augmenting the input image with gamma transformation;

  • dividing the corresponding illumination maps (obtained by the max intensity of RGB channels) into regular non-overlapping regions, randomly sampling a subset of regions, and masking the remaining ones while maintaining the whole structure of the map;

  • processing all the regions (both visible and masked regions) through the encoder and decoder to reconstruct the illumination map filtered by the bilateral filter;

  • for accentuation, the input of the network is the complete image structure rather than image patches.

After pre-training, we incorporate its encoder fMAELf_{{MAE}_{L}} into the design of proxβρ2\operatorname{prox}_{\beta\rho_{2}} as described in Fig. 4. Specifically, feeding the (k+1)(k+1)-th iteration 𝐋k+1\mathbf{L}_{k+1} into fMAELf_{{MAE}_{L}} to generate the feature representation possessing the properties of the illumination map:

𝐋fp=fMAEL(𝐋k+1).\mathbf{L}_{fp}=f_{{MAE}_{L}}\left(\mathbf{L}_{k+1}\right). (6)

Then, based on the prior features, an illumination map, 𝐋^k+1\hat{\mathbf{L}}_{k+1}, integrated with the customized illumination prior is formulated as:

𝐋^k+1=𝐋k+1Sig(Conv(𝐋fp))+Sig(Conv(𝐋fp)),\hat{\mathbf{L}}_{k+1}=\mathbf{L}_{k+1}*Sig(Conv(\mathbf{L}_{fp}\uparrow))+Sig(Conv(\mathbf{L}_{fp}\uparrow)), (7)

where \uparrow denotes upsampling to the same spatial size as 𝐋^k+1\hat{\mathbf{L}}_{k+1}, ConvConv indicates three convolution layers to generate the modulation parameter, and SigSig is the Sigmoid function. The qualitative and quantitative evaluations of the effectiveness of the fMAELf_{{MAE}_{L}} is presented in Fig. 8 and Table 4.

R sub-problem: Dropping the terms unrelated to 𝐑\mathbf{R} gives the following optimization problem:

𝐑k+1\displaystyle\mathbf{R}_{k+1} =argmin𝐑(𝐍k,𝐋^k+1,𝐑)\displaystyle=\arg\min_{\mathbf{R}}\mathcal{L}\left(\mathbf{N}_{k},\hat{\mathbf{L}}_{k+1},\mathbf{R}\right) (8)
=argmin𝐑ωρ3(𝐑)+μk2𝐈𝐑𝐋^k+1𝐍kF2h(𝐑).\displaystyle=\arg\min_{\mathbf{R}}\omega\rho_{3}(\mathbf{R})+\underbrace{\frac{\mu_{k}}{2}\left\|\mathbf{I}-\mathbf{R}\circ\hat{\mathbf{L}}_{k+1}-\mathbf{N}_{k}\right\|_{F}^{2}}_{h(\mathbf{R})}.

Similarly, Eq. (8) is written as:

𝐑k+1=proxωρ3(𝐑kα2h(𝐑k)),\mathbf{R}_{k+1}=\operatorname{prox}_{\omega\rho_{3}}\left(\mathbf{R}_{k}-\alpha_{2}\nabla h\left(\mathbf{R}_{k}\right)\right), (9)

where proxωρ3\operatorname{prox}_{\omega\rho_{3}} is the proximal gradient operator corresponding to the prior ρ3\rho_{3}, implemented by two Conv layers followed by the ReLU activation, α2\alpha_{2} indicates the updating step size, and h(𝐑k)=μk𝐋^k+1(𝐋^k+1𝐑k+𝐍k𝐈)\nabla h\left(\mathbf{R}_{k}\right)=\mu_{k}\hat{\mathbf{L}}_{k+1}\circ\left(\hat{\mathbf{L}}_{k+1}\circ\mathbf{R}_{k}+\mathbf{N}_{k}-\mathbf{I}\right).

N sub-problem: Collecting the terms related to 𝐍\mathbf{N} leads to the problem as follows:

𝐍k+1\displaystyle\mathbf{N}_{k+1} =argmin𝐍(𝐍,𝐋^k+1,𝐑k+1)\displaystyle=\arg\min_{\mathbf{N}}\mathcal{L}\left(\mathbf{N},\hat{\mathbf{L}}_{k+1},\mathbf{R}_{k+1}\right) (10)
=argmin𝐍𝐍1+γρ1(𝐍)\displaystyle=\arg\min_{\mathbf{N}}\left\|\mathbf{N}\right\|_{1}+\gamma\rho_{1}(\mathbf{N})
+μk2𝐈𝐑k+1𝐋^k+1𝐍F2.\displaystyle+\frac{\mu_{k}}{2}\left\|\mathbf{I}-\mathbf{R}_{k+1}\circ\hat{\mathbf{L}}_{k+1}-\mathbf{N}\right\|_{F}^{2}.

The solution of Eq. (10) is formulated as:

𝐍k+1=proxγρ1(Shrink(𝐈𝐑k+1𝐋^k+1,1μk)),\mathbf{N}_{k+1}=\operatorname{prox}_{\gamma\rho_{1}}(\rm Shrink(\mathbf{I}-\mathbf{R}_{k+1}\circ\hat{\mathbf{L}}_{k+1},\frac{1}{\mu_{k}})), (11)

where Shrink(𝐗,η)=max{|𝐗|η,0}sign(𝐗)\rm Shrink(\mathbf{X},\eta)=max\left\{|\mathbf{X}|-\eta,0\right\}\cdot\rm sign(\mathbf{X}), sign()\rm sign(\cdot) denotes the sign function, and proxγρ1\operatorname{prox}_{\gamma\rho_{1}} indicates the proximal gradient operator corresponding to the prior ρ1\rho_{1}, achieved by two Conv layers followed by the ReLU activation.

Fig. 5 presents the decomposed illumination, reflectance, and noise terms of the paired low-/normal- light images.

Refer to caption
Figure 5: The first and third rows present decomposed illumination, reflectance, and noise components of the low-/normal- light images, [𝐋l\mathbf{L}_{l}, 𝐑l\mathbf{R}_{l}, 𝐍l\mathbf{N}_{l}] and [𝐋𝐧\mathbf{L_{n}}, 𝐑n\mathbf{R}_{n}, 𝐍n\mathbf{N}_{n}], respectively, where 𝐍n\mathbf{N}_{n} is equal to zero. The second row illustrates the enhanced image, the enhanced illumination map, and the restored reflectance map, [𝐈en,𝐋en,𝐑re\mathbf{I}_{en},\mathbf{L}_{en},\mathbf{R}_{re}].

3.1.2 Enhancement

After the Retinex decomposition, the low-light and normal-light images, IlI_{l} and InI_{n}, are decomposed into their illumination, reflectance, and noise terms, i.e., [𝐋l\mathbf{L}_{l}, 𝐑l\mathbf{R}_{l}, 𝐍l\mathbf{N}_{l}] and [𝐋𝐧\mathbf{L_{n}}, 𝐑n\mathbf{R}_{n}, 𝐍n\mathbf{N}_{n}], where 𝐍n\mathbf{N}_{n} is equal to zero.

Due to the absence of an optimal light level for images, a flexible illumination adjustment system is required. Following [61], an indicator with the value ε\varepsilon is combined with 𝐋l\mathbf{L}_{l} as the input of the illumination adjustment network (ε=mean(𝐋n/𝐋l)(\varepsilon=mean(\mathbf{L}_{n}/\mathbf{L}_{l}) for training and specified by users for inference). As shown in Fig. 2, the illumination adjustment network is implemented by a U-Net [44] with three scales of 12, 24, and 48 channels, respectively. Furthermore, the reflectance maps of the low-light images are degraded by color deviation and noise pollution [61], which are significantly associated with illumination and noise distribution. Thus, we integrate the illumination and noise maps, 𝐋l\mathbf{L}_{l} and 𝐍l\mathbf{N}_{l}, with the degraded reflectance, 𝐑l\mathbf{R}_{l}, into the reflectance restoration network. Its structure is the same as the illumination adjustment network. Finally, the enhanced image, 𝐈en\mathbf{I}_{en}, will be obtained by element-wise multiplication between the enhanced illumination and the restored reflectance maps: 𝐈en=𝐋en𝐑re\mathbf{I}_{en}=\mathbf{L}_{en}\circ\mathbf{R}_{re}. Fig. 5 presents the enhanced illumination, restored reflectance components, and the corresponding enhanced image.

3.2 Optimization Flow

Orthogonal to structure design, we introduce elaborately designed loss functions to enable the network for better optimization, thus producing pleasing images.

Retinex decomposition: Recovering illumination, reflectance, and noise terms simultaneously is an ill-posed problem. Fortunately, paired low-/normal- light images are available, and the reflectance of a certain scene should be shared across varying light conditions. Therefore, we maximize similarity between the decomposed reflectance pair, i.e., rsD=𝐑l𝐑n1\mathcal{L}_{rs}^{D}=\left\|\mathbf{R}_{l}-\mathbf{R}_{n}\right\|_{1}. Furthermore, the illumination maps should be piece-wise smooth and mutually consistent. The illumination smoothness [61] is implemented by isD=𝐋lmax(|𝐈l|,ϵ)1+𝐋nmax(|𝐈n|,ϵ)1\mathcal{L}_{is}^{D}=\left\|\frac{\nabla\mathbf{L}_{l}}{\max\left(\left|\nabla\mathbf{I}_{l}\right|,\epsilon\right)}\right\|_{1}+\left\|\frac{\nabla\mathbf{L}_{n}}{\max\left(\left|\nabla\mathbf{I}_{n}\right|,\epsilon\right)}\right\|_{1}, where \nabla denotes the first order derivative operator in horizontal and vertical directions, |||\cdot| means the absolute value operator, and ϵ\epsilon is set to 0.010.01 to avoid zero denominators. The mutual consistency [60] is formulated as mcD=𝐌exp(c𝐌)1\mathcal{L}_{mc}^{D}=\|\mathbf{M}\circ\exp(-c\cdot\mathbf{M})\|_{1} with 𝐌=|𝐋l|+|𝐋n|\mathbf{M}=\left|\nabla\mathbf{L}_{l}\right|+\left|\nabla\mathbf{L}_{n}\right|. It encourages the strong mutual edges to be well preserved and all weak edges to be effectively suppressed. In addition, the reconstruction error is constrained by reD=𝐈l𝐑l𝐋l𝐍l1+𝐈n𝐑n𝐋n1\mathcal{L}_{re}^{D}=\left\|\mathbf{I}_{l}-\mathbf{R}_{l}\circ\mathbf{L}_{l}-\mathbf{N}_{l}\right\|_{1}+\left\|\mathbf{I}_{n}-\mathbf{R}_{n}\circ\mathbf{L}_{n}\right\|_{1}. In total, the loss function of Retinex decomposition is as follows:

D=reD+0.009rsD+0.15msD+0.2isD.\mathcal{L}^{D}=\mathcal{L}_{re}^{D}+0.009*\mathcal{L}_{rs}^{D}+0.15*\mathcal{L}_{ms}^{D}+0.2*\mathcal{L}_{is}^{D}. (12)

Enhancement: The loss function in this step comprises illumination adjustment and reflectance restoration:

E\displaystyle\mathcal{L}^{E} =𝐑re𝐑n1+(1SSIM(𝐑re,𝐑n))\displaystyle=\left\|\mathbf{R}_{re}-\mathbf{R}_{n}\right\|_{1}+(1-\rm SSIM(\mathbf{R}_{re},\mathbf{R}_{n})) (13)
+𝐋en𝐋n1+𝐋en𝐋n1,\displaystyle+\left\|\mathbf{L}_{en}-\mathbf{L}_{n}\right\|_{1}+\left\|\nabla\mathbf{L}_{en}-\nabla\mathbf{L}_{n}\right\|_{1},

where SSIM(,)\rm SSIM(\cdot,\cdot) is the structure similarity measurement.

Refer to caption
Figure 6: The MAE-based customized noise prior. An MAE-like encoder-decoder architecture takes the visible patches as input and predicts the HOG feature of the masked patches. After pre-training, the encoder fMAENf_{MAE_{N}} possesses the powerful gradient representation capability, thus employed as a regularization term to constrain the underlying noise in the enhanced image.

Customized noise prior: Low-light image enhancement focuses on correcting the luminance while eliminating the amplified noise after enhancement. Without noise removal operation, the inherent noise in low-light images will not be eliminated regardless of the adjusted lightness, i.e., being irrelevant to enhanced lightness. Histograms of Oriented Gradients [7] (HOG) describes the distribution of gradient orientations and magnitudes within a local sub-region while being innately irrelevant to enhanced lightness owing to the local normalization operation. Therefore, as shown in Fig. 6, we customize a noise prior with the MAE, where the target feature is the HOG feature of normal-light images:

  • obtaining the HOG feature of the whole image;

  • dividing the image into regular non-overlapping patches, randomly sampling a subset of patches, and masking the remaining ones;

  • flattening the histograms of masked patches and concatenating them into a 1-D vector as the target feature;

  • the encoder and decoder are implemented by vanilla vision transformers as in MAE.

After pre-training, the encoder of the noise prior, fMAENf_{MAE_{N}}, possesses the powerful gradient representation capability. Thus, we redevelop it as a regularization term to suppress noise by imposing gradient representation consistency constraint between the enhanced and normal-light images, which is formulated as:

N=fMAEN(𝐈en)fMAEN(𝐈n)1.\mathcal{L}^{N}=\left\|f_{{MAE}_{N}}(\mathbf{I}_{en})-f_{{MAE}_{N}}(\mathbf{I}_{n})\right\|_{1}. (14)

The improvement of fMAENf_{{MAE}_{N}} in the noise elimination is illustrated in Fig. 8 and Table 4.

In total, the loss function of the proposed CUE for the end-to-end training paradigm is formulated as:

=D+E+N.\mathcal{L}=\mathcal{L}^{D}+\mathcal{L}^{E}+\mathcal{L}^{N}. (15)

4 Experiments

4.1 Experimental Settings

Datasets. To evaluate the performance of our proposed CUE, we conduct experiments on two commonly used benchmarks, including LOL [51] and Huawei [13]. LOL dataset contains 500 paired low-/normal images, and we split 485 for training and 15 for testing as the official selection. For the Huawei dataset, we randomly select 2200 images for training and the remaining 280 for testing.

Implementations. We implement our proposed method with PyTorch on a single NVIDIA GTX 3090 GPU. We employ the Adam optimizer with β1=0.9\beta_{1}=0.9, β2=0.99\beta_{2}=0.99 to update our model for 3000KK iterations with a batch of 16. The initial learning rate is set to 1×1041\times 10^{-4} and decreases with a factor of 0.5 every 50KK iterations. The patch size is set to 64×6464\times 64. The coefficients in loss functions are consistent for training on both LOL and Huawei datasets.

Table 1: Quantitative comparison on the LOL and Huawei datasets in terms of PSNR\uparrow, SSIM\uparrow, and NIQE\downarrow. The best and the second results are marked in bold and underlined, respectively.
Dataset Metrics Method
SRIE RetinexNet MBLLEN GLADNet TBEFN KinD DRBN
LOL/Huawei PSNR 12.28/13.40 16.77/16.65 17.56/16.63 19.72/17.76 17.35/16.88 20.87/16.48 18.65/18.46
SSIM 0.596/0.477 0.425/0.485 0.729/0.526 0.680/0.521 0.781/0.575 0.802/0.540 0.801/0.635
NIQE 7.506/6.727 8.878/7.548 3.986/5.418 6.475/5.276 3.837/5.024 4.710/5.704 4.454/5.378
Dataset Metrics Method
URetinexNet ZeroDCE++ EnGAN RUAS SCI CUE (Ours) CUE++ (Ours)
LOL/Huawei PSNR 21.33/19.25 15.53/16.03 17.48/17.03 16.41/13.76 14.78/15.77 21.67/20.31 21.86/20.38
SSIM 0.834/0.608 0.567/0.507 0.674/0.514 0.500/0.516 0.522/0.480 0.774/0.658 0.841/0.670
NIQE 4.262/5.761 7.793/6.641 4.154/5.148 6.340/6.116 7.877/6.879 3.776/4.908 3.772/4.795
Refer to caption
Figure 7: Visual comparison with state-of-the-art methods on the LOL and Huawei dataset. Please zoom in for details.

4.2 Comparison with State-of-the-Arts

We compare our CUE with 12 state-of-the-art approaches: SRIE [11], RetinexNet [51], MBLLEN [36], GLADNet [49], TBEFN [35], KinD [61], DRBN [55], URetinexNet [53], Zero-DCE++ [25], EnlightenGAN [62], RUAS [32], SCI [40] on the LOL and Huawei datasets.

Quantitative results. We employ the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and NIQE [43] as numerical evaluation metrics. Note that the URetinexNet [53] and KinD [61] are not trained end-to-end, where decomposition, unfolding, and enhancement stages are trained separately. For a fair comparison, we also implement our CUE by individually training the Retinex decomposition, reflectance restoration, and illumination adjustment stages and denote it as CUE++. To emphasize, CUE++ has the same model size, FLOPs, and runtime as CUE, and the only difference is the training strategy. As illustrated in Table 1, CUE/CUE++ achieves the best results on the two benchmarks, substantiating its effectiveness.

Qualitative results. Fig. 7 shows the subjective visual quality comparison on the challenging extremely low-light images from the Huawei dataset. TBEFN [35], GLADNet [49], and URetinexNet [53] generate the results with overexposure, while MBLLEN [36], SCI [40], and Zero-DCE++ [25] yield underexposed enhanced results. Undeniably, DRBN [55] and KinD [61] predict well-exposed images. However, the color distortion in their results is non-ignorable. RetinexNet [51], SCI [40], and TBEFN [35] produce the result with color deviation and artifacts. In contrast, our CUE effectively enhances lightness and reveals details while eliminating noise.

Table 2: Quantitative comparison in terms of Parameters, FLOPs, and Runtime.
Method RetinexNet MBLLEN GLADNet TBEFN KinD DRBN URetinexNet EnGAN SCI CUE
Params 0.838M 0.450M 1.128M 0.486M 8.540M 0.577M 0.360M 8.367M 384 0.25M
FLOPs (G) 148.54 21.37 275.32 26.33 36.57 42.41 233.09 72.61 0.14 157.32
Runtime (s) 0.023 0.159 0.024 0.035 0.068 0.140 0.115 0.011 0.001 0.104

Computational complexity. We analyze the computational complexity of baselines and our CUE. We discard the traditional method (SRIE), and select SCI with the least parameters in three efficient LLIE approaches (Zero-DCE++, RUAS, and SCI) for comparison. Table 2 reports the model parameters, FLOPs, and runtime averaged on 50 images of 512×\times512. The running time is measured on a PC with an NVIDIA GTX 3070 GPU and an Intel i7-10700F CPU. Compared with the baselines, our CUE performs best with a balanced model size and runtime.

4.3 Ablation Studies

We conduct ablation studies on the LOL dataset to further investigate the effectiveness of our proposed CUE under different configurations.

Effect of the number of unfolding stages: To explore the effect of the number of unfolding stages in the Retinex decomposition stage, we conduct experiments on proposed CUE with different numbers of stage KK. As illustrated in Table 3, the performance increases as the number of stages increases until reaching 3. The performance will decline when further increasing K, which may be hampered by gradient propagation. We set K=3\rm K=3 as the default stage number to balance the performance and computation cost. The optimal KK and performance trend on the Huawei dataset is consistent with that of LOL (see supplementary material).

Table 3: PSNR and SSIM scores of the proposed CUE with different numbers of stages on the LOL dataset.
      Stage (KK)       PSNR\uparrow       SSIM\uparrow
      1       21.06       0.739
      2       21.48       0.767
      3       21.67       0.774
      4       21.65       0.771
      5       21.54       0.766

Effect of different components: We investigate the effectiveness of the proposed key components, i.e., fMAELf_{MAE_{L}} and fMAENf_{MAE_{N}}. The quantitative and qualitative evaluations are presented in Table 4 and Fig. 8. fMAELf_{MAE_{L}} denotes solving the L sub-problem with the encoder of the pre-trained illumination prior, and fMAENf_{MAE_{N}} denotes employing the gradient representation prior as the regularization. As shown in Fig. 8, removing fMAELf_{MAE_{L}} will cause the illumination map and the enhanced image to be over-smoothed. The enhanced image will suffer from severe noise pollution when discarding fMAENf_{MAE_{N}}. Combining them will achieve a notable performance improvement and a satisfying visual quality. Supplementation presents the detailed visual results.

Refer to caption
Figure 8: Ablation studies of fMAELf_{MAE_{L}} and fMAENf_{MAE_{N}}.

4.4 Extension

We further verify the effectiveness of the proposed gradient representation consistency constraint in image denoising task. We conduct the experiments on the SIDD [1] dataset with two well-known baselines, DNCNN [59] and MPRNet [57]. Table 5 and Fig. 9 demonstrate that incorporating the gradient representation consistency constraint will improve the performance of the baselines.

Table 4: PSNR and SSIM scores of the ablation studies for fMAELf_{MAE_{L}} and fMAENf_{MAE_{N}} on the LOL dataset.
fMAELf_{MAE_{L}} fMAENf_{MAE_{N}} PSNR\uparrow SSIM\uparrow
21.05 0.749
21.48 0.752
21.54 0.758
21.67 0.774
Refer to caption
Figure 9: Extension experiments on the effectiveness of fAMENf_{{AME}_{N}} for noise suppression. The numbers under each figure represent the PSNR/SSIM scores.
Table 5: Extension experiments of the gradient representation consistency, fMAENf_{MAE_{N}}, on image denoising task. The performance is evaluated on the SIDD [1] dataset.
Method PSNR\uparrow SSIM\uparrow
DNCNN [59] 32.43 0.841
DNCNN [59] + fMAENf_{MAE_{N}} 32.68 0.857
MPRNet [57] 39.25 0.956
MPRNet [57] + fMAENf_{MAE_{N}} 39.32 0.957

5 Conclusion

In this paper, we have demonstrated the potential of customized learnable priors for enhancing low-light images and proposed a more transparent and interpretable deep unfolding paradigm. The customized illumination and noise priors are pre-trained using a masked image modeling strategy with specific target features. The learnable illumination prior is responsible for capturing the illumination properties of normal-light images, and we have embedded it into the unfolding architecture to improve its transparency in the structure flow. The noise prior aims to learn the gradient representation of normal-light images, and we have redeveloped it as a regularization term within the loss function to eliminate noise in the model output during the optimization flow. Extensive experiments on multiple low-light image datasets have shown the superiority of our deep unfolding paradigm.

Acknowledgments

This work was supported by the JKW Research Funds under Grant 20-163-14-LZ-001-004-01, and the Anhui Provincial Natural Science Foundation under Grant 2108085UD12. We acknowledge the support of GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC.

References

  • [1] Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1692–1700, 2018.
  • [2] Tarik Arici, Salih Dikbas, and Yucel Altunbasak. A histogram modification framework and its application for image contrast enhancement. IEEE Transactions on Image Processing, 18(9):1921–1935, 2009.
  • [3] Hangbo Bao, Li Dong, and Furu Wei. Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254, 2021.
  • [4] Turgay Celik and Tardi Tjahjadi. Contextual and variational contrast enhancement. IEEE Transactions on Image Processing, 20(12):3431–3441, 2011.
  • [5] Jiawen Chen, Sylvain Paris, and Frédo Durand. Real-time edge-aware image processing with the bilateral grid. ACM Transactions on Graphics, 26(3):103–es, 2007.
  • [6] Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang. Context autoencoder for self-supervised representation learning. arXiv preprint arXiv:2202.03026, 2022.
  • [7] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 886–893, 2005.
  • [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  • [9] Minhao Fan, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Integrating semantic segmentation and Retinex model for low-light image enhancement. In Proceedings of the 28th ACM International Conference on Multimedia, pages 2317–2325, 2020.
  • [10] Xueyang Fu, Yinghao Liao, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding. A probabilistic method for image enhancement with simultaneous illumination and reflectance estimation. IEEE Transactions on Image Processing, 24(12):4965–4977, 2015.
  • [11] Xueyang Fu, Delu Zeng, Yue Huang, Xiao-Ping Zhang, and Xinghao Ding. A weighted variational model for simultaneous reflectance and illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2782–2790, 2016.
  • [12] Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, and Runmin Cong. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1780–1789, 2020.
  • [13] Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu Zou, Fang Lin, and Songchen Han. R2RNet: Low-light image enhancement via real-low to real-normal network. arXiv preprint arXiv:2106.14501, 2021.
  • [14] Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  • [15] Jie Huang, Yajing Liu, Xueyang Fu, Man Zhou, Yang Wang, Feng Zhao, and Zhiwei Xiong. Exposure normalization and compensation for multiple-exposure correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6043–6052, June 2022.
  • [16] Jie Huang, Yajing Liu, Feng Zhao, Keyu Yan, Jinghao Zhang, Yukun Huang, Man Zhou, and Zhiwei Xiong. Deep Fourier-based exposure correction network with spatial-frequency interaction. In Proceedings of the European Conference on Computer Vision, pages 163–180. Springer, 2022.
  • [17] Jie Huang, Zhiwei Xiong, Xueyang Fu, Dong Liu, and Zheng-Jun Zha. Hybrid image enhancement with progressive laplacian enhancing unit. In Proceedings of the 27th ACM International Conference on Multimedia, page 1614–1622, 2019.
  • [18] Jie Huang, Feng Zhao, Man Zhou, Jie Xiao, Naishan Zheng, Kaiwen Zheng, and Zhiwei Xiong. Learning sample relationship for exposure correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9904–9913, June 2023.
  • [19] Jie Huang, Man Zhou, Yajing Liu, Mingde Yao, Feng Zhao, and Zhiwei Xiong. Exposure-consistency representation learning for exposure correction. In Proceedings of the 30th ACM International Conference on Multimedia, page 6309–6317, 2022.
  • [20] Haidi Ibrahim and Nicholas Sia Pik Kong. Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Transactions on Consumer Electronics, 53(4):1752–1758, 2007.
  • [21] Yifan Jiang, Xinyu Gong, Ding Liu, Yu Cheng, Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou, and Zhangyang Wang. Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30:2340–2349, 2021.
  • [22] Dian Jin, Long Ma, Risheng Liu, and Xin Fan. Bridging the gap between low-light scenes: Bilevel learning for fast adaptation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2401–2409, 2021.
  • [23] Edwin H Land. The Retinex theory of color vision. Scientific American, 237(6):108–129, 1977.
  • [24] Chongyi Li, Chunle Guo, Linghao Han, Jun Jiang, Ming-Ming Cheng, Jinwei Gu, and Chen Change Loy. Low-light image and video enhancement using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):9396–9416, 2021.
  • [25] Chongyi Li, Chunle Guo, and Chen Change Loy. Learning to enhance low-light image via zero-reference deep curve estimation. arXiv preprint arXiv:2103.00860, 2021.
  • [26] Chongyi Li, Chun-Le Guo, Man Zhou, Zhexin Liang, Shangchen Zhou, Ruicheng Feng, and Chen Change Loy. Embedding Fourier for ultra-high-definition low-light image enhancement. In International Conference on Learning Representations, 2023.
  • [27] Jiaqian Li, Juncheng Li, Faming Fang, Fang Li, and Guixu Zhang. Luminance-aware pyramid network for low-light image enhancement. IEEE Transactions on Multimedia, 23:3153–3165, 2020.
  • [28] Mading Li, Jiaying Liu, Wenhan Yang, Xiaoyan Sun, and Zongming Guo. Structure-revealing low-light image enhancement via robust Retinex model. IEEE Transactions on Image Processing, 27(6):2828–2841, 2018.
  • [29] Seokjae Lim and Wonjun Kim. DSLR: Deep stacked laplacian restorer for low-light image enhancement. IEEE Transactions on Multimedia, 23:4272–4284, 2020.
  • [30] Risheng Liu, Zhiying Jiang, Shuzhou Yang, and Xin Fan. Twin adversarial contrastive learning for underwater image enhancement and beyond. IEEE Transactions on Image Processing, 31:4922–4936, 2022.
  • [31] Risheng Liu, Long Ma, Tengyu Ma, Xin Fan, and Zhongxuan Luo. Learning with nested scene modeling and cooperative architecture search for low-light vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5953–5969, 2022.
  • [32] Risheng Liu, Long Ma, Jiaao Zhang, Xin Fan, and Zhongxuan Luo. Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10561–10570, 2021.
  • [33] Risheng Liu, Long Ma, Yuxi Zhang, Xin Fan, and Zhongxuan Luo. Underexposed image correction via hybrid priors navigated deep propagation. IEEE Transactions on Neural Networks and Learning Systems, 33(8):3425–3436, 2021.
  • [34] Kin Gwn Lore, Adedotun Akintayo, and Soumik Sarkar. LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61:650–662, 2017.
  • [35] Kun Lu and Lihong Zhang. TBEFN: A two-branch exposure-fusion network for low-light image enhancement. IEEE Transactions on Multimedia, 23:4093–4105, 2020.
  • [36] Feifan Lv, Feng Lu, Jianhua Wu, and Chongsoon Lim. MBLLEN: Low-light image/video enhancement using cnns. In Proceedings of the British Machine Vision Conference, pages 1–13, 2018.
  • [37] Long Ma, Dian Jin, Nan An, Jinyuan Liu, Xin Fan, and Risheng Liu. Bilevel fast scene adaptation for low-light image enhancement. arXiv preprint arXiv:2306.01343, 2023.
  • [38] Long Ma, Risheng Liu, Yiyang Wang, Xin Fan, and Zhongxuan Luo. Low-light image enhancement via self-reinforced retinex projection model. IEEE Transactions on Multimedia, 2022.
  • [39] Long Ma, Risheng Liu, Jiaao Zhang, Xin Fan, and Zhongxuan Luo. Learning deep context-sensitive decomposition for low-light image enhancement. IEEE Transactions on Neural Networks and Learning Systems, 33(10):5666–5680, 2021.
  • [40] Long Ma, Tengyu Ma, Risheng Liu, Xin Fan, and Zhongxuan Luo. Toward fast, flexible, and robust low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5637–5646, June 2022.
  • [41] Long Ma, Tianjiao Ma, Xinwei Xue, Xin Fan, Zhongxuan Luo, and Risheng Liu. Practical exposure correction: Great truths are always simple. arXiv preprint arXiv:2212.14245, 2022.
  • [42] Tengyu Ma, Long Ma, Xin Fan, Zhongxuan Luo, and Risheng Liu. PIA: Parallel architecture with illumination allocator for joint enhancement and detection in low-light. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2070–2078, 2022.
  • [43] Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters, 20(3):209–212, 2012.
  • [44] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • [45] Himanshu Singh, Anil Kumar, LK Balyan, and Girish Kumar Singh. A novel optimally gamma corrected intensity span maximization approach for dark image enhancement. In 2017 22nd International Conference on Digital Signal Processing (DSP), pages 1–5. IEEE, 2017.
  • [46] Ruixing Wang, Qing Zhang, Chi-Wing Fu, Xiaoyong Shen, Wei-Shi Zheng, and Jiaya Jia. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6849–6857, 2019.
  • [47] Shuhang Wang, Jin Zheng, Hai-Miao Hu, and Bo Li. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Transactions on Image Processing, 22(9):3538–3548, 2013.
  • [48] Wei Wang, Na Sun, and Michael K Ng. A variational gamma correction model for image contrast enhancement. Inverse Problems & Imaging, 13(3):461, 2019.
  • [49] Wenjing Wang, Chen Wei, Wenhan Yang, and Jiaying Liu. GLADNet: Low-light enhancement network with global awareness. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 751–755. IEEE, 2018.
  • [50] Yang Wang, Yang Cao, Zheng-Jun Zha, Jing Zhang, Zhiwei Xiong, Wei Zhang, and Feng Wu. Progressive Retinex: Mutually reinforced illumination-noise perception network for low-light image enhancement. In Proceedings of the 27th ACM International Conference on Multimedia, pages 2015–2023, 2019.
  • [51] Chen Wei, Wenjing Wang, Wenhan Yang, and Jiaying Liu. Deep Retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560, 2018.
  • [52] Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, and Qi Tian. Mvp: Multimodality-guided visual pre-training. arXiv preprint arXiv:2203.05175, 2022.
  • [53] Wenhui Wu, Jian Weng, Pingping Zhang, Xu Wang, Wenhan Yang, and Jianmin Jiang. URetinex-net: Retinex-based deep unfolding network for low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5901–5910, June 2022.
  • [54] Xinwei Xue, Jia He, Long Ma, Yi Wang, Xin Fan, and Risheng Liu. Best of both worlds: See and understand clearly in the dark. In Proceedings of the 30th ACM International Conference on Multimedia, pages 2154–2162, 2022.
  • [55] Wenhan Yang, Shiqi Wang, Yuming Fang, Yue Wang, and Jiaying Liu. From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3063–3072, 2020.
  • [56] Se-Hwan Yun, Jin Heon Kim, and Suki Kim. Contrast enhancement using a weighted histogram equalization. In 2011 IEEE International Conference on Consumer Electronics (ICCE), pages 203–204. IEEE, 2011.
  • [57] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14821–14831, 2021.
  • [58] Jinghao Zhang, Jie Huang, Mingde Yao, Man Zhou, and Feng Zhao. Structure- and texture-aware learning for low-light image enhancement. In Proceedings of the 30th ACM International Conference on Multimedia, page 6483–6492, 2022.
  • [59] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on Image Processing, 26(7):3142–3155, 2017.
  • [60] Yonghua Zhang, Xiaojie Guo, Jiayi Ma, Wei Liu, and Jiawan Zhang. Beyond brightening low-light images. International Journal of Computer Vision, 129(4):1013–1037, 2021.
  • [61] Yonghua Zhang, Jiawan Zhang, and Xiaojie Guo. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the ACM International Conference on Multimedia, pages 1632–1640, 2019.
  • [62] Zunjin Zhao, Bangshu Xiong, Lei Wang, Qiaofeng Ou, Lei Yu, and Fa Kuang. RetinexDIP: A unified deep framework for low-light image enhancement. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–14, 2021.
  • [63] Chuanjun Zheng, Daming Shi, and Wentian Shi. Adaptive unfolding total variation network for low-light image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4439–4448, 2021.
  • [64] Naishan Zheng, Jie Huang, Feng Zhao, Xueyang Fu, and Feng Wu. Unsupervised underexposed image enhancement via self-illuminated and perceptual guidance. IEEE Transactions on Multimedia, pages 1–16, 2022.
  • [65] Naishan Zheng, Jie Huang, Qi Zhu, Man Zhou, Feng Zhao, and Zheng-Jun Zha. Enhancement by your aesthetic: An intelligible unsupervised personalized enhancer for low-light images. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6521–6529, 2022.
  • [66] Man Zhou, Jie Huang, Chun-Le Guo, and Chongyi Li. Fourmer: An efficient global modeling paradigm for image restoration. In International Conference on Machine Learning (ICML)-Oral, pages 42589–42601. PMLR, 2023.
  • [67] Shangchen Zhou, Chongyi Li, and Chen Change Loy. Lednet: Joint low-light enhancement and deblurring in the dark. In European Conference on Computer Vision (ECCV), 2022.
  • [68] Peixian Zhuang, Chongyi Li, and Jiamin Wu. Bayesian retinex underwater image enhancement. Engineering Applications of Artificial Intelligence [Paper Prize Award 2023-Application], 101:104171, 2021.