This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid Learning

Chaobing Zheng †, Zhengguo Li †, Yi Yang and Shiqian Wu* †Joint first authors. The corresponding author is Shiqian Wu.Chaobing Zheng, Yi Yang and Shiqian Wu are with the Institute of Robotics and Intelligent Systems, School of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China(e-mails: [email protected], [email protected], [email protected]).Zhengguo Li is with the Institute for Infocomm Research, Singapore, 138632, (email: [email protected]).
Abstract

A small ISO and a small exposure time are usually used to capture an image in back- or low-light condition which results in an image with negligible motion blur and small noise but looks dark. In this paper, a single image brightening algorithm is introduced to brighten such an image. The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times. The virtual images are first generated via intensity mapping functions (IMFs) which are computed using camera response functions (CRFs) and this is a model-driven approach. Both the virtual images are then enhanced by using a data-driven approach, i.e. a residual convolutional neural network to approach the ground truth images. The model-driven approach and the data-driven one compensate each other in the proposed hybrid learning framework. The final brightened image is obtained by fusing the original image and two virtual images via a multi-scale exposure fusion algorithm with properly defined weights. Experimental results show that the proposed brightening algorithm outperforms existing algorithms in terms of MEF-SSIM metric.

Index Terms:
Single image brightening, hybrid learning, virtual image, multi-scale exposure fusion, data-driven, model-driven

I Introduction

Exposure time and ISO value are the most important factors which can be considered together in various lighting conditions to capture high quality images. For example, in back- or low-light condition, there are two common settings of exposure time and ISO value. The first one is a long exposure time and a small ISO value. The captured image is clean but blurred. The second one is a small exposure time and a large ISO value, and the image is sharp but noisy. Clearly, both settings have difficulty capturing a high quality image, especially when the captured scene is with high dynamic range (HDR). A recommended setting in [1, 2] is a small exposure time and a small ISO value. The captured image is clean and sharp, and most of the high-light regions are not over-exposed, but the image is dark. Therefore, a single image brightening algorithm is highly demanded to increase the brightness so as to obtain a clean, sharp and bright image. Four possible challenging problems in single image brightening are: 1) noise in under-exposed regions could be amplified; 2) the high-light regions could be washed out; 3) there could be lightness distortion in the brightened image [3]; and 4) color could be distorted for the pixels in the under-exposed regions [4].

There are two types of single image brightening algorithms. One is model-driven image processing technologies [5, 6, 7, 8, 9, 10, 11, 12] and the other is data-driven methods such as deep learning ones [13, 15]. Inputs to a model-driven image brightening algorithm are an image (images) to be processed and the related visual prior(s) [1]. The prior(s) is (are) achieved by researchers according to their R&D experience. The model-driven algorithms are built on human’s intelligence. No off-line training is required by the model-driven algorithm but the on-line computational cost could be high. Inputs to a data-driven image brightening algorithm are an image (images) to be processed and the corresponding ground truth image (images) [13, 15]. Off-line training is required by the data-driven algorithm but the on-line computational cost could be low. Each network is trained for each camera in [13]. Considering the pros and cons of these two types of methods, fusing a model-driven method and a data-driven one might be an effective way to study single image brightening. Such a framework is called hybrid learning. Two objectives of this paper are: 1) to explore the feasibility of such a framework rather than a sophisticated neural network for deep learning such that the model-driven and data-driven methods can compensate each other; and 2) to address the four challenging problems in single image brightening.

In this paper, a new hybrid learning framework is introduced for single image brightening under an assumption that the camera response functions (CRFs) are available. This assumption is not an issue if the proposed algorithm is embedded in a digital camera, as different exposure images can be acquired by the camera in advance to estimate the CRFs via the method in [16]. Same as the algorithm in [1], two virtual images are generated to brighten different parts of the captured image. Instead of using the fixed ratio strategy in [1], these two images are first generated via a model-driven method, i.e. by using the intensity mapping functions (IMFs) which can be obtained from the CRFs. As such, there is no lightness distortion in the brightened images. It is noted that noise is amplified and color is distorted in under-exposed regions if the IMFs are used directly [4]. To handle the color-distorted problem, a fixed ratio strategy is designed to brighten each pixel with at least one under-exposed color channel. The ratio is properly determined by solving a least-square optimization problem to prevent possibly visible seam from appearing at the boundaries among the under-exposed regions and their neighboring regions. Besides reducing the color distortion, it is also necessary to avoid amplifying noise in the under-exposed regions. A simple edge-preserving smoothing method is provided to address the problem. The input image is decomposed into a base layer and a detail layer by using a weighted guided image filter (WGIF) in [17], and only the base layer is multiplied by the ratio to brighten the underlying image.

Due to limited representation capability of the IMFs, there is visible difference between the virtual images and their ground truth ones. A deep learning method is adopted to enhance the virtual images such that they are closer to their ground truth images. Since there could be color distortion produced by the model-driven method, one more color loss function is introduced to reduce the possible color distortion in the enhanced virtual images. As such, the proposed loss function is composed of restoration loss and color loss. Under-exposed regions in a low-lighting image are usually dominated by the noise, the randomness of noise brings some trouble to train the network. An adaptive weight is thus incorporated in the restoration loss function to mitigate its influence on the feedback adjustment, making the network easier to converge. Both the convergence speed and the accuracy of the hybrid learning are improved compared to the existing residual deep learning.

Besides increasing the brightness of the input image, it is also important to prevent the brightest regions of the input image from being washed out in the brightened image. Two new weighting functions are proposed to achieved the objective. All the input image and two virtual images are fused via the multi-scale exposure fusion (MEF) algorithm in [25] to produce the final image with the new weighting functions and the weighting functions in [25]. Experimental results show that the proposed algorithm outperforms other single image brightening algorithms. Same as the network in [13], each hybrid learning framework is trained for each camera. In other words, both the proposed framework and the network in [13] could be embedded in a smart phone or a digital camera. On the other hand, our input is an sRGB image rather than a raw image in [13]. Besides brightening darkest regions of the sRGB image, brightest regions of the sRGB image are prevented from being washed out while they could be are washed out by the existing brightened algorithms. In summary, our contributions are highlighted as follows:

1) A hybrid learning is formulated by integrating data-driven and model-driven methods for single image brightening. Such solution combines the advantages of both types of methods to generate two virtual images of higher quality. This is a good example to show that the model-driven and data-driven methods can compensate each other.

2) A new restoration loss function is introduced, in which adaptively weights are assigned to the loss caused by suspicious noise, and the convergence speed is improved.

3) A database, which consists of 300 low, medium and high exposure image triplets, is built up. Only the exposure time is changed while other configurations of cameras are fixed. Both camera shaking and object movement are strictly controlled to ensure that only the illumination is changed.

The rest of this paper is organized as below. Relevant works on single image brightening are reviewed in Section II. The proposed hybrid learning framework is summarized in Section III. Two initial virtual images are generated by using the IMF in Section IV. They are enhanced via a deep learning method in Section V. The input image and the two virtual images are fused together in Section VI to produce the brightened image. Extensive experimental results are provided in Section VII to verify the proposed hybrid learning framework. Finally, conclusion remarks are drawn and future works are discussed in Section VIII.

II Relevant Works on Single Image Brightening

In this section, existing works on single image brightening are summarized under two categories.

Conventional brightening algorithms such as histogram equalization, gamma correction et al. are simple and intuitive ways to enhance a low-lighting image. Although these methods can stretch the contrast of these images, and tackle the low visibility, other problems arise, such as noise amplification, detail loss in bright areas [18]. To address these problems, many single image brightening algorithms have been proposed in the past decade.

The single image brightening algorithms can be classified into two categories. One category is model-driven methods, such as low lighting image enhancement (LIME) [22], which extends the concept of Max-RGB [26] to the pixel level. The illumination of each pixel is first estimated as max(R,G,B)\max(R,G,B) individually. A structure prior is then imposed on the estimated illumination to obtain the final illumination map. The final image is obtained using the illumination map. Li et al. [1] proposed an image brightening algorithm using the MEF, in which two virtual images with large exposure times are generated via a fixed ratio strategy. The input image and the two virtual images are fused via the MEF algorithm to produce the final image. The fixed ratio strategy can avoid color distortion in under-exposed regions. However, it assumes that the brightness relationship is linear between two co-located pixels in the two differently exposed images, which could result in lightness distortion [3] which is one challenging problem in single image brightening. An interesting CRF based strategy was proposed in [3] for the single image brightening. Experimental results show that this method can reduce lightness distortion. The CRF based strategy was ever used in [4] to study ghost removal for differently exposed images with moving objects. Selecting one of the differently exposed images as the reference image. All pixels in other images are classified into consistent and inconsistent pixels. All the consistent pixels are kept while all the inconsistent pixels are corrected using the CRFs. As indicated in [4], it is very challenging to correct an inconsistent pixel in an under-exposed region via the CRFs if the exposure time of the reference image is smaller than that of the image to be corrected. Color could be distorted and noise could be amplified by the CRFs, which are also two challenging problems for single image brightening.

The other category is data-driven methods such as deep learning ones. The deep learning has been widely applied to address image processing problems including single image haze removal [19, 36, 28], single image rain removal [35, 38], single image denoising [21, 32], as well as single image brightening. Li et al. [27] designed a LightNet to predict the mapping relations between the weakly illuminated image and the corresponding illumination map. It is shown that excellent performance has been achieved if the weakly illuminated image is with good quality but it fails to brighten a weakly illuminated image with low quality such as containing noise or JPEG compression distortion. Wei et al. [34] combined deep learning with a Retinex model to design a Retinex-Net which is composed of a Decom-Net for separating reflectance from illumination and an Enhance-Net for illumination adjustment. Wang et al. [15] introduced a new neural network to learn an image-to-illumination mapping rather than an image-to-image mapping. Chen et al. [13] proposed an interesting deep learning method to map a very dark raw image to a bright image, and each network is trained for each camera in [13]. They intended to generate perceptually good images in low-light conditions while the brightened image looks blurry and the high-light regions are washed out in the brightened image. The data-driven approach has an advantage to obtain a mapping function without hand-crafted parameter tuning. Nevertheless, such technique requires large amount of training data. All the deep learning based brightening algorithms focused on increasing the brightness of the input image. Unfortunately, the high-light regions of the input image could be washed out. It is necessary to address this challenging problem for single image brightening.

Due to the pros and cons of these two different categories of methods, it is worth fusing a model-driven method and a data-driven one for single image brightening. The objectives of this paper are to explore such a fusion of the model-driven method and the deep learning method so that they can compensate each other, and to address the four challenging problems for single image brightening.

III The Proposed Brightening Algorithm

In this section, a single image brightening algorithm is proposed by introducing a hybrid learning framework which is a combination of model-driven and data-driven methods. Same as [13], each hybrid learning framework is trained for each camera. The brightness of the input image is increased while the high-light regions are prevented being washed out.

Let Z1Z_{1} be an eight-bit image which is captured at back- or low-light condition. Two eight-bit virtual images Z2Z_{2} and Z3Z_{3} with large exposure times are produced by using a model-driven method. The ground truth images of Z2Z_{2}, Z3Z_{3} are denoted as ZT2Z_{T_{2}} and ZT3Z_{T_{3}}, respectively. They are captured together with the image Z1Z_{1} by using the method in [16]. Fig. 1 summarizes the pipeline of our network for a single image brightening via multi-scale exposure fusion with hybrid learning. Since one objectives of this paper is to explore the feasibility of fusing model-driven and data-driven methods rather than a sophisticated neural network for deep learning, the CNN used in the proposed framework is on top of the network in [14] while the ReLU is replaced by the PReLU.

Refer to caption
Figure 1: The diagram of the proposed single image brightening algorithm via the MEF. Two virtual images Z2Z_{2} and Z3Z_{3} with Δt2{\Delta}t_{2} and Δt3{\Delta}t_{3}, which are larger than Δt1{\Delta}t_{1}, are first generated by a model-driven method, i.e. the IMF based method. (ZT2Z2)({Z_{T_{2}}}-{Z_{2}}) and (ZT3Z3)({Z_{T_{3}}}-{Z_{3}}) are then learnt via the data-driven residual CNN to enhance the initial virtual images. The input image and the two virtual images are finally fused to obtain a brightened image. The brightness of the image is increased while the brightest regions are prevented from being washed out.

The key component of the proposed algorithm is to generate two high-quality virtual images using a new concept of hybrid learning, which combines a model-driven method and a deep data-driven one. The necessity on such a fusion can be elaborated by borrowing wisdom from the field of nonlinear control systems [23, 24], namely, modelled dynamics and unmodelled dynamics are two concepts in the field of nonlinear control systems [23]. Z2Z_{2} and Z3Z_{3} produced by using the IMFs are the modeled information of ZT2Z_{T_{2}} and ZT3Z_{T_{3}} while (ZT2Z2)(Z_{T_{2}}-Z_{2}) and (ZT3Z3)(Z_{T_{3}}-Z_{3}) are the unmodelled information. The modelled information and the unmodelled information are analogous to the modelled dynamics and the unmodelled dynamics.

Different from existing data-driven methods and model-driven ones, the virtual images Z2{Z_{2}} and Z3{Z_{3}} are obtained by using a model-driven method in advance, then the unmodelled information (ZT2Z2)({Z_{T_{2}}}-{Z_{2}}) and (ZT3Z3)({Z_{T_{3}}}-{Z_{3}}) are learned by using a data-driven deep residual convolutional neural network. Clearly, the quality of the initial virtual images can be improved if part of the unmodelled information can be further represented. Such a framework can be regarded as hybrid learning.

The proposed hybrid learning has the following advantages: Firstly, compared with the model-driven method, the virtual images produced by the proposed method is enhanced by compensating unmodelled information. Secondly, compared with the data-driven solution, the hybrid learning converges fast and requires fewer training samples, because (ZT2Z2)({Z_{T_{2}}}-{Z_{2}}) and (ZT3Z3)({Z_{T_{3}}}-{Z_{3}}) are sparser than Z2{Z_{2}} and Z3{Z_{3}}. Thus, it is easy to train the latter neural network using a residual network [31]. Clearly, the model-driven and data-driven methods can compensate each other.

Finally, the input image Z1Z_{1} and two virtual images Z2Z_{2} and Z3Z_{3} are fused together using the MEF algorithm in [25] to produce the final image. The details on the proposed algorithm are given in the subsequent sections.

IV Generation of Initial Virtual Images

Two initial virtual images Z2Z_{2} and Z3Z_{3} that are brighter than the input Z1Z_{1} will be produced by using the IMFs in this section.

Let the CRF be fl(){f_{l}}(\cdot), and the exposure times of Z1{Z_{1}}, Z2{Z_{2}} and Z3{Z_{3}} be Δt1{{\Delta}t_{1}}, Δt2{{\Delta}t_{2}} and Δt3{{\Delta}t_{3}}, respectively. Without loss of generality, it is assumed that Δt3>Δt2>Δt1{{\Delta}t_{3}}>{{\Delta}t_{2}}>{{\Delta}t_{1}}. It has been shown in [37] that there is relative brightness change in the fused image if the exposure ratios are too large. Thus, Δt3\Delta t_{3} and Δt2\Delta t_{2} are selected as 16Δt116\Delta t_{1} and 4Δt14\Delta t_{1} in this study. The IMF between input image and two virtual images can be expressed as follows:

Λ1,i,l(z)=fl(fl1(z)ΔtiΔt1);i=2,3,{\Lambda}_{1,i,l}(z)={f_{l}}(\frac{{f^{-1}_{l}}(z){{\Delta}t_{i}}}{{\Delta}t_{1}})\;;\;i=2,3, (1)

where ll is a color channel. fl1(){f^{-1}_{l}}(\cdot) is the inverse function of the CRF fl(){f_{l}}(\cdot).

Instead of using the fixed ratio strategy in [1], two initial virtual images are generated by using the IMFs. The lightness distortion is prevented from appearing in the brightened images. As mentioned in [4], if the pixel value zz is larger than a threshold ξL{\xi}_{L}, Λ1,i,l(z){\Lambda}_{1,i,l}(z) is a one-to-one mapping, which is reliable. Otherwise, it is not reliable due to a one-to-many mapping. Here, the threshold ξL{\xi}_{L} is determined by the quality of the camera. Both cases are considered to produce the two initial virtual images as follows:

Case 1: The llth color channel of pixel pp, Z1,l(p){Z_{1,l}(p)} is bigger than the threshold ξL{\xi}_{L} for each channel ll. The pixel values corresponding to the two virtual images can be computed by using the IMF. The virtual pixels Z2(p)Z_{2}(p) and Z3(p)Z_{3}(p) are computed as

Z2(p)\displaystyle Z_{2}(p) =[Λ1,2,1(Z1,1(p)),Λ1,2,2(Z1,2(p)),Λ1,2,3(Z1,3(p))],\displaystyle=[{\Lambda}_{1,2,1}({Z_{1,1}(p)}),{\Lambda}_{1,2,2}({Z_{1,2}(p)}),{\Lambda}_{1,2,3}({Z_{1,3}(p)})], (2)
Z3(p)\displaystyle Z_{3}(p) =[Λ1,3,1(Z1,1(p)),Λ1,3,2(Z1,2(p)),Λ1,3,3(Z1,3(p))].\displaystyle=[{\Lambda}_{1,3,1}({Z_{1,1}(p)}),{\Lambda}_{1,3,2}({Z_{1,2}(p)}),{\Lambda}_{1,3,3}({Z_{1,3}(p)})]. (3)

Case 2: Z1,l(p){Z_{1,l}(p)} is smaller than the threshold ξL{\xi}_{L} for at least one channel, and the IMF is not reliable, which yields color distortion. Hence, the fixed ratio strategy in [1] is adopted to generate the corresponding virtual pixels. Two challenging problems to be addressed are: 1) the fixed ratio strategy amplifies noise in the under-exposed regions of Z1Z_{1}; and 2) there are visible seams at the boundaries among the underexposed regions and their neighboring regions if the ratio is selected as in [1]. To address the first problem, Z1Z_{1} is decomposed as a base layer Z1bZ_{1}^{b} and a detail layer Z1eZ_{1}^{e} by the WGIF [17]. To address the second problem, the virtual pixels Zi(p)(i=2,3)Z_{i}(p)(i=2,3) are computed as

Zi(p)=γ~iZ1b(p)+Z1e(p),Z_{i}(p)=\tilde{\gamma}_{i}Z_{1}^{b}(p)+Z_{1}^{e}(p), (4)

where the values of γ~i{{\tilde{\gamma}}_{i}}(i=2,3) are obtained by minimizing the following function:

l=13w~(Z1,l(p))(Λ1,i,l(Z1,l(p))Z1,le(p)γ~iZ1,lb(p))2,{\sum_{l=1}^{3}}{\tilde{w}(Z_{1,l}(p))(\Lambda_{1,i,l}(Z_{1,l}(p))-Z_{1,l}^{e}(p)-{{\tilde{\gamma}}_{i}}{Z_{1,l}^{b}(p)})^{2}}, (5)

and the function w~\tilde{w} is defined as [39]:

w~(z)={0;if0z<ξL1283h12(z)+2h13(z);ifξLz<ξU128;otherwise.,\displaystyle\tilde{w}(z)=\left\{{\begin{array}[]{*{20}{l}}{0;}&{if~{}}{{\rm{0}}{\leq z<{\xi_{L}}}}\\ {128-3{h_{1}}^{2}(z)+2{h_{1}}^{3}(z);}&{if~{}}{{\xi_{L}}\leq z<{\xi_{U}}}\\ {128;}&{otherwise}{}.\end{array}}\right., (9)

The function h1(z){h_{1}}(z) is given as

h1(z)=ξUzξUξL.{h_{1}}(z)=\frac{\xi_{U}-z}{\xi_{U}-\xi_{L}}. (10)

Similar to ξL\xi_{L}, the value of ξU\xi_{U} is related to camera quality. The higher the quality of the camera, the larger the value of ξU\xi_{U}. In this study, we use ξL=5\xi_{L}=5 and ξU=60\xi_{U}=60 as the default settings.

The resultant virtual images are shown in Fig. 2. The virtual images are close to the corresponding ground truth images but the color needs to be corrected. In addition, due to limited representation capability of IMFs, the images Z2Z_{2} and Z3Z_{3} do not contain all the information in the ground truth images. Thus, both intensity and color need to be adjusted.

Refer to caption
Figure 2: The first column includes three input low-light images which are taken with a Nikon 7200 camera. The ISO value is set as 800, and the exposure time Δt1{\Delta}t_{1} is very short. The second column is the set of initial virtual images Z2Z_{2}’s, the exposure time is Δt2{\Delta}t_{2}. The third column is the set of initial virtual images Z3Z_{3}’s, whose exposure time is Δt3{\Delta}t_{3}, Δt1<Δt2<Δt3{\Delta}t_{1}<{\Delta}t_{2}<{\Delta}t_{3}. The fourth column is the set of ground truth images ZT2Z_{T_{2}} in exposure time Δt2{\Delta}t_{2}. The fifth column is the set of ground truth images ZT3Z_{T_{3}} in exposure time Δt3{\Delta}t_{3}.

V Enhancement of Virtual Images Via Deep Learning

The IMF can be regarded as a model to represent the correlation among different exposed images but its representation capability is limited. As mentioned in the Section III, the unmodelled information (ZT2Z2)({Z_{T_{2}}}-{Z_{2}}) and (ZT3Z3)({Z_{T_{3}}}-{Z_{3}}) can be further represented by a deep learning method. The unmodelled information is usually sparse, i.e., most values are likely to be zero or small as shown in Fig. 3. It is expected that the unmodelled information can be compensated by a deep neural network. Similarly, it is challenging for the deep neural network to compensate the IMF in the under-exposed regions due to existence of noise in the regions.

The loss function used in the proposed hybrid learning is defined as

L=Lr+wcLc,L=L_{r}+w_{c}L_{c}, (11)

where wcw_{c} is a constant, which is selected as 2 in this study.

The restoration loss LrL_{r} is usually defined as

Lr=p,l[ZTi,l(p)Zi,l(p)fi(Z1,l(p))]2.L_{r}=\sum_{p,l}[Z_{T_{i},l}(p)-Z_{i,l}(p)-{f_{i}}(Z_{1,l}(p))]^{2}. (12)

The above loss function can be replaced by the weighted MSE in [30] which is equivalent to the SSIM but is differentiable. Since the under-exposed regions in Z1Z_{1} contain random noise [1], a content adaptive weight is introduced to the restoration loss so as to reduce the effect of noise on the adjustment parameters. The loss function LrL_{r} is given as

Lr=p,lWi,l(p)[ZTi,l(p)fi(Z1,l(p))Zi,l(p)]2,L_{r}=\sum_{p,l}W_{i,l}(p)[Z_{T_{i},l}(p)-{f_{i}}(Z_{1,l}(p))-Z_{i,l}(p)]^{2}, (13)

where the weight function Wi,l(p)W_{i,l}(p) is expressed as:

Wi,l(z)={1;ifZi,l(p)ν1νZi,l(p);otherwise.\displaystyle{W_{i,l}}(z)=\left\{{\begin{array}[]{*{20}{l}}{1;}&{if~{}}Z_{i,l}(p)\geq\nu\\ {\frac{1}{\nu-{Z_{i,l}(p)}};}&{otherwise}{}.\end{array}}\right. (16)

and ν\nu is a small positive constant and it is empirically selected as 6.0 in this paper if not specified. When the pixel value in position pp is smaller than vv, it may be noise, so a small weight is assigned to the loss.

To minimize the possible color distortion in the two brightened images, the color loss is defined as

Lc=p(ZTi(p),Zi(p)+fi(Z1(p))),L_{c}=\sum_{p}\angle(Z_{T_{i}}(p),Z_{i}(p)+{f_{i}}(Z_{1}(p))), (17)

where (ZTi(p),fi(Z1(p))+Zi(p))\angle(Z_{T_{i}}(p),{f_{i}}(Z_{1}(p))+Z_{i}(p)) is the angle between two 3D (R,G,B)(R,G,B) vectors ZTi(p)Z_{T_{i}}(p) and (fi(Z1(p))+Zi(p))({f_{i}}(Z_{1}(p))+Z_{i}(p)). Since the LrL_{r} metric only measures the color difference numerically, it cannot ensure that the color vectors have the same direction [15]. By employing the color loss in Eq. (17), the possible color distortion is reduced.

The enhanced virtual images is expressed as (fi(Z1)+Zi)(i=2,3)({f_{i}}(Z_{1})+Z_{i})(i=2,3). It is shown in Fig. 3 that the virtual images enhanced by the deep learning method are much closer to the ground truth images, which implies the unmodelled information is reduced significantly. The color of the plants in the second row is also distorted, while our result looks more natural and much closer to the ground truth. The color of the cup in the third row is obviously distorted by the model-driven method. The results of our method are much closer to the ground truth. It is worth noting that the deep learning method works on TensorFlow, and is trained with a mini-batch size of 32. An Adam optimizer with a fixed learning rate of 10410^{-4} is used to optimize the entire network. Mirroring and cropping are employed to augment training data.

Refer to caption
Figure 3: The first column shows the virtual images Z3Z_{3} generated by the model-driven method, the second column illustrates the results of using the deep learning method to enhance the virtual images (f(Z1)+Z3)(f(Z_{1})+Z_{3}), the third column shows the ground truth images, ZT3Z_{T_{3}}, the fourth demonstrates the results of (ZT3Z3)(Z_{T_{3}}-Z_{3}), the fifth shows the results of (ZT3f(Z1)Z3)(Z_{T_{3}}-f(Z_{1})-Z_{3}).

VI Multi-Scale Fusion of Input Image and Two Virtual Images

The input image Z1Z_{1} and the two virtual images Z2Z_{2} and Z3Z_{3} are fused together to produce the final image. As shown in [25], the weighting maps of the differently exposed images play an important role in the MEF algorithm. This section focuses on defining the weighting maps for the three images while the MEF is the same as [25].

Besides increasing the brightness of the image Z1Z_{1}, the brightest regions of the image Z1Z_{1} are prevented from being washed out. To achieve this objective, two different functions are adopted to define the weights for the three images.

One function is used to determine amplification factors of all the pixels in the image Z1Z_{1}. The weight is defined as

ψ1(z)={2;if z>1601+h22(z)(32h2(z));if 160z>1281;otherwise,\displaystyle\psi_{1}(z)=\left\{\begin{array}[]{ll}2;&\mbox{if~{}}z>160\\ 1+h_{2}^{2}(z)(3-2h_{2}(z));&\mbox{if~{}}160\geq z>128\\ 1;&\mbox{otherwise}\\ \end{array}\right., (21)

where the function h2(z)h_{2}(z) is (z128)/32(z-128)/32. The function ψ1(z)\psi_{1}(z) is inspired by a weighting function in [39].

The other is to measure contrast, well exposedness level, and color saturation of each pixel in the three images and it is defined the same as in [25]

ψ2(Zi(p))=wc(Zi(p))×ws(Zi(p))×we(Zi(p)),\displaystyle\psi_{2}(Z_{i}(p))=w_{c}(Z_{i}(p))\times w_{s}(Z_{i}(p))\times w_{e}(Z_{i}(p)), (22)

where wc(Zi(p))w_{c}(Z_{i}(p)), ws(Zi(p))w_{s}(Z_{i}(p)) and we(Zi(p))w_{e}(Z_{i}(p)) measure contrast, color saturation, and well-exposedness of pixel Zi(p)Z_{i}(p), respectively.

Let Y1Y_{1} be the luminance component of the image Z1Z_{1} in YUV color space. The weight of the pixel Z1(p)Z_{1}(p) is given as

W(Z1(p))=ψ1(Y1(p))ψ2(Z1(p)),\displaystyle W(Z_{1}(p))=\psi_{1}(Y_{1}(p))\psi_{2}(Z_{1}(p)), (23)

and the weight of the pixel Zi(p)(i=2,3)Z_{i}(p)(i=2,3) is given as

W(Zi(p))=ψ2(Zi(p)).\displaystyle W(Z_{i}(p))=\psi_{2}(Z_{i}(p)). (24)

With the above weighting maps, all the images Zi(i=1,2,3)Z_{i}(i=1,2,3) are fused together via the existing MEF algorithm in [25]. It can be shown in the Eqs. (23) and (24) that the pixels in the brightest regions of image Z1Z_{1} dominate the fusion. As such, the brightest regions of image Z1Z_{1} are prevented from being washed out by the proposed algorithm.

VII EXPERIMENTAL RESULTS

Extensive experiments are carried out in this section to demonstrate the rationality and effectiveness of the proposed hybrid learning framework. The emphasis is to show how the model-driven method and the data-driven one compensate each other. Readers are invited to view to the electronic version of the full-size figures and zoom in these figures in order to better appreciate the differences among images. Code is available at [41].

Refer to caption
Figure 4: (a) are high exposure images. (b) are middle exposure images. (c) are low exposure images. The images are collected by changing exposure time, while other configurations of camera are fixed. The camera is fixed to mitigate the effects of jitter, and no moving objects can appear in the image, ensuring that the only variable is illumination.

VII-A Dataset

We have built up a dataset which comprises 300 sets of differently exposed image. Each set contains three images. The images from difference scenes are captured using Nikon 7200. The ISO is set as 800. According to [16], exposure times are different while other configurations of the cameras are fixed. The interval of exposure ratio between them is 2 exposure values (EVs). Both camera shaking and object movement are strictly controlled to ensure that only the exposure time is different, some pictures are shown in Fig. 4.

Refer to caption
Figure 5: The first row shows the brightened images by the model-driven method. The second row illustrates the brightened images by the deep learning method. The third row shows the brightened images of the proposed hybrid learning without color loss function. The fourth row demonstrates the brightened images of the proposed hybrid learning.
Refer to caption

(a)

Refer to caption

(b)

Figure 6: (a) Training curve for intermediate exposure images; (b) Training curve for high exposure images; The Y-axis represents the loss and the X-axis represents the number of iterations. The blue line indicates the results by the proposed hybrid learning, and the red line is the results by the deep learning.
Refer to caption
Figure 7: Ablation study on the BN. The Y-axis represents the loss and the X-axis represents the number of iterations. The red line indicates the results without BN, and the blue line is the results with the BN. Clearly, the BN is helpful for the convergence.

VII-B Comparison of Three Different Methods

The MEF-SSIM in [29] is adopted to illustrate the performance of three different methods: model-driven method, data-driven, and the proposed hybrid learning. The experimental results are given in Fig. 5 and Table I. Clearly, the proposed hybrid learning can improve both the MEF-SSIM and the visual quality of virtual images. Particularly, the resultant images by the deep learning are blurry as shown in Fig. 5, details are also lost. Sometimes, the results by the deep learning are worse than those by the model-driven method. Clearly, the proposed hybrid learning can make the virtual images much closer to the ground truth images.

TABLE I: MEF-SSIM of Four Different Methods
image model-driven data-driven proposed without BNBN proposed without LcL_{c} proposed
furnace 0.8450 0.8494 0.8747 0.9185 0.9218
tea 0.8910 0.8654 0.8935 0.9251 0.9256
grass 0.9611 0.9360 0.9544 0.9686 0.9694
pavilion 0.9565 0.9428 0.9594 0.9605 0.9672
flower 0.9622 0.9597 0.9612 0.9662 0.9665
cup 0.8442 0.8421 0.8686 0.8927 0.8965
flower pot 0.8196 0.7582 0.8319 0.8621 0.8660
succulents 0.8741 0.8692 0.8816 0.9142 0.9163
electric fan 0.9516 0.9339 0.9528 0.9665 0.9668
bag 0.9608 0.9301 0.9494 0.9698 0.9701
dog 0.9070 0.8765 0.9106 0.9329 0.9340
building 0.9369 0.9371 0.9372 0.9413 0.9411
average 0.9091 0.8917 0.9146 0.9349 0.9368

Besides the quality of virtual images, the convergence speed is also analyzed. There is large difference between the input low-lighting image Z1Z_{1} and the ground truth medium/high exposure images ZTiZ_{T_{i}}. On the other hand, the virtual images ZiZ_{i} which are generated by the model-driven method are already close to ZTiZ_{T_{i}}. It is easy for the residual CNNs to represent (ZTiZi)(Z_{T_{i}}-Z_{i}), and the networks can converge much faster as shown in Fig. 6. Clearly, the model-driven method and the data-driven one indeed compensate each other in the proposed hybrid learning framework.

It should be pointed out that the on-line cost of the proposed hybrid learning framework is slighter higher than that of the pure deep learning method due to the inclusion of the data-driven method. On the other hand, a complexity scalable brightening algorithm is provided by the proposed hybrid learning. Such a framework is attractive for “capturing the moment” via mobile computational photography in the coming 5G era. The data-driven method can be adopted to produce an image for previewing on the mobile device. The captured image will be simultaneously sent to the cloud and a high-quality image is produced immediately. The generated image in the cloud will be sent back to the mobile device instantly due to the low latency of the 5G. If the photographer does not like the synthesized image, she/he can capture another image immediately.

VII-C Ablation Study

Since the main objective of this paper is to explore the feasibility of hybrid learning framework rather than a more sophisticated neural network for deep learning, simple ablation study is conducted on the network structure and loss functions.

The ablation study on the BN is shown in Fig. 7. Clearly, the BN is helpful for the convergence of the proposed network and it is thus adopted by the proposed network.

The under-exposed regions contain random noise, and it is difficult to obtain its regular pattern. Although deep learning methods have stronger representation capability, it is still difficult to characterize random noise which brings trouble to network training. Therefore, an adaptive weight is proposed to the restoration loss function as in the equation (13), which reduces the influence of noise to the network training. As shown in Fig. 8, the change of loss is significantly reduced by adding the weights, which can simplify the network training.

Although the restoration loss LrL_{r} can implicitly measure the color difference, it cannot guarantee that (fi(Z1)+Zi)({f_{i}}(Z_{1})+Z_{i}) and ZTiZ_{Ti} have the same color direction. There may exist color distortion by using the restoration loss only, as shown in Fig. 5. By adding the color loss LcL_{c}, the color distortion can be reduced as indicated by the results in Table I.

Refer to caption
Figure 8: Ablation study on loss functions. The Y-axis represents the loss and the X-axis represents the number of iterations. The red line indicates the results with the weights, and the blue line is the results without the weights. Clearly, the adaptive weights can improve the convergence.

VII-D Comparison of Seven Different Brightening Algorithms

In this subsection, the proposed algorithm is compared with six state-of-the-art image brightening algorithms including LIME [22], NPE [33], LECARM [22], SNIE [20], RetinexNet [34] and DeepUPE [15]. Both the RetinexNet and the DeepUPE are deep learning methods, and all the others are model-driven methods. The MEF-SSIM in [29] is adopted to objectively assess the performances of the seven algorithms. The performance assessment is shown in Table II. Clearly, our algorithm significantly outperforms the other six state-of-art algorithms in terms of the MEF-SSIM evaluation.

Refer to caption
Figure 9: The first column shows the low-lighting images, the second column to the seven columns illustrate the results of NPE, SNIE, LIME, LECARM, RetinexNet and DeepUPE, respectively. The last column shows our results. All the LIME, DeepUPE and our algorithm can brighten the low-lighting images while reducing noise, but the LIME and DeepUPE sometimes cause parts of the brightened images being washed out.

Besides the objective evaluation, seven brightened images are shown in Fig. 9 to demonstrate the effectiveness of the seven algorithms. It can be shown that there are visible distortions in the brightened images by the NPE and RetinexNet. The visibility of brightened images by the SNIE and LECARM needs to be improved. The LIME can achieve good results, but it cannot preserve the details in dark regions. The DeepUPE can achieve the desired effect for dark images, but this method results in over-saturation for bright regions. The proposed algorithm can preserve details, reduce noise, and avoid color distortion. The input image is brightened while the brightest regions are prevented from being washed out.

TABLE II: MEF-SSIM of Seven Different Algorithms
image NPE SNIE LIME LECARM RetinexNet DeepUPE Proposed
furnace 0.8638 0.7923 0.8877 0.7966 0.7787 0.9074 0.9218
tea 0.8286 0.7926 0.8788 0.8346 0.6974 0.9069 0.9256
grass 0.9134 0.8573 0.9252 0.8976 0.7572 0.9620 0.9694
pavilion 0.8960 0.8831 0.9272 0.9558 0.6503 0.8062 0.9672
flower 0.9060 0.8966 0.9310 0.9578 0.7102 0.6921 0.9665
cup 0.8221 0.7639 0.8625 0.8067 0.7278 0.8789 0.8965
flower pot 0.8107 0.6685 0.8179 0.7106 0.6922 0.8879 0.8660
succulents 0.8494 0.7744 0.8747 0.8295 0.6812 0.8931 0.9163
electric fan 0.8826 0.8280 0.9320 0.9098 0.7421 0.9437 0.9668
bag 0.8752 0.8429 0.9090 0.9118 0.6622 0.9367 0.9701
dog 0.8408 0.7703 0.8620 0.8344 0.7264 0.9247 0.9340
building 0.8922 0.8847 0.9192 0.9379 0.7401 0.6252 0.9411
average 0.8651 0.8129 0.8939 0.8653 0.7138 0.8637 0.9368
Refer to caption
Figure 10: Comparison of the brightened images by using inaccurate CRF and accurate CRF. The first column shows the input images, the second column illustrates the results by using inaccurate CRF, and the third column demonstrates brightened images by using accurate CRF.

VII-E Limitation of the Proposed Algorithm

Although the proposed algorithm outperforms the other six algorithms, there is space for further study. For example, the proposed algorithm assumes that the accurate CRFs are available. For images from unknown sources, it is quite challenging to estimate the CRFs. If the estimated CRFs are not accurate, the difference between the virtual images and their ground truth images become large, and the quality of the brightened images drops a little bit, as shown in Fig. 10.

As indicated in [16], the CRFs can be estimated correctly if the lighting condition is not changed for all the differently exposed images. This is also required by the proposed algorithm to generate the two virtual images even though the MEF is not sensitive to the variable lighting environment [40].

VIII CONCLUSION REMARKS

A new hybrid learning framework is introduced to study single image brightening. This paper was focuses on the compensation of model-driven method and data-driven method rather than employing sophisticated neural networks. Two initial virtual images with large exposure times are first generated by using the model-driven method and they are enhanced by using the data-driven residual convolutional neural networks. All the input image and the two virtual images are fused to obtain a brightened image. The brightness of the input image is increased while the high-light regions are prevented from being washed out. Experimental results show that the proposed algorithm outperforms the existing algorithms.

It was assumed by the proposed algorithm that the camera response functions (CRFs) are available. This is not an issue if the proposed algorithm is embedded into a digital camera or a smart phone but it might not be true for an image downloaded from the Internet. A deep learning based algorithm could be used or a single image based estimation method such as the LECARM [22] is adopted to estimate the CRFs. Two virtual images with 2EV gaps are generated in the proposed algorithm. It is interesting but challenging to determine the optimal number of virtual images with the optimal EV gaps. One more interesting topic is to apply the proposed hybrid learning for other image processing problems. It is also interesting to develop more sophisticated deep learning methods to replace the one used in this paper. All of them will be studied in our future R&\&D.

acknowledgement

This work was supported by the National Nature Science Foundation of China under Project 61775172 and Project 61620106012.

References

  • [1] Z. G. Li and J. H. Zheng. “Single image brightening via exposure fusion,” in 2016 International Conference on Acoustics, Speech, and Signal Processing, pp. 1756–1760, May. 2016.
  • [2] Z. G. Li, Z. Wei, C. Y. Wen, and J. H. Zheng, “Detail-enhanced multi-scale exposure fusion,” IEEE Trans. on Image Processing, vol. 26, no. 3, pp. 1243-1252, Mar. 2017.
  • [3] Y. Ren, Z. Ying, T. Li, and G. Li, “LECARM: Low-light image enhancement using camera response model,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 29, no. 4, pp. 968-981, Apr. 2019.
  • [4] Z. G. Li, J. H. Zheng, Z. J. Zhu, and S. Q. Wu, “Selectively detailenhanced fusion of differently exposed images with moving objects,” IEEE Trans. on Image Processing, vol. 23, no. 10, pp. 4372–4382, Oct. 2014.
  • [5] A. Celebi, R. Duvar and O. Urhan, “Fuzzy fusion based high dynamic range imaging using adaptive histogram separation,” IEEE Trans. Consumer Electronics, vol. 61, no. 1, pp. 119-127, Feb. 2015.
  • [6] X. Fu, Y. Liao, D. Zeng, Y. Huang, X. P., Zhang, and X. Ding, “A probabilistic method for image enhancement with simultaneous illumination and reflectance estimation,” IEEE Trans. on Image Processing, vol. 24, no. 12, pp. 4965-4977, Dec. 2015.
  • [7] S. Park, S. Yu, B. Moon, S. Ko, and J. Paik, “Low-light image enhancement using variational optimization-based retinex model,” IEEE Trans. Consumer Electronics, vol. 63, no. 2, pp. 70–71, May 2017.
  • [8] H. Yue, J. Yang, X, Sun, F. Wu, and C. Hou, “Contrast enhancement based on intrinsic image decomposition,” IEEE Trans. on Image Processing, vol. 26, no. 8, pp. 3981-3994, Aug. 2017.
  • [9] M. Li, J. Liu, W. Yang, X, Sun, and Z. Guo, “Structure-revealing low-light image enhancement via robust retinex model,” IEEE Trans. on Image Processing, vol. 27, no. 6, pp. 2828–2841, Jun 2018.
  • [10] Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: a practical low-light image enhancer,” in Proceedings of the 27th ACM International Conference on Multimedia, pp. 1632-1640, 2019.
  • [11] S. Liu and Y. Zhang, “Detail-preserving underexposed image enhancement via optimal weighted multi-exposure fusion,” IEEE Trans. on Consumer Electronics, vol. 65, no. 3, pp. 303–311, Aug. 2019.
  • [12] S. Hao, X. Han, Y. Guo, X. Xu, and M. Wang, “Low-light image enhancement with semi-decoupled decomposition,” IEEE Trans. on Multimedia, 2000.
  • [13] J. Xu, C. Chen, Q. Chen and V. Koltun, “Learning to see in the dark,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2018.
  • [14] K. Zhang, W. Zuo, Y. Chen, D. Meng and L. Zhang, “Beyond a Gaussian denoiser: residual learning of deep CNN for image denoising,” IEEE Trans. on Image Processing, vol. 26, no. 7, pp. 3142-3155, Jul. 2017.
  • [15] R. Wang, Z. Qing, C. Fu, X. Shen, W. Zheng, and J. Jia, “Underexposed photo enhancement using deep illumination estimation,” in IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2019.
  • [16] P. E. Debevec and J. Malik. “Recovering high dynamic range radiance maps from photographs,” in Proc. SIGGRAPH, pp. 369–378, 1997.
  • [17] Z. G. Li, J. H. Zheng, Z. J. Zhu, W. Yao, and S. Q. Wu. “Weighted guided image filtering,” IEEE Trans. on Image Processing, vol. 24, no. 1, pp. 20–129, Jan. 2015.
  • [18] T. Celik and T. Tjahjadi, “Contextual and variational contrast enhancement,” IEEE Trans. on Image Processing, vol. 20, no. 12, pp. 3431–3441, Dec. 2011.
  • [19] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “Dehazenet: An end-to-end system for single image haze removal,” IEEE Trans. on Image Processing, vol. 25, no. 11, pp. 5187–5198, Nov. 2016.
  • [20] X. Fu, D. Zeng, Y. Huang, X. P. Zhang, and X. Ding, “A weighted variational model for simultaneous reflectance and illumination estimation,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 2782–2790, Jun. 2016.
  • [21] J. Guan, R. Lai, and A. Xiong. “Wavelet deep neural network for stripe noise removal,” IEEE Access, pp. 44544–44554, 2019.
  • [22] X. Guo, Y. Li, and H. Ling, “LIME: Low-light image enhancement via illumination map estimation,” IEEE Trans. on Image Processing, vol. 26, no. 2, pp. 982–993, Feb. 2017.
  • [23] H. K. Khalil and J. W. Grizzle, Nonlinear system, Prentice Hall, 2002.
  • [24] Z. G. Li, Y. C. Soh, and C. Wen, Switched and impulsive systems: Analysis, design and applications, vol. 313. Springer Science & Business Media, 2005
  • [25] F. Kou, Z. G. Li, C. Wen, and W. H. Chen, “Multi-scale exposure fusion via gradient domain guided image filtering,” in IEEE International Conference on Multimedia and Expo., Hong Kong, China, pp. 1105-1110, Jul. 2017.
  • [26] E. H. Land, “The retinex theory of color vision,” Scientific American, vol. 237, no. 6, pp. 108–128, Jun. 1977.
  • [27] C. Li, J. Guo, F. Porikli, and Y. Pang, “Lightennet: a convolutional neural network for weakly illuminated image enhancement,” Pattern Recognition Letters, vol. 104, pp. 15–22, 2018.
  • [28] R. Liu, X. Fan, M. Hou, Z. Jiang, Z. Luo, and L. Zhang, “Learning aggregated transmission propagation networks for haze removal and beyond,” IEEE Trans. on Neural Networks and Learning Systems, vol. 30, no. 10, pp. 2973-1986, Oct. 2019.
  • [29] K. Ma, K. Zeng, and Z. Wang, “Perceptual quality assessment for multi-exposure image fusion,” IEEE Trans. on Image Processing, vol. 24, no. 11, pp. 3345–3356, Nov. 2015.
  • [30] H. L. Tan, Z. G. Li, Y. H. Tan, S. Rahardja, and C. H. Yeo, “A perceptually relevant MSE-based image quality metric,” IEEE Trans. on Image Processing, vol. 22, no. 11, pp. 4447-4459, Nov. 2013.
  • [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conferece on Computer Vision Pattern Recognition, pp. 770–778, 2016.
  • [32] C. Tian, Y. Xu, L. Fei, J. Wang, and J. Wen, “Enhanced cnn for image denoising,” CAAI Trans. on Intelligence Technology, vol. 4, no. 1, pp. 7–23, Jan. 2019.
  • [33] S. Wang, J. Zheng, H. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. on Image Processing, vol. 22, no. 9, pp. 3538–3578, Sept. 2013.
  • [34] C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for low-light enhancement,” in British Machine Vision Conference, 2018.
  • [35] P. Xiang, L. Wang, F. Wu, and J. Cheng, “Single-image de-raining with feature-supervised generative adversarial network,” IEEE Signal Processing Letters, vol. 26, no. 5, pp.650–654, May. 2019.
  • [36] X. Xu, K. Jia, C. Qing, and D. Tao, “Single image dehazing via deep learning-based image restoration,” in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 1609–1615, 2018.
  • [37] Y. Yang, W. Cao, S. Q. Wu, and Z. G. Li, “Multi-scale fusion of two large-exposure-ratio images,” IEEE Signal Processing Letters, vol. 25, no. 12, pp. 1885-1889, Dec. 2018.
  • [38] H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition, pp. 695–704, 2018.
  • [39] W. Yao, Z. G. Li, and S. Rahardja, “Noise reduction for differently exposed images,” in 2012 International Conference on Acoustics, Speech, and Signal Processing, pp. 917–920, Mar. 2012.
  • [40] T. Mertens, J. Kautz, and F. V. Reeth, “Exposure fusion: a simple and practical alternative to high dynamic range photography,” Computer Graphics Forum, vol. 28, no. 1, pp. 161-171, Jan. 2009.
  • [41] https://github.com/zhengchaobing/Single-Image-Brightening-via-MHybrid-Learning