Single Image Dehazing via Model-based Deep-Learning

Abstract

Model-based single image dehazing algorithms restore images with sharp edges and rich details at the expense of low PSNR values. Data-driven ones restore images with high PSNR values but with low contrast, and even some remaining haze. In this paper, a novel single image dehazing algorithm is introduced by integrating model-based and data-driven approaches. Both transmission map and atmospheric light are initialized by the model-based methods, and refined by deep learning based approaches which form a neural augmentation. Haze-free images are restored by using the transmission map and atmospheric light. Experimental results indicate that the proposed algorithm can remove haze well from real-world and synthetic hazy images.

1 Introduction

Due to the effect of light scattering through small particles accumulated in the air, hazy images suffer from contrast loss of captured objects [1], color distortion [2], and reduction of dynamic range [3]. Existing works on high-level image analysis such as object detection might not perform well on the hazy images, especially for those real-world hazy images with heavy haze [5]. It is highly demanded to study single image dehazing.

Single image dehazing is widely studied in the fields of image processing and computer vision. Two popular types of single image dehazing algorithms are model-based ones [2, 4, 5] and data-driven ones [6, 7]. The model-based ones are on top of the Koschmieders law [8]. They can improve the visibility of real-world hazy images well regardless of haze degree but they cannot achieve high PSNR and SSIM values on the synthetic sets of hazy images. On the other hand, the data-driven ones perform well on the synthetic sets while their performance is poor for real-world hazy images, especially for those hazy images with heavy haze [9]. It is thus desired to have a single image dehazing algorithm which is applicable for both the synthesized and the real-world hazy images.

In this paper, a novel single image dehazing algorithm is proposed by fusing the model-based and data-driven approaches. Same as the model-based methods in [2, 4, 5], both the atmospheric light and the transmission map are required by the proposed algorithm. They are initialized by using model-based methods, and refined by data-driven approaches. The atmospheric light is initialized by using the hierarchical searching method in [11] which is based on the Koschmieders law [8]. The transmission map is initialized by using the dark direct attenuation prior (DDAP) in [5] which is extended from the dark channel prior (DCP) in [2]. The DDAP is applicable to all the pixels in the hazy image. Unfortunately, the initial transmission map suffers from morphological artifacts. The morphological artifacts caused by the DDAP are then reduced by the novel haze line averaging algorithm in [5] which is designed by using the concept of haze line in [4].

The initial atmospheric light and transmission map are then refined via the popular generative adversarial network (GAN) [21]. The initial atmospheric light and transmission map can be regarded as noisy atmospheric light and transmission map. The main function of the GAN is to reduce the noise of the initial atmospheric light and transmission map. Following this idea, the generator of the proposed GAN is constructed on top of the latest DNN for noise reduction in [12] and the Res2Net in [13]. The discriminator of the proposed GAN is based on the PatchGAN in [14]. The proposed GAN is trained by using 500 hazy images from the multiple real-world foggy image defogging (MRFID) dataset in [9] and 500 hazy images from the realistic single image dehazing (RESIDE) dataset in [23]. Existing data-driven dehazing algorithms such as [6, 7, 16, 22] are trained by using more than 10K hazy images in the RESIDE dataset in [23]. All ground-truth images of haze-free images in the RESIDE dataset [23] are available. The $l_{1}$ loss function is applied to measure the restored images for those hazy images in the RESIDE dataset. There is a clean image for each hazy image in the MRFID dataset but the clean and hazy images are captured under different lighting conditions. A new loss function is derived by using an extreme channel which is extended from the dark channel in [2]. The new loss function, the adversarial loss function [21], and one more new loss function on the gradient of the restored image are used to measure the restored images for the hazy images in the MRFID dataset. The model-based estimation and the data-driven refinement forms a neural augmentation which can be applied to improve the interpretability of pure data-driven approaches [15]. Experimental results show that the proposed algorithm outperforms existing data-driven dehazing algorithms for the real-world hazy images from the dehazing quality index (DHQI) point of view [24]. The main contribution is to propose a novel model-based deep learning framework which is applicable to the synthetic and real-world hazy images. The number of training data can be reduced significantly.

Refer to caption — Fig. 1: The proposed framework for single image dehazing via model-based GAN. The generator is on top of the DNN in [12], and the discriminator is based on the PatchGAN in [14]. Each dual attention block (DAB) [12] contains spatial attention and channel attention modules. The Recursive Residual Group (RRG) contains several DABs and one $3*3$ convolution layers.

2 Neural Augmented Single Image Dehazing

The atmospheric light $A$ and the transmission map $t$ are initialized by model-based methods, and refined by data-driven approaches. The model-based and data-driven approaches form a neural augmentation [15]. The overall framework is shown in Figure 1.

2.1 Model-Based Initialization of $A$ and $t$

In this work, the hierarchical searching method in [11] is adopted to estimate an initial value of the $A$ . The hazy image $Z$ is divided into four rectangular regions. The score of each region is defined as the mean of the average pixel value subtracted by the standard deviation of the pixel values within the region along all the color channels. The region with the highest score is further divided into four smaller regions. This process is repeated until the size of the selected region is smaller than a pre-specified threshold. Within the finally selected region, the color vector, which minimizes the distance $\|[Z_{r}(p)-255,Z_{g}(p)-255,Z_{b}(p)-255]\|$ is selected as the atmospheric light. The initial $A$ might not be very accurate, it will be refined by using a data-driven approach.

The dark direct attenuation of a hazy image $Z$ is defined as

\displaystyle\psi_{\rho}(I*t)(p)=\min_{p^{\prime}\in\Omega_{\rho}(p)}\{\min_{c\in\{R,G,B\}}\{I_{c}(p^{\prime})t(p^{\prime})\}\},

(1)

where $\Omega_{\rho}(p)$ is a square window centered at the pixel $p$ of a radius $\rho$ which is usually selected as 7. By assuming that $\psi_{\rho}(I*t)(p)$ is zero, an initial transmission map $t_{0}(p)$ is estimated as [5]

\displaystyle t_{0}(p)=1-\psi_{\rho}(\frac{Z}{A})(p).

(2)

As shown in Figure 2(c), there are visibly morphological artifacts in the restored image $I$ if the $t_{0}(p)$ is directly applied to remove the haze from the hazy image $Z$ . The haze line averaging method in [5] is adopted to reduce the morphological artifacts. The haze line $H(p^{\prime\prime})$ is defined as [4]

\displaystyle H(p^{\prime\prime})=\{p|\sum_{c\in\{R,G,B\}}|I_{c}(p^{\prime\prime})-I_{c}(p)|=0\}.

(3)

Each haze line can be identified by using a color shift hazy pixel which is defined as $(Z(p)-A)$ . The $(Z(p)-A)$ can be converted into the spherical coordinates, $[r(p)$ , $\theta(p)$ , $\psi(p)]$ where $\theta(p)$ and $\psi(p)$ are the longitude and latitude, respectively, and $r(p)$ is $\|Z(p)-A\|$ . The morphological artifacts can be reduced by the following haze line averaging [5]:

\displaystyle t(p)=\frac{\sum_{p^{\prime}\in H_{s}(p^{\prime\prime})}t_{0}(p^{\prime})}{\sum_{p^{\prime}\in H_{s}(p^{\prime\prime})}r(p^{\prime})}r(p),

(4)

where $H_{s}(p^{\prime\prime})$ is a subset of $H(p^{\prime\prime})$ , and it is obtained as follows:

A 2-D histogram binning of $\theta$ and $\psi$ with uniform edges in the range $[0,2\pi]\times[0,\pi]$ is adopted to generate an initial set of $H(p^{\prime\prime})$ . The bin size is chosen as $\pi/720\times\pi/720$ . An upper bound $\nu$ is defined for the cardinality of the final sets. It is usually selected as 200. Let $|H(p^{\prime\prime})|$ be the cardinality of the set $H(p^{\prime\prime})$ . The set $H(p^{\prime\prime})$ is dividedly into $\max\{1,|H(p^{\prime\prime})|/\nu)$ sub-sets for a non-empty $H(p^{\prime\prime})$ .

As illustrated in Figure 2(d), the morphological artifacts are significantly reduced by the haze-line averaging (4). The remaining artifacts will be removed by a data-driven refinement.

2.2 Data-Driven Refinement of $A$ and $t$

Instead of using edge-preserving smoothing filters [18, 19, 20], the GAN [21] is utilized to refine the atmospheric light and transmission map. The generator is thus built up on top of the latest DNN for the noise reduction in [12]. The residual net in [12] is replaced by the Res2Net in [13] to well capture the multi-scale features at a granular level. Both spatial attention module and channel attention module are utilized in [12]. The modules can suppress the less useful features and only allow the propagation of more informative ones. Therefore, they effectively deal with the uneven distribution of haze. The DNN for the atmospheric light is simpler than the DNN for the transmission map. The discriminator is based on the PatchGAN in [14].

The proposed GAN is trained by using 500 images from the MRFID dataset [9] and 500 images with heavy haze regenerated from the RESIDE dataset [23]. Depth is estimated by using the algorithm in [26]. The scattering coefficients of 100 images are randomly generated in $[1.2,2.0]$ and those of the others are randomly selected in $[2.5,3.0]$ . The $A_{c}$ ’s are randomly generated in $[0.625,1.0]$ for the color channel $c$ independently. The heavily hazy images are then generated by using the Koschmieders law [8]. The MRFID dataset contains foggy and clean images of 200 outdoor scenes in different lighting conditions. For each scene, one clear image and four foggy images of different densities defined as slightly, moderately, highly, and extremely foggy, are manually selected from images taken from these scenes over the course of one calendar year.

Loss functions also play an important role in the proposed GAN. Since both the atmospheric light $A$ and transmission map $t(p)$ are refined by the GAN, the loss functions are defined by using the following restored image $I$ :

I(p)=\frac{Z(p)-A}{\max\{t(p),0.1\}}+A.

(5)

Due to the different lighting conditions, the conventional loss functions include $l_{1}$ and $l_{2}$ loss functions are not applicable for the MRFID dataset. A new loss function is proposed by introducing an extreme channel. Let $\bar{I}_{c}(p)$ be defined as

\displaystyle\bar{I}_{c}(p)=\min\{I_{c}(p),255-I_{c}(p)\},

(6)

and the extreme channel of the image $I$ is defined as

\displaystyle\phi_{\rho}(I)(p)=\psi_{\rho}(\bar{I})(p).

(7)

Considering a pair of hazy image $Z$ and clean image $T$ which are captured from the same scene, the corresponding haze-free image of the hazy image is $I$ . It can be known from the conventional imaging model [25] that

\displaystyle[T(p),I(p)]=[L_{T}(p),L_{I}(p)]R(p),

(8)

where $L_{T}(p)$ and $L_{I}(p)$ are the intensities of ambient light when the images $T$ and $I$ are captured. $R(p)$ is the ambient reflectance coefficient of surface, and it highly depends on the smoothness or texture of the surface. Since both the $L_{T}(p)$ and $L_{I}(p)$ are constant in a small neighborhood, $\phi_{\rho}(I)(p)$ and $\phi_{\rho}(T)(p)$ are usually determined by the reflection $R(p)$ . Thus, it follows that

\displaystyle\phi_{\rho}(I)(p)\approx\phi_{\rho}(T)(p)\approx 0.

(9)

The extreme channel of the restored image is required to match that of the clean image $T$ while the dark channel of the restored image is required to be zero in [27]. Let $W$ and $H$ be the width and height of the image $I$ , respectively. The first new loss function for the MRFID dataset is defined by using the extreme channels $\phi_{\rho}(I)$ and $\phi_{\rho}(T)$ as

L_{e}=\frac{1}{WH}\sum_{p}\|\phi_{\rho}(I)(p)-\phi_{\rho}(T)\|_{2}^{2}.

(10)

Table 1: Average PSNR value of 500 outdoor hazy images in the SOTS for different algorithms

\uparrow

	FD-GAN [22]	RefineDNet [16]	FFA-Net [6]	PSD [10]	DCP [18]	HLP [4]	MSBDN [7]	Ours
PSNR	20.78	20.80	32.13	15.15	17.49	18.06	30.25	21.42

Table 2: Average DHQI values of 79 real-world outdoor hazy images for different algorithms

\uparrow

	FD-GAN [22]	RefineDNet [16]	FFA-Net [6]	PSD [10]	DCP [18]	HLP [4]	MSBDN [7]	Ours
DHQI	51.00	57.57	55.33	50.60	51.92	52.75	54.32	58.88

Let $\nabla_{h}$ and $\nabla_{v}$ represent the horizontal and vertical gradients, respectively. The gradients of the restored image are required to approach those of the clean image $T$ rather than zeros as in [27]. The second new loss function for the MRFID dataset is defined by using the gradients of the restored image $I$ and clean image $T$ as

\displaystyle L_{t}=\frac{{\displaystyle\sum_{p,c}}(|\nabla_{h}I_{c}(p)-\nabla_{h}T_{c}(p)|+|\nabla_{v}I_{c}(p)-\nabla_{v}T_{c}(p)|)}{WH}.

(11)

The third adversarial loss function for the MRFID dataset is defined by using the restored image $I$ and clean image $T$ as

\displaystyle L_{adv}=\log(D(T))+\log(1-D(I)).

(12)

The overall loss function for the images in the MRFID dataset is defined as

L_{d,1}=w_{adv}L_{adv}+w_{e}L_{e}+L_{t},

(13)

where $w_{adv}$ and $w_{e}$ are two constants, and they are empirically selected as 1 and 200, respectively.

Since the ground-truth image $T$ is available in the RESIDE dataset, the loss function for the image in the RESIDE dataset is defined as

\displaystyle L_{d,2}=\frac{1}{WH}\sum_{p}|I(p)-T(p)|.

(14)

Step 1.

Initialize the atmospheric light $A_{0}$ from the hazy image $Z$ using the method in [11].
Step 2.

Initialize the transmission map $t_{0}$ by using the DDAP as in the equation (2).
Step 3.

Reduce the morphological artifacts of $t_{0}$ via the nonlocal haze line averaging (4).
Step 4.

Refine the atmospheric light and transmission map using the GAN in Figure 1.
Step 5.

Restore the haze free image $I$ via the equation (5).

Algorithm 1 Model-based single image deep dehazing

The atmospheric light $A$ and transmission map $t(p)$ in the equation (4) are refined by minimizing the following overall loss function in each batch:

\displaystyle L_{d}=w_{R}L_{d,2}+L_{d,1},

(15)

where $w_{R}$ is a constant, and it is empirically selected as 100 in this paper. The proposed algorithm is summarized in algorithm 1.

3 Experimental Results

Due to the space limitation, the ablation studies are omitted and this section focuses on comparing the proposed dehazing algorithm with seven state-of-the-art ones including RefineDNet [16], FFA-Net [6], PSD [10], DCP [18], HLP [4], MSBDN [7], and FD-GAN [22]. The algorithms HLP [4] and DCP [18] are model-based algorithms, the proposed algorithm and the RefineDNet [16] are combination of model-based and data-driven approaches, while the others are data-driven algorithms. All the results in [16, 6, 10, 7, 22] are generated by their publicly shared codes.

The proposed GAN is trained by using the hazy and clean images of 125 outdoor scenes from the MRFID dataset with the corresponding 500 hazy images as well as 500 images with heavy haze generated using ground-truth images from the RESIDE dataset [23]. The hazy images of the 25 outdoor scenes from the MRFID dataset with the corresponding 100 hazy images are selected as the the validation set. All these hazy images are randomly selected from the two datasets. The test images comprise 500 outdoor hazy images in the synthetic objective testing set (SOTS) [23], and 79 real-world hazy images in [5] which include 31 images are from the RESIDE [23], and the 19 images from [17] and the Internet.

The PSNR is first adopted to compare all these algorithms by using 500 synthetic outdoor hazy images in the SOTS [23]. The average PSNR values of the eight algorithms are given in Table 1. The FFA-Net [6] and MSBDN [7] are two CNN based dehazing algorithms, and they are optimized to provide high PSNR values on the SOTS dataset. Both the proposed algorithm and the algorithm in [16] indeed outperform the model-based algorithms in [4] and [18].

The quality index DHQI in [24] is then adopted to compare all these algorithms by using the 79 real-world hazy images in [5]. The average DHQI values of the 79 real-world outdoor hazy images are given in Table 2. The proposed algorithm outperforms others from the DHQI point of view.

Finally, all these dehazing algorithms are compared subjectively as in Figure 3. Readers are invited to view to electronic version of figures and zoom in them so as to better appreciate differences among all the images. Although the FFA-Net [6] and MSBDN [7] achieve high PSNR and SSIM values for the synthetic hazy images, their restored results are blurry for real-world hazy images and the dehazed images are not photo-realistic. In addition, the haze is not reduced well if it is heavy. The DCP [18], HLP [4], RefineDNet [16], FD-GAN [22], and the proposed algorithm can be applied to generate photo realistic images. There are visible morphological artifacts in the restored images by the PSD [10], RefineDNet [16] and FD-GAN [22]. Textures generated by the RefineDNet [16] and FD-GAN [22] are different from the real ones. The DCP [18] and HLP [4] restore more vivid and sharper images at the expense of amplified noise in sky regions. All these problems are overcome by the proposed algorithm.

4 Conclusion Remarks and Discussion

A new single image dehazing algorithm is introduced in this paper. Both transmission map and atmospheric light are obtained by a neural augmentation which consists of model-based estimation and data-driven refinement. They are then applied to restore a haze-free image. Experimental results validate that the proposed algorithm removes haze well from the synthetic and real-world hazy images. The proposed neural augmentation reduces the number of training data significantly. This paper focused on day-time hazy images. The proposed framework will be extended to study night-time hazy images [28] in our future research.

References

[1] S. G. Narasimhan and S. K. Nayar, “Contrast restoration of weather degraded images,” IEEE Trans. On Pattern Analysis and Machine Learning, vol. 25, no. 6, pp. 713-724, Jun. 2003.
[2] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. On Pattern Analysis and Machine Learning, vol 33, no. 12, pp. 2341-2353, Dec. 2011.
[3] F. Kou, Z. Wei, W. Chen, X. Wu, C. Wen, and Z. Li, “Intelligent detail enhancement for exposure fusion,” IEEE Trans. on Multimedia, vol. 20, no. 2, pp. 484–495, Feb. 2018.
[4] D. Berman, T. Treibitz, and S. Avidan, “Non-local Image Dehazing,” in IEEE Conference on Computer Vision and Pattern Recognition, pp. 1674-1682, 2016.
[5] Z. Li, H. Shu, and C. Zheng, “Multi-scale single image dehazing Using Laplacian and Gaussian pyramids,” IEEE Trans. on Image Processing, vol. 30, no. 12, pp. 9270-9279, Dec. 2021.
[6] X. Qin, Z. Wang, Y. Bai, X. Xie, and H. Jia, “FFA-Net: Feature fusion attention network for single image dehazing,” in AAAI Conference on Artificial Intelligence, pp. 11908-11915, 2020.
[7] H. Dong, J. Pan, L. Xiang, Z. Hu, X. Zhang, F. Wang, and M. Yang, “Multi-scale boosted dehazing network with dense feature fusion,” in IEEE Conference on Computer Vision and Pattern Recognition, 2020.
[8] H. Koschmieder, “Theorie der horizontalen sichtweite,” in Proc. Beitrage Phys. Freien Atmos., pp. 33-53, 1924.
[9] W. Liu, F. Zhou, T. Lu, J. Duan, and G. Qiu, “Image defogging quality assessment: real-world database and method,” IEEE Trans. on Image Processing, 30, pp. 176-190, 2021.
[10] Z. Chen, Y. Wang, Y. Yang, and D. Liu, “PSD: principled synthetic-to-real dehazing guided by physical priors,” in IEEE Conference on Computer Vision and Pattern Recognition, 2021.
[11] J. H. Kim, W. D. Jang, J. Y. Sim, and C. F. Kim, “Optimized contrast enhancement for real-time image and video dehazing,” Journal of Visual Communication and Image Representation, vol. 24, no. 3, pp. 410-425, Apr. 2013.
[12] S. Zamir, A. Arora, S. Khan, M. Hayat, F. Khan, M. Yang, and L. Shao, “CycleISP: real image restoration via improved data synthesis,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
[13] S. Gao, M. Cheng, K. Zhao, X. Zhang, M. Yang, and P. Torr, “Res2Net: A New Multi-scale Backbone Architecture,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 43, no. 2, pp. 652-662, Feb. 2021.
[14] P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1125-1134, 2017.
[15] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based deep learning,” arXiv Preprint, arXiv: 2012.08405v2, Jun 2021.
[16] S. Zhao, L. Zhang, Y. Shen, and Y. Zhou, “RefineDNet: A weakly supervised refinement framework for single image dehazing,” IEEE Trans. on Image Processing, vol. 30, pp. 3391-3404, 2021.
[17] R. Fattal, “Dehazing using color-lines,” ACM Trans. on Graphics, vol 34, no. 1, articale 13, Jan. 2014.
[18] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. on Pattern Analyis and Machine Intelligence, vol. 35, no. 6, pp. 1397-1409, Jun. 2013.
[19] Z. Li, J. Zheng, Z. Zhu, W. Yao, and S. Wu, “Weighted guided image filtering,” IEEE Trans. on Image Processing, vol. 24, no. 1, pp. 120-129, Jan. 2015.
[20] F. Kou, W. Chen, Z. Li, and C. Wen, “Content adaptive image detail enhancement,” IEEE Signal Processing Letters, vol. 22, no. 2, pp. 211-215, Feb. 2015.
[21] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengi, “Generative adversarial nets,” in NIPS 2014, Canada, Dec. 2014.
[22] Y. Dong, Y. Liu, H. Zhang, S. Chen, and Y. Qiao, “FD-GAN: Generative adversarial networks with fusiondiscriminator for single image dehazing,” in Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10728-10736, 2020.
[23] B. Li, W. Ren, D. Fu, D. Tao, D. Feng, W. Zeng, and Z. Wang, “Benchmarking single-image dehazing and beyond,” IEEE Trans. on Image Processing vol. 28, no. 1, pp. 492-505, Jan. 2018.
[24] X. Min, G. Zhai, K. Gu, X. Yang, and X. Guan, “Objective quality evaluation of dehazed images,” IEEE Trans. on Intelligent Transportation Systems, vol. 20, no. 8, pp. 2879-2892, Aug. 2019.
[25] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Englewood Cliffs, NJ, USA: Prentice-Hall, 2002.
[26] Z. Li and N. Snavely, “Megadepth: Learning single-view depth prediction from internet photos,” in IEEE/VF Conference on Computer Vision and Pattern Recognition, pp. 2041–2050, 2018.
[27] L. Li, Y. Dong, W. Ren, J. Pan, C. Gao, N. Sang, and M. Yang, “Semi-supervised image dehazing,” IEEE Trans. on Image Processing, vol. 29, pp. 2766-2779, 2020.
[28] M. Yang, J. Liu, and Z. Li, “Super-pixel based single nighttime image haze removal,” IEEE Trans. on Multimedia, vol. 20, no. 11, pp. 3008-3018, Nov. 2018.