Physical Model Guided Deep Image Deraining

Abstract

Single image deraining is an urgent task because the degraded rainy image makes many computer vision systems fail to work, such as video surveillance and autonomous driving. So, deraining becomes important and an effective deraining algorithm is needed. In this paper, we propose a novel network based on physical model guided learning for single image deraining, which consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The concatenation of rain streaks and rain-free image that are estimated by rain streaks network, rain-free network, respectively, is input to the guide-learning network to guide further learning and the direct sum of the two estimated images is constrained with the input rainy image based on the physical model of rainy image. Moreover, we further develop the Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and it is proved to boost the deraining performance. Quantitative and qualitative experimental results demonstrate that the proposed method outperforms the state-of-the-art deraining methods. The source code will be available at https://supercong94.wixsite.com/supercong94.

Index Terms— Image deraining, Multi-Scale Residual Block (MSRB), guide-learning

1 Introduction

Rain is a very common weather phenomenon, and images and videos captured in rain consist of raindrops and rain streaks with different speeds, different directions and various density levels, which causes many computer vision systems likely fail to work. So, removing the rain components from rainy images or videos, which obtains a clear background scene, is needed. There are two categories of deraining: single image-based methods [1, 2, 3, 4, 5, 6, 7] and video-based methods [8, 9, 10, 11]. As the temporal information can be leveraged by analyzing the difference between adjacent frames in a video, hence, video-based methods are easier than single image-based methods. In this paper, we explore the more difficult problem, single image deraining.

Image deraining has attracted much attention in recent years, which is always based on this physical rainy model: the observed rainy image is generally modeled as a linear sum of a rain-free background image and rain streaks. In the mathematical representation, the model can be expressed as:

\bm{O}=\bm{B}+\bm{R},

(1)

where $\bm{O}$ , $\bm{B}$ , and $\bm{R}$ denote the observed rainy images, clear background images, and rain streaks, respectively. Based on the Eq. (1), deraining methods should remove $\bm{R}$ from $\bm{O}$ to get $\bm{B}$ , which is a highly ill-posed problem, due to there are a series of solutions of $\bm{B}$ , $\bm{R}$ for a given $\bm{O}$ , theoretically.

To make the problem well be solved, numerous conventional methods adopt various priors about rain streaks or clean background scene to constrain the solution space, such as the sparse code [1], image decomposition [2], low-rank [3] and Gaussian mixture model [4]. These deraining methods always make simple hypotheses on $\bm{R}$ , i.e. rain streaks, such as the assumptions that the rain streaks are sparse and have similar characters in falling directions and shapes, which only work in some specific cases.

With the rise of deep learning, numerous methods have achieved greatly succeeded in many computer vision tasks [12, 13, 14] due to the powerful feature representation capability. Deraining methods also acquire significantly improvement via these deep learning-based methods [15, 16, 5, 6, 17]. However, they still exist some limitations.

On the one hand, many existing methods usually only estimate the rain streak or rain-free image [17, 6, 7], and they neglect that the estimated rain streaks and rain-free image can serve as a physical model guide for the deraining process. On the other hand, multi-scale operations can better acquire the rain streaks information with different levels, which should have a boost effect for deraining. However, numerous deep learning-based methods [6, 16, 17] do not consider the effect of multi-scale information into deraining.

To handle with above limitations, we propose a novel network based on physical model guided learning that utilizes physical model to guide the learning process and applies the multi-scale manner into feature maps. Specifically, the sum of the estimated rain streaks and rain-free image is compared with their corresponding rainy image as a constraint term according to the rainy physical model 1 and the concatenation of them is input into guide-learning as a guide to learn. Moreover, we design a Multi-Scale Residual Block (MSRB) to obtain different features with different levels.

Our contributions are summarized as followings:

•

We design the guide-learning network based on the rainy physical model and the guide boost the deraining performance on both details and texture information.
•

We propose a Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and experiments prove that the block is favorable for improving the rain streaks representation capability.
•

Our proposed network outperforms the state-of-the-art methods on synthetic and real-world datasets in visually, quantitatively and qualitatively.

2 Related Work

In this section, we present a brief review on single image deraining approaches that can be split into prior-based methods and deep learning-based methods.

For prior based methods, Kang et al. [2] first decomposed the rainy image into a low- and high-frequency layer, and then utilized sparse coding to remove the rain streaks in high-frequency layer. Chen et al. [3] assumed the rain steaks are low-rank and proposed an effective low-rank structure to model rain streaks. Luo et al. [1] proposed a discriminative sparse coding framework to accurately separate rain streaks and clean background scenes. Li et al. [4] used patch priors based on Gaussian Mixture Models for both rain steaks and clear background to remove the rain streaks.

For deep learning-based methods, Fu et al. [5, 6] first applied deep learning in single image deraining that they decompose rainy image into low- and high-frequency parts, and then put the high-frequency part into a Convolutional Neural Network (CNN) to estimate residual rain streaks. Yang et al. [7] proposed a recurrent contextual network that can jointly detect and remove rain steaks. Zhang et al. [16] designed a generative adversarial network to prevent the degeneration of background image and utilized perceptual loss to refine visual quality. Fan et al. [18] generalized a residual-guide network for deraining. Li et al. [17] utilized squeeze-and-excitation to learn different weights of different rain streaks layer for deraining. Ren et al. [19] considered network architecture, input and output, and loss functions and provided a better and simpler baseline deraining network.

3 Proposed Method

Refer to caption — Fig. 1: Overall Network Framework. MSRB is shown in Fig 2. The overall network consists of three sub-networks: Rain Streaks Network, Rain-free Network, and Guide-learning Network. The Rain Streak Network and Rain-free Network learn to estimate rain streaks and rain-free images, respectively, and their outputs are cascaded to input the Guide-learning Network as the further guided learning.

In this section, we state more details about our proposed method, including its overall network framework, the multi-scale residual block (MSRB) and loss functions.

3.1 Overall framework

As shown in Fig. 1, the proposed network consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The first two sub-networks have the same structures that are both encoder-decoder. And in order to learn better spatial contextual information to further guide to restore clear image, the estimated rain streaks and rain-free image are cascaded to input the guide-learning network with multi-stream dilation convolution to further refine the deraining results. Moreover, to restrain the rain streaks network and rain-free network to generate better according results, the add between estimated rain streaks and rain-free images is restrained via $L_{1}$ norm according to rainy physical model 1. Furthermore, MSRB is designed to acquire multi-scale information by combining the multi-scale operations and residual block.

3.2 Multi-Scale Residual Block (MSRB)

Multi-scale features have been widely leveraged in many computer vision systems, such as face-alignment [20], semantic segmentation [21], depth estimation [22] and single image super-resolution [23]. Combining features at different scales can result in a better representation of an object and its surrounding context. Therefore, multi-scale residual block (MSRB) is proposed that is the concatenation between different scales of feature maps and the residual block, as shown in Fig. 2.

We describe the MSRB mathematically: Firstly, we utilize $Pooling$ operation with different size of kernels and strides to obtain the multi-scale features:

y_{i}=Pooling_{i}(x),i=1,2,4.

(2)

where $Pooling_{i}$ denotes $Pooling$ operation with $i\times i$ kernel and stride.
Lastly, all the scales are fused and feed into three convolution layers then add the original input signal $x$ to learn the residual:

z=H(Cat[Up_{1}(y_{1}),\cdots,Up_{4}(y_{4})])+x.

(3)

where $Up_{i}$ denotes $i\times$ Upsampling operation and $Cat$ denotes concatenation operation at the channel dimension. $H$ denotes a series of operations that consist of two $3\times 3$ and one $1\times 1$ convolution operations. The MSRB can learn features with different scales and all different features are fused to learn the primary feature.

3.3 Loss function

We use $L_{1}$ -norm as the loss function.

For the rain streaks network and rain-free network:

L_{rain}=\lVert\widetilde{\bm{R}}-{\bm{R}}\rVert_{1},

(4)

L_{rain-free}=\lVert\widetilde{\bm{B}}-{\bm{B}}\rVert_{1},

(5)

where $\widetilde{\bm{R}}$ , $\widetilde{\bm{B}}$ denote the estimated rain streaks layer and clean background image, ${\bm{R}}$ and ${\bm{B}}$ denote the ground truth of rain streaks and rain-free image. For guide-learning network:

L_{guide}=\lVert\widehat{\bm{B}}-{\bm{B}}\rVert_{1},

(6)

where $\widehat{\bm{B}}$ denote the output of guide-learning network, i.e. the final estimated rain-free image.

Moreover, we compute the $L_{1}$ -norm of the input rainy image $\bm{O}$ and the sum of $\widetilde{\bm{R}}$ , $\widetilde{\bm{B}}$ in order to constrain the solution space of rain streaks and rain-free network according to the rainy physical model 1:

L_{p}=\lVert\widetilde{\bm{B}}+\widetilde{\bm{R}}-\bm{O}\rVert_{1},

(7)

So the overall loss function is defined as:

L=L_{guide}+\alpha L_{rain}+\beta L_{rain-free}+\gamma L_{p},

(8)

where $\alpha,\beta,\gamma$ are constant.

Table 1: Quantitative experiments evaluated on three synthetic datasets. The best results are marked in bold.

Dataset	Metric	DSC [1]	LP [4]	DDN [6]	JORDER [7]	RESCAN [17]	PReNet [19]	Ours
Rain100H	PSNR	15.66	14.26	22.26	23.45	25.92	27.89	28.96
Rain100H	SSIM	0.42	0.54	0.69	0.74	0.84	0.89	0.90
Rain100L	PSNR	24.16	29.11	34.85	36.11	36.12	36.69	38.64
Rain100L	SSIM	0.87	0.88	0.95	0.97	0.96	0.98	0.99
Rain1200	PSNR	21.44	22.46	30.95	29.75	32.35	32.38	33.42
Rain1200	SSIM	0.79	0.80	0.86	0.87	0.89	0.92	0.93

4 Experimental Results


PSNR/SSIM	22.12/0.79	20.31/0.75	24.49/0.79	24.93/0.92	25.84/0.93	Inf/1

PSNR/SSIM	22.89/0.71	20.86/0.69	24.73/0.72	26.20/0.88	27.49/0.89	Inf/1

PSNR/SSIM	30.80/0.88	31.20/0.91	32.74/0.87	33.81/0.95	35.12/0.96	Inf/1
(a) Input	(b) DDN	(c) JORDER	(d) RESCAN	(e) PReNet	(f) Ours	(g) GT

Fig. 3: Visual and quantitative comparisons of three synthetic examples. Obviously, the proposed method performs better than the other four deep learning-based methods, especially the region in masked box. Our results shown in (f) have the highest PSNR and SSIM values and are the cleanest.

In this section, we conduct a number of deraining experiments on three synthetic datasets and real-world datasets compared with six state-of-the-art deraining methods, including discriminative sparse coding (DSC) [1], layer priors (LP) [4], deep detail network (DDN) [6], the recurrent version of joint rain detection and removal (JORDER) [7], RESCAN [17] and PReNet [19].

4.1 Experiment settings

Synthetic Datasets

. We carry out experiments to evaluate the performance of our method on three synthetic datasets: Rain100H, Rain100L, and Rain1200, which all have various rain streaks with different sizes, shapes, and directions. There are 1800 image pairs for training and 200 image pairs for testing in Rain100H and Rain100L. In Rain1200, 12000 images are for training and 1200 images for testing. We choose Rain100H as our analysis dataset.

Real-world Testing Images

. We also evaluate the performance of our method on real-world images, which are provided by Zhang et al. [16] and Yang et al. [7]. In these images, they have different rain components from orientation to density.

Training Settings

. In the training process, we randomly crop each training image pairs to $160\times 160$ patch pairs. The batch size is chosen as 64. For each convolution layer, we use leaky-ReLU with $\alpha=0.2$ as the activation function except for the last layer. We use ADAM algorithm [24] to optimize our network. The initial learning rate is $5\times 10^{-4}$ , and is updated twice by a rate of $1/10$ at 1200 and 1600 epochs and the total epoch is 2000. $\alpha,\beta$ and $\gamma$ are set as 0.5, 0.5 and 0.001, respectively. Our entire network is trained on 8 Nvidia GTX 1080Ti GPUs based on PyTorch.

Evaluation Criterions

. We use peak signal to noise ratio (PSNR) and structure similarity (SSIM) to evaluate the quality of the recovered results in comparison with ground truth images. PSNR and SSIM are only computed for synthetic datasets, because not only the estimated rain-free images are needed, but also corresponding ground truth images during the computing process. For real-world images, they can only be evaluated by visual comparisons.

4.2 Results on synthetic datasets

Tab. 1 shows quantitatively comparative results between our method and six state-of-the-art deraining methods on Rain100H, Rain100L and Rain1200. There are two conventional methods: DSC [1] (ICCV15) and LP [4] (CVPR16), and four deep learning-based methods: DDN [6] (CVPR17), JORDER [7] (CVPR17), RESCAN [17] (ECCV18) and PReNet [19] (CVPR19). As we can see that our proposed method outperforms these state-of-the-art approaches on the three datasets.

We also show several challenging synthetic examples for visual comparisons in Fig. 3. As the prior based methods are obviously worse than deep learning-based methods according to Tab. 1, we only compare the visual performances with deep learning methods. The first column in Fig. 3 are synthetic images that are severely degraded by rain streaks. Fig 3 (b) and Fig. 3 (c) are the results of DDN [6] and JORDER [7]. Obviously, they both fail to recover an acceptable clean image. Fig. 3 (d) and Fig. 3 (e) are the results of RESCAN [17] and PReNet [19], which have unpleasing artifacts in the masked boxes. As shown in Fig. 3 (f), our results generate best deraining results no matter in quantitatively or visually.

4.3 Results on real-world images

To evaluate the robustness of our method on real-world images, we also provide two examples on real-world rainy datasets in Fig. 4. For the first example, our method generates the clearest and cleanest result, while the other methods remain some obvious artifacts or rain streaks. For the second example, the other methods get unpleasing artifacts in the masked box, while our approach generates better clear results. We provide more examples on both synthetic and real-world datasets in our supplemental materials.

4.4 Ablation study

Exploring the effectiveness of multi-scale manner and the restraint of physical model in our network is meaningful. So we design some experiments with different combinations of the proposed network components, such as three sub-networks, multi-scale structure, multi-stream dilation convolution, and $L_{p}$ . Tab. 2 shows the comparative results and W and W/O mean whether using the multi-scale structure or not. We can observe that the multi-scale manner boosts deraining performance on all models. This illustrates that our designed multi-scale manner is meaningful. Furthermore, Tab. 3 compares the effectiveness of multi-stream dilation convolution and physical model constraint $L_{p}$ . Fig. 5 provides the outputs of three sub-networks on two real-world rainy images. As we can see, the cropped patches in Fig. 5 (d) perform better than Fig. 5 (c) in detail and texture information, which demonstrates that the guide-learning network is effective in our proposed network.

Table 2: Ablation study on different models. The best results are marked in bold.

	Metric	M_1	M_2	M_3	M_4
W	PSNR	28.63	28.56	28.62	28.97
W	SSIM	0.8949	0.8946	0.8968	0.9015
W/O	PSNR	28.24	28.24	28.46	28.72
W/O	SSIM	0.8898	0.8900	0.8941	0.8986

•

M_1: Only rain streaks network.
•

M_2: Only rain-free network.
•

M_3: Only input the estimated rain-free image to guide-learning network.
•

M_4: (Default) the input is the concatenation of the estimated rain streaks and rain-free image to the guide-learning network.
•

R_1: Our proposed network without multi-stream dilation convolution.
•

R_2: Our proposed network without $L_{p}$ .
•

R_3: Our proposed network with multi-stream dilation convolution and $L_{p}$ , i.e. our proposed final network.

$L_{p}$ is the $L_{1}$ -norm of subtraction between the rainy image and the direct sum of the estimated two images from the first two sub-networks.

Table 3: Analysis on the effectiveness of multi-stream dilation convolution and physical model constraint. The best results are marked in bold.

Metric	R_1	R_2	R_3
PSNR	28.95	28.92	28.97
SSIM	0.9011	0.9012	0.9015

5 Conclusion

In this paper, we propose an effective method to handle single image deraining. Our network is based on the rainy physical model with guide-learning and the experiments demonstrate the physical model constraint and guide-learning are meaningful. Multi-Scale Residual Block is proposed and verified to boost the deraining performance. Quantitative and qualitative experimental results on both synthetic datasets and real-world datasets demonstrate the favorable of our network for single image deraining.

Acknowledgement

This work was supported by National Natural Science Foundation of China [grant numbers 61976041]; National Key R&D Program of China [grant numbers 2018AAA0100301]; National Science and Technology Major Project [grant numbers 2018ZX04041001-007, 2018ZX04016001-011].

References

[1] Yu Luo, Yong Xu, and Hui Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015, pp. 3397–3405.
[2] Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1742–1755, 2012.
[3] Yi-Lei Chen and Chiou-Ting Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in ICCV, 2013, pp. 1968–1975.
[4] Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and Michael S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016, pp. 2736–2744.
[5] Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, and John Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2944–2956, 2017.
[6] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017, pp. 1715–1723.
[7] Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017, pp. 1685–1694.
[8] Kshitiz Garg and Shree K. Nayar, “Vision and rain,” International Journal of Computer Vision, vol. 75, no. 1, pp. 3–27, 2007.
[9] Kshitiz Garg and Shree K. Nayar, “Detection and removal of rain from videos,” in CVPR, 2004, pp. 528–535.
[10] Jérémie Bossu, Nicolas Hautière, and Jean-Philippe Tarel, “Rain or snow detection in image sequences through use of a histogram of orientation of streaks,” International Journal of Computer Vision, vol. 93, no. 3, pp. 348–367, 2011.
[11] Abhishek Kumar Tripathi and Sudipta Mukhopadhyay, “Removal of rain from videos: a review,” Signal, Image and Video Processing, vol. 8, no. 8, pp. 1421–1430, 2014.
[12] Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun, “Cascaded pyramid network for multi-person pose estimation,” in CVPR, 2018, pp. 7103–7112.
[13] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
[14] Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 936–944.
[15] De-An Huang, Li-Wei Kang, Min-Chun Yang, Chia-Wen Lin, and Yu-Chiang Frank Wang, “Context-aware single image rain removal,” in ICME, 2012, pp. 164–169.
[16] He Zhang, Vishwanath Sindagi, and Vishal M. Patel, “Image de-raining using a conditional generative adversarial network,” in CoRR, 2017, vol. abs/1701.05957.
[17] Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” in ECCV, 2018, pp. 262–277.
[18] Zhiwen Fan, Huafeng Wu, Xueyang Fu, Yue Huang, and Xinghao Ding, “Residual-guide network for single image deraining,” in ACM MM, 2018, pp. 1751–1759.
[19] Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng, “Progressive image deraining networks: A better and simpler baseline,” in CVPR, 2019.
[20] Xi Peng, Rogerio S. Feris, Xiaoyu Wang, and Dimitris N. Metaxas, “A recurrent encoder-decoder network for sequential face alignment,” in ECCV, 2016, pp. 38–56.
[21] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia, “Pyramid scene parsing network,” in CVPR, 2017, pp. 6230–6239.
[22] David Eigen, Christian Puhrsch, and Rob Fergus, “Depth map prediction from a single image using a multi-scale deep network,” 2014, vol. abs/1406.2283v1.
[23] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu, “Residual dense network for image super-resolution,” 2018, vol. abs/1802.08797.
[24] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in CoRR, 2014, vol. abs/1412.6980.





Input	Rain Streaks	Rain-free	Output