Physical Model Guided Deep Image Deraining
Abstract
Single image deraining is an urgent task because the degraded rainy image makes many computer vision systems fail to work, such as video surveillance and autonomous driving. So, deraining becomes important and an effective deraining algorithm is needed. In this paper, we propose a novel network based on physical model guided learning for single image deraining, which consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The concatenation of rain streaks and rain-free image that are estimated by rain streaks network, rain-free network, respectively, is input to the guide-learning network to guide further learning and the direct sum of the two estimated images is constrained with the input rainy image based on the physical model of rainy image. Moreover, we further develop the Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and it is proved to boost the deraining performance. Quantitative and qualitative experimental results demonstrate that the proposed method outperforms the state-of-the-art deraining methods. The source code will be available at https://supercong94.wixsite.com/supercong94.
Index Terms— Image deraining, Multi-Scale Residual Block (MSRB), guide-learning
1 Introduction
Rain is a very common weather phenomenon, and images and videos captured in rain consist of raindrops and rain streaks with different speeds, different directions and various density levels, which causes many computer vision systems likely fail to work. So, removing the rain components from rainy images or videos, which obtains a clear background scene, is needed. There are two categories of deraining: single image-based methods [1, 2, 3, 4, 5, 6, 7] and video-based methods [8, 9, 10, 11]. As the temporal information can be leveraged by analyzing the difference between adjacent frames in a video, hence, video-based methods are easier than single image-based methods. In this paper, we explore the more difficult problem, single image deraining.
Image deraining has attracted much attention in recent years, which is always based on this physical rainy model: the observed rainy image is generally modeled as a linear sum of a rain-free background image and rain streaks. In the mathematical representation, the model can be expressed as:
(1) |
where , , and denote the observed rainy images, clear background images, and rain streaks, respectively. Based on the Eq. (1), deraining methods should remove from to get , which is a highly ill-posed problem, due to there are a series of solutions of , for a given , theoretically.
To make the problem well be solved, numerous conventional methods adopt various priors about rain streaks or clean background scene to constrain the solution space, such as the sparse code [1], image decomposition [2], low-rank [3] and Gaussian mixture model [4]. These deraining methods always make simple hypotheses on , i.e. rain streaks, such as the assumptions that the rain streaks are sparse and have similar characters in falling directions and shapes, which only work in some specific cases.
With the rise of deep learning, numerous methods have achieved greatly succeeded in many computer vision tasks [12, 13, 14] due to the powerful feature representation capability. Deraining methods also acquire significantly improvement via these deep learning-based methods [15, 16, 5, 6, 17]. However, they still exist some limitations.
On the one hand, many existing methods usually only estimate the rain streak or rain-free image [17, 6, 7], and they neglect that the estimated rain streaks and rain-free image can serve as a physical model guide for the deraining process. On the other hand, multi-scale operations can better acquire the rain streaks information with different levels, which should have a boost effect for deraining. However, numerous deep learning-based methods [6, 16, 17] do not consider the effect of multi-scale information into deraining.
To handle with above limitations, we propose a novel network based on physical model guided learning that utilizes physical model to guide the learning process and applies the multi-scale manner into feature maps. Specifically, the sum of the estimated rain streaks and rain-free image is compared with their corresponding rainy image as a constraint term according to the rainy physical model 1 and the concatenation of them is input into guide-learning as a guide to learn. Moreover, we design a Multi-Scale Residual Block (MSRB) to obtain different features with different levels.
Our contributions are summarized as followings:
-
•
We design the guide-learning network based on the rainy physical model and the guide boost the deraining performance on both details and texture information.
-
•
We propose a Multi-Scale Residual Block (MSRB) to better utilize multi-scale information and experiments prove that the block is favorable for improving the rain streaks representation capability.
-
•
Our proposed network outperforms the state-of-the-art methods on synthetic and real-world datasets in visually, quantitatively and qualitatively.
2 Related Work
In this section, we present a brief review on single image deraining approaches that can be split into prior-based methods and deep learning-based methods.
For prior based methods, Kang et al. [2] first decomposed the rainy image into a low- and high-frequency layer, and then utilized sparse coding to remove the rain streaks in high-frequency layer. Chen et al. [3] assumed the rain steaks are low-rank and proposed an effective low-rank structure to model rain streaks. Luo et al. [1] proposed a discriminative sparse coding framework to accurately separate rain streaks and clean background scenes. Li et al. [4] used patch priors based on Gaussian Mixture Models for both rain steaks and clear background to remove the rain streaks.
For deep learning-based methods, Fu et al. [5, 6] first applied deep learning in single image deraining that they decompose rainy image into low- and high-frequency parts, and then put the high-frequency part into a Convolutional Neural Network (CNN) to estimate residual rain streaks. Yang et al. [7] proposed a recurrent contextual network that can jointly detect and remove rain steaks. Zhang et al. [16] designed a generative adversarial network to prevent the degeneration of background image and utilized perceptual loss to refine visual quality. Fan et al. [18] generalized a residual-guide network for deraining. Li et al. [17] utilized squeeze-and-excitation to learn different weights of different rain streaks layer for deraining. Ren et al. [19] considered network architecture, input and output, and loss functions and provided a better and simpler baseline deraining network.
3 Proposed Method
![]() |
In this section, we state more details about our proposed method, including its overall network framework, the multi-scale residual block (MSRB) and loss functions.
3.1 Overall framework
As shown in Fig. 1, the proposed network consists of three sub-networks: rain streaks network, rain-free network, and guide-learning network. The first two sub-networks have the same structures that are both encoder-decoder. And in order to learn better spatial contextual information to further guide to restore clear image, the estimated rain streaks and rain-free image are cascaded to input the guide-learning network with multi-stream dilation convolution to further refine the deraining results. Moreover, to restrain the rain streaks network and rain-free network to generate better according results, the add between estimated rain streaks and rain-free images is restrained via norm according to rainy physical model 1. Furthermore, MSRB is designed to acquire multi-scale information by combining the multi-scale operations and residual block.
3.2 Multi-Scale Residual Block (MSRB)
Multi-scale features have been widely leveraged in many computer vision systems, such as face-alignment [20], semantic segmentation [21], depth estimation [22] and single image super-resolution [23]. Combining features at different scales can result in a better representation of an object and its surrounding context. Therefore, multi-scale residual block (MSRB) is proposed that is the concatenation between different scales of feature maps and the residual block, as shown in Fig. 2.
![]() |
We describe the MSRB mathematically: Firstly, we utilize operation with different size of kernels and strides to obtain the multi-scale features:
(2) |
where denotes operation with kernel and stride.
Lastly, all the scales are fused and feed into three convolution layers then add the original input signal to learn the residual:
(3) |
where denotes Upsampling operation and denotes concatenation operation at the channel dimension. denotes a series of operations that consist of two and one convolution operations. The MSRB can learn features with different scales and all different features are fused to learn the primary feature.
3.3 Loss function
We use -norm as the loss function.
For the rain streaks network and rain-free network:
(4) |
(5) |
where , denote the estimated rain streaks layer and clean background image, and denote the ground truth of rain streaks and rain-free image. For guide-learning network:
(6) |
where denote the output of guide-learning network, i.e. the final estimated rain-free image.
Moreover, we compute the -norm of the input rainy image and the sum of , in order to constrain the solution space of rain streaks and rain-free network according to the rainy physical model 1:
(7) |
So the overall loss function is defined as:
(8) |
where are constant.
Dataset | Metric | DSC [1] | LP [4] | DDN [6] | JORDER [7] | RESCAN [17] | PReNet [19] | Ours |
---|---|---|---|---|---|---|---|---|
Rain100H | PSNR | 15.66 | 14.26 | 22.26 | 23.45 | 25.92 | 27.89 | 28.96 |
SSIM | 0.42 | 0.54 | 0.69 | 0.74 | 0.84 | 0.89 | 0.90 | |
Rain100L | PSNR | 24.16 | 29.11 | 34.85 | 36.11 | 36.12 | 36.69 | 38.64 |
SSIM | 0.87 | 0.88 | 0.95 | 0.97 | 0.96 | 0.98 | 0.99 | |
Rain1200 | PSNR | 21.44 | 22.46 | 30.95 | 29.75 | 32.35 | 32.38 | 33.42 |
SSIM | 0.79 | 0.80 | 0.86 | 0.87 | 0.89 | 0.92 | 0.93 |
4 Experimental Results
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
PSNR/SSIM | 22.12/0.79 | 20.31/0.75 | 24.49/0.79 | 24.93/0.92 | 25.84/0.93 | Inf/1 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
PSNR/SSIM | 22.89/0.71 | 20.86/0.69 | 24.73/0.72 | 26.20/0.88 | 27.49/0.89 | Inf/1 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
PSNR/SSIM | 30.80/0.88 | 31.20/0.91 | 32.74/0.87 | 33.81/0.95 | 35.12/0.96 | Inf/1 |
(a) Input | (b) DDN | (c) JORDER | (d) RESCAN | (e) PReNet | (f) Ours | (g) GT |
In this section, we conduct a number of deraining experiments on three synthetic datasets and real-world datasets compared with six state-of-the-art deraining methods, including discriminative sparse coding (DSC) [1], layer priors (LP) [4], deep detail network (DDN) [6], the recurrent version of joint rain detection and removal (JORDER) [7], RESCAN [17] and PReNet [19].
4.1 Experiment settings
Synthetic Datasets
. We carry out experiments to evaluate the performance of our method on three synthetic datasets: Rain100H, Rain100L, and Rain1200, which all have various rain streaks with different sizes, shapes, and directions. There are 1800 image pairs for training and 200 image pairs for testing in Rain100H and Rain100L. In Rain1200, 12000 images are for training and 1200 images for testing. We choose Rain100H as our analysis dataset.
Real-world Testing Images
. We also evaluate the performance of our method on real-world images, which are provided by Zhang et al. [16] and Yang et al. [7]. In these images, they have different rain components from orientation to density.
Training Settings
. In the training process, we randomly crop each training image pairs to patch pairs. The batch size is chosen as 64. For each convolution layer, we use leaky-ReLU with as the activation function except for the last layer. We use ADAM algorithm [24] to optimize our network. The initial learning rate is , and is updated twice by a rate of at 1200 and 1600 epochs and the total epoch is 2000. and are set as 0.5, 0.5 and 0.001, respectively. Our entire network is trained on 8 Nvidia GTX 1080Ti GPUs based on PyTorch.
Evaluation Criterions
. We use peak signal to noise ratio (PSNR) and structure similarity (SSIM) to evaluate the quality of the recovered results in comparison with ground truth images. PSNR and SSIM are only computed for synthetic datasets, because not only the estimated rain-free images are needed, but also corresponding ground truth images during the computing process. For real-world images, they can only be evaluated by visual comparisons.
4.2 Results on synthetic datasets
Tab. 1 shows quantitatively comparative results between our method and six state-of-the-art deraining methods on Rain100H, Rain100L and Rain1200. There are two conventional methods: DSC [1] (ICCV15) and LP [4] (CVPR16), and four deep learning-based methods: DDN [6] (CVPR17), JORDER [7] (CVPR17), RESCAN [17] (ECCV18) and PReNet [19] (CVPR19). As we can see that our proposed method outperforms these state-of-the-art approaches on the three datasets.
We also show several challenging synthetic examples for visual comparisons in Fig. 3. As the prior based methods are obviously worse than deep learning-based methods according to Tab. 1, we only compare the visual performances with deep learning methods. The first column in Fig. 3 are synthetic images that are severely degraded by rain streaks. Fig 3 (b) and Fig. 3 (c) are the results of DDN [6] and JORDER [7]. Obviously, they both fail to recover an acceptable clean image. Fig. 3 (d) and Fig. 3 (e) are the results of RESCAN [17] and PReNet [19], which have unpleasing artifacts in the masked boxes. As shown in Fig. 3 (f), our results generate best deraining results no matter in quantitatively or visually.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
(a) Input | (b) DDN | (c) JORDER | (d) RESCAN | (e) PReNet | (f) Ours |
4.3 Results on real-world images
To evaluate the robustness of our method on real-world images, we also provide two examples on real-world rainy datasets in Fig. 4. For the first example, our method generates the clearest and cleanest result, while the other methods remain some obvious artifacts or rain streaks. For the second example, the other methods get unpleasing artifacts in the masked box, while our approach generates better clear results. We provide more examples on both synthetic and real-world datasets in our supplemental materials.
4.4 Ablation study
Exploring the effectiveness of multi-scale manner and the restraint of physical model in our network is meaningful. So we design some experiments with different combinations of the proposed network components, such as three sub-networks, multi-scale structure, multi-stream dilation convolution, and . Tab. 2 shows the comparative results and W and W/O mean whether using the multi-scale structure or not. We can observe that the multi-scale manner boosts deraining performance on all models. This illustrates that our designed multi-scale manner is meaningful. Furthermore, Tab. 3 compares the effectiveness of multi-stream dilation convolution and physical model constraint . Fig. 5 provides the outputs of three sub-networks on two real-world rainy images. As we can see, the cropped patches in Fig. 5 (d) perform better than Fig. 5 (c) in detail and texture information, which demonstrates that the guide-learning network is effective in our proposed network.
Metric | M_1 | M_2 | M_3 | M_4 | |
---|---|---|---|---|---|
W | PSNR | 28.63 | 28.56 | 28.62 | 28.97 |
SSIM | 0.8949 | 0.8946 | 0.8968 | 0.9015 | |
W/O | PSNR | 28.24 | 28.24 | 28.46 | 28.72 |
SSIM | 0.8898 | 0.8900 | 0.8941 | 0.8986 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Input | Rain Streaks | Rain-free | Output |
-
•
M_1: Only rain streaks network.
-
•
M_2: Only rain-free network.
-
•
M_3: Only input the estimated rain-free image to guide-learning network.
-
•
M_4: (Default) the input is the concatenation of the estimated rain streaks and rain-free image to the guide-learning network.
-
•
R_1: Our proposed network without multi-stream dilation convolution.
-
•
R_2: Our proposed network without .
-
•
R_3: Our proposed network with multi-stream dilation convolution and , i.e. our proposed final network.
is the -norm of subtraction between the rainy image and the direct sum of the estimated two images from the first two sub-networks.
Metric | R_1 | R_2 | R_3 |
---|---|---|---|
PSNR | 28.95 | 28.92 | 28.97 |
SSIM | 0.9011 | 0.9012 | 0.9015 |
5 Conclusion
In this paper, we propose an effective method to handle single image deraining. Our network is based on the rainy physical model with guide-learning and the experiments demonstrate the physical model constraint and guide-learning are meaningful. Multi-Scale Residual Block is proposed and verified to boost the deraining performance. Quantitative and qualitative experimental results on both synthetic datasets and real-world datasets demonstrate the favorable of our network for single image deraining.
Acknowledgement
This work was supported by National Natural Science Foundation of China [grant numbers 61976041]; National Key R&D Program of China [grant numbers 2018AAA0100301]; National Science and Technology Major Project [grant numbers 2018ZX04041001-007, 2018ZX04016001-011].
References
- [1] Yu Luo, Yong Xu, and Hui Ji, “Removing rain from a single image via discriminative sparse coding,” in ICCV, 2015, pp. 3397–3405.
- [2] Li-Wei Kang, Chia-Wen Lin, and Yu-Hsiang Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Transactions on Image Processing, vol. 21, no. 4, pp. 1742–1755, 2012.
- [3] Yi-Lei Chen and Chiou-Ting Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in ICCV, 2013, pp. 1968–1975.
- [4] Yu Li, Robby T. Tan, Xiaojie Guo, Jiangbo Lu, and Michael S. Brown, “Rain streak removal using layer priors,” in CVPR, 2016, pp. 2736–2744.
- [5] Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, and John Paisley, “Clearing the skies: A deep network architecture for single-image rain removal,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2944–2956, 2017.
- [6] Xueyang Fu, Jiabin Huang, Delu Zeng, Yue Huang, Xinghao Ding, and John Paisley, “Removing rain from single images via a deep detail network,” in CVPR, 2017, pp. 1715–1723.
- [7] Wenhan Yang, Robby T. Tan, Jiashi Feng, Jiaying Liu, Zongming Guo, and Shuicheng Yan, “Deep joint rain detection and removal from a single image,” in CVPR, 2017, pp. 1685–1694.
- [8] Kshitiz Garg and Shree K. Nayar, “Vision and rain,” International Journal of Computer Vision, vol. 75, no. 1, pp. 3–27, 2007.
- [9] Kshitiz Garg and Shree K. Nayar, “Detection and removal of rain from videos,” in CVPR, 2004, pp. 528–535.
- [10] Jérémie Bossu, Nicolas Hautière, and Jean-Philippe Tarel, “Rain or snow detection in image sequences through use of a histogram of orientation of streaks,” International Journal of Computer Vision, vol. 93, no. 3, pp. 348–367, 2011.
- [11] Abhishek Kumar Tripathi and Sudipta Mukhopadhyay, “Removal of rain from videos: a review,” Signal, Image and Video Processing, vol. 8, no. 8, pp. 1421–1430, 2014.
- [12] Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, and Jian Sun, “Cascaded pyramid network for multi-person pose estimation,” in CVPR, 2018, pp. 7103–7112.
- [13] Jonathan Long, Evan Shelhamer, and Trevor Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
- [14] Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie, “Feature pyramid networks for object detection,” in CVPR, 2017, pp. 936–944.
- [15] De-An Huang, Li-Wei Kang, Min-Chun Yang, Chia-Wen Lin, and Yu-Chiang Frank Wang, “Context-aware single image rain removal,” in ICME, 2012, pp. 164–169.
- [16] He Zhang, Vishwanath Sindagi, and Vishal M. Patel, “Image de-raining using a conditional generative adversarial network,” in CoRR, 2017, vol. abs/1701.05957.
- [17] Xia Li, Jianlong Wu, Zhouchen Lin, Hong Liu, and Hongbin Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” in ECCV, 2018, pp. 262–277.
- [18] Zhiwen Fan, Huafeng Wu, Xueyang Fu, Yue Huang, and Xinghao Ding, “Residual-guide network for single image deraining,” in ACM MM, 2018, pp. 1751–1759.
- [19] Dongwei Ren, Wangmeng Zuo, Qinghua Hu, Pengfei Zhu, and Deyu Meng, “Progressive image deraining networks: A better and simpler baseline,” in CVPR, 2019.
- [20] Xi Peng, Rogerio S. Feris, Xiaoyu Wang, and Dimitris N. Metaxas, “A recurrent encoder-decoder network for sequential face alignment,” in ECCV, 2016, pp. 38–56.
- [21] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia, “Pyramid scene parsing network,” in CVPR, 2017, pp. 6230–6239.
- [22] David Eigen, Christian Puhrsch, and Rob Fergus, “Depth map prediction from a single image using a multi-scale deep network,” 2014, vol. abs/1406.2283v1.
- [23] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong, and Yun Fu, “Residual dense network for image super-resolution,” 2018, vol. abs/1802.08797.
- [24] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” in CoRR, 2014, vol. abs/1412.6980.