2Nanjing University of Science and Technology
3Google Research 4University of California, Merced 5Yonsei University
Learning Discriminative Shrinkage Deep Networks for Image Deconvolution
Abstract
Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing a variety of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. This paper proposes an effective non-blind deconvolution approach by learning discriminative shrinkage functions to model these terms implicitly. Most existing methods use deep convolutional neural networks (CNNs) or radial basis functions to learn the regularization term simply. In contrast, we formulate both the data term and regularization term and split the deconvolution model into data-related and regularization-related sub-problems according to the alternating direction method of multipliers. We explore the properties of the Maxout function and develop a deep CNN model with a Maxout layer to learn discriminative shrinkage functions to approximate the solutions of these two sub-problems directly. Moreover, the fast-Fourier-transform-based image restoration usually leads to ringing artifacts. At the same time, the conjugate-gradient-based approach is time-consuming; we develop the Conjugate Gradient Network to restore the latent clear images effectively and efficiently. Experimental results show that the proposed method performs favorably against the state-of-the-art in terms of efficiency and accuracy. Source codes, models, and more results are available at https://github.com/setsunil/DSDNet.
1 Introduction
The single image deconvolution, or deblurring, aims to restore a clear and sharp image from a single blurry input image. Blind image deblurring has attracted interest from many researchers [31, 24, 49, 43]. With the rapid development of deep learning, tremendous progress has been made in blind image deblurring recently [26, 62, 74, 13, 59]. Since the kernel is available via blind methods, how to utilize these kernels well is still an important issue. Therefore, non-blind deconvolution has never lost the attention of researchers over the past decades [50, 14, 8, 10]. Due to the deconvolution problem’s ill-posedness, numerous methods explore the statistical properties of clear images as the image priors (e.g., hyper-Laplacian prior [23, 30]) to make this problem tractable. Although using the hand-crafted image priors facilitates ringing artifacts removal, the fine details are not restored well, as these limited priors may not model the inherent properties of various latent images sufficiently.
To overcome this problem, discriminative image priors are learned from training examples [58, 56, 9, 48]. These methods usually leverage radial basis functions (RBFs) as the shrinkage functions of the image prior. However, the RBFs contain many parameters, leading to complex optimization problems.
Deep convolutional neural network (CNN) has been developed to learn more effective regularization terms for the deconvolution problem [70]. These methods are motivated by [56] and directly estimate the solution to the regularization-related sub-problem by deep CNN models. As analyzed by [56], we can obtain the solution to the regularization-related sub-problem by combining the shrinkage functions. However, as the shrinkage functions are complex (e.g., non-Monotonic), simply using the convolution operation followed by the common activation functions, e.g., ReLU, cannot model the property of the shrinkage functions. Given the effectiveness of the deep features, it is of great interest to learn discriminative shrinkage functions. Therefore, if we can learn more complex shrinkage functions corresponding to the deep features, they shall surpass the hand-crafted ones in solving the regularization-related sub-problem.
We note that image restoration involves an image deconvolution step, usually taking advantage of fast Fourier transform (FFT) [56, 43, 73] or the Conjugate Gradient (CG) method [2, 28, 10]. However, the FFT-based approaches usually lead to ringing artifacts, while the CG-based ones are time-consuming. Besides, these two methods suffer from information loss: for FFT, we lose little information when we discard the imaginary parts in the inverse real FFT; for the CG method, the iterations we execute are usually far less than the upper bound. Therefore, developing an effective yet efficient image restoration method is also necessary.
In this paper, we develop a simple and effective model to discriminatively learn the shrinkage functions for non-blind deconvolution, which is called Discriminative Shrinkage Deep Network (DSDNet). We formulate the data and regularization terms as the learnable ones and split the image deconvolution model into the data-related sub-problem and regularization-related sub-problem. As shrinkage functions can solve both these sub-problems, and the learnable Maxout functions can efficiently approximate any complex functions, we directly learn the shrinkage functions of sub-problems via a deep CNN model with Maxout layers [16]. To effectively and efficiently generate clear images from the output of the learned functions, we develop a fully convolutional Conjugate Gradient Network (CGNet) motivated by the mathematical concept of the CG method. Finally, we formulate our method into an end-to-end network and solve the proposed approach by an Alternating Direction Method of Multipliers (ADMM) [44]. Experimental results show that the proposed method performs favorably against the state-of-the-art ones.
The main contributions of this work are:
-
•
We propose a simple yet effective non-blind deconvolution model to directly learn discriminative shrinkage functions to model the data and regularization terms for image deconvolution implicitly.
-
•
We develop an efficient and effective CGNet to restore clear images without the problems of CG and FFT.
-
•
The architecture of DSDNet is designed elaborately, which makes it flexible in model size and easy to be trained. Even the smallest DSDNet performs favorably against the state-of-the-art methods in speed and accuracy.
2 Related Work
Because numerous image deconvolution methods have been proposed, we discuss those most relevant to this work.
Statistical Image Prior-Based Methods.
Since non-blind deconvolution is an ill-posed problem, conventional methods usually develop image priors based on the statistical properties of clear images. Representative methods include total variation [52, 6, 65, 46], hyper-Laplacian prior [28, 23], and patch-based prior [61, 17, 75], to name a few. However, these hand-crafted priors may not model the inherent properties of the latent image well; thus, these methods do not effectively restore realistic images.
Learning-Based Methods.
To overcome the above limitations of the hand-crafted priors, researchers have proposed learning-based approaches, e.g., Markov random fields [51, 54], Gaussian mixture models [75], conditional random fields [63, 19, 57, 55], and radial basis functions [56, 9].
The learning-based non-blind deconvolution also gets deeper with the development of neural networks. Many methods use deep CNNs to model the regularization term and solve image restoration problems by unrolling existing optimization algorithms. For example, Iterative Shrinkage-Thresholding algorithm [69, 67], Douglas-Rachford method [1], Half-Quadratic Splitting algorithm [33, 71, 21, 70, 4, 32, 7, 22], gradient descend [15, 47] and ADMM [68]. These methods use deep CNN models to estimate the solution to the regularization-related sub-problem. As demonstrated by [56], the solutions are the combinations of the shrinkage functions. Simply using deep CNN models does not model the shrinkage functions well since most activation functions are too simple. Besides, most of them focus on the regularization terms yet ignore the importance of data terms. In addition, the image restoration step in these methods usually depends on an FFT-based solution. However, using FFT may lead to results with ringing artifacts. Even though the edge taper [25] alleviates artifacts, they are still inevitable in many scenes.
To overcome these problems, we leverage Maxout layers to learn discriminative shrinkage functions for regularization and data terms and develop the CGNet to restore images better. Furthermore, we adopt average pooling for noise level estimation and residual block for re-weights computation. In other words, we design each component according to the mathematical characteristics rather than stacking as many convolutional layers as possible, as in most previous works.
Blind Deblurring Methods.
Numerous end-to-end deep networks [60, 40, 62, 26] have been developed to restore clear images from blurry images directly. However, as demonstrated in [10], when the blur kernels are given, these methods do not perform well compared to the non-blind deconvolution methods. As non-blind deconvolution is vital for image restoration, we focus on this problem and develop a simple and effective approach to restoring high-quality images.


3 Revisiting Deep Unrolling-Based Methods
We first revisit deep unrolling-based methods for image deconvolution to motivate our work. Mathematically, the degradation process of the image blur is usually formulated as:
(1) |
where denotes the convolution operator; , , and denote the blurry image, the blur kernel, the latent image and noise, respectively. With the known kernel , we usually use formulate the deconvolution as a maximum-a-posteriori (MAP) problem:
(2) |
where is the likelihood of the observation (blurry) , while denotes an image prior of the latent image . This equation is equivalent to
(3) |
where and denote the regularization term and data term. In addition, the data term is usually modeled in the form of -norm, then (3) can be rewritten as
(4) |
where , denote the vector forms of and , respectively; denotes the Toeplitz matrix of the blur kernel . The ADMM method for image deconvolution is usually achieved by solving:
(5) |
where is an auxiliary variable, is a Lagrangian multiplier, is a weight parameter.
The solution of (5) can be obtained by alternatively solving:
(6a) | |||
(6b) | |||
(6c) |
Existing methods [70, 73, 72] usually solve (6a) via fast Fourier transform (FFT) or Conjugate Gradient methods. For (6b), its solution can be represented as a proximal operator:
(7) |
where . With , the multiplier in the (6) and (7) can be absorbed [44]. As demonstrated in [56], (7) can be approximated by shrinkage functions. Existing methods usually use deep CNN model to approximate the solution of (7). However, simply using the convolution operation followed by the fixed activation functions (e.g., ReLU) cannot model the shrinkage functions well as they are far more complex (e.g., non-monotonic) [56]. To better approximate the solution of (6b), we develop a deep CNN model with the Maxout function [16], which can effectively approximate proximal functions. In addition, we note that using FFT to solve (6a) does not obtain better results than the CG method demonstrated by [28, 29]. However, the CG method is time-consuming and unstable in deep networks (see Section 5.4 for more detail). To overcome this problem, we learn a differentiable CG network to restore a clear image more efficiently and effectively.
4 Proposed Method
Different from existing methods that simply learn the regularization term or the data term [70, 4, 32], we formulate both the data term and the regularization term as the learnable ones:
(8) | ||||
where denotes the -th learnable function; and are auxiliary variables that correspond to the regularization and data terms; and are the -th and -th learnable filters for regularization and data, respectively.
By introducing the Lagrangian multipliers and corresponding to the regularization and data terms, we can solve (8) using the ADMM method by:
(9a) | ||||
(9b) | ||||
(9c) | ||||
(9d) | ||||
(9e) |
In the following, we will develop deep CNN models with a Maxout layer to approximate the functions of (9a) and (9b). Moreover, we design a simple and effective deep CG network to solve (9c).
4.1 Network Architecture
Learning Filters and .
To learn filters and , we develop two networks ( and ), each containing one convolutional layer. The convolutional layer of has filters of pixels, and the convolutional layer of contains filters of the same size. The and are applied to and to learn the filters and , respectively.
Learning Discriminative Shrinkage Functions for (9a) and (9b).
To better learn the unknown discriminative shrinkage functions of (9a) and (9b), we take advantage of Maxout layers [16]. Specifically, the convolutional Maxout layer consists of two Maxout units. Each Maxout unit contains one convolutional layer followed by a channel-wise Max-pooling layer. Given an input feature map and the output feature map of the convolutional layer, a Maxout unit is achieved by:
(10) |
where , and ; the is the element of at the position , and the is the function to output the -th element of the output tensor . In our implementation, we have is of the same size as the input and .
Learning a Differentiable CG Network for (9c).
As stated in Section 3, although using FFT with boundary processing operations (e.g., edge taper and Laplacian smoothing [35]) can efficiently solve (9c), the results are not better than the CG-based solver, which can be observed in Table 5. However, using a CG-based solver is time-consuming. To generate latent clear images better, we develop a differentiable CG network to solve (9c). The CG method is used to solve the linear equation:
(11) |
where corresponds to the first term in (9c) and to the last term in (9c). Given , the CG method recursively computes conjugate vectors and find the difference between desired and initial input as
where is the iteration number upper-bounded by , and is the weight calculated with .
However, if the size of matrix is large, using CG to solve (11) needs high computational costs. To overcome this problem, we develop a differentiable CG network based on a U-Net to compute the . The network design is motivated by the following reasons:
-
•
As one of Krylov subspace methods [2], the solution can be found in the Krylov subspace , where is the residual vector. In other words, the CG method is a function of and . Our CGNet takes as input and as parts of the network. Its output is , which behaves as the CG method.
-
•
For a typical deconvolution problem, is composed of convolution and transpose convolution pairs as the first term in (9c). stands for the operation of and , and for and . This observation can be intuitively connected to an encoder-decoder architecture, so we integrate into the encoder and into the decoder.
- •
-
•
As the Conjugate Gradient method is a recursive algorithm, U-Net computes feature maps in a recursive fashion.
In the practical CG iterations, and are usually updated with iterative reweighted least squares (IRLS) to utilize the sparsity of priors [29]. We design a simple HypNet to estimate these weights. The network architecture of the HypNet is shown in Fig. 1. In addition, we note that the values of and in (9c) depend on the noise level. We design a simple NLNet to estimate the noise map , which plays a role similar to and (see Fig. 1). In contrast to most conventional methods, the NLNet computes the weight for each pixel, which is locally adaptive.
5 Experimental Results
5.1 Datasets and Implementation Details
Training Dataset.
Similar to [10, 11], the training data is composed of 4,744 images from the Waterloo Exploration dataset [36] and 400 images from the Berkeley segmentation dataset (BSD) [38]. To synthesize blurry images, we first generate 33,333 blur kernels by [55], where the sizes of these generated blur kernels range from pixels to pixels. We first crop image patches of pixels from each image, and then we randomly use generated blur kernels to generate blurry images. Each blurry image is randomly added Gaussian noise with noise levels from 1% to 5%.
Test Datasets.
We evaluate our method on both synthetic datasets and real ones. For the synthetic case, we use the 100 images from BSD100 and 100 kernels by [55] to generate blurry images, similar to training data. We also use the Set5 [3] dataset with the kernels generated by [5] as our test dataset. In addition, we use the datasets by Levin [30] and Lai [27] for evaluation.
For the real-world dataset, we evaluate our model on the data of Pan [43], where 23 blurry images and 23 kernels estimated by their method are contained.
Feather | Light | Heavy | Full | |
2 | 3 | 3 | 4 | |
24 | 24 | 49 | 49 | |
PSNR | 32.51 | 32.78 | 33.11 | 33.43 |
Second | 1.893 | 2.191 | 2.672 | 3.065 |
Implementation Details.
We train the networks using the ADAM [20] optimizer with default parameter settings. The batch size is 8. The total training iterations is 1 million, and the learning rate is from to . We gradually decay the learning rate to every 250,000 iterations and reset it to , and at the iteration of 250,001, 500,001 and 750,001, respectively. To constrain the network training, we apply the commonly used -norm loss to the ground truth and the network output . The data augmentation (including 90°and 180°rotations, vertical and horizontal flipping) is used. In this paper, we train 4 models of different sizes, i.e., Feather, Light, Heavy and Full, whose configurations and simple results on Set5 are shown in Table 1.
5.2 Quantitative Evaluation
We compare the proposed DSDNet with 7 state-of-the-art methods including IRCNN [73], SARFL [48], ADM_UDM [21], CPCR [12], KerUNC [41], VEM [42], DWDN [10], SVMAP [9] and DRUNet [72]. These methods are fine-tuned using the same training dataset as Section 5.1 and choose the best models from fine-tuned and the original ones for comparison.
PSNR and SSIM [66] are used for quantitative evaluation. All the quantitative evaluations are conducted without border cropping for fair comparisons.
Table 3 shows quantitative evaluation results on the synthetic datasets. The proposed method generates results with higher PSNR and SSIM values. In addition, we note that the proposed light-weighted model generates favorable results against the state-of-the-art, showing the effectiveness of the proposed algorithm. Due to the space limit, we only present the evaluation results of the Full DSDNet hereafter.
Dataset | noise | IRCNN [73] | SFARL[48] | ADM_UDM [21] | CPCR [12] | KerUNC [41] | VEM [42] | DWDN [10] | SVMAP [11] | DRUNet [72] | DSDNet(Light) | DSDNet(Full) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | ||
Levin [30] | 30.61 / 0.883 | 25.41 / 0.600 | 31.48 / 0.922 | 28.43 / 0.858 | 32.02 / 0.928 | 32.05 / 0.927 | 34.89 / 0.957 | 35.24 / 0.962 | 31.94 / 0.922 | 35.48 / 0.960 | 36.62 / 0.965 | |
29.70 / 0.864 | 16.82 / 0.255 | 28.61 / 0.812 | 25.61 / 0.765 | 21.72 / 0.416 | 29.47 / 0.867 | 31.94 / 0.916 | 31.20 / 0.893 | 30.86 / 0.905 | 32.13 / 0.918 | 32.89 / 0.925 | ||
28.98 / 0.854 | 13.07 / 0.157 | 27.83 / 0.827 | 23.68 / 0.703 | 18.25 / 0.272 | 27.79 / 0.819 | 30.21 / 0.883 | 30.12 / 0.876 | 29.79 / 0.880 | 30.24 / 0.883 | 30.94 / 0.893 | ||
BSD100 [38] | 29.20 / 0.817 | 24.21 / 0.568 | 29.39 / 0.836 | 28.77 / 0.829 | 29.23 / 0.829 | 29.54 / 0.848 | 31.10 / 0.881 | 31.52 / 0.888 | 30.36 / 0.872 | 31.50 / 0.892 | 32.01 / 0.898 | |
27.54 / 0.762 | 15.80 / 0.245 | 26.92 / 0.722 | 25.96 / 0.712 | 22.10 / 0.430 | 27.09 / 0.746 | 28.47 / 0.797 | 27.94 / 0.762 | 28.10 / 0.798 | 28.73 / 0.812 | 29.08 / 0.820 | ||
27.04 / 0.756 | 12.56 / 0.146 | 26.04 / 0.697 | 25.75 / 0.688 | 18.99 / 0.297 | 26.11 / 0.698 | 27.50 / 0.762 | 27.59 / 0.763 | 27.19 / 0.767 | 27.64 / 0.774 | 27.96 / 0.782 | ||
Set5 [3] | 30.15 / 0.853 | 26.21 / 0.632 | 30.52 / 0.868 | 30.59 / 0.875 | 30.45 / 0.864 | 31.00 / 0.875 | 32.18 / 0.893 | 32.31 / 0.892 | 30.84 / 0.881 | 32.78 / 0.899 | 33.43 / 0.905 | |
28.66 / 0.813 | 15.50 / 0.211 | 27.64 / 0.709 | 27.94 / 0.799 | 21.39 / 0.376 | 28.40 / 0.804 | 29.54 / 0.838 | 28.78 / 0.812 | 29.21 / 0.841 | 29.94 / 0.843 | 30.40 / 0.851 | ||
27.55 / 0.789 | 11.91 / 0.122 | 26.75 / 0.756 | 26.64 / 0.754 | 17.74 / 0.241 | 26.46 / 0.732 | 28.13 / 0.806 | 28.02 / 0.793 | 27.85 / 0.805 | 28.46 / 0.804 | 28.89 / 0.814 |
Subset | IRCNN [73] | ADM_UDM [21] | KerUNC [41] | VEM [42] | DWDN [10] | SVMAP [11] | DRUNet [72] | DSDNet |
---|---|---|---|---|---|---|---|---|
PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | PSNR / SSIM | |
Manmade | 20.47 / 0.604 | 22.43 / 0.724 | 22.19 / 0.725 | 22.71 / 0.780 | 24.02 / 0.836 | 23.75 / 0.776 | 20.62 /0.613 | 25.44 / 0.859 |
Natural | 23.26 / 0.636 | 25.04 / 0.733 | 25.42 / 0.757 | 25.29 / 0.752 | 25.91 / 0.814 | 26.23 / 0.778 | 23.25 /0.630 | 27.01 / 0.837 |
People | 28.04 / 0.843 | 28.81 / 0.866 | 28.80 / 0.848 | 27.19 / 0.723 | 30.02 / 0.905 | 30.88 / 0.899 | 28.04 /0.838 | 30.95 / 0.908 |
Saturation | 16.99 / 0.642 | 17.57 / 0.627 | 17.70 / 0.640 | 17.65 / 0.600 | 17.90 / 0.695 | 18.75 / 0.733 | 17.14 /0.658 | 18.38 / 0.734 |
Text | 21.37 / 0.828 | 25.13 / 0.883 | 23.32 / 0.855 | 24.92 / 0.853 | 25.40 / 0.877 | 25.60 / 0.894 | 21.79 /0.829 | 28.13 / 0.920 |
Overall | 22.03 / 0.710 | 23.80 / 0.767 | 23.49 / 0.765 | 23.55 / 0.742 | 24.65 / 0.825 | 25.04 / 0.816 | 22.17 /0.714 | 25.98 / 0.852 |
We then evaluate our method on the Lai [27] dataset. Because it contains Manmade, Natural, People, Saturation, and Text subsets, we present the evaluation accordingly. Table 3 shows that our method performs better than the evaluated methods. Similar to Table 3, our method also achieves the highest PSNR and SSIM in most tests, except for the Saturation subset. We note that Dong et al. [11] specifically train a model for saturation scenes; thus, this method performs slightly better. However, our model is only trained with common scenes but comparable in terms of PSNR on the Saturation images. Moreover, the SSIM values of our method are better than SVMAP [11], demonstrating the efficiency and robustness of our approach.
We also quantitatively evaluate our method on real-world blurry images and estimated kernels from Pan [43]. Since the ground truth images are unavailable, we use the no-reference BRISQUE [39] and PIQE [65] metrics for evaluation. Our model achieves the best score in BRISQUE and second place in PIQE, as shown in Table 4. As BRISQUE is a metric based on subject scoring, Table 4 shows that our model generates more subjectively satisfying results than other state-of-the-art methods.










5.3 Qualitative Evaluation
Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate blur results. In contrast, our method reconstructs better images (e.g., the wood texture is better restored, as shown in both red and green boxes).


















We show some visual comparisons on BSD100, Lai and a real-world case in Fig. 3, Fig. 4 and Fig. 5, respectively.
Fig. 3 shows the deblurred results of one synthetic blurry image with Gaussian noise. Our method generates the clearer image with finer details as shown in the parts enclosed in the red and green boxes.
Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate the results with blur effect. In contrast, our method reconstructs better images (e.g., the wood texture is better restored as shown in both red and green boxes).
Fig. 5 shows the deblurred results of a pair of real-world blurry images and estimated kernel [43]. Our method restores the text to sharpness in the red boxes and the blackest and sharpest eyebrow in the green boxes. In contrast, other methods cannot restore the text well and mix the eyebrow with the skin color. Furthermore, they also generate artifacts and noise in the olive background.
5.4 Ablation Study
w/o , | ReLU | RBF | CG | CG† | FFT | FFT† | DSDNet | |
---|---|---|---|---|---|---|---|---|
PSNR(dB) | 26.66 | 32.78 | 32.98 | 29.07 | 32.39 | 31.30 | 32.03 | 33.11 |
FLOPs(G) | 136.26 | 464.94 | 468.03 | 470.73 | 559.97 | 288.89 | 391.57 | 466.32 |
Parameters(M) | 1.04 | 237.85 | 237.85 | 0.31 | 32.95 | 0.27 | 32.09 | 237.87 |
Gain(dB) | -6.45 | -0.33 | -0.13 | -4.04 | -0.72 | -1.81 | -1.08 | -/- |
In this section, we design experiments to show the efficiency of the proposed discriminative shrinkage functions and the differentiable CGNet. Table 5 shows the ablation results w.r.t. different baselines. In this study, we train 7 models based on the architecture of the Heavy DSDNet. We compare the number of floating point operations (FLOPs) and the parameters in this study.
To validate the effects of the and , we train a model without estimating these two filters, denoted by “w/o ,”. Without the feature maps coming from them, we can only learn the shrinkage functions from RGB inputs. Table 5 shows that the PSNR value of the results by the baseline method without and is at least 6.45dB lower than that of our approach.
We also evaluate the effect of the Maxout layers by replacing them with ReLU. Table 5 shows that the method using ReLU does not generate better results than the proposed method, suggesting the effectiveness of the Maxout layers.
As RBFs are usually used to approximate shrinkage functions, one may wonder whether using them generates better results or not. To answer this question, we replace the Maxout layers with the commonly-used Gaussian RBFs. Table 5 shows that the PSNR value of the method using RBFs is at least 0.13dB lower than that of our method, indicating the effectiveness of the proposed method.
Finally, to demonstrate the efficiency of the proposed CGNet, we train 2 models with the conventional CG method and 2 models with FFT deconvolution. The CG method is unstable with respect to even small perturbations [37, 18]. Each optimization step in training and the existence of noise may cause the divergence of the CG method. Hence we have to reduce the learning rate and apply gradient clipping to avoid the gradient exploding during the training. However, it still performs poorly even if the training can be finished. To make the training more feasible, we denoise the inputs first by DRUNet [72], and this model is denoted as “CG†”. The training of “CG†” is smoother, the learning rate can be set as that of the DSDNet, and the performance is much better than generic “CG”. With another model to denoise, the computational cost is about 26% more FLOPs, and it takes more than twice the time for training compared to the DSDNet.
Similar to CG ones, we also provide results of deconvolution via FFT with the artifact processing operation by [25], i.e., “FFT” and “FFT†”. Although FFT ones are much faster than CG ones, the gap between “CG†” and “FFT†” is considerable, as mentioned in Section 3. As for the test time on Set5, “CG” is 5.6001 seconds, “CG†” is 5.8252 seconds, “FFT” is 3.6941 seconds, “FFT†” is 4.0330 seconds, and DSDNet is 2.6717 seconds. These result show the efficiency of the proposed CGNet. We include the ablation study on HypNet and NLNet in the supplementary material.

5.5 Execution Time Analysis
We analyze the execution time of the proposed methods and state-of-the-art ones. All the execution times are evaluated on one nVidia RTX 2080Ti GPU. Fig. 6 shows that our models are faster and more accurate than the state-of-the-art methods. Among these methods, our Feather model is a little bit faster than DWDN by 0.0052 seconds yet outperforms all other methods in PSNR.
5.6 Limitations
Although better performance on various datasets has been achieved, our method has some limitations. Our model cannot deal with blurry images containing significant saturation regions, which may lead to overflow. More analysis can be found in the supplementary material.
6 Conclusion
In this paper, we present a fully learnable MAP model for non-blind deconvolution. We formulate the data and regularization terms as the learnable ones and split the deconvolution model into data-related and regularization-related sub-problems in the ADMM framework. Maxout layers are used to learn the discriminative shrinkage functions, which directly approximate the solutions of these two sub-problems. We have further developed a CGNet to restore the images effectively and efficiently. With a reasonable design, the size of our model is flexibly adjustable while keeping competitive in performance. Extensive evaluations on the benchmark datasets demonstrate that the proposed model performs favorably against the state-of-the-art non-blind deconvolution methods in terms of quantitative metrics, visual quality, and computational efficiency.
References
- [1] Aljadaany, R., Pal, D.K., Savvides, M.: Douglas-rachford networks: Learning both the image prior and data fidelity terms for blind image deconvolution. In: CVPR. pp. 10235–10244 (2019)
- [2] Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the solution of Linear Systems: Building Blocks for Iterative Methods. SIAM (1994)
- [3] Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC (2012)
- [4] Bigdeli, S.A., Zwicker, M., Favaro, P., Jin, M.: Deep mean-shift priors for image restoration. In: NeurIPS (2017)
- [5] Chakrabarti, A.: A neural approach to blind motion deblurring. In: ECCV. pp. 221–235 (2016)
- [6] Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE TIP 7(3), 370–375 (1998)
- [7] Chen, L., Zhang, J., Pan, J., Lin, S., Fang, F., Ren, J.S.: Learning a non-blind deblurring network for night blurry images. In: CVPR. pp. 10542–10550 (2021)
- [8] Cho, S., Wang, J., Lee, S.: Handling outliers in non-blind image deconvolution. In: ICCV. pp. 495–502 (2011)
- [9] Dong, J., Pan, J., Sun, D., Su, Z., Yang, M.H.: Learning data terms for non-blind deblurring. In: ECCV. pp. 748–763 (2018)
- [10] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: NeurIPS (2020)
- [11] Dong, J., Roth, S., Schiele, B.: Learning spatially-variant map models for non-blind image deblurring. In: CVPR. pp. 4886–4895 (2021)
- [12] Eboli, T., Sun, J., Ponce, J.: End-to-end interpretable learning of non-blind image deblurring. In: European Conference on Computer Vision. pp. 314–331. Springer (2020)
- [13] Gao, H., Tao, X., Shen, X., Jia, J.: Dynamic scene deblurring with parameter selective sharing and nested skip connections. In: CVPR. pp. 3848–3856 (2019)
- [14] Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE TPAMI 14(3), 367–383 (1992)
- [15] Gong, D., Zhang, Z., Shi, Q., van den Hengel, A., Shen, C., Zhang, Y.: Learning deep gradient descent optimization for image deconvolution. IEEE transactions on neural networks and learning systems 31(12), 5468–5482 (2020)
- [16] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML. pp. 1319–1327 (2013)
- [17] Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: CVPR. pp. 2862–2869 (2014)
- [18] Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Courier Corporation (2003)
- [19] Jancsary, J., Nowozin, S., Rother, C.: Loss-specific training of non-parametric image restoration models: A new state of the art. In: ECCV. pp. 112–125 (2012)
- [20] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- [21] Ko, H.C., Chang, J.Y., Ding, J.J.: Deep priors inside an unrolled and adaptive deconvolution model. In: ACCV (2020)
- [22] Kong, S., Wang, W., Feng, X., Jia, X.: Deep red unfolding network for image restoration. IEEE TIP 31, 852–867 (2021)
- [23] Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: NeurIPS. pp. 1033–1041 (2009)
- [24] Krishnan, D., Tay, T., Fergus, R.: Blind deconvolution using a normalized sparsity measure. In: CVPR. pp. 233–240 (2011)
- [25] Kruse, J., Rother, C., Schmidt, U.: Learning to push the limits of efficient fft-based image deconvolution. In: ICCV. pp. 4586–4594 (2017)
- [26] Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR. pp. 8183–8192 (2018)
- [27] Lai, W.S., Huang, J.B., Hu, Z., Ahuja, N., Yang, M.H.: A comparative study for single image blind deblurring. In: CVPR. pp. 1701–1709 (2016)
- [28] Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM TOG 26(3), 70–es (2007)
- [29] Levin, A., Weiss, Y.: User assisted separation of reflections from a single image using a sparsity prior. IEEE TPAMI 29(9), 1647–1654 (2007)
- [30] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding and evaluating blind deconvolution algorithms. In: CVPR. pp. 1964–1971 (2009)
- [31] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding blind deconvolution algorithms. IEEE TPAMI 33(12), 2354–2367 (2011)
- [32] Li, L., Pan, J., Lai, W.S., Gao, C., Sang, N., Yang, M.H.: Blind image deblurring via deep discriminative priors. IJCV 127(8), 1025–1043 (2019)
- [33] Li, Y., Tofighi, M., Geng, J., Monga, V., Eldar, Y.C.: Deep algorithm unrolling for blind image deblurring. arXiv preprint arXiv:1902.03493 (2019)
- [34] Liu, C.S.: Modifications of steepest descent method and conjugate gradient method against noise for ill-posed linear systems. Commun. Numer. Anal 2012 (2012)
- [35] Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: ICIP. pp. 505–508 (2008)
- [36] Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE TIP 26(2), 1004–1016 (2016)
- [37] Marin, L., Háo, D.N., Lesnic, D.: Conjugate gradient-boundary element method for a cauchy problem in the lamé system. WIT Transactions on Modelling and Simulation 27 (2001)
- [38] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV. pp. 416–423 (2001)
- [39] Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE TIP 21(12), 4695–4708 (2012). https://doi.org/10.1109/TIP.2012.2214050
- [40] Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR. pp. 3883–3891 (2017)
- [41] Nan, Y., Ji, H.: Deep learning for handling kernel/model uncertainty in image deconvolution. In: CVPR. pp. 2388–2397 (2020)
- [42] Nan, Y., Quan, Y., Ji, H.: Variational-em-based deep learning for noise-blind image deblurring. In: CVPR. pp. 3626–3635 (2020)
- [43] Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: CVPR. pp. 1628–1636 (2016)
- [44] Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1(3), 127–239 (2014)
- [45] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (eds.) NeurIPS. pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- [46] Perrone, D., Favaro, P.: Total variation blind deconvolution: The devil is in the details. In: CVPR. pp. 2909–2916 (2014)
- [47] Qiu, H., Hammernik, K., Qin, C., Rueckert, D.: Gradirn: Learning iterative gradient descent-based energy minimization for deformable image registration. arXiv preprint arXiv:2112.03915 (2021)
- [48] Ren, D., Zuo, W., Zhang, D., Zhang, L., Yang, M.H.: Simultaneous fidelity and regularization learning for image restoration. IEEE TPAMI (2019)
- [49] Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhanced low-rank prior. IEEE TIP 25(7), 3426–3437 (2016)
- [50] Richardson, W.H.: Bayesian-based iterative method of image restoration. JoSA 62(1), 55–59 (1972)
- [51] Roth, S., Black, M.J.: Fields of experts: A framework for learning image priors. In: CVPR. pp. 860–867 (2005)
- [52] Rudin, L.I., Osher, S.: Total variation based image restoration with free local constraints. In: ICIP. vol. 1, pp. 31–35 (1994)
- [53] Ryabtsev, A.: The error accumulation in the conjugate gradient method for degenerate problem. arXiv preprint arXiv:2004.10242 (2020)
- [54] Samuel, K.G., Tappen, M.F.: Learning optimized map estimates in continuously-valued mrf models. In: CVPR. pp. 477–484 (2009)
- [55] Schmidt, U., Jancsary, J., Nowozin, S., Roth, S., Rother, C.: Cascades of regression tree fields for image restoration. IEEE TPAMI 38(4), 677–689 (2015)
- [56] Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: CVPR. pp. 2774–2781 (2014)
- [57] Schmidt, U., Rother, C., Nowozin, S., Jancsary, J., Roth, S.: Discriminative non-blind deblurring. In: CVPR. pp. 604–611 (2013)
- [58] Schuler, C.J., Christopher Burger, H., Harmeling, S., Scholkopf, B.: A machine learning approach for non-blind image deconvolution. In: CVPR. pp. 1067–1074 (2013)
- [59] Suin, M., Purohit, K., Rajagopalan, A.: Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In: CVPR. pp. 3606–3615 (2020)
- [60] Sun, J., Cao, W., Xu, Z., Ponce, J.: Learning a convolutional neural network for non-uniform motion blur removal. In: CVPR. pp. 769–777 (2015)
- [61] Sun, L., Cho, S., Wang, J., Hays, J.: Edge-based blur kernel estimation using patch priors. In: ICCP. pp. 1–8 (2013)
- [62] Tao, X., Gao, H., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. In: CVPR. pp. 8174–8182 (2018)
- [63] Tappen, M.F., Liu, C., Adelson, E.H., Freeman, W.T.: Learning gaussian conditional random fields for low-level vision. In: CVPR. pp. 1–8 (2007)
- [64] Venkatanath, N., Praneeth, D., Bh, M.C., Channappayya, S.S., Medasani, S.S.: Blind image quality evaluation using perception based features. In: National Conference on Communications (NCC). pp. 1–6 (2015)
- [65] Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)
- [66] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
- [67] Xiang, J., Dong, Y., Yang, Y.: Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE TMI (2021)
- [68] Yang, Y., Sun, J., Li, H., Xu, Z.: Deep admm-net for compressive sensing mri. In: NeurIPS. pp. 10–18 (2016)
- [69] Zhang, J., Ghanem, B.: Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. In: CVPR. pp. 1828–1837 (2018)
- [70] Zhang, J., shan Pan, J., Lai, W.S., Lau, R.W.H., Yang, M.H.: Learning fully convolutional networks for iterative non-blind deconvolution. In: CVPR. pp. 6969–6977 (2017)
- [71] Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: CVPR. pp. 3217–3226 (2020)
- [72] Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE TPAMI (2021)
- [73] Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: CVPR. pp. 3929–3938 (2017)
- [74] Zhang, K., Zuo, W., Zhang, L.: Deep plug-and-play super-resolution for arbitrary blur kernels. In: CVPR. pp. 1671–1681 (2019)
- [75] Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: ICCV. pp. 479–486 (2011)