¹¹institutetext: ¹Graduate Institute of Electronics Engineering, National Taiwan University
²Nanjing University of Science and Technology
³Google Research ⁴University of California, Merced ⁵Yonsei University

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

Pin-Hung Kuo Jinshan Pan Shao-Yi Chien Ming-Hsuan Yang 112211334455

Abstract

Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing a variety of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. This paper proposes an effective non-blind deconvolution approach by learning discriminative shrinkage functions to model these terms implicitly. Most existing methods use deep convolutional neural networks (CNNs) or radial basis functions to learn the regularization term simply. In contrast, we formulate both the data term and regularization term and split the deconvolution model into data-related and regularization-related sub-problems according to the alternating direction method of multipliers. We explore the properties of the Maxout function and develop a deep CNN model with a Maxout layer to learn discriminative shrinkage functions to approximate the solutions of these two sub-problems directly. Moreover, the fast-Fourier-transform-based image restoration usually leads to ringing artifacts. At the same time, the conjugate-gradient-based approach is time-consuming; we develop the Conjugate Gradient Network to restore the latent clear images effectively and efficiently. Experimental results show that the proposed method performs favorably against the state-of-the-art in terms of efficiency and accuracy. Source codes, models, and more results are available at https://github.com/setsunil/DSDNet.

1 Introduction

The single image deconvolution, or deblurring, aims to restore a clear and sharp image from a single blurry input image. Blind image deblurring has attracted interest from many researchers [31, 24, 49, 43]. With the rapid development of deep learning, tremendous progress has been made in blind image deblurring recently [26, 62, 74, 13, 59]. Since the kernel is available via blind methods, how to utilize these kernels well is still an important issue. Therefore, non-blind deconvolution has never lost the attention of researchers over the past decades [50, 14, 8, 10]. Due to the deconvolution problem’s ill-posedness, numerous methods explore the statistical properties of clear images as the image priors (e.g., hyper-Laplacian prior [23, 30]) to make this problem tractable. Although using the hand-crafted image priors facilitates ringing artifacts removal, the fine details are not restored well, as these limited priors may not model the inherent properties of various latent images sufficiently.

To overcome this problem, discriminative image priors are learned from training examples [58, 56, 9, 48]. These methods usually leverage radial basis functions (RBFs) as the shrinkage functions of the image prior. However, the RBFs contain many parameters, leading to complex optimization problems.

Deep convolutional neural network (CNN) has been developed to learn more effective regularization terms for the deconvolution problem [70]. These methods are motivated by [56] and directly estimate the solution to the regularization-related sub-problem by deep CNN models. As analyzed by [56], we can obtain the solution to the regularization-related sub-problem by combining the shrinkage functions. However, as the shrinkage functions are complex (e.g., non-Monotonic), simply using the convolution operation followed by the common activation functions, e.g., ReLU, cannot model the property of the shrinkage functions. Given the effectiveness of the deep features, it is of great interest to learn discriminative shrinkage functions. Therefore, if we can learn more complex shrinkage functions corresponding to the deep features, they shall surpass the hand-crafted ones in solving the regularization-related sub-problem.

We note that image restoration involves an image deconvolution step, usually taking advantage of fast Fourier transform (FFT) [56, 43, 73] or the Conjugate Gradient (CG) method [2, 28, 10]. However, the FFT-based approaches usually lead to ringing artifacts, while the CG-based ones are time-consuming. Besides, these two methods suffer from information loss: for FFT, we lose little information when we discard the imaginary parts in the inverse real FFT; for the CG method, the iterations we execute are usually far less than the upper bound. Therefore, developing an effective yet efficient image restoration method is also necessary.

In this paper, we develop a simple and effective model to discriminatively learn the shrinkage functions for non-blind deconvolution, which is called Discriminative Shrinkage Deep Network (DSDNet). We formulate the data and regularization terms as the learnable ones and split the image deconvolution model into the data-related sub-problem and regularization-related sub-problem. As shrinkage functions can solve both these sub-problems, and the learnable Maxout functions can efficiently approximate any complex functions, we directly learn the shrinkage functions of sub-problems via a deep CNN model with Maxout layers [16]. To effectively and efficiently generate clear images from the output of the learned functions, we develop a fully convolutional Conjugate Gradient Network (CGNet) motivated by the mathematical concept of the CG method. Finally, we formulate our method into an end-to-end network and solve the proposed approach by an Alternating Direction Method of Multipliers (ADMM) [44]. Experimental results show that the proposed method performs favorably against the state-of-the-art ones.

The main contributions of this work are:

•

We propose a simple yet effective non-blind deconvolution model to directly learn discriminative shrinkage functions to model the data and regularization terms for image deconvolution implicitly.
•

We develop an efficient and effective CGNet to restore clear images without the problems of CG and FFT.
•

The architecture of DSDNet is designed elaborately, which makes it flexible in model size and easy to be trained. Even the smallest DSDNet performs favorably against the state-of-the-art methods in speed and accuracy.

2 Related Work

Because numerous image deconvolution methods have been proposed, we discuss those most relevant to this work.

Statistical Image Prior-Based Methods.

Since non-blind deconvolution is an ill-posed problem, conventional methods usually develop image priors based on the statistical properties of clear images. Representative methods include total variation [52, 6, 65, 46], hyper-Laplacian prior [28, 23], and patch-based prior [61, 17, 75], to name a few. However, these hand-crafted priors may not model the inherent properties of the latent image well; thus, these methods do not effectively restore realistic images.

Learning-Based Methods.

To overcome the above limitations of the hand-crafted priors, researchers have proposed learning-based approaches, e.g., Markov random ﬁelds [51, 54], Gaussian mixture models [75], conditional random ﬁelds [63, 19, 57, 55], and radial basis functions [56, 9].

The learning-based non-blind deconvolution also gets deeper with the development of neural networks. Many methods use deep CNNs to model the regularization term and solve image restoration problems by unrolling existing optimization algorithms. For example, Iterative Shrinkage-Thresholding algorithm [69, 67], Douglas-Rachford method [1], Half-Quadratic Splitting algorithm [33, 71, 21, 70, 4, 32, 7, 22], gradient descend [15, 47] and ADMM [68]. These methods use deep CNN models to estimate the solution to the regularization-related sub-problem. As demonstrated by [56], the solutions are the combinations of the shrinkage functions. Simply using deep CNN models does not model the shrinkage functions well since most activation functions are too simple. Besides, most of them focus on the regularization terms yet ignore the importance of data terms. In addition, the image restoration step in these methods usually depends on an FFT-based solution. However, using FFT may lead to results with ringing artifacts. Even though the edge taper [25] alleviates artifacts, they are still inevitable in many scenes.

To overcome these problems, we leverage Maxout layers to learn discriminative shrinkage functions for regularization and data terms and develop the CGNet to restore images better. Furthermore, we adopt average pooling for noise level estimation and residual block for re-weights computation. In other words, we design each component according to the mathematical characteristics rather than stacking as many convolutional layers as possible, as in most previous works.

Blind Deblurring Methods.

Numerous end-to-end deep networks [60, 40, 62, 26] have been developed to restore clear images from blurry images directly. However, as demonstrated in [10], when the blur kernels are given, these methods do not perform well compared to the non-blind deconvolution methods. As non-blind deconvolution is vital for image restoration, we focus on this problem and develop a simple and effective approach to restoring high-quality images.

Refer to caption — Figure 1: Overview of the proposed method. The blue blocks and lines are the layers and flow of the regularization terms; the yellow ones are that of data terms. The HypNet is responsible for the reweighted maps the NLNet learns to control the weights of regularization and data terms according to the local noise level.

3 Revisiting Deep Unrolling-Based Methods

We first revisit deep unrolling-based methods for image deconvolution to motivate our work. Mathematically, the degradation process of the image blur is usually formulated as:

y=k\ast x+n,

(1)

where $\ast$ denotes the convolution operator; $y$ , $k$ , $x$ and $n$ denote the blurry image, the blur kernel, the latent image and noise, respectively. With the known kernel $k$ , we usually use formulate the deconvolution as a maximum-a-posteriori (MAP) problem:

x=\arg\max_{x}p(x|y,k)=\arg\max_{x}p(y|x,k)p(x),

(2)

where $p(y|x,k)$ is the likelihood of the observation (blurry) $y$ , while $p(x)$ denotes an image prior of the latent image $x$ . This equation is equivalent to

\min_{x}R(x)+D(y-k\ast x),

(3)

where $R(x)$ and $D(y-k\ast x)$ denote the regularization term and data term. In addition, the data term is usually modeled in the form of $\ell_{2}$ -norm, then (3) can be rewritten as

\min_{\mathbf{x}}\frac{1}{2}\|\mathbf{y-Hx}\|_{2}^{2}+R(\mathbf{x}),

(4)

where $\mathbf{x}$ , $\mathbf{y}$ denote the vector forms of $x$ and $y$ , respectively; $\mathbf{H}$ denotes the Toeplitz matrix of the blur kernel $k$ . The ADMM method for image deconvolution is usually achieved by solving:

\min_{\mathbf{x},\mathbf{u}}\frac{1}{2}\|\mathbf{y-Hx}\|_{2}^{2}+R(\mathbf{v})+\mathbf{u}^{\top}(\mathbf{v-x})+\frac{\rho}{2}\|\mathbf{v-x}\|_{2}^{2},

(5)

where $\mathbf{v}$ is an auxiliary variable, $\mathbf{u}$ is a Lagrangian multiplier, $\rho$ is a weight parameter.

The solution of (5) can be obtained by alternatively solving:


	$\displaystyle\mathbf{x}^{t+1}=\min_{\mathbf{x}}\\|\mathbf{y-Hx}\\|_{2}^{2}+\rho\\|\mathbf{v}^{t}-\mathbf{x}+\frac{\mathbf{u}^{t}}{\rho}\\|_{2}^{2},$		(6a)
	$\displaystyle\mathbf{v}^{t+1}=\min_{\mathbf{v}}\frac{\rho}{2}\\|\mathbf{v}-\mathbf{x}^{t+1}+\frac{\mathbf{u}^{t}}{\rho}\\|_{2}^{2}+R(\mathbf{v}),$		(6b)
	$\displaystyle\mathbf{u}^{t+1}=\mathbf{u}^{t}+\rho(\mathbf{v}^{t+1}-\mathbf{x}^{t+1}).$		(6c)

Existing methods [70, 73, 72] usually solve (6a) via fast Fourier transform (FFT) or Conjugate Gradient methods. For (6b), its solution can be represented as a proximal operator:

\text{{prox}}_{\lambda R}(\mathbf{x}^{t+1}-\mathbf{u}^{\prime t})=\arg\min_{\mathbf{v}}\frac{1}{2}\|\mathbf{v}-\mathbf{x}^{t+1}+\mathbf{u}^{\prime t}\|_{2}^{2}+\lambda R(\mathbf{v}),

(7)

where $\lambda=1/\rho$ . With $\mathbf{u}^{\prime t}=\lambda\mathbf{u}^{t}$ , the multiplier in the (6) and (7) can be absorbed [44]. As demonstrated in [56], (7) can be approximated by shrinkage functions. Existing methods usually use deep CNN model to approximate the solution of (7). However, simply using the convolution operation followed by the fixed activation functions (e.g., ReLU) cannot model the shrinkage functions well as they are far more complex (e.g., non-monotonic) [56]. To better approximate the solution of (6b), we develop a deep CNN model with the Maxout function [16], which can effectively approximate proximal functions. In addition, we note that using FFT to solve (6a) does not obtain better results than the CG method demonstrated by [28, 29]. However, the CG method is time-consuming and unstable in deep networks (see Section 5.4 for more detail). To overcome this problem, we learn a differentiable CG network to restore a clear image more efficiently and effectively.

4 Proposed Method

Different from existing methods that simply learn the regularization term or the data term [70, 4, 32], we formulate both the data term and the regularization term as the learnable ones:

		$\displaystyle\min\limits_{\mathbf{u},\mathbf{v},\mathbf{x},\mathbf{z}}\sum\limits_{i=1}^{N}R_{i}(\mathbf{v}_{i})+\sum\limits_{j=1+N}^{M+N}R_{j}(\mathbf{z}_{j})\quad$		(8)
		$\displaystyle s.t.\quad\mathbf{F}_{i}\mathbf{x}=\mathbf{v}_{i},\quad\mathbf{G}_{j}(\mathbf{y}-\mathbf{Hx})=\mathbf{z}_{j},$		(8)

where $R_{i}$ denotes the $i$ -th learnable function; $\mathbf{v}_{i}$ and $\mathbf{z}_{j}$ are auxiliary variables that correspond to the regularization and data terms; $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ are the $i$ -th and $j$ -th learnable filters for regularization and data, respectively.

By introducing the Lagrangian multipliers $\mathbf{u}_{i}$ and $\mathbf{u}_{j}$ corresponding to the regularization and data terms, we can solve (8) using the ADMM method by:


	$\displaystyle\mathbf{v}_{i}^{t+1}=\text{{prox}}_{\lambda_{i}R_{i}}(\mathbf{F}_{i}\mathbf{x}^{t}+\mathbf{u}_{i}^{t}),$	(9a)
	$\displaystyle\mathbf{z}_{j}^{t+1}=\text{{prox}}_{\lambda_{j}R_{j}}(\mathbf{G}_{j}(\mathbf{y-Hx}^{t})+\mathbf{u}_{j}^{t}),$	(9b)
	$\displaystyle\left(\sum_{i=1}^{N}\rho_{i}\mathbf{F}_{i}^{\top}\mathbf{F}_{i}+\sum_{j=N+1}^{N+M}\rho_{j}\mathbf{H}^{\top}\mathbf{G}_{j}^{\top}\mathbf{G}_{j}\mathbf{H}\right)\mathbf{x}^{t+1}$
$\displaystyle=$	$\displaystyle\Bigg{(}\sum_{i=1}^{N}\rho_{i}\mathbf{F}_{i}^{\top}(\mathbf{v}_{i}^{t+1}-\mathbf{u}_{i}^{t})+\sum_{j=N+1}^{N+M}\rho_{j}\mathbf{H}^{\top}\mathbf{G}_{j}^{\top}(\mathbf{G}_{j}\mathbf{y}-\mathbf{z}^{t+1}_{j}+\mathbf{u}_{j}^{t})\Bigg{)}$	(9c)
	$\displaystyle\mathbf{u}_{i}^{t+1}=\mathbf{u}_{i}^{t}+\mathbf{F}_{i}\mathbf{x}^{t+1}-\mathbf{v}_{i}^{t+1},$	(9d)
	$\displaystyle\mathbf{u}_{j}^{t+1}=\mathbf{u}_{j}^{t}+\mathbf{G}_{j}(\mathbf{y}-\mathbf{H}\mathbf{x}^{t+1})-\mathbf{z}_{j}^{t+1}.$	(9e)

In the following, we will develop deep CNN models with a Maxout layer to approximate the functions of (9a) and (9b). Moreover, we design a simple and effective deep CG network to solve (9c).

4.1 Network Architecture

This section describes how to design our deep CNN models to effectively solve(9a) -(9c) .

Learning Filters $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ .

To learn filters $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ , we develop two networks ( $\mathcal{N}_{F}$ and $\mathcal{N}_{G}$ ), each containing one convolutional layer. The convolutional layer of $\mathcal{N}_{F}$ has $N$ filters of $7\times 7$ pixels, and the convolutional layer of $\mathcal{N}_{G}$ contains $M$ filters of the same size. The $\mathcal{N}_{F}$ and $\mathcal{N}_{G}$ are applied to $\mathbf{x}^{t}$ and $\mathbf{y-Hx}^{t}$ to learn the filters $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ , respectively.

Learning Discriminative Shrinkage Functions for (9a) and (9b).

To better learn the unknown discriminative shrinkage functions of (9a) and (9b), we take advantage of Maxout layers [16]. Specifically, the convolutional Maxout layer consists of two Maxout units. Each Maxout unit contains one convolutional layer followed by a channel-wise Max-pooling layer. Given an input feature map $\mathbf{X}\in\mathbb{R}^{H\times W\times C}$ and the output feature map $\mathbf{X^{o}}\in\mathbb{R}^{H\times W\times KC}$ of the convolutional layer, a Maxout unit is achieved by:

o_{h,w,c}(\mathbf{X})=\max_{j\in\left[0,K\right)}x^{o}_{h,w,c\times K+j},

(10)

where $h\in\left[0,H\right)$ , $w\in\left[0,W\right)$ and $c\in\left[0,C\right)$ ; the $x^{o}_{h,w,c}$ is the element of $\mathbf{X^{o}}$ at the position $(h,w,c)$ , and the $o_{h,w,c}$ is the function to output the $(h,w,c)$ -th element of the output tensor $\mathbf{O}$ . In our implementation, we have $\mathbf{O}\in\mathbb{R}^{H\times W\times C}$ is of the same size as the input $\mathbf{X}$ and $K=4$ .

With two Maxout units, we acquire two output features, $\mathbf{O}_{1}$ and $\mathbf{O}_{2}$ ; the final output tensor of the Maxout layer is their difference, $\mathbf{O}_{1}-\mathbf{O}_{2}$ . We note that Maxout networks are universal approximators that can effectively approximate functions. Thus, we use it to obtain the solutions of (9a) and (9b).

Learning a Differentiable CG Network for (9c).

As stated in Section 3, although using FFT with boundary processing operations (e.g., edge taper and Laplacian smoothing [35]) can efficiently solve (9c), the results are not better than the CG-based solver, which can be observed in Table 5. However, using a CG-based solver is time-consuming. To generate latent clear images better, we develop a differentiable CG network to solve (9c). The CG method is used to solve the linear equation:

\mathbf{Ax}^{t+1}=\mathbf{b},

(11)

where $\mathbf{A}$ corresponds to the first term in (9c) and $\mathbf{b}$ to the last term in (9c). Given $\mathbf{x}\in\mathbb{R}^{d}$ , the CG method recursively computes conjugate vectors $\mathbf{p}_{l}$ and find the difference between desired $\mathbf{x}^{t+1}$ and initial input $\mathbf{x}^{t}$ as

\mathbf{x}^{t+1}-\mathbf{x}^{t}=\mathbf{s_{L}}=\sum_{l=0}^{L}\alpha_{l}\mathbf{p}_{l},

where $L$ is the iteration number upper-bounded by $d$ , and $\alpha_{l}$ is the weight calculated with $\mathbf{p}_{l}$ .

However, if the size of matrix $\mathbf{A}$ is large, using CG to solve (11) needs high computational costs. To overcome this problem, we develop a differentiable CG network based on a U-Net to compute the $\mathbf{s_{L}}$ . The network design is motivated by the following reasons:

•

As one of Krylov subspace methods [2], the solution $\mathbf{s_{L}}$ can be found in the Krylov subspace $\mathcal{K}_{L}(\mathbf{A,r_{0}})=\text{span}\{\mathbf{r_{0},Ar_{0},\dots,A^{\mathnormal{L}-\mathrm{1}}r_{0}}\}$ , where $\mathbf{r_{0}}=\mathbf{b-Ax}^{t}$ is the residual vector. In other words, the CG method is a function of $\mathbf{A}$ and $\mathbf{r_{0}}$ . Our CGNet takes $\mathbf{r_{0}}$ as input and $\mathbf{A}$ as parts of the network. Its output is $\mathbf{s_{L}}$ , which behaves as the CG method.
•

For a typical deconvolution problem, $\mathbf{A}$ is composed of convolution $\mathbf{A_{e}}$ and transpose convolution $\mathbf{A_{d}}$ pairs as the first term in (9c). $\mathbf{A_{e}}$ stands for the operation of $\mathbf{G_{\mathnormal{j}}H}$ and $\mathbf{F_{\mathnormal{i}}}$ , and $\mathbf{A_{d}}$ for $\mathbf{H^{\top}G^{\top}_{\mathnormal{j}}}$ and $\mathbf{F^{\top}_{\mathnormal{i}}}$ . This observation can be intuitively connected to an encoder-decoder architecture, so we integrate $\mathbf{A_{e}}$ into the encoder and $\mathbf{A_{d}}$ into the decoder.
•

The Conjugate Gradient method is sensitive to noise [34, 53]. With an encoder-decoder architecture, U-Net is robust to noise.
•

As the Conjugate Gradient method is a recursive algorithm, U-Net computes feature maps in a recursive fashion.

In the practical CG iterations, $\mathbf{A_{e}}$ and $\mathbf{A_{d}}$ are usually updated with iterative reweighted least squares (IRLS) to utilize the sparsity of priors [29]. We design a simple HypNet to estimate these weights. The network architecture of the HypNet is shown in Fig. 1. In addition, we note that the values of $\rho_{i}$ and $\rho_{j}$ in (9c) depend on the noise level. We design a simple NLNet to estimate the noise map $\mathbf{m_{n}}$ , which plays a role similar to $\rho_{i}$ and $\rho_{j}$ (see Fig. 1). In contrast to most conventional methods, the NLNet computes the weight for each pixel, which is locally adaptive.

5 Experimental Results

5.1 Datasets and Implementation Details

Training Dataset.

Similar to [10, 11], the training data is composed of 4,744 images from the Waterloo Exploration dataset [36] and 400 images from the Berkeley segmentation dataset (BSD) [38]. To synthesize blurry images, we first generate 33,333 blur kernels by [55], where the sizes of these generated blur kernels range from $13\times 13$ pixels to $35\times 35$ pixels. We first crop image patches of $128\times 128$ pixels from each image, and then we randomly use generated blur kernels to generate blurry images. Each blurry image is randomly added Gaussian noise with noise levels from 1% to 5%.

Test Datasets.

We evaluate our method on both synthetic datasets and real ones. For the synthetic case, we use the 100 images from BSD100 and 100 kernels by [55] to generate blurry images, similar to training data. We also use the Set5 [3] dataset with the kernels generated by [5] as our test dataset. In addition, we use the datasets by Levin [30] and Lai [27] for evaluation.

For the real-world dataset, we evaluate our model on the data of Pan [43], where 23 blurry images and 23 kernels estimated by their method are contained.

Table 1: Configuration of four models.

T

is the number of stages, i.e., how many duplicates are in the whole model.

M

and

N

denote the number of filters

\mathbf{F}_{i}

and

\mathbf{G}_{i}

, respectively. The PSNR (dB) and execution time are tested on Set5.

	Feather	Light	Heavy	Full
$T$	2	3	3	4
$M,N$	24	24	49	49
PSNR	32.51	32.78	33.11	33.43
Second	1.893	2.191	2.672	3.065

Implementation Details.

We train the networks using the ADAM [20] optimizer with default parameter settings. The batch size is 8. The total training iterations is 1 million, and the learning rate is from $1\times 10^{-4}$ to $1\times 10^{-7}$ . We gradually decay the learning rate to $1\times 10^{-7}$ every 250,000 iterations and reset it to $1\times 10^{-4}$ , $5\times 10^{-5}$ and $2.5\times 10^{-5}$ at the iteration of 250,001, 500,001 and 750,001, respectively. To constrain the network training, we apply the commonly used $\ell_{1}$ -norm loss to the ground truth and the network output $\mathbf{x}^{T}$ . The data augmentation (including $\pm$ 90°and 180°rotations, vertical and horizontal flipping) is used. In this paper, we train 4 models of different sizes, i.e., Feather, Light, Heavy and Full, whose configurations and simple results on Set5 are shown in Table 1.

It is worth noting that all the stages contain their own parameters and are end-to-end trained rather than share weights [71] or progressively trained [70]. We implement the networks in Pytorch [45] and train on one NVIDIA RTX 3090 GPU.

5.2 Quantitative Evaluation

We compare the proposed DSDNet with 7 state-of-the-art methods including IRCNN [73], SARFL [48], ADM_UDM [21], CPCR [12], KerUNC [41], VEM [42], DWDN [10], SVMAP [9] and DRUNet [72]. These methods are fine-tuned using the same training dataset as Section 5.1 and choose the best models from fine-tuned and the original ones for comparison.

PSNR and SSIM [66] are used for quantitative evaluation. All the quantitative evaluations are conducted without border cropping for fair comparisons.

Table 3 shows quantitative evaluation results on the synthetic datasets. The proposed method generates results with higher PSNR and SSIM values. In addition, we note that the proposed light-weighted model generates favorable results against the state-of-the-art, showing the effectiveness of the proposed algorithm. Due to the space limit, we only present the evaluation results of the Full DSDNet hereafter.

Table 2: Average PSNR(dB)/SSIM of the deblurring results with Gaussian noise using different methods. We highlight the best and the second best results. Our Full DSDNet wins first place, while our Light one also performs favorably against these state-of-the-art methods.

Dataset	noise	IRCNN [73]	SFARL[48]	ADM_UDM [21]	CPCR [12]	KerUNC [41]	VEM [42]	DWDN [10]	SVMAP [11]	DRUNet [72]	DSDNet(Light)	DSDNet(Full)
Dataset	noise	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM
Levin [30]	$1\%$	30.61 / 0.883	25.41 / 0.600	31.48 / 0.922	28.43 / 0.858	32.02 / 0.928	32.05 / 0.927	34.89 / 0.957	35.24 / 0.962	31.94 / 0.922	35.48 / 0.960	36.62 / 0.965
	$3\%$	29.70 / 0.864	16.82 / 0.255	28.61 / 0.812	25.61 / 0.765	21.72 / 0.416	29.47 / 0.867	31.94 / 0.916	31.20 / 0.893	30.86 / 0.905	32.13 / 0.918	32.89 / 0.925
	$5\%$	28.98 / 0.854	13.07 / 0.157	27.83 / 0.827	23.68 / 0.703	18.25 / 0.272	27.79 / 0.819	30.21 / 0.883	30.12 / 0.876	29.79 / 0.880	30.24 / 0.883	30.94 / 0.893
BSD100 [38]	$1\%$	29.20 / 0.817	24.21 / 0.568	29.39 / 0.836	28.77 / 0.829	29.23 / 0.829	29.54 / 0.848	31.10 / 0.881	31.52 / 0.888	30.36 / 0.872	31.50 / 0.892	32.01 / 0.898
	$3\%$	27.54 / 0.762	15.80 / 0.245	26.92 / 0.722	25.96 / 0.712	22.10 / 0.430	27.09 / 0.746	28.47 / 0.797	27.94 / 0.762	28.10 / 0.798	28.73 / 0.812	29.08 / 0.820
	$5\%$	27.04 / 0.756	12.56 / 0.146	26.04 / 0.697	25.75 / 0.688	18.99 / 0.297	26.11 / 0.698	27.50 / 0.762	27.59 / 0.763	27.19 / 0.767	27.64 / 0.774	27.96 / 0.782
Set5 [3]	$1\%$	30.15 / 0.853	26.21 / 0.632	30.52 / 0.868	30.59 / 0.875	30.45 / 0.864	31.00 / 0.875	32.18 / 0.893	32.31 / 0.892	30.84 / 0.881	32.78 / 0.899	33.43 / 0.905
	$3\%$	28.66 / 0.813	15.50 / 0.211	27.64 / 0.709	27.94 / 0.799	21.39 / 0.376	28.40 / 0.804	29.54 / 0.838	28.78 / 0.812	29.21 / 0.841	29.94 / 0.843	30.40 / 0.851
	$5\%$	27.55 / 0.789	11.91 / 0.122	26.75 / 0.756	26.64 / 0.754	17.74 / 0.241	26.46 / 0.732	28.13 / 0.806	28.02 / 0.793	27.85 / 0.805	28.46 / 0.804	28.89 / 0.814

Table 3: Evaluation on the dataset Lai [27]. The best and second best results are highlighted as Table 3. The Saturation results of SVMAP [11] are obtained by the model specifically trained for saturation scenes.

Subset	IRCNN [73]	ADM_UDM [21]	KerUNC [41]	VEM [42]	DWDN [10]	SVMAP [11]	DRUNet [72]	DSDNet
Subset	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM
Manmade	20.47 / 0.604	22.43 / 0.724	22.19 / 0.725	22.71 / 0.780	24.02 / 0.836	23.75 / 0.776	20.62 /0.613	25.44 / 0.859
Natural	23.26 / 0.636	25.04 / 0.733	25.42 / 0.757	25.29 / 0.752	25.91 / 0.814	26.23 / 0.778	23.25 /0.630	27.01 / 0.837
People	28.04 / 0.843	28.81 / 0.866	28.80 / 0.848	27.19 / 0.723	30.02 / 0.905	30.88 / 0.899	28.04 /0.838	30.95 / 0.908
Saturation	16.99 / 0.642	17.57 / 0.627	17.70 / 0.640	17.65 / 0.600	17.90 / 0.695	18.75 / 0.733	17.14 /0.658	18.38 / 0.734
Text	21.37 / 0.828	25.13 / 0.883	23.32 / 0.855	24.92 / 0.853	25.40 / 0.877	25.60 / 0.894	21.79 /0.829	28.13 / 0.920
Overall	22.03 / 0.710	23.80 / 0.767	23.49 / 0.765	23.55 / 0.742	24.65 / 0.825	25.04 / 0.816	22.17 /0.714	25.98 / 0.852

We then evaluate our method on the Lai [27] dataset. Because it contains Manmade, Natural, People, Saturation, and Text subsets, we present the evaluation accordingly. Table 3 shows that our method performs better than the evaluated methods. Similar to Table 3, our method also achieves the highest PSNR and SSIM in most tests, except for the Saturation subset. We note that Dong et al. [11] specifically train a model for saturation scenes; thus, this method performs slightly better. However, our model is only trained with common scenes but comparable in terms of PSNR on the Saturation images. Moreover, the SSIM values of our method are better than SVMAP [11], demonstrating the efficiency and robustness of our approach.

We also quantitatively evaluate our method on real-world blurry images and estimated kernels from Pan [43]. Since the ground truth images are unavailable, we use the no-reference BRISQUE [39] and PIQE [65] metrics for evaluation. Our model achieves the best score in BRISQUE and second place in PIQE, as shown in Table 4. As BRISQUE is a metric based on subject scoring, Table 4 shows that our model generates more subjectively satisfying results than other state-of-the-art methods.

Table 4: Quantitative evaluation of real cases Pan [43]. Non-reference image quality metrics BRISQUE [39] and PIQE [64] are used.

	IRCNN [73]	ADM_UDM [21]	KerUNC [41]	VEM [42]	DWDN [10]	SVMAP [11]	DRUNet [72]	DSDNet
BRISQUE	43.484	36.598	37.816	33.663	34.027	35.508	46.774	33.129
PIQE	78.700	67.605	65.674	44.942	51.348	56.032	81.074	49.788

5.3 Qualitative Evaluation

We show visual comparisons of a synthesized and a real-world case in Fig. 4 Fig. 5, respectively.

Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate blur results. In contrast, our method reconstructs better images (e.g., the wood texture is better restored, as shown in both red and green boxes).

We show some visual comparisons on BSD100, Lai and a real-world case in Fig. 3, Fig. 4 and Fig. 5, respectively.

Fig. 3 shows the deblurred results of one synthetic blurry image with $1\%$ Gaussian noise. Our method generates the clearer image with finer details as shown in the parts enclosed in the red and green boxes.

Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate the results with blur effect. In contrast, our method reconstructs better images (e.g., the wood texture is better restored as shown in both red and green boxes).

Fig. 5 shows the deblurred results of a pair of real-world blurry images and estimated kernel [43]. Our method restores the text to sharpness in the red boxes and the blackest and sharpest eyebrow in the green boxes. In contrast, other methods cannot restore the text well and mix the eyebrow with the skin color. Furthermore, they also generate artifacts and noise in the olive background.

5.4 Ablation Study

Table 5: Ablation study on Set5. “w/o F, G” is out of the convolutional layers before the Maxout layers; “ReLU” and “RBF” replace the Maxout layers with ReLU and RBF layers, respectively; “CG” performs conventional Conjugate Gradient method for deconvolution rather than CGNet, “FFT” performs FFT deconvolution with edge taper [25], and the superscript ^† means the input is denoised by DRUNet [72] first.

	w/o $\mathbf{F}$ , $\mathbf{G}$	ReLU	RBF	CG	CG^†	FFT	FFT^†	DSDNet
PSNR(dB)	26.66	32.78	32.98	29.07	32.39	31.30	32.03	33.11
FLOPs(G)	136.26	464.94	468.03	470.73	559.97	288.89	391.57	466.32
Parameters(M)	1.04	237.85	237.85	0.31	32.95	0.27	32.09	237.87
Gain(dB)	-6.45	-0.33	-0.13	-4.04	-0.72	-1.81	-1.08	-/-

In this section, we design experiments to show the efficiency of the proposed discriminative shrinkage functions and the differentiable CGNet. Table 5 shows the ablation results w.r.t. different baselines. In this study, we train 7 models based on the architecture of the Heavy DSDNet. We compare the number of floating point operations (FLOPs) and the parameters in this study.

To validate the effects of the $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ , we train a model without estimating these two filters, denoted by “w/o $\mathbf{F}$ , $\mathbf{G}$ ”. Without the feature maps coming from them, we can only learn the shrinkage functions from RGB inputs. Table 5 shows that the PSNR value of the results by the baseline method without $\mathbf{F}_{i}$ and $\mathbf{G}_{j}$ is at least 6.45dB lower than that of our approach.

We also evaluate the effect of the Maxout layers by replacing them with ReLU. Table 5 shows that the method using ReLU does not generate better results than the proposed method, suggesting the effectiveness of the Maxout layers.

As RBFs are usually used to approximate shrinkage functions, one may wonder whether using them generates better results or not. To answer this question, we replace the Maxout layers with the commonly-used Gaussian RBFs. Table 5 shows that the PSNR value of the method using RBFs is at least 0.13dB lower than that of our method, indicating the effectiveness of the proposed method.

Finally, to demonstrate the efficiency of the proposed CGNet, we train 2 models with the conventional CG method and 2 models with FFT deconvolution. The CG method is unstable with respect to even small perturbations [37, 18]. Each optimization step in training and the existence of noise may cause the divergence of the CG method. Hence we have to reduce the learning rate and apply gradient clipping to avoid the gradient exploding during the training. However, it still performs poorly even if the training can be finished. To make the training more feasible, we denoise the inputs first by DRUNet [72], and this model is denoted as “CG^†”. The training of “CG^†” is smoother, the learning rate can be set as that of the DSDNet, and the performance is much better than generic “CG”. With another model to denoise, the computational cost is about 26% more FLOPs, and it takes more than twice the time for training compared to the DSDNet.

Similar to CG ones, we also provide results of deconvolution via FFT with the artifact processing operation by [25], i.e., “FFT” and “FFT^†”. Although FFT ones are much faster than CG ones, the gap between “CG^†” and “FFT^†” is considerable, as mentioned in Section 3. As for the test time on Set5, “CG” is 5.6001 seconds, “CG^†” is 5.8252 seconds, “FFT” is 3.6941 seconds, “FFT^†” is 4.0330 seconds, and DSDNet is 2.6717 seconds. These result show the efficiency of the proposed CGNet. We include the ablation study on HypNet and NLNet in the supplementary material.

5.5 Execution Time Analysis

We analyze the execution time of the proposed methods and state-of-the-art ones. All the execution times are evaluated on one nVidia RTX 2080Ti GPU. Fig. 6 shows that our models are faster and more accurate than the state-of-the-art methods. Among these methods, our Feather model is a little bit faster than DWDN by 0.0052 seconds yet outperforms all other methods in PSNR.

5.6 Limitations

Although better performance on various datasets has been achieved, our method has some limitations. Our model cannot deal with blurry images containing significant saturation regions, which may lead to overflow. More analysis can be found in the supplementary material.

6 Conclusion

In this paper, we present a fully learnable MAP model for non-blind deconvolution. We formulate the data and regularization terms as the learnable ones and split the deconvolution model into data-related and regularization-related sub-problems in the ADMM framework. Maxout layers are used to learn the discriminative shrinkage functions, which directly approximate the solutions of these two sub-problems. We have further developed a CGNet to restore the images effectively and efficiently. With a reasonable design, the size of our model is flexibly adjustable while keeping competitive in performance. Extensive evaluations on the benchmark datasets demonstrate that the proposed model performs favorably against the state-of-the-art non-blind deconvolution methods in terms of quantitative metrics, visual quality, and computational efficiency.

References

[1] Aljadaany, R., Pal, D.K., Savvides, M.: Douglas-rachford networks: Learning both the image prior and data fidelity terms for blind image deconvolution. In: CVPR. pp. 10235–10244 (2019)
[2] Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the solution of Linear Systems: Building Blocks for Iterative Methods. SIAM (1994)
[3] Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC (2012)
[4] Bigdeli, S.A., Zwicker, M., Favaro, P., Jin, M.: Deep mean-shift priors for image restoration. In: NeurIPS (2017)
[5] Chakrabarti, A.: A neural approach to blind motion deblurring. In: ECCV. pp. 221–235 (2016)
[6] Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE TIP 7(3), 370–375 (1998)
[7] Chen, L., Zhang, J., Pan, J., Lin, S., Fang, F., Ren, J.S.: Learning a non-blind deblurring network for night blurry images. In: CVPR. pp. 10542–10550 (2021)
[8] Cho, S., Wang, J., Lee, S.: Handling outliers in non-blind image deconvolution. In: ICCV. pp. 495–502 (2011)
[9] Dong, J., Pan, J., Sun, D., Su, Z., Yang, M.H.: Learning data terms for non-blind deblurring. In: ECCV. pp. 748–763 (2018)
[10] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: NeurIPS (2020)
[11] Dong, J., Roth, S., Schiele, B.: Learning spatially-variant map models for non-blind image deblurring. In: CVPR. pp. 4886–4895 (2021)
[12] Eboli, T., Sun, J., Ponce, J.: End-to-end interpretable learning of non-blind image deblurring. In: European Conference on Computer Vision. pp. 314–331. Springer (2020)
[13] Gao, H., Tao, X., Shen, X., Jia, J.: Dynamic scene deblurring with parameter selective sharing and nested skip connections. In: CVPR. pp. 3848–3856 (2019)
[14] Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE TPAMI 14(3), 367–383 (1992)
[15] Gong, D., Zhang, Z., Shi, Q., van den Hengel, A., Shen, C., Zhang, Y.: Learning deep gradient descent optimization for image deconvolution. IEEE transactions on neural networks and learning systems 31(12), 5468–5482 (2020)
[16] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML. pp. 1319–1327 (2013)
[17] Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: CVPR. pp. 2862–2869 (2014)
[18] Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Courier Corporation (2003)
[19] Jancsary, J., Nowozin, S., Rother, C.: Loss-specific training of non-parametric image restoration models: A new state of the art. In: ECCV. pp. 112–125 (2012)
[20] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
[21] Ko, H.C., Chang, J.Y., Ding, J.J.: Deep priors inside an unrolled and adaptive deconvolution model. In: ACCV (2020)
[22] Kong, S., Wang, W., Feng, X., Jia, X.: Deep red unfolding network for image restoration. IEEE TIP 31, 852–867 (2021)
[23] Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: NeurIPS. pp. 1033–1041 (2009)
[24] Krishnan, D., Tay, T., Fergus, R.: Blind deconvolution using a normalized sparsity measure. In: CVPR. pp. 233–240 (2011)
[25] Kruse, J., Rother, C., Schmidt, U.: Learning to push the limits of efficient fft-based image deconvolution. In: ICCV. pp. 4586–4594 (2017)
[26] Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR. pp. 8183–8192 (2018)
[27] Lai, W.S., Huang, J.B., Hu, Z., Ahuja, N., Yang, M.H.: A comparative study for single image blind deblurring. In: CVPR. pp. 1701–1709 (2016)
[28] Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM TOG 26(3), 70–es (2007)
[29] Levin, A., Weiss, Y.: User assisted separation of reflections from a single image using a sparsity prior. IEEE TPAMI 29(9), 1647–1654 (2007)
[30] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding and evaluating blind deconvolution algorithms. In: CVPR. pp. 1964–1971 (2009)
[31] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding blind deconvolution algorithms. IEEE TPAMI 33(12), 2354–2367 (2011)
[32] Li, L., Pan, J., Lai, W.S., Gao, C., Sang, N., Yang, M.H.: Blind image deblurring via deep discriminative priors. IJCV 127(8), 1025–1043 (2019)
[33] Li, Y., Tofighi, M., Geng, J., Monga, V., Eldar, Y.C.: Deep algorithm unrolling for blind image deblurring. arXiv preprint arXiv:1902.03493 (2019)
[34] Liu, C.S.: Modifications of steepest descent method and conjugate gradient method against noise for ill-posed linear systems. Commun. Numer. Anal 2012 (2012)
[35] Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: ICIP. pp. 505–508 (2008)
[36] Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE TIP 26(2), 1004–1016 (2016)
[37] Marin, L., Háo, D.N., Lesnic, D.: Conjugate gradient-boundary element method for a cauchy problem in the lamé system. WIT Transactions on Modelling and Simulation 27 (2001)
[38] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV. pp. 416–423 (2001)
[39] Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE TIP 21(12), 4695–4708 (2012). https://doi.org/10.1109/TIP.2012.2214050
[40] Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR. pp. 3883–3891 (2017)
[41] Nan, Y., Ji, H.: Deep learning for handling kernel/model uncertainty in image deconvolution. In: CVPR. pp. 2388–2397 (2020)
[42] Nan, Y., Quan, Y., Ji, H.: Variational-em-based deep learning for noise-blind image deblurring. In: CVPR. pp. 3626–3635 (2020)
[43] Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: CVPR. pp. 1628–1636 (2016)
[44] Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1(3), 127–239 (2014)
[45] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (eds.) NeurIPS. pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[46] Perrone, D., Favaro, P.: Total variation blind deconvolution: The devil is in the details. In: CVPR. pp. 2909–2916 (2014)
[47] Qiu, H., Hammernik, K., Qin, C., Rueckert, D.: Gradirn: Learning iterative gradient descent-based energy minimization for deformable image registration. arXiv preprint arXiv:2112.03915 (2021)
[48] Ren, D., Zuo, W., Zhang, D., Zhang, L., Yang, M.H.: Simultaneous fidelity and regularization learning for image restoration. IEEE TPAMI (2019)
[49] Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhanced low-rank prior. IEEE TIP 25(7), 3426–3437 (2016)
[50] Richardson, W.H.: Bayesian-based iterative method of image restoration. JoSA 62(1), 55–59 (1972)
[51] Roth, S., Black, M.J.: Fields of experts: A framework for learning image priors. In: CVPR. pp. 860–867 (2005)
[52] Rudin, L.I., Osher, S.: Total variation based image restoration with free local constraints. In: ICIP. vol. 1, pp. 31–35 (1994)
[53] Ryabtsev, A.: The error accumulation in the conjugate gradient method for degenerate problem. arXiv preprint arXiv:2004.10242 (2020)
[54] Samuel, K.G., Tappen, M.F.: Learning optimized map estimates in continuously-valued mrf models. In: CVPR. pp. 477–484 (2009)
[55] Schmidt, U., Jancsary, J., Nowozin, S., Roth, S., Rother, C.: Cascades of regression tree fields for image restoration. IEEE TPAMI 38(4), 677–689 (2015)
[56] Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: CVPR. pp. 2774–2781 (2014)
[57] Schmidt, U., Rother, C., Nowozin, S., Jancsary, J., Roth, S.: Discriminative non-blind deblurring. In: CVPR. pp. 604–611 (2013)
[58] Schuler, C.J., Christopher Burger, H., Harmeling, S., Scholkopf, B.: A machine learning approach for non-blind image deconvolution. In: CVPR. pp. 1067–1074 (2013)
[59] Suin, M., Purohit, K., Rajagopalan, A.: Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In: CVPR. pp. 3606–3615 (2020)
[60] Sun, J., Cao, W., Xu, Z., Ponce, J.: Learning a convolutional neural network for non-uniform motion blur removal. In: CVPR. pp. 769–777 (2015)
[61] Sun, L., Cho, S., Wang, J., Hays, J.: Edge-based blur kernel estimation using patch priors. In: ICCP. pp. 1–8 (2013)
[62] Tao, X., Gao, H., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. In: CVPR. pp. 8174–8182 (2018)
[63] Tappen, M.F., Liu, C., Adelson, E.H., Freeman, W.T.: Learning gaussian conditional random fields for low-level vision. In: CVPR. pp. 1–8 (2007)
[64] Venkatanath, N., Praneeth, D., Bh, M.C., Channappayya, S.S., Medasani, S.S.: Blind image quality evaluation using perception based features. In: National Conference on Communications (NCC). pp. 1–6 (2015)
[65] Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)
[66] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
[67] Xiang, J., Dong, Y., Yang, Y.: Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE TMI (2021)
[68] Yang, Y., Sun, J., Li, H., Xu, Z.: Deep admm-net for compressive sensing mri. In: NeurIPS. pp. 10–18 (2016)
[69] Zhang, J., Ghanem, B.: Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. In: CVPR. pp. 1828–1837 (2018)
[70] Zhang, J., shan Pan, J., Lai, W.S., Lau, R.W.H., Yang, M.H.: Learning fully convolutional networks for iterative non-blind deconvolution. In: CVPR. pp. 6969–6977 (2017)
[71] Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: CVPR. pp. 3217–3226 (2020)
[72] Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE TPAMI (2021)
[73] Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: CVPR. pp. 3929–3938 (2017)
[74] Zhang, K., Zuo, W., Zhang, L.: Deep plug-and-play super-resolution for arbitrary blur kernels. In: CVPR. pp. 1671–1681 (2019)
[75] Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: ICCV. pp. 479–486 (2011)