This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: 1Graduate Institute of Electronics Engineering, National Taiwan University
  2Nanjing University of Science and Technology
  3Google Research   4University of California, Merced   5Yonsei University

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

Pin-Hung Kuo  Jinshan Pan  Shao-Yi Chien  Ming-Hsuan Yang 112211334455
Abstract

Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing a variety of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. This paper proposes an effective non-blind deconvolution approach by learning discriminative shrinkage functions to model these terms implicitly. Most existing methods use deep convolutional neural networks (CNNs) or radial basis functions to learn the regularization term simply. In contrast, we formulate both the data term and regularization term and split the deconvolution model into data-related and regularization-related sub-problems according to the alternating direction method of multipliers. We explore the properties of the Maxout function and develop a deep CNN model with a Maxout layer to learn discriminative shrinkage functions to approximate the solutions of these two sub-problems directly. Moreover, the fast-Fourier-transform-based image restoration usually leads to ringing artifacts. At the same time, the conjugate-gradient-based approach is time-consuming; we develop the Conjugate Gradient Network to restore the latent clear images effectively and efficiently. Experimental results show that the proposed method performs favorably against the state-of-the-art in terms of efficiency and accuracy. Source codes, models, and more results are available at https://github.com/setsunil/DSDNet.

1 Introduction

The single image deconvolution, or deblurring, aims to restore a clear and sharp image from a single blurry input image. Blind image deblurring has attracted interest from many researchers [31, 24, 49, 43]. With the rapid development of deep learning, tremendous progress has been made in blind image deblurring recently [26, 62, 74, 13, 59]. Since the kernel is available via blind methods, how to utilize these kernels well is still an important issue. Therefore, non-blind deconvolution has never lost the attention of researchers over the past decades [50, 14, 8, 10]. Due to the deconvolution problem’s ill-posedness, numerous methods explore the statistical properties of clear images as the image priors (e.g., hyper-Laplacian prior [23, 30]) to make this problem tractable. Although using the hand-crafted image priors facilitates ringing artifacts removal, the fine details are not restored well, as these limited priors may not model the inherent properties of various latent images sufficiently.

To overcome this problem, discriminative image priors are learned from training examples [58, 56, 9, 48]. These methods usually leverage radial basis functions (RBFs) as the shrinkage functions of the image prior. However, the RBFs contain many parameters, leading to complex optimization problems.

Deep convolutional neural network (CNN) has been developed to learn more effective regularization terms for the deconvolution problem [70]. These methods are motivated by [56] and directly estimate the solution to the regularization-related sub-problem by deep CNN models. As analyzed by [56], we can obtain the solution to the regularization-related sub-problem by combining the shrinkage functions. However, as the shrinkage functions are complex (e.g., non-Monotonic), simply using the convolution operation followed by the common activation functions, e.g., ReLU, cannot model the property of the shrinkage functions. Given the effectiveness of the deep features, it is of great interest to learn discriminative shrinkage functions. Therefore, if we can learn more complex shrinkage functions corresponding to the deep features, they shall surpass the hand-crafted ones in solving the regularization-related sub-problem.

We note that image restoration involves an image deconvolution step, usually taking advantage of fast Fourier transform (FFT) [56, 43, 73] or the Conjugate Gradient (CG) method [2, 28, 10]. However, the FFT-based approaches usually lead to ringing artifacts, while the CG-based ones are time-consuming. Besides, these two methods suffer from information loss: for FFT, we lose little information when we discard the imaginary parts in the inverse real FFT; for the CG method, the iterations we execute are usually far less than the upper bound. Therefore, developing an effective yet efficient image restoration method is also necessary.

In this paper, we develop a simple and effective model to discriminatively learn the shrinkage functions for non-blind deconvolution, which is called Discriminative Shrinkage Deep Network (DSDNet). We formulate the data and regularization terms as the learnable ones and split the image deconvolution model into the data-related sub-problem and regularization-related sub-problem. As shrinkage functions can solve both these sub-problems, and the learnable Maxout functions can efficiently approximate any complex functions, we directly learn the shrinkage functions of sub-problems via a deep CNN model with Maxout layers [16]. To effectively and efficiently generate clear images from the output of the learned functions, we develop a fully convolutional Conjugate Gradient Network (CGNet) motivated by the mathematical concept of the CG method. Finally, we formulate our method into an end-to-end network and solve the proposed approach by an Alternating Direction Method of Multipliers (ADMM) [44]. Experimental results show that the proposed method performs favorably against the state-of-the-art ones.

The main contributions of this work are:

  • We propose a simple yet effective non-blind deconvolution model to directly learn discriminative shrinkage functions to model the data and regularization terms for image deconvolution implicitly.

  • We develop an efficient and effective CGNet to restore clear images without the problems of CG and FFT.

  • The architecture of DSDNet is designed elaborately, which makes it flexible in model size and easy to be trained. Even the smallest DSDNet performs favorably against the state-of-the-art methods in speed and accuracy.

2 Related Work

Because numerous image deconvolution methods have been proposed, we discuss those most relevant to this work.

Statistical Image Prior-Based Methods.

Since non-blind deconvolution is an ill-posed problem, conventional methods usually develop image priors based on the statistical properties of clear images. Representative methods include total variation [52, 6, 65, 46], hyper-Laplacian prior [28, 23], and patch-based prior [61, 17, 75], to name a few. However, these hand-crafted priors may not model the inherent properties of the latent image well; thus, these methods do not effectively restore realistic images.

Learning-Based Methods.

To overcome the above limitations of the hand-crafted priors, researchers have proposed learning-based approaches, e.g., Markov random fields [51, 54], Gaussian mixture models [75], conditional random fields [63, 19, 57, 55], and radial basis functions [56, 9].

The learning-based non-blind deconvolution also gets deeper with the development of neural networks. Many methods use deep CNNs to model the regularization term and solve image restoration problems by unrolling existing optimization algorithms. For example, Iterative Shrinkage-Thresholding algorithm [69, 67], Douglas-Rachford method [1], Half-Quadratic Splitting algorithm [33, 71, 21, 70, 4, 32, 7, 22], gradient descend [15, 47] and ADMM [68]. These methods use deep CNN models to estimate the solution to the regularization-related sub-problem. As demonstrated by [56], the solutions are the combinations of the shrinkage functions. Simply using deep CNN models does not model the shrinkage functions well since most activation functions are too simple. Besides, most of them focus on the regularization terms yet ignore the importance of data terms. In addition, the image restoration step in these methods usually depends on an FFT-based solution. However, using FFT may lead to results with ringing artifacts. Even though the edge taper [25] alleviates artifacts, they are still inevitable in many scenes.

To overcome these problems, we leverage Maxout layers to learn discriminative shrinkage functions for regularization and data terms and develop the CGNet to restore images better. Furthermore, we adopt average pooling for noise level estimation and residual block for re-weights computation. In other words, we design each component according to the mathematical characteristics rather than stacking as many convolutional layers as possible, as in most previous works.

Blind Deblurring Methods.

Numerous end-to-end deep networks [60, 40, 62, 26] have been developed to restore clear images from blurry images directly. However, as demonstrated in [10], when the blur kernels are given, these methods do not perform well compared to the non-blind deconvolution methods. As non-blind deconvolution is vital for image restoration, we focus on this problem and develop a simple and effective approach to restoring high-quality images.

Refer to caption
Figure 1: Overview of the proposed method. The blue blocks and lines are the layers and flow of the regularization terms; the yellow ones are that of data terms. The HypNet is responsible for the reweighted maps the NLNet learns to control the weights of regularization and data terms according to the local noise level.
Refer to caption
Figure 2: Network architecture of the proposed CGNet. The residual vector 𝐫𝟎=𝐛𝐀𝐱t\mathbf{r_{0}}=\mathbf{b-Ax}^{t} is the input feature map, the operation 𝐀\mathbf{A}, the convolution 𝐀𝐞\mathbf{A_{e}} and the transposed convolution 𝐀𝐝\mathbf{A_{d}} are composed of the input filters 𝐅i\mathbf{F_{\mathnormal{i}}}, 𝐆j\mathbf{G_{\mathnormal{j}}} and 𝐇\mathbf{H}. The reweighted maps 𝐦𝐩\mathbf{m_{p}}, 𝐦𝐝\mathbf{m_{d}} are multiplied between 𝐀𝐞\mathbf{A_{e}} and 𝐀𝐝\mathbf{A_{d}} as the IRLS.

3 Revisiting Deep Unrolling-Based Methods

We first revisit deep unrolling-based methods for image deconvolution to motivate our work. Mathematically, the degradation process of the image blur is usually formulated as:

y=kx+n,y=k\ast x+n, (1)

where \ast denotes the convolution operator; yy, kk, xx and nn denote the blurry image, the blur kernel, the latent image and noise, respectively. With the known kernel kk, we usually use formulate the deconvolution as a maximum-a-posteriori (MAP) problem:

x=argmaxxp(x|y,k)=argmaxxp(y|x,k)p(x),x=\arg\max_{x}p(x|y,k)=\arg\max_{x}p(y|x,k)p(x), (2)

where p(y|x,k)p(y|x,k) is the likelihood of the observation (blurry) yy, while p(x)p(x) denotes an image prior of the latent image xx. This equation is equivalent to

minxR(x)+D(ykx),\min_{x}R(x)+D(y-k\ast x), (3)

where R(x)R(x) and D(ykx)D(y-k\ast x) denote the regularization term and data term. In addition, the data term is usually modeled in the form of 2\ell_{2}-norm, then (3) can be rewritten as

min𝐱12𝐲𝐇𝐱22+R(𝐱),\min_{\mathbf{x}}\frac{1}{2}\|\mathbf{y-Hx}\|_{2}^{2}+R(\mathbf{x}), (4)

where 𝐱\mathbf{x}, 𝐲\mathbf{y} denote the vector forms of xx and yy, respectively; 𝐇\mathbf{H} denotes the Toeplitz matrix of the blur kernel kk. The ADMM method for image deconvolution is usually achieved by solving:

min𝐱,𝐮12𝐲𝐇𝐱22+R(𝐯)+𝐮(𝐯𝐱)+ρ2𝐯𝐱22,\min_{\mathbf{x},\mathbf{u}}\frac{1}{2}\|\mathbf{y-Hx}\|_{2}^{2}+R(\mathbf{v})+\mathbf{u}^{\top}(\mathbf{v-x})+\frac{\rho}{2}\|\mathbf{v-x}\|_{2}^{2}, (5)

where 𝐯\mathbf{v} is an auxiliary variable, 𝐮\mathbf{u} is a Lagrangian multiplier, ρ\rho is a weight parameter.

The solution of (5) can be obtained by alternatively solving:

𝐱t+1=min𝐱𝐲𝐇𝐱22+ρ𝐯t𝐱+𝐮tρ22,\displaystyle\mathbf{x}^{t+1}=\min_{\mathbf{x}}\|\mathbf{y-Hx}\|_{2}^{2}+\rho\|\mathbf{v}^{t}-\mathbf{x}+\frac{\mathbf{u}^{t}}{\rho}\|_{2}^{2}, (6a)
𝐯t+1=min𝐯ρ2𝐯𝐱t+1+𝐮tρ22+R(𝐯),\displaystyle\mathbf{v}^{t+1}=\min_{\mathbf{v}}\frac{\rho}{2}\|\mathbf{v}-\mathbf{x}^{t+1}+\frac{\mathbf{u}^{t}}{\rho}\|_{2}^{2}+R(\mathbf{v}), (6b)
𝐮t+1=𝐮t+ρ(𝐯t+1𝐱t+1).\displaystyle\mathbf{u}^{t+1}=\mathbf{u}^{t}+\rho(\mathbf{v}^{t+1}-\mathbf{x}^{t+1}). (6c)

Existing methods [70, 73, 72] usually solve (6a) via fast Fourier transform (FFT) or Conjugate Gradient methods. For (6b), its solution can be represented as a proximal operator:

proxλR(𝐱t+1𝐮t)=argmin𝐯12𝐯𝐱t+1+𝐮t22+λR(𝐯),\text{{prox}}_{\lambda R}(\mathbf{x}^{t+1}-\mathbf{u}^{\prime t})=\arg\min_{\mathbf{v}}\frac{1}{2}\|\mathbf{v}-\mathbf{x}^{t+1}+\mathbf{u}^{\prime t}\|_{2}^{2}+\lambda R(\mathbf{v}), (7)

where λ=1/ρ\lambda=1/\rho. With 𝐮t=λ𝐮t\mathbf{u}^{\prime t}=\lambda\mathbf{u}^{t}, the multiplier in the (6) and (7) can be absorbed [44]. As demonstrated in [56], (7) can be approximated by shrinkage functions. Existing methods usually use deep CNN model to approximate the solution of (7). However, simply using the convolution operation followed by the fixed activation functions (e.g., ReLU) cannot model the shrinkage functions well as they are far more complex (e.g., non-monotonic) [56]. To better approximate the solution of (6b), we develop a deep CNN model with the Maxout function [16], which can effectively approximate proximal functions. In addition, we note that using FFT to solve (6a) does not obtain better results than the CG method demonstrated by [28, 29]. However, the CG method is time-consuming and unstable in deep networks (see Section 5.4 for more detail). To overcome this problem, we learn a differentiable CG network to restore a clear image more efficiently and effectively.

4 Proposed Method

Different from existing methods that simply learn the regularization term or the data term [70, 4, 32], we formulate both the data term and the regularization term as the learnable ones:

min𝐮,𝐯,𝐱,𝐳i=1NRi(𝐯i)+j=1+NM+NRj(𝐳j)\displaystyle\min\limits_{\mathbf{u},\mathbf{v},\mathbf{x},\mathbf{z}}\sum\limits_{i=1}^{N}R_{i}(\mathbf{v}_{i})+\sum\limits_{j=1+N}^{M+N}R_{j}(\mathbf{z}_{j})\quad (8)
s.t.𝐅i𝐱=𝐯i,𝐆j(𝐲𝐇𝐱)=𝐳j,\displaystyle s.t.\quad\mathbf{F}_{i}\mathbf{x}=\mathbf{v}_{i},\quad\mathbf{G}_{j}(\mathbf{y}-\mathbf{Hx})=\mathbf{z}_{j},

where RiR_{i} denotes the ii-th learnable function; 𝐯i\mathbf{v}_{i} and 𝐳j\mathbf{z}_{j} are auxiliary variables that correspond to the regularization and data terms; 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j} are the ii-th and jj-th learnable filters for regularization and data, respectively.

By introducing the Lagrangian multipliers 𝐮i\mathbf{u}_{i} and 𝐮j\mathbf{u}_{j} corresponding to the regularization and data terms, we can solve (8) using the ADMM method by:

𝐯it+1=proxλiRi(𝐅i𝐱t+𝐮it),\displaystyle\mathbf{v}_{i}^{t+1}=\text{{prox}}_{\lambda_{i}R_{i}}(\mathbf{F}_{i}\mathbf{x}^{t}+\mathbf{u}_{i}^{t}), (9a)
𝐳jt+1=proxλjRj(𝐆j(𝐲𝐇𝐱t)+𝐮jt),\displaystyle\mathbf{z}_{j}^{t+1}=\text{{prox}}_{\lambda_{j}R_{j}}(\mathbf{G}_{j}(\mathbf{y-Hx}^{t})+\mathbf{u}_{j}^{t}), (9b)
(i=1Nρi𝐅i𝐅i+j=N+1N+Mρj𝐇𝐆j𝐆j𝐇)𝐱t+1\displaystyle\left(\sum_{i=1}^{N}\rho_{i}\mathbf{F}_{i}^{\top}\mathbf{F}_{i}+\sum_{j=N+1}^{N+M}\rho_{j}\mathbf{H}^{\top}\mathbf{G}_{j}^{\top}\mathbf{G}_{j}\mathbf{H}\right)\mathbf{x}^{t+1}
=\displaystyle= (i=1Nρi𝐅i(𝐯it+1𝐮it)+j=N+1N+Mρj𝐇𝐆j(𝐆j𝐲𝐳jt+1+𝐮jt))\displaystyle\Bigg{(}\sum_{i=1}^{N}\rho_{i}\mathbf{F}_{i}^{\top}(\mathbf{v}_{i}^{t+1}-\mathbf{u}_{i}^{t})+\sum_{j=N+1}^{N+M}\rho_{j}\mathbf{H}^{\top}\mathbf{G}_{j}^{\top}(\mathbf{G}_{j}\mathbf{y}-\mathbf{z}^{t+1}_{j}+\mathbf{u}_{j}^{t})\Bigg{)} (9c)
𝐮it+1=𝐮it+𝐅i𝐱t+1𝐯it+1,\displaystyle\mathbf{u}_{i}^{t+1}=\mathbf{u}_{i}^{t}+\mathbf{F}_{i}\mathbf{x}^{t+1}-\mathbf{v}_{i}^{t+1}, (9d)
𝐮jt+1=𝐮jt+𝐆j(𝐲𝐇𝐱t+1)𝐳jt+1.\displaystyle\mathbf{u}_{j}^{t+1}=\mathbf{u}_{j}^{t}+\mathbf{G}_{j}(\mathbf{y}-\mathbf{H}\mathbf{x}^{t+1})-\mathbf{z}_{j}^{t+1}. (9e)

In the following, we will develop deep CNN models with a Maxout layer to approximate the functions of (9a) and (9b). Moreover, we design a simple and effective deep CG network to solve (9c).

4.1 Network Architecture

This section describes how to design our deep CNN models to effectively solve(9a) -(9c) .

Learning Filters 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j}.

To learn filters 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j}, we develop two networks (𝒩F\mathcal{N}_{F} and 𝒩G\mathcal{N}_{G}), each containing one convolutional layer. The convolutional layer of 𝒩F\mathcal{N}_{F} has NN filters of 7×77\times 7 pixels, and the convolutional layer of 𝒩G\mathcal{N}_{G} contains MM filters of the same size. The 𝒩F\mathcal{N}_{F} and 𝒩G\mathcal{N}_{G} are applied to 𝐱t\mathbf{x}^{t} and 𝐲𝐇𝐱t\mathbf{y-Hx}^{t} to learn the filters 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j}, respectively.

Learning Discriminative Shrinkage Functions for (9a) and (9b).

To better learn the unknown discriminative shrinkage functions of (9a) and (9b), we take advantage of Maxout layers [16]. Specifically, the convolutional Maxout layer consists of two Maxout units. Each Maxout unit contains one convolutional layer followed by a channel-wise Max-pooling layer. Given an input feature map 𝐗H×W×C\mathbf{X}\in\mathbb{R}^{H\times W\times C} and the output feature map 𝐗𝐨H×W×KC\mathbf{X^{o}}\in\mathbb{R}^{H\times W\times KC} of the convolutional layer, a Maxout unit is achieved by:

oh,w,c(𝐗)=maxj[0,K)xh,w,c×K+jo,o_{h,w,c}(\mathbf{X})=\max_{j\in\left[0,K\right)}x^{o}_{h,w,c\times K+j}, (10)

where h[0,H)h\in\left[0,H\right), w[0,W)w\in\left[0,W\right) and c[0,C)c\in\left[0,C\right); the xh,w,cox^{o}_{h,w,c} is the element of 𝐗𝐨\mathbf{X^{o}} at the position (h,w,c)(h,w,c), and the oh,w,co_{h,w,c} is the function to output the (h,w,c)(h,w,c)-th element of the output tensor 𝐎\mathbf{O}. In our implementation, we have 𝐎H×W×C\mathbf{O}\in\mathbb{R}^{H\times W\times C} is of the same size as the input 𝐗\mathbf{X} and K=4K=4.

With two Maxout units, we acquire two output features, 𝐎1\mathbf{O}_{1} and 𝐎2\mathbf{O}_{2}; the final output tensor of the Maxout layer is their difference, 𝐎1𝐎2\mathbf{O}_{1}-\mathbf{O}_{2}. We note that Maxout networks are universal approximators that can effectively approximate functions. Thus, we use it to obtain the solutions of (9a) and (9b).

Learning a Differentiable CG Network for (9c).

As stated in Section 3, although using FFT with boundary processing operations (e.g., edge taper and Laplacian smoothing [35]) can efficiently solve (9c), the results are not better than the CG-based solver, which can be observed in Table 5. However, using a CG-based solver is time-consuming. To generate latent clear images better, we develop a differentiable CG network to solve (9c). The CG method is used to solve the linear equation:

𝐀𝐱t+1=𝐛,\mathbf{Ax}^{t+1}=\mathbf{b}, (11)

where 𝐀\mathbf{A} corresponds to the first term in (9c) and 𝐛\mathbf{b} to the last term in (9c). Given 𝐱d\mathbf{x}\in\mathbb{R}^{d}, the CG method recursively computes conjugate vectors 𝐩l\mathbf{p}_{l} and find the difference between desired 𝐱t+1\mathbf{x}^{t+1} and initial input 𝐱t\mathbf{x}^{t} as

𝐱t+1𝐱t=𝐬𝐋=l=0Lαl𝐩l,\mathbf{x}^{t+1}-\mathbf{x}^{t}=\mathbf{s_{L}}=\sum_{l=0}^{L}\alpha_{l}\mathbf{p}_{l},

where LL is the iteration number upper-bounded by dd, and αl\alpha_{l} is the weight calculated with 𝐩l\mathbf{p}_{l}.

However, if the size of matrix 𝐀\mathbf{A} is large, using CG to solve (11) needs high computational costs. To overcome this problem, we develop a differentiable CG network based on a U-Net to compute the 𝐬𝐋\mathbf{s_{L}}. The network design is motivated by the following reasons:

  • As one of Krylov subspace methods [2], the solution 𝐬𝐋\mathbf{s_{L}} can be found in the Krylov subspace 𝒦L(𝐀,𝐫𝟎)=span{𝐫𝟎,𝐀𝐫𝟎,,𝐀L1𝐫𝟎}\mathcal{K}_{L}(\mathbf{A,r_{0}})=\text{span}\{\mathbf{r_{0},Ar_{0},\dots,A^{\mathnormal{L}-\mathrm{1}}r_{0}}\}, where 𝐫𝟎=𝐛𝐀𝐱t\mathbf{r_{0}}=\mathbf{b-Ax}^{t} is the residual vector. In other words, the CG method is a function of 𝐀\mathbf{A} and 𝐫𝟎\mathbf{r_{0}}. Our CGNet takes 𝐫𝟎\mathbf{r_{0}} as input and 𝐀\mathbf{A} as parts of the network. Its output is 𝐬𝐋\mathbf{s_{L}}, which behaves as the CG method.

  • For a typical deconvolution problem, 𝐀\mathbf{A} is composed of convolution 𝐀𝐞\mathbf{A_{e}} and transpose convolution 𝐀𝐝\mathbf{A_{d}} pairs as the first term in (9c). 𝐀𝐞\mathbf{A_{e}} stands for the operation of 𝐆j𝐇\mathbf{G_{\mathnormal{j}}H} and 𝐅i\mathbf{F_{\mathnormal{i}}}, and 𝐀𝐝\mathbf{A_{d}} for 𝐇𝐆j\mathbf{H^{\top}G^{\top}_{\mathnormal{j}}} and 𝐅i\mathbf{F^{\top}_{\mathnormal{i}}}. This observation can be intuitively connected to an encoder-decoder architecture, so we integrate 𝐀𝐞\mathbf{A_{e}} into the encoder and 𝐀𝐝\mathbf{A_{d}} into the decoder.

  • The Conjugate Gradient method is sensitive to noise [34, 53]. With an encoder-decoder architecture, U-Net is robust to noise.

  • As the Conjugate Gradient method is a recursive algorithm, U-Net computes feature maps in a recursive fashion.

In the practical CG iterations, 𝐀𝐞\mathbf{A_{e}} and 𝐀𝐝\mathbf{A_{d}} are usually updated with iterative reweighted least squares (IRLS) to utilize the sparsity of priors [29]. We design a simple HypNet to estimate these weights. The network architecture of the HypNet is shown in Fig. 1. In addition, we note that the values of ρi\rho_{i} and ρj\rho_{j} in (9c) depend on the noise level. We design a simple NLNet to estimate the noise map 𝐦𝐧\mathbf{m_{n}}, which plays a role similar to ρi\rho_{i} and ρj\rho_{j} (see Fig. 1). In contrast to most conventional methods, the NLNet computes the weight for each pixel, which is locally adaptive.

5 Experimental Results

5.1 Datasets and Implementation Details

Training Dataset.

Similar to [10, 11], the training data is composed of 4,744 images from the Waterloo Exploration dataset [36] and 400 images from the Berkeley segmentation dataset (BSD) [38]. To synthesize blurry images, we first generate 33,333 blur kernels by [55], where the sizes of these generated blur kernels range from 13×1313\times 13 pixels to 35×3535\times 35 pixels. We first crop image patches of 128×128128\times 128 pixels from each image, and then we randomly use generated blur kernels to generate blurry images. Each blurry image is randomly added Gaussian noise with noise levels from 1% to 5%.

Test Datasets.

We evaluate our method on both synthetic datasets and real ones. For the synthetic case, we use the 100 images from BSD100 and 100 kernels by [55] to generate blurry images, similar to training data. We also use the Set5 [3] dataset with the kernels generated by [5] as our test dataset. In addition, we use the datasets by Levin [30] and Lai [27] for evaluation.

For the real-world dataset, we evaluate our model on the data of Pan [43], where 23 blurry images and 23 kernels estimated by their method are contained.

Table 1: Configuration of four models. TT is the number of stages, i.e., how many duplicates are in the whole model. MM and NN denote the number of filters 𝐅i\mathbf{F}_{i}and 𝐆i\mathbf{G}_{i}, respectively. The PSNR (dB) and execution time are tested on Set5.
Feather Light Heavy Full
TT 2 3 3 4
M,NM,N 24 24 49 49
PSNR 32.51 32.78 33.11 33.43
Second 1.893 2.191 2.672 3.065

Implementation Details.

We train the networks using the ADAM [20] optimizer with default parameter settings. The batch size is 8. The total training iterations is 1 million, and the learning rate is from 1×1041\times 10^{-4} to 1×1071\times 10^{-7}. We gradually decay the learning rate to 1×1071\times 10^{-7} every 250,000 iterations and reset it to 1×1041\times 10^{-4}, 5×1055\times 10^{-5} and 2.5×1052.5\times 10^{-5} at the iteration of 250,001, 500,001 and 750,001, respectively. To constrain the network training, we apply the commonly used 1\ell_{1}-norm loss to the ground truth and the network output 𝐱T\mathbf{x}^{T}. The data augmentation (including ±\pm90°and 180°rotations, vertical and horizontal flipping) is used. In this paper, we train 4 models of different sizes, i.e., Feather, Light, Heavy and Full, whose configurations and simple results on Set5 are shown in Table 1.

It is worth noting that all the stages contain their own parameters and are end-to-end trained rather than share weights [71] or progressively trained [70]. We implement the networks in Pytorch [45] and train on one NVIDIA RTX 3090 GPU.

5.2 Quantitative Evaluation

We compare the proposed DSDNet with 7 state-of-the-art methods including IRCNN [73], SARFL [48], ADM_UDM [21], CPCR [12], KerUNC [41], VEM [42], DWDN [10], SVMAP [9] and DRUNet [72]. These methods are fine-tuned using the same training dataset as Section 5.1 and choose the best models from fine-tuned and the original ones for comparison.

PSNR and SSIM [66] are used for quantitative evaluation. All the quantitative evaluations are conducted without border cropping for fair comparisons.

Table 3 shows quantitative evaluation results on the synthetic datasets. The proposed method generates results with higher PSNR and SSIM values. In addition, we note that the proposed light-weighted model generates favorable results against the state-of-the-art, showing the effectiveness of the proposed algorithm. Due to the space limit, we only present the evaluation results of the Full DSDNet hereafter.

Table 2: Average PSNR(dB)/SSIM of the deblurring results with Gaussian noise using different methods. We highlight the best and the second best results. Our Full DSDNet wins first place, while our Light one also performs favorably against these state-of-the-art methods.
Dataset noise IRCNN [73] SFARL[48] ADM_UDM [21] CPCR [12] KerUNC [41] VEM [42] DWDN [10] SVMAP [11] DRUNet [72] DSDNet(Light) DSDNet(Full)
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
Levin [30] 1%1\% 30.61 / 0.883 25.41 / 0.600 31.48 / 0.922 28.43 / 0.858 32.02 / 0.928 32.05 / 0.927 34.89 / 0.957 35.24 / 0.962 31.94 / 0.922 35.48 / 0.960 36.62 / 0.965
3%3\% 29.70 / 0.864 16.82 / 0.255 28.61 / 0.812 25.61 / 0.765 21.72 / 0.416 29.47 / 0.867 31.94 / 0.916 31.20 / 0.893 30.86 / 0.905 32.13 / 0.918 32.89 / 0.925
5%5\% 28.98 / 0.854 13.07 / 0.157 27.83 / 0.827 23.68 / 0.703 18.25 / 0.272 27.79 / 0.819 30.21 / 0.883 30.12 / 0.876 29.79 / 0.880 30.24 / 0.883 30.94 / 0.893
BSD100 [38] 1%1\% 29.20 / 0.817 24.21 / 0.568 29.39 / 0.836 28.77 / 0.829 29.23 / 0.829 29.54 / 0.848 31.10 / 0.881 31.52 / 0.888 30.36 / 0.872 31.50 / 0.892 32.01 / 0.898
3%3\% 27.54 / 0.762 15.80 / 0.245 26.92 / 0.722 25.96 / 0.712 22.10 / 0.430 27.09 / 0.746 28.47 / 0.797 27.94 / 0.762 28.10 / 0.798 28.73 / 0.812 29.08 / 0.820
5%5\% 27.04 / 0.756 12.56 / 0.146 26.04 / 0.697 25.75 / 0.688 18.99 / 0.297 26.11 / 0.698 27.50 / 0.762 27.59 / 0.763 27.19 / 0.767 27.64 / 0.774 27.96 / 0.782
Set5 [3] 1%1\% 30.15 / 0.853 26.21 / 0.632 30.52 / 0.868 30.59 / 0.875 30.45 / 0.864 31.00 / 0.875 32.18 / 0.893 32.31 / 0.892 30.84 / 0.881 32.78 / 0.899 33.43 / 0.905
3%3\% 28.66 / 0.813 15.50 / 0.211 27.64 / 0.709 27.94 / 0.799 21.39 / 0.376 28.40 / 0.804 29.54 / 0.838 28.78 / 0.812 29.21 / 0.841 29.94 / 0.843 30.40 / 0.851
5%5\% 27.55 / 0.789 11.91 / 0.122 26.75 / 0.756 26.64 / 0.754 17.74 / 0.241 26.46 / 0.732 28.13 / 0.806 28.02 / 0.793 27.85 / 0.805 28.46 / 0.804 28.89 / 0.814
Table 3: Evaluation on the dataset Lai [27]. The best and second best results are highlighted as Table 3. The Saturation results of SVMAP [11] are obtained by the model specifically trained for saturation scenes.
Subset IRCNN [73] ADM_UDM [21] KerUNC [41] VEM [42] DWDN [10] SVMAP [11] DRUNet [72] DSDNet
PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM PSNR / SSIM
Manmade 20.47 / 0.604 22.43 / 0.724 22.19 / 0.725 22.71 / 0.780 24.02 / 0.836 23.75 / 0.776 20.62 /0.613 25.44 / 0.859
Natural 23.26 / 0.636 25.04 / 0.733 25.42 / 0.757 25.29 / 0.752 25.91 / 0.814 26.23 / 0.778 23.25 /0.630 27.01 / 0.837
People 28.04 / 0.843 28.81 / 0.866 28.80 / 0.848 27.19 / 0.723 30.02 / 0.905 30.88 / 0.899 28.04 /0.838 30.95 / 0.908
Saturation 16.99 / 0.642 17.57 / 0.627 17.70 / 0.640 17.65 / 0.600 17.90 / 0.695 18.75 / 0.733 17.14 /0.658 18.38 / 0.734
Text 21.37 / 0.828 25.13 / 0.883 23.32 / 0.855 24.92 / 0.853 25.40 / 0.877 25.60 / 0.894 21.79 /0.829 28.13 / 0.920
Overall 22.03 / 0.710 23.80 / 0.767 23.49 / 0.765 23.55 / 0.742 24.65 / 0.825 25.04 / 0.816 22.17 /0.714 25.98 / 0.852

We then evaluate our method on the Lai [27] dataset. Because it contains Manmade, Natural, People, Saturation, and Text subsets, we present the evaluation accordingly. Table 3 shows that our method performs better than the evaluated methods. Similar to Table 3, our method also achieves the highest PSNR and SSIM in most tests, except for the Saturation subset. We note that Dong et al. [11] specifically train a model for saturation scenes; thus, this method performs slightly better. However, our model is only trained with common scenes but comparable in terms of PSNR on the Saturation images. Moreover, the SSIM values of our method are better than SVMAP [11], demonstrating the efficiency and robustness of our approach.

We also quantitatively evaluate our method on real-world blurry images and estimated kernels from Pan [43]. Since the ground truth images are unavailable, we use the no-reference BRISQUE [39] and PIQE [65] metrics for evaluation. Our model achieves the best score in BRISQUE and second place in PIQE, as shown in Table 4. As BRISQUE is a metric based on subject scoring, Table 4 shows that our model generates more subjectively satisfying results than other state-of-the-art methods.

Table 4: Quantitative evaluation of real cases Pan [43]. Non-reference image quality metrics BRISQUE [39] and PIQE [64] are used.
IRCNN [73] ADM_UDM [21] KerUNC [41] VEM [42] DWDN [10] SVMAP [11] DRUNet [72] DSDNet
BRISQUE 43.484 36.598 37.816 33.663 34.027 35.508 46.774 33.129
PIQE 78.700 67.605 65.674 44.942 51.348 56.032 81.074 49.788
Refer to caption

Refer to caption Refer to caption

(a) Blurry input
Refer to caption

Refer to caption Refer to caption

(b) IRCNN [73]
Refer to caption

Refer to caption Refer to caption

(c) ADM_UDM [21]
Refer to caption

Refer to caption Refer to caption

(d) KerUNC [41]
Refer to caption

Refer to caption Refer to caption

(e) VEM [42]
Refer to caption

Refer to caption Refer to caption

(f) DWDN [10]
Refer to caption

Refer to caption Refer to caption

(g) SVMAP [11]
Refer to caption

Refer to caption Refer to caption

(h) DRUNet [72]
Refer to caption

Refer to caption Refer to caption

(i) DSDNet (ours)
Refer to caption

Refer to caption Refer to caption

(j) Ground truth
Figure 3: A synthetic case with 1%1\% Gaussian noise from BSD100 [38]. Our approach restores the image with finer detailed structures.

5.3 Qualitative Evaluation

We show visual comparisons of a synthesized and a real-world case in Fig. 4 Fig. 5, respectively.

Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate blur results. In contrast, our method reconstructs better images (e.g., the wood texture is better restored, as shown in both red and green boxes).

Refer to caption

Refer to caption Refer to caption

(a) Blurry input
Refer to caption

Refer to caption Refer to caption

(b) IRCNN [73]
Refer to caption

Refer to caption Refer to caption

(c) ADM_UDM [21]
Refer to caption

Refer to caption Refer to caption

(d) KerUNC [41]
Refer to caption

Refer to caption Refer to caption

(e) VEM [42]
Refer to caption

Refer to caption Refer to caption

(f) DWDN [10]
Refer to caption

Refer to caption Refer to caption

(g) SVMAP [11]
Refer to caption

Refer to caption Refer to caption

(h) DRUNet [72]
Refer to caption

Refer to caption Refer to caption

(i) DSDNet (ours)
Refer to caption

Refer to caption Refer to caption

(j) Ground truth
Figure 4: A synthetic case comes from Lai [27]. Our method restores clearer images with finer details (e.g., the wood texture).
Refer to caption

Refer to caption Refer to caption

(a) Blurry input
Refer to caption

Refer to caption Refer to caption

(b) IRCNN [73]
Refer to caption

Refer to caption Refer to caption

(c) ADM_UDM [21]
Refer to caption

Refer to caption Refer to caption

(d) KerUNC [41]
Refer to caption

Refer to caption Refer to caption

(e) DWDN [10]
Refer to caption

Refer to caption Refer to caption

(f) SVMAP [11]
Refer to caption

Refer to caption Refer to caption

(g) DRUNet [72]
Refer to caption

Refer to caption Refer to caption

(h) DSDNet (ours)
Figure 5: A real-world case whose kernel is estimated by [43]. Our method generates a better image with clearer text and a sharper eyebrow. In the olive background, our method produces the sharpest result without noise.

We show some visual comparisons on BSD100, Lai and a real-world case in Fig. 3, Fig. 4 and Fig. 5, respectively.

Fig. 3 shows the deblurred results of one synthetic blurry image with 1%1\% Gaussian noise. Our method generates the clearer image with finer details as shown in the parts enclosed in the red and green boxes.

Fig. 4 shows the results of Manmade from the dataset Lai [27]. The evaluated methods generate the results with blur effect. In contrast, our method reconstructs better images (e.g., the wood texture is better restored as shown in both red and green boxes).

Fig. 5 shows the deblurred results of a pair of real-world blurry images and estimated kernel [43]. Our method restores the text to sharpness in the red boxes and the blackest and sharpest eyebrow in the green boxes. In contrast, other methods cannot restore the text well and mix the eyebrow with the skin color. Furthermore, they also generate artifacts and noise in the olive background.

5.4 Ablation Study

Table 5: Ablation study on Set5. “w/o F, G” is out of the convolutional layers before the Maxout layers; “ReLU” and “RBF” replace the Maxout layers with ReLU and RBF layers, respectively; “CG” performs conventional Conjugate Gradient method for deconvolution rather than CGNet, “FFT” performs FFT deconvolution with edge taper [25], and the superscript means the input is denoised by DRUNet [72] first.
w/o 𝐅\mathbf{F}, 𝐆\mathbf{G} ReLU RBF CG CG FFT FFT DSDNet
PSNR(dB) 26.66 32.78 32.98 29.07 32.39 31.30 32.03 33.11
FLOPs(G) 136.26 464.94 468.03 470.73 559.97 288.89 391.57 466.32
Parameters(M) 1.04 237.85 237.85 0.31 32.95 0.27 32.09 237.87
Gain(dB) -6.45 -0.33 -0.13 -4.04 -0.72 -1.81 -1.08 -/-

In this section, we design experiments to show the efficiency of the proposed discriminative shrinkage functions and the differentiable CGNet. Table 5 shows the ablation results w.r.t. different baselines. In this study, we train 7 models based on the architecture of the Heavy DSDNet. We compare the number of floating point operations (FLOPs) and the parameters in this study.

To validate the effects of the 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j}, we train a model without estimating these two filters, denoted by “w/o 𝐅\mathbf{F},𝐆\mathbf{G}”. Without the feature maps coming from them, we can only learn the shrinkage functions from RGB inputs. Table 5 shows that the PSNR value of the results by the baseline method without 𝐅i\mathbf{F}_{i} and 𝐆j\mathbf{G}_{j} is at least 6.45dB lower than that of our approach.

We also evaluate the effect of the Maxout layers by replacing them with ReLU. Table 5 shows that the method using ReLU does not generate better results than the proposed method, suggesting the effectiveness of the Maxout layers.

As RBFs are usually used to approximate shrinkage functions, one may wonder whether using them generates better results or not. To answer this question, we replace the Maxout layers with the commonly-used Gaussian RBFs. Table 5 shows that the PSNR value of the method using RBFs is at least 0.13dB lower than that of our method, indicating the effectiveness of the proposed method.

Finally, to demonstrate the efficiency of the proposed CGNet, we train 2 models with the conventional CG method and 2 models with FFT deconvolution. The CG method is unstable with respect to even small perturbations [37, 18]. Each optimization step in training and the existence of noise may cause the divergence of the CG method. Hence we have to reduce the learning rate and apply gradient clipping to avoid the gradient exploding during the training. However, it still performs poorly even if the training can be finished. To make the training more feasible, we denoise the inputs first by DRUNet [72], and this model is denoted as “CG”. The training of “CG” is smoother, the learning rate can be set as that of the DSDNet, and the performance is much better than generic “CG”. With another model to denoise, the computational cost is about 26% more FLOPs, and it takes more than twice the time for training compared to the DSDNet.

Similar to CG ones, we also provide results of deconvolution via FFT with the artifact processing operation by [25], i.e., “FFT” and “FFT”. Although FFT ones are much faster than CG ones, the gap between “CG” and “FFT” is considerable, as mentioned in Section 3. As for the test time on Set5, “CG” is 5.6001 seconds, “CG” is 5.8252 seconds, “FFT” is 3.6941 seconds, “FFT” is 4.0330 seconds, and DSDNet is 2.6717 seconds. These result show the efficiency of the proposed CGNet. We include the ablation study on HypNet and NLNet in the supplementary material.

Refer to caption
Figure 6: Speed and accuracy trade-off. The results are evaluated on the Set5 dataset with 1%1\% Gaussian noise. Results of this work are shown in red points; the fastest Feather DSDNet still performs favorably against the state-of-the-art methods (blue points) in PSNR.

5.5 Execution Time Analysis

We analyze the execution time of the proposed methods and state-of-the-art ones. All the execution times are evaluated on one nVidia RTX 2080Ti GPU. Fig. 6 shows that our models are faster and more accurate than the state-of-the-art methods. Among these methods, our Feather model is a little bit faster than DWDN by 0.0052 seconds yet outperforms all other methods in PSNR.

5.6 Limitations

Although better performance on various datasets has been achieved, our method has some limitations. Our model cannot deal with blurry images containing significant saturation regions, which may lead to overflow. More analysis can be found in the supplementary material.

6 Conclusion

In this paper, we present a fully learnable MAP model for non-blind deconvolution. We formulate the data and regularization terms as the learnable ones and split the deconvolution model into data-related and regularization-related sub-problems in the ADMM framework. Maxout layers are used to learn the discriminative shrinkage functions, which directly approximate the solutions of these two sub-problems. We have further developed a CGNet to restore the images effectively and efficiently. With a reasonable design, the size of our model is flexibly adjustable while keeping competitive in performance. Extensive evaluations on the benchmark datasets demonstrate that the proposed model performs favorably against the state-of-the-art non-blind deconvolution methods in terms of quantitative metrics, visual quality, and computational efficiency.

References

  • [1] Aljadaany, R., Pal, D.K., Savvides, M.: Douglas-rachford networks: Learning both the image prior and data fidelity terms for blind image deconvolution. In: CVPR. pp. 10235–10244 (2019)
  • [2] Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the solution of Linear Systems: Building Blocks for Iterative Methods. SIAM (1994)
  • [3] Bevilacqua, M., Roumy, A., Guillemot, C., Alberi-Morel, M.L.: Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: BMVC (2012)
  • [4] Bigdeli, S.A., Zwicker, M., Favaro, P., Jin, M.: Deep mean-shift priors for image restoration. In: NeurIPS (2017)
  • [5] Chakrabarti, A.: A neural approach to blind motion deblurring. In: ECCV. pp. 221–235 (2016)
  • [6] Chan, T.F., Wong, C.K.: Total variation blind deconvolution. IEEE TIP 7(3), 370–375 (1998)
  • [7] Chen, L., Zhang, J., Pan, J., Lin, S., Fang, F., Ren, J.S.: Learning a non-blind deblurring network for night blurry images. In: CVPR. pp. 10542–10550 (2021)
  • [8] Cho, S., Wang, J., Lee, S.: Handling outliers in non-blind image deconvolution. In: ICCV. pp. 495–502 (2011)
  • [9] Dong, J., Pan, J., Sun, D., Su, Z., Yang, M.H.: Learning data terms for non-blind deblurring. In: ECCV. pp. 748–763 (2018)
  • [10] Dong, J., Roth, S., Schiele, B.: Deep wiener deconvolution: Wiener meets deep learning for image deblurring. In: NeurIPS (2020)
  • [11] Dong, J., Roth, S., Schiele, B.: Learning spatially-variant map models for non-blind image deblurring. In: CVPR. pp. 4886–4895 (2021)
  • [12] Eboli, T., Sun, J., Ponce, J.: End-to-end interpretable learning of non-blind image deblurring. In: European Conference on Computer Vision. pp. 314–331. Springer (2020)
  • [13] Gao, H., Tao, X., Shen, X., Jia, J.: Dynamic scene deblurring with parameter selective sharing and nested skip connections. In: CVPR. pp. 3848–3856 (2019)
  • [14] Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE TPAMI 14(3), 367–383 (1992)
  • [15] Gong, D., Zhang, Z., Shi, Q., van den Hengel, A., Shen, C., Zhang, Y.: Learning deep gradient descent optimization for image deconvolution. IEEE transactions on neural networks and learning systems 31(12), 5468–5482 (2020)
  • [16] Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML. pp. 1319–1327 (2013)
  • [17] Gu, S., Zhang, L., Zuo, W., Feng, X.: Weighted nuclear norm minimization with application to image denoising. In: CVPR. pp. 2862–2869 (2014)
  • [18] Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Courier Corporation (2003)
  • [19] Jancsary, J., Nowozin, S., Rother, C.: Loss-specific training of non-parametric image restoration models: A new state of the art. In: ECCV. pp. 112–125 (2012)
  • [20] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  • [21] Ko, H.C., Chang, J.Y., Ding, J.J.: Deep priors inside an unrolled and adaptive deconvolution model. In: ACCV (2020)
  • [22] Kong, S., Wang, W., Feng, X., Jia, X.: Deep red unfolding network for image restoration. IEEE TIP 31, 852–867 (2021)
  • [23] Krishnan, D., Fergus, R.: Fast image deconvolution using hyper-laplacian priors. In: NeurIPS. pp. 1033–1041 (2009)
  • [24] Krishnan, D., Tay, T., Fergus, R.: Blind deconvolution using a normalized sparsity measure. In: CVPR. pp. 233–240 (2011)
  • [25] Kruse, J., Rother, C., Schmidt, U.: Learning to push the limits of efficient fft-based image deconvolution. In: ICCV. pp. 4586–4594 (2017)
  • [26] Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D., Matas, J.: Deblurgan: Blind motion deblurring using conditional adversarial networks. In: CVPR. pp. 8183–8192 (2018)
  • [27] Lai, W.S., Huang, J.B., Hu, Z., Ahuja, N., Yang, M.H.: A comparative study for single image blind deblurring. In: CVPR. pp. 1701–1709 (2016)
  • [28] Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. ACM TOG 26(3), 70–es (2007)
  • [29] Levin, A., Weiss, Y.: User assisted separation of reflections from a single image using a sparsity prior. IEEE TPAMI 29(9), 1647–1654 (2007)
  • [30] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding and evaluating blind deconvolution algorithms. In: CVPR. pp. 1964–1971 (2009)
  • [31] Levin, A., Weiss, Y., Durand, F., Freeman, W.T.: Understanding blind deconvolution algorithms. IEEE TPAMI 33(12), 2354–2367 (2011)
  • [32] Li, L., Pan, J., Lai, W.S., Gao, C., Sang, N., Yang, M.H.: Blind image deblurring via deep discriminative priors. IJCV 127(8), 1025–1043 (2019)
  • [33] Li, Y., Tofighi, M., Geng, J., Monga, V., Eldar, Y.C.: Deep algorithm unrolling for blind image deblurring. arXiv preprint arXiv:1902.03493 (2019)
  • [34] Liu, C.S.: Modifications of steepest descent method and conjugate gradient method against noise for ill-posed linear systems. Commun. Numer. Anal 2012 (2012)
  • [35] Liu, R., Jia, J.: Reducing boundary artifacts in image deconvolution. In: ICIP. pp. 505–508 (2008)
  • [36] Ma, K., Duanmu, Z., Wu, Q., Wang, Z., Yong, H., Li, H., Zhang, L.: Waterloo exploration database: New challenges for image quality assessment models. IEEE TIP 26(2), 1004–1016 (2016)
  • [37] Marin, L., Háo, D.N., Lesnic, D.: Conjugate gradient-boundary element method for a cauchy problem in the lamé system. WIT Transactions on Modelling and Simulation 27 (2001)
  • [38] Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV. pp. 416–423 (2001)
  • [39] Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE TIP 21(12), 4695–4708 (2012). https://doi.org/10.1109/TIP.2012.2214050
  • [40] Nah, S., Hyun Kim, T., Mu Lee, K.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: CVPR. pp. 3883–3891 (2017)
  • [41] Nan, Y., Ji, H.: Deep learning for handling kernel/model uncertainty in image deconvolution. In: CVPR. pp. 2388–2397 (2020)
  • [42] Nan, Y., Quan, Y., Ji, H.: Variational-em-based deep learning for noise-blind image deblurring. In: CVPR. pp. 3626–3635 (2020)
  • [43] Pan, J., Sun, D., Pfister, H., Yang, M.H.: Blind image deblurring using dark channel prior. In: CVPR. pp. 1628–1636 (2016)
  • [44] Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1(3), 127–239 (2014)
  • [45] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett, R. (eds.) NeurIPS. pp. 8024–8035. Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
  • [46] Perrone, D., Favaro, P.: Total variation blind deconvolution: The devil is in the details. In: CVPR. pp. 2909–2916 (2014)
  • [47] Qiu, H., Hammernik, K., Qin, C., Rueckert, D.: Gradirn: Learning iterative gradient descent-based energy minimization for deformable image registration. arXiv preprint arXiv:2112.03915 (2021)
  • [48] Ren, D., Zuo, W., Zhang, D., Zhang, L., Yang, M.H.: Simultaneous fidelity and regularization learning for image restoration. IEEE TPAMI (2019)
  • [49] Ren, W., Cao, X., Pan, J., Guo, X., Zuo, W., Yang, M.H.: Image deblurring via enhanced low-rank prior. IEEE TIP 25(7), 3426–3437 (2016)
  • [50] Richardson, W.H.: Bayesian-based iterative method of image restoration. JoSA 62(1), 55–59 (1972)
  • [51] Roth, S., Black, M.J.: Fields of experts: A framework for learning image priors. In: CVPR. pp. 860–867 (2005)
  • [52] Rudin, L.I., Osher, S.: Total variation based image restoration with free local constraints. In: ICIP. vol. 1, pp. 31–35 (1994)
  • [53] Ryabtsev, A.: The error accumulation in the conjugate gradient method for degenerate problem. arXiv preprint arXiv:2004.10242 (2020)
  • [54] Samuel, K.G., Tappen, M.F.: Learning optimized map estimates in continuously-valued mrf models. In: CVPR. pp. 477–484 (2009)
  • [55] Schmidt, U., Jancsary, J., Nowozin, S., Roth, S., Rother, C.: Cascades of regression tree fields for image restoration. IEEE TPAMI 38(4), 677–689 (2015)
  • [56] Schmidt, U., Roth, S.: Shrinkage fields for effective image restoration. In: CVPR. pp. 2774–2781 (2014)
  • [57] Schmidt, U., Rother, C., Nowozin, S., Jancsary, J., Roth, S.: Discriminative non-blind deblurring. In: CVPR. pp. 604–611 (2013)
  • [58] Schuler, C.J., Christopher Burger, H., Harmeling, S., Scholkopf, B.: A machine learning approach for non-blind image deconvolution. In: CVPR. pp. 1067–1074 (2013)
  • [59] Suin, M., Purohit, K., Rajagopalan, A.: Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In: CVPR. pp. 3606–3615 (2020)
  • [60] Sun, J., Cao, W., Xu, Z., Ponce, J.: Learning a convolutional neural network for non-uniform motion blur removal. In: CVPR. pp. 769–777 (2015)
  • [61] Sun, L., Cho, S., Wang, J., Hays, J.: Edge-based blur kernel estimation using patch priors. In: ICCP. pp. 1–8 (2013)
  • [62] Tao, X., Gao, H., Shen, X., Wang, J., Jia, J.: Scale-recurrent network for deep image deblurring. In: CVPR. pp. 8174–8182 (2018)
  • [63] Tappen, M.F., Liu, C., Adelson, E.H., Freeman, W.T.: Learning gaussian conditional random fields for low-level vision. In: CVPR. pp. 1–8 (2007)
  • [64] Venkatanath, N., Praneeth, D., Bh, M.C., Channappayya, S.S., Medasani, S.S.: Blind image quality evaluation using perception based features. In: National Conference on Communications (NCC). pp. 1–6 (2015)
  • [65] Wang, Y., Yang, J., Yin, W., Zhang, Y.: A new alternating minimization algorithm for total variation image reconstruction. SIAM Journal on Imaging Sciences 1(3), 248–272 (2008)
  • [66] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP 13(4), 600–612 (2004)
  • [67] Xiang, J., Dong, Y., Yang, Y.: Fista-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE TMI (2021)
  • [68] Yang, Y., Sun, J., Li, H., Xu, Z.: Deep admm-net for compressive sensing mri. In: NeurIPS. pp. 10–18 (2016)
  • [69] Zhang, J., Ghanem, B.: Ista-net: Interpretable optimization-inspired deep network for image compressive sensing. In: CVPR. pp. 1828–1837 (2018)
  • [70] Zhang, J., shan Pan, J., Lai, W.S., Lau, R.W.H., Yang, M.H.: Learning fully convolutional networks for iterative non-blind deconvolution. In: CVPR. pp. 6969–6977 (2017)
  • [71] Zhang, K., Gool, L.V., Timofte, R.: Deep unfolding network for image super-resolution. In: CVPR. pp. 3217–3226 (2020)
  • [72] Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-play image restoration with deep denoiser prior. IEEE TPAMI (2021)
  • [73] Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep cnn denoiser prior for image restoration. In: CVPR. pp. 3929–3938 (2017)
  • [74] Zhang, K., Zuo, W., Zhang, L.: Deep plug-and-play super-resolution for arbitrary blur kernels. In: CVPR. pp. 1671–1681 (2019)
  • [75] Zoran, D., Weiss, Y.: From learning models of natural image patches to whole image restoration. In: ICCV. pp. 479–486 (2011)