This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction

Gang Yang [email protected] 0000-0001-9403-5818 University of Science and Technology of China43017-6221 Li Zhang [email protected] 0000-0003-1610-6056 University of Science and Technology of China Man Zhou [email protected] 0000-0003-2872-605X University of Science and Technology of China Aiping Liu [email protected] 0000-0001-8849-5228 University of Science and Technology of ChinaUSTC IAT-Huami Joint Laboratory for Brain-Machine Intelligence, Institute of Advanced Technology Xun Chen [email protected] 0000-0002-4922-8116 University of Science and Technology of ChinaUSTC IAT-Huami Joint Laboratory for Brain-Machine Intelligence, Institute of Advanced Technology Zhiwei Xiong [email protected] 0000-0002-9787-7460 University of Science and Technology of ChinaInstitute of Artificial Intelligence, Hefei Comprehensive National Science Center  and  Feng Wu [email protected] University of Science and Technology of ChinaInstitute of Artificial Intelligence, Hefei Comprehensive National Science Center
(2022)
Abstract.

Magnetic resonance imaging (MRI) with high resolution (HR) provides more detailed information for accurate diagnosis and quantitative image analysis. Despite the significant advances, most existing super-resolution (SR) reconstruction network for medical images has two flaws: 1) All of them are designed in a black-box principle, thus lacking sufficient interpretability and further limiting their practical applications. Interpretable neural network models are of significant interest since they enhance the trustworthiness required in clinical practice when dealing with medical images. 2) most existing SR reconstruction approaches only use a single contrast or use a simple multi-contrast fusion mechanism, neglecting the complex relationships between different contrasts that are critical for SR improvement. To deal with these issues, in this paper, a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction is proposed. The Model-Guided image SR reconstruction approach solves manually designed objective functions to reconstruct HR MRI. We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix and explicit multi-contrast relationship matrix into account during the end-to-end optimization. Extensive experiments on the multi-contrast IXI dataset and BraTs 2019 dataset demonstrate the superiority of our proposed model.

Model-Guided Network, MRI Super-Resolution, Deep Unfolding Network.
copyright: acmcopyrightjournalyear: 2022conference: Proceedings of the 30th ACM International Conference on Multimedia; October 10–14, 2022; Lisboa, Portugalbooktitle: Proceedings of the 30th ACM International Conference on Multimedia (MM ’22), October 10–14, 2022, Lisboa, Portugalprice: 15.00doi: 10.1145/3503161.3548068isbn: 978-1-4503-9203-7/22/10submissionid: 1308ccs: Computing methodologies Reconstruction

1. Introduction

Magnetic resonance imaging (MRI) has been widely adopted in clinical and medical research. In comparison to other imaging modalities such as computed tomography (CT) and nuclear imaging, MRI enjoys the advantage of delivering detailed images of tissue architecture without the use of ionizing radiation (Feng et al., 2021a). The MRI system may be configured in a number of ways using pulse sequences to provide multi-contrast images such as T1, T2, and proton density (PD) weighted images that include essential physiological and pathological features. However, in real-world cases, HR MR images are often obtained with a longer scanning time, lower signal-to-noise ratio, and small spatial converge (Zhang et al., 2021; Plenge et al., 2012). Additionally, the quality of MR images acquired in clinical practice may be insufficient owing to patients’ involuntary physiological movements (e.g., heart pounding and breathing) during the acquisition process. This is particularly problematic when protocols requiring a long echo time (TE) or repetition time (TR) are used. These scans may lead to inaccurate diagnosis as limited structural and textural information is provided in subsequent quantitative medical image analysis  (Feng et al., 2021d). As a result, there is emerging interest in developing super-resolution (SR) techniques for reconstructing high-resolution (HR) outputs from low-resolution (LR) images to increase the spatial resolution of magnetic resonance imaging.

The utility of super-resolution is capable to improve the quality of MR images without modifying the hardware and overcome the challenges in obtaining HR MRI scans. MRI super-resolution approaches can be broadly classified into two categories depending on the number of imaging modalities involved: single-contrast super-resolution (SCSR) methods and multi-contrast super-resolution (MCSR) methods. SCSR approaches have been extensively studied over the last several decades, with the goal of reconstructing the high-resolution counterpart of a given low-resolution image in a single contrast mode, thereby ignoring the complementary multi-contrast information. In contrast to SCSR methods, MCSR methods recover a target modality by synthesizing information from multiple modalities. Clinically, MRI generates multi-contrast images under a variety of imaging settings but with the same anatomical structure, which includes T1 and T2 weighted images (T1WIs and T2WIs), as well as proton density and fast-suppressed proton density weighted images (PDWIs and FS-PDWIs), providing complementary information to each other (Zeng et al., 2018; Mai et al., 2011). Noting that contrasts with shorter acquisition times are easier to obtain, they can be used to supplement a single LR image with extra information. For example, relevant HR information from T1WIs or PDWIs may be utilized as auxiliary contrasts to aid in the generation of target contrasts.

Existing techniques for image super-resolution reconstruction include model-based and learning-based approaches. Model-based techniques utilize domain knowledge when modeling the physical mechanism underlying the issue. Typical optimization algorithms include alternating direction method of multipliers (ADMM) algorithm (Sun et al., 2016), and iterative shrinkage-thresholding algorithm (ISTA) (Zhang and Ghanem, 2018). Regardless of its theoretical attractiveness, model-based approaches are incapable of performing end-to-end optimization, resulting in limited performance. Alternatively, deep learning-based SR approaches have gained growing attention in recent years. For example, various architectures such as residual networks (Chaudhari et al., 2018), generative adversarial networks (Lyu et al., 2020), and densely connected networks (Chen et al., 2018a) are utilized to reconstruct an MR image. Nevertheless, neural networks lack transparency (i.e., the black-box design) with generalized structures, and it is unclear how domain knowledge can be incorporated. When dealing with medical images, the accuracy and trustworthiness of reconstruction are critical for discovery and diagnosis. Therefore, balancing accuracy and interpretability is a non-trivial problem. The objective of designing interpretable neural networks is to bridge the gap between model-based and learning-based methods which can be accomplished by unfolding the iterations of an inference algorithm into deep neural networks, thus making the learning process interpretable.

In this paper, a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction is proposed. The motivation of our approach has two folds. On the one hand, to fully exploit domain knowledge of MRI SR and improve prediction performance, we formulate two manually designed objective functions for reconstructing HR MRI, each corresponding to a recovery process and incorporating domain knowledge. We then show how to solve these functions iteratively and how to unfold the iterative MGDUN algorithm into a neural network form by implementing specially designed modules. In contrast to conventional neural networks, model-guided design results in transparent network architectures that are well-aligned with the emerging interpretable machine learning framework. On the other hand, we formulate MGDUN as a network to reconstruct an HR image of a target contrast from an LR input with the aid of other guide contrasts, which is capable to exploit the complex relationships between different modalities better.

The main contributions of this paper are as follows:

  • We design a novel Model-Guided Deep Unfolding Network (MGDUN) for medical image SR reconstruction, which models multi-contrast MRI SR with other contrast images in an interpretable manner.

  • We elaborate on how to solve the manually designed objective functions and how to unfold the iterative algorithm into a neural network by incorporating domain knowledge with specially designed modules.

  • We reconstruct an HR image of a target contrast from an LR input with the aid of other guide contrasts, providing a new strategy for multi-contrast fusion.

  • Extensive experiments on the multi-contrast IXI dataset and BraTs 2019 dataset demonstrate the superiority of MGDUN.

2. RELATED WORK

2.1. Multi-contrast MR image representation

Clinically, MR images are usually acquired with multiple contrasts under a variety of imaging settings for comprehensive evaluation (Lyu et al., 2020; Feng et al., 2021b), and each provides unique and complementary structural information about tissues (Zeng et al., 2018; Brown and Semelka, 2011). As a result, Multi-contrast has been proposed to improve representation ability for a variety of MR image tasks, including segmentation and SR. For example, Huo et al. trained and evaluated MRI segmentation using T1WIs and T2WIs. (Huo et al., 2018). Different from the multi-contrast MRI segmentation task, the SR task requires the division of images into auxiliary and target contrasts. The auxiliary contrast, which is usually easier to obtain, is utilized to assist the reconstruction of the target contrast. Rough et al., for instance, super-resolved the target image using anatomical intermodality priors from a reference image (Rousseau et al., 2010). Meanwhile, similar anatomical structures in the auxiliary contrast may be utilized to reconstruct an SR image from its LR counterpart (Jafari-Khouzani, 2014; Zheng et al., 2017, 2018). However, most current methods are limited in constructing a model-based interpretable network, making the investigation of the relationship between multiple contrasts challenging.

2.2. Medical Image Super-Resolution

Medical image super-resolution methods are classified into two broad categories: single-contrast super-resolution (SCSR) (Lim et al., 2017; Zhao et al., 2018; Kuklisova-Murgasova et al., 2012; Pham et al., 2017; McDonagh et al., 2017; Zhao et al., 2019; Zhang et al., 2018; Feng et al., 2021c) and multi-contrast super-resolution (MCSR)  (Feng et al., 2021d; Zeng et al., 2018; Zheng et al., 2017; Lu et al., 2015; Manjón et al., 2010). Traditionally, because of their simplicity, bicubic and b-spline interpolations are two of the most frequently used SCSR methods in MRI practice. However, both methods invariably generate fuzzy edges and block artifacts. To address these issues, EDSR and MDSR (Lim et al., 2017) employ multiple blocks with linear residuals, contributing to improved performance in image super-resolution, whereas RCAN (Zhang et al., 2018) employs numerous residual groups with long skip connections and several residual blocks with short skip connections within each residual group. Recently, T2Net (Feng et al., 2021c) with joint MRI reconstruction and SR enables representation and feature sharing across tasks, producing higher-quality, super-resolved, and motion-artifact-free images. In another line of studies, MCSR methods have shown superior performance and the ability to make full use of domain knowledge from multiple modalities. For example, Feng et al. (Feng et al., 2021d) develop a novel and successful solution called SANet, which consists of a separable attention network for exploring foreground and background areas in forward and backward directions using auxiliary contrast.

2.3. Deep unfolding network

As a pioneer work, deep unfolding is first reported in  (Gregor and LeCun, 2010), and it designs a learned version of the iterative soft thresholding algorithm (ISTA) that can be unfolded into a neural network form. Since then, a series of works (Kokkinos and Lefkimmiatis, 2018; You et al., 2021; Song et al., 2021; Ning et al., 2020; Zhang et al., 2020) demonstrate that deep unfolding methods are applicable to certain optimization algorithms since they can not only optimize the parameters in an end-to-end manner by minimizing the loss function over a large training set, but also integrate model-based and learning-based methods well. For instance, a novel fast network (Afonso et al., 2010; Ning et al., 2020) based on half-quadratic splitting is proposed to solve the unconstrained optimization problem in the task of image restoration and reconstruction. Another important work in image SR  (Zhang et al., 2020) proposes an end-to-end trainable unfolding network that integrates the flexibility of model-based methods with the advantages of learning-based methods. Additionally, CoISTA (Deng and Dragotti, 2019) introduces a novel joint multi-modal dictionary learning (JMDL) method for modeling cross-modal dependency. It converts the JMDL model into a deep neural network by unfolding the iterative shrinkage and thresholding algorithm (ISTA).

3. METHOD

Refer to caption
Figure 1. The overall architecture of MGDUN. It is a model-guided interpretable network with T stages.

3.1. Motivation

An HR image IHRI_{HR} can return the LR image ILRI_{LR} obtained following the down-sampling process, and the process of down-sampling ff can be expressed as follows:

(1) ILR=f(IHR)=ϕ(IHR)+𝒩\displaystyle I_{LR}\;=\;f(I_{HR})\;=\;\phi(I_{HR})\;+\;{\mathcal{N}}

where ϕ\phi denotes the function for down-sampling or blurring, and 𝒩{\mathcal{N}} denotes the system noise. The SR process is theoretically aimed at exploring the inverse solution f1f^{-1} to the original down-sampling function ff. Owing to the fact that the SR process is an ill-posed problem, it is impossible to obtain an exact inverse solution; only approximate solutions are possible. The goal of the SR imaging process is to find the most desirable inverse function gg of the theory inverse solution f1f^{-1}.

(2) ISR=g(ILR)IHR\displaystyle I_{SR}\;=\;g(I_{LR})\;\approx\;I_{HR}

where ISRI_{SR} denotes the corresponding SR image. To obtain such an approximate solution gg, it is necessary to use the image prior. There are still limitations on the prior information of single contrast images, so we take advantage of multi-contrast MR images. Based on the prior information of multi-contrast MRI, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction reconstruction.

3.2. Model-Guided MRI SR algorithm

3.2.1. The objective functions


Let Xn×CX\in\mathbb{R}^{n\times C} represents the degraded observations and ZN×CZ\in\mathbb{R}^{N\times C} represents the unknown original image, where CC denotes the number of channels, and n=h×wn=h\times w and N=H×WN=H\times W. It is assumed that an LR image is obtained by down-sampling and blurring an HR image, so the linear relationship between the observed image and the original HR image can be typically formulated as follows:

(3) X=DKZ+𝒩1\displaystyle X\;=\;DKZ\;+\;{\mathcal{N}}_{1}

where Dn×ND\in\mathbb{R}^{n\times N} and KN×NK\in\mathbb{R}^{N\times N} represent the process of down-sampling and blurring, respectively, and 𝒩1{\mathcal{N}}_{1} denotes the noise.

Transform modal relationship considering multi-contrast MR images in MCSR task, the transform relationship between the guide image YN×CY\in\mathbb{R}^{N\times C} and the unknown original image can be formulated by:

(4) Y=PZ+𝒩2\displaystyle Y\;=\;PZ\;+\;{\mathcal{N}}_{2}

where PN×NP\in\mathbb{R}^{N\times N} is the transform function, and the 𝒩2{\mathcal{N}}_{2} represents the noise in this process. As a result, ZZ can be obtained by solving the following objective function:

Z=argmin𝑍\displaystyle Z\;=\;\underset{Z}{argmin} 12XDKZ22+η2YPZ22\displaystyle\frac{1}{2}{||X\;-\;DKZ||}_{2}^{2}\;+\;\frac{\eta}{2}{||Y\;-\;PZ||}_{2}^{2}\;
(5) +λ11(Z)+λ22(Z)\displaystyle+\;\lambda_{1}{\mathfrak{R}}_{1}(Z)\;+\;\lambda_{2}{\mathfrak{R}}_{2}(Z)

where the hyper parameters (η,λ1,λ2\eta,\lambda_{1},\lambda_{2}) are the trade-off coefficient, and the last two regularization terms correspond to the prior domain knowledge of the MCSR task, which are noise prior in the typical degradation process and transform modal noise prior in multi-contrast image, respectively. The choice of various regularization functions reflects different ways of incorporating prior knowledge about the unknown original HR MR image.

In the next section, we describe how to solve the objective function with an iterative algorithm. Then, we unfold the iterative algorithm into neural networks for image SR, obtaining an end-to-end reconstruction architecture.

3.2.2. The Proximal Gradient Descent Algorithm


Following the framework of half-quadratic splitting (HQS) to introduce two auxiliary splitting parameters UU and VV for ZZ with different prior knowledge of the MCSR task, the Eq. 5 can be formulated as a non-constrained optimization problem, which can be written as:

argminZ,U,V\displaystyle\underset{Z,\;U,\;V}{argmin} 12XDKZ22+η2YPZ22\displaystyle\frac{1}{2}{||X\;-\;DKZ||}_{2}^{2}\;+\;\frac{\eta}{2}{||Y\;-\;PZ||}_{2}^{2}\;
+β12UZ22+β22VZ22\displaystyle+\frac{\beta_{1}}{2}{||U\;-\;Z||}_{2}^{2}\;+\frac{\beta_{2}}{2}{||V\;-\;Z||}_{2}^{2}\;
(6) +λ11(U)+λ22(V)\displaystyle+\;\lambda_{1}{\mathfrak{R}}_{1}(U)\;+\;\lambda_{2}{\mathfrak{R}}_{2}(V)

where β1\beta_{1}, β2\beta_{2}, λ1\lambda_{1}, and λ2\lambda_{2} are the penalty parameters. To obtain an unrolling inference, Eq. 6 can be divided into the following three sub-problems and solved alternatively:

(7) U(t)=\displaystyle U^{(t)}\;=\; argmin𝑈β12UZ(t)22+λ11(U)\displaystyle\underset{U}{argmin}\frac{\beta_{1}}{2}{||U\;-\;Z^{(t)}||}_{2}^{2}\;+\;\lambda_{1}{\mathfrak{R}}_{1}(U)\;
(8) V(t)=\displaystyle V^{(t)}\;=\; argmin𝑉β22VZ(t)22+λ22(V)\displaystyle\underset{V}{argmin}\frac{\beta_{2}}{2}{||V\;-\;Z^{(t)}||}_{2}^{2}\;+\;\lambda_{2}{\mathfrak{R}}_{2}(V)\;
Z(t+1)=\displaystyle Z^{(t+1)}\;=\; argmin𝑍12XDKZ22+η2YPZ22\displaystyle\underset{Z}{argmin}\frac{1}{2}{||X\;-\;DKZ||}_{2}^{2}\;+\;\frac{\eta}{2}{||Y\;-\;PZ||}_{2}^{2}\;
(9) +β12U(t)Z22+β22V(t)Z22\displaystyle+\frac{\beta_{1}}{2}{||U^{(t)}\;-\;Z||}_{2}^{2}\;+\frac{\beta_{2}}{2}{||V^{(t)}\;-\;Z||}_{2}^{2}\;

here, tt denotes the HQS iteration index.

For the objective function of Eq. 5 with the prior knowledge of the MCSR task, we employ the efficient Proximal Gradient Descent (PGD) to solve the above three sub-problems:

U(t)\displaystyle U^{(t)}\; =Prox1(U(t1)δ1U(U(t1)))\displaystyle=\;{Prox}_{{\mathfrak{R}}_{1}}(U^{(t-1)}\;-\;\delta_{1}\nabla_{U}\;\mathcal{F}(U^{(t-1)}))\;
(10) =Prox1(U(t1)δ1(β1(U(t1)Z(t))))\displaystyle=\;\;{Prox}_{{\mathfrak{R}}_{1}}(U^{(t-1)}\;-\;\delta_{1}(\beta_{1}(U^{(t-1)}\;-\;Z^{(t)})))
V(t)\displaystyle V^{(t)}\; =Prox2(V(t1)δ2V(V(t1)))\displaystyle=\;{Prox}_{{\mathfrak{R}}_{2}}(V^{(t-1)}\;-\;\delta_{2}\nabla_{V}\;\mathcal{F}(V^{(t-1)}))\;
(11) =Prox2(V(t1)δ2(β2(V(t1)Z(t))))\displaystyle=\;\;{Prox}_{{\mathfrak{R}}_{2}}(V^{(t-1)}\;-\;\delta_{2}(\beta_{2}(V^{(t-1)}\;-\;Z^{(t)})))
(12) Z(t+1)\displaystyle Z^{(t+1)}\; =Z(t)δ3Z(Z(t))\displaystyle=\;Z^{(t)}\;-\;\delta_{3}\nabla_{Z}\;\mathcal{F}(Z^{(t)})

where Prox1()Prox_{{\mathfrak{R}}_{1}}(\cdot) and Prox2()Prox_{{\mathfrak{R}}_{2}}(\cdot) are proximal operators corresponding to penalty 1(){{\mathfrak{R}}_{1}}(\cdot) and 2(){{\mathfrak{R}}_{2}}(\cdot), which integrate the information coming from target contrast LR MRI and guidance contrast HR MRI. And the gradient related notations are detailed as:

Z(Z(t))=\displaystyle\nabla_{Z}\;\mathcal{F}(Z^{(t)})\;=\; (DK)T(DKZ(t)X)+ηPT(PZ(t)Y)\displaystyle{(DK)}^{T}{(DKZ^{(t)}\;-\;X})\;+\;\eta P^{T}(PZ^{(t)}\;-\;Y)\;
(13) +β1(Z(t)U(t))+β2(Z(t)V(t))\displaystyle+\beta_{1}(Z^{(t)}\;-\;U^{(t)})+{\beta_{2}}(Z^{(t)}\;-\;V^{(t)})

In summary, the iterative algorithm for solving the MCSR task of Eq. 5 is given above, where we initialize Z(0)Z^{(0)} with a bicubic interpolated version of XX. Under the framework of the interpretable MGDUN model, the PGD algorithm usually requires dozens of iterations to converge.

Refer to caption
Figure 2. Architectures of MGDUN’s Submodules. (a) Denoising Module; (b) INN blocks in cross-modal transform module; (c) Up-sampling blocks (𝑼𝒑\boldsymbol{Up}); (d) down-sampling blocks (𝑫𝒐𝒘𝒏\boldsymbol{Down}).

3.3. Model-Guided deep unfolding network

In Sec. 3.2, we have proposed a general model-guided MRI SR algorithm for MCSR tasks. Like other model-based image restoration, it is difficult to optimize Eq. 3.2.2 and Eq. 3.2.2 due to the nonlinearity. Meantime, in traditional model-based approaches (Boyd et al., 2011; He et al., 2016), alternatively solving the above three optimization problems requires many iterations to converge leading to prohibitive computational cost. An alternative approach is to unfold the iterative optimization into a series of network implementations as demonstrated in recent years (Wisdom et al., 2017; Chen et al., 2018b; Dong et al., 2018; Bertocchi et al., 2020). The total number of unfolding stages naturally corresponds to that of PGD iterations.

3.3.1. Model represent and model overview


The main idea behind deep unfolding network is that conventional iterative soft-thresholding algorithm (ISTA) can be implemented equivalently by a stack of recurrent neural networks (Wisdom et al., 2017). Inspired by the principle of model-driven deep learning, we generalize Eq. 3.2.2 to Eq. 12 as a network block. Each step is translated with deep learning terminologies.

In each network, two auxiliary variables (UU and VV) are updated first. Due to the existence of noise, the Prox()Prox(\cdot) operator can be implemented by a deep denoising module (𝑫𝑴\boldsymbol{DM}). In Eq. 3.2.2, given the evaluated HR image Z(t)Z^{(t)} of the current stage and auxiliary splitting parameter U(t1)U^{(t-1)} of the previous stage, it generates the auxiliary splitting parameter U(t)U^{(t)} of the current stage. The same as V(t)V^{(t)} in the form of Eq. 3.2.2. In neural networks, these two steps are implemented by:

(14) U(t)=\displaystyle U^{(t)}\;= 𝑫𝑴𝟏(U(t1)+ξ1Z(t);C,C,C)\displaystyle\;\boldsymbol{{DM}_{1}}(U^{(t-1)}\;+\;\xi_{1}Z^{(t)};C,C^{\prime},C)
(15) V(t)=\displaystyle V^{(t)}\;= 𝑫𝑴𝟐(V(t1)+ξ2Z(t);C,C,C)\displaystyle\;\boldsymbol{{DM}_{2}}(V^{(t-1)}\;+\;\xi_{2}Z^{(t)};C,C^{\prime},C)

where 𝑫𝑴(;C,C,C)\boldsymbol{DM}(\cdot;C,C^{\prime},C) is used to obtain more expressive auxiliary splitting parameter. CC^{\prime} is the number of channels for feature maps.

Then, we reconstruct our estimated HR image according to Eq. 12 and Eq. 13. In our network, the reconstruction process which is implemented by a reconstruction module consists of cross-modal transform module, down-sampling block, and up-sampling block, takes auxiliary various U(t)U^{(t)}, V(t)V^{(t)}, the evaluated HR image Z(t)Z^{(t)}, the LR MRI and the guide image YY as inputs and outputs the reconstructed image Z(t+1)Z^{(t+1)}. Therefore, the reconstruction process in the neural network is as follows:

Z(t+1)=\displaystyle Z^{(t+1)}\;= Z(t)δ3(𝑼𝒑(𝑫𝒐𝒘𝒏(Z(t))X)\displaystyle\;Z^{(t)}\;-\;\;\delta_{3}(\boldsymbol{Up}(\boldsymbol{Down}(Z^{(t)})\;-\;X)\;
+η𝑷𝑻(𝑷(Z(𝐭))Y)\displaystyle+\;\eta\boldsymbol{P}^{\boldsymbol{T}}(\boldsymbol{P}\boldsymbol{(}Z^{(\mathbf{t})}\boldsymbol{)}\;-\;Y)\;
(16) +β1(Z(t)U(t))+β2(Z(t)V(t)))\displaystyle+\beta_{1}(Z^{(t)}\;-\;U^{(t)})+{\beta_{2}}(Z^{(t)}\;-\;V^{(t)})\;)\;

where 𝑼𝒑()\boldsymbol{Up}(\cdot) and 𝑫𝒐𝒘𝒏()\boldsymbol{Down}(\cdot) denote the up-sampling and down-sampling operator in spatial resolution, respectively, and 𝑷()\boldsymbol{P}(\cdot) and 𝑷𝑻()\boldsymbol{P^{T}}(\cdot) perform the cross-modal transform functions.

Then, the updated Z(t+1)Z^{(t+1)} is fed into the next stage to refine the estimate UU and VV again. The denoising module and the reconstruction module are alternatively updated TT times until reaching the final reconstruction.

The overall network architecture of the MGDUN is shown in Fig. 1, which contains TT stages that are intentionally designed to correspond to TT iterations in the PGD optimization algorithm. Each stage of MGDUN consists of three specified network modules, containing a deep denoising module for denoising and updating for auxiliary variables, a cross-modal transform Module for cross-modal transform function, and a reconstruction module for reconstruction and updating for ZZ. We will elaborate on each module next.

3.3.2. The deep denoising module


The design of deep denoising modules corresponds to the computing the updated estimate U(t)U^{(t)} and V(t)V^{(t)} in Eq. 3.2.2 and Eq. 3.2.2, respectively. In general, any existing image denoising network can be used as the denoising module here.

As shown in Fig. 1, the intermediate estimates U(t1)U^{(t-1)} and V(t1)V^{(t-1)} are fed into the proximal operator after weighting with the intermediate estimate Z(t)Z^{(t)} for further refinement. In this paper, we have adopted a variant of U-net as the backbone of the deep denoising module. Other more effective networks for medical image denoising can be also adopted. The U-net denoising network consists of an encoder and a decoder. As shown in Fig. 2(a), the encoder consists of four encoding blocks, which contain two convolutional layers with 3×33\times 3 kernels and ReLU nonlinearity. Corresponding to the encoder, the decoder also consists of four decoding blocks, which contain two convolutional layers with 3×33\times 3 kernels and ReLU nonlinearity. Instead of predicting the refine auxiliary various directly, we enable the denoising module to predict the residual by adding a skip connection from the input to the output. To reduce the number of network parameters and the effect of overfitting, we opt to enforce all denoising modules sharing the same network parameters.

3.3.3. Cross-Modal Transform Module


Noted that Eq. 16 involves the cross-modal transform matrix 𝑷\boldsymbol{P} and 𝑷𝑻\boldsymbol{P^{T}} that are expensive to calculate. We find that 𝑷\boldsymbol{P} and 𝑷𝑻\boldsymbol{P^{T}} perform a cross-modal transform operation, and the two processes are inverted: one from the target contrast image to the guide contrast image, and the other from the guide contrast image to the target contrast image. The process can be formulated as:

(17) 𝔗(t)=\displaystyle\mathfrak{T}^{(t)}\;= 𝑷(Z(t))\displaystyle\;\boldsymbol{P}(Z^{(t)})
(18) Z(t)=\displaystyle Z^{(t)}\;= 𝑷𝐓(𝔗(t))\displaystyle\;\boldsymbol{P}^{\mathbf{T}}(\mathfrak{T}^{(t)})

As a result, we design cross-modal transform modules using the principle of invertible neural networks (INNs). INNs have been adopted in various inference tasks and achieved excellent performance due to their flexibility (Kingma et al., 2016; Lu et al., 2021). We formulate an INN architecture design to serve as cross-modal transform operation. It consists of two pixel shuffling layers (Dinh et al., 2016) and several INN blocks. As shown in Figure 2(b), relevant invertible modules are embedded in the cross-modal transform module.

For the t-th stage, given an evaluation HR MRI (Z(t)Z^{(t)}) to be refined, we first put it to one pixel shuffling layer for changing dimension, then pass through several INN blocks to execute the cross-modal transform function, and finally restore the original dimension through the other pixel shuffling layer.

For the forward operation, one pixel shuffling layer executes dimension addition first. Then the input (Z(t)Z^{(t)}) is divided into (Z1(t)Z_{1}^{(t)}) and (Z2(t)Z_{2}^{(t)}) along the channel axis, and the corresponding cross-modal transform output is 𝔗1(t)\mathfrak{T}_{1}^{(t)} and 𝔗2(t)\mathfrak{T}_{2}^{(t)} (two components of 𝔗(t)\mathfrak{T}^{(t)}). This process corresponds to the operation of cross-modal transform matrix 𝑷\boldsymbol{P}, in which an INN block can be expressed as :

(21) {𝔗1(t)=Z1(t)+φ(Z2(t))𝔗2(t)=Z2(t)exp(ρ(𝔗1(t)))+η(𝔗1(t))\displaystyle\left\{\begin{array}[]{l}\mathfrak{T}_{1}^{(t)}\;=\;Z_{1}^{(t)}\;+\;\varphi(Z_{2}^{(t)})\\ \mathfrak{T}_{2}^{(t)}\;=\;Z_{2}^{(t)}\odot exp(\rho(\mathfrak{T}_{1}^{(t)}))\;+\;\eta(\mathfrak{T}_{1}^{(t)})\end{array}\right.

where φ()\varphi(\cdot), η()\eta(\cdot) and ρ()\rho(\cdot) are arbitrary functions, exp()exp(\cdot) is Exponential functions, and \odot is the Hadamard product.

Accordingly, for the backward operation, given [𝔗1(t)\mathfrak{T}_{1}^{(t)}, 𝔗2(t)\mathfrak{T}_{2}^{(t)}], it is easy to calculate [Z1(t)Z_{1}^{(t)}, Z2(t)Z_{2}^{(t)}] as:

(24) {Z2(t)=(𝔗2(t)η(𝔗1(t)))exp(ρ(𝔗1(t)))Z1(t)=𝔗1(t)φ(Z2(t))\displaystyle\left\{\begin{array}[]{l}Z_{2}^{(t)}\;=\;(\mathfrak{T}_{2}^{(t)}\;-\;\eta(\mathfrak{T}_{1}^{(t)}))\odot exp(-\rho(\mathfrak{T}_{1}^{(t)}))\\ Z_{1}^{(t)}\;=\;\mathfrak{T}_{1}^{(t)}\;-\;\varphi(Z_{2}^{(t)})\end{array}\right.

This process corresponds to the operation of matrix 𝑷𝑻\boldsymbol{P^{T}} in Eq. 18. The forward and backward operations are shown in Fig. 2.

3.3.4. Reconstruction Module


The design of the reconstruction module corresponds to the update of intermediate evaluated Z(T)Z^{(T)} as described in Eq. 16. With the output of denoising modules (U(t)U^{(t)}, V(t)V^{(t)}), the evaluated HR image Z(t)Z^{(t)}, the LR MRI and the guide image YY, we can reconstruct the updated image Z(t+1)Z^{(t+1)}. The architecture of the reconstruction module is shown in Fig. 1 and Eq. 16, which still involve the degradation operations ((DK)T{(DK)^{T}} and DK{DK}). The pair of operation (DK)T{(DK)^{T}} and DK{DK} can be implemented by up-sampling and down-sampling layers for modeling capability.

The operators (DK)T{(DK)^{T}} and DK{DK} are simulated using a convolution network layer respectively. Specifically, DK{DK} is simulated by a network called down-sampling-blocks (DownDown) consisting of a convolutional layer with 3×33\times 3 kernels and 64 channels, one max pool layer to decrease the spatial resolution, and two convolutional layers with 3×33\times 3 kernels for reprojection to the original dimension (as shown in Figure 2(c)). Similarly, the (DK)T{(DK)^{T}} is simulated by a network call Up-sampling-blocks (UpUp) consisting of a convolutional layer with 3×33\times 3 kernels and 64 channels, and one upsample layer to increase the spatial resolution and two convolutional layers with 3×33\times 3 kernels for reprojection to the original dimension as shown in Figure 2(d).

3.3.5. network training


We will apply this model for the super-resolution reconstruction of LR T2WI with the aid of HR PDWI (or HR T1WI).

Our MGDUN is supervised by the 1\mathcal{L}_{1} loss between Z(T)Z^{(T)} and the groundtruth ZZ. Then, the overall network is trained by minimizing the following loss function:

(25) Θ=argminΘi=1Ng(Xi,Yi;Θ)Zi1\displaystyle\Theta\;=\;\underset{\Theta}{argmin}\sum_{i=1}^{N}{\left\|g(X_{i},\;Y_{i};\;\Theta)\;-\;Z_{i}\right\|}_{1}

where XiX_{i}, YiY_{i}, and ZiZ_{i} denote the ithi^{th} pair of target contrast LR T2WI, guide HR PDWI, and the original target contrast HR T2WI, respectively. g(;Θ)g(\cdot;\Theta) denotes the reconstructed HR T2WI by the network with parameter Θ\Theta.

Table 1. The comparisons of average PSNR, SSIM, and MSE on IXI and BraTs datasets with 2×2\times and 4×4\times enlargements. The best and second best results are highlighted in red and blue color, respectively. The up or down arrows indicate higher or lower values corresponding to better results.
Datasets IXI BraTs
Scales 2×\times SR 4×\times SR 2×\times SR 4×\times SR
Metrics PSNR \uparrow SSIM \uparrow MSE \downarrow PSNR \uparrow SSIM \uparrow MSE \downarrow PSNR \uparrow SSIM \uparrow MSE \downarrow PSNR \uparrow SSIM \uparrow MSE \downarrow
Bicubic 24.5537 0.7584 15.4654 24.3658 0.7508 15.8045 26.5771 0.8413 12.1369 26.3637 0.8360 12.4472
EDSR (Lim et al., 2017) 31.4066 0.9290 7.0586 29.5377 0.9028 8.7625 33.2810 0.9557 5.6549 31.9166 0.9419 6.6225
MDSR (Lim et al., 2017) 30.9519 0.9242 7.5883 29.6663 0.9042 8.6254 33.3624 0.9567 5.5991 32.0175 0.9430 6.5372
RCAN (Zhang et al., 2018) 31.9391 0.9351 6.6395 31.3783 0.9301 7.0721 34.1138 0.9631 5.1123 32.7279 0.9504 6.0093
CoISTA (Deng and Dragotti, 2019) 31.4199 0.9121 7.0657 29.4435 0.8757 8.516 29.2042 0.9028 9.088 27.4678 0.8812 11.0865
T2Net (Feng et al., 2021c) 30.0556 0.9117 8.2676 29.4629 0.9014 8.8596 32.9922 0.9530 5.8623 31.7000 0.9389 6.8210
SANet (Feng et al., 2021d) 36.7565 0.9683 3.7998 35.2765 0.9616 4.5115 35.0775 0.9636 4.4661 33.5158 0.9578 4.9808
MGDUN(Ours) 37.3366 0.9691 3.5598 35.9786 0.9637 4.1639 35.9690 0.9703 4.1529 34.5577 0.9615 4.8933
Refer to caption
Figure 3. Qualitative comparisons of all methods on IXI dataset. The first row visualizes the visual effects of different methods, and the last row visualizes the error map between the SR results and the ground truths.
Table 2. The results of different configurations on IXI dataset. The up or down arrows indicate higher or lower values corresponding to better results.
configuration Stages PSNR \uparrow SSIM \uparrow MSE \downarrow
I 3 36.7565 0.9683 3.7998
II(Ours) 4 37.3366 0.9691 3.5598
III 5 37.9190 0.9716 3.3378
IV 6 38.0198 0.9722 3.3206
configuration Guide image PSNR SSIM MSE
V 34.7081 0.9568 4.8183
VI(concat) 35.9719 0.9630 4.1703
VII(Ours) 37.3366 0.9691 3.5598
configuration INN blocks PSNR SSIM MSE
VIII 1 36.8480 0.9682 3.7681
IX(Ours) 2 37.3366 0.9691 3.5598
X 3 37.4740 0.9703 3.5038
configuration Denoiser PSNR SSIM MSE
XI(Ours) U-Net 37.3366 0.9691 3.5598
XII Resnet 37.1130 0.9686 3.6551
Table 3. The parameters and testing time results of 2×2\times enlargement
Methods Bicubic EDSR MDSR RCAN CoISTA T2Net SANet MGDUN
# Params (M) - 1.37 0.34 12.46 16.43 0.68 11.41 1.68
# Testing time (s) - 4.16 2.71 1.60 1.50 1.65 2.21 1.97

4. EXPERIMENT

4.1. Datasets

IXI Dataset. The IXI dataset contains registered T2 weighted and PD weighted MR images of 578 patients. T2 weighted images were used for SR and the PD weighted images served as the guidance. We excluded a few slices of each volume as the frontal slices are much noisier than the others, making their distribution different111More details can be obtained from http://brain-development.org/ixi-dataset/.. We splitted the IXI dataset patient-wisely into a ratio of 7:1:2 for training/validation/testing. The size of original HR images of both T2 and PD weighted images is 256×256256\times 256.

BraTs Dataset. The BraTs dataset (2019) contains multimodal brain data, including registered T1, T1ce, T2, and PD weighted images. Similar to the IXI dataset, we chose T2 weighted images for SR and T1 weighted images for guidance. The size of an original HR image is 240×240240\times 240. 3350 pairs were used for training, and 1250 paired images were used for validation.

Finally, each T2 image was blurred with a 3×3 Gaussian filter and down-sampled. We obtained an LR image of desired dimensions after bicubic interpolation. Before training, for numerical stability, all images were normalized over the range of [0,10,1].

Metrics: The peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and mean-square error (MSE) were used to evaluate the image quality of MCSR results (the greater values indicate better performance).

4.2. Implementation Details

We implement MGDUN in the Pytorch framework with an NVIDIA GeForce RTX 3080TI GPU. In the training phase, our model is trained using the ADAM optimizer with a learning rate of 1e51e-5, for 200200 epochs. The parameters α\alpha and β\beta are empirically set to be 0.9 and 0.999. The batch size is set as 4. We also unfold the whole image with this approach during testing. The default stage number TT is set to be 4, and the number of INN blocks in the cross-modal transform module is set to 2. And the source code is available at https://github.com/yggame/MGDUN.

4.3. Results

We compare our results with a number of models including classical methods, single-contrast methods, multi-contrast methods, and unfolding methods. More specifically, the proposed method is compared with Bicubic, EDSR (Lim et al., 2017), MDSR (Lim et al., 2017), RCAN (Zhang et al., 2018), CoISTA (Deng and Dragotti, 2019), T2Net (Feng et al., 2021c), and SANet (Feng et al., 2021d). The benchmark results are trained using the same ways and training datasets as described in their corresponding works.

4.3.1. Quantitative results


For quantitative evaluation, we utilize the average PSNR, SSIM, and MSE. Tab. 1 reports the target contrast reconstruction performance on different datasets under 2×2\times and 4×4\times enlargements where the best and second best values are highlighted in bold red and underline blue, respectively. It is clearly noted that our method achieves the best performance on different enlargements of different datasets. The results demonstrate that our model can effectively fuse the two contrasts, which is beneficial to the SR reconstruction of the target contrast. This substantiates the effectiveness and flexibility of our method with a certain degree of generalization.

Refer to caption
Figure 4. Qualitative comparison of all methods on BraTs dataset. The first row visualizes the visual effects of different methods, and the last row visualizes the error map between the SR results and the ground truths.

4.3.2. Qualitative results


We provide qualitative comparison results on the IXI dataset as well as the BraTs dataset and their corresponding error maps in Fig. 3 and Fig. 4. The texture of error maps represents the restoration error, the smoother the texture, the better the reconstruction. As we can see, the input has significant aliasing artifacts and lacks anatomical details. It can be noted that our model recovers the image with fewer visible artifacts and reconstructs more details than other competing methods. The quality improvement achieved by MDCUN may be associated with the full usage of the feature maps from the former stages to refine the final results.

4.4. Ablation Study

To further verify the performance of the proposed model under different configurations, a series of ablation studies are carried out, including 1) Effects of the number of stages; 2) Influence of the guide images; 3) Effect of the number of INN blocks in cross-modal transform Module; 4) Effect of the denoising modules. In this section, only IXI dataset is used.

Effect of number of stages     To explore the impact of the number of unfolding stages on the performance, we report the results for different realizations of the proposed model with varying number t of unfolding stages as described in Eq. 3.2.2 to Eq. 12. Tab. 2(I-IV) shows the performance of different stages from 3 to 6. It can be observed that the performance increases as the number of stages increases. We choose T=4T=4 in our implementation to balance the performance and computational complexity.

Influence of the guide images     The guide images are used to provide complementary information for recovering a target modality. In order to verify the effectiveness of the guide images, we conduct a series of ablation studies (e.g., removing the guide image and using a simple multi-contrast fusion mechanism). The results are shown in Tab. 2(V-VII). As we can see, our approach is an effective strategy and improve performance.

Effect of the number of INN blocks in cross-modal transform Module     We additionally perform a comparative experiment to verify the effectiveness of INN blocks in the cross-modal transform module. As shown in Tab. 2(VIII-X), the performance will increase as the number of INN blocks increases. In other words, when dealing with cross-modal transform functions, the reconstruction capability of our method could be improved by appropriately increasing the number of blocks.

Effect of the denoising modules To verify the effectiveness of U-net denoising modules, we further implement an ablation study, in which U-net denoising modules are replaced by Resnet denoising modules containing a similar number of parameters. As we can see in Tab. 2(XI-XII), the proposed methods with U-net denoiser outperforms Resnet denoiser.

4.5. Cost-performance Trade-off

To demonstrate the trade-off between the cost and the performance, we compare this against several existing SCSR methods in Tab. 3. It’s noted that our model can achieve better performance than others with comparable model sizes. Although it does not achieve the best performance in terms of testing time, the proposed method still has notable advantages over other competing methods. The results demonstrate that our method can yield satisfying performance with a good trade-off between cost and performance compared to other deep learning-based methods.

5. CONCLUSION

The interpretable deep learning model is a promising approach for the recovery of medical images as trustworthiness is required in clinical practice. In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction and show how to unfold it by deep convolutional network implementation for multi-contrast medical image SR reconstruction. Our MGDUN is capable of better exploring domain knowledge in MCSR tasks and optimizing model-guided SR reconstruction algorithm in an interpretable manner.

Acknowledgements.
This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 61922075, the USTC Research Funds of the Double First-Class Initiative under Grants YD2100002004 and KY2100000123 as well as the University Synergy Innovation Program of Anhui Province No. GXXT-2019-025. We acknowledge the support of GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC.

References

  • (1)
  • Afonso et al. (2010) Manya V Afonso, José M Bioucas-Dias, and Mário AT Figueiredo. 2010. Fast image recovery using variable splitting and constrained optimization. IEEE transactions on image processing 19, 9 (2010), 2345–2356.
  • Bertocchi et al. (2020) Carla Bertocchi, Emilie Chouzenoux, Marie-Caroline Corbineau, Jean-Christophe Pesquet, and Marco Prato. 2020. Deep unfolding of a proximal interior point method for image restoration. Inverse Problems 36, 3 (2020), 034005.
  • Boyd et al. (2011) Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3, 1 (2011), 1–122.
  • Brown and Semelka (2011) Mark A Brown and Richard C Semelka. 2011. MRI: basic principles and applications. John Wiley & Sons.
  • Chaudhari et al. (2018) Akshay S Chaudhari, Zhongnan Fang, Feliks Kogan, Jeff Wood, Kathryn J Stevens, Eric K Gibbons, Jin Hyung Lee, Garry E Gold, and Brian A Hargreaves. 2018. Super-resolution musculoskeletal MRI using deep learning. Magnetic resonance in medicine 80, 5 (2018), 2139–2154.
  • Chen et al. (2018b) Chang Chen, Zhiwei Xiong, Xinmei Tian, and Feng Wu. 2018b. Deep boosting for image denoising. In Proceedings of the European Conference on Computer Vision (ECCV). 3–18.
  • Chen et al. (2018a) Yuhua Chen, Yibin Xie, Zhengwei Zhou, Feng Shi, Anthony G Christodoulou, and Debiao Li. 2018a. Brain MRI super resolution using 3D deep densely connected neural networks. In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 739–742.
  • Deng and Dragotti (2019) Xin Deng and Pier Luigi Dragotti. 2019. Deep coupled ISTA network for multi-modal image super-resolution. IEEE Transactions on Image Processing 29 (2019), 1683–1698.
  • Dinh et al. (2016) Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. 2016. Density estimation using real nvp. arXiv preprint arXiv:1605.08803 (2016).
  • Dong et al. (2018) Weisheng Dong, Peiyao Wang, Wotao Yin, Guangming Shi, Fangfang Wu, and Xiaotong Lu. 2018. Denoising prior driven deep neural network for image restoration. IEEE transactions on pattern analysis and machine intelligence 41, 10 (2018), 2305–2318.
  • Feng et al. (2021a) Chun-Mei Feng, Huazhu Fu, Shuhao Yuan, and Yong Xu. 2021a. Multi-Contrast MRI Super-Resolution via a Multi-Stage Integration Network. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI).
  • Feng et al. (2021b) Chun-Mei Feng, Huazhu Fu, Shuhao Yuan, and Yong Xu. 2021b. Multi-contrast mri super-resolution via a multi-stage integration network. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 140–149.
  • Feng et al. (2021c) Chun-Mei Feng, Yunlu Yan, Huazhu Fu, Li Chen, and Yong Xu. 2021c. Task Transformer Network for Joint MRI Reconstruction and Super-Resolution. In International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI).
  • Feng et al. (2021d) Chun-Mei Feng, Yunlu Yan, Chengliang Liu, Huazhu Fu, Yong Xu, and Ling Shao. 2021d. Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution. arXiv preprint arXiv:2109.01664 (2021).
  • Gregor and LeCun (2010) Karol Gregor and Yann LeCun. 2010. Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning. 399–406.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  • Huo et al. (2018) Yuankai Huo, Zhoubing Xu, Shunxing Bao, Camilo Bermudez, Hyeonsoo Moon, Prasanna Parvathaneni, Tamara K Moyo, Michael R Savona, Albert Assad, Richard G Abramson, et al. 2018. Splenomegaly segmentation on multi-modal MRI using deep convolutional networks. IEEE transactions on medical imaging 38, 5 (2018), 1185–1196.
  • Jafari-Khouzani (2014) Kourosh Jafari-Khouzani. 2014. MRI upsampling using feature-based nonlocal means approach. IEEE transactions on medical imaging 33, 10 (2014), 1969–1985.
  • Kingma et al. (2016) Durk P Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya Sutskever, and Max Welling. 2016. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems 29 (2016), 4743–4751.
  • Kokkinos and Lefkimmiatis (2018) Filippos Kokkinos and Stamatios Lefkimmiatis. 2018. Deep image demosaicking using a cascade of convolutional residual denoising networks. In Proceedings of the European Conference on Computer Vision (ECCV). 303–319.
  • Kuklisova-Murgasova et al. (2012) Maria Kuklisova-Murgasova, Gerardine Quaghebeur, Mary A. Rutherford, Joseph V. Hajnal, and Julia A. Schnabel. 2012. Reconstruction of fetal brain MRI with intensity matching and complete outlier removal. Medical Image Analysis 16, 8 (2012), 1550–1564. https://doi.org/10.1016/j.media.2012.07.004
  • Lim et al. (2017) Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 136–144.
  • Lu et al. (2021) Shao-Ping Lu, Rong Wang, Tao Zhong, and Paul L Rosin. 2021. Large-capacity image steganography based on invertible neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10816–10825.
  • Lu et al. (2015) Xiaoqiang Lu, Zihan Huang, and Yuan Yuan. 2015. MR image super-resolution via manifold regularized sparse learning. Neurocomputing 162 (2015), 96–104.
  • Lyu et al. (2020) Qing Lyu, Hongming Shan, Cole Steber, Corbin Helis, Chris Whitlow, Michael Chan, and Ge Wang. 2020. Multi-contrast super-resolution MRI through a progressive network. IEEE transactions on medical imaging 39, 9 (2020), 2738–2749.
  • Mai et al. (2011) Zhenhua Mai, Jeny Rajan, Marleen Verhoye, and Jan Sijbers. 2011. Robust edge-directed interpolation of magnetic resonance images. Physics in Medicine & Biology 56, 22 (2011), 7287.
  • Manjón et al. (2010) José V Manjón, Pierrick Coupé, Antonio Buades, D Louis Collins, and Montserrat Robles. 2010. MRI superresolution using self-similarity and image priors. International journal of biomedical imaging 2010 (2010).
  • McDonagh et al. (2017) Steven McDonagh, Benjamin Hou, Amir Alansary, Ozan Oktay, Konstantinos Kamnitsas, Mary Rutherford, Jo V Hajnal, and Bernhard Kainz. 2017. Context-sensitive super-resolution for fast fetal magnetic resonance imaging. In Molecular Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and Treatment. Springer, 116–126.
  • Ning et al. (2020) Qian Ning, Weisheng Dong, Guangming Shi, Leida Li, and Xin Li. 2020. Accurate and lightweight image super-resolution with model-guided deep unfolding network. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2020), 240–252.
  • Pham et al. (2017) Chi-Hieu Pham, Aurélien Ducournau, Ronan Fablet, and François Rousseau. 2017. Brain MRI super-resolution using deep 3D convolutional networks. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). IEEE, 197–200.
  • Plenge et al. (2012) Esben Plenge, Dirk HJ Poot, Monique Bernsen, Gyula Kotek, Gavin Houston, Piotr Wielopolski, Louise van der Weerd, Wiro J Niessen, and Erik Meijering. 2012. Super-resolution methods in MRI: can they improve the trade-off between resolution, signal-to-noise ratio, and acquisition time? Magnetic resonance in medicine 68, 6 (2012), 1983–1993.
  • Rousseau et al. (2010) François Rousseau, Alzheimer’s Disease Neuroimaging Initiative, et al. 2010. A non-local approach for image super-resolution using intermodality priors. Medical image analysis 14, 4 (2010), 594–605.
  • Song et al. (2021) Jiechong Song, Bin Chen, and Jian Zhang. 2021. Memory-Augmented Deep Unfolding Network for Compressive Sensing. In Proceedings of the 29th ACM International Conference on Multimedia. 4249–4258.
  • Sun et al. (2016) Jian Sun, Huibin Li, Zongben Xu, et al. 2016. Deep ADMM-Net for compressive sensing MRI. Advances in neural information processing systems 29 (2016).
  • Wisdom et al. (2017) Scott Wisdom, Thomas Powers, James Pitton, and Les Atlas. 2017. Building recurrent networks by unfolding iterative thresholding for sequential sparse recovery. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4346–4350.
  • You et al. (2021) Di You, Jingfen Xie, and Jian Zhang. 2021. ISTA-Net++: flexible deep unfolding network for compressive sensing. In 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
  • Zeng et al. (2018) Kun Zeng, Hong Zheng, Congbo Cai, Yu Yang, Kaihua Zhang, and Zhong Chen. 2018. Simultaneous single-and multi-contrast super-resolution for brain MRI images based on a convolutional neural network. Computers in biology and medicine 99 (2018), 133–141.
  • Zhang and Ghanem (2018) Jian Zhang and Bernard Ghanem. 2018. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1828–1837.
  • Zhang et al. (2020) Kai Zhang, Luc Van Gool, and Radu Timofte. 2020. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3217–3226.
  • Zhang et al. (2021) Yulun Zhang, Kai Li, Kunpeng Li, and Yun Fu. 2021. MR Image Super-Resolution with Squeeze and Excitation Reasoning Attention Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13425–13434.
  • Zhang et al. (2018) Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV). 286–301.
  • Zhao et al. (2018) Can Zhao, Aaron Carass, Blake E Dewey, Jonghye Woo, Jiwon Oh, Peter A Calabresi, Daniel S Reich, Pascal Sati, Dzung L Pham, and Jerry L Prince. 2018. A deep learning based anti-aliasing self super-resolution algorithm for MRI. In International conference on medical image computing and computer-assisted intervention. Springer, 100–108.
  • Zhao et al. (2019) Xiaole Zhao, Yulun Zhang, Tao Zhang, and Xueming Zou. 2019. Channel splitting network for single MR image super-resolution. IEEE Transactions on Image Processing 28, 11 (2019), 5649–5662.
  • Zheng et al. (2017) Hong Zheng, Xiaobo Qu, Zhengjian Bai, Yunsong Liu, Di Guo, Jiyang Dong, Xi Peng, and Zhong Chen. 2017. Multi-contrast brain magnetic resonance image super-resolution using the local weight similarity. BMC medical imaging 17, 1 (2017), 1–13.
  • Zheng et al. (2018) Hong Zheng, Kun Zeng, Di Guo, Jiaxi Ying, Yu Yang, Xi Peng, Feng Huang, Zhong Chen, and Xiaobo Qu. 2018. Multi-contrast brain MRI image super-resolution with gradient-guided edge enhancement. IEEE Access 6 (2018), 57856–57867.