This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Xiangtan University, Xiangtan Hunan, China 22institutetext: Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China 33institutetext: Pengcheng Laboratory, Shenzhen , Guangdong, China 44institutetext: Jinan University, Guangzhou, Guangdong, China
44email: [email protected], [email protected], [email protected]

SelfCoLearn: Self-supervised collaborative
learning for accelerating dynamic MR imaging

Juan Zou 1122    Cheng Li 22    Sen Jia 22    Ruoyou Wu 2233    Tingrui Pei 1144    Hairong Zheng 2233    Shanshan Wang 2233
Abstract

Lately, deep learning has been extensively investigated for accelerating dynamic magnetic resonance (MR) imaging, with encouraging progresses achieved. However, without fully sampled reference data for training, current approaches may have limited abilities in recovering fine details or structures. To address this challenge, this paper proposes a self-supervised collaborative learning framework (SelfCoLearn) for accurate dynamic MR image reconstruction from undersampled k-space data. The proposed framework is equipped with three important components, namely, dual-network collaborative learning, reunderampling data augmentation and a specially designed co-training loss. The framework is flexible to be integrated with both data-driven networks and model-based iterative un-rolled networks. Our method has been evaluated on in-vivo dataset and compared to four state-of-the-art methods. Results show that our method possesses strong capabilities in capturing essential and inherent representations for direct reconstructions from the undersampled k-space data and thus enables high-quality and fast dynamic MR imaging.

Keywords:
Dynamic MR imaging Self-supervised learning Collaborative learning Reunderampling data augmentation Co-training loss

1 Introduction

Deep learning-based dynamic magnetic resonance (MR) imaging has attracted substantial attention in recent years. It draws knowledge from big datasets via network training and then uses the trained network to conduct dynamic image reconstruction from the undersampled k-space data. Compared to the classical compressed sensing methods [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], deep learning-based methods have made encouraging performances and progresses.

Based on the reliance on the fully sampled dataset or not, existing methods can be roughly categorized into two types, fully-supervised methods and unsupervised ones. For the fully-supervised methods, data pairs are needed for the training of the neural networks between the undersampled/corrupted data and the fully sampled/ reference data [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. In this category, different network structures and prior knowledge have been explored [22, 23, 24, 25, 26, 27, 28, 29]. For example, Schlemper et al. [25] proposed a cascade network architecture composed of an intermediate de-aliasing convolutional neural network (CNN) module and a data consistency layer. Chen et al. [26] applied bidirectional convolutional recurrent neural network (CRNN) with interleaved data consistency to accelerate MR imaging. Chen et al. [27] designed a parallel framework, including a time-frequency domain CRNN and an image domain CRNN to simultaneously exploit spatiotemporal correlations. Huang et al.[28] unrolled the low-rank plus sparse method [9] into a deep neural network to learn the low rank and sparse regularization. Ke et al. [29] exploited the low rank priors (SLR-Net). These methods have made great progresses in accelerating dynamic MR imaging. However, one major challenge of these methods is that, in many practical imaging scenarios, obtaining high-quality fully sampled dynamic MRI data is infeasible due to various factors, such as physiological motions of patients, imaging speed restriction, etc. Therefore, the requirement of fully sampled reference data for network training has hindered the wide application of these methods.

To address this issue, researchers have developed unsupervised learning methods to train models without fully sampled reference data [30, 31, 32]. For example, Ke et al. [30] generated a pseudo reference label from undersampled data by merging neighbouring frames of undersampled k-space data. Jin et al. [31] extended the framework of deep image prior [33] to dynamic non-cartesian MRI. Recently, Yaman et al. [34] proposed a classical self-supervised learning method (SSDU) for static MR imaging. SSDU divides the acquired undersampled data into two parts. One part is treated as the input data, and the other is utilized as the supervisory signals [35]. Subsequently, Acar et al. [32] applied SSDU to reconstruct dynamic MR images. All of these works have made great contributions to unsupervised dynamic MR image reconstruction. Nevertheless, these works still have spaces to improve in recovering fine details or structures due to the incomplete inherent representation of the undersampled data compared to fully sampled data.

To boost the performances for accelerating dynamic MR imaging without fully sampled reference data, we propose a self-supervised collaborative learning framework (SelfCoLearn). Taking the assumption that the latent representation of network predictions is consistent under different reundersampling data augmentation from the same data. SelfCoLearn apply collaborative training of dual-network with reundersampling data augmentation to explore more sufficient prior knowledge compared to a single network. Specifically, from undersampled k-space data, the operations of reundersampling data augmentation are implemented to obtain two reundersampling inputs for dual-network. Dual networks are trained collaboratively with a specially designed co-training loss in an end-to-end manner. With this collaborative training strategy, the proposed framework can possess strong capabilities in capturing essential and inherent representations from the undersamled k-space data in self-supervised learning manner. Additionally, the framework is flexible to be integrated with both data-driven networks and model-based iterative un-rolled networks [36] for dynamic MR imaging. The main contributions can be described as follows:

  • We present a self-supervised collaborative learning framework with reundersampling data augmentation for accelerating dynamic MR imaging. The proposed framework is flexible to be integrated with both data-driven networks and iterative un-rolled networks.

  • A co-training loss, which includes an undersampled consistency loss term and a contrastive consistency loss term, is designed to guide the end-to-end framework to capture essential and inherent representations from the undersamled k-space data.

  • Extensive experiments are conducted to evaluate the effectiveness of the proposed SelfCoLearn with both data-driven and iterative un-rolled networks, with more promising results obtained compared to four state-of-the-art methods.

The remainder of this paper is organized as follows: Section II states the dynamic MR imaging problem and the proposed SelfCoLearn with different backbone networks. Section III summarizes the experiments and results to demonstrate the effectiveness of the proposed SelfCoLearn, while discussion is presented in section IV. Section V concludes the work.

2 Methodology

2.1 Dynamic MR Imaging Formulation

The goal of dynamic MR imaging is to estimate dynamic MR image sequences 𝐱N\mathbf{x}\in\mathbb{C}^{N} from undersampled measurements 𝐲M(MN)\mathbf{y}\in\mathbb{C}^{M}(M\ll N) in k-space. N=NhNWTN=N_{h}N_{W}T is a vector. NhN_{h} and NWN_{W} are height and width of the frame respectively. TT represents the number of frames in each sequence. Thus, the imaging model can be described as follows:

y=𝐀𝐱+e\mathrm{y}=\mathbf{A}\mathbf{x}+\mathrm{e} (1)

where eM\mathrm{e}\in\mathbb{C}^{M} is noise and 𝐀=𝐏𝐅\mathbf{A}=\mathbf{PF} is an undersampled Fourier encoding operator, 𝐅\mathbf{F} is 2D Fourier transform to each frame in dynamic image sequence and 𝐏\mathbf{P} is the undersampled mask for each frame. In general, the reconstruction problem is formulated as an unconstrained optimization problem to explore the prior knowledge:

x=argminx12𝐀𝐱y22+λ(𝐱)\mathrm{x}^{*}=\arg\min_{\mathrm{x}}\frac{1}{2}\|\mathbf{A}\mathbf{x}-\mathrm{y}\|_{2}^{2}+\lambda\mathcal{R}(\mathbf{x}) (2)

where (𝐱)\mathcal{R}(\mathbf{x}) represents a prior regularization item on 𝐱\mathbf{x}, and λ\lambda is the weight of the regularization. 12𝐀𝐱y22\frac{1}{2}\|\mathbf{A}\mathbf{x}-\mathrm{y}\|_{2}^{2} is the data fidelity item, which ensures the reconstruction result to be consistent with the original undersampled measurements.

For fully-supervised deep learning methods, it typically uses a CNN fCNN(𝐲θ)f_{CNN}\left(\mathbf{y}\mid\theta\right), as a regularization term (𝐱)\mathcal{R}(\mathbf{x}), by learning the mapping between undersampled/ corrupted data and their corresponding fully sampled data with parameters θ\theta. Its mathematical description can be given as:

argminθi=1S(fCNN(𝐲iθ),𝐱iref)\arg\min_{\theta}\sum_{i=1}^{S}\mathcal{L}\left(f_{CNN}\left(\mathbf{y}_{i}\mid\theta\right),\mathbf{x}_{i}^{ref}\right) (3)

where SS is the number of dynamic image sequences in the training set. 𝐱iref\mathbf{x}_{i}^{ref} is the fully sampled data of the subject data ii. ()\mathcal{L}(\cdot) denotes the loss function between the predicted reconstruction output and the fully sampled reference data, which typically adopts the l1l_{1}-norm or l2l_{2}-norm.

2.2 The Overall Framework

In our work, we propose a simple but effective self-supervised training framework for dynamic MR imaging, whose paradigm is shown in Fig. 1. Our framework trains two independent reconstruction networks simultaneously, which have different inputs and different weight parameters. The backbone network can adopt either data-driven network architecture or the iterative un-rolled network, such as CRNN [26], k-t NEXT [24] and SLR-Net [29] et al. Based on the consistency between two network predictions, the network provides complementary information for the to-be-reconstructed dynamic MR images in its peer partner, which is an additional regularization compared to the existing unsupervised methods [32]. The two networks will finally realize consistent reconstruction in the training process. Specifically, given a raw undersampled k-space data sequence Ω={𝐲Ωt}t=1T\Omega=\left\{\mathbf{y}_{\Omega}^{t}\right\}_{t=1}^{T}, we reundersampled the original k-space data 𝐲Ωt\mathbf{y}_{\Omega}^{t} to construct a partial data points sequence {𝐲ut}t=1T\left\{\mathbf{y}_{\mathrm{u}}^{t}\right\}_{t=1}^{T}:

𝐲ut=Put(𝐲Ωt),t=1,,T,u=Θ,Λ\mathbf{y}_{\mathrm{u}}^{t}=P_{\mathrm{u}}^{t}\left(\mathbf{y}_{\Omega}^{t}\right),\mathrm{t}=1,\ldots,\mathrm{T},\mathrm{u}=\Theta,\Lambda (4)

where tt is the index of sequence, uu denotes the index of the two training sequences and PutP_{\mathrm{u}}^{t} is the undersampled mask for frame tt. In order to make full use of all data points in 𝐲Ωt\mathbf{y}_{\Omega}^{t} to learn representation, and ensure each network can provide complementary information for the to-be-reconstructed dynamic MR images in its peer network, these training sequences are generated adhere to the following data augmented principles: (1) The union of data points in two training sequences must be equal to the data 𝐲Ωt\mathbf{y}_{\Omega}^{t}, i.e., 𝐲Ωt=𝐲Θt𝐲Λt\mathbf{y}_{\Omega}^{t}=\mathbf{y}_{\Theta}^{t}\cup\mathbf{y}_{\Lambda}^{t}. (2) The data points in two training sequences should be different, i.e., 𝐲Θt𝐲Λt\mathbf{y}_{\Theta}^{t}\neq\mathbf{y}_{\Lambda}^{t}. (3) The training sequences should include most of the low frequency data points and part of the high frequency data points. Following these principles, the two training sequences contain similar data points in the low frequency region, and different points in the high frequency region. Noted that the operation of data reundersampling is only necessary during training, the reconstructed images can be inferred from the test data directly.

Refer to caption
Figure 1: An overview of the proposed self-supervised collaborative training framework. The raw undersampled k-space data sequence 𝐲Ωt\mathbf{y}_{\Omega}^{t} is undersampled from the fully sampled data with the undersampled mask PtP^{t}. K-space data sequence 𝐲Θt\mathbf{y}_{\Theta}^{t} and 𝐲Λt\mathbf{y}_{\Lambda}^{t} are reundersampled from 𝐲Ωt\mathbf{y}_{\Omega}^{t} with reundersampled mask 𝐏Θt\mathbf{P}_{\Theta}^{t} and 𝐏Λt\mathbf{P}_{\Lambda}^{t}. The two networks received inputs from zero-filling image sequences of 𝐲Θt\mathbf{y}_{\Theta}^{t} and 𝐲Λt\mathbf{y}_{\Lambda}^{t}, respectively. The predicted image sequences of networks are transformed to k-space data f(yΘt)f\left(y_{\Theta}^{t}\right) and f(yΛt)f\left(y_{\Lambda}^{t}\right) with two-dimensional Fourier transform. A co-training loss is calculated utilizing 𝐲Ωt\mathbf{y}_{\Omega}^{t}, f(yΘt)f\left(y_{\Theta}^{t}\right) and f(yΛt)f\left(y_{\Lambda}^{t}\right). The backbone network can flexibly adopt both data-driven network or iterative un-rolled network, such as CRNN, k-t NEXT and SLR-Net et al. Collaborative network-1 and collaborative network-2 have the same network structure but with different weight parameters θΘ{\theta}_{\Theta} and θΛ{\theta}_{\Lambda} respectively.

2.3 Network Architectures

1) Data-driven dynamic MR imaging: In the data-driven settings, the common practice is to decouple Eq. 2 into a regularization term and a data fidelity term via utilizing the variable splitting technique [25, 26]. By introducing an auxiliary variable 𝐳=𝐱\mathbf{z}=\mathbf{x}, Eq.2 can be re-formulated as the penalty function:

argminx,zλ(𝐳)+12𝐀𝐱y22+μ𝐱𝐳22\arg\min_{\mathrm{x,z}}\lambda\mathcal{R}(\mathbf{z})+\frac{1}{2}\|\mathbf{A}\mathbf{x}-\mathrm{y}\|_{2}^{2}+\mu\|\mathbf{x}-\mathbf{z}\|_{2}^{2} (5)

where μ\mu is a penalty parameter. Eq.5 can then be solved iteratively via alternating minimization over 𝐳\mathbf{z} and 𝐱\mathbf{x}:

𝐳n=argminzλ(𝐳)+μ𝐱n1𝐳22\mathbf{z}^{n}=\arg\min_{\mathrm{z}}\lambda\mathcal{R}(\mathbf{z})+\mu\|\mathbf{x}^{n-1}-\mathbf{z}\|_{2}^{2} (6)
𝐱n=argminx12𝐀𝐱y22+μ𝐱𝐳n22\mathbf{x}^{n}=\arg\min_{\mathrm{x}}\frac{1}{2}\|\mathbf{A}\mathbf{x}-\mathrm{y}\|_{2}^{2}+\mu\|\mathbf{x}-\mathbf{z}^{n}\|_{2}^{2} (7)

where n{0,1,2,,N1}n\in\left\{0,1,2,\ldots,N-1\right\} is the nnth iteration, 𝐱0\mathbf{x}^{0} is the zero-filling image transformed from original undersampled measurement, 𝐳n\mathbf{z}^{n} denotes the intermediate reconstruction sequence, and 𝐱n\mathbf{x}^{n} denotes the final reconstruction sequence at each iteration. In Eq.7, the operation on the intermediate reconstruction sequence 𝐳n\mathbf{z}^{n} is a data consistency step, which uses the original sampled k-space data points to replace the corresponding data points in the reconstructed k-space data [25]. The iterative optimization process in Eq.6 and Eq.7 is unrolled into a neural network.

CRNN [26] is a typical data-driven method that integrates data consistency in k-space. A single iteration of CRNN can be illustrated as the following process:

𝐱rnn(n)=𝐱rec(n1)+CRNN(𝐱rec(n1))\mathbf{x}_{rnn}^{(n)}=\mathbf{x}_{rec}^{(n-1)}+\mathrm{CRNN}\left(\mathbf{x}_{rec}^{(n-1)}\right) (8)
𝐱rec(n)=DC(𝐱rnn(n);y,λ)\mathbf{x}_{rec}^{(n)}=\mathrm{DC}\left(\mathbf{x}_{rnn}^{(n)};\mathrm{y},\lambda\right) (9)

where 𝐱rnn(n)\mathbf{x}_{rnn}^{(n)} is the intermediate reconstruction sequence analogous to 𝐳n\mathbf{z}^{n} in Eq.6, and 𝐱rec(n)\mathbf{x}_{rec}^{(n)} denotes the final predicted result at each iteration analogous to 𝐱n\mathbf{x}^{n} in Eq.7. The regularization subproblem in Eq.6 is solved by using a convolutional recurrent neural network. The data consistency subproblem in Eq.7 is treated as a data consistency network layer. The unrolled architecture of CRNN is shown in Fig. 2. More details of CRNN layers can be found in [26].

Refer to caption
Figure 2: The unrolled architecture of CRNN with N iterations.

2) Un-rolled dynamic MR imaging: Another widely-used strategy is un-rolled dynamic MR imaging, which constructs CNNs according to the iterations of traditional optimization algorithms. Different optimization algorithms lead to different network architectures. SLR-Net, which formulates sparse and low rank priors as regularized terms in an optimization algorithm [29], is a typical example of un-rolled method. In SLR-Net, by introducing an auxiliary variable 𝐌\mathbf{M}, Eq.2 can be decoupled as the fidelity term, sparse regularization term and the low rank regularization term:

argminx,M12𝐀𝐱y22+λ1D𝐱1+λ2M\arg\min_{\mathrm{x,M}}\frac{1}{2}\|\mathbf{A}\mathbf{x}-\mathrm{y}\|_{2}^{2}+\lambda_{1}\|D\mathbf{x}\|_{1}+\lambda_{2}\|M\|_{*} (10)

where DD is a sparse transform in a certain sparse domain. M=R𝐱M=R\mathbf{x} is a matrix (with size (N𝐡×N𝐰N_{\mathbf{h}}\times N_{\mathbf{w}}, TT)), in which each column corresponds to one frame in dynamic MR image sequence. RR is a reshaping operator. M\|M\|_{*} is the nuclear norm. Previous works have proven that nuclear norm minimization is effective in low-rank matrix recovery [37]. More details of the iterative process in SLR-Net can be found in [29].

2.4 The Proposed Co-training Loss

We have designed a co-training loss to promote accurate dynamic MR image reconstruction in a self-supervised manner. The core idea of the co-training loss is to enforce the consistency not only between the reconstruction results and the original undersampled k-space data, but also between two network predictions. Compared with existing methods with single network, the consistency between two network predictions is an additional regularization, which guides the dual-network to learn more correct information. Specifically, the co-training loss in SelfCoLearn, including an undersampled consistency loss term and a contrastive consistency loss term, is calculated to optimize the whole framework.

Let fSelfCoLearn(𝐲Ωt)f_{SelfCoLearn}\left(\mathbf{y}_{\Omega}^{t}\right) denotes SelfCoLearn, 𝐲Ωt\mathbf{y}_{\Omega}^{t} is the original undersampled k-space data. During training, two training sequences 𝐲Θt\mathbf{y}_{\Theta}^{t} and 𝐲Λt\mathbf{y}_{\Lambda}^{t} are generated from 𝐲Ωt\mathbf{y}_{\Omega}^{t} following the data augmented principles in section II-B as follows:

𝐲Θt=𝐏Θt𝐲Ωt,𝐲Λt=𝐏Λt𝐲Ωt,\mathbf{y}_{\Theta}^{t}=\mathbf{P}_{\Theta}^{t}\mathbf{y}_{\Omega}^{t},\quad\mathbf{y}_{\Lambda}^{t}=\mathbf{P}_{\Lambda}^{t}\mathbf{y}_{\Omega}^{t}, (11)

where 𝐏Θt\mathbf{P}_{\Theta}^{t} and 𝐏Λt\mathbf{P}_{\Lambda}^{t} are reundersampled mask for 𝐲Ωt\mathbf{y}_{\Omega}^{t}. The undersampled consistency loss is mainly referred to the actually sampled k-space points in 𝐲Ωt\mathbf{y}_{\Omega}^{t}, which ensures the corresponding sampled points in network prediction consistent with actually sampled k-space points in 𝐲Ωt\mathbf{y}_{\Omega}^{t}. The corresponding sampled points yΘΩt\mathrm{y}_{\Theta\rightarrow\Omega}^{t} and yΛΩt\mathrm{y}_{\Lambda\rightarrow\Omega}^{t} in these two network predictions can be written as:

yΘΩt=𝐏tf(𝐲Θt),yΛΩt=𝐏tf(𝐲Λt),\mathrm{y}_{\Theta\rightarrow\Omega}^{t}=\mathbf{P}^{t}f\left(\mathbf{y}_{\Theta}^{t}\right),\quad\mathrm{y}_{\Lambda\rightarrow\Omega}^{t}=\mathbf{P}^{t}f\left(\mathbf{y}_{\Lambda}^{t}\right), (12)

where k-space data f(𝐲Θt)f\left(\mathbf{y}_{\Theta}^{t}\right) and f(𝐲Λt)f\left(\mathbf{y}_{\Lambda}^{t}\right) are transformed from the predicted image sequences of two networks, respectively. 𝐏t\mathbf{P}^{t} is the undersampled mask, which is applied to generate the original undersampled k-space data 𝐲Ωt\mathbf{y}_{\Omega}^{t} from the fully sampled data. Thus, the Undersampled Consistency loss term is used to calculate the mean-square-error between the actually sampled k-space points in 𝐲Ωt\mathbf{y}_{\Omega}^{t} and that in each network prediction as follows:

UC=𝐲ΘΩt𝐲Ωt22+𝐲ΛΩt𝐲Ωt22.\mathcal{L}_{UC}=\left\|\mathbf{y}_{\Theta\rightarrow\Omega}^{t}-\mathbf{y}_{\Omega}^{t}\right\|_{2}^{2}+\left\|\mathbf{y}_{\Lambda\rightarrow\Omega}^{t}-\mathbf{y}_{\Omega}^{t}\right\|_{2}^{2}. (13)

In the ideal case, when different reundersampled k-space data from the same data are feed into two networks, the network predictions shall approximate the fully sampled reference data with the network optimizations. However, when the fully sampled reference data are absent, these two networks may produce different predicted results only with the undersampled consistency loss, and result in different reconstruction performances. As mentioned above, a contrastive consistency loss is defined to compute the mean-square-error between two network predictions with different reundersampling inputs from the same data. Specially, our contrastive consistency loss term is mainly referred to the corresponding points in network predictions to unsampled k-space points in 𝐲Ωt\mathbf{y}_{\Omega}^{t}. These corresponding points 𝐲¯ΘΩt\bar{\mathbf{y}}_{\Theta\rightarrow\Omega}^{t} and 𝐲¯ΛΩt\bar{\mathbf{y}}_{\Lambda\rightarrow\Omega}^{t} in two network predictions f(𝐲Θt)f\left(\mathbf{y}_{\Theta}^{t}\right) and f(𝐲Λt)f\left(\mathbf{y}_{\Lambda}^{t}\right) can be written as:

𝐲¯ΘΩt=(𝐈𝐏t)f(𝐲Θt),𝐲¯ΛΩt=(𝐈𝐏t)f(𝐲Λt),\bar{\mathbf{y}}_{\Theta\rightarrow\Omega}^{t}=\left(\mathbf{I}-\mathbf{P}^{t}\right)f\left(\mathbf{y}_{\Theta}^{t}\right),\quad\bar{\mathbf{y}}_{\Lambda\rightarrow\Omega}^{t}=\left(\mathbf{I}-\mathbf{P}^{t}\right)f\left(\mathbf{y}_{\Lambda}^{t}\right), (14)

therefore, the Contrastive Consistency loss term is formulated as:

CC=y¯ΘΩty¯ΛΩt22.\mathcal{L}_{CC}=\left\|\bar{\mathrm{y}}_{\Theta\rightarrow\Omega}^{t}-\bar{\mathrm{y}}_{\Lambda\rightarrow\Omega}^{t}\right\|_{2}^{2}. (15)

Combining the two loss terms, our final co-training loss function can be written as:

co=UC+γCC,\mathcal{L}_{co}=\mathcal{L}_{UC}+\gamma\mathcal{L}_{CC}, (16)

where γ\gamma is used to balance the weight parameter of undersampled consistency loss and contrastive consistency loss. During testing phase, undersampled data sequence is set as input of collaborative network-1 to obtain the final reconstruction result.

3 Experimental Results

Extensive experiments have been performed to evaluate the effectiveness of the proposed method. The performance of SelfCoLearn is compared with that of four state-of-the-art fully-supervised and self-supervised learning methods at different acceleration factors. Besides, SelfCoLearn with different backbone networks have been experimented, including both data-driven networks and iterative un-rolled networks for dynamic MR imaging. Then, results of ablation studies are reported to investigate the impacts of the undersampled consistency loss term and contrastive consistency loss term. Finally, reconstruction results with different co-training loss calculated in different domains are reported to further validate the effectiveness of proposed SelfCoLearn.

3.1 Experimental Setup

1) Dataset: T1-weighted FLASH sequence is utilized to collect fully sampled cardiac data from 101 volunteers on a 3T scanner. All in vivo experiments have been approved by the Institutional Review Board (IRB) of Shenzhen Institutes of Advanced Technology, and written informed consent have been obtained from all volunteers. Each scan acquires a single slice from the volunteer with 25 temporal frames. The following parameters were used for the FLASH sequence: FOV 330×330 mm, acquisition matrix 192×192, slice thickness = 6 mm, TR/TE = 50 ms/3 ms, and 24 receiving coils. The raw multi-coil data of each frame was combined by the adaptive coil combine method [38] to produce a single-channel complex-valued reconstruction image. Then, the complex-valued images were transformed to k-space data, which simulate a fully sampled single-coil data acquisition. Training of neural networks requires a large amount of data. To this end, we use data augmentation to enlarge the data set. Specifically, we translated the original images along x, y, and t directions, and the translation step size is 128×128×14, and the stride along the three directions are 12, 12, and 3, respectively. Finally, our dataset consists of 6214 complex-valued cardiac MR data sequences with size 128×128×14. 5950 cardiac MR data sequences were randomly selected as the training dataset, 50 cardiac sequences were used as validation dataset, and the remaining sequences were used for testing.

2) Reunderampling K-space Data Augmentation: In the proposed method, the fully sampled data are only used to generate the original undersampled k-space data 𝐲Ωt\mathbf{y}_{\Omega}^{t} with a 2D random retrospective undersampled mask 𝐏t\mathbf{P}^{t}. Following the principles of training data augmentation in section II-B, 𝐲Ωt\mathbf{y}_{\Omega}^{t} is augmented to two training sequences 𝐲Θt\mathbf{y}_{\Theta}^{t} and 𝐲Λt\mathbf{y}_{\Lambda}^{t} with two 2D random reundersampled masks 𝐏Θt\mathbf{P}_{\Theta}^{t} and 𝐏Λt\mathbf{P}_{\Lambda}^{t}. 𝐏Θt\mathbf{P}_{\Theta}^{t} with 2-fold acceleration is used for collaborative network-1, and 𝐏Λt\mathbf{P}_{\Lambda}^{t}, which is combines the complementary set of 𝐏Θt\mathbf{P}_{\Theta}^{t} with some low-frequency data points of 𝐏t\mathbf{P}^{t}, is used for collaborative network-2.

3) Evaluation Metrics: To evaluate the reconstruction performance, the mean-square-error (MSE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM) [39] are calculated as follows:

MSE=RefRec22\mathrm{MSE}=\|\mathrm{Ref}-\mathrm{Rec}\|_{2}^{2} (17)
PSNR=20log10MAXRefMSE\mathrm{PSNR}=20\log_{10}\frac{MAX_{Ref}}{\sqrt{MSE}} (18)
SSIM=(2μRefμRec+c1)(2σRef,Rec+c2)(μRef2+μRec2+c1)(σRef2+σRec2+c2)\mathrm{SSIM}=\frac{\left(2\mu_{Ref}\mu_{Rec}+c_{1}\right)\left(2\sigma_{Ref,Rec}+c_{2}\right)}{\left(\mu_{Ref}^{2}+\mu_{Rec}^{2}+c_{1}\right)\left(\sigma_{Ref}^{2}+\sigma_{Rec}^{2}+c_{2}\right)} (19)

where Rec\mathrm{Rec} is the reconstructed image sequence, and Ref\mathrm{Ref} represents the reference image sequence. MAXRef\mathrm{MAX}_{Ref} is the maximum possible value in the image. μRef\mu_{Ref} and μRec\mu_{Rec} are the averaged intensity values of the corresponding images. σRef\sigma_{Ref} and σRec\sigma_{Rec} are the variances. c1c_{1} and c2c_{2} are adjustable constants. σRef,Rec\sigma_{Ref,Rec} is the covariance. The SSIM index is a multiplicative combination of the luminance term, the contrast term, and the structural term (details shown in [39]).

4) Model Configuration and Implementation Details: The proposed framework is flexible to be integrated with both data-driven and iterative un-rolled networks. Most of our experiments adopt CRNN as the backbone network. In detail, the network is composed of a bidirectional CRNN layer, three CRNN layers, a 2D CNN layer, a residual connection sums output with input and a DC layer. The nonlinear activation function utilized is the Rectified linear unit (ReLU). For the bidirectional CRNN and CRNN layer, the number of convolution filters is set to nf=64n_{f}=64 with a kernel size of k=3k=3. The 2D CNN contains one convolutional layer with k=3k=3 and nf=2n_{f}=2. We use stride=1stride=1 and the padding is set to half of the filter size (rounded down). The DC layer is followed by the 2D CNN layer, which forces the sampled data points in predicted k-space data to be consistent with that in the inputs.

For model training, the number of iteration step is set to N=5N=5. The batch size is set to 1. All training data and test data are normalized to [0,1]. The SelfCoLearn framework with CRNN and k-t NEXT is implemented in PyTorch 1.8.1, and that with SLR-Net is implemented in Tensorflow2.2.0. The experiments are performed on a GPU server with an Nvidia Titan Xp Graphics Processing Unit (GPU, 12GB memory). The model is trained by Adam optimizer [40] with parameters β1=0.5\beta_{1}=0.5 and β2=0.999\beta_{2}=0.999. The weight parameter γ\gamma in co-training loss is set to 0.01. The learning rate is set to 10410^{-4}. It takes 42 hours to train SelfCoLearn with CRNN for 40 epochs and each cardiac MR data sequence takes roughly 0.5 second to get the reconstructed result.

3.2 Comparisons to State-of-the-Art Unsupervised Methods

To demonstrate the superiority of the proposed SelfCoLearn, we compared it with two self-supervised methods, SS-DCCNN and SS-CRNN, at different acceleration factors. It is worth noting that the state-of-the-art self-supervised method SSDU [34] was developed for static MR imaging. Literature [32] adopted a similar self-supervised training manner as SSDU for dynamic MR imaging. It evaluated several backbone architectures for dynamic MR imaging including DCCNN and CRNN, whereas SSDU adopted ResNet as backbone network. We choose two self-supervised learning methods SS-DCCNN and SS-CRNN [32] for comparison. In this experiment, the proposed SelfCoLearn selects the CRNN as the backbone network. For fair comparison, all methods are carefully tuned to obtain the best performance on the current dataset.

Refer to caption
Figure 3: Reconstruction results of different methods (SS-DCCNN, SS-CRNN, and the proposed SelfCoLearn) at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration. The first three rows show diastolic reconstruction results at the tenth frame. The first row shows, from left to right, fully sampled image and the reconstruction results of respective methods (display range [0, 1]). The second row shows the enlarged views of the heart regions. The third row shows the error maps of corresponding methods (display range [0, 0.2]). The following three rows show systolic reconstruction at the fifth frame. The last two rows show y-t views (extraction of the 40th slice along the y and t dimensions) and the error maps.

Fig. 3 plots the reconstruction results of different methods at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. The first three rows show diastolic reconstruction at the tenth frame of image sequence, and the following three rows show systolic reconstruction at the fifth frame of image sequence. The first row shows, from left to right, the fully sampled reference image and the reconstruction results of respective methods (display range [0, 1]). The second row shows the enlarged views of the heart regions. The third row shows the corresponding error maps (display range [0, 0.2]). The y-t views at 𝐱=40\mathbf{x}=40 are shown in the seventh row. The corresponding error maps of y-t views are shown in the last row. From the visualization results, the proposed SelfCoLearn generates better reconstruction results than the two self-supervised methods, SS-DCCNN and SS-CRNN, at all acceleration factors. The reconstruction images of SelfCoLearn show finer structural details and more precise heart borders with fewer artifacts, while the SS-DCCNN and SS-CRNN exhibit artifacts within the heart border and chambers. The error maps of SelfCoLearn also indicate minor reconstruction errors, especially at the borders of heart chambers.

The quantitative results of these self-supervised methods are listed in Table 1. Similar conclusions can be drawn that the proposed SelfCoLearn achieves better quantitative performance than the two existing self-supervised learning methods. Therefore, our collaborative learning strategy can effectively capture essential and inherent representations directly from undersampled k-space data.

Table 1: Quantitative reconstruction results of different methods (SS-DCCNN, SS-CRNN, and the proposed SelfCoLearn) at 8-fold, 12-fold and 14-fold acceleration factors (mean±std)
AF Methods Training pattern PSNR(dB) SSIM MSE(*e-4)
SS-DCCNN Self-supervised 22.56±2.71 0.7263±0.0663 67.87±49.27
8-fold SS-CRNN Self-supervised 30.81±1.77 0.8994±0.0288 9.02±3.75
SelfCoLearn Self-supervised 37.27±2.40 0.9622±0.0201 2.17±1.22
SS-DCCNN Self-supervised 22.17±2.76 0.7014±0.0658 74.89±54.96
12-fold SS-CRNN Self-supervised 30.14±1.78 0.8952±0.0298 10.54±4.40
SelfCoLearn Self-supervised 35.19±2.24 0.9480±0.0246 3.44±1.78
SS-DCCNN Self-supervised 20.70±2.78 0.6667±0.0715 104.21±71.77
14-fold SS-CRNN Self-supervised 29.82±1.77 0.8911±0.0301 11.32±4.68
SelfCoLearn Self-supervised 34.38±2.23 0.9399±0.0273 4.14±2.11

Fig. 4 give the box plots displaying the median and interquartile range (25th-75th percentile) of the reconstruction results of different self-supervised methods across all test cardiac cine data at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. For all dynamic cine sequences, SelfCoLearn outperforms the two self-supervised learning methods (SS-DCCNN and SS-CRNN) at all three acceleration factors.

Refer to caption
Figure 4: Box plots of different methods (SS-DCCNN, SS-CRNN and the proposed SelfCoLearn) at 8-fold, 12-fold and 14-fold accelerations, which show the median and interquartile range of PSNR and SSIM on the cardiac cine test dataset.

3.3 Comparisons to State-of-the-Art Supervised Methods

We further compare our SelfCoLearn with different supervised methods, including supervised U-Net and supervised CRNN [26], at different acceleration factors. Fig. 5 plots the reconstruction results of different methods at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. From the visualization results, the proposed SelfCoLearn restores more precise anatomical details of heart regions than supervised U-Net at all acceleration factors, while supervised U-Net fails to recover some details in the heart chambers. The error maps of SelfCoLearn also indicate minor reconstruction errors than those of supervised U-Net.

Refer to caption
Figure 5: Reconstruction results of different methods (Supervised U-Net, Supervised CRNN and the proposed SelfCoLearn) at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration. The first three rows show diastolic reconstruction results at the tenth frame. The first row shows, from left to right, fully sampled image and the reconstruction results of respective methods (display range [0, 1]). The second row shows the enlarged views of the heart regions. The third row shows the error maps of corresponding methods (display range [0, 0.2]). The following three rows show systolic reconstruction at the fifth frame. The last two rows show y-t views (extraction of the 40th slice along the y and t dimensions) and the error maps.

In addition, SelfCoLearn generates comparable reconstruction results to those of supervised CRNN at low acceleration factors. From the enlarged heart region plots, the images reconstructed by SelfCoLearn are as clear as those generated by supervised CRNN. At higher acceleration factors, such as 14-fold acceleration, the reconstructed images of SelfCoLearn become slightly blur. Nevertheless, most of the structural details in the heart regions are still successfully restored by SelfCoLearn. Quantitative results at all acceleration factors (Table 2) also show the promising results of SelfCoLearn. Therefore, it can be concluded that the proposed SelfCoLearn can achieve comparable reconstruction performance with fully-supervised methods via self-supervised collaborative learning for accelerating dynamic MR imaging.

Table 2: Quantitative reconstruction results of different methods (Supervised U-Net, Supervised CRNN and the proposed SelfCoLearn) at 8-fold, 12-fold and 14-fold acceleration factors (mean±std)
AF Methods Training pattern PSNR(dB) SSIM MSE(*e-4)
U-Net Supervised 32.63±1.97 0.9186±0.0301 6.06±2.88
8-fold SelfCoLearn Self-supervised 37.27±2.40 0.9622±0.0201 2.17±1.22
CRNN Supervised 38.09±2.52 0.9635±0.0204 1.83±1.07
U-Net Supervised 31.96±1.88 0.9111±0.0317 6.99±3.03
12-fold SelfCoLearn Self-supervised 35.19±2.24 0.9480±0.0246 3.44±1.78
CRNN Supervised 36.32±2.29 0.9513±0.0244 2.67±1.42
U-Net Supervised 31.51±1.99 0.9045±0.0330 7.86±3.83
14-fold SelfCoLearn Self-supervised 34.38±2.23 0.9399±0.0273 4.14±2.11
CRNN Supervised 35.74±2.28 0.9461±0.0261 3.05±1.59

4 Discussion

4.1 Network Backbone Architectures

In this section, we explore the reconstruction results of the proposed self-supervised learning strategy with different backbone networks for dynamic MR imaging. Experiments are conducted using SLR-Net [29], k-t NEXT [24], and CRNN [26] at 8-fold acceleration. The reconstruction results with different backbone networks can be found in Fig. 6 and Table 3. Compared with SS-CRNN [8], the proposed self-supervised learning strategy can achieve better results regardless of the utilized backbone network. Among the three different backbone networks, SLR-Net generates worse results than k-t NEXT and CRNN. The reason for this phenomenon may be that SLR-Net needs to learn a singular value threshold, and the absence of fully sampled reference data causes the learned singular value threshold is suboptimal. However, the proposed self-supervised learning strategy with SLR-Net still obtain acceptable reconstruction results. The qualitative results in Fig. 6. clearly show that the proposed SelfCoLearn can better restore the structural details and achieve clearer reconstructed MR images (especially in the heart regions around the red and yellow arrows) than SS-CRNN. The quantitative results also indicate more accurate reconstructions achieved by the proposed SelfCoLearn. These results confirm that our proposed self-supervised learning framework is flexible regarding the adopted backbone network, and it can achieve promising reconstruction results with both data-driven and iterative un-rolled networks for dynamic MR imaging.

Refer to caption
Figure 6: Reconstruction results of SS-CRNN and the proposed self-supervised learning strategy with SLR-Net, k-t NEXT, and CRNN backbone networks at 8-fold acceleration. The first row shows, from left to right, fully sampled image, and the reconstruction results of SS-CRNN and the proposed self-supervised learning strategy with SLR-Net, k-t NEXT, and CRNN (display range [0, 1], 10th frame). The second row shows the enlarged views of the heart regions. The third row shows the error maps of two methods (display range [0, 0.2]). The last two rows show y-t views (extraction of the 40th slice along the y and t dimensions) and the error maps.
Table 3: Quantitative results of SS-CRNN and SelfCoLearn with different backbone networks at 8-fold acceleration(mean±std)
Methods Training pattern PSNR(dB) SSIM MSE(*e-4)
SS-CRNN Self-supervised 30.81±1.77 0.8994±0.0288 9.02±3.75
SelfCoLearn with SLR-Net Self-supervised 33.58±2.24 0.9495±0.0220 5.57±10.48
SelfCoLearn with k-t Next Self-supervised 36.95±2.39 0.9613±0.0203 2.34±1.32
SelfCoLearn with CRNN Self-supervised 37.27±2.40 0.9622±0.0201 2.17±1.22

4.2 Co-training Loss Function

In this section, we investigate the utility of the designed the co-training loss function. The backbone network in these experiments adopts CRNN. Different training strategies at 8-fold acceleration are utilized: Strategy B-I – a single reconstruction network is trained in self-supervised manner. Only the loss function between the output f(𝐲Θt)f\left(\mathbf{y}_{\Theta}^{t}\right) of network and 𝐲Λt\mathbf{y}_{\Lambda}^{t} is used to train the network. This training strategy is similar to that of SSDU. Strategy B-II – a strategy similar to B-I but the loss function here is calculated between the output f(𝐲Θt)f\left(\mathbf{y}_{\Theta}^{t}\right) of network and original undersampled k-space data 𝐲Ωt\mathbf{y}_{\Omega}^{t}. SelfCoLearn – two networks are trained collaboratively with UC\mathcal{L}_{UC} and CC\mathcal{L}_{CC}, and the two collaborative networks adopt the same backbone network as that in strategy B-I. Reconstruction images of methods utilizing the different training strategies are plotted in Fig. 7. Quantitative results are listed in Table 4. From both qualitative and quantitative results, we can observe that SelfCoLearn (training two networks collaboratively with both loss terms) achieves the best performance (especially in the heart regions around the red and yellow arrows). In particular, the contrastive consistency loss term results in a large reconstruction performance improvement. For example, PSNR is improved from 31.04dB (Strategy B-II) to 37.27 dB (SelfCoLearn).

Refer to caption
Figure 7: Ablation studies utilizing different training strategies at 8-fold acceleration. The first row shows, from left to right, fully sampled image, and the reconstruction results of strategy B-I, strategy B-II and proposed SelfCoLearn (display range [0, 1], 10th frame). The second row shows the enlarged views of the heart regions. The third row shows the error maps of respective methods (display range [0, 0.2]). The last two rows show y-t views (extraction of the 40th slice along the y and t dimensions) and the error maps.
Table 4: Quantitative results of reconstruction models utilizing different training strategies at 8-fold acceleration(mean±std)
Methods Training pattern Single-Net Parallel-Net LUCL_{UC} LCCL_{CC} PSNR(dB) SSIM MSE(*e-4)
Strategy B-I Self-supervised \sqrt{} ×\times ×\times ×\times 30.81±1.77 0.8994±0.0288 9.02±3.75
Strategy B-II Self-supervised \sqrt{} ×\times \sqrt{} ×\times 31.04±1.74 0.9045±0.0274 8.53±3.50
SelfCoLearn Self-supervised ×\times \sqrt{} \sqrt{} \sqrt{} 37.27±2.40 0.9622±0.0201 2.17±1.22

4.3 Loss Functions

In this section, we inspect the effects of loss functions. The backbone network in these experiments adopts CRNN. Reconstruction results at 8-fold acceleration are given in Fig. 8 and Table 5. Three strategies utilizing different loss function settings are investigated. In Strategy C-I, two networks are trained collaboratively with UC\mathcal{L}_{UC} and CC\mathcal{L}_{CC}. in which UC\mathcal{L}_{UC} is calculated in the x-t domain, and CC\mathcal{L}_{CC} is calculated in the k-space domain. In Strategy C-II, both UC\mathcal{L}_{UC} and CC\mathcal{L}_{CC} are calculated in the x-t domain. In Strategy C-III, both UC\mathcal{L}_{UC} and CC\mathcal{L}_{CC} are calculated in the k-space domain. From both qualitative and quantitative results, we can observe that the influence of utilizing different loss function settings on the reconstruction performance is insignificant. All the other experiments in this work adopt the loss function setting of strategy C-III.

Table 5: Quantitative results of methods utilizing different loss function strategies at 8-fold acceleration (mean±std)
Methods Training pattern LUCL_{UC} LCCL_{CC} PSNR(dB) SSIM MSE(*e-4)
Strategy C-I Self-supervised x-t domain k-space 37.00±2.35 0.9617±0.0201 2.30±1.29
Strategy C-II Self-supervised x-t domain x-t domain 37.20±2.37 0.9617±0.0203 2.20±1.22
Strategy C-III Self-supervised k-space k-space 37.27±2.40 0.9622±0.0201 2.17±1.22
Refer to caption
Figure 8: Effects of loss functions calculated in different domains on the reconstruction results at 8-fold acceleration. The first row shows, from left to right, fully sampled image, and the reconstruction results of models utilizing Strategy C-I, C-II and C-III (display range [0, 1], 10th frame). The second row shows the enlarged views of the heart regions. The third row shows the error maps of respective strategies (display range [0, 0.2]). The last two rows show y-t views (extraction of the 40th slice along the y and t dimensions) and the error maps.

5 Conclusion

In this work, we propose a self-supervised collaborative training framework to boost the image reconstruction performance for accelerating dynamic MR imaging. In particular, two independent reconstruction networks are trained collaboratively with different inputs, which are augmented from the same k-space data. To guide the two networks in capturing the detailed structural features and spatiotemporal correlations in dynamic image sequences, a co-training loss function is designed to promote the consistency between the two network predictions to provide complementary information for to-be-reconstructed dynamic MR images. The framework is flexible to be integrated with both data-driven and model-based iterative un-rolled networks. Our method has been comprehensively evaluated on a cardiac cine dataset. And comparisons to four state-of-the-art fully-supervised and self-supervised learning methods at different accelerations have been performed. SelfCoLearn achieves better results than the existing self-supervised learning methods, and it can generate comparable reconstruction performance to existing supervised learning methods. These observations indicate that SelfCoLearn possesses strong capabilities in capturing essential and inherent representations directly from the undersampled k-space data and thus enable high-quality and fast dynamic MR imaging.

References

  • [1] U. Gamper, P. Boesiger, and S. Kozerke, “Compressed sensing in dynamic mri,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 59, no. 2, pp. 365–373, 2008.
  • [2] B. Zhao, J. P. Haldar, A. G. Christodoulou, and Z.-P. Liang, “Image reconstruction from highly undersampled (k, t)-space data with joint partial separability and sparsity constraints,” IEEE transactions on medical imaging, vol. 31, no. 9, pp. 1809–1820, 2012.
  • [3] H. Jung, J. C. Ye, and E. Y. Kim, “Improved k–t blast and k–t sense using focuss,” Physics in Medicine & Biology, vol. 52, no. 11, p. 3201, 2007.
  • [4] Y. Wang and L. Ying, “Compressed sensing dynamic cardiac cine mri using learned spatiotemporal dictionary,” IEEE transactions on Biomedical Engineering, vol. 61, no. 4, pp. 1109–1120, 2013.
  • [5] J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic mr data reconstruction,” IEEE transactions on medical imaging, vol. 33, no. 4, pp. 979–994, 2014.
  • [6] J. Ji and T. Lang, “Dynamic mri with compressed sensing imaging using temporal correlations,” in 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.   IEEE, 2008, pp. 1613–1616.
  • [7] N. Zhao, D. O¡¯Connor, A. Basarab, D. Ruan, and K. Sheng, “Motion compensated dynamic mri reconstruction with local affine optical flow estimation,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 11, pp. 3050–3059, 2019.
  • [8] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, “k-t focuss: a general compressed sensing framework for high resolution dynamic mri,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 61, no. 1, pp. 103–116, 2009.
  • [9] R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic mri with separation of background and dynamic components,” Magnetic resonance in medicine, vol. 73, no. 3, pp. 1125–1136, 2015.
  • [10] F. Huang, J. Akao, S. Vijayakumar, G. R. Duensing, and M. Limkeman, “k-t grappa: A k-space implementation for dynamic mri with high reduction factor,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 54, no. 5, pp. 1172–1184, 2005.
  • [11] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in 2016 IEEE 13th international symposium on biomedical imaging (ISBI).   IEEE, 2016, pp. 514–517.
  • [12] J. Zhang and B. Ghanem, “Ista-net: Interpretable optimization-inspired deep network for image compressive sensing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1828–1837.
  • [13] T. Eo, Y. Jun, T. Kim, J. Jang, H.-J. Lee, and D. Hwang, “Kiki-net: cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images,” Magnetic resonance in medicine, vol. 80, no. 5, pp. 2188–2201, 2018.
  • [14] J. Sun, H. Li, Z. Xu et al., “Deep admm-net for compressive sensing mri,” Advances in neural information processing systems, vol. 29, 2016.
  • [15] H. K. Aggarwal, M. P. Mani, and M. Jacob, “Modl: Model-based deep learning architecture for inverse problems,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 394–405, 2018.
  • [16] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated mri data,” Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055–3071, 2018.
  • [17] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domain-transform manifold learning,” Nature, vol. 555, no. 7697, pp. 487–492, 2018.
  • [18] M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil, “Scan-specific robust artificial-neural-networks for k-space interpolation (raki) reconstruction: Database-free deep learning for fast imaging,” Magnetic resonance in medicine, vol. 81, no. 1, pp. 439–453, 2019.
  • [19] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly, “Deep generative adversarial neural networks for compressive sensing mri,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 167–179, 2018.
  • [20] O. Cohen, B. Zhu, and M. S. Rosen, “Mr fingerprinting deep reconstruction network (drone),” Magnetic resonance in medicine, vol. 80, no. 3, pp. 885–894, 2018.
  • [21] J. Yoon, E. Gong, I. Chatnuntawech, B. Bilgic, J. Lee, W. Jung, J. Ko, H. Jung, K. Setsompop, G. Zaharchuk et al., “Quantitative susceptibility mapping using deep neural network: Qsmnet,” Neuroimage, vol. 179, pp. 199–206, 2018.
  • [22] Q. Huang, Y. Xian, D. Yang, H. Qu, J. Yi, P. Wu, and D. N. Metaxas, “Dynamic mri reconstruction with end-to-end motion-guided network,” Medical Image Analysis, vol. 68, p. 101901, 2021.
  • [23] G. Seegoolam, J. Schlemper, C. Qin, A. Price, J. Hajnal, and D. Rueckert, “Exploiting motion for deep learning reconstruction of extremely-undersampled dynamic mri,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2019, pp. 704–712.
  • [24] C. Qin, J. Schlemper, J. Duan, G. Seegoolam, A. Price, J. Hajnal, and D. Rueckert, “k-t next: dynamic mr image reconstruction exploiting spatio-temporal correlations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2019, pp. 505–513.
  • [25] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic mr image reconstruction,” IEEE transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2017.
  • [26] C. Qin, J. Schlemper, J. Caballero, A. N. Price, J. V. Hajnal, and D. Rueckert, “Convolutional recurrent neural networks for dynamic mr image reconstruction,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 280–290, 2018.
  • [27] C. Qin, J. Duan, K. Hammernik, J. Schlemper, T. Küstner, R. Botnar, C. Prieto, A. N. Price, J. V. Hajnal, and D. Rueckert, “Complementary time-frequency domain networks for dynamic parallel mr image reconstruction,” Magnetic Resonance in Medicine, vol. 86, no. 6, pp. 3274–3291, 2021.
  • [28] W. Huang, Z. Ke, Z.-X. Cui, J. Cheng, Z. Qiu, S. Jia, L. Ying, Y. Zhu, and D. Liang, “Deep low-rank plus sparse network for dynamic mr imaging,” Medical Image Analysis, vol. 73, p. 102190, 2021.
  • [29] Z. Ke, W. Huang, Z.-X. Cui, J. Cheng, S. Jia, H. Wang, X. Liu, H. Zheng, L. Ying, Y. Zhu et al., “Learned low-rank priors in dynamic mr imaging,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3698–3710, 2021.
  • [30] Z. Ke, J. Cheng, L. Ying, H. Zheng, Y. Zhu, and D. Liang, “An unsupervised deep learning method for multi-coil cine mri,” Physics in Medicine & Biology, vol. 65, no. 23, p. 235041, 2020.
  • [31] J. Yoo, K. H. Jin, H. Gupta, J. Yerly, M. Stuber, and M. Unser, “Time-dependent deep image prior for dynamic mri,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3337–3348, 2021.
  • [32] M. Acar, T. Çukur, and İ. Öksüz, “Self-supervised dynamic mri reconstruction,” in International Workshop on Machine Learning for Medical Image Reconstruction.   Springer, 2021, pp. 35–44.
  • [33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9446–9454.
  • [34] B. Yaman, S. A. H. Hosseini, S. Moeller, J. Ellermann, K. Uğurbil, and M. Akçakaya, “Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data,” Magnetic resonance in medicine, vol. 84, no. 6, pp. 3172–3191, 2020.
  • [35] M. Akçakaya, B. Yaman, H. Chung, and J. C. Ye, “Unsupervised deep learning methods for biological image reconstruction and enhancement: An overview from a signal processing perspective,” IEEE Signal Processing Magazine, vol. 39, no. 2, pp. 28–44, 2022.
  • [36] D. Liang, J. Cheng, Z. Ke, and L. Ying, “Deep magnetic resonance image reconstruction: Inverse problems meet neural networks,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 141–151, 2020.
  • [37] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
  • [38] K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimum rank approximation,” IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4402–4416, 2010.
  • [39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  • [40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), 2015.