44email: [email protected], [email protected], [email protected]
SelfCoLearn: Self-supervised collaborative
learning for accelerating dynamic MR imaging
Abstract
Lately, deep learning has been extensively investigated for accelerating dynamic magnetic resonance (MR) imaging, with encouraging progresses achieved. However, without fully sampled reference data for training, current approaches may have limited abilities in recovering fine details or structures. To address this challenge, this paper proposes a self-supervised collaborative learning framework (SelfCoLearn) for accurate dynamic MR image reconstruction from undersampled k-space data. The proposed framework is equipped with three important components, namely, dual-network collaborative learning, reunderampling data augmentation and a specially designed co-training loss. The framework is flexible to be integrated with both data-driven networks and model-based iterative un-rolled networks. Our method has been evaluated on in-vivo dataset and compared to four state-of-the-art methods. Results show that our method possesses strong capabilities in capturing essential and inherent representations for direct reconstructions from the undersampled k-space data and thus enables high-quality and fast dynamic MR imaging.
Keywords:
Dynamic MR imaging Self-supervised learning Collaborative learning Reunderampling data augmentation Co-training loss1 Introduction
Deep learning-based dynamic magnetic resonance (MR) imaging has attracted substantial attention in recent years. It draws knowledge from big datasets via network training and then uses the trained network to conduct dynamic image reconstruction from the undersampled k-space data. Compared to the classical compressed sensing methods [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], deep learning-based methods have made encouraging performances and progresses.
Based on the reliance on the fully sampled dataset or not, existing methods can be roughly categorized into two types, fully-supervised methods and unsupervised ones. For the fully-supervised methods, data pairs are needed for the training of the neural networks between the undersampled/corrupted data and the fully sampled/ reference data [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. In this category, different network structures and prior knowledge have been explored [22, 23, 24, 25, 26, 27, 28, 29]. For example, Schlemper et al. [25] proposed a cascade network architecture composed of an intermediate de-aliasing convolutional neural network (CNN) module and a data consistency layer. Chen et al. [26] applied bidirectional convolutional recurrent neural network (CRNN) with interleaved data consistency to accelerate MR imaging. Chen et al. [27] designed a parallel framework, including a time-frequency domain CRNN and an image domain CRNN to simultaneously exploit spatiotemporal correlations. Huang et al.[28] unrolled the low-rank plus sparse method [9] into a deep neural network to learn the low rank and sparse regularization. Ke et al. [29] exploited the low rank priors (SLR-Net). These methods have made great progresses in accelerating dynamic MR imaging. However, one major challenge of these methods is that, in many practical imaging scenarios, obtaining high-quality fully sampled dynamic MRI data is infeasible due to various factors, such as physiological motions of patients, imaging speed restriction, etc. Therefore, the requirement of fully sampled reference data for network training has hindered the wide application of these methods.
To address this issue, researchers have developed unsupervised learning methods to train models without fully sampled reference data [30, 31, 32]. For example, Ke et al. [30] generated a pseudo reference label from undersampled data by merging neighbouring frames of undersampled k-space data. Jin et al. [31] extended the framework of deep image prior [33] to dynamic non-cartesian MRI. Recently, Yaman et al. [34] proposed a classical self-supervised learning method (SSDU) for static MR imaging. SSDU divides the acquired undersampled data into two parts. One part is treated as the input data, and the other is utilized as the supervisory signals [35]. Subsequently, Acar et al. [32] applied SSDU to reconstruct dynamic MR images. All of these works have made great contributions to unsupervised dynamic MR image reconstruction. Nevertheless, these works still have spaces to improve in recovering fine details or structures due to the incomplete inherent representation of the undersampled data compared to fully sampled data.
To boost the performances for accelerating dynamic MR imaging without fully sampled reference data, we propose a self-supervised collaborative learning framework (SelfCoLearn). Taking the assumption that the latent representation of network predictions is consistent under different reundersampling data augmentation from the same data. SelfCoLearn apply collaborative training of dual-network with reundersampling data augmentation to explore more sufficient prior knowledge compared to a single network. Specifically, from undersampled k-space data, the operations of reundersampling data augmentation are implemented to obtain two reundersampling inputs for dual-network. Dual networks are trained collaboratively with a specially designed co-training loss in an end-to-end manner. With this collaborative training strategy, the proposed framework can possess strong capabilities in capturing essential and inherent representations from the undersamled k-space data in self-supervised learning manner. Additionally, the framework is flexible to be integrated with both data-driven networks and model-based iterative un-rolled networks [36] for dynamic MR imaging. The main contributions can be described as follows:
-
•
We present a self-supervised collaborative learning framework with reundersampling data augmentation for accelerating dynamic MR imaging. The proposed framework is flexible to be integrated with both data-driven networks and iterative un-rolled networks.
-
•
A co-training loss, which includes an undersampled consistency loss term and a contrastive consistency loss term, is designed to guide the end-to-end framework to capture essential and inherent representations from the undersamled k-space data.
-
•
Extensive experiments are conducted to evaluate the effectiveness of the proposed SelfCoLearn with both data-driven and iterative un-rolled networks, with more promising results obtained compared to four state-of-the-art methods.
The remainder of this paper is organized as follows: Section II states the dynamic MR imaging problem and the proposed SelfCoLearn with different backbone networks. Section III summarizes the experiments and results to demonstrate the effectiveness of the proposed SelfCoLearn, while discussion is presented in section IV. Section V concludes the work.
2 Methodology
2.1 Dynamic MR Imaging Formulation
The goal of dynamic MR imaging is to estimate dynamic MR image sequences from undersampled measurements in k-space. is a vector. and are height and width of the frame respectively. represents the number of frames in each sequence. Thus, the imaging model can be described as follows:
(1) |
where is noise and is an undersampled Fourier encoding operator, is 2D Fourier transform to each frame in dynamic image sequence and is the undersampled mask for each frame. In general, the reconstruction problem is formulated as an unconstrained optimization problem to explore the prior knowledge:
(2) |
where represents a prior regularization item on , and is the weight of the regularization. is the data fidelity item, which ensures the reconstruction result to be consistent with the original undersampled measurements.
For fully-supervised deep learning methods, it typically uses a CNN , as a regularization term , by learning the mapping between undersampled/ corrupted data and their corresponding fully sampled data with parameters . Its mathematical description can be given as:
(3) |
where is the number of dynamic image sequences in the training set. is the fully sampled data of the subject data . denotes the loss function between the predicted reconstruction output and the fully sampled reference data, which typically adopts the norm or norm.
2.2 The Overall Framework
In our work, we propose a simple but effective self-supervised training framework for dynamic MR imaging, whose paradigm is shown in Fig. 1. Our framework trains two independent reconstruction networks simultaneously, which have different inputs and different weight parameters. The backbone network can adopt either data-driven network architecture or the iterative un-rolled network, such as CRNN [26], k-t NEXT [24] and SLR-Net [29] et al. Based on the consistency between two network predictions, the network provides complementary information for the to-be-reconstructed dynamic MR images in its peer partner, which is an additional regularization compared to the existing unsupervised methods [32]. The two networks will finally realize consistent reconstruction in the training process. Specifically, given a raw undersampled k-space data sequence , we reundersampled the original k-space data to construct a partial data points sequence :
(4) |
where is the index of sequence, denotes the index of the two training sequences and is the undersampled mask for frame . In order to make full use of all data points in to learn representation, and ensure each network can provide complementary information for the to-be-reconstructed dynamic MR images in its peer network, these training sequences are generated adhere to the following data augmented principles: (1) The union of data points in two training sequences must be equal to the data , i.e., . (2) The data points in two training sequences should be different, i.e., . (3) The training sequences should include most of the low frequency data points and part of the high frequency data points. Following these principles, the two training sequences contain similar data points in the low frequency region, and different points in the high frequency region. Noted that the operation of data reundersampling is only necessary during training, the reconstructed images can be inferred from the test data directly.

2.3 Network Architectures
1) Data-driven dynamic MR imaging: In the data-driven settings, the common practice is to decouple Eq. 2 into a regularization term and a data fidelity term via utilizing the variable splitting technique [25, 26]. By introducing an auxiliary variable , Eq.2 can be re-formulated as the penalty function:
(5) |
where is a penalty parameter. Eq.5 can then be solved iteratively via alternating minimization over and :
(6) |
(7) |
where is the th iteration, is the zero-filling image transformed from original undersampled measurement, denotes the intermediate reconstruction sequence, and denotes the final reconstruction sequence at each iteration. In Eq.7, the operation on the intermediate reconstruction sequence is a data consistency step, which uses the original sampled k-space data points to replace the corresponding data points in the reconstructed k-space data [25]. The iterative optimization process in Eq.6 and Eq.7 is unrolled into a neural network.
CRNN [26] is a typical data-driven method that integrates data consistency in k-space. A single iteration of CRNN can be illustrated as the following process:
(8) |
(9) |
where is the intermediate reconstruction sequence analogous to in Eq.6, and denotes the final predicted result at each iteration analogous to in Eq.7. The regularization subproblem in Eq.6 is solved by using a convolutional recurrent neural network. The data consistency subproblem in Eq.7 is treated as a data consistency network layer. The unrolled architecture of CRNN is shown in Fig. 2. More details of CRNN layers can be found in [26].

2) Un-rolled dynamic MR imaging: Another widely-used strategy is un-rolled dynamic MR imaging, which constructs CNNs according to the iterations of traditional optimization algorithms. Different optimization algorithms lead to different network architectures. SLR-Net, which formulates sparse and low rank priors as regularized terms in an optimization algorithm [29], is a typical example of un-rolled method. In SLR-Net, by introducing an auxiliary variable , Eq.2 can be decoupled as the fidelity term, sparse regularization term and the low rank regularization term:
(10) |
where is a sparse transform in a certain sparse domain. is a matrix (with size (, )), in which each column corresponds to one frame in dynamic MR image sequence. is a reshaping operator. is the nuclear norm. Previous works have proven that nuclear norm minimization is effective in low-rank matrix recovery [37]. More details of the iterative process in SLR-Net can be found in [29].
2.4 The Proposed Co-training Loss
We have designed a co-training loss to promote accurate dynamic MR image reconstruction in a self-supervised manner. The core idea of the co-training loss is to enforce the consistency not only between the reconstruction results and the original undersampled k-space data, but also between two network predictions. Compared with existing methods with single network, the consistency between two network predictions is an additional regularization, which guides the dual-network to learn more correct information. Specifically, the co-training loss in SelfCoLearn, including an undersampled consistency loss term and a contrastive consistency loss term, is calculated to optimize the whole framework.
Let denotes SelfCoLearn, is the original undersampled k-space data. During training, two training sequences and are generated from following the data augmented principles in section II-B as follows:
(11) |
where and are reundersampled mask for . The undersampled consistency loss is mainly referred to the actually sampled k-space points in , which ensures the corresponding sampled points in network prediction consistent with actually sampled k-space points in . The corresponding sampled points and in these two network predictions can be written as:
(12) |
where k-space data and are transformed from the predicted image sequences of two networks, respectively. is the undersampled mask, which is applied to generate the original undersampled k-space data from the fully sampled data. Thus, the Undersampled Consistency loss term is used to calculate the mean-square-error between the actually sampled k-space points in and that in each network prediction as follows:
(13) |
In the ideal case, when different reundersampled k-space data from the same data are feed into two networks, the network predictions shall approximate the fully sampled reference data with the network optimizations. However, when the fully sampled reference data are absent, these two networks may produce different predicted results only with the undersampled consistency loss, and result in different reconstruction performances. As mentioned above, a contrastive consistency loss is defined to compute the mean-square-error between two network predictions with different reundersampling inputs from the same data. Specially, our contrastive consistency loss term is mainly referred to the corresponding points in network predictions to unsampled k-space points in . These corresponding points and in two network predictions and can be written as:
(14) |
therefore, the Contrastive Consistency loss term is formulated as:
(15) |
Combining the two loss terms, our final co-training loss function can be written as:
(16) |
where is used to balance the weight parameter of undersampled consistency loss and contrastive consistency loss. During testing phase, undersampled data sequence is set as input of collaborative network-1 to obtain the final reconstruction result.
3 Experimental Results
Extensive experiments have been performed to evaluate the effectiveness of the proposed method. The performance of SelfCoLearn is compared with that of four state-of-the-art fully-supervised and self-supervised learning methods at different acceleration factors. Besides, SelfCoLearn with different backbone networks have been experimented, including both data-driven networks and iterative un-rolled networks for dynamic MR imaging. Then, results of ablation studies are reported to investigate the impacts of the undersampled consistency loss term and contrastive consistency loss term. Finally, reconstruction results with different co-training loss calculated in different domains are reported to further validate the effectiveness of proposed SelfCoLearn.
3.1 Experimental Setup
1) Dataset: T1-weighted FLASH sequence is utilized to collect fully sampled cardiac data from 101 volunteers on a 3T scanner. All in vivo experiments have been approved by the Institutional Review Board (IRB) of Shenzhen Institutes of Advanced Technology, and written informed consent have been obtained from all volunteers. Each scan acquires a single slice from the volunteer with 25 temporal frames. The following parameters were used for the FLASH sequence: FOV 330×330 mm, acquisition matrix 192×192, slice thickness = 6 mm, TR/TE = 50 ms/3 ms, and 24 receiving coils. The raw multi-coil data of each frame was combined by the adaptive coil combine method [38] to produce a single-channel complex-valued reconstruction image. Then, the complex-valued images were transformed to k-space data, which simulate a fully sampled single-coil data acquisition. Training of neural networks requires a large amount of data. To this end, we use data augmentation to enlarge the data set. Specifically, we translated the original images along x, y, and t directions, and the translation step size is 128×128×14, and the stride along the three directions are 12, 12, and 3, respectively. Finally, our dataset consists of 6214 complex-valued cardiac MR data sequences with size 128×128×14. 5950 cardiac MR data sequences were randomly selected as the training dataset, 50 cardiac sequences were used as validation dataset, and the remaining sequences were used for testing.
2) Reunderampling K-space Data Augmentation: In the proposed method, the fully sampled data are only used to generate the original undersampled k-space data with a 2D random retrospective undersampled mask . Following the principles of training data augmentation in section II-B, is augmented to two training sequences and with two 2D random reundersampled masks and . with 2-fold acceleration is used for collaborative network-1, and , which is combines the complementary set of with some low-frequency data points of , is used for collaborative network-2.
3) Evaluation Metrics: To evaluate the reconstruction performance, the mean-square-error (MSE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM) [39] are calculated as follows:
(17) |
(18) |
(19) |
where is the reconstructed image sequence, and represents the reference image sequence. is the maximum possible value in the image. and are the averaged intensity values of the corresponding images. and are the variances. and are adjustable constants. is the covariance. The SSIM index is a multiplicative combination of the luminance term, the contrast term, and the structural term (details shown in [39]).
4) Model Configuration and Implementation Details: The proposed framework is flexible to be integrated with both data-driven and iterative un-rolled networks. Most of our experiments adopt CRNN as the backbone network. In detail, the network is composed of a bidirectional CRNN layer, three CRNN layers, a 2D CNN layer, a residual connection sums output with input and a DC layer. The nonlinear activation function utilized is the Rectified linear unit (ReLU). For the bidirectional CRNN and CRNN layer, the number of convolution filters is set to with a kernel size of . The 2D CNN contains one convolutional layer with and . We use and the padding is set to half of the filter size (rounded down). The DC layer is followed by the 2D CNN layer, which forces the sampled data points in predicted k-space data to be consistent with that in the inputs.
For model training, the number of iteration step is set to . The batch size is set to 1. All training data and test data are normalized to [0,1]. The SelfCoLearn framework with CRNN and k-t NEXT is implemented in PyTorch 1.8.1, and that with SLR-Net is implemented in Tensorflow2.2.0. The experiments are performed on a GPU server with an Nvidia Titan Xp Graphics Processing Unit (GPU, 12GB memory). The model is trained by Adam optimizer [40] with parameters and . The weight parameter in co-training loss is set to 0.01. The learning rate is set to . It takes 42 hours to train SelfCoLearn with CRNN for 40 epochs and each cardiac MR data sequence takes roughly 0.5 second to get the reconstructed result.
3.2 Comparisons to State-of-the-Art Unsupervised Methods
To demonstrate the superiority of the proposed SelfCoLearn, we compared it with two self-supervised methods, SS-DCCNN and SS-CRNN, at different acceleration factors. It is worth noting that the state-of-the-art self-supervised method SSDU [34] was developed for static MR imaging. Literature [32] adopted a similar self-supervised training manner as SSDU for dynamic MR imaging. It evaluated several backbone architectures for dynamic MR imaging including DCCNN and CRNN, whereas SSDU adopted ResNet as backbone network. We choose two self-supervised learning methods SS-DCCNN and SS-CRNN [32] for comparison. In this experiment, the proposed SelfCoLearn selects the CRNN as the backbone network. For fair comparison, all methods are carefully tuned to obtain the best performance on the current dataset.

Fig. 3 plots the reconstruction results of different methods at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. The first three rows show diastolic reconstruction at the tenth frame of image sequence, and the following three rows show systolic reconstruction at the fifth frame of image sequence. The first row shows, from left to right, the fully sampled reference image and the reconstruction results of respective methods (display range [0, 1]). The second row shows the enlarged views of the heart regions. The third row shows the corresponding error maps (display range [0, 0.2]). The y-t views at are shown in the seventh row. The corresponding error maps of y-t views are shown in the last row. From the visualization results, the proposed SelfCoLearn generates better reconstruction results than the two self-supervised methods, SS-DCCNN and SS-CRNN, at all acceleration factors. The reconstruction images of SelfCoLearn show finer structural details and more precise heart borders with fewer artifacts, while the SS-DCCNN and SS-CRNN exhibit artifacts within the heart border and chambers. The error maps of SelfCoLearn also indicate minor reconstruction errors, especially at the borders of heart chambers.
The quantitative results of these self-supervised methods are listed in Table 1. Similar conclusions can be drawn that the proposed SelfCoLearn achieves better quantitative performance than the two existing self-supervised learning methods. Therefore, our collaborative learning strategy can effectively capture essential and inherent representations directly from undersampled k-space data.
AF | Methods | Training pattern | PSNR(dB) | SSIM | MSE(*e-4) |
---|---|---|---|---|---|
SS-DCCNN | Self-supervised | 22.56±2.71 | 0.7263±0.0663 | 67.87±49.27 | |
8-fold | SS-CRNN | Self-supervised | 30.81±1.77 | 0.8994±0.0288 | 9.02±3.75 |
SelfCoLearn | Self-supervised | 37.27±2.40 | 0.9622±0.0201 | 2.17±1.22 | |
SS-DCCNN | Self-supervised | 22.17±2.76 | 0.7014±0.0658 | 74.89±54.96 | |
12-fold | SS-CRNN | Self-supervised | 30.14±1.78 | 0.8952±0.0298 | 10.54±4.40 |
SelfCoLearn | Self-supervised | 35.19±2.24 | 0.9480±0.0246 | 3.44±1.78 | |
SS-DCCNN | Self-supervised | 20.70±2.78 | 0.6667±0.0715 | 104.21±71.77 | |
14-fold | SS-CRNN | Self-supervised | 29.82±1.77 | 0.8911±0.0301 | 11.32±4.68 |
SelfCoLearn | Self-supervised | 34.38±2.23 | 0.9399±0.0273 | 4.14±2.11 |
Fig. 4 give the box plots displaying the median and interquartile range (25th-75th percentile) of the reconstruction results of different self-supervised methods across all test cardiac cine data at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. For all dynamic cine sequences, SelfCoLearn outperforms the two self-supervised learning methods (SS-DCCNN and SS-CRNN) at all three acceleration factors.

3.3 Comparisons to State-of-the-Art Supervised Methods
We further compare our SelfCoLearn with different supervised methods, including supervised U-Net and supervised CRNN [26], at different acceleration factors. Fig. 5 plots the reconstruction results of different methods at 8-fold acceleration, 12-fold acceleration and 14-fold acceleration, respectively. From the visualization results, the proposed SelfCoLearn restores more precise anatomical details of heart regions than supervised U-Net at all acceleration factors, while supervised U-Net fails to recover some details in the heart chambers. The error maps of SelfCoLearn also indicate minor reconstruction errors than those of supervised U-Net.

In addition, SelfCoLearn generates comparable reconstruction results to those of supervised CRNN at low acceleration factors. From the enlarged heart region plots, the images reconstructed by SelfCoLearn are as clear as those generated by supervised CRNN. At higher acceleration factors, such as 14-fold acceleration, the reconstructed images of SelfCoLearn become slightly blur. Nevertheless, most of the structural details in the heart regions are still successfully restored by SelfCoLearn. Quantitative results at all acceleration factors (Table 2) also show the promising results of SelfCoLearn. Therefore, it can be concluded that the proposed SelfCoLearn can achieve comparable reconstruction performance with fully-supervised methods via self-supervised collaborative learning for accelerating dynamic MR imaging.
AF | Methods | Training pattern | PSNR(dB) | SSIM | MSE(*e-4) |
---|---|---|---|---|---|
U-Net | Supervised | 32.63±1.97 | 0.9186±0.0301 | 6.06±2.88 | |
8-fold | SelfCoLearn | Self-supervised | 37.27±2.40 | 0.9622±0.0201 | 2.17±1.22 |
CRNN | Supervised | 38.09±2.52 | 0.9635±0.0204 | 1.83±1.07 | |
U-Net | Supervised | 31.96±1.88 | 0.9111±0.0317 | 6.99±3.03 | |
12-fold | SelfCoLearn | Self-supervised | 35.19±2.24 | 0.9480±0.0246 | 3.44±1.78 |
CRNN | Supervised | 36.32±2.29 | 0.9513±0.0244 | 2.67±1.42 | |
U-Net | Supervised | 31.51±1.99 | 0.9045±0.0330 | 7.86±3.83 | |
14-fold | SelfCoLearn | Self-supervised | 34.38±2.23 | 0.9399±0.0273 | 4.14±2.11 |
CRNN | Supervised | 35.74±2.28 | 0.9461±0.0261 | 3.05±1.59 |
4 Discussion
4.1 Network Backbone Architectures
In this section, we explore the reconstruction results of the proposed self-supervised learning strategy with different backbone networks for dynamic MR imaging. Experiments are conducted using SLR-Net [29], k-t NEXT [24], and CRNN [26] at 8-fold acceleration. The reconstruction results with different backbone networks can be found in Fig. 6 and Table 3. Compared with SS-CRNN [8], the proposed self-supervised learning strategy can achieve better results regardless of the utilized backbone network. Among the three different backbone networks, SLR-Net generates worse results than k-t NEXT and CRNN. The reason for this phenomenon may be that SLR-Net needs to learn a singular value threshold, and the absence of fully sampled reference data causes the learned singular value threshold is suboptimal. However, the proposed self-supervised learning strategy with SLR-Net still obtain acceptable reconstruction results. The qualitative results in Fig. 6. clearly show that the proposed SelfCoLearn can better restore the structural details and achieve clearer reconstructed MR images (especially in the heart regions around the red and yellow arrows) than SS-CRNN. The quantitative results also indicate more accurate reconstructions achieved by the proposed SelfCoLearn. These results confirm that our proposed self-supervised learning framework is flexible regarding the adopted backbone network, and it can achieve promising reconstruction results with both data-driven and iterative un-rolled networks for dynamic MR imaging.

Methods | Training pattern | PSNR(dB) | SSIM | MSE(*e-4) |
---|---|---|---|---|
SS-CRNN | Self-supervised | 30.81±1.77 | 0.8994±0.0288 | 9.02±3.75 |
SelfCoLearn with SLR-Net | Self-supervised | 33.58±2.24 | 0.9495±0.0220 | 5.57±10.48 |
SelfCoLearn with k-t Next | Self-supervised | 36.95±2.39 | 0.9613±0.0203 | 2.34±1.32 |
SelfCoLearn with CRNN | Self-supervised | 37.27±2.40 | 0.9622±0.0201 | 2.17±1.22 |
4.2 Co-training Loss Function
In this section, we investigate the utility of the designed the co-training loss function. The backbone network in these experiments adopts CRNN. Different training strategies at 8-fold acceleration are utilized: Strategy B-I – a single reconstruction network is trained in self-supervised manner. Only the loss function between the output of network and is used to train the network. This training strategy is similar to that of SSDU. Strategy B-II – a strategy similar to B-I but the loss function here is calculated between the output of network and original undersampled k-space data . SelfCoLearn – two networks are trained collaboratively with and , and the two collaborative networks adopt the same backbone network as that in strategy B-I. Reconstruction images of methods utilizing the different training strategies are plotted in Fig. 7. Quantitative results are listed in Table 4. From both qualitative and quantitative results, we can observe that SelfCoLearn (training two networks collaboratively with both loss terms) achieves the best performance (especially in the heart regions around the red and yellow arrows). In particular, the contrastive consistency loss term results in a large reconstruction performance improvement. For example, PSNR is improved from 31.04dB (Strategy B-II) to 37.27 dB (SelfCoLearn).

Methods | Training pattern | Single-Net | Parallel-Net | PSNR(dB) | SSIM | MSE(*e-4) | ||
---|---|---|---|---|---|---|---|---|
Strategy B-I | Self-supervised | 30.81±1.77 | 0.8994±0.0288 | 9.02±3.75 | ||||
Strategy B-II | Self-supervised | 31.04±1.74 | 0.9045±0.0274 | 8.53±3.50 | ||||
SelfCoLearn | Self-supervised | 37.27±2.40 | 0.9622±0.0201 | 2.17±1.22 |
4.3 Loss Functions
In this section, we inspect the effects of loss functions. The backbone network in these experiments adopts CRNN. Reconstruction results at 8-fold acceleration are given in Fig. 8 and Table 5. Three strategies utilizing different loss function settings are investigated. In Strategy C-I, two networks are trained collaboratively with and . in which is calculated in the x-t domain, and is calculated in the k-space domain. In Strategy C-II, both and are calculated in the x-t domain. In Strategy C-III, both and are calculated in the k-space domain. From both qualitative and quantitative results, we can observe that the influence of utilizing different loss function settings on the reconstruction performance is insignificant. All the other experiments in this work adopt the loss function setting of strategy C-III.
Methods | Training pattern | PSNR(dB) | SSIM | MSE(*e-4) | ||
---|---|---|---|---|---|---|
Strategy C-I | Self-supervised | x-t domain | k-space | 37.00±2.35 | 0.9617±0.0201 | 2.30±1.29 |
Strategy C-II | Self-supervised | x-t domain | x-t domain | 37.20±2.37 | 0.9617±0.0203 | 2.20±1.22 |
Strategy C-III | Self-supervised | k-space | k-space | 37.27±2.40 | 0.9622±0.0201 | 2.17±1.22 |

5 Conclusion
In this work, we propose a self-supervised collaborative training framework to boost the image reconstruction performance for accelerating dynamic MR imaging. In particular, two independent reconstruction networks are trained collaboratively with different inputs, which are augmented from the same k-space data. To guide the two networks in capturing the detailed structural features and spatiotemporal correlations in dynamic image sequences, a co-training loss function is designed to promote the consistency between the two network predictions to provide complementary information for to-be-reconstructed dynamic MR images. The framework is flexible to be integrated with both data-driven and model-based iterative un-rolled networks. Our method has been comprehensively evaluated on a cardiac cine dataset. And comparisons to four state-of-the-art fully-supervised and self-supervised learning methods at different accelerations have been performed. SelfCoLearn achieves better results than the existing self-supervised learning methods, and it can generate comparable reconstruction performance to existing supervised learning methods. These observations indicate that SelfCoLearn possesses strong capabilities in capturing essential and inherent representations directly from the undersampled k-space data and thus enable high-quality and fast dynamic MR imaging.
References
- [1] U. Gamper, P. Boesiger, and S. Kozerke, “Compressed sensing in dynamic mri,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 59, no. 2, pp. 365–373, 2008.
- [2] B. Zhao, J. P. Haldar, A. G. Christodoulou, and Z.-P. Liang, “Image reconstruction from highly undersampled (k, t)-space data with joint partial separability and sparsity constraints,” IEEE transactions on medical imaging, vol. 31, no. 9, pp. 1809–1820, 2012.
- [3] H. Jung, J. C. Ye, and E. Y. Kim, “Improved k–t blast and k–t sense using focuss,” Physics in Medicine & Biology, vol. 52, no. 11, p. 3201, 2007.
- [4] Y. Wang and L. Ying, “Compressed sensing dynamic cardiac cine mri using learned spatiotemporal dictionary,” IEEE transactions on Biomedical Engineering, vol. 61, no. 4, pp. 1109–1120, 2013.
- [5] J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic mr data reconstruction,” IEEE transactions on medical imaging, vol. 33, no. 4, pp. 979–994, 2014.
- [6] J. Ji and T. Lang, “Dynamic mri with compressed sensing imaging using temporal correlations,” in 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. IEEE, 2008, pp. 1613–1616.
- [7] N. Zhao, D. O¡¯Connor, A. Basarab, D. Ruan, and K. Sheng, “Motion compensated dynamic mri reconstruction with local affine optical flow estimation,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 11, pp. 3050–3059, 2019.
- [8] H. Jung, K. Sung, K. S. Nayak, E. Y. Kim, and J. C. Ye, “k-t focuss: a general compressed sensing framework for high resolution dynamic mri,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 61, no. 1, pp. 103–116, 2009.
- [9] R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic mri with separation of background and dynamic components,” Magnetic resonance in medicine, vol. 73, no. 3, pp. 1125–1136, 2015.
- [10] F. Huang, J. Akao, S. Vijayakumar, G. R. Duensing, and M. Limkeman, “k-t grappa: A k-space implementation for dynamic mri with high reduction factor,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 54, no. 5, pp. 1172–1184, 2005.
- [11] S. Wang, Z. Su, L. Ying, X. Peng, S. Zhu, F. Liang, D. Feng, and D. Liang, “Accelerating magnetic resonance imaging via deep learning,” in 2016 IEEE 13th international symposium on biomedical imaging (ISBI). IEEE, 2016, pp. 514–517.
- [12] J. Zhang and B. Ghanem, “Ista-net: Interpretable optimization-inspired deep network for image compressive sensing,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1828–1837.
- [13] T. Eo, Y. Jun, T. Kim, J. Jang, H.-J. Lee, and D. Hwang, “Kiki-net: cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images,” Magnetic resonance in medicine, vol. 80, no. 5, pp. 2188–2201, 2018.
- [14] J. Sun, H. Li, Z. Xu et al., “Deep admm-net for compressive sensing mri,” Advances in neural information processing systems, vol. 29, 2016.
- [15] H. K. Aggarwal, M. P. Mani, and M. Jacob, “Modl: Model-based deep learning architecture for inverse problems,” IEEE transactions on medical imaging, vol. 38, no. 2, pp. 394–405, 2018.
- [16] K. Hammernik, T. Klatzer, E. Kobler, M. P. Recht, D. K. Sodickson, T. Pock, and F. Knoll, “Learning a variational network for reconstruction of accelerated mri data,” Magnetic resonance in medicine, vol. 79, no. 6, pp. 3055–3071, 2018.
- [17] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image reconstruction by domain-transform manifold learning,” Nature, vol. 555, no. 7697, pp. 487–492, 2018.
- [18] M. Akçakaya, S. Moeller, S. Weingärtner, and K. Uğurbil, “Scan-specific robust artificial-neural-networks for k-space interpolation (raki) reconstruction: Database-free deep learning for fast imaging,” Magnetic resonance in medicine, vol. 81, no. 1, pp. 439–453, 2019.
- [19] M. Mardani, E. Gong, J. Y. Cheng, S. S. Vasanawala, G. Zaharchuk, L. Xing, and J. M. Pauly, “Deep generative adversarial neural networks for compressive sensing mri,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 167–179, 2018.
- [20] O. Cohen, B. Zhu, and M. S. Rosen, “Mr fingerprinting deep reconstruction network (drone),” Magnetic resonance in medicine, vol. 80, no. 3, pp. 885–894, 2018.
- [21] J. Yoon, E. Gong, I. Chatnuntawech, B. Bilgic, J. Lee, W. Jung, J. Ko, H. Jung, K. Setsompop, G. Zaharchuk et al., “Quantitative susceptibility mapping using deep neural network: Qsmnet,” Neuroimage, vol. 179, pp. 199–206, 2018.
- [22] Q. Huang, Y. Xian, D. Yang, H. Qu, J. Yi, P. Wu, and D. N. Metaxas, “Dynamic mri reconstruction with end-to-end motion-guided network,” Medical Image Analysis, vol. 68, p. 101901, 2021.
- [23] G. Seegoolam, J. Schlemper, C. Qin, A. Price, J. Hajnal, and D. Rueckert, “Exploiting motion for deep learning reconstruction of extremely-undersampled dynamic mri,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 704–712.
- [24] C. Qin, J. Schlemper, J. Duan, G. Seegoolam, A. Price, J. Hajnal, and D. Rueckert, “k-t next: dynamic mr image reconstruction exploiting spatio-temporal correlations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 505–513.
- [25] J. Schlemper, J. Caballero, J. V. Hajnal, A. N. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for dynamic mr image reconstruction,” IEEE transactions on Medical Imaging, vol. 37, no. 2, pp. 491–503, 2017.
- [26] C. Qin, J. Schlemper, J. Caballero, A. N. Price, J. V. Hajnal, and D. Rueckert, “Convolutional recurrent neural networks for dynamic mr image reconstruction,” IEEE transactions on medical imaging, vol. 38, no. 1, pp. 280–290, 2018.
- [27] C. Qin, J. Duan, K. Hammernik, J. Schlemper, T. Küstner, R. Botnar, C. Prieto, A. N. Price, J. V. Hajnal, and D. Rueckert, “Complementary time-frequency domain networks for dynamic parallel mr image reconstruction,” Magnetic Resonance in Medicine, vol. 86, no. 6, pp. 3274–3291, 2021.
- [28] W. Huang, Z. Ke, Z.-X. Cui, J. Cheng, Z. Qiu, S. Jia, L. Ying, Y. Zhu, and D. Liang, “Deep low-rank plus sparse network for dynamic mr imaging,” Medical Image Analysis, vol. 73, p. 102190, 2021.
- [29] Z. Ke, W. Huang, Z.-X. Cui, J. Cheng, S. Jia, H. Wang, X. Liu, H. Zheng, L. Ying, Y. Zhu et al., “Learned low-rank priors in dynamic mr imaging,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3698–3710, 2021.
- [30] Z. Ke, J. Cheng, L. Ying, H. Zheng, Y. Zhu, and D. Liang, “An unsupervised deep learning method for multi-coil cine mri,” Physics in Medicine & Biology, vol. 65, no. 23, p. 235041, 2020.
- [31] J. Yoo, K. H. Jin, H. Gupta, J. Yerly, M. Stuber, and M. Unser, “Time-dependent deep image prior for dynamic mri,” IEEE Transactions on Medical Imaging, vol. 40, no. 12, pp. 3337–3348, 2021.
- [32] M. Acar, T. Çukur, and İ. Öksüz, “Self-supervised dynamic mri reconstruction,” in International Workshop on Machine Learning for Medical Image Reconstruction. Springer, 2021, pp. 35–44.
- [33] D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9446–9454.
- [34] B. Yaman, S. A. H. Hosseini, S. Moeller, J. Ellermann, K. Uğurbil, and M. Akçakaya, “Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data,” Magnetic resonance in medicine, vol. 84, no. 6, pp. 3172–3191, 2020.
- [35] M. Akçakaya, B. Yaman, H. Chung, and J. C. Ye, “Unsupervised deep learning methods for biological image reconstruction and enhancement: An overview from a signal processing perspective,” IEEE Signal Processing Magazine, vol. 39, no. 2, pp. 28–44, 2022.
- [36] D. Liang, J. Cheng, Z. Ke, and L. Ying, “Deep magnetic resonance image reconstruction: Inverse problems meet neural networks,” IEEE Signal Processing Magazine, vol. 37, no. 1, pp. 141–151, 2020.
- [37] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.
- [38] K. Lee and Y. Bresler, “Admira: Atomic decomposition for minimum rank approximation,” IEEE Transactions on Information Theory, vol. 56, no. 9, pp. 4402–4416, 2010.
- [39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- [40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR (Poster), 2015.