[email protected]
22institutetext: Pengcheng Laboratory, Shenzhen 518055, China 33institutetext: University of Chinese Academy of Sciences, Beijing 100049, China 44institutetext: School of Computer Science and Engineering, Beihang University, Beijing 100191, China 55institutetext: School of Physics and Electronic Science, Changsha University of Science and Technology, Changsha 410114, China
CSR-dMRI: Continuous Super-Resolution of Diffusion MRI with Anatomical Structure-assisted Implicit Neural Representation Learning
Abstract
Deep learning-based dMRI super-resolution methods can effectively enhance image resolution by leveraging the learning capabilities of neural networks on large datasets. However, these methods tend to learn a fixed scale mapping between low-resolution (LR) and high-resolution (HR) images, overlooking the need for radiologists to scale the images at arbitrary resolutions. Moreover, the pixel-wise loss in the image domain tends to generate over-smoothed results, losing fine textures and edge information. To address these issues, we propose a novel continuous super-resolution method for dMRI, called CSR-dMRI, which utilizes an anatomical structure-assisted implicit neural representation learning approach. Specifically, the CSR-dMRI model consists of two components. The first is the latent feature extractor, which primarily extracts latent space feature maps from LR dMRI and anatomical images while learning structural prior information from the anatomical images. The second is the implicit function network, which utilizes voxel coordinates and latent feature vectors to generate voxel intensities at corresponding positions. Additionally, a frequency-domain-based loss is introduced to preserve the structural and texture information, further enhancing the image quality. Extensive experiments on the publicly available HCP dataset validate the effectiveness of our approach. Furthermore, our method demonstrates superior generalization capability and can be applied to arbitrary-scale super-resolution, including non-integer scale factors, expanding its applicability beyond conventional approaches.
Keywords:
Diffusion MRI Continuous Super-resolution Implicit neural representation.1 Introduction
Diffusion magnetic resonance imaging (dMRI) can reflect early changes in the microstructure of brain tissue in neurological diseases by measuring the diffusion displacement distribution of water molecules in the brain tissue[11, 5]. As a non-invasive method, dMRI is extensively utilized for the diagnosis of brain diseases. Moreover, it provides a distinctive avenue for exploring the neural foundations of human cognitive behavior. However, to accurately estimate quantitative evaluation parameters for diffusion tensor imaging (DTI) or diffusion kurtosis imaging (DKI), it is typically necessary to acquire HR data, resulting in prolonged acquisition time for dMRI. This can lead to motion artifacts and patient discomfort. In clinical practice, reducing acquisition time is often achieved by sacrificing the spatial resolution of dMRI images, which can affect the estimation accuracy of DTI or DKI parameters. Therefore, the reconstruction of HR dMRI images from LR dMRI images holds significant clinical value.
Image post-processing methods are one of the feasible solutions to address the issue. There are some traditional interpolation methods, such as nearest-neighbor interpolation and linear interpolation, that can be utilized. However, Van et al.,[17] pointed out that interpolation methods often lead to blurred image edges and are unable to recover fine details. On the other hand, super-resolution reconstruction (SR) methods are an interesting alternative solution that can generate HR images from LR images[21, 10, 16, 19, 18]. The existing SR methods can be roughly divided into two categories: model-based methods[8, 12, 20, 15, 9, 13, 14] and data-driven methods[7, 3, 4]. Model-based methods rely on mathematical models to build connections between LR and HR images, while data-driven methods utilize neural networks to learn the nonlinear mapping relationship between LR and HR images. With the fast development of deep learning (DL), SR methods based on deep learning have been widely investigated[22, 6, 24]. For example, Lim et al.,[6] proposed EDSR, which improves super-resolution image reconstruction quality by stacking deeper or wider networks to extract more information within the same computational resources. Zhang et al.,[24] introduced the RDN network, which utilizes residual dense blocks with densely connected convolutional layers to extract rich local features. It also allows direct connections from all layers of the previous residual dense block (RDB) to the current RDB, making full use of hierarchical features extracted from the original LR image. These methods have shown promising performance in enhancing the quality of the reconstructed super-resolution images.
Recently, some researchers have started applying SR techniques to diffusion MRI[7, 1], primarily using learning-based methods. For instance, Chatterjee et al.,[1] proposed a ShuffleUNet architecture to reconstruct super-resolution dMRI data, addressing issues of image blurring and over-smoothing by replacing stridden convolutions with lossless pooling layers. Luo et al.,[7] proposed a sub-pixel convolution generative adversarial attentional network (SPC-GAAN) for the reconstruction super-resolution dMRI data, achieving promising results. However, these methods can only reconstruct super-resolution dMRI data with a fixed integer scale factor, limiting their applicability. In practical applications, due to differences in scanning protocols, the resolution of acquired dMRI data is inconsistent. To better facilitate clinical analysis, it is necessary to reconstruct them to a consistent resolution. The aforementioned methods may face challenges in achieving this. Additionally, the pixel-wise loss in the image domain, which was utilized in these methods, tends to generate over-smooth results, leading to the loss of fine textures and edge information.
To address these issues, we propose an anatomical structure-assisted implicit neural representation learning framework for continuous super-resolution of dMRI images. To the best of our knowledge, this is the first attempt of implicit neural representation learning in dMRI super-resolution. Specifically, we introduce implicit neural representation into the super-resolution of dMRI, enabling arbitrary-scale super-resolution for dMRI, and it can be used for transforming dMRI data at different resolutions. Our contributions can be summarized as follows:
1) We propose a novel paradigm for arbitrary-scale super-resolution in diffusion MRI by combining anatomical structure-assisted implicit neural representation learning, called CSR-dMRI. It can be used to transform dMRI data at different resolutions into dMRI data at the same resolution, facilitating clinical analysis.
2) The details and texture information of the image are improved by introducing anatomical image and frequency-domain-based loss.
3) Extensive experiments on the public HCP dataset demonstrate the effectiveness of our approach. Furthermore, our method exhibits better generalization and can be applied to non-integer scale factors, expanding the applicability of this approach.
2 Methods
2.1 Implicit Neural Representation Learning
In implicit neural representation learning, a voxel can be represented by a neural network as a continuous function. the network with parameters can be defined as:
(1) |
where the input is the normalized coordinate index in the voxel spatial field, and the output is the corresponding intensity value in the voxel. The network function maps coordinates to voxel intensities, effectively encoding the internal information of the entire voxel into the network parameters. The network with parameters is also referred to as the neural representation of the voxel. Our voxel coordinates are constructed based on the size of the dMRI data and normalized along each dimension to the range of [-1,1]. This network is only applicable to the super-resolution of individual dMRI images and is not suitable for multiple dMRI images. Additional auxiliary information needs to be provided.

2.2 Model Overview
The overall architecture of the proposed CSR-dMRI model is illustrated in Figure 1. Due to the primary focus of dMRI data on tissue microstructure and diffusion information, which provides limited anatomical information. So, we introduce T1-weighted data to provide anatomical prior information. Additionally, by incorporating frequency-domain-based loss to preserve texture and structural information in the image, the quality of the image is enhanced. The entire network consists of a latent feature extractor and an implicit function network. The latent feature extractor is used to extract features from LR dMRI and anatomical images, while the implicit function network predicts voxel intensities at arbitrary coordinates by integrating coordinates and corresponding latent feature vectors, thereby achieving super-resolution. Specifically, for a given pair of LR dMRI and anatomical images from the dataset , along with their corresponding HR dMRI, where represents the total number of samples, and represents the upsampling scale factor for the -th sample pair, dimension 2 indicates concatenating DWI images and anatomical images along the channel. We first utilize a latent feature extractor to transform the LR dMRI and anatomical images into feature maps , where each element corresponding to the feature vector of the LR dMRI at coordinate . For any voxel coordinate in HR dMRI, we generate the corresponding feature map by bilinear interpolation of . Subsequently, the queried coordinate and the corresponding feature vector are concatenated as input to , and the output of the is the predicted voxel intensity at spatial coordinate . By minimizing the difference between the predicted voxel intensity and the real voxel intensity , both the latent feature extractor and implicit function network are simultaneously optimized.
2.2.1 Latent Feature Extractor.
For the latent feature extractor, inspired by the works[2, 23], Residual Dense Network (RDN)[24] is used to extract latent space feature vectors from LR dMRI and anatomical image data. The latent feature extractor takes the LR data as input and outputs the feature map . This latent space feature extraction approach helps the implicit function network effectively integrate local information of the image, enabling the recovery of details in HR images even at large upsampling scale factors. Additionally, to maintain the scale of the extracted features uncharged, we removed the upsampling operation from the last layer of the original RDN network and expanded all 2D convolutional layers to 3D convolutional layers. To ensure extracting a sufficient amount of features for each coordinate position, the output channel number of the last layer is set to 128.
2.2.2 Implicit Function Network.
The implicit function network consists of a sequence of 8 consecutive fully connected layers, each followed by a activation layer. A residual connection is established between the input of the network and the output of the fourth activation layer. The goal of is to predict the voxel intensity at any spatial coordinate . The specific process is as follows:
(2) |
where represents the spatial coordinate position to be predicted, represents the specific latent feature vector of the voxel at spatial coordinate . Instead of using spatial coordinates alone as inputs to the implicit function network, we combine the spatial coordinates with their corresponding latent feature vectors. This approach effectively integrates the local semantic information from the image, thereby enhancing the ability of to recover image details.
2.2.3 Loss Function.
In image super-resolution tasks, the pixel-wise loss is commonly used as a loss function to improve the model’s performance and convergence[25]. However, since pixel-wise loss does not take into account the perceptual quality of the image, the generated results tend to be smooth and lack high-frequency details. To address this issue, we introduce a frequency-domain-based loss that better captures the frequency information of the image, preserving both the structure and texture details in the image, thereby further improving the image quality. The specific process is as follows:
(3) |
where and represent the fast Fourier transforms of and , respectively, as shown in Figure 1. denotes the total number of samples. Finally, the objective function of the CSR-dMRI model is defined as:
(4) |
we set =0.01 empirically to balance the two losses, denotes the L1 loss between and .
3 Experiments
3.1 Datasets
In this paper, we utilize data from the Human Connectome Project (HCP) WU-Minn-Ox Consortium public dataset111https://www.humanconnectome.org. We select 100 pre-processed diffusion and T1-weighted MRI data. Out of these, 70 were used for training, 10 for validation, and 20 for testing. The whole-brain diffusion MRI data were acquired with an isotropic resolution of 1.25mm, featuring four b-values (0, 1000, 2000, 3000). For diffusion MRI data, we only utilize the data from b1000, normalized by dividing the b1000 data by the average value of 18 b0 data. Firstly, we extract 9 patches of dimension size from each sample. Subsequently, these patches were cropped to generate HR patches of dimension size. Finally, the HR patches were downsampled to obtain LR patches of dimension size using bicubic interpolation. The scale is randomly sampled from a uniform distribution .
3.2 Experimental Setup
All networks are trained using the Pytorch framework with one NVIDIA RTX A6000 GPU (with 48 GB memory). The Adam optimizer is employed for model training with an initial learning rate of , and the batch size is set to 9. Every 200 epochs, the learning rate is multiplied by 0.5. The model is trained for a total of 1000 epochs. We compare our CSR-dMRI model with some state-of-the-art methods, including traditional interpolation method: Bicubic, deep learning-based fixed-scale super-resolution methods: DCSRN[3], ResCNN3D[4], and an arbitrary scale super-resolution method: ArSSR[23]. PSNR and SSIM are used for quantitative evaluation.
3.3 Results
To comprehensively assess the effectiveness of our approach, we conduct experiments in three aspects: the first involves in-distribution experiments, including and experiments.The second is out-of-distribution experiments where we tested the results under the . The third involves experiments with non-integer scale factors, testing the results under the .
In-distribution | Out-of-distribution | Non-integer scale | ||||||
2 | 3 | 4 | 2.4 | |||||
method | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM |
Bicubic | 25.8880 | 0.9075 | 23.6657 | 0.8360 | 22.6657 | 0.7774 | 24.6268 | 0.8778 |
DCSRN | 26.4039 | 0.8901 | 24.3236 | 0.8479 | 23.1425 | 0.7974 | 25.1044 | 0.8797 |
ResCNN3D | 26.7900 | 0.9291 | 24.7841 | 0.8667 | 23.3626 | 0.8135 | 25.4493 | 0.8927 |
ArSSR | 26.6123 | 0.9343 | 25.3040 | 0.9059 | 23.8770 | 0.8554 | 26.2907 | 0.9266 |
CSR-dMRI | 27.3611 | 0.9458 | 26.0762 | 0.9235 | 24.5061 | 0.8752 | 27.1196 | 0.9410 |
3.3.1 Quantitative Results.
Table 1 shows the quantitative results of different methods under in-distribution, out-of-distribution, and non-integer scale factor conditions. Since the training scales for the DCSRN and ResCNN3D models are fixed at and , respectively, when testing with the non-integer scale factor of , we first use the model for super-resolution and then perform downsampling to obtain the results. For testing with the out-of-distribution scale factor of , we first use the model for super-resolution and then upsampling to obtain the results. The arbitrary-scale super-resolution method ArSSR and our CSR-dMRI model are both trained at scale factors of to . Overall, our method achieves the best results across all scales. In out-of-distribution scenarios, our method outperforms existing approaches and maintains better performance even with non-integer scale factors. The experimental results confirm that our method excels in performance and generalization compared to existing methods, and it can also accommodate non-integer scale factors effectively, meeting the diverse imaging resolution needs of medical professionals.

3.3.2 Qualitative Results.
To further compare the performance of the models, Figure 2 displays the reconstruction results of various methods at a non-integer scale () in the axial, sagittal, and coronal directions, along with corresponding local zoomed-in results for two areas. From a visual perspective, our method can preserve more detailed information. Compared to ArSSR, our method CSR-dMRI performs better in terms of image details, benefiting from the introduced anatomical images and frequency-domain-based loss constraints. Additionally, locally magnified images also indicate that our method can eliminate block artifacts and preserve better texture details. Overall, our method achieves the best results.
3.4 Ablation Study
To evaluate the effectiveness of key components in the CSR-dMRI model, we conduct ablation experiments on different components, as shown in Table 2. INR indicates whether implicit neural representations are used, as ArSSR[23], T1w indicates the inclusion of anatomical image, indicates the introduction of frequency-domain-based loss. The quantitative results in Table 2 indicate that different components have positively contributed to the results. Furthermore, the structural prior information provided by anatomical images effectively enhances the image quality.
In-distribution | out-of-distribution | Non-integer scale | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
2x | 3x | 4x | 2.4x | ||||||||
Setting | INR | T1w | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |
Baseline | 26.9584 | 0.9352 | 24.8094 | 0.8789 | 23.3321 | 0.8240 | 25.4377 | 0.9015 | |||
M1 | 26.6123 | 0.9343 | 25.3040 | 0.9059 | 23.8770 | 0.8554 | 26.2907 | 0.9266 | |||
M2 | 27.0454 | 0.9444 | 25.8534 | 0.9222 | 24.2769 | 0.8734 | 26.9627 | 0.9408 | |||
M3 | 26.8047 | 0.9354 | 25.5623 | 0.9089 | 24.0327 | 0.8575 | 26.5996 | 0.9294 | |||
CSR-dMRI | 27.3611 | 0.9458 | 26.0762 | 0.9235 | 24.5061 | 0.8752 | 27.1196 | 0.9410 |
4 Conclusion
In this paper, we propose a CSR-dMRI method for continuous super-resolution of diffusion MRI with anatomical structure-assisted implicit neural representation learning. Details and texture information in the image are preserved by incorporating anatomical images and frequency-domain-based loss during training. Extensive experiments on the HCP dataset indicate superior performance and generalization of our CSR-dMRI model, showing applicability across non-integer scale factors. This contributes to addressing the clinical demand for images at different resolutions.
4.0.1 Acknowledgements.
This research was partly supported by the National Natural Science Foundation of China (62222118, U22A2040), Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application (2022B1212010011), Shenzhen Science and Technology Program (RCYX20210706092104034, JCYJ20220531100213029), and Youth Innovation Promotion Association CAS.
References
- [1] Chatterjee, S., Sciarra, A., Dünnwald, M., Mushunuri, R.V., Podishetti, R., Rao, R.N., Gopinath, G.D., Oeltze-Jafra, S., Speck, O., Nürnberger, A.: ShuffleUNet: Super resolution of diffusion-weighted MRIs using deep learning. In: 2021 29th European Signal Processing Conference (EUSIPCO). pp. 940–944. IEEE (2021)
- [2] Chen, Y., Liu, S., Wang, X.: Learning continuous image representation with local implicit image function. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8628–8638 (2021)
- [3] Chen, Y., Xie, Y., Zhou, Z., Shi, F., Christodoulou, A.G., Li, D.: Brain MRI super resolution using 3D deep densely connected neural networks. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018). pp. 739–742. IEEE (2018)
- [4] Du, J., He, Z., Wang, L., Gholipour, A., Zhou, Z., Chen, D., Jia, Y.: Super-resolution reconstruction of single anisotropic 3D MR images using residual convolutional neural network. Neurocomputing 392, 209–220 (2020)
- [5] Li, C., Li, W., Liu, C., Zheng, H., Cai, J., Wang, S.: Artificial intelligence in multiparametric magnetic resonance imaging: A review. Medical physics 49(10), e1024–e1054 (2022)
- [6] Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 136–144 (2017)
- [7] Luo, S., Zhou, J., Yang, Z., Wei, H., Fu, Y.: Diffusion MRI super-resolution reconstruction via sub-pixel convolution generative adversarial network. Magnetic Resonance Imaging 88, 101–107 (2022)
- [8] Nedjati-Gilani, S., Alexander, D.C., Parker, G.J.: Regularized super-resolution for diffusion MRI. In: 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro. pp. 875–878. IEEE (2008)
- [9] Ning, L., Setsompop, K., Michailovich, O., Makris, N., Westin, C.F., Rathi, Y.: A compressed-sensing approach for super-resolution reconstruction of diffusion MRI. In: Information Processing in Medical Imaging: 24th International Conference, IPMI 2015, Sabhal Mor Ostaig, Isle of Skye, UK, June 28-July 3, 2015, Proceedings 24. pp. 57–68. Springer (2015)
- [10] Park, S.C., Park, M.K., Kang, M.G.: Super-resolution image reconstruction: A technical overview. IEEE signal processing magazine 20(3), 21–36 (2003)
- [11] Razek, A.A.K.A., Ashmalla, G.A.: Assessment of paraspinal neurogenic tumors with diffusion-weighted MR imaging. European Spine Journal 27, 841–846 (2018)
- [12] Scherrer, B., Gholipour, A., Warfield, S.K.: Super-resolution in diffusion-weighted imaging. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2011: 14th International Conference, Toronto, Canada, September 18-22, 2011, Proceedings, Part II 14. pp. 124–132. Springer (2011)
- [13] Shi, F., Cheng, J., Wang, L., Yap, P.T., Shen, D.: LRTV: MR image super-resolution with low-rank and total variation regularizations. IEEE transactions on medical imaging 34(12), 2459–2466 (2015)
- [14] Shi, F., Cheng, J., Wang, L., Yap, P.T., Shen, D.: Super-resolution reconstruction of diffusion-weighted images using 4D low-rank and total variation. In: Computational Diffusion MRI: MICCAI Workshop, Munich, Germany, October 9th, 2015. pp. 15–25. Springer (2016)
- [15] Tobisch, A., Neher, P.F., Rowe, M.C., Maier-Hein, K.H., Zhang, H.: Model-based super-resolution of diffusion MRI. In: Computational Diffusion MRI and Brain Connectivity: MICCAI Workshops, Nagoya, Japan, September 22nd, 2013. pp. 25–34. Springer (2014)
- [16] Umirzakova, S., Ahmad, S., Khan, L.U., Whangbo, T.: Medical image super-resolution for smart healthcare applications: A comprehensive survey. Information Fusion p. 102075 (2023)
- [17] Van Ouwerkerk, J.: Image super-resolution survey. Image and vision Computing 24(10), 1039–1052 (2006)
- [18] Wang, S., Cao, G., Wang, Y., Liao, S., Wang, Q., Shi, J., Li, C., Shen, D.: Review and prospect: Artificial intelligence in advanced medical imaging. Frontiers in radiology 1, 781868 (2021)
- [19] Wang, S., Wu, R., Jia, S., Diakite, A., Li, C., Liu, Q., Zheng, H., Ying, L.: Knowledge-driven deep learning for fast MR imaging: Undersampled MR image reconstruction from supervised to un-supervised learning. Magnetic Resonance in Medicine 92(2), 496–518 (2024)
- [20] Wang, S., Wu, R., Li, C., Zou, J., Zhang, Z., Liu, Q., Xi, Y., Zheng, H.: PARCEL: Physics-based unsupervised contrastive representation learning for multi-coil MR imaging. IEEE/ACM Transactions on Computational Biology and Bioinformatics 20(5), 2659–2670 (2022)
- [21] Wang, S., Xiao, T., Liu, Q., Zheng, H.: Deep learning for fast MR imaging: A review for learning reconstruction from incomplete k-space data. Biomedical Signal Processing and Control 68, 102579 (2021)
- [22] Wang, Z., Chen, J., Hoi, S.C.: Deep learning for image super-resolution: A survey. IEEE transactions on pattern analysis and machine intelligence 43(10), 3365–3387 (2020)
- [23] Wu, Q., Li, Y., Sun, Y., Zhou, Y., Wei, H., Yu, J., Zhang, Y.: An arbitrary scale super-resolution approach for 3D MR images via implicit neural representation. IEEE Journal of Biomedical and Health Informatics 27(2), 1004–1015 (2022)
- [24] Zhang, Y., Tian, Y., Kong, Y., Zhong, B., Fu, Y.: Residual dense network for image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2472–2481 (2018)
- [25] Zhao, H., Gallo, O., Frosio, I., Kautz, J.: Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging 3(1), 47–57 (2016)