This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Image Processing Center, Beihang University, Beijing, China 22institutetext: Robotics Institute, Beihang University, Beijing, China 33institutetext: Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China 33email: {tsai, jackybxz}@buaa.edu.cn

MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration

Tao Guo 11    Yinuo Wang 11    Shihao Shu 11    Diansheng Chen 22    Zhouping Tang 33    Cai Meng 11    Xiangzhi Bai 11
Abstract

Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration module and a fine-grained, yet simple, feature extractor for efficient long-range correspondence modeling and high-dimensional feature learning, respectively. Additionally, we develop a well-annotated brain MR-CT registration dataset, SR-Reg, to address the scarcity of data in multi-modality registration. To validate MambaMorph’s multi-modality registration capabilities, we conduct quantitative experiments on both our SR-Reg dataset and a public T1-T2 dataset. The experimental results on both datasets demonstrate that MambaMorph significantly outperforms the current state-of-the-art learning-based registration methods in terms of registration accuracy. Further study underscores the efficiency of the Mamba-based registration module and the lightweight feature extractor, which achieve notable registration quality while maintaining reasonable computational costs and speeds. We believe that MambaMorph holds significant potential for practical applications in medical image registration. The code for MambaMorph is available at https://github.com/Guo-Stone/MambaMorph.

Keywords:
Multi-modality registration Mamba Feature learning.

1 Introduction

Deformable image registration is crucial in medical image analysis, compensating for non-rigid anatomical deformation due to factors like surgery invention and different imaging techniques. It is essential for accurately aligning images spatially before analysis, as exemplified by the up to 10 mm deformation [20] between preoperative MR and intraoperative CT scans that affine registration alone cannot rectify. Effective registration allows for precise anatomical brain segmentation and surgical navigation.

While traditional registration methods, as illustrated by references such as [1, 7], can compute diffeomorphic displacement fields and achieve considerable accuracy in registration, their extensive computational demands and lengthy processing times limit their applicability in real-time scenarios. Over the last decade, learning-based registration techniques, exemplified by VoxelMorph [2], have demonstrated their efficiency in fast registration, occasionally matching the accuracy of traditional methods. Within a similar registration framework, VoxelMorph and subsequent variations [4, 6, 13, 14, 16, 17, 18] concatenate moving volume and fixed volume into a single volume, which is then processed through a UNet-like module to compute the deformation field. Specifically, LKU-Net [16] integrates a parallel convolutional block to enhance its receptive field. Similarly, TransMorph [4] employs a Swin Transformer within its encoder to capture long-distance correspondence, thereby enhancing registration precision. These approaches, which combine feature extraction and registration into a single process, impose considerable demands on the neural network’s ability to learn effective features and ensure accurate alignment [21]. To mitigate these challenges, XMorpher [21] adopts a comprehensive transformer architecture, leveraging cross-attention mechanisms to refine volume representation. Despite XMorpher’s impressive performance in registration, it incurs a notable computational expense.

Volumes from different imaging modalities, like MR and CT, display notable differences in intensity distributions, which present challenges for existing registration methods. In the context of multi-modality deformable registration, researchers typically face two main hurdles: (1) the scarcity of annotated data and (2) the challenge of efficiently representing volumes from different modalities. To address the lack of multi-modality data, SynthMorph [13] generates volumes with a broad range of intensity distributions for training the registration network, enhancing the model’s ability to adapt to new modalities. However, this method tends to converge slowly due to the poor quality of the synthetic data. To reduce reliance on segmentation labels, ContraReg [5] utilizes a pretrained autoencoder for assessing similarity between multi-modality volumes in a representational space. Similarly, SAME [19] employs a pretrained feature extractor to provide high-quality representations. Although these strategies have shown improvements in registration accuracy, they face challenges in swiftly adapting to novel modalities. An alternative strategy for MR-CT registration [10] converts the multi-modality registration challenge into two separate single-modality registrations through image synthesis, which eliminates the necessity for complex feature learning and multi-modality similarity measurement but suffers from an impractical, oversized, and challenging-to-train framework. Moreover, a cross-modal attention mechanism [22] has been introduced to directly extract features from transrectal ultrasound and MR for both rigid and deformable registration, showcasing the mechanism’s quadratic complexity and the associated heavy computational load, which restricts its application to small volume sizes.

To tackle the issues mentioned above, we propose a registration framework, MambaMorph, equipped with a novel registration module and a fine-grained feature extractor, and develop a well-annotated brain MR-CT registration dataset. To summarize, our contributions are as follows: (1) We introduce a registration module that incorporates an advanced sequence modeling technique, named Mamba. Mamba [8] has proven its excellence in long-range modeling with linear complexity, particularly within the field of Natural Language Processing. Drawing inspiration from TransMorph [4], we integrate Mamba blocks into the encoder of our registration module. This integration, illustrated in Fig 1, is designed to efficiently capture long-range spatial relationships while optimizing memory usage. (2) We propose a fine-grained feature extractor prior to the registration module, for high-dimensional feature learning. Separated from the registration phase, this extractor is designed as a simple U-Net with only one down-sampling step, thereby capturing pixel-wise features for deformable registration. (3) We develop a brain MR-CT registration dataset, named SR-Reg, to address the data deficiency in brain MR-CT registration. SR-Reg comprises 180 pairs of well-aligned, skull-stripped, and intensity-rectified MR-CT volumes, each accompanied by corresponding segmentation labels. (4) Experimental results on two datasets demonstrate MambaMorph’s exceptional registration accuracy and practicality. On our SR-Reg dataset, MambaMorph surpasses the second-best method by nearly 6% in terms of Dice score and achieves an inference speed of 0.27 seconds per pair. Additionally, MambaMorph attains the highest accuracy on a public T1-T2 dataset.

2 Method

Refer to caption
Figure 1: The framework of MambaMorph

To perform multi-modality (e.g. MR-CT, T1-T2) deformable registration, our MambaMorph adopts two proposed blocks, i.e. Mamba-based registration module and fine-grained feature extractor. The framework of MambaMorph is illustrated in Fig. 1.

Given a pair of moving volume xmx_{m} and fixed volume xfx_{f}, their corresponding segmentation sms_{m} and sfs_{f}, registration module ψ(,)\mathcal{R}_{\psi}(\cdot,\cdot) and feature extractor θ()\mathcal{F}_{\theta}(\cdot), the training objective of registration in this paper can be formulated as follows:

minψ,θdice(smϕ,sf)+λssmooth(ϕ)\min\limits_{\psi,\theta}\mathcal{L}_{dice}(s_{m}\circ\phi,s_{f})+\lambda_{s}\mathcal{L}_{smooth}(\phi) (1)

where deformation field ϕ=ψ(fm,ff)\phi=\mathcal{R}_{\psi}(f_{m},f_{f}), feature fz|z{m,f}=θ(xz)f_{z|z\in\{m,f\}}=\mathcal{F}_{\theta}(x_{z}), warping operation \circ is applied by spatial transformation network (STN), dice(,)\mathcal{L}_{dice}(\cdot,\cdot) and smooth()\mathcal{L}_{smooth}(\cdot) are the weak-supervised loss and the smooth loss in VoxelMorph [2].

In this section, we introduce our mamba-based registration module ψ(,)\mathcal{R}_{\psi}(\cdot,\cdot) in subsection 3.1 and feature extractor θ()\mathcal{F}_{\theta}(\cdot) in subsection 3.2.

2.1 Mamba-based registration module

Following the framework of TransMorph [4], our MambaMorph’s registration module takes some modifications (i.e. Mamba block and position embedding), as shown in Fig. 1. After splitting a full size volume (e.g. 192×\times208×\times176 voxels in this paper), one may obtain a sequence with about 0.1M tokens (patches). Given its quadratic complexity in relation to sequence length, the transformer would impose a significant computational burden and thus lacks practicality in this context. As a potential replacer of transformer, Mamba [8] is more good at dealing with considerably long sequence (> 1M tokens) with linear complexity. Therefore, we utilize Mamba blocks in our registration module to improve registration performance and reduce computational demands.

Developed from SSM (State Space Model) [9], Mamba [8] can be seen as an input-dependent variant of RNN. Similar to RNN and transformer, Mamba accepts an input tensor of shape B×L×CB\times L\times C and outputs a tensor of the same shape. BB, LL, CC denote batch size, sequence length and number of channel respectively. In the main stream of Mamba block, the input are projected linearly via an MLP and sent to a 1D convolution layer then. After a SiLU/Swish layer, the tensor is processed in the SSM layer. Concretely, the original SSM layer [9] defines a sequence-to-sequence mapping, 𝒙L×C𝒚L×C\boldsymbol{x}\in\mathbbm{R}^{L\times C}\rightarrow\boldsymbol{y}\in\mathbbm{R}^{L\times C}. The formulation is:

𝒛l\displaystyle\boldsymbol{z}_{l} =𝑨¯𝒛l1+𝑩¯𝒙l\displaystyle=\boldsymbol{\bar{A}}\boldsymbol{z}_{l-1}+\boldsymbol{\bar{B}}\boldsymbol{x}_{l} (2)
𝒚l\displaystyle\boldsymbol{y}_{l} =𝑪𝒛l\displaystyle=\boldsymbol{C}\boldsymbol{z}_{l} (3)

where l{1,,L}l\in\{1,\dots,L\} denotes token index; 𝒛L×N\boldsymbol{z}\in\mathbbm{R}^{L\times N} is hidden state and NN is the dimension of the state; 𝑨¯N×N,𝑩¯N×C,𝑪C×N\boldsymbol{\bar{A}}\in\mathbbm{R}^{N\times N},\boldsymbol{\bar{B}}\in\mathbbm{R}^{N\times C},\boldsymbol{C}\in\mathbbm{R}^{C\times N} are three transition matrices, wherein 𝑨¯\boldsymbol{\bar{A}} and 𝑩¯\boldsymbol{\bar{B}} are discretized via zero-order hold. In this case, the parameters of these transition matrices are constant while the tokens are variable, limiting the capacity of the original SSM layer. When it comes to the SSM layer of Mamba, three transition matrices are calculated through linear projection directly or indirectly with respect to input 𝒙l\boldsymbol{x}_{l}, making them input-dependent. After the introduction of input-dependent projection, these matrices can be viewed as gates in LSTM [12] which efficiently establish correlations between tokens. Thanks to the input-dependent mechanism and several hardware-aware techniques, Mamba is more capable for long-range modeling with less computational cost compared to transformer. Another branch of Mamba block introduces a gated mechanism to be selective with respect to data. Notably, we add sinusoidal position embedding onto the split input to make it position-aware.

MambaMorph’s registration module takes a volume as input, which is concatenated from moving feature and fixed feature. The input flows into two branches - a horizontal one with full volume and a UNet-like one where the volume is split. At the latter branch, the volume, added with sinusoidal position embedding, is split into amounts of patches (as shown in Fig. 1, split and position emb.) and seen as a sequence of tokens (patches). After linear projection, consecutive stages of Mamba block and patch merging are applied on the sequence. Patch merging is a variant of down sampling for image sequence, involving decreasing the amount of tokens from B×(H×W×D)×CB\times(H\times W\times D)\times C to B×(H/2×W/2×D/2)×8CB\times(H/2\times W/2\times D/2)\times 8C and projecting the number of channel from 8C8C to 2C2C then. In the UNet-like module, Mamba-based encoder aims to capture long-range correspondence while the CNN-based decoder is for local feature.

2.2 Fine-grained feature extraction

From our perspective, the current methods [2, 6, 14, 13, 4] are seen as registration modules, lacking in effective feature extraction. These frameworks concatenate two volumes and treat them as a single volume to establish spatial correspondence. While effective for registering single-modality volumes due to their similar appearance, it may encounters difficulties with multi-modality registration. In this case, it is crucial to extract corresponding features from identical regions across two volumes, especially when these volumes exhibit significantly different appearances. Besides, GVSL [11] proves that deformable registration and feature extraction can mutually enhance each other. Considering that deformable registration is a pixel-wise task, we propose a fine-grained but simple UNet with only one down-sampling operation as our feature extractor, as shown in Fig. 1. The relatively small receptive field of the UNet is designed to maximize the retention of local information, thereby supporting the fine-grained nature of the task. Different from SAME [19] which leverages pretrained feature, MambaMorph learns to extract feature and register simultaneously. In our setting, the two feature extractors share weights.

Refer to caption
Figure 2: Example registration results among baselines and MambaMorph in two test cases of SR-Reg. Contours of the lateral and fourth ventricle segmentations (defined in the fixed CT volumes) are depicted in red and green, respectively.

3 Experiments

3.1 Experiment setting

3.1.1 Data usage and metrics

The SynthRAD 2023 dataset [23], originally introduced during a synthetic CT (sCT) campaign111https://synthrad2023.grand-challenge.org/, comprises 180 MR-CT pairs that are both accessible and well-registered, with each pair belonging to the same subject. Therefore, the brain segmentation labels for CT scans are presumed to be identical to those for the corresponding MR scans [10]. Based on this dataset, we develop a new brain MR-CT registration dataset, named SR-Reg (SynthRAD Registration)222This dataset is provided under a CC-BY-NC 4.0 International license (https://creativecommons.org/licenses/by-nc/4.0/) and available at
https://drive.google.com/drive/folders/1qxUM-PuvWe1S6GvWudyKUXY8p˙jnP˙gN?usp=drive˙link.
. The main processing steps include brain segmentation using SynthSeg [3], skull stripping with SynthStrip [15] and intensity rectification. As a result, the SR-Reg dataset consists of 180 well-aligned, skull-stripped and intensity-rectified MR-CT pairs, with corresponding segmentation labels. Data processing flowchart is shown in the supplementary material.

We compare deformable registration methods on our SR-Reg dataset (MR-CT) and the public IXI dataset333https://brain-development.org/ixi-dataset/ (T1-T2). For training, validation, and testing, subjects are divided into 150/10/20 and 427/50/100, respectively. All volumes have dimensions of 192×\times208×\times176 voxels with a resolution of 1×\times1×\times1 mm3mm^{3}. In our experiments, the moving volumes are MR and T1, while the fixed volumes are CT and T2, respectively. We note that the volume pairs in test set remain constant and the intensity values of all volumes are normalized to [0,1] using min-max normalization.

We use the mean dice similarity coefficients (Dice) of the 16 regions of interest and 95%\% Hausdorff distance (HD95) to measure the registration accuracy, percentage of non-positive Jacobian determinant (P|Jϕ|0P_{|J_{\phi}|\leq 0}) to evaluate the diffeomorphic property of deformation field, inference time (Time), GPU memory usage (Memory) and amount of parameter to evaluate the practicality of model.

3.1.2 Model selection and implementation

To validate MambaMorph’s superiority in multi-modality deformable registration, we compare it with 3 representative learning-based methods (i.e. VoxelMorph [2], TransMorph [4] and XMorpher[21]). The feature extractor of MambaMorph is a 2-layer depth UNet network whose channel numbers are 16 at all layers. In further study, we introduce extra two models, TransMorphfeat (i.e. TransMorph with feature extractor) and MambaMorphreg (i.e. MambaMorph without feature extractor). Due to the size issue, we reduce the depth of UNet-like networks to 3 layers in all models excepts for VoxelMorph. The weight of smooth\mathcal{L}_{smooth} is set to 0.1 and the learning rate is 0.0001. Each model is trained with Adam optimizer for 100 epochs. All the experiments are implemented by PyTorch on NVIDIA GeForce RTX 4090 GPUs.

3.2 Results and Analysis

3.2.1 Quantitative comparison

As summarized in Tab. 1, our MambaMorph significantly outperforms three advanced registration methods on SR-Reg dataset, in terms of Dice and HD95. Particularly on SR-Reg dataset, the Dice of MambaMorph is about 6% higher than that of the second-best method, XMorpher, which is based on transformer. We assume that this advantage is attributed to Mamba’s superior long-range modeling capabilities. As illustrated in Fig. 2, MambaMorph achieves visually superior registration results. Even on the IXI dataset, our MambaMorph demonstrates superior performance w.r.t. registration accuracy. However, the diffeomorphic property of MambaMorph (P|Jϕ|0P_{|J_{\phi}|\leq 0}) is not as outstanding as its registration accuracy (Dice and HD95). It’s speculated that the smoothness of Mamba’s output cannot compare with that produced by CNN.

Table 1: Quantitative comparison of investigated methods on the testing datasets over SR-Reg and IXI. Methods of bold font are ours, values of bold font are the best.
Methods SR-Reg (MR-CT) IXI (T1-T2)
Dice (%) \uparrow HD95 (mm) \downarrow P|Jϕ|0P_{|J_{\phi}|\leq 0} (%) \downarrow Dice (%) \uparrow HD95 (mm) \downarrow P|Jϕ|0P_{|J_{\phi}|\leq 0} (%) \downarrow
Initial 62.42±\pm3.29 3.73±\pm0.41 - 45.66±\pm14.07 7.25±\pm3.52 -
VoxelMorph [2] 74.20±\pm2.86 2.81±\pm0.39 0.14±\pm0.02 81.36±\pm2.42 2.12±\pm0.36 0.38±\pm0.12
TransMorph [4] 75.08±\pm2.20 2.76±\pm0.35 0.36±\pm0.04 86.89±\pm1.13 1.56±\pm0.18 1.70±\pm0.44
XMorpher [21] 76.84±\pm2.54 2.44±\pm0.31 0.17±\pm0.04 80.45±\pm2.16 2.14±\pm0.32 0.73±\pm0.29
MambaMorph 82.71±\pm1.45 2.00±\pm0.22 0.34±\pm0.02 87.52±\pm1.51 1.53±\pm0.24 0.77±\pm0.18

3.2.2 Further study

The results presented in Tab. 2 demonstrate the effectiveness of the Mamba block and the feature extractor, as well as the practicality of MambaMorph, in multi-modality deformable registration. Notably, MambaMorphreg only varies from TransMorph in the architecture of encoder. This slight difference allows MambaMorphreg to surpass TransMorph in terms of Dice by a large margin, thereby illustrating the advantages of Mamba over the transformer in medical image registration. Comparing the Dice between MambaMorphreg and MambaMorph, the inclusion of a feature extractor in the latter one leads to substantial improvements. The remarkable enhancements achieved with such a simple feature extractor prove the significance of feature learning in multi-modality registration. In addition, by equipping both with a feature extractor, MambaMorph outperforms TransMorphfeat in almost all aspects, which once again proves the efficacy of Mamba. Overall, MambaMorph not only reaches high registration accuracy but also maintains an acceptable computational load.

Table 2: Further study on SR-Reg. Feat denotes feature extractor. TransMorphfeat refers to TransMorph with feature extractor. MambaMorphreg is the mamba-based registration module.
Method Feat Dice (%) Time (s) Memory (Gb) Parameter (Mb)
VoxelMorph [2] 74.20±\pm2.86 0.09 4.61 0.33
XMorpher [21] 76.84±\pm2.54 2.19 3.62 8.40
TransMorph [4] 75.08±\pm2.20 0.16 6.15 14.29
TransMorphfeat 80.60±\pm1.44 0.27 8.34 14.57
MambaMorphreg 78.98±\pm2.02 0.16 5.41 7.31
MambaMorph 82.71±\pm1.45 0.27 7.60 7.59

4 Conclusion

In this paper, we propose a Mamba-based framework for multi-modality deformable registration, named MambaMorph. The encoder of MambaMorph’s registration module adopts the latest network, Mamba, which is more effective in capturing long-range correlations compared to transformer. Recognizing the critical role of feature learning in multi-modality registration, we introduce a simple feature extractor in MambaMorph to learn fine-grained features. To alleviate data deficiency in brain MR-CT registration, we developed an MR-CT dataset called SR-Reg. Experiments on both SR-Reg dataset and IXI dataset highlight Mamba’s potential and underscore the importance of feature learning in this case. Although we have not yet fully explore the potential of Mamba and feature learning in this case, these areas remain a focus for our future research. To the best of our knowledge, MambaMorph is the first method integrating Mamba into image registration. We anticipate its adoption in clinical practice.

{credits}

4.0.1 Acknowledgements

This work was supported in part by ***.

References

  • [1] Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis 12(1), 26–41 (2008). https://doi.org/10.1016/j.media.2007.06.004
  • [2] Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: a learning framework for deformable medical image registration. IEEE transactions on medical imaging 38(8), 1788–1800 (2019). https://doi.org/10.1109/TMI.2019.2897538
  • [3] Billot, B., Greve, D.N., Puonti, O., Thielscher, A., Van Leemput, K., Fischl, B., et al.: Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining. Medical image analysis 86, 102789 (2023). https://doi.org/10.1016/j.media.2023.102789
  • [4] Chen, J., Frey, E.C., He, Y., Segars, W.P., Li, Y., Du, Y.: Transmorph: Transformer for unsupervised medical image registration. Medical image analysis 82, 102615 (2022). https://doi.org/10.1016/j.media.2022.102615
  • [5] Dey, N., Schlemper, J., Salehi, S.S.M., Zhou, B., Gerig, G., Sofka, M.: Contrareg: Contrastive learning of multi-modality unsupervised deformable image registration. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 66–77. Springer (2022). https://doi.org/10.1007/978-3-031-16446-0_7
  • [6] Dou, H., Bi, N., Han, L., Huang, Y., Mann, R., Yang, X., et al.: Gsmorph: Gradient surgery for cine-mri cardiac deformable registration. arXiv preprint arXiv:2306.14687 (2023), https://arxiv.org/abs/2306.14687
  • [7] Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012). https://doi.org/10.1016/j.neuroimage.2012.01.021
  • [8] Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023), https://arxiv.org/abs/2312.00752
  • [9] Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021), https://arxiv.org/abs/2111.00396
  • [10] Han, R., Jones, C.K., Lee, J., Wu, P., Vagdargi, P., Uneri, A., et al.: Deformable mr-ct image registration using an unsupervised, dual-channel network for neurosurgical guidance. Medical image analysis 75, 102292 (2022). https://doi.org/10.1016/j.media.2021.102292
  • [11] He, Y., Yang, G., Ge, R., Chen, Y., Coatrieux, J.L., Wang, B., et al.: Geometric visual similarity learning in 3d medical image self-supervised pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9538–9547 (2023)
  • [12] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
  • [13] Hoffmann, M., Billot, B., Greve, D.N., Iglesias, J.E., Fischl, B., Dalca, A.V.: Synthmorph: learning contrast-invariant registration without acquired images. IEEE transactions on medical imaging 41(3), 543–558 (2021). https://doi.org/10.1109/TMI.2021.3116879
  • [14] Hoopes, A., Hoffmann, M., Fischl, B., Guttag, J., Dalca, A.V.: Hypermorph: Amortized hyperparameter learning for image registration. In: Information Processing in Medical Imaging: 27th International Conference, IPMI 2021, Virtual Event, June 28–June 30, 2021, Proceedings 27. pp. 3–17. Springer (2021). https://doi.org/10.1007/978-3-030-78191-0_1
  • [15] Hoopes, A., Mora, J.S., Dalca, A.V., Fischl, B., Hoffmann, M.: Synthstrip: Skull-stripping for any brain image. NeuroImage 260, 119474 (2022). https://doi.org/10.1016/j.neuroimage.2022.119474
  • [16] Jia, X., Bartlett, J., Zhang, T., Lu, W., Qiu, Z., Duan, J.: U-net vs transformer: Is u-net outdated in medical image registration? In: International Workshop on Machine Learning in Medical Imaging. pp. 151–160. Springer (2022). https://doi.org/10.1007/978-3-031-21014-3_16
  • [17] Joshi, A., Hong, Y.: R2net: Efficient and flexible diffeomorphic image registration using lipschitz continuous residual networks. Medical Image Analysis 89, 102917 (2023). https://doi.org/10.1016/j.media.2023.102917
  • [18] Kim, B., Kim, D.H., Park, S.H., Kim, J., Lee, J.G., Ye, J.C.: Cyclemorph: cycle consistent unsupervised deformable image registration. Medical image analysis 71, 102036 (2021). https://doi.org/10.1016/j.media.2021.102036
  • [19] Liu, F., Yan, K., Harrison, A.P., Guo, D., Lu, L., Yuille, A.L., et al.: Same: Deformable image registration based on self-supervised anatomical embeddings. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24. pp. 87–97. Springer (2021). https://doi.org/10.1007/978-3-030-87202-1_9
  • [20] Nowell, M., Rodionov, R., Diehl, B., Wehner, T., Zombori, G., Kinghorn, J., et al.: A novel method for implementation of frameless stereoeeg in epilepsy surgery. Neurosurgery 10(4),  525 (2014). https://doi.org/10.1227/NEU.0000000000000544
  • [21] Shi, J., He, Y., Kong, Y., Coatrieux, J.L., Shu, H., Yang, G., et al.: Xmorpher: Full transformer for deformable medical image registration via cross attention. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 217–226. Springer (2022). https://doi.org/10.1007/978-3-031-16446-0_21
  • [22] Song, X., Chao, H., Xu, X., Guo, H., Xu, S., Turkbey, B., et al.: Cross-modal attention for multi-modal image registration. Medical Image Analysis 82, 102612 (2022). https://doi.org/10.1016/j.media.2022.102612
  • [23] Thummerer, A., van der Bijl, E., Galapon Jr, A., Verhoeff, J.J., Langendijk, J.A., Both, S., et al.: Synthrad2023 grand challenge dataset: Generating synthetic ct for radiotherapy. Medical Physics (2023). https://doi.org/10.1002/mp.16529