Annealed Score-Based Diffusion Model for MR Motion Artifact Reduction
Abstract
Motion artifact reduction is one of the important research topics in MR imaging, as the motion artifact degrades image quality and makes diagnosis difficult. Recently, many deep learning approaches have been studied for motion artifact reduction. Unfortunately, most existing models are trained in a supervised manner, requiring paired motion-corrupted and motion-free images, or are based on a strict motion-corruption model, which limits their use for real-world situations. To address this issue, here we present an annealed score-based diffusion model for MRI motion artifact reduction. Specifically, we train a score-based model using only motion-free images, and then motion artifacts are removed by applying forward and reverse diffusion processes repeatedly to gradually impose a low-frequency data consistency. Experimental results verify that the proposed method successfully reduces both simulated and in vivo motion artifacts, outperforming the state-of-the-art deep learning methods.
Index Terms:
MRI, motion artifact, score-based models, diffusion modelsI Introduction
Magnetic resonance imaging (MRI) is an imaging technique that provides various types of contrast images without radiation exposure or invasive procedure. Despite many advantages, MRI requires a long acquisition time due to its imaging physics. Furthermore, the long acquisition time leads to motion artifacts due to the movement of the patient, so the motion artifact is considered one of the main problems of MRI.
In addition, the contrast agent injection may cause motion artifacts in MRI. For example, gadoxetic acid (Gd-EOB-DTPA) is one of the liver-specific MRI contrast agents that can help the diagnosis of diseases such as hepatocellular carcinoma, liver metastases [1, 2] by providing hepatobiliary phase (HBP) imaging [3]. However, the administration of Gd-EOB-DTPA can occur acute transient dyspnea, resulting in transient severe motion (TSM) [4]. If TSM occurs, the image quality of the arterial phase is degraded, and the accuracy of diagnosis can be affected. So, an algorithm to correct the motion artifact due to TSM is required.
There have been several attempts to reduce the motion artifact of MRI by tracking the motion [5, 6], or changing sampling trajectory or imaging sequence [7, 8, 9]. However, they require additional devices or scan time, and the types of motion artifacts corrected by these methods are limited.
Motion artifact correction algorithms based on compressed sensing (CS) [10, 11, 12] also have been investigated. CS-based algorithms have shown high-quality results, but they have limitations such as the difficulty of hyperparameter tuning and high computational complexity. Furthermore, many CS algorithms require raw -space data, which are rarely obtained in clinical environments due to storage limitations.
Recent studies for MRI motion artifact reduction are based on deep learning methods [13, 14, 15, 16, 17, 18, 19]. Deep learning methods have shown improved performance and reduced run time compared to previous methods. However, most of the deep learning methods are based on supervised learning approaches. Since paired motion-free and corrupted data are difficult to obtain, these methods usually utilize simulated motion artifact images to train the networks. Therefore, it is difficult to apply them to real motion-corrupted data.
To overcome the limitation of simulation-based deep learning methods, deep learning methods using unpaired data have also been explored. Some methods interpret the motion artifact reduction problem as image-to-image translation [20, 21], and address them based on CycleGAN architecture [22]. Although they utilize real motion artifact data, the performance of these algorithms is often limited because there is no explicit motion artifact rejection mechanism.
Recently, we proposed an algorithm for MR motion artifact reduction using bootstrap subsampling and aggregation [23]. Under the assumption that the motion artifact appears as -space outliers, the method removes the motion artifact by rejecting -space outliers in a probabilistic manner. Although our prior method outperforms other simulation-based or unpaired deep learning methods, there exist limitations if the motion artifact does not appear as sparse outliers in -space.
Recently, score-based diffusion models [24, 25, 26] have shown remarkable performance in the field of image generation. In score-based diffusion models, a network that estimates the score, the gradient of the log probability density function, is first trained, and then images can be generated by solving the reverse-time stochastic differential equation (SDE). Furthermore, it has been verified that unconditionally trained diffusion models can be applied to solve various inverse problems by adjusting sampling procedure using constraints [27, 28, 29, 30, 31]. Importantly, the unconditionally trained diffusion models do not require paired data, so it is possible to solve inverse problems in an unsupervised manner.
Inspired by this, here we propose a novel MRI motion artifact reduction method using score-based diffusion models. Fig. 1 shows the overall procedure of the proposed method. During reverse diffusion, the low-frequency data consistency is gradually imposed in an iterative manner so that the overall structure of the original image is maintained and helps to remove only motion artifact components.
In particular, our constraint is designed based on the observation that the motion artifacts in MRI usually occurred in the high-frequency region of -space. This is because -space acquisition is usually performed first in the center region and motion occurs after a certain period after the start of acquisition, so -space samples that include motion artifacts generally appear in high-frequency regions. Therefore, during the reverse diffusion, the low-frequency region needs to be maintained and only the high-frequency region should be corrected by diffusion sampling. However, because the high-frequency region of -space also contains the information of details, the detailed structures of images can be altered or vanished if the data consistency step in Eq. (9) is applied directly. To address this issue, we propose an annealed reverse sampling procedure where the data consistency step is gradually applied in a repeated manner to maintain high-frequency details of measurements.
The remaining parts of the paper are constructed as follows. Section II introduces backgrounds of score-based diffusion models. Section III contains the key idea of the proposed method. The experimental setting is explained in Section IV, and qualitative and quantitative results are shown in Section V. Section VI and VII contains the discussion and conclusion of our paper.
II Backgrounds
II-A Score-Based Diffusion Models
A continuous diffusion process can be represented as , where denotes the time variable. Here, where is the data distribution, and where refers to the noise distribution which is commonly set to Gaussian distribution. Then, the diffusion process can be modeled by the solution of the following stochastic differential equation:
(1) |
where is a drift coefficient, is a diffusion coefficient, and denotes a standard Wiener process. By solving Eq. (1), it is possible to transmit a sample from the data distribution to that of the noise distribution through the forward diffusion process.
If it is possible to reverse the diffusion process in Eq. (1), then we can obtain samples of the data distribution from samples of the noise distribution. In [32], it was shown that the reverse process is also a diffusion process that can be modeled by following reverse SDE:
(2) |
where is also a standard Wiener process from time 1 to 0, and denotes the score function. Therefore, if the score function can be estimated, it is possible to derive the reverse diffusion process and generate samples of data distribution from random Gaussian noise.
Among the many possible choices of and , we choose variance exploding SDE (VE-SDE) [26], where and are defined by
(3) |
and
(4) |
Then, the reverse SDE in Eq. (2) can be rewritten as:
(5) |
Here, the time index is usually discretized uniformly into intervals, and and can be defined as
(6) |
The score function is generally estimated by training a neural network with denoising score matching [33]. The training of the score-based model with denoising score matching can be done by minimizing the following objective function:
(7) |
After training the neural network and plugging it into Eq. (5), the reverse SDE can be solved by numerical SDE solvers or predictor-corrector (PC) samplers [26].
II-B Come-Closer-Diffuse-Faster (CCDF)
The main drawback of score-based diffusion models is their slow sampling time. Because the sampling starts from the random Gaussian noise and usually requires thousands of steps, the sampling time of score-based diffusion models is too long. In the prior work [28], the authors proposed a method called Come-Closer-Diffuse-Faster (CCDF) to reduce the sampling time of diffusion models when solving inverse problems. Specifically, instead of starting sampling from random Gaussian noise, the forward diffusion is first applied from the initial reconstruction, leading to only few steps of reverse diffusion to get the final reconstruction.
More specifically, Algorithm 1 shows the CCDF sampling procedure using the PC sampler, where denotes the initial measurement, and is the number of reverse diffusion steps where . Here, the data consistency step should be non-expansive to maintain the stochastic contraction mapping nature of reverse diffusion sampling [28]. With a better initialization followed by one-step forward diffusion, CCDF largely reduces the reverse sampling time for solving inverse problems [28].
III Main Contribution
III-A Motivation
In our prior work [23], we solved the motion artifact reduction problem by regarding motion artifacts as sparse outliers in -space. Specifically, if the motion is occurred by translation or rotation, it is assumed to result in -space phase shift or rotation at the specific phase encoding lines:
(8) |
where denotes the motion-corrupted -space with the indices along the frequency encoding direction and phase encoding direction , and is the motion-free image, denotes the Fourier transform, denotes the rotation operation with the angle , is the displacement in radian, and is the phase encoding indices where the rotation or translation occurred.
Based on this assumption, the network is trained to reconstruct fully sampled motion-free images from randomly sub-sampled images along the phase encoding direction in which the corrupted -space data can be removed in a probabilistic manner. Although this method does not require simulated motion artifact images and shows improved performance, it has a limitation in that it is difficult to apply when the motion artifacts cannot be considered as sparse outliers in -space. Furthermore, because the index of outliers is not known, some outliers that are not removed by subsampling can remain in the reconstructed image.
III-B Proposed Method
Rather than using the sparse outlier assumption in Eq. (8), our method is based on a more relaxed assumption that the motion artifacts in MRI mainly occur in the high-frequency region of -space. This is because -space acquisition is usually performed first in the center region and motion occurs after a certain period after the start of acquisition so that -space samples with motion artifacts generally appear in high-frequency regions. Therefore, the high-frequency region of -space should be corrected to remove the motion artifact.
The application of CCDF in Algorithm 1 starts from the one-step forward diffusion from the initialization. Then, a näive way of using data consistency for reverse diffusion would be to impose the low-frequency region consistency:
(9) |
where is the operator that samples only the low-frequency region of -space. In other words, during reverse diffusion, the low-frequency region is maintained so that only the high-frequency region is corrected by the diffusion model.
However, because the high-frequency region of -space also contains the information of details, the detailed structures of images can be altered or vanished if the data consistency step in Eq. (9) is applied directly. To address this issue, we propose an annealed data consistency step to maintain high-frequency details of measurements as illustrated in Fig. 2:
(10) |
where is the annealing hyperparameter to control the weight of high-frequency components of the measurement. Furthermore, as shown in Algorithm 2, we choose relatively small , and repeat forward and reverse processes times so that the high-frequency components of the measurement are gradually added at each data consistency step.
Here, Eq. (10) can be written as
where
Since [28], it is also true that . Therefore, is a non-expansive mapping, so it can accelerate the reverse diffusion process through the CCDF principle [28].
III-C Implementation Details
In our implementation, we choose VE-SDE, which results in the following one-step forward sampling:
(11) |
where and is the clean training data. By plugging this in Eq. (7), we have the following cost function [26]:
(12) |
Here, we choose the number of discretized steps , and and in Eq. (6) are set to and , respectively. We train the score model for 1.3M iterations and follow [30] for the setting of other hyperparameters such as optimization, batch size, learning rate, gradient clipping, or exponential moving average.
In addition, for , and in Algorithm 2, we choose , , and
(13) |
where . In other words, linearly decreases to 0 as goes to 1, so the weight of high-frequency components of the measurement decreases as reverse diffusion proceeds.
In CCDF [28], it was shown that a better initialization provides faster reverse sampling. Accordingly, the neural network (NN) initialization could be utilized if available as it is better than the original artifact-corrupted images. Accordingly, we also employed NN initialization with [20] for the brain dataset, and [23] for the liver dataset.
IV Methods
IV-A Experimental Data
In our experiments, we use two MR datasets. The first dataset is the human connectome project (HCP) dataset which is the public dataset that contains human brain MR images. This dataset is acquired by Siemens 3T system with 3D spin echo sequence, and the scan parameters are as follows: TR = 3200 ms, TE = 565 ms, echo train duration = 1105, matrix size = 320320, voxel size = 0.70.70.7 mm3, and phase encoding direction = anterior-posterior. Because the HCP dataset does not contain motion-corrupted images, it is used for quantitative evaluation with motion artifact simulation. The score model is trained with 3000 motion-free MR images from 150 subjects, and other 800 images from 40 subjects are used for testing.
The second dataset is collected from Chungnam National University Hospital (CNUH), and it includes Gd-EOB-DTPA-enhanced liver MR images. It is obtained by a 3T Philips Achieva MR system with the following scan parameters: TR = 3.1 ms, TE = 1.5 ms, flip angle = 10∘, field of view = 256256 mm2, slice thickness/intersection gap = 2/0 mm, acquisition matrix = 320192, and phase encoding direction = anterior-posterior. Also, dynamic imaging including various phases was obtained, but only arterial phase images are used for experiments because TSM usually occurs during the arterial phase. The liver dataset consists of two groups, motion-free images, and motion-corrupted images. For the training of the score model, 3097 motion-free images from 18 subjects are used. After training, 444 simulated motion-corrupted images from 5 subjects are selected for the quantitative evaluation, and 38 MR volumes with in vivo motion-corrupted images are used for qualitative and radiologist evaluations.
IV-B Artifact Simulation
For the quantitative evaluation, we used simulated motion artifacts. The simulation was performed similarly to prior works [16, 23]. The first type of motion artifact that we simulate is random translation and rotation. We simulate the first type of motion artifact with the HCP brain dataset. The motion artifact with random translation and rotation can be simulated by Eq. (8) with
(14) |
where denotes the rotation angle, and denote the degree of motion along and direction, respectively, and is the delay time of the phase error due to the centric -space filling. In our simulation, is fixed to , is randomly sampled from , and are sampled from and , respectively, at each -space line.
The second type of simulated motion is respiratory motion, which appears as a sinusoidal function in -space [16, 23]:
(15) |
where , , and denote the amplitude, period, and phase shift of the sinusoidal function, respectively. Because the respiratory motion appears in abdominal MR images, we simulate it with the liver MR image dataset. Parameters for the simulation are sampled as follows: , , , and , where denotes the uniform distribution with the interval
IV-C Comparison Methods
We compared our method with three state-of-the-art methods to verify the performance of the method. The first comparison method is MARC [16], a method for reducing liver MRI motion artifacts. Because it is a supervised method, we train MARC models using simulated motion-corrupted images with Eqs. (14) and (15).
The second comparison method is Cycle-MedGAN V2.0 [20], an unpaired deep learning method based on CycleGAN [22]. Cycle-MedGAN V2.0 can be trained with both simulated or in vivo motion-corrupted data, but we train it with only simulated motion-corrupted data because the training of Cycle-MedGAN V2.0 was unstable when using in vivo data.
We also employed the bootstrap subsampling and aggregation method in [23] as a comparison method. Because this method requires only motion-free images during training, simulated or in vivo motion-corrupted images were not used during training.
IV-D Evaluation Methods
For the quantitative evaluation, we used the peak signal-to-noise ratio (PSNR) and the structural similarity index metric (SSIM). Because there is no ground truth matched with in vivo motion-corrupted images, the quantitative evaluation was performed with simulated motion-corrupted images.
In addition, we also conducted a clinical evaluation with the results using in vivo motion-corrupted data. Specifically, a radiologist with 13 years of experience in abdominal MR imaging performed an analysis of the results of various methods. The image analysis was conducted from various perspectives. First, the performance in reducing motion artifacts is rated using a 5-point scoring system: 1 = non-diagnostic (severe artifacts causing impaired diagnostic capability of the readers); 2 = substantial artifacts with image quality decrease, but diagnostic performance impairment; 3 = mild artifacts, no significant (only mild) image quality disturbance; 4 = minimal artifacts, sharp image; 5 = no artifacts. The image noise level is also evaluated with the following scoring system: 1 = non-diagnostic (severe noise causing impaired diagnostic capability of the readers); 2 = substantial noise with image quality decrease, but diagnostic performance impairment; 3 = mild noise, no significant (only mild) image quality disturbance; 4 = minimal noise; 5 = no noise. Next, the blurring can be induced when reducing the motion artifact, so the rating of image blurring level is performed: 1 = non-diagnostic (severely pixelated texture causing impaired diagnostic capability of the readers); 2 = substantially pixelated, artificial sensation with concerns about the loss of normal texture, without diagnostic performance impairment; 3 = mildly pixelated, artificial sensation, without image quality decrease; 4 = minimal alteration of image texture; 5 = no alteration of image texture. Furthermore, because the hepatic artery (HA) on the arterial phase should be visualized clearly, the vessel clarity is evaluated with a scoring system: 1 = not delineated due to motion or low signal-to-noise ratio (SNR); 2 = blur or decreased SNR; 3 = clear common hepatic artery (CHA) and proper hepatic artery (PHA), but blurred HA and gastroduodenal artery (GDA); 4 = entire HA is clearly visible, clear CHA, GDA, bilateral HA; 5 = strong contrast-to-noise ratio with score 4. Last, the overall image quality is assessed by following scoring system: 1 = non-diagnostic; 2 = not satisfactory image quality, but re-examination is not needed; 3 = acceptable image quality (image quality may not be very good, but clinically acceptable); 4 = good image quality without significant artifact; 5 = excellent image quality without artifact and good spatial resolution. The score is rated for each volume in all assessments. Also, the results were presented to the radiologist in a random order without any labeling for a fair comparison.
V Results
V-A Results with Simulated Data
Fig. 3 shows the motion artifact reduction results of various methods with random simulated motion-corrupted data. As shown in Fig. 3(a), it is hard to recognize detailed structures of brains due to motion artifacts. MARC [16] reduces the motion artifact but the output images of MARC are too blurry or smoothed (Fig. 3(b)). In the results of MARC, the boundary between gray matter and white matter is not clear (the first row in Fig. 3(b)), and the structure of the choroid plexus is not properly restored (the second row in Fig. 3(b)). Next, in Fig. 3(c) and (d), Cycle-MedGAN V2.0 [20] and bootstrap subsampling and aggregation [23] remove random motion artifacts significantly and show increased quantitative results compared to input images. However, there are some differences between label images and outputs of Cycle-MedGAN V2.0 as shown in difference maps, and bootstrap subsampling and aggregation [23] shows blurrier edge details compared to label images. On the other hand, as shown in Fig. 3(e), the proposed method shows the best qualitative and quantitative results among all methods. Especially, the proposed method shows the sharpest boundary between gray and white matters among methods as shown in the first row of Fig. 3.
Next, we compare motion artifact reduction methods using simulated respiratory motion-corruption data. In Fig. 4(a), the vasculature of the liver is damaged or blurred due to motion artifacts. Especially, artifacts appear most severe around blood vessels. MARC removes motion artifacts and achieves high quantitative metric values, but the blood vessels still look blurry as shown in Fig. 4(b). On the other hand, Cycle-MedGAN V2.0 [20] sharp reconstructed results but the PSNR of results of Cycle-MedGAN V2.0 is lower than that of input images (4(c)). It is maybe because Cycle-MedGAN V2.0 changes image intensity or details. Results of bootstrap subsampling and aggregation [23] are shown in Fig. 4(d), resulting in images with reduced motion artifacts and improved quantitative metrics compared to input images. However, some motion artifacts near the blood vessels remain (the first row of Fig. 4(d)), and it is hard to recognize the vessel due to blurring and remaining artifacts (the second row of Fig. 4(d)). Meanwhile, the proposed method shows the most similar restoration results to the label images as shown in Fig. 4(e) and (f). Specifically, the vascular structure is most clearly and accurately restored by the proposed method. Furthermore, our method significantly reduces motion artifacts around the blood vessels compared to other methods.
TABLE I shows the quantitative metric values of motion artifact reduction methods. In experiments using simulated random motion-corrupted data, the proposed method achieves the highest PSNR and SSIM, and it is consistent with the qualitative results in Figs. 3 and 4. On the other hand, MARC shows the highest quantitative results when using simulated respiratory motion-corrupted data. However, as confirmed in Figs. 3 and 4, reconstructed images by MARC are extremely blurred, so the detailed structures are indistinguishable. Compared to MARC, the proposed method removes the motion artifacts without losing information on image details. Furthermore, the quantitative metric value of our method is the highest among that of unpaired/unsupervised methods.
V-B Results with In Vivo Data
Because the simulated motion artifacts only consider rigid motion artifacts, it should be verified that the method can also be applied to non-rigid in vivo motion artifact removal. In Fig. 5(a), motion artifacts due to transient dyspnea degrade the quality of liver MR image. We attempt to remove motion artifacts in Fig. 5(a), and results are shown in Fig. 5(b) to (e). Again, MARC removes not only motion artifacts but also detailed structures of blood vessels, so the reconstructed image is extremely blurry (Fig. 5(b)). Conversely, in Fig. 5(c), Cycle-MedGAN V2.0 makes the image sharper, but it also amplifies motion artifacts or noise in the input image. Next, the bootstrap subsampling and aggregation method also fails to remove the motion artifacts. Specifically, as shown in the yellow and green boxes of Fig. 5(d), motion artifacts around the blood vessels remain in the output image. Unlike comparison methods, the proposed method successfully removes the motion artifacts and reduces the noise level of the input image. Furthermore, our method reconstructs detailed structures. For example, in the yellow box of Fig. 5(e), the sharpness of the lesion increased as the motion artifact disappeared. Also, the vascular structure is recovered due to the reduction of motion artifacts as shown in the green box of Fig. 5(e). Through the experiment using in vivo motion-corrupted data, we confirmed that the proposed method also removes in vivo motion artifacts that contain the non-rigid motion of patients.
Method | Motion artifact | Noise | Blurring | Vessel clarity | Overall quality |
Input | 3.03 0.91 | 3.03 0.68 | 3.92 0.71 | 3.45 1.20 | 3.00 1.09 |
MARC [16] | 3.37 0.79 | 3.34 0.81 | 2.29 0.87 | 2.97 1.15 | 2.50 1.03 |
Cycle [20] | 3.42 0.92 | 3.13 0.81 | 3.97 0.94 | 3.47 1.18 | 3.21 1.09 |
BSA [23] | 3.45 1.22 | 3.39 0.75 | 3.89 1.06 | 3.45 1.29 | 3.29 1.18 |
Proposed | 3.63 1.10 | 3.58 0.76 | 3.97 0.91 | 3.71 1.31 | 3.45 1.25 |
V-C Clinical Evaluation
Because it is impossible to quantitatively evaluate results using in vivo motion-corrupted datasets due to the lack of paired motion-free data, we evaluate motion artifact reduction results by clinical evaluation.
TABLE II shows the scores by evaluating each method on various criteria. MARC achieved scores of 3.37 and 3.34 in terms of motion artifact and noise evaluation, respectively, while input images score 3.03 in both evaluations. These results indicate that MARC was good in motion artifact improvement or noise reduction. However, MARC scored 2.29 in the blurring evaluation, which is lower than the score of input images (score: 3.92). The blurring effect of MARC also can be confirmed in Fig. 5(b). Therefore, the overall quality score of MARC (score: 2.50) is lower than that of input images (score: 3.00). On the other hand, Cycle-MedGAN V2.0 got the highest score in the blurring evaluation (score: 3.97). However, Cycle-MedGAN V2.0 scored 3.13 in noise evaluation, which is lower than the scores of other methods. This high level of noise affects the image quality drop of Cycle-MedGAN V2.0, so Cycle-MedGAN V2.0 gets only 3.29 points in terms of the overall image quality. As shown in Fig. 5 and TABLE II, the bootstrap subsampling and aggregation method shows higher scores than the other existing methods in most assessments. However, the outputs of the bootstrap subsampling and aggregation method were slightly blurred, so its score was lower than the input images in the blurring evaluation.
While the other methods each showed drawbacks, the proposed method achieved the highest performance in all evaluations. First, in terms of motion artifact removal, the proposed method achieves the highest score (score: 3.63) while other methods get similar lower scores (score: 3.37-3.45). Next, our method scored 3.58 and 3.97 in the noise and blurring evaluations, respectively. From these results, we confirm that our method does not amplify image noise level or blur output images through the clinical evaluation. Moreover, the motion-corrupted input images scored 3.45 in terms of vessel clarity. The proposed method shows a significant improvement in vessel clarity score (score: 3.71) while the vessel clarity of the other three methods is similar to or lower than that of motion-corrupted input images (score: 2.97-3.47). Finally, our method gets the best score (score: 3.45) for overall image quality. To sum up, the proposed method achieves the highest score in all clinical evaluations, and this result indicates that our method is useful in clinical practice.
VI Discussion
VI-A Comparison with Other Methods
In Section V, it was verified that MARC [16] generates blurry outputs in both simulation and in vivo study. The blurring results may be a limitation of methods based on supervised learning. Because the supervised learning minimizes the loss (e.g. L1, mean squared error (MSE)) between output and label, it achieves high quantitative results as shown in TABLE I. However, it can also lead to the loss of information on image details because L1 or MSE losses do not assure the perceptual quality of output images.
Unlike MARC, Cycle-MedGAN V2.0 [20] is an unpaired method that does not require paired input and label images. Instead of using losses between input and label, it translates an image from one domain to another domain by utilizing cycle consistency loss and adversarial loss. Because the discriminators of Cycle-MedGAN V2.0 distinguish real and fake generated images, the generators of Cycle-MedGAN V2.0 provide realistic images with sharp details. However, we have confirmed that Cycle-MedGAN V2.0 also magnifies the artifacts or noise of images. We conjecture that it is because the networks of Cycle-MedGAN V2.0 consider resolution degradation due to the motion artifacts to be the main difference between the two image domains. Therefore, the networks of Cycle-MedGAN V2.0 try to improve resolution rather than eliminate motion artifacts.
Compared to the previous two methods, bootstrap subsampling and aggregation [23] showed stable qualitative and quantitative results. Nevertheless, because [23] works under the assumption that the motion artifact appears as sparse outliers in -space, the performance of this method is degraded if the assumption is not satisfied. For example, we simulated the respiratory motion with Eq. (15), so the respiratory motion appears as a continuous sinusoidal form in -space. Because the motion did not appear as sparse outliers, the performance of [23] was dropped compared to when it works with simulated random motion-corrupted data.
On the other hand, our proposed method presented outstanding results compared to other comparison methods. The proposed method successfully removes motion artifacts and retrieves high-frequency image details in both simulation and in vivo studies.
Nevertheless, our method is not free of limitations. Because the score-based diffusion models require several steps of reverse diffusion, it takes a long time to generate outputs. Although we utilized the CCDF algorithm to reduce the inference time, our method also requires several seconds as shown in TABLE III. Therefore, the acceleration of the proposed method should be done for clinical use.
Hyperparameters | PSNR (dB) | SSIM | Time/image (sec) | |
0 | 36.36 | 0.935 | 19.30 | |
0.01 | 37.01 | 0.940 | 19.30 | |
0.1 | 36.58 | 0.927 | 19.30 | |
1 | 36.45 | 0.935 | 1.834 | |
10 | 37.01 | 0.940 | 19.30 | |
100 | 36.88 | 0.938 | 195.6 | |
1 | 36.43 | 0.934 | 6.358 | |
3 | 37.01 | 0.940 | 19.30 | |
5 | 37.28 | 0.942 | 32.48 | |
10 3 | 37.01 | 0.940 | 19.30 | |
30 1 | 36.90 | 0.938 | 19.30 |
VI-B Effects of Annealing Hyperparameters
In our method, we injected high-frequency components of measurements (-space of motion-corrupted images) with the hyperparameter to preserve detailed structures of MR images. To confirm the effect of high-frequency component injection, we conduct our method for simulated liver motion-corrupted images with various . As shown in TABLE III, the proposed method with shows lower quantitative results than the proposed method with . It is because detailed structures such as vessels cannot be reconstructed perfectly without high-frequency component injection. When , the quantitative results drop again compared to results with . We conjecture that it is because the high-frequency component of measurements also contains motion artifacts, and the remaining artifacts degrade the quality of reconstructed images. Therefore, we choose to inject high-frequency components with in our experiments.
Next, we also confirm the effect of the selection of . When , the motion artifacts remain in output images, so the quantitative results deteriorate. On the other hand, our method also shows the degraded performance when . It may be because the structures that cannot be seen in the input image were generated during the iterations of the reverse diffusion process. Moreover, the required inference time of the proposed method with is quite long as shown in TABLE III, so we choose that shows the best qualitative and quantitative performance.
Finally, the number of iterations of the reverse diffusion process is also one of the important hyperparameters of our method. Through the experiments on , we find that the proposed method cannot completely remove motion artifacts when . On the other hand, when , the required inference time for one image is too long while the performance gain is negligible compared to when . Therefore, is selected in our experiments.
In addition, we also verify the effect of the combination of and . The proposed method shows different results depending on the combination of and as shown in TABLE III, even if it takes the same inference time. The proposed method with , shows lower quantitative performance compared to the method with , . It is because the motion artifacts cannot be removed perfectly with only one iteration of the diffusion process even though is large. Through the experiment, we verify that the combination of , is better than , for the performance of our proposed method.
VII Conclusion
In this paper, we proposed a novel MRI motion artifact reduction method using the annealed score-based diffusion model. By applying the diffusion process iteratively and gradually imposing data consistency with high-frequency injection, the proposed method successfully reduced simulated and in vivo motion artifacts in MR images. Furthermore, we verified that our method provides higher-quality images and more clinical meaning compared to other state-of-the-art deep learning methods. We believe that our algorithm can be a useful framework for MRI motion artifact reduction.
VIII Acknowledgement
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST)), National Research Foundation(NRF) of Korea grant NRF-2020R1A2B5B03001980, and by the KAIST Key Research Institute (Interdisciplinary Research Group) Project.
References
- [1] A. Nishie, D. Kakihara, Y. Asayama, Y. Ushijima, Y. Takayama, N. Fujita, D. Shimamoto, K. Shirabe, T. Hida, and H. Honda, “Detectability of hepatocellular carcinoma on gadoxetic acid-enhanced MRI at 3T in patients with severe liver dysfunction: clinical impact of dual-source parallel radiofrequency excitation,” Clinical radiology, vol. 70, no. 3, pp. 254–261, 2015.
- [2] N. Verloh, K. Utpatel, M. Haimerl, F. Zeman, C. Fellner, S. Fichtner-Feigl, A. Teufel, C. Stroszczynski, M. Evert, and P. Wiggermann, “Liver fibrosis and Gd-EOB-DTPA-enhanced MRI: A histopathologic correlation,” Scientific reports, vol. 5, no. 1, pp. 1–10, 2015.
- [3] K. Kubota, T. Tamura, N. Aoyama, M. Nogami, N. Hamada, A. Nishioka, and Y. Ogawa, “Correlation of liver parenchymal gadolinium-ethoxybenzyl diethylenetriaminepentaacetic acid enhancement and liver function in humans with hepatocellular carcinoma,” Oncology letters, vol. 3, no. 5, pp. 990–994, 2012.
- [4] M. S. Davenport, B. L. Viglianti, M. M. Al-Hawary, E. M. Caoili, R. K. Kaza, P. S. Liu, K. E. Maturen, T. L. Chenevert, and H. K. Hussain, “Comparison of acute transient dyspnea after intravenous administration of gadoxetate disodium and gadobenate dimeglumine: effect on arterial phase image quality,” Radiology, vol. 266, no. 2, pp. 452–461, 2013.
- [5] N. White, C. Roddey, A. Shankaranarayanan, E. Han, D. Rettmann, J. Santos, J. Kuperman, and A. Dale, “Promo: real-time prospective motion correction in MRI using image-based tracking,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 63, no. 1, pp. 91–105, 2010.
- [6] N. Todd, O. Josephs, M. F. Callaghan, A. Lutti, and N. Weiskopf, “Prospective motion correction of 3d echo-planar imaging data for functional mri using optical tracking,” Neuroimage, vol. 113, pp. 1–12, 2015.
- [7] J.-R. Liao, J. M. Pauly, T. J. Brosnan, and N. J. Pelc, “Reduction of motion artifacts in cine MRI using variable-density spiral trajectories,” Magnetic resonance in medicine, vol. 37, no. 4, pp. 569–575, 1997.
- [8] J. G. Pipe, “Motion correction with PROPELLER MRI: application to head motion and free-breathing cardiac imaging,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 42, no. 5, pp. 963–969, 1999.
- [9] S. S. Vasanawala, Y. Iwadate, D. G. Church, R. J. Herfkens, and A. C. Brau, “Navigated abdominal T1-W MRI permits free-breathing image acquisition with less motion artifact,” Pediatric radiology, vol. 40, no. 3, pp. 340–344, 2010.
- [10] M. Usman, D. Atkinson, F. Odille, C. Kolbitsch, G. Vaillant, T. Schaeffter, P. G. Batchelor, and C. Prieto, “Motion corrected compressed sensing for free-breathing dynamic cardiac MRI,” Magnetic resonance in medicine, vol. 70, no. 2, pp. 504–516, 2013.
- [11] Z. Yang, C. Zhang, and L. Xie, “Sparse MRI for motion correction,” in 2013 IEEE 10th International Symposium on Biomedical Imaging. IEEE, 2013, pp. 962–965.
- [12] K. H. Jin, J.-Y. Um, D. Lee, J. Lee, S.-H. Park, and J. C. Ye, “MRI artifact correction using sparse+ low-rank decomposition of annihilating filter-based hankel matrix,” Magnetic resonance in medicine, vol. 78, no. 1, pp. 327–340, 2017.
- [13] B. A. Duffy, W. Zhang, H. Tang, L. Zhao, M. Law, A. W. Toga, and H. Kim, “Retrospective correction of motion artifact affected structural MRI images using deep learning of simulated motion,” 2018.
- [14] I. Oksuz, J. Clough, B. Ruijsink, E. Puyol-Antón, A. Bustin, G. Cruz, C. Prieto, D. Rueckert, A. P. King, and J. A. Schnabel, “Detection and correction of cardiac mri motion artefacts during reconstruction from k-space,” in International conference on medical image computing and computer-assisted intervention. Springer, 2019, pp. 695–703.
- [15] J. Liu, M. Kocak, M. Supanich, and J. Deng, “Motion artifacts reduction in brain MRI by means of a deep residual network with densely connected multi-resolution blocks (DRN-DCMB),” Magnetic resonance imaging, vol. 71, pp. 69–79, 2020.
- [16] D. Tamada, M.-L. Kromrey, S. Ichikawa, H. Onishi, and U. Motosugi, “Motion artifact reduction using a convolutional neural network for dynamic contrast enhanced MR imaging of the liver,” Magnetic resonance in medical sciences, vol. 19, no. 1, p. 64, 2020.
- [17] Q. Lyu, H. Shan, Y. Xie, A. C. Kwan, Y. Otaki, K. Kuronuma, D. Li, and G. Wang, “Cine cardiac MRI motion artifact reduction using a recurrent neural network,” IEEE Transactions on Medical Imaging, vol. 40, no. 8, pp. 2170–2181, 2021.
- [18] M. A. Al-Masni, S. Lee, J. Yi, S. Kim, S.-M. Gho, Y. H. Choi, and D.-H. Kim, “Stacked U-Nets with self-assisted priors towards robust correction of rigid motion artifact in brain MRI,” NeuroImage, vol. 259, p. 119411, 2022.
- [19] E. Kuzmina, A. Razumov, O. Y. Rogov, E. Adalsteinsson, J. White, and D. V. Dylov, “Autofocusing+: Noise-resilient motion correction in magnetic resonance imaging,” arXiv preprint arXiv:2203.05569, 2022.
- [20] K. Armanious, A. Tanwar, S. Abdulatif, T. Küstner, S. Gatidis, and B. Yang, “Unsupervised adversarial correction of rigid MR motion artifacts,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 1494–1498.
- [21] S. Liu, K.-H. Thung, L. Qu, W. Lin, D. Shen, and P.-T. Yap, “Learning MRI artefact removal with unpaired data,” Nature Machine Intelligence, vol. 3, no. 1, pp. 60–67, 2021.
- [22] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.
- [23] G. Oh, J. E. Lee, and J. C. Ye, “Unpaired MR motion artifact deep learning using outlier-rejecting bootstrap aggregation,” IEEE Transactions on Medical Imaging, vol. 40, no. 11, pp. 3125–3139, 2021.
- [24] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- [25] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
- [26] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2020.
- [27] Y. Song, L. Shen, L. Xing, and S. Ermon, “Solving inverse problems in medical imaging with score-based generative models,” arXiv preprint arXiv:2111.08005, 2021.
- [28] H. Chung, B. Sim, and J. C. Ye, “Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 413–12 422.
- [29] H. Chung, B. Sim, D. Ryu, and J. C. Ye, “Improving diffusion models for inverse problems using manifold constraints,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=nJJjv0JDJju
- [30] H. Chung and J. C. Ye, “Score-based diffusion models for accelerated MRI,” Medical Image Analysis, p. 102479, 2022.
- [31] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, “Image super-resolution via iterative refinement,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- [32] B. D. Anderson, “Reverse-time diffusion equation models,” Stochastic Processes and their Applications, vol. 12, no. 3, pp. 313–326, 1982.
- [33] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.