capbtabboxtable[][\FBwidth] \authorinfo Send correspondence to Mark A. Anastasio & Rongping Zeng. E-mail: {[email protected] & [email protected]}.
Estimating Task-based Performance Bounds for Accelerated MRI Image Reconstruction Methods by Use of Learned-Ideal Observers
Abstract
Medical imaging systems are commonly assessed and optimized by the use of objective measures of image quality (IQ). The performance of the ideal observer (IO) acting on imaging measurements has long been advocated as a figure-of-merit to guide the optimization of imaging systems. For computed imaging systems, the performance of the IO acting on imaging measurements also sets an upper bound on task-performance that no image reconstruction method can transcend. As such, estimation of IO performance can provide valuable guidance when designing under-sampled data-acquisition techniques by enabling the identification of designs that will not permit the reconstruction of diagnostically inappropriate images for a specified task - no matter how advanced the reconstruction method is or how plausible the reconstructed images appear. The need for such analysis is urgent because of the substantial increase of medical device submissions on deep learning-based image reconstruction methods and the fact that they may produce clean images disguising the potential loss of diagnostic information when data is aggressively under-sampled. Recently, convolutional neural network (CNN) approximated IOs (CNN-IOs) was investigated for estimating the performance of data space IOs to establish task-based performance bounds for image reconstruction, under an X-ray computed tomographic (CT) context. In this work, the application of such data space CNN-IO analysis to multi-coil magnetic resonance imaging (MRI) systems has been explored. This study utilized stylized multi-coil sensitivity encoding (SENSE) MRI systems and deep-generated stochastic brain models to demonstrate the approach. Signal-known-statistically and background-known-statistically (SKS/BKS) binary signal detection tasks were selected to study the impact of different acceleration factors on the data space IO performance.
1 Introduction
Objective measures of image quality (IQ) that quantify the ability of an observer to perform a specific task are widely employed in the field of medical imaging [1, 2, 3, 4, 5, 6]. For the purpose of optimizing imaging system designs, objective IQ measures based on the performance of the Bayesian Ideal Observer (IO) acting on imaging measurements with consideration of different diagnostic tasks have been advocated [1, 2]. In this way, the amount of task-specific information in the imaging measurements can be maximumly used. Moreover, for computed imaging systems, the performance of the IO acting on imaging measurements sets an upper bound on task-performance that no image reconstruction method can improve upon. The ability to compute performance bounds for computed imaging systems is now more important than ever for evaluating deep learning-based image reconstruction methods (DLIRMs). A variety of DLIRMs are being actively developed, primarily for applications of under-sampled data acquisitions in which conventional image reconstruction methods are not fully effective. For example, a possible use case for DLIRMs is image reconstruction from highly incomplete and noisy tomographic measurements. In some cases, DLIRMs can yield visually plausible images that possess encouraging scores as measured by physical, non-task-based, metrics such as MSE, PSNR, and SSIM. However, these metrics may disguise the situations in which incomplete and noisy tomographic measurement data will not permit the reconstruction of a diagnostically useful image, no matter how advanced the DLIRM is or plausible the reconstructed images appear. This can occur when features of the object that are important for performing the diagnostic task are not sufficiently presented in the measurement data. A natural way to identify such situations is to estimate the performance of the IO acting on the measurement data as a performance bound for any image reconstruction methods.
Recently, convolutional neural network (CNN) approximated IOs (CNN-IOs) have been investigated for estimating the performance of data space IOs to establish task-based performance bounds for image reconstruction [7]. The effectiveness of the data space CNN-IO has been demonstrated in a relatively simple simulation study relevant to the X-ray computed tomographic (CT) context. The application of such data space CNN-IO analysis to advanced multi-coil magnetic resonance imaging (MRI) systems and more realistic object variability remains unexplored. Considering the active development of DLIRMs for multi-coil accelerated MRI acquisitions, there exists an urgent need to explore the applications of data space CNN-IO in MRI.
In this work, the application of data space CNN-IOs was explored in a realistically simulated MRI imaging context. The study simulates a multi-coil sensitivity encoding (SENSE) parallel MRI system. To characterize more realistic object variability, an advanced diffusion model was trained with a large set of high-quality MRI brain images to establish the stochastic object model (SOM) of the human brain. In addition, signal-known-statistically and background-known-statistically (SKS/BKS) binary signal detection tasks with random signal locations were considered to study the impact of acceleration factors on task-based performance bounds for image reconstruction.
2 Background
2.1 Formulation of binary signal detection tasks
A continuous-to-discrete (C-D) description of a linear imaging system [1] is considered as , where is the measured image vector, denotes the object function that is dependent on the coordinate , , denotes a linear imaging operator that maps to , and denotes the measurement noise. When its spatial dependence is not important to highlight, will be denoted as .
A binary signal detection task requires an observer to classify the measured image data as satisfying either a signal-present hypothesis or a signal-absent hypothesis . These two hypotheses can be described as:
(1a) | |||
(1b) |
where and denote the signal and background object, respectively, and and denote the measured signal and background image data. To perform this task, a deterministic observer computes a test statistic that maps the measured image to a real-valued scalar variable that is compared to a predetermined threshold to determine if satisfies or . By varying the threshold , a ROC curve can be formed to quantify the trade-off between the false-positive fraction (FPF) and the true-positive fraction (TPF) [1]. The area under the ROC curve (AUC) can be subsequently calculated as a figure-of-merit (FOM) for signal detection performance.
2.2 The data space ideal observer (IO) and the CNN-approximated data space IO
The Bayesian Ideal Observer (IO) sets an upper limit of observer performance for signal detection tasks and has been advocated for use in optimizing medical imaging systems and data-acquisition designs [1]. The IO test statistic is any monotonic transformation of the likelihood ratio , where and are the conditional probability density functions that describe the measured data under the hypotheses and , respectively. In general, the likelihood ratio is often analytically intractable. In this study, a supervised learning-based method is employed to approximate [4, 7] on raw measurement data. This will be accomplished by use of an appropriately designed CNN-based classifier [4]. Specifically, convolutional layers are gradually added to the classifier until the model’s detection performance converges. The use of CNN-approximated IO (CNN-IO) has been successfully applied to raw measurement data in our previous study [7].
3 Numerical Studies
Computer-simulation studies were conducted to investigate the use of CNN-IOs to establish task-based performance bounds for image reconstruction based on binary signal detection tasks. The study employed a stylized simulation of brain magnetic resonance imaging (MRI). Simulated -space data were computed using a discrete-to-discrete (D-D) forward operator corresponding to a multi-coil sensitivity encoding (SENSE) parallel MRI system. Details regarding the simulated multi-coil MRI system, stochastic object models, and noise statistics are described below.
3.1 Stylized multi-coil SENSE MRI systems
Multi-coil SENSE MRI systems with 8 coils were modeled [8]. For the coil, the corresponding k-space measurement was simulated as , where was the to-be-imaged object, was the Cartesian sampling mask [9], represented the discrete Fourier transform, was the simulated coil sensitivity map [10], and was complex-valued Gaussian noise with standard deviation of 15.
3.2 Stochastic object model and signal
A stochastic object model (SOM) was employed to create ensembles of to-be-imaged objects. Specifically, the SOM described 2D axial slices of human brain MRI images. A diffusion model, i.e., denoising diffusion probabilistic model (DDPM) [11], was employed to establish the SOM. It has been reported that compared with other generative models, the recently developed diffusion models can generate images with better visual quality [11]. The DDPM model was trained by use of the axial brain MRI slices from the Human Connectome Project (HCP) dataset. In our study, only the slices that contain the cerebrospinal fluid (CSF) area were considered to minimize object variability due to slice location and instead emphasize differences between patients. Gaussian signals with random locations in the white matter area were considered in our study as the to-be-detected signal. The standard deviation of the Gaussian was 2 mm and its amplitude was 0.7. Figure 1 shows a realization of background, signal, and signal-present objects.

3.3 Impact of acceleration factor
A study was considered to investigate the impact of acceleration factors on the established task-based performance bounds. The acceleration factor was gradually increased from 1x (fully sampled) to 12x, and the corresponding impact on the established bounds was investigated. Here, SKS/BKS binary signal detection tasks with random signal locations were considered in this study. Realizations of the established SOM were employed as the random background. To assess the validity of the performance bounds for image reconstruction methods, both U-Net-based reconstruction methods [12] and root sum-of-square (rSOS) were considered as examples of to-be-evaluated reconstruction methods.
4 Results

Acceleration | 1x | 4x | 8x | 12x | |
---|---|---|---|---|---|
RMSE | rSOS | 0.0971 | 0.1080 | 0.1302 | 0.1548 |
U-Net | 0.0163 | 0.0230 | 0.0264 | 0.0318 | |
SSIM | rSOS | 0.7048 | 0.6764 | 0.6066 | 0.5530 |
U-Net | 0.9833 | 0.9696 | 0.9647 | 0.9543 |

Figure 2 shows the estimated task-based performance bounds for different acceleration factors (1x-12x). As expected, for all cases, it was observed that the established bounds decreased as a function of the acceleration factor. As such, estimation of IO performance can provide valuable guidance when designing data-acquisition techniques and reconstruction methods for a certain task.
In addition, as shown in Fig. 2, Tab. 1, and Fig. 4, the U-Net-based method greatly improved traditional IQ measures and visual appearances but not task-based IQ measures when compared with the rSOS method for all considered acceleration factors. This confirms the fact that traditional IQ measures may not correlate with specific task-based measures of IQ.
5 NEW OR BREAKTHROUGH RESULTS TO BE PRESENTED
This work explored the application of data space CNN-IOs in a multi-coil SENSE parallel MRI system to estimate performance bounds for DLIRMs in location-unknown signal detection tasks. Brain object variability was characterized by an advanced diffusion model. This work will help advance the use of data space IO for analyses of imaging technologies under clinically relevant conditions.
6 Conclusion
In this paper, studies were designed to investigate the use of CNN-IO in raw data space to estimate task-based performance bounds for image reconstruction methods, under a realistic multi-coil SENSE MRI context. Advanced diffusion models were employed to establish the SOM that generated realistic object variability for location-unknown signal detection tasks. A preliminary numerical study demonstrated that the CNN-IO could be potentially employed to estimate the task-based performance bounds for various acceleration factors. Importantly, our study demonstrated that although traditional IQ metrics such as RMSE and SSIM can reflect the effectiveness of DLIRMs relative to data fidelity and visual perception, they may not accurately reflect the effectiveness in preserving task-relevant information. In contrast, our results showed that the U-Net-based DLIRM could not restore small-sized signals as the acceleration factor increased to 4 or above. Therefore, the effectiveness of novel DLIRMs should not be determined solely using data fidelity or perpetual metrics, and task-based figures of merits must also be considered.
7 Acknowledgments
This work was supported in part by NIH Awards P41EB031772 (sub-project 6366), R01EB034249, R01CA233873, R01CA287778, and R56DE033344. Kaiyan Li acknowledges funding by an appointment to the Research Participation Program at the Center for Devices and Radiological Health administered by the Oak Ridge Institute for Science and Education through an inter-agency agreement between the U.S. Department of Energy and U.S. Food and Drug Administration.
References
- [1] Barrett, H. H. and Myers, K. J., [Foundations of image science ], John Wiley & Sons (2013).
- [2] Kupinski, M. A., Hoppin, J. W., Clarkson, E., and Barrett, H. H., “Ideal-observer computation in medical imaging with use of markov-chain monte carlo techniques,” JOSA A 20(3), 430–438 (2003).
- [3] Li, K., Zhou, W., Li, H., and Anastasio, M. A., “Assessing the impact of deep neural network-based image denoising on binary signal detection tasks,” IEEE Transactions on Medical Imaging (2021).
- [4] Zhou, W., Li, H., and Anastasio, M. A., “Approximating the ideal observer and hotelling observer for binary signal detection tasks by use of supervised learning methods,” IEEE transactions on medical imaging 38(10), 2456–2468 (2019).
- [5] Li, K., Li, H., and Anastasio, M. A., “A task-informed model training method for deep neural network-based image denoising,” in [Medical Imaging 2022: Image Perception, Observer Performance, and Technology Assessment ], 12035, 249–255, SPIE (2022).
- [6] Li, K., Li, H., and Anastasio, M. A., “On the impact of incorporating task-information in learning-based image denoising,” arXiv preprint arXiv:2211.13303 (2022).
- [7] Li, K., Villa, U., Li, H., and Anastasio, M. A., “Application of learned ideal observers for estimating task-based performance bounds for computed imaging systems,” Journal of Medical Imaging 11(2), 026002–026002 (2024).
- [8] Ohliger, M. A. and Sodickson, D. K., “An introduction to coil array design for parallel mri,” NMR in Biomedicine: An International Journal Devoted to the Development and Application of Magnetic Resonance In vivo 19(3), 300–315 (2006).
- [9] Zbontar, J., Knoll, F., Sriram, A., Murrell, T., Huang, Z., Muckley, M. J., Defazio, A., Stern, R., Johnson, P., Bruno, M., et al., “fastmri: An open dataset and benchmarks for accelerated mri,” arXiv preprint arXiv:1811.08839 (2018).
- [10] Guerquin-Kern, M., Lejeune, L., Pruessmann, K. P., and Unser, M., “Realistic analytical phantoms for parallel magnetic resonance imaging,” IEEE Transactions on Medical Imaging 31(3), 626–636 (2011).
- [11] Ho, J., Jain, A., and Abbeel, P., “Denoising diffusion probabilistic models,” Advances in neural information processing systems 33, 6840–6851 (2020).
- [12] Jin, K. H., McCann, M. T., Froustey, E., and Unser, M., “Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing 26(9), 4509–4522 (2017).