Diffusion Models for Unsupervised Anomaly Detection in Fetal Brain Ultrasound
Abstract
Ultrasonography is an essential tool in mid-pregnancy for assessing fetal development, appreciated for its non-invasive and real-time imaging capabilities. Yet, the interpretation of ultrasound images is often complicated by acoustic shadows, speckle noise, and other artifacts that obscure crucial diagnostic details. To address these challenges, our study presents a novel unsupervised anomaly detection framework specifically designed for fetal ultrasound imaging. This framework incorporates gestational age filtering, precise identification of fetal standard planes, and targeted segmentation of brain regions to enhance diagnostic accuracy. Furthermore, we introduce the use of denoising diffusion probabilistic models in this context, marking a significant innovation in detecting previously unrecognized anomalies. We rigorously evaluated the framework using various diffusion-based anomaly detection methods, noise types, and noise levels. Notably, AutoDDPM emerged as the most effective, achieving an area under the precision-recall curve of 79.8% in detecting anomalies. This advancement holds promise for improving the tools available for nuanced and effective prenatal diagnostics.

Keywords:
Fetal Ultrasound Screening Medical Imaging1 Introduction
Ultrasonography (US) is an indispensable tool in prenatal care, widely used for monitoring fetal development due to its safety, real-time imaging capabilities, and cost-effectiveness [18]. Particularly, the mid-pregnancy US scan at around 22 weeks is crucial for assessing fetal growth and identifying potential anomalies, including those affecting the brain. However, interpreting US images remains challenging due to artifacts such as acoustic shadows, speckle noise, and motion blurring. These issues arise from the complex interactions between US waves and biological tissues, which can obscure critical diagnostic details and complicate both manual and automated analyses [16].
The application of deep learning (DL) in fetal US image analysis has shown significant promise, enhancing the ability of clinicians to detect anomalies [11]. However, supervised DL methods face considerable challenges due to the anatomical diversity and the imbalance between healthy and anomalous samples [17, 26]. Given these limitations, unsupervised anomaly detection (UAD) emerges as a viable alternative. UAD methods train exclusively on healthy samples, establishing a baseline for normality without the need for labeled anomaly data. Even though this approach offers better generalization across various pathologies [22], it remains largely unexplored for fetal brain US.
In this study, we propose a novel UAD framework aimed at detecting brain anomalies in fetal US images. Our framework incorporates several innovative techniques to enhance the quality and consistency of the US scans. Specifically, we filter by gestational age, detect and localize standard planes using SonoNet [2], and remove background noise to focus on the brain region. Furthermore, we pioneer the application of denoising diffusion probabilistic models (DDPM) in this context. DDPMs have shown exceptional performance in capturing complex distributions and generating high-fidelity images, making them suitable for identifying previously unseen anomalies in fetal US scans [12].
We thoroughly evaluated our framework using various diffusion-based UAD methods, noise types, and noise levels to understand their performance and robustness. Among these, AutoDDPM [4] demonstrated superior performance, achieving an area under the precision-recall curve of 79.8% in anomaly detection.
2 Related work
This section reviews existing research on anomaly detection (AD) in brain imaging, focusing on two main areas: AD methods for brain MRI and AD methods for fetal US. This overview establishes the context for our novel approach using diffusion-based methods for fetal brain US anomaly detection.
Anomaly Detection in Brain MRI.
Magnetic resonance imaging (MRI) is vital for detecting brain abnormalities, providing detailed tissue contrasts and revealing various pathological changes [15]. Traditional AD methods in brain MRI include Autoencoders [5, 9, 28] and Generative Adversarial Networks [20, 1], which learn the distribution of healthy anatomy by compressing and decompressing image data. Recently, diffusion models have demonstrated superior performance by offering better mode coverage and sample quality. Denoising diffusion probabilistic models (DDPMs) iteratively learn the data distribution through noising and denoising processes, showing significant success in brain MRI applications [3, 23, 4]. However, their application to fetal brain US remains unexplored.
Anomaly Detection in Fetal Images. There has been significant focus on other fetal organs, particularly the heart, for detecting anomalies in fetal imaging. In heart imaging, Chotzoglou et al. [10] proposed an unsupervised approach for detecting Hypoplastic Left Heart Syndrome from fetal US images. Research on AD for fetal brain imaging has also seen notable contributions. FOAC-NET, a supervised convolutional neural network (CNN) architecture, has been developed for detecting fetal organ anomalies in MRI [14]. In fetal brain US, H. N. Xie et al. [25] developed a CNN-based system to classify US images into normal and abnormal categories, achieving high accuracy. However, unsupervised anomaly detection remains largely unexplored in fetal brain US.
3 Methods
3.1 Background
Denoising Diffusion Probabilistic Models (DDPMs) use a forward diffusion process, , to incrementally corrupt data from a target distribution, , to a normal distribution. A reverse process, , generates samples by transforming noise back to . The forward process is defined as:
(1) |
with a variance schedule, , increasing linearly from to [13]. The reverse generative model, with parameters , begins with and proceeds from to :
(2) |
where , and . A U-Net [19] is used to learn the noise and approximate . The loss function, , targeting the marginal likelihood , is:
(3) |
where is the Kullback-Leibler divergence. We use Ho et al.’s simplified objective [13]:
(4) |
AnoDDPM [24] leverages DDPMs with either Gaussian or Simplex noise for anomaly detection segmentation. A more recent method proposes a conditional diffusion model to produce more accurate pseudo-healthy counterfactuals, known as AutoDDPM [4].
3.2 Fetal Brain UAD Framework
Selecting US data from the initial dataset requires meticulous attention to ensure the high quality of the training, validation, and testing data. This selection process must meet specific criteria, such as the correct gestational age, the transventricular plane view, and the visibility of key brain structures. To diminish the noise inherent in US images, segmentation into brain and background is necessary. Manually performing this selection process can be extremely time-consuming. Therefore, we propose a semi-automatic data preprocessing pipeline (see Fig. 2) that aids in the data curation process, thereby enhancing the quality of the data in the training dataset. Following the construction of the final training, validation, and testing datasets using our semi-automatic pipeline, we train DDPMs to remove artificially added noise and reconstruct pseudo-healthy images. The modular inference setup of our method allows us to test different inference strategies, including AnoDDPM and AutoDDPM, with various noise types such as Simplex and Gaussian.

3.2.1 Standard Plane Detection.
We utilize SonoNet [2] to automatically select images with the correct transventricular plane view of the fetal brain. SonoNet is designed for the real-time detection of fetal standard scan planes in US images and classifies images into one of 13 standard plane categories, including the brain view at the posterior horn of the ventricle (Brain (tv.)). The core of the method is a CNN inspired by the VGG16 architecture. For our experiment, we employed SonoNet-32, equipped with 32 kernels, achieving an F1-Score of 0.798 in plane classification. SonoNet also provides a confidence score for its predictions, enabling a more rigorous selection of candidate images. We eliminate all images that are not labeled as "Brain (tv.)" or have a corresponding confidence score lower than 0.9.
3.2.2 Brain Segmentation.
In the second step of our pipeline, we segment and crop all the images to eliminate background noise. We use a probabilistic deep learning approach with a U-Net segmentation network to mask the head from US images. Although in [21, 7] ellipses are fitted to the segmented contours for biometric measurements, we only utilize the segmentation model. A manual review is performed to verify the quality of the data that has been automatically selected and preprocessed.
3.2.3 Diffusion-based Anomaly Detection.
The final step in our methodology involves the use of DDPMs for anomaly detection. The inference phase of our method is modular, allowing the use of different strategies for anomaly detection. We explore the use of both AnoDDPM and AutoDDPM methods.
4 Experiments and Results
4.1 Experimental Setup
Datasets. For our experiments, we utilized data from both public [8] and private clinical datasets. The clinical dataset comprised 234 control patients, from which only those within a gestational age range of 19 to 22+6 weeks were selected for inclusion. Consequently, our final dataset for training, validation, and testing included 76 patients from the clinical dataset and 19 from the public dataset, totaling 252 images. For the evaluation of the downstream task, we chose 8 anomalous and 5 healthy control patients from the private clinical dataset, yielding 18 and 12 images, respectively. We adjusted the input pixel values to fall within the range of (0, 1) for Gaussian noise and for Simplex noise. We normalized the images to the 98th percentile, resized them to a resolution of , and applied rotations and horizontal and vertical flips as data augmentation.
Evaluation Metrics. To evaluate the performance on healthy scans, we utilized several metrics, including mean absolute error (MAE) as a measure of reconstruction error, structural similarity index (SSIM), and learned perceptual image patch similarity (LPIPS) [27] for reconstruction accuracy. For anomaly detection performance, we assessed the algorithm’s ability to correctly classify images as either healthy or anomalous using true positives, true negatives, false positives, and false negatives. We calculated the Area Under the Precision-Recall Curve (AUPRC) and the Area Under the Receiver Operating Characteristic (AUROC) curve to provide a comprehensive view of the algorithm’s classification performance.
4.2 Anomaly Detection Performance
Method | Healthy | Pathological | ||
---|---|---|---|---|
SSIM | LPIPS | AURPC | AUROC | |
AnoDDPM [24] with Simplex(t=50) | 0.81±0.05 | 0.26±0.05 | 78.9 | 70.8 |
AnoDDPM [13] with Gaussian(t=250) | 0.88±0.01 | 0.05±0.01 | 73.0 | 63.8 |
AnoDDPM [13] with Gaussian(t=300) | 0.87±0.02 | 0.05±0.01 | 73.5 | 57.4 |
AutoDDPM [4] with Gaussian(t=300) | 0.88 0.02 | 0.05 0.02 | 79.8 | 66.6 |


We assessed the performance of AnoDDPM employing both Gaussian and Simplex Noise, alongside AutoDDPM, as detailed in [4]. The evaluation focused on various noise levels (), analyzing the model’s robustness under different conditions.
The classification outcomes are succinctly summarized in Table 1, with the most effective noise variations and types for both AnoDDPM and AutoDDPM highlighted. Corresponding visual results are depicted in Fig. 3. For a thorough examination of all noise levels and types, including metrics on reconstruction errors and accuracy, readers are directed to the supplementary materials, which provide an extensive overview of all conducted experiments.
Influence of Noise Type and Levels. We evaluated AnoDDPM using Gaussian and Simplex noise at various noise levels, as summarized in Tables 2 and 3 in the supplementary materials. While Gaussian noise at a level of yields the best performance in terms of reconstruction for healthy images (MAE of 0.013, SSIM of 0.898), it demonstrates limitations in anomaly detection with modest scores (67.5 AUPRC and 43.5 AUROC). This reflects a typical challenge in medical imaging: achieving minimal alteration in input images often compromises the effectiveness of anomaly detection. In contrast, higher noise levels () show improved anomaly detection results, reaching up to 73 AUPRC and 63.8 AUROC, despite the compromise in image quality metrics. This observation aligns with the ’noise paradox’ discussed in.[4], illustrating the complex trade-off between achieving optimal reconstruction of healthy images and effective anomaly detection in pathological cases. This paradox can also be observed in Fig.4, where higher noise levels enable the detection of the pathology but introduce false positive detections.
At a noise level of , AnoDDPM with Simplex noise achieves its most effective reconstruction capabilities for healthy images, recording MAE values of 0.034 and SSIM scores of 0.808. These significantly lag behind the results obtained with Gaussian noise, highlighting a substantial disparity in denoising effectiveness. As noise levels increase, both MAE and SSIM metrics for Simplex noise exhibit gradual deterioration, e.g., for , the MAE increases to 0.045, while SSIM scores decrease to 0.741. In terms of anomaly detection capabilities, Simplex noise at delivers the most impressive results, achieving an AUPRC of 78.9 and an AUROC of 70.8. These figures are the highest among all configurations and noise types tested, underscoring the superior ability of Simplex noise to detect anomalies under specific conditions, despite its lower reconstruction performance.

Figure 3 illustrates that Gaussian noise models achieve high-quality reconstructions and effectively highlight anomalies in fetal brain US images. In contrast, Simplex noise introduces noticeable noise artifacts on both healthy and pathological scans, though it remains effective in anomaly detection.
Influence of Anomaly Maps. We analyzed the impact of different anomaly maps — MAE, LPIPS [6], and their combination (MAE*LPIPS) — on anomaly detection performance. Detailed results are shown in Table 4 and Figures 1 and 2 in the supplementary materials. The combined anomaly maps yield the best performance overall, proving to be more precise in balancing the identification of pathological regions and reducing false positive detections.
5 Discussion
Our study into the automatic anomaly detection pipeline for fetal brain US using DDPMs demonstrates effective pseudo-healthy synthesis and precise anomaly localization. Notably, newer diffusion models such as AutoDDPM [4] show significant promise in enhancing these capabilities. However, the performance of Simplex noise raises specific concerns; its interaction with the inherent noise characteristic of US images suggests it may not be ideally suited for these applications. This behavior necessitates further analysis to fully understand and mitigate adverse effects caused by this type of artificial noise.
Although the results are promising and the dataset was large enough for the models to be able to capture the underlying normative distribution, the statistical validation of these findings requires more extensive datasets. This project is part of an ongoing initiative within the clinic, with continuous efforts to collect more US images to enrich our dataset. Additionally, assessing the clinical utility of this automated screening for pathologies is vital. Evaluating how well our pipeline supports operators and sonographers in detecting and diagnosing conditions will help determine the practical benefits of integrating this technology into everyday clinical practice.
Future research directions will focus on evaluating our pipeline on US video data to simulate real clinical environments more accurately. This includes capturing the dynamic aspects of fetal movements and variations in US probe positioning, which can significantly impact image quality and diagnostic accuracy. Moreover, expanding the diversity of the dataset with images from different US machines and settings will help improve the generalizability and robustness of the proposed methods.
6 Conclusion
In this work, we introduced a novel framework to enable fetal brain anomaly detection on US scans. We designed the framework to automatically reduce the noise and randomness inherent in the US images, allowing models to learn the normative distribution effectively. For the first time, we applied diffusion-based models to automatically processed fetal brain US scans and evaluated their performance under different noise types and levels. Our results indicate that diffusion models, particularly AutoDDPM, hold significant potential for improving the accuracy and reliability of fetal brain anomaly detection in clinical settings.
References
- [1] Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.: Ganomaly: Semi-supervised anomaly detection via adversarial training. In: Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14. pp. 622–637. Springer (2019)
- [2] Baumgartner, C.F., Kamnitsas, K., Matthew, J., Fletcher, T.P., Smith, S., Koch, L.M., Kainz, B., Rueckert, D.: Sononet: Real-time detection and localisation of fetal standard scan planes in freehand ultrasound (2017)
- [3] Behrendt, F., Bhattacharya, D., Krüger, J., Opfer, R., Schlaefer, A.: Patched diffusion models for unsupervised anomaly detection in brain mri. International Conference on Medical Imaging with Deep Learning (2023)
- [4] Bercea, C.I., Neumayr, M., Rueckert, D., Schnabel, J.A.: Mask, stitch, and re-sample: Enhancing robustness and generalizability in anomaly detection through automatic diffusion models (2023)
- [5] Bercea, C.I., Rueckert, D., Schnabel, J.A.: What do aes learn? challenging common assumptions in unsupervised anomaly detection. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 304–314. Springer (2023)
- [6] Bercea, C.I., Wiestler, B., Rueckert, D., Schnabel, J.A.: Generalizing unsupervised anomaly detection: towards unbiased pathology screening. In: Medical Imaging with Deep Learning (2023)
- [7] Budd, S., Sinclair, M., Khanal, B., Matthew, J., Lloyd, D., Gomez, A., Toussaint, N., Robinson, E.C., Kainz, B.: Confident head circumference measurement from ultrasound with real-time feedback for sonographers. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. pp. 683–691. Springer International Publishing, Cham (2019)
- [8] Burgos-Artizzu, X.P., Coronado-Gutierrez, D., Valenzuela-Alcaraz, B., Bonet-Carne, E., Eixarch, E., Crispi, F., Gratacós, E.: FETAL_PLANES_DB: Common maternal-fetal ultrasound images (Jun 2020). https://doi.org/10.5281/zenodo.3904280, https://doi.org/10.5281/zenodo.3904280
- [9] Chen, X., You, S., Tezcan, K.C., Konukoglu, E.: Unsupervised lesion detection via image restoration with a normative prior. Medical Image Analysis 64 (2020)
- [10] Chotzoglou, E., Day, T., Tan, J., Matthew, J., Lloyd, D., Razavi, R., Simpson, J., Kainz, B., et al.: Learning normal appearance for fetal anomaly screening: Application to the unsupervised detection of hypoplastic left heart syndrome. Machine Learning for Biomedical Imaging 1(September 2021 issue), 1–25 (2021)
- [11] Fiorentino, M.C., Villani, F.P., Di Cosmo, M., Frontoni, E., Moccia, S.: A review on deep-learning algorithms for fetal ultrasound-image analysis. Medical Image Analysis 83, 102629 (Jan 2023). https://doi.org/10.1016/j.media.2022.102629, http://dx.doi.org/10.1016/j.media.2022.102629
- [12] Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. CoRR abs/2006.11239 (2020), https://arxiv.org/abs/2006.11239
- [13] Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
- [14] Lo, J., Lim, A., Wagner, M.W., Ertl-Wagner, B., Sussman, D.: Fetal organ anomaly classification network for identifying organ anomalies in fetal mri. Frontiers in Artificial Intelligence 5 (2022). https://doi.org/10.3389/frai.2022.832485, https://www.frontiersin.org/articles/10.3389/frai.2022.832485
- [15] Luo, G., Xie, W., Gao, R., Zheng, T., Chen, L., Sun, H.: Unsupervised anomaly detection in brain mri: Learning abstract distribution from massive healthy brains. Computers in Biology and Medicine 154, 106610 (2023). https://doi.org/https://doi.org/10.1016/j.compbiomed.2023.106610, https://www.sciencedirect.com/science/article/pii/S0010482523000756
- [16] Meng, L., Zhao, D., Yang, Z., Wang, B.: Automatic display of fetal brain planes and automatic measurements of fetal brain parameters by transabdominal three-dimensional ultrasound. Journal of Clinical Ultrasound 48 (07 2019). https://doi.org/10.1002/jcu.22762
- [17] Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: A review. ACM Computing Surveys 54(2), 1–38 (Mar 2021). https://doi.org/10.1145/3439950, http://dx.doi.org/10.1145/3439950
- [18] Reddy, U., Filly, R., Copel, J.: Prenatal imaging: Ultrasonography and magnetic resonance imaging. Obstetrics and gynecology 112, 145–57 (08 2008). https://doi.org/10.1097/01.AOG.0000318871.95090.d9
- [19] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation (2015)
- [20] Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U.: f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis 54, 30–44 (2019)
- [21] Sinclair, M., Baumgartner, C.F., Matthew, J., Bai, W., Martinez, J.C., Li, Y., Smith, S., Knight, C.L., Kainz, B., Hajnal, J., King, A.P., Rueckert, D.: Human-level performance on automatic head biometrics in fetal ultrasound using fully convolutional neural networks. In: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). pp. 714–717 (2018). https://doi.org/10.1109/EMBC.2018.8512278
- [22] Tschuchnig, M.E., Gadermayr, M.: Anomaly Detection in Medical Imaging - A Mini Review, p. 33–38. Springer Fachmedien Wiesbaden (2022). https://doi.org/10.1007/978-3-658-36295-9_5, http://dx.doi.org/10.1007/978-3-658-36295-9_5
- [23] Wolleb, J., Bieder, F., Sandkühler, R., Cattin, P.C.: Diffusion models for medical anomaly detection. Medical Image Computing and Computer Assisted Intervention pp. 35–45 (2022)
- [24] Wyatt, J., Leach, A., Schmon, S.M., Willcocks, C.G.: Anoddpm: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 650–656 (06 2022)
- [25] Xie, H.N., Wang, N., He, M., Zhang, L.H., Cai, H.M., Xian, J.B., Lin, M.F., Zheng, J., Yang, Y.Z.: Using deep-learning algorithms to classify fetal brain ultrasound images as normal or abnormal. Ultrasound in Obstetrics & Gynecology 56(4), 579–587 (2020). https://doi.org/https://doi.org/10.1002/uog.21967, https://obgyn.onlinelibrary.wiley.com/doi/abs/10.1002/uog.21967
- [26] Zhang, H., Guo, W., Zhang, S., Lu, H., Zhao, X.: Unsupervised deep anomaly detection for medical images using an improved adversarial autoencoder. Journal of Digital Imaging 35 (01 2022). https://doi.org/10.1007/s10278-021-00558-8
- [27] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric (2018)
- [28] Zimmerer, D., Isensee, F., Petersen, J., Kohl, S., Maier-Hein, K.: Unsupervised anomaly localization using variational auto-encoders (2019)