This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Self-supervised Brain Lesion Generation for Effective Data Augmentation of Medical Images

Jiayu Huo Sébastien Ourselin Rachel Sparks School of Biomedical Engineering and Imaging Sciences (BMEIS), King’s College London, London, UK.
Abstract

Accurate brain lesion delineation is important for planning neurosurgical treatment. Automatic brain lesion segmentation methods based on convolutional neural networks have demonstrated remarkable performance. However, neural network performance is constrained by the lack of large-scale well-annotated training datasets. In this manuscript, we propose a comprehensive framework to efficiently generate new samples for training a brain lesion segmentation model. We first train a lesion generator, based on an adversarial autoencoder, in a self-supervised manner. Next, we utilize a novel image composition algorithm, Soft Poisson Blending, to seamlessly combine synthetic lesions and brain images to obtain training samples. Finally, to effectively train the brain lesion segmentation model with augmented images we introduce a new prototype consistence regularization to align real and synthetic features. Our framework is validated by extensive experiments on two public brain lesion segmentation datasets: ATLAS v2.0 and Shift MS. Our method outperforms existing brain image data augmentation schemes. For instance, our method improves the Dice from 50.36% to 60.23% compared to the U-Net with conventional data augmentation techniques for the ATLAS v2.0 dataset.

keywords:
MSC:
41A05, 41A10, 65D05, 65D17 \KWDBrain Lesion Segmentation, Data Augmentation, Poisson Blending, Prototype Learning.
journal: Medical Image Analysis

1 Introduction

Brain lesions are often indicative of serious neurological conditions, from cancer to stroke. Magnetic Resonance (MR) Imaging is widely used to detect brain lesions as MR provides excellent soft-tissue contrast, allowing for a clear distinction between healthy and abnormal brain tissue. Accurate segmentation of brain lesions is crucial for quantitative analysis of lesion progression and planning surgical treatments. The current clinical standard is human delineation of the brain lesion boundary by an expert which is tedious, time-consuming, and costly. Neural networks have emerged as a promising technique to automate brain lesion segmentation [31, 26]. However, training neural networks requires large amounts of well-annotated images to ensure good performance. The need for large training datasets limits the development of automatic brain lesion segmentation models since the scale of brain lesion datasets is often limited.

Data augmentation is a widely used technique to increase training dataset size and diversity in order to improve model performance. Conventional data augmentation strategies include random rotations, brightness adjustment, etc. Although these simple spatial and appearance transformations improve segmentation performance to some extent, they do not fundamentally increase the diversity of the dataset and provide a smaller boost in performance compared to acquiring new data. Recently, methods to perform data augmentation based on image fusion (such as Mixup [43], CutMix [42], etc. ) have been developed to increase training dataset diversity. However, these approaches may shift the distribution of the original dataset [29], which is catastrophic for small datasets as the model learns non-representative features in the augmented images [43]. Neural network approaches, such as Generative Adversarial Networks (GANs) [7] have been proposed to synthesize data for model training. However, the performance of GANs, similar to other neural networks, is constrained by the size of the training dataset, posing challenges in generating realistic images when only limited data is available. Regardless of the method used to create augmented samples, combining augmented and real samples does not guarantee the segmentation model will learn discriminative features to segment lesions on real samples as there is no supervision in the feature space. Therefore, we raise two questions: How can we generate realistic images that will not shift the original data distribution with limited training samples? How can we effectively use synthetic samples to train a segmentation model to perform well on real samples?

We present a comprehensive framework to effectively augment brain imaging data with synthetic lesions to train a lesion segmentation model. Our framework has three stages: (1) train a lesion generator in a self-supervised manner, and sample feasible latent vectors from a constrained embedding space to create realistic paired lesion images and masks; (2) blend lesion images into existing brain images leveraging our novel image composition technique called Soft Poisson Blending (SPB); (3) train the segmentation model using a prototype consistency regularization term to align real and synthetic lesion features for better performance. The main contributions of our work are:

  • 1.

    We develop a data augmentation method to enhance the performance of neural networks trained to perform segmentation tasks. It comprises a two-stage adversarial autoencoder (AAE), consisting of shape and intensity generation networks, to generate on-the-fly new foreground regions when training. Distinct from other GANs, we designed a self-supervised approach where we simulate image-label pairs for training our AAE. This enables our AAE to learn a larger distribution of lesions and improve the quality of synthetic results.

  • 2.

    We introduce Soft Poisson Blending (SPB), based on Poisson Image Editing [27], to ensure realistic and smooth boundaries when inserting generated lesions into a brain image. SPB computes a refined guidance vector field that adjusts intensity values to make the synthetic lesion appear similar to the surrounding brain tissues.

  • 3.

    We design a new loss term, prototype consistency regularization, to learn common features between synthetic and real training samples.

Refer to caption
Fig. 1: (a) Real lesions (green arrow) and synthetic lesions (orange arrow) in images from the ATLAS v 2.0 (top) and MS Shift dataset (bottom) demonstrating the synthetic lesions have a similar appearance to real lesions. (b) t-SNE of the intensity embedding space for real lesions, synthetic lesions, and training samples. (c) t-SNE of the shape embedding space for real lesions, synthetic lesions, and training samples.

2 Related Work

2.1 Data Augmentation for Data Scarcity

To mitigate the problem of data scarcity when training neural networks, data augmentation techniques are used to increase the training dataset size thereby improving model performance. Conventional data augmentation techniques include shape transformations (flip, rotation, scaling, etc. ) and appearance transformations (color jittering, brightness, and contrast adjustment, etc. [15, 13]. However, the diversity of the dataset achieved by such transformations is limited. Furthermore, the intrinsic characteristics of the training samples are not fundamentally changed, limiting improvements in model performance. The development of GANs enabled more advanced data augmentation where entirely new samples are created by the model. GANs can generate realistic samples for both natural [2] and medical [24, 34] images. GANs may also be designed to enable image-to-image translation, not generating images from noise, in order to create images of different modalities or characteristics. Nevertheless, GANs require large datasets when training the model to enable accurate image generation. Recently, image-manipulation-based data augmentation techniques [43, 42, 44] have been developed to increase training dataset size and variety by using a set of image manipulation rules to generate new samples. For instance, CutMix [42] cuts and fuses patches from different images to create new training samples. Such techniques must be carefully designed and may shift the distribution of the training dataset especially when the original training dataset size is small [29]. In this study, we combine the strengths of generative models and image-manipulation-based data augmentation techniques to synthesize realistic brain images with limited training samples.

2.2 Poisson Blending in Deep Learning

Poisson blending is an image processing technique to seamlessly integrate regions of a source image into a target image. In the context of data augmentation, this typically involves identifying the foreground of a source image and integrating this region into a target image. The blending process is guided by the gradient of the source image and the intensity of the target image [27]. Poisson blending generates more coherent images compared to simpler fusion methods such as Copy-Paste [6]. Poisson blending has been used for data augmentation for a variety of natural images [20]. In the context of medical imaging, [36] developed a self-supervised learning strategy to detect abnormalities in chest X-ray images, and abnormalities were generated by fusing image patches into the target image using Poisson blending. [38] utilized Poisson blending to generate retinal images with lesions and different appearances than in the original training dataset to improve the performance of a lesion segmentation model. [17] introduced Poisson blending to the gastric disease classification task for better model generalization ability. However, all of these applications were applied to 2D images, and to our knowledge, Poisson blending has not yet been applied to 3D medical images.

2.3 Prototype Learning

Prototype networks [35] were first presented for few-shot learning, where the network learns to symbolize each class using a prototype(a.k.a. a representative vector in the embedding space). Each class prototype is typically obtained by computing the average feature of samples belonging to the class. This method was initially applied in image classification where distances between different class prototypes were maximized across training samples [35]. Prototype learning was extended to image segmentation by computing cosine similarity between the class prototypes and individual pixel features. In this context for the test dataset pixels the class with which they have the highest similarity to its prototype. [39] designed a bidirectional prototype alignment mechanism for the few-shot image segmentation task. [16] utilized class prototypes to augment training samples in the feature space. [40] introduced a cyclic prototype consistency framework for semi-supervised medical image segmentation. In our work, we draw on the idea of prototype consistency to introduce a regularisation term during segmentation model training to align features between synthetic and real samples.

3 Methodology

Fig. 2 illustrates our entire framework comprised of three stages: I. training the lesion generator, II. inserting synthetic lesions into brain images, and III. using the generated images for segmentation model training and incorporating prototype consistency regularization. Exemplar synthetic lesions and real lesions are shown in Fig. 1 (a). The synthetic lesions have appearances similar to those of real lesions.

Refer to caption
Fig. 2: Overview of our framework containing three stages. First, the lesion generator is trained via a self-supervised learning strategy and used to generate synthetic lesions based on constrained latent space sampling in Stage I. In Stage II, we seamlessly compose synthetic lesions into full brain images using the proposed Soft Poisson Blending (SPB) to increase the number of training samples. In Stage III, we train the downstream segmentation model with the prototype consistency regularization to align real and synthetic features.

3.1 Self-supervised AAE for Brain Lesion Synthesis

Training a 3D generative model with a small dataset is challenging and achieving a good model performance is unlikely, therefore, we decompose brain lesion generation into two sub-tasks to reduce model complexity and use a self-supervised learning strategy during training. We designed two models: a shape adversarial auto-encoder (shape AAE) and an intensity adversarial auto-encoder (intensity AAE), to first create lesion masks and then perform texture synthesis to generate lesion images corresponding to the masks. Both shape and intensity AAEs are trained in a self-supervised manner. For lesion synthesis real lesion images are used to define the data distribution in the latent space and synthetic lesion images are created by sampling latent vectors from only this distribution before going through the decoder block of the trained AAEs.

3.1.1 Shape and Intensity AAE Design

We follow the model architecture proposed in  [30] to design the shape and intensity AAEs. As shown in Fig. 2, each AAE contains an encoder EE, a decoder GG, an image discriminator DxD_{x}, and a latent discriminator DzD_{z}. Although the structures are similar, for the intensity AAE we introduce a mask embedding block (MEB) [12] to provide shape guidance when generating the lesion intensity. The detailed structure of MEB is shown in Fig. 3. It embeds the mask to the feature space to control the shape of synthetic lesions.

Refer to caption
Fig. 3: The detailed structure of MEB block.

For training the AAE models we use a three-term loss: reconstruction loss rec\mathcal{L}_{rec}, latent adversarial loss adv_z\mathcal{L}_{adv\_z}, and image adversarial loss adv_x\mathcal{L}_{adv\_x}. rec\mathcal{L}_{rec} computed as the mean absolute error (MAE) between the input image \mathcal{I} and the reconstructed image ^\mathcal{\hat{I}}:

rec=^1.\mathcal{L}_{rec}=\|\mathcal{I}-\mathcal{\hat{I}}\|_{1}. (1)

rec\mathcal{L}_{rec} guarantees ^\mathcal{\hat{I}} and \mathcal{I} look similar in general.

The latent adversarial loss adv_z\mathcal{L}_{adv\_z} is formulated to ensure that the latent space of the lesions has a normal distribution:

adv_z(D)\displaystyle\mathcal{L}_{adv\_z}(D) =𝔼[max(0,1+Dz(E()))]\displaystyle=\mathbb{E}[max(0,1+D_{z}(E(\mathcal{I})))] (2)
+𝔼[max(0,1Dz(𝒩(0,1)))],\displaystyle+\mathbb{E}[max(0,1-D_{z}(\mathcal{N}(0,1)))],
adv_z(G)=𝔼[Dz(E())],\mathcal{L}_{adv\_z}(G)=-\mathbb{E}[D_{z}(E(\mathcal{I}))], (3)

where 𝒩(0,1)\mathcal{N}(0,1) is a normal distribution of mean 0 and standard deviation 11. Similarly, the image adversarial loss adv_x\mathcal{L}_{adv\_x} is designed to ensure the reconstructed image is realistic in appearance. It is defined as:

adv_x(D)\displaystyle\mathcal{L}_{adv\_x}(D) =𝔼[max(0,1+Dx(^))]\displaystyle=\mathbb{E}[max(0,1+D_{x}(\mathcal{\hat{I}}))] (4)
+𝔼[max(0,1Dx())],\displaystyle+\mathbb{E}[max(0,1-D_{x}(\mathcal{I}))],
adv_x(G)=𝔼[Dx(^)].\mathcal{L}_{adv\_x}(G)=-\mathbb{E}[D_{x}(\mathcal{\hat{I}})]. (5)

3.1.2 Training Set Generation

Unlike other generative models which train models with real image-mask pairs, we train our model in a self-supervised manner by generating lesion-mask pairs. As shown in Fig. 1 (b) and (c), the latent distribution of real lesions (pink triangles) is a subset of the pre-generated lesions (grey dots), which indicates training on pre-generated lesion-mask pairs is sufficient to learn a representative feature embedding space for real lesion-mask pairs.

The pre-generated lesion-mask pairs are created as follows. Inspired by [10], we first generate n|n𝒰(1,5)n|n\sim\mathcal{U}(1,5) ellipsoids with overlap to simulate a general lesion shape. The half-axis lengths of three directions follow the uniform distribution 𝒰(5,15)\mathcal{U}(5,15). Elastic deformations controlled by σ|σ𝒰(3,6)\sigma|\sigma\sim\mathcal{U}(3,6) are applied to the ellipsoids to make the shape more natural and irregular. Finally, we add Perlin noise [28] to make the boundary more irregular. A comparison between real lesion masks and those generated by this approach is shown in Fig. 4 (a) and (b). Note the complexity of the pre-generated masks exceeds that of real lesion masks, this enhances the AAE’s ability to learn the reconstruction task.

To generate the lesion images we randomly select a location within the brain image from the training set and then extract the intensity values for voxels inside the pre-generated mask within that region. To increase variation in the training set, we apply the foreground intensity perturbation [11] to randomly adjust the intensity values. Fig. 4 (c) and (d) show real lesions and those generated by our approach. Note the styles of the images are similar, which ensures the pre-generated lesion-mask pairs are suitable for the model training.

3.1.3 Constrained Sampling for New Lesion Synthesis

Once AAE model training is complete, we freeze both shape AAE and intensity AAE model weights. As we see in Fig. 1 (b) and (c), the latent space of real lesions is a subset of the latent space of the lesions generated for training the AAE models. Therefore, we use a constrained sampling strategy to synthesize lesions that are more similar to real lesions in the latent space. Specifically, we first obtain the latent embedding vectors for real lesion masks and lesion intensity images using the shape encoder ESE_{S} and the intensity encoder EIE_{I}. Next, we apply Principal Component Analysis (PCA) to these vectors to obtain a dimensionality-reduced latent space representation. Note we apply PCA to the shape and intensity latent embedding vectors separately. We keep the top KK principal components for each space which cover 90%90\% of the latent embedding vector variance.

To create a new synthetic lesion we uniformly sample two vectors, one from the dimensionality-reduced shape latent space and one from the dimensionality-reduced intensity latent space. We then map these dimensionality-reduced latent vectors to the original latent embedding spaces. Finally, we use these two latent embedding vectors as the input to the shape decoder GSG_{S} and intensity decoder GIG_{I}, respectively, to generate new synthetic lesion masks and images.

Refer to caption
Fig. 4: Lesion mask in three views for (a) real masks and (b) pred-generated masks for the shape AAE model training. Lesion images in three views for (c) real images and (d) pred-generated images for the intensity AAE model training.

3.2 Lesion Image Composition

After we generate the synthetic lesion images, we have to fuse the lesion images with a brain image to generate training samples for the segmentation model training. We first select a plausible location in the brain for the generated lesion, then create a composite image using our modified Poisson image editing method called Soft Poisson Blending (SPB) to ensure the boundary between the image and lesion is seamless. We detail this approach below.

3.2.1 Lesion Location Selection

The brain has a regular anatomical structure, with distinct regions including the ventricle and brain stem. The location of brain lesions depends on the underlying pathology (e.g., , stroke and multiple sclerosis). For the datasets in this work, we use FastFurfer [8] to segment the white matter area of the original patient’s brain as a region proposal for lesion location selection. After obtaining the mask for white matter areas, we apply morphological binary erosion operation to shrink the masked region so that the synthetic lesion area will not exceed the mask boundary. We randomly (following a uniform distribution) select a voxel in the masked area as the center of the synthetic lesion.

3.2.2 Soft Poisson Blending

We developed Soft Poisson Blending (SPB), which is a modified implementation of Poisson Image Editing [27]. The key idea of Poisson Blending is to use the Poisson partial differential equation under the Dirichlet boundary condition to specify the intensity value at the boundary area. We first adapt the conventional Poisson Blending approach to apply to 3D images. Second, we refine the guidance vector field, to ensure that the lesion foreground exhibits a natural internal appearance while having edge consistency with the background image.

For a brain image ss, the target region that we would like to blend with the lesion image gg, we define as Ω\Omega. The boundary of Ω\Omega is defined as Ω\partial\Omega. ff, the value function defined on Ω\Omega, has a value of ff^{*} at Ω\partial\Omega. We solve the optimization problem defined as:

minfΩ|fV(x)|2,f|Ω=f|Ω,xΩ\min_{f}\iint_{\Omega}|\nabla f-V(x)|^{2},\left.f\right|_{\partial\Omega}=\left.f^{*}\right|_{\partial\Omega},x\in\Omega (6)

where VV is the guidance vector field and \nabla is the gradient operator. This equation guarantees that: i) The gradient of the foreground content is as close as possible to VV. ii) The boundary pixel values of the foreground are consistent with the existing ss, i.e., a seamless transition. The solution under the Dirichlet boundary condition is the Poisson equation:

Δf=divV(x),f|Ω=f|Ω,xΩ\Delta f=\operatorname{div}V(x),\left.f\right|_{\partial\Omega}=\left.f^{*}\right|_{\partial\Omega},x\in\Omega (7)

where Δ\Delta is the Laplacian operator, and div\operatorname{div} is the divergence operator. The guidance vector field VV (Fig. 5 (f)) is calculated as the mixed gradient of the brain image ss (Fig. 5 (a)), and the synthetic lesion image gg (Fig. 5 (d)) by selecting (s,g)(\nabla s,\nabla g). However, using this definition of VV to construct the blended image can lead to an unnatural appearance (Fig. 5 (c)). This unnatural appearance is caused by the absolute value of g\nabla g on Ω\partial\Omega being much higher than s\nabla s since for gg regions outside of the foreground are zero. Based on this observation, for Soft Poisson Blending, we modified the computation of VV as follows:

V(x)|xΩ={s(x)|s(x)|>|g(x)|&xΩ,g(x)otherwise.\left.V(x)\right|_{x\in\Omega}=\begin{cases}\nabla s(x)&\left|\nabla s(x)\right|>|\nabla g(x)|\ \&\ x\in{\partial\Omega},\\ \nabla g(x)&\text{otherwise.}\end{cases} (8)

This results in the blended image becoming more realistic (Fig. 5 (e)) compared to the original Poisson Blending algorithm (Fig. 5 (f)).

Refer to caption
Fig. 5: (a) The target brain image used for the background image. (b) The composited image based on SPB. (c) The composited image based on the original Poisson Blending. (d) The synthetic brain lesion. (e) The guidance vector field used for SPB. (f) The guidance vector field used for the original Poisson Blending. The yellow arrow points to the gradient values on the region boundary.

3.3 Prototype Consistency for Feature Alignment

After synthetic lesions are blended into the brain images, the next step is training a segmentation model. Note that here the training dataset is a mixture of real and synthetic lesions which provides us with a unique opportunity to use this information to improve the lesion representations at the feature map level. We propose a consistency regularization, to prefer networks where feature maps for the two types of lesions (real and synthetic) are most similar. We hypothesize this will tend towards feature maps that are more general to the segmentation problem and less specific to features of the particular training dataset thereby increasing segmentation model robustness.

The feature map of real lesions is denoted as Freal=F𝟙[M=1]F_{real}=F\cdot\mathds{1}\left[M=1\right], and the feature map of synthetic lesions is denoted Fsyn=F𝟙[M=2]F_{syn}=F\cdot\mathds{1}\left[M=2\right]. Here FF indicates the feature map of the composited image I^\hat{I}, and 𝟙()\mathds{1}(\cdot) is an indicator function where the value is 1 if the condition is true, otherwise it is 0. Inspired by the prototypical network [35], we aim to force the segmentation model to learn similar feature distributions for FrealF_{real} and FsynF_{syn} via a class prototype. Specifically, we first obtain feature prototypes for both lesion types by averaging feature maps for the specific lesion type in the spatial dimension:

𝒫real=x,y,zFreal(x,y,z)x,y,z𝟙[M(x,y,z)=1],\mathcal{P}_{real}=\frac{\sum_{x,y,z}F^{(x,y,z)}_{real}}{\sum_{x,y,z}\mathds{1}\left[M^{(x,y,z)}=1\right]}, (9)
𝒫syn=x,y,zFsyn(x,y,z)x,y,z𝟙[M(x,y,z)=2],\mathcal{P}_{syn}=\frac{\sum_{x,y,z}F^{(x,y,z)}_{syn}}{\sum_{x,y,z}\mathds{1}\left[M^{(x,y,z)}=2\right]}, (10)

where (x,y,z)(x,y,z) is the spatial coordinate. The loss of the prototype differences can then be computed as:

pd=𝒫real𝒫syn1,\mathcal{L}_{pd}=\|\mathcal{P}_{real}-\mathcal{P}_{syn}\|_{1}, (11)

where 1\|\cdot\|_{1} is the L1-norm of a vector.

Only optimizing pd\mathcal{L}_{pd} neglects the intrinsic relation between class-specific features since it only minimizes the discrepancy between two class prototypes [1]. To this end, we develop prototype relationship loss to maximize the consistency between relationship matrices constructed from the class prototype and class-specific features. Since the number of voxels in real and synthetic lesion areas can be different, we randomly sample kk feature vectors within each class to obtain realn×c×k\mathcal{F}_{real}\in\mathbb{R}^{n\times c\times k} and synn×c×k\mathcal{F}_{syn}\in\mathbb{R}^{n\times c\times k}, where cc is the number of feature channels. To measure consistency we compute cosine similarity:

cos(𝒫,)=𝒫𝒫22,\cos(\mathcal{P},\mathcal{F})=\frac{\mathcal{P}\cdot\mathcal{F}}{\|\mathcal{P}\|_{2}\cdot\|\mathcal{F}\|_{2}}, (12)

where 2\|\cdot\|_{2} denotes the L2-norm. The prototype relationship loss is computed as:

prd\displaystyle\mathcal{L}_{prd} =cos(𝒫reali,reali)cos(𝒫reali,syni)1\displaystyle=\|\cos(\mathcal{P}^{i}_{real},\mathcal{F}^{i}_{real})-\cos(\mathcal{P}^{i}_{real},\mathcal{F}^{i}_{syn})\|_{1} (13)
+cos(𝒫syni,reali)cos(𝒫syni,syni)1.\displaystyle+\|\cos(\mathcal{P}^{i}_{syn},\mathcal{F}^{i}_{real})-\cos(\mathcal{P}^{i}_{syn},\mathcal{F}^{i}_{syn})\|_{1}.

The loss function for training the segmentation model is:

=seg+λ1pd+λ2prdpc,\begin{matrix}\mathcal{L}=\mathcal{L}_{seg}+\underbrace{\lambda_{1}\mathcal{L}_{pd}+\lambda_{2}\mathcal{L}_{prd}}_{\mathcal{L}_{pc}},\end{matrix} (14)

where λ1\lambda_{1} and λ2\lambda_{2} are weight factors. pc\mathcal{L}_{pc} is the prototype consistency, comprised of a difference and relationship loss term, and seg\mathcal{L}_{seg} is the compound loss which comprises of the Dice and Cross-entropy loss functions.

4 Experiments

Our framework is evaluated on two public brain segmentation datasets: the ATLAS v2.0 dataset and the Shift MS segmentation dataset as described in Section 4.1. We compare our approach to existing methods (Section 4.3) to show the superiority in both lesion synthesis and segmentation tasks. We further conduct ablation studies to validate the effectiveness of each component in our framework (Section 4.4).

Table 1: Shift MS dataset details, including scanner location, type, magnetic field strength, and dataset split.
Dataset Location Scanner Field Trn Devin Devout Evlin
MEESG-1 Rennes S Verio 3.0T 8 2 0 5
Bordeaux GE Disc 3.0T 5 1 0 2
Lyon S Aera 1.5T 10 2 0 17
P Ingenia 3.0T
ISBI Best P Medical 3.0T 10 2 0 9
PubMRI Ljubljana S Mag 3.0T 0 0 25 0

4.1 Dataset and Preprocessing

4.1.1 ATLAS v2.0 Dataset

The ATLAS v2.0 dataset [19] is a large stroke segmentation dataset that contains 655 T1-weighted brain images collected from 33 research cohorts. All images first have intensity standardization and are registered to the MNI-152 template (1mm3 voxel spacing). A defacing step is applied to anonymize the scan. All lesion masks are annotated and then checked by two neurological experts. We randomly select 80% of the dataset as the training set, and keep the remaining 20% for model evaluation.

4.1.2 Shift MS Dataset

The Shift MS dataset [23] is a multi-center white matter multiple sclerosis segmentation dataset comprised of i.e., MSSEG-1 [5], ISBI[3], PubMRI [18] and a private dataset collected from the University of Lausanne. We use the three public datasets for model training and evaluation since the private dataset is not publicly available. Detailed information on the dataset is shown in Table 1, the training set (Trn) is used for model training, the in-domain development set (Devin), the out-domain development set (Devout), and the in-domain evaluation set (Evain) are used for evaluating model segmentation performance only.

Each subject has two available modalities, FLAIR and T1 with contrast. In this work, we only use the FLAIR modality. All of the subjects have been preprocessed with image denoising, skull stripping, and bias field correction. We resample all images to 1mm3 isotropic spacing. The ground-truth segmentation mask is determined by the consensus of annotations acquired from clinical experts.

Table 2: Quantitative results of our method and other generative models for lesion synthesis. Here ’real’ indicates models are trained using only real images, and ’synt’ indicates models are trained with the self-supervised strategy.
Methods Training #Param ATLAS v2.0 Shifts MS
Data Type PSNR\uparrow MAE\downarrow PSNR\uparrow MAE\downarrow
AE real 64.11M 31.72 0.0014 30.42 0.0017
[33] synt 64.11M 32.07 0.0011 31.56 0.0014
AAE real 64.41M 30.66 0.0016 31.33 0.0014
[22] synt 64.41M 32.11 0.0010 32.84 0.0009
f-AnoGAN real 78.15M 29.86 0.0019 28.35 0.0023
[34] synt 78.15M 30.22 0.0017 30.05 0.0018
DDPM real 74.23M 32.53 0.0009 32.38 0.0009
[9] synt 74.23M 34.48 0.0007 34.23 0.0008
PNDM real 74.23M 32.56 0.0009 32.32 0.0009
[21] synt 74.23M 34.87 0.0007 34.68 0.0007
Ours real 67.08M 35.23 0.0006 34.38 0.0007
synt 67.08M 37.42 0.0004 36.57 0.0005
Table 3: Quantitative results of our method compared to other segmentation models for the Shift MS dataset. * indicates that the p-value << 0.05 compared to the second-best model performance computing with a paired Student’s t-test.
Methods Shifts MS Devin Shifts MS Devout Shifts MS Evlin
DSC\uparrow ASD\downarrow HD95\downarrow DSC\uparrow ASD\downarrow HD95\downarrow DSC\uparrow ASD\downarrow HD95\downarrow
UNet [31] 62.73±\pm21.04 7.71±\pm9.55 20.40±\pm18.91 61.58±\pm21.70 4.34±\pm8.57 14.52±\pm13.60 51.74±\pm20.29 14.01±\pm18.34 29.02±\pm23.83
Attention UNet [25] 72.34±\pm13.24 2.46±\pm2.77 12.53±\pm16.88 64.80±\pm22.98 3.66±\pm8.41 12.83±\pm15.65 65.62±\pm15.21 3.34±\pm5.07 13.92±\pm16.64
SwinUNETR [37] 66.65±\pm14.34 3.54±\pm4.51 18.04±\pm17.42 61.39±\pm22.29 4.79±\pm8.24 15.27±\pm14.60 57.83±\pm13.85 5.38±\pm8.24 20.94±\pm20.77
MedNeXt [32] 69.21±\pm14.39 10.93±\pm12.96 53.96±\pm69.47 64.99±\pm22.99 2.72±\pm6.32 14.97±\pm17.89 59.20±\pm19.88 21.54±\pm37.39 54.39±\pm68.26
Ours 78.35±8.74\textbf{78.35$\pm$8.74}^{*} 0.91±\pm1.09 8.15±12.06\textbf{8.15$\pm$12.06}^{*} 68.52±15.60\textbf{68.52$\pm$15.60}^{*} 1.73±3.24\textbf{1.73$\pm$3.24}^{*} 12.26±\pm19.37 69.51±12.63\textbf{69.51$\pm$12.63}^{*} 1.60±2.17\textbf{1.60$\pm$2.17}^{*} 7.30±7.75\textbf{7.30$\pm$7.75}^{*}

4.2 Implementation Details

The framework is implemented in PyTorch 1.13.1. All model training and validation were performed using an NVIDIA A100 40G GPU. For the lesion generator, the input mask and image size is 64×64×6464\times 64\times 64. We used the AdamW optimizer to train the shape and the intensity models, with the learning rate set to 1e51e-5. The batch size was 4 and the total number of training epochs was 100. For the segmentation model, we use UNet [31] as the backbone model. The input patch size is 128×128×128128\times 128\times 128 and the batch size is 2. We used the AdamW optimizer for the segmentation model training with the learning rate set to 1e31e-3 and consecutively reduced with a cosine annealing strategy. A total of 500 epochs were set for the ATLAS v2.0 dataset and 1000 epochs for the Shift MS dataset. The loss function coefficients are set to λ1=1\lambda_{1}=1, λ2=50\lambda_{2}=50, for each dataset respectively, these values were chosen empirically.

4.3 Model Evaluation

We evaluated our framework on (1) its ability to generate lesions, (2) the performance of the segmentation model trained with images generated using our framework compared to other models and (3) comparing our framework with other data augmentation techniques for training the segmentation model.

4.3.1 Lesion Synthesis Performance

To evaluate the effectiveness of our generative model and self-supervised training strategy, we compare the synthetic lesions generated by our framework to other generative models. We use the peak signal-to-noise ratio (PSNR) and mean absolute error (MAE) to evaluate synthetic results. Structural similarity (SSIM) is not suitable for this scenario because the large proportion of background dominates the small foreground area, leading to an inaccurate representation of the actual image quality. The measures are reported in Table 2. We compare our approach to both GAN and diffusion models. For each method, the model is trained either with only real lesions or only pre-generated image-mask pairs created using the self-supervised strategy (see Section 3.1.2). Across all models training with pre-generated data yields superior metrics compared to the real dataset, underscoring the ability of appropriate synthetic data to enhance model training. Additionally, our approach achieves the highest PSNR and lowest MAE, indicating its robustness and adaptability. Notably, the increased performance of our model is not attributable to the model’s size, as evidenced by the comparison with f-AnoGAN which has the largest number of parameters but the lowest quantitative performance. The performance gains of our method are attributed to its innovative architecture and the utilization of the self-supervised training strategy.

4.3.2 Downstream Segmentation Model Performance

We compare the segmentation model performance using our framework to existing segmentation models. All models were trained from scratch. Note that our framework can use any backbone model to perform the segmentation task, here we consider UNet as the base segmentation model. Segmentation performance for all models was evaluated using the Dice similarity coefficient (DSC), average surface distance (ASD), and 95% Hausdorff distance (HD95). Table 4 shows the segmentation model performance for the ATLAS v2.0 dataset. Our approach consistently demonstrates improved performance for all metrics compared to all competing models, irrespective of the base model. This consistent overperformance is attributed to two pivotal enhancements: the use of an augmented brain lesion dataset, which closely mimics real-world appearance for model training, and the incorporation of the prototype consistency loss to align model features for real and synthetic lesions. These enhancements not only elevate the base model’s ability to accurately segment images but also emphasize a crucial insight: the quality of the dataset plays a more critical role than the network architecture in achieving model generalization for segmentation tasks. Our approach, by leveraging high-fidelity synthetic data and strategic feature alignment, effectively unleashes the potential of classical segmentation models like UNet, establishing that dataset augmentation and targeted modifications to the training loss function can improve segmentation models’ performances for brain stroke lesion segmentation.

Table 4: Quantitative results of our method and other segmentation models for the ATLAS v2.0 dataset. * indicates that the p-value << 0.05 compared to the second-best model using a paired Student’s t-test.
Methods DSC\uparrow ASD\downarrow HD95\downarrow
UNet [31] 50.36±\pm30.28 20.25±\pm25.46 39.42±\pm37.18
Attention UNet [25] 52.64±\pm29.66 19.75±\pm23.70 43.65±\pm38.92
SwinUNETR [37] 53.19±\pm29.82 21.47±\pm51.44 37.33±\pm56.20
MedNeXt [32] 48.24±\pm32.79 21.67±\pm52.65 35.81±\pm54.92
Ours 60.23±29.48\textbf{60.23$\pm$29.48}^{*} 6.32±13.68\textbf{6.32$\pm$13.68}^{*} 20.26±25.81\textbf{20.26$\pm$25.81}^{*}

The quantitative metrics for model segmentation performance in the Shift MS Dataset are reported in Table 3. As with ATLAS v2.0, our framework has improved performance compared to the other models for the out-domain development set (Devout), which demonstrates our framework improves domain generalization ability.

Fig. 6 and Fig. 7 show qualitative segmentation results for the ATLAS v2.0 and Shift MS Dataset, respectively. These results demonstrate that our framework consistently provides more accurate predictions compared to other models. Notably, in regions indicated by the yellow arrows, other models have either false positives or fail to segment the foreground. The improvements in our framework are particularly evident in the segmentation of small lesions, which are often challenging for other segmentation models to recognize.

Refer to caption
Fig. 6: Visualization of lesion segmentation in the ATLAS v2.0 dataset for different models. Here ’Att UNet’ is short for the Attention UNet.

4.3.3 Comparisons with Different Data Augmentation Methods

We compare the performance of our framework with other data augmentation methods, including voxel-based methods and GAN-based methods. Voxel-based data augmentation methods use a combination of two existing images to create a new synthetic image. Here we adopted six methods: i.e., Mixup [43], CutMix [42], Copy-Paste [6], TumorCP [41], CarveMix [43], and SelfMix [45]. For the GAN-based methods [4], a conditional GAN was adapted to generate either a deformation field (D), intensity field (I), or a combination of both (D+I) to change the structure and/or the appearance of images. Regardless of the data augmentation method, UNet was the segmentation model used.

Quantitative segmentation model performance on the ATLAS v2.0 dataset and Shift MS dataset are shown in Table 5 and Table 6, respectively. For the ATLAS v2.0 dataset, our framework achieves the best segmentation metric, improving DSC by 3.65, ASD by 5.99 mm, and HD95 by 3.27 mm compared to the second-best model. A similar trend is seen for the Shift MS dataset. One potential reason for the improvements seen in our framework is that the comparison voxel-based data augmentation methods create unrealistic foreground areas which may shift the decision boundary of the segmentation model and reduce their ability to generalize. For the GAN-based data augmentation methods, generated deformation and intensity fields do not change the intrinsic characteristics of the original images, which results in models that may overfit the training data.

Refer to caption
Fig. 7: Visualization of lesion segmentation on the Shift MS dataset for different models. Here ’Att UNet’ is short for the Attention UNet.
Table 5: UNet Segmentation model performance for our method and other data augmentation techniques on the ATLAS v2.0 dataset. For the GAN method, we consider deformation augmentation (D), intensity augmentation (I), and both (D+I). * indicates that our framework significantly improves segmentation performance (p-value << 0.05) compared to the second-best model using a paired Student’s t-test.
Methods Type DSC\uparrow ASD\downarrow HD95\downarrow
Mixup [43] Voxel 43.75±\pm35.41 23.45±\pm20.48 43.76±\pm32.54
CutMix [42] Voxel 47.32±\pm32.46 22.71±\pm20.03 40.35±\pm31.56
Copy-Paste [6] Voxel 55.24±\pm31.42 15.57±\pm15.42 25.78±\pm27.44
TumorCP [41] Voxel 55.64±\pm32.59 15.43±\pm15.08 24.86±\pm26.94
CarveMix [43] Voxel 56.58±\pm31.05 12.31±\pm16.54 23.53±\pm25.98
SelfMix [45] Voxel 54.13±\pm31.24 16.78±\pm17.55 25.97±\pm26.88
cGAN (D) [4] GAN 52.79±\pm32.23 20.43±\pm22.48 36.72±\pm30.23
cGAN (I) [4] GAN 50.66±\pm30.54 21.14±\pm23.84 39.46±\pm32.36
cGAN (D+I) [4] GAN 51.48±\pm31.26 22.58±\pm28.56 37.22±\pm31.28
Ours Mixed 60.23±29.48\textbf{60.23$\pm$29.48}^{*} 6.32±13.68\textbf{6.32$\pm$13.68}^{*} 20.26±\pm25.81
Table 6: UNet Segmentation model performance for our framework and other data augmentation techniques on the Shift MS dataset. For the GAN method, we consider deformation augmentation (D), intensity augmentation (I), and both (D+I). * indicates that our framework significantly improves segmentation performance (p-value << 0.05) compared to the second-best model using a paired Student’s t-test.
Methods Type Shifts MS Devin Shifts MS Devout Shifts MS Evlin
DSC\uparrow ASD\downarrow HD95\downarrow DSC\uparrow ASD\downarrow HD95\downarrow DSC\uparrow ASD\downarrow HD95\downarrow
Mixup [43] Voxel 52.64±\pm32.68 12.68±\pm13.45 26.75±\pm28.46 48.76±\pm23.45 13.54±\pm14.23 23.45±\pm24.35 38.45±\pm21.56 14.69±\pm15.46 38.58±\pm39.43
CutMix [42] Voxel 54.23±\pm30.69 11.54±\pm12.53 24.54±\pm26.58 50.43±\pm22.59 12.59±\pm13.56 21.76±\pm22.69 40.23±\pm22.63 13.42±\pm12.53 36.44±\pm37.22
Copy-Paste [6] Voxel 60.36±\pm15.48 8.42±\pm9.64 22.43±\pm25.46 58.46±\pm19.53 11.23±\pm10.77 20.69±\pm21.34 43.56±\pm23.28 10.85±\pm11.49 25.12±\pm26.41
TumorCP [41] Voxel 61.55±\pm13.55 6.46±\pm7.69 20.15±\pm27.33 62.44±\pm18.46 9.53±\pm6.75 18.43±\pm20.67 51.12±\pm20.51 9.34±\pm8.32 18.32±\pm15.46
CarveMix [43] Voxel 65.28±\pm12.47 5.24±\pm5.53 15.63±\pm18.42 63.86±\pm17.63 8.75±\pm7.53 15.23±\pm20.49 54.53±\pm18.24 6.43±\pm7.22 15.44±\pm12.75
SelfMix [45] Voxel 62.45±\pm14.62 7.53±\pm6.45 19.44±\pm20.47 60.54±\pm18.45 10.84±\pm7.53 16.29±\pm21.54 52.36±\pm19.42 7.52±\pm6.31 16.32±\pm11.42
cGAN (D) [4] GAN 62.87±\pm22.54 9.53±\pm7.65 19.78±\pm22.23 62.12±\pm20.68 6.52±\pm7.98 17.32±\pm21.34 52.65±\pm18.23 12.89±\pm10.25 28.36±\pm24.35
cGAN (I) [4] GAN 60.23±\pm25.65 10.54±\pm8.33 23.45±\pm24.25 60.98±\pm22.45 8.51±\pm8.21 18.55±\pm20.35 51.25±\pm18.78 13.45±\pm11.54 30.48±\pm26.35
cGAN (D+I) [4] GAN 61.27±\pm24.47 9.23±\pm8.11 22.36±\pm21.69 61.73±\pm21.59 6.78±\pm8.07 17.21±\pm19.24 52.73±\pm17.63 12.45±\pm10.48 26.45±\pm24.51
Ours Mixed 78.35±8.74\textbf{78.35$\pm$8.74}^{*} 0.97±1.09\textbf{0.97$\pm$1.09}^{*} 8.15±12.06\textbf{8.15$\pm$12.06}^{*} 68.52±15.60\textbf{68.52$\pm$15.60}^{*} 1.73±3.24\textbf{1.73$\pm$3.24}^{*} 12.26±\pm19.37 69.51±12.63\textbf{69.51$\pm$12.63}^{*} 1.60±2.17\textbf{1.60$\pm$2.17}^{*} 7.30±7.75\textbf{7.30$\pm$7.75}^{*}

4.4 Ablation Studies

We validate the effectiveness of the three modules in our framework i.e., Lesion synthesis (SL), Soft Poisson Blending (SPB), and Prototype Consistency (PC). All ablation studies were conducted on the ATLAS v2.0 dataset and Shift MS dataset.

4.4.1 Synthetic Lesion

We validated the performance of the segmentation model when using only synthetic lesions for training. The results are shown in the first row of Table 7 and Table 8 for the ATLAS v2.0 and Shift MS datasets, respectively. For the ATLAS v2.0 dataset, training with only synthetic lesions improves segmentation performance compared to using only the real data for model training. In contrast, segmentation performance on the Shift MS dataset is worse. The reason for this discrepancy we believe is dataset size, the ATLAS v2.0 dataset contains 655 images but the Shift MS dataset only has 33 images. While foreground mismatch and boundary artifacts caused by directly inserting the synthetic lesions can increase the model’s generalization, they are catastrophic for a small dataset like the MS Shift dataset since the synthetic lesions with boundary artifacts will shift the segmentation model feature distribution.

4.4.2 Soft Poisson Blending

Applying SPB to achieve a consistent appearance with real lesions and a seamless boundary improves the segmentation model performance for both datasets (the second row of Table 7 and Table 8). Our results demonstrate that resolving the inconsistent appearance of synthetic lesions improves the model performance.

4.4.3 Prototype Consistency

To address the potential feature gap caused by synthetic and real images, we introduced prototype consistency regularization. This penalty, applied to both real and synthetic lesions, ensures the segmentation model learns similar features for lesions regardless of origin. Results shown in the third row of Tables 7 and 8 demonstrate applying prototype consistency regularization solely to synthetic lesions yields improved segmentation model performance. Moreover, integrating this regularization with a consistent lesion appearance further enhances segmentation performance, as evidenced in the fourth row of Table 7. The Shift MS dataset (Table 8) demonstrates a substantial improvement in segmentation performance compared to models where the consistency penalty was not employed highlighting that feature alignment is most important for small datasets where even a small shift in the synthetic lesion distribution can affect segmentation performance.

Table 7: Ablation study on the components of our framework: synthetic lesions (SL), Soft Poisson Blending (SPB), and prototype consistency loss (PC) for the ATLAS v2.0 dataset. UNet is the segmentation model.
SL SPB PC DSC\uparrow ASD\downarrow HD95\downarrow
\checkmark 54.86±\pm29.84 16.96±\pm24.59 34.98±\pm36.01
\checkmark \checkmark 58.71±\pm27.20 11.30±\pm17.57 27.17±\pm30.28
\checkmark \checkmark 59.38±\pm30.80 18.93±\pm72.62 30.00±\pm73.03
\checkmark \checkmark \checkmark 60.23±\pm29.48 6.32±\pm13.68 20.26±\pm25.81
Table 8: Ablation study on the components of our framework: synthetic lesions (SL), Soft Poisson Blending (SPB), and prototype consistency loss (PC) for the Shift MS dataset. UNet is the segmentation model.
SL SPB PC Shifts MS Devin
DSC\uparrow ASD\downarrow HD95\downarrow
\checkmark 59.54±\pm24.61 9.68±\pm12.64 22.36±\pm17.31
\checkmark \checkmark 67.16±\pm14.26 5.98±\pm6.74 20.06±\pm16.99
\checkmark \checkmark 70.05±\pm20.63 3.11±\pm4.39 13.49±\pm17.06
\checkmark \checkmark \checkmark 78.35±\pm8.74 0.97±\pm1.09 8.15±\pm12.06
SL SPB PC Shifts MS Devout
DSC\uparrow ASD\downarrow HD95\downarrow
\checkmark 58.48±\pm25.06 6.89±\pm9.77 17.75±\pm15.71
\checkmark \checkmark 59.23±\pm26.13 7.11±\pm10.79 16.36±\pm16.76
\checkmark \checkmark 59.99±\pm24.63 3.30±\pm7.16 15.88±\pm17.47
\checkmark \checkmark \checkmark 68.52±\pm15.60 1.73±\pm3.24 12.26±\pm19.37
SL SPB PC Shifts MS Evlin
DSC\uparrow ASD\downarrow HD95\downarrow
\checkmark 42.48±\pm23.76 18.42±\pm19.71 33.52±\pm23.88
\checkmark \checkmark 53.89±\pm22.22 18.20±\pm28.73 41.92±\pm54.82
\checkmark \checkmark 54.55±\pm25.35 6.05±\pm7.33 17.30±\pm15.15
\checkmark \checkmark \checkmark 69.51±\pm12.63 1.60±\pm2.17 7.30±\pm7.75

5 Conclusions & Discussions

We presented a comprehensive framework to augment existing training samples for brain lesion segmentation via a two-stage adversarial autoencoder (AAE) to generate new lesion images. The AAE is trained in a self-supervised manner, but generates synthetic lesions with the same latent space distribution as real lesions. We then augment the synthetic images by using Soft Poisson Blending (SPB) to create a seamless boundary between foreground and background, eliminating boundary artifacts. Finally we introduce a prototype consistency regularisation term during segmentation model training to ensure similar features across synthetic and real lesions are learnt. The synthetic lesion samples boost segmentation model performance under the supervision of the prototype consistency penalty. Experiments on two public datasets demonstrate that our framework outperforms other data augmentation approaches and methods that only adapt augmented samples for model training. We do not compare our approach to models based on pre-trained datasets such as SAM [14] because they are pre-trained on a large-scale dataset, making direct comparisons unfair. Besides, SAM-based methods largely depend on accurate user prompts to achieve good segmentation results, which differs from our fully automatic setting requiring no prompt. Currently, our framework is validated on brain lesion MRI datasets. Extending our framework to other image modalities and other organs will be future work. Additionally, we will explore adding conditions to further control the process of lesion image synthesis for controllable data augmentation in the future.

Acknowledgments

This work was supported by Centre for Doctoral Training in Surgical and Interventional Engineering at King’s College London; the funding from the Wellcome Trust Award (218380/Z/19/Z) and the Wellcome/EPSRC Centre for Medical Engineering (WT203148/Z/16/Z).

References

  • Asadi et al. [2023] Asadi, N., Davari, M., Mudur, S., Aljundi, R., Belilovsky, E., 2023. Prototype-sample relation distillation: towards replay-free continual learning, in: International Conference on Machine Learning, PMLR. pp. 1093–1106.
  • Brock et al. [2018] Brock, A., Donahue, J., Simonyan, K., 2018. Large scale gan training for high fidelity natural image synthesis, in: International Conference on Learning Representations.
  • Carass et al. [2017] Carass, A., Roy, S., Jog, A., Cuzzocreo, J.L., Magrath, E., Gherman, A., Button, J., Nguyen, J., Prados, F., Sudre, C.H., et al., 2017. Longitudinal multiple sclerosis lesion segmentation: resource and challenge. NeuroImage 148, 77–102.
  • Chaitanya et al. [2021] Chaitanya, K., Karani, N., Baumgartner, C.F., Erdil, E., Becker, A., Donati, O., Konukoglu, E., 2021. Semi-supervised task-driven data augmentation for medical image segmentation. Medical Image Analysis 68, 101934.
  • Commowick et al. [2018] Commowick, O., Istace, A., Kain, M., Laurent, B., Leray, F., Simon, M., Pop, S.C., Girard, P., Ameli, R., Ferré, J.C., et al., 2018. Objective evaluation of multiple sclerosis lesion segmentation using a data management and processing infrastructure. Scientific reports 8, 13650.
  • Ghiasi et al. [2021] Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., Zoph, B., 2021. Simple copy-paste is a strong data augmentation method for instance segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2918–2928.
  • Goodfellow et al. [2014] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014. Generative adversarial nets. Advances in neural information processing systems 27.
  • Henschel et al. [2020] Henschel, L., Conjeti, S., Estrada, S., Diers, K., Fischl, B., Reuter, M., 2020. Fastsurfer-a fast and accurate deep learning based neuroimaging pipeline. NeuroImage 219, 117012.
  • Ho et al. [2020] Ho, J., Jain, A., Abbeel, P., 2020. Denoising diffusion probabilistic models. Advances in neural information processing systems 33, 6840–6851.
  • Hu et al. [2023] Hu, Q., Chen, Y., Xiao, J., Sun, S., Chen, J., Yuille, A.L., Zhou, Z., 2023. Label-free liver tumor segmentation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7422–7432.
  • Huo et al. [2023] Huo, J., Liu, Y., Ouyang, X., Granados, A., Ourselin, S., Sparks, R., 2023. Arhnet: Adaptive region harmonization for lesion-aware augmentation to improve segmentation performance, in: International Workshop on Machine Learning in Medical Imaging, Springer. pp. 377–386.
  • Huo et al. [2022] Huo, J., Vakharia, V., Wu, C., Sharan, A., Ko, A., Ourselin, S., Sparks, R., 2022. Brain lesion synthesis via progressive adversarial variational auto-encoder, in: International Workshop on Simulation and Synthesis in Medical Imaging, Springer. pp. 101–111.
  • Isensee et al. [2021] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H., 2021. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18, 203–211.
  • Kirillov et al. [2023] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al., 2023. Segment anything, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026.
  • Krizhevsky et al. [2012] Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25.
  • Kuo et al. [2020] Kuo, C.W., Ma, C.Y., Huang, J.B., Kira, Z., 2020. Featmatch: Feature-based augmentation for semi-supervised learning, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, Springer. pp. 479–495.
  • Lee and Cho [2023] Lee, H.s., Cho, H.c., 2023. Improving classification performance in gastric disease through realistic data augmentation technique based on poisson blending. Journal of Electrical Engineering & Technology , 1–8.
  • Lesjak et al. [2018] Lesjak, Ž., Galimzianova, A., Koren, A., Lukin, M., Pernuš, F., Likar, B., Špiclin, Ž., 2018. A novel public mr image dataset of multiple sclerosis patients with lesion segmentations based on multi-rater consensus. Neuroinformatics 16, 51–63.
  • Liew et al. [2022] Liew, S.L., Lo, B.P., Donnelly, M.R., Zavaliangos-Petropulu, A., Jeong, J.N., Barisano, G., Hutton, A., Simon, J.P., Juliano, J.M., Suri, A., et al., 2022. A large, curated, open-source stroke neuroimaging dataset to improve lesion segmentation algorithms. Scientific data 9, 320.
  • Liu et al. [2021] Liu, C., Wang, Z., Wang, S., Tang, T., Tao, Y., Yang, C., Li, H., Liu, X., Fan, X., 2021. A new dataset, poisson gan and aquanet for underwater object grabbing. IEEE Transactions on Circuits and Systems for Video Technology 32, 2831–2844.
  • Liu et al. [2022] Liu, L., Ren, Y., Lin, Z., Zhao, Z., 2022. Pseudo numerical methods for diffusion models on manifolds, in: International Conference on Learning Representations.
  • Makhzani et al. [2015] Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B., 2015. Adversarial autoencoders. arXiv preprint arXiv:1511.05644 .
  • Malinin et al. [2022] Malinin, A., Athanasopoulos, A., Barakovic, M., Cuadra, M.B., Gales, M.J., Granziera, C., Graziani, M., Kartashev, N., Kyriakopoulos, K., Lu, P.J., et al., 2022. Shifts 2.0: Extending the dataset of real distributional shifts. arXiv preprint arXiv:2206.15407 .
  • Nie et al. [2017] Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang, Q., Shen, D., 2017. Medical image synthesis with context-aware generative adversarial networks, in: Medical Image Computing and Computer Assisted Intervention- MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, September 11-13, 2017, Proceedings, Part III 20, Springer. pp. 417–425.
  • Oktay et al. [2018] Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S., Hammerla, N.Y., Kainz, B., et al., 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 .
  • Pereira et al. [2016] Pereira, S., Pinto, A., Alves, V., Silva, C.A., 2016. Brain tumor segmentation using convolutional neural networks in mri images. IEEE transactions on medical imaging 35, 1240–1251.
  • Pérez et al. [2023] Pérez, P., Gangnet, M., Blake, A., 2023. Poisson image editing, in: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 577–582.
  • Perlin [1985] Perlin, K., 1985. An image synthesizer. ACM Siggraph Computer Graphics 19, 287–296.
  • Pinto et al. [2022] Pinto, F., Yang, H., Lim, S.N., Torr, P., Dokania, P., 2022. Using mixup as a regularizer can surprisingly improve accuracy & out-of-distribution robustness. Advances in Neural Information Processing Systems 35, 14608–14622.
  • Rombach et al. [2022] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B., 2022. High-resolution image synthesis with latent diffusion models, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695.
  • Ronneberger et al. [2015] Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, Springer. pp. 234–241.
  • Roy et al. [2023] Roy, S., Koehler, G., Ulrich, C., Baumgartner, M., Petersen, J., Isensee, F., Jaeger, P.F., Maier-Hein, K.H., 2023. Mednext: transformer-driven scaling of convnets for medical image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 405–415.
  • Rumelhart et al. [1986] Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by back-propagating errors. nature 323, 533–536.
  • Schlegl et al. [2019] Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U., 2019. f-anogan: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis 54, 30–44.
  • Snell et al. [2017] Snell, J., Swersky, K., Zemel, R., 2017. Prototypical networks for few-shot learning. Advances in neural information processing systems 30.
  • Tan et al. [2021] Tan, J., Hou, B., Day, T., Simpson, J., Rueckert, D., Kainz, B., 2021. Detecting outliers with poisson image interpolation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V 24, Springer. pp. 581–591.
  • Tang et al. [2022] Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V., Hatamizadeh, A., 2022. Self-supervised pre-training of swin transformers for 3d medical image analysis, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20730–20740.
  • Wang et al. [2022] Wang, H., Zhou, Y., Zhang, J., Lei, J., Sun, D., Xu, F., Xu, X., 2022. Anomaly segmentation in retinal images with poisson-blending data augmentation. Medical Image Analysis 81, 102534.
  • Wang et al. [2019] Wang, K., Liew, J.H., Zou, Y., Zhou, D., Feng, J., 2019. Panet: Few-shot image semantic segmentation with prototype alignment, in: proceedings of the IEEE/CVF international conference on computer vision, pp. 9197–9206.
  • Xu et al. [2022] Xu, Z., Wang, Y., Lu, D., Yu, L., Yan, J., Luo, J., Ma, K., Zheng, Y., Tong, R.K.y., 2022. All-around real label supervision: Cyclic prototype consistency learning for semi-supervised medical image segmentation. IEEE Journal of Biomedical and Health Informatics 26, 3174–3184.
  • Yang et al. [2021] Yang, J., Zhang, Y., Liang, Y., Zhang, Y., He, L., He, Z., 2021. Tumorcp: A simple but effective object-level data augmentation for tumor segmentation, in: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I 24, Springer. pp. 579–588.
  • Yun et al. [2019] Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y., 2019. Cutmix: Regularization strategy to train strong classifiers with localizable features, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032.
  • Zhang et al. [2018] Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D., 2018. mixup: Beyond empirical risk minimization, in: International Conference on Learning Representations.
  • Zhang et al. [2023] Zhang, X., Liu, C., Ou, N., Zeng, X., Zhuo, Z., Duan, Y., Xiong, X., Yu, Y., Liu, Z., Liu, Y., et al., 2023. Carvemix: a simple data augmentation method for brain lesion segmentation. NeuroImage 271, 120041.
  • Zhu et al. [2022] Zhu, Q., Wang, Y., Yin, L., Yang, J., Liao, F., Li, S., 2022. Selfmix: a self-adaptive data augmentation method for lesion segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 683–692.