TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation
Abstract
Deep learning models are notoriously data-hungry. Thus, there is an urging need for data-efficient techniques in medical image analysis, where well-annotated data are costly and time consuming to collect. Motivated by the recently revived “Copy-Paste” augmentation, we propose TumorCP, a simple but effective object-level data augmentation method tailored for tumor segmentation. TumorCP is online and stochastic, providing unlimited augmentation possibilities for tumors’ subjects, locations, appearances, as well as morphologies. Experiments on kidney tumor segmentation task demonstrate that TumorCP surpasses the strong baseline by a remarkable margin of 7.12% on tumor Dice. Moreover, together with image-level data augmentation, it beats the current state-of-the-art by 2.32% on tumor Dice. Comprehensive ablation studies are performed to validate the effectiveness of TumorCP. Meanwhile, we show that TumorCP can lead to striking improvements in extremely low-data regimes. Evaluated with only 10% labeled data, TumorCP significantly boosts tumor Dice by 21.87%. To the best of our knowledge, this is the very first work exploring and extending the “Copy-Paste” design in medical imaging domain. Code is available at: https://github.com/YaoZhang93/TumorCP.
Keywords:
Data-efficiency Tumor segmentation Data augmentation.1 Introduction
Deep learning (DL) models work remarkably well over the past few years in computer vision tasks, including medical image analysis. Though DL models act like de facto standard, they are notoriously data-hungry, demanding more so than ever large and well-annotated datasets to achieve robust performance [16]. However, high-quality annotated datasets require intense labor and domain knowledge, which becomes more expensive in the medical domain.
To improve data-efficient learning, several successful approaches have been proposed from different perspectives, such as leveraging unlabeled data for semi-supervised self-training [16, 1, 15] or self-supervised pre-training [19, 1, 12], distilling priors from data as explicit constraints for model training [10, 9], generating new data with the imaging of an anatomy of a different modality [8, 11], or utilizing appropriate data augmentation methods to increase data diversity [5, 14, 2, 15]. Some of them are designated for medical images. Particularly, Zhou et al. [19] designed a unified self-supervised learning framework, integrating multiple proxy tasks to exploit unlabeled medical data, and showed performance gains for downstream tasks. Xue et al. [17], and Shin et al. [13] used GANs to generate additional training data for histopathology image classification and brain tumor segmentation. The quality of the “realness” of synthesized training data dramatically affects model performance due to the risk of overfitting to fake data. Eaton et.al. [3] studied Mix-up [18] augmentation for brain tumor segmentation. However, it requires a specific patch-level operation which involves complicated strategies, e.g. sampling of small patches to be mixed up.
Distinct from the trend of using increasingly sophisticated methods like GANs, we investigate “Copy-Paste”, a straightforward augmentation technique [4, 2] that has been recently revisited and made breakthroughs in natural image instance segmentation [5]. Copy-Paste augmentation avoids costly generation processes from representation space to pixel space by simply pasting the labeled instance onto new background images as additional training data. Despite its success in natural images, such method is largely unexplored in the medical image realm. Moreover, its effectiveness for medical tasks remains doubtable since the context information tends to be ignored in Copy-Paste. For instance, in the tumor segmentation, one would argue the importance of surrounding visual clues, i.e., context, for the emergence of a tumor. Besides, one would believe the inherent anatomical structures in medical image make the context indispensable for tumor segmentation. In this work, we also aim to fill the gap of understanding the role of context in medical domain by examining the effectiveness of Copy-Paste augmentation for tumor segmentation.
We propose TumorCP, a simple but effective object-level data augmentation method based on Copy-Paste for tumor segmentation tasks. Straightforwardly, TumorCP randomly chooses a tumor from a source image and paste it onto the organs in the target image after a series of spatial, contrast, and blurring augmentations. We use kidney tumor segmentation (KiTS19 dataset [6]) and a state-of-the-art model (nnUNet[7]) as the benchmark to evaluate the proposed method. We empirically show that though TumorCP inevitably generates artifacts after Copy-Paste, it consistently provides solid gains over all different settings in our experiments. Specifically, with only rigid spatial transformation and Copy-Paste within the same patient, TumorCP can surpass the baseline by 6.24% tumor Dice. Together with inter-patient Copy-Paste and other tumor-oriented augmentations, TumorCP further outperforms the baseline by 7.12% tumor Dice. Moreover, with image-level data augmentation (ImgDA), our best version beats state-of-the-art by 2.3%. Going one step further, we also study TumorCP for extremely low-data regime, where only 10% labeled data are exploited for training. Under this setting, TumorCP with ImgDA can improve the tumor Dice by 21.87% compared with no-data-augmentation (noDA), which is unprecedented to our knowledge, convincingly demonstrating the effectiveness of TumorCP for data-efficiency learning.
The success of TumorCP is an empirical observation to support context-decoupled learning even in medical domain. We briefly discuss our understanding of the open question of the context’s role and why TumorCP works in Sec 2.2. We hope our work can provide some useful data points to our community and shed light on the importance of Copy-Paste augmentation, which is powerful but unfortunately nearly absent in the medical imaging field.
2 Method

TumorCP is an online and stochastic augmentation process specified for tumor segmentation. Its implementation is easy and straightforward. As illustrated in Fig.1, given a set of training samples , with the probability of , TumorCP does nothing; otherwise TumorCP samples a pair of images and conducts Copy-Paste once. Let be the set of tumor(s) on , be the set of volumetric coordinates of organ(s) on , and be the set of stochastic data transformations, each of which has a probability parameter called . To do once Copy-Paste, TumorCP first samples a tumor , a set of transformation(s) , and a target location , followed by centering at to replace the original data and annotation. To fully leverage the advantage of TumorCP, we carefully design two modes of Copy-Paste for tumors: intra-patient and inter-patient Copy-Paste. Meanwhile, we enhance Copy-Paste with several object-level transformation to obtain abundant augmentations.
2.1 TumorCP’s augmentation
2.1.1 Intra-/Inter- Copy-Paste.
In order to study the effect of inter-patient variance to TumorCP, we define two base settings: 1) intra-patient Copy-Paste (intra-CP) if the source and target images are identical, i.e., both from the same patient and 2) inter-patient Copy-Paste (inter-CP) if those are different. From the perspective of data distribution, the intra-CP is preferred as its intensity agreement with the data as a whole, but this limits data diversity. From the perspective of data diversity, the inter-CP is favored as it unlocks the access for leveraging both new backgrounds and foregrounds from other patients, but it also brings distribution discrepancy. It might be surprising that we empirically show the inter-CP significantly outperforms intra-CP one in ablation study in Sec 3.2.1.
2.1.2 Copy-Paste with Transformations.
Building from plain Copy-Paste, we naturally extend it by incorporating four different object-level transformations motivated by different objectives as the followings. The detailed implementations are summarized in appendix.
-
–
Spatial transformation decouples context and improves morphology diversity. Given the fixed acquired CT images, tumors always appear along with their surrounding visual context. Though image-level spatial augmentation increases data diversity in terms of perspectives (e.g., mirroring and slight rotation), it still processes an image as a whole, remaining the coupling between foreground and background. Therefore, the model can seek for and tend to overfit to the plausible but de facto irrelevant surrounding clues. Note that plain Copy-Paste already addressed this problem by offering new background via the most basic spatial transformation — shifting. We further increase the morphology diversity by applying i) rigid transformation that includes scaling, rotation, and mirroring, and ii) elastic transformation that deforms tumors. Fig. 1 demonstrates examples of transformed tumors.
-
–
Gamma transformation enhances contrast and improves intensity diversity. Given a tumor, we apply gamma transformation to adjust its intensity distribution while retaining the whole intensity range. On the one hand, the tumor intensity diversity is enhanced by randomly sample gamma parameter; on the other hand, the local contrast is enhanced by power-law non-linearity, facilitating tumor discrimination.
-
–
Blurring transformation improves texture diversity. We use a Gaussian filter as the blurring transformation. Intuitively, a Gaussian filter with different sigma values can filter out the noise and smooth the tumor to some extent. Aggregating noise-perturbed low-level textures can indirectly increase the texture diversity to relatively high-level textures.
The whole pipeline can be incorporated together with image-level augmentation. It is worth mentioning that all the instance augmentation process is both online and random, bringing unlimited possibilities for tumors’ locations and appearances within or across the subjects.
2.2 Intuitions on TumorCP’s Effectiveness
As aforementioned, TumorCP has two goals: i) increase the data diversity, and ii) learn high-level and to abstract the invariant representation of tumor. Data diversity is increased as the new combinations of tumors, and their surroundings are generated with the augmentation. For learning high-level information, we discuss three properties of TumorCP to explain its effectiveness.
Eliminated background bias by context-invariant prediction. As mentioned before, the semantic contexts are fixed for the acquired medical images. Convolutional Neural Network (CNN) inevitably convolutes surrounding visual contexts along with the objects themselves. This can bias the model towards plausible but indeed tumor-irrelevant clues, increasing the risk of overfitting. With both random and online spatial transformation, TumorCP offers access for tumor to preciously unattached zones and thus provides unlimited possibilities for tumors’ surrounding contexts. It enforces the model’s prediction to be invariant across different visual surroundings and eliminates background bias.
Improved generalizability by transformation-invariant prediction. The model should capture both high-level semantic information and low-level boundary information for successful segmentation. With both random and online Gamma & Blur transformations, TumorCP can generate diverse tumors in terms of size, shape, color and texture, which increase the intra-class disparity. It tasks the model to capture the golden semantics from the data. In other words, it enforces the model’s prediction to be invariant across different data transformation (that potentially resembles real-world data) and improves generalizability.
Oversampling behavior. Data imbalance is a widely experienced problem. Typical solutions usually re-weight loss function or re-sample training data according to the class distribution. In this work, the distribution of background, organ, and tumor is extremely imbalanced. From this perspective, TumorCP acts like a data re-sampler that significantly increases the volume of tumors in multiplication degree at a minor cost.
3 Experiments and Discussion
3.1 Experiment settings
We evaluate TumorCP on KiTS19 [6], a publicly available dataset for kidney tumor segmentation. We randomly split the published 210 images into a training set with 168 images and a validation set with 42 images. As the limited computation resources, we majorly report ablation study results on the validation set if not specified. Note that this validation set is unaugmented and unseen i.e., neither used to tune hyper-parameters nor to monitor the training process. We use Sørensen-Dice Coefficient (Dice) score in all experiments, which measures the overlap of model’s prediction and ground truth , formulated as . The average and standard deviation of the Dice score over all patients are reported.
We use publicly available††https://github.com/MIC-DKFZ/nnUNet state-of-the-art nnUNet codebase for implementation, which includes data pre-processing, leading image-level augmentation pipelines, as well as top-performance models. It almost tops all biomedical image segmentation benchmarks [7]. This paper focuses on a general augmentation method for tumor segmentation, so the choices of datasets and running models are orthogonal to our goal. TumorCP can generalize to other segmentation models and tumor segmentation datasets at no cost.
All experiments are conducted on Nvidia V100 GPU with 500 epochs training of 3d_fullres nnUNet, instead of 1000 epochs by nnUNet’s default. The batch size for training is 2. During training, each epoch takes 250 iterations, which means 250 batches of data are sampled and learned. Other settings in model training remain its default. We refer readers to [7] and the codebase link for more details.
3.2 Ablation Study
For simplicity and unification, we set the probability of TumorCP performing Copy-Paste as for all experiments.
Ablation on intra-CP with different transformations. We first investigate TumorCP under intra-CP with various object-level transformations. In this ablation, no image-level augmentation is applied. All object-level transformations have a 0.5 probability of being invoked. For example, Intra-CP&Rigid means rigid transformation has a 0.5 probability to be conducted when Intra-CP is triggered. Table 1 presents the comparison on different methods. As the first group of Table 1 demonstrates, all the models trained with TumorCP (shaded cells) consistently outperform the baseline model, no-data-augmentation (noDA). Specifically, the vanilla intra-CP itself can bring 1.09% Dice improvement over baseline; TumorCP with only rigid transformation can increase tumor Dice by 6.24%.
Method | Mean Dice std / improvement over baseline (%) | |
Kidney | Tumor | |
noDA | 96.622.41/baseline | 72.5926.97/baseline |
Intra-CP | 96.812.02/+0.19 | 73.6826.99/+1.09 |
Intra-CP&Elastic | 96.751.88/+0.13 | 73.9528.20/+1.36 |
Intra-CP&Rigid | 96.781.92/+0.16 | 78.8319.77/+6.24 |
Intra-CP&Gamma | 96.811.89/+0.19 | 76.3223.97/+3.73 |
Intra-CP&Blur | 96.891.92/+0.27 | 76.4624.86/+3.87 |
Intra-CP | 96.812.02/+0.19 | 73.6826.99/+1.09 |
Inter-CP | 96.732.03/+0.11 | 77.2223.67/+4.63 |
Intra-&Inter-CP | 96.781.98/+0.16 | 77.4423.46/+4.85 |
3.2.1 Ablation on intra-/inter-CP.
Here we study the effect of intra-/inter-CP for the considerations in Sec. 2.1.1. The second group in Table 1 shows that inter-CP significantly outperforms intra-CP by 3.54% Tumor Dice, yielding a 4.63% improvement over the baseline model. Though surprised to some extent, this result meets our expectation as both the tumors’ and the backgrounds’ diversity from one patient are still limited compared to other patients. Copying others’ tumors and pasting them onto current patients’ cases is supposed to unlock more novel combinations and bring more data diversity. We also aggregate intra- and inter- CP by setting a 50% chance for each to sample data pairs from the dataset. The last line in Table 1 presents the result and is shown to the best entry among this ablation. It demonstrates the superiority of combining both intra- and inter- patient’s context exchange.
Ablation on compatibility. As the last step, we accumulate the composition of all object-level transformations and Intra-&Inter-CP to constitute TumorCP⋆. Previously we improve from noDA baseline. Here we also explore the compatibility between TumorCP and image-level augmentation. The image-level augmentation follows nnUNetV2Trainer default setting detailed here††https://git.io/Jqvro [7]. Results in Table 2 shows that TumorCP⋆ is compatible with image-level augmentations, and thus can act as a plug-in module in general augmentation pipeline. Together with image-level augmentation, TumorCP⋆ can improve 7.12% from no image-level augmentation (noDA) baseline and 2.32% from image-level augmentation (ImgDA) baseline. It is worthy to mention that the ImgDA baseline currently still holds the state-of-the-art performance for KiTS Dataset, which means TumorCP⋆ can further boost exisiting arts to higher performance. TumorCP⋆ can generalize to other models and datasets at almost no cost.
Method | Mean Dice std / improvement over baseline (%) | |
Kidney | Tumor | |
noDA | 96.622.41/baseline | 72.5926.97/baseline |
TumorCP⋆ | 96.861.91/+0.24 | 79.7122.56/+7.12 |
ImgDA | 97.061.48/baseline | 82.4321.29/baseline |
TumorCP⋆ + ImgDA | 97.151.43/+0.09 | 84.7520.87/+2.32 |
TumorCP also improves organ segmentation. Though TumorCP is intended for better tumor segmentation, it also consistently improves kidney segmentation performance compared to its baselines. It also meets our intuitions for TumorCP, since from the perspective of kidney, tumors are the relative context and background to some extent, which resembles “Eliminated background bias by context-invariant prediction” but now for the kidney.
3.3 Towards extremely low-data regime
Finally, we demonstrate the potentials of TumorCP in extremely low-data regime via some additional ablations. Particularly, we randomly select 10% data from the training set same as before. Then, we train three models, noDA, ImgDA and TumorCP⋆ + ImgDA on 10% data respectively, followed by the evaluation on the same validation set. Table 3 shows the results. Under this setting, our method can improve the noDA by 21.87%, which, to the best of our knowledge, is unprecedented, convincingly demonstrating the effectiveness of TumorCP for data-efficiency learning. It breaks the trend of using sophisticated methods or strategies while achieving promising results in low-data regime of tumor segmentation.
Method | Mean Dice std / improvement over baseline (%) | |
Kidney | Tumor | |
10%-data noDA | 93.254.41/baseline | 41.1239.58/baseline |
10%-data ImgDA | 95.413.25/+2.16 | 54.3431.59/+13.22 |
10%-data TumorCP⋆ + ImgDA | 95.533.25/(+2.16/+2.28) | 62.9926.92/(+13.22/+21.87) |
4 Conclusion and Future Works
This key contribution of our work is the proposal and comprehensive study of TumorCP, a simple but effective object-level data augmentation for tumor segmentation. Extensive experiments confirm the remarkable effectiveness of our method. In addition to surpassing current art in kidney tumor segmentation by 2.31% in tumor Dice, we also demonstrate the potential of TumorCP for the extremely low-data regime. We prefer to call our TumorCP as a new baseline, as it does not involve any sophisticated techniques nor extensive hyper-parameter adjustment while achieving the new state-of-the-art. Besides, TumorCP does not directly handle the distribution mismatching in the inter-CP setting but still gets fabulous performance. Future works can easily extend TumorCP for other medical segmentation tasks without significant modifications, and are worth trying for further improving state-of-the-art accuracy.
References
- [1] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020)
- [2] Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 1301–1310 (2017)
- [3] Eaton-Rosen, Z., Bragman, F., Ourselin, S., Cardoso, M.J.: Improving data augmentation for medical image segmentation (2018)
- [4] Fang, H.S., Sun, J., Wang, R., Gou, M., Li, Y.L., Lu, C.: Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 682–691 (2019)
- [5] Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.Y., Cubuk, E.D., Le, Q.V., Zoph, B.: Simple copy-paste is a strong data augmentation method for instance segmentation. arXiv preprint arXiv:2012.07177 (2020)
- [6] Heller, N., Sathianathen, N., Kalapara, A., Walczak, E., Moore, K., Kaluzniak, H., Rosenberg, J., Blake, P., Rengel, Z., Oestreich, M., et al.: The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes. arXiv preprint arXiv:1904.00445 (2019)
- [7] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18(2), 203–211 (2021)
- [8] Liang, Y., Qiu, L., Lu, T., Fang, Z., Tu, D., Yang, J., Shao, Y., Wang, K., Chen, X.A., He, L.: Oralviewer: 3d demonstration of dental surgeries for patient education with oral cavity reconstruction from a 2d panoramic x-ray. In: 26th International Conference on Intelligent User Interfaces. pp. 553–563 (2021)
- [9] Liang, Y., Song, W., Dym, J.P., Wang, K., He, L.: Comparenet: Anatomical segmentation network with deep non-local label fusion. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 292–300 (2019)
- [10] Liang, Y., Song, W., Yang, J., Qiu, L., Wang, K., He, L.: Atlas-aware convnet for accurate yet robust anatomical segmentation. In: Asian Conference on Machine Learning. pp. 113–128. PMLR (2020)
- [11] Liang, Y., Song, W., Yang, J., Qiu, L., Wang, K., He, L.: X2teeth: 3d teeth reconstruction from a single panoramic radiograph. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 400–409 (2020)
- [12] Mitrovic, J., McWilliams, B., Walker, J., Buesing, L., Blundell, C.: Representation learning via invariant causal mechanisms. arXiv preprint arXiv:2010.07922 (2020)
- [13] Shin, H.C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L., Andriole, K.P., Michalski, M.: Medical image synthesis for data augmentation and anonymization using generative adversarial networks. In: International workshop on simulation and synthesis in medical imaging. pp. 1–11. Springer (2018)
- [14] Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. Journal of Big Data 6(1), 1–48 (2019)
- [15] Sohn, K., Berthelot, D., Li, C.L., Zhang, Z., Carlini, N., Cubuk, E.D., Kurakin, A., Zhang, H., Raffel, C.: Fixmatch: Simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020)
- [16] Xie, Q., Luong, M.T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10687–10698 (2020)
- [17] Xue, Y., Zhou, Q., Ye, J., Long, L.R., Antani, S., Cornwell, C., Xue, Z., Huang, X.: Synthetic augmentation and feature-based filtering for improved cervical histopathology image classification. In: International conference on medical image computing and computer-assisted intervention. pp. 387–396. Springer (2019)
- [18] Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
- [19] Zhou, Z., Sodha, V., Siddiquee, M.M.R., Feng, R., Tajbakhsh, N., Gotway, M.B., Liang, J.: Models genesis: Generic autodidactic models for 3d medical image analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 384–393. Springer (2019)
Appendix 0.A Appendix – Data augmentation details
0.A.1 Rigid Transformation
Rigid transformation consists of three transformation:
-
–
Mirroring: randomly choose one from eight possible mirroring axes combination, i.e. (x), (y), (z), (x,y), (x,z), (y,z), (x,y,z), with probability of 0.5.
-
–
Rotation: Since the 3D abdominal CT data are usually anisotropic. We only rotate the instance around z-axis to constrain the spacing consistency. The tumor will rotate randomly in a range of with the probability of 0.5.
-
–
Scaling: With the probability of 0.5, the tumor will be re-scaled in a range of with resize function of skimage package in an order 3.
0.A.2 Deformable Elastic Transformation
We use the implementation of batchgenerators††Publicly available at: https://git.io/JqfTt python library, with the alpha range and sigma range .
0.A.3 Gamma Transformation
For gamma transformation, we force the mean and standard deviation of the copied instance unchanged. The gamma value is sampled in a range of (0.7, 1.5).
0.A.4 Blurring Transformation
We use Gaussian filter with the sigma sampled from as blurring transformation.