Medical Image Segmentation with Limited Supervision: A Review of Deep Network Models
Abstract
Despite the remarkable performance of deep learning methods on various tasks, most cutting-edge models rely heavily on large-scale annotated training examples, which are often unavailable for clinical and health care tasks. The labeling costs for medical images are very high, especially in medical image segmentation, which typically requires intensive pixel/voxel-wise labeling. Therefore, the strong capability of learning and generalizing from limited supervision, including a limited amount of annotations, sparse annotations, and inaccurate annotations, is crucial for the successful application of deep learning models in medical image segmentation. However, due to its intrinsic difficulty, segmentation with limited supervision is challenging and specific model design and/or learning strategies are needed. In this paper, we provide a systematic and up-to-date review of the solutions above, with summaries and comments about the methodologies. We also highlight several problems in this field, discussed future directions observing further investigations.
Index Terms:
Medical image segmentation, semi-supervised segmentation, partially-supervised segmentation, noisy label, sparse annotationI Introduction
Medical image segmentation, identifying the pixels/voxels of anatomical or pathological structures from background biomedical images, is of vital importance in many biomedical applications, such as computer-assisted diagnosis, radiotherapy planning, surgery simulation, treatment, and follow-up of many diseases. Typical medical image segmentation tasks include brain and tumor segmentation [1, 2, 3], cardiac segmentation [4], liver and tumor segmentation [5, 6, 7, 8], cell and subcellular structures [9, 10, 11], multi-organ segmentation [12] and lung and pulmonary nodules [13], vessel segmentation [14], etc., and thus can deliver crucial information about the objects of interest. While semantic segmentation of medical images involves labeling each pixel/voxel with the semantic class, instance segmentation (such as cell segmentation) extends semantic segmentation to discriminate each instance within the same class. Recently, deep learning methods have achieved impressive performance improvements on various medical image segmentation tasks and set the new state of the art. Numerous image segmentation algorithms have been developed in the literature and have made great progress on the designs and performance of deep network models [15, 16].

However, the scarcity of high-quality annotated training data has been a significant challenge for medical image segmentation. The strong generalization capabilities of most cutting-edge segmentation models, which are usually deep and wide networks, highly rely on large-scale and high-quality pixel-wise annotated data, which are often unavailable for clinical and health care tasks. In fact, it is an expensive and time-consuming process to manually annotate medical images at pixel-level since it requires the knowledge of experienced clinical experts. The scarcity of annotated medical imaging data is further exacerbated by the data differences in patient populations, acquisition parameters and protocols, sequences, vendors, and centers, which may result in obvious statistical shifts. Thus, it is even challenging to collect a sufficiently large number of training data due to the heterogeneous nature of medical imaging data and the strict legal and ethical requirements for patient privacy. The data scarcity problem is much more severe for emerging tasks and new environments, where quick model employment is expected. However, only a limited amount of annotations with limited quality are available. Therefore, the high cost of pixel-level labeling and the privacy and security of data hinder the model training and their scalability to novel images of emerging tasks and new environments, which subsequently hamper the application of deep segmentation models in real-world clinical and health care usage. Thus, learning strong and robust segmentation models from limited labeled data and readily available unlabeled data is crucial for the successful application of deep learning models in clinical usage and health care.
These challenges have inspired many research efforts on learning with limited supervision, where the training data only have a limited amount of annotated examples, accurate but sparse annotations, inaccurate annotations, coarse-level annotations, and combinations of thereof. However, due to its intrinsic difficulty, segmentation with limited supervision is challenging, and specific model design and/or learning strategies are needed. Despite these challenges, researchers have introduced a diverse set of deep network models [16] that can handle incomplete, sparse, inaccurate, or coarse annotations. However, the progress is more slowly than that of fully supervised learning. In this paper, we will take a systematic and up-to-date look at the development of recent technologies that explored the unlabeled examples and prior knowledge to address the limited supervision and small data problem.
Several comprehensive surveys exist about the deep learning methods [15, 17] or its subcategories such as generative adversarial networks (GAN) [18] for general medical image analysis (including classification, reconstruction, detection, registration, and segmentation) [15, 17, 19] or a specialized topic [1, 4, 9, 12, 13], such as digital pathology image analysis [12, 20], and incorporating domain knowledge [21]. While several reviews have reviewed the application of deep learning for a specific application of the segmentation, such as cardiac image segmentation [4], brain tissue segmentation [1], brain tumor segmentation [22], and segmentation for covid-19 [13], the surveys by [23, 16, 24] review advancement of deep network architectures, losses, and training strategies for medical image segmentation. There are also several reviews closely related to our paper. Karimi et al. [25] reviewed deep learning methods dealing with label noises for medical image analysis, where most of the representative studies are about medical image classification. Zhang et al. [26] provided a review of deep learning methods that tackling small sample problems for various medical image analysis tasks, such as classification, detection, and segmentation. The most relevant survey to our study is [27], which focused on deep learning solutions for medical image segmentation with an imperfect training set. The current survey focuses on deep network models for medical image segmentation with limited supervision and provides a more updated review of recent advancements. An overview of the main body of this survey is demonstrated in Fig. 1.
To sum up, the main contributions of this paper are:
-
•
We provide a systematic and up-to-date review of medical image segmentation with limited supervision. One can quickly identify the frontier ideas in this field and, more importantly obtain an overall picture of the problems and methodologies in this research area.
-
•
We categorize the problem of limited-supervised segmentation into semi-supervised segmentation, partially supervised segmentation, and inaccurately-supervised segmentation and offer a structural review of recent advances in methods that can be used to address these problems. We also offer summaries and comments about the pros and cons of the methodologies in each category, and the connections of methods in different categories.
-
•
We also highlight several problems in this field and discuss the limitation and the new trends and future directions for medical image segmentation with limited supervision.
The paper is organized as follows. In Section II, we provide preliminary knowledge about medical image segmentation and the basic deep network architectures for this task, as well as the categorization of medical image segmentation with limited supervision. Section III to Section V provide a detailed review of methods for semi-supervised segmentation, partially-supervised segmentation, and inaccurately supervised segmentation, respectively. Section VI discusses future directions. Lastly, Section VI concludes this survey.



II Overview
Medical image segmentation involves delineating the anatomical or pathological structures from medical images of various modalities. As pointed out in [17], medical images are heterogonous with imbalanced classes and have multiple modalities with sparse annotations. Thus, it is complicated and challenging to analyze various medical images. Here, we focus on the medical image segmentation problem, which typically consists of semantic segmentation and instance segmentation. Semantic segmentation refers to the task of assigning each pixel/voxel with a semantic category label (such as liver, kidney, etc.). Thus, semantic segmentation generates per-pixel segmentation masks, and the multiple objects of the same category are treated as one entity. In contrast, instance segmentation delineates the instances of each category. An illustration of the difference of semantic segmentation and instance segmentation is exemplified in Fig. 2. For semantic segmentation of nuclei, all nuclei pixels are annotated with the same label, and instance segmentation associates different nuclei with different labels.
Deep network for image segmentation. The convolutional neural network (CNN) has been the de-facto solution for medical image segmentation. CNNs have shown striking improvement over traditional methods, such as machine learning methods using hand-crafted features [28, 10], graph cut methods [29], shape deformation [30], and variational methods [31]. Image segmentation has recently been tackled by end-to-end learning and fully convolutional networks (FCN) [32], especially in encoder-decoder architecture [33, 34]. Compared to classical CNN, the FCN is composed of convolutional layers without any fully-connected layer at the end of the network and can transform the feature map of the intermediate layer back to the size of the input image. Thus, the prediction of an FCN has a spatial one-to-one correspondence with the input image, which has dramatically promoted semantic segmentation research. Many models with improved network architectures have been introduced, such as SegNet [33] with encoder-decoder architecture, the U-Net [34], PSP-Net [35] with pyramid pooling, DeepLab [36] with atrous spatial pyramid pooling, Attention U-Net with attention module[37], etc. An FCN in encoder-decoder architecture typically consists of a contracting sub-net, i.e., the encoder, that gradually reduces the feature maps and captures high-level features, and an expending sub-net, i.e., the decoder, that gradually recovers the spatial information and fine boundary. A demonstration of the difference between a CNN and an FCN in encoder-decoder architecture is shown in Fig. 3. Notably, the U-Net introduces additional skip connections between the encoder and decoder (as shown in Fig. 4), and have produced very impressive results in the domain of medical image segmentation. Dense skip connections were introduced in the DenseNet [38], and have been widely used in many segmentation models [15]. Please refer to [39, 40, 15] for comprehensive reviews of recent improvements of the FCN models for semantic segmentation of natural and medical images.


Image segmentation with limited supervision. The cost of labor-intensive, pixel-level annotation of large scale medical imaging data can be reduced by utilizing 1) a small subset of labeled training data, also known as semi-supervised learning or few-shot learning; 2) partial annotations (including sparse annotations), i.e., partially-supervised learning; or 3) inaccurate annotations including noisy labels, bounding boxes, and boundary scribbles. It is noteworthy that, though the labeled data is scarce in the semi-supervised setting, these annotated are typically assumes to be precise and reliable, which is different from inaccurate and partial annotation settings. The extreme case of limited supervision is the unsupervised setting, where there is ultimately no labeled data available. However, methods only exploring unlabeled data, such as clustering, are usually task-agnostic and usually show very low performance for the complicated segmentation tasks. Recently, auxiliary tasks, such as the adaptation of a well-trained model from a similar domain with a similar task [41, 11], have been leveraged to migrate this problem. Although we will not cover the unsupervised segmentation and their solutions, such as unsupervised domain adaptation (UDA) [42] and zero-shot learning [43], we mention it here to start by looking at all settings in the big picture. In this paper, we focus on methods that learn to segment medical images with incomplete, inexact, and inaccurate annotations by jointly leveraging a few labeled data and a large number of unlabeled examples.
III Semi-supervised Segmentation
Semi-supervised Segmentation is a common scenario in medical applications, where only a small subset of the training images are assumed to have full pixel-wise annotations. However, there is also an abundance of unlabeled images that can be used to improve both the accuracy and generalization capabilities. Since unlabeled data do not involve labor-intensive annotations, any performance gain conferred by using unlabeled data comes with a low cost. The major challenge of this learning scenario lies in how to efficiently and thoroughly exploit a large quantity of unlabeled data. The most common approaches for semi-supervised segmentation include 1) general strategies, e.g., transfer learning, data augmentation, prior knowledge learning, curriculum learning, and few-shot learning, and 2) specialized methods that make use of unlabeled data, e.g., self-training [44, 45, 46, 47], consistency regularization [48], co-training, self-supervised learning, and adversarial learning.
In the following subsections, we first review general methods that can be used to address small labeled data problem, i,e., transfer learning, data augmentation, prior knowledge learning, curriculum learning, as shown in Fig. 5. Then, we discuss specialized methods designed for semi-supervised learning. Finally, we elaborate on few-shot learning, which learns to generalize from a few examples with prior knowledge.

III-A Transfer learning
Transfer learning refers to reusing a model developed for a task as the starting point for a model on a second task, which may speed up the learning process, alleviate the problem of limited training data, and improve generalization on the second task. In contrast, training an entire deep network from scratch usually requires a large-scale labeled dataset. The ”model pretraining and fine-tuning” strategy, a notable example of transfer learning, has been a simple but effective paradigm in many deep network applications since many tasks are related. In many deep learning studies, transfer learning also narrowly refers to the ”model pretraining and fine-tuning” strategy. Typically, transfer learning from natural image data to medical datasets involves starting with standard network architectures, e.g., VGG [49] and ResNet [50], and using corresponding pre-trained weights trained on large-scale external sources of natural images, namely ImageNet [51] and PASCAL VOC [52], as initialization or a fixed feature extractor, and then fine-tuning the model on medical imaging data. This model reusing strategy tends to work if the features are general and suitable for both the source and target tasks. The transferability of features on different layers of the deep network was investigated in [53]. They showed that transferring features even from distant tasks can be better than using random features.
It is noteworthy that, various medical applications involves segmentation of 3D medical images, which hinders the transfer of pre-trained model on 2D natural images to the current task. While it is straightforward to reformulate volume image segmentation to slice-by-slice 2D segmentation, rich spatial 3D contexts will inevitably lose. Possible solutions for transferring 2D networks to 3D networks include 1) copying the 2D kernels along an axis [54] and 2) padding the pre-trained 2D kernels by zeros along an axis [55, 56]. For instance, Liu et al. [56] proposed to transfer convolutional features learned from 2D images to 3D anisotropic volume and obtained desired strong generalization capability of the pre-trained 2D network.
For medical image analysis, another challenge of the pre-trained model on large-scale natural image sets is the significant domain gap between natural images and medical images, even an obvious gap between medical images of different modalities. Tajbakhsh et al. [57] investigated the effectiveness of pre-trained deep CNNs with sufficient fine-tuning compared to training a deep network from scratch on four different medical imaging applications. They showed that, in most cases, fine-tuned network could outperform those trained from scratch and showed better robustness. However, Raghu et al. [58] recently evaluated the properties of transfer learning from ImageNet on two large scale medical imaging tasks. They demonstrated a contrasting result that transfer learning gained little performance benefit, and simple and lightweight models can perform comparably to large pre-trained networks. Zoph et al. [59] demonstrated that stronger data augmentation and more labeled data would diminish the benefit of pretraining for vision applications, but self-training is always helpful.
While a well pre-trained model trained on a large-scale medical image dataset may be more valuable for medical image segmentation, there is not a large scale annotated dataset like ImageNet in the medical domain. To obtain a universal pre-trained model with promising transferable and generalizable ability for medical image analysis, several studies have proposed to pre-train models on medical datasets that are limited to specific modalities or tasks. Zhou et al. [60] built a 3D pre-trained model, called Genesis Chest CT, using unlabeled 3D Chest Computed Tomography (CT) images with a novel self-supervised learning method. Similar pre-trained models for specific image domains were also similarly built, such as Genesis Chest CT 2D, and Genesis Chest X-ray, which used 2D chest CT and chest X-ray images, respectively. A universal 3D model was learned in [61] by leveraging a self-supervised learning scheme from multiple unlabeled source datasets of different modalities and distinctive scan regions.
III-B Data augmentation
Since deep networks are heavily reliant on big data to learn discriminative representation and avoid overfitting, data augmentation [62] has been considered as a simple yet effective data-space solution to the problem of limited annotated data. Specifically, data augmentation aims to artificially enhance the size, diversity, and quality of the training data without collecting and manually labeling new data. Typical data augmentation methods not only include data warping methods [62] such as random affine and elastic transformations, random cropping [50], random erasing [63, 64], intensity transformation and adversarial data augmentation [65, 66], but also include methods that can synthesize more diverse and realistic labeled examples, such as mixing images [67, 68, 69, 70], feature space augmentation [71], and generative adversarial networks [72, 18, 73, 74]. While general transformation augmentation methods such as random affine transformations, elastic transformations, and intensity transformations are easy to implement and have shown performance improvements in abundant applications [34, 75, 2], they do not take advantage of the knowledge in unlabeled training data. Recently, there is a growing interest in developing augmentation that can simulate real variations of the data, and thus task-driven approach [76, 77, 78] is a promising direction. Schlesinger et al. [79] provided a recent review of data augmentation methods for brain tumor segmentation.
Mixing and cutting images [67, 68, 69, 80] is a class of simple but effective augmentation methods in many applications [70]. Specifically, Mixup [67] linearly interpolates a random pair of training images and correspondingly their labels. Recently, Mixup has been improved in [69] with learned mixing policies to prevent manifold intrusion. Cutout [64] adopts the idea of the regional dropout strategy, that is occluding a portion of an image, on training data. Alternatively, CutMix [68] is a combination of aspects of Mixup and Cutout by replacing a portion of an image with a portion of a different image. For the application of medical image segmentation, Panfilov et al. [70] tested the efficiency of Mixup for knee MRI segmentation and showed improved model robustness.
Adversarial data augmentation involves harnessing adversarial examples to train robust models against unforeseen data corruptions or distribution shifts [81, 66, 65] and thus is plausible to cope with limited labeled training data. When applying to medical image segmentation, designing and constructing more realistic adversarial perturbations is a crucial problem [82, 65, 77]. For MR image segmentation, Chen et al. [77] introduced intensity inhomogeneity as a new type of adversarial attack using a realistic intensity transformation function learned with adversarial training to amplify intensity non-uniformity in MR images and simulate potential image artifacts, such as bias field. To obtain adversarial samples subject to a given transformation model, Olut et al. [83] proposed to learn a statistical deformation model that can capture plausible anatomical variations from unlabeled data by deep registration models. A similar idea was adopted in [84].
Generative adversarial networks (GAN) [72, 18] have also been utilized to conduct medical data augmentation by directly synthesizing new labeled data. Costa et al. [85] proposed training a generative model with adversarial learning to synthesize both realistic retinal vessel trees and retinal color images. For semi-supervised medical image segmentation, Chaitanya et al. [86] proposed to learn a generative network to synthesize new samples from both labeled and unlabeled data by simultaneously learning and applying realistic spatial deformation fields and additive intensity transformation fields. To improve cross-modal segmentation with limited training samples, Cai et al. [87] developed a cross-modality data synthesis approach to generate realistic looking 2D/3D images of a specific modality as data augmentation. Yu et al. [88] integrated edge information into conditional GAN [89] for cross-modality MR image synthesis. To segment pulmonary nodules, Qin et al. [90] augmented the training set with synthetic CT images and labels and achieved promising results. For one-shot brain segmentation, Zhao et al. [76] used a data-driven approach for synthesizing labeled images as data augmentation. Specifically, they proposed to model the set of spatial and appearance transformations between all the training data, including both the labeled and unlabeled images, and then applied the learned transformations on the single labeled image to synthesize new labeled images.
III-C Prior knowledge learning
A group of semi-supervised methods has addressed semi-supervised segmentation by incorporating prior knowledge/ domain knowledge such as anatomical priors about the objects of interest into the segmentation model as a strong regularization [91, 92, 93, 94, 95, 96]. In fact, prior knowledge about location, shape, anatomy, and context is also crucial for manual annotation, especially in the presence of fuzzy boundaries or low image contrast. As for semantic segmentation with deep networks, the model training is typically guided by local or pixel-wise loss functions (e.g., Dice loss [97] and cross-entropy loss), which may not be sufficient to learn informative features about the underlying anatomical structures and global dependencies. Anatomical-prior guided methods usually assume the plausible solution space can be expressed in the form of a prior distribution, enforcing the network to generate more anatomically plausible segmentations.
Atlas-based segmentation [98, 99] with single- or multiple- atlas has been widely used in medical image segmentation to exploit prior knowledge from previously labeled training images. An atlas consists of a reference model with labels related to the anatomical structures. Thus, it can provide crucial knowledge, such as information about location, texture, shape, spatial relationship, etc., for segmentation, especially when limited labeled data available for model training. Atlas-based methods essentially treat the segmentation problem as a registration problem, and non-rigid registration is typically used to account for the anatomical differences between subjects. Wang et al. [91] addressed one-shot segmentation of brain structures from Magnetic Resonance Images (MRIs) with single atlas-based segmentation, where reversible voxel-wise correspondences between the atlas and the unlabelled images were learned with a correspondence-learning deep network. Ito et al. [100] considered semi-supervised segmentation of brain tissue from MRI. Specifically, they relied on image registration with one or more atlas to generate pseudo labels on unlabeled data. The expectation-maximization (EM) algorithm was used to update model parameters and pseudo labels alternatively. However, image registration, the process of geometrically aligning two or more images, is computation-intensive, which may hamper its practical application. Please refer to [101] for a comprehensive review of both affine and deformable image registration with deep learning methods. A similar idea was employed by Chi et al. [102], where they generated pseudo-labels by utilizing deformable image registration to propagate atlas labels onto unlabeled images. Xu and Niethammer et al. [92] proposed to jointly learn two deep networks for weakly-supervised image registration and semi-supervised segmentation, assuming that these two tasks can mutually guide each other’s training on unlabeled images. He et al. [103] further proposed an improved joint learning model, which added a perturbation factor in the registration to its sustainable data augmentation ability and a discriminator to extract registration confidence maps for better guidance of the segmentation task. For 3D left cavity (LV) segmentation on echocardiography with limited annotation data, Dong et al. [104] introduced a deep atlas network with a lightweight registration network and a multi-level information consistency constraint. However, registration, which is a computation insensitive and challenging task, is not an essential segmentation. For semi-supervised 3D liver segmentation, Zheng et al. [105] proposed to combine probabilistic atlas, which can provide the shape and position prior, with deep segmentation networks using a prior weighted cross-entropy loss. The probabilistic atlas was obtained by averaging the manually labeled liver masks after aligning all labeled training images. Vakalopoulou et al. [106] developed an AtlasNet, which consisted of multiple deep networks trained after co-aligning multiple anatomies through multi-metric deformable registration. The multiple deep networks were used to map all training images to common subspaces to reduce biological variability.
Shape-prior based segmentation [94, 95, 96, 107, 108, 109, 110, 111, 112, 113] has been an active research topic in the context of deep learning to obtain more accurate and anatomically plausible segmentation. While principal component analysis (PCA) based statistical shape model (SSM) [30] was widely adopted by traditional segmentation methods, it is not straightforward to combine SSM with deep networks. Ambellan et al. [114] combined 3D SSMs with 2D and 3D deep convolutional networks to obtain a robust and accurate segmentation of even highly pathological knee bone and cartilage. Specifically, they used SSM adjustment as a shape regularization of the outputs of the segmentation networks. Oktay et al. [94] initially used a stacked convolutional autoencoder to learn non-linear shape representations, which is integrated with the segmentation network to enforce its predictions to follow the learned anatomical priors. With the shape prior, their method obtained highly competitive performance for cardiac image segmentation while learning from a limited number (30) of labeled cases. Rather than using the compact codes produced by an autoencoder as the shape constraint in [94], Yue et al. [113] used the reconstructions of the predicted segmentations to maintain a realistic shape of the resulting segmentation.
While the framework in [94, 113] incorporated the learned anatomical prior into deep networks through a regularization term, Painchaud et al. [115] incorporated the anatomical priors through an additional post-processing stage. Specifically, they warped initial segmentation results toward the closest anatomically correct cardiac shape, which was leaned and generated with a constrained variational autoencoder. Ravishankar et al. [116] introduced a shape regularization network (convolutional autoencoder) after the segmentation. Larrazabal et al. [107] learned lower-dimensional representations of plausible shapes with a denoising autoencoder and use it as a post-processing step to impose shape constraints on the coarse output of the segmentation network.
As a novel extension of template deformation methods [30] in the context of deep networks, Lee et al. [110] introduced a template transformer network, where a shape template is deformed to match the underlying structure of interest through an end-to-end trained spatial transformer network. Zotti et al. [117] introduced a probabilistic image estimated by computing the pixel-wise empirical proportion of each class based on aligned ground truth label fields of the training images. The probabilistic shape-prior image was concatenated with network features for prior guidance. For the semi-supervised 3D segmentation of renal artery, He et al. [103] proposed assisting the segmentation network with multi-scale semantic features extracted from unlabeled data with an autoencoder.
Other types of anatomical priors such as star shape prior [108, 118, 119, 120, 121], convex shape prior [122], topology [123, 124, 125, 126, 127], size [128, 129, 130], etc., have also been introduced to improve the segmentation robustness and anatomically accuracy.


III-D Curriculum learning
Given the greater complexity of semi-supervised segmentation over classification and the importance of starting small, the concept of curriculum learning [131], or the easy-to-hard strategy, has also been utilized. Curriculum learning describes a type of learning strategy that first starts with easier aspects of the task or easier subtasks and then gradually increases the difficulty level. In a broad sense, the most widely used curriculum learning strategies include data curriculum learning and task curriculum learning. While early studies focused on data curriculum learning by reweighting the target training distribution, recent studies also investigated the varied easiness among different works [132], i.e., task curriculum learning.
In data curriculum learning, non-uniform sampling of examples or mini-batches from the entire training data, rather than uniform-sampling as in standard deep network training, is used in model training. Therefore, the core tasks are how to rank the training examples and how to guide the order of presentation of examples based on this ranking [133]. Thus, it is flexible to incorporate prior knowledge about the data and task. It has been empirically demonstrated that this learning paradigm is useful in avoiding bad local minima and in achieving better generalization ability [134]. Data curriculum learning has recently been used in several medical applications, especially location and classification tasks [135, 136, 137, 78] but few in segmentation tasks [138, 139]. To train a deep network for the classification and location of thoracic diseases on chest radiographs, Tang et al. [135] first ranked the training images according to the difficulty (indicated by the severity-levels of the disease) and then fed them to the deep network to boost the representation learning gradually. For fracture classification, Jiménez-Sánchez et al. [137, 140] assigned a degree of difficulty to each training example according to medical decision trees and inconsistencies in multiple experts’ annotations. In addition to the predefined curriculum by prior knowledge and keeping it fixed after that, the curriculum can also be dynamically determined to adapt to the feedback of the learner, also known as self-paced curriculum learning [141] or self-paced learning [142]. For lung nodule segmentation/detection with extreme class imbalance, Jesson et al. [138] introduced an adaptive sampling strategy, which favors difficult-to-classify examples. For instance-level segmentation of pulmonary nodule, Wang et al. [139] employed pseudo labels as the surrogate of ground truth labels on unlabeled data. To utilize the pseudo-labeled data, they followed the idea of self-spaced curriculum learning [141] and embedded curriculum design as a regularization term into the learning objective.
Task curriculum learning consists of tackling easy but related tasks first to provide auxiliary information for more complicated tasks, which will be solved later. Task curriculum learning is highly related to multi-stage learning in segmentation [143, 144, 2], where more easier tasks such as location or coarse segmentation are first solved with a simple method. After that, the more complex pixel-level segmentation is addressed. For example, for cross-domain segmentation of natural images, Zhang et al. [145] proposed to solve easy tasks first to infer necessary properties about the target domain. Specifically, they first estimated label distributions over both global images and some landmark superpixels of the target domain. They then enforced the semantic segmentation network to follow those target-domain properties as much as possible. For left ventricle segmentation in MR images, Kervadec et al. [146] introduced a curriculum-style strategy that first learned the size of the target region, which is the more easier task and then regularized the segmentation task, which is a difficult task, with pre-learned region size.
III-E Self-training strategy
Self-training [147, 148, 44, 149] (also called pseudo-labeling) is an iterative process that alternatively generates pseudo-labels on the unlabeled data and retrains the learner on the combined labeled and pseudo-labeled data. An illustration of the self-training based semi-supervised segmentation is shown in Fig. 6 (a) (with a comparison of the strategy of human-in-the-loop, i.e., active learning for interactive segmentation that will be discussed in Sec. IV). The generality and flexibility of self-training have been validated in many applications [59]. A fundamental property of self-training strategy is that it can be combined with any supervised learner and provides a straightforward but effective manner of leveraging unlabeled data.
Self-training is usually in a teacher-student paradigm [150, 151] (as shown in Fig. 7), which consists of first learning a teacher model from ground truth annotations and then using the predictions of the teacher model to generate pseudo-labels on the unlabeled data. The ground truth annotations and pseudo labels with high confidence are further jointly digested iteratively to learn a powerful student model. Bai et al. [44] first trained a teacher neural network using the labeled data and then utilized prediction confidence (i.e., the probability prediction followed by a conditional random field (CRF) refitment) of the teacher model on the unlabeled data as the pseudo labels. Fan et al. [152] applied a similar strategy for lung infection segmentation from CT images. Typically, the self-training method is iterative, and the quality of pseudo-labels should be gradually improved for a successful self-training approach [151]. The self-training strategy’s main challenge lies in generating reliable pseudo labels and handling the negative impacts by adding incomplete and incorrect pseudo-labels, which may confuse the model training.
A promising direction to improve the quality of pseudo labels and reduce the negative impact of noisy pseudo labels is to estimate the uncertainty or confidence estimation [153, 154, 155, 156, 157]. To this end, it is pleasable to let the teacher model simultaneously generate the segmentation predictions as the pseudo labels and estimate the uncertainty maps for the unlabeled images. The uncertainty maps can be used as guidance to maintain reliable predictions [158, 159, 160, 161, 162, 163, 164, 165].
There are two categories of uncertainty [154, 157] one can model, namely aleatoric uncertainty (data uncertainty), which is an inherent property of the data distribution and irreducible, and epistemic uncertainties (model uncertainty), which can be reduced through the collection of additional data. Popular approaches to generate pseudo-labels and quantify uncertainties in deep networks include Bayesian neural networks [155, 166], Monte Carlo Dropout [153, 46, 45, 167], Monte Carlo batch normalization [168], and deep ensembles [169]. Bayesian neural networks capture model uncertainty by learning a posterior distribution over parameters. While Bayesian networks are often hard to implement and computationally slow to train [153, 169], non-Bayesian strategies, including Monte Carlo Dropout and deep ensembles, are more attractive. Jungo et al. [160] evaluated several widely-used pixel-wise uncertainty measures concerning their reliability and limitations for medical image segmentation, and also highlighted the importance of developing subject-wise uncertainty estimations.
The model uncertainty estimated with Monte Carlo Dropout [153] can be interpreted as an approximation of Bayesian uncertainty. Concretely, the predictive uncertainty is estimated by averaging the results of multiple stochastic forward passes of the deep network under random dropout. The widely-used uncertainty measures include normalized entropy of the varied probabilistic predictions, the variance of the Monte Carlo samples, mutual information, and predicted variance. Nair et al. [158] provided an in-depth analysis of the different measures based on medical image segmentation performance. Camarasa et al. [170] conducted a quantitative and statistical comparison of several uncertainty measures of Monte Carlo Dropout based on the task of multiclass segmentation. Given the uncertainty estimated by Monte Carlo Dropout [153], Yu et al. [46] introduced an uncertainty-aware consistency loss for the learning of the student model and applied it to the semi-supervised segmentation of the left atrium. Similarly, Sedai et al. [45] conducted semi-supervised segmentation of retinal layers in OCT images with uncertainty guidance estimated with Monte Carlo Dropout.
Deep ensembles [169] were theoretically motivated by the bootstrap and have been empirically demonstrated to be a promising approach for boosting the accuracy and robustness of deep networks. Concretely, multiple networks using different training subsets and/or different initializations are separately trained to enforce variability, and then the predictions are combined by averaging for the uncertainty estimation. Mehrtash et al. [163] used deep ensembles for confidence calibration, where they trained multiple models with different initializations and random shuffling of the training data. They applied their confidence calibrated model for brain, heart, and prostate segmentation.
Ayhan and Berens [171] showed that applying traditional augmentation at the test time can be an effective and efficient estimation of heteroscedastic aleatoric uncertainty in deep networks, and they applied their method on fundus image analysis. Kendall and Gal [157] introduced a unified Bayesian framework to combine aleatoric and epistemic uncertainty estimations for deep networks. Wang et al. [164] validated the effectiveness of test-time augmentation as aleatoric uncertainty estimation on the segmentation of fetal brains and brain tumors.



III-F Co-training
Co-training initially introduced by Blum and Mitchell [172] exploits multiview data descriptions to learn from a limited number of labeled examples and a large amount of unlabeled data. The underlying assumption is that the training examples can be described by two or more different but complementary sets of features, called views, which are assumed conditionally independent in the ideal given the category. As an extension of self-training to multiple base learners, the original co-training for classification first learns a separate learner for each view using any labeled examples, then most confident predictions of all base learners on unlabeled data are gradually added to the labeled data of other base learners to continue the iterative training. By enforcing prediction agreements between the different but related views, the goal is to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Moreover, it is essential to ensure the different base learners giving different and complementary information about each instance [173], namely, view difference constraint or diversity criterion. Peng et al. [174] applied the idea of co-training to semi-supervised segmentation of medical images. Concretely, they trained multiple models on different subsets of the labeled training data and used a common set of unlabeled training images to exchange information with each other. Diversity across models was enforced by utilizing adversarial samples generated using both the labeled and unlabeled data as [175]. For semi-segmentation of multi-organ from 3D medical images, Zhou et al. [176] introduced multi-planar co-training, which involves training different segmentation models on multiple planes, i.e., axial, coronal, and sagittal planes, of a volume image in the teacher-student paradigm. Xia et al. [177] incorporated uncertainty estimation to the multi-planar co-training approach in [176] to generate more reliable pseudo labels for unlabeled data.
III-G Consistency regularization
Consistency regularization [178, 179, 48] utilizes unlabeled data by relying on the assumption that favoring models should generate consistent predictions for similar inputs, as shown in Fig. 8. More specifically, the trained model should output the same predictions for classification or equivariant predictions for segmentation when fed perturbed or transformed input. To this end, methods of this category learn to minimize the difference in predictions of passing perturbed or transformed versions of a training sample through the deep network, aiming to obtain a model of better generalization ability. The conceptual idea of consistency regularization in both single-model architecture and dual-model architecture is shown in Fig. 8. Cui et al. [180] adapted the mean teacher model [151], an improved teacher-student self-training strategy that also considered consistency regularization, to semi-supervised brain lesion segmentation. Specifically, they minimized the differences between the predictions of the teacher model and the student model for the same input under different noise perturbations. Yu et al. [46] further introduced uncertainty estimation into the mean teacher learning framework. Li et al. [181] introduced geometric-transformation consistent loss, which was integrated into the mean teacher learning framework [151] and applied to the semi-supervised segmentation of sin lesion, optic disc, and liver tumor. In the mean teacher framework, Zhou et al. [182] encouraged the predictions from the teacher and student networks to be consistent in both feature and semantic level under small perturbations. They applied their model to semi-supervised instance segmentation of cervical cells. In the teacher-student paradigm, Fotedar et al. [183] further considered consistency under extreme transformations, including a diverse set of intensity-based, geometric, and image mixing transformations, and conducted semi-supervised lesion segmentation and retinal vessel segmentation from skin and fundus images, respectively. Liu et al. [184] explicitly enforced the consistency of relationships among different samples under perturbations in the teacher-student framework. Instead of using the self-training strategy (i.e., the teacher-student paradigm), Bortsova et al. [48] enforced transformation consistency on both the labeled and unlabeled data within a Siamese network and achieved state-of-the-art performance on chest X-ray segmentation. A similar idea has been adopted in [185] to weakly supervised segmentation of covid-19 in CT images. For semi-supervised medical image segmentation, Peng et al. [186] further employed mutual-information-based clustering loss to explicitly enforce prediction consistency between nearby pixels in the unlabeled images and random perturbed unlabeled images. Fang and Li [187] developed a convolutional network with two decoder branches of different architectures and minimized the difference between soft masks generated by the two decoders. They applied their method to kidney tumor and brain tumor segmentation and showed promising results.

III-H Self-supervised learning
Self-supervised learning [188, 189, 190] (as shown in Fig. 9), a form of unsupervised learning, has been widely used to explore unlabeled data and has shown soaring performance on representation learning through discovering data-inherent patterns. Self-supervised learning leverages the unlabeled data with automatically generated supervisory signal and benefits the downstream tasks by self-supervised model pretraining. Then, the pretrained model and the leaned features are adapted to the target tasks of interest. Therefore, self-supervised learning aims to obtain a good representation of the training data without using any manual label. A popular solution is to learn useful features by introducing various pretext tasks, such as Jigsaw puzzles [189], rotation prediction [191], inpainting [192], colorization [190], relative position [188], and a combination of a number of them, for the networks to solve, as demonstrated in Fig. 9. In this way, unlabeled training data can also be leveraged to acquire generic knowledge under different concepts, which can be transferred to various downstream tasks. These pretext tasks share one common property: labels for the pretext task can be automatically generated based on a certain degree of image understanding. For semi-supervised medical image segmentation, Li et al. [47] proposed generating pseudo-label by recurrently optimizing the neural network with a self-supervised task, where Jigsaw puzzles were used as the pretext task. Tajbakhsh et al. [193] used three pretext tasks, i.e., rotation, reconstruction, and colorization, to pre-train a deep network for different medical image segmentation tasks in the context of having limited quantities of labeled training data. Taleb et al. [194] extended five different pretext tasks, including contrastive predictive coding, rotation prediction, Jigsaw puzzles, relative patch location, and exemplar networks to 3D context, and showed competitive results on brain tumor segmentation from 3D MRI and pancreas tumor segmentation from 3D CT images. In [195], Taleb et al. introduced a multimodal puzzle task to pretrain a model from multi-modal images, which was then finetuned on a limited set of labeled data for the downstream segmentation task.
III-I Adversarial learning
Generative adversarial learning was introduced by Goodfellow et al. [72], which involves training two subnetworks, one serves as a discriminator that aims to identify whether a sample is drawn from true data or generated by the generator, and the other as a generator that aims to generate samples that are not distinguishable by the discriminator. The generator and discriminator are trained as a minimax two-player game. Adversarial training has been used in many applications, including fully-supervised image segmentation [196, 197, 198], semi-supervised segmentation [199, 200, 103, 201, 187, 105, 202, 203], and domain adaptive segmentation [41, 204, 11, 205, 206, 207]. For semi-supervised segmentation tasks, a straightforward strategy [208, 209] is to augment the standard segmentation (the generator) with a discriminator network designed to distinguish between the predicted segmentation and the ground truths and choose reliable pseudo labels on the unlabeled data. Zhang et al. [209] applied adversarial learning to biomedical image segmentation with a model consisting of two subnetworks: a segmentation network (generator) to conduct segmentation and an evaluation network (discriminator) to assess segmentation quality. Han et al. [198] introduced Spine-GAN, a recurrent Generative Adversarial Network, to segment multiple spinal structures from MRIs. For semi-supervised medical image segmentation, Nie et al. [208] followed a similar strategy introduced in [199] and utilized an adversarial network to select the trustworthy regions of unlabeled data to train the segmentation network. Generative adversarial learning has also been used as a data-space solution to the small data problem by directly synthesizing more realistic looking data. For the segmentation of unpaired multi-model cardiovascular volumes with limited training data, Zhang et al. [210] utilized a cycle-consistent adversarial network for training a cross-modality synthesis model, which can synthesize realistic looking 3D images. Cross-modality shape-consistency was enforced to guarantee the shape invariance of the synthetic images.
III-J Few-shot segmentation
Few-shot segmentation (FSS) [211] aims at learning a model on base semantic classes but performing segmentation on novel semantic class with only labeled images (i.e., -shot) of this unseen class without retraining the model. The image-label pairs for the new class are typically referred to as the support set. Given the support set, FSS predicts a binary mask of the novel class for each query image. It is noteworthy that, in FSS, the base classes for model training are assumed to have sufficient labeled training data, and the novel class, i.e., the testing class, is not seen by the model during training. Although few-shot learning has shown promising performance for classification and detection, its application in segmentation is immensely challenging due to the need for pixel-wise prediction. A comprehensive review of few-shot learning (FSL) [212] has been provided in [213]. The application of FSL to semantic segmentation of natural images was initially introduced in [211]. Rather than fine-tuning the pre-trained model on the few support set as [214], Shaban et al. [211] introduced a two-branched approach, where the first branch takes the labeled image as input and predicts in a single forward pass a set of parameters, which are used by the second branch to generate a prediction for a query image. It is noteworthy that fine-tuning a large network on a very small support set is prone to overfitting. Roy et al. [215] considered the few-shot segmentation of organs from medical volumetric images, where only a few annotated slices are available. Following the two-branch paradigm in [211], they introduced strong interactions at multiple locations between the two branches by using Channel Squeeze & Spatial Excitation modules [216, 217], which is different from the one interaction at the final layer in [211]. Ouyang et al. [218] introduced a superpixel-based self-supervision technique for few-shot segmentation of medical images and showed the promising ability of generalization to unseen semantic classes.
III-K Summary
In previous sections, we summarize popular techniques for semi-supervised segmentation of medical images. In summary, these methods address three crucial problems: 1) how to learn a reliable model from just a few labeled data without overfitting, 2) how to make the best use of the unlabeled data, and 3) how to use domain knowledge to learn a robust model with better generalization. Note that the three problems are not independent. The first problem can be easier to address when additional unlabeled data are available, or specific domain knowledge can be exploited.
The first problem can be partially addressed by data augmentation, curriculum learning, and transfer learning. Data augmentation is a simple yet effective data-space solution and artificially augment the labeled data. However, recent methods also exploit unlabeled data to capture real variations of the data. Data curriculum learning works on the data space by taking advantage of human knowledge about the training data. However, this method is not always effective. The transfer learning relies on the availability of external large benchmark datasets for model pretraining and can be regarded as a model-space solution. However, the effectiveness of the transfer learning, that is, adapting the pretrained network to the current dataset, depends on the nature of the current dataset, such as the similarity of the benchmark dataset and the current dataset, and the size of the current dataset. Generally, when the similarity of the two datasets is high, and the size of the current dataset is small, the performance gain is significant. When transferring from natural image benchmarks to medical datasets, where the data similarity is relatively low, the benefit of transfer learning is not always significant [58]. Thus, pretraining on relevant domains and applying to the current domain with supervised or semi-supervised training, known as Domain Adaptation [219] or Domain generalization, has received growing attention. Please refer to [42] for comprehensive reviews of domain adaptation for semantic segmentation.
There are more methods that leverage unlabeled data, including the self-training, consistency regularization, adversarial learning, and self-supervised learning. The self-supervised learning strategy also follows the ”pretrain, fine-tune” pipeline, but conducts model pretraining on the current unlabeled data in an unsupervised or self-supervised style, which is different from the supervised pretraining in transfer learning. In contrast, the consistency regularization strategy introduces unsupervised losses on the unlabeled data, and the unsupervised losses are jointly learned with the supervised loss on the labeled data. The self-training directly augments the labeled data through pseudo-labeling the unlabeled data.
There are many types of domain knowledge, such as anatomical priors by shape or atlas modeling, data or task priors by curriculum learning and transfer learning, and so on. Incorporating domain knowledge has proven to be effective in regularizing the model training and especially valuable in medical image segmentation. The few-shot learning aims to generalize from a few labeled examples with prior knowledge. Thus, it can help relieve the burden of data collecting and annotation and help learn from rare cases, which is crucial for biomedical applications.
IV Partially-supervised Segmentation
While semi-supervised segmentation addresses the scenario that a small subset of the training data is fully annotated, partially-supervised segmentation refers to more challenging cases wherein partial annotations are available for all examples or a subset of examples. Obviously, a model that requires only partial annotations will further reduce the workloads of manual labeling. However, this problem is more challenging than semi-supervised learning.
Volume segmentation with sparsely annotated slices. For 3D medical image segmentation, uniformly sampled slices with annotations were used in [220, 221, 222, 223, 224] to train a 3D deep network model by assigning a zero weight to unannotated voxels in the loss function. Bai et al. [221] performed label propagation from annotated slices to unannotated slices based on non-rigid registration and introduce an exponentially weighted loss function for model training. Bitarafan et al. [223] considered a partially-supervised segmentation problem where only one 2D slice on each volume in the training data was annotated. They addressed this problem with a self-training framework that alternatively generates pseudo labels and updates the 3D segmentation model. Specifically, they utilized the registration of consecutive 2D slices to propagate labels to unlabeled voxels. To segment 3D medical volumes with sparsely annotated 2D slices, Zheng et al. [224] utilized uncertainty-guided self-training to gradually boost the segmentation accuracy. Before training segmentation models with sparsely annotated slices, Zheng et al. [225] first identified the most influential and diverse slices for manual annotation with a deep network. After manual annotation of the selected slices, they conducted segmentation with a self-training strategy. Wang et al. [226] considered a 3D image training dataset with mixed types of annotations, i.e., image volumes with a few annotated consecutive slices, a few sparsely annotated slices, or full annotation. Under the self-training framework, they iteratively generated pseudo labels and updated the model with augmented labeled data. To take advantage of the incomplete annotated data, they introduced a hybrid loss, including a boundary regression loss on labeled data and a voxel classification loss on both labeled data and unlabeled data.
Segmentation with partially annotated regions. As sparse annotations, partially annotated regions or scribbles can provide the location and label information at a few pixels or partial regions. They have been widely used in segmentation tasks [29, 227, 228, 229, 230], especially in the context of interactive segmentation [231, 232, 233, 234, 235, 236], where users give feedback to refine the segmentation iteratively. Scribbles have been recognized as a user-friendly way of interacting for both natural and medical image segmentations [231, 227, 29, 6]. While scribble-supervised segmentation was usually addressed by optimizing a graphical model [29] or a variational model [31], tackling this problem with deep networks has also been a hot topic.
Since medical images usually suffer from low tissue contrast, fuzzy boundaries, and image artifacts such as noise and intensity bias, an interactive strategy is also valuable for medical image segmentation. Zhang et al. [237] considered interactive medical image segmentation via a point-based interaction, where the physician should click on rough central points of the objects for each testing image. For MR image segmentation, Wang et al. [238] employed scribbles as user interaction to fine-tune coarse segmentation in the context of deep learning. Zhou et al. [239] introduced an interactive editing network trained using simulated user interactions to refine the existing segmentation. Liao et al. [240] proposed to solve the dynamic process of iteratively interactive image segmentation of 3D medical images with multi-agent reinforcement learning, where they treated each voxel as an agent with shared behaviors.
To reduce the annotation effort in the context of interactive segmentation, especially for instance segmentation, researchers have explored methods to suggest annotations [241, 139, 242, 243] or select informative samples [244]. A promising solution is active learning, the process of selecting the examples or regions that need to get human labels. In this way, we can obtain a model that achieves the desired accuracy faster and with low annotation cost. The flowchart of active learning for interactive segmentation on a conceptual level is shown in Fig. 6 (b). A comparison with self-training, which explores no human interaction, is also demonstrated. Yang et al. [241] combined the deep network model with active learning to identify the most representative and uncertain areas for annotation. In instance-level segmentation of pulmonary nodules, Wang et al. [139] utilized active learning to overcome the annotation bottleneck by querying the most confusing unannotated instances for manual annotation. The most confusing unannotated instances were identified with high-uncertainty. For breast cancer segmentation on immunohistochemistry images, Sourati et al. [243] introduced a new active learning method with Fisher information for deep networks to identify a small number of the most informative samples to be manually annotated. To achieve a rapid increase in the segmentation performance, Shen et al. [245] designed three criteria, i.e., dissatisfaction, representativeness, and diverseness, in the framework of active learning to select an informative subset for labeling, which can obviously reduce the cost of annotation. Given an initial segmentation, Wang et al. [235] used uncertainty estimation to identify a subset of slices that require user interactions.
Rather than performing interactive segmentation, Lin et al. [227] directly used the given sparse scribbles as the supervision to train a deep convolutional network for natural image segmentation. Tang et al. [228] investigated partially-supervised segmentation with scribbles as annotations and introduced several regularized losses, including CRF [29] loss, high-order normalized cut loss, and kernel cut loss in the context of deep convolutional networks. In [229], Tang et al. further introduced normalized cut loss. Ji et al. [246] investigated the segmentation of brain tumor substructures with whole tumor/normal brain scribbles and the image-level labels as supervision. To capture fine tumor boundaries, they augmented the segmentation network with a dense CRF [247] loss. For 3D instance segmentation in medical images, Zhao et al. [248] considered a mix of 3D bounding boxes for all instances and voxel-wise annotation for a small fraction of the instances. They addressed this problem with a cascade of two stages: an instance detection stage with bounding-box annotations and an instance segmentation stage with full annotations for a small number of instances.
For semantic segmentation of emphysema with both annotated and unannotated areas in training data, in the self-training framework, Peng et al. [249] utilized the similarities of deeply learned features between labeled and unlabeled areas to guide the label propagation to unannotated areas. Then, the selected regions with confident pseudo-labels are used to enrich the training data. For the segmentation of cancerous regions in gigapixel whole slide images (WSIs), Cheng et al. [230] considered a partially labeled scenario, where only partial cancer regions in WSIs were annotated by pathologists due to time constraints or misinterpretation. To tackle this problem, they integrated the idea of the teacher-student learning paradigm and self-similarity learning to enforce nearby patches in a WSI to be similar in feature space. A similar prediction ensemble strategy was used to generate pseudo labels, which were used to filter out noisy labels.
Dong et al. [250] considered neuron segmentation from macaque brain images, where both central points and rough masks were used as the supervision. Zheng et al. [251] proposed to use boundary scribble as the weak supervision for tumor segmentation, where boundary scribbles are coarse lesion edges. While boundary scribbles provide both location information about the lesions and more accurate boundary information than bounding boxes, they still lack information about accurate boundaries. For cell segmentation with scribble annotations, Lee and Jeong [252] proposed to generate reliable labels through the integration of pseudo-labeling and label filtering in the mean teacher framework [151].




Segmentation with point annotations. An extremely sparse annotation is point annotation [253], which is labeling only one point in each object as examplified in Fig. 10. Point annotations [253, 254, 255, 256, 257] have also been considered to reduce the cost of manual labeling, which is especially useful for multiclass segmentation and instance-level segmentation. Point annotation is one of the fastest ways to label objects. As has shown in [253], point annotation is significantly cheaper than dense pixel-level annotation. Despite its cost-efficiency, point annotations are extremely sparse annotations and only contain location information. Thus, most studies have utilized point annotations on object detection and counting tasks [258, 259, 260], such as cell detection [261, 260] and nuclei detection [262]. Yoo et al. [263] investigated nuclei segmentation with point annotations. Since point annotations do not contain nucleus boundary information, they augmented the segmentation network with an auxiliary network for edge detection with the Sobel filtered prediction map of the segmentation network as the supervision signal. To segment mitosis from breast histopathology images with centroid labels, Li et al. [264] expanded the single-pixel label to a novel label with concentric circles, where the inside circle was regarded as a mitotic region, and the regions outside the outer ring were regarded as non-mitosis. They introduced a concentric loss to make the segmentation network be trained only with the estimated labels in the inside circle and outside the outer circle. For nuclei segmentation, Qu et al. [265, 256] addressed a more challenging case, where only sparse points annotation, i.e., only a small portion of nuclei locations in each image, were annotated with center points. Their method consists of two stages, the first stage conducts nuclei detection with a self-training strategy, and the second stage performs semi-supervised segmentation with pseudo-labels generated with Voronoi diagram and k-means clustering. The dense CRF loss was utilized in training to refine the segmentation.
Multiclass segmentation from multiple few-class-labeled datasets. Segmentation of various anatomical structures from medical images, such as multi-organ segmentation, is a fundamental problem in medical image analysis and downstream clinical usage. However, besides the cost of data collection, obtaining sufficient multiclass annotations on a large dataset in itself is a labor-intensive and often impossible task. In contrast, there are various annotated datasets from different medical centers for their own clinical and research purposes but with missing annotations for several classes, or even only single-class annotations available. Most public available medical image data are designed and annotated for specific clinical and research purpose, such as Sliver07 [23] for evaluation of liver segmentation [7, 8, 6, 266], LiTS [267] for evaluation of liver-tumor segmentation, KiTS [268] for evaluation of kidney-tumor segmentation [269], LUNA [270] for lung nodule analysis [271, 272], and BRATS [273] for brain tumor analysis [3, 2]. Thus, a significant challenge is how to learn a universal multi-class segmentation model from multiple partially annotated datasets with missing annotated classes.
Zhou et al. [274] first considered a partially-supervised multi-organ segmentation problem where a small fully labeled dataset and several partially-labeled datasets are available. They developed a prior-aware neural network that explicitly incorporated anatomical priors on abdominal organ sizes as domain-specific knowledge to guide the training process. Dmitriev et al. [275] further removed the need for the fully labeled data and investigated the problem of multi-class (e.g.,multi-organ) segmentation from single-class (e.g., single-organ) labeled datasets [274]. They proposed to condition a single convolutional network for multi-class segmentation with non-overlapping single-class datasets for training. Concretely, they inserted the conditional information as an intermediate activation between convolutional operation and the activation function. Huang et al. [276] tackled the problem of partially-supervised multi-organ segmentation in the co-training framework, where they collaboratively trained multiple networks, and each network was taught by other networks on un-annotated organs. Yan et al. [277] proposed to develop a universal lesion detection algorithm to detect a comprehensive variety of lesions from multiple datasets with partial labels. Specifically, they introduced intra-patient lesion matching and cross-dataset lesion mining to address missing annotations and utilized feature sharing, proposal fusion, and annotation mining to integrate different datasets. Shi et al. [278] addressed partially-supervised multi-organ segmentation with two novel losses: the marginal loss that aims to merge all unlabeled organ pixels with the background label, and the exclusion loss that constrains multi-organs exclusive. To segment multi-organ and tumors from multiple partially labeled datasets, Zhang et al. [279] proposed an encoder-decoder network with a single but dynamic head, in which the kernels are generated adaptively by a controller, conditioned on both the input image and assigned task. Dong et al. [250] considered a special case where full labels for all classes are not available on the whole training set, but labels of different classes are available on different subsets. They addressed this problem with a data augmentation strategy by exploiting the assumption that patients share anatomical similarities.
Summary. Reducing annotation cost essentially echoes the real-world environments, where the annotations are often incomplete or even sparse. This section covers four types of partial annotations, including partially annotated slices for 3D images, partially annotated regions, point annotations, and multiple few-class-labeled datasets. As shown in previous sections, most methods to address these scenarios are based on self-training and regularization techniques. When collecting annotated data with a human in the loop, suggestive annotations can significantly reduce the annotation effort, mostly when extensive scale data should be annotated, or a large number of instances need annotations. Thus, a critical question is which data samples or image regions should be selected for annotations to achieve high-quality performance faster. This active learning paradigm, as exemplified in Fig. 6 (b), has been an active research field, where more efforts are still needed.
V Inaccurately-supervised segmentation
Segmentation with inaccurate or imprecise annotations refers to the scenario where the ground truth labels are corrupted with (random, class-conditional or instance-conditional [280, 281]) noises, thus also referring to noisy label learning [282, 283]. Imprecise boundaries, and mislabeling are also inaccurate annotations. Moreover, bounding-box annotations can be treated as annotations with inaccurate boundaries and mislabeled regions. Note that, as shown in [284], boundary-localized errors are more challenging than random label errors.
Learning from noisy labels has recently drawn much attention in many applications, including medical image analysis [25, 285, 286]. It is expensive and sometimes infeasible to obtain accurate labels, especially on medical imaging data where labeling requires domain expertise, and annotating huge-size imaging data is inherently a daunting task. In contrast, noisy labels such as those generated by non-experts [287] or computers [236] are easy to obtain. Moreover, it is impractical to manually correct the label errors, which is not only time-consuming bust also requires a stronger committee of experts. Karimi et al. [25] provided a review of the state-of-the-art deep learning methods (published in 2019 or earlier) in handling label noise. However, most approaches for dealing noisy (low-quality) annotations are developed for classification [288, 289, 290, 291, 292] and detection [293]. Herein, we focus on medical image segmentation with noise labels.
While a class of methods struggles to model and learn label noises [294, 295], other methods choose to select confidence examples, reducing the side effects without explicitly modeling the label noise [296, 289, 297], such as the co-teaching paradigm [296] and reweighting strategy [294]. To eliminate the disturbance of segmentation from inaccurate labels, Zhu et al. [297] developed a label quality evaluation strategy with a deep neural network to automatically assess the label quality. They trained the segmentation model on examples with clean annotations. For chest X-ray segmentation with imperfect labels, Xue et al. [298] adopted a cascade strategy consisting of two stages: a sample selection stage, which selects the clean annotated examples as the co-teaching paradigm, and a label correction and model learning stage, which learns the segmentation model from both the corrected labels and original labels. To segment skin lesions from noisy annotations, Mirikharaji et al. [299] adopted a spatially adaptive reweighting approach to emphasize the learning from clean labels and reduce the side effect of noisy pixel-level annotations. A meta-learning was adopted to assign higher importance to pixels. Shu et al. [300] proposed to enhance supervision of noisy labels by capturing local visual saliency features, which are less affected by supervised signals from inaccurate labels. For noisy-labeled medical image segmentation, Zhang et al. [301] integrated confidence learning [302], which can identify the label errors through estimating the joint distribution between the noisy annotations and the true (latent) annotations, into the teacher-student framework to identify the corrupted labels at pixel-level. Soft label correction based on spatial label smoothing regularization was also adopted to generate high-quality labels. Rather than using fully manual annotations for vessel segmentation, Zhang et al. [236] proposed to learn the segmentation from noisy pseudo labels obtained from automatic vessel enhancement, which usually has system bias. To tackle this problem, they adopted improved self-paced learning with online guidance from additional sparse manual annotations. The self-paced learning strategy enabled the model training to focus on easy pixels, which have a higher chance to have correct labels. To minimize manual annotations, they introduced a model-vesselness uncertainty estimation for suggestive annotation. To weaken the influence of the noise pseudo labels in semi-supervised segmentation, Min et al. [303] introduced a two-stream mutual attention network with hierarchical distillation, where the multiple attention layers were used to discover incorrect labels and indicate potentially incorrect gradients.
Segmentation with bounding-box annotations. A appealing weak supervision is bounding-box annotations [304, 305, 306], which are easy to obtain and can yield confirmed information about backgrounds and rich information about the foreground. Moreover, the bounding box, as shown in Fig. 10, can be simply represented by two corners, which allows light storage. Given the uncertainty of figure-ground separation within each bounding box [307], one of the core tasks for bounding-box supervised segmentation is to generate accurate pseudo-labels. A popular pseudo-label generation approach is Grabcut [231], which iteratively estimates the foreground and background’s distributions and conducts segmentation with CRF models such as graph cut [29]. The iterative strategy that alternatively updates segmentation model parameters and pseudo labels have been widely used to address bounding box annotations [304, 308]. In the context of natural image segmentation with deep networks, the BoxUp model [304] iterated between automatically generating region proposals and training convolutional networks. For fetal brain segmentation from MR images with bounding-box annotations, Rajchl et al. [308] introduced DeepCut model, an extension of the GrabCut method to estimate distributions by training a deep network classifier. Specifically, they iteratively optimized a densely-connected CRF model and a deep convolutional network. Kervadec et al. [309] leveraged the classical bounding-box tightness prior [310] to regularize the output of deep segmentation network. Concretely, the bounding-box tightness prior was reformulated as a set of foreground constraints and a global background emptiness constraint, which enforced the regions outside bounding box to contain no foreground. They solved the introduced energy function with inequality constraints a sequence of unconstrained losses based on an extension of the log-barrier method. Wang et al. [311] investigated the segmentation of male pelvic organs in CT from 3D bounding box annotations, which was addressed with iterative learning of deep network model and pseudo labels. A label denoising module, which evaluated the consistency of predictions across consecutive iterations, was designed to identify the voxels with unreliable labels. Zheng et al. [251] proposed to use boundary scribble as the weak supervision for tumor segmentation, where boundary scribbles are coarse lesion edges. While boundary scribbles include locations of lesions and provide more accurate boundary information than bounding boxs, they still lack accurate information about boundaries.
Summary. Lower the requirement of precise annotations can also significantly reduce annotation efforts. In this section, we have reviewed two types of inaccurate annotations, i.e., noisy labels and boxing-box label. While the partial annotations reviewed in Section IV are reliable annotations for the positive classes (except for the background class), inaccurate annotations in this section refer to unreliable labels. For example, the noisy labels can be regarded as labels corrupted from ground truth labels; the bounding-box annotations, as shown in Fig. 10 (d), contain both foreground pixels and background pixels. It is known that deep network models are susceptible to the presentation of label corruptions [312, 25]. Thus, addressing label noises has gained increasing attention in recent years and has been a popular topic in top conference venues.
VI Discussion and future directions
In this section, we discuss some ongoing or future directions of medical image segmentation with limited supervision.
Task-agnostic versus task-specific use of unlabeled data Semi-supervised segmentation methods partially differ in how to leverage unlabeled data. There are two typical ways to make use of unlabeled data: 1) the task-agnostic approach, which leverages unlabeled data through unsupervised or self-supervised pretraining, such as the self-supervised learning strategy in Sec. III-H; 2) the task-specific approach, which jointly leverages the labeled and unlabeled data by enforcing a form of regularization, such as the consistency regularization strategy in Sec. III-G and self-training in Sec. III-E. While the task-agnostic approaches utilize the unlabeled data for unsupervised representation learning followed by supervised fine-tuning, the task-specific approaches use the unlabeled data to directly augment the labeled data through pseudo-labeling, or regularize the supervised model learning through consistency regularization. Both paradigms have shown promising results and received substantial attention in the fields of medical imaging and computer vision. Recently, an encouraging progress in self-supervised learning is the contrastive learning [313, 314], which formulates the task of discriminating similar and dissimilar things in the learning model. The Momentum Contrast model [314] with contrastive unsupervised pretraining outperformed its supervised pretrained counterpart in several natural image segmentation tasks. The contrastive learning strategy also has been used in medical image segmentation with limited annotations and has shown promising results [315]. However, the gap between the objectives of the self-supervised pretraining and downstream segmentation task is non-negligible. More work in this direction is expected to push the boundaries on medical image segmentation tasks. Another promising direction is integrating the task-agnostic and task-specific approaches in an elegant way. A possible solution is introduced in [316]. They first fine-tuned the unsupervised pretrained model and then distilled the model into a smaller one with the unlabeled data.
More constructive theoretical analysis is needed. Although diverse strategies, such as self-supervised learning, and curriculum learning, have been introduced and achieved promising results, more studies are needed to identify their mechanisms. For example, it is still unclear how to automatically design an adaptive curriculum for the given segmentation task instead of using a predefined curriculum [131] and when will curriculum-like strategies, especially data curriculum, benefit the deep model training. Possible promising directions for automatic curriculum design may be the self-pace paradigm [141] and teacher-student paradigm [317]. Whereas curriculum learning has achieved success on classification and detection tasks [135, 136, 137, 78], it has relatively limited applications on semi-supervised medical image segmentation tasks [138, 139]. In addition to more extensive experimental results on diverse segmentation tasks, there is also a need for theoretical guarantees on their effectiveness[133, 318], which is the foundation for its application on specific tasks. A solid foundation is also needed for segmentation methods based on self-supervised learning, especially those using consistency loss or contrastive loss [319].
Lightweight and efficient segmentation models are favorable. Deep and wide models are slow to train and, more importantly, they may easily overfit on datasets with limited annotation. Lightweight models with few parameters and few computational resource requirements are favorable for model training and deploying on computationally limited platforms, which may significantly improve the clinical application’s efficiency [320, 2, 321]. Two strategies are usually employed: model compression [322] and efficient model architecture designs [323, 320, 2]. Moreover, model cascade, model ensemble, maintaining multiple networks, and combining them, such as the self-training strategy, are usually used, which inevitably increase system complexity and degrades training efficiency [2]. Thus, maintaining a more simple model system is challenging future direction.
Hyper-parameter searching is challenging. There are usually more hyper-parameters, such as the trade-off parameters, in segmentation methods with limited supervision. However, there are not enough labeled data for reliable hyper-parameter searching, resulting in high-variance in performance. A possible solution is using meta-learning [324], the goal of which is ‘learning to learn better’. In other words, meta-learning seeks to improve the learning algorithm itself with either task-agnostic or task-specific prior knowledge and thus can improve both data and computational efficiency. Thus, there is a rapid growth in interest in meta-learning and its various applications, including medical image segmentation with limited supervision [325]. Utilizing meta-loss on a small set of labeled data has shown promising results in few-shot learning [324, 326, 327].
Complex label noises are challenging. Label noises in real-world applications are usually a mix of several types of noises, such as class-dependent noise, instance-dependent noise, and adversarial noise, which tends to confuse models on ambiguous regions or instances. Thus, training models with the ability to tackle complex noises is valuable for real-world clinical applications. Karimi et al. [25] reviewed deep learning methods dealing with label noises for medical image analysis, where most of the representative studies are about medical image classification. However, the challenge of dealing with label noises is particularly significant in segmentation tasks since pixel-wise labeling of large datasets is resource-intensive and requires experts’ domain knowledge. The limited imaging resolution also makes the annotators difficult to identify small objects and fuzzy boundaries. More label noises exist in the large number of annotations produced by non-experts or automatics labeling software with little human refitments. To analyze and address various kinds of label errors, an important thing is to construct large scale datasets with real noises, which in itself is a challenging task. Currently, most studies still use public datasets with simulated label perturbations [284, 298, 301] or private datasets [300, 25]. Building up public benchmarks with real noises is crucial to make further breakthroughs, especially for clinical usage.
Learning to represent and integrate domain knowledge is still challenging. Although domain knowledge has dramatically boosted medical image segmentation methods, especially in settings with limited supervision, the selection and representation of prior knowledge are still challenging since it is usually highly dependent on the specific task. Xie et al. [21] summarized recent progress on integrating domain knowledge into deep learning models for medical image analysis. Moreover, translating the original representation of prior knowledge in clinical settings to the representations that are ready for the integration with deep networks is challenging.
VII Conclusions
In this review, we covered effective solutions for the segmentation of biomedical images with limited supervision, namely, semi-supervised segmentation, partially-supervised segmentation, and inaccurately-supervised segmentation. We reviewed a diverse set of methods for these problems. For semi-supervised segmentation, we provide a taxonomy of existing methods that are with the ability to leverage labeled data, unlabeled data, and also prior knowledge. For the task of partially-supervised segmentation, we considered segmentation with partially annotated regions, point annotations, or partially annotated slices, interactive segmentation, and multi-class segmentation from multiple partial-class-labeled datasets and shown the current technical status regarding recent solutions. For the task of inaccurately-supervised segmentation, we summarized the methods addressing noisy labels and bounding-box annotations. We also have discussed possible future directions for further studies.
References
- [1] L. Dora, S. Agrawal, R. Panda, and A. Abraham, “State-of-the-art methods for brain tissue segmentation: A review,” IEEE Reviews in Biomedical Engineering, vol. 10, pp. 235–249, 2017.
- [2] Z. Luo, Z. Jia, Z. Yuan, and J. Peng, “Hdc-net: Hierarchical decoupled convolution network for brain tumor segmentation,” IEEE Journal of Biomedical and Health Informatics, 2020.
- [3] G. Wang, W. Li, S. Ourselin, and T. Vercauteren, “Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks,” in International MICCAI Brainlesion Workshop. Springer, 2017, pp. 178–190.
- [4] C. Chen, C. Qin, H. Qiu, G. Tarroni, J. Duan, W. Bai, and D. Rueckert, “Deep learning for cardiac image segmentation: A review,” Frontiers in Cardiovascular Medicine, vol. 7, p. 25, 2020.
- [5] X. Li, H. Chen, X. Qi, Q. Dou, C.-W. Fu, and P.-A. Heng, “H-denseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
- [6] J. Peng, Y. Wang, and D. Kong, “Liver segmentation with constrained convex variational model,” Pattern Recognition Letters, vol. 43, pp. 81–88, 2014.
- [7] J. Peng, P. Hu, F. Lu, Z. Peng, D. Kong, and H. Zhang, “3d liver segmentation using multiple region appearances and graph cuts,” Medical Physics, vol. 42, no. 12, pp. 6840–6852, 2015.
- [8] P. Hu, F. Wu, J. Peng, P. Liang, and D. Kong, “Automatic 3d liver segmentation based on deep learning and globally optimized surface evolution,” Physics in Medicine & Biology, vol. 61, no. 24, p. 8676, 2016.
- [9] S. Deng, X. Zhang, W. Yan, I. Eric, C. Chang, Y. Fan, M. Lai, and Y. Xu, “Deep learning in digital pathology image analysis: a survey,” Frontiers of Medicine, pp. 1–18, 2020.
- [10] J. Peng and Z. Yuan, “Mitochondria segmentation from em images via hierarchical structured contextual forest,” IEEE Journal of Biomedical and Health Informatics, vol. 10, no. 8, pp. 2251–2259, 2020.
- [11] J. Peng, J. Yi, and Z. Yuan, “Unsupervised mitochondria segmentation in em images via domain adaptive multi-task learning,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1199–1209, 2020.
- [12] J. J. Cerrolaza, M. L. Picazo, L. Humbert, Y. Sato, D. Rueckert, M. Á. G. Ballester, and M. G. Linguraru, “Computational anatomy for multi-organ analysis in medical imaging: A review,” Medical Image Analysis, vol. 56, pp. 44–67, 2019.
- [13] F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, K. He, Y. Shi, and D. Shen, “Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19,” IEEE Reviews in Biomedical Engineering, 2020.
- [14] S. Moccia, E. De Momi, S. El Hadji, and L. S. Mattos, “Blood vessel segmentation algorithms-review of methods, datasets and evaluation metrics,” Computer Methods and Programs in Biomedicine, vol. 158, pp. 71–91, 2018.
- [15] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017.
- [16] S. A. Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, and G. Hamarneh, “Deep semantic segmentation of natural and medical images: A review,” Artificial Intelligence Review, pp. 1–42, 2020.
- [17] S. K. Zhou, H. Greenspan, C. Davatzikos, J. S. Duncan, B. van Ginneken, A. Madabhushi, J. L. Prince, D. Rueckert, and R. M. Summers, “A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises,” arXiv preprint arXiv:2008.09104, 2020.
- [18] X. Yi, E. Walia, and P. Babyn, “Generative adversarial network in medical imaging: A review,” Medical Image Analysis, vol. 58, p. 101552, 2019.
- [19] F. Altaf, S. M. Islam, N. Akhtar, and N. K. Janjua, “Going deep in medical image analysis: Concepts, methods, challenges, and future directions,” IEEE Access, vol. 7, pp. 99 540–99 572, 2019.
- [20] C. L. Srinidhi, O. Ciga, and A. L. Martel, “Deep neural network models for computational histopathology: A survey,” Medical Image Analysis, p. 101813, 2020.
- [21] X. Xie, J. Niu, X. Liu, Z. Chen, and S. Tang, “A survey on domain knowledge powered deep learning for medical image analysis,” arXiv preprint arXiv:2004.12150, 2020.
- [22] A. Wadhwa, A. Bhardwaj, and V. S. Verma, “A review on brain tumor segmentation of mri images,” Magnetic Resonance Imaging, vol. 61, pp. 247–259, 2019.
- [23] T. Heimann, B. Van Ginneken, M. A. Styner, Y. Arzhaeva, V. Aurich, C. Bauer, A. Beck, C. Becker, R. Beichel, G. Bekes et al., “Comparison and evaluation of methods for liver segmentation from ct datasets,” IEEE Transactions on Medical Imaging, vol. 28, no. 8, pp. 1251–1265, 2009.
- [24] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image segmentation using deep learning: A survey,” arXiv preprint arXiv:2001.05566, 2020.
- [25] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis,” Medical Image Analysis, vol. 65, p. 101759, 2020.
- [26] P. Zhang, Y. Zhong, Y. Deng, X. Tang, and X. Li, “A survey on deep learning of small sample in biomedical image analysis,” arXiv preprint arXiv:1908.00473, 2019.
- [27] N. Tajbakhsh, L. Jeyaseelan, Q. Li, J. N. Chiang, Z. Wu, and X. Ding, “Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation,” Medical Image Analysis, p. 101693, 2020.
- [28] J. Shotton, M. Johnson, and R. Cipolla, “Semantic texton forests for image categorization and segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2008, pp. 1–8.
- [29] Y. Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal boundary & region segmentation of objects in nd images,” in Proceedings of IEEE International Conference on Computer Vision., vol. 1. IEEE, 2001, pp. 105–112.
- [30] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, “Active shape models-their training and application,” Computer Vision and Image Understanding, vol. 61, no. 1, pp. 38–59, 1995.
- [31] T. F. Chan, S. Esedoglu, and M. Nikolova, “Algorithms for finding global minimizers of image segmentation and denoising models,” SIAM Journal on Applied Mathematics, vol. 66, no. 5, pp. 1632–1648, 2006.
- [32] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
- [33] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1520–1528.
- [34] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 234–241.
- [35] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2881–2890.
- [36] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- [37] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz et al., “Attention u-net: Learning where to look for the pancreas,” arXiv preprint arXiv:1804.03999, 2018.
- [38] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700–4708.
- [39] S. Hao, Y. Zhou, and Y. Guo, “A brief survey on semantic segmentation with deep learning,” Neurocomputing, vol. 406, pp. 302–321, 2020.
- [40] S. A. Taghanaki, K. Abhishek, J. P. Cohen, J. Cohen-Adad, and G. Hamarneh, “Deep semantic segmentation of natural and medical images: A review,” Artificial Intelligence Review, vol. 54, no. 1, pp. 137–178, 2021.
- [41] K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert et al., “Unsupervised domain adaptation in brain lesion segmentation with adversarial networks,” in International Conference on Information Processing in Medical Imaging. Springer, 2017, pp. 597–609.
- [42] M. Toldo, A. Maracani, U. Michieli, and P. Zanuttigh, “Unsupervised domain adaptation in semantic segmentation: a review,” arXiv preprint arXiv:2005.10876, 2020.
- [43] W. Wang, V. W. Zheng, H. Yu, and C. Miao, “A survey of zero-shot learning: Settings, methods, and applications,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–37, 2019.
- [44] W. Bai, O. Oktay, M. Sinclair, H. Suzuki, M. Rajchl, G. Tarroni, B. Glocker, A. King, P. M. Matthews, and D. Rueckert, “Semi-supervised learning for network-based cardiac mr image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 253–260.
- [45] S. Sedai, B. Antony, R. Rai, K. Jones, H. Ishikawa, J. Schuman, W. Gadi, and R. Garnavi, “Uncertainty guided semi-supervised segmentation of retinal layers in oct images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 282–290.
- [46] L. Yu, S. Wang, X. Li, C.-W. Fu, and P.-A. Heng, “Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 605–613.
- [47] Y. Li, J. Chen, X. Xie, K. Ma, and Y. Zheng, “Self-loop uncertainty: A novel pseudo-label for semi-supervised medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 614–623.
- [48] G. Bortsova, F. Dubost, L. Hogeweg, I. Katramados, and M. de Bruijne, “Semi-supervised medical image segmentation via learning consistency under transformations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 810–818.
- [49] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- [50] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- [51] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 2009, pp. 248–255.
- [52] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
- [53] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems, 2014, pp. 3320–3328.
- [54] J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
- [55] H. Shan, Y. Zhang, Q. Yang, U. Kruger, M. K. Kalra, L. Sun, W. Cong, and G. Wang, “3-d convolutional encoder-decoder network for low-dose ct via transfer learning from a 2-d trained network,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1522–1534, 2018.
- [56] S. Liu, D. Xu, S. K. Zhou, O. Pauly, S. Grbic, T. Mertelmeier, J. Wicklein, A. Jerebko, W. Cai, and D. Comaniciu, “3d anisotropic hybrid network: Transferring convolutional features from 2d images to 3d anisotropic volumes,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 851–858.
- [57] N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.
- [58] M. Raghu, C. Zhang, J. Kleinberg, and S. Bengio, “Transfusion: Understanding transfer learning for medical imaging,” arXiv preprint arXiv:1902.07208, 2019.
- [59] B. Zoph, G. Ghiasi, T.-Y. Lin, Y. Cui, H. Liu, E. D. Cubuk, and Q. Le, “Rethinking pre-training and self-training,” Advances in Neural Information Processing Systems, vol. 33, 2020.
- [60] Z. Zhou, V. Sodha, M. M. R. Siddiquee, R. Feng, N. Tajbakhsh, M. B. Gotway, and J. Liang, “Models genesis: Generic autodidactic models for 3d medical image analysis,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 384–393.
- [61] X. Zhang, Y. Zhang, X. Zhang, and Y. Wang, “Universal model for 3d medical image analysis,” arXiv preprint arXiv:2010.06107, 2020.
- [62] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- [63] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” arXiv preprint arXiv:1708.04896, 2017.
- [64] T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
- [65] R. Volpi, H. Namkoong, O. Sener, J. C. Duchi, V. Murino, and S. Savarese, “Generalizing to unseen domains via adversarial data augmentation,” in Advances in Neural Information Processing Systems, 2018, pp. 5334–5344.
- [66] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
- [67] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
- [68] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6023–6032.
- [69] H. Guo, Y. Mao, and R. Zhang, “Mixup as locally linear out-of-manifold regularization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 3714–3722.
- [70] E. Panfilov, A. Tiulpin, S. Klein, M. T. Nieminen, and S. Saarakkala, “Improving robustness of deep learning based knee mri segmentation: Mixup and adversarial domain adaptation,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019, pp. 0–0.
- [71] T. DeVries and G. W. Taylor, “Dataset augmentation in feature space,” arXiv preprint arXiv:1702.05538, 2017.
- [72] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
- [73] N. H. N. Tegang, J.-R. Fouefack, B. Borotikar, V. Burdin, T. S. Douglas, and T. E. Mutsvangwa, “A gaussian process model based generative framework for data augmentation of multi-modal 3d image volumes,” in International Workshop on Simulation and Synthesis in Medical Imaging. Springer, 2020, pp. 90–100.
- [74] J. M. J. Valanarasu, R. Yasarla, P. Wang, I. Hacihaliloglu, and V. M. Patel, “Learning to segment brain anatomy from 2d ultrasound with less data,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1221–1234, 2020.
- [75] Z. Yuan, J. Yi, Z. Luo, Z. Jia, and J. Peng, “Em-net: Centerline-aware mitochondria segmentation in em images via hierarchical view-ensemble convolutional network,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 1219–1222.
- [76] A. Zhao, G. Balakrishnan, F. Durand, J. V. Guttag, and A. V. Dalca, “Data augmentation using learned transformations for one-shot medical image segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8543–8553.
- [77] C. Chen, C. Qin, H. Qiu, C. Ouyang, S. Wang, L. Chen, G. Tarroni, W. Bai, and D. Rueckert, “Realistic adversarial data augmentation for mr image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 667–677.
- [78] I. Oksuz, B. Ruijsink, E. Puyol-Antón, J. R. Clough, G. Cruz, A. Bustin, C. Prieto, R. Botnar, D. Rueckert, J. A. Schnabel et al., “Automatic cnn-based detection of cardiac mr motion artefacts using k-space data augmentation and curriculum learning,” Medical Image Analysis, vol. 55, pp. 136–147, 2019.
- [79] D. E. Schlesinger and C. M. Stultz, “Deep learning for cardiovascular risk stratification,” Current Treatment Options in Cardiovascular Medicine, vol. 22, no. 8, pp. 1–14, 2020.
- [80] V. Verma, A. Lamb, C. Beckham, A. Najafi, I. Mitliagkas, D. Lopez-Paz, and Y. Bengio, “Manifold mixup: Better representations by interpolating hidden states,” in International Conference on Machine Learning. PMLR, 2019, pp. 6438–6447.
- [81] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
- [82] A. J. Ratner, H. Ehrenberg, Z. Hussain, J. Dunnmon, and C. Ré, “Learning to compose domain-specific transformations for data augmentation,” in Advances in Neural Information Processing Systems, 2017, pp. 3236–3246.
- [83] S. Olut, Z. Shen, Z. Xu, S. Gerber, and M. Niethammer, “Adversarial data augmentation via deformation statistics,” in European Conference on Computer Vision. Springer, 2020, pp. 643–659.
- [84] Z. Shen, Z. Xu, S. Olut, and M. Niethammer, “Anatomical data augmentation via fluid-based image registration,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 318–328.
- [85] P. Costa, A. Galdran, M. I. Meyer, M. Niemeijer, M. Abràmoff, A. M. Mendonça, and A. Campilho, “End-to-end adversarial retinal image synthesis,” IEEE Transactions on Medical Imaging, vol. 37, no. 3, pp. 781–791, 2017.
- [86] K. Chaitanya, N. Karani, C. F. Baumgartner, A. Becker, O. Donati, and E. Konukoglu, “Semi-supervised and task-driven data augmentation,” in International Conference on Information Processing in Medical Imaging. Springer, 2019, pp. 29–41.
- [87] J. Cai, Z. Zhang, L. Cui, Y. Zheng, and L. Yang, “Towards cross-modal organ translation and segmentation: A cycle-and shape-consistent generative adversarial network,” Medical Image Analysis, vol. 52, pp. 174–184, 2019.
- [88] B. Yu, L. Zhou, L. Wang, Y. Shi, J. Fripp, and P. Bourgeat, “Ea-gans: edge-aware generative adversarial networks for cross-modality mr image synthesis,” IEEE Transactions on Medical Imaging, vol. 38, no. 7, pp. 1750–1762, 2019.
- [89] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
- [90] Y. Qin, H. Zheng, X. Huang, J. Yang, and Y.-M. Zhu, “Pulmonary nodule segmentation with ct sample synthesis using adversarial networks,” Medical Physics, vol. 46, no. 3, pp. 1218–1229, 2019.
- [91] S. Wang, S. Cao, D. Wei, R. Wang, K. Ma, L. Wang, D. Meng, and Y. Zheng, “Lt-net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9162–9171.
- [92] Z. Xu and M. Niethammer, “Deepatlas: Joint semi-supervised learning of image registration and segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 420–429.
- [93] R. El Jurdi, C. Petitjean, P. Honeine, and F. Abdallah, “Bb-unet: U-net with bounding box prior,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1189–1198, 2020.
- [94] O. Oktay, E. Ferrante, K. Kamnitsas, M. Heinrich, W. Bai, J. Caballero, S. A. Cook, A. De Marvao, T. Dawes, D. P. O‘Regan et al., “Anatomically constrained neural networks (acnns): application to cardiac image enhancement and segmentation,” IEEE Transactions on Medical Imaging, vol. 37, no. 2, pp. 384–395, 2017.
- [95] A. V. Dalca, J. Guttag, and M. R. Sabuncu, “Anatomical priors in convolutional networks for unsupervised biomedical segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9290–9299.
- [96] A. J. Larrazabal, C. Martinez, and E. Ferrante, “Anatomical priors for image segmentation via post-processing with denoising autoencoders,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 585–593.
- [97] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in Fourth International Conference on 3D vision (3DV). IEEE, 2016, pp. 565–571.
- [98] J. E. Iglesias and M. R. Sabuncu, “Multi-atlas segmentation of biomedical images: a survey,” Medical Image Analysis, vol. 24, no. 1, pp. 205–219, 2015.
- [99] M. Cabezas, A. Oliver, X. Lladó, J. Freixenet, and M. B. Cuadra, “A review of atlas-based segmentation for magnetic resonance brain images,” Computer Methods and Programs in Biomedicine, vol. 104, no. 3, pp. e158–e177, 2011.
- [100] R. Ito, K. Nakae, J. Hata, H. Okano, and S. Ishii, “Semi-supervised deep learning of brain tissue segmentation,” Neural Networks, vol. 116, pp. 25–34, 2019.
- [101] B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum, “A deep learning framework for unsupervised affine and deformable image registration,” Medical Image Analysis, vol. 52, pp. 128–143, 2019.
- [102] W. Chi, L. Ma, J. Wu, M. Chen, W. Lu, and X. Gu, “Deep learning based medical image segmentation with limited labels,” Physics in Medicine & Biology, 2020.
- [103] Y. He, G. Yang, Y. Chen, Y. Kong, J. Wu, L. Tang, X. Zhu, J.-L. Dillenseger, P. Shao, S. Zhang et al., “Dpa-densebiasnet: Semi-supervised 3d fine renal artery segmentation with dense biased network and deep priori anatomy,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 139–147.
- [104] S. Dong, G. Luo, C. Tam, W. Wang, K. Wang, S. Cao, B. Chen, H. Zhang, and S. Li, “Deep atlas network for efficient 3d left ventricle segmentation on echocardiography,” Medical Image Analysis, vol. 61, p. 101638, 2020.
- [105] H. Zheng, L. Lin, H. Hu, Q. Zhang, Q. Chen, Y. Iwamoto, X. Han, Y.-W. Chen, R. Tong, and J. Wu, “Semi-supervised segmentation of liver using adversarial learning with deep atlas prior,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 148–156.
- [106] M. Vakalopoulou, G. Chassagnon, N. Bus, R. Marini, E. I. Zacharaki, M.-P. Revel, and N. Paragios, “Atlasnet: multi-atlas non-linear deep networks for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 658–666.
- [107] A. J. Larrazabal, C. Martínez, B. Glocker, and E. Ferrante, “Post-dae: Anatomically plausible segmentation via post-processing with denoising autoencoders,” IEEE Transactions on Medical Imaging, 2020.
- [108] Z. Mirikharaji and G. Hamarneh, “Star shape prior in fully convolutional networks for skin lesion segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 737–745.
- [109] J. Duan, G. Bello, J. Schlemper, W. Bai, T. J. Dawes, C. Biffi, A. de Marvao, G. Doumoud, D. P. O’Regan, and D. Rueckert, “Automatic 3d bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach,” IEEE Transactions on Medical Imaging, vol. 38, no. 9, pp. 2151–2164, 2019.
- [110] M. C. H. Lee, K. Petersen, N. Pawlowski, B. Glocker, and M. Schaap, “Tetris: template transformer networks for image segmentation with shape priors,” IEEE Transactions on Medical Imaging, vol. 38, no. 11, pp. 2596–2606, 2019.
- [111] Y. Lu, K. Zheng, W. Li, Y. Wang, A. P. Harrison, C. Lin, S. Wang, J. Xiao, L. Lu, C.-F. Kuo et al., “Contour transformer network for one-shot segmentation of anatomical structures,” IEEE Transactions on Medical Imaging, 2021.
- [112] S. Tilborghs, T. Dresselaers, P. Claus, J. Bogaert, and F. Maes, “Shape constrained cnn for cardiac mr segmentation with simultaneous prediction of shape and pose parameters,” Lecture Notes in Computer Science, 2020.
- [113] Q. Yue, X. Luo, Q. Ye, L. Xu, and X. Zhuang, “Cardiac segmentation from lge mri using deep neural network incorporating shape and spatial priors,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 559–567.
- [114] F. Ambellan, A. Tack, M. Ehlke, and S. Zachow, “Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the osteoarthritis initiative,” Medical Image Analysis, vol. 52, pp. 109–118, 2019.
- [115] N. Painchaud, Y. Skandarani, T. Judge, O. Bernard, A. Lalande, and P.-M. Jodoin, “Cardiac segmentation with strong anatomical guarantees,” IEEE Transactions on Medical Imaging, vol. 39, no. 11, pp. 3703–3713, 2020.
- [116] H. Ravishankar, R. Venkataramani, S. Thiruvenkadam, P. Sudhakar, and V. Vaidya, “Learning and incorporating shape models for semantic segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 203–211.
- [117] C. Zotti, Z. Luo, A. Lalande, and P.-M. Jodoin, “Convolutional neural network with shape prior applied to cardiac mri segmentation,” IEEE Journal of Biomedical and Health Informatics, vol. 23, no. 3, pp. 1119–1128, 2018.
- [118] J. Liu, X. Wang, and X.-c. Tai, “Deep convolutional neural networks with spatial regularization, volume and star-shape priori for image segmentation,” arXiv preprint arXiv:2002.03989, 2020.
- [119] M. Weigert, U. Schmidt, R. Haase, K. Sugawara, and G. Myers, “Star-convex polyhedra for 3d object detection and segmentation in microscopy,” in The IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 3666–3673.
- [120] M. Rak, J. Steffen, A. Meyer, C. Hansen, and K.-D. Tönnies, “Combining convolutional neural networks and star convex cuts for fast whole spine vertebra segmentation in mri,” Computer Methods and Programs in Biomedicine, vol. 177, pp. 47–56, 2019.
- [121] U. Schmidt, M. Weigert, C. Broaddus, and G. Myers, “Cell detection with star-convex polygons,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 265–273.
- [122] J. Liu, X.-C. Tai, and S. Luo, “Convex shape prior for deep neural convolution network based eye fundus images segmentation,” arXiv preprint arXiv:2005.07476, 2020.
- [123] A. BenTaieb and G. Hamarneh, “Topology aware fully convolutional networks for histology gland segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 460–468.
- [124] K. Huang, H.-D. Cheng, Y. Zhang, B. Zhang, P. Xing, and C. Ning, “Medical knowledge constrained semantic breast ultrasound image segmentation,” in 24th International Conference on Pattern Recognition. IEEE, 2018, pp. 1193–1198.
- [125] J. R. Clough, I. Oksuz, N. Byrne, J. A. Schnabel, and A. P. King, “Explicit topological priors for deep-learning based image segmentation using persistent homology,” in International Conference on Information Processing in Medical Imaging. Springer, 2019, pp. 16–28.
- [126] J. Clough, N. Byrne, I. Oksuz, V. A. Zimmer, J. A. Schnabel, and A. King, “A topological loss function for deep-learning based image segmentation using persistent homology,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- [127] N. Byrne, J. R. Clough, G. Montana, and A. P. King, “A persistent homology-based topological loss function for multi-class cnn segmentation of cardiac mri,” arXiv preprint arXiv:2008.09585, 2020.
- [128] H. Kervadec, J. Dolz, M. Tang, E. Granger, Y. Boykov, and I. B. Ayed, “Constrained-cnn losses for weakly supervised segmentation,” Medical Image Analysis, vol. 54, pp. 88–99, 2019.
- [129] M. Bateson, H. Kervadec, J. Dolz, H. Lombaert, and I. B. Ayed, “Constrained domain adaptation for segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 326–334.
- [130] X. Chen, B. M. Williams, S. R. Vallabhaneni, G. Czanner, R. Williams, and Y. Zheng, “Learning active contour models for medical image segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 632–11 640.
- [131] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 41–48.
- [132] A. Pentina, V. Sharmanska, and C. H. Lampert, “Curriculum learning of multiple tasks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5492–5500.
- [133] G. Hacohen and D. Weinshall, “On the power of curriculum learning in training deep networks,” arXiv preprint arXiv:1904.03626, 2019.
- [134] F. Khan, B. Mutlu, and J. Zhu, “How do humans teach: On curriculum learning and teaching dimension,” Advances in Neural Information Processing Systems, vol. 24, pp. 1449–1457, 2011.
- [135] Y. Tang, X. Wang, A. P. Harrison, L. Lu, J. Xiao, and R. M. Summers, “Attention-guided curriculum learning for weakly supervised classification and localization of thoracic diseases on chest radiographs,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2018, pp. 249–258.
- [136] R. Zhao, X. Chen, Z. Chen, and S. Li, “Egdcl: An adaptive curriculum learning framework for unbiased glaucoma diagnosis,” in European Conference on Computer Vision. Springer, 2020, pp. 190–205.
- [137] A. Jiménez-Sánchez, D. Mateus, S. Kirchhoff, C. Kirchhoff, P. Biberthaler, N. Navab, M. A. G. Ballester, and G. Piella, “Medical-based deep curriculum learning for improved fracture classification,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 694–702.
- [138] A. Jesson, N. Guizard, S. H. Ghalehjegh, D. Goblot, F. Soudan, and N. Chapados, “Cased: curriculum adaptive sampling for extreme data imbalance,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 639–646.
- [139] W. Wang, Y. Lu, B. Wu, T. Chen, D. Z. Chen, and J. Wu, “Deep active self-paced learning for accurate pulmonary nodule segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 723–731.
- [140] A. Jiménez-Sánchez, D. Mateus, S. Kirchhoff, C. Kirchhoff, P. Biberthaler, N. Navab, M. A. G. Ballester, and G. Piella, “Curriculum learning for annotation-efficient medical image analysis: scheduling data with prior knowledge and uncertainty,” arXiv preprint arXiv:2007.16102, 2020.
- [141] L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. G. Hauptmann, “Self-paced curriculum learning,” in AAAI, vol. 29, no. 1, 2015, pp. 2694–2700.
- [142] M. P. Kumar, B. Packer, and D. Koller, “Self-paced learning for latent variable models,” in Advances in Neural Information Processing Systems, 2010, pp. 1189–1197.
- [143] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3150–3158.
- [144] D. M. Vigneault, W. Xie, C. Y. Ho, D. A. Bluemke, and J. A. Noble, “-net (omega-net): fully automatic, multi-view cardiac mr detection, orientation, and segmentation with deep neural networks,” Medical Image Analysis, vol. 48, pp. 95–106, 2018.
- [145] Y. Zhang, P. David, and B. Gong, “Curriculum domain adaptation for semantic segmentation of urban scenes,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2020–2030.
- [146] H. Kervadec, J. Dolz, É. Granger, and I. B. Ayed, “Curriculum semi-supervised segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 568–576.
- [147] D.-H. Lee, “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in ICML Workshop on Challenges in Representation Learning, vol. 3, no. 2, 2013.
- [148] I. Triguero, S. García, and F. Herrera, “Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study,” Knowledge and Information Systems, vol. 42, no. 2, pp. 245–284, 2015.
- [149] J. Li, S. Yang, X. Huang, Q. Da, X. Yang, Z. Hu, Q. Duan, C. Wang, and H. Li, “Signet ring cell detection with a semi-supervised learning framework,” in International Conference on Information Processing in Medical Imaging. Springer, 2019, pp. 842–854.
- [150] S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016.
- [151] A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Advances in Neural Information Processing Systems, 2017, pp. 1195–1204.
- [152] D.-P. Fan, T. Zhou, G.-P. Ji, Y. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-net: Automatic covid-19 lung infection segmentation from ct images,” IEEE Transactions on Medical Imaging, 2020.
- [153] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in International Conference on Machine Learning, 2016, pp. 1050–1059.
- [154] A. Der Kiureghian and O. Ditlevsen, “Aleatory or epistemic? does it matter?” Structural Safety, vol. 31, no. 2, pp. 105–112, 2009.
- [155] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning, 2015, pp. 1613–1622.
- [156] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,” in Advances in Neural Information Processing Systems, 2019, pp. 13 991–14 002.
- [157] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” in Advances in Neural Information Processing Systems, 2017, pp. 5574–5584.
- [158] T. Nair, D. Precup, D. L. Arnold, and T. Arbel, “Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation,” Medical Image Analysis, vol. 59, p. 101557, 2020.
- [159] S. Graham, H. Chen, J. Gamper, Q. Dou, P.-A. Heng, D. Snead, Y. W. Tsang, and N. Rajpoot, “Mild-net: minimal information loss dilated network for gland instance segmentation in colon histology images,” Medical Image Analysis, vol. 52, pp. 199–211, 2019.
- [160] A. Jungo and M. Reyes, “Assessing reliability and challenges of uncertainty estimations for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 48–56.
- [161] X. Cao, H. Chen, Y. Li, Y. Peng, S. Wang, and L. Cheng, “Uncertainty aware temporal-ensembling model for semi-supervised abus mass segmentation,” IEEE Transactions on Medical Imaging, 2020.
- [162] J. Wen, N. Zheng, J. Yuan, Z. Gong, and C. Chen, “Bayesian uncertainty matching for unsupervised domain adaptation,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence. AAAI Press, 2019, pp. 3849–3855.
- [163] A. Mehrtash, W. M. Wells, C. M. Tempany, P. Abolmaesumi, and T. Kapur, “Confidence calibration and predictive uncertainty estimation for deep medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 12, pp. 3868–3878, 2020.
- [164] G. Wang, W. Li, M. Aertsen, J. Deprest, S. Ourselin, and T. Vercauteren, “Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks,” Neurocomputing, vol. 338, pp. 34–45, 2019.
- [165] Y. Hiasa, Y. Otake, M. Takao, T. Ogawa, N. Sugano, and Y. Sato, “Automated muscle segmentation from clinical ct using bayesian u-net for personalized musculoskeletal modeling,” IEEE Transactions on Medical Imaging, vol. 39, no. 4, pp. 1030–1040, 2019.
- [166] H. Wang and D.-Y. Yeung, “Towards bayesian deep learning: A framework and some existing methods,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 12, pp. 3395–3408, 2016.
- [167] L. Venturini, A. T. Papageorghiou, J. A. Noble, and A. I. Namburete, “Uncertainty estimates as data selection criteria to boost omni-supervised learning,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 689–698.
- [168] M. Teye, H. Azizpour, and K. Smith, “Bayesian uncertainty estimation for batch normalized deep networks,” in International Conference on Machine Learning, 2018.
- [169] B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Advances in Neural Information Processing Systems, 2017, pp. 6402–6413.
- [170] R. Camarasa, D. Bos, J. Hendrikse, P. Nederkoorn, E. Kooi, A. van der Lugt, and M. de Bruijne, “Quantitative comparison of monte-carlo dropout uncertainty measures for multi-class segmentation,” in Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and Graphs in Biomedical Image Analysis. Springer, 2020, pp. 32–41.
- [171] M. S. Ayhan and P. Berens, “Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks,” in Medical Imaging with Deep Learning, 2018.
- [172] A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998, pp. 92–100.
- [173] W. Wang and Z.-H. Zhou, “A new analysis of co-training,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, 2010, pp. 1135–1142.
- [174] J. Peng, G. Estrada, M. Pedersoli, and C. Desrosiers, “Deep co-training for semi-supervised image segmentation,” Pattern Recognition, p. 107269, 2020.
- [175] S. Qiao, W. Shen, Z. Zhang, B. Wang, and A. Yuille, “Deep co-training for semi-supervised image recognition,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 135–152.
- [176] Y. Zhou, Y. Wang, P. Tang, W. Shen, E. K. Fishman, and A. L. Yuille, “Semi-supervised multi-organ segmentation via multi-planar co-training,” arXiv preprint arXiv:1804.02586, 2018.
- [177] Y. Xia, D. Yang, Z. Yu, F. Liu, J. Cai, L. Yu, Z. Zhu, D. Xu, A. Yuille, and H. Roth, “Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation,” Medical Image Analysis, vol. 65, p. 101766, 2020.
- [178] M. Sajjadi, M. Javanmardi, and T. Tasdizen, “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” Advances in Neural Information Processing Systems, vol. 29, pp. 1163–1171, 2016.
- [179] P. Bachman, O. Alsharif, and D. Precup, “Learning with pseudo-ensembles,” in Advances in Neural Information Processing Systems, 2014, pp. 3365–3373.
- [180] W. Cui, Y. Liu, Y. Li, M. Guo, Y. Li, X. Li, T. Wang, X. Zeng, and C. Ye, “Semi-supervised brain lesion segmentation with an adapted mean teacher model,” in International Conference on Information Processing in Medical Imaging. Springer, 2019, pp. 554–565.
- [181] X. Li, L. Yu, H. Chen, C.-W. Fu, L. Xing, and P.-A. Heng, “Transformation-consistent self-ensembling model for semisupervised medical image segmentation,” IEEE Transactions on Neural Networks and Learning Systems, 2020.
- [182] Y. Zhou, H. Chen, H. Lin, and P.-A. Heng, “Deep semi-supervised knowledge distillation for overlapping cervical cell instance segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 521–531.
- [183] G. Fotedar, N. Tajbakhsh, S. Ananth, and X. Ding, “Extreme consistency: Overcoming annotation scarcity and domain shifts,” arXiv preprint arXiv:2004.11966, 2020.
- [184] Q. Liu, L. Yu, L. Luo, Q. Dou, and P. A. Heng, “Semi-supervised medical image classification with relation-driven self-ensembling model,” IEEE Transactions on Medical Imaging, vol. 39, no. 11, pp. 3429–3440, 2020.
- [185] I. Laradji, P. Rodriguez, O. Manas, K. Lensink, M. Law, L. Kurzman, W. Parker, D. Vazquez, and D. Nowrouzezahrai, “A weakly supervised consistency-based learning method for covid-19 segmentation in ct images,” arXiv preprint arXiv:2007.02180, 2020.
- [186] J. Peng, M. Pedersoli, and C. Desrosiers, “Mutual information deep regularization for semi-supervised segmentation,” in Medical Imaging with Deep Learning. PMLR, 2020, pp. 601–613.
- [187] K. Fang and W.-J. Li, “Dmnet: Difference minimization network for semi-supervised segmentation in medical images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 532–541.
- [188] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised visual representation learning by context prediction,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1422–1430.
- [189] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in European Conference on Computer Vision. Springer, 2016, pp. 69–84.
- [190] G. Larsson, M. Maire, and G. Shakhnarovich, “Learning representations for automatic colorization,” in European Conference on Computer Vision. Springer, 2016, pp. 577–593.
- [191] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” arXiv preprint arXiv:1803.07728, 2018.
- [192] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2536–2544.
- [193] N. Tajbakhsh, Y. Hu, J. Cao, X. Yan, Y. Xiao, Y. Lu, J. Liang, D. Terzopoulos, and X. Ding, “Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data,” in IEEE 16th International Symposium on Biomedical Imaging. IEEE, 2019, pp. 1251–1255.
- [194] A. Taleb, W. Loetzsch, N. Danz, J. Severin, T. Gaertner, B. Bergner, and C. Lippert, “3d self-supervised methods for medical imaging,” in Advances in Neural Information Processing Systems, 2020.
- [195] A. Taleb, C. Lippert, T. Klein, and M. Nabi, “Multimodal self-supervised learning for medical image analysis,” arXiv preprint arXiv:1912.05396, 2019.
- [196] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic segmentation using adversarial networks,” in NIPS Workshop on Adversarial Training, 2016.
- [197] D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, D. Metaxas, and D. Comaniciu, “Automatic liver segmentation using an adversarial image-to-image network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 507–515.
- [198] Z. Han, B. Wei, A. Mercado, S. Leung, and S. Li, “Spine-gan: Semantic segmentation of multiple spinal structures,” Medical Image Analysis, vol. 50, pp. 23–35, 2018.
- [199] W. C. Hung, Y. H. Tsai, Y. T. Liou, Y. Y. Lin, and M. H. Yang, “Adversarial learning for semi-supervised semantic segmentation,” in 29th British Machine Vision Conference, 2019.
- [200] N. Souly, C. Spampinato, and M. Shah, “Semi and weakly supervised semantic segmentation using generative adversarial network,” arXiv preprint arXiv:1703.09695, 2017.
- [201] L. Han, Y. Huang, H. Dou, S. Wang, S. Ahamad, H. Luo, Q. Liu, J. Fan, and J. Zhang, “Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network,” Computer Methods and Programs in Biomedicine, vol. 189, p. 105275, 2020.
- [202] Z. Dai, Z. Yang, F. Yang, W. W. Cohen, and R. R. Salakhutdinov, “Good semi-supervised learning that requires a bad gan,” Advances in Neural Information Processing Systems, vol. 30, pp. 6510–6520, 2017.
- [203] S. Li, C. Zhang, and X. He, “Shape-aware semi-supervised 3d semantic segmentation for medical images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 552–561.
- [204] Q. Dou, C. Ouyang, C. Chen, H. Chen, and P.-A. Heng, “Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 691–697.
- [205] Q. Dou, C. Ouyang, C. Chen, H. Chen, B. Glocker, X. Zhuang, and P.-A. Heng, “Pnp-adanet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation,” IEEE Access, vol. 7, pp. 99 065–99 076, 2019.
- [206] S. Wang, L. Yu, X. Yang, C.-W. Fu, and P.-A. Heng, “Patch-based output space adversarial learning for joint optic disc and cup segmentation,” IEEE Transactions on Medical Imaging, vol. 38, no. 11, pp. 2485–2495, 2019.
- [207] C. Chen, Q. Dou, H. Chen, J. Qin, and P. A. Heng, “Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation,” IEEE Transactions on Medical Imaging, vol. 39, no. 7, pp. 2494–2505, 2020.
- [208] D. Nie, Y. Gao, L. Wang, and D. Shen, “Asdnet: Attention based semi-supervised deep networks for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 370–378.
- [209] Y. Zhang, L. Yang, J. Chen, M. Fredericksen, D. P. Hughes, and D. Z. Chen, “Deep adversarial networks for biomedical image segmentation utilizing unannotated images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 408–416.
- [210] Z. Zhang, L. Yang, and Y. Zheng, “Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9242–9251.
- [211] A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” arXiv preprint arXiv:1709.03410, 2017.
- [212] L. Fei-Fei, R. Fergus, and P. Perona, “One-shot learning of object categories,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 4, pp. 594–611, 2006.
- [213] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Computing Surveys (CSUR), vol. 53, no. 3, pp. 1–34, 2020.
- [214] S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool, “One-shot video object segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 221–230.
- [215] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger, “‘squeeze & excite’ guided few-shot segmentation of volumetric images,” Medical Image Analysis, vol. 59, p. 101587, 2020.
- [216] A. G. Roy, N. Navab, and C. Wachinger, “Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 421–429.
- [217] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
- [218] C. Ouyang, C. Biffi, C. Chen, T. Kart, H. Qiu, and D. Rueckert, “Self-supervision with superpixels: Training few-shot medical image segmentation without annotation,” in European Conference on Computer Vision. Springer, 2020, pp. 762–780.
- [219] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan, “A theory of learning from different domains,” Machine learning, vol. 79, no. 1, pp. 151–175, 2010.
- [220] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, “3d u-net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2016, pp. 424–432.
- [221] W. Bai, H. Suzuki, C. Qin, G. Tarroni, O. Oktay, P. M. Matthews, and D. Rueckert, “Recurrent neural networks for aortic image sequence segmentation with sparse annotations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 586–594.
- [222] Z. Zhang, J. Li, Z. Zhong, Z. Jiao, and X. Gao, “A sparse annotation strategy based on attention-guided active learning for 3d medical image segmentation,” arXiv preprint arXiv:1906.07367, 2019.
- [223] A. Bitarafan, M. Nikdan, and M. Soleymanibaghshah, “3d image segmentation with sparse annotation by self-training and internal registration,” IEEE Journal of Biomedical and Health Informatics, 2020.
- [224] H. Zheng, S. M. M. Perrine, M. K. Pitirri, K. Kawasaki, C. Wang, J. T. Richtsmeier, and D. Z. Chen, “Cartilage segmentation in high-resolution 3d micro-ct images via uncertainty-guided self-training with very sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 802–812.
- [225] H. Zheng, Y. Zhang, L. Yang, C. Wang, and D. Z. Chen, “An annotation sparsification strategy for 3d medical image segmentation via representative selection and self-training.” in AAAI, 2020, pp. 6925–6932.
- [226] S. Wang, D. Nie, L. Qu, Y. Shao, J. Lian, Q. Wang, and D. Shen, “Ct male pelvic organ segmentation via hybrid loss network with incomplete annotation,” IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 2151–2162, 2020.
- [227] D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribble-supervised convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3159–3167.
- [228] M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised cnn segmentation,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 507–522.
- [229] M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers, “Normalized cut loss for weakly-supervised cnn segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1818–1827.
- [230] H.-T. Cheng, C.-F. Yeh, P.-C. Kuo, A. Wei, K.-C. Liu, M.-C. Ko, K.-H. Chao, Y.-C. Peng, and T.-L. Liu, “Self-similarity student for partial label histopathology image segmentation,” in European Conference on Computer Vision. Springer, 2020, pp. 117–132.
- [231] C. Rother, V. Kolmogorov, and A. Blake, “”grabcut” interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 309–314, 2004.
- [232] S. Majumder and A. Yao, “Content-aware multi-level guidance for interactive instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 602–11 611.
- [233] Y. Han and Z. Zhang, “Deep learning assisted image interactive framework for brain image segmentation,” IEEE Access, vol. 8, pp. 117 028–117 035, 2020.
- [234] G. Wang, M. A. Zuluaga, W. Li, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin et al., “Deepigeos: a deep interactive geodesic framework for medical image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1559–1572, 2018.
- [235] G. Wang, M. Aertsen, J. Deprest, S. Ourselin, T. Vercauteren, and S. Zhang, “Uncertainty-guided efficient interactive refinement of fetal brain segmentation from stacks of mri slices,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 279–288.
- [236] J. Zhang, G. Wang, H. Xie, S. Zhang, N. Huang, S. Zhang, and L. Gu, “Weakly supervised vessel segmentation in x-ray angiograms by self-paced learning from noisy labels with suggestive annotation,” Neurocomputing, vol. 417, pp. 114–127, 2020.
- [237] J. Zhang, Y. Shi, J. Sun, L. Wang, L. Zhou, Y. Gao, and D. Shen, “Interactive medical image segmentation via a point-based interaction,” Artificial Intelligence in Medicine, p. 101998, 2020.
- [238] G. Wang, W. Li, M. A. Zuluaga, R. Pratt, P. A. Patel, M. Aertsen, T. Doel, A. L. David, J. Deprest, S. Ourselin et al., “Interactive medical image segmentation using deep learning with image-specific fine tuning,” IEEE Transactions on Medical Imaging, vol. 37, no. 7, pp. 1562–1573, 2018.
- [239] B. Zhou, L. Chen, and Z. Wang, “Interactive deep editing framework for medical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 329–337.
- [240] X. Liao, W. Li, Q. Xu, X. Wang, B. Jin, X. Zhang, Y. Wang, and Y. Zhang, “Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9394–9402.
- [241] L. Yang, Y. Zhang, J. Chen, S. Zhang, and D. Z. Chen, “Suggestive annotation: A deep active learning framework for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2017, pp. 399–407.
- [242] S. Belharbi, I. B. Ayed, L. McCaffrey, and E. Granger, “Deep active learning for joint classification & segmentation with weak annotator,” arXiv preprint arXiv:2010.04889, 2020.
- [243] J. Sourati, A. Gholipour, J. G. Dy, X. Tomas-Fernandez, S. Kurugol, and S. K. Warfield, “Intelligent labeling based on fisher information for medical image segmentation using deep learning,” IEEE Transactions on Medical Imaging, vol. 38, no. 11, pp. 2642–2653, 2019.
- [244] D. Mahapatra, B. Bozorgtabar, J.-P. Thiran, and M. Reyes, “Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 580–588.
- [245] H. Shen, K. Tian, P. Dong, J. Zhang, K. Yan, S. Che, J. Yao, P. Luo, and X. Han, “Deep active learning for breast cancer segmentation on immunohistochemistry images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 509–518.
- [246] Z. Ji, Y. Shen, C. Ma, and M. Gao, “Scribble-based hierarchical weakly supervised learning for brain tumor segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 175–183.
- [247] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” Advances in Neural Information Processing Systems, vol. 24, pp. 109–117, 2011.
- [248] Z. Zhao, L. Yang, H. Zheng, I. H. Guldner, S. Zhang, and D. Z. Chen, “Deep learning based instance segmentation in 3d biomedical images using weak annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 352–360.
- [249] L. Peng, L. Lin, H. Hu, Y. Zhang, H. Li, Y. Iwamoto, X. Han, and Y. W. Chen, “Semi-supervised learning for semantic segmentation of emphysema with partial annotations,” IEEE Journal of Biomedical and Health Informatics, vol. 8, no. 24, pp. 2327–2336, 2019.
- [250] M. Dong, D. Liu, Z. Xiong, X. Chen, Y. Zhang, Z.-J. Zha, G. Bi, and F. Wu, “Towards neuron segmentation from macaque brain images: A weakly supervised approach,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 194–203.
- [251] H. Zheng, Z. Zhuang, Y. Qin, Y. Gu, J. Yang, and G.-Z. Yang, “Weakly supervised deep learning for breast cancer segmentation with coarse annotations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 450–459.
- [252] H. Lee and W.-K. Jeong, “Scribble2label: Scribble-supervised cell segmentation via self-generating pseudo-labels with consistency,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 14–23.
- [253] A. Bearman, O. Russakovsky, V. Ferrari, and L. Fei-Fei, “What’s the point: Semantic segmentation with point supervision,” in European Conference on Computer Vision. Springer, 2016, pp. 549–565.
- [254] Y. B. Can, K. Chaitanya, B. Mustafa, L. M. Koch, E. Konukoglu, and C. F. Baumgartner, “Learning to segment medical images with scribble-supervision alone,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2018, pp. 236–244.
- [255] L. Lejeune, J. Grossrieder, and R. Sznitman, “Iterative multi-path tracking for video and volume segmentation with sparse point supervision,” Medical Image Analysis, vol. 50, pp. 65–81, 2018.
- [256] H. Qu, P. Wu, Q. Huang, J. Yi, Z. Yan, K. Li, G. M. Riedlinger, S. De, S. Zhang, and D. N. Metaxas, “Weakly supervised deep nuclei segmentation using partial points annotation in histopathology images,” IEEE Transactions on Medical Imaging, vol. 39, no. 11, pp. 3655–3666, 2020.
- [257] H. Qu, J. Yi, Q. Huang, P. Wu, and D. Metaxas, “Nuclei segmentation using mixed points and masks selected from uncertainty,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020, pp. 973–976.
- [258] I. H. Laradji, N. Rostamzadeh, P. O. Pinheiro, D. Vazquez, and M. Schmidt, “Where are the blobs: Counting by localization with point supervision,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 547–562.
- [259] Y. Liu, M. Shi, Q. Zhao, and X. Wang, “Point in, box out: Beyond counting persons in crowds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 6469–6478.
- [260] Z. Gao, P. Puttapirat, J. Shi, and C. Li, “Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 439–448.
- [261] P. Kainz, M. Urschler, S. Schulter, P. Wohlhart, and V. Lepetit, “You should use regression to detect cells,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2015, pp. 276–283.
- [262] Y. Zhou, Q. Dou, H. Chen, J. Qin, and P.-A. Heng, “Sfcn-opi: Detection and fine-grained classification of nuclei using sibling fcn with objectness prior interaction,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 2652–2659.
- [263] I. Yoo, D. Yoo, and K. Paeng, “Pseudoedgenet: Nuclei segmentation only with point annotations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 731–739.
- [264] C. Li, X. Wang, W. Liu, L. J. Latecki, B. Wang, and J. Huang, “Weakly supervised mitosis detection in breast histopathology images using concentric loss,” Medical Image Analysis, vol. 53, pp. 165–178, 2019.
- [265] H. Qu, P. Wu, Q. Huang, J. Yi, G. M. Riedlinger, S. De, and D. N. Metaxas, “Weakly supervised deep nuclei segmentation using points annotation in histopathology images,” in International Conference on Medical Imaging with Deep Learning, 2019, pp. 390–400.
- [266] F. Lu, F. Wu, P. Hu, Z. Peng, and D. Kong, “Automatic 3d liver location and segmentation via convolutional neural network and graph cut,” International Journal of Computer Assisted Radiology and Surgery, vol. 12, no. 2, pp. 171–182, 2017.
- [267] P. Bilic, P. F. Christ, E. Vorontsov, G. Chlebus, H. Chen, Q. Dou, C.-W. Fu, X. Han, P.-A. Heng, J. Hesser et al., “The liver tumor segmentation benchmark (lits),” arXiv preprint arXiv:1901.04056, 2019.
- [268] N. Heller, N. Sathianathen, A. Kalapara, E. Walczak, K. Moore, H. Kaluzniak, J. Rosenberg, P. Blake, Z. Rengel, M. Oestreich et al., “The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes,” arXiv preprint arXiv:1904.00445, 2019.
- [269] N. Heller, F. Isensee, K. H. Maier-Hein, X. Hou, C. Xie, F. Li, Y. Nan, G. Mu, Z. Lin, M. Han et al., “The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge,” Medical Image Analysis, vol. 67, p. 101821, 2019.
- [270] A. A. A. Setio, A. Traverso, T. De Bel, M. S. Berens, C. van den Bogaard, P. Cerello, H. Chen, Q. Dou, M. E. Fantacci, B. Geurts et al., “Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge,” Medical Image Analysis, vol. 42, pp. 1–13, 2017.
- [271] Y. Xie, J. Zhang, and Y. Xia, “Semi-supervised adversarial model for benign–malignant lung nodule classification on chest ct,” Medical Image Analysis, vol. 57, pp. 237–248, 2019.
- [272] H. Cao, H. Liu, E. Song, G. Ma, X. Xu, R. Jin, T. Liu, and C.-C. Hung, “A two-stage convolutional neural networks for lung nodule detection,” IEEE Journal of Biomedical and Health Informatics, vol. 24, no. 7, pp. 2006–2015, 2020.
- [273] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest et al., “The multimodal brain tumor image segmentation benchmark (brats),” IEEE Transactions on Medical Imaging, vol. 34, no. 10, pp. 1993–2024, 2014.
- [274] Y. Zhou, Z. Li, S. Bai, C. Wang, X. Chen, M. Han, E. Fishman, and A. L. Yuille, “Prior-aware neural network for partially-supervised multi-organ segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 10 672–10 681.
- [275] K. Dmitriev and A. E. Kaufman, “Learning multi-class segmentations from single-class datasets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9501–9511.
- [276] R. Huang, Y. Zheng, Z. Hu, S. Zhang, and H. Li, “Multi-organ segmentation via co-training weight-averaged models from few-organ datasets,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 146–155.
- [277] K. Yan, J. Cai, Y. Zheng, A. P. Harrison, D. Jin, Y.-B. Tang, Y.-X. Tang, L. Huang, J. Xiao, and L. Lu, “Learning from multiple datasets with heterogeneous and partial labels for universal lesion detection in ct,” arXiv preprint arXiv:2009.02577, 2020.
- [278] G. Shi, L. Xiao, Y. Chen, and S. K. Zhou, “Marginal loss and exclusion loss for partially supervised multi-organ segmentation,” arXiv preprint arXiv:2007.03868, 2020.
- [279] J. Zhang, Y. Xie, Y. Xia, and C. Shen, “Dodnet: Learning to segment multi-organ and tumors from multiple partially labeled datasets,” arXiv preprint arXiv:2011.10217, 2020.
- [280] A. K. Menon, B. Van Rooyen, and N. Natarajan, “Learning from binary labels with instance-dependent noise,” Machine Learning, vol. 107, no. 8-10, pp. 1561–1595, 2018.
- [281] J. Cheng, T. Liu, K. Ramamohanarao, and D. Tao, “Learning with bounded instance and label-dependent label noise,” in International Conference on Machine Learning. PMLR, 2020, pp. 1789–1799.
- [282] N. Natarajan, I. S. Dhillon, P. K. Ravikumar, and A. Tewari, “Learning with noisy labels,” Advances in Neural Information Processing Systems, vol. 26, pp. 1196–1204, 2013.
- [283] D. Angluin and P. Laird, “Learning from noisy examples,” Machine Learning, vol. 2, no. 4, pp. 343–370, 1988.
- [284] N. Heller, J. Dean, and N. Papanikolopoulos, “Imperfect segmentation labels: How much do they matter?” in Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis. Springer, 2018, pp. 112–120.
- [285] H. Le, D. Samaras, T. Kurc, R. Gupta, K. Shroyer, and J. Saltz, “Pancreatic cancer detection in whole slide images using noisy label annotations,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 541–549.
- [286] B.-B. Gao, C. Xing, C.-W. Xie, J. Wu, and X. Geng, “Deep label distribution learning with label ambiguity,” IEEE Transactions on Image Processing, vol. 26, no. 6, pp. 2825–2838, 2017.
- [287] T. Zhang, L. Yu, N. Hu, S. Lv, and S. Gu, “Robust medical image segmentation from non-expert annotations with tri-network,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 249–258.
- [288] Y. Dgani, H. Greenspan, and J. Goldberger, “Training a neural network based on unreliable human annotation of medical images,” in IEEE 15th International Symposium on Biomedical Imaging. IEEE, 2018, pp. 39–42.
- [289] C. Xue, Q. Dou, X. Shi, H. Chen, and P.-A. Heng, “Robust learning at noisy labeled medical images: Applied to skin lesion classification,” in IEEE 16th International Symposium on Biomedical Imaging. IEEE, 2019, pp. 1280–1283.
- [290] B. Han, J. Yao, G. Niu, M. Zhou, I. Tsang, Y. Zhang, and M. Sugiyama, “Masking: A new perspective of noisy supervision,” Advances in Neural Information Processing Systems, vol. 31, pp. 5836–5846, 2018.
- [291] G. Algan, I. Ulusoy, Ş. Gönül, B. Turgut, and B. Bakbak, “Deep learning from small amount of medical data with noisy labels: A meta-learning approach,” arXiv preprint arXiv:2010.06939, 2020.
- [292] Z. Cao, G. Yang, Q. Chen, X. Chen, and F. Lv, “Breast tumor classification through learning from noisy labeled ultrasound images,” Medical Physics, vol. 47, no. 3, pp. 1048–1057, 2020.
- [293] D. Karimi, J. M. Peters, A. Ouaalam, S. P. Prabhu, M. Sahin, D. A. Krueger, A. Kolevzon, C. Eng, S. K. Warfield, and A. Gholipour, “Learning to detect brain lesions from noisy annotations,” in IEEE 17th International Symposium on Biomedical Imaging. IEEE, 2020, pp. 1910–1914.
- [294] M. Ren, W. Zeng, B. Yang, and R. Urtasun, “Learning to reweight examples for robust deep learning,” arXiv preprint arXiv:1803.09050, 2018.
- [295] J. Du and Z. Cai, “Modelling class noise with symmetric and asymmetric distributions,” in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, pp. 2589–2595.
- [296] B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” in Advances in Neural Information Processing Systems, 2018, pp. 8527–8537.
- [297] H. Zhu, J. Shi, and J. Wu, “Pick-and-learn: Automatic quality evaluation for noisy-labeled image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 576–584.
- [298] C. Xue, Q. Deng, X. Li, Q. Dou, and P.-A. Heng, “Cascaded robust learning at imperfect labels for chest x-ray segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 579–588.
- [299] Z. Mirikharaji, Y. Yan, and G. Hamarneh, “Learning to segment skin lesions from noisy annotations,” in Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data. Springer, 2019, pp. 207–215.
- [300] Y. Shu, X. Wu, and W. Li, “Lvc-net: Medical image segmentation with noisy label based on local visual cues,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2019, pp. 558–566.
- [301] M. Zhang, J. Gao, Z. Lyu, W. Zhao, Q. Wang, W. Ding, S. Wang, Z. Li, and S. Cui, “Characterizing label errors: Confident learning for noisy-labeled image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2020, pp. 721–730.
- [302] C. G. Northcutt, L. Jiang, and I. L. Chuang, “Confident learning: Estimating uncertainty in dataset labels,” arXiv preprint arXiv:1911.00068, 2019.
- [303] S. Min, X. Chen, Z.-J. Zha, F. Wu, and Y. Zhang, “A two-stream mutual attention network for semi-supervised biomedical segmentation with noisy labels,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 4578–4585.
- [304] J. Dai, K. He, and J. Sun, “Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1635–1643.
- [305] G. Papandreou, L.-C. Chen, K. P. Murphy, and A. L. Yuille, “Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1742–1750.
- [306] A. Khoreva, R. Benenson, J. Hosang, M. Hein, and B. Schiele, “Simple does it: Weakly supervised instance and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 876–885.
- [307] C.-C. Hsu, K.-J. Hsu, C.-C. Tsai, Y.-Y. Lin, and Y.-Y. Chuang, “Weakly supervised instance segmentation using the bounding box tightness prior,” in Advances in Neural Information Processing Systems, 2019, pp. 6586–6597.
- [308] M. Rajchl, M. C. Lee, O. Oktay, K. Kamnitsas, J. Passerat-Palmbach, W. Bai, M. Damodaram, M. A. Rutherford, J. V. Hajnal, B. Kainz et al., “Deepcut: Object segmentation from bounding box annotations using convolutional neural networks,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 674–683, 2016.
- [309] H. Kervadec, J. Dolz, S. Wang, E. Granger, and I. B. Ayed, “Bounding boxes for weakly supervised segmentation: global constraints get close to full supervision,” arXiv preprint arXiv:2004.06816, 2020.
- [310] V. Lempitsky, P. Kohli, C. Rother, and T. Sharp, “Image segmentation with a bounding box prior,” in IEEE 12th International Conference on Computer Vision. IEEE, pp. 277–284.
- [311] S. Wang, Q. Wang, Y. Shao, L. Qu, C. Lian, J. Lian, and D. Shen, “Iterative label denoising network: Segmenting male pelvic organs in ct from 3d bounding box annotations,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 10, pp. 2710–2720, 2020.
- [312] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay, “Adversarial attacks and defences: A survey,” arXiv preprint arXiv:1810.00069, 2018.
- [313] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International Conference on Machine Learning. PMLR, 2020, pp. 1597–1607.
- [314] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.
- [315] K. Chaitanya, E. Erdil, N. Karani, and E. Konukoglu, “Contrastive learning of global and local features for medical image segmentation with limited annotations,” Advances in Neural Information Processing Systems, vol. 33, 2020.
- [316] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton, “Big self-supervised models are strong semi-supervised learners,” arXiv preprint arXiv:2006.10029, 2020.
- [317] T. Matiisen, A. Oliver, T. Cohen, and J. Schulman, “Teacher-student curriculum learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3732–3740, 2020.
- [318] D. Weinshall and D. Amir, “Theory of curriculum learning, with convex loss functions,” Journal of Machine Learning Research, vol. 21, no. 222, pp. 1–19, 2020.
- [319] S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi, “A theoretical analysis of contrastive unsupervised representation learning,” arXiv preprint arXiv:1902.09229, 2019.
- [320] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
- [321] S. Karakanis and G. Leontidis, “Lightweight deep learning models for detecting covid-19 from chest x-ray images,” Computers in Biology and Medicine, vol. 130, p. 104181.
- [322] Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “Model compression and acceleration for deep neural networks: The principles, progress, and challenges,” IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 126–136, 2018.
- [323] X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
- [324] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” 2017.
- [325] P. Khandelwal and P. Yushkevich, “Domain generalizer: A few-shot meta learning framework for domain generalization in medical imaging,” in Domain Adaptation and Representation Transfer, and Distributed and Collaborative Learning. Springer, 2020, pp. 73–84.
- [326] P. Tian, Z. Wu, L. Qi, L. Wang, Y. Shi, and Y. Gao, “Differentiable meta-learning model for few-shot semantic segmentation.” in AAAI, 2020, pp. 12 087–12 094.
- [327] M. Ren, E. Triantafillou, S. Ravi, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel, “Meta-learning for semi-supervised few-shot classification,” arXiv preprint arXiv:1803.00676, 2018.