11email: [email protected] 22institutetext: Tencent AI Lab, Shenzhen, China
22email: [email protected] 33institutetext: Department of Automation, Tsinghua University, Beijing, China 33email: [email protected]
44institutetext: University of California, Los Angeles, USA
55institutetext: The Chinese University of Hong Kong, Hong Kong, China
Human-machine Interactive Tissue Prototype Learning for Label-efficient Histopathology Image Segmentation
Abstract
Deep learning have greatly advanced histopathology image segmentation but usually require abundant annotated data. However, due to the gigapixel scale of whole slide images and pathologists’ heavy daily workload, obtaining pixel-level labels for supervised learning in clinical practice is often infeasible. Alternatively, weakly-supervised segmentation methods have been explored with less laborious image-level labels, but their performance is unsatisfactory due to the lack of dense supervision. Inspired by the recent success of self-supervised learning, we present a label-efficient tissue prototype dictionary building pipeline and propose to use the obtained prototypes to guide histopathology image segmentation. Particularly, taking advantage of self-supervised contrastive learning, an encoder is trained to project the unlabeled histopathology image patches into a discriminative embedding space where these patches are clustered to identify the tissue prototypes by efficient pathologists’ visual examination. Then, the encoder is used to map the images into the embedding space and generate pixel-level pseudo tissue masks by querying the tissue prototype dictionary. Finally, the pseudo masks are used to train a segmentation network with dense supervision for better performance. Experiments on two public datasets demonstrate that our method can achieve comparable segmentation performance as the fully-supervised baselines with less annotation burden and outperform other weakly-supervised methods. Codes are available at https://github.com/WinterPan2017/proto2seg.
Keywords:
WSI Segmentation Label-efficient Learning Clustering.1 Introduction
Visual examination of tissue sections under a microscope is a crucial step for disease assessment and prognosis. Automatic segmentation of WSIs is in high demand because it helps pathologists quantify tissue distribution. So far, supervised learning approaches have shown state-of-the-art performance on WSI segmentation with abundant annotated data [4]. Unfortunately, it takes hours to annotate pixel-wise labels for a WSI with the gigapixel scale as shown in Fig. 1 (a). Thus, abundant pixel-level labels for supervised learning are often infeasible in busy daily clinical practice. Alternatively, weakly-supervised segmentation methods have been explored to supervise the model training process with less laborious image-level labels, as shown in Fig. 1 (b), and estimate segmentation results. However, these weakly-supervised methods suffer from unsatisfied pixel-level predictions due to the lack of dense supervision. Therefore, we wonder if there exist other WSI analysis pipelines that can provide pixel-level supervision without a heavy labeling workload.

Recent years have witnessed substantial progress in the unsupervised learning of natural image analysis, especially that made by contrastive learning [7, 9]. Inspired by previous works [22, 21] where researchers achieved high tissue classification accuracy by performing clustering algorithms on self-supervised learned histopathology representations, in this work, we make a further step to build a bridge between contrastive learning based WSI patch pre-training and pixel-level tissue segmentation with a human-machine interactive tissue prototype learning pipeline, namely Proto2Seg. Particularly, taking advantage of self-supervised contrastive learning, we crop unlabeled WSIs into local patches and use them to train an encoder that can project these patches into a discriminative embedding space. These patches are then divided into different clusters in the embedding space via unsupervised clustering. By examining dozens of representative tissue patches in each cluster as shown in Fig. 1 (c), pathologists can efficiently determine whether a cluster is a target tissue type or not. The centroids of pathologist-selected clusters are collected to build a tissue prototype dictionary, with which we can use the encoder to map the original WSIs into the embedding space and generate pseudo tissue masks by querying the nearest prototype to current local regions with flexible settings. We further adopt a refinement strategy to use the generated masks as the dense supervision for training a segmentation network from scratch. As such, pathologists only need to spend several minutes examining representative patches in every cluster for WSI segmentation model training rather than hour-counted dense annotation.
In summary, our contributions are in the following aspects: (1) We make one of the early attempts to bridge the contrastive learning-based WSI patch pre-training and dense segmentation by a low-labor-cost human-machine interactive labeling tissue prototype dictionary. (2) We propose an effective framework, namely Proto2Seg, to generate coarse tissue masks by querying the tissue prototype dictionary and designing a customized query process to further improve the coarse segmentation results. (3) By using the coarse tissue masks to supervise the training of histopathology segmentation networks, the quantitative and qualitative experimental results on two public datasets demonstrate that our approach achieves comparable segmentation performance to the fully-supervised upper bound and is superior to other weakly-supervised methods.
2 Related Works
Self-supervised WSI Analysis: Inspired by the recent success of contrastive learning [7, 9] in natural image analysis, there have been some works [22, 23] where researchers fine-tuned a model with contrastive learning based pre-trained weights under image-level supervision for WSI patch classification task. That is, manual labels are still required in the fine-tuning process in these works, although the model is pre-trained in a self-supervised fashion. In a recent work [21], researchers made an early attempt to distinguish different tissues with a recursive clustering algorithm, proving that integrating clustering with the contrastive learning-based histopathology representations can achieve high patch classification accuracy. In contrast to the existed works, we focus on building a bridge between the clustering-based tissue classification and the pixel-level WSI segmentation to move the topic forward for label-efficient WSI analysis.
Weakly-supervised WSI Segmentation: Since it is difficult to obtain pixel-level labels for WSI segmentation, weakly-supervised WSI segmentation has been an active research area for years. The weakly-supervised WSI segmentation methods can be roughly categorized into CAM-based and MIL-based solutions. The CAM-based methods are built on Class Activation Map variants [24, 16, 5] produced by well-trained classification models. Generally, these predictions are error-prone and need to be refined by complex post-processing strategies [4]. MIL-based solutions [12, 17, 14] regard WSIs as a bag of local patches and can learn to predict the classification label of each local patch. The patch classification results are then merged as segmentation masks. Most MIL-based methods [12, 17] are designed for binary segmentation scenario. Different from these works, our method can handle both binary and multiple tissue segmentation tasks with effective human-machine interactive steps and better results.
3 Methodology
Our framework, as shown in Fig. 2, includes three steps: (1) contrastive learning-based encoder training, (2) prototype identification based on clustering, and (3) coarse segmentation prediction and refinement.

3.1 Contrastive Learning based Encoder Training
When pathologists read WSIs, they can recognize different tissues based on comparing local cells’ visual appearance and surrounding micro-environments. Similarly, we need an encoder that can project local visual histopathology patterns into a discriminate space to identify different tissues. Inspired by recent studies on contrastive learning, we adopt SimCLR [7] to pre-train the encoder without labels in a self-supervised fashion.
The detail of this step is shown in Fig. 2 Step 1. Given a set of WSIs , we crop them into non-overlap patches with small resolution (empirically set as 128128 in this study). During the training stage, given N patches in a mini-batch, we get 2N patches by applying different augmentations on each patch. Two augmented patches from the same patch are regarded as positive pairs, others are treated as negative pairs. We employ ResNet18 [11] backbone, where the final linear classification layer and global average pooling layer are removed, as the encoder . Each patch is encoded by to get , and then feed into global average pooling layer to generate an embedding space with a dimension of . Same as [7], a projection head is used to map to where the contrastive loss is applied. To achieve our goal, the contrastive loss is utilized to pull the positive pairs close and push the negative pairs away in the embedding space. Given a sample‘s embedding and as its positive and negative samples’ embeddings, the formula of contrastive loss is defined as: , where denotes the similarity function and denotes the temperature parameter. In practice, we utilize cosine similarity and set to 0.5. After training, is used to project histopathology patches into a discriminative embedding space of for prototype identification.
3.2 Prototype Identification based on Clustering
Having obtained a discriminative histopathology image encoding latent space, we then use clustering to mine potential tissue prototypes (i.e. cluster centroids), which can capture inter- and intra-class heterogeneity of different tissues and represent the entire encoding space in a dictionary. To map these prototypes with corresponding labels, only in this step do we need to integrate experts’ examination. In practice, we can ask an expert to examine sampled representative patches to efficiently determine the labels of corresponding clusters without laboriously evaluating all the patches as shown in Fig. 2 Step 2. In our experiments, we simulate the visual inspection process with ground-truth labels for easy reproducibility and evaluation of the segmentation performance.
To begin with, we illustrate how to perform the clustering with the pre-trained . To reduce computational overhead, we randomly sample a subset from the entire histopathology patch set . Then, and are utilized to project patches into the discriminative embedding space to obtain the embedding set . Next, we adopt unsupervised clustering on to find tissue prototypes without accessing any manual label. For computation efficiency, K-Means++ [2] is employed to generate clusters from . Naturally, the first problem comes: how to determine a proper cluster number ?
Intuitively, the selection of plays a trade-off between reducing annotation workload and purifying clusters, i.e., a larger results in smaller tissue clusters with higher inter-class similarity, but costs more labor as the pathologist needs to inspect. In order to select a good cluster number without accessing any manual labels, we adopt the elbow method [13] to determine proper where model performance increment is no longer worth additional cost. We use an annotation-free distance-based measurement: inter-class embedding squared distance to measure the model performance. is given by: , where denotes the centroid of cluster and denotes the embeddings of patches in cluster . Generally, decreases when grows up. To search a proper , we can plot the curve of relative reduction , where is potential values of (e.g., ), and choose the elbow point where the reduction of slows down as the final cluster number.
It is noted that identifying proper cluster number is an open problem, and we also evaluate other clustering methods. Our experiments show that KMeans++ is better with current setting in our following ablation study.
Based on the obtained clusters, then there comes the second problem: how can human experts efficiently determine the tissue types for every cluster? We propose to let human experts only evaluate sampled representative patches to give the prototype-level labels instead of inspecting all the patches. The representative patches are sampled by the central sampling strategy. That is, given a target cluster, we sample patches which are closest to the cluster centroid. More sampling strategies will be discussed in the experiments. To simulate pathologists’ the visual examination process, here we count the proportion of each tissue according to the pixel-level labels. If there exists one tissue type with a proportion of pixels exceeding 80%, we consider this cluster to represent this tissue and add the prototype (the cluster centroid embedding) into the prototype dictionary. Otherwise, we assume the cluster may be a mixture of different tissue types (i.e. the border area between organs), and drop it.

To help readers better understand, we in Fig. 3 illustrate the dictionary building process with the BCSS dataset [1]. We visualize the randomly sampled 20000 patches used for dictionary building with T-SNE in Fig. 3 (a) which is colored according to the manual pixel-level labels. Particularly, these patches are divided into 6 types, of which 5 types (tumor, stroma, inflammatory infiltration, necrosis, and others) are defined by the dataset providers, and the mixture type defined by us. Our tissue dictionary is built on the clustering results drawn in Fig. 3 (b) by K-Means++ with . Intuitively, we can observe that the clustering results maintain a high degree of consistency with the manual labels. By sampling 10 representative patches from each cluster center and calculating the tissue proportion, we can find that 13/5/3/1/3 clusters are identified as tumors/stroma/inflammatory infiltration/necrosis/others, correspondingly. Note that Cluster 1 is formed by blank background patches, therefore spreading in a centrosymmetric distribution. There are 5 cluster with multiple tissue mixture patterns, which are excluded from the dictionary.
After above process, we can build the prototype dictionary with prototypes after the above process. The prototype dictionary can be noted as , where are prototype embeddings, and are prototype-level labels given by the pathologist.
3.3 Coarse Segmentation Prediction and Refinement
Having obtained the pre-trained and prototype dictionary , we can generate coarse segmentation masks for a given WSI by the process shown in Fig. 2 Step 3. We first feed WSI into the encoder to have a semantic vector map. Then, we can query to determine the tissue type for each location. Intuitively, for each location, we can directly retrieve the nearest prototype to the current semantic vector from and label each location with the corresponding tissue type. We termed such a query setting as Direct-Query (DQ). However, DQ treats each location independently ignoring intrinsic similarity between the embeddings. Thus, we further improve DQ with an enhanced Cluster-then-Query (CQ) setting to suppress outlier query results. Particularly, in order to explore the intrinsic similarity between the embeddings in the given feature map, we divide the locations into different groups by performing K-Means++ on the 512-d vector set. After that, the clustering centroids of different groups are used to query for labeling each location. As for the clustering number of the target WSI, we find that it is better to decide the clustering number according to tissue types in the given WSI. According to the DQ based query results, we can roughly know that there may exist tissue types for the target WSI. Considering the possibility of spatial dispersion of the same tissue in images, We divide the current feature map into clusters. is the scale factor and set to 5 empirically.
Now, we have a tissue segmentation map, which can be upsampled to restore its resolution for the given WSI. The coarse map has zigzag edges but it can offer dense supervision for segmentation network training. To refine the final prediction, we further use these coarse masks as the pseudo tissue masks to train a segmentation network from scratch. By such, we finally build a bridge from the prototype learning to semantic segmentation.
4 Experiments
Dataset | CAMELYON16 | BCSS | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Metrics | Pix. Acc. | Tum. Dice | Nor. Dice | Pix. Acc. | Tum. Dice | Stm. Dice | Inf. Dice | Nec. Dice | Oth. Dice | |
Pixel-level Supervised LinkNet | 0.9391 | 0.9078 | 0.9896 | 0.8335 | 0.8773 | 0.8162 | 0.7662 | 0.7640 | 0.9653 | |
Coarse Seg. | Grad-Cam | 0.8243 | 0.7480 | 0.9745 | 0.5799 | 0.4415 | 0.6190 | 0.5122 | 0.3251 | 0.7244 |
Grad-Cam++ | 0.7462 | 0.6169 | 0.9641 | 0.6121 | 0.6327 | 0.6041 | 0.3606 | 0.2757 | 0.7638 | |
ABMIL | 0.8455 | 0.8033 | 0.9805 | N/A | N/A | N/A | N/A | N/A | N/A | |
CLAM | 0.8700 | 0.7752 | 0.9743 | 0.6441 | 0.7082 | 0.6256 | 0.6404 | 0.3571 | 0.9109 | |
Our Proto2Seg(DQ) | 0.9174 | 0.8401 | 0.9810 | 0.7421 | 0.7907 | 0.7240 | 0.6590 | 0.6176 | 0.9430 | |
Our Proto2Seg(CQ) | 0.9176 | 0.8491 | 0.9824 | 0.7484 | 0.7956 | 0.7281 | 0.6576 | 0.6381 | 0.9429 | |
Refine. Seg. | CLAM + LinkNet | 0.9189 | 0.8741 | 0.9859 | 0.7323 | 0.7797 | 0.7281 | 0.6784 | 0.6337 | 0.9054 |
Our Proto2Seg(DQ) + LinkNet | 0.9266 | 0.8766 | 0.9859 | 0.7491 | 0.8029 | 0.7604 | 0.6800 | 0.6143 | 0.9479 | |
Our Proto2Seg(CQ) + LinkNet | 0.9231 | 0.8824 | 0.9868 | 0.7750 | 0.8063 | 0.7769 | 0.6799 | 0.6979 | 0.9477 |
4.1 Setup
Datasets: Our experiments are conducted on two public datasets: CAMELYON16: we use the train set which contains 270 WSIs focusing on sentinel lymph nodes with pixel-wise cancerous/normal region annotation [3]. BCSS: contains 151 Region-of-Interest cropped Images from breast cancer WSIs in TCGA with tissue-level annotations [1]. Five tissue types are labeled: tumor, stroma, inflammatory infiltration, necrosis, and ‘others’ (including background, blood, etc). Similar to previous works [17, 21], we generate the 20482048 / 10241024 image-level data from CAMELYON16/BCSS with the specimen-level pixel size of / to get a proper view of biological tissues as the image-level data. For BCSS, we follow the official train-test split with 2151/976 images for training/testing. For CAMELYON16, we randomly divide patients into the train-test set by 2:1 following the hold-out setting and have 1348/848 images for training/testing.
Baselines & Evaluation: We compare our framework with the pixel-level supervised upper bound established by training a LinkNet [6] with ground-truth annotations, and several weakly supervised baselines including Grad-Cam [16], Grad-Cam++ [5], ABMIL [12], and CLAM [14]. The images from Camelyon16/BCSS are cropped into patches for MIL methods. All networks are built based on ResNet18. We report macro-average pixel accuracy and each tissue’s Dice score to compare different methods.
Our Protot2Seg: We first crop the training set into the patches without overlapping resulting in over patches for CAMELYON16/BCSS, respectively. All the patches are used to train a ResNet18-based encoder following the original SimCLR [7] setting for 200 epochs. For computation efficiency in the clustering process, 20000 patches are randomly sampled for tissue prototype clustering on both datasets. Note that, the testing set is not exposed to the encoder during the training or the dictionary building process. We set the tissue cluster number of CAMELYON16/BCSS as 15/30 through the elbow method. We employ prototype dictionary and CQ setting to generate coarse segmentation. The training of the refinement segmentation network follows the same setting as our pixel-level supervised baseline except for using coarse segmentation masks as the supervision.
4.2 Main Results
Annotation Type | Quantity | Average Time | Total Time | Ratio |
---|---|---|---|---|
Pixel-level | 2151 images | 8min50s per image | 316.7h | 1583.5x |
Image-level | 2151 images | 1min59s per image | 71.1h | 355.5x |
Prototype-level (ours) | 30 clusters | 26s per cluster | 0.2h | 1x |
Quantitative results: The results of segmentation performance are summarized in Table. 1. Our Proto2Seg(DQ) coarse segmentation results obtained via directly querying the tissue prototype dictionary already surpass other weakly supervised baselines with significant margins. By using the Proto2Seg(CQ) coarse masks as dense supervision, the refined segmentation results achieve comparable pixel accuracy and Dice as the pixel-level supervised upper bound on CAMELYON16 for binary segmentation. For BCSS, most tissue segmentation results are further improved by Proto2Seg(DQ) + refinement training, except for the necrosis. We conjecture that the coarse masks of the necrosis provide inaccurate supervision with only 1 prototype as shown in Fig. 3 (b) and mislead the refinement. Our Proto2Seg(CQ) setting, taking the embeddings’ intrinsic similarity and the tissue’s spatial dispersion into consideration, further improves the coarse segmentation performance. We can also observe that the CQ based coarse segmentation masks offer better supervised signals than masks generated by DQ. We also enhance the CLAM segmentation results with the refinement training step, and we can find that our solution is still superior to the CLAM + refinement. In conclusion, our solution can narrows the performance gap between pixel-level fully-supervised and weak-supervised methods with prototypes.
Note that we achieve better segmentation results with a lower annotation burden than other methods. To compare the labeling cost, we invited 3 pathologists to annotate randomly sampled 20 BCSS patches with 1024x1024 pixels for pixel/image-level labels and use the average time to measure the labeling cost for BCSS training set reported in Table. 2. Obviously, our method is much more labor-saving. It is also reminded that we used 1024 1024 cropped BCSS images but the original WSIs can be with a resolution of 10000 10000, i.e. it will be more time-consuming to give pixel/image-level labels for the original WSIs. But our method will not be affected by larger resolutions.

Qualitative results: We visualize the qualitative results in Fig. 4. We can find that our methods can better handle binary and multi-tissue segmentation than other weakly-supervised methods. Because the DQ setting treats every location independently, there exist outlier errors in its coarse segmentation results. We use white arrows to address where the coarse segmentation results are further improved by our CQ strategy. By using the coarse segmentation as the dense supervision, it can be observed that the fine results better correct some error predictions and have smoother edges.
4.3 Ablation Study
Clustering Numbers for Prototype Identification: Fig. 5 (a) illustrates how the selection of cluster number for dictionary building can affect the results. When the number of clusters , it’s difficult for our method to distinguish different tissues, because too few clusters may not capture enough inter-/intra-class heterogeneity. When , the coarse segmentation masks can obtain satisfied segmentation results. As mentioned above, we follow the elbow method to set without accessing any labels, while the optimal Dice is achieve when . That is, determining proper is an open challenge for unsupervised clustering, but our method is robust when .

Clustering Method | Kmeans | Kmeans++ | SpectralClustering | DBSCAN | FINCH |
---|---|---|---|---|---|
Time (s) | 12.82 | 10.67 | 108.9 | 10.85 | 8.08 |
Mean Dice (DQ) | 0.9000 | 0.9107 | 0.9045 | 0.7802 | 0.8942 |
Patch Sampling Strategies for Visual Inspection: Fig. 5 (b) shows how the patch sampling strategy will affect the results on BCSS. We perform experiments about central/equidistant sampling strategies with different values of and report the macro-average Dice of 5 tissues. In equidistant sampling setting, we randomly sample equidistant patches in each cluster form center to border for inspection. We can conclude that central sampling is constantly better. We can also observe that checking too few patches leads to performance decrease while checking more patches leads to higher accuracy but with a heavier workload. To make the trade-off, we employ central sampling with .
Different clustering algorithms: Five clustering algorithms are evaluated with our framework shown in Table 3. The clustering number of the former three algorithms are determined with the same elbow method described. DBSCAN[8] and FINCH[15] are -free, i.e. the algorithm can automatically determine clustering number. We can observe that Kmeans++[2] outperforms other methods.
5 Conclusion
In this paper, we make one of the early attempts to bridge the contrastive learning-based WSI patch pre-training and semantic segmentation with a human-machine interactive labeling tissue prototype dictionary. Experiments on two public datasets demonstrate that our method is comparable to the supervised upper bounds and outperform other weakly-supervised methods. The major limitations are that we simulate the visual examination of pathologists in current experiments and the coarse masks inevitably contain noises. Future work will discuss about how human factors may affect the visual injection process and incorporate techniques [20, 19, 18] to alleviate the negative impacts of label noises.
5.0.1 Acknowledgement.
This research was partly supported by the National Key RD Program of China (Grant No. 2020AAA0108303), Shenzhen Science and Technology Project (Grant No. JCYJ20200109143041798), Shenzhen Stable Supporting Program (Grant No. WDZC20200820200655001), and Shenzhen Key Lab of next generation interactive media innovative technology (Grant No. ZDSY S20210623092001004).
References
- [1] Amgad, M., Elfandy, H., et al.: Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35(18), 3461–3467 (2019)
- [2] Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: SODA. pp. 1027–1035 (2007)
- [3] Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318(22), 2199–2210 (2017)
- [4] Chan, L., Hosseini, M.S., Rowsell, C., et al.: Histosegnet: Semantic segmentation of histological tissue type in whole slide images. In: ICCV. pp. 10662–10671 (2019)
- [5] Chattopadhay, A., Sarkar, A., et al.: Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In: WACV. pp. 839–847 (2018)
- [6] Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: VCIP. pp. 1–4 (2017)
- [7] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML. pp. 1597–1607 (2020)
- [8] Ester, M., Kriegel, H.P., Sander, J., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD. vol. 96, pp. 226–231 (1996)
- [9] Grill, J.B., Strub, F., Altché, F., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS. pp. 21271–21284 (2020)
- [10] He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR. pp. 9729–9738 (2020)
- [11] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016)
- [12] Ilse, M., Tomczak, J., Welling, M.: Attention-based deep multiple instance learning. In: ICML. pp. 2127–2136 (2018)
- [13] Liu, F., Deng, Y.: Determine the number of unknown targets in open world based on elbow method. IEEE Transactions on Fuzzy Systems 29(5), 986–995 (2020)
- [14] Lu, M.Y., et al.: Data-efficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5(6), 555–570 (2021)
- [15] Sarfraz, S., Sharma, V., Stiefelhagen, R.: Efficient parameter-free clustering using first neighbor relations. In: CVPR. pp. 8934–8943 (2019)
- [16] Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV. pp. 618–626 (2017)
- [17] Xu, G., Song, Z., Sun, Z., et al.: Camel: A weakly supervised learning framework for histopathology image segmentation. In: CVPR. pp. 10682–10691 (2019)
- [18] Xu, Z., Lu, D., Luo, J., et al.: Anti-interference from noisy labels: Mean-teacher-assisted confident learning for medical image segmentation. IEEE Transactions on Medical Imaging 41(11), 3062–3073 (2022)
- [19] Xu, Z., Lu, D., Wang, Y., et al.: Noisy labels are treasure: mean-teacher-assisted confident learning for hepatic vessel segmentation. In: MICCAI. pp. 3–13 (2021)
- [20] Xu, Z., Lu, D., et al.: Denoising for relaxing: Unsupervised domain adaptive fundus image segmentation without source data. In: MICCAI. pp. 214–224 (2022)
- [21] Yan, J., Chen, H., Li, X., Yao, J.: Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis. Computerized Medical Imaging and Graphics 97, 102053 (2022)
- [22] Yang, J., et al.: Towards better understanding and better generalization of low-shot classification in histology images with contrastive learning. In: ICLR (2022)
- [23] Yang, P., Hong, Z., Yin, X., Zhu, C., Jiang, R.: Self-supervised visual representation learning for histopathological images. In: MICCAI. pp. 47–57 (2021)
- [24] Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR. pp. 2921–2929 (2016)