33email: [email protected]
Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal
Abstract
Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. Our framework aims to achieve the balance between stability and plasticity during continuous learning. To preserve system plasticity, we utilize local memory bank with reservoir sampling method to save instances, which can comprehensively encompass the feature spaces of both old and new tasks. Furthermore, A distance consistency rehearsal (DCR) module is designed to ensure the retrieval queue’s consistency for previous tasks, which is regarded as stability within a lifelong CBHIR system. We evaluated the proposed method on four public WSI datasets from TCGA projects. The experimental results have demonstrated the proposed method is effective and is superior to the state-of-the-art methods. The code is available at https://github.com/OliverZXY/LWSR.
Keywords:
Histopathology image analysis CBHIR Continual learning.1 Introduction
With the development of digital pathology and artificial intelligence, computer-aided cancer diagnosis methods (CAD) on histopathological image analysis have been widely studied [19, 8, 6, 7]. Content-based histopathological image retrieval (CBHIR) is an emerging technology in this domain [5], which offers the capability to return histopathology whole slide images (WSIs) that are content-wise similar to query WSIs from an established database.
However, though current CBHIR systems have shown excellent performance on stationary database [29, 28]. Nevertheless, they would also forget previously learned knowledge while learning new data, known as catastrophic forgetting [11, 12, 3, 13] in artificial intelligence systems. In image retrieval systems, catastrophic forgetting manifests as inconsistencies in the returned queues for old tasks before and after learning new tasks. This is rather serious for an image retrieval system in the medical scenario, because doctors may depend on the retrieved results to diagnose. Inconsistencies in these results over time can severely undermine the reliability of the system, potentially misleading doctors in diagnosis. This inevitably limits CBHIR systems’ practical application in data-incremental clinical scenario.
Recently, continual learning (CL) has been proposed [18, 9] to mitigate catastrophic forgetting when learning continuously from a non-stationary data stream. Numerous proposed CL methods can be roughly summarized into three categories: replay methods, regularization-based methods and parameter isolation methods [9]. Among them, replay methods achieve promising performance by storing a subset of data from passing data stream as exemplars for experience replay. In natural scene, lots of replay methods have been proposed to alleviate catastrophic forgetting in downstream tasks, such as classification [21, 27, 26] and semantic segmentation[10, 20]. In digital pathology, rehearsal method has been successfully applied in WSI classification task [13]. No matter in natural scenes or in digital pathology, for the replay method, the most crucial aspect is how to sample rehearsal instances and how to replay. This holds true for digital pathology retrieval tasks as well. In the domain of CBHIR, there is a lack of research on continuous retrieval frameworks.
In this paper, we propose a novel continual whole slide retrieval framework named lifelong whole slide retrieval (LWSR). LWSR achieves good retrieval performance by balancing the trade-off between stability and plasticity during continuous learning. To preserve model’s plastcity, we utilize a local memory bank to save instances of old tasks, which aims to encompass the feature space of old tasks with a reservoir random sampling algorithm [24]. Furthermore, we regard the retrieval queue’s consistency for previous tasks as stability within a lifelong CBHIR system and design a distance consistency rehearsal (DCR) module, which has been verified effective to improve the performance of retrieval and the stability of returned queues after continuous learning. The proposed framework was evaluated on a public TCGA continual dataset, which contains TCGA-NSCLC, TCGA-BRCA, TCGA-RCC and TCGA-GAST. The experimental results have demonstrated the proposed LWSR is effective in the task of continual WSI retrieval and is superior to the typical classification-oriented continual learning approaches.
The contribution of the paper can be summarized in two aspects. (1) We propose a novel lifelong whole slide retrieval (LWSR) framework to tackle the challenge of continual learning in histopathology image retrieval task. To the best of our knowledge, this is the first time to solve the continual learning problem in domain of histopathology image retrieval. (2) A novel distance consistency rehearsal (DCR) module is proposed to achieve consistency of result queues for old tasks. With the local memory bank, the proposed framework can achieve balance between stability and plasticity.

2 Method
2.1 Data Preparation and Problem Definition
Continual learning for CBHIR can be defined as training a model on non-stationary data when new data added into the WSI database.
The proposed framework is illustrated in Fig. 1. For an original WSI, we first divided it into several non-overlapping patches with size of . Then, we utilize pre-trained encoder PLIP [14] to extract patch features, for its excellent performance in understanding image semantic information. Afterwards, a feature cube of a WSI can be represented as , where is the dimension of the feature and is the number of patches in the WSI. To alleviate storage pressure, we save feature cubes in the memory bank instead of the original WSIs.
As shown in Fig. 1(I), we define the database as a sequence of datasets , where is the dataset of task , and is the total number of tasks. Dataset contains labeled WSIs, and is the class label of WSI . In the class-incremental CHBIR scenario, the model needs to return relative WSIs sequence of given query WSIs from both old tasks and new tasks. Universal training process for every task is illustrated in Fig. 1(III). When data of new task coming, described as Fig. 1(a-b), a mini-batch of training data and replay samples from memory bank are input to the WSI encoder to get WSI representations. Then, representations of current task and previous tasks are combined to calculate retrieval relative losses which facilitate retrieval precision of the model. Meanwhile, we execute distance consistency rehearsal on representations of old tasks to maintain the stability of the returned queues, as shown in Fig. 1(IV).
2.2 Distance Consistency Rehearsal
As illustrated in section 1, we design distance consistency rehearsal (DCR) module to maintain stability of result queues for old tasks, which is detailed in Fig. 1(IV) and illustrated in algorithm 1.
In practical scenarios, the stability of the CBHIR system is demonstrated by the consistency of the result queues for old tasks. In terms of feature space, this stability is reflected by maintaining constant distances between instances of old tasks even after learning new tasks. Based on this prior, we propose the distance consistency rehearsal (DCR), which is achieved by minimizing the differential value of distance between representations of current replay samples and distance matrix of corresponding replay samples saved in memory bank. The rehearsal process is elaborated as follows.
First, we construct the target distance matrix of instances saved in memory bank. At the ending of task , we execute reservoir sampling algorithm [24] on dataset and memory bank to get sampled feature cubes and for the memory bank of task , where and means sampled number of and respectively. Then, sampled feature cubes is fed into task encoder to get feature representations and , where is the dimension of the whole slide representation. With combination of and , denoted as , we calculate the Euclidean distance between every pair of elements in set to obtain the target distance matrix , which is formulated as
(1) |
where , , denotes the total number of instances sampled at the ending of task , and represent the -th and -th sampled instance.
Then, distance consistency rehearsal is implemented for current task. In every mini-batch of current dataset , we get representations of current dataset and instances sampled from memory bank, denoted as and . For , distance matrix is obtained by the same way in Eq. (1) and illustrated in algorithm 1. To maintain the distance consistency of rehearsal samples, we minimize the Mean Squared Error (MSE) loss between distance matrix of current task and replay samples by the equations
(2) |
where means total number of feature cubes sampled from memory bank and is sub-matrix sampled from according to the indexes of .
2.3 Objective for LWSR
Above all, we design the distance consistency loss to maintain the queue stability of a continual CBHIR system. Meanwhile, pair-wise loss and cross-entropy loss have been proven effective in image retrieval task [28, 29]. As a result, the model is optimized in an end-to-end fusion by
(3) |
where is the pair-wise loss, is the cross-entropy loss, is the distance consistency loss, and are the ground-truth label and output logits of input instances, and is a balancing factor. Through 3, we attempt to not only improve retrieval precision of LWSR but also maintain queue consistency as much as possible.
3 Experiment and Result
To comprehensively evaluate our proposed method, we construct a sequence WSI retrieval dataset, details illustrated in Table. 1. The sequence dataset contains four datasets from The Cancer Genome Atlas (TCGA), which is listed as non-small cell lung carcinoma (NSCLC), renal cell carcinoma (RCC), invasive breast carcinoma (BRCA) and gastrointestinal adenocarcinoma (GAST). We define the continual histopathology image database as a dynamic growing database. Each time one dataset containing two classes of cancer is added into the database, which could be seen as a class-incremental retrieval scene. The arrival sequence of continual database is NSCLC, RCC, BRCA and GAST.
We used pre-trained encoder PLIP [14] to extract patch features on the magnification under 20× lenses. TransMIL [22] was utilized as the WSI encoder (shown in Fig. 1(c)) for its effectiveness has been verified on various WSI analysis tasks [22, 25]. All the methods were implemented in Python 3.8 with PyTorch 1.12 and CUDA 12.2 and run the experiments on a computer with 2 GPUs of Nvidia Geforce RTX 4090.
For evaluation of retrieval precision, we report six metrics, containing slide-level mAP, R@3, and P@5 to evaluate the performance of 8-class tumor type retrieval, and site-mAP, site-R@3, site-P@5 for the 4-anatomic-site retrieval. Furthermore, to evaluate the stability of our method, we adopt two statistical correlation metrics Spearman’s rank correlation coefficient (SRC) [23] and Kendall rank correlation coefficient (KRC) [15] to assess the consistency of returned queues for old tasks.
Dataset | Anatomic site | Tumor type | Cases | Total Cases |
NSCLC | Pulmonary | Lung adenocarcinoma (LUAD) | 496 | 995 |
Lung squamous cell carcinoma (LUSC) | 499 | |||
RCC | Urinary | Clear cell renal cell carcinoma (KIRC) | 492 | 779 |
Papillary renal cell carcinoma (KIRP) | 287 | |||
BRCA | Breast | Invasive ductal (IDC) | 794 | 998 |
Invasive lobular carcinoma (ILC) | 204 | |||
GAST | Gastrointestinal | Colon Adenocarcinoma (COAD) | 369 | 731 |
Stomach Adenocarcinoma (STAD) | 362 | |||
Total Count | 3053 |
3.1 Model verification
We first trained a model with all datasets under fully supervision as the upper-bound, which is identified as JointTrain in Table. 2. Subsequently, individual training were sequentially conducted for the four datasets as the lower boundary, which is identified as Finetune. According to Table. 2, model subjected to Finetune exhibits the poorest performance, indicating that catastrophic forgetting occurs as the database size increases. In contrast, our method significantly mitigates catastrophic forgetting, performing nearly as well as the JointTrain approach. Although our method does not achieve as high mAP as JointTrain, it delivers comparable values in R@3 and P@5 metrics with JointTrain, which means the pathologists can efficiently receive diagnostically relevant cases by assessing a few top-returned results. DCR is defined to maintain the distance relationship between the cases from old retrieval tasks, which provides more finer guidance to the model in learning the similarities between WSIs compared to the common pair-wise loss. As highlighted in Table. 2 and illustrated in Fig. 2, DCR markedly improves the precision and consistency of fine-grained retrieval, as demonstrated by enhancements in both the precision and consistency metrics. Moreover, we combined Breakup-Reorganize (BuRo) rehearsal, the main component of ConSlide [13], with our framework, which is presented as LWSR w/ BuRo in Table. 2. BuRo achieves data augmentation by randomly selecting and combining patches from same category WSIs. However, for retrieval tasks, representations are expected to describe the complete semantic information of WSIs. The virtual cases created by BuRo would mislead the retrieval model to describe the real WSIs. It can be obviously observed that BuRo rehearsal method performs poorly in WSI retrieval tasks, with mAP of 0.563, showing a significant performance gap between LWSR w/o DCR and LWSR.
Methods | Retrieval Precision Metric | Consistency Metric | ||||||
---|---|---|---|---|---|---|---|---|
mAP | R@3 | P@5 | site mAP | site R@3 | site P@5 | SRC | KRC | |
JointTrain | 0.908 | 0.914 | 0.903 | 0.833 | 0.987 | 0.983 | - | - |
Finetune | 0.426 | 0.877 | 0.635 | 0.557 | 0.965 | 0.824 | 0.408 | 0.295 |
LWSR w/ BuRo | 0.563 | 0.940 | 0.809 | 0.628 | 0.988 | 0.948 | 0.687 | 0.535 |
LWSR w/o DCR | 0.789 | 0.938 | 0.861 | 0.757 | 0.988 | 0.974 | 0.852 | 0.707 |
LWSR | 0.796 | 0.960 | 0.871 | 0.804 | 0.997 | 0.990 | 0.878 | 0.736 |
3.2 Comparison with other continuous learning frameworks
We compare our method with several classic continual learning approaches, involving regularization-based methods LwF [17] and EWC [16] and replay-based methods ER-ACE [2], A-GEM [3] and DER++ [1].
Methods | Retrieval Precision Metric | Consistency Metric | ||||||
---|---|---|---|---|---|---|---|---|
mAP | R@3 | P@5 | site mAP | site R@3 | site P@5 | SRC | KRC | |
Regularization-based | ||||||||
LwF[17] | 0.432 | 0.895 | 0.647 | 0.560 | 0.965 | 0.829 | 0.473 | 0.339 |
EWC[16] | 0.437 | 0.881 | 0.647 | 0.573 | 0.970 | 0.836 | 0.440 | 0.319 |
Replay based (Buffer size 5 WSIs) | ||||||||
ER-ACE[2] | 0.672 | 0.931 | 0.815 | 0.685 | 0.985 | 0.954 | 0.818 | 0.687 |
A-GEM[4] | 0.439 | 0.897 | 0.687 | 0.597 | 0.972 | 0.850 | 0.659 | 0.507 |
DER++[1] | 0.655 | 0.910 | 0.786 | 0.728 | 0.985 | 0.948 | 0.835 | 0.696 |
LWSR | 0.773 | 0.945 | 0.859 | 0.788 | 0.998 | 0.979 | 0.863 | 0.723 |
Replay based (Buffer size 10 WSIs) | ||||||||
ER-ACE[2] | 0.733 | 0.930 | 0.836 | 0.761 | 0.981 | 0.964 | 0.854 | 0.716 |
A-GEM[4] | 0.496 | 0.905 | 0.711 | 0.652 | 0.971 | 0.882 | 0.635 | 0.510 |
DER++[1] | 0.680 | 0.927 | 0.799 | 0.749 | 0.987 | 0.949 | 0.839 | 0.695 |
LWSR | 0.796 | 0.960 | 0.871 | 0.804 | 0.997 | 0.990 | 0.878 | 0.736 |
Replay based (Buffer size 15 WSIs) | ||||||||
ER-ACE[2] | 0.746 | 0.925 | 0.835 | 0.732 | 0.985 | 0.955 | 0.849 | 0.705 |
A-GEM[4] | 0.489 | 0.887 | 0.696 | 0.668 | 0.972 | 0.873 | 0.707 | 0.563 |
DER++[1] | 0.693 | 0.924 | 0.799 | 0.739 | 0.985 | 0.943 | 0.852 | 0.714 |
LWSR | 0.808 | 0.931 | 0.855 | 0.826 | 0.992 | 0.983 | 0.866 | 0.715 |
According to Table. 3, regularization based methods fail to effectively alleviate catastrophic forgetting and they only achieve a slight improvement over Finetune. Compared to other replay-based continual learning methods, our method consistently achieve the best performance under various circumstances. The main reason is that these methods are proposed to solve catastrophic forgetting in the domain of classification initially and do not accounted for taking the interaction between the current task and previous tasks into consideration. In comparison with the second-best methods under buffer in size of about 10 WSIs, LWSR achieves an increase of 6.3%/3.0%/3.5% in mAP, R@3 and P@5, and an increase of 4.3%/1.6%/2.6% in site mAP, R@3 and P@5. It has demonstrated the effectiveness of the proposed method for the task of lifelong CBHIR. Furthermore, with the growth of buffer size, our method can achieve further improvements. It is also apparent that the enhancement in performance metrics upon expanding the buffer size from 10 WSIs to 15 WSIs is not as pronounced as the improvement observed from enlarging it from 5 to 10. A 10-WSI buffer balances diversity between current and previous tasks, aiding the model in achieving high retrieval precision after sequential learning. However, a 15-WSI buffer may overly emphasize past tasks at the expense of current ones, leading to decreased focus on the present task.

4 Conclusion
In this paper, we have proposed a novel Lifelong Whole Slide Retrieval (LWSR) framework to address the challenges of catastrophic forgetting by progressive model updating on continuously growing retrieval database. The experiments on a large sequence retrieval dataset have proven the LWSR is effective in the scenario of class-incremental retrieval tasks, and achieves the state-of-the-art overall performance.
4.0.1 Acknowledgements
This work was partly supported by Beijing Natural Science Foundation (Grant No. 7242270), partly supported by the National Natural Science Foundation of China (Grant No. 62171007, 61901018, and 61906058), and partly supported by the Fundamental Research Funds for the Central Universities of China (grant No. YWF-23-Q-1075 and JZ2022HGTB0285).
4.0.2 \discintname
The authors have no competing interests to declare that are relevant to the content of this article.
References
- [1] Buzzega, P., Boschini, M., Porrello, A., Abati, D., Calderara, S.: Dark experience for general continual learning: a strong, simple baseline. Advances in neural information processing systems 33, 15920–15930 (2020)
- [2] Caccia, L., Aljundi, R., Asadi, N., Tuytelaars, T., Pineau, J., Belilovsky, E.: New insights on reducing abrupt representation change in online continual learning. arXiv preprint arXiv:2104.05025 (2021)
- [3] Chaudhry, A., Dokania, P.K., Ajanthan, T., Torr, P.H.: Riemannian walk for incremental learning: Understanding forgetting and intransigence. In: Proceedings of the European conference on computer vision (ECCV). pp. 532-547 (2018)
- [4] Chaudhry, A., Ranzato, M., Rohrbach, M., Elhoseiny, M.: Efficient lifelong learning with a-gem. arXiv preprint arXiv:1812.00420 (2018)
- [5] Chen, C., Lu, M.Y., Williamson, D.F., Chen, T.Y., Schaumberg, A.J., Mahmood, F.: Fast and scalable search of wholeslide images via self-supervised deep learning. Nature Biomedical Engineering 6(12), 1420-1434 (2022)
- [6] Chen, R.J., Chen, C., Li, Y., Chen, T.Y., Trister, A.D., Krishnan, R.G., Mahmood, F.: Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16144–16155 (June 2022)
- [7] Chen, R.J., Krishnan, R.G.: Selfsupervised vision transformers learn visual concepts in histopathology. Learning Meaningful Representations of Life, NeurIPS 2021 (2021)
- [8] Chen, R.J., Lu, M.Y., Williamson, D.F., Chen, T.Y., Lipkova, J., Shaban, M., Shady, M., Williams, M., Joo, B., Noor, Z., et al.: Pan-cancer integrative histologygenomic analysis via multimodal deep learning. Cancer Cell (2022)
- [9] De Lange, M., Aljundi, R., Masana, M., Parisot, S., Jia, X., Leonardis, A., Slabaugh, G., Tuytelaars, T.: A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence 44(7), 3366–3385 (2021)
- [10] Douillard, A., Chen, Y., Dapogny, A., Cord, M.: Plop: Learning without forgetting for continual semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 4040–4050 (2021)
- [11] French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3(4), 128-135 (1999)
- [12] Goodfellow, I.J., Mirza, M., Xiao, D., Courville, A., Bengio, Y.: An empirical investigation of catastrophic forgetting in gradientbased neural networks. arXiv preprint arXiv:1312.6211 (2013)
- [13] Huang, Y., Zhao, W., Wang, S., Fu, Y., Jiang, Y., Yu, L.: Conslide: Asynchronous hierarchical interaction transformer with breakup-reorganize rehearsal for continual whole slide image analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 21349-21360 (2023)
- [14] Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T.J., Zou, J.: A visual-language foundation model for pathology image analysis using medical twitter. Nature Medicine pp. 1–10 (2023)
- [15] Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
- [16] Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114(13), 3521–3526 (2017)
- [17] Li, Z., Hoiem, D.: Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence 40(12), 2935–2947 (2017)
- [18] Lopez-Paz, D., Ranzato, M.: Gradient episodic memory for continual learning. Advances in neural information processing systems 30 (2017)
- [19] Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F.: Dataefficient and weakly supervised computational pathology on whole-slide images. Nature Biomedical Engineering 5(6), 555-570 (2021)
- [20] Maracani, A., Michieli, U., Toldo, M., Zanuttigh, P.: Recall: Replay-based continual learning in semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 7026–7035 (2021)
- [21] Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: icarl: Incremental classifier and representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. pp. 2001–2010 (2017)
- [22] Shao, Z., Bian, H., Chen, Y., Wang, Y., Zhang, J., Ji, X., et al.: Transmil: Transformer based correlated multiple instance learning for whole slide image classification. Advances in Neural Information Processing Systems 34, 2136-2147 (2021)
- [23] Spearman, C.: The proof and measurement of association between two things. The American journal of psychology 100(3/4), 441–471 (1987)
- [24] Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11(1), 37–57 (1985)
- [25] Wang, X., Yang, S., Zhang, J., Wang, M., Zhang, J., Yang, W., Huang, J., Han, X.: Transformer-based unsupervised contrastive learning for histopathological image classification. Medical image analysis 81, 102559 (2022)
- [26] Wang, Z., Zhang, Z., Lee, C.Y., Zhang, H., Sun, R., Ren, X., Su, G., Perot, V., Dy, J., Pfister, T.: Learning to prompt for continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 139–149 (2022)
- [27] Yan, S., Xie, J., He, X.: Der: Dynamically expandable representation for class incremental learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3014–3023 (2021)
- [28] Zheng, Y., Jiang, Z., Shi, J., Xie, F., Zhang, H., Huai, J., Cao, M., Yang, X.: Diagnostic regions attention network (dranet) for histopathology wsi recommendation and retrieval. IEEE Transactions on Medical Imaging (2020). doi:10.1109/TMI.2020.3046636
- [29] Zheng, Y., Jiang, Z., Zhang, H., Xie, F., Shi, J.: Tracing diagnosis paths on histopathology wsis for diagnostically relevant case recommendation. In: Medical Image Computing and ComputerAssisted Intervention. pp. 459-469 (2020). doi:10.1007/978-3-030-59722-1_44
Supplementary Material of Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal
Xinyu Zhu Zhiguo Jiang Kun Wu Jun Shi Yushan Zheng ✉
5 Evaluation metrics
Metric Name | Definition | Description |
---|---|---|
Spearman’s Rank Correlation Coefficient | is the difference between the ranks of retrieval sequence of -th task after tasks’ training, and is the number of instance of each sequence. | |
Usage in Application | ||
Kendall’s Rank Correlation Coefficient | is the number of concordant pairs of retrieval sequence of -th task after tasks’ training, is the number of discordant pairs, and is the number of instance of each sequence. | |
Usage in Application |
6 Implementation details
Hyperparameter Name | Hyperparameter Value |
---|---|
Patch Sampling Number per WSI | 2048 |
Pair-wise Loss Weight | 1.0 |
Cross-entropy Loss Weight | 1.0 |
Distance Consistency Loss Weight | 0.01 |
Learning Rate | 1e-5 |
Buffer Size | 100, 200, 300 |
Batch Size | 10 |
Minibatch Size | 30 |
Number of Epochs | 70 |
Optimizer | Adam |
Scheduler | StepLR |