Non-Exemplar Online Class-incremental Continual Learning via
Dual-prototype Self-augment and Refinement
——Appendix——

Fushuo Huo¹, Wenchao Xu¹, Jingcai Guo^{1, 2}, Haozhao Wang³¹¹1Corresponding author, Yunfeng Fan¹

Overview

The appendix presents more experimental settings, results, and analyses as follows:

Appendix A More implementation details

Appendix B More results on different dataset partitions.

Appendix C Results and analysis on different base session training strategies.

Appendix D Hyper parameter analysis

Appendix E Computation overhead analysis

Appendix A: More Implementation Details

Dataset Overview. We conduct experiments on three widely used datasets, including CORE-50 (Lomonaco and Maltoni 2017), CIFAR 100 (Krizhevsky and Hinton 2009), and Mini-ImageNet (Vinyals et al. 2016). Here we give brief introductions. CORE-50 (Lomonaco and Maltoni 2017) is a benchmark designed for class incremental learning with 50 classes. Each class has around 2,398 training images and 900 testing images, with the size of $3\times 128\times 128$ . CIFAR 100 (Krizhevsky and Hinton 2009) contains 60000 images of $32\times 32$ size from 100 classes, and each class includes 500 training images and 100 test images. Mini-ImageNet (Vinyals et al. 2016) contains 100 classes and is divided into 10 sub-datasets for 10 disjoint tasks, and each task contains 10 classes. Each task comprises 5,000 training images and 1,000 testing images, all with the size of $3\times 84\times 84$ .

Training Details. For OCL methods, we employ the same dataset partitions and training protocols of NO-CL, i.e., pre-training on the base classes and then online class-incremental learning with example buffers. Other hyperparameters are adopted as default. The example buffers are restored and retrieved during the whole training procedure with the default updating pipeline. MIR (Aljundi et al. 2019), GD (Prabhu, Torr, and Dokania 2020), ASER (Shim et al. 2021), SCR (Mai et al. 2021), and DVC (Gu et al. 2022) are based on the OCL codebase ²²2https://github.com/RaptorMai/online-continual-learning. Other methods are implemented with the public released codes. For FS-CL methods, FACT (Zhou et al. 2022a) and ALICE (Peng et al. 2022), we also adopt the same protocols as NO-CL. The prototypes of novel classes are computed by all data samples rather than few-shot samples. During the inference phase, FACT and ALICE directly infer incremental data samples via computed prototypes without finetuning the network. All methods are implemented with the public released codes. For NE-CL methods (Zhu et al. 2021b, 2022), as (Zhu et al. 2022) does not provide training scripts, we adopt the three-party codes (Zhou et al. 2022b)³³3https://github.com/G-U-N/PyCIL on CIFAR100 dataset. For (Zhu et al. 2021b), we employ the same dataset partitions and training and testing protocols of NO-CL. Note that all methods are employed the same reduced ResNet-18 as the feature extractor for fair comparisons. All experiments are conducted with NVIDIA RTX3090 GPU on CUDA 11.4 using PyTorch framework.

Base Session Training Details. Here, we give the details of our base session training strategy. We employ base training functions on the outputs of the feature extractor and projection module to obtain vanilla and high-dimensional prototypes for sequentially online sessions: $L^{base}=L^{base}_{vp}+L^{base}_{hp}$ . $L^{base}_{vp}=Loss(Proj_{vp}(\theta_{1}(x)),y)$ and $L^{base}_{vp}=Loss(Proj_{hp}(\theta{{}_{2}}(\theta_{1}(x))),y)$ . $x$ , $y$ , $\theta_{1}$ , and $\theta_{2}$ denote input samples, labels, feature extractor, and projection module. $Proj_{vp/hp}$ are linear layers to align vanilla- and high-dimensional prototypes for loss calculations. For cross-entropy (CE) loss functions, $Proj_{vp/hp}$ are one-layer MLP with the output dimension of base class. For supervised contrastive (SC) loss (Khosla et al. 2020), we follow SCR (Mai et al. 2021) and adopt the same hyper-parameters of SCR. $Proj_{vp/hp}$ are two-layer MLP with the dimension of 160 and 128 and the temperature is set to 0.1.

Appendix B: Results on Different Dataset Partitions

Due to space constraints in the main paper, in this subsection, we report the additional results of different dataset partitions. Concretely, as shown in Tables 1 and 2, we conduct experiments on the configuration of $40\%+6\%\times 10$ and $80\%+2\%\times 10$ , where $40\%$ and $60\%$ classes are selected as the base classes, and the rest classes are continually fed to the network in 10 sessions. Moreover, more incremental sessions (i.e., 20 sessions) with the configuration of $60\%+2\%\times 20$ are also compared in Table 3. The five representative state-of-the-art methods are compared with the same training and inference protocols as Non-exemplar Online Class-incremental continual Learning (NO-CL), including Online Class-incremental continual Learning (OCL) methods (i.e., SCR (Mai et al. 2021), OCM (Guo, Liu, and Zhao 2022), and DVC(Gu et al. 2022)), Non-Exemplar Class-incremental continual Learning (NE-CL) method (i.e., PASS (Zhu et al. 2021b)), and Few-Shot Class-incremental Learning (FS-CL) method (i.e., ALICE (Peng et al. 2022)). As we can see from Tables 1 and 2, fewer base classes result in poor performance, both in base and novel classes, because the network lacks enough pre-trained information to generalize to novel classes. Meanwhile, as our method depends on the inner-prototype computed by the pre-trained backbone, the well-trained backbone benefits us much for the prototype refinement. Notably, even with $40\%$ base classes, our method also achieves the best performance in Acc and HM metrics, which validates the robustness of prototype refinement strategies. Moreover, as for more incremental sessions in Table 3, the performance only drops slightly. Overall, experiments on different dataset partitions validate the effectiveness and robustness of our method.

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	35.0(49.3/25.5) $\|$ 33.6		37.1(59.2/22.3) $\|$ 32.4		36.5(58.6/21.8) $\|$ 31.8
PASS	25.2(62.8/0.2) $\|$ 0.4		26.4(65.1/0.6) $\|$ 1.2		25.8(63.4/0.8) $\|$ 1.6
MS	1000	2000	1000	2000	1000	2000
\hdashlineSCR	37.3(36.9/37.5) $\|$ 37.2	38.7(38.9/38.6) $\|$ 38.7	32.7(34.2/31.7) $\|$ 32.9	34.9(36.9/33.6) $\|$ 35.2	29.8(29.1/30.3) $\|$ 29.7	36.3(39.2/34.4) $\|$ 36.6
SCR_ft	34.2(45.2/26.8) $\|$ 33.6	37.7(49.8/29.6) $\|$ 37.1	30.9(45.2/21.3) $\|$ 21.3	36.4(50.8/26.8) $\|$ 35.1	32.9(43.2/26.1) $\|$ 32.5	36.8(50.8/27.5) $\|$ 35.7
OCM	37.6(37.5/37.6) $\|$ 37.5	39.5(40.5/38.9) $\|$ 39.7	31.6(32.9/30.8) $\|$ 31.8	37.1(36.1/37.8) $\|$ 36.9	31.3(30.8/31.6) $\|$ 31.2	31.7(32.9/30.9) $\|$ 31.9
OCM_ft	35.8(43.1/30.9) $\|$ 36.0	36.7(46.1/30.6) $\|$ 36.7	32.4(48.2/21.8) $\|$ 30.1	35.9(50.6/26.1) $\|$ 36.8	30.2(38.1/24.9) $\|$ 31.4	37.7(51.2/28.7) $\|$ 36.8
DVC	37.5(36.9/37.9) $\|$ 37.3	38.7(39.8/38.0) $\|$ 38.9	29.9(30.0/29.8) $\|$ 29.9	37.2(34.6/39.0) $\|$ 36.6	29.7(29.9/29.6) $\|$ 29.7	33.3(35.6/31.8) $\|$ 33.6
DVC_ft	36.7(45.8/30.6) $\|$ 36.7	37.9(45.1/33.1) $\|$ 38.2	32.3(43.6/24.7) $\|$ 31.5	36.5(46.5/29.8) $\|$ 36.3	31.4(39.4/26.1) $\|$ 31.3	32.6(38.5/28.6) $\|$ 32.8
Ours	45.5(44.2/46.4) $\|$ 45.3		38.6(43.8/35.2) $\|$ 39.0		40.4(55.4/30.4) $\|$ 39.3

Table 1: The quantitative analysis of dataset partition

\textbf{40\%}+\textbf{6\%}\times\textbf{10}

. Class-wise accuracy (Acc) by end of the training in terms of all classes, base classes, and novel classes and Harmonic accuracy (HM) are illustrated. MS and _ft mean the example memory size and finetuning versions. The best results are marked in bold.

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	44.7(48.2/30.6) $\|$ 37.4		50.5(57.2/23.8) $\|$ 33.6		50.3(56.4/26.1) $\|$ 35.7
PASS	48.4(60.3/0.6) $\|$ 1.2		50.4(62.8/0.8) $\|$ 1.6		49.8(62.0/0.9) $\|$ 1.8
MS	1000	2000	1000	2000	1000	2000
SCR	39.9(39.2/42.9) $\|$ 40.9	42.9(43.5/40.8) $\|$ 42.1	40.7(42.2/34.7) $\|$ 38.1	46.3(47.2/42.8) $\|$ 44.9	40.4(40.6/39.7) $\|$ 40.1	44.5(43.8/47.3) $\|$ 45.5
SCR_ft	45.4(48.9/31.2)) $\|$ 38.1	49.7(53.7/33.7) $\|$ 54.6	47.4(50.6/34.8) $\|$ 41.2	49.2(53.2/33.4) $\|$ 41.0	40.2(45.2/32.8) $\|$ 38.0	47.0(49.2/38.5) $\|$ 43.5
OCM	43.4(43.8/42.2) $\|$ 42.9	43.2(43.6/41.8) $\|$ 42.7	39.9(39.8/40.6) $\|$ 40.2	43.6(43.9/42.3) $\|$ 43.1	39.5(38.9/41.8) $\|$ 40.3	43.5(42.9/46.0) $\|$ 44.3
OCM_ft	46.8(49.8/34.8) $\|$ 40.9	47.8(49.8/39.7) $\|$ 44.2	46.1(48.8/35.1) $\|$ 40.8	48.7(51.8/36.3) $\|$ 42.7	44.1(46.1/36.3) $\|$ 40.6	46.3(48.6/37.2) $\|$ 42.1
DVC	43.0(42.8/43.8) $\|$ 43.3	45.2(45.8/42.8) $\|$ 44.2	39.4(38.6/42.5) $\|$ 40.5	40.1(39.7/41.9) $\|$ 40.8	41.6(42.6/37.4) $\|$ 39.8	45.4(45.6/44.8) $\|$ 45.2
DVC_ft	48.0(50.3/39.0) $\|$ 43.9	50.3(53.7/36.9) $\|$ 43.7	41.1(41.6/38.9) $\|$ 40.2	42.4(43.9/36.4) $\|$ 39.8	45.3(48.6/31.9) $\|$ 38.5	47.8(50.8/36.2) $\|$ 42.2
Ours	55.2(55.6/53.7) $\|$ 54.6		54.3(56.2/46.8) $\|$ 51.1		52.2(52.6/50.8) $\|$ 51.7

Table 2: The quantitative analysis of dataset partition

\textbf{80\%}+\textbf{2\%}\times\textbf{10}

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	39.5(46.2/29.5) $\|$ 36.0		42.5(53.5/25.9) $\|$ 34.9		41.1(51.4/25.7) $\|$ 34.3
PASS	35.2(58.3/0.8) $\|$ 1.6		37.9(62.6/1.0) $\|$ 2.0		37.2(61.2/1.1) $\|$ 2.2
MS	1000	2000	1000	2000	1000	2000
\hdashlineSCR	38.6(37.2/40.6) $\|$ 38.8	38.6(39.2/37.6) $\|$ 38.4	36.9(38.8/34.1) $\|$ 36.3	40.7(42.8/37.6) $\|$ 40.0	34.4(34.6/34.2) $\|$ 34.4	35.1(38.6/29.8) $\|$ 33.6
SCR_ft	36.6(42.4/27.9) $\|$ 33.7	41.3(49.8/28.6) $\|$ 36.3	38.6(49.2/22.8) $\|$ 31.2	41.1(52.8/23.7) $\|$ 32.7	37.9(43.2/30.0) $\|$ 35.4	39.5(43.7/33.2) $\|$ 37.7
OCM	38.3(38.8/37.6) $\|$ 38.2	41.1(42.1/39.6) $\|$ 40.8	36.3(36.5/35.9) $\|$ 36.2	41.4(42.7/39.6) $\|$ 41.1	35.9(35.2/37.1) $\|$ 36.1	39.5(43.7/33.2) $\|$ 37.7
OCM_ft	37.7(43.1/29.7) $\|$ 35.2	43.3(47.9/36.4) $\|$ 41.4	39.7(45.7/30.7) $\|$ 36.7	40.9(45.9/33.4) $\|$ 38.7	36.8(39.8/32.2) $\|$ 35.4	41.5(43.5/38.4) $\|$ 40.7
DVC	38.1(37.8/38.6) $\|$ 38.2	40.5(41.8/38.6) $\|$ 40.1	37.5(37.2/38.1) $\|$ 37.7	40.2(41.6/38.2) $\|$ 39.8	34.4(32.9/36.7) $\|$ 34.7	37.1(35.4/39.6) $\|$ 37.4
DVC_ft	39.8(45.9/30.7) $\|$ 36.8	41.8(46.3/35.0) $\|$ 39.9	37.8(42.1/31.4) $\|$ 35.9	40.6(44.8/34.3) $\|$ 38.9	35.1(38.9/29.4) $\|$ 33.4	37.7(38.7/36.1) $\|$ 37.4
Ours	49.5(49.1/50.2)49.6		47.6(51.6/41.7) $\|$ 46.1		49.1(53.8/42.1) $\|$ 47.2

Table 3: The quantitative analysis of dataset partition

\textbf{60\%}+\textbf{2\%}\times\textbf{20}

Appendix C: Results and Analysis on Different the Base Session Training Strategies

The stability and plasticity dilemma is a thorny problem in the area of continual learning. To deal with this dilemma, previous NE-CL (Zhu et al. 2021b, a) and FS-CL (Peng et al. 2022; Kalla and Biswas 2022) methods employ self-supervised learning (Jing and Tian 2021) and class and data augmentation to learn task-agnostic and transferable representations. For the problem of NO-CL, the base session training strategies also matter for the stability and plasticity dilemma. For fair comparisons, similar to (Mai et al. 2021; Gu et al. 2022; Guo, Liu, and Zhao 2022), we also employ supervised contrastive (SC) learning. Here, we provide two training strategies. Concretely, we add the extra self-supervised learning loss (Lee, Hwang, and Shin 2020) (+SSL) like (Zhu et al. 2021b; Kalla and Biswas 2022) and use the data augmentation strategy (+DA) proposed by (Zhu et al. 2021a; Peng et al. 2022). The results in Table 4 show that the elaborately designed pre-training strategies improve the accuracy both in the base and novel classes. Therefore, developing a more robust pre-training strategies is a promising way for the proposed NO-CL problem.

Ablations	CIFAR100	Mini-ImageNet
Metrics	Acc(base/novel)	Acc(base/novel)
Ours(+CE)	45.8(50.0/39.6)	47.7(52.6/40.3)
Ours(+SC)	48.6(52.4/42.9)	50.7(56.1/42.6)
+SSL	50.4(53.8/45.2)	52.2(57.3/44.6)
+DA	51.2(55.7/44.6)	53.2(58.2/45.8)

Table 4: The results of Different training strategies on CIFAR100 and Mini-ImageNet. SSL and DA mean self-supervised learning and data augmentation, respectively.

Refer to caption — Figure 1: Quantitative results of varying $\lambda$ .

	CIFAR100	Mini-ImageNet
Ablations	Acc(base/novel)/HM	Acc(bse/novel)/HM
w/ 256	44.8(49.5/37.9)/42.9	47.4(53.6/38.1)/44.5
w/ 1024	47.4(51.8/40.8)/45.7	49.7(55.4/41.1)/47.2
w/ 2048	48.6(52.4/42.9)/47.2	50.7(56.1/42.6)/48.4
w/ 3074	48.4(52.3/42.7)/47.0	50.8(56.3/42.6)/48.4

Table 5: Quantitative results of varying dimension of hyperdimensional embedding.

	CIFAR100						Mini-ImageNet
Metrics	ALICE	SSRE	SCR	DVC	OCM	Ours	ALICE	SSRE	SCR	DVC	OCM	Ours
Time(s)	512	291	165	126	561	35	793	457	254	194	831	61
Memory(GB)	1.9	3.2	1.8	1.6	12.8	1.4	4.4	6.8	4.1	2.8	21.4	1.9

Table 6: Quantitative comparisons of computation overhead in terms of online training time and memory footprint.

Appendix D: Hyper Parameter Analysis

Here, we analyse hyperparameters including feature transform coefficient $\lambda$ , online iteration $T$ , and the number of sampled prototypes $K$ . We provide quantitative results on the Mini-ImageNet dataset in Figures 1, 2, and 3. Also, the experiments of the varying dimension of hyperdimensional embedding are conducted in Table 5.

In Figure 1, we can see that $\lambda>1$ leads to degraded performance as the feature distribution is more concentrated close to 0. Meanwhile, decreasing too much $\lambda$ makes the distribution scattered and less aligned to the calibrated Gaussian distribution. Therefore, we set $\lambda$ as 0.5 in our experiment.

In Figure 2, we vary online iteration $T$ . Less $T$ iterations harm the network to accommodate online novel classes by refining hyperdimensional prototypes and aligning the projection module. Meanwhile, more online iterations only lead to slight degradation, which validates the plasticity of our method. Therefore, we set $T$ to 20 to achieve the stability-plasticity trade-off.

In Figure 3, we vary the number of sampled prototypes $K$ . Concretely, we vary the number of sampled prototypes of base classes $K_{base}$ , novel classes $K_{novel}$ , and all classes $K$ . We can see that the imbalance sampling of classes leads to performance degradation, which is similar to the class imbalance problem (Hou et al. 2019; Wu et al. 2019). Also, increasing $K$ does not bring many gains while inducing computation overheads. Therefore, we set $K=20$ .

From Table 5, we can learn that increasing the dimension of hyperdimensional embedding benefits the proposed method while too large dimension brings little gain. Therefore, we set the dimension of hyperdimensional embedding as 2048 in our experiments.

Appendix E: Computation Overhead Analysis

For computation overheads during online learning, which is usually considered in OCL scenarios (Fini et al. 2020), we provide more quantitative comparisons to OCL, NE-CL, and FS-CL methods in Table 6. The batchsize of example-based methods is set as 10. As we only align prototypes by finetuning the projection module, which is much more efficient compared with training the whole network. Therefore, our method has clear advantages on computation overheads for online continual learning. Meanwhile, the bi-level optimization quickly converges as shown in Figure 4.

References

Aljundi et al. (2019) Aljundi, R.; Belilovsky, E.; Tuytelaars, T.; Charlin, L.; Caccia, M.; Lin, M.; and Page-Caccia, L. 2019. Online Continual Learning with Maximal Interfered Retrieval. In NeurIPS.
Fini et al. (2020) Fini, E.; Lathuilière, S.; Sangineto, E.; Nabi, M.; and Ricci, E. 2020. Online Continual Learning under Extreme Memory Constraints. In ECCV, 720–735.
Gu et al. (2022) Gu, Y.; Yang, X.; Wei, K.; and Deng, C. 2022. Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency. In CVPR, 7442–7451.
Guo, Liu, and Zhao (2022) Guo, Y.; Liu, B.; and Zhao, D. 2022. Online Continual Learning through Mutual Information Maximization. In ICML.
Hou et al. (2019) Hou, S.; Pan, X.; Loy, C. C.; Wang, Z.; and Lin, D. 2019. Learning a Unified Classifier Incrementally via Rebalancing. In CVPR.
Jing and Tian (2021) Jing, L.; and Tian, Y. 2021. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE TPAMI, 43(11): 4037–4058.
Kalla and Biswas (2022) Kalla, J.; and Biswas, S. 2022. S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning. In ECCV, 432–448. Cham.
Khosla et al. (2020) Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; and Krishnan, D. 2020. Supervised Contrastive Learning. In NeurIPS, 18661–18673.
Krizhevsky and Hinton (2009) Krizhevsky, A.; and Hinton, G. 2009. Learning multiple layers of features from tiny images. In Technical Report.
Lee, Hwang, and Shin (2020) Lee, H.; Hwang, S. J.; and Shin, J. 2020. Self-supervised Label Augmentation via Input Transformations. In ICML, 5714–5724.
Lomonaco and Maltoni (2017) Lomonaco, V.; and Maltoni, D. 2017. CORe50: a New Dataset and Benchmark for Continuous Object Recognition. In CoRL, 17–26.
Mai et al. (2021) Mai, Z.; Li, R.; Kim, H.; and Sanner, S. 2021. Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning. In CVPR Workshops, 3589–3599.
Peng et al. (2022) Peng, C.; Zhao, K.; Wang, T.; Li, M.; and Lovell, B. C. 2022. Few-Shot Class-Incremental Learning from an Open-Set Perspective. In ECCV, 382–397.
Prabhu, Torr, and Dokania (2020) Prabhu, A.; Torr, P. H. S.; and Dokania, P. K. 2020. GDumb: A Simple Approach that Questions Our Progress in Continual Learning. In ECCV, 524–540.
Shim et al. (2021) Shim, D.; Mai, Z.; Jeong, J.; Sanner, S.; Kim, H.; and Jang, J. 2021. Online Class-Incremental Continual Learning with Adversarial Shapley Value. AAAI.
Vinyals et al. (2016) Vinyals, O.; Blundell, C.; Lillicrap, T.; kavukcuoglu, k.; and Wierstra, D. 2016. Matching Networks for One Shot Learning. In Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, I.; and Garnett, R., eds., NeurIPS.
Wu et al. (2019) Wu, Y.; Chen, Y.; Wang, L.; Ye, Y.; Liu, Z.; Guo, Y.; and Fu, Y. 2019. Large Scale Incremental Learning. In CVPR.
Zhou et al. (2022a) Zhou, D.-W.; Wang, F.-Y.; Ye, H.-J.; Ma, L.; Pu, S.; and Zhan, D.-C. 2022a. Forward Compatible Few-Shot Class-Incremental Learning. In CVPR, 9046–9056.
Zhou et al. (2022b) Zhou, D.-W.; Wang, F.-Y.; Ye, H.-J.; and Zhan, D.-C. 2022b. PyCIL: A Python Toolbox for Class-Incremental Learning. SCIENCE CHINA Information Sciences.
Zhu et al. (2021a) Zhu, F.; Cheng, Z.; Zhang, X.-y.; and Liu, C.-l. 2021a. Class-Incremental Learning via Dual Augmentation. In NeurIPS, 14306–14318.
Zhu et al. (2021b) Zhu, F.; Zhang, X.-Y.; Wang, C.; Yin, F.; and Liu, C.-L. 2021b. Prototype Augmentation and Self-Supervision for Incremental Learning. In CVPR, 5871–5880.
Zhu et al. (2022) Zhu, K.; Zhai, W.; Cao, Y.; Luo, J.; and Zha, Z.-J. 2022. Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning. In CVPR, 9296–9305.

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	35.0(49.3/25.5) $\|$ 33.6		37.1(59.2/22.3) $\|$ 32.4		36.5(58.6/21.8) $\|$ 31.8
PASS	25.2(62.8/0.2) $\|$ 0.4		26.4(65.1/0.6) $\|$ 1.2		25.8(63.4/0.8) $\|$ 1.6
MS	1000	2000	1000	2000	1000	2000
\hdashlineSCR	37.3(36.9/37.5) $\|$ 37.2	38.7(38.9/38.6) $\|$ 38.7	32.7(34.2/31.7) $\|$ 32.9	34.9(36.9/33.6) $\|$ 35.2	29.8(29.1/30.3) $\|$ 29.7	36.3(39.2/34.4) $\|$ 36.6
SCR_ft	34.2(45.2/26.8) $\|$ 33.6	37.7(49.8/29.6) $\|$ 37.1	30.9(45.2/21.3) $\|$ 21.3	36.4(50.8/26.8) $\|$ 35.1	32.9(43.2/26.1) $\|$ 32.5	36.8(50.8/27.5) $\|$ 35.7
OCM	37.6(37.5/37.6) $\|$ 37.5	39.5(40.5/38.9) $\|$ 39.7	31.6(32.9/30.8) $\|$ 31.8	37.1(36.1/37.8) $\|$ 36.9	31.3(30.8/31.6) $\|$ 31.2	31.7(32.9/30.9) $\|$ 31.9
OCM_ft	35.8(43.1/30.9) $\|$ 36.0	36.7(46.1/30.6) $\|$ 36.7	32.4(48.2/21.8) $\|$ 30.1	35.9(50.6/26.1) $\|$ 36.8	30.2(38.1/24.9) $\|$ 31.4	37.7(51.2/28.7) $\|$ 36.8
DVC	37.5(36.9/37.9) $\|$ 37.3	38.7(39.8/38.0) $\|$ 38.9	29.9(30.0/29.8) $\|$ 29.9	37.2(34.6/39.0) $\|$ 36.6	29.7(29.9/29.6) $\|$ 29.7	33.3(35.6/31.8) $\|$ 33.6
DVC_ft	36.7(45.8/30.6) $\|$ 36.7	37.9(45.1/33.1) $\|$ 38.2	32.3(43.6/24.7) $\|$ 31.5	36.5(46.5/29.8) $\|$ 36.3	31.4(39.4/26.1) $\|$ 31.3	32.6(38.5/28.6) $\|$ 32.8
Ours	45.5(44.2/46.4) $\|$ 45.3		38.6(43.8/35.2) $\|$ 39.0		40.4(55.4/30.4) $\|$ 39.3

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	44.7(48.2/30.6) $\|$ 37.4		50.5(57.2/23.8) $\|$ 33.6		50.3(56.4/26.1) $\|$ 35.7
PASS	48.4(60.3/0.6) $\|$ 1.2		50.4(62.8/0.8) $\|$ 1.6		49.8(62.0/0.9) $\|$ 1.8
MS	1000	2000	1000	2000	1000	2000
SCR	39.9(39.2/42.9) $\|$ 40.9	42.9(43.5/40.8) $\|$ 42.1	40.7(42.2/34.7) $\|$ 38.1	46.3(47.2/42.8) $\|$ 44.9	40.4(40.6/39.7) $\|$ 40.1	44.5(43.8/47.3) $\|$ 45.5
SCR_ft	45.4(48.9/31.2)) $\|$ 38.1	49.7(53.7/33.7) $\|$ 54.6	47.4(50.6/34.8) $\|$ 41.2	49.2(53.2/33.4) $\|$ 41.0	40.2(45.2/32.8) $\|$ 38.0	47.0(49.2/38.5) $\|$ 43.5
OCM	43.4(43.8/42.2) $\|$ 42.9	43.2(43.6/41.8) $\|$ 42.7	39.9(39.8/40.6) $\|$ 40.2	43.6(43.9/42.3) $\|$ 43.1	39.5(38.9/41.8) $\|$ 40.3	43.5(42.9/46.0) $\|$ 44.3
OCM_ft	46.8(49.8/34.8) $\|$ 40.9	47.8(49.8/39.7) $\|$ 44.2	46.1(48.8/35.1) $\|$ 40.8	48.7(51.8/36.3) $\|$ 42.7	44.1(46.1/36.3) $\|$ 40.6	46.3(48.6/37.2) $\|$ 42.1
DVC	43.0(42.8/43.8) $\|$ 43.3	45.2(45.8/42.8) $\|$ 44.2	39.4(38.6/42.5) $\|$ 40.5	40.1(39.7/41.9) $\|$ 40.8	41.6(42.6/37.4) $\|$ 39.8	45.4(45.6/44.8) $\|$ 45.2
DVC_ft	48.0(50.3/39.0) $\|$ 43.9	50.3(53.7/36.9) $\|$ 43.7	41.1(41.6/38.9) $\|$ 40.2	42.4(43.9/36.4) $\|$ 39.8	45.3(48.6/31.9) $\|$ 38.5	47.8(50.8/36.2) $\|$ 42.2
Ours	55.2(55.6/53.7) $\|$ 54.6		54.3(56.2/46.8) $\|$ 51.1		52.2(52.6/50.8) $\|$ 51.7

Methods	CORE-50		CIFAR100		Mini-ImageNet
Metrics	Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM		Acc(base/novel) $\|$ HM
ALICE	39.5(46.2/29.5) $\|$ 36.0		42.5(53.5/25.9) $\|$ 34.9		41.1(51.4/25.7) $\|$ 34.3
PASS	35.2(58.3/0.8) $\|$ 1.6		37.9(62.6/1.0) $\|$ 2.0		37.2(61.2/1.1) $\|$ 2.2
MS	1000	2000	1000	2000	1000	2000
\hdashlineSCR	38.6(37.2/40.6) $\|$ 38.8	38.6(39.2/37.6) $\|$ 38.4	36.9(38.8/34.1) $\|$ 36.3	40.7(42.8/37.6) $\|$ 40.0	34.4(34.6/34.2) $\|$ 34.4	35.1(38.6/29.8) $\|$ 33.6
SCR_ft	36.6(42.4/27.9) $\|$ 33.7	41.3(49.8/28.6) $\|$ 36.3	38.6(49.2/22.8) $\|$ 31.2	41.1(52.8/23.7) $\|$ 32.7	37.9(43.2/30.0) $\|$ 35.4	39.5(43.7/33.2) $\|$ 37.7
OCM	38.3(38.8/37.6) $\|$ 38.2	41.1(42.1/39.6) $\|$ 40.8	36.3(36.5/35.9) $\|$ 36.2	41.4(42.7/39.6) $\|$ 41.1	35.9(35.2/37.1) $\|$ 36.1	39.5(43.7/33.2) $\|$ 37.7
OCM_ft	37.7(43.1/29.7) $\|$ 35.2	43.3(47.9/36.4) $\|$ 41.4	39.7(45.7/30.7) $\|$ 36.7	40.9(45.9/33.4) $\|$ 38.7	36.8(39.8/32.2) $\|$ 35.4	41.5(43.5/38.4) $\|$ 40.7
DVC	38.1(37.8/38.6) $\|$ 38.2	40.5(41.8/38.6) $\|$ 40.1	37.5(37.2/38.1) $\|$ 37.7	40.2(41.6/38.2) $\|$ 39.8	34.4(32.9/36.7) $\|$ 34.7	37.1(35.4/39.6) $\|$ 37.4
DVC_ft	39.8(45.9/30.7) $\|$ 36.8	41.8(46.3/35.0) $\|$ 39.9	37.8(42.1/31.4) $\|$ 35.9	40.6(44.8/34.3) $\|$ 38.9	35.1(38.9/29.4) $\|$ 33.4	37.7(38.7/36.1) $\|$ 37.4
Ours	49.5(49.1/50.2)49.6		47.6(51.6/41.7) $\|$ 46.1		49.1(53.8/42.1) $\|$ 47.2

Non-Exemplar Online Class-incremental Continual Learning via Dual-prototype Self-augment and Refinement ——Appendix——

Overview

References

Non-Exemplar Online Class-incremental Continual Learning via
Dual-prototype Self-augment and Refinement
——Appendix——