Non-Exemplar Online Class-incremental Continual Learning via
Dual-prototype Self-augment and Refinement
——Appendix——
Overview
The appendix presents more experimental settings, results, and analyses as follows:
Appendix A More implementation details
Appendix B More results on different dataset partitions.
Appendix C Results and analysis on different base session training strategies.
Appendix D Hyper parameter analysis
Appendix E Computation overhead analysis
Appendix A: More Implementation Details
Dataset Overview. We conduct experiments on three widely used datasets, including CORE-50 (Lomonaco and Maltoni 2017), CIFAR 100 (Krizhevsky and Hinton 2009), and Mini-ImageNet (Vinyals et al. 2016). Here we give brief introductions. CORE-50 (Lomonaco and Maltoni 2017) is a benchmark designed for class incremental learning with 50 classes. Each class has around 2,398 training images and 900 testing images, with the size of . CIFAR 100 (Krizhevsky and Hinton 2009) contains 60000 images of size from 100 classes, and each class includes 500 training images and 100 test images. Mini-ImageNet (Vinyals et al. 2016) contains 100 classes and is divided into 10 sub-datasets for 10 disjoint tasks, and each task contains 10 classes. Each task comprises 5,000 training images and 1,000 testing images, all with the size of .
Training Details. For OCL methods, we employ the same dataset partitions and training protocols of NO-CL, i.e., pre-training on the base classes and then online class-incremental learning with example buffers. Other hyperparameters are adopted as default. The example buffers are restored and retrieved during the whole training procedure with the default updating pipeline. MIR (Aljundi et al. 2019), GD (Prabhu, Torr, and Dokania 2020), ASER (Shim et al. 2021), SCR (Mai et al. 2021), and DVC (Gu et al. 2022) are based on the OCL codebase 222https://github.com/RaptorMai/online-continual-learning. Other methods are implemented with the public released codes. For FS-CL methods, FACT (Zhou et al. 2022a) and ALICE (Peng et al. 2022), we also adopt the same protocols as NO-CL. The prototypes of novel classes are computed by all data samples rather than few-shot samples. During the inference phase, FACT and ALICE directly infer incremental data samples via computed prototypes without finetuning the network. All methods are implemented with the public released codes. For NE-CL methods (Zhu et al. 2021b, 2022), as (Zhu et al. 2022) does not provide training scripts, we adopt the three-party codes (Zhou et al. 2022b)333https://github.com/G-U-N/PyCIL on CIFAR100 dataset. For (Zhu et al. 2021b), we employ the same dataset partitions and training and testing protocols of NO-CL. Note that all methods are employed the same reduced ResNet-18 as the feature extractor for fair comparisons. All experiments are conducted with NVIDIA RTX3090 GPU on CUDA 11.4 using PyTorch framework.
Base Session Training Details. Here, we give the details of our base session training strategy. We employ base training functions on the outputs of the feature extractor and projection module to obtain vanilla and high-dimensional prototypes for sequentially online sessions: . and . , , , and denote input samples, labels, feature extractor, and projection module. are linear layers to align vanilla- and high-dimensional prototypes for loss calculations. For cross-entropy (CE) loss functions, are one-layer MLP with the output dimension of base class. For supervised contrastive (SC) loss (Khosla et al. 2020), we follow SCR (Mai et al. 2021) and adopt the same hyper-parameters of SCR. are two-layer MLP with the dimension of 160 and 128 and the temperature is set to 0.1.
Appendix B: Results on Different Dataset Partitions
Due to space constraints in the main paper, in this subsection, we report the additional results of different dataset partitions. Concretely, as shown in Tables 1 and 2, we conduct experiments on the configuration of and , where and classes are selected as the base classes, and the rest classes are continually fed to the network in 10 sessions. Moreover, more incremental sessions (i.e., 20 sessions) with the configuration of are also compared in Table 3. The five representative state-of-the-art methods are compared with the same training and inference protocols as Non-exemplar Online Class-incremental continual Learning (NO-CL), including Online Class-incremental continual Learning (OCL) methods (i.e., SCR (Mai et al. 2021), OCM (Guo, Liu, and Zhao 2022), and DVC(Gu et al. 2022)), Non-Exemplar Class-incremental continual Learning (NE-CL) method (i.e., PASS (Zhu et al. 2021b)), and Few-Shot Class-incremental Learning (FS-CL) method (i.e., ALICE (Peng et al. 2022)). As we can see from Tables 1 and 2, fewer base classes result in poor performance, both in base and novel classes, because the network lacks enough pre-trained information to generalize to novel classes. Meanwhile, as our method depends on the inner-prototype computed by the pre-trained backbone, the well-trained backbone benefits us much for the prototype refinement. Notably, even with base classes, our method also achieves the best performance in Acc and HM metrics, which validates the robustness of prototype refinement strategies. Moreover, as for more incremental sessions in Table 3, the performance only drops slightly. Overall, experiments on different dataset partitions validate the effectiveness and robustness of our method.
Methods | CORE-50 | CIFAR100 | Mini-ImageNet | |||
---|---|---|---|---|---|---|
Metrics | Acc(base/novel)HM | Acc(base/novel)HM | Acc(base/novel)HM | |||
ALICE | 35.0(49.3/25.5)33.6 | 37.1(59.2/22.3)32.4 | 36.5(58.6/21.8)31.8 | |||
PASS | 25.2(62.8/0.2)0.4 | 26.4(65.1/0.6)1.2 | 25.8(63.4/0.8)1.6 | |||
MS | 1000 | 2000 | 1000 | 2000 | 1000 | 2000 |
\hdashlineSCR | 37.3(36.9/37.5)37.2 | 38.7(38.9/38.6)38.7 | 32.7(34.2/31.7)32.9 | 34.9(36.9/33.6)35.2 | 29.8(29.1/30.3)29.7 | 36.3(39.2/34.4)36.6 |
SCRft | 34.2(45.2/26.8)33.6 | 37.7(49.8/29.6)37.1 | 30.9(45.2/21.3)21.3 | 36.4(50.8/26.8)35.1 | 32.9(43.2/26.1)32.5 | 36.8(50.8/27.5)35.7 |
OCM | 37.6(37.5/37.6)37.5 | 39.5(40.5/38.9)39.7 | 31.6(32.9/30.8)31.8 | 37.1(36.1/37.8)36.9 | 31.3(30.8/31.6)31.2 | 31.7(32.9/30.9)31.9 |
OCMft | 35.8(43.1/30.9)36.0 | 36.7(46.1/30.6)36.7 | 32.4(48.2/21.8)30.1 | 35.9(50.6/26.1)36.8 | 30.2(38.1/24.9)31.4 | 37.7(51.2/28.7)36.8 |
DVC | 37.5(36.9/37.9)37.3 | 38.7(39.8/38.0)38.9 | 29.9(30.0/29.8)29.9 | 37.2(34.6/39.0)36.6 | 29.7(29.9/29.6)29.7 | 33.3(35.6/31.8)33.6 |
DVCft | 36.7(45.8/30.6)36.7 | 37.9(45.1/33.1)38.2 | 32.3(43.6/24.7)31.5 | 36.5(46.5/29.8)36.3 | 31.4(39.4/26.1)31.3 | 32.6(38.5/28.6)32.8 |
Ours | 45.5(44.2/46.4)45.3 | 38.6(43.8/35.2)39.0 | 40.4(55.4/30.4)39.3 |
Methods | CORE-50 | CIFAR100 | Mini-ImageNet | |||
---|---|---|---|---|---|---|
Metrics | Acc(base/novel)HM | Acc(base/novel)HM | Acc(base/novel)HM | |||
ALICE | 44.7(48.2/30.6)37.4 | 50.5(57.2/23.8)33.6 | 50.3(56.4/26.1)35.7 | |||
PASS | 48.4(60.3/0.6)1.2 | 50.4(62.8/0.8)1.6 | 49.8(62.0/0.9)1.8 | |||
MS | 1000 | 2000 | 1000 | 2000 | 1000 | 2000 |
SCR | 39.9(39.2/42.9)40.9 | 42.9(43.5/40.8)42.1 | 40.7(42.2/34.7)38.1 | 46.3(47.2/42.8)44.9 | 40.4(40.6/39.7)40.1 | 44.5(43.8/47.3)45.5 |
SCRft | 45.4(48.9/31.2))38.1 | 49.7(53.7/33.7)54.6 | 47.4(50.6/34.8)41.2 | 49.2(53.2/33.4)41.0 | 40.2(45.2/32.8)38.0 | 47.0(49.2/38.5)43.5 |
OCM | 43.4(43.8/42.2)42.9 | 43.2(43.6/41.8)42.7 | 39.9(39.8/40.6)40.2 | 43.6(43.9/42.3)43.1 | 39.5(38.9/41.8)40.3 | 43.5(42.9/46.0)44.3 |
OCMft | 46.8(49.8/34.8)40.9 | 47.8(49.8/39.7)44.2 | 46.1(48.8/35.1)40.8 | 48.7(51.8/36.3)42.7 | 44.1(46.1/36.3)40.6 | 46.3(48.6/37.2)42.1 |
DVC | 43.0(42.8/43.8)43.3 | 45.2(45.8/42.8)44.2 | 39.4(38.6/42.5)40.5 | 40.1(39.7/41.9)40.8 | 41.6(42.6/37.4)39.8 | 45.4(45.6/44.8)45.2 |
DVCft | 48.0(50.3/39.0)43.9 | 50.3(53.7/36.9)43.7 | 41.1(41.6/38.9)40.2 | 42.4(43.9/36.4)39.8 | 45.3(48.6/31.9)38.5 | 47.8(50.8/36.2)42.2 |
Ours | 55.2(55.6/53.7)54.6 | 54.3(56.2/46.8)51.1 | 52.2(52.6/50.8)51.7 |
Methods | CORE-50 | CIFAR100 | Mini-ImageNet | |||
---|---|---|---|---|---|---|
Metrics | Acc(base/novel)HM | Acc(base/novel)HM | Acc(base/novel)HM | |||
ALICE | 39.5(46.2/29.5)36.0 | 42.5(53.5/25.9)34.9 | 41.1(51.4/25.7)34.3 | |||
PASS | 35.2(58.3/0.8)1.6 | 37.9(62.6/1.0)2.0 | 37.2(61.2/1.1)2.2 | |||
MS | 1000 | 2000 | 1000 | 2000 | 1000 | 2000 |
\hdashlineSCR | 38.6(37.2/40.6)38.8 | 38.6(39.2/37.6)38.4 | 36.9(38.8/34.1)36.3 | 40.7(42.8/37.6)40.0 | 34.4(34.6/34.2)34.4 | 35.1(38.6/29.8)33.6 |
SCRft | 36.6(42.4/27.9)33.7 | 41.3(49.8/28.6)36.3 | 38.6(49.2/22.8)31.2 | 41.1(52.8/23.7)32.7 | 37.9(43.2/30.0)35.4 | 39.5(43.7/33.2)37.7 |
OCM | 38.3(38.8/37.6)38.2 | 41.1(42.1/39.6)40.8 | 36.3(36.5/35.9)36.2 | 41.4(42.7/39.6)41.1 | 35.9(35.2/37.1)36.1 | 39.5(43.7/33.2)37.7 |
OCMft | 37.7(43.1/29.7)35.2 | 43.3(47.9/36.4)41.4 | 39.7(45.7/30.7)36.7 | 40.9(45.9/33.4)38.7 | 36.8(39.8/32.2)35.4 | 41.5(43.5/38.4)40.7 |
DVC | 38.1(37.8/38.6)38.2 | 40.5(41.8/38.6)40.1 | 37.5(37.2/38.1)37.7 | 40.2(41.6/38.2)39.8 | 34.4(32.9/36.7)34.7 | 37.1(35.4/39.6)37.4 |
DVCft | 39.8(45.9/30.7)36.8 | 41.8(46.3/35.0)39.9 | 37.8(42.1/31.4)35.9 | 40.6(44.8/34.3)38.9 | 35.1(38.9/29.4)33.4 | 37.7(38.7/36.1)37.4 |
Ours | 49.5(49.1/50.2)49.6 | 47.6(51.6/41.7)46.1 | 49.1(53.8/42.1)47.2 |
Appendix C: Results and Analysis on Different the Base Session Training Strategies
The stability and plasticity dilemma is a thorny problem in the area of continual learning. To deal with this dilemma, previous NE-CL (Zhu et al. 2021b, a) and FS-CL (Peng et al. 2022; Kalla and Biswas 2022) methods employ self-supervised learning (Jing and Tian 2021) and class and data augmentation to learn task-agnostic and transferable representations. For the problem of NO-CL, the base session training strategies also matter for the stability and plasticity dilemma. For fair comparisons, similar to (Mai et al. 2021; Gu et al. 2022; Guo, Liu, and Zhao 2022), we also employ supervised contrastive (SC) learning. Here, we provide two training strategies. Concretely, we add the extra self-supervised learning loss (Lee, Hwang, and Shin 2020) (+SSL) like (Zhu et al. 2021b; Kalla and Biswas 2022) and use the data augmentation strategy (+DA) proposed by (Zhu et al. 2021a; Peng et al. 2022). The results in Table 4 show that the elaborately designed pre-training strategies improve the accuracy both in the base and novel classes. Therefore, developing a more robust pre-training strategies is a promising way for the proposed NO-CL problem.
Ablations | CIFAR100 | Mini-ImageNet |
---|---|---|
Metrics | Acc(base/novel) | Acc(base/novel) |
Ours(+CE) | 45.8(50.0/39.6) | 47.7(52.6/40.3) |
Ours(+SC) | 48.6(52.4/42.9) | 50.7(56.1/42.6) |
+SSL | 50.4(53.8/45.2) | 52.2(57.3/44.6) |
+DA | 51.2(55.7/44.6) | 53.2(58.2/45.8) |



CIFAR100 | Mini-ImageNet | |
---|---|---|
Ablations | Acc(base/novel)/HM | Acc(bse/novel)/HM |
w/ 256 | 44.8(49.5/37.9)/42.9 | 47.4(53.6/38.1)/44.5 |
w/ 1024 | 47.4(51.8/40.8)/45.7 | 49.7(55.4/41.1)/47.2 |
w/ 2048 | 48.6(52.4/42.9)/47.2 | 50.7(56.1/42.6)/48.4 |
w/ 3074 | 48.4(52.3/42.7)/47.0 | 50.8(56.3/42.6)/48.4 |



CIFAR100 | Mini-ImageNet | |||||||||||
Metrics | ALICE | SSRE | SCR | DVC | OCM | Ours | ALICE | SSRE | SCR | DVC | OCM | Ours |
Time(s) | 512 | 291 | 165 | 126 | 561 | 35 | 793 | 457 | 254 | 194 | 831 | 61 |
Memory(GB) | 1.9 | 3.2 | 1.8 | 1.6 | 12.8 | 1.4 | 4.4 | 6.8 | 4.1 | 2.8 | 21.4 | 1.9 |
Appendix D: Hyper Parameter Analysis
Here, we analyse hyperparameters including feature transform coefficient , online iteration , and the number of sampled prototypes . We provide quantitative results on the Mini-ImageNet dataset in Figures 1, 2, and 3. Also, the experiments of the varying dimension of hyperdimensional embedding are conducted in Table 5.
In Figure 1, we can see that leads to degraded performance as the feature distribution is more concentrated close to 0. Meanwhile, decreasing too much makes the distribution scattered and less aligned to the calibrated Gaussian distribution. Therefore, we set as 0.5 in our experiment.
In Figure 2, we vary online iteration . Less iterations harm the network to accommodate online novel classes by refining hyperdimensional prototypes and aligning the projection module. Meanwhile, more online iterations only lead to slight degradation, which validates the plasticity of our method. Therefore, we set to 20 to achieve the stability-plasticity trade-off.
In Figure 3, we vary the number of sampled prototypes . Concretely, we vary the number of sampled prototypes of base classes , novel classes , and all classes . We can see that the imbalance sampling of classes leads to performance degradation, which is similar to the class imbalance problem (Hou et al. 2019; Wu et al. 2019). Also, increasing does not bring many gains while inducing computation overheads. Therefore, we set .
From Table 5, we can learn that increasing the dimension of hyperdimensional embedding benefits the proposed method while too large dimension brings little gain. Therefore, we set the dimension of hyperdimensional embedding as 2048 in our experiments.
Appendix E: Computation Overhead Analysis
For computation overheads during online learning, which is usually considered in OCL scenarios (Fini et al. 2020), we provide more quantitative comparisons to OCL, NE-CL, and FS-CL methods in Table 6. The batchsize of example-based methods is set as 10. As we only align prototypes by finetuning the projection module, which is much more efficient compared with training the whole network. Therefore, our method has clear advantages on computation overheads for online continual learning. Meanwhile, the bi-level optimization quickly converges as shown in Figure 4.
References
- Aljundi et al. (2019) Aljundi, R.; Belilovsky, E.; Tuytelaars, T.; Charlin, L.; Caccia, M.; Lin, M.; and Page-Caccia, L. 2019. Online Continual Learning with Maximal Interfered Retrieval. In NeurIPS.
- Fini et al. (2020) Fini, E.; Lathuilière, S.; Sangineto, E.; Nabi, M.; and Ricci, E. 2020. Online Continual Learning under Extreme Memory Constraints. In ECCV, 720–735.
- Gu et al. (2022) Gu, Y.; Yang, X.; Wei, K.; and Deng, C. 2022. Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency. In CVPR, 7442–7451.
- Guo, Liu, and Zhao (2022) Guo, Y.; Liu, B.; and Zhao, D. 2022. Online Continual Learning through Mutual Information Maximization. In ICML.
- Hou et al. (2019) Hou, S.; Pan, X.; Loy, C. C.; Wang, Z.; and Lin, D. 2019. Learning a Unified Classifier Incrementally via Rebalancing. In CVPR.
- Jing and Tian (2021) Jing, L.; and Tian, Y. 2021. Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey. IEEE TPAMI, 43(11): 4037–4058.
- Kalla and Biswas (2022) Kalla, J.; and Biswas, S. 2022. S3C: Self-Supervised Stochastic Classifiers for Few-Shot Class-Incremental Learning. In ECCV, 432–448. Cham.
- Khosla et al. (2020) Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; and Krishnan, D. 2020. Supervised Contrastive Learning. In NeurIPS, 18661–18673.
- Krizhevsky and Hinton (2009) Krizhevsky, A.; and Hinton, G. 2009. Learning multiple layers of features from tiny images. In Technical Report.
- Lee, Hwang, and Shin (2020) Lee, H.; Hwang, S. J.; and Shin, J. 2020. Self-supervised Label Augmentation via Input Transformations. In ICML, 5714–5724.
- Lomonaco and Maltoni (2017) Lomonaco, V.; and Maltoni, D. 2017. CORe50: a New Dataset and Benchmark for Continuous Object Recognition. In CoRL, 17–26.
- Mai et al. (2021) Mai, Z.; Li, R.; Kim, H.; and Sanner, S. 2021. Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning. In CVPR Workshops, 3589–3599.
- Peng et al. (2022) Peng, C.; Zhao, K.; Wang, T.; Li, M.; and Lovell, B. C. 2022. Few-Shot Class-Incremental Learning from an Open-Set Perspective. In ECCV, 382–397.
- Prabhu, Torr, and Dokania (2020) Prabhu, A.; Torr, P. H. S.; and Dokania, P. K. 2020. GDumb: A Simple Approach that Questions Our Progress in Continual Learning. In ECCV, 524–540.
- Shim et al. (2021) Shim, D.; Mai, Z.; Jeong, J.; Sanner, S.; Kim, H.; and Jang, J. 2021. Online Class-Incremental Continual Learning with Adversarial Shapley Value. AAAI.
- Vinyals et al. (2016) Vinyals, O.; Blundell, C.; Lillicrap, T.; kavukcuoglu, k.; and Wierstra, D. 2016. Matching Networks for One Shot Learning. In Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, I.; and Garnett, R., eds., NeurIPS.
- Wu et al. (2019) Wu, Y.; Chen, Y.; Wang, L.; Ye, Y.; Liu, Z.; Guo, Y.; and Fu, Y. 2019. Large Scale Incremental Learning. In CVPR.
- Zhou et al. (2022a) Zhou, D.-W.; Wang, F.-Y.; Ye, H.-J.; Ma, L.; Pu, S.; and Zhan, D.-C. 2022a. Forward Compatible Few-Shot Class-Incremental Learning. In CVPR, 9046–9056.
- Zhou et al. (2022b) Zhou, D.-W.; Wang, F.-Y.; Ye, H.-J.; and Zhan, D.-C. 2022b. PyCIL: A Python Toolbox for Class-Incremental Learning. SCIENCE CHINA Information Sciences.
- Zhu et al. (2021a) Zhu, F.; Cheng, Z.; Zhang, X.-y.; and Liu, C.-l. 2021a. Class-Incremental Learning via Dual Augmentation. In NeurIPS, 14306–14318.
- Zhu et al. (2021b) Zhu, F.; Zhang, X.-Y.; Wang, C.; Yin, F.; and Liu, C.-L. 2021b. Prototype Augmentation and Self-Supervision for Incremental Learning. In CVPR, 5871–5880.
- Zhu et al. (2022) Zhu, K.; Zhai, W.; Cao, Y.; Luo, J.; and Zha, Z.-J. 2022. Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning. In CVPR, 9296–9305.