Supplementary Materials of
Hint-Aug: Drawing Hints from Vision Foundation Models
towards Boosted Few-shot Parameter-Efficient ViT Tuning
# Shot | Tuning Technique | Augment | Food | Pets | Cars | Flowers | Aircraft |
---|---|---|---|---|---|---|---|
8-shot | Adapter | No-aug | 48.12 | 89.45 | 50.63 | 93.91 | 39.87 |
[zhang2022neural] | 53.59 | 90.35 | 52.15 | 94.82 | 34.29 | ||
Proposed | 57.19 | 91.41 | 55.44 | 95.17 | 42.35 | ||
VPT | No-aug | 49.13 | 92.12 | 48.97 | 89.64 | 38.79 | |
[zhang2022neural] | 53.12 | 90.71 | 47.74 | 94.36 | 36.12 | ||
Proposed | 55.85 | 92.48 | 49.29 | 94.83 | 40.15 | ||
LoRA | No-aug | 46.39 | 88.31 | 47.68 | 93.75 | 39.62 | |
[zhang2022neural] | 53.16 | 89.75 | 55.81 | 95.70 | 38.61 | ||
Proposed | 55.11 | 90.22 | 57.12 | 96.01 | 41.10 | ||
16-shot | Adapter | No-aug | 58.41 | 91.09 | 73.56 | 95.86 | 59.41 |
[zhang2022neural] | 63.05 | 91.58 | 75.03 | 96.77 | 54.97 | ||
Proposed | 65.01 | 92.45 | 77.54 | 97.48 | 62.42 | ||
VPT | No-aug | 57.53 | 94.51 | 69.95 | 90.41 | 55.12 | |
[zhang2022neural] | 62.06 | 91.63 | 69.99 | 95.83 | 50.29 | ||
Proposed | 62.87 | 95.04 | 71.83 | 96.51 | 58.39 | ||
LoRA | No-aug | 58.69 | 90.24 | 73.28 | 97.12 | 58.81 | |
[zhang2022neural] | 62.37 | 91.93 | 77.74 | 97.65 | 56.62 | ||
Proposed | 63.10 | 92.38 | 78.81 | 98.10 | 59.49 |
1 Overview and Outline
In this supplement, we provide more experiments and analysis as a complement to the main content, which are outlined below:
-
•
We evaluate Hint-Aug’s ability in boosting the FViTs’ few-shot parameter-efficient tuning accuracy across different pretraining strategies in Supple. 2.
-
•
We evaluate Hint-Aug’s ability in boosting the FViTs’ accuracy under the less data-limited transfer learning setting in Supple. 3.
-
•
We conduct an ablation study on Hint-Aug’s sensitivity to in Eq. 3 from the main paper in Supple. 4.
2 Hint-Aug’s Achieved Accuracy Across Different Pretraining Strategies
The pretraining strategy is a key factor in fully exploiting the FViTs’ potential [he2021masked, bao2021beit, chen2021empirical]. As mentioned in [he2021masked, chen2021empirical], it plays an even more important role than the FViT models’ architectures, given that we are now benchmarking under transformer-based models of similar architectures (e.g., applying the pretraining strategy as in [he2021masked] on ViT-Base can compensate for its accuracy gap with ViT-Huge, which has nine times more number of parameters). With this in mind, we aim to validate Hint-Aug’s capability in boosting the FViTs’ accuracy across models trained with different pretraining strategies. To do this, we validate the proposed Hint-Aug by applying it on top of the ViT-Base [dosovitskiy2020image] models pretrained with a widely used SOTA pretraining strategy, MAE [he2021masked], while keeping all other experiment settings the same as those used in the experiments described in the main paper.
As shown in Table 1, Hint-Aug can consistently boost the achievable accuracy over that of the baseline tuning methods by and on an 8-shot and 16-shot setting, respectively. This validates that Hint-Aug is a general method that can consistently improve the effectiveness of few-shot parameter-efficient tuning of FViTs.
3 Hint-Aug’s Ability in Boosting Transfer Learning Accuracy
We further validate Hint-Aug’s achieved accuracy when we use more tuning data as in the conventional transfer learning scenario because it can evaluate Hint-Aug’s potential as a general technique to improve parameter-efficient tuning accuracy under various scenarios. In this set of experiments, we tune the ImageNet pretrained ViT-Base [dosovitskiy2020image] on five datasets including Caltech-101 [fei2004learning], CIFAR-100 [krizhevsky2009learning], DTD [cimpoi2014describing], Flowers [nilsback2006visual], and Pets [parkhi2012cats]. We follow the same tuning setting in [zhang2022neural] and show our results in Table 2. We can see that Hint-Aug achieves as high as 2.47%, 0.90%, and 3.77% accuracy improvement over the baseline method when tuning Adapter, VPT, and LoRA, respectively. This further validates that (1) Hint-Aug can generally boost the achievable accuracy under various settings in parameter-efficient tuning, and (2) the features generated in Hint-Aug are of high quality and can further improve the feature diversity even when more data (e.g., 50,000 tuning data in CIFAR-100 dataset) are provided.
Tuning Technique | Augment | Caltech-101 | CIFAR-100 | DTD | Flowers | Pets |
---|---|---|---|---|---|---|
Adapter | No-aug | 90.10 | 69.22 | 68.04 | 98.61 | 89.88 |
[zhang2022neural] | 88.58 | 64.79 | 70.32 | 98.68 | 90.61 | |
Proposed | 91.14 | 71.67 | 70.64 | 99.01 | 90.86 | |
VPT | No-aug | 90.83 | 76.76 | 65.78 | 98.03 | 88.27 |
[zhang2022neural] | 86.32 | 57.14 | 68.78 | 98.41 | 90.57 | |
Proposed | 91.64 | 77.23 | 69.68 | 98.83 | 90.71 | |
LoRA | No-aug | 91.37 | 67.12 | 69.40 | 98.55 | 90.38 |
[zhang2022neural] | 88.35 | 65.91 | 71.07 | 98.69 | 91.11 | |
Proposed | 92.43 | 70.87 | 71.21 | 99.06 | 91.81 |
4 Ablation Study on ’s Impact in Hint-Aug’s Achievable Accuracy
To validate Hint-Aug’s ability when being applied to new tasks, we perform an ablation study regarding the impact of the only hyperparameter, in Eq. 3, in Hint-Aug, under 8-shot Adapter tuning of ViT-Base [dosovitskiy2020image] on the Food [bossard2014food] dataset. As shown in Table 3, different values of lead to marginal differences in the achievable accuracy (i.e., lower than 0.2%). This validates that Hint-Aug is not sensitive to the hyperparameter being used, alleviating the potential burden of having to tune hyperparameters when applying Hint-Aug to a new task.
0.2 | 0.1 | 0.05 | |
---|---|---|---|
Accuracy (%) | 70.88 | 71.04 | 70.85 |