This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Supplementary Materials of
Hint-Aug: Drawing Hints from Vision Foundation Models
towards Boosted Few-shot Parameter-Efficient ViT Tuning

First Author
Institution1
Institution1 address
[email protected]
   Second Author
Institution2
First line of institution2 address
[email protected]
Table 1: Benchmarking Hint-Aug with ViT-Base pretrained with MAE. Best results are marked in bold.
# Shot Tuning Technique Augment Food Pets Cars Flowers Aircraft
8-shot Adapter No-aug 48.12 89.45 50.63 93.91 39.87
[zhang2022neural] 53.59 90.35 52.15 94.82 34.29
Proposed 57.19 91.41 55.44 95.17 42.35
VPT No-aug 49.13 92.12 48.97 89.64 38.79
[zhang2022neural] 53.12 90.71 47.74 94.36 36.12
Proposed 55.85 92.48 49.29 94.83 40.15
LoRA No-aug 46.39 88.31 47.68 93.75 39.62
[zhang2022neural] 53.16 89.75 55.81 95.70 38.61
Proposed 55.11 90.22 57.12 96.01 41.10
16-shot Adapter No-aug 58.41 91.09 73.56 95.86 59.41
[zhang2022neural] 63.05 91.58 75.03 96.77 54.97
Proposed 65.01 92.45 77.54 97.48 62.42
VPT No-aug 57.53 94.51 69.95 90.41 55.12
[zhang2022neural] 62.06 91.63 69.99 95.83 50.29
Proposed 62.87 95.04 71.83 96.51 58.39
LoRA No-aug 58.69 90.24 73.28 97.12 58.81
[zhang2022neural] 62.37 91.93 77.74 97.65 56.62
Proposed 63.10 92.38 78.81 98.10 59.49

1 Overview and Outline

In this supplement, we provide more experiments and analysis as a complement to the main content, which are outlined below:

  • We evaluate Hint-Aug’s ability in boosting the FViTs’ few-shot parameter-efficient tuning accuracy across different pretraining strategies in Supple. 2.

  • We evaluate Hint-Aug’s ability in boosting the FViTs’ accuracy under the less data-limited transfer learning setting in Supple. 3.

  • We conduct an ablation study on Hint-Aug’s sensitivity to λ\lambda in Eq. 3 from the main paper in Supple. 4.

2 Hint-Aug’s Achieved Accuracy Across Different Pretraining Strategies

The pretraining strategy is a key factor in fully exploiting the FViTs’ potential [he2021masked, bao2021beit, chen2021empirical]. As mentioned in [he2021masked, chen2021empirical], it plays an even more important role than the FViT models’ architectures, given that we are now benchmarking under transformer-based models of similar architectures (e.g., applying the pretraining strategy as in [he2021masked] on ViT-Base can compensate for its accuracy gap with ViT-Huge, which has nine times more number of parameters). With this in mind, we aim to validate Hint-Aug’s capability in boosting the FViTs’ accuracy across models trained with different pretraining strategies. To do this, we validate the proposed Hint-Aug by applying it on top of the ViT-Base [dosovitskiy2020image] models pretrained with a widely used SOTA pretraining strategy, MAE [he2021masked], while keeping all other experiment settings the same as those used in the experiments described in the main paper.

As shown in Table 1, Hint-Aug can consistently boost the achievable accuracy over that of the baseline tuning methods by 0.31%3.60%0.31\%\sim 3.60\% and 0.45%3.27%0.45\%\sim 3.27\% on an 8-shot and 16-shot setting, respectively. This validates that Hint-Aug is a general method that can consistently improve the effectiveness of few-shot parameter-efficient tuning of FViTs.

3 Hint-Aug’s Ability in Boosting Transfer Learning Accuracy

We further validate Hint-Aug’s achieved accuracy when we use more tuning data as in the conventional transfer learning scenario because it can evaluate Hint-Aug’s potential as a general technique to improve parameter-efficient tuning accuracy under various scenarios. In this set of experiments, we tune the ImageNet pretrained ViT-Base [dosovitskiy2020image] on five datasets including Caltech-101 [fei2004learning], CIFAR-100 [krizhevsky2009learning], DTD [cimpoi2014describing], Flowers [nilsback2006visual], and Pets [parkhi2012cats]. We follow the same tuning setting in [zhang2022neural] and show our results in Table 2. We can see that Hint-Aug achieves as high as 2.47%, 0.90%, and 3.77% accuracy improvement over the baseline method when tuning Adapter, VPT, and LoRA, respectively. This further validates that (1) Hint-Aug can generally boost the achievable accuracy under various settings in parameter-efficient tuning, and (2) the features generated in Hint-Aug are of high quality and can further improve the feature diversity even when more data (e.g., 50,000 tuning data in CIFAR-100 dataset) are provided.

Table 2: Benchmarking Hint-Aug with ViT-Base on transfer learning datasets. Best results are marked in bold.
Tuning Technique Augment Caltech-101 CIFAR-100 DTD Flowers Pets
Adapter No-aug 90.10 69.22 68.04 98.61 89.88
[zhang2022neural] 88.58 64.79 70.32 98.68 90.61
Proposed 91.14 71.67 70.64 99.01 90.86
VPT No-aug 90.83 76.76 65.78 98.03 88.27
[zhang2022neural] 86.32 57.14 68.78 98.41 90.57
Proposed 91.64 77.23 69.68 98.83 90.71
LoRA No-aug 91.37 67.12 69.40 98.55 90.38
[zhang2022neural] 88.35 65.91 71.07 98.69 91.11
Proposed 92.43 70.87 71.21 99.06 91.81

4 Ablation Study on λ\lambda’s Impact in Hint-Aug’s Achievable Accuracy

To validate Hint-Aug’s ability when being applied to new tasks, we perform an ablation study regarding the impact of the only hyperparameter, λ\lambda in Eq. 3, in Hint-Aug, under 8-shot Adapter tuning of ViT-Base [dosovitskiy2020image] on the Food [bossard2014food] dataset. As shown in Table 3, different values of λ\lambda lead to marginal differences in the achievable accuracy (i.e., lower than 0.2%). This validates that Hint-Aug is not sensitive to the hyperparameter being used, alleviating the potential burden of having to tune hyperparameters when applying Hint-Aug to a new task.

Table 3: Ablation study on λ\lambda’s impact.
λ\lambda 0.2 0.1 0.05
Accuracy (%) 70.88 71.04 70.85