Supplementary Materials of
Hint-Aug: Drawing Hints from Vision Foundation Models
towards Boosted Few-shot Parameter-Efficient ViT Tuning

First Author
Institution1
Institution1 address
[email protected] Second Author
Institution2
First line of institution2 address
[email protected]

Table 1: Benchmarking Hint-Aug with ViT-Base pretrained with MAE. Best results are marked in bold.

# Shot	Tuning Technique	Augment	Food	Pets	Cars	Flowers	Aircraft
8-shot	Adapter	No-aug	48.12	89.45	50.63	93.91	39.87
		[zhang2022neural]	53.59	90.35	52.15	94.82	34.29
		Proposed	57.19	91.41	55.44	95.17	42.35
	VPT	No-aug	49.13	92.12	48.97	89.64	38.79
		[zhang2022neural]	53.12	90.71	47.74	94.36	36.12
		Proposed	55.85	92.48	49.29	94.83	40.15
	LoRA	No-aug	46.39	88.31	47.68	93.75	39.62
		[zhang2022neural]	53.16	89.75	55.81	95.70	38.61
		Proposed	55.11	90.22	57.12	96.01	41.10
16-shot	Adapter	No-aug	58.41	91.09	73.56	95.86	59.41
		[zhang2022neural]	63.05	91.58	75.03	96.77	54.97
		Proposed	65.01	92.45	77.54	97.48	62.42
	VPT	No-aug	57.53	94.51	69.95	90.41	55.12
		[zhang2022neural]	62.06	91.63	69.99	95.83	50.29
		Proposed	62.87	95.04	71.83	96.51	58.39
	LoRA	No-aug	58.69	90.24	73.28	97.12	58.81
		[zhang2022neural]	62.37	91.93	77.74	97.65	56.62
		Proposed	63.10	92.38	78.81	98.10	59.49

1 Overview and Outline

In this supplement, we provide more experiments and analysis as a complement to the main content, which are outlined below:

•

We evaluate Hint-Aug’s ability in boosting the FViTs’ few-shot parameter-efficient tuning accuracy across different pretraining strategies in Supple. 2.
•

We evaluate Hint-Aug’s ability in boosting the FViTs’ accuracy under the less data-limited transfer learning setting in Supple. 3.
•

We conduct an ablation study on Hint-Aug’s sensitivity to $\lambda$ in Eq. 3 from the main paper in Supple. 4.

2 Hint-Aug’s Achieved Accuracy Across Different Pretraining Strategies

The pretraining strategy is a key factor in fully exploiting the FViTs’ potential [he2021masked, bao2021beit, chen2021empirical]. As mentioned in [he2021masked, chen2021empirical], it plays an even more important role than the FViT models’ architectures, given that we are now benchmarking under transformer-based models of similar architectures (e.g., applying the pretraining strategy as in [he2021masked] on ViT-Base can compensate for its accuracy gap with ViT-Huge, which has nine times more number of parameters). With this in mind, we aim to validate Hint-Aug’s capability in boosting the FViTs’ accuracy across models trained with different pretraining strategies. To do this, we validate the proposed Hint-Aug by applying it on top of the ViT-Base [dosovitskiy2020image] models pretrained with a widely used SOTA pretraining strategy, MAE [he2021masked], while keeping all other experiment settings the same as those used in the experiments described in the main paper.

As shown in Table 1, Hint-Aug can consistently boost the achievable accuracy over that of the baseline tuning methods by $0.31\%\sim 3.60\%$ and $0.45\%\sim 3.27\%$ on an 8-shot and 16-shot setting, respectively. This validates that Hint-Aug is a general method that can consistently improve the effectiveness of few-shot parameter-efficient tuning of FViTs.

3 Hint-Aug’s Ability in Boosting Transfer Learning Accuracy

We further validate Hint-Aug’s achieved accuracy when we use more tuning data as in the conventional transfer learning scenario because it can evaluate Hint-Aug’s potential as a general technique to improve parameter-efficient tuning accuracy under various scenarios. In this set of experiments, we tune the ImageNet pretrained ViT-Base [dosovitskiy2020image] on five datasets including Caltech-101 [fei2004learning], CIFAR-100 [krizhevsky2009learning], DTD [cimpoi2014describing], Flowers [nilsback2006visual], and Pets [parkhi2012cats]. We follow the same tuning setting in [zhang2022neural] and show our results in Table 2. We can see that Hint-Aug achieves as high as 2.47%, 0.90%, and 3.77% accuracy improvement over the baseline method when tuning Adapter, VPT, and LoRA, respectively. This further validates that (1) Hint-Aug can generally boost the achievable accuracy under various settings in parameter-efficient tuning, and (2) the features generated in Hint-Aug are of high quality and can further improve the feature diversity even when more data (e.g., 50,000 tuning data in CIFAR-100 dataset) are provided.

Table 2: Benchmarking Hint-Aug with ViT-Base on transfer learning datasets. Best results are marked in bold.

Tuning Technique	Augment	Caltech-101	CIFAR-100	DTD	Flowers	Pets
Adapter	No-aug	90.10	69.22	68.04	98.61	89.88
	[zhang2022neural]	88.58	64.79	70.32	98.68	90.61
	Proposed	91.14	71.67	70.64	99.01	90.86
VPT	No-aug	90.83	76.76	65.78	98.03	88.27
	[zhang2022neural]	86.32	57.14	68.78	98.41	90.57
	Proposed	91.64	77.23	69.68	98.83	90.71
LoRA	No-aug	91.37	67.12	69.40	98.55	90.38
	[zhang2022neural]	88.35	65.91	71.07	98.69	91.11
	Proposed	92.43	70.87	71.21	99.06	91.81

4 Ablation Study on $\lambda$ ’s Impact in Hint-Aug’s Achievable Accuracy

To validate Hint-Aug’s ability when being applied to new tasks, we perform an ablation study regarding the impact of the only hyperparameter, $\lambda$ in Eq. 3, in Hint-Aug, under 8-shot Adapter tuning of ViT-Base [dosovitskiy2020image] on the Food [bossard2014food] dataset. As shown in Table 3, different values of $\lambda$ lead to marginal differences in the achievable accuracy (i.e., lower than 0.2%). This validates that Hint-Aug is not sensitive to the hyperparameter being used, alleviating the potential burden of having to tune hyperparameters when applying Hint-Aug to a new task.

Table 3: Ablation study on

\lambda

’s impact.

$\lambda$	0.2	0.1	0.05
Accuracy (%)	70.88	71.04	70.85

Supplementary Materials of Hint-Aug: Drawing Hints from Vision Foundation Models towards Boosted Few-shot Parameter-Efficient ViT Tuning

1 Overview and Outline

2 Hint-Aug’s Achieved Accuracy Across Different Pretraining Strategies

3 Hint-Aug’s Ability in Boosting Transfer Learning Accuracy

4 Ablation Study on λ\lambda’s Impact in Hint-Aug’s Achievable Accuracy

Supplementary Materials of
Hint-Aug: Drawing Hints from Vision Foundation Models
towards Boosted Few-shot Parameter-Efficient ViT Tuning

4 Ablation Study on $\lambda$ ’s Impact in Hint-Aug’s Achievable Accuracy