Supplementary Material
1 Analysis on DomainNet-126
Here, we provide detailed analysis done on DomainNet-126. The paper results the average accuracy across the seven domains in Table 7 and 8 in the main paper. Here, we provide the results for each domain shift on performing ablation on loss components in Table 1. We also report detailed results of AdaContrast (Table 2) and pSTarC (Table 3) on varying batch sizes.
RC RP PC CS SP RS PR Avg ✓ ✓ 57.1 66.6 57.1 47.8 54.7 54.1 74.0 58.8 ✓ ✓ 56.0 63.6 56.0 51.8 61.7 49.9 78.7 59.7 ✓ ✓ 60.1 67.3 59.6 55.0 64.8 54.6 79.8 63.0 ✓ ✓ ✓ 60.8 67.7 60.3 55.6 65.3 55.8 80.2 63.7
Method RC RP PC CS SP RS PR Average 8 44.9 54.7 45.8 44.1 52.1 42.2 66.9 50.1 16 54.4 62.8 54.6 49.6 59.3 49.3 75.2 57.9 32 57.8 65.4 58.1 52.0 61.9 52.9 77.6 60.8 64 60.0 66.9 59.7 53.6 63.7 54.3 78.6 62.4 128 60.1 67.1 60.2 54.4 64.2 54.3 78.7 62.4
Method RC RP PC CS SP RS PR Average 8 53.5 62.6 51.2 41.2 54.1 46.2 69.9 54.1 16 56.5 65.7 56.2 49.6 59.8 51.4 75.5 59.2 32 59.4 67.2 58.0 51.1 61.9 54.5 77.0 61.3 64 61.6 68.9 60.5 54.7 64.3 57.0 79.4 63.8 128 60.8 67.7 60.3 55.6 65.3 55.8 80.2 63.7
2 Choice of parameter
For pseudo source feature generation, we set the total number of features as where is the number of classes and is the number of features we expect to be generated per class. During test time adaptation, we set the number of positives as 5 for all experiments. So, ensuring the generated feature bank to contain 5 samples per class should suffice for the algorithm to work well without any significant degradation in accuracy.
We use an Adam optimizer with a learning rate of 0.01 and optimize the feature bank for 50 steps. However, as we are optimising the feature bank, naively setting may not ensure there are adequate number of features (5 in this case) per class. The optimum for the second term in the loss occurs only when there are equal number of samples in each class, i.e., when the class distribution become uniform. This can not always be guaranteed, while using the same optimization parameters across datasets. For example, even setting 120 as the number of features for VisDA with 12 classes, atleast 5 features per class were generated. But for DomainNet-126, using the same optimization scheme, we observed some classes had less than 5 features generated. Instead of tuning the optimizer hyperparameters for each dataset, we set sufficiently large (20 here) so that it can be used across all datasets. We observe that on setting as features, for all datasets, using the same optimizer parameters and number of steps, we could ensure atleast 5 features per class were generated to enable using them seamlessly during TTA. Hence, we set to 20 features per class for all datasets, VisDA (with 12 classes), DomainNet-126 (with 126 classes), Office-Home (with 65 classes) and CIFAR-100 (with 100 classes).
3 Pseudo Code for pSTarC
We will publicly release the code on acceptance. Parts of this code is adapted from AaD.
4 Augmentation
We use the same augmentations as used in AdaContrast here. We explicitly report the series of augmentations done along with their ranges for better reproducibility.