Learning Class Unique Features in Fine-Grained Visual Classification: Appendix
1 Appendix 1: Experimental Details
Our experiments are performed on fine-grained visual classification (FGVC) benchmarks: CUB-200-2011 (wah2011caltech), FGVC-Aircraft (maji2013fine), Stanford Cars (KrauseStarkDengFei-Fei_3DRR2013) and standard visual classification benchmarks: CIFAR-10 (krizhevsky2009learning), CIFAR-100 (krizhevsky2009learning), STL-10 (coates2011analysis). The statistics of six datasets are shown in 1.
Different methods are compared on ResNet18 (he2016deep), VGGNet11 (simonyan2014very), DenseNet161 (huang2017densely). We conduct all our experiments on Pytorch framework (paszke2019pytorch) with NVIDIA 2080Ti GPUs. In all the datasets, we perform three-folds cross validation to choose the hyper-parameter (including the algorithm-specific parameters, learning rate, decay rate and weight decay, see Table 2. Then we evaluate all the models over three runs and report the mean and standard deviation on the test set. We use Stochastic Gradient Descent (SGD) to update the model parameters. All the images are normalized and augmented by random crop and random horizontal flip. For algorithm-specific hyper-parameters, we set 1.00 for the weight of Confidence Penalty (CP), 0.10 for the smoothing rate of Label Smoothing (LS) and 0.85 for of Minimax Loss (MM). For FRL, we set and
Dataset | #Training | #Testing | #Categories |
---|---|---|---|
CUB-200-2011 | 5994 | 5794 | 200 |
FGVC-Aircraft | 6667 | 3333 | 100 |
Stanford Cars | 8144 | 8041 | 196 |
CIFAR-10 | 50000 | 10000 | 10 |
CIFAR-100 | 50000 | 10000 | 100 |
STL-10 | 5000 | 8000 | 10 |
Dataset | Image sizet | Crop size | Batch size | Epochs | Learning rate | Weight decay | LR policy |
---|---|---|---|---|---|---|---|
CIFAR-10 | 3232 | 3232 | 128 | 200 | 0.1 | 0.0005 | 0.2/50 |
CIFAR-100 | 3232 | 3232 | 128 | 200 | 0.1 | 0.0005 | 0.2/50 |
STL-10 | 9696 | 9696 | 64 | 200 | 0.1 | 0.0005 | 0.2/50 |
CUB-200-2011 | 512512 | 448448 | 16 | 60 | 0.004 | 0.0005 | 0.9/2 |
FGVC-Aircraft | 512512 | 448448 | 16 | 60 | 0.008 | 0.0005 | Cosine |
Stanford Cars | 512512 | 448448 | 16 | 60 | 0.01 | 0.0005 | Cosine |
2 Appendix 2: The Proofs
2.1 Proof of lemma 1
Proof.
From previous discussions, we have
On the other hand, when the conditional probability distribution over non-target classes is uniform, we have
Therefore , combining with the fact that MI is always non-negative gives . Finally we have , and thus . ∎
2.2 Proof of theorem 1
Proof.
Since , then . We have:
which concludes Theorem 1. ∎
2.3 Proof of theorem 2
Proof.
Let . Since , for any , we have
which concludes Theorem 2. ∎
2.4 Proof of theorem 3
Proof.
Under the strategy of , the expected payoff of the adversary can be calculated as:
Since contains all the indexes except under the condition that uniformly distribute all the probabilities over non-target classes, thus . Moreover, contains only one action which is equal to , thus:
Assume that there is another strategy that gives higher payoff for the adversary, then:
which is contrary to the assumption, thus prove that the is the best response to .
Likewise, we can prove that is the best response to . Thus we have proved that forms a Nash equilibrium. ∎