Removing Undesirable Feature Contributions Using Out-of-Distribution Data
Abstract
Several data augmentation methods deploy unlabeled-in-distribution (UID) data to bridge the gap between the training and inference of neural networks. However, these methods have clear limitations in terms of availability of UID data and dependence of algorithms on pseudo-labels. Herein, we propose a data augmentation method to improve generalization in both adversarial and standard learning by using out-of-distribution (OOD) data that are devoid of the abovementioned issues. We show how to improve generalization theoretically using OOD data in each learning scenario and complement our theoretical analysis with experiments on CIFAR-10, CIFAR-100, and a subset of ImageNet. The results indicate that undesirable features are shared even among image data that seem to have little correlation from a human point of view. We also present the advantages of the proposed method through comparison with other data augmentation methods, which can be used in the absence of UID data. Furthermore, we demonstrate that the proposed method can further improve the existing state-of-the-art adversarial training.
1 Introduction
The power of the enormous amount of data suggested by the empirical risk minimization (ERM) principle (Vapnik & Vapnik, 1998) has allowed deep neural networks (DNNs) to perform outstandingly on many tasks, including computer vision (Krizhevsky et al., 2012) and natural language processing (Hinton et al., 2012). However, most of the practical problems encountered by DNNs have high-dimensional input spaces, and nontrivial generalization errors arise owing to the curse of dimensionality (Bellman, 1961). Moreover, neural networks have been found to be easily deceived by adversarial perturbations with a high degree of confidence (Szegedy et al., 2013). Several studies (Goodfellow et al., 2014; Krizhevsky et al., 2012) have been conducted to address these generalization problems resulting from ERM. Most of them handled the generalization problems by extending the training distribution (Madry et al., 2017; Lee et al., 2020). Nevertheless, it has been demonstrated that more data are needed to achieve better generalization (Schmidt et al., 2018). Recent methods (Carmon et al., 2019; Xie et al., 2019) introduced unlabeled-in-distribution (UID) data to compensate for the lack of training samples. However, there are limitations associated with these methods. First, obtaining suitable UID data for selected classes is challenging. Second, when applying supervised learning methods on pseudo-labeled data, the effect of data augmentation depends heavily on the accuracy of the pseudo-label generator.
In our study, in order to break through the limitations outlined above, we propose an approach that promotes robust and standard generalization using out-of-distribution (OOD) data. Especially, motivated by previous studies demonstrating the existence of common adversarial space among different images or even datasets (Naseer et al., 2019; Poursaeed et al., 2018), we show that OOD data can be leveraged for adversarial learning. Likewise, if the OOD data share the same undesirable features as those of the in-distribution data in terms of standard generalization, they can be leveraged for standard learning. By definition, in this work, the classes of the OOD data differ from those of the in-distribution data, and our method do not use the label information of the OOD data. Therefore the proposed method is free from the previously mentioned problems caused by UID data. We present a theoretical model which demonstrates how to improve generalization using OOD data in both adversarial and standard learning. In our theoretical model, we separate desirable and undesirable features and show how the training on OOD data, which shares undesirable features with in-distribution data, changes the weight values of the classifier. Based on the theoretical analysis, we introduce out-of-distribution data augmented training (OAT), which assigns a uniform distribution label to all the OOD data samples to remove the influence of undesirable features in adversarial and standard learning. In the proposed method, each batch is composed of training data and OOD data, and OOD data regularize the training so that only features that are strongly correlated with class labels are learned. We complement our theoretical findings with experiments on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), and a subset of ImageNet (Deng et al., 2009). In addition, we present the empirical evidence for the transferability of undesirable features through further studies on various datasets including Simpson Characters (Attia, 2018), Fashion Product Images (Aggarwal, 2018), SVHN (Netzer et al., 2011), Places365 (Zhou et al., 2017), and VisDA-17 (Peng et al., 2017).
Contributions
(i) We propose a simple method, out-of-distribution data augmented training (OAT), to leverage OOD data for adversarial and standard learning, and theoretical analyses demonstrate how our proposed method can improve robust and standard generalization. (ii) The results of experimental procedures on CIFAR-10, CIFAR-100, and a subset of ImageNet suggest that OAT can help reduce the generalization gap in adversarial and standard learning. (iii) By applying OAT using various OOD datasets, it is shown that undesirable features are shared among diverse image datasets. It is also demonstrated that OAT can effectively extend training distribution by comparison with other data augmentation methods that can be employed in the absence of UID data. (iv) The state-of-the-art adversarial training method using UID data is found to further improve by incorporating the proposed method of leveraging OOD data.
2 Background
Undesirable features in adversarial learning
Tsipras et al. (2018) demonstrated the existence of a trade-off between standard accuracy and adversarial robustness with the distinction between robust and non-robust features. They showed the possibility that adversarial robustness is incompatible with standard accuracy by constructing a binary classification task that the data model consists of input-label pairs sampled from a distribution as follows:
(1) |
Here, is a robust feature that is strongly correlated with the label, and the other features are non-robust features that are weakly correlated with the label. is small but sufficiently large such that a simple classifier attains a high standard accuracy, and . To characterize adversarial robustness, the definitions of expected standard loss and expected adversarial loss for a data distribution are defined as follows:
(2) |
Here, is the loss function of the model, and represents the set of perturbations that the adversary can apply to deceive the model. For Equation (1), Tsipras et al. (2018) showed that the following classifier can yield a small expected standard loss:
(3) |
They also proved that the classifier is vulnerable to adversarial perturbations, and that adversarial training results in a classifier that assigns zero weight values to non-robust features.
Transferability of Adversarial Perturbations
Naseer et al. (2019) produced domain-agnostic adversarial perturbations, thereby showing common adversarial space among different datasets. They showed that an adversarial function trained on Paintings, Cartoons or Medical data can deceive the classifier on ImageNet data with a high success rate. The study findings show that even datasets from considerably different domains share non-robust features. Therefore a method for supplementing the data needed for adversarial training is presented herein.
Undesirable features in standard learning
Wang et al. (2020) noted that convolutional neural networks (CNN) can capture high-frequency components in images that are almost imperceptible to a human. This ability is thought to be closely related to the generalization behaviors of CNNs, especially the capacity in memorizing random labels. Several studies (Geirhos et al., 2018; Bahng et al., 2019) reported that CNNs are biased towards local image features, and the generalization performance can be improved by regularizing that bias. In this context, a method of regularizing undesirable feature contributions using OOD data is proposed, assuming that undesirable features arise from the bias of CNNs or insufficient training data and are widely distributed in the input space.
3 Methods
3.1 Theoretical motivation
In this section, we analyze theoretically how OOD data can be used to make up for the insufficient training samples in adversarial training based on the dichotomy between robust and non-robust features. The theoretical motivation of using OOD data to reduce the contribution of undesirable features in standard learning can be found in Appendix B.
Setup and overview
We denote in-distribution data as target data. Given a target dataset sampled from a data distribution , where is the input space, we suppose that a feature extractor and a linear classification model are trained on and target feature-label pairs , respectively, to yield the small expected standard loss of the classification model. We then define an OOD dataset sampled from a data distribution that has the same distribution of non-robust features as that of with reference to the preceding studies (Naseer et al., 2019; Moosavi-Dezfooli et al., 2017). After fixing to facilitate the theoretical analysis on this framework, we demonstrate how adversarial training on the OOD dataset affects the weight values of our classifier.
Our data model
The feature extractor can be considered to consist of several feature extractors . Hence, we can set the distributions of the target feature-label pair and the OOD feature vector as follows:
(4) |
where stands for uniformly at random. From here on, we will only deal with the OOD data, therefore the accents (tilde and caret) that distinguish between target data and OOD data are omitted. In Equation (4), the feature is the output of a robust feature extractor , and the other features are those of non-robust feature extractors . Since the OOD input vectors do not have the same robust features as the target input vectors, has zero mean and a small variance. Furthermore, because the OOD data have the same distribution of non-robust features as the target data, have a non-zero mean and a larger variance than the robust features. In addition, represents the unknown label associated with the non-robust features, and is a non-negative constant which represents the degree of correlation between the non-robust features and the unknown label. Please note that the input space of our classifier is the output space of in our data model. Therefore, is not limited to a small value even in the context of the bounded adversary. Rather, the high degree of confidence that DNNs show for the adversarial examples (Goodfellow et al., 2014) suggests that is large.
Our linear classification model
According to Section 2, we know that our linear classification model (logistic regression), defined as follows, yields a low expected standard loss while demonstrating high adversarial vulnerability.
(5) |
To observe the effect of adversarial training on the OOD dataset, we train our classifier by applying the stochastic gradient descent algorithm to the cross-entropy loss function .
Firstly, we construct the adversarial feature vector against our classifier for adversarial training.
Theorem 1.
Let be the given target value of the feature vector in our classification model, and be a non-negative constant. Then, when , the expectation of the adversarial feature vector is
(6) |
(All the proofs of the theorems in this paper can be found in Appendix A.) Here, we assume the bounded adversary. In Theorem 1, we can observe that the adversary pushes the non-robust features farther in the direction of the unknown label , which coincides with our intuition. When the given target value is , the adversary will make our classification model output equal to zero or one to yield a large loss.
Our classification model is trained on the adversarial features shown in Theorem 1.
Theorem 2.
When , the expected gradient of the loss function with respect to the weight vector of our classification model is
(7) |
Thus, the adversarial training with on the OOD dataset leads to the weight values corresponding to the non-robust features converging to zero while preserving from the gradient update. This shows that we can reduce the impact of non-robust features using the OOD dataset. We, however, should not only reduce the influence of the non-robust features, but also improve the classification accuracy using the robust feature. This can be achieved through the adversarial training on the target dataset. Accordingly, we show the effect of the adversarial training on the OOD dataset when in our example.
Theorem 3.
When and , the expected gradient of the loss function with respect to the weight vector of our classification model is
(8) |
Theorem 3 shows that when , the adversarial training with on the OOD dataset reduces the influence of all the features in . However, we can see that the expected gradients for the weight values associated with the non-robust features are always greater than the expected gradient for the weight value associated with the robust feature. In addition, the greater the value of , the faster the weight value associated with it converges to zero. This means that the contribution of non-robust features with high influence decreases rapidly. In the case of multiclass classification, it is straightforward that corresponds to uniform distribution label , where is the number of the classes. Intuitively, the meaning of () is that the input lies on the decision boundary. To reduce the training loss for (), the classifier will learn that the features of do not contribute to a specific class, which can be understood as removing the contributions of the features of .
3.2 Out-of-distribution data augmented training
Based on our theoretical analysis, we introduce Out-of-distribution data Augmented Training (OAT). OAT is the training on the union of the target dataset and the OOD dataset . When applying our proposed method, we need to consider the following two points: 1) Temporary labels associated with OOD data samples are required for supervised learning, and 2) the loss functions corresponding to and should be properly combined.
First, we assign a uniform distribution label to all the OOD data samples as confirmed in our theoretical analysis. This labeling method enables us to leverage OOD data for supervised learning at no extra cost. Moreover, it means that our method is completely free from the limitations of the methods using UID data (see Section 1).
Second, although OOD data can be used to improve the standard and robust generalization of neural networks, the training on target data is essential to enhance the classification accuracy of neural networks. In addition, according to Theorem 3, adversarial training on the pairs of OOD data samples and affects the weight for robust features as well as that for non-robust features. Hence, the balance between losses from and is important in OAT. For this reason, we introduce a hyperparameter into our proposed method and train neural networks as follows:
(9) |
Here, OAT-A and OAT-S represent OOD data augmented adversarial and standard learning, respectively. The pseudo-code for the overall procedure of our method is presented in Algorithm 1.
4 Related studies
Adversarial examples
Many adversarial attack methods have been proposed, including the projected gradient descent (PGD) and Carlini & Wagner (CW) attacks (Madry et al., 2017; Carlini & Wagner, 2017). The PGD attack employs an iterative procedure of the fast gradient sign method (FGSM) (Goodfellow et al., 2014) to find worst-case examples in which the training loss is maximized. CW attack finds adversarial examples using CW losses instead of cross-entropy losses. Recently, Croce & Hein (2020) proposed autoattack (AA), which is a powerful ensemble attack with two extensions of the PGD attack and two existing attacks (Croce & Hein, 2019; Andriushchenko et al., 2019). To defend against these adversarial attacks, various adversarial defense methods have been developed (Goodfellow et al., 2014; Kannan et al., 2018). Madry et al. (2017) introduced adversarial training that uses adversarial examples as training data, and Zhang et al. (2019) proposed TRADES to optimize a surrogate loss which is a sum of the natural error and boundary error.
OOD detection
Lee et al. (2017) and Hendrycks et al. (2018) dealt with the overconfidence problem of the confidence score-based OOD detectors. They used uniform distribution labels as in our method to resolve the overconfidence issue. In particular, Augustin et al. (2020) addressed the overconfidence problem in adversarial settings. The authors proposed a method that is practically identical to OAT-A by combining adversarial training with ACET (Hein et al., 2019). They did not, however, address the generalization problems of neural networks. Specifically, our theoretical results allow us to explain the classification performance improvement of classifiers that were only considered as secondary effects in the abovementioned studies. Further related works can be found in Appendix C.
5 Experimental results and discussion
5.1 Experimental setup
OOD datasets
We created OOD datasets from the 80 Million Tiny Images dataset (Torralba et al., 2008) (80M-TI), using the work of Carmon et al. (2019) for CIFAR-10 and CIFAR-100, respectively. In addition, we resized (using a bilinear interpolation) ImageNet to dimensions of and and divided it into datasets containing 10 and 990 classes, respectively; these are called ImgNet10 and ImgNet990. Furthermore, we resized Places365 and VisDA-17 for the experiments on ImgNet10 and cropped the Simpson Characters (Simpson), and Fashion Product (Fashion) datasets to dimensions of for the experiments on CIFAR10 and CIFAR100. The details in sourcing the OOD datasets can be found in Appendix D.
Implementation details
Implementation details including the hyperparameter , architectures, batch sizes, and training iterations are summarized in Appendix E. All models compared in the same target dataset are trained in the same training batch size and iterations to ensure a fair comparison. In other words, OAT have a batch size of , which would be compared with a model that is normally trained with a batch size of . To evaluate the adversarial robustness of the models in our experiments, we apply several adversarial attacks including PGD, CW, and AA. Note that we denote PGD and CW attacks with iterative steps as PGD and CW, respectively, and the original test set as Clean. We compare the following models in our experiments111Our codes are available at https://github.com/Saehyung-Lee/OAT.:
-
1.
Standard: The model which is normally trained on the target dataset.
-
2.
PGD: The model trained using PGD-based adversarial training on the target dataset.
-
3.
TRADES: The model trained using TRADES on the target dataset.
-
4.
: The model which is adversarially trained with OAT based on a PGD approach.
-
5.
: The model which is adversarially trained with OAT based on TRADES.
-
6.
: The model which is normally trained with OAT using the OOD dataset .
5.2 Study on the effectiveness of OAT in adversarial learning
Model | Target | OOD | Clean | PGD100 | CW100 | AA |
Standard | CIFAR10 | - | 95.48 | 0.00 | 0.00 | 0.00 |
PGD | - | 87.48 | 49.92 | 50.80 | 48.29 | |
PGD+CutMix | - | 89.35 | 53.39 | 52.35 | 49.05 | |
TRADES | - | 85.24 | 55.69 | 54.04 | 52.83 | |
OATPGD | 80M-TI | 86.63 | 56.77 | 52.38 | 49.98 | |
OATTRADES | 80M-TI | 86.76 | 59.66 | 55.71 | 54.63 | |
Standard | CIFAR100 | - | 78.57 | 0.02 | 0.00 | 0.00 |
PGD | - | 61.37 | 24.66 | 24.68 | 22.76 | |
TRADES | - | 58.84 | 30.24 | 27.97 | 26.91 | |
OATPGD | 80M-TI | 61.54 | 30.02 | 27.85 | 25.36 | |
OATTRADES | 80M-TI | 63.07 | 34.23 | 29.02 | 27.83 | |
Standard | ImgNet10 (64 x 64) | - | 86.03 | 0.11 | 0.06 | 0.00 |
PGD | - | 82.80 | 48.77 | 48.86 | 48.34 | |
OATPGD | ImgNet990 | 81.91 | 59.03 | 54.69 | 53.83 |
CIFAR10 | ImgNet10 (64 x 64) | |||||||||
\hlineB2 OOD | None | SVHN | Simpson |
|
None | Places365 |
|
|||
\hlineB1.5 Clean | 87.48 | 86.16 | 86.79 | 85.84 | 82.80 | 82.37 | 82.46 | |||
PGD20 | 50.41 | 53.70 | 53.88 | 53.27 | 49.00 | 59.86 | 55.34 | |||
CW20 | 51.11 | 52.21 | 52.15 | 51.70 | 48.91 | 56.23 | 53.80 |
Evaluating adversarial robustness
The improvements in the robust generalization performances of the PGD and TRADES models through the application of OAT are evaluated against PGD100, CW100, and AA with bound of and 0.031. The results are summarized in Table 1 and indicate that OAT improves the robust generalization of all adversarial training methods tested regardless of the target dataset. In particular, from the results against AA it can be seen that the effectiveness of OAT does not rely on obfuscated gradients (Athalye et al., 2018). This is because AA removes the possibility of gradient masking through the application of a combination of strong adaptive attacks (Croce & Hein, 2019; 2020) and a black-box attack (Andriushchenko et al., 2019).
However, while OAT brings a significant improvement in robustness against PGD attacks, it is relatively less effective against CW attacks. According to our theoretical analysis, these results imply that the OOD data have relatively few non-robust features used in the CW attacks. In other words, powerful targeted attacks, such as CW attacks, are constructed using gradient information more selectively than untargeted attacks, such as PGD attacks. To gain insight into this phenomenon, we train models with various hyperparameter values. The results suggest that the greater the influence of OOD on the training process is, the higher the robustness against PGD attacks and the lower the robustness against CW attacks are (see Appendix G for more details).
In the absence of UID data, various data augmentation methods other than OAT can be employed. CutMix (Yun et al., 2019), a method of amplifying training data by cutting and pasting patches between training images, was recently proposed and exhibited excellent performance in classification and transfer learning tasks. By applying CutMix to adversarial training, the advantages of OAT over other data augmentation methods are assessed. Specifically, OAT brings a higher level of robustness than CutMix, as deduced from Table 1. Because adversarial examples are closely related to the high-frequency components of images (Wang et al., 2020) and adversarial training reduces the model’s sensitivity to these components, CutMix is an ineffective data augmentation method. Although CutMix extends the global feature distribution, there is no significant difference in terms of local feature distribution. Contrarily, OAT can effectively regularize the classifier by observing various undesirable features held by additional data in the learning process.
OAT with diverse OOD datasets
Non-robust features are widely shared among different datasets, as demonstrated by applying OAT using various OOD datasets. Table 5.2 indicates that OAT improves robust generalization for all OOD datasets, including those that have little correlation with the target dataset from a human perspective. In addition, given that relatively simple datasets with no background (Fashion and VisDA-17) are less effective than others, we can suppose that non-robust features arise from the high complexity (Snodgrass & Vanderwart, 1980) of images for natural image datasets, such as CIFAR and ImageNet.
When UID data are available
In adversarial training, in-distribution data reduce the sensitivity of neural networks to non-robust features and provide robustly generalizable features. On the other hand, the proposed theory shows that OAT takes advantage of the data amplification effect only for non-robust features using OOD data. Therefore, the effectiveness of OAT is expected to decrease as more in-distribution data are augmented. To observe such a trend empirically, the effect of OAT is investigated according to additional in-distribution data size using previously published pseudo-labeled data (Carmon et al., 2019).

Surprisingly, Figure 1(a) shows that OAT still improves robust generalization even when many pseudo-labeled data are used. OAT is also combined with RST (Carmon et al., 2019), which has recently recorded a state-of-the-art adversarial robustness using UID data; in fact, Figure 1(b) demonstrates that OAT can further improve the state-of-the-art adversarial training method. This is presumably because the additional data include noisy labeled data, in which induce memorization in the learning process and thus impair the generalization performance (Zhang et al., 2016). OAT seems to achieve a higher level of robust generalization by effectively suppressing the effects of noisy data. Additional details on the experiments illustrated in Figure 1 are described in Appendix H.
5.3 Study on the effectiveness of OAT in standard learning
Randomization test
The effect of OAT is analyzed based on the randomization test (Zhang et al., 2016). The randomization test is an experiment aiming to observe the effective capacity of neural networks and the effect of regularization by training the model on a copy of the data where the true labels are replaced by random labels.

In Figure 2(a), it can be seen that the Standard model memorizes all training samples to obtain a low training error, whereas the OAT model continues to have a high training error. These results show that OAT effectively regularizes neural networks so that it learns only features with strong correlation with the class labels. Figure 2(b) indicates that the OAT model learns more slowly than the Standard model owing to the influence of a strong regularizer in the early stages of training; however, it achieves a better generalization performance at the end of training.
Evaluating classification performance
The effect of OAT on the classification accuracy is shown in Table 3. When the number of training samples () is small, the influence of undesirable features is expected to be large. Therefore, the experiments are classified depending on the number of training samples ().
Dataset | CIFAR10 | CIFAR100 |
2,500 / Full | 2,500 / Full | |
Standard | 65.44 / 94.46 | 24.41 / 74.87 |
OATSVHN | 68.56 / 94.45 | 24.82 / 75.65 |
OATSimpson | 70.08 / 94.43 | 27.04 / 76.03 |
OAT80M-TI | 72.49 / 95.20 | 26.13 / 76.30 |
Pseudo-label | - / 95.28 | - / 77.24 |
Fusion | - / 95.53 | - / 77.36 |
Dataset |
|
|
||||
100 / Full | 100 / Full | |||||
Standard | 37.90 / 86.93 | 33.36 / 90.91 | ||||
OATVisDA17 | 36.21 / 86.71 | 35.93 / 91.23 | ||||
OATPlaces365 | 41.84 / 88.37 | 40.11 / 91.42 | ||||
OATImgNet990 | 42.18 / 87.88 | 40.41 / 91.87 |
Table 3 indicates that the effect of OAT is large when the amount of training data is small, as predicted. Additionally, OAT enhances the generalization performance using 80M-TI, ImgNet990, and Places365, which have input distributions similar to the target datasets, more than when using other OOD datasets. This proves empirically that in our theoretical analysis, the OOD data, following the same undesirable feature distribution as the target data, can improve generalization through OAT. In addition, the results of Pseudo-label and Fusion models show that even when pseudo-labeled data are available, OOD data can be leveraged to further improve the standard generalization performance. Moreover, the proposed method, which does not require complex operations and is very simple to implement, can lead to higher performance by combining with existing data augmentation methods. As an example, the effectiveness of Mixup (Zhang et al., 2017) is enhanced by applying OAT; the experimental results are provided in Appendix I.
Finally, OAT in a standard learning scheme using the entire target dataset is generally less effective than OAT in an adversarial training scheme (see Appendix F for more details). Therefore, it can be inferred that the transferability of undesirable features is greater in adversarial settings than in standard settings. In other words, these results experimentally show that the number of training samples required for robust generalization is large compared to that required for standard generalization.
6 Conclusions and future directions
In this study, a method is proposed to compensate for the insufficient training data by using OOD data, which are less restrictive than UID data. It is theoretically demonstrated that training with OOD data can remove undesirable feature contributions in a simple Gaussian model. Experiments are performed on various OOD datasets, which surprisingly demonstrate that even OOD datasets that apparently have little correlation with the target dataset from the human perspective can help standard and robust generalization through the proposed method. These results imply that a common undesirable feature space exists among diverse datasets. In addition, the effectiveness of the proposed method is evaluated when extra UID data are available, and the results indicate that OAT can improve the generalization performance even when substantial pseudo-labeled data are used.
Nevertheless, some limitations need to be acknowledged. First, it is challenging to predict the effectiveness of the proposed method before applying it to a specific target-OOD dataset pair. Second, our method is less effective against strong targeted adversarial attacks, such as CW attacks, as it is difficult to generate deliberate adversarial attacks on in-distribution in the process of OAT. Therefore, as a future research direction, we aim to quantify the degree to which undesirable features are shared between the target and OOD datasets and construct strong adversarial attacks using OOD data.
Acknowledgements:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) [2018R1A2B3001628], the BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2020, and AIR Lab (AI Research Lab) in Hyundai & Kia Motor Company through HKMC-SNU AI Consortium Fund.
References
- Aggarwal (2018) Param Aggarwal. Fashion product images (small), 2018. data retrieved from Kaggle, https://www.kaggle.com/paramaggarwal/fashion-product-images-small.
- Andriushchenko et al. (2019) Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. arXiv preprint arXiv:1912.00049, 2019.
- Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
- Attia (2018) Alexandre Attia. The simpsons characters data, 2018. data retrieved from Kaggle, https://www.kaggle.com/alexattia/the-simpsons-characters-dataset.
- Augustin et al. (2020) Maximilian Augustin, Alexander Meinke, and Matthias Hein. Adversarial robustness on in-and out-distribution improves explainability. In European Conference on Computer Vision, pp. 228–245. Springer, 2020.
- Bahng et al. (2019) Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, and Seong Joon Oh. Learning de-biased representations with biased representations. arXiv preprint arXiv:1910.02806, 2019.
- Bellman (1961) Robert Bellman. Curse of dimensionality. Adaptive control processes: a guided tour. Princeton, NJ, 3:2, 1961.
- Ben-Tal et al. (2013) Aharon Ben-Tal, Dick Den Hertog, Anja De Waegenaere, Bertrand Melenberg, and Gijs Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59(2):341–357, 2013.
- Carlini & Wagner (2017) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
- Carmon et al. (2019) Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems, pp. 11190–11201, 2019.
- Chan et al. (2020) Alvin Chan, Yi Tay, and Yew-Soon Ong. What it thinks is important is important: Robustness transfers through input gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Croce & Hein (2019) Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. arXiv preprint arXiv:1907.02044, 2019.
- Croce & Hein (2020) Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. arXiv preprint arXiv:2003.01690, 2020.
- Dai et al. (2017) Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Russ R Salakhutdinov. Good semi-supervised learning that requires a bad gan. In Advances in neural information processing systems, pp. 6510–6520, 2017.
- Deng et al. (2009) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- Geirhos et al. (2018) Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pp. 630–645. Springer, 2016.
- Hein et al. (2019) Matthias Hein, Maksym Andriushchenko, and Julian Bitterwolf. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 41–50, 2019.
- Hendrycks et al. (2018) Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606, 2018.
- Hendrycks et al. (2019) Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness and uncertainty. arXiv preprint arXiv:1901.09960, 2019.
- Hinton et al. (2012) Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine, 29, 2012.
- (23) Jeremy Howard. Imagenette. URL https://github.com/fastai/imagenette/.
- Kannan et al. (2018) Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
- Krizhevsky et al. (2009) Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
- Lee et al. (2017) Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. Training confidence-calibrated classifiers for detecting out-of-distribution samples. arXiv preprint arXiv:1711.09325, 2017.
- Lee et al. (2020) Saehyung Lee, Hyungyu Lee, and Sungroh Yoon. Adversarial vertex mixup: Toward better adversarially robust generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Madry et al. (2017) Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Moosavi-Dezfooli et al. (2017) Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1765–1773, 2017.
- Najafi et al. (2019) Amir Najafi, Shin-ichi Maeda, Masanori Koyama, and Takeru Miyato. Robustness to adversarial perturbations in learning from incomplete data. In Advances in Neural Information Processing Systems, pp. 5542–5552, 2019.
- Naseer et al. (2019) Muhammad Muzammal Naseer, Salman H Khan, Muhammad Haris Khan, Fahad Shahbaz Khan, and Fatih Porikli. Cross-domain transferability of adversarial perturbations. In Advances in Neural Information Processing Systems, pp. 12885–12895, 2019.
- Netzer et al. (2011) Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. 2011.
- Peng et al. (2017) Xingchao Peng, Ben Usman, Neela Kaushik, Judy Hoffman, Dequan Wang, and Kate Saenko. Visda: The visual domain adaptation challenge, 2017.
- Poursaeed et al. (2018) Omid Poursaeed, Isay Katsman, Bicheng Gao, and Serge Belongie. Generative adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4422–4431, 2018.
- Schmidt et al. (2018) Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pp. 5014–5026, 2018.
- Snodgrass & Vanderwart (1980) Joan G Snodgrass and Mary Vanderwart. A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. Journal of experimental psychology: Human learning and memory, 6(2):174, 1980.
- Stanforth et al. (2019) Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli, et al. Are labels required for improving adversarial robustness? arXiv preprint arXiv:1905.13725, 2019.
- Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Torralba et al. (2008) Antonio Torralba, Rob Fergus, and William T Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE transactions on pattern analysis and machine intelligence, 30(11):1958–1970, 2008.
- Tsipras et al. (2018) Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
- Vapnik & Vapnik (1998) Vladimir Vapnik and Vlamimir Vapnik. Statistical learning theory wiley. New York, 1, 1998.
- Wang et al. (2020) Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P. Xing. High-frequency component helps explain the generalization of convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Xie et al. (2019) Qizhe Xie, Eduard Hovy, Minh-Thang Luong, and Quoc V Le. Self-training with noisy student improves imagenet classification. arXiv preprint arXiv:1911.04252, 2019.
- Yun et al. (2019) Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6023–6032, 2019.
- Zagoruyko & Komodakis (2016) Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
- Zhang et al. (2016) Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
- Zhang et al. (2019) Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In Kamalika Chaudhuri and Ruslan Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7472–7482, Long Beach, California, USA, 09–15 Jun 2019. PMLR. URL http://proceedings.mlr.press/v97/zhang19p.html. https://github.com/yaodongyu/TRADES.
- Zhang et al. (2017) Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Zhou et al. (2017) Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
Appendix A Proofs
Theorem 1.
Let be the given target value of the feature vector in our classification model, and be a non-negative constant. Then, when , the expectation of the adversarial feature vector is
(10) |
Proof.
Let be the adversarial perturbation in the input space. Then,
(11) |
Equation 11 suggests that constructing the adversarial perturbation in the input space can be approximated by finding the adversarial perturbation in the feature space. These have a relationship of by linear approximation. Hence, without loss of generality, the perturbed feature vector can be approximated by with an . Then,
(12) |
Because our classification model was trained to minimize the expected standard loss, can be approximated by . Then,
(13) |
∎
Theorem 2.
When , the expected gradient of the loss function with respect to the weight vector of our classification model is
(14) |
Proof.
Based on the adversarial vulnerability of our classifier, can be approximated by . Therefore,
(15) |
∎
Theorem 3.
When and , the expected gradient of the loss function with respect to the weight vector of our classification model is
(16) |
Proof.
(17) |
∎
Appendix B The Theoretical Motivation of OAT in Standard Learning
Based on the same setup described in Section 3 of the main manuscript, we construct the data model for OAT in the following manner:
(18) |
For simplicity, we consider only one desirable feature extractor and one undesirable feature extractor . In Equation (18), features and represent the output of feature extractors and , respectively. As the OOD input vectors do not have the same desirable features as the target input vectors, the mean of is zero. On the other hand, because the OOD data have the same distribution of undesirable feature as the target data, the mean of is non-zero. In addition, considering that the feature extractors are trained to respond sensitively to the target input vectors, the output variance of the undesirable feature extractor is much larger than that of the desirable feature extractor for OOD data, especially in the high-dimensional case. represents the unknown label associated with the undesirable feature, and is a non-negative constant that represents the degree of correlation between the undesirable feature and the unknown label.
We can prove the following theorems for a logistic regression model that is not strongly dependent on the desirable feature.
Theorem 4.
The OOD data barely affect the gradient update of the weight associated with the desirable feature.
Proof.
Because we assumed a linear classifier that is not strongly dependent on the desirable feature owing to the bias of CNNs or insufficient training data, can be approximated by . Then, for the target value , we specified the feature as shown in the following equation:
(19) |
∎
Note that in Equation (19), the gradient of the loss function with respect to the weight value is zero regardless of . The optimal is then a value that makes converge to zero, thereby removing the influence of the undesirable feature.
Theorem 5.
When , the standard learning on the OOD data leads to the weight value corresponding to the undesirable feature converging to 0.
Proof.
(20) |
When , with high probability. Hence,
(21) |
Similarly,
(22) |
∎
Appendix C Further Related Works
Using Unlabeled Data to Improve Adversarial Robustness
Stanforth et al. (2019) and Carmon et al. (2019) analyzed the requirement of larger sample complexity for adversarially robust generalization (Schmidt et al., 2018). Based on a model described in a prior work (Schmidt et al., 2018), they theoretically proved that unlabeled data can alleviate the need for the large sample complexity of robust generalization. Therefore, they proposed a semi-supervised learning technique by augmenting the training dataset with extra unlabeled data. They used pseudo-labels obtained from a model trained with the existing training dataset. They experimented on the CIFAR-10 dataset by using the 80 Million Tiny Images dataset (Torralba et al., 2008) (80M-TI) as an extra training dataset, resulting in improvements of adversarial robustness of the model. Najafi et al. (2019) also used unlabeled data for adversarial robustness by extending the distributionally robust learning (Ben-Tal et al., 2013) to semi-supervised learning scenarios. Instead of using pseudo-labels, they used soft-labels, which are chosen from a set of labels and softened according to the loss values of data. Their results include experiments on the MNIST, CIFAR-10, and SVHN datasets.
Augustin et al. (2020)
Comparison with Bad GAN
Dai et al. (2017) theoretically showed that a perfect generator cannot able to enhance the generalization performance, and good semi-supervised learning actually requires a bad generator. Through theoretical analysis, they proposed an empirical formulation to generate samples with low-density in the input space. Compared to our method, there are two main differences between Bad GAN and OAT. First, Dai et al. (2017) only considered semi-supervised learning settings, whereas OAT can also be applied to adversarial settings. Second, data augmentation using GAN has clear limitations. Dai et al. (2017) penalized high-density samples to generate low-density samples in the input space, but this method is inefficient in a high-dimensional input space. On the other hand, OAT uses various and efficient OOD directly to regularize the model, and the research direction that develops from the use of synthetic data to the use of real data can also be found for OOD detection (Hendrycks et al., 2018).
Appendix D Sourcing OOD Datasets
We created OOD datasets from 80M-TI, using the work of Carmon et al. (2019) for CIFAR10 and CIFAR100, respectively. In other words, we trained a 11-way (1 more class for OOD) classifier on a training set consisting of the CIFAR10 dataset and 1M randomly sampled images from 80M-TI with keywords that did not appear in CIFAR10. We then applied the classifier on 80M-TI and sorted the images based on confidence in the OOD class. The 1M and 5M images selected in the order of the highest confidence were used for OAT-A and OAT-S, respectively. In Table 3, Fusion has a batch size of , which would be compared with the Pseudo-label model that is trained with a batch size of . In addition, we resized (using a bilinear interpolation) ImageNet to dimensions of and and divided it into datasets containing 10 and 990 classes, respectively; these are called ImgNet10 (train set size = 9894 and test set size = 3500) and ImgNet990, respectively. The classes were divided based on the Imagenette dataset (Howard, ), and the experimental results for the differently divided dataset (Imagewoof) can be seen in Appendix F. We increased the number of images of ImgNet990 by 10 times through random cropping and created the OOD datasets in the same process as 80M-TI. Furthermore, we resized (bilinear interpolation) Places365 (Zhou et al., 2017) and VisDA17 (Peng et al., 2017) for the experiments on ImgNet10 and cropped Simpson Characters (Simpson) (Attia, 2018) and Fashion product (Fashion) (Aggarwal, 2018) to dimensions of for the experiments on CIFAR.
Target | architecture | training steps | batch size | |
CIFAR | WRN-34-10 (Zagoruyko & Komodakis, 2016) | 1.0 | 80K | 128 |
ImgNet10 | ResNet18 (He et al., 2016) | 1.0 | 15.4K | 128 |
Target | N | architecture | training steps | batch size | |
CIFAR | 2500 | ResNet18 | 1.0 | 4000 | 128 |
50K | 78K | 128 | |||
ImgNet10 (64 x 64) | 100 | WRN-22-10 | 1.0 | 200 | 100 |
9894 | 0.2 | 15.4K | 128 | ||
ImgNet10 (160 x 160) | 100 | ResNet18 | 1.0 | 400 | 100 |
9894 | 0.1 | 38.5K | 128 |
Appendix E Implementation Details
For all the experiments except TRADES and OATTRADES, the initial learning rate is set to 0.1. The learning rate is multiplied by 0.1 at 50% and 75% of the total training steps, and the weight decay factor is set to . We use the same adversarial perturbation budget , as in Madry et al. (2017). We recorded the maximum adversarial robustness of the models on the test set after the first learning rate decay in adversarial training and the empirical upper bound of the test accuracy of the models during the standard learning. The other details are summarized in Tables 4 and 5. For TRADES and OATTRADES, we deploy a batch size of 64 and train the models using the same configurations as Zhang et al. (2019).
Appendix F Further Discussion about the Effectiveness of OAT
We created a new ImgNet10 dataset with reference to Imagewoof (Howard, ) to learn more about the effects of OAT. Imagewoof is a subset of ImageNet, and all classes are dog breeds. We tested whether OAT is effective for the newly constructed ImgNet10 dataset, and the experimental results of OAT-A and OAT-S are shown in Tables 6 and 7, respectively.
Model | OOD | Clean | PGD20 | CW20 |
Standard | - | 75.77 | - | - |
PGD | - | 53.64 | 16.18 | 14.81 |
OATPGD | ImgNet990 | 48.24 | 22.09 | 17.1 |
Places365 | 50.45 | 22.57 | 17.37 |
Target | N | 100 | 250 | 500 | 1250 | 2500 | Full |
ImgNet10 (64 x 64) | Standard | 19.42 | 25.48 | 31.49 | 47.04 | 57.12 | 75.77 |
OATImgNet990 | 22.74 | 30.07 | 35.16 | 48.84 | 58.14 | 75.17 | |
OATPlaces365 | 22.36 | 30.15 | 33.57 | 48.16 | 57.15 | 75.63 | |
ImgNet10 (160 x 160) | Standard | 20.03 | 25.36 | 30.69 | 50.69 | 64.82 | 81.22 |
OATImgNet990 | 23.64 | 30.05 | 37.09 | 60.01 | 70.39 | 83.16 | |
OATPlaces365 | 23.09 | 31.74 | 37.22 | 59.45 | 69.90 | 83.04 |
We can see from Tables 6 and 7 that for the new ImgNet10, OAT is largely useful for generalization. However, from the results of ImgNet10(6464) in Table 7, we can speculate that OAT will help in training as a regularizer only when an excessive number of features are involved in the classification. This is because OAT shows no performance improvement for the new ImgNet10(6464) dataset. The new ImgNet10(6464) is believed to have lost much of the categorical features, since all the classes are dog breeds, in addition to being overly downscaled. In other words, OAT is beneficial for the overfitting problem, but not for the underfitting problem. We can confirm this by artificially reducing the number of training samples to increase the features that can distinguish between each categorical sample distribution, and then observe the improvement in generalization performance by applying OAT.
Appendix G Ablation study
PGD20 | CW20 | |
1.0 | 57.45 | 52.65 |
2.0 | 58.25 | 52.16 |
3.0 | 58.31 | 52.02 |
4.0 | 59.04 | 51.71 |
5.0 | 59.48 | 51.24 |
Our experiments indicate that it is possible to enhance adversarial robustness by removing the contributions of non-robust features existing in additional data through OAT and transfer the consistency to in-distribution data. However, the increase in the robustness against targeted attacks appears smaller than that against untargeted attacks in our method. To gain insight into this phenomenon, we train the OAT models for various values and investigate the difference between untargeted and targeted attacks. In Table 8, it can be seen that the greater the influence of OOD on the training process is, the higher the robustness against PGD20 and the lower the robustness against CW20 are. According to our theoretical analysis, it can be understood that there are many features used for untargeted attacks in OOD data, whereas relatively few features exist for targeted attacks. Because targeted attacks are stronger than untargeted attacks (Carlini & Wagner, 2017), the results of Table 8 provide empirical evidence for the trade-off between the transferability of adversarial perturbations and the strength of adversarial attacks.
A similar trend was reported in the study of Chan et al. (2020). The authors showed that input gradient adversarial matching (IGAM) can transfer robustness across different tasks. Their results show that IGAM-trained models have similar or higher robustness than baseline models against weak attacks, such as FGSM or low-step PGD attacks, but are vulnerable to strong attacks. The finetuning-based method can also be regarded as an attempt to increase the robustness by using other datasets, but the abovementioned phenomenon is not observed in the method proposed by Hendrycks et al. (2019). This is because the method does not transfer the robustness learned from other datasets to the target dataset but rather uses a function learned from a dataset that has a large sample complexity and a data distribution similar to that of the target data with little modification. This can be confirmed from the small number of training iterations and small learning rate involved by the fine-tuning process. Moreover, applying the same method to a dataset with a small sample size or far from the target data distribution has no effect (Chan et al., 2020).
Appendix H When UID data are available
Model | Target | Ratio | Error rate |
Standard | CIFAR10 | - | 5.54 |
OAT80M-TI | - | 4.80 | |
Mixup | 0.0 | 4.11 | |
OAT80M-TI+Mixup | 0.6 | 3.88 | |
Standard | CIFAR100 | - | 25.13 |
OAT80M-TI | - | 24.35 | |
Mixup | 0.0 | 22.63 | |
OAT80M-TI+Mixup | 0.6 | 21.60 | |
Standard | ImgNet10 (64 x 64) | - | 13.07 |
OATImgNet990 | - | 12.12 | |
OATPlaces365 | - | 11.63 | |
Mixup | 0.0 | 11.60 | |
OATImgNet990+Mixup | 0.7 | 10.77 | |
OATPlaces365+Mixup | 0.7 | 10.39 | |
Standard | ImgNet10 (160 x 160) | - | 9.09 |
OATImgNet990 | - | 8.13 | |
OATPlaces365 | - | 8.58 | |
Mixup | 0.0 | 7.04 | |
OATImgNet990+Mixup | 0.3 | 6.46 | |
OATPlaces365+Mixup | 0.3 | 6.90 |
We train the OAT models in conjuction with UID data as follows:
(23) |
In the experiment of Figure 1(a), the Pseudo-label models are trained via a PGD-based approach, and every batch consists of 64 CIFAR-10 data and 64 pseudo-labeled data. The OAT+Pseudo-label models are also trained via a PGD-based approach, and every batch consists of 64 CIFAR-10 data, 32 pseudo-labeled data, and 32 OOD data (80M-TI). The hyperparameters () are set to (). All other conditions are the same as described in Appendix E. In the experiment of Figure 1(b), we train the models with the same configurations as Carmon et al. (2019), but every batch for the OAT+RST model comprises 128 pseudo-labeled data, 64 CIFAR-10 data, and 64 OOD data (the RST model has a batch size of 256). The hyperparameters () are set to (0.25, 0.25, 0.5).
Appendix I Experimental results for Mixup
Zhang et al. (2017) proposed a data augmentation method named Mixup. Mixup generates training examples as and , where and are two examples drawn at random from the training data, and . Here, we show that even outside of the convex combination of training data can be effectively regularized using OOD data. Table 9 shows that the models trained with the combination of OAT and Mixup always output the highest accuracy. The OAT+Mixup models are trained with the following algorithm: