AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation
Abstract.
Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.
1. Introduction
Recently, Deep Neural Networks (DNNs) have been playing prominent roles in many applications. Unfortunately, considerable studies have demonstrated that DNNs can be easily deceived if certain imperceptible perturbations are introduced to their inputs (SzegedyZSBEGF13, ; NguyenYC15, ; shi2021, ; SSA, ; ILA-DA, ; a2sc, ). These perturbed inputs, also known as adversarial examples, have enforced DNNs to produce erroneous decision and become a significant security concern in safety sensitive scenarios, such as autonomous driving (attack_driving, ) and medical diagnosis (attack_medical, ).
Nowadays, adversarial training has been proved to be an effective adversarial defense strategy (FGSM, ; AD_tripletloss, ; XieTGWYL20, ; adversarial_training_tomm, ). However, it requires to possess enough knowledge about the classification models and necessitates substantial computational costs to retrain the classification models. On the contrary, adversarial example detection, a.k.a. adversarial detection, defends against adversarial attacks by distinguishing whether the inputs are benign or manipulated. The mechanism of this type of methods can be efficiently deploy to defend many applications without the requirement of extra knowledge about the core model.
Generalization ability is vital for adversarial detection methods in real-world scenarios, because these methods tend to encounter unseen attacks and they are desired to perform consistently. Current detection techniques (lid, ; md, ; steg, ; SID, ; txt_advdetection, ) typically achieve considerable generalization ability over a few conventional attacks, such as BIM (BIM, ), PGD (pgd, ), and C&W (CW, ). However, we empirically observe that these methods exhibit instability and inadequate performance against recent attacks, such as SSA (SSA, ), Jitter (jitter, ) and ILA-DA (ILA-DA, ). These methods tend to give unsatisfactory generalization performance, because their training process usually relies on a single known adversarial attack and they have not been developed from the perspective of boosting generalization ability.
In this paper, we provide a new perspective to further analyze the generalization ability of adversarial detection methods. Clearly, the threat model of all different adversarial attacks should be identical to ensure fair analysis and comparison. Thus, the features extracted by the model from different attacks are of same dimensions within a shared feature space. Due to the variations in configurations such as attack objectives, parameters, and loss functions, the features of the examples generated from each attack form a distinct domain, named Adversarial Domain (AD), as depicted in Fig. 1(a). Formally, Adversarial Domain (AD) of a particular adversarial attack is defined as the cumulative representations of all the adversarial examples, which are generated by that attack. Existing detection methods typically select a random single attack to generate the training samples, i.e., they only select one AD as the source domain, as shown in Fig. 1(b). Apparently, there are few intersections between the source and unseen target domains, which usually lead to poor generalization performance. Additionally, the randomness inherent in the selection of the source domains will induce considerable fluctuations in generalization performance.

A straightforward solution to the above issues, as shown in Fig. 1(c), is to randomly select multiple ADs as the source domains. This strategy tends to create a larger overlap with the target domain(s) and thereby improves the generalization performance of adversarial detection. Nonetheless, this strategy also induces uncertainty, and the selected similar ADs incur additional training costs without enough performance gains.
To further address the aforementioned problems, we propose a novel detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). As shown in Fig. 1(d), by selecting Principal Adversarial Domains (PADs) as the source domains, which significantly enlarges the coverage of the entire adversarial feature space and creates larger overlap with the target domain, we can offer superior generalization performance.
Specifically, AED-PADA contains two stages, i.e., Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In the stage of PADI, since the discrepancies between the adversarial examples from various adversarial attacks are quite different compared to these of the ordinary classification tasks, we exploit adversarial supervised contrastive learning(Adv-SCL) to construct distinguishable ADs. Then, the selection of the most representative ADs must meet two key criteria. Firstly, there should be a clear distinction between candidate ADs to avoid redundancy caused by selecting similar ADs. Secondly, the combination of the candidate ADs should cover as much of the entire feature space as possible. To select the most representative ADs, we propose a Coverage of Entire Feature Space (CEFS) metric. With our CEFS metric, the formed PADs possess broad coverage of the entire feature space, and thus effectively improve the likelihood of capturing the location of the unseen target AD(s).
In the stage of PADA, we pioneer to exploit the mechanism of Multi-source Unsupervised Domain Adaptation (MUDA) to effectively utilize the rich knowledge acquired by PADs, to detect the unseen adversarial examples in the target domain. The framework of PADA is compatible with various existing MUDA methods. Since typical MUDAs only focus on extracting the semantic features from the spatial domain, we propose an adversarial feature enhancement module to extract features from both the spatial and frequency domains to construct a more comprehensive representation of adversarial examples.
Our major contributions can be summarized as follows.
-
•
We propose a novel adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation, to significantly improve the generalization performance of adversarial detection.
-
•
We propose Principal Adversarial Domains Identification to identify the PADs, which possess a large coverage of the entire adversarial example feature space, with the help of the constructed AD clustering and proposed CEFS metric.
-
•
We propose Principal Adversarial Domain Adaptation for detecting adversarial examples, by exploiting adversarial feature enhancement based Multi-source Unsupervised Domain Adaptation (MUDA), which is compatible with various existing MUDA methods. To the best of our knowledge, this is the first work to exploit MUDA for adversarial example detection.
2. Related Work
2.1. Adversarial Example Detection
The majority of existing adversarial example detection methods rely on statistical features (HendrycksG17, ; magnet, ; FarAwayDis1, ; FarAwayDis2, ; Hidden4, ; Erase-and-Restore, ). They usually assume that the benign and adversarial examples originate from different distributions, and construct detectors based on the distinct statistical characteristics of these examples. Specifically, Grosse et al. (STest1, ) utilize Maximum Mean Discrepancy for the adversarial example detection. Li et al. employ Principal Component Analysis (PCA) to extract statistical feature, and construct a cascade classifier based on Support Vector Machines (SVMs) (PCAInconsistency, ). Feinman et al. (KDBU, ) carry out the detection based on Kernel Density (KD) and Bayesian-Uncertainty (BU) estimation. Ma et al. (lid, ) exploit the concept of Local Intrinsic Dimensionality (LID) to calculate the distance between the distribution of inputs and their neighbors. Lee et al. (md, ) utilize Guassian Discriminant Analysis (GDA) to model the difference between the benign and adversarial samples, and differentiate them based on Mahalanobis Distance (MD). Liu et al. (steg, ) point out that steganalysis could be applied to adversarial example detection, and propose a steganalysis-based detection method (Steg). Tian et al. (SID, ) reveal the inconsistency in the boundary fluctuations between the adversarial and benign examples, and construct Sensitivity Inconsistency Detector (SID) to identify the adversarial examples. Wang et al. (txt_advdetection, ) embed hidden-layer feature maps of DNNs into word vectors, and detect adversarial examples via Sentiment Analysis (SA).
Existing adversarial detection methods exhibit poor generalization because their training typically depends on a single known attack which vastly differs from unseen test attacks. In this paper, we propose a novel adversarial detection method, which can substantially increase the coverage of the entire adversarial feature space and create larger overlap with the test adversarial attacks.
2.2. Multi-source Unsupervised Domain Adaptation
Transfer learning (transfer_learning_survey1, ; transfer_learning_survey2, ; transfer_learning_survey3, ) is a deep learning technique which leverages knowledge acquired from the source task(s) to improve learning efficiency and performance on a related but different target task. Unsupervised domain adaptation (UDA) (unsuperviesed_domain_adaptation1, ) is a type of popular method in transfer learning which aims to migrate knowledge learned from the labeled source domain(s) to the target domain, where only unlabeled target data are available for training. Single-source Unsupervised Domain Adaptation (SUDA) (single-uda1, ; single-uda2, ; single-uda3, ) is widely explored in the previous research, which can transfer knowledge from one single source to one target domain. Compared to SUDA, Multi-source Unsupervised Domain Adaptation (MUDA) acquires richer information while introduces a new challenge, i.e., how to effectively bridge the domain gaps between all source domains and the target domain.
Various distribution alignment schemes have been proposed to achieve alignment between source and target domains. For example, Multiple Feature Spaces Adaptation Network (MFSAN) (aaai-mda, ) leverages Maximum Mean Discrepancy (MMD) to align the distributions of each pair of source and target domains in multiple specific feature spaces and aligns the outputs of classifiers by utilizing the domain-specific decision boundaries. Peng et al. (moment-mda, ) provide new theoretical insights specifically for moment matching to align the sources with each other and with the target. Owing to the development of generative adversarial networks, adversarial learning is widely used to find a domain-invariant feature space. It either focuses on approximating all combinations of pairwise domain discrepancies between each source and the target (cocktail, ; DARN, ) or uses a single domain discriminator (MIAN, ). Other explicit measures of discrepancy, such as Wasserstein distance (mdmn, ; Wasserstein, ), are also employed in MUDA to align the distribution of features. In addition to distribution alignment, the graph-matching metric (graph-muda1, ; graph-muda2, ) also considers the structural and geometric information, which achieves the alignment between the source and target domains by mapping both nodes and edges in a graph.
In this paper, we argue that the poor generalization performance of adversarial detection is due to the significant discrepancy between the source domains utilized for training and the target domain utilized for testing. Therefore, based on the MUDA approach, we propose a viable solution, Principal Adversarial Domain Adaptation, to reduce this great gap.
3. Methodology
To improve the generalization ability of adversarial example detection, we propose a novel adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). The entire framework of our AED-PADA is shown in Fig. 2. AED-PADA contains two stages, Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In the stage of PADI, we first incorporate adversarial supervised contrastive learning(Adv-SCL) to acquire distinguishable ADs. Then, we construct AD clustering to group ADs into different clusters. By proposing the Coverage of Entire Feature Space (CEFS) metric, we select the most representative ADs from each cluster to form PADs. In the stage of PADA, we propose an adversarial feature enhancement method based on the original MUDA method to effectively leverage PADs to detect the unseen adversarial attack methods. Note that Secs. 3.2, 3.3, 3.4 and 3.5 introduce PADI while Sec. 3.6 presents PADA.
3.1. Notations
Suppose we have a labeled class classification dataset with samples, where the label . A classifier is trained on to classify an input sample into one of the classes, . Adversarial attack aims to fool into assigning incorrect labels via generating adversarial examples. We have a set of adversarial attack methods, , which consists of distinct adversarial attack methods. is the adversarial dataset generated by the -th adversarial attack method , where denotes the -th adversarial example generated by the -th adversarial attack , and presents its corresponding prediction label. We define as, , which is a mapping from an adversarial example to its attack method. Consequently, we construct a set of adversarial examples based on , which comprises types of adversarial examples, .

3.2. Adversarial Domain Acquisition
Typically, when we extract the features of the adversarial examples, which are generated from different untargeted attacks via common CNNs, the features tend to spread in an indistinguishable manner in the feature space. To acquire distinguishable representations of adversarial examples from different attacks, we exploit adversarial supervised contrastive learning (Adv-SCL) to extract features. Then, we form the Adversarial Domains (ADs), each of which is defined as the representations of all the adversarial examples generated from a particular adversarial attack.
Based on the supervised contrastive learning (SCL) method (KhoslaTWSTIMLK20, ), Adv-SCL neglects the classification result of the adversarial examples and focuses sorely on identifying their generation methods. Specifically, each adversarial example in is characterized by the method used to generate it, i.e., the adversarial attack method . The of the adversarial example serves as a key discriminant for determining whether different are positive or negative samples. Here, a pair of examples from the same attack are considered as positive, while those from different attacks are considered as negative. This learning strategy amplifies the dissimilarities across examples from various attacks and generates a more appropriate representation.
As shown in Fig. 2(a), the input of AD acquisition is the adversarial example set . Adv-SCL consists of an Encoder Network () and a Projection Network (). extracts a feature vector from the input adversarial example , and further projects this representation vector to an auxiliary vector , which will be discarded after training.
(transforming1, ; transform2, ) investigate that transformation strategies, such as cropping and rescaling, bit-depth reduction, JPEG compression, and randomization, have been used to defend against adversarial examples. Therefore, in order to prevent any potential ineffectiveness of adversarial samples caused by transformations, we only use normalization () as the data augmentation operation in the stage of AD Acquisition.
We randomly sample example-label pairs in , , where is the adversarial attack of the -th adversarial example. The corresponding batch employed for training consists of pairs, , where and are two views of , and they are from the same adversarial attack method . The loss function of Adv-SCL is defined as,
(1) |
(2) |
Here, denotes the index of the augmented samples. is a subset of which includes all indexes except . represents the indices of positive samples in the batch except , and stands for its cardinality. denotes the dot product. is the temperature parameter which scales the similarity values.
3.3. Adversarial Domain Clustering
After the acquisition of ADs, we obtain as AD of the corresponding adversarial attack . For , we construct a set of ADs, . Since different ADs tend to distribute differently in the feature space and many of them possess various portions of overlaps, it is quite difficult to directly select the most representative ADs from scratch. Then, it is vital to explore the similarities among different ADs.
To address this issue, we intend to perform the selection via two steps, i.e., clustering to assess the similarities of ADs and selecting the most representative ADs from each cluster. Then, we construct a viable solution named Adversarial Domain Clustering, as shown in Fig. 2(a). This strategy groups ADs into different clusters, by ensuring that the similarity among ADs within the same cluster is maximized, while the similarity among ADs across different clusters is minimized. AD clustering avoids the redundancy and additional costs by preventing the repeated selection of similar attack methods, and it facilitates the selection of the most representative ADs in the subsequent steps.
Since the samples to be clustered here are collections of features, rather than individual data points, traditional clustering methods such as K-Means (kmeans, ) cannot be directly applied. Since spectral clustering (NgJW01, ) only requires the similarity matrix among samples, it is utilized to construct the clustering step in our AD clustering.
For the estimation of the similarity matrix in spectral clustering, which represents the similarities between different ADs, we propose Adversarial Domain Similarity Measurement (ADSM) based on Jensen-Shannon divergence (JSD) (ErvenH14, ), which quantifies the similarity between two probability distributions. To compute the similarities, we transform each in to by converting into probabilities and performing normalizations. Since a smaller value of JSD between two ADs implies a smaller discrepancy in their probability distributions, can be computed via
(3) |
By letting the element at the -th row and -th column refer to the similarity between and , can be calculated by
(4) |
Since the spectral clustering cannot automatically determine the optimal number of clusters, the Calinski-Harabasz score (CH score) (mann2023proposed, ), which requires no knowledge of the cluster shape, is utilized to evaluate the clustering performance and estimate the optimal number of clusters. Note that it measures both the within-cluster and between-cluster distances, thereby offering a more comprehensive view of the clustering performance. CH score is calculated by
(5) |
where and is the number of clusters and data respectively, denotes between-cluster covariance matrix, denotes within-cluster covariance matrix, and denotes the trace of the matrix.
Higher value of indicates better clustering. We posit that the inherent structure of the entire feature space composed of different ADs is highly complex. Given that the CH score often awards the highest evaluation when cluster numbers , we choose to commence our consideration from the scenario where in this paper. With the help of the CH score, our AD clustering can automatically group the ADs into optimal number of clusters.
3.4. Principal Adversarial Domains Selection
After similar ADs are clustered, then the selection step can be performed. To form the most effective Principal Adversarial Domains (PADs), it is vital to select appropriate ADs from different clusters. Thus, we propose the Coverage of Entire Feature Space metric (CEFS) to guide the PADs selection process.
CEFS is a ratio-based metric which contains two aspects, Intra-Domain Dispersion (IDD) and Discrepancy between Adversarial Domains (DAD). IDD represents the dispersion among features within each AD, where a higher value indicates a larger coverage of the feature space. For any AD, , where and denote the number and dimension of features in , respectively, IDD can be computed by
(6) |
DAD represents the discrepancies between the two selected ADs, this discrepancy can be quantified using a distance metric , such as Kullback-Leibler divergence (kl-divergence, ) or Maximum Mean Discrepancy (MMD) (mmd, ). A lower DAD value indicates a greater similarity between the two selected ADs. DAD can be calculated as,
(7) |
Then, CEFS can be obtained via
(8) |
where and is the number of ADs in PADs and the number of ADs in the AD set respectively, and denotes a concatenation operation.
CEFS is a ratio-based metric to quantify the coverage of selected ADs within the entire feature space (EFS). In Eq. (8), the numerator represents Intra-Domain Dispersion (IDD), indicates feature dispersion within each AD. The denominator measures the discrepancy between the selected ADs and EFS. As CEFS increases, the numerator grows, indicating a larger feature space for each AD, and the denominator decreases, suggesting a greater similarity between the selected ADs and the EFS. Consequently, the selected ADs have a larger coverage of the EFS, increasing the likelihood of capturing unseen ADs. We utilize CEFS to select PADs, which can give a larger coverage of the feature space with the same number of ADs. PADs actually enhance the probability of capturing the location of unseen ADs, thereby improving the generalization performance.
3.5. Training process of Principal Adversarial Domains Identification
The process of Principal Adversarial Domains Identification (PADI) consists of Adversarial Domain Acquisition (AD Acquisition), Adversarial Domain Clustering (AD Clustering) and Principal Adversarial Domains Selection (PADs Selection). The training process of Principal Adversarial Domains Identification is described in Algorithm 1.
For the original dataset , we divide it into two non-overlapping datasets and , which are used for AD Acquisition and AD Clustering, respectively. The adversarial examples sets for AD Acquisition are denoted as . , where is the number of adversarial attacks, is the adversarial examples set generated by the -th adversarial attack, and is the cardinality of .
Likewise, The adversarial examples sets for AD Clustering are denoted as . , , where is the number of adversarial attacks, is the adversarial examples set generated by the -th adversarial attack, and is the cardinality of . The rationale behind this strategy is to prevent the model after contrastive learning from overfitting to the data on AD acquisition, which leads to subpar performance in AD Clustering and PAD Confirming.
3.6. Principal Adversarial Domain Adaptation
To transfer the learned knowledge from PADs to the target domain, i.e., the adversarial examples generated from unseen methods, we propose Principal Adversarial Domain Adaptation (PADA) to detect adversarial examples, as depicted in Fig. 2(b).
Ideally, the inputs of PADA comprise source data and unseen target data. The source data, denoted as, , consists of an equal number of benign examples and adversarial examples, where the adversarial examples contain types of adversarial examples determined by PADs. Since the unseen target data is unavailable, we can only use the training data as a proxy. The proxy data, denoted as , also contains an equal number of benign examples and adversarial examples, drawn from the training benign set and the training adversarial set , respectively. contains types of adversarial examples. During the PADI stage, we select the most representative ADs, i.e., types of ADs from these types to form the PADs. Consequently, employing as a proxy for unseen target data serves two key purposes. Firstly, it prevents overlap between the source data used for training and the unseen target data used for testing. Secondly, the PADA process compels the transfer of specific knowledge from PADs to the more extensive set . This transfer aims to boost the generalization capabilities of networks of PADA, with the goal of enhancing their performance when testing the unseen target data.
PADA consists of three sequential components, feature extraction, feature alignment and classification. The framework of PADA is compatible with various widely used Multi-source Unsupervised Domain Adaptation (MUDA) methods (moment-mda, ; DARN, ; mdmn, ; aaai-mda, ). The experimental results indicate that our PADA possesses excellent generalization capabilities based on various existing MUDA methods. Due to the simplicity and effectiveness of MFSAN (aaai-mda, ), along with its superior detection performance compared to other MUDA methods, we select MFSAN as the basic MUDA method for our PADA.

In the feature extraction component, unfortunately, existing MUDA methods including MFSAN, only extract spatial features. (GeirhosRMBWB19, ; WangWHX20, ; FanLCZG21, ) indicate that the high-frequency component of an image plays a crucial role in the prediction of deep neural network. Adversarial perturbations are more likely to be concealed in the high-frequency information of images. To capture more comprehensive features of adversarial examples, we propose an adversarial feature enhancement (AFE) module as the feature extraction component. AFE contains both the spatial feature extraction and frequency feature extraction branches. For frequency feature extraction, we design perturbations extraction filters (PEF), which is a plug-and-play operation based on Spatial Rich Model (SRM) (SRM, ) to capture subtle perturbation signals hidden in the adversarial examples.
SRM typically uses 30 basic kernels to capture textures and discontinuities of images, and has been employed in image forensics to detect subtle and irregular manipulation or hidden information. Subsequent research (SRM_detection, ) indicates that in image manipulation detection, employing only three kernels can achieve considerable detection performance, and more many basic kernels does not further enhance performance. Essentially, these kernels are high-pass filters, which enhance the high-frequency signals and remove the low-frequency components of the inputs. Adversarial perturbations are typically hidden within the high-frequency information of an image. Therefore, our PEF uses the same three kernels to extract subtle perturbation signals. As shown in Fig. 3, PEF consists of three filters which are implemented by convolution kernels with fixed parameters. We set the kernel size of PEF to be , and the output channel size of PEF is 3. Both and employ from Sec. 3.2, aligned with the threat models, specifically ResNet-18 or VGG-16. Then, the enhanced adversarial features, denoted as , are fed into the next module.
In the alignment component, we assign the specific network to map each source domain and proxy domain to the different feature space, and utilizes MMD as the distance metric to align them. is used to align the features between source domain and proxy domain,
(9) |
where is the enhanced adversarial features after AFE, is the number of source domains. each source domain is associated with the corresponding network .
In the classification component, we utilize the cross-entropy loss to ensure the correct classification. Given that different domain-specific classifiers are trained on their respective source domains, resulting in significant discrepancies among their predictions for the same proxy domain. To address this, is used to minimize the differences among the predictions of various classifiers.
(10) |
Overall, the total loss is formulated as follow, and are hyperparameters used to adjust the weights of and , respectively.
(11) |
4. Experiments
4.1. Experimental Setups
4.1.1. DNN backbones
We evaluate the performance of the proposed method, by employing two widely used DNN architectures, i.e., ResNet-18 (HeZRS16, ) and VGG-16 (SimonyanZ14a, ), according to (SID, ).
4.1.2. Datasets
We evaluate the performance of the proposed method on three popular datasets, CIFAR-10 (cifar10, ), SVHN (SVHN, ) and ImageNet (imagenet, ). All the images in ImageNet are resized to via pre-processing.
As shown in Table 1, each of the three datasets is divided into three category-balanced and non-overlapping subsets: Train-acq-src, Train-clu-pro and Test. The training data for AD Acquisition in the PADI stage and the source data in the PADA stage are selected from Train-acq-src. Similarly, the training data for AD Clustering in the PADI stage and the proxy data in the PADA stage are selected from Train-clu-pro.
For CIFAR-10, we divide the official CIFAR-10 training set into two halves, each of which contains 25,000 images, to form Train-acq-src and Train-clu-pro. Besides, we form Test with all the 10,000 images in the official CIFAR-10 testing set. For SVHN, we set the number of Train-acq-src and Train-clu-pro to 20,000 instead of 25,000, due to the presence of category imbalance in the official SVHN training dataset, to construct a category-balanced training dataset. Test of SVHN also consists of 10,000 images, which are randomly selected from the official SVHN testing set. For ImageNet, We divide the official ImageNet (ILSVRC2012) validation set, which consists of 50,000 images, into two parts, 40,000 images for training and 10,000 images for Test. The 40,000 training images are then equally divided into two subsets, Train-acq-src and Train-clu-pro, each of which contains 20,000 images.
Ten types of training adversarial examples are created based on the benign examples from the Train-acq-src and Train-clu-pro subsets. Subsequently, another seven attacks are applied to generate the testing adversarial examples on the Test set. These specific adversarial attack methods are detailed in Sec. 4.1.4. Note that the attack methods for training and testing are entirely different, ensuring that the training data and testing data are mutually exclusive.
4.1.3. Data splitting strategy for the training stage
Table 2 presents the data splitting strategy for the training stage of our AED-PADA. AED-PADA contains two stages during training. In the PADI stage, to improve the efficiency, we randomly select 10,000 images from each type of adversarial attacks and their corresponding benign images from the Train-acq-src for supervised contrastive learning. To alleviate the overfitting problem, we select 10,000 images per attack from Train-clu-pro for AD Clustering and PAD Confirming.
In the PADA stage, the inputs of PADA are the images from the source domains and the unseen target domains. Each source domain contains 10,000 samples, with an equal split of 5,000 benign examples and 5,000 adversarial examples, which are all randomly selected from Train-acq-src. Since the data from unseen domains is unavailable, we can only utilize the training examples as a proxy, which are generated from all the 10 types of training attacks. Specifically, the proxy domain contains 10,000 samples, i.e., 5,000 benign examples and 5,000 adversarial examples, which are all randomly selected from the Train-clu-pro. Note that the adversarial examples in the proxy domain are obtained via 10 different training attacks, with each attack providing 500 examples. This specific arrangement has two benefits. Firstly, it ensures zero overlap between the source and proxy domains during training, to avoid data leakage. Secondly, it enhances the diversity of the training data, to further benefit the generalization of the detection model.
Dataset | Train-acq-src | Train-clu-pro | Test | |||
benign | adv | benign | adv | benign | adv | |
CIFAR-10 | ||||||
SVHN | ||||||
ImageNet |
Training stage | benign examples | adversarial examples | |
PADI | AD Acquisition | - | /attack |
AD Clustering | - | /attack | |
PADA | source domain | /source | /source |
proxy domain |
4.1.4. Baseline adversarial attack methods
To evaluate the generalization capabilities of the detection methods, it is important to consider a diverse set of attack methods, including both the earlier and recent techniques. Here, 10 earlier attack methods are utilized to generate adversarial examples for training, including FGSM (FGSM, ), BIM (BIM, ), C&W (CW, ), DeepFool (DeepFool, ), PGD (pgd, ), MI-FGSM (MIM, ), DIM(DIM, ), ILA (ILA, ), YA-ILA (ILA++, ) and SI-NI-FGSM (si-ni-fgsm, ). 7 SOTA attack methods are employed to generate adversarial examples for testing, including APGD (APGD, ), ILA-DA (ILA-DA, ), Jitter (jitter, ), SSA (SSA, ), TI-FGSM (TIM, ), VMI-FGSM (vmifgsm, ) and VNI-FGSM (vmifgsm, ). The adversarial attacks for training and testing are entirely distinct, so the adversarial detection results on the unseen testing attacks indicate the generalization performance of our proposed detection method.
4.1.5. Baseline adversarial detection methods
We compare the generalizability of our AED-PADA with five state-of-the-art adversarial detection methods, LID (lid, ), MD (md, ), Steg (steg, ), SID (SID, ) and SA (txt_advdetection, ). These detection methods differ from ours as they employ only a single attack method for training. To ensure a fair comparison with our AED-PADA, it is necessary to consider the scenario training by multiple attacks. Consequently, we consider the following two configurations for the SOTA detection methods: (1) For single attack training, we utilize all data from Train-acq-src to ensure sufficient data volume for effective training. (2) For multiple attacks training, the methods are trained by the adversarial examples in PADs, which are consistent with our AED-PADA. Each source domain contains 10,000 samples from Train-acq-src, evenly divided into 5,000 benign and 5,000 adversarial examples.
Dataset (Backbone) | Detector | Accuracy on Unseen SOTA Adversarial Attacks (%) | Averaged Accuracy (%) | ||||||
APGD | ILA-DA | Jitter | SSA | TI-FGSM | VMI-FGSM | VNI-FGSM | |||
CIFAR-10 (ResNet-18) | LID (lid, ) | 90.044 | 94.121 | 78.240 | 59.839 | 90.863 | 91.056 | 86.496 | 84.380 |
LID-PADs | 78.903 | 86.068 | 70.045 | 57.330 | 82.278 | 82.333 | 79.978 | 76.705 | |
MD (md, ) | 65.976 | 96.657 | 61.764 | 54.225 | 69.282 | 69.653 | 69.726 | 69.612 | |
MD-PADs | 63.371 | 97.970 | 62.058 | 51.501 | 68.100 | 68.814 | 69.351 | 68.738 | |
Steg (steg, ) | 83.521 | 94.692 | 86.759 | 58.064 | 90.496 | 90.814 | 92.057 | 85.200 | |
Steg-PADs | 84.085 | 94.155 | 88.885 | 62.305 | 90.715 | 90.860 | 91.690 | 86.099 | |
SID (SID, ) | 86.113 | 60.999 | 74.629 | 53.732 | 87.323 | 87.330 | 82.100 | 76.032 | |
SID-PADs | 86.520 | 61.820 | 74.860 | 54.010 | 88.285 | 88.220 | 82.935 | 76.664 | |
SA (txt_advdetection, ) | 84.907 | 90.731 | 87.365 | 81.323 | 89.275 | 90.013 | 91.967 | 87.940 | |
SA-PADs | 89.125 | 93.485 | 91.925 | 88.885 | 94.760 | 94.875 | 95.595 | 92.664 | |
AED-PADA | 90.545 | 97.855 | 97.065 | 84.255 | 97.700 | 97.765 | 97.675 | 94.694 | |
CIFAR-10 (VGG-16) | LID (lid, ) | 85.024 | 93.357 | 75.092 | 58.639 | 88.201 | 88.173 | 81.540 | 81.432 |
LID-PADs | 70.493 | 89.403 | 65.190 | 58.588 | 75.553 | 75.408 | 73.500 | 72.590 | |
MD (md, ) | 51.322 | 72.013 | 50.904 | 52.012 | 51.809 | 51.799 | 50.963 | 54.403 | |
MD-PADs | 50.033 | 81.779 | 50.028 | 51.349 | 50.005 | 50.856 | 51.349 | 55.057 | |
Steg (steg, ) | 79.280 | 92.756 | 88.466 | 57.470 | 88.854 | 88.891 | 89.390 | 83.586 | |
Steg-PADs | 79.900 | 92.265 | 88.705 | 59.360 | 89.015 | 88.800 | 89.235 | 83.897 | |
SID (SID, ) | 83.235 | 66.047 | 73.260 | 53.373 | 84.743 | 84.582 | 76.210 | 74.493 | |
SID-PADs | 83.105 | 66.240 | 73.230 | 53.375 | 84.605 | 84.385 | 76.145 | 74.441 | |
SA (txt_advdetection, ) | 80.198 | 85.036 | 81.003 | 63.984 | 85.356 | 86.293 | 87.627 | 81.357 | |
SA-PADs | 79.720 | 83.825 | 77.985 | 63.335 | 83.765 | 84.475 | 85.755 | 79.873 | |
AED-PADA | 84.370 | 94.365 | 93.260 | 65.765 | 93.135 | 93.410 | 93.545 | 88.264 |
Dataset (Backbone) | Detector | Accuracy on Unseen SOTA Adversarial Attacks (%) | Averaged Accuracy (%) | ||||||
APGD | ILA-DA | Jitter | SSA | TI-FGSM | VMI-FGSM | VNI-FGSM | |||
SVHN (ResNet-18) | LID (lid, ) | 66.431 | 87.940 | 63.697 | 60.324 | 71.403 | 71.384 | 64.952 | 69.447 |
LID-PADs | 61.849 | 86.846 | 60.551 | 58.515 | 69.581 | 69.639 | 64.182 | 67.309 | |
MD (md, ) | 61.167 | 66.439 | 59.906 | 58.270 | 67.625 | 67.609 | 60.870 | 63.127 | |
MD-PADs | 60.768 | 50.190 | 59.637 | 57.967 | 67.257 | 67.337 | 60.688 | 60.549 | |
Steg (steg, ) | 71.591 | 51.509 | 94.268 | 59.865 | 96.943 | 96.874 | 95.954 | 81.000 | |
Steg-PADs | 73.460 | 50.860 | 97.115 | 74.360 | 98.005 | 98.040 | 97.575 | 84.202 | |
SID (SID, ) | 68.859 | 92.207 | 63.763 | 56.571 | 74.074 | 73.995 | 64.150 | 70.517 | |
SID-PADs | 69.670 | 93.420 | 64.155 | 56.800 | 75.285 | 75.120 | 64.780 | 71.319 | |
SA (txt_advdetection, ) | 68.092 | 73.066 | 65.288 | 69.276 | 82.758 | 82.823 | 82.467 | 74.824 | |
SA-PADs | 69.490 | 73.100 | 63.610 | 66.550 | 78.155 | 78.220 | 77.785 | 72.416 | |
AED-PADA | 74.455 | 99.730 | 99.730 | 87.305 | 99.730 | 99.730 | 99.730 | 94.344 | |
SVHN (VGG-16) | LID (lid, ) | 65.645 | 96.605 | 62.731 | 64.252 | 74.205 | 74.263 | 70.296 | 72.571 |
LID-PADs | 69.710 | 97.154 | 63.626 | 63.955 | 76.258 | 76.306 | 70.119 | 73.875 | |
MD (md, ) | 51.511 | 58.333 | 51.233 | 51.859 | 51.870 | 52.268 | 51.465 | 52.648 | |
MD-PADs | 51.920 | 56.484 | 51.578 | 52.745 | 51.103 | 50.285 | 51.748 | 52.266 | |
Steg (steg, ) | 70.586 | 98.661 | 93.882 | 58.496 | 95.774 | 95.575 | 94.549 | 86.789 | |
Steg-PADs | 72.925 | 98.975 | 97.700 | 58.905 | 98.485 | 98.480 | 98.085 | 89.079 | |
SID (SID, ) | 68.159 | 69.813 | 61.320 | 60.722 | 74.468 | 74.454 | 67.213 | 68.021 | |
SID-PADs | 68.365 | 62.450 | 61.145 | 56.790 | 70.075 | 70.080 | 63.470 | 64.625 | |
SA (txt_advdetection, ) | 65.862 | 72.558 | 63.304 | 67.745 | 72.087 | 72.183 | 73.811 | 69.650 | |
SA-PADs | 65.060 | 69.765 | 59.365 | 62.605 | 66.795 | 66.955 | 67.385 | 65.419 | |
AED-PADA | 73.500 | 99.220 | 98.995 | 68.990 | 99.295 | 99.310 | 99.220 | 91.219 |
Dataset (Backbone) | Detector | Accuracy on Unseen SOTA Adversarial Attacks (%) | Averaged Accuracy (%) | ||||||
APGD | ILA-DA | Jitter | SSA | TI-FGSM | VMI-FGSM | VNI-FGSM | |||
ImageNet (ResNet-18) | LID (lid, ) | 52.868 | 54.078 | 50.991 | 53.263 | 53.793 | 56.566 | 56.549 | 54.015 |
LID-PADs | 57.340 | 63.910 | 55.195 | 58.150 | 58.890 | 60.900 | 61.245 | 59.376 | |
MD (md, ) | 53.760 | 52.031 | 51.728 | 51.603 | 53.533 | 59.438 | 58.536 | 54.376 | |
MD-PADs | 54.560 | 50.425 | 50.685 | 52.765 | 54.695 | 62.165 | 61.040 | 55.191 | |
Steg (steg, ) | 78.415 | 86.954 | 86.553 | 85.431 | 65.101 | 94.679 | 94.606 | 84.534 | |
Steg-PADs | 90.725 | 94.735 | 94.700 | 94.535 | 89.660 | 94.875 | 94.875 | 93.444 | |
SID (SID, ) | 51.885 | 56.475 | 51.745 | 51.425 | 55.285 | 55.335 | 54.995 | 53.878 | |
SID-PADs | 55.070 | 56.975 | 53.785 | 52.335 | 59.790 | 59.860 | 58.400 | 56.602 | |
SA (txt_advdetection, ) | 85.738 | 87.149 | 85.912 | 85.712 | 86.960 | 85.988 | 86.014 | 86.210 | |
SA-PADs | 66.955 | 69.230 | 67.665 | 66.930 | 67.385 | 68.490 | 68.550 | 67.886 | |
AED-PADA | 98.830 | 99.935 | 99.965 | 99.965 | 99.070 | 99.965 | 99.960 | 99.670 | |
ImageNet (VGG-16) | LID (lid, ) | 53.144 | 53.672 | 52.029 | 51.694 | 52.056 | 57.081 | 56.173 | 53.693 |
LID-PADs | 56.295 | 59.330 | 55.110 | 54.825 | 55.695 | 59.960 | 59.310 | 57.218 | |
MD (md, ) | 53.760 | 52.031 | 51.728 | 51.603 | 53.533 | 59.438 | 58.536 | 54.376 | |
MD-PADs | 53.470 | 50.705 | 51.960 | 50.970 | 52.430 | 57.385 | 56.585 | 53.358 | |
Steg (steg, ) | 78.931 | 90.915 | 85.852 | 84.279 | 66.300 | 94.695 | 94.969 | 85.134 | |
Steg-PADs | 90.585 | 94.185 | 94.115 | 94.165 | 90.375 | 94.530 | 94.535 | 93.213 | |
SID (SID, ) | 41.718 | 50.624 | 50.477 | 44.102 | 51.015 | 51.567 | 51.528 | 48.719 | |
SID-PADs | 43.238 | 46.165 | 53.794 | 50.002 | 49.125 | 53.257 | 46.382 | 48.852 | |
SA (txt_advdetection, ) | 87.747 | 93.025 | 89.939 | 88.176 | 86.256 | 92.133 | 92.255 | 89.933 | |
SA-PADs | 91.590 | 94.910 | 93.415 | 92.310 | 91.280 | 94.250 | 94.280 | 93.148 | |
AED-PADA | 98.960 | 99.930 | 99.935 | 99.890 | 98.225 | 99.935 | 99.930 | 99.544 |
4.1.6. Implementation details
In this paper, adversarial examples are generated by the untargeted white-box attacks with the norm constraint. The perturbation of the training adversarial examples is constrained to a challenging scenario, where the maximum magnitude of the adversarial perturbation is set to 2. The step size and number of iterations for adversarial attacks are set to 1/255 and 10, respectively. During PADA stage, we utilize MFSAN (aaai-mda, ) as the basic MUDA method and employ the same training strategy as it. All the adversarial example detection methods are trained consistently for 100 epochs. To evaluate the performance of our proposed method, the widely used Accuracy is employed as the metric for the adversarial example detection task. We utilize MFSAN as the basic MUDA method in the PADA stage of our framework and set the trade-off parameters , which respectively control the importance of and . The experiments are conducted on an NVIDIA GeForce RTX 3080Ti GPU.
4.2. Performance Evaluations
Here, we compare our method with 5 SOTA detection methods, including LID (lid, ), MD (md, ), Steg (steg, ), SID (SID, ) and SA (txt_advdetection, ).
4.2.1. Generalization performances against unseen adversarial attacks
Tables 3, 4, and 5 present the cross-attack detection results across 7 unseen testing adversarial attacks on CIFAR-10, SVHN, and ImageNet, respectively. Existing adversarial detection methods, which named without ‘-PADs’, are trained on a single attack with their original settings. Each of their results is the averaged detection accuracy with 10 training attacks being utilized one after another in the training process. On the contrary, those with ‘-PADs’ are trained on multiple attacks to be fairly compared to our AED-PADA.
These results provide compelling evidence that our approach obviously outperforms these state-of-the-art adversarial detection methods on the generalization ability. Note that the performance of our approach (99.544%) surpasses that of SID (48.719%) by a significant margin, i.e., 50.825%, under the settings of ImageNet and VGG-16. Besides, our superiority is particularly achieved in challenging scenarios, where the maximum magnitude of the adversarial perturbation is merely set to 2. However, in previous studies, the maximum magnitude is typically set to 4 or 8. The reduction in the magnitude of perturbations significantly increases the difficulty of detecting adversarial examples and reduces the detection performance. The experimental results in Tables 3, 4, and 5 demonstrate that our AED-PADA is effective in the scenarios characterized by subtle adversarial perturbations.
As can be observed, our AED-PADA demonstrates better generalization ability on ImageNet compared to that on CIFAR-10 and SVHN, because the image resolution in ImageNet () is higher than that in CIFAR-10 and SVHN (). Intuitively, a larger image tends to provide a larger space for adversarial perturbation generation, which makes the differences between the perturbations obtained from various attack methods more pronounced, i.e., reducing the overlaps between different adversarial attacks in the adversarial feature space. By selecting PADs, our AED-PADA is able to occupy a larger feature space, thus achieving a better generalization ability.
Besides, PADs may not be suitable to be directly applied to the existing methods. As can be observed, if the SOTA methods directly adopt PADs for training (denoted as X-PADs), their performances may not always increase. For instance, the generalization performance of LID-PADs on CIFAR-10 is inferior to that of LID. We postulate that this discrepancy can be attributed partially to the reduction in the volume of the training data, and partially to the incompatibility between PADs and the framework of LID. This observation further verifies the effectiveness of our PADA in our AED-PADA framework.
4.2.2. Generalization performances across different backbones and datasets
To thoroughly evaluate the generalization ability of adversarial detection methods, it is also vital to assess their performances in the scenarios where the training and testing adversarial examples are from different backbones and datasets. A detection method with better cross-attack, cross-backbone, and cross-dataset capabilities tends to be better suited for practical applications in complex environments.
Table 6 displays the generalization performances of different detection methods across different backbones and datasets. The cross-backbone results are obtained on CIFAR-10, and the cross-dataset results are obtained when the backbone is ResNet-18. In general, results from both scenarios demonstrate that our method exhibits the best generalization ability. Since the structures tend to vary significantly among different types of backbones, which will result in different feature dimensions between training and testing, both LID and MD are not applicable in cross-backbone experiment, i.e., they lack the capability to generalize across backbones, because they utilize the intermediate features of the backbones for detection.
Detector | Averaged Accuracy (%) | ||||
Cross-Backbone | Cross-Dataset | ||||
ResNet-18 VGG-16 | VGG-16 ResNet-18 | CIFAR-10 SVHN | SVHN CIFAR-10 | ||
LID (lid, ) | - | - | 71.619 | 80.233 | |
MD (md, ) | - | - | 62.437 | 61.429 | |
Steg (steg, ) | 82.894 | 85.157 | 53.249 | 78.155 | |
SID (SID, ) | 74.169 | 76.458 | 68.928 | 73.442 | |
SA (txt_advdetection, ) | 80.751 | 87.773 | 58.910 | 51.694 | |
Ours | 89.301 | 91.624 | 82.964 | 85.113 |
Detector | Known | Unseen | ||||
2 | 1 | 4 | 8 | avg | ||
LID [13] | 84.4 | 56.2 | 71.0 | 76.1 | 67.8 | |
MD [14] | 69.6 | 57.1 | 66.2 | 66.4 | 63.2 | |
Steg [15] | 85.2 | 67.8 | 90.0 | 90.4 | 82.7 | |
SID [16] | 76.0 | 55.2 | 68.0 | 64.6 | 62.6 | |
SA [17] | 87.9 | 75.7 | 89.8 | 88.9 | 84.8 | |
Ours | 94.7 | 89.9 | 97.0 | 95.2 | 94.0 |
Detector | Known | Unseen | ||||
2 | 1 | 4 | 8 | avg | ||
LID [13] | 69.4 | 55.0 | 69.9 | 75.0 | 66.6 | |
MD [14] | 63.1 | 55.4 | 63.4 | 65.2 | 61.3 | |
Steg [15] | 81.0 | 58.3 | 94.7 | 95.4 | 82.8 | |
SID [16] | 70.5 | 51.9 | 64.8 | 65.4 | 60.7 | |
SA [17] | 74.8 | 55.7 | 83.9 | 84.1 | 74.6 | |
Ours | 93.3 | 89.1 | 98.6 | 99.5 | 95.7 |
4.2.3. Generalization performances across different maximum perturbation magnitudes
Here, we evaluate the robustness of the detection models by focusing on a challenging scenario where the maximum perturbation magnitude of the adversarial examples to be detected is unknown. All the detection models are trained on adversarial examples with a maximum perturbation magnitude of 2, while the maximum magnitude of the testing adversarial perturbations is varied across 1, 2, 4 and 8, respectively. Tables 8 and 8 present the generalization performances across different maximum perturbation magnitudes on CIFAR-10 and SVHN, respectively. Based on these results, our AED-PADA consistently outperforms the state-of-the-art adversarial detection methods across all the testing scenarios, demonstrating superior generalization ability of our method against the adversarial perturbations with unseen perturbation magnitudes.
4.3. Effectiveness of AD clustering and PADs Selection
Selecting multiple adversarial attacks from the same cluster leads to redundancy, as these candidate attacks are quite similar. The combination of them covers a small feature space, causing poor generalization. On the other hand, choosing from different clusters effectively avoids this issue. Consequently, as shown in Table 9 and Table 10, we set up two groups of experiments, i.e., selecting ADs from Same Cluster and Cross Cluster, to verify the effectiveness of the AD Clustering and PADs Selection of AED-PADA, against the 7 unseen attacks. ‘Same Cluster’ and ‘Cross Cluster’ respectively represent that the ADs employed as the sources domains are selected from the same AD cluster and different AD clusters.
Table 9 presents the generalization performances on CIFAR-10 and ResNet-18. The result of AD clustering is {FGSM, PGD, DIM, MI-FGSM, SI-NI-FGSM}-{BIM, ILA, YA-ILA}-{C&W, DeepFool}. For ‘Same Cluster’, the best, worst and mean values of averaged results are 90.444%, 93.494%, and 92.150%, respectively. For ‘Cross Cluster’, the best, worst, and mean values are 94.694%, 92.429%, and 93.892%, respectively. Similarly, Table 10 shows the results on CIFAR-10 and VGG-16. The result of AD clustering is {BIM, ILA, YA-ILA, PGD}-{DIM, MI-FGSM, FGSM}-{CW, DeepFool, SI-NI-FGSM}. For ‘Same Cluster’, the best, worst and mean values of the averaged results are 87.132%, 88.019%, and 87.461%, respectively. For ‘Cross Cluster’, the best, worst, and mean values of the averaged values are 88.264%, 87.406%, and 87.818%, respectively.
Origin | Detector | Averaged Accuracy (%) | CEFS() |
Same Cluster | FGSM+SI-NI-FGSM+PGD | 90.444 | - |
BIM+ILA+YA-ILA | 92.513 | - | |
MI-FGSM+SI-NI-FGSM+DIM | 93.494 | - | |
Cross Cluster | DIM+BIM+C&W | 94.694 | 80.856 |
MI-FGSM+BIM+C&W | 94.621 | 72.625 | |
SI-NI-FGSM+BIM+C&W | 94.624 | 58.649 | |
FGSM+BIM+C&W | 94.600 | 58.543 | |
PGD+BIM+C&W | 94.222 | 50.669 | |
FGSM+BIM+DeepFool | 93.407 | 50.142 | |
FGSM+ILA+DeepFool | 92.539 | 49.818 | |
PGD+ILA+DeepFool | 92.429 | 48.015 |
Origin | Detector | Averaged Accuracy (%) | CEFS() |
Same Cluster | BIM+ILA+YA-ILA | 87.389 | - |
BIM+ILA+PGD | 87.153 | - | |
BIM+YA-ILA+PGD | 87.132 | - | |
ILA+YA-ILA+PGD | 87.439 | - | |
DIM+MIM+FGSM | 87.634 | - | |
CW+DeepFool+SI-NI-FGSM | 88.019 | - | |
Cross Cluster | ILA+DIM+CW | 88.264 | 18.692 |
PGD+DIM+CW | 88.236 | 18.605 | |
ILA+FGSM+CW | 87.969 | 18.532 | |
YA-ILA+FGSM+CW | 87.897 | 18.527 | |
YA-ILA+MI-FGSM+SI-NI-FGSM | 87.670 | 18.518 | |
BIM+DIM+SI-NI-FGSM | 87.676 | 18.410 | |
YA-ILA+FGSM+SI-NI-FGSM | 87.425 | 18.388 | |
YA-ILA+DIM+SI-NI-FGSM | 87.406 | 18.385 |
Based on the results, we can obtain three observations. Firstly, the mean results of ‘Cross Cluster’ selection is higher than of ‘Same Cluster’ selection, and the best result of ‘Cross Cluster’ selection is also superior. Specially, On the setting of CIFAR-10 and ResNet-18, the mean results of ‘Cross Cluster’ selection is even higher than of the best results of ‘Same Cluster’ selection. This verifies the effectiveness of our AD Clustering. Apparently, source domains from ‘Cross Cluster’ can certainly enhance the generalization ability of the detection methods.
Secondly, although ‘Cross Cluster’ selection in general achieves better results, it cannot guarantee that the randomly selected ADs (from ‘Cross Cluster’s) always give better performance than the results of ‘Same Cluster’ selected ADs. When the PADs with a very low CEFS score are employed as the source domains for detection, the generalization performance may be worse than that of ‘Same Cluster’ selection. This observation further verifies the effectiveness of our PADs Selection.
Thirdly, as shown in the ‘Cross Cluster’ part of table 9 and table 10, for the majority of results, a higher CEFS value of PADs induces a higher averaged accuracy of the proposed detection. This observation suggests that the proposed CEFS is a proper guide for selecting PADs, i.e., a higher CEFS score indicates that the corresponding PADs can cover a larger proportion of the entire feature space, and the PADA stage, which uses the PADs as source domains, gives a better generalization performance.
4.4. The effect of different numbers of clusters
Here, we apply AD Clustering to 10 adversarial attack methods in the training set and present the impact of different cluster numbers (the number of adversarial attacks in PADs). As depicted in Table 11, the detection performance increase when increases, until a certain value of is achieved. This trend is attributed to the utilization of an increased number of ADs as the source domains, which undeniably provides larger coverage of the entire feature space. Note that the values with represent the generalization performance of our AED-PADA, in which the number of clusters is automatically determined based on the CH score, and the bolded values represent the best generalization performance.
As can be observed, our automatic CH score based method can yield superior performance on SVHN with VGG-16 being the backbone, and give comparable performance to the best result on other settings. Although the best result with manually selecting exceeds our automatic method for 0.2%-0.3% in terms of the averaged accuracy, it requires more parameters and gives a slower speed. Moreover, in real-world scenarios, the adversarial attacks in the training set may far exceed 10 types, and it is clearly unwise to test the detection performance of each clustering result individually. Consequently, this suggests that our method actually achieves a better balance between the training costs and generalization performances.
Dataset (Backbone) | Averaged Accuracy (%) | |||||
CIFAR-10 (ResNet-18) | 91.936 | 94.784 | 94.571 | 94.561 | 95.018 | |
CIFAR-10 (VGG-16) | 87.946 | 87.964 | 87.991 | 88.104 | 88.498 | |
SVHN (ResNet-18) | 91.497 | 91.257 | 91.726 | 94.999 | 94.801 | |
SVHN (VGG-16) | 89.183 | 89.269 | 90.056 | 90.026 | 90.713 |
Dataset | Feature Extraction | Averaged Accuracy (%) | Average | |
ResNet-18 | VGG-16 | |||
CIFAR-10 | spatial | 93.411 | 88.251 | 90.831 |
freq | 93.379 | 86.755 | 90.067 | |
spatial + freq | 94.694 | 88.264 | 91.479 | |
SVHN | spatial | 94.612 | 89.721 | 92.166 |
freq | 88.157 | 91.580 | 89.869 | |
spatial + freq | 94.344 | 91.219 | 92.781 |
4.5. Effectiveness of the adversarial feature enhancement in PADA
In this experiment, we utilize three distinct feature extractions, i.e., spatial feature extraction, frequency feature extraction, and the adversarial feature enhancement (spatial + freq) in our framework. Still 7 test unseen attacks are employed and the averaged results are reported in Table 12. As can be observed, in both datasets, the average results of our adversarial feature enhancement(spatial + freq) exhibit superior performances than both the spatial feature extraction and frequency feature extraction. This indicates that using adversarial feature enhancement as the feature extraction method for PADA is effective. Furthermore, whether it is spatial feature extraction or frequency feature extraction, the performance of single domain feature extraction across different backbones is inconsistent, as evidenced by the fluctuated averaged detection performance across different backbones on SVHN. Our adversarial feature enhancement, which combines both spatial and frequency aspects, can more comprehensively capture adversarial perturbation signals, yielding superior adversarial detection performance.
4.6. The compatibility with different MUDA methods
To demonstrate the compatibility of our framework with the existing Multi-source Unsupervised Domain Adaptation (MUDA) methods, we employ four widely used Multi-source Unsupervised Domain Adaptation (MUDA) methods, M3DA (moment-mda, ), DARN (DARN, ), MDMN (mdmn, ) and MFSAN (aaai-mda, ). All four MUDA methods have utilized the adversarial feature enhancement as the feature extraction component. Table 13 presents that the performances of employing all four MUDA methods can surpass the existing SOTA adversarial detection methods. This verifies that our framework possesses excellent compatibility with existing MUDA methods. In other words, The first exploration of MUDA methods in adversarial example detection has proven to be successful, and Principal Adversarial Domain Adaptation can effectively transfer the knowledge from PADs to the unseen target domain. Besides, since the average result of MFSAN outperforms other MUDA methods, we select MFSAN as the basic MUDA component in our PADA.
Dataset | Backbone | Averaged Accuracy of different MUDA methods (%) | |||
M3DA (moment-mda, ) | DARN (DARN, ) | MDMN (mdmn, ) | MFSAN (Ours) (aaai-mda, ) | ||
CIFAR-10 | ResNet-18 | 94.976 | 93.948 | 93.832 | 94.694 |
CIFAR-10 | VGG-16 | 87.819 | 87.624 | 91.463 | 88.864 |
SVHN | ResNet-18 | 93.892 | 93.934 | 93.888 | 94.344 |
SVHN | VGG-16 | 90.960 | 92.244 | 89.826 | 91.219 |
Average | 91.912 | 91.938 | 92.252 | 92.280 |
4.7. Parameters sensitivity.
We utilize MFSAN as the basic MUDA method in the PADA stage of our framework, and control the importance of and , respectively. To study the parameters sensitivity, we sample the values of and from {0.5, 1.0, 2.0}, and perform the experiments under the settings of CIFAR-10 and ResNet-18.
Averaged Accuracy (%) | ||
=0.5 | =0.5 | 94.503 |
=0.5 | =1.0 | 94.591 |
=0.5 | =2.0 | 94.355 |
=1.0 | =0.5 | 94.646 |
=1.0 | =1.0 | 94.694 |
=1.0 | =2.0 | 94.603 |
=2.0 | =0.5 | 94.598 |
=2.0 | =1.0 | 94.629 |
=2.0 | =2.0 | 94.223 |
Table 14 indicates that our proposed method maintains high performance consistency under various parameter settings when employing MFSAN as the basic MUDA method. The best averaged accuracy is 94.694% when . Variations of the parameters typically do not induce the fluctuations in the detection performance. The mean value and the standard deviation of all the averaged accuracies are 94.538% and 0.154%, respectively. This indicates that our PADA framework is effective and robust, since it shows insensitivity to different parameters and consistently offers high detection accuracy.
4.8. The computational costs
Cost | Detector | |||||
LID (lid, ) | MD (md, ) | Steg (steg, ) | SID (SID, ) | SA (txt_advdetection, ) | Ours | |
Time (ms/image) | 8.13 | 18.05 | 23.85 | 2.16 | 1.21 | 0.60 |
Hardware (MB) | 7458 | 7248 | 1614 | 1592 | 8564 | 6124 |
Table 15 presents the time and hardware costs when deploying the state-of-the-art adversarial example detection methods and our AED-PADA. The results indicate that our method has the minimal time expenses and moderate hardware (GPU memory) expenses under the same settings. All experiments are conducted on an NVIDIA GeForce RTX 3080Ti GPU under CIFAR-10 and ResNet-18 settings.
Furthermore, our AED-PADA is capable of deployment in real-world scenarios. Real-world scenarios demand the adversarial example detection methods not only possess the capability for real-time detection but also maintain robust detection performance in the environments where the adversarial attacks, datasets, and backbones during testing are entirely unseen. Firstly, we only train once, ready for real-world deployment without retraining for new unseen attacks. We have the shortest inference time, which is the best real-time detection performance. Secondly, we train from earlier attacks and test on more advanced attacks. Table 3 demonstrates that under this real-world-aligned setting, our AED-PADA exhibits superior generalization performance, and would maintain considerable detection capabilities against future unseen attacks. Lastly, with its strong performance across various backbones and datasets depicted in the Table 6, our proposed method performs better in the environments with unseen datasets and backbones. Consequently, our AED-PADA is well-suited for practical applications in the complex real-world environments.
5. Conclusion
In this paper, we proposed a novel and effective adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation, to improve the generalization ability of adversarial detection. Specifically, AED-PADA contains two stages, i.e., Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In PADI, we acquired ADs from scratch and constructed PADs as the source domains for PADA. In PADA, we proposed an adversarial feature enhancement based Multi-source Unsupervised Domain Adaptation framework, which is compatible with various existing MUDA methods, to effectively leverage PADs to achieve adversarial example detection. Experimental results demonstrated the superiority of our work, compared to the state-of-the-art detection methods.
Acknowledgements.
This work was supported in part by the National Natural Science Foundation of China under Grant 62272020, U20B2069 and 62176253, in part by the State Key Laboratory of Complex & Critical Software Environment under Grant SKLSDE2023ZX-16, and in part by the Fundamental Research Funds for Central Universities.References
- [1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- [2] Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.
- [3] Mengte Shi, Sheng Li, Zhaoxia Yin, Xinpeng Zhang, and Zhenxing Qian. On generating jpeg adversarial images. In International Conference on Multimedia and Expo, pages 1–6, 2021.
- [4] Cheng Luo, Qinliang Lin, Weicheng Xie, Bizhu Wu, Jinheng Xie, and Linlin Shen. Frequency-driven imperceptible adversarial attack on semantic similarity. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15294–15303, 2022.
- [5] Chiu Wai Yan, Tsz-Him Cheung, and Dit-Yan Yeung. ILA-DA: improving transferability of intermediate level attack with data augmentation. In International Conference on Learning Representations, 2023.
- [6] Yikun Xu, Xingxing Wei, Pengwen Dai, and Xiaochun Cao. A2sc: Adversarial attacks on subspace clustering. ACM Transactions on Multimedia Computing, Communications and Applications, 19(6):1–23, 2023.
- [7] Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, and Yi Ren. Targeted attack on deep rl-based autonomous driving with learned visual patterns. In International Conference on Robotics and Automation, pages 10571–10577, 2022.
- [8] Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, and Feng Lu. Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110:107332, 2021.
- [9] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- [10] Pengcheng Li, Jinfeng Yi, Bowen Zhou, and Lijun Zhang. Improving the robustness of deep neural networks via adversarial training with triplet loss. In International Joint Conference on Artificial Intelligence, pages 2909–2915, 2019.
- [11] Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and Quoc V. Le. Adversarial examples improve image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 816–825, 2020.
- [12] Shangxi Wu, Jitao Sang, Kaiyan Xu, Guanhua Zheng, and Changsheng Xu. Adaptive adversarial logits pairing. ACM Transactions on Multimedia Computing, Communications and Applications, 20(2):56:1–56:16, 2024.
- [13] Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi N. R. Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations, 2018.
- [14] Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.
- [15] Jiayang Liu, Weiming Zhang, Yiwei Zhang, Dongdong Hou, Yujia Liu, Hongyue Zha, and Nenghai Yu. Detection based defense against adversarial examples from the steganalysis point of view. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4825–4834, 2019.
- [16] Jinyu Tian, Jiantao Zhou, Yuanman Li, and Jia Duan. Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In AAAI Conference on Artificial Intelligence, pages 9877–9885, 2021.
- [17] Yulong Wang, Tianxiang Li, Shenghong Li, Xin Yuan, and Wei Ni. New adversarial image detection based on sentiment analysis. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2023.
- [18] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In International Conference on Learning Representations Workshop, 2017.
- [19] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- [20] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
- [21] Leo Schwinn, René Raab, An Nguyen, Dario Zanca, and Bjoern Eskofier. Exploring misclassifications of robust neural networks to enhance adversarial attacks. Applied Intelligence, pages 1–17, 2023.
- [22] Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
- [23] Dongyu Meng and Hao Chen. Magnet: A two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, pages 135–147, 2017.
- [24] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging Generative Models to Understand and Defend Against Adversarial Examples. In International Conference on Learning Representations, 2018.
- [25] Jonathan Aigrain and Marcin Detyniecki. Detecting adversarial examples and other misclassifications in neural networks by introspection. In International Conference on Learning Representations, 2019.
- [26] Philip Sperl, Ching-Yu Kao, Peng Chen, and Konstantin Böttinger. Dla: Dense-layer-analysis for adversarial example detection. In IEEE European Symposium on Security and Privacy, pages 198–215, 2019.
- [27] Fei Zuo and Qiang Zeng. Exploiting the sensitivity of L2 adversarial examples to erase-and-restore. In ACM Asia Conference on Computer and Communications Security, pages 40–51, 2021.
- [28] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280, 2017.
- [29] Xin Li and Fuxin Li. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. In IEEE/CVF International Conference on Computer Vision, pages 5775–5783, 2017.
- [30] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410, 2017.
- [31] Liang Zhao, Zhikui Chen, Laurence T. Yang, M. Jamal Deen, and Z. Jane Wang. Deep semantic mapping for heterogeneous multimedia transfer learning using co-occurrence data. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(1s):9:1–9:21, 2019.
- [32] Yang Yang, Yi Yang, and Heng Tao Shen. Effective transfer tagging from image to video. ACM Transactions on Multimedia Computing, Communications, and Applications, 9(2):14:1–14:20, 2013.
- [33] Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In International Conference on Machine Learning, volume 27, pages 17–36, 2012.
- [34] Feng Liu, Guangquan Zhang, and Jie Lu. Heterogeneous domain adaptation: An unsupervised approach. IEEE Transactions on Neural Networks and Learning Systems, 31(12):5588–5602, 2020.
- [35] Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, and Elisa Ricci. Unsupervised domain adaptation using feature-whitening and consensus loss. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9471–9480, 2019.
- [36] Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. Beyond sharing weights for deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):801–814, 2019.
- [37] Minghao Chen, Shuai Zhao, Haifeng Liu, and Deng Cai. Adversarial-learned loss for domain adaptation. In AAAI Conference on Artificial Intelligence, pages 3521–3528, 2020.
- [38] Yongchun Zhu, Fuzhen Zhuang, and Deqing Wang. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In AAAI Conference on Artificial Intelligence, pages 5989–5996, 2019.
- [39] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In IEEE/CVF International Conference on Computer Vision, pages 1406–1415, 2019.
- [40] Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, and Liang Lin. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3964–3973, 2018.
- [41] Junfeng Wen, Russell Greiner, and Dale Schuurmans. Domain aggregation networks for multi-source domain adaptation. In International Conference on Machine Learning, pages 10214–10224, 2020.
- [42] Geon Yeong Park and Sang Wan Lee. Information-theoretic regularization for multi-source domain adaptation. In IEEE/CVF International Conference on Computer Vision, pages 9194–9203, 2021.
- [43] Yitong Li, michael Murias, geraldine Dawson, and David E Carlson. Extracting relationships by multi-domain matching. In Advances in Neural Information Processing Systems, 2018.
- [44] Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. Multi-source distilling domain adaptation. In AAAI Conference on Artificial Intelligence, volume 34, pages 12975–12983, 2020.
- [45] Baoyao Yang and Pong C Yuen. Cross-domain visual representations via unsupervised graph alignment. In AAAI Conference on Artificial Intelligence, volume 33, pages 5613–5620, 2019.
- [46] Hang Wang, Minghao Xu, Bingbing Ni, and Wenjun Zhang. Learning to combine: Knowledge aggregation for multi-source domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), volume 12353, pages 727–744, 2020.
- [47] Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot a nd Ce Liu, and Dilip Krishnan. Supervised contrastive learning. In Advances in Neural Information Processing Systems, pages 18661–18673, 2020.
- [48] Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
- [49] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan L. Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
- [50] MacQueen and James. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pages 281–297, 1967.
- [51] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856, 2001.
- [52] Tim van Erven and Peter Harremoës. Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
- [53] Supreet Kaur Mann and Sonal Chawla. A proposed hybrid clustering algorithm using k-means and birch for cluster based cab recommender system (cbcrs). International Journal of Information Technology, 15(1):219–227, 2023.
- [54] Solomon Kullback and Richard A Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951.
- [55] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander J. Smola. A kernel two-sample test. The Journal of Machine Learning Research, 13:723–773, 2012.
- [56] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2019.
- [57] Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P. Xing. High-frequency component helps explain the generalization of convolutional neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8681–8691, 2020.
- [58] Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, and Chuang Gan. When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In Advances in Neural Information Processing Systems, pages 21480–21492, 2021.
- [59] Jessica Fridrich and Jan Kodovsky. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7(3):868–882, 2012.
- [60] Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. Learning rich features for image manipulation detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- [61] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- [62] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- [63] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
- [64] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
- [65] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
- [66] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
- [67] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2018.
- [68] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L. Yuille. Improving transferability of adversarial examples with input diversity. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2730–2739, 2019.
- [69] Qian Huang, Isay Katsman, Zeqi Gu, Horace He, Serge J. Belongie, and Ser-Nam Lim. Enhancing adversarial example transferability with an intermediate level attack. In IEEE/CVF International Conference on Computer Vision, pages 4732–4741, 2019.
- [70] Qizhang Li, Yiwen Guo, and Hao Chen. Yet another intermediate-level attack. In Proceedings of the European Conference on Computer Vision (ECCV), pages 241–257, 2020.
- [71] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020.
- [72] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, pages 2206–2216, 2020.
- [73] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
- [74] Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1924–1933, 2021.