AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation

Heqi Peng [email protected] Beihang UniversityBeijingChina , Yunhong Wang [email protected] Beihang UniversityBeijingChina , Ruijie Yang [email protected] Beihang UniversityBeijingChina , Beichen Li [email protected] Beihang UniversityBeijingChina , Rui Wang [email protected] Institute of Information Engineering, Chinese Academy of SciencesBeijingChina and Yuanfang Guo [email protected] Beihang UniversityBeijingChina

Abstract.

Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen testing adversarial examples. To address this issue, we propose a novel method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). Specifically, our approach identifies the Principal Adversarial Domains (PADs), i.e., a combination of features of the adversarial examples generated by different attacks, which possesses a large portion of the entire adversarial feature space. Subsequently, we pioneer to exploit Multi-source Unsupervised Domain Adaptation in adversarial example detection, with PADs as the source domains. Experimental results demonstrate the superior generalization ability of our proposed AED-PADA. Note that this superiority is particularly achieved in challenging scenarios characterized by employing the minimal magnitude constraint for the perturbations.

Adversarial defense, adversarial example detection, generalization ability.

^†^†copyright: acmlicensed^†^†journal: TOMM^†^†journalyear: 2024^†^†doi: 10.1145/3706061^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Security and privacy Social aspects of security and privacy^†^†ccs: Computing methodologies Computer vision

1. Introduction

Recently, Deep Neural Networks (DNNs) have been playing prominent roles in many applications. Unfortunately, considerable studies have demonstrated that DNNs can be easily deceived if certain imperceptible perturbations are introduced to their inputs (SzegedyZSBEGF13, ; NguyenYC15, ; shi2021, ; SSA, ; ILA-DA, ; a2sc, ). These perturbed inputs, also known as adversarial examples, have enforced DNNs to produce erroneous decision and become a significant security concern in safety sensitive scenarios, such as autonomous driving (attack_driving, ) and medical diagnosis (attack_medical, ).

Nowadays, adversarial training has been proved to be an effective adversarial defense strategy (FGSM, ; AD_tripletloss, ; XieTGWYL20, ; adversarial_training_tomm, ). However, it requires to possess enough knowledge about the classification models and necessitates substantial computational costs to retrain the classification models. On the contrary, adversarial example detection, a.k.a. adversarial detection, defends against adversarial attacks by distinguishing whether the inputs are benign or manipulated. The mechanism of this type of methods can be efficiently deploy to defend many applications without the requirement of extra knowledge about the core model.

Generalization ability is vital for adversarial detection methods in real-world scenarios, because these methods tend to encounter unseen attacks and they are desired to perform consistently. Current detection techniques (lid, ; md, ; steg, ; SID, ; txt_advdetection, ) typically achieve considerable generalization ability over a few conventional attacks, such as BIM (BIM, ), PGD (pgd, ), and C&W (CW, ). However, we empirically observe that these methods exhibit instability and inadequate performance against recent attacks, such as SSA (SSA, ), Jitter (jitter, ) and ILA-DA (ILA-DA, ). These methods tend to give unsatisfactory generalization performance, because their training process usually relies on a single known adversarial attack and they have not been developed from the perspective of boosting generalization ability.

In this paper, we provide a new perspective to further analyze the generalization ability of adversarial detection methods. Clearly, the threat model of all different adversarial attacks should be identical to ensure fair analysis and comparison. Thus, the features extracted by the model from different attacks are of same dimensions within a shared feature space. Due to the variations in configurations such as attack objectives, parameters, and loss functions, the features of the examples generated from each attack form a distinct domain, named Adversarial Domain (AD), as depicted in Fig. 1(a). Formally, Adversarial Domain (AD) of a particular adversarial attack is defined as the cumulative representations of all the adversarial examples, which are generated by that attack. Existing detection methods typically select a random single attack to generate the training samples, i.e., they only select one AD as the source domain, as shown in Fig. 1(b). Apparently, there are few intersections between the source and unseen target domains, which usually lead to poor generalization performance. Additionally, the randomness inherent in the selection of the source domains will induce considerable fluctuations in generalization performance.

Refer to caption — Figure 1. Schematic illustration of our proposed work. (a) represents the entire adversarial feature space as an instance, which contains 9 adversarial domains $\{A_{1},A_{2},\ldots,A_{9}\}$ . (b) represents the mechanism of existing detection methods, which usually performs the training via a single source domain to detect the examples in the unseen adversarial domain, e.g. $T~{}(T=A_{9})$ . (c) is a straightforward solution to improve generalization ability via randomly selecting multiple source domains. (d) presents the intuition behind our work. We construct PADs, which possess a larger coverage of the entire feature space, to create more potential overlaps with the target domain. The strategy is designed to significantly enhance the detection generalization ability.

A straightforward solution to the above issues, as shown in Fig. 1(c), is to randomly select multiple ADs as the source domains. This strategy tends to create a larger overlap with the target domain(s) and thereby improves the generalization performance of adversarial detection. Nonetheless, this strategy also induces uncertainty, and the selected similar ADs incur additional training costs without enough performance gains.

To further address the aforementioned problems, we propose a novel detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). As shown in Fig. 1(d), by selecting Principal Adversarial Domains (PADs) as the source domains, which significantly enlarges the coverage of the entire adversarial feature space and creates larger overlap with the target domain, we can offer superior generalization performance.

Specifically, AED-PADA contains two stages, i.e., Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In the stage of PADI, since the discrepancies between the adversarial examples from various adversarial attacks are quite different compared to these of the ordinary classification tasks, we exploit adversarial supervised contrastive learning(Adv-SCL) to construct distinguishable ADs. Then, the selection of the most representative ADs must meet two key criteria. Firstly, there should be a clear distinction between candidate ADs to avoid redundancy caused by selecting similar ADs. Secondly, the combination of the candidate ADs should cover as much of the entire feature space as possible. To select the most representative ADs, we propose a Coverage of Entire Feature Space (CEFS) metric. With our CEFS metric, the formed PADs possess broad coverage of the entire feature space, and thus effectively improve the likelihood of capturing the location of the unseen target AD(s).

In the stage of PADA, we pioneer to exploit the mechanism of Multi-source Unsupervised Domain Adaptation (MUDA) to effectively utilize the rich knowledge acquired by PADs, to detect the unseen adversarial examples in the target domain. The framework of PADA is compatible with various existing MUDA methods. Since typical MUDAs only focus on extracting the semantic features from the spatial domain, we propose an adversarial feature enhancement module to extract features from both the spatial and frequency domains to construct a more comprehensive representation of adversarial examples.

Our major contributions can be summarized as follows.

•

We propose a novel adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation, to significantly improve the generalization performance of adversarial detection.
•

We propose Principal Adversarial Domains Identification to identify the PADs, which possess a large coverage of the entire adversarial example feature space, with the help of the constructed AD clustering and proposed CEFS metric.
•

We propose Principal Adversarial Domain Adaptation for detecting adversarial examples, by exploiting adversarial feature enhancement based Multi-source Unsupervised Domain Adaptation (MUDA), which is compatible with various existing MUDA methods. To the best of our knowledge, this is the first work to exploit MUDA for adversarial example detection.

2. Related Work

2.1. Adversarial Example Detection

The majority of existing adversarial example detection methods rely on statistical features (HendrycksG17, ; magnet, ; FarAwayDis1, ; FarAwayDis2, ; Hidden4, ; Erase-and-Restore, ). They usually assume that the benign and adversarial examples originate from different distributions, and construct detectors based on the distinct statistical characteristics of these examples. Specifically, Grosse et al. (STest1, ) utilize Maximum Mean Discrepancy for the adversarial example detection. Li et al. employ Principal Component Analysis (PCA) to extract statistical feature, and construct a cascade classifier based on Support Vector Machines (SVMs) (PCAInconsistency, ). Feinman et al. (KDBU, ) carry out the detection based on Kernel Density (KD) and Bayesian-Uncertainty (BU) estimation. Ma et al. (lid, ) exploit the concept of Local Intrinsic Dimensionality (LID) to calculate the distance between the distribution of inputs and their neighbors. Lee et al. (md, ) utilize Guassian Discriminant Analysis (GDA) to model the difference between the benign and adversarial samples, and differentiate them based on Mahalanobis Distance (MD). Liu et al. (steg, ) point out that steganalysis could be applied to adversarial example detection, and propose a steganalysis-based detection method (Steg). Tian et al. (SID, ) reveal the inconsistency in the boundary fluctuations between the adversarial and benign examples, and construct Sensitivity Inconsistency Detector (SID) to identify the adversarial examples. Wang et al. (txt_advdetection, ) embed hidden-layer feature maps of DNNs into word vectors, and detect adversarial examples via Sentiment Analysis (SA).

Existing adversarial detection methods exhibit poor generalization because their training typically depends on a single known attack which vastly differs from unseen test attacks. In this paper, we propose a novel adversarial detection method, which can substantially increase the coverage of the entire adversarial feature space and create larger overlap with the test adversarial attacks.

2.2. Multi-source Unsupervised Domain Adaptation

Transfer learning (transfer_learning_survey1, ; transfer_learning_survey2, ; transfer_learning_survey3, ) is a deep learning technique which leverages knowledge acquired from the source task(s) to improve learning efficiency and performance on a related but different target task. Unsupervised domain adaptation (UDA) (unsuperviesed_domain_adaptation1, ) is a type of popular method in transfer learning which aims to migrate knowledge learned from the labeled source domain(s) to the target domain, where only unlabeled target data are available for training. Single-source Unsupervised Domain Adaptation (SUDA) (single-uda1, ; single-uda2, ; single-uda3, ) is widely explored in the previous research, which can transfer knowledge from one single source to one target domain. Compared to SUDA, Multi-source Unsupervised Domain Adaptation (MUDA) acquires richer information while introduces a new challenge, i.e., how to effectively bridge the domain gaps between all source domains and the target domain.

Various distribution alignment schemes have been proposed to achieve alignment between source and target domains. For example, Multiple Feature Spaces Adaptation Network (MFSAN) (aaai-mda, ) leverages Maximum Mean Discrepancy (MMD) to align the distributions of each pair of source and target domains in multiple specific feature spaces and aligns the outputs of classifiers by utilizing the domain-specific decision boundaries. Peng et al. (moment-mda, ) provide new theoretical insights specifically for moment matching to align the sources with each other and with the target. Owing to the development of generative adversarial networks, adversarial learning is widely used to find a domain-invariant feature space. It either focuses on approximating all combinations of pairwise domain discrepancies between each source and the target (cocktail, ; DARN, ) or uses a single domain discriminator (MIAN, ). Other explicit measures of discrepancy, such as Wasserstein distance (mdmn, ; Wasserstein, ), are also employed in MUDA to align the distribution of features. In addition to distribution alignment, the graph-matching metric (graph-muda1, ; graph-muda2, ) also considers the structural and geometric information, which achieves the alignment between the source and target domains by mapping both nodes and edges in a graph.

In this paper, we argue that the poor generalization performance of adversarial detection is due to the significant discrepancy between the source domains utilized for training and the target domain utilized for testing. Therefore, based on the MUDA approach, we propose a viable solution, Principal Adversarial Domain Adaptation, to reduce this great gap.

3. Methodology

To improve the generalization ability of adversarial example detection, we propose a novel adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation (AED-PADA). The entire framework of our AED-PADA is shown in Fig. 2. AED-PADA contains two stages, Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In the stage of PADI, we first incorporate adversarial supervised contrastive learning(Adv-SCL) to acquire distinguishable ADs. Then, we construct AD clustering to group ADs into different clusters. By proposing the Coverage of Entire Feature Space (CEFS) metric, we select the most representative ADs from each cluster to form PADs. In the stage of PADA, we propose an adversarial feature enhancement method based on the original MUDA method to effectively leverage PADs to detect the unseen adversarial attack methods. Note that Secs. 3.2, 3.3, 3.4 and 3.5 introduce PADI while Sec. 3.6 presents PADA.

3.1. Notations

Suppose we have a labeled $C-$ class classification dataset $\mathbb{D}=\{(x_{i},y_{i})\}_{i=1}^{N}$ with $N$ samples, where the label $y\in\{1,2,\ldots,C\}$ . A classifier $f$ is trained on $\mathbb{D}$ to classify an input sample into one of the $C$ classes, $f(x)\rightarrow\mathbb{Z}_{C}$ . Adversarial attack $\psi$ aims to fool $f$ into assigning incorrect labels via generating adversarial examples. We have a set of adversarial attack methods, $\Psi=\{\psi^{m}\}_{m=1}^{M}$ , which consists of $M$ distinct adversarial attack methods. $\mathbb{D}^{m}=\{(x_{i}^{m},y_{i}^{m})\}_{i=1}^{N}$ is the adversarial dataset generated by the $m$ -th adversarial attack method $\psi^{m}$ , where $x_{i}^{m}$ denotes the $i$ -th adversarial example generated by the $m$ -th adversarial attack $\psi^{m}$ , and $y_{i}^{m}$ presents its corresponding prediction label. We define $\varphi$ as, $\varphi(x_{i}^{m})=\psi^{m}$ , which is a mapping from an adversarial example to its attack method. Consequently, we construct a set of adversarial examples $\mathcal{D}$ based on $\mathbb{D}$ , which comprises $M$ types of adversarial examples, $\mathcal{D}=\{\mathbb{D}^{m}\}_{m=1}^{M}$ .

3.2. Adversarial Domain Acquisition

Typically, when we extract the features of the adversarial examples, which are generated from different untargeted attacks via common CNNs, the features tend to spread in an indistinguishable manner in the feature space. To acquire distinguishable representations of adversarial examples from different attacks, we exploit adversarial supervised contrastive learning (Adv-SCL) to extract features. Then, we form the Adversarial Domains (ADs), each of which is defined as the representations of all the adversarial examples generated from a particular adversarial attack.

Based on the supervised contrastive learning (SCL) method (KhoslaTWSTIMLK20, ), Adv-SCL neglects the classification result of the adversarial examples and focuses sorely on identifying their generation methods. Specifically, each adversarial example $x_{i}^{m}$ in $\mathcal{D}$ is characterized by the method used to generate it, i.e., the adversarial attack method $\psi^{m}$ . The $\psi^{m}$ of the adversarial example $x_{i}^{m}$ serves as a key discriminant for determining whether different $x_{i}^{m}$ are positive or negative samples. Here, a pair of examples from the same attack are considered as positive, while those from different attacks are considered as negative. This learning strategy amplifies the dissimilarities across examples from various attacks and generates a more appropriate representation.

As shown in Fig. 2(a), the input of AD acquisition is the adversarial example set $\mathcal{D}$ . Adv-SCL consists of an Encoder Network ( $Enc$ ) and a Projection Network ( $Proj$ ). $Enc(\cdot)$ extracts a feature vector from the input adversarial example $x_{i}^{m}$ , and $Proj(\cdot)$ further projects this representation vector to an auxiliary vector $z_{i}=Proj(Enc(x_{i}^{m}))$ , which will be discarded after training.

(transforming1, ; transform2, ) investigate that transformation strategies, such as cropping and rescaling, bit-depth reduction, JPEG compression, and randomization, have been used to defend against adversarial examples. Therefore, in order to prevent any potential ineffectiveness of adversarial samples caused by transformations, we only use normalization ( $Norm$ ) as the data augmentation operation in the stage of AD Acquisition.

We randomly sample $n$ example-label pairs in $\mathcal{D}$ , $\{(x_{k}^{m_{k}},y_{k}^{m_{k}})\}_{k=1}^{n}$ , where $m_{k}$ is the adversarial attack of the $k$ -th adversarial example. The corresponding batch employed for training consists of $2n$ pairs, $\{(\tilde{x}_{l},\tilde{y}_{l})\}_{l=1}^{2n}$ , where $\tilde{x}_{2k}$ and $\tilde{x}_{2k-1}$ are two views of $x_{k}^{m_{k}}$ , and they are from the same adversarial attack method $\psi^{m_{k}}$ . The loss function of Adv-SCL is defined as,

(1)

\mathcal{L}^{sup}=\sum_{i\in I}\mathcal{L}_{i}^{sup},

(2)

\mathcal{L}_{i}^{sup}=\frac{-1}{|P(i)|}\sum_{p\in P(i)}\log\frac{\exp(z_{i}\cdot z_{p}/\tau)}{\sum_{a\in A(i)}\exp(z_{i}\cdot z_{a}/\tau)}.

Here, $i\in I=\{1,\ldots,2n\}$ denotes the index of the augmented samples. $A(i)$ is a subset of $I$ which includes all indexes except $i$ . $P(i)=\{p\in A(i):\varphi(\tilde{x}_{p})=\varphi(\tilde{x}_{i})\}$ represents the indices of positive samples in the batch except $i$ , and $|P(i)|$ stands for its cardinality. $\cdot$ denotes the dot product. $\tau$ is the temperature parameter which scales the similarity values.

3.3. Adversarial Domain Clustering

After the acquisition of ADs, we obtain $\mathbb{H}^{m}=Enc(\mathbb{D}^{m})$ as AD of the corresponding adversarial attack $\psi^{m}$ . For $\mathcal{D}$ , we construct a set of ADs, $\mathcal{H}=\{\mathbb{H}^{m}\}_{m=1}^{M}$ . Since different ADs tend to distribute differently in the feature space and many of them possess various portions of overlaps, it is quite difficult to directly select the most representative ADs from scratch. Then, it is vital to explore the similarities among different ADs.

To address this issue, we intend to perform the selection via two steps, i.e., clustering to assess the similarities of ADs and selecting the most representative ADs from each cluster. Then, we construct a viable solution named Adversarial Domain Clustering, as shown in Fig. 2(a). This strategy groups ADs into different clusters, by ensuring that the similarity among ADs within the same cluster is maximized, while the similarity among ADs across different clusters is minimized. AD clustering avoids the redundancy and additional costs by preventing the repeated selection of similar attack methods, and it facilitates the selection of the most representative ADs in the subsequent steps.

Since the samples to be clustered here are collections of features, rather than individual data points, traditional clustering methods such as K-Means (kmeans, ) cannot be directly applied. Since spectral clustering (NgJW01, ) only requires the similarity matrix among samples, it is utilized to construct the clustering step in our AD clustering.

For the estimation of the similarity matrix $W\in R^{M\times M}$ in spectral clustering, which represents the similarities between different ADs, we propose Adversarial Domain Similarity Measurement (ADSM) based on Jensen-Shannon divergence (JSD) (ErvenH14, ), which quantifies the similarity between two probability distributions. To compute the similarities, we transform each $\mathbb{H}^{i}$ in $\mathcal{H}$ to $\overline{\mathbb{H}^{i}}$ by converting $\mathbb{H}^{i}$ into probabilities and performing normalizations. Since a smaller value of JSD between two ADs implies a smaller discrepancy in their probability distributions, $\text{ADSM}(\mathbb{H}^{i},\mathbb{H}^{j})$ can be computed via

(3)

\text{ADSM}(\mathbb{H}^{i},\mathbb{H}^{j})=\frac{1}{\text{JSD}(\overline{\mathbb{H}^{i}},\overline{\mathbb{H}^{j}})}.

By letting the element $W_{i,j}$ at the $i$ -th row and $j$ -th column refer to the similarity between $\mathbb{H}^{i}$ and $\mathbb{H}^{j}$ , $W_{i,j}$ can be calculated by

(4)

W_{i,j}=\left\{\begin{array}[]{cc}\text{ADSM}(\mathbb{H}^{i},\mathbb{H}^{j})&i\neq j\\ 0&i=j\end{array}.\right.

Since the spectral clustering cannot automatically determine the optimal number of clusters, the Calinski-Harabasz score (CH score) (mann2023proposed, ), which requires no knowledge of the cluster shape, is utilized to evaluate the clustering performance and estimate the optimal number of clusters. Note that it measures both the within-cluster and between-cluster distances, thereby offering a more comprehensive view of the clustering performance. CH score is calculated by

(5)

\text{CH}(K)=\frac{Tr(BC_{K})/(K-1)}{Tr(WC_{K})/(N-K)},

where $K$ and $N$ is the number of clusters and data respectively, $BC_{K}$ denotes between-cluster covariance matrix, $WC_{K}$ denotes within-cluster covariance matrix, and $Tr(\cdot)$ denotes the trace of the matrix.

Higher value of $\text{CH}(K)$ indicates better clustering. We posit that the inherent structure of the entire feature space composed of different ADs is highly complex. Given that the CH score often awards the highest evaluation when cluster numbers $K=2$ , we choose to commence our consideration from the scenario where $K=3$ in this paper. With the help of the CH score, our AD clustering can automatically group the ADs into optimal number of clusters.

3.4. Principal Adversarial Domains Selection

After similar ADs are clustered, then the selection step can be performed. To form the most effective Principal Adversarial Domains (PADs), it is vital to select appropriate ADs from different clusters. Thus, we propose the Coverage of Entire Feature Space metric (CEFS) to guide the PADs selection process.

CEFS is a ratio-based metric which contains two aspects, Intra-Domain Dispersion (IDD) and Discrepancy between Adversarial Domains (DAD). IDD represents the dispersion among features within each AD, where a higher value indicates a larger coverage of the feature space. For any AD, $\mathbb{H}^{i}=\{{h_{1}^{i}},{h_{2}^{i}},\ldots,{h_{v}^{i}}\}\in R^{v\times d}$ , where $v$ and $d$ denote the number and dimension of features in $\mathbb{H}^{i}$ , respectively, IDD can be computed by

(6)

\text{IDD}(x)=\frac{1}{v}\sum_{i=1}^{v}\left\|\frac{x_{i}}{\|x_{i}\|_{2}}-\frac{1}{v}\sum_{j=1}^{v}\frac{x_{j}}{\|x_{j}\|_{2}}\right\|_{2}.

DAD represents the discrepancies between the two selected ADs, this discrepancy can be quantified using a distance metric $Dist(\cdot,\cdot)$ , such as Kullback-Leibler divergence (kl-divergence, ) or Maximum Mean Discrepancy (MMD) (mmd, ). A lower DAD value indicates a greater similarity between the two selected ADs. DAD can be calculated as,

(7)

\text{DAD}(\mathbb{H}^{i},\mathbb{H}^{j})=Dist(\overline{\mathbb{H}^{i}},\overline{\mathbb{H}^{j}}).

Then, CEFS can be obtained via

(8)

\text{CEFS}=\frac{\sum_{i=1}^{K}\text{IDD}(\mathbb{H}^{i})}{\text{DAD}((\mathbb{H}^{1}\parallel\cdots\parallel\mathbb{H}^{K}),(\mathbb{H}^{1}\parallel\cdots\parallel\mathbb{H}^{M}))},

where $K$ and $M$ is the number of ADs in PADs and the number of ADs in the AD set $\mathcal{H}$ respectively, and $\parallel$ denotes a concatenation operation.

CEFS is a ratio-based metric to quantify the coverage of selected ADs within the entire feature space (EFS). In Eq. (8), the numerator represents Intra-Domain Dispersion (IDD), indicates feature dispersion within each AD. The denominator measures the discrepancy between the selected ADs and EFS. As CEFS increases, the numerator grows, indicating a larger feature space for each AD, and the denominator decreases, suggesting a greater similarity between the selected ADs and the EFS. Consequently, the selected ADs have a larger coverage of the EFS, increasing the likelihood of capturing unseen ADs. We utilize CEFS to select PADs, which can give a larger coverage of the feature space with the same number of ADs. PADs actually enhance the probability of capturing the location of unseen ADs, thereby improving the generalization performance.

3.5. Training process of Principal Adversarial Domains Identification

The process of Principal Adversarial Domains Identification (PADI) consists of Adversarial Domain Acquisition (AD Acquisition), Adversarial Domain Clustering (AD Clustering) and Principal Adversarial Domains Selection (PADs Selection). The training process of Principal Adversarial Domains Identification is described in Algorithm 1.

Algorithm 1 Training process of Principal Adversarial Domains Identification

\mathcal{D}_{acq},\mathcal{D}_{clu}

2:Principal Adversarial Domains (PADs).

\triangleright

Adversarial Domain Acquisition

4:for

x_{i}^{m_{i}}

\mathbf{in}

\mathcal{D}_{acq}

5: Augment

x_{i}^{m_{i}}

to be

(\tilde{x}_{2i-1},\tilde{x}_{2i})

with augmentation

Norm

z_{i}=Enc(Proj(\tilde{x}_{i}))

7: Get

\mathbf{Enc}

\mathbf{Proj}

via minimizing

\mathcal{L}

in Eq. (1)

8:end for

\triangleright

Adversarial Domain Clustering

10:Generate the AD pool

\mathcal{H}_{clu}

based on

\mathcal{D}_{clu}

11:Generate the similarity matrix

W

by Eq. (3)

12:for

k

\mathbf{in}

[3,

M

] do

13:

r_{k}=SC(W,k)

\triangleright

r_{k}

is the clustering result via Spectral clustering.

14: Get

\text{CH}(k)

by Eq. (5)

15:end for

16:

K=\operatorname*{arg\,max}_{k}(\text{CH}(k))

\triangleright

K

is the best number of clusters.

17:

r_{K}=SC(W,K)=\{clu_{i}\}_{i=1}^{K}

\triangleright

r_{K}

is the best clustering result which contains

K

clusters.

18:

\triangleright

Principal Adversarial Domains Selection

19:

C=\Pi_{i=1}^{K}n_{i}

\triangleright

n_{i}

is the number of adversarial attacks in

clu_{i}

, and

\Sigma_{i=1}^{K}n_{i}=M.

20:Generate combination list

L=\{l_{i}\}_{i=1}^{C}

\triangleright

L

is the list of all combinations in

r_{K}

, each

L_{i}

has

K

types of attacks.

21:for

l_{i}

\mathbf{in}

L

22: Generate

\mathcal{H}_{i}=\{\mathbb{H}_{i}^{j}\}_{j=1}^{K}

based on

l_{i}

\triangleright

\mathcal{H}_{i}

is the list of ADs based on

l_{i}

23: Get CEFS(

\mathcal{H}_{i}

) by Eq. (8)

24:end for

25:

\text{PADs}=\operatorname*{arg\,max}_{l_{i}}(\text{CEFS}(\mathcal{H}_{i}))

For the original dataset $\mathbb{D}$ , we divide it into two non-overlapping datasets $\mathbb{D}_{acq}$ and $\mathbb{D}_{clu}$ , which are used for AD Acquisition and AD Clustering, respectively. The adversarial examples sets for AD Acquisition are denoted as $\mathcal{D}_{acq}$ . $\mathcal{D}_{acq}=\{\mathbb{D}_{acq}^{m}\}_{m=1}^{M},\mathbb{D}_{acq}^{m}=\{(x_{i}^{m},y_{i}^{m})\}_{i=1}^{N_{m}}$ , where $M$ is the number of adversarial attacks, $\mathbb{D}_{acq}^{m}$ is the adversarial examples set generated by the $m$ -th adversarial attack, and $N_{m}$ is the cardinality of $\mathbb{D}_{acq}^{m}$ .

Likewise, The adversarial examples sets for AD Clustering are denoted as $\mathcal{D}_{clu}$ . $\mathcal{D}_{clu}=\{\mathbb{D}_{clu}^{q}\}_{q=1}^{M}$ , $\mathbb{D}_{clu}^{q}=\{(x_{i}^{q},y_{i}^{q})\}_{i=1}^{N_{q}}$ , where $M$ is the number of adversarial attacks, $\mathbb{D}_{clu}^{q}$ is the adversarial examples set generated by the $q$ -th adversarial attack, and $N_{q}$ is the cardinality of $\mathbb{D}_{clu}^{q}$ . The rationale behind this strategy is to prevent the model after contrastive learning from overfitting to the data on AD acquisition, which leads to subpar performance in AD Clustering and PAD Confirming.

3.6. Principal Adversarial Domain Adaptation

To transfer the learned knowledge from PADs to the target domain, i.e., the adversarial examples generated from unseen methods, we propose Principal Adversarial Domain Adaptation (PADA) to detect adversarial examples, as depicted in Fig. 2(b).

Ideally, the inputs of PADA comprise source data and unseen target data. The source data, denoted as, $Src=\{Src_{i}\}_{i=1}^{K}$ , consists of an equal number of benign examples and adversarial examples, where the adversarial examples contain $K$ types of adversarial examples determined by PADs. Since the unseen target data is unavailable, we can only use the training data as a proxy. The proxy data, denoted as $PD$ , also contains an equal number of benign examples and adversarial examples, drawn from the training benign set $\mathbb{D}$ and the training adversarial set $\mathcal{D}$ , respectively. $\mathcal{D}$ contains $M$ types of adversarial examples. During the PADI stage, we select the most representative ADs, i.e., $K$ types of ADs from these $M$ types to form the PADs. Consequently, employing $\mathcal{D}$ as a proxy for unseen target data serves two key purposes. Firstly, it prevents overlap between the source data used for training and the unseen target data used for testing. Secondly, the PADA process compels the transfer of specific knowledge from PADs to the more extensive set $\mathcal{D}$ . This transfer aims to boost the generalization capabilities of networks of PADA, with the goal of enhancing their performance when testing the unseen target data.

PADA consists of three sequential components, feature extraction, feature alignment and classification. The framework of PADA is compatible with various widely used Multi-source Unsupervised Domain Adaptation (MUDA) methods (moment-mda, ; DARN, ; mdmn, ; aaai-mda, ). The experimental results indicate that our PADA possesses excellent generalization capabilities based on various existing MUDA methods. Due to the simplicity and effectiveness of MFSAN (aaai-mda, ), along with its superior detection performance compared to other MUDA methods, we select MFSAN as the basic MUDA method for our PADA.

In the feature extraction component, unfortunately, existing MUDA methods including MFSAN, only extract spatial features. (GeirhosRMBWB19, ; WangWHX20, ; FanLCZG21, ) indicate that the high-frequency component of an image plays a crucial role in the prediction of deep neural network. Adversarial perturbations are more likely to be concealed in the high-frequency information of images. To capture more comprehensive features of adversarial examples, we propose an adversarial feature enhancement (AFE) module as the feature extraction component. AFE contains both the spatial feature extraction $\mathcal{S}$ and frequency feature extraction $\mathcal{F}$ branches. For frequency feature extraction, we design perturbations extraction filters (PEF), which is a plug-and-play operation based on Spatial Rich Model (SRM) (SRM, ) to capture subtle perturbation signals hidden in the adversarial examples.

SRM typically uses 30 basic kernels to capture textures and discontinuities of images, and has been employed in image forensics to detect subtle and irregular manipulation or hidden information. Subsequent research (SRM_detection, ) indicates that in image manipulation detection, employing only three kernels can achieve considerable detection performance, and more many basic kernels does not further enhance performance. Essentially, these kernels are high-pass filters, which enhance the high-frequency signals and remove the low-frequency components of the inputs. Adversarial perturbations are typically hidden within the high-frequency information of an image. Therefore, our PEF uses the same three kernels to extract subtle perturbation signals. As shown in Fig. 3, PEF consists of three filters which are implemented by convolution kernels with fixed parameters. We set the kernel size of PEF to be $5\times 5\times 3$ , and the output channel size of PEF is 3. Both $\mathcal{S}$ and $\mathcal{F}$ employ $Enc$ from Sec. 3.2, aligned with the threat models, specifically ResNet-18 or VGG-16. Then, the enhanced adversarial features, denoted as $r(x)=[\mathcal{S}(x),\mathcal{F}(x)]$ , are fed into the next module.

In the alignment component, we assign the specific network $Q_{i}$ to map each source domain $Src_{i}$ and proxy domain $PD$ to the different feature space, and utilizes MMD as the distance metric to align them. $L_{d}$ is used to align the features between source domain and proxy domain,

(9)

L_{d}=\frac{1}{K}\sum_{i=1}^{K}\text{MMD}(Q_{i}(r(Src_{i})),Q_{i}(r(PD))),

where $r$ is the enhanced adversarial features after AFE, $K$ is the number of source domains. each source domain is associated with the corresponding network $Q_{i}$ .

In the classification component, we utilize the cross-entropy loss $L_{cls}$ to ensure the correct classification. Given that different domain-specific classifiers $C_{i}$ are trained on their respective source domains, resulting in significant discrepancies among their predictions for the same proxy domain. To address this, $L_{disc}$ is used to minimize the differences among the predictions of various classifiers.

(10)

\displaystyle L_{disc}=

\displaystyle\frac{2}{K\times(K-1)}\sum_{j=1}^{K-1}\sum_{i=j+1}^{K}\mathbb{E}_{x\sim PD}|C_{i}(Q_{i}(r(x)))-C_{j}(Q_{j}(r(x)))|.

Overall, the total loss is formulated as follow, $\lambda$ and $\gamma$ are hyperparameters used to adjust the weights of $L_{d}$ and $L_{disc}$ , respectively.

(11)

L_{total}=L_{cls}+\lambda L_{d}+\gamma L_{disc}.

4. Experiments

4.1. Experimental Setups

4.1.1. DNN backbones

We evaluate the performance of the proposed method, by employing two widely used DNN architectures, i.e., ResNet-18 (HeZRS16, ) and VGG-16 (SimonyanZ14a, ), according to (SID, ).

4.1.2. Datasets

We evaluate the performance of the proposed method on three popular datasets, CIFAR-10 (cifar10, ), SVHN (SVHN, ) and ImageNet (imagenet, ). All the images in ImageNet are resized to $224\times 224\times 3$ via pre-processing.

As shown in Table 1, each of the three datasets is divided into three category-balanced and non-overlapping subsets: Train-acq-src, Train-clu-pro and Test. The training data for AD Acquisition in the PADI stage and the source data in the PADA stage are selected from Train-acq-src. Similarly, the training data for AD Clustering in the PADI stage and the proxy data in the PADA stage are selected from Train-clu-pro.

For CIFAR-10, we divide the official CIFAR-10 training set into two halves, each of which contains 25,000 images, to form Train-acq-src and Train-clu-pro. Besides, we form Test with all the 10,000 images in the official CIFAR-10 testing set. For SVHN, we set the number of Train-acq-src and Train-clu-pro to 20,000 instead of 25,000, due to the presence of category imbalance in the official SVHN training dataset, to construct a category-balanced training dataset. Test of SVHN also consists of 10,000 images, which are randomly selected from the official SVHN testing set. For ImageNet, We divide the official ImageNet (ILSVRC2012) validation set, which consists of 50,000 images, into two parts, 40,000 images for training and 10,000 images for Test. The 40,000 training images are then equally divided into two subsets, Train-acq-src and Train-clu-pro, each of which contains 20,000 images.

Ten types of training adversarial examples are created based on the benign examples from the Train-acq-src and Train-clu-pro subsets. Subsequently, another seven attacks are applied to generate the testing adversarial examples on the Test set. These specific adversarial attack methods are detailed in Sec. 4.1.4. Note that the attack methods for training and testing are entirely different, ensuring that the training data and testing data are mutually exclusive.

4.1.3. Data splitting strategy for the training stage

Table 2 presents the data splitting strategy for the training stage of our AED-PADA. AED-PADA contains two stages during training. In the PADI stage, to improve the efficiency, we randomly select 10,000 images from each type of adversarial attacks and their corresponding benign images from the Train-acq-src for supervised contrastive learning. To alleviate the overfitting problem, we select 10,000 images per attack from Train-clu-pro for AD Clustering and PAD Confirming.

In the PADA stage, the inputs of PADA are the images from the source domains and the unseen target domains. Each source domain contains 10,000 samples, with an equal split of 5,000 benign examples and 5,000 adversarial examples, which are all randomly selected from Train-acq-src. Since the data from unseen domains is unavailable, we can only utilize the training examples as a proxy, which are generated from all the 10 types of training attacks. Specifically, the proxy domain contains 10,000 samples, i.e., 5,000 benign examples and 5,000 adversarial examples, which are all randomly selected from the Train-clu-pro. Note that the adversarial examples in the proxy domain are obtained via 10 different training attacks, with each attack providing 500 examples. This specific arrangement has two benefits. Firstly, it ensures zero overlap between the source and proxy domains during training, to avoid data leakage. Secondly, it enhances the diversity of the training data, to further benefit the generalization of the detection model.

Table 1. Datasets for training and testing.

Dataset	Train-acq-src		Train-clu-pro		Test
Dataset	benign	adv	benign	adv	benign	adv
CIFAR-10	$25,000$	$10\times 25,000$	$25,000$	$10\times 25,000$	$10,000$	$7\times 10,000$
SVHN	$20,000$	$10\times 20,000$	$20,000$	$10\times 20,000$	$10,000$	$7\times 10,000$
ImageNet	$20,000$	$10\times 20,000$	$20,000$	$10\times 20,000$	$10,000$	$7\times 10,000$

Table 2. Data splitting for the training stage of our AED-PADA.

Training stage		benign examples	adversarial examples
PADI	AD Acquisition	-	$10,000$ /attack
PADI	AD Clustering	-	$10,000$ /attack
PADA	source domain	$5,000$ /source	$5,000$ /source
PADA	proxy domain	$5,000$	$5,000$

4.1.4. Baseline adversarial attack methods

To evaluate the generalization capabilities of the detection methods, it is important to consider a diverse set of attack methods, including both the earlier and recent techniques. Here, 10 earlier attack methods are utilized to generate adversarial examples for training, including FGSM (FGSM, ), BIM (BIM, ), C&W (CW, ), DeepFool (DeepFool, ), PGD (pgd, ), MI-FGSM (MIM, ), DIM(DIM, ), ILA (ILA, ), YA-ILA (ILA++, ) and SI-NI-FGSM (si-ni-fgsm, ). 7 SOTA attack methods are employed to generate adversarial examples for testing, including APGD (APGD, ), ILA-DA (ILA-DA, ), Jitter (jitter, ), SSA (SSA, ), TI-FGSM (TIM, ), VMI-FGSM (vmifgsm, ) and VNI-FGSM (vmifgsm, ). The adversarial attacks for training and testing are entirely distinct, so the adversarial detection results on the unseen testing attacks indicate the generalization performance of our proposed detection method.

4.1.5. Baseline adversarial detection methods

We compare the generalizability of our AED-PADA with five state-of-the-art adversarial detection methods, LID (lid, ), MD (md, ), Steg (steg, ), SID (SID, ) and SA (txt_advdetection, ). These detection methods differ from ours as they employ only a single attack method for training. To ensure a fair comparison with our AED-PADA, it is necessary to consider the scenario training by multiple attacks. Consequently, we consider the following two configurations for the SOTA detection methods: (1) For single attack training, we utilize all data from Train-acq-src to ensure sufficient data volume for effective training. (2) For multiple attacks training, the methods are trained by the adversarial examples in PADs, which are consistent with our AED-PADA. Each source domain contains 10,000 samples from Train-acq-src, evenly divided into 5,000 benign and 5,000 adversarial examples.

Table 3. Comparison of the generalization performances on CIFAR-10 between the state-of-the-art adversarial detection methods and our AED-PADA. Averaged Accuracy is the average result across seven unseen state-of-the-art testing adversarial attacks. The bolded and the underlined values represent the best and the second best results for each column, respectively.

Dataset (Backbone)	Detector	Accuracy on Unseen SOTA Adversarial Attacks (%)							Averaged Accuracy (%)
	Detector	APGD	ILA-DA	Jitter	SSA	TI-FGSM	VMI-FGSM	VNI-FGSM	Averaged Accuracy (%)
CIFAR-10 (ResNet-18)	LID (lid, )	90.044	94.121	78.240	59.839	90.863	91.056	86.496	84.380
	LID-PADs	78.903	86.068	70.045	57.330	82.278	82.333	79.978	76.705
	MD (md, )	65.976	96.657	61.764	54.225	69.282	69.653	69.726	69.612
	MD-PADs	63.371	97.970	62.058	51.501	68.100	68.814	69.351	68.738
	Steg (steg, )	83.521	94.692	86.759	58.064	90.496	90.814	92.057	85.200
	Steg-PADs	84.085	94.155	88.885	62.305	90.715	90.860	91.690	86.099
	SID (SID, )	86.113	60.999	74.629	53.732	87.323	87.330	82.100	76.032
	SID-PADs	86.520	61.820	74.860	54.010	88.285	88.220	82.935	76.664
	SA (txt_advdetection, )	84.907	90.731	87.365	81.323	89.275	90.013	91.967	87.940
	SA-PADs	89.125	93.485	91.925	88.885	94.760	94.875	95.595	92.664
	AED-PADA	90.545	97.855	97.065	84.255	97.700	97.765	97.675	94.694
CIFAR-10 (VGG-16)	LID (lid, )	85.024	93.357	75.092	58.639	88.201	88.173	81.540	81.432
	LID-PADs	70.493	89.403	65.190	58.588	75.553	75.408	73.500	72.590
	MD (md, )	51.322	72.013	50.904	52.012	51.809	51.799	50.963	54.403
	MD-PADs	50.033	81.779	50.028	51.349	50.005	50.856	51.349	55.057
	Steg (steg, )	79.280	92.756	88.466	57.470	88.854	88.891	89.390	83.586
	Steg-PADs	79.900	92.265	88.705	59.360	89.015	88.800	89.235	83.897
	SID (SID, )	83.235	66.047	73.260	53.373	84.743	84.582	76.210	74.493
	SID-PADs	83.105	66.240	73.230	53.375	84.605	84.385	76.145	74.441
	SA (txt_advdetection, )	80.198	85.036	81.003	63.984	85.356	86.293	87.627	81.357
	SA-PADs	79.720	83.825	77.985	63.335	83.765	84.475	85.755	79.873
	AED-PADA	84.370	94.365	93.260	65.765	93.135	93.410	93.545	88.264

Table 4. Comparison of the generalization performances on SVHN between the state-of-the-art adversarial detection methods and our AED-PADA. Averaged Accuracy is the average result across seven unseen state-of-the-art testing adversarial attacks. The bolded and the underlined values represent the best and the second best results for each column, respectively.

Dataset (Backbone)	Detector	Accuracy on Unseen SOTA Adversarial Attacks (%)							Averaged Accuracy (%)
	Detector	APGD	ILA-DA	Jitter	SSA	TI-FGSM	VMI-FGSM	VNI-FGSM	Averaged Accuracy (%)
SVHN (ResNet-18)	LID (lid, )	66.431	87.940	63.697	60.324	71.403	71.384	64.952	69.447
	LID-PADs	61.849	86.846	60.551	58.515	69.581	69.639	64.182	67.309
	MD (md, )	61.167	66.439	59.906	58.270	67.625	67.609	60.870	63.127
	MD-PADs	60.768	50.190	59.637	57.967	67.257	67.337	60.688	60.549
	Steg (steg, )	71.591	51.509	94.268	59.865	96.943	96.874	95.954	81.000
	Steg-PADs	73.460	50.860	97.115	74.360	98.005	98.040	97.575	84.202
	SID (SID, )	68.859	92.207	63.763	56.571	74.074	73.995	64.150	70.517
	SID-PADs	69.670	93.420	64.155	56.800	75.285	75.120	64.780	71.319
	SA (txt_advdetection, )	68.092	73.066	65.288	69.276	82.758	82.823	82.467	74.824
	SA-PADs	69.490	73.100	63.610	66.550	78.155	78.220	77.785	72.416
	AED-PADA	74.455	99.730	99.730	87.305	99.730	99.730	99.730	94.344
SVHN (VGG-16)	LID (lid, )	65.645	96.605	62.731	64.252	74.205	74.263	70.296	72.571
	LID-PADs	69.710	97.154	63.626	63.955	76.258	76.306	70.119	73.875
	MD (md, )	51.511	58.333	51.233	51.859	51.870	52.268	51.465	52.648
	MD-PADs	51.920	56.484	51.578	52.745	51.103	50.285	51.748	52.266
	Steg (steg, )	70.586	98.661	93.882	58.496	95.774	95.575	94.549	86.789
	Steg-PADs	72.925	98.975	97.700	58.905	98.485	98.480	98.085	89.079
	SID (SID, )	68.159	69.813	61.320	60.722	74.468	74.454	67.213	68.021
	SID-PADs	68.365	62.450	61.145	56.790	70.075	70.080	63.470	64.625
	SA (txt_advdetection, )	65.862	72.558	63.304	67.745	72.087	72.183	73.811	69.650
	SA-PADs	65.060	69.765	59.365	62.605	66.795	66.955	67.385	65.419
	AED-PADA	73.500	99.220	98.995	68.990	99.295	99.310	99.220	91.219

Table 5. Comparison of the generalization performances on ImageNet between the state-of-the-art adversarial detection methods and our AED-PADA. Averaged Accuracy is the average result across seven unseen state-of-the-art testing adversarial attacks. The bolded and the underlined values represent the best and the second best results for each column, respectively.

Dataset (Backbone)	Detector	Accuracy on Unseen SOTA Adversarial Attacks (%)							Averaged Accuracy (%)
	Detector	APGD	ILA-DA	Jitter	SSA	TI-FGSM	VMI-FGSM	VNI-FGSM	Averaged Accuracy (%)
ImageNet (ResNet-18)	LID (lid, )	52.868	54.078	50.991	53.263	53.793	56.566	56.549	54.015
	LID-PADs	57.340	63.910	55.195	58.150	58.890	60.900	61.245	59.376
	MD (md, )	53.760	52.031	51.728	51.603	53.533	59.438	58.536	54.376
	MD-PADs	54.560	50.425	50.685	52.765	54.695	62.165	61.040	55.191
	Steg (steg, )	78.415	86.954	86.553	85.431	65.101	94.679	94.606	84.534
	Steg-PADs	90.725	94.735	94.700	94.535	89.660	94.875	94.875	93.444
	SID (SID, )	51.885	56.475	51.745	51.425	55.285	55.335	54.995	53.878
	SID-PADs	55.070	56.975	53.785	52.335	59.790	59.860	58.400	56.602
	SA (txt_advdetection, )	85.738	87.149	85.912	85.712	86.960	85.988	86.014	86.210
	SA-PADs	66.955	69.230	67.665	66.930	67.385	68.490	68.550	67.886
	AED-PADA	98.830	99.935	99.965	99.965	99.070	99.965	99.960	99.670
ImageNet (VGG-16)	LID (lid, )	53.144	53.672	52.029	51.694	52.056	57.081	56.173	53.693
	LID-PADs	56.295	59.330	55.110	54.825	55.695	59.960	59.310	57.218
	MD (md, )	53.760	52.031	51.728	51.603	53.533	59.438	58.536	54.376
	MD-PADs	53.470	50.705	51.960	50.970	52.430	57.385	56.585	53.358
	Steg (steg, )	78.931	90.915	85.852	84.279	66.300	94.695	94.969	85.134
	Steg-PADs	90.585	94.185	94.115	94.165	90.375	94.530	94.535	93.213
	SID (SID, )	41.718	50.624	50.477	44.102	51.015	51.567	51.528	48.719
	SID-PADs	43.238	46.165	53.794	50.002	49.125	53.257	46.382	48.852
	SA (txt_advdetection, )	87.747	93.025	89.939	88.176	86.256	92.133	92.255	89.933
	SA-PADs	91.590	94.910	93.415	92.310	91.280	94.250	94.280	93.148
	AED-PADA	98.960	99.930	99.935	99.890	98.225	99.935	99.930	99.544

4.1.6. Implementation details

In this paper, adversarial examples are generated by the untargeted white-box attacks with the $l_{\infty}$ norm constraint. The perturbation of the training adversarial examples is constrained to a challenging scenario, where the maximum magnitude of the adversarial perturbation is set to 2. The step size and number of iterations for adversarial attacks are set to 1/255 and 10, respectively. During PADA stage, we utilize MFSAN (aaai-mda, ) as the basic MUDA method and employ the same training strategy as it. All the adversarial example detection methods are trained consistently for 100 epochs. To evaluate the performance of our proposed method, the widely used Accuracy is employed as the metric for the adversarial example detection task. We utilize MFSAN as the basic MUDA method in the PADA stage of our framework and set the trade-off parameters $\lambda=\gamma=1.0$ , which respectively control the importance of $L_{d}$ and $L_{disc}$ . The experiments are conducted on an NVIDIA GeForce RTX 3080Ti GPU.

4.2. Performance Evaluations

Here, we compare our method with 5 SOTA detection methods, including LID (lid, ), MD (md, ), Steg (steg, ), SID (SID, ) and SA (txt_advdetection, ).

4.2.1. Generalization performances against unseen adversarial attacks

Tables 3, 4, and 5 present the cross-attack detection results across 7 unseen testing adversarial attacks on CIFAR-10, SVHN, and ImageNet, respectively. Existing adversarial detection methods, which named without ‘-PADs’, are trained on a single attack with their original settings. Each of their results is the averaged detection accuracy with 10 training attacks being utilized one after another in the training process. On the contrary, those with ‘-PADs’ are trained on multiple attacks to be fairly compared to our AED-PADA.

These results provide compelling evidence that our approach obviously outperforms these state-of-the-art adversarial detection methods on the generalization ability. Note that the performance of our approach (99.544%) surpasses that of SID (48.719%) by a significant margin, i.e., 50.825%, under the settings of ImageNet and VGG-16. Besides, our superiority is particularly achieved in challenging scenarios, where the maximum magnitude of the adversarial perturbation is merely set to 2. However, in previous studies, the maximum magnitude is typically set to 4 or 8. The reduction in the magnitude of perturbations significantly increases the difficulty of detecting adversarial examples and reduces the detection performance. The experimental results in Tables 3, 4, and 5 demonstrate that our AED-PADA is effective in the scenarios characterized by subtle adversarial perturbations.

As can be observed, our AED-PADA demonstrates better generalization ability on ImageNet compared to that on CIFAR-10 and SVHN, because the image resolution in ImageNet ( $224\times 224$ ) is higher than that in CIFAR-10 and SVHN ( $32\times 32$ ). Intuitively, a larger image tends to provide a larger space for adversarial perturbation generation, which makes the differences between the perturbations obtained from various attack methods more pronounced, i.e., reducing the overlaps between different adversarial attacks in the adversarial feature space. By selecting PADs, our AED-PADA is able to occupy a larger feature space, thus achieving a better generalization ability.

Besides, PADs may not be suitable to be directly applied to the existing methods. As can be observed, if the SOTA methods directly adopt PADs for training (denoted as X-PADs), their performances may not always increase. For instance, the generalization performance of LID-PADs on CIFAR-10 is inferior to that of LID. We postulate that this discrepancy can be attributed partially to the reduction in the volume of the training data, and partially to the incompatibility between PADs and the framework of LID. This observation further verifies the effectiveness of our PADA in our AED-PADA framework.

4.2.2. Generalization performances across different backbones and datasets

To thoroughly evaluate the generalization ability of adversarial detection methods, it is also vital to assess their performances in the scenarios where the training and testing adversarial examples are from different backbones and datasets. A detection method with better cross-attack, cross-backbone, and cross-dataset capabilities tends to be better suited for practical applications in complex environments.

Table 6 displays the generalization performances of different detection methods across different backbones and datasets. The cross-backbone results are obtained on CIFAR-10, and the cross-dataset results are obtained when the backbone is ResNet-18. In general, results from both scenarios demonstrate that our method exhibits the best generalization ability. Since the structures tend to vary significantly among different types of backbones, which will result in different feature dimensions between training and testing, both LID and MD are not applicable in cross-backbone experiment, i.e., they lack the capability to generalize across backbones, because they utilize the intermediate features of the backbones for detection.

Table 6. Comparison of generalization performances across different backbones and datasets. The bolded and the underlined values represent the best and the second best results for each column, respectively.

Detector	Averaged Accuracy (%)
	Cross-Backbone		Cross-Dataset
	ResNet-18 $\rightarrow$ VGG-16	VGG-16 $\rightarrow$ ResNet-18	CIFAR-10 $\rightarrow$ SVHN	SVHN $\rightarrow$ CIFAR-10
LID (lid, )	-	-	71.619	80.233
MD (md, )	-	-	62.437	61.429
Steg (steg, )	82.894	85.157	53.249	78.155
SID (SID, )	74.169	76.458	68.928	73.442
SA (txt_advdetection, )	80.751	87.773	58.910	51.694
Ours	89.301	91.624	82.964	85.113

Table 7. Comparison of generalization performances across different maximum perturbation magnitudes on CIFAR-10 when the backbone is ResNet-18. The maximum magnitude of the testing perturbation is set to 1, 2, 4, 8, respectively.

Detector	Known	Unseen
Detector	2	1	4	8	avg
LID [13]	84.4	56.2	71.0	76.1	67.8
MD [14]	69.6	57.1	66.2	66.4	63.2
Steg [15]	85.2	67.8	90.0	90.4	82.7
SID [16]	76.0	55.2	68.0	64.6	62.6
SA [17]	87.9	75.7	89.8	88.9	84.8
Ours	94.7	89.9	97.0	95.2	94.0

Table 8. Comparison of generalization performances across different maximum perturbation magnitudes on SVHN when the backbone is ResNet-18. The maximum magnitude of the testing perturbation is set to 1, 2, 4, 8, respectively.

Detector	Known	Unseen
Detector	2	1	4	8	avg
LID [13]	69.4	55.0	69.9	75.0	66.6
MD [14]	63.1	55.4	63.4	65.2	61.3
Steg [15]	81.0	58.3	94.7	95.4	82.8
SID [16]	70.5	51.9	64.8	65.4	60.7
SA [17]	74.8	55.7	83.9	84.1	74.6
Ours	93.3	89.1	98.6	99.5	95.7

4.2.3. Generalization performances across different maximum perturbation magnitudes

Here, we evaluate the robustness of the detection models by focusing on a challenging scenario where the maximum perturbation magnitude of the adversarial examples to be detected is unknown. All the detection models are trained on adversarial examples with a maximum perturbation magnitude of 2, while the maximum magnitude of the testing adversarial perturbations is varied across 1, 2, 4 and 8, respectively. Tables 8 and 8 present the generalization performances across different maximum perturbation magnitudes on CIFAR-10 and SVHN, respectively. Based on these results, our AED-PADA consistently outperforms the state-of-the-art adversarial detection methods across all the testing scenarios, demonstrating superior generalization ability of our method against the adversarial perturbations with unseen perturbation magnitudes.

4.3. Effectiveness of AD clustering and PADs Selection

Selecting multiple adversarial attacks from the same cluster leads to redundancy, as these candidate attacks are quite similar. The combination of them covers a small feature space, causing poor generalization. On the other hand, choosing from different clusters effectively avoids this issue. Consequently, as shown in Table 9 and Table 10, we set up two groups of experiments, i.e., selecting ADs from Same Cluster and Cross Cluster, to verify the effectiveness of the AD Clustering and PADs Selection of AED-PADA, against the 7 unseen attacks. ‘Same Cluster’ and ‘Cross Cluster’ respectively represent that the ADs employed as the sources domains are selected from the same AD cluster and different AD clusters.

Table 9 presents the generalization performances on CIFAR-10 and ResNet-18. The result of AD clustering is {FGSM, PGD, DIM, MI-FGSM, SI-NI-FGSM}-{BIM, ILA, YA-ILA}-{C&W, DeepFool}. For ‘Same Cluster’, the best, worst and mean values of averaged results are 90.444%, 93.494%, and 92.150%, respectively. For ‘Cross Cluster’, the best, worst, and mean values are 94.694%, 92.429%, and 93.892%, respectively. Similarly, Table 10 shows the results on CIFAR-10 and VGG-16. The result of AD clustering is {BIM, ILA, YA-ILA, PGD}-{DIM, MI-FGSM, FGSM}-{CW, DeepFool, SI-NI-FGSM}. For ‘Same Cluster’, the best, worst and mean values of the averaged results are 87.132%, 88.019%, and 87.461%, respectively. For ‘Cross Cluster’, the best, worst, and mean values of the averaged values are 88.264%, 87.406%, and 87.818%, respectively.

Table 9. Detection Results of different source domains on CIFAR-10 when the backbone is ResNet-18.

Origin	Detector	Averaged Accuracy (%)	CEFS( $\uparrow$ )
Same Cluster	FGSM+SI-NI-FGSM+PGD	90.444	-
	BIM+ILA+YA-ILA	92.513	-
	MI-FGSM+SI-NI-FGSM+DIM	93.494	-
Cross Cluster	DIM+BIM+C&W	94.694	80.856
	MI-FGSM+BIM+C&W	94.621	72.625
	SI-NI-FGSM+BIM+C&W	94.624	58.649
	FGSM+BIM+C&W	94.600	58.543
	PGD+BIM+C&W	94.222	50.669
	FGSM+BIM+DeepFool	93.407	50.142
	FGSM+ILA+DeepFool	92.539	49.818
	PGD+ILA+DeepFool	92.429	48.015

Table 10. Detection Results of different source domains on CIFAR-10 when the backbone is VGG-16.

Origin	Detector	Averaged Accuracy (%)	CEFS( $\uparrow$ )
Same Cluster	BIM+ILA+YA-ILA	87.389	-
	BIM+ILA+PGD	87.153	-
	BIM+YA-ILA+PGD	87.132	-
	ILA+YA-ILA+PGD	87.439	-
	DIM+MIM+FGSM	87.634	-
	CW+DeepFool+SI-NI-FGSM	88.019	-
Cross Cluster	ILA+DIM+CW	88.264	18.692
	PGD+DIM+CW	88.236	18.605
	ILA+FGSM+CW	87.969	18.532
	YA-ILA+FGSM+CW	87.897	18.527
	YA-ILA+MI-FGSM+SI-NI-FGSM	87.670	18.518
	BIM+DIM+SI-NI-FGSM	87.676	18.410
	YA-ILA+FGSM+SI-NI-FGSM	87.425	18.388
	YA-ILA+DIM+SI-NI-FGSM	87.406	18.385

Based on the results, we can obtain three observations. Firstly, the mean results of ‘Cross Cluster’ selection is higher than of ‘Same Cluster’ selection, and the best result of ‘Cross Cluster’ selection is also superior. Specially, On the setting of CIFAR-10 and ResNet-18, the mean results of ‘Cross Cluster’ selection is even higher than of the best results of ‘Same Cluster’ selection. This verifies the effectiveness of our AD Clustering. Apparently, source domains from ‘Cross Cluster’ can certainly enhance the generalization ability of the detection methods.

Secondly, although ‘Cross Cluster’ selection in general achieves better results, it cannot guarantee that the randomly selected ADs (from ‘Cross Cluster’s) always give better performance than the results of ‘Same Cluster’ selected ADs. When the PADs with a very low CEFS score are employed as the source domains for detection, the generalization performance may be worse than that of ‘Same Cluster’ selection. This observation further verifies the effectiveness of our PADs Selection.

Thirdly, as shown in the ‘Cross Cluster’ part of table 9 and table 10, for the majority of results, a higher CEFS value of PADs induces a higher averaged accuracy of the proposed detection. This observation suggests that the proposed CEFS is a proper guide for selecting PADs, i.e., a higher CEFS score indicates that the corresponding PADs can cover a larger proportion of the entire feature space, and the PADA stage, which uses the PADs as source domains, gives a better generalization performance.

4.4. The effect of different numbers of clusters

Here, we apply AD Clustering to 10 adversarial attack methods in the training set and present the impact of different cluster numbers $K$ (the number of adversarial attacks in PADs). As depicted in Table 11, the detection performance increase when $K$ increases, until a certain value of $K$ is achieved. This trend is attributed to the utilization of an increased number of ADs as the source domains, which undeniably provides larger coverage of the entire feature space. Note that the values with ${\dagger}$ represent the generalization performance of our AED-PADA, in which the number of clusters is automatically determined based on the CH score, and the bolded values represent the best generalization performance.

As can be observed, our automatic CH score based method can yield superior performance on SVHN with VGG-16 being the backbone, and give comparable performance to the best result on other settings. Although the best result with manually selecting $K$ exceeds our automatic method for 0.2%-0.3% in terms of the averaged accuracy, it requires $2.3\times$ more parameters and gives a $3\times$ slower speed. Moreover, in real-world scenarios, the adversarial attacks in the training set may far exceed 10 types, and it is clearly unwise to test the detection performance of each clustering result individually. Consequently, this suggests that our method actually achieves a better balance between the training costs and generalization performances.

Table 11. Detection Performance across seven unseen testing adversarial attacks with different number of clusters

K

Dataset (Backbone)	Averaged Accuracy (%)
	$K=2$	$K=3$	$K=4$	$K=5$	$K=6$	$K=7$
CIFAR-10 (ResNet-18)	91.936	$94.694^{{\dagger}}$	94.784	94.571	94.561	95.018
CIFAR-10 (VGG-16)	87.946	$88.264^{{\dagger}}$	87.964	87.991	88.104	88.498
SVHN (ResNet-18)	91.497	$94.344^{{\dagger}}$	91.257	91.726	94.999	94.801
SVHN (VGG-16)	89.183	89.269	$\textbf{91.219}^{{\dagger}}$	90.056	90.026	90.713

Table 12. Detection Performance of different feature extractions in PADA of our AED-PADA.

Dataset	Feature Extraction	Averaged Accuracy (%)		Average
	Feature Extraction	ResNet-18	VGG-16	Average
CIFAR-10	spatial	93.411	88.251	90.831
	freq	93.379	86.755	90.067
	spatial + freq	94.694	88.264	91.479
SVHN	spatial	94.612	89.721	92.166
	freq	88.157	91.580	89.869
	spatial + freq	94.344	91.219	92.781

4.5. Effectiveness of the adversarial feature enhancement in PADA

In this experiment, we utilize three distinct feature extractions, i.e., spatial feature extraction, frequency feature extraction, and the adversarial feature enhancement (spatial + freq) in our framework. Still 7 test unseen attacks are employed and the averaged results are reported in Table 12. As can be observed, in both datasets, the average results of our adversarial feature enhancement(spatial + freq) exhibit superior performances than both the spatial feature extraction and frequency feature extraction. This indicates that using adversarial feature enhancement as the feature extraction method for PADA is effective. Furthermore, whether it is spatial feature extraction or frequency feature extraction, the performance of single domain feature extraction across different backbones is inconsistent, as evidenced by the fluctuated averaged detection performance across different backbones on SVHN. Our adversarial feature enhancement, which combines both spatial and frequency aspects, can more comprehensively capture adversarial perturbation signals, yielding superior adversarial detection performance.

4.6. The compatibility with different MUDA methods

To demonstrate the compatibility of our framework with the existing Multi-source Unsupervised Domain Adaptation (MUDA) methods, we employ four widely used Multi-source Unsupervised Domain Adaptation (MUDA) methods, M³DA (moment-mda, ), DARN (DARN, ), MDMN (mdmn, ) and MFSAN (aaai-mda, ). All four MUDA methods have utilized the adversarial feature enhancement as the feature extraction component. Table 13 presents that the performances of employing all four MUDA methods can surpass the existing SOTA adversarial detection methods. This verifies that our framework possesses excellent compatibility with existing MUDA methods. In other words, The first exploration of MUDA methods in adversarial example detection has proven to be successful, and Principal Adversarial Domain Adaptation can effectively transfer the knowledge from PADs to the unseen target domain. Besides, since the average result of MFSAN outperforms other MUDA methods, we select MFSAN as the basic MUDA component in our PADA.

Table 13. Detection performance of our framework with different MUDA methods.

Dataset	Backbone	Averaged Accuracy of different MUDA methods (%)
	Backbone	M³DA (moment-mda, )	DARN (DARN, )	MDMN (mdmn, )	MFSAN (Ours) (aaai-mda, )
CIFAR-10	ResNet-18	94.976	93.948	93.832	94.694
CIFAR-10	VGG-16	87.819	87.624	91.463	88.864
SVHN	ResNet-18	93.892	93.934	93.888	94.344
SVHN	VGG-16	90.960	92.244	89.826	91.219
Average		91.912	91.938	92.252	92.280

4.7. Parameters sensitivity.

We utilize MFSAN as the basic MUDA method in the PADA stage of our framework, and $\lambda=\gamma=1.0$ control the importance of $L_{d}$ and $L_{disc}$ , respectively. To study the parameters sensitivity, we sample the values of $\lambda$ and $\gamma$ from {0.5, 1.0, 2.0}, and perform the experiments under the settings of CIFAR-10 and ResNet-18.

Table 14. Sensitivity of the parameters

\lambda

and

\gamma

in the PADA stage.

$\lambda$	$\gamma$	Averaged Accuracy (%)
$\lambda$ =0.5	$\gamma$ =0.5	94.503
$\lambda$ =0.5	$\gamma$ =1.0	94.591
$\lambda$ =0.5	$\gamma$ =2.0	94.355
$\lambda$ =1.0	$\gamma$ =0.5	94.646
$\lambda$ =1.0	$\gamma$ =1.0	94.694
$\lambda$ =1.0	$\gamma$ =2.0	94.603
$\lambda$ =2.0	$\gamma$ =0.5	94.598
$\lambda$ =2.0	$\gamma$ =1.0	94.629
$\lambda$ =2.0	$\gamma$ =2.0	94.223

Table 14 indicates that our proposed method maintains high performance consistency under various parameter settings when employing MFSAN as the basic MUDA method. The best averaged accuracy is 94.694% when $\lambda=\gamma=1.0$ . Variations of the parameters typically do not induce the fluctuations in the detection performance. The mean value and the standard deviation of all the averaged accuracies are 94.538% and 0.154%, respectively. This indicates that our PADA framework is effective and robust, since it shows insensitivity to different parameters and consistently offers high detection accuracy.

4.8. The computational costs

Table 15. Comparison of the time and hardware costs. Batch size for each method is 1000.

Cost	Detector
	LID (lid, )	MD (md, )	Steg (steg, )	SID (SID, )	SA (txt_advdetection, )	Ours
Time (ms/image)	8.13	18.05	23.85	2.16	1.21	0.60
Hardware (MB)	7458	7248	1614	1592	8564	6124

Table 15 presents the time and hardware costs when deploying the state-of-the-art adversarial example detection methods and our AED-PADA. The results indicate that our method has the minimal time expenses and moderate hardware (GPU memory) expenses under the same settings. All experiments are conducted on an NVIDIA GeForce RTX 3080Ti GPU under CIFAR-10 and ResNet-18 settings.

Furthermore, our AED-PADA is capable of deployment in real-world scenarios. Real-world scenarios demand the adversarial example detection methods not only possess the capability for real-time detection but also maintain robust detection performance in the environments where the adversarial attacks, datasets, and backbones during testing are entirely unseen. Firstly, we only train once, ready for real-world deployment without retraining for new unseen attacks. We have the shortest inference time, which is the best real-time detection performance. Secondly, we train from earlier attacks and test on more advanced attacks. Table 3 demonstrates that under this real-world-aligned setting, our AED-PADA exhibits superior generalization performance, and would maintain considerable detection capabilities against future unseen attacks. Lastly, with its strong performance across various backbones and datasets depicted in the Table 6, our proposed method performs better in the environments with unseen datasets and backbones. Consequently, our AED-PADA is well-suited for practical applications in the complex real-world environments.

5. Conclusion

In this paper, we proposed a novel and effective adversarial example detection method, named Adversarial Example Detection via Principal Adversarial Domain Adaptation, to improve the generalization ability of adversarial detection. Specifically, AED-PADA contains two stages, i.e., Principal Adversarial Domains Identification (PADI) and Principal Adversarial Domain Adaptation (PADA). In PADI, we acquired ADs from scratch and constructed PADs as the source domains for PADA. In PADA, we proposed an adversarial feature enhancement based Multi-source Unsupervised Domain Adaptation framework, which is compatible with various existing MUDA methods, to effectively leverage PADs to achieve adversarial example detection. Experimental results demonstrated the superiority of our work, compared to the state-of-the-art detection methods.

Acknowledgements.

This work was supported in part by the National Natural Science Foundation of China under Grant 62272020, U20B2069 and 62176253, in part by the State Key Laboratory of Complex & Critical Software Environment under Grant SKLSDE2023ZX-16, and in part by the Fundamental Research Funds for Central Universities.

References

[1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
[2] Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.
[3] Mengte Shi, Sheng Li, Zhaoxia Yin, Xinpeng Zhang, and Zhenxing Qian. On generating jpeg adversarial images. In International Conference on Multimedia and Expo, pages 1–6, 2021.
[4] Cheng Luo, Qinliang Lin, Weicheng Xie, Bizhu Wu, Jinheng Xie, and Linlin Shen. Frequency-driven imperceptible adversarial attack on semantic similarity. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15294–15303, 2022.
[5] Chiu Wai Yan, Tsz-Him Cheung, and Dit-Yan Yeung. ILA-DA: improving transferability of intermediate level attack with data augmentation. In International Conference on Learning Representations, 2023.
[6] Yikun Xu, Xingxing Wei, Pengwen Dai, and Xiaochun Cao. A2sc: Adversarial attacks on subspace clustering. ACM Transactions on Multimedia Computing, Communications and Applications, 19(6):1–23, 2023.
[7] Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, and Yi Ren. Targeted attack on deep rl-based autonomous driving with learned visual patterns. In International Conference on Robotics and Automation, pages 10571–10577, 2022.
[8] Xingjun Ma, Yuhao Niu, Lin Gu, Yisen Wang, Yitian Zhao, James Bailey, and Feng Lu. Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition, 110:107332, 2021.
[9] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
[10] Pengcheng Li, Jinfeng Yi, Bowen Zhou, and Lijun Zhang. Improving the robustness of deep neural networks via adversarial training with triplet loss. In International Joint Conference on Artificial Intelligence, pages 2909–2915, 2019.
[11] Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and Quoc V. Le. Adversarial examples improve image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 816–825, 2020.
[12] Shangxi Wu, Jitao Sang, Kaiyan Xu, Guanhua Zheng, and Changsheng Xu. Adaptive adversarial logits pairing. ACM Transactions on Multimedia Computing, Communications and Applications, 20(2):56:1–56:16, 2024.
[13] Xingjun Ma, Bo Li, Yisen Wang, Sarah M. Erfani, Sudanthi N. R. Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E. Houle, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. In International Conference on Learning Representations, 2018.
[14] Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Advances in Neural Information Processing Systems, pages 7167–7177, 2018.
[15] Jiayang Liu, Weiming Zhang, Yiwei Zhang, Dongdong Hou, Yujia Liu, Hongyue Zha, and Nenghai Yu. Detection based defense against adversarial examples from the steganalysis point of view. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4825–4834, 2019.
[16] Jinyu Tian, Jiantao Zhou, Yuanman Li, and Jia Duan. Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. In AAAI Conference on Artificial Intelligence, pages 9877–9885, 2021.
[17] Yulong Wang, Tianxiang Li, Shenghong Li, Xin Yuan, and Wei Ni. New adversarial image detection based on sentiment analysis. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2023.
[18] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In International Conference on Learning Representations Workshop, 2017.
[19] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
[20] Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
[21] Leo Schwinn, René Raab, An Nguyen, Dario Zanca, and Bjoern Eskofier. Exploring misclassifications of robust neural networks to enhance adversarial attacks. Applied Intelligence, pages 1–17, 2023.
[22] Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks. In International Conference on Learning Representations, 2017.
[23] Dongyu Meng and Hao Chen. Magnet: A two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, pages 135–147, 2017.
[24] Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging Generative Models to Understand and Defend Against Adversarial Examples. In International Conference on Learning Representations, 2018.
[25] Jonathan Aigrain and Marcin Detyniecki. Detecting adversarial examples and other misclassifications in neural networks by introspection. In International Conference on Learning Representations, 2019.
[26] Philip Sperl, Ching-Yu Kao, Peng Chen, and Konstantin Böttinger. Dla: Dense-layer-analysis for adversarial example detection. In IEEE European Symposium on Security and Privacy, pages 198–215, 2019.
[27] Fei Zuo and Qiang Zeng. Exploiting the sensitivity of L2 adversarial examples to erase-and-restore. In ACM Asia Conference on Computer and Communications Security, pages 40–51, 2021.
[28] Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280, 2017.
[29] Xin Li and Fuxin Li. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. In IEEE/CVF International Conference on Computer Vision, pages 5775–5783, 2017.
[30] Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410, 2017.
[31] Liang Zhao, Zhikui Chen, Laurence T. Yang, M. Jamal Deen, and Z. Jane Wang. Deep semantic mapping for heterogeneous multimedia transfer learning using co-occurrence data. ACM Transactions on Multimedia Computing, Communications, and Applications, 15(1s):9:1–9:21, 2019.
[32] Yang Yang, Yi Yang, and Heng Tao Shen. Effective transfer tagging from image to video. ACM Transactions on Multimedia Computing, Communications, and Applications, 9(2):14:1–14:20, 2013.
[33] Yoshua Bengio. Deep learning of representations for unsupervised and transfer learning. In International Conference on Machine Learning, volume 27, pages 17–36, 2012.
[34] Feng Liu, Guangquan Zhang, and Jie Lu. Heterogeneous domain adaptation: An unsupervised approach. IEEE Transactions on Neural Networks and Learning Systems, 31(12):5588–5602, 2020.
[35] Subhankar Roy, Aliaksandr Siarohin, Enver Sangineto, Samuel Rota Bulo, Nicu Sebe, and Elisa Ricci. Unsupervised domain adaptation using feature-whitening and consensus loss. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9471–9480, 2019.
[36] Artem Rozantsev, Mathieu Salzmann, and Pascal Fua. Beyond sharing weights for deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(4):801–814, 2019.
[37] Minghao Chen, Shuai Zhao, Haifeng Liu, and Deng Cai. Adversarial-learned loss for domain adaptation. In AAAI Conference on Artificial Intelligence, pages 3521–3528, 2020.
[38] Yongchun Zhu, Fuzhen Zhuang, and Deqing Wang. Aligning domain-specific distribution and classifier for cross-domain classification from multiple sources. In AAAI Conference on Artificial Intelligence, pages 5989–5996, 2019.
[39] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In IEEE/CVF International Conference on Computer Vision, pages 1406–1415, 2019.
[40] Ruijia Xu, Ziliang Chen, Wangmeng Zuo, Junjie Yan, and Liang Lin. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3964–3973, 2018.
[41] Junfeng Wen, Russell Greiner, and Dale Schuurmans. Domain aggregation networks for multi-source domain adaptation. In International Conference on Machine Learning, pages 10214–10224, 2020.
[42] Geon Yeong Park and Sang Wan Lee. Information-theoretic regularization for multi-source domain adaptation. In IEEE/CVF International Conference on Computer Vision, pages 9194–9203, 2021.
[43] Yitong Li, michael Murias, geraldine Dawson, and David E Carlson. Extracting relationships by multi-domain matching. In Advances in Neural Information Processing Systems, 2018.
[44] Sicheng Zhao, Guangzhi Wang, Shanghang Zhang, Yang Gu, Yaxian Li, Zhichao Song, Pengfei Xu, Runbo Hu, Hua Chai, and Kurt Keutzer. Multi-source distilling domain adaptation. In AAAI Conference on Artificial Intelligence, volume 34, pages 12975–12983, 2020.
[45] Baoyao Yang and Pong C Yuen. Cross-domain visual representations via unsupervised graph alignment. In AAAI Conference on Artificial Intelligence, volume 33, pages 5613–5620, 2019.
[46] Hang Wang, Minghao Xu, Bingbing Ni, and Wenjun Zhang. Learning to combine: Knowledge aggregation for multi-source domain adaptation. In Proceedings of the European Conference on Computer Vision (ECCV), volume 12353, pages 727–744, 2020.
[47] Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot a nd Ce Liu, and Dilip Krishnan. Supervised contrastive learning. In Advances in Neural Information Processing Systems, pages 18661–18673, 2020.
[48] Chuan Guo, Mayank Rana, Moustapha Cissé, and Laurens van der Maaten. Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
[49] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan L. Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations, 2018.
[50] MacQueen and James. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pages 281–297, 1967.
[51] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, pages 849–856, 2001.
[52] Tim van Erven and Peter Harremoës. Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
[53] Supreet Kaur Mann and Sonal Chawla. A proposed hybrid clustering algorithm using k-means and birch for cluster based cab recommender system (cbcrs). International Journal of Information Technology, 15(1):219–227, 2023.
[54] Solomon Kullback and Richard A Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22(1):79–86, 1951.
[55] Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander J. Smola. A kernel two-sample test. The Journal of Machine Learning Research, 13:723–773, 2012.
[56] Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In International Conference on Learning Representations, 2019.
[57] Haohan Wang, Xindi Wu, Zeyi Huang, and Eric P. Xing. High-frequency component helps explain the generalization of convolutional neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8681–8691, 2020.
[58] Lijie Fan, Sijia Liu, Pin-Yu Chen, Gaoyuan Zhang, and Chuang Gan. When does contrastive learning preserve adversarial robustness from pretraining to finetuning? In Advances in Neural Information Processing Systems, pages 21480–21492, 2021.
[59] Jessica Fridrich and Jan Kodovsky. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7(3):868–882, 2012.
[60] Peng Zhou, Xintong Han, Vlad I. Morariu, and Larry S. Davis. Learning rich features for image manipulation detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
[61] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
[62] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
[63] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
[64] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading Digits in Natural Images with Unsupervised Feature Learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[65] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
[66] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: A Simple and Accurate Method to Fool Deep Neural Networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2574–2582, 2016.
[67] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2018.
[68] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L. Yuille. Improving transferability of adversarial examples with input diversity. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2730–2739, 2019.
[69] Qian Huang, Isay Katsman, Zeqi Gu, Horace He, Serge J. Belongie, and Ser-Nam Lim. Enhancing adversarial example transferability with an intermediate level attack. In IEEE/CVF International Conference on Computer Vision, pages 4732–4741, 2019.
[70] Qizhang Li, Yiwen Guo, and Hao Chen. Yet another intermediate-level attack. In Proceedings of the European Conference on Computer Vision (ECCV), pages 241–257, 2020.
[71] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E. Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2020.
[72] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, pages 2206–2216, 2020.
[73] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
[74] Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1924–1933, 2021.