Boosting Adversarial Transferability with Learnable Patch-wise Masks

Xingxing Wei^∗ , and Shiji Zhao Xingxing Wei, Shiji Zhao were at the Institute of Artificial Intelligence, Beihang University, No.37, Xueyuan Road, Haidian District, Beijing, 100191, P.R. China. (E-mail: {xxwei, zhaoshiji123}@buaa.edu.cn).Xingxing Wei is the corresponding author.

Abstract

Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models. Although many methods have been proposed to boost adversarial transferability, a gap still exists between capabilities and practical demand. In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model. For that, a patch-wise mask is utilized to prune the model-specific regions when calculating adversarial perturbations. To accurately localize these regions, we present a learnable approach to automatically optimize the mask. Specifically, we simulate the target models in our framework, and adjust the patch-wise mask according to the feedback of the simulated models. To improve the efficiency, the differential evolutionary (DE) algorithm is utilized to search for patch-wise masks for a specific image. During iterative attacks, the learned masks are applied to the image to drop out the patches related to model-specific regions, thus making the gradients more generic and improving the adversarial transferability. The proposed approach is a preprocessing method and can be integrated with existing methods to further boost the transferability. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our method. We incorporate the proposed approach with existing methods to perform ensemble attacks and achieve an average success rate of 93.01% against seven advanced defense methods, which can effectively enhance the state-of-the-art transfer-based attack performance.

Index Terms:

DNNs, Adversarial Attack, Adversarial Transferability.

I Introduction

Deep neural networks (DNNs) have achieved remarkable performance in a variety of computer vision tasks, including image classification [10, 20], object detection [4] and semantic segmentation [17]. However, DNNs are vulnerable to adversarial examples [24, 5], which are generated by adding well-designed imperceptible perturbations to benign images, making deep models output wrong predictions expected by the adversary. More importantly, adversarial examples show transferability, which means that the adversarial examples crafted for one DNN model may cause other unseen DNN models to make mistakes. These properties make adversarial examples attract widespread attention, especially in security-sensitive application scenarios, e.g., face recognition [49, 50], self-driving [14].

Refer to caption — Figure 1: Illustrations of the discriminative regions. We adapt smoothed gradient [21] to visualize the saliency maps of three DNN models, i.e., ResNet=50 [6], Inception-v3 [23] and VGG-16 [20]. The left denotes the discriminative regions on the original images, and the right denotes the discriminative regions on the masked images. Red numbers represent the clustering coefficient [29, 7] of the saliency maps.

Intuitively speaking, transfer-based attacks utilize the assumption that different DNN models trained on the same dataset have similar discriminative regions. Therefore, an adversarial example generated against the white-box source models can be transferred to attack the black-box target model. However, because the generated adversarial perturbation is highly correlated with the discriminative regions of the white-box source model at the given input point, overfitting may occur. We argue that the discriminative regions of a DNN model can be divided into model-generic regions and model-specific regions. As their names imply, model-generic regions are the most similar discriminative regions across DNN models, and model-specific regions are the most differential regions for various DNN models, which leads to overfitting and restricts transferability. For that, we illustrate the discriminative regions of three popular models by visualizing their gradients¹¹1Gradients are directly correlated with the model’s discriminative regions. Here we choose a popular gradient visualization method: smoothgrad [21] to remove the noise effects of gradients. Other popular visualization methods can also be used. in Figure 1 (a). We can see that these three models focus on similar discriminative regions, but various model-specific discriminative regions indeed exist for each DNN model, such as some highlighted saliency maps appearing in the background. When the generated adversarial perturbations overfit to the source model, these model-specific perturbations become key for mitigating the improvements versus the transferability.

From this viewpoint, a variety of existing methods can be regarded to boost transferability by searching for the model-generic discriminative regions via various strategies and then generating the transferable adversarial examples. For example, input transformation attacks [3, 34, 31, 27, 18] adapt various input transformations to create diverse input patterns, and then average the gradients on these patterns to avoid overfitting, which actually searches for the model-generic discriminative regions via data augmentation. Feature-level object-aware attacks [30, 28, 37] mitigate overfitting to specific blind spots of the source model by measuring and highlighting the important features of the object region to search for the model-generic regions.

In addition to searching for the model-generic discriminative regions, we argue that another available way exists to boost the transferability in the opposite direction, i.e., localizing and pruning these model-specific regions, and thus the complete model-generic regions can be better retained. However, model-specific regions change with different DNN models, and we cannot remove these regions one by one. Moreover, how to localize these model-specific regions is a challenging problem because we do not have a quantitative metric to accurately distinguish and pick the model-specific regions.

To address this issue, in this paper, we propose a preprocessing method named Learnable Patch-wise Mask (LPM) to accurately localize the model-specific regions, and then prune them according to the learned masks. Because discriminative regions of DNN models are based on the image’s content, we aim to drop out the image’s patches related to the model-specific discriminative regions, thus making the semantic information of dropped patches meaningless. In this way, different DNN models will not pay attention to model-specific regions. To accurately localize these patches, we align a patch-wise binary mask to the input image. To better learn this patch-wise mask, we construct ensemble DNN models to simulate a target model (called simulated models). Then, we adjust the patch-wise mask according to the feedback from simulated models. If the mask is appropriate (i.e., the model-specific regions are pruned well), the transfer-based attack performance versus these simulated models will be high, and vice versa. To improve the solving efficiency, we utilize the differential evolution (DE) algorithm [1] to optimize the patch-wise mask. The pipeline of our method is illustrated in Figure 2. The code can be found in https://github.com/zhaoshiji123/LPM.

Because of the absent model-specific regions, the poor transferability caused by overfitting will be mitigated. The masked images and the gradient maps are illustrated in Figure 1 (b). We see that these models focus on more generic discriminative regions of the masked images than the benign ones. In this way, our method can be considered as a preprocessing method to generate masked images, which can be applied to enhance the performance of transfer-based attack methods when generating adversarial examples.

The main contributions can be summarized as follows:

•

We argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability. For that, we propose a preprocessing method named Learnable Patch-wise Mask (LPM) to learn and prune the model-specific regions for different DNN models in a unified manner. In this way, different models have consistent model-generic regions, and the adversarial perturbations can have better attack transferability.
•

We propose a learnable method that utilizes the differential evolutionary algorithm to search for a patch-wise binary mask. We use the learned masks to drop out the image patches related to the model-specific regions and boost transfer-based attacks by mitigating overfitting to the source model.
•

Extensive experimental results show that the proposed method can remarkably enhance the attack transferability of existing transfer-based attack methods in both mainstream undefended and defended models, which shows the effectiveness of our LPM.

The rest of this paper is structured as follows: Section 2 reviews the related works on transfer-based adversarial attacks, and the corresponding defense methods. Section 3 describes the detailed method for the proposed method. Section 4 presents and analyzes the experimental results. Section 5 concludes the paper.

II Related Work

II-A Transfer-based Adversarial Attacks

Existing transfer-based adversarial attacks can be divided into three classes, i.e. gradient optimization attacks [2, 13, 26, 35, 45, 46, 48], input transformation attacks [3, 34, 31, 27, 18], and feature-level object-aware attacks [28, 37, 47].

The advanced gradient calculation methods can stabilize the update directions and escape from poor local maxima during the iterations, resulting in more transferable adversarial examples [2]. Lin et al. [13] introduce the Nesterov accelerated gradient into iterative attacks to effectively look ahead and improve the transferability of adversarial examples. Wang et al. [26] further consider the gradient variance of the previous iteration to tune the current gradient to stabilize the update direction and avoid overfitting.

Input transformation attacks adapt various input transformations to create diverse input patterns to avoid overfitting and improve adversarial transferability [34, 3]. Admix [27] calculates the gradient on the input image admixed with a small portion of each image to craft more transferable adversarial images. Wu et al. [31] claim that these prior data augmentation methods employ fixed transformations, which lead to inferior results. They propose exploiting an adversarial transformation network to automatically model various transformations. From another perspective, Long et al. [18] propose a spectrum transformation based on a discrete cosine transform to diversify images and improve adversarial transferability.

Feature-level object-aware attacks mitigate overfitting to the source model by measuring and highlighting the important features, which can be achieved by maximizing the distance between the adversarial image and the benign image on the feature map. Wang et al. [28] perform attacks in the intermediate layers by suppressing positive features and promoting negative features directly to enhance transferability. Zhang et al. [37] propose a neuron attribution-based attack to conduct attacks with more accurate neuron importance estimations.

Existing attacks boost the transferability by searching for the model-generic discriminative regions via various strategies. However, we aim to prune the model-specific regions, which have a different mechanism to boost transferability. In the experiments, we enhance the attack transferability of state-of-the-art methods to verify the effectiveness of our method.

II-B Defense against Adversarial Attacks

To defend against adversarial examples, several methods have been proposed, including adversarial training [5, 44] and preprocessing [33, 12]. Adversarial training injects adversarial examples into the training process. The adversarial training models can resist the perturbations in the gradient direction of the loss function, but the accuracy on benign samples will decrease. Preprocessing methods purify the adversarial examples by removing or destroying adversarial perturbations but reducing the accuracy of benign images. These defenses include the high-level representation denoiser (HGD) [12], random resizing and padding (R&P) [32], randomization smoothing (RS) [9], feature distillation (FD) [16], feature squeezing method: bit reduction (BIT) [36] and a neural representation purifier (NRP) [19].

We employ some representative state-of-the-art defense methods in the experiments to test the performance of our method and evaluate its effectiveness against defenses.

III Methodology

III-A Preliminaries

Given a classification network $f_{\theta}$ parameterized by $\theta$ , let $x,y$ denote the benign image and its corresponding ground-truth label. The goal of the adversarial attack is to find an example $x^{adv}$ that is in the vicinity of $x$ but misclassified by the network. In most cases, we use the $L_{p}$ norm to limit the adversarial perturbations below a threshold $\epsilon$ , where $p$ could be $0$ , $2$ , $\infty$ . This can be expressed as:

f_{\theta}(x^{adv})\neq y,\ s.t.\ \left\|x^{adv}-x\right\|_{p}\leq\epsilon.

(1)

The iterative version of FGSM (I-FGSM) [11] iteratively applies the fast gradient sign method multiple times with a small step size $\alpha$ , which can be expressed as:

x_{t+1}^{adv}=x_{t}^{adv}+\alpha\cdot sign(\bigtriangledown_{x_{t}^{adv}}J(x_{t}^{adv},y)),

(2)

where $x_{0}^{adv}=x$ . $\bigtriangledown_{x_{t}^{adv}}J$ is the gradient of the loss function $J(\cdot)$ with respect to $x_{t}^{adv}$ and cross-entropy loss is often used. $sign(\cdot)$ is the sign function to limit perturbations conformin to the $L_{\infty}$ norm bound.

III-B Attack Based on Learnable Patch-wise Mask

The adversarial example generated by existing transfer-based attacks has poor transferability due to overfitting to the source model. As previosly mentioned, we argue that the model-specific discriminative regions of different DNN models are the key factor causing this overfitting (see Figure 1). To boost transferability, we propose a mask to drop out the image’s patches related to these regions in the process of generating adversarial perturbations. To implement this idea, we employ a learnable strategy to predict a patch-wise mask (LPM) to localize the corresponding patches in the images. For convenience of description, we apply the LPM combined with the I-FGSM as an example to introduce our method. The update of perturbations in the LPM-I-FGSM is formalized as:

x^{adv}_{t+1}=x^{adv}_{t}+\alpha\cdot sign(\bigtriangledown_{x^{adv}_{t}}J(x^{adv}_{t}\odot M,y)),

(3)

where $M\in\{0,1\}^{m\times n}$ is the learned patch-wise binary matrix, and the patches on the image with $0$ will be dropped out. $m\times n$ are the total number of patches normally divided via the predefined patch size $p_{s}$ .

We simulate the target model $t(\cdot)$ to compute feedback to adjust the mask and call it the simulated model $s(\cdot)$ , which can be one or more DNN models. If the performance is strong on the simulated model, indicating that the adversarial examples have a good generalization, it will also perform well on the target model. We can choose a simulated model that is closely related to the target model according to prior knowledge. If no prior knowledge of the target model exists, we utilize the representative DNN models. In this work, we select ResNet-50 [6], VGG-16 [20], and DenseNet-161 [8], and use the ensemble version as the simulated model. The effect of the simulated models will be explored in the ablation study.

The differential evolution (DE) algorithm is employed to optimize the patch-wise mask. The input includes randomly initialized masks $\mathcal{M}(0)$ and feedback $\phi$ on the simulated model. This can be expressed as:

M^{*}=D(\mathcal{M}(0),\phi),

(4)

where $D(\cdot)$ is the DE algorithm. The feedback is crucial for adjusting the mask. The cross-entropy loss with the ground-truth label can reflect the classification results, and the loss value represents a greater propensity for misclassification. In addition, since multiple simulated models are utilized to guide the optimization of masks, we use both the mean and variance of cross-entropy loss as the feedback to prevent falling into the local optimal solution of a certain simulated model, which can be formalized as:

\phi=-\sum_{i=0}^{n_{s}}CE(p_{i},y)+\sum_{i=0}^{n_{s}}\big{(}CE(p_{i},y)-\frac{1}{n_{s}}\sum_{j=0}^{n_{s}}CE(p_{j},y)\big{)}^{2},

(5)

where $n_{s}$ is the number of simulated models, and $y$ indicates the one-hot label. $p_{i}$ is the predicted probability by the $i$ -th simulated model $s_{i}(\cdot)$ , and $CE$ denotes the cross-entropy loss.

The whole framework of the proposed method is illustrated in Figure 2. We initialize a patch-wise mask and combine it with the input image by an elementwise product. The adversarial perturbations are calculated by the I-FGSM attack on the source model $f(\cdot)$ for the modified input image. Then, the generated adversarial image is input to the simulated model $s(\cdot)$ to calculate the feedback $\phi$ . The differential evolution algorithm adjusts the patch-wise masks according to the feedback until the optimal mask is achieved. Finally, we test the transferability of adversarial examples on the target model $t(\cdot)$ . The details can be found in Algorithm 1.

In our method, the patch size is a hyperparameter and we explore its influence in the experimental section. Furthermore, to remove the randomness of the predicted mask solution, we predict multiple masks for one image and employ the intersection of multiple learned masks for every step of the final attack. We find that this method can further improve the transfer-based attack success rate.

III-C Learning the Mask by Differential Evolution

The DE algorithm utilizes crossover strategy, mutation strategy, and selection strategy to heuristically search for the optimal solution with the guidance of feedback. Specifically, in our evolutionary algorithm, a population represents a set of solution masks. Given the population size $P$ , the $k$ -th generation solution $\mathcal{M}(k)$ is represented as:

\mathcal{M}(k)=\{{M}_{i}(k)|{M}_{i}(k)\in\{0,1\}^{m\times n},i=1,...,P\},

(6)

where $M_{i}(k)$ denotes the $i$ -th individual mask solution in the $k$ -th generation.

Algorithm 1 Attacks Based on Learnable Patch-wise Mask

0: Source model

f(\cdot)

, simulated model

s(\cdot)

, clean image

x

and ground-truth label

y

. population size

P

, DE iterations

T_{DE}

, patch size

p_{s}

, hyper-parameter

r

\rho

, attack iterations

T_{m}

for searching population, final attack iterations

T

0: The generated adversarial image

X^{adv}

1: Initialize patch-wise masks

\mathcal{M}(0)

randomly with the zeros’ rate of

r

and patch size of

p_{s}

, and get the superior individual

\mathcal{M}^{*}(0)

by calculate the feedback of

\mathcal{M}(0)

in Eq. 5.

2: for

k

1

T_{DE}

3: Generate the candidate generation solutions

\mathcal{M}^{{}^{\prime}}(k+1)

: Get

{M}_{i}^{{}^{\prime}}(k+1)

by Eq. 7,

i\in(0,\rho*P]

; // Crossover. Get

{M}_{i}^{{}^{\prime}}(k+1)

by Eq. 8,

i\in(\rho*P,P]

; // Mutation.

4: for

i

1

P

5: Drop the image

x

according to mask

M_{i}^{{}^{\prime}}(k+1)

;

6: Generate

x_{i}^{adv}

by I-FGSM with

T_{m}

based on Eq. 3;

7: Calculate feedback

\phi_{i}

x_{i}^{adv}

on simulated model

s(\cdot)

according to Eq. 5;

8: end for

9: Get the next generation solutions

\mathcal{M}(k+1)

: Get

{M}_{i}(k+1)

by Eq. 9,

i\in(0,P]

;// Selection.

10: end for

11: Get the final adversarial image

X^{adv}

with

T

by I-FGSM on source model

f(\cdot)

with

x

modified by

\mathcal{M}^{*}

from

\mathcal{M}(k+1)

;

12: return

X^{adv}

// Notes: I-FGSM can be replaced by other attack methods, e.g., DI-FGSM, MI-FGSM, TI-FGSM, etc.

Crossover Strategy. The crossover strategy generates the candidate next-generation individuals based on the superior individuals in the $k$ -th generation. To determine the superior individuals in the $k$ -th generation $\mathcal{M}^{*}(k)$ , we calculate the feedback $\phi_{i}$ of each $M_{i}(k)$ , and take those with the best $\rho*P$ as superior individuals. Then, we perform crossover among superior individuals to generate part of the candidate solutions for the $k+1$ -th generation. The crossover operation is formalized as:

{M}_{i}^{{}^{\prime}}(k+1)=I[{M}_{\pi_{1}}^{*}(k)+{M}_{\pi_{2}}^{*}(k)],i=1,...\rho*P,

(7)

where ${M}_{i}^{{}^{\prime}}(k+1)$ is the $i$ -th candidate individual mask solution in the ${k+1}$ -th generation. $M_{\pi_{1}}^{*}(k)$ and $M_{\pi_{2}}^{*}(k)$ are randomly selected from the superior individuals, and $\pi_{1},\pi_{2}\in(0,\rho*P]$ . $\rho$ is the rate of crossover individuals to the total individuals. $I(\cdot)$ is the indicator function, $I(x)=1$ if $x=2$ ; $I(x)=0$ if $x=0$ ; $I(x)=0$ or $1$ (with the probability of $0.5$ ) if $x=1$ . The crossover strategy helps to find the location of commonality among masks.

Mutation Strategy. The mutation strategy aims to avoid the suboptimal solution and generate another part of the candidate individuals, which can be expressed as:

{M}_{i}^{{}^{\prime}}(k+1)=E(M_{i}(k),p_{m}),i=\rho*P,...,P,

(8)

where $E(\cdot)$ is the mutation function, which randomly changes the value of each patch position in the mask with probability $p_{m}$ . We set $p_{m}$ to 1, which is equivalent to randomly generating mutation individuals who have no direct connection with the previous individuals. Meanwhile, we keep the number of 0 or 1 in the individuals constant during the mutation strategy to ensure that the drop rate does not change.

Selection Strategy. After the crossover and mutation operation, we select the best $P$ nonrepeating individuals as final $k+1$ generation solutions based on the feedback $\phi$ from the $k$ generation individuals $\mathcal{M}(k)$ and the $k+1$ generation candidate individuals from $\mathcal{M^{{}^{\prime}}}(k+1)$ , which can be formalized as follows:

\mathcal{M}(k+1)=Top(\mathcal{M}(k),\mathcal{M}^{{}^{\prime}}(k+1);P),

(9)

where $Top(\cdot;P)$ denotes the selection function and obtains the best $P$ unique individuals based on feedback $\phi$ . we repeat the above process for iteration $T_{DE}$ . Then we select the final individual $\mathcal{M}(T_{DE})$ as the final solution $\mathcal{M}^{*}$ .

III-D The Explanation of Model-specific Regions

We visualize the evolution process of masks and analyze the proposed method. As Figure 3 shows, the dropped positions tend to be stable with continuous evolution. We find that the foreground and its related regions are basically preserved.

Here, we provide a reasonable explanation of this phenomenon. Since the foreground contains the most direct semantic information connection with the label, models will be more inclined to pay attention to the foreground with the label guidance in the training process, leading to the model-generic discriminative regions; In contrast, as a part of the image, the background is utilized in the training process, but this type of information lacks semantic guidance, so different models may focus on different parts of the background information, leading to the model-specific discriminative regions.

This show that the learned mask is meaningful and can drop the model-specific regions, which reduces overfitting and enhances the transferability of the perturbations.

IV Experiments

IV-A Experimental Settings

Dataset. We apply the ImageNet-compatible dataset²²2https://github.com/cleverhans-lab/cleverhans/tree/master/cleverhans_v3.1.0/examples/nips17_adversarial_competition/dataset following previous work [2, 37, 34], which contains 1000 images from ImageNet [10] with size 299 $\times$ 299 $\times$ 3.

Models. The proposed method is evaluated on six normally trained models, including Inception-v3 (Inc-v3) [23], Inception-v4 (Inc-v4) [22], Inception-Resnet-v2 (IRes-v2) [22], ResNet-101 (Res-101) [6] and ResNet-152 (Res-152), and three adversarially trained models, i.e., adv-Inception-v3 (adv-Inc-v3), ens4-adv-Inception-v3 (Inc-v3_ens4) and ens-adv-Inception-ResNet-v2 (IRes-v2_ens) [25].

It should be mentioned that all the models used in our paper are implemented in an open source manner, coming from PyTorch’s official pretrained models or the open source model library timm ³³3https://github.com/rwightman/pytorch-image-models, and our code is based on the PyTorch framework. Due to the different implementations, there may be small differences between the compared transfer-based attacks versus the attack success rates in their original papers; however, we ensure that all the performances claimed in the paper are obtained by fair testing under the same models.

In addition, we also evaluate our methods on four other advanced defense strategies, including R&P [32], FD [16], RS [9], and NRP [19]. The purified images of FD, RS, and NRP are fed to adv-Inc-v3 to give the final prediction.

SOTA Methods. We choose some existing state-of-the-art attacks, including baseline methods I-FGSM [11], advanced gradient calculation methods, i.e., MI-FGSM [2], NI-FGSM [13], and VNI-FGSM [26], and input transformation attacks, i.e., TI-FGSM [3], DI-FGSM [34], and S²I-FGSM [18].

Hyperparameters. We follow the setting in [2, 26, 34] with the maximum perturbation $\epsilon$ = 16, the attack iteration $T=10$ , and the step size $\alpha=1.6/255$ among all experiments. For MI-FGSM, we set $\mu=1.0$ as recommended in [2]. The diverse probability of DI-FGSM is set as $0.5$ . We adopt Gaussian kernel size of $7\times 7$ for the TI-FGSM. The spectrum transformation number of S²I-FGSM is 20 following the original setting. For our LPM, we set $T_{DE}=10$ , $P=40$ , $\rho=0.3$ , and $r=0.1$ . The number of masks is set to $12$ , and the patch size is set to $32$ . The attack iterations $T_{m}$ in searching model-specific regions is set to 10. Attack Success Rates is used as the evaluation metric, and it refers to the percentage of images that are misclassified by the target model.

TABLE I: The attack success rates (%) of I-FGSM [11], TI-FGSM [3], DI-FGSM [34], MI-FGSM [2], S²I-FGSM [18], and enhanced version by our method (LPM). The adversarial examples are generated by Inc-v3.

*

indicates the white-box model being attacked.

Attack	Inc-v3	IRes-v2	Inc-v4	Res-101	Res-152	adv-Inc-v3	Inc-v3_ens4	IRes-v2_ens	R&P	FD	RS	NRP	Avg
I-FGSM	100^∗	16.7	26.7	24.1	21.5	20.3	10.6	8.9	5.9	22.9	17.1	27.8	25.21
LPM-I-FGSM	100^∗	21.2	33.3	30.2	28.8	21.5	11.0	8.3	5.6	25.2	18.1	28.7	27.66
TI-FGSM	100^∗	16.8	24.8	21.0	19.6	20.8	11.6	9.9	7.8	26.5	18.0	27.6	25.37
LPM-TI-FGSM	100^∗	22.4	34.6	28.6	24.3	22.3	13.8	11.4	8.0	30.7	19.4	28.0	28.63
DI-FGSM	99.7^∗	29.8	46.4	33.2	32.9	22.3	12.7	10.9	7.3	27.7	17.9	28.7	30.79
LPM-DI-FGSM	100^∗	44.6	59.5	47.7	45.3	25.6	17.4	12.0	10.6	32.4	19.6	28.8	36.96
MI-FGSM	100^∗	40.1	48.3	43.1	41.0	31.4	16.8	14.1	10.5	36.3	24.6	32.6	36.57
LPM-MI-FGSM	100^∗	51.5	60.7	56.1	53.7	36.6	20.9	16.5	12.4	43.3	26.8	34.4	42.74
S²I-FGSM	99.5^∗	53.1	61.6	42.0	44.3	35.3	27.2	18.5	15.9	48.5	23.9	35.3	42.09
LPM-S²I-FGSM	100^∗	67.2	73.0	58.7	56.9	40.7	32.2	23.0	20.8	55.3	25.7	36.1	49.13

TABLE II: The attack success rates (%) of I-FGSM [11], TI-FGSM [3], DI-FGSM [34], MI-FGSM [2], S²I-FGSM [18], and enhanced version by our method (LPM). The adversarial examples are generated by IRes-v2.

*

indicates the white-box model being attacked.

Attack	IRes-v2	Inc-v3	Inc-v4	Res-101	Res-152	adv-Inc-v3	Inc-v3_ens4	IRes-v2_ens	R&P	FD	RS	NRP	Avg
I-FGSM	99.7^∗	24.3	23.8	19.8	17.5	21.7	9.8	10.7	5.5	23.9	17.0	27.5	25.10
LPM-I-FGSM	100^∗	27.2	26.5	23.0	20.0	21.8	11.1	10.6	6.7	26.5	18.4	28.7	26.71
TI-FGSM	99.5^∗	26.6	26.3	19.4	18.4	22.6	11.2	13.0	7.6	30.4	17.6	27.9	26.71
LPM-TI-FGSM	100^∗	30.5	30.3	24.5	21.4	23.7	12.3	14.0	8.0	32.5	19.4	29.4	28.83
DI-FGSM	98.4^∗	39.7	41.5	28.6	27.2	25.0	13.9	13.5	8.1	32.0	18.7	29.6	31.35
LPM-DI-FGSM	100^∗	49.7	52.4	37.2	35.4	27.1	15.8	16.1	10.7	36.3	20.5	30.0	35.93
MI-FGSM	99.6^∗	49.8	48.0	44.5	40.2	34.5	18.3	20.6	11.6	39.6	27.5	34.4	39.05
LPM-MI-FGSM	100^∗	63.1	59.2	54.7	50.9	38.2	21.4	23.8	14.2	45.8	29.6	34.9	44.65
S²I-FGSM	99.2^∗	59.2	58.2	41.0	40.6	40.2	27.7	28.3	18.6	54.3	25.0	36.3	44.05
LPM-S²I-FGSM	99.9^∗	64.7	64.1	49.3	48.7	43.8	27.4	29.6	21.6	58.6	26.8	36.5	47.58

TABLE III: The attack success rates (%) of I-FGSM [11], TI-FGSM [3], DI-FGSM [34], MI-FGSM [2], S²I-FGSM [18], and enhanced version by our method (LPM). The adversarial examples are generated by Res-152.

*

indicates the white-box model being attacked.

Attack	Res-152	Inc-v3	IRes-v2	Inc-v4	Res-101	adv-Inc-v3	Inc-v3_ens4	IRes-v2_ens	R&P	FD	RS	NRP	Avg
I-FGSM	100^∗	27.8	14.6	26.1	87.5	19.9	10.8	8.8	7.0	22.2	19.0	27.8	30.96
LPM-I-FGSM	100^∗	34.0	18.3	31.5	95.7	20.7	11.2	9.8	7.3	23.0	19.9	28.5	33.33
TI-FGSM	100^∗	31.5	17.5	26.7	88.2	22.2	12.4	10.4	10.0	28.4	22.3	28.0	33.13
LPM-TI-FGSM	100^∗	41.0	24.8	37.7	95.6	23.9	16.2	12.8	11.3	31.2	24.6	29.8	37.41
DI-FGSM	100^∗	53.3	33.3	51.9	95.8	23.6	14.9	11.8	11.4	28.6	21.9	29.4	39.66
LPM-DI-FGSM	100^∗	71.5	45.9	69.3	99.5	25.9	18.8	14.7	14.1	32.0	24.1	30.7	45.54
MI-FGSM	100^∗	52.2	36.7	46.8	96.3	30.2	18.3	15.0	12.4	32.8	31.1	34.1	42.16
LPM-MI-FGSM	100^∗	64.9	45.7	57.8	98.9	31.6	20.7	17.6	14.9	37.4	34.1	36.6	46.68
S²I-FGSM	100^∗	72.6	62.8	71.4	98.7	41.3	33.8	27.7	29.5	53.3	33.7	37.5	55.19
LPM-S²I-FGSM	100^∗	78.2	65.9	76.3	99.2	43.4	36.1	32.6	31.6	56.7	34.6	38.6	57.77

TABLE IV: The attack success rates (%) toward seven defense methods in the ensemble attack setting. DTS denotes the combination of TI-FGSM [3], DI-FGSM [34], and SI-FGSM [13]. The adversarial examples are generated via an ensemble of Inc-v3, IRes-v2, and Res-101, the weight of each is 1/3.

Attack	Inc-v4	Res-152	adv-Inc-v3	Inc-v3_ens4	IRes-v2_ens	R&P	FD	RS	NRP	Avg
MI-FGSM-DTS	94.5	98.0	83.4	87.9	80.5	85.6	93.1	78.5	55.0	80.57
LPM-MI-FGSM-DTS	98.1	99.5	89.8	93.1	87.0	90.5	96.7	86.1	58.5	85.96
NI-FGSM-DTS	96.2	98.5	83.7	90.1	82.1	85.3	93.5	78.7	41.8	79.31
LPM-NI-FGSM-DTS	98.2	99.7	88.6	94.3	87.7	91.7	96.2	85.9	57.5	85.99
VNI-FGSM-DTS	93.9	96.7	89.6	90.9	85.2	86.9	96.3	86.5	74.5	87.13
LPM-VNI-FGSM-DTS	95.9	98.0	92.3	92.6	88.1	88.2	96.7	85.1	78.0	88.71
S²I-MI-FGSM-DTS	96.9	97.5	95.8	95.6	92.7	94.1	97.7	92.4	79.9	92.60
LPM-S²I-MI-FGSM-DTS	98.0	97.8	96.3	96.4	93.1	94.5	98.2	94.3	78.3	93.01

IV-B Effect of Hyperparameters in the Proposed Method

In the experiments, we find that the patch size plays an important role in boosting transferability, and we explore its effect on the results. The curves of different patch sizes are shown in Figure 4 (left). The patch size of $1$ , i.e., pixel-level drop, has a negative effect on transferability. We think that pixel-level drop is not effective enough to prune the model-specific regions, resulting in poor gradients. As the patch size increases within a certain range, the attack success rate is continuously improved. Based on the results, the patch size is set to 32 in later experiments.

In addition to the patch size $p_{s}$ , the number of masks is also an important hyperparameter. Figure 4 (right) shows the results of adjusting the mask number (i.e., how many patch-wise masks to learn for each image by using the differential evolution algorithm). It can be found that the greater the mask number is, the stronger the transferability of generated adversarial examples. Considering the computational complexity and attack success rate, we set the number of masks to 12.

IV-C Ablation Study

To explore the effectiveness of the proposed learning strategy based on the differential evolution algorithm, we conducted ablation studies. The experimental results are shown in Figure 5. RMI-FGSM represents MI-FGSM with random patch-wise masks (randomly generate a set of masks with the same number and size of LPM, other settings are the same as LPM-MI-FGSM as described in Section IV.A.). LPM-MI-FGSM1, LPM-MI-FGSM2, and LPM-MI-FGSM3 represent MI-FGSM with learnable patch-wise masks, and the numbers of simulated models are 1 (ResNet-50), 2 (ResNet-50 and VGG-16), and 3 (ResNet-50, VGG-16, and DenseNet-161), respectively. As seen from the results, attacks with learned patch-wise masks outperform random masks and no masks, which demonstrates the effectiveness of the proposed learnable strategy for searching model-specific regions on images and improving adversarial transferability. For example, the success rate of RMI-FGSM is 34.4% on adv-Inc-v3, while that of the proposed learning-based attack LPM-MI-FGSM1 is 35.2%. As the number of simulated models increases, the attack success rate of LPM-MI-FGSM1 is 36.6% on adv-Inc-v3. The results for the IRes-v2 and Res-152 also follow this trend.

IV-D Comparison of Transferability

We conduct experiments based on the existing advanced methods (i.e.,I-FGSM [11], TI-FGSM [3], DI-FGSM [34], MI-FGSM [2], and S²I-FGSM [18]) and enhance these methods with our LPM. The results are reported in Table I, Table II, and Table III. From the table, we can see that the LPM can effectively strengthen the existing SOTA methods when attacking white-box models. More importantly, the proposed method can significantly improve the success rate of transfer attacks. Specifically, when we generate adversarial examples using Inc-v3 as the white-box model, LPM-I-FGSM, LPM-TI-FGSM, LPM-DI-FGSM, LPM-MI-FGSM, and LPM-S²I-FGSM achieve average transfer success rates of 27.66%, 28.63%, 36.96%, 42.74%, and 49.13%, which outperform the original attack performance of 2.45%, 3.26%, 6.17%, 6.17%, and 7.04% respectively. This reveals that our LPM can further improve the adversarial transferability of existing methods.

IV-E Combined with Existing Methods

The momentum term can stabilize update directions and improve adversarial transferability [2]. Lin et al. [13] show that the combination (DTS) of translation-invariant (TI) [3], diverse inputs (DI) [34], and scale invariant (SI) [13] can further improve attack success rates. Liu et al. [15] find that attacking multiple models simultaneously can effectively improve the transferability of adversarial images. Here, we combine all of them with existing SOTA methods and adopt a model ensemble attack by averaging the logit outputs of Inc-v3, IRes-v2, and ResNet-152 to calculate the loss to generate adversarial examples, and the results are reported in Table IV. We can observe that our LPM can further enhance the adversarial transferability for existing methods MI-FGSM-DTS [2], NI-FGSM-DTS [13], VNI-FGSM-DTS [26], and S²I-MI-FGSM-DTS [18], which can achieve an increase in the attack success rate of 5.39%, 6.68%, 1.58%, and 0.41%, respectively. This is a remarkable improvement and indicates that our approach has good scalability and can be combined with existing methods to further improve transferability. The effect of the attack is shown in Figure 6.

V Attack on Vision Transformer Model

To comprehensively evaluate the proposed method, we conduct experiments on vision transformers (ViTs). Eight ViTs are selected, i.e., ViT-B [39], PiT-B [42], DeiT-B [43], ViT-S [39], PiT-S [42], Visformer-S [38], LeViT-256 [41], and ConViT-B [40]. All ViTs are pretrained on ImageNet. We adopt model ensemble attack and use Inc-v3, IRes-v2, and Res-101 as white-box models to generate adversarial examples. The results are reported in Table V. We can see that LPM can effectively improve the average attack success rate for the existing methods MI-FGSM-DTS [2], NI-FGSM-DTS [13], VNI-FGSM-DTS [26], and S²I-MI-FGSM-DTS [18], which can achieve an increase in the attack success rate of 7.29%, 6.13%, 1.85%, and 0.55%, respectively. The experimental results in Table V illustrate the applicability of the proposed method to ViTs, and indicate that the current ViTs are prone to the threat of adversarial examples.

V-A Trade-off between Performance and Efficiency

In this subsection, we discuss the trade-off between the performance and efficiency of the LPM. As a preprocessing method, the time consumption of LPM is mainly concentrated in the process of searching the model-specific regions (DE). The time complexity of the DE algorithm is determined by these hyperparameters: the DE iterations $T_{DE}$ (step 2 in Algorithm 1), the population size $P$ (step 4 in Algorithm 1), and the attack iterations $T_{m}$ (step 6 in Algorithm 1). Because the time cost caused by $P$ can be avoided by parallelizing the processing of the population, we mainly discuss the impact of the DE iterations $T_{DE}$ and the attack iterations $T_{m}$ . The $T_{DE}$ and $T_{m}$ determine whether proper individual masks can be selected and alleviate the instability of the performance caused by randomness. The experiments are conducted on LPM-I-FGSM with NVIDIA GeForce RTX 3080.

TABLE V: The attack success rates (%) toward ViTs in the ensemble attack setting. DTS denotes the combination of TI [3], DI [34], and SI [13]. The adversarial examples are generated via an ensemble of Inc-v3, IRes-v2, and Res-101, the weight of each is 1/3.

Attack	ViT-B	ViT-S	DeiT-S	PiT-B	PiT-S	Visformer-S	LeViT-256	ConViT-B	Avg
MI-FGSM-DTS	67.2	79.1	64.4	75	83.2	87.8	87.2	68.2	76.51
LPM-MI-FGSM-DTS	74.8	85.2	73.4	82.6	89.7	93.4	93.7	77.6	83.80
NI-FGSM-DTS	67.3	78.8	66.0	75.6	84.4	90.0	89.3	69.6	77.63
LPM-NI-FGSM-DTS	74.1	85.0	73.3	82.6	89.8	94.2	94.6	76.5	83.76
VNI-FGSM-DTS	77.0	84.5	64.2	68.7	81.5	84.8	85.2	69.0	76.86
LPM-VNI-FGSM-DTS	78.9	86.1	65.4	70.9	83.6	86.6	88.5	69.7	78.71
S²I-MI-FGSM-DTS	83.6	89.9	76.7	83.2	89.6	92.1	92.9	79.3	85.91
LPM-S²I-MI-FGSM-DTS	84.2	90.8	77.4	84.1	89.9	92.5	93.2	79.6	86.46

TABLE VI: The attack success rates (%) and time cost (s) with different

T_{DE}

and

T_{m}

of LPM, other setting is the same as mentioned before. The time cost refers to the time applying a single image to search for a learnable mask. The adversarial examples are generated by Inc-v3.

T_{DE}=0

denotes that we apply generated Patch-wise masks while the iteration of DE is 0, which is euqal to random generated masks.

$T_{DE}$	$T_{m}$	IRes-v2	Inc-v4	Res-101	Res-152	Time (s)
0	-	19.3	30.8	27.4	23.8	2.7
5	5	21.2	32.9	28.7	25.8	3.8
5	8	21.2	35.0	29.8	27.0	6.5
5	10	22.9	32.5	28.5	26.7	8.3
10	5	22.1	33.1	29.8	25.6	7.5
10	8	22.1	34.1	30.9	27.0	11.9
10	10	21.2	33.3	30.2	28.8	14.9

The results in Table VI show that $T_{DE}$ and $T_{m}$ will obviously affect the time cost. When $T_{DE}$ and $T_{m}$ decrease, the time cost will correspondingly decrease. Meanwhile, the attack performance will have a minor fluctuation in a certain range and still maintain a competitive performance, e.g., $T_{DE}=5$ and $T_{m}=8$ is a cost-effective solution compared with the original setting. Therefore, our LPM can be chosen with different hyperparameter settings in the DE algorithm according to the actual requirement.

V-B Quantification of Gradient Aggregation

To measure the impact of our method on gradient aggregation, we introduce the clustering coefficient [29, 7] to measure the aggregation degree of saliency maps [21] for clean images and images masked by our DE algorithm.

Figure 1(a) shows that owing to model-specific regions, the saliency maps show a cluttered distribution. When these model-specific regions are masked, the saliency maps become clustered (Figure 1(b)). Therefore, we can use the aggregation degree of saliency maps in an image to reflect whether our hypothesis about the model-specific is true. Here we introduce the local clustering coefficient [29, 7] to compute quantitative evidence. In saliency maps, if one pixel is greater than a certain threshold, it is seen as a vertex. We compute $C_{i}$ for each vertex, which denotes how close one vertex’s neighbors are to being a clique. The $C_{i}$ can be defined as follows:

C=\frac{1}{N}\sum_{i=1}^{N}{C_{i}},~{}C_{i}=\frac{2|\{e_{jk}:v_{j},v_{k}\in L_{i}\}|}{k_{i}(k_{i}-1)},

(10)

where $N$ is the total vertex, and $L_{i}$ represents the set of immediately connected vertices with vertex $v_{i}$ . $v_{j}$ and $v_{k}$ are two vertices that belong to $L_{i}$ . $e_{jk}$ denotes the edge connecting vertex $v_{j}$ with vertex $v_{k}$ , and $k_{i}$ denotes the number of neighbors of vertex $v_{i}$ . $|\cdot|$ denotes the number of edges.

Here we also compute the average clustering coefficient of saliency maps generated by three DNN models, i.e., ResNet-50, Inception-v3, and VGG-16. The result in Table VII shows a higher aggregation degree of mask images than benign images, which means that our masks prune model-specific regions and lead to clustered saliency maps.

TABLE VII: The average clustering coefficient of three DNN models.

Saliency map	Inc-v3	VGG-16	Res-50
Benign image	0.179	0.153	0.20
Masked image	0.212	0.190	0.23

VI Conclusion

In this paper, we argued that model-specific discriminative regions caused overfitting to the source model and thus reduced adversarial transferability. Then, we proposed a learning strategy based on the differential evolution algorithm to search for the patch-wise mask (LPM), which was used to prune model-specific regions when calculating adversarial perturbations. LPM as a preprocessing operation could be integrated with existing gradient-based methods and effectively improve these methods’ transfer attack success rates. In the ensemble attack setting, the proposed approach achieved an average success rate of 93.01% against seven advanced defense mechanisms, demonstrating the effectiveness of our method.

Acknowledgement

This work was supported by the Project of the National Natural Science Foundation of China (No.62076018), and the Fundamental Research Funds for the Central Universities.

References

[1] Uday K Chakraborty. Advances in differential evolution, volume 143. Springer, 2008.
[2] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
[3] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
[4] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
[5] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
[7] Paul W Holland and Samuel Leinhardt. Transitivity in structural models of small groups. Comparative group studies, 2(2):107–124, 1971.
[8] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
[9] Jinyuan Jia, Xiaoyu Cao, Binghui Wang, and Neil Zhenqiang Gong. Certified robustness for top-k predictions against adversarial perturbations via randomized smoothing. arXiv preprint arXiv:1912.09899, 2019.
[10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.
[11] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
[12] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1778–1787, 2018.
[13] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, 2019.
[14] Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. Perceptual-sensitive gan for generating adversarial patches. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 1028–1035, 2019.
[15] Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
[16] Zihao Liu, Qi Liu, Tao Liu, Nuo Xu, Xue Lin, Yanzhi Wang, and Wujie Wen. Feature distillation: Dnn-oriented jpeg compression against adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
[17] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
[18] Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xianglong Liu, Jian Zhang, and Jingkuan Song. Frequency domain model augmentation for adversarial attack. In European Conference on Computer Vision, pages 549–566. Springer, 2022.
[19] Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 262–271, 2020.
[20] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[21] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
[22] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
[23] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
[24] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[25] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
[26] Xiaosen Wang and Kun He. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1924–1933, June 2021.
[27] Xiaosen Wang, Xuanran He, Jingdong Wang, and Kun He. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16158–16167, 2021.
[28] Zhibo Wang, Hengchang Guo, Zhifei Zhang, Wenxin Liu, Zhan Qin, and Kui Ren. Feature importance-aware transferable adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7639–7648, 2021.
[29] Duncan J Watts and Steven H Strogatz. Collective dynamics of ‘small-world’networks. nature, 393:440–442, 1998.
[30] Weibin Wu, Yuxin Su, Xixian Chen, Shenglin Zhao, Irwin King, Michael R Lyu, and Yu-Wing Tai. Boosting the transferability of adversarial samples via attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1161–1170, 2020.
[31] Weibin Wu, Yuxin Su, Michael R Lyu, and Irwin King. Improving the transferability of adversarial samples with adversarial transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9024–9033, 2021.
[32] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. arXiv preprint arXiv:1711.01991, 2017.
[33] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan L. Yuille. Mitigating adversarial effects through randomization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
[34] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, Jianyu Wang, Zhou Ren, and Alan L Yuille. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2730–2739, 2019.
[35] Yifeng Xiong, Jiadong Lin, Min Zhang, John E Hopcroft, and Kun He. Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14983–14992, 2022.
[36] Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. The Internet Society, 2018.
[37] Jianping Zhang, Weibin Wu, Jen-tse Huang, Yizhan Huang, Wenxuan Wang, Yuxin Su, and Michael R Lyu. Improving adversarial transferability via neuron attribution-based attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14993–15002, 2022.
[38] Zhengsu Chen, Lingxi Xie, Jianwei Niu, Xuefeng Liu, Longhui Wei, and Qi Tian. Visformer: The vision-friendly transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 589–598, 2021.
[39] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[40] Stéphane d’Ascoli, Hugo Touvron, Matthew L Leavitt, Ari S Morcos, and Levent Sagun. Convit: Improving vision transformers with soft convolutional inductive biases. In International Conference on Machine Learning, pages 2286–2296. PMLR, 2021.
[41] Benjamin Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, and Matthijs Douze. Levit: a vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12259–12269, 2021.
[42] Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, and Seong Joon Oh. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11936–11945, 2021.
[43] Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, and Alexandre Sablayrolles. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning, pages 10347–10357. PMLR, 2021.
[44] Zhao, S., Yu, J., Sun, Z., Zhang, B., Wei, X.: Enhanced accuracy and robustness via multi-teacher adversarial distillation. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV. pp. 585–602. Springer (2022)
[45] Gao, Lianli and Huang, Zijie and Song, Jingkuan and Yang, Yang and Shen, Heng Tao. Push & pull: Transferable adversarial examples with attentive attack. In IEEE Transactions on Multimedia, volume 24, pages 2329–2338. IEEE, 2021.
[46] Wan, Chen and Huang, Fangjun and Zhao, Xianfeng. Average gradient-based adversarial attack. In IEEE Transactions on Multimedia, IEEE, 2023.
[47] Zhang, Shihui and Zuo, Dongxu and Yang, Yongliang and Zhang, Xiaowei. A Transferable Adversarial Belief Attack with Salient Region Perturbation Restriction. In IEEE Transactions on Multimedia, IEEE, 2022.
[48] Yuan, Haojie and Chu, Qi and Zhu, Feng and Zhao, Rui and Liu, Bin and Yu, Neng-Hai. AutoMA: towards automatic model augmentation for transferable adversarial attacks. In IEEE Transactions on Multimedia, IEEE, 2021.
[49] Wei, X., Guo, Y., Yu, J.: Adversarial sticker: A stealthy attack method in the physical world. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)
[50] Wei, X., Guo, Y., Yu, J., Zhang, B.: Simultaneously optimizing perturbations and positions for black-box adversarial patch attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)