\useunder

\ul

Improving Transferability of Adversarial Examples via Bayesian Attacks

Qizhang Li, Yiwen Guo, Xiaochen Yang, Wangmeng Zuo, Hao Chen Q. Li is with the School of Computer Science and Technology, Harbin Institute of Technology, China. E-mail: [email protected]. Y. Guo is with ByteDance AI Lab. E-mail: [email protected]. X. Yang is with the School of Mathematics and Statistics, University of Glasgow, UK. E-mail: [email protected]. W. Zuo is with the School of Computer Science and Technology, Harbin Institute of Technology, China. E-mail: [email protected]. H. Chen is with the Department of Computer Science, University of California, Davis, US. Email: [email protected] received June 1, 2023.

Abstract

This paper presents a substantial extension of our work published at ICLR [1]. Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our empirical findings demonstrate that: 1) the combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability; 2) by introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when comparing with our ICLR basic Bayesian method. We will make our code publicly available.

Index Terms:

Deep neural networks, adversarial examples, transferability, generalization ability.

1 Introduction

Deep neural networks (DNNs) have demonstrated remarkable performances in various applications, such as computer vision [2], natural language processing [3], and speech recognition [4]. Nevertheless, these models have been found to be vulnerable to adversarial examples [5], which are maliciously perturbed input data that could mislead the models into making incorrect predictions. This raises serious concerns about the safety of DNNs, particularly when applying them in security-critical domains. More severely, adversarial examples are transferable [6]. An attacker can manipulate adversarial examples on a substitute model to attack unknown victim models with different architectures and parameters. An in-depth exploration of adversarial transferability is crucial as it provides valuable insights into the nature of DNNs, enables more comprehensive evaluation of their robustness, and promotes the design of stronger defense methods and fully resistant networks.

Over the past several years, considerable effort has been devoted to improving the transferability of adversarial examples. In particular, ensemble-based methods [7, 8, 9] attract our attention as they can be readily combined with almost all other methods, such as optimizing intermediate-level representations [10, 11] or modifying the backpropagation computation [12, 13]. The effectiveness of introducing ensemble-based attack is generally related to the number of models available in the bucket, thus we consider a statistical ensemble which consists of infinite many substitute models in some sense by introducing a Bayesian formulation. To model the randomness of models comprehensively, we introduce distributions to the model parameters and model input. Parameters of these distributions can be obtained from an off-the-shelf model, and if fine-tuning is possible, these parameters can further be optimized. Adversarial examples are then crafted by maximizing the mean prediction loss averaged across the parameters and inputs sampled from their respective distributions.

We evaluate our method in attacking a variety of models on ImageNet [14] and CIFAR-10 [15]. The proposed method outperforms state-of-the-arts considerably. We also show that our method can be readily integrated with existing methods for further improving the attack performance.

What’s new in comparison to [1]: 1) We introduce a probability measure to model input in conjunction with the parameters of the substitute models. This enables the joint diversification of both the model and input throughout the iterative process of generating adversarial examples. 2) With the incorporation of input randomness in our Bayesian formulation, we newly derive a principled optimization objective for fine-tuning that encourages flat minima in both the parameter space and the input space.

2 Background and Related Work

Background on adversarial examples. Given a benign input ${\mathbf{x}}$ with label $y$ and a substitute model $f_{\mathbf{w}}$ with full knowledge of architecture and parameters ${\mathbf{w}}$ , gradient-based approaches generate adversarial examples by optimizing $\ell_{p}$ -bounded perturbations $\mathbf{\Delta x}$ to maximize the prediction loss $L$ :

\operatorname*{arg\,max}_{\|\mathbf{\Delta x}\|_{p}\leq\epsilon}L({\mathbf{x}}+\mathbf{\Delta x},y,{\mathbf{w}}),

where $\epsilon$ is the perturbation budget.

FGSM [16] is a simple one-step attack method to obtain adversarial examples in the $p=\infty$ setting:

{\mathbf{x}}^{\text{adv}}={\mathbf{x}}+\epsilon\cdot\mathrm{sign}(\nabla_{{\mathbf{x}}}L({\mathbf{x}},y,{\mathbf{w}})).

The iterative variant of FGSM, I-FGSM [17], is capable of generating more powerful attacks:

{\mathbf{x}}^{\text{adv}}_{0}={\mathbf{x}},\ {\mathbf{x}}^{\text{adv}}_{t+1}=\mathrm{clip}({\mathbf{x}}^{\text{adv}}_{t}+\eta\cdot\mathrm{sign}(\nabla_{{\mathbf{x}}}L({\mathbf{x}},y,{\mathbf{w}})),

where $\eta$ is the step size and the $\mathrm{clip}$ function ensures that the generated adversarial examples remain within the pre-specified range.

2.1 Transfer-based Attacks

FGSM and I-FGSM require calculating gradients of the victim model. Nevertheless, in practical scenarios, the attacker may not have enough knowledge about the victim model for calculating gradients. To address this issue, many attacks rely on the transferability of adversarial examples, meaning that adversarial examples crafted for one classification model (using, for example, FGSM or I-FGSM) can often successfully attack other victim models. It is normally assumed to be able to query the victim model to annotate training samples, collect a set of samples from the same distribution as that modeled by the victim models, or collect a pre-trained substitute model that is trained to accomplish the same task as the victim models.

To enhance the transferability of adversarial examples, several groups of methods have been proposed. Advancements include improved optimizer and gradient computation techniques for updating ${\mathbf{x}}^{\text{adv}}$ [18, 19, 13, 10, 20, 11, 21], innovative strategies of training and fine-tuning substitute models [22, 23], and two groups of methods closely related to this paper: random input augmentation [24, 25, 19, 26] and substitute model augmentation (i.e., ensemble) [7, 9, 27, 28]. For instance, in input augmentation methods, Dong et al. [24], Xie et al. [25], Lin et al. [19] introduced different sorts of transformations into the iterative update of adversarial examples. Comparing with these methods, our method introduces input diversity via a principled Bayesian formulation and for the first time takes such transformation into account during substitute model fine-tuning. For substitute model augmentation (i.e., ensemble), Liu et al. [7] proposed to generate adversarial examples on an ensemble of multiple substitute models that differ in their architectures. Additionally, Xiong et al. [9] proposed stochastic variance reduced ensemble to reduce the variance of gradients of different substitute models following the spirit of stochastic variance reduced gradient [29]. Gubri et al. [28] suggested fine-tuning with a fixed and large learning rate to collect multiple models along the training trajectory for the ensemble attack. In this paper, we consider the diversity in both the substitute models and model inputs by introducing a Bayesian approximation for achieving this.

2.2 Bayesian DNNs

If a deep neural network (DNN) is considered as a probabilistic model, the process of training its parameters, denoted as ${\mathbf{w}}$ , can be seen as maximum likelihood estimation or maximum a posteriori estimation (with regularization). In Bayesian deep learning, the approach involves simultaneously estimating the posterior distribution of the parameters given the data. The prediction for any new input instance is obtained by taking the expectation over this posterior distribution. Due to the large number of parameters typically involved in DNNs, optimizing Bayesian models becomes more challenging compared to shallow models. As a result, numerous studies have been conducted to address this issue, leading to the development of various scalable approximations. Effective methods utilize variational inference [30, 31, 32, 33, 34, 35, 36, 37], dropout inference [38, 39, 40], Laplace approximation [41, 42, 43], or stochastic gradient descent (SGD)-based approximation [44, 45, 46, 47]. Taking SWAG [45] as an example, which is an SGD-based approximation, it approximates the posterior using a Gaussian distribution with the stochastic weight averaging (SWA) solution as its first raw moment and the composition of a low rank matrix and a diagonal matrix as its second central moment. Our method is developed in a Bayesian spirit and we shall discuss SWAG thoroughly later in this paper. In addition to approximating the posterior over model parameters, our work also involves approximating the posterior over adversarial perturbations. This is implemented in order to introduce randomization into the model input during each iteration of iterative attacks.

In recent years, there has been research on studying the robustness of Bayesian DNNs. Besides exploring the probabilistic robustness and safety measures of such models [48, 49], attacks have been adapted [50, 51] to evaluate the robustness of these models in practice. While Bayesian models are often considered to be more robust [52, 53], adversarial training has also been proposed for further safeguarding them, as can be seen in the work by Liu et al. [50]. However, these studies have not specifically focused on adversarial transferability as we do in this paper.

3 Bayesian Attack for Improved Transferability

An intuition for improving the transferability of adversarial examples suggests improving the diversity during back-propagation. As an extension of [1], this paper considers model diversity and input diversity jointly.

3.1 Generate Adversarial Examples via Bayesian Modeling

Bayesian learning aims to discover a distribution of likely models rather than a single deterministic model. Let $\mathcal{D}=\{({\mathbf{x}}_{i},y_{i})\}_{i=1}^{N}$ denote a training set. Bayesian inference incorporates a prior belief about the parameters ${\mathbf{w}}$ through the prior distribution $p({\mathbf{w}})$ and updates this belief after observing the data $\mathcal{D}$ using Bayes’ theorem, resulting in the posterior distribution: $p(\mathbf{w}|\mathcal{D})\propto p(\mathcal{D}|{\mathbf{w}})p(\mathbf{w})$ . For a new input ${\mathbf{x}}$ , the predictive distribution of its class label $y$ is given by the Bayesian model averaging , i.e.,

p(y|{\mathbf{x}},\mathcal{D})=\int_{{\mathbf{w}}}p(y|{\mathbf{x}},{\mathbf{w}})p({\mathbf{w}}|\mathcal{D})d{\mathbf{w}},

(1)

where $p(y|{\mathbf{x}},{\mathbf{w}})$ is the conditional probability, obtained from the DNN output followed by a softmax function.

To perform attack on such a Bayesian model, a straightforward idea is to minimize the probability of the true class for a given input, as in [1]:

		$\displaystyle\operatorname*{arg\,min}_{\\|\mathbf{\Delta x}\\|_{p}\leq\epsilon}p(y\|{\mathbf{x}}+\mathbf{\Delta x},\mathcal{D})$		(2)
	$\displaystyle=$	$\displaystyle\operatorname*{arg\,min}_{\\|\mathbf{\Delta x}\\|_{p}\leq\epsilon}\int_{{\mathbf{w}}}p(y\|{\mathbf{x}}+\mathbf{\Delta x},{\mathbf{w}})p({\mathbf{w}}\|\mathcal{D})d{\mathbf{w}}.$		(2)

An iterative optimizer (e.g., I-FGSM) is often applied, and at the $t$ -the iteration, it seeks $\widetilde{\mathbf{\Delta x}}$ to minimize

\int_{{\mathbf{w}}}p(y|{\mathbf{x}}+\widehat{\mathbf{\Delta x}}_{t}+\widetilde{\mathbf{\Delta x}},{\mathbf{w}})p({\mathbf{w}}|\mathcal{D})d{\mathbf{w}}

(3)

while ensuring $\|\widehat{\mathbf{\Delta x}}_{t}+\widetilde{\mathbf{\Delta x}}\|_{p}\leq\epsilon$ . $\widehat{\mathbf{\Delta x}}_{1}=0$ , and for $t>1$ , $\widehat{\mathbf{\Delta x}}_{t}$ is the sum of all perturbations accumulated over the previous $t-1$ iterations. Optimizing Eq. (2) or minimizing Eq. (3) can be regarded as generating adversarial examples that could succeed on a distribution of models, and it has been proved effective in our previous work [1].

In Eqs. (2) and (3), Bayesian model averaging is used to predict the label of a deterministic model input. Such perturbation optimization processes solely consider the diversity of models, while the diversity of model inputs is overlooked. Nevertheless, considering that different DNN models may be equipped with different pre-processing operations, given the same benign input (e.g., a clean image), ${\mathbf{x}}$ can be different for different models after their specific pre-processing steps. Moreover, even given the same ${\mathbf{x}}$ (which is the pre-processed model input), different models obtain different $\widehat{\mathbf{\Delta x}}_{t}$ . Therefore, introducing input diversity into Eqs. (2) and (3) may also be beneficial to the transferability of generated adversarial examples.

To incorporate such diversity, we simply introduce some randomness to the model input by rewriting Eq. (3) as:

\int_{{\mathbf{w}},{\mathbf{e}}}p(y|{\mathbf{x}}+\widehat{\mathbf{\Delta x}}_{t}+{\mathbf{e}}+\widetilde{\mathbf{\Delta x}},{\mathbf{w}})p({\mathbf{w}}|\mathcal{D})p({\mathbf{e}})d{\mathbf{w}}d{\mathbf{e}}.

(4)

Here the randomness term ${\mathbf{e}}$ is added linearly to the input ${\mathbf{x}}$ and/or accumulated perturbations $\widehat{\mathbf{\Delta x}}_{t}$ . We can also introduce more complex modifications, such as by using a Gaussian filter $g({\mathbf{x}},\widehat{\mathbf{\Delta x}}_{t},{\mathbf{e}})$ with a random standard deviation [54]; yet, for simplicity, we discuss the linear case in this paper. Due to the very large number of parameters, it is intractable to perform exact inference using Eq. (4). Instead we adopt the Monte Carlo sampling to approximate the integral, where a set of $M$ models, each parameterized by ${\mathbf{w}}_{j}$ , are sampled from the posterior $p({\mathbf{w}}|\mathcal{D})$ and $S$ inputs are sampled from $p({\mathbf{e}})$ . The optimization problem can then be cast to maximizing

	$\displaystyle\frac{1}{MS}\sum_{j=1}^{M}\sum_{k=1}^{S}L({\mathbf{x}}+\widehat{\mathbf{\Delta x}}_{t}+{\mathbf{e}}_{k}+\widetilde{\mathbf{\Delta x}},y,{\mathbf{w}}_{j}),$		(5)
	$\displaystyle\operatorname{s.t.}{\mathbf{w}}_{j}\sim p({\mathbf{w}}\|\mathcal{D}),{\mathbf{e}}_{k}\sim p({\mathbf{e}}),$		(5)

where $L(\cdot,\cdot,{\mathbf{w}}_{j})$ is a function evaluating the prediction loss of a DNN model parameterized by $\mathbf{w}_{j}$ .

3.2 Construct a Posterior without Fine-tuning

Given any pre-trained DNN model whose parameters are $\hat{{\mathbf{w}}}$ , we can simply obtain a posterior $\mathcal{N}(\hat{\mathbf{w}},\sigma^{2}\mathbf{I})$ without fine-tuning by assuming it as an isotropic Gaussian. $\sigma$ is a positive constant for controlling the diversity of distribution. Similarly, we can consider ${\mathbf{e}}_{k}\sim\mathcal{N}(\mathbf{0},\sigma_{{\mathbf{e}}}^{2}\mathbf{I})$ . Figure 1 compares the effectiveness of generating transferable attacks using Eq. (5) and the original implementation in our ICLR paper [1]. The experiment was conducted on ImageNet with ResNet-50 used as the substitute model; experiment details are deferred to Section 4.1. Apparently, introducing either model diversity or input diversity in a Bayesian manner could outperform the baseline I-FGSM on all victim models. More significantly, a joint diversification could further enhance adversarial transferability with a 17.05% increase in the average success rate compared with considering the model diversity alone [1].

Refer to caption — Figure 1: Comparison of success rates in attacking 10 different victim models when adversarial examples are generated on a substitute model (ResNet-50) in a non-Bayesian manner (i.e., I-FGSM) and using Bayesian modeling for model parameters, inputs, and both (without fine-tuning). Dotted lines indicate the average success rate across all 10 victim models. We performed $\ell_{\infty}$ attacks with $\epsilon=8/255$ . Best viewed in color.

3.3 Obtain a more Suitable Posterior via Fine-tuning

In this subsection, we explain the optimization procedure of the posterior when fine-tuning the Bayesian model from a pre-trained model is possible. Following prior work, we consider a threat model in which fine-tuning can be performed on datasets collected for the same task as the victim models.

As in Section 3.2, we assume an isotropic Gaussian posterior $\mathcal{N}(\hat{\mathbf{w}},\sigma^{2}\mathbf{I})$ ; however, the mean vector $\hat{\mathbf{w}}$ is now considered as a trainable parameter. Optimization of the Bayesian model, or more specifically $\hat{\mathbf{w}}$ , can be formulated as:

\max_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}\mathbb{E}_{{\mathbf{w}}\sim\mathcal{N}(\hat{\mathbf{w}},\sigma^{2}\mathbf{I}),{\mathbf{e}}\sim\mathcal{N}(0,\sigma_{{\mathbf{e}}}^{2}\mathbf{I})}p(y_{i}|{\mathbf{x}}_{i},{\mathbf{e}},{\mathbf{w}}).

(6)

By adopting Monte Carlo sampling, it can further be reformulated as:

	$\displaystyle\min_{\hat{\mathbf{w}}}\frac{1}{NMS}\sum_{i=1}^{N}\sum_{j=1}^{M}\sum_{k=1}^{S}L({\mathbf{x}}_{i}+{\mathbf{e}}_{i,k},y_{i},\hat{\mathbf{w}}+\mathbf{\Delta w}_{j}),$		(7)
	$\displaystyle\operatorname{s.t.}\,\mathbf{\Delta w}_{j}\sim{\mathcal{N}}(\mathbf{0},\sigma^{2}\mathbf{I}),{\mathbf{e}}_{i,k}\sim\mathcal{N}(\mathbf{0},\sigma_{{\mathbf{e}}}^{2}\mathbf{I})$		(7)

The computational complexity of Eq. (7) is high, thus we focus on the worst-case performance in the distributions, whose loss bounds the objective in Eq. (7) from below. The optimization problem then becomes:

$\displaystyle\min_{\hat{\mathbf{w}}}\max_{\mathbf{\Delta w},\{{\mathbf{e}}_{i}\}}$	$\displaystyle\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i}+{\mathbf{e}}_{i},y_{i},\hat{\mathbf{w}}+\mathbf{\Delta w}),$	(8)
$\displaystyle\operatorname{s.t.}$	$\displaystyle\,\mathbf{\Delta w}\sim{\mathcal{N}}(\mathbf{0},\sigma^{2}\mathbf{I})\ \mathrm{and}\ p(\mathbf{\Delta w})\geq\varepsilon,$
	$\displaystyle\,{\mathbf{e}}_{i}\sim\mathcal{N}(\mathbf{0},\sigma_{{\mathbf{e}}}^{2}\mathbf{I})\ \mathrm{and}\ p({\mathbf{e}}_{i})\geq\varepsilon_{{\mathbf{e}}},$

where $\varepsilon$ and $\varepsilon_{{\mathbf{e}}}$ control the confidence region of the Gaussian distributions.

By applying the first-order Taylor approximation to the objective function in Eq. (8), we can approximate the optimal solutions $\mathbf{\Delta w}^{\ast}$ and ${\mathbf{e}}_{i}^{\ast}$ to the inner-maximization problem using $\lambda_{\varepsilon,\sigma}\nabla_{\hat{\mathbf{w}}}\sum_{i}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})/\|\nabla_{\hat{\mathbf{w}}}\sum_{i}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})\|$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}\nabla_{{\mathbf{x}}_{i}}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})/\|\nabla_{{\mathbf{x}}_{i}}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})\|$ , respectively. $\lambda_{\varepsilon,\sigma}$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ are computed using the quantile function of the Gaussian distributions. Thereafter, the outer gradient for solving Eq. (8) is:

\nabla_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})+\mathbf{H_{\hat{{\mathbf{w}}},\hat{{\mathbf{w}}}}}\mathbf{\Delta w}^{\ast}+\mathbf{H_{\hat{{\mathbf{w}}},{\mathbf{x}}_{i}}}{\mathbf{e}}_{i}^{\ast},

(9)

which involves second-order partial derivatives in the Hessian matrix, and it can be approximately calculated using the finite difference method. That said, we use

	$\displaystyle\mathbf{H_{\hat{{\mathbf{w}}},\hat{{\mathbf{w}}}}}\mathbf{\Delta w}^{\ast}\approx\frac{1}{\gamma}\left(\nabla_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}}+\gamma\mathbf{\Delta w}^{\ast})\right.$		(10)
	$\displaystyle\left.-\nabla_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})\right),$		(10)

where $\gamma$ is a small positive constant. $\mathbf{H_{\hat{{\mathbf{w}}},{\mathbf{x}}_{i}}}{\mathbf{e}}_{i}^{\ast}$ is approximated similarly as

	$\displaystyle\mathbf{H_{\hat{{\mathbf{w}}},{\mathbf{x}}_{i}}}{\mathbf{e}}_{i}^{\ast}\approx\frac{1}{\gamma}\left(\nabla_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i}+\gamma{\mathbf{e}}_{i}^{\ast},y_{i},\hat{\mathbf{w}})\right.$		(11)
	$\displaystyle\left.-\nabla_{\hat{\mathbf{w}}}\frac{1}{N}\sum_{i=1}^{N}L({\mathbf{x}}_{i},y_{i},\hat{\mathbf{w}})\right).$		(11)

By introducing $\mathbf{H_{\hat{{\mathbf{w}}},\hat{{\mathbf{w}}}}}\mathbf{\Delta w}^{\ast}$ and $\mathbf{H_{\hat{{\mathbf{w}}},{\mathbf{x}}_{i}}}{\mathbf{e}}_{i}^{\ast}$ in Eq. (9), we encourage flat minima in the parameter space and the input space, respectively. The former is known to be beneficial to the generalization ability of DNNs [55], while the later has been paid little attention during training/fine-tuning.

To evaluate the effectiveness of fine-tuning, an experiment was carried out on ImageNet using ResNet-50 as the substitute model, and results are shown in Figure 2. It clearly suggests that fine-tuning leads to more significant adversarial transferability. More detailed comparison results are provided in Table I.

TABLE I: Comparing transferability of FGSM and I-FGSM adversarial examples generated on a deterministic substitute model and the Bayesian substitute model (with isotropic or improved Gaussian posteriors and with or without fine-tune) under the

\ell_{\infty}

constraint with

\epsilon=8/255

on ImageNet. The architecture of the substitute models is ResNet-50, and “Average” was calculated over all ten victim models.

		fine- tune	VGG -19 [56]	Inception v3 [57]	ResNet -152 [58]	DenseNet -121 [59]	ConvNeXt -B [60]	ViT -B [61]	DeiT -B [62]	Swin -B [63]	BEiT -B [64]	Mixer -B [65]	\ul Average
FGSM	-	✗	41.26%	28.08%	33.64%	41.20%	14.50%	9.04%	8.04%	9.52%	9.30%	17.72%	21.23%
	Isotropic	✗	52.02%	38.10%	46.40%	53.84%	17.06%	12.50%	11.90%	13.00%	13.62%	23.18%	28.16%
		✓	75.50%	57.14%	66.08%	76.78%	24.94%	26.02%	24.00%	23.58%	30.12%	38.04%	44.22%
	Improved	✗	-	-	-	-	-	-	-	-	-	-	-
		✓	84.50%	68.04%	77.10%	85.94%	31.06%	33.32%	30.76%	29.42%	38.62%	45.06%	52.38%
I-FGSM	-	✗	44.52%	24.06%	36.14%	44.40%	15.16%	6.50%	5.68%	8.24%	7.24%	12.76%	20.47%
	Isotropic	✗	82.86%	60.96%	79.94%	85.42%	36.92%	18.52%	15.86%	19.82%	23.46%	27.14%	45.09%
		✓	97.32%	89.94%	96.58%	98.84%	49.44%	43.68%	35.00%	34.56%	59.62%	49.70%	65.47%
	Improved	✗	92.40%	77.22%	91.42%	94.48%	45.44%	29.20%	26.32%	31.36%	36.74%	39.32%	56.39%
		✓	99.12%	95.64%	99.12%	99.66%	64.28%	58.20%	51.66%	49.26%	73.98%	61.04%	75.20%

3.4 Improved Distribution Modeling

In Sections 3.2 and 3.3, we have demonstrated the superiority of adopting the Bayesian formulation for generating transferable adversarial examples. However, it should be noted that our approach relies on a relatively strong assumption that the posterior follows an isotropic distribution. Taking one step further, we remove the assumption about the covariance matrix and try to learn it from data in this subsection.

Numerous methods have been proposed to learn covariance matrices, but in this paper we opt for SWAG [45] due to its simplicity and scalability. SWAG offers an enhanced Gaussian approximation to the distribution of ${\mathbf{w}}$ and ${\mathbf{e}}_{i}$ (for brevity, the subscript $i$ will be dropped in this subsection). Specifically, we adopt the SWA solution [66] as the mean of ${\mathbf{w}}$ , and decompose the covariance matrix into a diagonal term, a low-rank term, and a scaled identity term, i.e., ${\mathbf{w}}\sim{\mathcal{N}}({\mathbf{w}}_{\mathrm{SWA}},\mathbf{\Sigma}_{\mathbf{w}})$ , where

\mathbf{\Sigma}_{\mathbf{w}}=\alpha(\mathbf{\Sigma}_{\mathrm{diag}}+\mathbf{\Sigma}_{\mathrm{low-rank}})+\beta\mathbf{I}.

(12)

$\alpha\geq 0$ represent the scaling factor of SWAG for disassociating the learning rate of the covariance [45] and $\beta\geq 0$ controls the covariance matrix of the isotropic Gaussian distribution. ${\mathbf{w}}_{\mathrm{SWA}}$ , $\mathbf{\Sigma}_{\mathrm{diag}}$ , and $\mathbf{\Sigma}_{\mathrm{low-rank}}$ are obtained after fine-tuning converges, thus we keep the fine-tuning loss and fine-tuning mechanism as in Section 3.3 even with the improved distribution modeling.

For multi-step attacks, e.g., in I-FGSM, ${\mathbf{e}}$ can be similarly obtained from the distribution of perturbations. Unlike for ${\mathbf{w}}_{\mathrm{SWA}}$ and $\mathbf{\Sigma}_{\mathbf{w}}$ , it does not necessarily require model fine-tuning to obtain the mean vector and covariance matrix of the distribution, as they model the randomness in model input instead of in model parameters and the learning dynamics of the adversarial inputs are obtained during I-FGSM. Yet, for single-step attacks like FGSM, such a distribution modeling boils down to be equivalent to the isotropic case.

We compare performance of methods with and without such improved distribution, in single-step and multi-step attacks, in Table I. Approximating the covariance matrix of parameters through SWAG improves the attack success rates on all victim models, leading to an average increase of 8.16% (from 44.22% to 52.38%) when using single-step attacks based on FGSM. This indicates that the more general distributional assumption of model parameters aligns better with the distribution of victim parameters in practice. A similar benefit is observed when applying the improved distribution modeling to the model input, where the average success rate increases from 45.09% to 56.39% in using I-FGSM. The best performance is achieved when both improved posteriors and fine-tuning are adopted. However, note that with fine-tuning, we suggest not to adopt such improved distribution modeling of model input and model parameters simultaneously, as the two distributions will not be independent under such circumstances and there will be more hyper-parameters to be tuned. Following this suggestion, in Table I and subsequent experiments, when fine-tuning is performed, we only adopt SWAG for the model parameters and use the isotropic formulation for the model inputs.

TABLE II: Success rates of transfer-based attacks on ImageNet using ResNet-50 as substitute architecture and I-FGSM as the back-end attack, under the

\ell_{\infty}

constraint with

\epsilon=8/255

in the untargeted setting. “Average” was calculated over all ten victim models.

Method	VGG -19 [56]	Inception v3 [57]	ResNet -152 [58]	DenseNet -121 [59]	ConvNeXt -B [60]	ViT -B [61]	DeiT -B [62]	Swin -B [63]	BEiT -B [64]	Mixer -B [65]	\ul Average
I-FGSM	44.52%	24.06%	36.14%	44.40%	15.16%	6.50%	5.68%	8.24%	7.24%	12.76%	20.47%
MI-FGSM (2018) [18]	54.98%	34.08%	46.76%	55.48%	20.22%	10.14%	8.82%	11.24%	10.94%	17.82%	27.05%
TI-FGSM (2019) [24]	49.54%	25.10%	40.88%	49.60%	14.36%	9.76%	7.22%	9.28%	11.12%	14.48%	23.13%
DI²-FGSM (2019) [25]	81.92%	57.12%	81.40%	86.28%	35.44%	17.58%	17.64%	20.04%	21.70%	25.54%	44.47%
SI-FGSM (2019) [19]	56.10%	37.04%	50.76%	61.42%	18.82%	9.14%	8.44%	10.10%	10.40%	16.52%	27.87%
NI-FGSM (2019) [19]	54.80%	34.52%	46.92%	55.30%	20.46%	10.34%	8.68%	10.86%	11.04%	17.70%	27.06%
ILA (2019) [10]	75.30%	53.54%	71.40%	76.88%	35.34%	15.02%	14.04%	19.10%	17.34%	23.00%	40.10%
SGM (2020) [12]	73.02%	47.40%	62.22%	70.72%	34.74%	17.22%	15.22%	19.60%	16.92%	26.44%	38.35%
LinBP (2020) [13]	77.84%	51.00%	63.70%	75.66%	24.58%	10.82%	8.42%	12.74%	13.38%	20.88%	35.90%
Admix (2021) [26]	74.76%	54.08%	71.22%	79.70%	26.32%	12.64%	10.86%	13.88%	16.58%	21.84%	38.19%
NAA (2022) [67]	79.00%	63.78%	72.94%	82.78%	27.44%	13.70%	12.48%	16.96%	17.14%	26.28%	41.25%
ILA++ (2022) [21]	78.22%	59.16%	75.46%	80.44%	41.30%	17.26%	16.96%	21.60%	21.06%	25.80%	43.73%
Ours (w/o fine-tune)	92.40%	77.22%	91.42%	94.48%	45.44%	29.20%	26.32%	31.36%	36.74%	39.32%	56.39%
LGV (2022) [28]	90.80%	61.86%	84.90%	93.76%	35.94%	14.20%	12.64%	16.50%	18.08%	23.36%	45.20%
Our BasicBayesian (2023) [1]	97.10%	84.34%	96.16%	98.82%	52.82%	24.04%	17.88%	24.84%	33.60%	31.04%	56.06%
Ours	99.12%	95.64%	99.12%	99.66%	64.28%	58.20%	51.66%	49.26%	73.98%	61.04%	75.20%

TABLE III: Success rates of transfer-based attacks on CIFAR-10 using ResNet-18 as substitute architecture and I-FGSM as the back-end attack, under the

\ell_{\infty}

constraint with

\epsilon=4/255

in the untargeted setting. “Average” was calculated over all six victim models.

Method	VGG -19 [56]	WRN -28-10 [68]	ResNeXt -29 [69]	DenseNet -BC [59]	PyramidNet -272 [70]	GDAS [71]	\ul Average
I-FGSM	38.47%	57.43%	58.61%	53.23%	12.82%	40.52%	43.51%
MI-FGSM (2018) [18]	42.42%	62.25%	63.84%	57.45%	14.15%	44.21%	47.39%
TI-FGSM (2019) [24]	38.52%	57.51%	58.74%	53.43%	12.94%	40.62%	43.63%
DI²FGSM (2019) [25]	55.91%	71.97%	73.14%	66.51%	21.37%	52.32%	56.87%
SI-FGSM (2019) [19]	47.28%	63.74%	68.45%	62.42%	16.95%	43.36%	50.37%
NI-FGSM (2019) [19]	42.38%	62.23%	63.50%	57.05%	14.10%	44.23%	47.25%
ILA (2019) [10]	56.16%	76.37%	76.78%	72.78%	22.49%	56.26%	60.14%
SGM (2020) [12]	40.85%	59.61%	62.50%	55.48%	13.84%	45.17%	46.24%
LinBP (2020) [13]	58.51%	78.83%	81.52%	76.98%	27.44%	61.34%	64.10%
Admix (2021) [26]	50.22%	68.65%	72.36%	66.65%	18.31%	45.87%	53.68%
NAA (2022) [67]	39.40%	57.94%	60.26%	56.29%	13.74%	45.47%	45.52%
ILA++ (2022) [21]	60.31%	77.82%	78.43%	74.71%	26.25%	59.44%	62.83%
Ours (w/o fine-tune)	67.44%	83.31%	84.21%	80.25%	27.83%	65.15%	68.03%
LGV (2022) [28]	80.91%	92.57%	93.05%	90.94%	41.00%	77.67%	79.36%
Our BasicBayesian (2023) [1]	85.23%	94.37%	95.38%	93.56%	48.46%	83.61%	83.44%
Ours	88.04%	94.39%	95.61%	93.65%	55.80%	85.63%	85.52%

4 Experiments

We evaluate the effectiveness of our method by comparing it to recent state-of-the-arts in this section.

4.1 Experimental Settings

To be consistent with [1], we focused on untargeted $\ell_{\infty}$ attacks to study the adversarial transferability. All experiments were conducted on the same set of ImageNet [14] models collected from the timm repository [72], i.e., ResNet-50 [58], VGG-19 [56], Inception v3 [57], ResNet-152 [58], DenseNet-121 [59], ConvNeXt-B [60], ViT-B [61], DeiT-B [62], Swin-B [63], BEiT-B [64], and MLP-Mixer-B [65]. These models are well-known and encompass CNN, transformer, and MLP architectures, making the experiments more comprehensive. We randomly sampled 5000 test images that can be correctly classified by all these models from the ImageNet validation set for evaluation. Since some victim models are different from those in [1], the test images are also different. The ResNet-50 was chosen as the substitute model, same as in [1]. For the experiments conducted on CIFAR-10, we adhered to the settings established in prior work [1]. Specifically, we performed attacks on VGG-19 [56], WRN-28-10 [68], ResNeXt-29 [69], DenseNet-BC [59], PyramidNet-272 [70], and GDAS [71], employing ResNet-18 [58] as the substitute model. The test images used are the entire test set of the CIFAR-10 dataset. We ran 50 iterations with a step size of 1/255 for all the iterative attacks.

In this paper, we set $\gamma$ to a fixed constant, which is slightly different from the approach described in [1]. In [1], $\gamma$ was set to be $0.1/\|\mathbf{\Delta w}^{*}\|_{2}$ , dependent on the value of $\|\mathbf{\Delta w}^{*}\|_{2}$ . However, for our experiments on ImageNet and CIFAR-10, we set $\gamma$ to 0.1 with $\lambda_{\varepsilon,\sigma}=1$ , and $\gamma$ to 0.5 with $\lambda_{\varepsilon,\sigma}=0.2$ , respectively. These hyper-parameters match the ones actually used in [1]. We set $\lambda_{\varepsilon_{\mathbf{e}},\sigma_{\mathbf{e}}}$ to be 1 and 0.01 for ImageNet and CIFAR-10, respectively. We used a learning rate of 0.05, an SGD optimizer with a momentum of 0.9 and a weight decay of 0.0005, a batch size of 1024 for ImageNet and 128 for CIFAR-10, and a number of epochs of 10. We fixed $\sigma=0.006$ and $0.012$ for ImageNet and CIFAR-10, respectively. The $\sigma_{{\mathbf{e}}}$ was set to be $0.01$ and $0.05$ with and without fine-tuning, respectively. Considering Eq. (12), we used $\alpha=1$ for the posterior distribution over model parameters. For the posterior distribution over model input, we set $\alpha=100$ and $25$ for ImageNet and CIFAR-10, respectively. Due to the negligible difference in success rates observed between using a diagonal matrix and a combination of diagonal and low-rank matrices as the covariance for the SWAG posterior, we opted for simplicity and consistently employed the diagonal matrix. We also set $\beta=0$ in the experiments for the same reason. For our current method with fine-tuning, we used SWAG to approximate the posterior over model parameters, while we used an isotropic Gaussian distribution for the posterior over model inputs. This choice was made because modeling the posterior over model inputs becomes challenging in situations where there is a high degree of randomness in the model parameters. We set $M=5$ and $S=5$ unless otherwise specified.

For compared competitors, we followed their official implementations. For LGV [28], we sampled 5 models from the collected model set at each iteration. By doing so, the performance will be better compared with sampling only one model as in LGV’s default setting. For DRA [23], we tested it on ImageNet using the ResNet-50 model provided by the authors. All experiments were performed on an NVIDIA V100 GPU.

4.2 Comparison with State-of-the-arts

We compared our method with recent state-of-the-arts in Tables II and III. A variety of methods were included in the comparison, including methods that adopt advanced optimizers (MI-FGSM [18] and NI-FGSM [19]), increase input diversity (TI-FGSM [24], DI²-FGSM [25], SI-FGSM [19], and Admix [26]), use advanced gradient computations (ILA [10], SGM [12], LinBP [13], NAA [67], and ILA++ [21]), and employ substitute model fine-tuning (LGV [28]). We also evaluated our Bayesian formulation that only models the diversity in substitute model parameters [1]; for clarity, it is referred to as “BasicBayesian” in this paper. The performance of all compared methods was evaluated in the task of attacking 10 victim models on ImageNet and 6 victim models on CIFAR-10.

It can be observed from the tables that our current method outperforms all these methods. It can achieve the best average success rate on ImageNet even without fine-tuning. When fine-tuning is possible, the average success rate improves remarkably, achieving $75.20\%$ on ImageNet and 85.52% on CIFAR-10.

4.3 Combination with Other Methods

We would also like to mention that it is possible to combine our method with other attack methods to further enhance the transferability. In Table IV, we report the attack success rate of our method, in combination with MI-FGSM, DI²-FGSM, SGM, LinBP, and ILA++. It can be seen that the transferability to all victim models gets improved. The best performance is obtained when combining our current method with DI²-FGSM, achieving an average success rate of $78.73\%$ (which shows a $+34.26\%$ gain in comparison with the original performance of DI²-FGSM).

TABLE IV: Success rates of combining our method with other state-of-the-arts on ImageNet using ResNet-50 as substitute architecture and I-FGSM as the back-end attack, under the

\ell_{\infty}

constraint with

\epsilon=8/255

in the untargeted setting. “Average” was calculated over all ten victim models.

Method	VGG -19 [56]	Inception v3 [57]	ResNet -152 [58]	DenseNet -121 [59]	ConvNeXt -B [60]	ViT -B [61]	DeiT -B [62]	Swin -B [63]	BEiT -B [64]	Mixer -B [65]	\ul Average
Our BasicBayesian [1]	97.10%	84.34%	96.16%	98.82%	52.82%	24.04%	17.88%	24.84%	33.60%	31.04%	56.06%
Ours (w/o fine-tune)	92.40%	77.22%	91.42%	94.48%	45.44%	29.20%	26.32%	31.36%	36.74%	39.32%	56.39%
Ours	99.12%	95.64%	99.12%	99.66%	64.28%	58.20%	51.66%	49.26%	73.98%	61.04%	75.20%
MI-FGSM (2018) [18]	54.98%	34.08%	46.76%	55.48%	20.22%	10.14%	8.82%	11.24%	10.94%	17.82%	27.05%
+Our BasicBayesian [1]	98.48%	89.92%	97.50%	99.40%	51.94%	30.00%	22.24%	29.44%	42.32%	37.96%	59.92%
+Ours (w/o fine-tune)	92.58%	78.64%	91.42%	94.72%	46.38%	29.98%	26.82%	31.98%	36.98%	39.68%	56.92%
+Ours	99.02%	95.52%	98.82%	99.48%	62.46%	59.62%	53.80%	50.80%	74.40%	65.44%	75.94%
DI²-FGSM (2019) [25]	81.92%	57.12%	81.40%	86.28%	35.44%	17.58%	17.64%	20.04%	21.70%	25.54%	44.47%
+Our BasicBayesian [1]	99.58%	96.48%	99.58%	99.88%	64.56%	46.86%	36.66%	44.18%	64.68%	49.48%	70.19%
+Ours (w/o fine-tune)	96.62%	87.70%	96.90%	98.16%	55.62%	40.24%	38.34%	41.82%	50.40%	49.02%	65.48%
+Ours	99.32%	96.58%	99.42%	99.70%	63.40%	66.66%	59.08%	53.54%	81.62%	68.02%	78.73%
SGM (2020) [12]	73.02%	47.40%	62.22%	70.72%	34.74%	17.22%	15.22%	19.60%	16.92%	26.44%	38.35%
+Our BasicBayesian [1]	98.58%	89.98%	97.04%	99.18%	56.90%	28.24%	21.78%	30.24%	40.32%	37.72%	60.00%
+Ours (w/o fine-tune)	94.40%	81.66%	92.34%	95.74%	50.50%	33.74%	32.30%	35.54%	41.34%	47.50%	60.51%
+Ours	99.16%	95.76%	98.92%	99.64%	63.12%	58.28%	51.96%	49.96%	74.10%	64.04%	75.49%
LinBP (2020) [13]	77.84%	51.00%	63.70%	75.66%	24.58%	10.82%	8.42%	12.74%	13.38%	20.88%	35.90%
+Our BasicBayesian [1]	98.24%	90.10%	97.18%	99.32%	53.78%	27.08%	18.92%	28.38%	40.06%	32.66%	58.57%
+Ours (w/o fine-tune)	94.30%	80.94%	92.68%	95.76%	48.56%	31.58%	28.50%	34.14%	40.52%	40.58%	58.76%
+Ours	99.16%	95.72%	99.14%	99.68%	62.94%	60.84%	51.74%	49.88%	76.10%	61.78%	75.70%
ILA++ (2022) [21]	78.22%	59.16%	75.46%	80.44%	41.30%	17.26%	16.96%	21.60%	21.06%	25.80%	43.73%
+Our BasicBayesian [1]	98.76%	92.66%	98.10%	99.34%	59.22%	44.26%	40.64%	47.94%	57.00%	50.90%	68.88%
+Ours (w/o fine-tune)	92.04%	80.60%	91.62%	93.72%	48.46%	32.20%	31.38%	33.80%	41.06%	39.78%	58.47%
+Ours	98.58%	94.30%	98.26%	99.18%	63.06%	63.28%	58.56%	57.12%	74.96%	67.40%	77.47%

4.4 Attacking Defensive Models

It is also of interest to evaluate the transferability of adversarial examples to robust models, and we compared the performance of competitive methods in this setting in Table V. The victim models used in this study were collected from RobustBench [73]. All these models were trained using some sorts of advanced adversarial training [74, 75, 76], and they exhibit high robust accuracy against AutoAttack [77] on the official ImageNet validation set. These models included a robust ConvNeXt-B [78], a robust Swin-B [78], and a robust ViT-B-CvSt [79]. Following the setting in [1], two models from Bai et al.’s open-source repository [80], namely a robust ResNet-50-GELU and a robust DeiT-S, were also adopted as the robust victim. The tested robust ConvNeXt-B, robust Swin-B, and robust ViT-B-CvSt show higher prediction accuracy (i.e., 55.82%, 56.16%, and 54.66%, respectively, against AutoAttack) than that of the robust ResNet-50-GELU and robust DeiT-S (35.51% and 35.50%, respectively).

We still used the ResNet-50 substitute model which was trained just as normal and not robust to adversarial examples at all. From Table V, we can observe that our newly proposed method improves the transferability of adversarial examples to these defensive models, compared with the basic Bayesian formulation in [1].

Since the objective of adversarial training is different from that of normal training, in the sense that the distribution of input is different, we suggest increasing $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ to achieve better alignment between the distributions and to further enhance transferability. When we increase $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ from 1 to 5 or even 10, we indeed obtain even better results in attacking robust models on ImageNet, as will be discussed in the ablation study.

In addition to adversarial training, we conducted experiments on robust models obtained through randomized smoothing [81], which is widely recognized as one of the leading techniques for achieving certified robustness. When attacking such a defensive model with a ResNet-50 architecture, we observe a substantial improvement compared to [1] (26.76% $\rightarrow$ 85.04%).

TABLE V: Success rates of attacking adversarially trained models on ImageNet using ResNet-50 as substitute architecture and I-FGSM as the back-end attack, under the

\ell_{\infty}

constraint with

\epsilon=8/255

in the untargeted setting. “Average” was calculated over all five victim models.

Method	ConvNeXt -B [78]	Swin -B [78]	ViT -B [79]	ResNet -50 [80]	DeiT -S [80]	\ulAverage
Ours BasicBayesian [1]	6.80%	7.08%	7.22%	16.86%	17.64%	11.12%
Ours (w/o fine-tune)	6.98%	7.22%	7.32%	17.40%	18.10%	11.40%
Ours	8.14%	8.04%	8.74%	21.00%	20.74%	13.33%

4.5 Ablation Study

We conduct a series of ablation experiments to study the impact of different hyper-parameters.

The effect of $\lambda_{\varepsilon,\sigma}$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ . When adopting fine-tuning, we have two main hyper-parameters that have an effect, namely $\lambda_{\varepsilon,\sigma}$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ . We conducted an empirical study to demonstrate how the performance of our method varies with the values of these two hyper-parameters on ImageNet. We varied $\lambda_{\varepsilon,\sigma}$ from the set $\{0,0.1,0.2,0.5,1,2\}$ and varied $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ from the set $\{0,0.5,1,5,10\}$ for attacking models which are the same as in Table II and Table V. The average success rates of attacking these normally trained models and robust models are given in Table VI. To achieve the best performance for each $\lambda_{\varepsilon,\sigma}$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ , we tuned the other hyper-parameters using 500 randomly selected images from the validation set. These images show no overlap with the 5000 test images.

From the table, it can be observed that increasing the values of $\lambda_{\varepsilon,\sigma}$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ from 0 to 0.5 enhances the transferability of adversarial examples in attacking normally trained models. However, excessively large values of these hyper-parameters can lead to inferior performance. The best performance of attacking normally trained models is achieved by setting $\lambda_{\varepsilon,\sigma}=0.5$ and $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}=1$ , which results in an average success rate of $75.60\%$ . Considering the performance on attacking robust models, it can be observed that the average success rate peaks a larger value of $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ . This is partially because that reducing the prediction loss of the perturbed inputs during fine-tuning in Eq. (8) resembles performing adversarial training, and larger $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$ implies making the substitute model robust in a larger neighborhood of benign inputs. A recent related method, namely DRA [23], also suggests that performing regularized fine-tuning before attacking is beneficial to attacking defensive models, and it achieves an average success rate of $62.03\%$ and $17.68\%$ in attacking normally trained models and adversarially trained models, respectively. By setting $\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}=5$ , our method has an average success rate ranging from $65.84\%$ to $67.51\%$ in attacking normally trained models and ranging from $18.48\%$ to $20.47\%$ in attacking adversarially trained models, which outperforms DRA considerably.

TABLE VI: Average success rates of attacking normally trained victim models and adversarially trained victim models on ImageNet, with different choices of

\lambda_{\varepsilon,\sigma}

and

\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}

. The substitute model is ResNet-50. The normally trained victim models are the same as those in Table II, and the adversarially trained models are the same as those in Table V. The symbol “-” signifies that the training does not converge.

		$\lambda_{\varepsilon_{{\mathbf{e}}},\sigma_{{\mathbf{e}}}}$
		0	0.5	1	5	10
$\lambda_{\varepsilon,\sigma}$	0	47.66%/10.86%	71.22%/12.77%	74.33%/14.11%	66.00%/20.47%	47.43%/22.72%
	0.1	50.74%/10.93%	73.19%/12.83%	74.08%/14.04%	66.14%/20.38%	48.23%/22.70%
	0.2	51.97%/11.04%	73.62%/12.80%	74.19%/13.84%	66.65%/19.94%	49.57%/22.22%
	0.5	54.24%/11.14%	73.78%/12.64%	75.60%/13.56%	67.51%/19.09%	49.09%/21.70%
	1	55.58%/11.26%	72.81%/12.54%	75.20%/13.33%	65.84%/18.48%	-
	2	51.67%/11.48%	66.02%/12.66%	70.78%/13.36%	-	-

The effect of $M$ and $S$ . Our method is proposed based on the principle that increasing the diversity of model parameters and model input can enhance the transferability of adversarial examples. In Figure 3, we conducted experiments to evaluate the transferability of adversarial examples crafted using different choices of the number of model parameters and input noise sampled at each attack iteration, i.e., $M$ and $S$ , in the cases of with and without fine-tuning. The average success rates were obtained by attacking the same victim models as in Table II. The results apparently show that sampling more substitute models and more input noise can indeed enhance the transferability of adversarial examples, just as expected.

5 Conclusion

In this paper, we aim at improving the transferability of adversarial examples. We have developed a Bayesian formulation for performing attacks, which can be equivalently regarded as generating adversarial examples on a set of infinitely many substitute models with input augmentations. We also advocated possible fine-tuning and advanced posterior approximations for improving the Bayesian model. Extensive experiments have been conducted on ImageNet and CIFAR-10 to demonstrate the effectiveness of the proposed method in generating transferable adversarial examples. It has been shown our method outperforms recent state-of-the-arts by large margins in attacking more than 10 DNNs, including convolutional networks and vision transformers and MLPs, as well as in attacking defensive models. We have also showcased the compatibility of our method with existing transfer-based attack methods, leading to even more powerful adversarial transferability.

References

[1] Q. Li, Y. Guo, W. Zuo, and H. Chen, “Making substitute models more bayesian can enhance transferability of adversarial examples,” in International Conference on Learning Representations, 2023.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.
[3] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 160–167.
[4] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, “Convolutional neural networks for speech recognition,” IEEE/ACM Transactions on audio, speech, and language processing, vol. 22, no. 10, pp. 1533–1545, 2014.
[5] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
[6] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to black-box attacks using adversarial samples,” arXiv preprint arXiv:1605.07277, 2016.
[7] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in ICLR, 2017.
[8] Y. Li, S. Bai, Y. Zhou, C. Xie, Z. Zhang, and A. Yuille, “Learning transferable adversarial examples via ghost networks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 11 458–11 465.
[9] Y. Xiong, J. Lin, M. Zhang, J. E. Hopcroft, and K. He, “Stochastic variance reduced ensemble adversarial attack for boosting the adversarial transferability,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 983–14 992.
[10] Q. Huang, I. Katsman, H. He, Z. Gu, S. Belongie, and S.-N. Lim, “Enhancing adversarial example transferability with an intermediate level attack,” in ICCV, 2019.
[11] Q. Li, Y. Guo, and H. Chen, “Yet another intermediate-leve attack,” in ECCV, 2020.
[12] D. Wu, Y. Wang, S.-T. Xia, J. Bailey, and X. Ma, “Rethinking the security of skip connections in resnet-like neural networks,” in ICLR, 2020.
[13] Y. Guo, Q. Li, and H. Chen, “Backpropagating linearly improves transferability of adversarial examples,” in NeurIPS, 2020.
[14] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “Imagenet large scale visual recognition challenge,” IJCV, 2015.
[15] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Citeseer, Tech. Rep., 2009.
[16] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.
[17] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in ICLR, 2017.
[18] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018.
[19] J. Lin, C. Song, K. He, L. Wang, and J. E. Hopcroft, “Nesterov accelerated gradient and scale invariance for adversarial attacks,” arXiv preprint arXiv:1908.06281, 2019.
[20] Y. Huang and A. W.-K. Kong, “Transferable adversarial attack based on integrated gradients,” arXiv preprint arXiv:2205.13152, 2022.
[21] Y. Guo, Q. Li, W. Zuo, and H. Chen, “An intermediate-level attack framework on the basis of linear regression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
[22] J. Springer, M. Mitchell, and G. Kenyon, “A little robustness goes a long way: Leveraging robust features for targeted transfer attacks,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[23] Y. Zhu, Y. Chen, X. Li, K. Chen, Y. He, X. Tian, B. Zheng, Y. Chen, and Q. Huang, “Toward understanding and boosting adversarial transferability from a distribution perspective,” IEEE Transactions on Image Processing, vol. 31, pp. 6487–6501, 2022.
[24] Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4312–4321.
[25] C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in CVPR, 2019.
[26] X. Wang, X. He, J. Wang, and K. He, “Admix: Enhancing the transferability of adversarial attacks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 16 158–16 167.
[27] M. Gubri, M. Cordy, M. Papadakis, Y. Le Traon, and K. Sen, “Efficient and transferable adversarial examples from bayesian neural networks,” in Uncertainty in Artificial Intelligence. PMLR, 2022, pp. 738–748.
[28] M. Gubri, M. Cordy, M. Papadakis, Y. L. Traon, and K. Sen, “Lgv: Boosting adversarial example transferability from large geometric vicinity,” arXiv preprint arXiv:2207.13129, 2022.
[29] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” Advances in neural information processing systems, vol. 26, 2013.
[30] A. Graves, “Practical variational inference for neural networks,” Advances in neural information processing systems, vol. 24, 2011.
[31] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in International conference on machine learning. PMLR, 2015, pp. 1613–1622.
[32] D. P. Kingma, T. Salimans, and M. Welling, “Variational dropout and the local reparameterization trick,” Advances in neural information processing systems, vol. 28, 2015.
[33] M. Khan, D. Nielsen, V. Tangkaratt, W. Lin, Y. Gal, and A. Srivastava, “Fast and scalable bayesian deep learning by weight-perturbation in adam,” in International Conference on Machine Learning. PMLR, 2018, pp. 2611–2620.
[34] G. Zhang, S. Sun, D. Duvenaud, and R. Grosse, “Noisy natural gradient as variational inference,” in International Conference on Machine Learning. PMLR, 2018, pp. 5852–5861.
[35] A. Wu, S. Nowozin, E. Meeds, R. E. Turner, J. M. Hernandez-Lobato, and A. L. Gaunt, “Deterministic variational inference for robust bayesian neural networks,” arXiv preprint arXiv:1810.03958, 2018.
[36] K. Osawa, S. Swaroop, M. E. E. Khan, A. Jain, R. Eschenhagen, R. E. Turner, and R. Yokota, “Practical deep learning with bayesian principles,” Advances in neural information processing systems, vol. 32, 2019.
[37] M. Dusenberry, G. Jerfel, Y. Wen, Y. Ma, J. Snoek, K. Heller, B. Lakshminarayanan, and D. Tran, “Efficient and scalable bayesian neural nets with rank-1 factors,” in International conference on machine learning. PMLR, 2020, pp. 2782–2792.
[38] Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning. PMLR, 2016, pp. 1050–1059.
[39] A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” Advances in neural information processing systems, vol. 30, 2017.
[40] Y. Gal, J. Hron, and A. Kendall, “Concrete dropout,” Advances in neural information processing systems, vol. 30, 2017.
[41] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
[42] H. Ritter, A. Botev, and D. Barber, “A scalable laplace approximation for neural networks,” in 6th International Conference on Learning Representations, ICLR 2018-Conference Track Proceedings, vol. 6. International Conference on Representation Learning, 2018.
[43] D. X. Li, “On default correlation: A copula function approach,” The Journal of Fixed Income, vol. 9, no. 4, pp. 43–54, 2000.
[44] S. Mandt, M. D. Hoffman, and D. M. Blei, “Stochastic gradient descent as approximate bayesian inference,” arXiv preprint arXiv:1704.04289, 2017.
[45] W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[46] W. Maddox, S. Tang, P. Moreno, A. G. Wilson, and A. Damianou, “Fast adaptation with linearized neural networks,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 2737–2745.
[47] A. G. Wilson and P. Izmailov, “Bayesian deep learning and a probabilistic perspective of generalization,” Advances in neural information processing systems, vol. 33, pp. 4697–4708, 2020.
[48] L. Cardelli, M. Kwiatkowska, L. Laurenti, N. Paoletti, A. Patane, and M. Wicker, “Statistical guarantees for the robustness of bayesian neural networks,” arXiv preprint arXiv:1903.01980, 2019.
[49] M. Wicker, L. Laurenti, A. Patane, and M. Kwiatkowska, “Probabilistic safety for bayesian neural networks,” in Conference on Uncertainty in Artificial Intelligence. PMLR, 2020, pp. 1198–1207.
[50] X. Liu, Y. Li, C. Wu, and C.-J. Hsieh, “Adv-bnn: Improved adversarial defense through robust bayesian neural network,” arXiv preprint arXiv:1810.01279, 2018.
[51] M. Yuan, M. Wicker, and L. Laurenti, “Gradient-free adversarial attacks for bayesian neural networks,” arXiv preprint arXiv:2012.12640, 2020.
[52] G. Carbone, M. Wicker, L. Laurenti, A. Patane, L. Bortolussi, and G. Sanguinetti, “Robustness of bayesian neural networks to gradient-based attacks,” Advances in Neural Information Processing Systems, vol. 33, pp. 15 602–15 613, 2020.
[53] Y. Li, J. Bradshaw, and Y. Sharma, “Are generative classifiers more robust to adversarial attacks?” in International Conference on Machine Learning. PMLR, 2019, pp. 3804–3814.
[54] Z. Wang, Y. Guo, and W. Zuo, “Deepfake forensics via an adversarial game,” IEEE Transactions on Image Processing, vol. 31, pp. 3541–3552, 2022.
[55] P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” arXiv preprint arXiv:2010.01412, 2020.
[56] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
[57] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in CVPR, 2016.
[58] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
[59] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017.
[60] Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 976–11 986.
[61] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
[62] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in International Conference on Machine Learning. PMLR, 2021, pp. 10 347–10 357.
[63] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
[64] H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,” in International Conference on Learning Representations, 2021.
[65] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in Neural Information Processing Systems, vol. 34, pp. 24 261–24 272, 2021.
[66] P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.
[67] J. Zhang, W. Wu, J.-t. Huang, Y. Huang, W. Wang, Y. Su, and M. R. Lyu, “Improving adversarial transferability via neuron attribution-based attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 993–15 002.
[68] S. Zagoruyko and N. Komodakis, “Wide residual networks,” in BMVC, 2016.
[69] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in CVPR, 2017.
[70] D. Han, J. Kim, and J. Kim, “Deep pyramidal residual networks,” in CVPR, 2017, pp. 5927–5935.
[71] X. Dong and Y. Yang, “Searching for a robust neural architecture in four gpu hours,” in CVPR, 2019.
[72] R. Wightman, “Pytorch image models,” https://github.com/rwightman/pytorch-image-models, 2019.
[73] F. Croce, M. Andriushchenko, V. Sehwag, E. Debenedetti, N. Flammarion, M. Chiang, P. Mittal, and M. Hein, “Robustbench: a standardized adversarial robustness benchmark,” arXiv preprint arXiv:2010.09670, 2020.
[74] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018.
[75] E. Wong, L. Rice, and J. Z. Kolter, “Fast is better than free: Revisiting adversarial training,” arXiv preprint arXiv:2001.03994, 2020.
[76] C. Xie, M. Tan, B. Gong, J. Wang, A. L. Yuille, and Q. V. Le, “Adversarial examples improve image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 819–828.
[77] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020.
[78] C. Liu, Y. Dong, W. Xiang, X. Yang, H. Su, J. Zhu, Y. Chen, Y. He, H. Xue, and S. Zheng, “A comprehensive study on robustness of image classification models: Benchmarking and rethinking,” arXiv preprint arXiv:2302.14301, 2023.
[79] N. D. Singh, F. Croce, and M. Hein, “Revisiting adversarial training for imagenet: Architectures, training and generalization across threat models,” arXiv preprint arXiv:2303.01870, 2023.
[80] Y. Bai, J. Mei, A. Yuille, and C. Xie, “Are transformers more robust than cnns?” in Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
[81] J. Cohen, E. Rosenfeld, and Z. Kolter, “Certified adversarial robustness via randomized smoothing,” in International Conference on Machine Learning. PMLR, 2019, pp. 1310–1320.