Reducing Adversarial Training Cost with Gradient Approximation

Huihui Gong The University of Sydney [email protected]

Abstract

Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, the prohibitive computational overhead of generating strong enough AEs, due to the maximization of the loss function, sometimes makes the regular PGD adversarial training impractical when using larger and more complicated models. In this paper, we propose that the adversarial loss can be approximated by the partial sum of Taylor series. Furthermore, we approximate the gradient of adversarial loss and propose a new and efficient adversarial training method, adversarial training with gradient approximation (GAAT), to reduce the cost of building up robust models. Additionally, extensive experiments demonstrate that this efficiency improvement can be achieved without any or with very little loss in accuracy on natural and adversarial examples, which show that our proposed method saves up to 60% of the training time with comparable model test accuracy on MNIST, CIFAR-10 and CIFAR-100 datasets.

1 Introduction

In recent years, deep learning models have given a mightily impressive performance in a variety of domains including automatic driving, machine translation, automatic speech recognition, cyber defence, and so on. However, these deep networks are extremely vulnerable to particularly designed inputs with imperceptible perturbations, named adversarial examples. In order to defend against the threat of adversarial examples, relevant researchers proposed numerous robust deep networks, which usually include main types, provable defences Cohen et al. (2019); Salmany et al. (2019); Singla and Feizi (2020) and empirical defences Kurakin et al. (2016); Madry et al. (2018). Among the proposed methods, adversarial training, a famous empirical defence, is one of the most effective methods Cai et al. (2018); Wang et al. (2019); Hendrycks et al. (2019). Generally, in an adversarial training epoch, the adversarial versions of natural training data are generated and then they are used to train the model to increase its robustness towards such adversaries.

Model	Std Training	RAT
VGGNet-16	31 min	421 min
ResNet-18	17 min	189 min
ResNet-50	41 min	563 min
WideResNet-28x4	94 min	1345 min

Table 1: Time consumed by standard training and RAT of different models for the CIFAR-10 dataset. All experiments are run with 200 epochs and the same hyper-parameters. RAT is conducted using the following extra parameters: {PGD steps, step size, maximum perturbation}={10, 1/255, 8/255}.

The bulk of research interests about adversarial examples are relevant to generating stronger adversarial examples as well as improving adversarial robustness of deep models, while less attention is put into the cost of training a robust model. According to Table 1, the regular adversarial training (RAT) is much more costly than the standard training when finishing the CIFAR-10 image classification task with four common deep models. Here, we use the same hardware, training dataset, training hyper-parameters and total number of epochs. Thus, it is contributive to reduce the computational cost of adversarial training. Nowadays, a robust method relates to training an ensemble of deep models Strauss et al. (2017); Tramèr et al. (2018); Pang et al. (2019), which greatly reduces the training time of each model. In Shafahi et al. (2019), the authors simultaneously generated adversarial examples and updated model parameters to reduce the training epochs with comparable accuracy performances.

In this paper, we propose a novel adversarial training method, adversarial training with gradient approximation (GAAT), which significantly reduces the cost of adversarial training by considering the gradient approximation of adversarial loss. Specifically, in consideration of the imperceptibility of the adversarial perturbation, we leverage the standard loss, its gradient, second-order derivative with respect to the inputs as well as the adversarial perturbation to approximate the adversarial loss. Then, we utilize it to generate strong adversarial examples. Finally, the generated adversarial examples are used for adversarial training to improve the robustness of deep models.

Our contributions. We propose a simple but effective method, adversarial training with gradient approximation (GAAT), which trains the robust deep models faster than RAT. The key idea of GAAT is to replace the costly gradient of adversarial loss with the approximate gradient. Based on popular deep learning models (shallow neural networks Wong and Kolter (2018), VGGNet Simonyan and Zisserman (2014), ResNet He et al. (2015) and WideResNet Zagoruyko and Komodakis (2016)), our proposed robust models trained on MNIST, CIFAR-10 and CIFAR-100 datasets save up to 60% of training time with comparable robust performances to models trained with RAT. Moreover, we compare our method with the delayed adversarial training Gupta et al. (2020) and combine them to further improve the training efficiency.

2 Background and Related Work

Since proposed in 2014 Szegedy et al. (2014), adversarial examples have become a hot interest among machine learning researchers and scholars. Until now, many papers have studied adversarial attacks and robust models. In this section, we only emphasize several methods and concepts that are most relevant to our work.

Regular adversarial training. Initially, the most commonly utilized method to create adversarial examples is based on projected gradient descent (PGD), which originally referred to the basic iteration method proposed by Kurakin et al. Kurakin et al. (2017). To improve the previous methods, multiple stronger threat models are proposed, e.g., Madry et al. Madry et al. (2018) proposed the famous PGD approach, which is regarded as the most effective first-order attack. Interestingly, this method can transfer into a heuristic defence by exploiting the adversarial examples in adversarial training, which serves as an effective defence method in practice. Concretely speaking, let an image $x$ , its label $y$ and some ball $\mathcal{B}(x,\epsilon)$ around $x$ with radius $\epsilon$ . The PGD attack method is formulated as the following iteration:

\delta^{t+1}=\Pi\Big{(}\delta^{t}+\alpha\cdot\mathrm{sign}\left(\nabla_{\delta^{t}}\ell(f_{\theta}(x+\delta^{t}),y)\right)\Big{)},

(1)

where $\delta^{0}$ is the initial perturbation (usually set to zero or a random value), $\alpha$ is the step size and $\Pi(\cdot)$ projects the results of the gradient step into the $\epsilon$ -ball. In this paper, we consider the $l_{\infty}$ ball $\mathcal{B}_{\infty}(x,\epsilon)=\{x+\delta:||\delta||_{\infty}\leq\epsilon\}$ . $\mathrm{sign}(\cdot)$ is used to make the iteration converge faster, because the standard gradient steps $[\nabla_{\delta^{t}}\ell(f_{\theta}(x+\delta^{t}),y)]$ are typically too small. If $t+1=T$ , it is denoted as a $T$ -step PGD attack. The bigger the value of $T$ is, the stronger the adversarial examples are, resulting in a greater chance of misclassification by a well-trained classifier $f_{\theta}$ . In addition to its effectiveness of threatening a trained classifier, PGD attack also enlightens us on how to defend against such strong first-order adversaries. Specifically, instead of minimizing the loss function evaluated at a natural example $x$ , we minimize the loss function on an adversarial example ( $x+\delta^{T}$ ), where $\delta^{T}$ is generated by the $T$ -step PGD attack (1) for some ball $\mathcal{B}(x,\epsilon)$ . We refer to the above PGD based adversarial training as the regular adversarial training (RAT) and summarize it in Algorithm 1.

Algorithm 1 Regular adversarial training (RAT)

Input: $M$ batches of natural images $(x,y)$
Parameter: Training epochs $N$ , trained model $f_{\theta}$ , learning rate $\gamma$ , PGD parameters { $T$ , $\alpha$ , $\epsilon$ }
Output: Robust model parameters $\theta$

1: for

n=1\dots N

2: for

i=1\dots M

3: # Conduct PGD adversarial attack

\delta=0

# or random value

5: for

t=1\dots T

\delta=\delta+\alpha\cdot\mathrm{sign}(\nabla_{\delta}\ell(f_{\theta}(x_{i}+\delta),y_{i}))

\delta=\max(\min(\delta,\epsilon),-\epsilon)

# projected into

l_{\infty}

ball

8: end for

9: # Update classifier parameters with some optimizer

10:

\theta=\theta-\gamma\nabla_{\theta}\ell(f_{\theta}(x_{i}+\delta),y_{i})

11: end for

12: end for

13: return

\theta

# return parameters of the robust classifier

\psfrag{R(x,del)}[c][c][0.7]{$R(x,\delta)$}\psfrag{epoch}[c][c][0.7]{Epoch}\psfrag{1 Step}[c][c][0.6]{1 Step}\psfrag{4 Steps}[c][c][0.6]{4 Steps}\psfrag{10 Steps}[c][c][0.6]{10 Steps}\psfrag{16 Steps}[c][c][0.6]{16 Steps}\includegraphics[width=355.57156pt]{figures/res18_obser.eps}

(a) ResNet-18

(b) ResNet-50

Figure 1: The value of

R(x,\delta)

of different steps PGD regular adversarial training for two ResNet models on the CIFAR-10 dataset.

Improving efficiency of adversarial training. Another relevant work is about reducing the overhead of adversarial training. Shafahi et al. Shafahi et al. (2019) took advantage of gradient information from model updates to reduce the cost of generating adversaries and claimed about 7 times faster than RAT. However, one is required to run their methods for multiple times to select better models that reduces the overall training time, which is significant because the suboptimal models achieve the suboptimal generalization and robustness Gupta et al. (2020). Wong et al. Wong et al. (2020) proposed the a fast adversarial training (Fast AT) method that reuses the well-known method, fast gradient sign method Goodfellow et al. (2015). By initializing the perturbation with a random value, Fast AT is as effective as RAT but has much lower cost. Nevertheless, Fast AT has its limitation, i.e., it sometimes fails due to catastrophic overfitting, as illustrated in Wong et al. (2020); Li et al. (2020). Gupta et al. Gupta et al. (2020) showed that in adversarial training, using natural examples for the initial epochs and adversarial samples for the rest epochs achieve comparable robustness with RAT, but this method, denoted as delayed adversarial training (DAT), consumes much less time and computational resources.

Taylor series. Last but not least, our paper relies on Taylor series, which is an infinite sum of terms that are calculated from derivatives of a function at a single point. Note that the notation $C^{m}[a,b]$ denotes the set of all functions $f(\cdot)$ so that $f(\cdot)$ and its first, second, …, $n$ th-order derivatives are continuous at the computational interval $[a,b]$ , the Taylor series of a function $f(x)$ is given by

	$\displaystyle f(x+\triangle x)=$	$\displaystyle f(x)+\frac{1}{1!}\triangle xf^{(1)}(x)+\frac{1}{2!}\triangle x^{2}f^{(2)}(x)$
		$\displaystyle+\frac{1}{3!}\triangle x^{3}f^{(3)}(x)+\cdots+\frac{1}{n!}\triangle x^{n}f^{(n)}(x),$

where $f^{(i)}$ denotes the $i$ th-order derivative of $f(x)$ . In mathematics and engineering fields, like future minimization Zhang et al. (2017) and quadratic programming Zhang et al. (2019), the partial sum of Taylor series is usually applied to approximating a target function so as to achieve an acceptable precision Mathews and Fink (2005). Besides, in recent years, Taylor series have increasingly applied to machine learning and computer vision areas, such as rectifier networks Balduzzi et al. (2017), saliency map interpretation Singla et al. (2019) and meta learning Gonzalez and Miikkulainen (2020).

3 Motivating Gradient Approximation

By analysing the cost of PGD adversarial training method, we find that one of the most time-consuming operations is to compute the gradient of adversarial loss [ $\nabla_{\delta}\ell(f_{\theta}(x+\delta),y)$ ]. The classifier function $f_{\theta}$ is a very complex non-linear function (with multi-layer linear combination computation and non-linear activation computation), so it is very costly to compute the gradient of $f_{\theta}$ for $T$ times in the $T$ -step PGD adversarial training method. We find that the adversarial loss can be approximated by partial Taylor series so that we can circumvent the expensive gradient computation and further accelerate adversarial training.

3.1 Adversarial Loss Approximation

First of all, we present the third-order Taylor series of a scalar function $f(x)$ with continuous third-order derivatives as follows:

	$\displaystyle f(x+\triangle x)=$	$\displaystyle f(x)+\triangle xf^{(1)}(x)+\frac{1}{2}\triangle x^{2}f^{(2)}(x)$
		$\displaystyle+\frac{1}{6}\triangle x^{3}f^{(3)}(\xi),$

where $\xi\in(x,x+\triangle x)$ and $\triangle x^{3}f^{(3)}(\xi)/6$ is the Lagrangian remainder. Because $f^{(3)}(\xi)/6$ is a constant, we can rewrite the remainder as $O(\triangle x^{3})=\triangle x^{3}f^{(3)}(\xi)/6$ , which is a function of $\triangle x$ . If $\triangle x$ is small enough, we can omit the remainder $O(\triangle x^{3})$ , so we approximate $f(x+\triangle x)$ as follows:

f(x+\triangle x)\approx f(x)+\triangle xf^{(1)}(x)+\frac{1}{2}\triangle x^{2}f^{(2)}(x).

(2)

In our setting, due to the small value of the imperceptible perturbation $\delta$ , it is reasonable to approximate the adversarial loss¹¹1For the convenience of representation, we abbreviate adversarial loss $\ell(f_{\theta}(x+\delta),y)$ as $\ell(x+\delta)$ . $\ell(x+\delta)$ in the matrix form by Formula (2):

	$\displaystyle\ell(x+\delta)$	$\displaystyle\approx\ell(x)+\delta^{T}\nabla_{x}\ell(x)+\frac{1}{2}\delta^{T}\nabla_{x}^{2}\ell(x)\delta$		(3)
		$\displaystyle=\ell(x)+\delta^{T}J(x)+\frac{1}{2}\delta^{T}H(x)\delta,$		(3)

where $J(x)=\nabla_{x}\ell(x)$ is the first-order derivative matrix, i.e., the Jacobian matrix; $H(x)=\nabla_{x}^{2}\ell(x)$ is the second-order derivative matrix, i.e., the Hessian matrix.

3.2 Empirical Observations of Adversarial Loss Approximation

In order to evaluate the effectiveness of the approximation (3), we define the residual error between the true loss and the approximate loss as

R(x,\delta)=|\ell(x+\delta)-\ell(x)-\delta^{T}J(x)-\frac{1}{2}\delta^{T}H(x)\delta|,

(4)

where $|\cdot|$ computes the absolute value, and $||\delta||_{\infty}\leq\epsilon$ . $H(x)$ is approximated by Equation (8).

We measure $R(x,\delta)$ of two ResNet models (ResNet-18 and ResNet-50) with RAT on the CIFAR-10 dataset, where the adversary generation is run with 1, 4, 10, 16 steps of PGD attack (1). $R(x,\delta)$ is computed throughout training on the training set of CIFAR-10. We used two residual networks with the depth of 18 and 50 as our models. The results are shown in Figure 1. At the initial epochs, $R(x,\delta)$ is relatively large, and as the training process goes on, $R(x,\delta)$ decreases. Besides, the more steps the PGD attack has, the larger $R(x,\delta)$ is, while it still converges to very small values, i.e., $7.9\times 10^{-4}$ for ResNet-18 and $5.4\times 10^{-4}$ for ResNet-50. However, when we train with more PGD steps to generate adversaries, $R(x,\delta)$ is not going to get any larger, e.g., $R(x,\delta)$ of 16-step PGD equals approximately to $R(x,\delta)$ of 10-step PGD.

4 Adversarial Training with Gradient Approximation

From the section above, we make the empirical observations that $R(x,\delta)$ decreases as we train with more epochs. In this section, based on the approximation (3), we derive an approximation of the gradient of $\ell(x+\delta)$ that can be utilized for accelerating adversarial training.

4.1 Gradient Approximation of Adversarial Loss

Instead of computing the partial derivative with respect to $\delta$ of the true adversarial loss, we find the partial derivative of the approximate loss (3), so we obtain:

\nabla_{\delta}\ell(x+\delta)\approx J(x)+H(x)\delta.

(5)

Furthermore, we replace the formula of Line 6 in Algorithm 1 with

\delta=\delta+\alpha\cdot\mathrm{sign}(J(x)+H(x)\delta).

(6)

In the $T$ -step PGD adversarial training, the regular PGD method computes the partial derivative [ $\nabla_{\delta}\ell(x+\delta)$ ] for $T$ times in a batch. However, except some very simple operations (like addition and multiplication), adversarial training with gradient approximation (5) only computes the Jacobian matrix and the Hessian matrix of $\ell(x)$ with respect to natural examples $x$ for one time.

4.2 Hessian Matrix Approximation

In Equations (5) and (6), we can easily obtain the Jacobian matrix by backpropagation in adversarial training, while the Hessian matrix is usually difficult to calculate or requires a large amount of calculation. Therefore, we also use approximation method to simplify the calculation.

Use $J(x)$ to Approximate $H(x)$ . Referring to the Gauss-Newton method, we use the Jacobian matrix to roughly approximate the Hessian matrix. Gauss-Newton method is a specialized method for minimizing the least squares cost. Given a point $x$ , the Gauss-Newton method in our setting is based on the following objective function:

		$\displaystyle S(\delta)=\frac{1}{2}r^{2}(\delta),$		(7)
		$\displaystyle\mathrm{s.t.}~{}~{}r(\delta)=\ell(x+\delta)-\ell(x)$
		$\displaystyle\approx\ell(x)+\delta^{T}\nabla_{x}\ell(x)-\ell(x)=\delta^{T}\nabla_{x}\ell(x),$

where we use the first-order Taylor series to approximate $\ell(x+\delta)$ . So the gradient of (7) is $g=r\nabla_{x}\ell(x)$ (Here, we abbreviate $r(\delta)$ as $r$ ), and the Hessian matrix is $H(x)=\nabla_{x}\ell(x)^{T}\nabla_{x}\ell(x)+r\nabla_{\delta}(\nabla_{x}\ell(x))$ . Omitting the second item of $H(x)$ , we obtain:

H(x)\approx\nabla_{x}\ell(x)^{T}\nabla_{x}\ell(x).

(8)

Although the objective function (7) is different from the objective function in our task, we can still use (8) to roughly approximate the Hessian matrix in (5) and (6) for the purpose of simplifying the calculation.

Omit Partial Derivatives to Approximate $H(x)$ . Given an image $x\in\mathbb{R}^{w\times h}$ , the Hessian matrix can be written as²²2For convenience, we reshape the $w\times h$ image matrix as $wh\times 1$ vector.

H(x)=\left[\begin{matrix}\frac{\partial^{2}\ell}{\partial x_{i}\partial x_{j}}\end{matrix}\right]_{wh\times wh},

(9)

where $i$ (or $j$ ) is the $i$ th (or $j$ th) element of the image vector and $1\leq i,j\leq wh$ . When $i\neq j$ , the partial derivatives $\partial^{2}\ell/\partial x_{i}\partial x_{j}$ are very difficult to calculate; however, when $i=j$ , the second-order derivatives $\partial^{2}\ell/\partial x_{i}^{2}$ are easier to obtain. So another idea to approximate $H(x)$ is to omit the partial derivatives ( $\partial^{2}\ell/\partial x_{i}\partial x_{j}$ , $i\neq j$ ) and just set them to zero, and further we obtain the following diagonal matrix for approximating the Hessian matrix:

H(x)\approx\left[\begin{matrix}\frac{\partial^{2}\ell}{\partial x_{1}^{2}}&0&\cdots&0\\ 0&\frac{\partial^{2}\ell}{\partial x_{2}^{2}}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&\frac{\partial^{2}\ell}{\partial x_{wh}^{2}}\\ \end{matrix}\right]_{wh\times wh}.

(10)

Before ending this section, we summarize adversarial training with gradient approximation (GAAT) in Algorithm 2 [using (8) to approximate $H(x)$ ].

Algorithm 2 Adversarial training with gradient approximation (GAAT)

1: for

n=1\dots N

2: for

i=1\dots M

\delta=0

# or random value

J=\nabla_{x}\ell(x_{i})

# Jacobian matrix

5: # Approximate Hessian matrix

H=J^{T}J

# or use (10) to approximate

H(x)

7: for

t=1\dots T

\delta=\delta+\alpha\cdot\mathrm{sign}(J+H\delta)

\delta=\max(\min(\delta,\epsilon),-\epsilon)

# projected into

l_{\infty}

ball

10: end for

11: # Update classifier parameters with some optimizer

12:

\theta=\theta-\gamma\nabla_{\theta}\ell(f_{\theta}(x_{i}+\delta),y_{i})

13: end for

14: end for

15: return

\theta

# return parameters of the robust classifier

Dataset	Model	Method	Training time	Time saved	Nat acc	Adv acc
CIFAR-10	VGGNet-16	RAT	421 min	46.32%	87.00%	46.64%
	VGGNet-16	GAAT	226 min	46.32%	89.05%	45.50%
	ResNet-18	RAT	189 min	30.16%	82.54%	42.10%
	ResNet-18	GAAT	132 min	30.16%	85.49%	41.39%
	ResNet-50	RAT	563 min	50.80%	86.60%	46.26%
	ResNet-50	GAAT	277 min	50.80%	87.83%	44.54%
	WideResNet-28x4	RAT	1345 min	60.07%	87.72%	47.07%
	WideResNet-28x4	GAAT	537 min	60.07%	90.34%	46.27%
CIFAR-100	ResNet-18	RAT	189 min	29.63%	58.24%	11.55%
	ResNet-18	GAAT	133 min	29.63%	59.58%	9.90%
	ResNet-50	RAT	564 min	51.60%	63.86%	15.30%
	ResNet-50	GAAT	273 min	51.60%	66.66%	13.25%
MNIST	Convolutional ReLU	RAT	32 min	12.50%	99.31%	96.13%
MNIST	Convolutional ReLU	GAAT	28 min	12.50%	99.30%	95.13%

Table 2: Training time and test accuracy (i.e., natural accuracy with natural examples and adversarial accuracy with adversarial examples) for RAT and our method (GAAT) with different models on different datasets.

5 Extensive Experiments and Results

In this section, we evaluate our proposed method for improving adversarial training efficiency. We compared the training time and test accuracy (i.e., natural accuracy and adversarial accuracy) of the models trained with GAAT as well as the models trained with RAT. To test adversarial accuracy, we used the powerful PGD attack (1). MNIST, CIFAR-10 and CIFAR-100 datasets were used in our experiments. Besides, we compared the training time and test accuracy between our method and delayed adversarial training (DAT) Gupta et al. (2020). Additionally, our method was combined with DAT to further improve the training efficiency of the robust models.

5.1 Evaluation Setup

For MNIST, we used the convolutional ReLU model mentioned in Wong and Kolter (2018), with two convolutional layers with 16 and 32 $4\times 4$ filters each, followed by a fully connected layer with 100 units, with a batch size of 512. For CIFAR-10, we used four popular models, VGGNet-16, ResNet-18, ResNet-50 and WideResNet-28x4 with a batch size of 128. For CIFAR-100, we use ResNet-18 and ResNet-50 models with a batch size of 128.

Generally, we used 200 epochs in total, SGD with momentum 0.9 and weight decay $2\times 10^{-4}$ as the optimizer, and an initial learning rate of 0.1, which was decayed by a factor of $0.2$ after epochs 60, 120 and 160. For adversarial training, the parameters are represented in a tuple: { $T$ , $\alpha$ , $\epsilon$ }. The adversarial parameters were set to {10, $1/255$ , $8/255$ } for CIFAR-10 and CIFAR-100. For MNIST, the adversarial parameters were set to {100, 0.01, 0.1}. All adversarial examples were generated by adding an initial random perturbation $\delta^{0}$ with $||\delta^{0}||_{\infty}\leq\epsilon$ .

5.2 Training Time and Test Accuracy

Table 2 displays the training time and test accuracy to train all robust models on different datasets. Table 3 shows the elapsed time of different models to generate adversarial examples with RAT and GAAT on the CIFAR-10 dataset.

RAT (Algorithm 1)	Line 5-8	Line 6-7
VGGNet-16	284 ms	27 ms
ResNet-18	126 ms	10 ms
ResNet-50	389 ms	38 ms
WideResNet-28x4	899 ms	67 ms
GAAT (Algorithm 2)	Line 4-10	Line 8-9	Line 4-6
VGGNet-16	140 ms	9 ms	4 ms
ResNet-18	87 ms	5 ms	5 ms
ResNet-50	180 ms	12 ms	13 ms
WideResNet-28x4	327 ms	31 ms	8 ms

Table 3: Elapsed time of generating adversarial examples with RAT and GAAT.

\psfrag{Adv acc}[c][c][0.9]{Adv acc/\%}\psfrag{PGD steps}[c][c][0.9]{PGD steps}\psfrag{epsilon}[c][c][0.9]{$\epsilon\times 255$}\psfrag{RAT}[c][c][0.65]{RAT}\psfrag{DAT}[c][c][0.65]{DAT}\psfrag{GAAT}[c][c][0.65]{~{}GAAT}\psfrag{DAT+GAAT}[c][c][0.65]{DAT+GAAT}\includegraphics[width=351.23112pt]{figures/pgdsteps.eps}

Figure 2: Adversarial accuracy of four well-trained models with the early stopping strategy in Table 4 on the CIFAR-10 dataset when tested with PGD attacks of different strength.

Training time. To assess how much training time the approximate gradient can reduce, we conducted experiments to record the elapsed time of generating 10-step adversarial examples with the true gradient and the approximate gradient. The results on the CIFAR-10 dataset with different models are shown in Table 3, where Line 5-8 in Algorithm 1 represent generating adversarial examples with RAT; Line 6-7 in Algorithm 1 denote one step PGD with RAT; Line 4-10 in Algorithm 2 represent generating adversarial examples with GAAT; Line 8-9 in Algorithm 2 denote one step PGD with GAAT; and Line 4-6 in Algorithm 2 denote computing the Hessian matrix. Evidently, our method greatly reduces the elapsed time of generating adversarial examples, and for larger and more complicated models, e.g., ResNet-50 and WideResNet-28x4, our method reduces the elapsed time by a larger margin. Besides, the time of computing the Hessian matrix (Line 4-6 in Algorithm 2) relates to the depth of models. However, it is negligible compared with the time of generating adversarial examples.

We also evaluated the total training time for RAT and GAAT with different models on different datasets. The results are displayed in Table 2, showing that the overhead are remarkably decreased when using our proposed method. For all models, we obtained significant reduction in time overhead. Note that the more complex the model is, the greater overhead reduction our method achieves. For instance, our method saves 60% of the time overhead when training a robust WideResNet-28x4 model.

Test accuracy. From Table 2, we find that the test accuracy of models trained with our method is very close to that of models trained with RAT. In fact, the natural accuracy (tested with natural examples) of models trained with our method is a little higher than that of models trained with RAT, while the adversarial accuracy (tested with adversarial examples) of models trained with our method is a little lower than that of models trained with RAT. The possible reason of these results is that we used an approximate gradient to replace the true gradient. Exactly, the approximate gradient is smaller than the true gradient because the approximate gradient omits the high-order items of the true gradient. Hence, in our method, we used weaker attacks than the attack in RAT to adversarially train a robust model. The little loss of the adversarial accuracy is acceptable, compared with the great time overhead saving.

5.3 Comparison and Combination with DAT

Delayed adversarial training (DAT), which uses the natural training to replace the adversarial training for the initial epochs, reduces much time overhead and achieves comparable accuracy. In this subsection, we compare it with our method, and then combine it with our method to further improve adversarial training efficiency. Additionally, early stopping strategy Rice et al. (2020) is regarded as an effective way to avert adversarial overfitting; thus we also used this strategy to save training time and improve model robustness. Table 4 shows the training time and test accuracy of RAT, DAT, GAAT and DAT+GAAT methods. Here, we used the CIFAR-10 dataset and the ResNet-50 model. In DAT and DAT+GAAT, we switched natural training to adversarial training after epoch 100.

As shown in Table 4, GAAT and DAT both significantly reduce the time cost of establishing a robust model, while GAAT can save more time. More intriguingly, when combining DAT with GAAT, DAT+GAAT reduces the time overhead to a greater extent, saving about 70% of the training time compared with RAT (80% with early stopping). Besides, by using early stopping strategy, training efficiency and model robustness are both improved.

Method	Training time	Nat acc	Adv acc
RAT	563 min	86.60%	46.26%
+early stopping	479 min	87.24%	46.33%
GAAT	277 min	87.83%	44.54%
+early stopping	167 min	88.53%	45.23%
DAT	307 min	86.82%	43.08%
+early stopping	233 min	87.04%	44.93%
DAT+GAAT	159 min	88.48%	42.32%
+early stopping	119 min	88.66%	42.73%

Table 4: Training time and test accuracy of four adversarial training methods for the ResNet-50 model on the CIFAR-10 dataset.

5.4 Generalization to Attacks of Different Strength

We assessed robustness of the four model in Table 4 against attacks of different strength which they were not trained to defend against. We varied the number of PGD steps and the value of $\epsilon$ to test attacks. Figure 2 shows the performances with RAT, DAT, GAAT and DAT+GAAT methods on the CIFAR-10 dataset. The models trained using three efficient methods, i.e., DAT, GAAT and DAT+GAAT, achieve comparable robustness against a wide range of PGD attacks and follows the same pattern as RAT.

6 Conclusion

In this paper, we aimed at reducing the computational cost of regular adversarial training. We came up with a new idea about using approximate adversarial gradient to generate adversarial examples. Then, we proposed a variant of the regular adversarial training, adversarial training with gradient approximation, which remarkably improves training efficiency to build up robust deep models. Furthermore, extensive experiments and results showed that our proposed method achieves comparable performances with significantly reducing the computational cost.

References

Balduzzi et al. [2017] David Balduzzi, Brian McWilliams, and Tony Butler-Yeoman. Neural Taylor approximations: Convergence and exploration in rectifier networks. In ICML, 2017.
Cai et al. [2018] Qi-Zhi Cai, Chang Liu, and Dawn Song. Curriculum adversarial training. In IJCAI, 2018.
Cohen et al. [2019] Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter. Certified adversarial robustness via randomized smoothing. In ICML, 2019.
Gonzalez and Miikkulainen [2020] Santiago Gonzalez and Risto Miikkulainen. Optimizing loss functions through multivariate Taylor polynomial parameterization. arXiv preprint arXiv:2002.00059, 2020.
Goodfellow et al. [2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
Gupta et al. [2020] Sidharth Gupta, Parijat Dube, and Ashish Verma. Improving the affordability of robustness training for DNNs. In CVPR, 2020.
He et al. [2015] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
Hendrycks et al. [2019] Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pre-training can improve model robustness and uncertainty. In ICML, 2019.
Kurakin et al. [2016] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
Kurakin et al. [2017] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physicalworld. In ICLR Workshop Track Proceedings, 2017.
Li et al. [2020] Bai Li, Shiqi Wang, Suman Jana, and Lawrence Carin. Towards understanding fast adversarial training. arXiv preprint arXiv:2006.03089, 2020.
Madry et al. [2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
Mathews and Fink [2005] J. Mathews and K. Fink. Numerical Methods Using MATLAB. Prentice Hall, New Jersey, 2005.
Pang et al. [2019] Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. In ICML, 2019.
Rice et al. [2020] Leslie Rice, Eric Wong, and J. Zico Kolter. Overfitting in adversarially robust deep learning. In ICML, 2020.
Salmany et al. [2019] Hadi Salmany, Greg Yangx, Jerry Li, Pengchuan Zhang, Huan Zhang, Ilya Razenshteyn, and S $\mathrm{\acute{e}}$ bastien Bubeck. Provably robust deep learning via adversarially trained smoothed classifiers. In NeurIPS, 2019.
Shafahi et al. [2019] Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In NeurIPS, 2019.
Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Singla and Feizi [2020] Sahil Singla and Soheil Feizi. Second-order provable defenses against adversarial attacks. In ICML, 2020.
Singla et al. [2019] Sahil Singla, Eric Wallace, Shi Feng, and Soheil Feizi. Understanding impacts of high-order loss approximations and features in deep learning interpretation. In ICML, 2019.
Strauss et al. [2017] Thilo Strauss, Markus Hanselmann, Andrej Junginger, and Holger Ulmer. Ensemble methods as a defense to adversarial perturbations against deep neural networks. arXiv preprint arXiv:1709.03423, 2017.
Szegedy et al. [2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In ICLR, 2014.
Tramèr et al. [2018] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In ICLR, 2018.
Wang et al. [2019] Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. On the convergence and robustness of adversarial training. In ICML, 2019.
Wong and Kolter [2018] Eric Wong and J. Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In ICML, 2018.
Wong et al. [2020] Eric Wong, Leslie Rice, and J. Zico Kolter. Fast is better than free: Revisiting adversarial training. In ICLR, 2020.
Zagoruyko and Komodakis [2016] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
Zhang et al. [2017] Yunong Zhang, Huihui Gong, Jian Li, Huanchang Huang, and Min Yang. Euler-precision ZFD formula 3N $g$ PFD_G extended to future minimization with theoretical guarantees and numerical experiments. In IAEAC, 2017.
Zhang et al. [2019] Yunong Zhang, Huihui Gong, Min Yang, Jian Li, and Xuyun Yang. Stepsize range and optimal value for Taylor–Zhang discretization formula applied to zeroing neurodynamics illustrated via future equality-constrained quadratic programming. IEEE Transactions on Neural Networks and Learning Systems, 2019.