This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Making Adversarial Examples More Transferable and Indistinguishable

Junhua Zou1, Yexin Duan1, Boyu Li2, Wu Zhang1, Yu Pan1, Zhisong Pan1 Corresponding authors
Abstract

Fast gradient sign attack series are popular methods that are used to generate adversarial examples. However, most of the approaches based on fast gradient sign attack series cannot balance the indistinguishability and transferability due to the limitations of the basic sign structure. To address this problem, we propose a method, called Adam Iterative Fast Gradient Tanh Method (AI-FGTM), to generate indistinguishable adversarial examples with high transferability. Besides, smaller kernels and dynamic step size are also applied to generate adversarial examples for further increasing the attack success rates. Extensive experiments on an ImageNet-compatible dataset show that our method generates more indistinguishable adversarial examples and achieves higher attack success rates without extra running time and resource. Our best transfer-based attack NI-TI-DI-AITM can fool six classic defense models with an average success rate of 89.3% and three advanced defense models with an average success rate of 82.7%, which are higher than the state-of-the-art gradient-based attacks. Additionally, our method can also reduce nearly 20% mean perturbation. We expect that our method will serve as a new baseline for generating adversarial examples with better transferability and indistinguishability.

Introduction

Despite the great success on many tasks, deep neural networks (DNNs) have been shown that they are vulnerable to adversarial examples (Goodfellow, Shlens, and Szegedy 2015; Szegedy et al. 2014), i.e., the inputs with imperceptible perturbations can cause the incorrect results of DNNs. Moreover, a tougher problem terms transferability (Liu et al. 2017; Moosavi-Dezfooli et al. 2017) that the adversarial examples crafted by a known DNN can also fool other unknown DNNs. Consequently, adversarial examples present severe threats to the real-world applications (Athalye et al. 2018; Eykholt et al. 2018; Kurakin, Goodfellow, and Bengio 2017a) and have motivated extensive research on defense methods (Madry et al. 2018; Liao et al. 2018; Guo et al. 2018; Raghunathan, Steinhardt, and Liang 2018; Wong and Kolter 2018; Pang, Du, and Zhu 2018; Samangouei, Kabkab, and Chellappa 2018). Foolbox (Rauber et al. 2020) roughly categorizes attack methods into three types: the gradient-based methods (Dong et al. 2018; Goodfellow, Shlens, and Szegedy 2015; Kurakin, Goodfellow, and Bengio 2017b), the score-based methods (Narodytska and Kasiviswanathan 2017), and the decision-based methods (Brendel, Rauber, and Bethge 2018; Chen, Jordan, and Wainwright 2020). In this paper, we focus on the gradient-based methods. Although the adversarial examples crafted by using the gradient-based methods satisfy the Lp{{L}_{p}} bound and continually achieve higher black-box success rates, these examples can be identified easily. In addition, the adversarial examples generated by the approaches based on the basic sign structure are limited. Taking TI-MI-FGSM (the combination of translation-invariant method (Dong et al. 2019) and momentum iterative fast gradient sign method (Dong et al. 2018)) as an example, the gradient processing steps, such as Gaussian blur, the gradient normalization, and the sign function, severely damage the gradient information. Additionally, the sign function also increases the perturbation size.

Refer to caption
Figure 1: Overview of the our method. (a) We replace the sign function with the tanh function to generate smaller perturbations. (b) We use Adam instead of momentum method and gradient normalization to get larger losses in only ten iterations. (c) We use smaller kernels in Gaussian blur to avoid the loss of gradient information. (d) We gradually increase the step size.
Refer to caption
Figure 2: The comparison of clean examples and adversarial examples crafted by three combined attacks. (1) TI-DI-AITM (λ\lambda=0.65) achieves similar success rates as TI-DIM and generates much more indistinguishable adversarial examples. (2) TI-DI-AITM (λ\lambda=1.3) achieves much higher success rates than TI-DIM and generates more indistinguishable adversarial examples.

In this paper, we propose a method, called Adam Iterative Fast Gradient Tanh Method (AI-FGTM), which improves the indistinguishability and transferability of adversarial examples. It is known that the fast gradient sign attack series iteratively process gradient information with transformation, normalization, and the sign function. To preserve the gradient information as much as possible, AI-FGTM modifies the major gradient processing steps. Still take TI-MI-FGSM as an example, to avoid the loss of gradient information and generate imperceptible perturbations, we replace the momentum algorithm and the sign function with Adam (Kingma and Ba 2015) and the tanh function, respectively. Then, we employ dynamic step size and smaller filters in Gaussian blur. The overview of AI-FGTM is shown in Fig. 1, and the detailed process will be given in Methodology. Furthermore, combining the existing attack methods with AI-FGTM can get much smaller perturbations and deliver state-of-the-art success rates. Fig. 2 shows the comparison of different examples, where the adversarial examples are crafted by three combined attacks, namely, DIM (Xie et al. 2019), TI-DIM (Dong et al. 2019) and TI-DI-AITM (the combination of TI-DIM and our method).

In summary, we make the following contributions:

  • 1.

    Inspired by the limitations of the fast gradient sign series, we propose AI-FGTM in which the major gradient processing steps are improved to boost the indistinguishability and transferability of adversarial examples.

  • 2.

    We show that AI-FGTM integrated with other transfer-based attacks can obtain much smaller perturbations and larger losses than the current sign attack series.

  • 3.

    The empirical experiments prove that without extra running time and resource, our best attack fools six classic defense models with an average success rate of 89.3% and three advanced defense models with an average success rate of 82.7%, which are higher than the state-of-the-art gradient-based attacks.

Review of Existing Attack Methods

Problem definition

Let {(fWi)i[N],(fBj)j[M]}\left\{{{{\left({{f_{Wi}}}\right)}_{i\in\left[N\right]}},{{\left({{f_{Bj}}}\right)}_{j\in\left[M\right]}}}\right\} be a set of pre-trained classifiers, where (fWi)i[N]\ {{\left({{f_{Wi}}}\right)}_{i\in\left[N\right]}}{\rm{}} denotes the white-box classifiers and (fBj)j[M]{{\left({{f_{Bj}}}\right)}_{j\in\left[M\right]}} represents the unknown classifiers. Given a clean example xx, it can be correctly classified to the ground-truth label ytrue{y^{true}} by all pre-trained classifiers. It is possible to craft an adversarial example xadv{x^{adv}} that satisfies xadvxpε{\left\|{{x^{adv}}-x}\right\|_{p}}\leq\varepsilon by using the white-box classifiers, where pp could be 0, 11, 22, \infty, and ε\varepsilon is the perturbation size. In this paper, we focus on non-targeted attack with p=p=\infty. Note that, the adversarial example xadv{x^{adv}} can mislead the white-box classifiers and the unknown classifiers simultaneously.

The gradient-based methods

Here, we introduce the family of the gradient-based methods.

Fast Gradient Sign Method (FGSM) (Goodfellow, Shlens, and Szegedy 2015) establishes the basic framework of the gradient-based methods. It efficiently crafts an adversarial example xadv{x^{adv}} by using one-step update while maximizing the loss function J(xadv,ytrue)J({{x^{adv}},{y^{true}}}) of a given classifier as

xadv=x+εsign(xJ(x,ytrue)),{x^{adv}}=x+\varepsilon\cdot{\rm{sign}}({{\nabla_{x}}J({x,{y^{true}}})}), (1)

where xJ(,){\nabla_{x}}J({\cdot,\cdot}) computes the gradient of the loss function w.r.t. xx, sign(){\rm{sign}}(\cdot) is the sign function, and ε\varepsilon is the given scalar value that restricts the L{L_{\infty}} norm of the perturbation.

Basic Iterative Method (BIM) (Kurakin, Goodfellow, and Bengio 2017a) is the iterative version of FGSM that performs better in white-box attack but less effective in transfer-based attack. It iteratively updates the adversarial example xtadvx_{t}^{adv} with a small step size α{\rm{\alpha}} as

xt+1adv=Clipεx{xtadv+αsign(xtadvJ(xtadv,ytrue))},x_{t+1}^{adv}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{adv}+\alpha\cdot{\rm{sign}}({{\nabla_{x_{t}^{adv}}}J({x_{t}^{adv},{y^{true}}})})}\right\}, (2)

where α=ε/T{\rm{\alpha}}=\varepsilon/T with TT denoting the number of iterations. Clipεx{}{\rm{Clip}}_{\varepsilon}^{x}\left\{\cdot\right\} performs per-pixel clipping as

Clipεx{x}=min{255,x+ε,max{0,xε,x}}.{\rm{Clip}}_{\varepsilon}^{x}\left\{{x^{\prime}}\right\}=\min\left\{{255,x+\varepsilon,\max\left\{{0,x-\varepsilon,x^{\prime}}\right\}}\right\}. (3)

Momentum Iterative Fast Gradient Sign Method (MI-FGSM) (Dong et al. 2018) enhances the transferability of adversarial examples by incorporating momentum term into gradient process, given as

gt+1=μgt+xtadvJ(xtadv,ytrue)xtadvJ(xtadv,ytrue)1,{g_{t+1}}=\mu\cdot{g_{t}}+\frac{{{\nabla_{x_{t}^{adv}}}J({x_{t}^{{}^{adv}},{y^{true}}})}}{{{{\left\|{{\nabla_{x_{t}^{adv}}}J({x_{t}^{{}^{adv}},{y^{true}}})}\right\|}_{1}}}}, (4)
xt+1adv=Clipεx{xtadv+αsign(gt+1)},x_{t+1}^{{}^{adv}}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{{}^{adv}}+\alpha\cdot{\rm{sign}}({{g_{t+1}}})}\right\}, (5)

where gt+1{g_{t+1}} denotes the accumulated gradient at (t+1)th{(t+1)}_{th} iteration , and μ\mu is the decay factor of gt+1{g_{t+1}}.

Nesterov Iterative Method (NIM) (Lin et al. 2020) integrates an anticipatory update into MI-FGSM and further increases the transferability of adversarial examples. The update procedures are expressed as

xtnes=xtadv+αμgt,x_{t}^{nes}=x_{t}^{adv}+\alpha\cdot\mu\cdot{g_{t}}, (6)
gt+1=μgt+xtnesJ(xtnes,ytrue)xtnesJ(xtnes,ytrue)1,{g_{t+1}}=\mu\cdot{g_{t}}+\frac{{{\nabla_{x_{t}^{nes}}}J({x_{t}^{nes},{y^{true}}})}}{{{{\left\|{{\nabla_{x_{t}^{nes}}}J({x_{t}^{nes},{y^{true}}})}\right\|}_{1}}}}, (7)
xt+1adv=Clipεx{xtadv+αsign(gt+1)}.x_{t+1}^{adv}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{adv}+\alpha\cdot{\rm{sign}}({{g_{t+1}}})}\right\}. (8)

Scale-Invariant Method (SIM) (Lin et al. 2020) applies the scale copies of the input image to further improve the transferability. However, SIM requires much more running time and resource.

Diverse Input Method (DIM) (Xie et al. 2019) applies random resizing and padding to the adversarial examples with the probability pp at each iteration. DIM can be easily integrated into other gradient-based methods to further boost the transferability of adversarial examples. The transformation function T(xtadv,p){T(x_{t}^{adv},p)} is

T(xtadv,p)={T(xtadv),pxtadv,(1p).T(x_{t}^{adv},p)=\left\{\begin{array}[]{ll}T(x_{t}^{adv}),p\\ x_{t}^{adv},(1-p)\end{array}\right.. (9)

Translation-Invariant Method (TIM) (Dong et al. 2019) optimizes an adversarial example by an ensemble of translated examples as

xt+1adv=i,jTij(xtadv),s.t.xtadvxrealϵ,\left.\begin{array}[]{ll}x_{t+1}^{adv}=\sum_{i,j}T_{ij}(x_{t}^{adv}),s.t.\,\,\|x_{t}^{adv}-x^{real}\|_{\infty}\leq\epsilon,\end{array}\right. (10)

where Tij(xtadv)T_{ij}(x_{t}^{adv}) denotes the translation function that respectively shifts input xtadvx_{t}^{adv} by ii and jj pixels along the two-dimensions. And TIM calculates the gradient of the loss function at a point x^tadv\hat{x}_{t}^{adv}, convolves the gradient with a pre-defined Gaussian filter and updates as

g=xtadv(i,jwijJ(Tij(xtadv),ytrue))|xtadv=x^tadvi,jwijTij(xtadvJ(xtadv,ytrue))|xtadv=x^tadvWxtadvJ(xtadv,ytrue),\begin{array}[]{l}{g^{{}^{\prime}}}={\nabla_{x_{t}^{adv}}}({\sum\limits_{i,j}{{w_{ij}}J({{T_{ij}}({x_{t}^{adv}}),{y^{true}}})}}){|_{x_{t}^{adv}=\hat{x}_{t}^{adv}}}\\ \quad\approx\sum\limits_{i,j}{{w_{ij}}{T_{-i-j}}({{\nabla_{x_{t}^{adv}}}J({x_{t}^{adv},{y^{true}}})}){|_{x_{t}^{adv}=\hat{x}_{t}^{adv}}}}\\ \quad\approx W*{\nabla_{x_{t}^{adv}}}J({x_{t}^{adv},{y^{true}}})\end{array}, (11)
xt+1adv=Clipεx{xtadv+αsign(g)}.x_{t+1}^{adv}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{adv}+\alpha\cdot{\rm{sign}}({{g^{{}^{\prime}}}})}\right\}. (12)

Note that, with limited running time and computing resources, the combination of NIM, TIM and DIM (NI-TI-DIM) is strong transfer-based attack method so far.

Methodology

Motivations

Based on the contradiction that the adversarial examples achieve high success rates but can be identified easily, our observations are shown as follows:

  • 1.

    The sign function in gradient-based methods has two main disadvantages. On the one hand, the sign function normalizes all the gradient values to 1, -1 or 0, and thus leads to the loss of gradient information. On the other hand, the sign function normalizes some small gradient values to 1 or -1, which increases the perturbation size. While the tanh function not only normalizes the large gradient values as the sign function, but also maintains the small gradient values as function y=xy=x. Therefore, the tanh function can replace the sign function and reduce the perturbation size.

  • 2.

    With iterations T=10T=10, the applications of Nesterov’s accelerated gradient (NAG) (Lin et al. 2020) and the momentum algorithm (Dong et al. 2018) in adversarial attacks demonstrate that we can migrate other methods to generate adversarial examples. Moreover, the ttht_{\rm{th}} gradient xtadvJ(xtadv,ytrue){\nabla_{x_{t}^{adv}}J({x_{t}^{adv},{y^{true}}})} is normalized by the L1L_{1} distance of itself before the momentum algorithm. Intuitively, due to the performance of traditional convergence algorithms, Adam can achieve larger losses than the momentum algorithm in such small number of iterations. Additionally, Adam can normalize the gradient with mt/vt+δ{{{m_{t}}}\mathord{\left/{\vphantom{{{m_{t}}}{\sqrt{{v_{t}}+\delta}}}}\right.\kern-1.2pt}{\sqrt{{v_{t}}+\delta}}}, where mt{m_{t}} denotes the first moment vector, vt{v_{t}} is the second moment vector and δ=108\delta={10^{-8}}.

  • 3.

    Traditional convergence algorithms employ learning rate decay to improve the model performance. Existing gradient-based methods set stable step size α=ε/T{\rm{\alpha}}=\varepsilon/T. In intuition, we can improve the transferability with the step size change. Different from the traditional convergence algorithms, the attack methods with the ε\varepsilon-ball restriction aim to maximize the loss function of the target models. Hence, we use the increasing step size with t=0T1αt=ε\sum\nolimits_{t=0}^{{\rm{T-1}}}{{\alpha_{\rm{t}}}}={\rm{\varepsilon}}.

  • 4.

    Dong et al. (Dong et al. 2019) show that Gaussian blur with a large kernel improves the transferability of adversarial examples. However, Gaussian blur with larger kernels leads to the loss of the gradient information. Using the modifications mentioned above, the gradient information is preserved and plays a more important role in generating adversarial examples. Consequently, we apply smaller kernels in Gaussian blur to avoid the loss of the gradient information.

Based on the above four observations, we propose AI-FGTM to craft the adversarial examples which are expected to be more transferable and indistinguishable.

AI-FGTM

Adam (Kingma and Ba 2015) uses the exponential moving averages of squared past gradients to mitigate the rapid decay of learning rate. Essentially, this algorithm limits the reliance of update to only the past few gradients by the following simple recursion:

mt+1=β1mt+(1β1)gt+1,{m_{t+1}}={\beta_{1}}{m_{t}}+({1-{\beta_{1}}}){g_{t+1}}, (13)
vt+1=β2vt+(1β2)gt+12,{v_{t+1}}={\beta_{2}}{v_{t}}+({1-{\beta_{2}}})g_{t+1}^{2}, (14)
θt+1=θtα(1β2t)1β1tmt+1vt+1+δ,{\theta_{t+1}}={\theta_{t}}-\alpha\cdot\frac{{\sqrt{({1-\beta_{2}^{t}})}}}{{1-\beta_{1}^{t}}}\cdot\frac{{{m_{t+1}}}}{{\sqrt{{v_{t+1}}+\delta}}}, (15)

where mt{m_{t}} denotes the first moment vector, vt{v_{t}} represents the second moment vector, β1{\beta_{1}} and β2{\beta_{2}} are the exponential decay rates.

Table 1: Abbreviations used in the paper
Abbreviation Definition
TI-DIM The combination of MI-FGSM, TIM and DIM
NI-TI-DIM The combination of MI-FGSM, NIM, TIM and DIM
SI-NI-TI-DIM The combination of MI-FGSM, NIM, SIM, TIM and DIM
TI-DI-AITM The combination of AI-FGTM, TIM and DIM
NI-TI-DI-AITM The combination of AI-FGTM, NIM, TIM and DIM
SI-NI-TI-DI-AITM The combination of AI-FGTM, NIM, SIM, TIM and DIM
Table 2: The running time (s) of generating 1000 adversarial examples for Inc-v3, Inc-v4, IncRes-v2, Res-v2-101 and the ensemble of theses four models.
Attack Inc-v3 Inc-v4 IncRes-v2 Res-v2-101 Model ensemble
TI-DIM 172.8 261.2 277.8 234.0 767.5
NI-TI-DIM 174.5 238.9 291.8 243.0 830.2
SI-NI-TI-DIM 608.2 1086.3 1156.2 1096.2 3490.2
TI-DI-AITM 170.6 258.5 280.4 239.3 762.7
NI-TI-DI-AITM 173.5 253.7 288.1 242.1 770.1
SI-NI-TI-DI-AITM 603.6 1103.9 1119.4 1123.1 3341.6

Due to the opposite optimization objectives, we apply Adam into adversarial attack with some modifications. Starting with x0adv=xx_{0}^{adv}=x, m0=0m_{0}=0 and v0=0v_{0}=0, the first moment estimate and the second moment estimate are presented as follows:

mt+1=mt+μ1xtadvJ(xtadv,ytrue),{m_{t+1}}={m_{t}}+{\mu_{1}}\cdot{\nabla_{x_{t}^{adv}}}J({x_{t}^{adv},{y^{true}}}), (16)
vt+1=vt+μ2(xtadvJ(xtadv,ytrue))2,{v_{t+1}}={v_{t}}+{\mu_{2}}\cdot{({{\nabla_{x_{t}^{adv}}}J({x_{t}^{adv},{y^{true}}})})^{2}}, (17)

where μ1{\mu_{1}} and μ2{\mu_{2}} denote the first moment factor and second moment factor, respectively. We replace the sign function with the tanh function and update xt+1advx_{t+1}^{adv} as

αt=εt=0T11β1t+1(1β2t+1)1β1t+1(1β2t+1),{\alpha}_{t}=\frac{\varepsilon}{{\mathop{\sum}\nolimits_{t=0}^{T-1}\frac{{1-\beta_{1}^{t+1}}}{{\sqrt{({1-\beta_{2}^{t+1}})}}}}}\frac{{1-\beta_{1}^{t+1}}}{{\sqrt{({1-\beta_{2}^{t+1}})}}}, (18)
xt+1adv=Clipεx{xtadv+αttanh(λmt+1vt+1+δ)},x_{t+1}^{adv}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{adv}+{\alpha_{t}}\cdot{\rm{tanh}}({\lambda\frac{{{m_{t+1}}}}{{\sqrt{{v_{t+1}}}+\delta}}})}\right\}, (19)

where β1{\beta_{1}} and β2{\beta_{2}} are exponential decay rates, and λ\lambda denotes the scale factor. Specifically, αt{\alpha}_{t} is the increasing step size with t=0T1αt=ε\sum\nolimits_{t=0}^{{\rm{T-1}}}{{\alpha_{\rm{t}}}}={\rm{\varepsilon}}. Then the tanh function reduces the perturbations of adversarial examples without any success rate reduction. Furthermore, mt+1/(vt+1+δ){m_{t+1}}/({\sqrt{{v_{t+1}}}+\delta}) replaces the L1L_{1} normalization and the first moment estimate of Eq. 4 due to the fact that Adam has faster divergence speed than momentum attack algorithm (as shown in Fig. 1(b)).

The combination of AI-FGTM and NIM

Inspired by NIM that integrates an anticipatory update into MI-FGSM. Similarly, we can also integrate an anticipatory update into AI-FGTM. We first calculate the step size in each iteration as Eq. 18, and the Nesterov term can be expressed as

xtnes=xtadv+αtmtvt+δ.x_{t}^{nes}=x_{t}^{adv}+{\alpha}_{t}\cdot{\frac{{{m_{t}}}}{{\sqrt{{v_{t}}}+\delta}}}. (20)

The remaining update procedures are similar to Eq. 16, Eq. 17 and Eq. 19, which can be expressed as

mt+1=mt+μ1xtnesJ(xtnes,ytrue),{m_{t+1}}={m_{t}}+{\mu_{1}}\cdot{\nabla_{x_{t}^{nes}}}J({x_{t}^{nes},{y^{true}}}), (21)
vt+1=vt+μ2(xtnesJ(xtnes,ytrue))2,{v_{t+1}}={v_{t}}+{\mu_{2}}\cdot{({{\nabla_{x_{t}^{nes}}}J({x_{t}^{nes},{y^{true}}})})^{2}}, (22)
xt+1adv=Clipεx{xtadv+αttanh(λmt+1vt+1+δ)}.x_{t+1}^{adv}={\rm{Clip}}_{\varepsilon}^{x}\left\{{x_{t}^{adv}+{\alpha_{t}}\cdot{\rm{tanh}}({\lambda\frac{{{m_{t+1}}}}{{\sqrt{{v_{t+1}}}+\delta}}})}\right\}. (23)

We summarize NI-TI-DI-AITM as the combination of AI-FGTM, NIM, TIM and DIM, and the procedure is given in Algorithm 1.

Algorithm 1 NI-TI-DI-AITM

Input: A clean example xx and its ground-truth label ytrue{y^{true}};
Parameters: The perturbation size ε\varepsilon; the iteration number TT; the decay factors μ1{\mu_{1}} and μ2{\mu_{2}}; the exponential decay rates β1{\beta_{1}} and β2{\beta_{2}}; the scale factor λ\lambda; the probability pp.
Output: An adversarial example xadvx^{adv}.

1:  m0=0{m_{0}}=0; v0=0{v_{0}}=0; x0adv=xx_{{}_{0}}^{adv}=x;
2:  for t=0t=0 to T1T-1 do
3:     Input xtadvx_{t}^{adv};
4:     Update step size α\alpha by Eq. 18;
5:     Obtain the Nesterov term xtnesx_{t}^{nes} by Eq. 20;
6:     Obtain the diverse input T(xtnes,p)T(x_{t}^{nes},p) by Eq. 9;
7:     Compute the gradient xtadvJ(T(xtnes,p),ytrue){\nabla_{x_{t}^{adv}}}J({T(x_{t}^{nes},p),{y^{true}}});
8:     Obtain processed gradient gg^{\prime} by Eq. 11;
9:     Update mt+1{m_{t+1}} by mt+1=mt+μ1g{m_{t+1}}={m_{t}}+{\mu_{1}}\cdot g^{\prime};
10:     Update vt+1{v_{t+1}} by vt+1=vt+μ2(g)2{v_{t+1}}={v_{t}}+{\mu_{2}}\cdot{({g^{\prime}})^{2}};
11:     Update xt+1advx_{t+1}^{adv} by Eq. 23;
12:  end for
13:  Return xadv=xtadv{x^{adv}}=x_{t}^{adv}.

Experiments

In this section, we provide extensive experimental results on an ImageNet-compatible dataset to validate our method. First, we introduce the experimental setting. Then, we compare the running efficiency of different transfer-based attacks. Next, we present the ablation study of the effects of different parts of our method. Finally, we compare the results of the baseline attacks. Table  1 presents the definitions of the abbreviations used in this paper.

Experimental setting

Dataset. We utilize 1000 images 111https://github.com/tensorflow/cleverhans/tree/master/examples/nips17˙adversarial˙competition/dataset which are used in the NIPS 2017 adversarial competition to conduct the following experiments.

Models. In this paper, we employ thirteen models to perform the following experiments. Four non-defense models (Inception v3 (Inc-v3) (Szegedy et al. 2016), Inception v4 (Inc-v4), Inception ResNet v2 (IncRes-v2) (Szegedy et al. 2017), and ResNet v2-101 (Res-v2-101) (He et al. 2016)) are used as white-box models to craft adversarial examples. Six defense models (Inc-v3ens3, Inc-v3ens4, IncResv2ens (Tramèr et al. 2018), high-level representation guided denoiser (HGD) (Liao et al. 2018), input transformation through random resizing and padding (R&P) (Xie et al. 2018), and rank-3 submission 222https://github.com/anlthms/nips-2017/tree/master/mmd in the NIPS 2017 adversarial competition) are employed as classic models to evaluate the crafted adversarial examples. In addition, we also evaluate the attacks with three advanced defenses (Feature Distillation (Liu et al. 2019), Comdefend (Jia et al. 2019), and Randomized Smoothing (Cohen, Rosenfeld, and Kolter 2019)).

Baselines. We focus on the comparison of TI-DIM, NI-TI-DIM, TI-DI-AITM and NI-TI-DI-AITM, where TI-DIM and NI-TI-DIM are both the state-of-the-art methods.

Hyper-parameters. According to TI-DIM (Dong et al. 2019) and NI-FGSM (Lin et al. 2020), we set the maximum perturbation ε=16{\rm{\varepsilon}}=16, and the number of iteration T=10T=10. Specifically, we set the kernel size to 15×1515\times 15 in normal TI-DIM and NI-TI-DIM while 9×99\times 9 in TI-DI-AITM. The exploration of appropriate settings of our method is illustrated in Appendix.

The mean perturbation size. For an adversarial example xadvx^{adv} with the size of M×N×3{M\times N\times 3}, the mean perturbation size Pm{P_{m}} can be calculated as

Pm=i=1Mj=1Nk=13|xijkadvxijk|M×N×3,{P_{m}}=\frac{{\mathop{\sum}\nolimits_{i=1}^{M}\mathop{\sum}\nolimits_{j=1}^{N}\mathop{\sum}\nolimits_{k=1}^{3}\left|{x_{ijk}^{adv}-{x_{ijk}}}\right|}}{{M\times N\times 3}}, (24)

where xijkx_{ijk} denotes the value of channel kk of the image xx at coordinates (i,j)({i,j}).

Table 3: Ablation study of the effects of the tanh function, the Adam optimizer, the kernel size and dynamic step size. The adversarial examples are generated for Inc-v3, Inc-v4, IncRes-v2, Res-v2-101 respectively using TI-DIM and TI-DIM with different parts of our method. We compare the mean perturbations and the mean attack success rates of the generated adversarial examples against six classic defense models.
Attack tanh Adam smaller kernels dynamic step size mean success rate (%) mean perturbation
TI-DIM 82.0 10.46
82.4 9.14
83.6 9.20
83.1 7.86
86.5 7.82
88.0 8.11
Table 4: The success rates (%) of adversarial attacks against six defense models under single-model setting. The adversarial examples are generated for Inc-v3, Inc-v4, IncRes-v2, Res-v2-101 respectively using TI-DIM, NI-TI-DIM, TI-DI-AITM and NI-TI-DI-AITM.
Attack Inc-v3ens3 Inc-v3ens4 IncRes-v2ens HGD R&P NIPS-r3
Inc-v3
TI-DIM
TI-DI-AITM
NI-TI-DIM
NI-TI-DI-AITM
46.5
53.8
48.3
51.9
47.3
53.3
48.6
52.4
38.1
39.0
36.9
38.2
38.0
40.2
37.3
39.3
36.9
39.1
36.8
38.1
41.1
45.7
42.5
44.6
Inc-v4
TI-DIM
TI-DI-AITM
NI-TI-DIM
NI-TI-DI-AITM
48.2
53.2
52.4
54.8
47.9
51.8
51.8
53.7
39.1
42.4
41.3
41.7
40.6
43.7
41.9
43.9
39.3
42.5
41.1
43.2
41.5
44.6
42.7
44.1
IncRes-v2
TI-DIM
TI-DI-AITM
NI-TI-DIM
NI-TI-DI-AITM
60.8
64.9
61.5
66.5
59.6
61.8
60.4
63.8
59.3
62.1
59.9
62.0
58.4
62.7
60.1
63.2
60.7
64.8
62.2
65.6
61.3
65.1
63.1
65.8
Res-v2-101
TI-DIM
TI-DI-AITM
NI-TI-DIM
NI-TI-DI-AITM
56.1
62.8
59.5
64.0
55.4
62.8
57.7
61.0
49.8
54.4
50.4
54.6
51.3
55.3
51.9
54.8
50.4
54.2
50.8
53.4
52.3
57.1
54.6
57.6
Refer to caption
Figure 3: Results of adversarial examples generated for the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-v2-101 using different attacks.
Table 5: The success rates (%) of adversarial attacks against six defense models under model ensemble setting. The adversarial examples are generated for the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-v2-101 using TI-DIM, NI-TI-DIM, TI-DI-AITM and NI-TI-DI-AITM.
Attack Inc-v3ens3 Inc-v3ens4 IncRes-v2ens HGD R&P NIPS-r3 Average
TI-DIM 83.9 83.2 78.4 81.9 81.2 83.6 82.0
TI-DI-AITM 90.2 88.5 85.4 88.3 87.1 88.7 88.0
NI-TI-DIM 85.5 85.9 80.1 83.6 82.9 84.3 83.7
NI-TI-DI-AITM 91.8 90.3 85.8 89.4 88.6 90.1 89.3
Table 6: The success rates (%) of adversarial attacks against Feature Distillation (Liu et al. 2019), Comdefend (Jia et al. 2019), and Randomized Smoothing (Cohen, Rosenfeld, and Kolter 2019) under model ensemble setting. The adversarial examples are generated for the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-v2-101 using TI-DIM, NI-TI-DIM, TI-DI-AITM and NI-TI-DI-AITM.
Attack Feature Distillation Comdefend Randomized Smoothing Average
TI-DIM 83.1 78.2 49.9 70.4
TI-DI-AITM 90.6 87.9 63.7 80.7
NI-TI-DIM 82.1 84.7 58.6 75.1
NI-TI-DI-AITM 91.4 90.3 66.4 82.7

The comparison of running efficiency

We compare the running time of each attack mentioned in Table 1 using a piece of Nvidia GPU GTX 1080 Ti. Table 2 shows the running time under single-model setting and model ensemble setting. It can be seen that attacks combined with our method AI-FGTM do not cost extra running. Additionally, SIM requires at least two pieces of GPUs under model ensemble setting and costs much more running time than other attacks under both single-model setting and model ensemble setting. Therefore, we exclude SIM in the following experiments.

Ablation study

Table 3 shows the ablation study of the effects of different parts of our method. We compare the mean perturbation and the mean success rates of the adversarial examples against six classic defense models. Our observations are shown as follow:

  • 1.

    Both of the tanh function and Adam can reduce the perturbation size. Additionally, Adam can also improve the transferability of adversarial examples.

  • 2.

    The combination of the tanh function and Adam can greatly reduce the perturbation size, but only slightly improve the transferability of adversarial examples.

  • 3.

    Using smaller kernels and dynamic step size can improve the transferability of adversarial examples even using dynamic step size slightly increase the perturbation size.

The validation results in the single-model attack scenario

In this section, we compare the success rates of AI-FGTM based attacks and the baseline attacks against six classic defenses. We generate adversarial attacks for Inc-v3, Incv4, IncRes-v2, and Res-v2-101 by severally using TI-DIM, TI-DI-AITM, NI-TI-DIM and NI-TI-DI-AITM.

As shown in Table 4, we find that our attack method consistently outperforms the baseline attacks by a large margin. Furthermore, according to Table 4 and Fig. 5(b), we observe that our method can generate adversarial examples with much better transferability and indistinguishability.

The validation results in the model ensemble attack scenario

In this section, we present the success rates of adversarial examples generated for an ensemble of four non-defense models. Table 5 gives the results of transfer-based attacks against six classic defense models. It shows that our methods achieve higher success rates than baseline attacks. In particular, without extra running time and resource, TI-DI-AITM and NI-TI-DI-AITM fool six defense models with an average success rate of 88.0% and 89.3%, respectively, which are higher than the state-of-the-art gradient-based attacks.

We also validate our method by comparing the different results between DIM, TI-DIM and TI-DI-AI-FGTM in Fig. 5. Adversarial examples are generated for the ensemble of Inc-v3, Inc-v4, IncRes-v2 and Res-v2-101 using different attack methods. Fig. 5 (a) shows that the tanh function does not hurt the performance of adversarial examples and Adam can boost the attack success rates. Fig. 5 (b) shows that our method significantly reduce the mean perturbation size of adversarial examples. In particular, our method reduces 40% perturbation while delivering the stable performance. Fig. 5 (c) shows that our approach with λ=1.3\lambda=1.3 obtains the largest loss of all the methods.

We evaluate the attacks with three more advanced defenses, namely Feature Distillation (Liu et al. 2019), Comdefend (Jia et al. 2019) and Randomized Smoothing (Cohen, Rosenfeld, and Kolter 2019). Table 6 shows the success rates of TI-DIM, TI-DI-AITM, NI-TI-DIM and NI-TI-DI-AITM against these defenses in the ensemble attack scenario.

In Table 6, we find that the attacks with AI-FGTM consistently outperform the attacks with MI-FGSM. In general, our methods can fool these defenses with high success rates.

Based on the above experimental results, it is reasonable to state that the proposed TI-DI-AITM and NI-TI-DI-AITM can generate adversarial examples with much better indistinguishability and transferability. Meanwhile, TI-DI-AITM and NI-TI-DI-AITM raise the security challenge for the development of more effective defense models.

Conclusion

In this paper, we propose AI-FGTM to craft adversarial examples that are much indistinguishable and transferable. AI-FGTM modifies the major gradient processing steps of the basic sign structure to address the limitations faced by the existing basic sign involved methods. Compared with the state-of-the-art attacks, extensive experiments on an ImageNet-compatible dataset show that our method generates more indistinguishable adversarial examples and achieves higher attack success rates without extra running time and resource. Our best attack NI-TI-DI-AITM can fool six classic defense models with an average success rate of 89.3% and fool three advanced defense models with an average success rate of 82.7%, which are higher than the state-of-the-art gradient-based attacks. Additionally, our method reduces nearly 20% mean perturbation. It is expected that our method serves as a new baseline for generating adversarial examples with higher transferability and indistinguishability.

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under Grant (62106281). This paper was finished with the encouragement of Zou’s wife Maojuan Tian. Zou would like thank her and tell her:‘ the most romantic thing I can imagine is gradually getting old with you in scientific exploration.’

References

  • Athalye et al. (2018) Athalye, A.; Engstrom, L.; Ilyas, A.; and Kwok, K. 2018. Synthesizing Robust Adversarial Examples. In Proceedings of the 35th International Conference on Machine Learning, 284–293.
  • Brendel, Rauber, and Bethge (2018) Brendel, W.; Rauber, J.; and Bethge, M. 2018. Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models. In 6th International Conference on Learning Representations.
  • Chen, Jordan, and Wainwright (2020) Chen, J.; Jordan, M. I.; and Wainwright, M. J. 2020. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. In 2020 IEEE Symposium on Security and Privacy, SP 2020, San Francisco, CA, USA, May 18-21, 2020, 1277–1294. IEEE.
  • Cohen, Rosenfeld, and Kolter (2019) Cohen, J. M.; Rosenfeld, E.; and Kolter, J. Z. 2019. Certified Adversarial Robustness via Randomized Smoothing. In Proceedings of the 36th International Conference on Machine Learning, 1310–1320.
  • Dong et al. (2018) Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; and Li, J. 2018. Boosting Adversarial Attacks With Momentum. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 9185–9193.
  • Dong et al. (2019) Dong, Y.; Pang, T.; Su, H.; and Zhu, J. 2019. Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks. In IEEE Conference on Computer Vision and Pattern Recognition, 4312–4321.
  • Eykholt et al. (2018) Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; and Song, D. 2018. Robust Physical-World Attacks on Deep Learning Visual Classification. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 1625–1634.
  • Goodfellow, Shlens, and Szegedy (2015) Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and Harnessing Adversarial Examples. In 3rd International Conference on Learning Representations.
  • Guo et al. (2018) Guo, C.; Rana, M.; Cissé, M.; and van der Maaten, L. 2018. Countering Adversarial Images using Input Transformations. In 6th International Conference on Learning Representations.
  • He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Identity Mappings in Deep Residual Networks. In Computer Vision - ECCV 2016 - 14th European Conference, 630–645.
  • Jia et al. (2019) Jia, X.; Wei, X.; Cao, X.; and Foroosh, H. 2019. ComDefend: An Efficient Image Compression Model to Defend Adversarial Examples. In IEEE Conference on Computer Vision and Pattern Recognition, 6084–6092.
  • Kingma and Ba (2015) Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations.
  • Kurakin, Goodfellow, and Bengio (2017a) Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2017a. Adversarial examples in the physical world. In 5th International Conference on Learning Representations.
  • Kurakin, Goodfellow, and Bengio (2017b) Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2017b. Adversarial Machine Learning at Scale. In 5th International Conference on Learning Representations.
  • Liao et al. (2018) Liao, F.; Liang, M.; Dong, Y.; Pang, T.; Hu, X.; and Zhu, J. 2018. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, 1778–1787.
  • Lin et al. (2020) Lin, J.; Song, C.; He, K.; Wang, L.; and Hopcroft, J. E. 2020. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  • Liu et al. (2017) Liu, Y.; Chen, X.; Liu, C.; and Song, D. 2017. Delving into Transferable Adversarial Examples and Black-box Attacks. In 5th International Conference on Learning Representations.
  • Liu et al. (2019) Liu, Z.; Liu, Q.; Liu, T.; Xu, N.; Lin, X.; Wang, Y.; and Wen, W. 2019. Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples. In IEEE Conference on Computer Vision and Pattern Recognition, 860–868.
  • Madry et al. (2018) Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In 6th International Conference on Learning Representations.
  • Moosavi-Dezfooli et al. (2017) Moosavi-Dezfooli, S.; Fawzi, A.; Fawzi, O.; and Frossard, P. 2017. Universal Adversarial Perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, 86–94.
  • Narodytska and Kasiviswanathan (2017) Narodytska, N.; and Kasiviswanathan, S. P. 2017. Simple Black-Box Adversarial Attacks on Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017, 1310–1318. IEEE Computer Society.
  • Pang, Du, and Zhu (2018) Pang, T.; Du, C.; and Zhu, J. 2018. Max-Mahalanobis Linear Discriminant Analysis Networks. In Proceedings of the 35th International Conference on Machine Learning, 4013–4022.
  • Raghunathan, Steinhardt, and Liang (2018) Raghunathan, A.; Steinhardt, J.; and Liang, P. 2018. Certified Defenses against Adversarial Examples. In 6th International Conference on Learning Representations.
  • Rauber et al. (2020) Rauber, J.; Zimmermann, R. S.; Bethge, M.; and Brendel, W. 2020. Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. J. Open Source Softw., 5(53): 2607.
  • Samangouei, Kabkab, and Chellappa (2018) Samangouei, P.; Kabkab, M.; and Chellappa, R. 2018. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. In 6th International Conference on Learning Representations.
  • Szegedy et al. (2017) Szegedy, C.; Ioffe, S.; Vanhoucke, V.; and Alemi, A. A. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4278–4284.
  • Szegedy et al. (2016) Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; and Wojna, Z. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826.
  • Szegedy et al. (2014) Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I. J.; and Fergus, R. 2014. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations.
  • Tramèr et al. (2018) Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I. J.; Boneh, D.; and McDaniel, P. D. 2018. Ensemble Adversarial Training: Attacks and Defenses. In 6th International Conference on Learning Representations.
  • Wong and Kolter (2018) Wong, E.; and Kolter, J. Z. 2018. Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope. In Proceedings of the 35th International Conference on Machine Learning, 5283–5292.
  • Xie et al. (2018) Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; and Yuille, A. L. 2018. Mitigating Adversarial Effects Through Randomization. In 6th International Conference on Learning Representations.
  • Xie et al. (2019) Xie, C.; Zhang, Z.; Zhou, Y.; Bai, S.; Wang, J.; Ren, Z.; and Yuille, A. L. 2019. Improving Transferability of Adversarial Examples With Input Diversity. In IEEE Conference on Computer Vision and Pattern Recognition, 2730–2739.

Appendix

In this supplementary material, we provide more results in our experiments. In Sec. A, we investigate the effects of different hyper-parameters of AI-FGTM. In Sec. B, we report the success rates in white-box attack to show the effectiveness of AI-FGTM. In Sec. C, we present the visual examples generated by different attacks to show the better indistinguishability of the adversarial examples generated by our methods. In Sec. D, we finally present the value distributions of the accumulated gradients across iterations to demonstrate the limitation of the basic sign attack series. Codes are also available in the supplementary material. Codes are available at https://github.com/278287847/AI-FGTM.

A. The indistinguishability with PSNR and SSIM

We evaluate the indistinguishability with PSNR and SSIM (higher value indicates better indistinguishability). In detail, We generate adversarial examples for inc-v3 with TI-DIM, TI-DI-AITM, NI-TI-DIM and NI-TI-DI-AITM, and get the mean PSNR and SSIM between the adversarial examples and the clean examples. The results are shown in Table 7. We can find that the attacks with AI-FGTM can always perform better.

Table 7: The indistinguishability evaluated with PSNR and SSIM.
Attack PSNR SSIM
TI-DIM
TI-DI-AITM
26.81
28.64
0.78
0.84
NI-TI-DIM
NI-TI-DI-AITM
26.83
28.63
0.77
0.82

B. The effects of different hyper-parameters

We explore the effects of different hyper-parameters of AI-FGTM and aim to find the appropriate settings to balance the success rates of both white-box and black-box attacks. The adversarial examples are generated for the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-v2-101 using TI-DI-AITM. We first show the results of white-box attacks against four known models, and then we present the performance of black-box attacks against three defense models in Fig. 4. It can be seen that the appropriate settings are λ=1.3\lambda=1.3, μ1=1.5{\mu_{1}}=1.5, μ2=1.9{\mu_{2}}=1.9, β1=0.9{\beta_{1}}=0.9, β2=0.99{\beta_{2}}=0.99, and the kernel length is 9.

Refer to caption
Figure 4: The success rates (%) of adversarial attacks against Inc-v3, Inc-v4, IncRes-v2, Res-v2-101, Inc-v3ens3, Inc-v3ens4 and IncResv2ens.

C. The effectiveness in the white-box case

In order to demonstrate the effectiveness of our method, we report the success rates in white-box attack. We present the validation results in single-model attack scenario and model ensemble attack scenario. Table 8 presents the success rates of TI-DIM, TI-DI-AITM, NI-TI-DIM, and NI-TI-DI-AITM against white-box models. The details of the abbreviations are stated more precisely in Table 1 of our submitted paper.

As shown in Table 8, we can find that NI-TI-DI-AITM consistently outperform NI-TI-DIM in the single-model attack scenario. Compared with MI-FGSM, experimental results show that our method can improve the performance of NIM. Fig. 2 of our submitted paper also demonstrate that our method can generate adversarial examples with better indistinguishability.

Table 8: The success rates (%) of adversarial attacks against six black-box defense models under single-model setting. The adversarial examples are generated for Inc-v3, Inc-v4, IncRes-v2, Res-v2-101 respectively using NI-TI-DIM and NI-TI-DI-AITM.
Attack Inc-v3ens3 Inc-v3ens4 IncRes-v2ens HGD R&P NIPS-r3
Inc-v3
NI-TI-DIM
NI-TI-DI-AITM
48.3
51.9
48.6
52.4
36.9
38.2
37.3
39.3
36.8
38.1
42.5
44.6
Inc-v4
NI-TI-DIM
NI-TI-DI-AITM
52.4
54.8
51.8
53.7
41.3
41.7
41.9
43.9
41.1
43.2
42.7
44.1
IncRes-v2
NI-TI-DIM
NI-TI-DI-AITM
61.5
66.5
60.4
63.8
59.9
62.0
60.1
63.2
62.2
65.6
63.1
65.8
Res-v2-101
NI-TI-DIM
NI-TI-DI-AITM
59.5
64.0
57.7
61.0
50.4
54.6
51.9
54.8
50.8
53.4
54.6
57.6

D. Visualization of adversarial examples

We visualize six groups of adversarial examples generated by six different attacks in Fig 5. The adversarial examples are crafted on the ensemble models, including Inc-v3, Inc-v4, IncRes-v2 and Res-101, using TI-DIM, TI-DI-AITM (λ\lambda=1.3), TI-DI-AITM (λ\lambda=0.65), NI-TI-DIM, NI-TI-DI-AITM (λ\lambda=1.3) and NI-TI-DI-AITM (λ\lambda=0.65). We see that adversarial examples generates by our method have the better indistinguishability.

Refer to caption
Figure 5: Visualization of six groups of adversarial examples generated by six different attacks.

E. The value distributions of the accumulated gradients across iterations

We present the value distributions of the accumulated gradients across iterations to demonstrate the limitation of the basic sign attack series. With iteration number T=10T=10, we present the value distributions of the accumulated gradients in each iteration in Fig 6. As the number of iterations increases, values of the accumulated gradients tend to be greater than 1 or less than -1. However, a large number of values are in the range of (0.5,0.5)(-0.5,0.5) in the first three iterations. With constant step size α=ε/T{\rm{\alpha}}=\varepsilon/T, perturbation size of adversarial examples is greatly enlarged. Hence, we replace the sign function with the tanh function and gradually increase the step size.

Refer to caption
Figure 6: The value distributions of the accumulated gradients across iterations.