This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rethinking PGD Attack: Is Sign Function Necessary?

Junjie Yang1, Tianlong Chen2, Xuxi Chen3, Zhangyang Wang3, Yingbin Liang1
1The Ohio State University, 2MIT@CSAIL, 3University of Texas at Austin
Abstract

Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.

1 Introduction

Neural network has been widely adopted in many areas, e.g., computer vision (Krizhevsky et al., 2012; He et al., 2016) and natural language processing (Hochreiter and Schmidhuber, 1997). Generally, a well-trained neural network can make very accurate prediction when classifying image classes. However, existing works (Goodfellow et al., 2015; Madry et al., 2018; Zhang et al., 2020, 2021) have shown that merely tiny perturbations of the neural network input, which would not affect human judgment, might cause significant mistakes for the network output. These perturbations can be intentionally generated using various algorithms, usually referred to as adversarial attacks. Adversarial attacks are commonly categorized into white-box attacks and black-box attacks. In white-box attacks, we have access to all neural network parameters, allowing us to exploit this information (e.g., through back-propagation) to generate adversarial inputs. In contrast, the neural network architectures are inaccessible in the scenario of black-box attacks, and one can only test different inputs and their corresponding outputs to identify successful adversarial examples.

This work focuses exclusively on the white-box scenario, particularly for the LL_{\infty} norm-based projected gradient descent (PGD) (Madry et al., 2018) attack. In LL_{\infty} norm attacks, the learned perturbation δ\delta is constrained within an ϵ\epsilon-ball, ensuring that the absolute magnitudes of all pixel values do not exceed ϵ\epsilon. Under these constraints, most white-box algorithms employ the "signed gradient" to maximize the perturbation values and approach the ϵ\epsilon boundary, thereby generating stronger attacks. For instance, the Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015) initially introduces the sign operation for one-step perturbation update. Subsequently, Madry et al. (2018) proposes the PGD to iteratively update the perturbation using the signed gradient. Following-up PGD-based approaches, such as MIM (Dong et al., 2018) and Auto-Attack (Croce and Hein, 2020), unanimously follow the practice of using the sign function. Previous work (Agarwal et al., 2020) has empirically demonstrated that the "signed gradient" significantly outperforms the raw gradient in FGSM context. However, the signed gradient, compared to the raw gradient, abandons magnitude information in PGD, which may compromise the adversary information too. This motivates the following questions:

  • \bullet

    Despite the loss of gradient value information, sign-based attacks remain the preferred choice in LL_{\infty} norm scenarios. What are the critical factors in determining the quality of perturbations in PGD? Why does the raw gradient fail to generate more effective adversarial attacks?

Note that the raw gradient is found effective in L2L_{2} contexts. For instance, the C&W (Carlini and Wagner, 2017) attack, a widely used algorithm, utilizes the raw gradient and incorporates constraint terms and variable changes for adversarial attacks. Therefore, more pertinent questions arise:

  • \bullet

    Why does the "signed gradient" appear necessary in LL_{\infty} attacks? What are the pros and cons of the sign function? Can we achieve more general attack success (including LL_{\infty}) without the “sign" operation, and can that even outperform signed gradients?

1.1 Main Contributions

This work rigorously analyzes how the perturbation update method affects the adversarial quality and further empirically shows why raw gradient is not favored in the current PGD algorithm. Moreover, we propose a new raw gradient-based algorithm that outperforms vanilla PGD across various scenarios without introducing any extra computational cost.

In this study, our first objective is to theoretically characterize the effect of update mechanisms on adversarial samples at each step. We observe that a stronger attack in the previous step, or a larger perturbation change, leads to a greater attack improvement in the subsequent step. Furthermore, our empirical observations reveal that in the later update steps, vanilla PGD, which employs the sign function, exhibits a larger change in perturbation compared to raw gradient-based PGD. Such a larger change facilitates more pixels approaching the ϵ\epsilon boundary, indicating a stronger attack update based on the theoretical results. This explains the preference for using the signed gradient in PGD.

Next, we point out that the failure of the raw gradient in PGD is not only due to the insufficient update on its own, but also attributed to the clipping design. In PGD, all perturbations are clipped within the ϵ\epsilon ball per step. This design results in the loss of significant magnitude information for the raw gradient update, particularly when most pixels approach the ϵ\epsilon boundary. Instead, we introduce a new hidden variable of non-clipped perturbation for updating. In this way, the LL_{\infty} norm-based adversarial attack, a constrained optimization problem, is transformed into an unconstrained optimization problem, where the proposed hidden variable is allowed to surpass the ϵ\epsilon boundary. Extensive experiments further demonstrate that our proposed method significantly improves the performance of raw update-based PGD and even outperforms vanilla PGD, without incurring additional computational overhead.

2 Related Works

Adversarial robustness has been explored in recent years. Szegedy et al. (2013) were the first to mention that imperceptible perturbations can cause networks to misclassify images. Then, based on gradients with respect to the input, Goodfellow et al. (2015) proposed the one-step FGSM for generating adversarial examples. Furthermore, iterative-based adversarial attacks such as BIM (Kurakin et al., 2018) and PGD (Madry et al., 2018) have been proposed for white-box attacks. The difference between these algorithms lies in whether they adopt random initialization or not. The C&W attack (Carlini and Wagner, 2017) introduced a regularized loss parameterized by cc and demonstrated that defensive distillation does not significantly enhance robustness. DeepFool (Moosavi-Dezfooli et al., 2016) generated perturbations by projecting the data onto the closest hyperplane. Auto-Attack (Croce and Hein, 2020) combines four different attacks, two of which are PGD variants. Regarding the role of the "sign" function, Agarwal et al. (2020) demonstrated that gradient magnitude alone cannot result in a successful FGSM attack. Meanwhile, by manipulating the images in the opposite direction of the gradient, the classification error rates can be significantly reduced. Furthermore, there are several works (Zhang et al., 2022; Liu et al., 2019; Shaham et al., 2015) to study the sign function within the bilevel optimization or adversarial training frameworks.

Black-box attacks have also received significant attention in the field. For instance, Chen et al. (2017) introduced the use of zeroth-order information to estimate the neural network gradients in order to generate adversarial examples. In a different approach, Ilyas et al. (2018) incorporated natural evolutionary strategies to enhance query efficiency. Furthermore, Al-Dujaili and O’Reilly (2020) proposed a sign-based gradient estimation approach, which replaced continuous gradient estimation with binary black-box optimization. The Square Attack algorithm, proposed by Andriushchenko et al. (2020), introduced randomized localized square-shaped updates, resulting in a substantial improvement in query efficiency. Another approach, Rays (Chen and Gu, 2020), transformed the continuous problem of finding the closest decision boundary into a discrete problem and achieved success in hard-label contexts.

3 Methodology

In this section, we introduce the formulation of adversarial attacks and present our algorithm. Let us consider the input xnx\in\mathbb{R}^{n} with its true label yy. We define the prediction model f(x;w)f(x;w), where ww represents the model’s parameters. The loss function, denoted as l(f(x;w),y)l(f(x;w),y), quantifies the discrepancy between the model’s output f(x;w)f(x;w) and the true label yy. Hence, our objective in LL_{\infty} norm based adversarial attack is to maximize the loss function g(x+δ)g(x+\delta), while constraining the perturbation δ\delta within an ϵ\epsilon-ball, as follows:

maxδ:δϵl(f(x+δ);w),y)=g(x+δ).\displaystyle\max_{\delta:\|\delta\|_{\infty}\leq\epsilon}l(f(x+\delta);w),y)=g(x+\delta).

In practice, we adopt ϵ\epsilon-ball projection σϵ()\sigma_{\epsilon}(\cdot) to construct the projected perturbation δc=σϵ(δ)\delta_{c}=\sigma_{\epsilon}(\delta) and it satisfies δcϵ\|\delta_{c}\|_{\infty}\leq\epsilon. The σϵ()\sigma_{\epsilon}(\cdot) operation can be characterized as follows:

δc=σϵ(δ)=max(min(δ,ϵ),ϵ).\displaystyle\delta_{c}=\sigma_{\epsilon}(\delta)=\max(\min(\delta,\epsilon),-\epsilon).

Note that σϵ\sigma_{\epsilon} projection operates element-wisely.

Inspired by Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015), most attack algorithms utilize the sign function for perturbation update where Projected Gradient Descent (PGD) (Madry et al., 2018) is widely studied and applied. Specifically, The perturbation δ\delta is iteratively updated using the signed gradient in PGD. If we represent the perturbation at the tt-th step as δt\delta^{t} (t=0,,T1)(t=0,\ldots,T-1), the update procedure in PGD can be described as follows:

(Vanilla PGD)δt+1=δct+αsign(δg(x+δct)),\displaystyle\textit{(Vanilla PGD)}\quad\delta^{t+1}=\delta^{t}_{c}+\alpha\text{sign}(\nabla_{\delta}g(x+\delta^{t}_{c})), (1)

where δct=σϵ(δt)\delta_{c}^{t}=\sigma_{\epsilon}(\delta^{t}) represents the element-wise clipped (or projected) perturbation at the tt-th step. The sign() function assigns a value of 1 to positive elements and -1 to negative elements. The adversarial performance is evaluated by measuring the performance of the clipped final step perturbation, denoted as g(x+δcT)g(x+\delta_{c}^{T}).

To maximize the adversarial loss g(x+δcT)g(x+\delta_{c}^{T}), a natural question arises: can we eliminate the sign() function in eq.1, considering that the raw gradient contains more information before being projected into a binary value? A naive variant of PGD with raw updates is defined:

(PGD raw gradient)δt+1=δct+αδg(x+δct),\displaystyle\textit{(PGD raw gradient)}\quad\delta^{t+1}=\delta^{t}_{c}+\alpha\nabla_{\delta}g(x+\delta^{t}_{c}), (2)

where we simply remove the sign() function. However, our experimental results have indicated that update in eq.2 result in worse performance compared to the signed gradient update in terms of robust accuracy.

The failure of utilizing raw gradient arises from the usage of clipped perturbation for update. In Section 4.2, we show that during the update process, more than half of the perturbation elements cross and are projected into the ϵ\epsilon boundary. Consequently, a significant amount of raw gradient magnitude information is eliminated in δct\delta_{c}^{t}, making its adoption for update ineffective in strengthening the attack.

To address this issue, we introduce a hidden unclipped perturbation δt\delta^{t} for update and propose Raw Gradient Descent (RGD) algorithm, outlined as follows:

(Proposed RGD)δt+1=δt+αδg(x+δct).\displaystyle\textit{(Proposed RGD)}\quad\delta^{t+1}=\delta^{t}+\alpha\nabla_{\delta}g(x+\delta_{c}^{t}). (3)

It is important to note that the intermediate update of δt+1\delta^{t+1} depends on the unclipped δt\delta^{t} and is allowed to cross the ϵ\epsilon boundary. The projection restriction is solely applied to the raw gradient δg(x+δct)\nabla_{\delta}g(x+\delta_{c}^{t}). As a result, the constrained optimization problem regarding δct\delta_{c}^{t} is transformed into an unconstrained optimization problem concerning δt\delta^{t}, with the restriction implicitly applied in the gradient computation δg(x+δct)\nabla_{\delta}g(x+\delta_{c}^{t}). The last-iteration output δT\delta^{T} will be clipped to fit the constraint, as the final output. This design allows the intermediate perturbation to learn the genuine adversarial distribution without any loss of magnitude information. Its effectiveness has been demonstrated through experiments in Section 5, where the clipped objective function g(x+δcT)g(x+\delta_{c}^{T}) was utilized. Furthermore, our proposed algorithm does not incur any additional computational cost compared to the vanilla PGD in eq. 1, except saving the extra term δt\delta^{t}.

  Algorithm 1 2 3 4 5 6 7
Robust Accuracy (%) PGD 65.51 52.8 46.74 44.89 43.84 43.43 43.19
PGD (raw) 58.74 50.36 46.7 45.58 44.98 44.68 44.54
RGD 55.65 47.58 45.14 43.97 43.4 43.04 42.88
Boundary Ratio (%) PGD 31.7 52.5 74.5 78.1 83.2 85.0 86.9
PGD (raw) 24.2 39.6 47.8 52.2 55.3 57.4 59.1
RGD 34.5 52.5 60.1 64.3 67.0 69.0 70.6
 
Table 1: Comparison of update algorithms for robust accuracy and perturbation pixel-wise boundary ratio in different steps.

4 Understanding How Update Influences PGD Performance

In this section, we begin by conducting a theoretical analysis of the PGD update and examine how the step-wise update influences the attack performance. Subsequently, we integrate our theoretical findings with experimental results to elucidate why the sign operation has been favored in PGD.

PGD Coefficient Histogram

PGD (raw) Coefficient Histogram

RGD Coefficient Histogram
\stackunder[5pt]Refer to captionstep1 \stackunder[5pt]Refer to captionstep3 \stackunder[5pt]Refer to captionstep5 \stackunder[5pt]Refer to captionstep7

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Comparison of update algorithms for perturbation pixel distribution in different steps.

4.1 Theoretical Insights

In this part, we characterize the influence of the update procedure on the adversarial gain g(x+δct+1)g(x+δct)g(x+\delta_{c}^{t+1})-g(x+\delta_{c}^{t}) at each step tt. Inspired by existing works (Du et al., 2019; Arora et al., 2019), we consider a model function with one ReLU (Nair and Hinton, 2010) hidden layer, defined as follows:

f(x;w)=w1Th(w2x),\displaystyle f(x;w)=w_{1}^{T}h(w_{2}*x), (4)

where h(x)=max(0,x)h(x)=\max(0,x) denotes the element-wisely ReLU function, xn×1x\in\mathbb{R}^{n\times 1} denotes the input, w1m×1w_{1}\in\mathbb{R}^{m\times 1} and w2m×nw_{2}\in\mathbb{R}^{m\times n} denote output weight and first-layer vector weight respectively. Assuming the Mean Square Error (MSE) loss, the loss function g(x+δc)g(x+\delta_{c}) is defined as follows:

g(x+δc)=12[f(x+δc;w)y(x)]2,\displaystyle g(x+\delta_{c})=\frac{1}{2}[f(x+\delta_{c};w)-y^{\ast}(x)]^{2}, (5)

where y(x)y^{\ast}(x) denotes the real label value for input xx.

Theorem 1.

Considering g(x)=12[w1Th(w2x)y(x)]2g(x)=\frac{1}{2}[w_{1}^{T}h(w_{2}*x)-y^{\ast}(x)]^{2}, activation function h(x)h(x) as ReLU, we define |||\cdot| as element wise absolute operation, then the adversarial step gain g(x+δct+1)g(x+δct)g(x+\delta_{c}^{t+1})-g(x+\delta_{c}^{t}) is bounded as follows:

g(x+δct+1)g(x+\displaystyle g(x+\delta_{c}^{t+1})-g(x+ δct)22|w1T||w2||δct+1δct|\displaystyle\delta_{c}^{t})\leq\frac{\sqrt{2}}{2}|w_{1}^{T}||w_{2}||\delta_{c}^{t+1}-\delta_{c}^{t}|
(g(x+δct+1)+g(x+δct)),\displaystyle\left(\sqrt{g(x+\delta_{c}^{t+1})}+\sqrt{g(x+\delta_{c}^{t})}\right),

where δct\delta_{c}^{t} denotes the clipped perturbation in tt-th step.

The theorem demonstrates that the adversarial gain in step tt depends on two factors: (i) The magnitude of perturbation change, δct+1δct\delta_{c}^{t+1}-\delta_{c}^{t}. If the update algorithm results in a large perturbation change, the adversarial gain will be increased. (ii) The previous adversarial loss g(x+δct)g(x+\delta_{c}^{t}), and the updated loss g(x+δct+1)g(x+\delta_{c}^{t+1}). It is important to note that the adversarial loss is non-negative, and we expect the updated g(x+δct+1)g(x+\delta_{c}^{t+1}) to be greater than the previous g(x+δct)g(x+\delta_{c}^{t}). Therefore, the step gain g(x+δct+1)g(x+δct)g(x+\delta_{c}^{t+1})-g(x+\delta_{c}^{t}) is highly influenced by the previous loss g(x+δct)g(x+\delta_{c}^{t}). As a result, a larger previous adversarial loss g(x+δct)g(x+\delta_{c}^{t}) implies a better step gain. In summary, if our update algorithm induces a significant perturbation change δct+1δct\delta_{c}^{t+1}-\delta_{c}^{t} and starts with a substantial initial adversarial loss g(x+δct)g(x+\delta_{c}^{t}), it will lead to a larger adversarial gain g(x+δct+1)g(x+δct)g(x+\delta_{c}^{t+1})-g(x+\delta_{c}^{t}).

  Dataset Method PGD PGD(raw) RGD
CIFAR-10 (ϵ=8/255\epsilon=8/255) WRN-28(Ding et al., 2020) 52.64±0.052\pm 0.052 53.98±0.067\pm 0.067 52.56±0.013\pm 0.013
PreRN-18(Wong et al., 2020) 47.45±0.052\pm 0.052 47.50±0.013\pm 0.013 47.31±0.005\pm 0.005
RN-18(Addepalli et al., 2022) 57.06±0.044\pm 0.044 57.11±0.005\pm 0.005 57.03±0.000\pm 0.000
RN-18(Engstrom et al., 2019) 43.23±0.023\pm 0.023 44.50±0.015\pm 0.015 42.87±0.010\pm 0.010
CIFAR-10 (ϵ=16/255\epsilon=16/255) WRN-28(Ding et al., 2020) 37.53±0.143\pm 0.143 40.93±0.068\pm 0.068 37.19±0.024\pm 0.024
PreRN-18(Wong et al., 2020) 13.32±0.073\pm 0.073 14.98±0.012\pm 0.012 12.99±0.015\pm 0.015
RN-18(Addepalli et al., 2022) 25.57±0.094\pm 0.094 25.87±0.016\pm 0.016 25.27±0.006\pm 0.006
RN-18(Engstrom et al., 2019) 14.19±0.033\pm 0.033 19.11±0.048\pm 0.048 13.37±0.028\pm 0.028
CIFAR-100 (ϵ=16/255\epsilon=16/255) WRN-28(Wang et al., 2023) 21.10±0.075\pm 0.075 20.72±0.000\pm 0.000 20.61±0.000\pm 0.000
XCiT(Debenedetti et al., 2022) 15.28±0.067\pm 0.067 15.44±0.000\pm 0.000 15.28±0.004\pm 0.004
WRN-34(Addepalli et al., 2022) 16.37±0.010\pm 0.010 16.16±0.019\pm 0.019 15.94±0.013\pm 0.013
RN-18(Addepalli et al., 2022) 14.45±0.082\pm 0.082 14.40±0.008\pm 0.008 14.20±0.010\pm 0.010
PreRN-18(Rice et al., 2020) 6.14±0.035\pm 0.035 7.36±0.018\pm 0.018 5.78±0.010\pm 0.010
ImageNet (ϵ=4/255\epsilon=4/255) WRN-50(Salman et al., 2020) 42.02±0.041\pm 0.041 42.57±0.032\pm 0.032 41.88±0.015\pm 0.015
RN-50(Wong et al., 2020) 29.0±0.028\pm 0.028 28.56±0.013\pm 0.013 27.85±0.027\pm 0.027
RN-18(Salman et al., 2020) 30.08±0.054\pm 0.054 30.45±0.010\pm 0.010 30.08±0.008\pm 0.008
ImageNet (ϵ=8/255\epsilon=8/255) WRN-50(Salman et al., 2020) 19.35±0.041\pm 0.041 21.75±0.050\pm 0.050 18.97±0.032\pm 0.032
RN-50(Wong et al., 2020) 11.77±0.030\pm 0.030 12.31±0.016\pm 0.016 11.03±0.010\pm 0.010
RN-18(Salman et al., 2020) 12.88±0.027\pm 0.027 13.69±0.010\pm 0.010 12.68±0.008\pm 0.008
 
Table 2: Robust accuracy comparison of PGD, PGD with raw update and RGD for 7-step attack. The methods are abbreviated. XCiT-S12:XCiT, WideResNet:WRN, ResNet:RN, and PreActResNet:PreRN.

4.2 Empirical Study

We next conduct a further study to analyze how the update process shapes adversarial samples in practice. Specifically, we target the robust PGD model 111The robust model used can be found at https://github.com/ndb796/Pytorch-Adversarial-Training-CIFAR with CIFAR10 ϵ=8/255\epsilon=8/255 LL_{\infty} setting. on the CIFAR10 testing dataset within the ϵ=8/255\epsilon=8/255 ball. The perturbation distribution at each step for three update algorithms (PGD, PGD with raw update, proposed RGD) is illustrated in Figure 1. The results reveal that, within the limited 7 steps, the majority of pixels (86.9%) converge to the boundary in the PGD update, whereas its corresponding raw update shows a lower boundary ratio (59.1%) with a considerable number of pixels remaining stuck at the zero initial point. More detailed information regarding the boundary ratio and robust accuracy can be found in Table 1. It reveals that PGD achieves a lower robust accuracy (43.19%43.19\%) compared to its raw update version (44.54%44.54\%). Considering the theoretical results presented in Theorem 1, which indicate that larger perturbation changes lead to better performance gains, we can conclude that the success of PGD, in contrast to PGD with naive raw update, can be attributed to its "sign" ability, which facilitates more pixel changes and convergence to the boundary.

Furthermore, it is worth noting that RGD outperforms PGD in terms of attack performance, despite having fewer boundary pixels (70.6% v.s. 86.9%). This can be attributed to the fact that the real adversarial distribution in RGD exhibits better attack performance, as indicated by the higher values of g(x+δct)g(x+\delta_{c}^{t}), resulting in larger adversarial gains at each step, as demonstrated in Table 1. Consequently, RGD is able to learn a better adversarial distribution even without a significant perturbation change. It is important to highlight that PGD (raw), despite its use of raw updates, does not effectively preserve most of the magnitude information due to the clip operation. As a result, it performs significantly worse than RGD.

To ensure a fair comparison, we carefully fine-tuned the step size α\alpha for all three algorithms, using grid search. Following the principle of minimal robust accuracy, we set a zero initial point for both PGD (raw) and RGD, while utilizing a random initial point for PGD. Note that the choice of initial point does not influence the final boundary ratio while detailed initial comparison is available in Appendix B.

5 Experimental Results

In this section, we begin by comparing the proposed RGD algorithm with PGD (sign/raw updates) for adversarial attacks on different architectures, with varying adversarial perturbation levels and datasets. Then we compare RGD and PGD over adversarial training setting and showcase the robust accuracy performance boosts brought by RGD. Moreover, we utilize RGD for transfer attacks and conduct extensive experiments to validate its remarkable improvement in adversarial transferability. All experiments are conducted with five independent runs with different random seeds, and the standard deviations are reported to affirm results’ significance. The step sizes α\alpha for all algorithms are carefully tuned through grid search. For the PGD, we grid search the step size from list (2ϵ\epsilon, 1.5ϵ\epsilon, ϵ\epsilon, 0.8ϵ\epsilon, 0.5ϵ\epsilon, 0.25ϵ\epsilon, 0.2ϵ\epsilon) and our RGD, PGD(raw) step size tuned from list (1, 3, 10, 30, 100, 300, 1000, 3000, 1e4, 3e4, etc). The step size for RGD and PGD(raw) is larger than PGD ones because the raw gradient over the input is small which calls for a large step size for update. Meanwhile, we utilize random initialization for PGD and zero initialization for the other methods. The initial choice is based on minimal robust accuracy principle. The detailed initial comparison is available in Appendix B. All experiments use NVIDIA Volta V100 GPUs.

5.1 Comparison of Algorithms for Adversarial Attack

We compare PGD with sign/raw updates and proposed RGD using a 7-step attack. The datasets include CIFAR-10, CIFAR-100, and ImageNet, and we attack their respective testing or validation sets. The models attacked are sourced from RobustBench (Croce et al., 2021). The adversarial attack settings follows the approach of Ding et al. (2019), and the results are presented in Table 2.

Our results demonstrate that introduced non-clipped perturbation significantly improves the raw update method, resulting in a substantial performance boost from PGD (raw) to RGD. Furthermore, RGD outperforms PGD across various datasets, model architectures, and adversarial sizes, highlighting the superiority of RGD. It is worth noting that RGD is particularly advantageous in scenarios with larger ϵ\epsilon values (e.g., ϵ=16/255\epsilon=16/255). This is likely because larger adversarial size allows RGD to exhibit a more effective adversarial distribution. Conversely, if a smaller ϵ\epsilon-ball is used, most RGD perturbation pixels will be clipped into the ϵ\epsilon boundary in the final step.

5.2 Comparison of Algorithms for Adversarial Training

For adversarial training, we conduct experiments with different architectures and steps on CIFAR10 to compare PGD and RGD. We specifically train models including ResNet, WideResNet, and a Convolutional Neural Network, using both 5-step and 10-step attack strategies. Each model comprises 6 blocks. The robustness of these models is assessed using a 10-step Projected Gradient Descent (PGD) attack, with all experiments conducted within an ϵ=8/255\epsilon=8/255 constraint. Both clean and robust accuracy results are presented in Table 3.

  Setting Method Clean Accuracy Robust Accuracy
ResNet (Step=5) PGD 84.41 % 29.32 %
RGD 76.35% 44.67% (15.35%\uparrow)
ResNet (Step=10) PGD 79.41% 38.15 %
RGD 78.35% 47.16% (9.01%\uparrow)
WideResNet (Step=5) PGD 89.83% 37.86 %
RGD 86.17% 52.15% (14.29%\uparrow)
WideResNet (Step=10) PGD 80.53% 48.33 %
RGD 86.04% 52.19% (3.86%\uparrow)
CNN (Step=5) PGD 86.85% 26.00 %
RGD 82.72% 42.76% (16.76%\uparrow)
CNN (Step=10) PGD 82.10% 41.35 %
RGD 82.63% 43.03% (1.68%\uparrow)
 
Table 3: Adversarial training comparison of PGD and RGD. CNN refers to Convolutional neural network. The number in the bracket shows the robust accuracy improvements by switching PGD to RGD.

From the table, it can be observed that RGD significantly improves all robust accuracy across different architectures and steps. Specifically, for the setting of WideResNet step=10, RGD improves both clean and robust accuracy. For the other settings of CNN step=5, RGD sacrifices a little clean accuracy of 3%3\% but significantly improves the robust accuracy of 16%16\% than PGD. All these experimental results further validate the superiority of RGD over PGD, especially with setting of step=5.

  Dataset Method Type PGD PGD (raw) RGD
ImageNet (ϵ=8/255\epsilon=8/255) ResNet-50 Source 99.97±0.001\pm 0.001 96.73±0.068\pm 0.068 99.30±0.015\pm 0.015
DenseNet-121 Clean 54.52±0.15\pm 0.15 24.34±0.355\pm 0.355 59.42±0.379\pm 0.379
VGG19-BN Clean 51.06±0.155\pm 0.155 26.51±0.241\pm 0.241 59.40±0.358\pm 0.358
Inception-V3 Clean 29.04±0.349\pm 0.349 22.42±0.355\pm 0.355 34.82±0.207\pm 0.207
PreActResNet-18 Robust 30.09±0.077\pm 0.077 30.51±0.118\pm 0.118 30.88±0.027\pm 0.027
WideResNet-50 Robust 12.13±0.051\pm 0.051 12.55±0.07\pm 0.07 12.77±0.079\pm 0.079
ResNet-50 Robust 18.14±0.096\pm 0.096 18.81±0.116\pm 0.116 19.11±0.047\pm 0.047
ImageNet (ϵ=16/255\epsilon=16/255) ResNet-50 Source 100±0.008\pm 0.008 98.08±0.194\pm 0.194 99.30±0.008\pm 0.008
DenseNet-121 Clean 77.58±0.221\pm 0.221 42.76±0.363\pm 0.363 81.1±0.196\pm 0.196
VGG19-BN Clean 72.67±0.142\pm 0.142 49.98±0.289\pm 0.289 80.62±0.246\pm 0.246
Inception-V3 Clean 44.04±0.414\pm 0.414 35.01±0.193\pm 0.193 54.6±0.253\pm 0.253
PreActResNet-18 Robust 34±0.041\pm 0.041 37.66±0.054\pm 0.054 38.22±0.093\pm 0.093
WideResNet-50 Robust 15.03±0.131\pm 0.131 19.12±0.113\pm 0.113 19.62±0.078\pm 0.078
ResNet-50 Robust 21.96±0.054\pm 0.054 26.81±0.183\pm 0.183 27.52±0.109\pm 0.109
CIFAR-10 (ϵ=8/255\epsilon=8/255) ResNet-50 Source 98.76±0.065\pm 0.065 78.99±0.316\pm 0.316 98.22±0.026\pm 0.026
DenseNet-121 Clean 65.74±0.3\pm 0.3 43.05±0.145\pm 0.145 79.66±0.124\pm 0.124
VGG19-BN Clean 58.20±0.159\pm 0.159 45.87±0.341\pm 0.341 72.23±0.155\pm 0.155
Inception-V3 Clean 64.28±0.17\pm 0.17 49.87±0.357\pm 0.357 77.3±0.1\pm 0.1
WideResNet-28 Robust 17.41±0.039\pm 0.039 17.86±0.052\pm 0.052 18.88±0.037\pm 0.037
PreActResNet-181 Robust 18.11±0.057\pm 0.057 18.06±0.049\pm 0.049 19.01±0.049\pm 0.049
PreActResNet-182 Robust 21.46±0.079\pm 0.079 21.62±0.063\pm 0.063 22.54±0.039\pm 0.039
CIFAR-10 (ϵ=16/255\epsilon=16/255) ResNet50 Source 99.9±0.024\pm 0.024 90.32±0.192\pm 0.192 99.22±0.017\pm 0.017
DenseNet-121 Clean 89.36±0.348\pm 0.348 76.5±0.198\pm 0.198 95.51±0.082\pm 0.082
VGG19-BN Clean 82.90±0.206\pm 0.206 79.17±0.306\pm 0.306 90.8±0.086\pm 0.086
Inception-V3 Clean 86.61±0.181\pm 0.181 79.7±0.234\pm 0.234 93.49±0.044\pm 0.044
WideResNet-28 Robust 19.9±0.098\pm 0.098 22.42±0.148\pm 0.148 23.90±0.035\pm 0.035
PreActResNet-181 Robust 19.98±0.069\pm 0.069 21.67±0.103\pm 0.103 23.53±0.041\pm 0.041
PreActResNet-182 Robust 23.23±0.124\pm 0.124 24.49±0.071\pm 0.071 26.03±0.039\pm 0.039
 
Table 4: Comparison of transfer attack success rates in 10-step attack for PGD, PGD with raw update, and RGD. In ImageNet, PreActResNet-18 model is from Wong et al. (2020), WideResNet-50 model is from Salman et al. (2020), ResNet-50 model is from Engstrom et al. (2019), In CIFAR-10, WideResNet-28 model is from Ding et al. (2020), PreActResNet-181 model is from Wong et al. (2020), PreActResNet-182 model is from Andriushchenko and Flammarion (2020).
Refer to caption
Refer to caption
Refer to caption
Figure 2: Comparison of robust accuracy of PGD, PGD with raw update and proposed RGD with different ϵ\epsilon sizes.
Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparison of robust accuracy of PGD, PGD with raw update and RGD with different update steps when attacking robust ResNet18 model in CIFAR10.

5.3 Transfer Attack Study

In this section, we examine and compare the transferability of RGD with PGD (sign/raw). We begin by generating adversarial data through a 10-step attack on the clean ResNet50 model (He et al., 2016). These adversarial samples are then transferred to attack clean (Simonyan and Zisserman, 2015; Huang et al., 2017; Szegedy et al., 2015) and robust (Croce et al., 2021) target models. The ImageNet validation set and CIFAR-10 testing set are used as the data for the attacks, following the settings in Zhao et al. (2022). The detailed results of the attack success rates for the source and target models can be found in Table 4.

The findings demonstrate that RGD consistently achieves the highest target success rates, while maintaining similar source success rates compared to PGD. It consistently improves the success rate by at least 5% when attacking most clean models and by around 3% for robust models with larger boundary (ϵ=16/255\epsilon=16/255). Furthermore, PGD (raw) outperforms signed PGD when attacking robust models, indicating that raw updates enhance the transferability to robust models. Our proposed RGD method further improves both robust and clean transferability compared to PGD (raw).

6 Ablation and Visualization

6.1 Adversarial Perturbation Level Study

In this section, we conduct a comprehensive comparison under different levels of adversarial perturbation. Specifically, we consider three different values of ϵ\epsilon (ϵ=4/255,8/255,16/255\epsilon=4/255,8/255,16/255) for generating adversarial examples, and evaluate the robust accuracy of PGD with sign or raw gradient updates, as well as RGD, when attacking the robust ResNet18 model using the same settings as in Section 4.2. We perform a grid search to determine the step sizes for each algorithm, and the results are presented in Figure 2, where a lower robust accuracy indicates a stronger attack.

Our result reveals the following observations: In the context of small ϵ\epsilon (ϵ=4/255\epsilon=4/255), all algorithms converge to a similar performance point, suggesting comparable attack effectiveness. However, for larger size (ϵ=16/255\epsilon=16/255), PGD with raw gradient update performs relatively poorly, while RGD outperforms the other algorithms with noticeable improvements. Therefore, when a larger perturbation is allowed, RGD is the preferred choice for achieving stronger attack performance.

Furthermore, the raw update-based algorithms, namely PGD (raw) and RGD, exhibit a lower robust accuracy in the early steps. It is important to mention that we utilize zero initial for PGD (raw) and RGD, while a stronger uniform random initialization for PGD. This suggests that the raw update enables the generation of high-quality adversarial examples in the early steps.

6.2 Adversarial Update Step Study

In this section, we compare the performance of PGD with sign/raw updates and RGD at different update steps. Specifically, we evaluate the algorithms at steps 5, 10 and 20. The experimental setup follows Section 4.2 where step sizes are carefully fine-tuned. We fix the adversarial size ϵ\epsilon at 16/25516/255, and the results are presented in Figure 3.

Our findings reveal that in the early stages, RGD achieves a lower robust accuracy compared to PGD, which can be attributed to its generation of stronger adversarial examples through a genuine adversarial distribution. However, as the number of training steps increases, the performance gap between RGD and PGD diminishes. This can be explained by the stable perturbation change introduced by the sign function in PGD, allowing it to achieve comparable performance to RGD in the later steps. In summary, RGD is a preferable choice for scenarios requiring a few-step update.

Additionally, it is evident that PGD with raw update remains in a suboptimal performance region and shows limited improvement in the later steps. This observation further highlights the advantage of the non-clipping design in RGD, which facilitates the generation of stronger adversarial examples.

7 Conclusion

This work provides a theoretical analysis of how update procedures impact adversarial performance and offers an empirical explanation for why the sign operation is preferred in PGD. Additionally, we introduce the hidden unclipped perturbation and propose a novel attack algorithm called RGD. This algorithm transforms the constrained optimization problem into an unconstrained optimization problem. Extensive experiments have been conducted to demonstrate the superiority of proposed algorithm in practical scenarios.

References

  • Addepalli et al. (2022) S. Addepalli, S. Jain, et al. Efficient and effective augmentation strategy for adversarial training. Advances in Neural Information Processing Systems (NeurIPS), 35:1488–1501, 2022.
  • Agarwal et al. (2020) A. Agarwal, R. Singh, and M. Vatsa. The role of’sign’and’direction’of gradient on the performance of cnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 646–647, 2020.
  • Al-Dujaili and O’Reilly (2020) A. Al-Dujaili and U.-M. O’Reilly. Sign bits are all you need for black-box attacks. In International Conference on Learning Representations, 2020.
  • Andriushchenko and Flammarion (2020) M. Andriushchenko and N. Flammarion. Understanding and improving fast adversarial training. Advances in Neural Information Processing Systems (NeurIPS), 33:16048–16059, 2020.
  • Andriushchenko et al. (2020) M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein. Square attack: a query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision (ECCV), pages 484–501. Springer, 2020.
  • Arora et al. (2019) S. Arora, S. Du, W. Hu, Z. Li, and R. Wang. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning (ICML), pages 322–332, 2019.
  • Carlini and Wagner (2017) N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. Ieee, 2017.
  • Chen and Gu (2020) J. Chen and Q. Gu. Rays: A ray searching method for hard-label adversarial attack. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), pages 1739–1747, 2020.
  • Chen et al. (2017) P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 15–26, 2017.
  • Croce and Hein (2020) F. Croce and M. Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning (ICML), pages 2206–2216. PMLR, 2020.
  • Croce et al. (2021) F. Croce, M. Andriushchenko, V. Sehwag, E. Debenedetti, N. Flammarion, M. Chiang, P. Mittal, and M. Hein. Robustbench: a standardized adversarial robustness benchmark. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2021. URL https://openreview.net/forum?id=SSKZPJCt7B.
  • Debenedetti et al. (2022) E. Debenedetti, V. Sehwag, and P. Mittal. A light recipe to train robust vision transformers. arXiv preprint arXiv:2209.07399, 2022.
  • Ding et al. (2019) G. W. Ding, L. Wang, and X. Jin. AdverTorch v0.1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019.
  • Ding et al. (2020) G. W. Ding, Y. Sharma, K. Y. C. Lui, and R. Huang. Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations (ICLR), 2020.
  • Dong et al. (2018) Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9185–9193, 2018.
  • Du et al. (2019) S. S. Du, X. Zhai, B. Poczos, and A. Singh. Gradient descent provably optimizes over-parameterized neural networks. In International Conference on Learning Representations (ICLR), 2019.
  • Engstrom et al. (2019) L. Engstrom, A. Ilyas, H. Salman, S. Santurkar, and D. Tsipras. Robustness (python library), 2019. URL https://github.com/MadryLab/robustness.
  • Goodfellow et al. (2015) I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
  • He et al. (2016) K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 770–778, 2016.
  • Hochreiter and Schmidhuber (1997) S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • Huang et al. (2017) G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 4700–4708, 2017.
  • Huang et al. (2021) H. Huang, Y. Wang, S. Erfani, Q. Gu, J. Bailey, and X. Ma. Exploring architectural ingredients of adversarially robust deep neural networks. Advances in Neural Information Processing Systems (NeurIPS), 34:5545–5559, 2021.
  • Ilyas et al. (2018) A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box adversarial attacks with limited queries and information. In International conference on machine learning, pages 2137–2146. PMLR, 2018.
  • Jia et al. (2022) X. Jia, Y. Zhang, B. Wu, K. Ma, J. Wang, and X. Cao. Las-at: adversarial training with learnable attack strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13398–13408, 2022.
  • Krizhevsky et al. (2012) A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems (NeurIPS), volume 25. Curran Associates, Inc., 2012.
  • Kurakin et al. (2018) A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Artificial intelligence safety and security, pages 99–112. Chapman and Hall/CRC, 2018.
  • Liu et al. (2019) S. Liu, P.-Y. Chen, X. Chen, and M. Hong. signsgd via zeroth-order oracle. In International conference on learning representations. International Conference on Learning Representations (ICLR), 2019.
  • Madry et al. (2018) A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018.
  • Moosavi-Dezfooli et al. (2016) S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 2574–2582, 2016.
  • Nair and Hinton (2010) V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML), 2010.
  • Rebuffi et al. (2021) S.-A. Rebuffi, S. Gowal, D. A. Calian, F. Stimberg, O. Wiles, and T. Mann. Fixing data augmentation to improve adversarial robustness. arXiv preprint arXiv:2103.01946, 2021.
  • Rice et al. (2020) L. Rice, E. Wong, and Z. Kolter. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning (ICML), pages 8093–8104, 2020.
  • Salman et al. (2020) H. Salman, A. Ilyas, L. Engstrom, A. Kapoor, and A. Madry. Do adversarially robust imagenet models transfer better? Advances in Neural Information Processing Systems (NeurIPS), 33:3533–3545, 2020.
  • Shaham et al. (2015) U. Shaham, Y. Yamada, and S. Negahban. Understanding adversarial training: Increasing local stability of neural nets through robust optimization. arXiv preprint arXiv:1511.05432, 2015.
  • Simonyan and Zisserman (2015) K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations (ICLR), 2015.
  • Szegedy et al. (2013) C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  • Szegedy et al. (2015) C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 1–9, 2015.
  • Wang et al. (2023) Z. Wang, T. Pang, C. Du, M. Lin, W. Liu, and S. Yan. Better diffusion models further improve adversarial training. arXiv preprint arXiv:2302.04638, 2023.
  • Wong et al. (2020) E. Wong, L. Rice, and J. Z. Kolter. Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations (ICLR), 2020.
  • Wu et al. (2020) D. Wu, S.-T. Xia, and Y. Wang. Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems (NeurIPS), 33:2958–2969, 2020.
  • Zhang et al. (2020) J. Zhang, J. Zhu, G. Niu, B. Han, M. Sugiyama, and M. Kankanhalli. Geometry-aware instance-reweighted adversarial training. arXiv preprint arXiv:2010.01736, 2020.
  • Zhang et al. (2021) Y. Zhang, M. Gong, T. Liu, G. Niu, X. Tian, B. Han, B. Schölkopf, and K. Zhang. Causaladv: Adversarial robustness through the lens of causality. arXiv preprint arXiv:2106.06196, 2021.
  • Zhang et al. (2022) Y. Zhang, G. Zhang, P. Khanduri, M. Hong, S. Chang, and S. Liu. Revisiting and advancing fast adversarial training through the lens of bi-level optimization. In International Conference on Machine Learning (ICML), pages 26693–26712. PMLR, 2022.
  • Zhao et al. (2022) Z. Zhao, H. Zhang, R. Li, R. Sicre, L. Amsaleg, and M. Backes. Towards good practices in evaluating transfer adversarial attacks. arXiv preprint arXiv:2211.09565, 2022.

Supplementary Materials

Appendix A Comparison with AutoAttack

AutoAttack (Croce and Hein, 2020) is an ensemble adversarial algorithm known for its success in adversarial attacks. One of its components, APGDCE, is a variant of the PGD algorithm. The results from Section 4.2 and 6.2 demonstrate that RGD is capable of learning the genuine adversarial distribution, resulting in stronger attack in the early steps. In contrast, PGD benefits from the larger perturbation changes introduced by the sign function, which leads to great performance in the later steps. Therefore, we integrate RGD into the 100-step APGDCE algorithm by replacing the first two APGD updates with RGD updates. Specifically, we implement APGDCE+RGD without restarting, and RGD is initialized with zero values. The step size α\alpha for RGD is carefully fine-tuned based on the first two step performance. The selection of two steps was motivated by its ability to yield the most significant enhancements in most scenarios. A comprehensive comparison of the improvements observed in the initial two steps is available in Appendix D. Following the same experimental setup in Croce and Hein (2020), we present comparison between APGDCE and APGDCE+RGD for their final robust accuracy in Table 5.

  Dataset Method APGDCE APGDCE+RGD
CIFAR-10 (ϵ=16/255\epsilon=16/255) WRN-34(Huang et al., 2021) 23.48±0.036\pm 0.036 23.35±0.028\pm 0.028
WRN-28(Wu et al., 2020) 28.0±0.061\pm 0.061 27.81±0.012\pm 0.012
PreRN-18(Wong et al., 2020) 9.63±0.037\pm 0.037 9.58±0.01\pm 0.01
CIFAR-100 (ϵ=16/255\epsilon=16/255) WRN-28(Wang et al., 2023) 19.43±0.082\pm 0.082 19.31±0.005\pm 0.005
WRN-70(Rebuffi et al., 2021) 16.88±0.039\pm 0.039 16.55±0.002\pm 0.002
XCiT(Debenedetti et al., 2022) 13.77±0.082\pm 0.082 13.71±0.004\pm 0.004
WRN-34(Jia et al., 2022) 11.56±0.036\pm 0.036 11.28±0.016\pm 0.016
ImageNet (ϵ=4/255\epsilon=4/255) RN-18(Salman et al., 2020) 29.32±0.033\pm 0.033 29.26±0.020\pm 0.020
XCiT(Debenedetti et al., 2022) 42.73±0.010\pm 0.010 42.73±0.010\pm 0.010
PreRN-18(Wong et al., 2020) 27.18±0.023\pm 0.023 27.07±0.035\pm 0.035
WRN-50(Salman et al., 2020) 40.86±0.029\pm 0.029 40.76±0.032\pm 0.032
 
Table 5: Comparison of robust accuracy in 100-step attack between APGDCE (AutoAttack) and APGDCE+RGD. The methods are abbreviated as in Table 2.

Our results demonstrate that despite the initial disadvantage of zero initialization, RGD achieves a lower robust accuracy compared to APGD in the first two steps as shown in Appendix D, highlighting its superiority. Furthermore, RGD maintains this advantage in the final stage across most scenarios as shown in Table 5. It is worth noting that such considerable performance improvement is achieved by only modifying the first two steps out of the 100 updates.

Appendix B Comparison of Initials

For initialization, we choose methods based on their suitability: PGD favors random initialization, while RGD and PGD (raw) lean towards zero initialization. To illustrate, we present the robust accuracy outcomes when attacking the robust ResNet18 model using various initial values:

  Method Random Initial Zero Initial
PGD 43.2% 43.27%
PGD (raw) 44.65% 44.49%
RGD 42.91% 42.88%
 
Table 6: Robust accuracy comparison of PGD, PGD (raw) and RGD witin random/zero initial for 7-step attack.

From Table 6, it is evident that PGD benefits from random initialization, whereas PGD (raw) and RGD perform better with zero initialization. Moreover, RGD exhibits superior performance in both contexts.

Appendix C Pixel-wise Experiments

To better illustrate that PGD enjoys larger perturbation change per step compared with RGD, we calculate the average pixel-wise perturbation change when attacking the robust ResNet18 model. The results are shown below:

  Algorithm 1 2 3 4 5 6 7
Perturbation change PGD 0.0167 0.0119 0.0077 0.005 0.0041 0.0035 0.0032
PGD (raw) 0.0116 0.0082 0.006 0.0054 0.0049 0.0048 0.0048
RGD 0.0146 0.0084 0.0048 0.003 0.0022 0.0019 0.0017
 
Table 7: Comparison of update algorithms for average perturbation change in different steps.

From Table 7, it is evident that RGD undergoes smaller perturbation shifts than PGD, leading to a decreased boundary ratio as depicted in Table 1. Despite these minor perturbation variations, RGD, benefiting from genuine adversarial perturbations, surpasses PGD in performance. However, the edge of this improvement narrows with increasing iterations.

In the case of PGD (raw), while it might show pronounced perturbation changes in later stages, its adversarial loss remains suboptimal, and it lacks consistent convergence stability. As a result, when compared to both PGD and RGD, performance of PGD (raw) is notably inferior.

Appendix D AutoAttack Extra Experimental Results

  Dataset Method APGDCE APGDCE+RGD
CIFAR10 (ϵ=16/255\epsilon=16/255) WRN-34(Huang et al., 2021) 90.01\rightarrow47.73 91.23\rightarrow45.79
WRN-28(Wu et al., 2020) 86.46\rightarrow41.71 88.25\rightarrow39.3
PreRN-18(Wong et al., 2020) 81.85\rightarrow24.75 83.34\rightarrow21.72
CIFAR100 (ϵ=16/255\epsilon=16/255) WRN-28(Wang et al., 2023) 70.72\rightarrow25.6 72.58\rightarrow24.17
WRN-70(Rebuffi et al., 2021) 61.73\rightarrow22.75 63.56\rightarrow21.44
XCiT(Debenedetti et al., 2022) 65.64\rightarrow19.88 67.34\rightarrow19.16
WRN-34(Jia et al., 2022) 65.11\rightarrow21.44 67.31\rightarrow20.33
ImageNet (ϵ=4/255\epsilon=4/255) RN-18(Salman et al., 2020) 52.64\rightarrow31.28 52.88\rightarrow30.68
XCiT(Debenedetti et al., 2022) 72.04\rightarrow46.58 72.06\rightarrow45.72
PreRN-18(Wong et al., 2020) 52.97\rightarrow30.94 53.46\rightarrow30.12
WRN-50(Salman et al., 2020) 68.36\rightarrow44.96 68.62\rightarrow43.9
 
Table 8: Comparison of robust ccuracy between APGD (AutoAttack) and APGD+RGD. The table presents the change in accuracy during the first two steps. The methods are abbreviated as follows: XCiT-S12 is denoted as XCiT, WideResNet as WRN, ResNet as RN, and PreActResNet as PreRN.

In this section, we evaluate the improvement in the first two steps of accuracy for APGD and APGD+RGD. The results are shown in Table 8. We can observe that despite having a worse initialization (zero initialization), RGD achieves a lower robust accuracy after two steps compared to APGD, indicating its superiority. Furthermore, APGD+RGD maintains these advantages in the final accuracy, as demonstrated in Table 5.

Appendix E Related Lemmas

Lemma 1.

Considering activation function h(x)h(x) as ReLU, i.e., h(x)=max(x,0)=|x|+x2h(x)=\max(x,0)=\frac{|x|+x}{2}, then we define |||\cdot| as element wise absolute operation, w1+=12(w1+|w1|),w1=12(w1|w1|)w_{1+}=\frac{1}{2}(w_{1}+|w_{1}|),w_{1-}=\frac{1}{2}(w_{1}-|w_{1}|) and obtain

w1T\displaystyle w_{1}^{T} h(w2(x+δct+1))w1Th(w2(x+δct))\displaystyle h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))
=(i)\displaystyle\overset{(i)}{=} 12w1T(|w2(x+δct+1)||w2(x+δct)|)+12w1Tw2(δct+1δct)\displaystyle\frac{1}{2}w_{1}^{T}(|w_{2}(x+\delta_{c}^{t+1})|-|w_{2}(x+\delta_{c}^{t})|)+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})
=(ii)\displaystyle\overset{(ii)}{=} 12w1+T(|w2(x+δct+1)||w2(x+δct)|)+12w1T(|w2(x+δct+1)||w2(x+δct)|)+12w1Tw2(δct+1δct)\displaystyle\frac{1}{2}w_{1+}^{T}(|w_{2}(x+\delta_{c}^{t+1})|-|w_{2}(x+\delta_{c}^{t})|)+\frac{1}{2}w_{1-}^{T}(|w_{2}(x+\delta_{c}^{t+1})|-|w_{2}(x+\delta_{c}^{t})|)+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})
(iii)\displaystyle\overset{(iii)}{\leq} 12w1+T(|w2(x+δct+1)w2(x+δct)|)12w1T(|w2(x+δct+1)w2(x+δct)|)+12w1Tw2(δct+1δct)\displaystyle\frac{1}{2}w_{1+}^{T}(|w_{2}(x+\delta_{c}^{t+1})-w_{2}(x+\delta_{c}^{t})|)-\frac{1}{2}w_{1-}^{T}(|w_{2}(x+\delta_{c}^{t+1})-w_{2}(x+\delta_{c}^{t})|)+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})
=\displaystyle= 12(w1+Tw1T)(|w2(x+δct+1)w2(x+δct)|)+12w1Tw2(δct+1δct)\displaystyle\frac{1}{2}(w_{1+}^{T}-w_{1-}^{T})(|w_{2}(x+\delta_{c}^{t+1})-w_{2}(x+\delta_{c}^{t})|)+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})
=\displaystyle= 12(w1+Tw1T)|w2(δct+1δct)|+12w1Tw2(δct+1δct)\displaystyle\frac{1}{2}(w_{1+}^{T}-w_{1-}^{T})|w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})|+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})
=(iv)\displaystyle\overset{(iv)}{=} 12|w1T||w2(δct+1δct)|+12w1Tw2(δct+1δct),\displaystyle\frac{1}{2}|w_{1}^{T}||w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})|+\frac{1}{2}w_{1}^{T}w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t}),

where (i)(i) follows because h(x)=|x|+x2h(x)=\frac{|x|+x}{2}, (ii)(ii) follows because w1T=w1+T+w1Tw_{1}^{T}=w_{1+}^{T}+w_{1-}^{T}, (iii)(iii) follows because all elements in w1+Tw_{1+}^{T} are non-negative, all elements in w1Tw_{1-}^{T} are non-positive and (iv)(iv) follows because w1+w1=|w1|w_{1+}-w_{1-}=|w_{1}|. Then, we take absolute operation for both sides and obtain:

|w1T\displaystyle|w_{1}^{T} h(w2(x+δct+1))w1Th(w2(x+δct))||w1T||w2(δct+1δct)||w1T||w2||δct+1δct|,\displaystyle h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))|\leq|w_{1}^{T}||w_{2}(\delta_{c}^{t+1}-\delta_{c}^{t})|\leq|w_{1}^{T}||w_{2}||\delta_{c}^{t+1}-\delta_{c}^{t}|,

where |||\cdot| is defined as element wise absolute operation.

Lemma 2.

Considering g(x)=12[w1Th(w2x)y(x)]2g(x)=\frac{1}{2}[w_{1}^{T}h(w_{2}*x)-y^{\ast}(x)]^{2}, then we obtain

|12\displaystyle\bigg{|}\frac{1}{2} w1Th(w2(x+δct+1))+12w1Th(w2(x+δct+1))y(x)|\displaystyle w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))+\frac{1}{2}w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-y^{\ast}(x)\bigg{|}
\displaystyle\leq 12|w1Th(w2(x+δct+1))y(x)|+12|w1Th(w2(x+δct))y(x)|\displaystyle\frac{1}{2}|w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-y^{\ast}(x)|+\frac{1}{2}|w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))-y^{\ast}(x)|
=(i)\displaystyle\overset{(i)}{=} 22[g(x+δct+1)+g(x+δct)],\displaystyle\frac{\sqrt{2}}{2}[\sqrt{g(x+\delta_{c}^{t+1})}+\sqrt{g(x+\delta_{c}^{t})}],

where (i)(i) follows from g(x)g(x) definition.

Appendix F Proof of Theorem 1

Considering g(x)=12[w1Th(w2x)y(x)]2g(x)=\frac{1}{2}[w_{1}^{T}h(w_{2}*x)-y^{\ast}(x)]^{2}, activation function h(x)h(x) as ReLU, we define |||\cdot| as element wise absolute operation and characterize the adversarial step gain g(x+δct+1)g(x+δct)g(x+\delta_{c}^{t+1})-g(x+\delta_{c}^{t}) as follows:

g(x\displaystyle g(x +δct+1)g(x+δct)\displaystyle+\delta_{c}^{t+1})-g(x+\delta_{c}^{t})
=(i)\displaystyle\overset{(i)}{=} 12[w1Th(w2(x+δct+1))y(x)]212[w1Th(w2(x+δct))y(x)]2\displaystyle\frac{1}{2}[w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-y^{\ast}(x)]^{2}-\frac{1}{2}[w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))-y^{\ast}(x)]^{2}
=\displaystyle= 12[w1Th(w2(x+δct+1))w1Th(w2(x+δct))][w1Th(w2(x+δct+1))+w1Th(w2(x+δct))]\displaystyle\frac{1}{2}[w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))][w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))+w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))]
[w1Th(w2(x+δct+1))w1Th(w2(x+δct))]y(x)\displaystyle-[w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))]y^{\ast}(x)
=\displaystyle= [w1Th(w2(x+δct+1))w1Th(w2(x+δct))][12w1Th(w2(x+δct+1))+12w1Th(w2(x+δct+1))y(x)]\displaystyle[w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))]*[\frac{1}{2}w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))+\frac{1}{2}w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-y^{\ast}(x)]
\displaystyle\leq |w1Th(w2(x+δct+1))w1Th(w2(x+δct))||12w1Th(w2(x+δct+1))+12w1Th(w2(x+δct+1))y(x)|\displaystyle|w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t}))|*\bigg{|}\frac{1}{2}w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))+\frac{1}{2}w_{1}^{T}h(w_{2}*(x+\delta_{c}^{t+1}))-y^{\ast}(x)\bigg{|}
(ii)\displaystyle\overset{(ii)}{\leq} 22|w1T||w2||δct+1δct|(g(x+δct+1)+g(x+δct)),\displaystyle\frac{\sqrt{2}}{2}*|w_{1}^{T}||w_{2}||\delta_{c}^{t+1}-\delta_{c}^{t}|\left(\sqrt{g(x+\delta_{c}^{t+1})}+\sqrt{g(x+\delta_{c}^{t})}\right),

where (i)(i) follows from g(x+δct)g(x+\delta_{c}^{t}) definition and (ii)(ii) follows from Lemma 1 and 2.