\ArticleType

RESEARCH PAPER \Year2022 \Month \Vol \No \DOI \ArtNo \ReceiveDate \ReviseDate \AcceptDate \OnlineDate

Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness

zlin@pku.edu.cn

\AuthorMark

Author A

\AuthorCitation

Author A, Author B, Author C, et al

Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness

Ke Sun Mingjie Li Zhouchen Lin State Key Lab of General AI, School of Intelligence Science and Technology, Peking University Institute for Artificial Intelligence, Peking University Pazhou Laboratory (Huangpu)

Abstract

Adversarial robustness, which primarily comprises sensitivity-based robustness and spatial robustness, plays an integral part in achieving robust generalization. In this paper, we endeavor to design strategies to achieve universal adversarial robustness. To achieve this, we first investigate the relatively less-explored realm of spatial robustness. Then, we integrate the existing spatial robustness methods by incorporating both local and global spatial vulnerability into a unified spatial attack and adversarial training approach. Furthermore, we present a comprehensive relationship between natural accuracy, sensitivity-based robustness, and spatial robustness, supported by strong evidence from the perspective of robust representation. Crucially, to reconcile the interplay between the mutual impacts of various robustness components into one unified framework, we incorporate the Pareto criterion into the adversarial robustness analysis, yielding a novel strategy called Pareto Adversarial Training for achieving universal robustness. The resulting Pareto front, which delineates the set of optimal solutions, provides an optimal balance between natural accuracy and various adversarial robustness. This sheds light on solutions for achieving universal robustness in the future. To the best of our knowledge, we are the first to consider universal adversarial robustness via multi-objective optimization.

keywords:

Deep Learning, Adversarial Robustness, Reliable Machine Learning, Pareto Optimization, Spatial Robustness

1 Introduction

Robust generalization serves as an extension of the traditional generalization that is normally achieved via Empirical Risk Minimization for i.i.d. data [33]. However, the test environment could be slightly or dramatically different from the training environment [15] in a robust generalization scenario. Lately, improving the robustness of deep neural networks has been one of the pivotal areas of research, encompassing different threads of research such as adversarial robustness [8, 28], non-adversarial robustness [10, 37], Bayesian deep learning [23, 7] and causality [1]. In this paper, we focus on adversarial robustness, where adversarial examples are carefully manipulated by humans to fool machine learning models, e.g., deep neural networks, which could pose serious threats, especially in safety-critical applications. Currently, adversarial training [8, 21, 4, 36] is regarded as a promising and widely accepted strategy to address this issue.

Like Out-of-Distribution (OoD) robustness, adversarial robustness also has several aspects [9, 20, 3], including sensitivity-based robustness [30], i.e., robustness against pixel-wise perturbations (normally within the constraints of an $l_{p}$ ball), and spatial robustness, i.e., robustness against multiple spatial transformations. Computer vision and graphics literature provide a deeper insight into these two aspects, revealing that two main factors determine the appearance of a pictured object [35, 29]: (1) lighting and materials, and (2) geometry. Most previous studies on adversarial robustness have focused only on the first factor [35] by examining pixel-wise perturbations, e.g., Projected Gradient Descent (PGD) attacks [21], assuming that the underlying geometry stays the same after the adversarial perturbation. Only a small proportion of research works have attempted to tackle the less-studied second factor, which includes Flow-based [35] and Rotation-Translation (RT)-based attacks [6, 5].

However, it is crucial to consider spatial robustness for achieving universal robustness, the ultimate objective of robust generalization. One of the most important reasons is that sensitivity-based robustness, which is generally based on the $l_{p}$ -distance, is not sufficient to maintain perceptual similarity [25, 6, 5, 35]. Specifically, although spatial attacks or geometric transformations result in small perceptual differences, they yield large $l_{p}$ distances.

A clear relationship between accuracy, sensitivity-based and spatial robustness is the key to achieving universal adversarial robustness. While the trade-off between sensitivity-based robustness and accuracy has been revealed by several studies [40, 32, 24], the comprehensive relationships among spatial robustness and them are still unclear. Although previous studies [31, 12] have explored this issue, they only focused on Rotation-Translation spatial robustness and did not consider Flow-based spatial robustness [35, 39]. Surprisingly, we find that Flow-based spatial robustness presents a relationship contrary to the one revealed previously, making the previous conclusion less reliable.

Based on this important finding, we start our exploration of clearer relationships between different robustnesses, and we eventually harmonize the conflicting relationships within them by leveraging the Pareto criterion [14, 13, 38], thus achieving an optimal balanced universal robustness. A recent study [24] attributes the conflicting relationships among the various robustnesses to overparametrization, while we uncovered it from the perspective of different shape-biased representations. Another report [34] examined the trade-off in the inference time, while we target more comprehensive relationships between different robustnesses with a different methodology.

In this paper, we first try to gain deeper insights into the robustness relationships by investigating the two main spatial robustness branches, i.e., Flow-based spatial attack [35] and Rotation-Translation (RT) attack [5]. After revealing their impact on local and global spatial sensitivity, we propose integrated spatial attack and spatial adversarial training, which can incorporate comprehensive spatial vulnerabilities or robustness. Based on this understanding, we present a comprehensive relationship among the accuracy, sensitivity-based robustness, and the two branches of spatial robustness by investigating their different saliency maps from the perspectives of shape-bias, sparse or dense representation. It turns out that while the relationship between sensitivity-based and RT robustness is a fundamental trade-off, sensitivity-based and Flow-based spatial robustness are highly correlated, providing a vital supplementary for previous conclusions. Thus, comprehensive relationships between accuracy and the various robustnesses are not pure trade-offs, motivating us to introduce the Pareto criterion [14, 13, 38], the general multi-objective optimization principle, into the universal adversarial robustness analysis. The Pareto criterion enables an optimal balance between the interplay of natural accuracy and the different adversarial robustnesses, leading to universal adversarial robustness in a Pareto manner. By incorporating a two-moment term that can capture the interaction between loss of accuracy and different robustnesses, we propose a bi-level optimization framework called Pareto Adversarial Training. The resulting Pareto front provides a set of optimal solutions that can balance perfectly all the relationships under consideration, outperforming other existing strategies.

Our contributions can be summarized as follows:

•

We reveal the existence of both local and global spatial robustness and propose integrated spatial attack and spatial adversarial training, incorporating comprehensive spatial vulnerabilities.
•

We present comprehensive relationships among accuracy, sensitivity-based, and different spatial robustnesses, supported by strong and intuitive evidence from the perspective of robust representation.
•

We incorporate the Pareto criterion into adversarial robustness analysis, and the resulting Pareto Adversarial Training can optimally balance multiple adversarial robustness, yielding universal adversarial robustness.

2 Local and Global Spatial Robustness

To present the comprehensive relationships between accuracy and different adversarial robustnesses, we first provide a fine-grained understanding of spatial robustness. We summarize several studies about spatial robustness [6, 5, 35, 39, 31, 12] into two major branches: (1) Flow-based Attacks, and (2) Rotation-Translation (RT) Attacks. In particular, we find that the former mainly focuses on the local spatial vulnerability while the latter tends to capture the global spatial sensitivity. Based on this finding, integrated spatial attack and spatial adversarial training are proposed.

Refer to caption — Figure 1: Visualization of Flow-based, RT and Our Integrated Spatial adversarial examples on MNIST, CIFAR- $10$ and Caltech- $256$ . More images and detailed discussions are provided in A.

2.1 Local Spatial Robustness: Flow-based Attacks

The most representative Flow-based Attack is the Spatial Transformed Attack [35], wherein a differentiable flow vector $w_{F}=(\Delta\mu,\Delta v)$ is introduced in the 2D coordinates $(\mu,v)$ to craft adversarial spatial transformation. The vanilla-targeted Flow-based attack [35] follows the optimization manner ( $\kappa=0$ ):

\displaystyle w_{F}^{*}=\operatorname*{arg\,min}_{w_{F}}\max_{i\neq t}f_{\theta}^{i}(x_{w_{F}})-f_{\theta}^{t}(x_{w_{F}})+\tau\mathcal{L}_{flow}(w_{F}),

(1)

where $f_{\theta}(x)=\left(f_{\theta}^{1}(x),\ldots,f_{\theta}^{K}(x)\right)$ is the classifier in the $K$ -classification task. $x_{w_{F}}$ is a Flow-based adversarial example parameterized by the flow vector $w_{F}$ . $\mathcal{L}_{flow}$ , which measures the local smoothness of the spatial transformation balanced by $\tau$ .

Interestingly, our empirical study shown in the left part of Figure 1 suggests that the Flow-based attack tends to yield local permutations among pixels in some specific regions, irrespective of the option of $\tau$ , rather than a global spatial transformation based on their shapes. Our analysis indicates that this phenomenon is due to two factors: 1) Local permutations, especially in regions where colors of pixels change dramatically, are already sufficiently sensitive to manipulations, as demonstrated by our empirical results shown in Figure 1. 2) The manner of optimization does not incorporate any sort of shape transformation information, e.g., a parametric equation of rotation, as opposed to the vanilla Rotation-Translation attack, which we present in the following. Therefore, we conclude that Flow-based attacks tend to capture the local spatial vulnerability. Further, to design the integrated spatial attack, we transform Eq 1 into its untargeted version under cross-entropy loss with flow vector bounded by an $\epsilon_{F}$ -ball:

\displaystyle w^{*}_{F}=\operatorname*{arg\,max}_{w_{F}}\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{F}},y)\ \ s.t.\ \|w_{F}\|\leq\epsilon_{F},

(2)

where $\mathcal{L}_{\theta}^{\mathrm{CE}}(x,y)=\log\sum_{j}\exp\left(f_{\theta}^{j}(x)\right)-f_{\theta}^{y}(x)$ . To maintain a uniform optimization form in our integrated spatial attack, we replace local smoothness term $\mathcal{L}_{flow}$ in Eq. 1 with our familiar $l_{p}$ constraint and leverage the cross-entropy loss instead of the $max$ operation as suggested in [2]. Proposition 1 reveals the correlation between the two losses, indicating that the smooth approximation version of $max$ operation in Eq. 1, denoted as $\mathcal{L}^{S}_{\theta}$ , has a parallel updating direction with cross-entropy loss related to $w_{F}$ . Proof can be found in B.

Proposition 1.

Consider $\mathcal{L}^{S}_{\theta}(x,y)=\log\sum_{i\neq y}\exp\left(f_{\theta}^{i}(x)\right)-f_{\theta}^{y}(x)$ as the smooth version loss of Eq. 1 without a local smoothness term. For a fixed $(x_{w_{F}},y)$ and $\theta$ , we have

\displaystyle\nabla_{w_{F}}\mathcal{L}_{\theta}^{\text{CE}}(x_{w_{F}},y)=r(x_{w_{F}},y)\nabla_{w_{F}}\mathcal{L}_{\theta}^{S}(x_{w_{F}},y),

(3)

where $r(x_{w_{F}},y)=\sum_{i\neq y}\exp\left(f_{\theta}^{i}(x_{w_{F}})\right)/\sum_{i}\exp\left(f_{\theta}^{i}(x_{w_{F}})\right)$ .

2.2 Global Spatial Robustness: Rotation-Translation Attacks

The original Rotation-Translation attack [6, 5] applies parametric equation constraints on 2D coordinates, thus capturing the global spatial information:

\displaystyle\left[\begin{array}[]{l}u^{\prime}\\ v^{\prime}\end{array}\right]=\left[\begin{array}[]{cc}\cos\theta&-\sin\theta\\ \sin\theta&\cos\theta\end{array}\right]\left[\begin{array}[]{l}u\\ v\end{array}\right]+\left[\begin{array}[]{l}\delta u\\ \delta v\end{array}\right].

(4)

To design a generic spatial transformation matrix that can simultaneously consider rotation, translation, cropping, and scaling, we re-parameterize the transform matrix as a generic 6-dimensional affine transformation one, inspired by Spatial Transformer Networks [11]:

\displaystyle\left[\begin{array}[]{l}u^{\prime}\\ v^{\prime}\end{array}\right]=(\left[\begin{array}[]{ccc}1&0&0\\ 0&1&0\end{array}\right]+\left[\begin{array}[]{ccc}w_{RT}^{11}&w_{RT}^{12}&w_{RT}^{13}\\ w_{RT}^{21}&w_{RT}^{22}&w_{RT}^{23}\end{array}\right])\left[\begin{array}[]{l}u\\ v\\ 1\end{array}\right],

(5)

where we denote $A_{w_{RT}}$ as the generic 6-dimensional affine transformation matrix, in which each entry of $w_{RT}$ indicates the increment in different spatial aspects. For example, $(w_{RT}^{13},w_{RT}^{23})$ determines translation. Finally, the optimization form of the resulting generic and differentiable RT-based attack bounded by $\epsilon_{RT}$ -ball is expressed as:

\displaystyle w^{*}_{RT}=\operatorname*{arg\,max}_{w_{RT}}\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{RT}},y)\ \ s.t.\ \|w_{RT}\|\leq\epsilon_{RT}.

(6)

2.3 Integrated Spatial Attack

The key to achieving integrated spatial robustness is to design an integrated parameterized sampling grid $\mathcal{T}_{w_{RT},w_{F}}\left(G\right)$ that can wrap the regular grid with both flow and affine transformation, where $G$ is the generated grid. We show our integrated approach as shown below:

	$\displaystyle\mathcal{T}_{w_{RT},w_{F}}\left(G\right)$	$\displaystyle=A_{w_{RT}}\left[\begin{array}[]{l}u\\ v\\ 1\end{array}\right]+w_{F},$		(7)
	$\displaystyle x^{adv}$	$\displaystyle=\mathcal{T}_{w_{RT},w_{F}}\left(G\right)\circ x.$		(7)

Then we sample new $x^{adv}$ by $\mathcal{T}_{w_{RT},w_{F}}\left(G\right)$ via the differentiable bilinear interpolation [11]. Note that $w_{F}$ has the same dimensions as the grid $G$ , which are different from the impact of two-dimensional translation parameters in $w_{RT}$ . Then the final loss function of the integrated spatial attack can be presented as:

\displaystyle w^{*}=\operatorname*{arg\,max}_{w}\mathcal{L}^{\text{CE}}_{\theta}(x+\eta_{w},y),\ \ s.t.\ \|w\|\leq\epsilon,

(8)

where $\eta_{w}$ is the crafted integrated spatial perturbation parameterized by $w=[w_{F},w_{RT}]^{T}$ , simultaneously considering both Flow-based and RT spatial sensitivity. Note that $\eta_{w}$ itself does not necessarily satisfy the $l_{p}$ constraint directly. For the implementation, we follow the PGD procedure [21], a common practice in sensitivity-based attacks. We consider the infinity norm of $w$ and different learning rates for the two types of spatial robustness. Therefore, the updating rule of $w$ in each iteration is:

$\displaystyle\left[\begin{array}[]{l}\bar{w}^{t+1}_{F}\\ \bar{w}^{t+1}_{RT}\end{array}\right]$	$\displaystyle=\left[\begin{array}[]{l}w^{t}_{F}\\ w^{t}_{RT}\end{array}\right]+\left[\begin{array}[]{l}\alpha_{F}\\ \alpha_{RT}\end{array}\right]\text{sign}(\nabla_{w}\mathcal{L}^{\text{CE}}_{\theta}(x^{t}_{w^{t}},y)),$	(9)
$\displaystyle\left[\begin{array}[]{l}w^{t+1}_{F}\\ w^{t+1}_{RT}\end{array}\right]$	$\displaystyle=\text{clip}_{\epsilon}(\left[\begin{array}[]{l}\bar{w}^{t+1}_{F}\\ \bar{w}^{t+1}_{RT}\end{array}\right]),$
$\displaystyle x^{t+1}_{w^{t+1}}$	$\displaystyle=\mathcal{T}_{w^{t+1}}\left(G\right)\circ x,$

where $w^{t+1}=[w^{t+1}_{F},w^{t+1}_{RT}]^{T}$ is element-wisely clipped from $\bar{w}^{t+1}$ by $\epsilon=[\bm{\epsilon}_{F},\bm{\epsilon}_{RT}]^{T}$ . From Figure 1, we can observe that our Integrated Spatial Attack can construct both local and global spatial transformations on images. Thus, it can simultaneously yield local pixel-wise permutations and global shape transformations.

Then, we visualize the loss surface under this Integrated Spatial Attack leveraging “filter normalization” [18] as illustrated in Figure 2. We strictly follow the implementation from [18] to achieve the desired visualization of the loss landscape of our integrated adversarial attacks for all the differentiable parameters $w$ . Specifically, we view $w_{F}$ and $w_{RT}$ as two parameterized filters, which is analogous to the “filter normalization?? technique proposed by [18]. In the left part of Figure 2, we adjust the initialization of the variance of $w$ , which then can provide a distant view of loss landscape before the optimization in Eq. 8. It exhibits a highly regular loss landscape, and its non-concavity w.r.t. only rotation and translation [5] has been tremendously improved. In the middle of Figure 2, we then provide a closer view of the loss landscape before the optimization. It shows a highly convex surface around the $w$ to be optimized, facilitating the following optimization. In the right part of Figure 2, we also present the loss landscape around the maxima $w^{*}$ after the optimization in Eq. 8 of our integrated spatial attack, exhibiting a highly concave surface as well. In summary, the highly non-concave loss landscape concerning only rotation and translation raised by [5] has been largely alleviated by considering both local and global spatial vulnerabilities. This integrated form smooths the optimization process, which guarantees the efficacy of our Integrated Spatial Attack.

2.4 Spatial Adversarial Training

As Eq. 9 incorporates local and global spatial robustness simultaneously, it is natural to leverage it to construct Spatial Adversarial Training, which we deploy in Experiment 4.4.

3 Relationship Between Sensitivity and Spatial Robustness

In this section, we will empirically investigate the relationships between different robustnesses and then explain them from the perspective of shape-based representation by leveraging a saliency map.

3.1 Relationships

We conduct rigorous experiments on MNIST, CIFAR- $10$ , and Caltech- $256$ datasets to empirically examine the behavior of local and global spatial robustness as the sensitivity-based robustness increases. Specifically, after adversarially training multiple PGD (sensitivity-based) robust models with different numbers of PGD iterations, we further compute their test accuracy under Flow-based and RT-based spatial attacks via methods proposed in Section 2. The accuracy is computed on correctly classified test data for the model under consideration to mitigate the impact of the slightly different generalizations of these PGD-trained models. We fix both $\epsilon_{F}$ and $\epsilon_{RT}$ as 0.3 on MNIST, and choose $\epsilon_{F}$ and $\epsilon_{RT}$ as 0.3 and 1.0, respectively, on CIFAR- $10$ and Caltech- $256$ . Then, we can control their strength of perturbations by adjusting the number of iterations in Flow-based and RT-based spatial attacks.

In Figure 3, the X-axis shows adversarially PGD-trained models with different numbers of PGD iterations, which can measure the different strengths of a model’s PGD (sensitivity-based) robustness. The Y-axis represents the computed test accuracy of the corresponding PGD-trained models under different spatial attacks, and a high-level test accuracy reflects a model’s high spatial robustness. It turns out that Flow-based spatial robustness (red lines) presents a steady ascending tendency across three datasets as the PGD sensitivity-based robustness increases, while the trend of RT-based spatial robustness (blue lines) fluctuates conversely. This result reveals that the sensitivity-based and RT-based spatial robustness is a trade-off relationship, consistent with the previous conclusion [12, 31]. However, this trade-off does not (even on the contrary) apply to the local spatial sensitivity, where sensitivity-based and Flow-based spatial robustness is positively correlated. We provide strong and intuitive evidence from the perspective of shape-biased representation below.

3.2 Explanation from the Shape-bias Representation

We show first with our brief conclusion: the sensitivity-based robustness corresponds to the sparse and shape-bias representation [26, 41], indicating that sensitivity-based robust models rely more on the global shape during prediction rather than the local texture. Nevertheless, the local and global spatial robustness are associated with different representation manners.

We visualize the saliency maps of naturally trained, PGD, Flow-based, and RT adversarially trained models on some randomly selected images on Caltech- $256$ , which are exhibited in Figure 4 to examine the shape-biased representation. Specifically, visualizing the saliency maps aims at assigning a sensitivity value, sometimes also called “attribution”, to show the sensitivity of the output to each pixel of an input image. Following [26, 41], we leverage SmoothGrad [27] to calculate the saliency map $S(x)$ of an image $x$ , which alleviates the noises in the gradient by averaging over the gradient of $n$ noisy copies of an input:

\displaystyle S(x)=\frac{1}{n}\sum_{i=1}^{n}\frac{\partial f^{y}_{\theta}\left(x_{i}\right)}{\partial x_{i}},

(10)

where $x_{i}=x+q_{i}$ , and $q_{i}$ are noises drawn i.i.d from a Gaussian distribution $\mathcal{N}(0,\sigma^{2})$ . In our experiment, we set $n=100$ and the noise level $\sigma/(x_{max}-x_{min})=0.1$ .

Figure 4 shows that PGD-trained models tend to learn a sparse and shape-biased representation for all pixels of an image, while two types of spatially adversarially trained models suggest a converse representation. In particular, the representation from the Flow-based training model presents a noisy and shape-biased one as it places extreme values, although noisy, on pixels around the shape of objects, e.g., the edge between the horse and the background shown in Flow AT in Figure 4. On the contrary, RT-based models rely less on the shape of objects, and the saliency values tend to be dense, smoothly scattering around more pixels of an image.

We calculate the distance of saliency maps from different models across all test data on the Caltech- $256$ dataset and then compute their skewness in Figure 5. Specifically, we compute the pixel-wise distance between the saliency maps of the two models, and then we calculate the median of the skewness of the saliency map difference for all test data. Note that if two saliency maps have no statistical difference, then the difference in the values will follow a symmetric normal distribution with skewness 0. Negative skewness indicates that the original saliency map (representation) is sparse as compared to the model under consideration. We plot the tendency of skewness as the strength of some specific robustness increases in Figure 5. We summarize the observations into two conclusions:

1.

Based on the first and fourth sub-pictures, both PGD and Flow-based robust models tend to learn a sparse and shape-biased representation compared with the natural model. However, the Flow-based trained model is less sparse (we call it noisy shape-biased) in comparison with the PGD-trained one.
2.

In contrast, RT-based robust models tend to learn a dense representation. This is intuitive because the RT-trained model is expected to memorize broader pixel locations to cope with potential rotations and transformations in the test data.

Overall, the divergent representation (sparse vs. dense) between RT-based and sensitivity robustness verifies that the trade-off shown in Figure 3 is fundamental. More importantly, the positive correlation of sensitivity-based and local spatial robustness, shown in Figure 3, can also be explained by their similar shape-biased representation, although the latter tends to be noisy.

4 Pareto Adversarial Robustness

4.1 Motivation

Multi-objective Optimization. Given the insights garnered from our analysis of the relationships between natural accuracy and different kinds of adversarial robustness, a natural question that comes up is how to design a training strategy that can perfectly balance their mutual impacts, which mainly results from their different representation manners. In most cases, their relationships exhibit trade-offs, except for the positive correlation between sensitivity robustness and local spatial robustness. We use $\mathcal{L}_{nat},\mathcal{L}_{\text{PGD}},\mathcal{L}_{\text{Flow}}$ and $\mathcal{L}_{\text{RT}}$ to represent the natural loss, the PGD adversarial loss, the Flow-based and the RT-based adversarial loss, respectively. We cast obtaining universal adversarial robustness as well as maintaining natural generalization ability as a multi-objective optimization problem [16], encompassing all of the aforementioned losses with a loss vector:

\displaystyle\min_{\theta}\mathcal{L}^{\theta}=(\mathcal{L}_{0}^{\theta},\mathcal{L}_{1}^{\theta},\mathcal{L}_{2}^{\theta},\mathcal{L}_{3}^{\theta})^{\top},

(11)

where $\mathcal{L}_{0}^{\theta},\mathcal{L}_{1}^{\theta},\mathcal{L}_{2}^{\theta},\mathcal{L}_{3}^{\theta}$ represent $\mathcal{L}_{\text{nat}},\mathcal{L}_{\text{PGD}},\mathcal{L}_{\text{Flow}},\mathcal{L}_{\text{RT}}$ respectively for simplicity, sharing the same model parameter $\theta$ . The multi-objective optimization is to optimize all loss functions simultaneously by exploiting the shared knowledge and structure, e.g., the representation.

Pareto Optimization. To harmonize these competing optimization objectives in the context of adversarial robustness, we introduce Pareto Optimization [14, 19, 17], which is successfully applied when optimal decisions need to be taken in the presence of trade-offs between multiple conflicting objectives. Pareto optimization endeavors to achieve Pareto Optimality, a balanced situation between all objectives, where none of the objective functions can be improved in value without degrading some of the other objective values. Mathematically, we have the following definitions [42, 19].

Pareto Dominance in Adversarial Robustness. Let $\theta^{1},\theta^{2}$ be two parameters in the space $\Omega$ . $\theta^{1}$ dominates $\theta^{2}$ , i.e., $\theta^{1}\prec\theta^{2}$ , if and only if $\mathcal{L}_{i}^{\theta^{1}}\leq\mathcal{L}_{i}^{\theta^{2}},\forall i\in\{0,1,2,3\}$ and $\mathcal{L}_{j}^{\theta^{1}}<\mathcal{L}_{j}^{\theta^{2}},\exists j\in\{0,1,2,3\}$ .

Pareto Optimality. $\theta^{*}$ is a Pareto optimal point, and $\mathcal{L}^{\theta^{*}}$ is a Pareto optimal objective vector if it does not exist $\hat{\theta}\in\Omega$ such that $\hat{\theta}\prec\theta^{*}$ . The resulting Pareto front contains all Pareto optimal solutions.

Pareto Adversarial Robustness. Based on the insights presented above, a natural approach for incorporating Pareto criteria into multi-objective optimization in the context of adversarial training is to achieve universal adversarial robustness as well as maintain a desirable natural accuracy. The resulting Pareto Front contains all optimal, adversarially trained models for the given different constraints. The detailed formulation is presented later in Section 4.3.

4.2 Limitations of the Existing Strategies.

We denote $\mathcal{R}_{\text{adv}}(f;S_{i}):=\mathbb{E}_{(x,y)\sim\mathcal{D}}\left[\max_{\bm{r}\in S_{i}}\mathcal{L}(f(x+r),y)\right]$ as the adversarial risk under perturbation sets $S_{i},i=1,...,m.$ Our goal is to find $f_{\theta}$ that can achieve uniform risk minimization across all $S_{i}$ as well as the minimal risk in the natural data. There are two common strategies to handle this issue.

1) Average adversarial training (Ave AT) [31]. $\mathcal{R}_{\text{ave}}(f;S):=\mathbb{E}_{(x,y)\sim\mathcal{D}}\left[\frac{1}{m}\sum_{i=1}^{m}\max_{\bm{r}\in S_{i}}\mathcal{L}(f(x+r),y)\right]$ , regards each adversarial robustness as having equal status. Intuitively, it may yield unsatisfactory solutions when the strength of different attacks mixed in the training are not balanced.

2) Max adversarial training (Max AT) [31, 22], i.e., $\mathcal{R}_{\text{max}}(f;S):=\mathbb{E}_{(x,y)\sim\mathcal{D}}[\max_{i}\{\max_{\bm{r}\in S_{i}}\mathcal{L}(f(x+r),y)\}]$ tries to optimize over the max loss from the largest perturbations.

Overfitting issue of Max AT. Intuitively, Max AT may overfit to one specific type of adversarial robustness if its adversarial attack used for training is too strong. In Figure 6, we plot the difference in robust accuracy between Max AT and single PGD adversarial training. It turns out that as the strength of PGD attack $\epsilon$ used in Max AT increases, the difference among the three kinds of robust accuracy between Max AT and a single PGD AT tends to vanish. This indicates that the comprehensive robustness of Max AT degenerates to a single PGD adversarial training because the PGD loss tends to dominate as the strength of the PGD attack increases.

Overfitting issue of Ave AT based on its Relationship with Max AT. We consider the generalization issue based on different risks and then set the risk in Max AT and Ave AT as $\mathcal{R}_{\text{max}}=\max_{i}\mathcal{R}(f,S_{i})=\max_{i}\mathcal{R}^{S_{i}}$ and $\mathcal{R}_{\text{ave}}=\frac{1}{m}\sum_{i=1}^{m}\mathcal{R}(f_{\theta},S_{i})=\frac{1}{m}\sum_{i=1}^{m}\mathcal{R}^{S_{i}}$ . Proposition 2 informs that Max AT is closely associated with some form of Ave AT. This indicates that Max AT is likely to perform similarly to the specific form of Ave AT, which also suffers from unsatisfactory solutions when the strength of different attacks mixed in training is imbalanced. We verified this claim in Table 2 under a stronger PGD attack in Section 4.3.

Proposition 2.

Given KKT differentiability and qualification conditions, $\exists\lambda_{i}\geq 0$ , such that the risk minimizer in Max AT, i.e., $\mathcal{R}^{\star}_{\text{max}}$ is a first-order stationary point of $\sum_{S_{i}\in\mathcal{S}}\lambda_{i}\mathcal{R}^{S_{i}}$ regardless of the relationship of $S_{i}$ .

Remark. We point out that both Ave AT and Max AT may suffer from the robustness overfitting issue and thus fail in certain scenarios. However, a clever combination choice among all involved adversarial losses has the potential to alleviate the overfitting issues, thus outperforming both Max AT and Ave AT in terms of universal robustness. Motivated by this, we propose Pareto Adversarial Training in the next section, which will provide strong empirical evidence to support this intuition.

4.3 Pareto Adversarial Training

We apply linear scalarization to solve the multi-objective optimization, which is the most commonly used approach. We denote $\alpha=(\alpha_{0},\alpha_{1},\alpha_{2},\alpha_{3})$ as the combination coefficients for various losses. Thus, the objective function is $\min_{\theta}\sum_{i=0}^{3}\mathbb{E}_{x}\left[\alpha_{i}^{*}(\theta)\mathcal{L}_{i}^{\theta}\right]$ . Further, within the context of Pareto optimality, our goal is to find optimal combinations $\alpha$ between natural accuracy, sensitivity-based, and spatial robustness to perfectly balance their mutual impacts during the whole training process. Furthermore, we train a model $f_{\theta}$ under the optimal combinations $\alpha^{*}$ of different losses, and the computation of $\alpha^{*}$ in training is also associated with different losses determined by model parameters $\theta$ . This implies a bilevel optimization problem with $\theta$ as the upper-level variable and $\alpha$ as the lower-level variable. In the construction of low-level optimization regarding $\alpha$ , we apply a two-moment objective function concerning all losses. We name this bi-level optimization as Pareto Adversarial Training, which is formulated as:

		$\displaystyle\min_{\theta}\sum_{i=0}^{3}\mathbb{E}_{x}\left[\alpha_{i}^{*}(\theta)\mathcal{L}_{i}^{\theta}\right],$		(12)
		$\displaystyle\ \text{s.t.}\ \alpha^{}=\operatorname{arg\,min}_{\alpha}\sum_{i=0}^{3}\sum_{j=0}^{3}\mathbb{E}_{x}\left[(\alpha_{i}\mathcal{L}_{i}^{\theta}-\alpha_{j}\mathcal{L}_{j}^{\theta})^{2}\right],r=\sum_{i=1}^{3}\alpha_{i}\mathbb{E}_{x}\left[\mathcal{L}_{i}^{\theta}\right],\sum_{i=0}^{3}\alpha_{i}=1,\alpha_{i}\geq 0,\forall i=0,1,2,3,$		(12)

where $r$ indicates the expectation of one-moment over all robust losses, i.e., spatial and sensitivity-based losses, which reflects the strength of comprehensive robustness we require after solving this quadratic lower-level optimization regarding $\alpha$ . In particular, given the model parameter $\theta$ in each training step, the larger $r$ we require will push the resulting $\alpha_{i},i=1,2,3$ larger, thus increasing the weight of the robust losses rather than the natural loss to pursue more robustness.

Algorithm 1 Bi-level Optimization in Pareto Adversarial Training.

Input: Training data ( $\mathcal{X}$ , $\mathcal{Y}$ ). Batch size $M$ and adjustable hyper-parameter $r$ . Initialization of $\alpha$ as $[1/4,1/4,1/4,1/4]$ .
Output: Classifier $f_{\theta}$ .

1: repeat

2: Sample

\{\mathbf{y}_{1},...,\mathbf{y}_{M}\}

and

\{\mathbf{x}_{1},...,\mathbf{x}_{M}\}

from all training data.

3: / * Step 1: Compute loss in Eq. LABEL:eq_alpha * /

4: Compute natural loss

\mathcal{L}_{\text{nat}}

, and adversarial loss

\mathcal{L}_{\text{PGD}},\mathcal{L}_{\text{Flow}},\mathcal{L}_{\text{RT}}

based on natural cross entropy loss, PGD loss and Eq. 2 and 6, respectively.

5: / * Step 2: Upper-level Optimization over

\theta

* /

6: Given the current

\alpha

, update

f_{\theta}

by descending its stochastic gradient of:

	$\displaystyle\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}^{\text{CE}}(f_{\theta}(\mathbf{x}_{i}),\mathbf{y}_{i})$	$\displaystyle=\frac{1}{M}\sum_{i=1}^{M}\alpha_{0}\mathcal{L}_{\text{nat}}(f_{\theta}(\mathbf{x}_{i}),\mathbf{y}_{i})+\alpha_{1}\mathcal{L}_{\text{PGD}}(f_{\theta}(\mathbf{x}_{i}),\mathbf{y}_{i})$
		$\displaystyle+\alpha_{2}\mathcal{L}_{\text{Flow}}(f_{\theta}(\mathbf{x}_{i}),\mathbf{y}_{i})+\alpha_{3}\mathcal{L}_{\text{RT}}(f_{\theta}(\mathbf{x}_{i}),\mathbf{y}_{i})$

7: / * Step 3: Lower-level Optimization over

\alpha

* /

8: Compute

\hat{\mu}

and

\hat{\Sigma}

by sliding window technique in Eq. 13.

9: Evaluate

P

in the quadratic form shown in Eq. 13.

10: Solve Eq. 13 via CVXOPT tool to obtain the

\alpha

11: until Convergence

Two-Moment Objective Function. The two-moment form is a common practice in Pareto optimization. For example, in the financial portfolio theory, the mean-variance optimization is normally leveraged to compute the Pareto Efficient Front, where the risk of the asset portfolio, measured by their variances, is minimized to balance the different correlations of these assets given an expected return from the investor. Similarly, the square loss of the difference between each loss pair in Eq. LABEL:eq_alpha measures their mutual impacts. For instance, a decrease in $\mathcal{L}_{\text{PGD}}$ tends to increase $\mathcal{L}_{\text{RT}}$ as they have a fundamental trade-off relationship. We hope to mitigate all these mutual impacts, measured by the weighted quadratic differences, among all losses given an expected robustness level of $r$ . In the implementation, as we regard all losses as random variables with their stochasticity arising from the mini-batch sampling from data, we leverage the sliding windows technique to compute their expectations. Our bi-level optimization within a batch is (1) $\theta$ : update parameters $\theta$ via SGD and (2) $\alpha$ : solve $\alpha$ via quadratic programming. Denote the random variables $\mathcal{L}_{0},\mathcal{L}_{1},\mathcal{L}_{2},\mathcal{L}_{3}$ with mean vector $\mu$ and covariance matrix $\Sigma$ . We transform our lower-level optimization regarding $\alpha$ as the following standard quadratic form:

\displaystyle\min_{\theta,\alpha}

\displaystyle\ \ \alpha^{T}P\alpha\quad\text{ s.t. }

\displaystyle\left[\begin{array}[]{cccc}0&\mu_{1}&\mu_{2}&\mu_{3}\\ 1&1&1&1\\ \end{array}\right]\alpha=\left[\begin{array}[]{c}r\\ 1\end{array}\right],-\alpha\leq\textbf{0},

(13)

where $P=8(\text{diag}(\Sigma)+\text{diag}(\mu\mu^{T}))-2(\Sigma+\mu\mu^{T})$ . We utilize the CVXOPT tool to solve this quadratic optimization within each mini-batch training. CVXOPT is probably the most popular free software package for convex optimization based on the Python programming language that can solve quadratic programming effectively. We also provide proof of the quadratic formulation in D.

A detailed algorithm description is given in Algorithm 1. In the lower-level procedure of Pareto adversarial training, we solve the quadratic optimization regarding $\alpha$ given $\theta$ in each training step to obtain the optimal combinations among natural loss, sensitivity-based, and spatial adversarial loss. Then in the upper-level optimization, we leverage our familiar SGD method to update $\theta$ based on $\alpha^{*}$ calculated from the lower-level problem. Note that the computation complexity of our method is similar to Ave AT and Max AT, which are still competitive in computation.

4.4 Approximated Pareto Front

Dataset	Robustness Score ( $\%$ )	Natural Model	PGD AT	Spatial AT	Max AT	Ave AT	Pareto AT ( $r=2.2$ )
MNIST	Sensitivity-based Robustness	29.40	98.42	0.24	65.16	92.70	88.06
	Local Spatial Robustness	14.36	38.23	27.59	53.02	48.51	58.70
	Global Spatial Robustness	16.70	12.77	78.76	51.47	88.76	90.40
	Universal Robustness	0.0	88.97	46.14	109.19	169.52	176.71
Dataset	Robustness Score ( $\%$ )	Natural Model	PGD AT	Spatial AT	Max AT	Ave AT	Pareto AT ( $r=4.0$ )
CIFAR-10	Sensitivity-based Robustness	0.82	70.24	12.11	52.04	49.68	51.65
	Local Spatial Robustness	9.34	72.29	83.63	85.31	79.88	80.38
	Global Spatial Robustness	57.28	18.94	40.96	40.06	64.98	66.36
	Universal Robustness	0.0	94.04	69.26	109.98	127.11	130.96
Dataset	Robustness Score ( $\%$ )	Natural Model	PGD AT	Spatial AT	Max AT	Ave AT	Pareto AT ( $r=6.0$ )
Caltech-256	Sensitivity-based Robustness	4.74	82.43	6.94	59.81	71.60	76.52
	Local Spatial Robustness	34.59	87.96	88.75	65.89	86.67	87.39
	Global Spatial Robustness	49.73	21.71	65.04	64.64	53.68	50.00
	Universal Robustness	0.0	103.05	71.66	101.28	122.89	124.85

Table 1: Robustness Score of each type of adversarial robustness on MNIST, CIFAR-10, and Caltech-256. Each type of robustness (

\%

) is the average test accuracy under different strengths of perturbations. We choose the universal robustness of the Natural Model as the baseline and set it as 0. We use the difference in average test accuracy between other models and the Natural Model and then sum them as the universal robustness.

By adjusting the upper bound of the expected adversarial robustness loss $r$ , we can evenly generate Pareto optimal solutions where the obtained models will have different levels of robustness under optimal combinations. The set of all Pareto optimal solutions then forms the Pareto front. Rigorously, it is almost impossible to attain all Pareto optimal solutions for a general continuous multi-objective optimization problem unless a closed-form solution exists for each $r$ . Alternatively, we leverage the limited solutions obtained by solving a series of multi-objective optimization problems for various $r$ to approximate the Pareto front.

Thus, we train deep neural networks under different adversarial training strategies, i.e., PGD Adversarial Training (PGD AT), Spatial Adversarial Training (Spatial AT) proposed in Section 2.4, Max AT, Ave AT, and Pareto Adversarial Training (Pareto AT) under different $r$ , in which we apply a proper iteration. Then we evaluate their test accuracy under PGD, Flow-based, and RT attacks under different perturbation strengths. Next, we average the test accuracies for each type of attack, and the result is a quantitative measure of the specific robustness, called Robustness Score. To evaluate the universal robustness, we further compute the average of Robustness Scores for all kinds of robustness and use the increment over the naturally trained model as the metric called Universal Robustness Score. We report the Robustness Scores of all models on CIFAR-10 in Table 1, and the results on the other two datasets are similar. All implementation details are provided in E. It shows that Pareto AT ( $r=4.0$ ) has the best universal robustness score among all the models considered, although the highest specific robustness normally exists in the adversarial training model that only focuses on it.

Finally, we plot the universal robustness scores and the sacrificed clean accuracy of all methods across three datasets in Figure 7, where multiple Pareto AT models (red points) are trained under different $r$ . The Pareto criterion exhibited in Figure 7 provides an optimality principle, which enables Pareto Adversarial Training to achieve the best universal robustness among all the methods considered, given a certain tolerable level of sacrificed clean accuracy. By adjusting the expected universal robustness $r$ in Pareto Adversarial Training, we can develop the set of Pareto optimal solutions, i.e., the Pareto front. It shows that all other methods are above our Pareto front and are less effective than our proposal.

Robustness Score ( $\%$ )	Natural Model	Ave AT	Pareto AT ( $r=3.5$ )	Pareto AT ( $r=3.7$ )	Pareto AT ( $r=4.0$ )	Pareto AT ( $r=4.2$ )
Natural Accuracy	91.43	56.39	82.64	79.69	71.68	61.53
Sensitivity-based	0.82	64.11	53.73	58.70	63.28	65.19
Local Spatial	9.34	82.45	80.10	77.62	81.01	82.38
Global Spatial	57.28	51.36	66.57	67.56	59.69	52.04
Universal Robustness	0.0	197.92	200.39(+2.37)	203.88(+5.96)	203.98(+6.06)	199.61(+1.69)

Table 2: Robustness Score on CIFAR-10 with a larger step size 8/255 and

\epsilon

as 16/255 in PGD perturbations used for both Ave AT and Pareto AT across different

r

Overfitting Issue of Ave AT. Note that although the perturbation strength adopted in Table 1 is mild, we need to point out that the superiority of Pareto AT over Ave AT can be higher if the overfitting issue is severe. We demonstrate this claim in Table 2, where we apply a stronger PGD perturbation used in AT. Finally, we find that Ave AT overfits sensitivity robustness more severely, achieving much less universal robustness and sacrificing more clean accuracy than Pareto AT. Pareto Adversarial Training can mitigate the overfitting issue regarding an overly strong perturbation in AT because Pareto AT can automatically adjust the weights $\alpha$ while training, which is the key advantage of Pareto AT over Ave AT.

Sensitivity Analysis. Comparing universal adversarial robustness between Table 1 and Table 2, it can be seen that Pareto AT achieves more consistent universal adversarial robustness. In addition to this sensitivity analysis in terms of perturbation sizes, we also investigate the variation of universal adversarial robustness by changing the expected adversarial robustness loss $r$ . Results are provided in Table 2. It suggests that Pareto AT with a mild $r$ can achieve the best universal robustness score, while Pareto AT with an excessively large or small $r$ may not have sufficient universal robustness. Moreover, Pareto Front in Figure 7 also serves as the sensitivity analysis results in terms of different $r$ .

Overall, we conclude that Pareto adversarial training perfectly balances the mutual impacts of sensitivity-based robustness and spatial robustness under the Pareto criterion.

5 Discussion and Conclusion

The principal purpose of our work is to design a novel approach to achieve universal adversarial robustness. We first analyze the two main branches of spatial robustness and then integrate them into one attack and adversarial training design. Furthermore, we investigate the comprehensive relationships between sensitivity-based and two distinct spatial robustnesses from the perspective of representation. Based on the understanding of the mutual impacts of different kinds of adversarial robustness, we introduce the Pareto criterion into the adversarial training framework to develop Pareto Adversarial Training. The resulting Pareto front provides optimal solutions over existing baselines, given the universal robustness level we hope to attain. In the future, we hope to apply Pareto analysis to more general Out-of-Distribution generalization settings.

\Acknowledgements

Z. Lin was supported by National Key R $\&$ D Program of China (2022ZD0160300), the NSF China (No. 62276004), and Qualcomm.

References

[1] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
[2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), pages 39–57. IEEE, 2017.
[3] Zhaohui Che, Ali Borji, Guangtao Zhai, Suiyi Ling, Jing Li, Xiongkuo Min, Guodong Guo, and Patrick Le Callet. Smgea: A new ensemble adversarial attack powered by long-term gradient memories. IEEE Transactions on Neural Networks and Learning Systems, 2020.
[4] Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Max-margin adversarial (mma) training: Direct input space margin maximization through adversarial training. International Conference on Learning Representations, ICLR 2020, 2018.
[5] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In International Conference on Machine Learning, pages 1802–1811, 2019.
[6] Logan Engstrom, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A rotation and a translation suffice: Fooling cnns with simple transformations. arXiv preprint arXiv:1712.02779, 1(2):3, 2017.
[7] Yarin Gal. Uncertainty in deep learning. University of Cambridge, 1(3), 2016.
[8] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. International Conference on Learning Representations, 2014.
[9] Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, et al. The many faces of robustness: A critical analysis of out-of-distribution generalization. arXiv preprint arXiv:2006.16241, 2020.
[10] Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations, 2019.
[11] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in neural information processing systems, pages 2017–2025, 2015.
[12] Sandesh Kamath, Amit Deshpande, and KV Subrahmanyam. Invariance vs. robustness of neural networks. arXiv preprint arXiv:2002.11318, 2020.
[13] Il Yong Kim and OL De Weck. Adaptive weighted sum method for multiobjective optimization: a new method for pareto front generation. Structural and multidisciplinary optimization, 31(2):105–116, 2006.
[14] Il Yong Kim and Oliver L De Weck. Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structural and multidisciplinary optimization, 29(2):149–158, 2005.
[15] David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (rex). arXiv preprint arXiv:2003.00688, 2020.
[16] Man-Fai Leung and Jun Wang. A collaborative neurodynamic approach to multiobjective optimization. IEEE transactions on neural networks and learning systems, 29(11):5738–5748, 2018.
[17] Cong Li, Michael Georgiopoulos, and Georgios C Anagnostopoulos. Pareto-path multitask multiple kernel learning. IEEE transactions on neural networks and learning systems, 26(1):51–61, 2014.
[18] Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems, pages 6389–6399, 2018.
[19] Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qing-Fu Zhang, and Sam Kwong. Pareto multi-task learning. In Advances in Neural Information Processing Systems, pages 12060–12070, 2019.
[20] Qi Liu and Wujie Wen. Model compression hardens deep neural networks: A new perspective to prevent adversarial attacks. IEEE Transactions on Neural Networks and Learning Systems, 2021.
[21] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, ICLR 2018, 2017.
[22] Pratyush Maini, Eric Wong, and J Zico Kolter. Adversarial robustness against the union of multiple perturbation models. International Conference on Machine Learning, 2019.
[23] Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
[24] Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, and Percy Liang. Understanding and mitigating the tradeoff between robustness and accuracy. International Conference on Machine Learning, 2020.
[25] Mahmood Sharif, Lujo Bauer, and Michael K Reiter. On the suitability of lp-norms for creating and preventing adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1605–1613, 2018.
[26] Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, and Jingdong Wang. Informative dropout for robust representation learning: A shape-bias perspective. International Conference on Machine Learning, 2020.
[27] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
[28] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[29] Richard Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
[30] Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, and Jörn-Henrik Jacobsen. Fundamental tradeoffs between invariance and sensitivity to adversarial perturbations. ICML, 2020.
[31] Florian Tramèr and Dan Boneh. Adversarial training and robustness for multiple perturbations. In Advances in Neural Information Processing Systems, pages 5866–5876, 2019.
[32] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry. Robustness may be at odds with accuracy. International Conference on Learning Representations, ICLR 2019, 2018.
[33] Vladimir N Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity, pages 11–30. Springer, 2015.
[34] Haotao Wang, Tianlong Chen, Shupeng Gui, Ting-Kuei Hu, Ji Liu, and Zhangyang Wang. Once-for-all adversarial training: In-situ tradeoff between robustness and accuracy for free. Advances in Neural Information Processing Systems, 2020.
[35] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformed adversarial examples. International Conference on Learning Representations, ICLR 2018, 2018.
[36] Nanyang Ye, Qianxiao Li, Xiao-Yun Zhou, and Zhanxing Zhu. An annealing mechanism for adversarial training acceleration. IEEE Transactions on Neural Networks and Learning Systems, 2021.
[37] Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. A fourier perspective on model robustness in computer vision. In Advances in Neural Information Processing Systems, pages 13276–13286, 2019.
[38] Milan Zeleny. Multiple criteria decision making Kyoto 1975, volume 123. Springer Science & Business Media, 2012.
[39] Haichao Zhang and Jianyu Wang. Joint adversarial training: Incorporating both spatial and pixel attacks. arXiv preprint arXiv:1907.10737, 2019.
[40] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled trade-off between robustness and accuracy. International Conference on Machine Learning, 2019.
[41] Tianyuan Zhang and Zhanxing Zhu. Interpreting adversarially trained convolutional neural networks. International Conference on Machine Learning, 2019.
[42] Eckart Zitzler and Lothar Thiele. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE transactions on Evolutionary Computation, 3(4):257–271, 1999.

Appendix A Visualization of Various Attacks

To better present the visual effect of various kinds of adversarial attacks, we provide high-resolution results on Caltech- $256$ in Figure 8. It turns out that Flow-based attacks focus on local spatial vulnerability that mainly blurs pixels in some local regions, while RT attacks cause a shape-based global spatial transformation. More importantly, our integrated spatial attacks are more comprehensive in the sense of spatial robustness, combining both local and local spatial sensitivity.

Appendix B Proof of Proposition 1

Proof.

Firstly, we have the following equations according to the definitions of the loss function:

	$\displaystyle\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{F}},y)$	$\displaystyle=\log\sum_{i=1}^{K}\exp\left(f_{\theta}^{i}(x_{w_{F}})\right)-f_{\theta}^{y}(x_{w_{F}})$		(14)
	$\displaystyle\mathcal{L}^{S}_{\theta}(x_{w_{F}},y)$	$\displaystyle=\log\sum_{i\neq y}\exp\left(f_{\theta}^{i}(x_{w_{F}})\right)-f_{\theta}^{y}(x_{w_{F}})$		(14)

Then, we compute their gradients for the flow vector $x_{w_{F}}$ . The gradient of $\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{F}},y)$ is shown as follows:

		$\displaystyle\nabla_{w_{F}}\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{F}},y)$		(15)
		$\displaystyle=\frac{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))\cdot\nabla_{x_{w_{F}}}f_{\theta}^{i}(x_{w_{F}})\cdot\nabla_{w_{F}}x_{w_{F}}}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}-\nabla_{x_{w_{F}}}f_{\theta}^{y}(x_{w_{F}})\cdot\nabla_{w_{F}}x_{w_{F}}$
		$\displaystyle=\frac{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))\nabla_{w_{F}}x_{w_{F}}(\nabla_{x_{w_{F}}}f_{\theta}^{i}(x_{w_{F}})-\nabla_{x_{w_{F}}}f_{\theta}^{y}(x_{w_{F}}))}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}$

Similarly, the gradient of $\mathcal{L}^{S}_{\theta}(x_{w_{F}},y)$ is:

\displaystyle\nabla_{w_{F}}\mathcal{L}^{S}_{\theta}(x_{w_{F}},y)=\frac{1}{\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}\cdot(\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))\nabla_{w_{F}}x_{w_{F}}(\nabla_{x_{w_{F}}}f_{\theta}^{i}(x_{w_{F}})-\nabla_{x_{w_{F}}}f_{\theta}^{y}(x_{w_{F}})))

(16)

Then we take the multiplication of $\nabla_{w_{F}}\mathcal{L}^{S}_{\theta}(x_{w_{F}},y)$ by a term $\frac{\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}$ , finally we attain:

$\displaystyle\nabla_{w_{F}}\mathcal{L}^{S}_{\theta}(x_{w_{F}},y)\cdot\frac{\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}=$	$\displaystyle\frac{\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))\nabla_{w_{F}}x_{w_{F}}(\nabla_{x_{w_{F}}}f_{\theta}^{i}(x_{w_{F}})-\nabla_{x_{w_{F}}}f_{\theta}^{y}(x_{w_{F}}))+0}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}$	(17)
$\displaystyle=$	$\displaystyle\frac{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))\nabla_{w_{F}}x_{w_{F}}(\nabla_{x_{w_{F}}}f_{\theta}^{i}(x_{w_{F}})-\nabla_{x_{w_{F}}}f_{\theta}^{y}(x_{w_{F}}))}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}$
$\displaystyle=$	$\displaystyle\nabla_{w_{F}}\mathcal{L}^{\text{CE}}_{\theta}(x_{w_{F}},y)$

Finally, we denote $\frac{\sum_{i\neq y}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}{\sum_{i=1}^{K}\exp(f_{\theta}^{i}(x_{w_{F}}))}$ as $r(x_{w_{F}},y)$ . ∎

Appendix C Proof of Proposition 2

Proof.

Let f as the minimizer, e.g., the neural networks after the optimization:

	$\displaystyle f^{\star}$	$\displaystyle\in\min_{f}\max_{i}\mathcal{R}(f,S_{i})$		(18)
	$\displaystyle M^{\star}$	$\displaystyle=\max_{i}\mathcal{R}(f,S_{i})$		(18)

Then the optimization can be equivalent to a constrained version:

	$\displaystyle\min_{f,M}$	$\displaystyle\ \ M$		(19)
	s.t.	$\displaystyle\mathcal{R}(f,S_{i})\leq M\text{ for all }S_{i}\in\mathcal{S}$		(19)

with Lagrangian $L(f,M,\lambda)=M+\sum_{S_{i}\in\mathcal{S}}\lambda_{i}\left(\mathcal{R}(f,S_{i})-M\right)$ . If this optimization problem satisfies KKT condition, then $\exists\lambda_{i}\geq 0$ with $\nabla_{f}L\left(f^{\star},M^{\star},\lambda\right)=0$ such that

\left.\nabla_{f}\right|_{f=f^{\star}}\sum_{S_{i}\in\mathcal{S}}\lambda_{i}\mathcal{R}(f,S_{i})=0

∎

Remark

We point out that our conclusion is made under the assumption that the KKT condition holds and the stationary point of $f$ regarding the Lagrangian function can be attained, which normally requires the convexity condition. However, under these assumptions, we can still establish the close correlation between Max AT and Ave AT, indicating they are likely to perform similarly in many cases.

Appendix D Optimization analysis on the Pareto Adversarial Training and Algorithm

We provide the proof of $P$ in the following:

Proof.

$\displaystyle\sum_{i=0}^{3}\sum_{j=0}^{3}\mathbb{E}(\alpha_{i}\mathcal{L}_{i}-\alpha_{j}\mathcal{L}_{j})^{2}=$	$\displaystyle\sum_{i=0}^{3}\sum_{j=0}^{3}\mathbb{E}((\alpha_{i}\mathcal{L}_{i}-\mathbb{E}(\alpha_{i}\mathcal{L}_{i}))-(\alpha_{j}\mathcal{L}_{j}-\mathbb{E}(\alpha_{j}\mathcal{L}_{j}))+(\mathbb{E}(\alpha_{i}\mathcal{L}_{i})-\mathbb{E}(\alpha_{j}\mathcal{L}_{j})))^{2}$	(20)
$\displaystyle=$	$\displaystyle\sum_{i=0}^{3}\sum_{j=0}^{3}\mathbb{E}((\alpha_{i}\mathcal{L}_{i}-\mathbb{E}(\alpha_{i}\mathcal{L}_{i})-(\alpha_{j}\mathcal{L}_{j}-\mathbb{E}(\alpha_{j}\mathcal{L}_{j}))^{2}+(\mathbb{E}(\alpha_{i}\mathcal{L}_{i})-\mathbb{E}(\alpha_{j}\mathcal{L}_{j}))^{2}+0$
$\displaystyle=$	$\displaystyle\sum_{i=0}^{3}\sum_{j=0}^{3}(\alpha_{i}^{2}\sigma_{ii}+\alpha_{j}^{2}\sigma_{jj}-2\alpha_{i}\alpha_{j}\sigma_{ij})+(\alpha_{i}^{2}\mu_{i}^{2}+\alpha_{j}^{2}\mu_{j}^{2}-2\alpha_{i}\alpha_{j}\mu_{i}\mu_{j})$
$\displaystyle=$	$\displaystyle\ 8\alpha^{T}\text{diag}(\Sigma)\alpha-2\alpha^{T}\Sigma\alpha+8\alpha^{T}\text{diag}(\mu\mu^{T})\alpha-2\alpha^{T}(\mu\mu^{T})\alpha$
$\displaystyle=$	$\displaystyle\ \alpha^{T}(8(\text{diag}(\Sigma)+\text{diag}(\mu\mu^{T}))-2(\Sigma+\mu\mu^{T}))\alpha$

∎

Appendix E Implementation

Implementation Details. For MNIST comparison, we train the Simple CNN in [40] on MNIST for $100$ epochs. As for the CIFAR- $10$ dataset, we choose the widely used Pre-Act ResNet- $18$ with grouped normalization and trained the network for $76$ epochs. The other details of our implementation on MNIST and CIFAR- $10$ are based on [40], while the implementation on Caltech- $256$ has to refer to [41] with $10$ epochs to finetune a pre-trained ResNet- $18$ .

•

PGD Attack. We apply the widely accepted setting on these three datasets. We set step size as $0.01$ , $\epsilon$ as $0.3$ on MNIST while the step size is $0.007$ and $\epsilon$ is $0.031$ on both CIFAR- $10$ and Caltech- $256$ datasets. To evaluate the different levels of robustness, we evaluate PGD attack under $10,20,30,40$ iterations on MNIST and $5,10,15,20$ iterations on CIFAR- $10$ and Caltech- $256$ datasets.
•

Flow-based and RT Attacks. On MNIST, we set step size $\alpha_{F}$ and $\alpha_{RT}$ as $0.01$ and $0.1$ , and choose $\epsilon_{F},\epsilon_{RT}$ as $0.3$ . We select $5,10,15,20$ as the attack iterations for the evaluation of both two attacks. On CIFAR- $10$ , we set step size $\alpha_{F}$ as $1e-3$ and $\alpha_{RT}$ as $0.05$ , and choose $\epsilon_{F},\epsilon_{RT}$ as $0.3,1.0$ . We select $3,5,10,15$ as the attack iterations for the evaluation of both two attacks. On Caltech- $256$ , we set step size $\alpha_{F}$ as $1e-5$ and $\alpha_{RT}$ as $0.1$ , and choose $\epsilon_{F},\epsilon_{RT}$ as $0.3$ and $1.0$ for the two attacks, respectively. We select $3,5,10,15$ as the attack iterations for the evaluation of both two attacks.
•

PGD Adversarial Training. We choose PGD iterations as 30, 3, and 5 in the PGD adversarial training on MNIST, CIFAR-10, and Caltech-256, respectively. The adversarial attack strength is the same as PGD attacks for each dataset, respectively.
•

Spatial Adversarial Training. Our integrated spatial adversarial training is based on our proposed integrated spatial attacks that unify both Flow-based and RT-based attacks. We set the iterations as $20$ , $5$ , $10$ and on MNIST, CIFAR- $10$ and Caltech- $256$ , respectively. Other hyper-parameters are the same as those in their corresponding attacks.
•

Pareto Adversarial Training. The parameter $r$ is the measure of comprehensive adversarial robustness. We select a sequence of $r$ to train multiple Pareto Adversarial training models. Particularly, on MNIST, we choose $r$ in $[0.2,0.5,0.8,1.0,1.2,1.5,1.8,2.0,2.2]$ , and $r$ in $[0.5,1.0,1.25,1.5,2.25,3.0,3.5,4.0]$ on CIFAR- $10$ and Caltech- $256$ . Other parameters follow the corresponding methods above, respectively.