This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

Xiaogang Xu, Hengshuang Zhao, Philip Torr, and Jiaya Jia X. Xu and J. Jia are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong. E-mail: {xgxu, leojia}@cse.cuhk.edu.hk. H. Zhao is with the Department of Computer Science, The University of Hong Kong. E-mail: [email protected]. P. Torr is with the Department of Engineering Science, University of Oxford. E-mail: [email protected].
Abstract

Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial attack that is highly transferable. This threat comes from the distribution gap between adversarial and clean samples in feature space of the target DNNs. In this paper, we use Deep Generative Networks (DGNs) with a novel training mechanism to eliminate the distribution gap. The trained DGNs align the distribution of adversarial samples with clean ones for the target DNNs by translating pixel values. Different from previous work, we propose a more effective pixel-level training constraint to make this achievable, thus enhancing robustness on adversarial samples. Further, a class-aware feature-level constraint is formulated for integrated distribution alignment. Our approach is general and applicable to multiple tasks, including image classification, semantic segmentation, and object detection. We conduct extensive experiments on different datasets. Our strategy demonstrates its unique effectiveness and generality against black-box attacks.

Index Terms:
Deep Generative Model, Adversarial Defense, Distribution Alignment.

1 Introduction

Most deep learning models are vulnerable to adversarial samples [1, 2, 3, 4] that are maliciously generated to fool the target model by adding adversarial perturbation to the original input. Such perturbations are imperceptible to the human visual system, severely threatening real-world deep learning applications of face recognition [5, 6], self-driving cars [7, 8], etc. It is hard to have full knowledge of the target model in practice, and attack often adopts black-box mechanism, utilizing the transferability of adversarial samples. In this paper, we focus on defending such attack, following the setting of [9, 10].

To safeguard DNNs from black-box adversarial attack, a major class of adversarial defense applies input transformation to the input samples before they are processed by target DNNs. A fundamental strategy is to use image processing operations without learning (e.g., image compression [11] and quilting [12]). Alternatively, learning-based methods [13, 14, 9] train Deep Generative Networks (DGNs) to accomplish such transformation, resulting in higher robustness. The network is trained with adversarial/clean samples as input, and synthesizes output, on which the target model works well. The essence of these methods is to weaken difference between adversarial and the corresponding clean samples via updating their pixel values with the network. Current approaches to train such DGNs can be divided into three categories: 1) setting pixel-level constraints for reduction of the distance between adversarial and clean samples [15, 13]; 2) adopting constraints in the feature level of target models [14]; 3) and employing constraints in the pixel and feature level [9, 16]. All these methods ignore the overall distribution alignment in feature spaces of target models, which is a potential problem affecting robustness.

Refer to caption Refer to caption Refer to caption Refer to caption
(a) xcx^{c} in 𝒪\mathcal{O} (b) xax^{a} in 𝒪\mathcal{O} (c) x^c\widehat{x}^{c} in 𝒪\mathcal{O} (d) x^a\widehat{x}^{a} in 𝒪\mathcal{O}
Figure 1: t-SNE visualizations in the feature space of the target model on CIFAR10 [17]. The target model 𝒪\mathcal{O} has proper distribution for clean samples xcx^{c} and disordered distribution for adversarial samples xax^{a}. Our trained generator 𝒢\mathcal{G} turns xcx^{c} into x^c\widehat{x}^{c}, xax^{a} into x^a\widehat{x}^{a} for correct distribution.
Refer to caption

(a) Pixel level training constraints

Refer to caption

(b) Feature level training constraints

Figure 2: Our overall framework for the training of deep generative network 𝒢\mathcal{G}. To align the distribution of clean samples xcx^{c} and adversarial samples xax^{a} for the target model, the training constraints are set in the pixel and feature levels of the target model 𝒪\mathcal{O}.

In this paper, we train DGNs to protect target models via aligning the distribution of clean and adversarial samples in the feature spaces of the target models. Compared with existing methods, novel training constraints are introduced in the pixel and feature levels. In the pixel level, we match adversarial samples with clean ones in the output space of DGNs. While in the feature level, we design a class-aware constraint by aligning the central feature of clean and adversarial samples within each class, maximizing the inter-class distance as well as minimizing the intra-class distance for all categories. Especially, our trained DGNs can be generalized to the protection of models that have not appeared during training.

DGNs trained with our method align the distribution of adversarial samples to clean ones, as exhibited in Fig. 1. By design, our defense is general for several high-level computer vision tasks. We apply it to image classification, semantic segmentation, and object detection, with the help of diverse datasets, models, and attack.

In summary, our contribution is the following.

  • Novel input transformation strategy to achieve defense against black-box attacks by distribution alignment for clean and adversarial samples, blocking the transferability of unseen adversarial samples effectively.

  • New training constraints in both pixel and feature levels of target models.

  • Extensive experiments on various tasks. Our method yields high robustness, effectiveness, and generality.

2 Related Work

Adversarial attack. Adversarial attack involves white-box attack [18, 2], where attackers have full knowledge of the target model and the defense strategy; gray-box attack [12], where attackers have access to the target model while no access to the defense strategy; black-box attack [19], where attackers do not know the target model and the defense, and often use adversarial samples’ transferability for attack. Existing attacks carried out on the classification task usually compute or simulate the gradient information of target models [2, 20, 21, 22]. Meanwhile, semantic segmentation [4, 23, 3] and object detection networks [4, 24, 25, 26, 27, 28] are also vulnerable to adversarial attack. It is common sense that the black-box attack is more common than the white-box and gray-box attack for real-world applications, and it is worth exploring how to defend such attacks for different tasks.

Adversarial defense. There have been several strategies for defense to eliminate threat of adversarial perturbation. A major class of defense transforms the input images for high robustness [12, 29]. Such approaches translate pixel values of adversarial/clean samples to remove the influence of highly transferable adversarial perturbation.

Current input transformation based defenses that employ DGNs can be divided into three categories, according to their training constraints. 1) Using pixel-level constraints to reduce the differences of pixel values between clean and adversarial samples [15, 13, 30, 31, 32, 33]; 2) applying feature-level constraints to unify representations of clean and adversarial samples in feature space of the target model [14]; 3) simultaneously setting pixel- and feature-level constraints, which are proved to be more advantageous [9, 16]. We note in the feature level, existing approaches only set the distance between clean and adversarial samples as the constraint to optimize, without alignment of distributions in feature space. For example, Huang et.al. [33] proposed to train a network with the mechanism of predictive perturbation-aware filtering, which can remove the adversarial perturbation through a denoising operation. In this paper, we propose to implement a novel input transformation strategy, where the DGNs are trained with exact pixel-level and feature-level distribution alignment.

Moreover, besides the lack of pixel-level alignment, although some existing adversarial training approaches have considered the feature level alignment, our method has noticeable differences with them. For instance, Mustafa et.al. [34, 35] designed prototype conformity loss which can force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes, while such a loss can not achieve the distribution alignment for adversarial samples and clean samples in the deep feature space. Hou et.al. [36] proposed to set a discriminator and adopt an adversarial learning strategy, which is similar to GAN training [37], to make the features of adversarial samples and clean samples indistinguishable. However, such adversarial learning can not explicitly enforce the complete distribution alignment, leading to the misalignment in the deep feature space with high probability. Song et.al. [38] proposed to incorporate the constraints of distribution alignment in the adversarial training. Nevertheless, such a method has not considered achieving an exact distribution alignment which is our target: not only the distribution shapes of adversarial and clean samples are aligned, but also the features of the paired individuality (the paired adversarial and the corresponding clean sample) are also aligned.

Refer to caption
(a) Visual illustration for traditional pixel-level training constraints
Refer to caption
(b) Visual illustration for our pixel-level training constraints
Figure 3: Illustration for the differences between traditional and our pixel-level constraints.

3 Our Method

We train a network 𝒪\mathcal{O} for one task 𝒮\mathcal{S}, where 𝒮\mathcal{S} can be image classification, semantic segmentation, or object detection. The network 𝒪\mathcal{O} is usually trained with a set of clean samples xcx^{c} and we suppose xc𝒞x^{c}\sim\mathcal{C}, where 𝒞\mathcal{C} is the distribution of clean samples. Trained network 𝒪\mathcal{O} behaves decently on xc𝒞x^{c}\sim\mathcal{C} for task 𝒮\mathcal{S}, while its performance remarkably degrades after adding adversarial perturbation ϵ\epsilon to xcx^{c}.

Adversarial samples are denoted as xax^{a} (xa=xc+ϵx^{a}=x^{c}+\epsilon), and we represent the distribution of adversarial samples as 𝒜\mathcal{A}. As shown in Fig. 1(a)&(b), although the clean samples xcx^{c} and the adversarial samples xax^{a} are indistinguishable, there is a large gap between 𝒞\mathcal{C} and 𝒜\mathcal{A} in feature space of the target model 𝒪\mathcal{O} that causes 𝒪\mathcal{O} to fail on adversarial samples.

To eliminate the threat of adversarial samples, we align 𝒞\mathcal{C} and 𝒜\mathcal{A} for a target model 𝒪\mathcal{O} as exhibited in Fig. 1(c)&(d). Based on this motivation, we propose to train a network 𝒢\mathcal{G} that aligns 𝒞\mathcal{C} and 𝒜\mathcal{A} in the feature space of the target model by modifying the pixel values of xcx^{c} and xax^{a}.

As a result, 𝒢\mathcal{G} translates xcx^{c} into x^c\widehat{x}^{c}, xax^{a} into x^a\widehat{x}^{a}, and 𝒪\mathcal{O} achieves decent results on x^c\widehat{x}^{c} and x^a\widehat{x}^{a}. To train 𝒢\mathcal{G}, we set constraints for alignment in both pixel and feature level as shown in Fig. 2, and push the distribution of x^c\widehat{x}^{c} and x^a\widehat{x}^{a} to approach 𝒞\mathcal{C} since 𝒪\mathcal{O} performs well on xc𝒞x^{c}\sim\mathcal{C}. Furthermore, it is proved in Sec. 4.9 that the trained 𝒢\mathcal{G} can also be generalized to protect the models 𝒪\mathcal{O}^{\prime} that have not appeared during training.

3.1 Pixel-level Alignment

Motivation. The pixel-level training constraints are employed to weaken the difference in terms of pixel RGB values between clean and adversarial samples, and promote alignment in the feature space of the target model 𝒪\mathcal{O}. Previous work validated that “distance metric between clean and adversarial samples” and “adversarial learning” [37] are two practical pixel-level training constraints. Traditional pixel-level constraints [15, 13, 30, 31, 32] mainly adopt xcx^{c} to guide formulation of x^c\widehat{x}^{c} and x^a\widehat{x}^{a} through the generator 𝒢\mathcal{G}, as displayed in Fig. 3(a). They compute the distance metric between x^c\widehat{x}^{c} and xcx^{c}, x^a\widehat{x}^{a} and xcx^{c}. And they set xcx^{c} as real samples, x^c\widehat{x}^{c} and x^a\widehat{x}^{a} as fake samples to conduct adversarial learning.

We propose a scheme for pixel-level training constraints, where we use xcx^{c} to guide the formulation of x^c\widehat{x}^{c} and utilize x^c\widehat{x}^{c} to help matching xax^{a}, as shown in Fig. 3(b). In this setting, x^c\widehat{x}^{c} is intermediate to shorten the discrepancy between x^a\widehat{x}^{a} and xcx^{c} effectively. Comprehensive experiments illustrate that our novel setting results in conspicuous improvement for performance on adversarial samples, compared to the traditional schemes.

Our pixel-level training constraints also include the distance metric as well as adversarial learning.

Reconstruction loss. Given clean samples xcx^{c}, adversarial samples xax^{a} are obtained by adding adversarial perturbation ϵ\epsilon to xcx^{c}, and the generator 𝒢\mathcal{G} synthesizes output x^c\widehat{x}^{c} and x^a\widehat{x}^{a} with input of xcx^{c} and xax^{a}. Based on it, we define the reconstruction loss term r\mathcal{L}_{{r}} as

x^a=𝒢(xa),x^c=𝒢(xc),\displaystyle\widehat{x}^{a}=\mathcal{G}(x^{a}),\ \widehat{x}^{c}=\mathcal{G}(x^{c}), (1)
r=𝔼(x^cxc1)+𝔼(x^ax^c1),\displaystyle\mathcal{L}_{{r}}=\mathbb{E}(\|\widehat{x}^{c}-x^{c}\|_{1})+\mathbb{E}(\|\widehat{x}^{a}-\widehat{x}^{c}\|_{1}),

where 𝔼\mathbb{E} is to compute the mean value, 1\|\,\|_{1} is the L1L_{1} Euclidean distance. Moreover, to help synthesize images with high resolutions, we use a perceptual loss term [39, 40] p\mathcal{L}_{{p}}, by computing the reconstruction distance in the feature space of an ImageNet-pretrained VGG-16 network [41] for x^c\widehat{x}^{c} and xcx^{c}, as well as x^a\widehat{x}^{a} and x^c\widehat{x}^{c}. Not that the perceptual loss term is not the constraint for feature-level alignment, since the VGG-16 network is not task-specific and we mainly employ the perceptual loss for the visual similarity of x^c\widehat{x}^{c} and xcx^{c} (or x^a\widehat{x}^{a} and x^c\widehat{x}^{c}) in the pixel level.

Adversarial loss. To match the global pattern for 𝒞\mathcal{C} and 𝒜\mathcal{A} at the pixel level, we implement loss terms in adversarial learning with sampling. We employ a discriminator 𝒟\mathcal{D}, and set the loss terms in the form of LSGAN [42] as

GANd=\displaystyle\mathcal{L}_{{GAN}_{d}}= 𝔼xc𝒞((𝒟(x^c)1)2)+\displaystyle\mathbb{E}_{x^{c}\sim\mathcal{C}}((\mathcal{D}(\widehat{x}^{c})-1)^{2})+ (2)
𝔼xa𝒜((𝒟(x^a)0)2),\displaystyle\mathbb{E}_{x^{a}\sim\mathcal{A}}((\mathcal{D}(\widehat{x}^{a})-0)^{2}),
GANg=\displaystyle\mathcal{L}_{{GAN}_{g}}= 𝔼xa𝒜((𝒟(x^a)1)2),\displaystyle\mathbb{E}_{x^{a}\sim\mathcal{A}}((\mathcal{D}(\widehat{x}^{a})-1)^{2}),

where GANd\mathcal{L}_{{GAN}_{d}} is set for the discriminator, and GANg\mathcal{L}_{{GAN}_{g}} is adopted for the generator. Additionally, the feature match loss is adopted as an auxiliary part of the adversarial loss [43]. We obtain intermediate features from 𝒟\mathcal{D} for fake and real samples, and compute their distances as

m=𝔼((x^c)(x^a)1),\begin{split}\mathcal{L}_{{m}}&=\mathbb{E}(\|\mathcal{F}(\widehat{x}^{c})-\mathcal{F}(\widehat{x}^{a})\|_{1}),\end{split} (3)

where (x)\mathcal{F}(x) are intermediate features obtained from the discriminator for real or fake samples xx.

Algorithm 1 Our strategy to train the generator 𝒢\mathcal{G} for the target model 𝒪\mathcal{O} on the target task 𝒮\mathcal{S}

Parameter: Training data (xc,yc)(x^{c},y^{c}), initialized 𝒢\mathcal{G} and 𝒟\mathcal{D}, maximum number of iteration TmaxT_{max}, number of iteration T0T\leftarrow 0

1:while TTmaxT\not=T_{max} do
2:     Read a minibatch of data Db={x1c,,xbc}D_{b}=\{x^{c}_{1},...,x^{c}_{b}\}, Yb={y1c,,ybc}Y_{b}=\{y^{c}_{1},...,y^{c}_{b}\}.
3:     Use the chosen attack algorithm and 𝒪\mathcal{O} to generate adversarial samples Ab={x1a,,xba}A_{b}=\{x^{a}_{1},...,x^{a}_{b}\}.
4:     Compute r\mathcal{L}_{{r}}, p\mathcal{L}_{{p}}, GANg\mathcal{L}_{{GAN}_{g}} and m\mathcal{L}_{{m}}, using DbD_{b}, AbA_{b} and the discriminator 𝒟\mathcal{D}.
5:     Forward DbD_{b} and AbA_{b} through 𝒪\mathcal{O} to obtain features of KK classes, according to YbY_{b}.
6:     Compute the clustering center of each class according to the obtained features in this batch.
7:     Compute Fclass\mathcal{L}_{{F}_{class}} using features of KK classes and the corresponding clustering centers.
8:     Compute g\mathcal{L}_{g} to update the generator 𝒢\mathcal{G}, compute GANd\mathcal{L}_{{GAN}_{d}} to update the discriminator 𝒟\mathcal{D}. TT+1T\leftarrow T+1
9:end while

Theoretical analysis. In the traditional setting, to weaken the discrepancy between adversarial samples with the corresponding clean samples, x^axc1\|\widehat{x}^{a}-x^{c}\|_{1} is set as the objective to optimize for training 𝒢\mathcal{G}. However, x^axc1\|\widehat{x}^{a}-x^{c}\|_{1} is large at the beginning of training and there exist multiple solutions x^a\widehat{x}^{a} that have the same value for x^axc1\|\widehat{x}^{a}-x^{c}\|_{1} at each step of training. Thus, this setting is ill-posed during training and is very likely to result in local optima (i.e., x^a\widehat{x}^{a} generated from 𝒢\mathcal{G} does not approach xcx^{c} enough).

In our pixel-level constraint, we instead set x^ax^c1\|\widehat{x}^{a}-\widehat{x}^{c}\|_{1} as the objective. The distance between x^a\widehat{x}^{a} and x^c\widehat{x}^{c} is narrower than the distance between x^a\widehat{x}^{a} and xcx^{c}, since x^a\widehat{x}^{a} and x^c\widehat{x}^{c} locate in the same output space of 𝒢\mathcal{G}. To demonstrate this, we adopt Proxy-A distance [44] to measure the distance between two domains’ distributions. Given the generalization error κ\kappa on discriminating between the target and source samples, the Proxy-A distance is defined as 2(12κ)2(1-2\kappa). It is observed that the Proxy-A distance between x^a\widehat{x}^{a} and x^c\widehat{x}^{c} is smaller at the beginning of training, always shorter than the Proxy-A distance between x^a\widehat{x}^{a} and xcx^{c}, as shown in Fig. 4. It finally approaches zero.

Thus, our setting has a smaller solution space during training and is decent to avoid local optima, since it is simpler to train 𝒢\mathcal{G} to make x^a\widehat{x}^{a} very close to x^c\widehat{x}^{c}. And we find that the optimization between x^a\widehat{x}^{a} and x^c\widehat{x}^{c} can also lead to more narrow distance between x^a\widehat{x}^{a} and xcx^{c}. It is verified by the visualizations in Fig. 1 and Fig. 4.

Moreover, the target model 𝒪\mathcal{O} performs well on both xcx^{c} and x^c\widehat{x}^{c}, validated in experiments. x^a\widehat{x}^{a} obtained from our setting is close to xcx^{c} because x^c\widehat{x}^{c} is close to xcx^{c}, and x^a\widehat{x}^{a} is very near x^c\widehat{x}^{c}. Therefore, 𝒪\mathcal{O} behaves better on x^a\widehat{x}^{a} with our setting than traditional ones.

Refer to caption
Figure 4: The visualization of Proxy-A distance between “x^a\widehat{x}^{a} and xcx^{c}” as well as “x^a\widehat{x}^{a} and x^c\widehat{x}^{c}” at each training epoch. With our pixel-level training constraint, x^a\widehat{x}^{a} can be better aligned with x^c\widehat{x}^{c} and xcx^{c}.

3.2 Feature-level Alignment

Motivation. Besides the pixel-level training constraints, recent work revealed the necessity of feature-level training constraints for the target model 𝒪\mathcal{O} [14, 9] to enhance protection. Feature-level alignment helps formulate robust representation for networks of a given task. Existing methods formulate feature-level training constraints as the distances in feature space of 𝒪\mathcal{O} between synthesized and clean samples. However, they ignore the constraint for the alignment of overall 𝒞\mathcal{C} and 𝒜\mathcal{A} in feature level. We instead propose a class-aware feature-level training constraint, besides the distance metric in the feature space of 𝒪\mathcal{O}. Our class-aware constraint aligns the integrated distribution of clean and adversarial samples within each category for the target model, minimizing the intra-class distance and maximizing the inter-class distance for adversarial samples’ distribution.

Task-oriented loss. Suppose 𝒪\mathcal{O} is trained with the paired data (xc,yc)(x^{c},y^{c}) where ycy^{c} is the ground truth for xcx^{c}. And the loss term to train 𝒪\mathcal{O} can be represented as o(xc,yc)\mathcal{L}_{{o}}(x^{c},y^{c}) (e.g., o\mathcal{L}_{{o}} is the cross-entropy loss for image classification and semantic segmentation, the loss for bounding box regression and classification in the object detection task). To align behaviors of clean and adversarial samples in feature space of 𝒪\mathcal{O}, we adopt a feature-level loss as

Ftask=o(x^c,yc)+o(x^a,yc).\begin{split}\mathcal{L}_{F_{task}}&=\mathcal{L}_{{o}}(\widehat{x}^{c},y^{c})+\mathcal{L}_{{o}}(\widehat{x}^{a},y^{c}).\end{split} (4)

Reconstruction loss. Similar to the reconstruction loss in pixel level, we set loss Frec\mathcal{L}_{{F}_{rec}} to minimize distances between adversarial and clean samples in the feature space of 𝒪\mathcal{O} as

z^a=𝒪(x^a),z^c=𝒪(x^c),zc=𝒪(xc),\displaystyle\widehat{z}^{a}=\mathcal{O}(\widehat{x}^{a}),\ \widehat{z}^{c}=\mathcal{O}(\widehat{x}^{c}),\ z^{c}=\mathcal{O}(x^{c}), (5)
Frec=𝔼(z^czc1)+𝔼(z^azc1),\displaystyle\mathcal{L}_{{F}_{rec}}=\mathbb{E}(\|\widehat{z}^{c}-z^{c}\|_{1})+\mathbb{E}(\|\widehat{z}^{a}-z^{c}\|_{1}),

where z^a\widehat{z}^{a}, z^c\widehat{z}^{c} and zcz^{c} are the intermediate features of x^a\widehat{x}^{a}, x^c\widehat{x}^{c} and xcx^{c} in the feature space of 𝒪\mathcal{O}. The choice of feature space to compute feature-level constraints for different tasks is illustrated in the supplementary file.

Distribution alignment loss. We also note previous methods do not consider aligning the overall distribution of 𝒞\mathcal{C} and 𝒜\mathcal{A} in the feature space of 𝒪\mathcal{O}. In contrast, we formulate our class-aware training constraint for the overall distribution alignment. Suppose there are KK classes within the distribution of 𝒞\mathcal{C} and 𝒜\mathcal{A}. We represent zc(k)z^{c(k)} and z^a(k)\widehat{z}^{a(k)} as the features for xcx^{c} and x^a\widehat{x}^{a} of the kk-th class extracted from 𝒪\mathcal{O}. The clustering center of zc(k)z^{c(k)} and z^a(k)\widehat{z}^{a(k)} is denoted as mc(k)m^{c(k)} and m^a(k)\widehat{m}^{a(k)} in the distribution of 𝒞\mathcal{C} and 𝒜\mathcal{A} respectively. The class-aware constraint consists of three terms. First, we compute a loss term as the distance between mc(k)m^{c(k)} and m^a(k)\widehat{m}^{a(k)} to align the distribution of adversarial samples to clean samples as

Falign\displaystyle\mathcal{L}_{{F}_{align}} =k=1:K𝔼(mc(k)m^a(k)1).\displaystyle=\sum_{k=1:K}\mathbb{E}(\|m^{c(k)}-\widehat{m}^{a(k)}\|_{1}). (6)

Further, favorable distributions in the feature spaces of the target models for high-level tasks should have wide distances between the features from different classes, and narrow separation among the features from the same class. To this end, we set intra- and inter-class loss as

Fintra=k=1:K𝔼(z^a(k)m^a(k)1),\displaystyle\mathcal{L}_{{F}_{intra}}=\sum_{k=1:K}\mathbb{E}(\|\widehat{z}^{a(k)}-\widehat{m}^{a(k)}\|_{1}), (7)
Finter=k=1:Ki=1:K,ik𝔼(Mm^a(k)m^a(i)1),\displaystyle\mathcal{L}_{{F}_{inter}}=\sum_{k=1:K}\sum_{i=1:K,i\neq k}\mathbb{E}(M-\|\widehat{m}^{a(k)}-\widehat{m}^{a(i)}\|_{1}),

where MM is a pre-defined hyper-parameter for controlling the inter-class distance. And the overall class-aware training constraint is written as

Fclass\displaystyle\mathcal{L}_{{F}_{class}} =λ1Falign+λ2Finter+λ3Fintra,\displaystyle=\lambda_{1}\mathcal{L}_{{F}_{align}}+\lambda_{2}\mathcal{L}_{{F}_{inter}}+\lambda_{3}\mathcal{L}_{{F}_{intra}}, (8)

where λ1\lambda_{1}, λ2\lambda_{2}, and λ3\lambda_{3} are loss weights.

3.3 Overall Training Constraint

In summary, the overall training constraint for the generator 𝒢\mathcal{G} can be written as

g=\displaystyle\mathcal{L}_{{g}}= λ4(r+p+m+GANg+Ftask+Frec)+\displaystyle\lambda_{4}(\mathcal{L}_{{r}}+\mathcal{L}_{{p}}+\mathcal{L}_{{m}}+\mathcal{L}_{{GAN}_{g}}+\mathcal{L}_{F_{task}}+\mathcal{L}_{{F}_{rec}})+ (9)
λ1Falign+λ2Finter+λ3Fintra,\displaystyle\lambda_{1}\mathcal{L}_{{F}_{align}}+\lambda_{2}\mathcal{L}_{{F}_{inter}}+\lambda_{3}\mathcal{L}_{{F}_{intra}},

where λ1\lambda_{1} to λ4\lambda_{4} are loss weights. Our training algorithm for 𝒢\mathcal{G} is summarized in Alg. 1. λ1\lambda_{1}, λ2\lambda_{2}, λ3\lambda_{3} and λ4\lambda_{4} are set using grid search on the validation set. We adopt λ1=0.1\lambda_{1}=0.1, λ2=1\lambda_{2}=1, λ3=0.005\lambda_{3}=0.005 and λ4=50\lambda_{4}=50 in our experiments. We apply the same loss weights for training of various tasks.

Note that adversarial samples xax^{a} during training are synthesized from the target model 𝒪\mathcal{O} only, while the trained model 𝒢\mathcal{G} can defend unseen adversarial perturbations that are transferred from models with different structures.

4 Experiments

Our framework is applicable to several important tasks. In the following, we conduct extensive experimental evaluation on image classification, semantic segmentation and object detection, on multiple datasets. We show the effect of our proposed pixel- and feature-level constraints through ablation study and illustrate the superiority of our method compared with other alternatives.

4.1 Datasets

To demonstrate the performance of our method on the three key tasks, we select representative datasets in each task. For image classification, we choose CIFAR10 [17], CIFAR100 [17] and ImageNet [45] (we adopt its subset, the Tiny-ImageNet with 200 classes); for semantic segmentation, we employ Cityscapes [46] and VOC2012 [47]; for object detection, VOC07+12 [47] setting is adopted. The train and test splits are set according to their official setting.

4.2 Structure of DCNs in Experiments

In our experiments, the generator 𝒢\mathcal{G} is implemented as the “global generator” structure in pix2pixHD [43]. In classification, images from CIFAR10 and CIFAR100 are with size 32×3232\times 32 and the images from Tiny-ImageNet are 64×6464\times 64 large for the input of target model as well as the generator. The generator contains two down-sample/up-sample layers. In the semantic segmentation task, the images from Cityscapes and VOC2012 are shaped as 512×512512\times 512 for the generator with four down-sample/up-sample layers, and are set as 425×425425\times 425 and 417×417417\times 417 for the target model respectively. As for object detection, the input size of the generator, which is built with four down-sample/up-sample layers, is 256×256256\times 256 for VOC07+12. The input size of the target model is 300×300300\times 300. Note that our training constraints are suitable for the generator 𝒢\mathcal{G} that is built with various structures.

4.3 Target Models

In the classification task, 𝒪\mathcal{O} is adopted as the structure of WideResNet [48]/ResNet50 [49]; as for semantic segmentation, we employ 𝒪\mathcal{O} with the architecture of PSPNet [50]/DeepLabv3 [51]; in object detection, 𝒪\mathcal{O} is set as the framework of SSD [52]/RFBNet [53]. We use 𝒪\mathcal{O} (trained without defense) to train 𝒢\mathcal{G} in input transformation strategies (including ours), and the results of processed clean/adversarial samples x^c\widehat{x}^{c}/x^a\widehat{x}^{a} are computed by 𝒪\mathcal{O} during evaluation. And in Sec. 4.9, it is verified that 𝒢\mathcal{G} can also protect target models 𝒪\mathcal{O}^{\prime} that have different structures and parameters from 𝒪\mathcal{O}, and have not appeared during training.

Table I: Comparison on the classification tasks. “WideResNet \rightarrow ResNet50”: attacking WideResNet while generating adversarial samples from ResNet50.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
No Defense 95.1 2.1 5.3 6.4 78.1 7.5 9.3 10.4 64.5 19.2 20.4 21.2
No Defense (finetune) 95.6 3.8 7.5 8.7 79.5 8.0 9.7 10.9 63.9 22.1 23.2 23.6
TRADES [48] 87.3 79.7 87.1 87.2 62.8 53.9 62.6 62.7 58.5 45.3 58.3 58.5
TRADES (finetune) [48] 85.4 78.5 85.3 85.4 78.1 53.9 55.8 44.2 43.3 42.2 43.2 43.4
Free-adv [54] 77.1 73.1 76.7 77.0 49.2 46.6 49.0 49.2 55.5 45.8 55.4 55.5
Free-adv (finetune) [54] 88.5 79.8 88.4 88.5 63.7 54.1 63.5 63.7 42.2 40.9 48.6 48.7
Defense [30] 39.9 38.7 38.1 39.3 31.1 30.5 30.9 31.0 20.4 18.5 19.7 19.9
SR [13] 48.0 47.2 48.0 48.1 33.6 33.1 33.5 33.8 31.1 30.5 30.9 31.2
FPD [16] 48.5 47.6 48.4 48.5 52.5 41.2 42.4 42.5 39.7 33.7 39.5 39.8
APE [15] 90.2 41.9 89.1 89.8 73.2 37.2 65.7 67.7 62.4 30.5 61.6 62.3
Denoise [14] 89.8 80.0 90.1 90.0 67.2 53.1 66.8 66.1 59.8 45.3 59.4 59.9
NRP [9] 91.8 79.0 89.2 89.8 70.3 52.8 66.6 67.2 59.1 45.5 59.0 59.2
Ours 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2

4.4 Training

Training parameters. We employ Adam optimizer with β1\beta_{1} and β2\beta_{2} set to 0.5 and 0.999. The learning rate is set as 0.0002 and the batch size is 1 for Cityscapes and VOC2012, 4 for VOC07+12 , 16 for Tiny-ImageNet, and 64 for CIFAR10/CIFAR100. The number of training epoch is 80 for Cityscapes, 20 for VOC2012, 80 for VOC07+12, 30 for Tiny-ImageNet, 100 for CIFAR10/CIFAR100. Our method is implemented with PyTorch [55], and runs on a TITAN X GPU.

Data augmentation. For training on three tasks, we adopt the data augmentation policies described in [48, 50, 52]. For CIFAR10 and CIFAR100, we first pad the input to 40×4040\times 40 and randomly crop a patch of size 32×3232\times 32. Then the patch is chosen to be horizontally flipped or not; for Tiny-ImageNet, the data augmentation includes the operation of random crop (the pad size is 4 and the crop size is 64) and random horizontal flip. For the segmentation task (Cityscapes and VOC2012), we employ the official augmentations in [56], including RandScale, RandRotate, RandomGaussianBlur, RandomHorizontalFlip, and random crop. For the detection task (VOC07+12), the augmentation strategies in [52] are utilized.

Details of our training constraints for image classification. For the classification task, we adopt WideResNet [48] and ResNet50 [49] for experiments, and implement feature-level training constraints by extracting the feature of the fully connected layer before the final layer. Especially, each image xH×W×3x\in\mathbb{R}^{H\times W\times 3} (HH and WW are the height and width of the image) has one class label yKy\in\mathbb{R}^{K} and one feature vector fLf\in\mathbb{R}^{L} (LL is the length of the vector). And we group features into KK classes according to yy.

Details of our training constraints for segmentation. In the semantic segmentation task, we use PSPNet [50] and DeepLabv3 [51] with ResNet50 as the backbone for experiments, and complete feature-level constraints by using the feature of the convolution layer before the final layer. Different with the classification task, each image in semantic segmentation task has multiple class labels and thus has multiple feature vectors with different classes: for an image xH×W×3x\in\mathbb{R}^{H\times W\times 3}, it has the corresponding segmentation map yH×W×Ky\in\mathbb{R}^{H\times W\times K}. And the feature of the image can be denoted as fh×w×Lf^{\prime}\in\mathbb{R}^{h\times w\times L} with a resized segmentation map yh×w×Ky^{\prime}\in\mathbb{R}^{h\times w\times K}. Then we can obtain a set of feature vectors with their corresponding labels as {y1,,yh×w},yiK\{y_{1},...,y_{h\times w}\},y_{i}\in\mathbb{R}^{K}, {f1,,fh×w},fiL\{f_{1},...,f_{h\times w}\},f_{i}\in\mathbb{R}^{L}, which are reshaped from yy^{\prime} as well as ff^{\prime}. Such set of feature vectors can be grouped into KK classes and utilized in feature-level constraints.

Details of our training constraints for object detection. We employ SSD [52] and RFBNet [53] with VGG16 as the backbone for experiments within the object detection task. And the features for feature-level constraints are obtained as the outputs of the backbone and are cropped with the ground truth of the bounding box. In this task, each image has multiple class labels for bounding boxes in this image, and thus also has multiple feature vectors with different classes: an image xH×W×3x\in\mathbb{R}^{H\times W\times 3} can have BB bounding boxes and one class label for each box. The feature of the image can be denoted as fh×w×Lf^{\prime}\in\mathbb{R}^{h\times w\times L} and we can obtain the feature of each bounding box on this feature map with the corresponding class label, as {y1,,yB},yiK\{y_{1},...,y_{B}\},y_{i}\in\mathbb{R}^{K}, {f1,,fB},fiL\{f_{1},...,f_{B}\},f_{i}\in\mathbb{R}^{L}.

Table II: Comparison with existing defense methods on the semantic segmentation and object detection tasks. “PSPNet \rightarrow DeepLabv3”: attacking PSPNet while obtaining adversarial samples with DeepLabv3; “SSD \rightarrow RFBNet”: attacking SSD while acquiring adversarial samples from RFBNet.
PSPNet \rightarrow DeepLabv3 Cityscapes (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
SSD \rightarrow RFBNet clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
No Defense 73.5 3.6 38.6 12.5 76.4 11.1 46.1 16.6 72.5 17.9 14.5 15.0
No Defense (finetune) 73.3 3.6 37.1 12.5 76.3 11.2 45.2 15.4 73.6 19.6 16.8 16.1
SAT [57] 65.7 50.5 52.5 51.0 73.9 57.1 64.3 69.3
SAT (finetune) [57] 66.3 42.1 47.7 42.4 74.2 60.5 60.1 63.4
DDCAT [57] 67.7 51.4 54.4 51.8 75.1 58.8 65.1 69.3
DDCAT (finetune) [57] 68.3 43.3 49.4 43.1 76.0 62.4 62.4 64.8
CLS [58] 47.8 34.5 43.3 44.1
LOC [58] 52.9 36.7 38.8 39.9
CON [58] 40.7 31.3 39.5 40.3
MTD [58] 49.1 42.0 44.3 44.1
Defense [30] 22.2 20.2 19.9 21.2 23.5 21.9 22.6 23.3 34.6 30.2 30.8 29.7
SR [13] 60.7 52.0 50.6 51.5 71.2 58.1 60.3 67.2 54.6 36.2 37.5 34.5
FPD [16] 55.9 53.0 53.2 55.0 61.5 57.5 58.6 59.7 57.2 58.1 55.8 57.8
APE [15] 54.5 34.7 33.3 45.4 74.3 37.8 54.3 59.9 62.3 57.9 58.0 55.8
Denoise [14] 64.4 55.1 53.8 64.0 70.4 61.9 61.5 67.8 61.6 52.2 50.9 51.7
NRP [9] 65.0 55.2 49.3 64.1 70.5 62.6 59.3 68.5 60.4 59.9 55.4 58.9
Ours 67.6 59.5 62.0 64.7 71.2 63.6 66.0 69.5 60.5 61.2 58.3 60.9
Table III: Comparison on the classification tasks. “ResNet50 \rightarrow WideResNet”: attacking ResNet50 while generating adversarial samples from WideResNet.
ResNet50 \rightarrow WideResNet CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
No Defense 94.3 8.2 4.3 5.8 76.1 17.3 18.3 16.6 62.1 22.7 21.7 22.2
No Defense (finetune) 94.7 8.1 6.6 7.5 77.8 16.6 18.3 16.6 61.3 22.7 25.8 26.1
TRADES [48] 85.3 82.3 85.1 85.2 59.8 57.3 59.7 59.8 44.3 43.0 44.3 44.3
TRADES (finetune) [48] 83.2 82.0 83.1 83.2 56.3 55.2 56.3 56.3 43.3 42.4 43.3 43.3
Free-adv [54] 86.3 81.6 86.2 86.3 64.0 55.8 63.9 64.0 53.4 41.6 53.3 53.4
Free-adv (finetune) [54] 88.3 82.9 88.2 88.3 63.4 55.2 63.3 63.4 39.9 38.5 39.7 39.8
Defense [30] 40.1 38.8 39.7 39.8 31.1 30.2 30.5 30.9 21.5 19.9 20.3 21.0
SR [13] 45.3 45.1 45.3 45.3 35.6 35.3 35.9 36.3 30.6 29.5 30.0 30.1
FPD [16] 53.0 52.7 52.9 53.0 50.0 46.8 49.7 50.0 34.5 31.7 34.2 34.4
APE [15] 89.1 64.9 88.9 89.6 71.7 51.6 66.4 67.2 60.2 35.1 59.0 59.1
Denoise [14] 88.3 80.7 87.9 88.2 66.7 56.3 65.3 65.5 56.6 42.3 56.5 56.7
NRP [9] 90.3 80.9 88.1 88.3 66.7 56.6 66.2 66.6 56.5 43.3 56.1 56.5
Ours 90.0 84.0 89.8 90.2 67.3 57.7 67.5 68.1 61.0 43.1 59.9 60.8
Table IV: Comparison with existing defense methods on the semantic segmentation and object detection tasks. “DeepLabv3 \rightarrow PSPNet”: attacking DeepLabv3 while obtaining adversarial samples with PSPNet; “RFBNet \rightarrow SSD”: attacking RFBNet while acquiring adversarial samples from SSD.
DeepLabv3 \rightarrow PSPNet Cityscape (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
RFBNet \rightarrow SSD clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
No Defense 73.2 3.8 32.3 13.5 76.9 11.8 49.0 20.2 80.6 16.9 15.6 15.9
No Defense (finetune) 73.5 3.9 33.1 13.3 75.7 11.7 49.8 18.4 80.8 17.2 15.8 16.3
SAT [57] 64.3 49.2 50.7 49.9 72.8 56.2 65.3 68.6
SAT (finetune) [57] 65.4 42.5 48.5 42.8 74.8 60.1 64.1 69.9
DDCAT [57] 67.7 50.3 52.9 50.8 74.2 61.5 66.7 69.9
DDCAT (finetune) [57] 68.2 43.3 50.1 43.6 76.2 63.4 65.4 70.2
CLS [58] 52.0 39.6 48.8 49.1
LOC [58] 57.6 41.7 42.8 44.4
CON [58] 43.5 35.3 43.8 44.1
MTD [58] 53.7 47.1 48.3 48.9
Defense [30] 23.3 21.2 20.4 22.9 22.5 20.7 22.0 22.4 42.2 37.8 38.2 37.1
SR [13] 62.8 51.6 50.6 52.8 70.1 60.5 64.0 68.6 56.2 38.6 39.3 36.2
FPD [16] 49.5 49.6 50.0 50.8 62.1 59.7 60.8 61.7 58.7 59.3 57.3 59.2
APE [15] 54.2 27.2 22.7 35.6 75.5 40.8 57.6 63.4 68.5 60.0 61.3 58.8
Denoise [14] 64.6 58.5 55.0 64.0 67.2 63.4 64.7 68.8 68.5 59.3 57.9 58.4
NRP [9] 65.6 58.0 50.7 64.1 68.1 64.7 62.8 69.2 68.5 67.2 63.5 67.2
Ours 67.3 60.3 62.7 64.9 69.1 65.2 66.9 70.7 70.2 70.0 68.1 70.0

Attacks employed in training. In this paper, all attacks are conducted with LL_{\infty} constraint and untargeted form (except the experiments in Sec. 4.10). For the image classification task, we follow [48] to set the attack parameters during training and testing. And we employ the attack parameters’ setting in [57] and [58] for the semantic segmentation and object detection task, respectively. Therefore, for experiments of classification, the adversarial perturbation ϵ\epsilon during training is generated by PGD [59] with KL criterion; for semantic segmentation, we employ BIM [22] for training; in object detection, classification attack (“cls”) and localization attack (“loc”) [58] are utilized during training. And the corresponding details are described as the following. For the classification task, we utilize PGD attack with KL criterion during training, and the perturbation range ϵ=0.031×255\epsilon=0.031\times 255, the step size α=0.0175×255\alpha=0.0175\times 255, the number of attack iteration n=4n=4. In the semantic segmentation task, we use BIM (ϵ=0.03×255\epsilon=0.03\times 255, α=0.01×255\alpha=0.01\times 255, n=3n=3) during training. Moreover, the attacks for classification loss and location loss are employed within the object detection task, and the perturbation range is 8 for pixel values within [0,255][0,255].

4.5 Evaluation

Attack types. During evaluation, in classification tasks, the adversarial perturbation is obtained by PGD with cross-entropy criterion and translation-invariant form [60], DeepFool attack [61], and C&W attack [62]. For semantic segmentation, we adopt BIM, DeepFool, and C&W for evaluation; for object detection, “cls”, “loc”, and “cls+loc” (simultaneously conducting classification and localization attacks) are employed for testing.

We experiment with the transferable attack to demonstrate our method’s effect. In such evaluation, attackers cannot utilize the exact gradient information of the target model. Instead, they usually obtain the gradient information from a substitute network, which is trained on the same dataset [19, 63, 64] with different model structures. Thus, in evaluation of the classification task, we use the perturbations computed from ResNet50 to attack the defense framework trained with WideResNet. For semantic segmentation, the adversarial samples obtained from DeepLabv3 are adopted to achieve attack for PSPNet. As for object detection, attacks for SSD are implemented by employing the adversarial perturbations generated from RFBNet.

Attack parameters. Following [48, 57, 58], for the classification task, we adopt PGD with cross-entropy criterion (ϵ=0.031×255\epsilon=0.031\times 255, α=0.0075×255\alpha=0.0075\times 255, n=8n=8), DeepFool attack (ϵ=0.031×255\epsilon=0.031\times 255), C&W attack (ϵ=0.031×255\epsilon=0.031\times 255, the step size is 0.0075×2550.0075\times 255). For semantic segmentation, we utilize BIM (ϵ=0.03×255\epsilon=0.03\times 255, α=0.01×255\alpha=0.01\times 255, n=4n=4), DeepFool attack (ϵ=0.03×255\epsilon=0.03\times 255), C&W attack (ϵ=0.03×255\epsilon=0.03\times 255, the step size is 0.01×2550.01\times 255) for evaluation. Moreover, the perturbation range is set as 8 for the object detection task. Note that we adopt different attack types and parameters for training and evaluation, to verify the generalization ability of the trained DGNs towards unseen attacks.

Metrics. Classification accuracy, mean of class-wise intersection over union (miou) and mean average precision (mAP), are adopted as the quality indicators for image classification, semantic segmentation, and object detection, respectively.

4.6 Comparison with Existing Methods

Baselines. We choose approaches of input transformation and adversarial training for comparison. Most existing adversarial training work focuses on image classification, and we adopt two recent methods, [48] and [54], for comparison. For the semantic segmentation task, [57] first conducted comprehensive exploration on the impact of adversarial training for semantic segmentation. We employ two strategies proposed by [57] (SAT and DDC-AT). For object detection, [58] proposed four variants of adversarial training. All these adversarial training approaches are trained with their original configuration and “finetune” means finetuning pre-trained models (without defense) via adversarial training. For input transformation strategies, we use six representative methods, including [9, 14, 15, 16, 30, 13]. We adopt their original generator structures, and re-train them with the same epochs, batch size and target models as ours.

Table V: The evaluation with various attacks on CIFAR10, CIFAR100 and Tiny-ImageNet. “WideResNet \rightarrow ResNet50”: attacking WideResNet while generating adversarial samples from ResNet50 (except the decision-based attack that does not utilizing the transferability of adversarial samples).
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %)
SSP [9] FDA [65] TAP [66] MI [21] Auto [67] Square [68] Cu&Wh [69] DIM [70] PPDA [71] GeoDA [72] SF [73] DR [74]
Denoise [14] 60.2 65.6 67.1 71.4 61.4 59.6 58.4 64.1 58.2 59.5 56.2 56.9
NRP [9] 62.0 66.2 69.2 71.9 62.3 61.1 60.7 65.3 59.7 60.3 57.7 58.4
Ours 64.1 68.0 70.1 73.2 65.2 63.3 62.5 67.4 63.4 64.7 61.3 63.9
WideResNet \rightarrow ResNet50 CIFAR100 (Accuracy %)
SSP [9] FDA [65] TAP [66] MI [21] Auto [67] Square [68] Cu&Wh [69] DIM [70] PPDA [71] GeoDA [72] SF [73] DR [74]
Denoise [14] 40.8 46.3 46.7 48.3 43.2 40.6 38.8 40.5 35.1 36.9 33.7 36.1
NRP [9] 41.7 45.8 45.9 46.3 42.1 39.1 40.2 42.3 38.8 40.4 36.9 39.2
Ours 44.9 47.6 48.1 50.3 45.5 41.8 42.2 46.7 42.3 43.5 40.2 43.0
WideResNet \rightarrow ResNet50 Tiny-ImageNet (Accuracy %)
SSP [9] FDA [65] TAP [66] MI [21] Auto [67] Square [68] Cu&Wh [69] DIM [70] PPDA [71] GeoDA [72] SF [73] DR [74]
Denoise [14] 34.5 40.9 38.5 41.2 34.2 31.8 31.7 36.2 32.9 35.0 31.0 34.7
NRP [9] 35.5 39.1 37.2 40.6 36.7 30.5 32.4 37.6 33.4 34.8 31.3 35.1
Ours 37.3 40.6 41.8 44.5 38.8 33.1 34.2 40.2 34.9 36.6 32.8 35.2
Table VI: Results of statistically significant calculation for the comparison in Table I and II. We report the p-value.
CIFAR10 CIFAR100
PGD DeepFool C&W PGD DeepFool C&W
Ours vs Denoise 2.7e-17 3.2e-8 5.6e-7 5.8e-17 4.3e-16 3.6e-7
Ours vs NRP 1.6e-13 2.6e-7 7.4e-10 7.5e-9 6.8e-7 9.2e-12
Tiny-ImageNet Cityscapes
PGD DeepFool C&W BIM DeepFool C&W
Ours vs Denoise 5.5e-7 8.2e-8 1.7e-7 7.2e-11 8.4e-10 2.9e-8
Ours vs NRP 6.7e-8 2.4e-11 5.0e-10 7.3e-8 3.3e-10 9.1e-9
VOC2012 VOC07+12
BIM DeepFool C&W cls loc cls+loc
Ours vs Denoise 2.6e-11 8.3e-7 9.4e-7 1.1e-8 9.8e-10 6.9e-7
Ours vs NRP 4.4e-9 5.7e-7 1.8e-10 4.8e-9 7.1e-7 8.8e-10
Table VII: Comparison on classification, semantic segmentation, and object detection tasks in terms of quality enhancement.
WideResNet \rightarrow ResNet50 CIFAR10 (dB) CIFAR100 (dB) Tiny-ImageNet (dB)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
Defense [30] 23.70 23.54 23.50 23.64 20.86 20.54 20.72 20.79 23.63 23.05 23.34 23.40
SR [13] 24.58 24.05 24.11 24.26 21.20 21.08 21.14 21.18 25.27 25.04 25.09 25.21
FPD [16] 24.64 24.16 24.24 24.45 22.08 21.85 21.91 21.93 26.14 25.67 26.06 26.09
APE [15] 27.52 23.77 26.82 26.91 25.65 21.46 23.88 24.15 28.45 25.30 28.01 28.15
Denoise [14] 27.43 25.24 26.85 26.98 24.87 22.21 24.04 23.96 28.14 26.11 27.67 27.87
NRP [9] 27.81 25.03 27.03 27.07 25.32 22.03 24.13 24.20 28.01 26.32 27.88 27.92
Ours 27.69 25.52 27.17 27.26 25.09 22.67 24.55 24.87 28.52 26.74 28.25 28.36
PSPNet \rightarrow DeepLabv3 Cityscapes (dB) VOC2012 (dB) VOC07+12 (dB)
SSD \rightarrow RFBNet clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
Defense [30] 21.06 20.52 20.36 20.92 20.83 20.08 20.34 20.50 23.21 22.88 23.03 22.84
SR [13] 28.14 26.16 26.05 26.13 29.14 26.06 26.31 28.02 24.46 23.34 23.52 23.19
FPD [16] 27.70 26.00 26.19 26.69 27.38 26.04 26.27 26.53 25.87 25.72 25.01 25.33
APE [15] 27.15 24.87 24.21 25.38 30.06 22.50 25.80 26.15 26.23 25.06 25.07 24.80
Denoise [14] 28.93 27.19 27.04 28.84 29.02 26.71 26.55 28.06 26.15 24.21 23.86 24.08
NRP [9] 29.51 27.04 26.02 29.41 29.07 26.85 26.03 28.11 25.90 25.27 24.63 25.02
Ours 30.02 27.45 28.93 29.66 29.25 27.13 28.22 28.64 25.94 25.81 25.08 25.37

Quantitative results. The results in the classification task are summarized in Table I. Although a few approaches yield comparable effect on clean samples, our full setting results in the highest robustness on adversarial samples, proving the effectiveness of our method. Further, as exhibited in Table II, in semantic segmentation and object detection, our results outperform all other competing methods on adversarial samples. Only a fraction of approaches work decently on clean samples. Most lack the similar level of robustness on adversarial samples (e.g., APE).

We also conduct the comparison between our approach and current methods, with the target model structure of ResNet50 [49], DeepLabv3 [51] and RFBNet [53]. The results are summarized in Table III and IV, and these results all support the superiority of our approach over current methods.

Evaluation with various attacks on classification tasks. Furthermore, there are other state-of-the-art black-box attack methods that are designed for the classification task. We include state-of-the-art attacks for evaluation, including SSP attack [9], MI-FGSM [21], FDA [65], TAP attack [66], DIM attack [70], Square attack [68], AutoAttack [67], Curls&Whey (Cu&Wh) attack [69], PPDA attack [71], GeoDA attack [72], SF attack [73] and Dispersion reduction (DR) attack [74]. They are all implemented as the untargeted attacks. The SSP attack is implemented with attack iteration number as 100, perturbation range as ϵ=0.031×255\epsilon=0.031\times 255, step size as α=0.0075×255\alpha=0.0075\times 255, and the attack is completed by using the VGG16 layers. The MI-FGSM attack is implemented with attack iteration number as 100, perturbation range as ϵ=0.031×255\epsilon=0.031\times 255, step size as α=0.0075×255\alpha=0.0075\times 255, and decay factor as 1.0. The FDA attack, TAP attack and DIM attack are implemented with attack iteration number as 100, perturbation range as ϵ=0.031×255\epsilon=0.031\times 255, and step size as α=0.0075×255\alpha=0.0075\times 255. Especially, the DIM attack is implemented with translation invariant attack [60] with decay factor as 1.0, the probability for stochastic transformation function as 0.5. The Square attack is completed with ϵ=0.031×255\epsilon=0.031\times 255 and attack iteration number as 10000. For the Square attack, loss function is adopted as the cross-entropy loss and margin loss, and the probability of changing a coordinate is set as 0.3. The AutoAttack is implemented with the PGD attack using the cross-entropy loss, and attack iteration number is set as 1000, number of restart is set as 5, ρ\rho is set as 0.75, and ϵ=0.031×255\epsilon=0.031\times 255. The AutoAttack utilizes the transferability of adversarial samples. Curls&Whey attack is implemented by setting attack iteration number as 100, the variance of gaussian noise as 2, the step size for attack as 0.0075×2550.0075\times 255, the binary search step as 12, and perturbation range as ϵ=0.031×255\epsilon=0.031\times 255. The PPDA attack is completed with maximal iteration number as 4000, the size for the reduction of dimension is set as 1500, the amplitude of change at each step is set as 0.0075×2550.0075\times 255, and the perturbation range is ϵ=0.031×255\epsilon=0.031\times 255 For GeoDA attack, the maximal iteration number is 4000, λ=0.6\lambda=0.6, the dimension of the subspace is 75, and the perturbation range is ϵ=0.031×255\epsilon=0.031\times 255 For SF attack, the maximal iteration number is 20000 and the perturbation range is ϵ=0.031×255\epsilon=0.031\times 255 and the the dimensionality reduction rate is 2.0. For DR attack, the attack iteration number is 100, perturbation range is ϵ=0.031×255\epsilon=0.031\times 255, step size as α=0.0075×255\alpha=0.0075\times 255. The results of our method and two strong baselines (Denoise [14] and NRP [9]) are reported in Table V. These results demonstrate that our approach can keep robustness towards various types of attacks and outperforms these two strong baselines.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
xcx^{c} result of xcx^{c} xax^{a} result of xax^{a} x^a\widehat{x}^{a} result of x^a\widehat{x}^{a}
Figure 5: Visual illustration of results on clean samples xcx^{c}, adversarial samples xax^{a} (PGD attack) and processed adversarial samples x^a\widehat{x}^{a} with our trained 𝒢\mathcal{G}, for the tasks of semantic segmentation on Cityscapes.
t-SNE visualizations for classification task on CIFAR10 [17]
Refer to caption Refer to caption Refer to caption Refer to caption
t-SNE visualizations for segmentation task on Cityscapes [46]
Refer to caption Refer to caption Refer to caption Refer to caption
t-SNE visualizations for detection task on VOC07+12 [75]
Refer to caption Refer to caption Refer to caption Refer to caption
(a) xcx^{c} in 𝒪\mathcal{O} (b) xax^{a} in 𝒪\mathcal{O} (c) x^c\widehat{x}^{c} in 𝒪\mathcal{O} (d) x^a\widehat{x}^{a} in 𝒪\mathcal{O}
Figure 6: Target model 𝒪\mathcal{O} has ideal distributions for clean samples xcx^{c} while disordered distributions for adversarial samples xax^{a}. Our generator 𝒢\mathcal{G} can turn xcx^{c}/xax^{a} into x^c\widehat{x}^{c}/x^a\widehat{x}^{a} with corrected distrubutions.
Refer to caption albatross, \surd score: 0.9998 Refer to caption goose, ×\times score: 1.000 Refer to caption albatross, \surd score: 0.9887
Refer to caption goldfish, \surd score: 0.9998 Refer to caption oboe, ×\times score: 1.000 Refer to caption goldfish, \surd score: 0.9973
xcx^{c} result of xcx^{c} xax^{a} result of xax^{a} x^a\widehat{x}^{a} result of x^a\widehat{x}^{a}
Figure 7: Visual illustration of results on clean samples xcx^{c}, adversarial samples xax^{a} (PGD attack) and processed adversarial samples x^a\widehat{x}^{a} with our trained 𝒢\mathcal{G}, for the tasks of classification on Tiny-ImageNet.
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
xcx^{c} result of xcx^{c} xax^{a} result of xax^{a} x^a\widehat{x}^{a} result of x^a\widehat{x}^{a}
Figure 8: Visual illustration of results on clean samples xcx^{c}, adversarial samples xax^{a} (cls+loc attack) and processed adversarial samples x^a\widehat{x}^{a} with our trained 𝒢\mathcal{G}, for the tasks of object detection on VOC07+12.
Table VIII: Results of the ablation study for effects of “novel pixel-level constraints” and “integrated distribution alignment”.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
No Defense 95.1 2.1 5.3 6.4 78.1 7.5 9.3 10.4 64.5 19.2 20.4 21.2
I\mathcal{L}_{I} 90.4 66.4 85.3 86.1 67.9 46.4 66.6 67.5 63.4 34.3 60.7 61.7
I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} 90.7 78.2 89.0 89.5 68.1 53.4 67.0 67.8 63.2 46.1 61.5 62.4
T\mathcal{L}_{T} 92.9 49.5 84.4 85.5 71.2 39.3 65.1 66.6 63.4 32.0 58.3 59.2
T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)} 92.5 74.1 87.6 88.3 70.3 50.4 65.9 66.8 63.2 44.4 60.2 61.1
Full 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2
PSPNet \rightarrow DeepLabv3 Cityscapes (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
SSD \rightarrow RFBNet clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
No Defense 73.5 3.6 38.6 12.5 76.4 11.1 46.1 16.6 72.5 17.9 14.5 15.0
I\mathcal{L}_{I} 61.5 50.8 52.4 56.1 73.1 59.8 63.0 68.3 63.4 59.4 54.4 58.2
I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} 64.9 58.7 60.7 63.4 70.3 62.9 64.7 68.6 58.7 60.7 57.4 60.1
T\mathcal{L}_{T} 63.8 46.3 50.9 54.6 74.1 45.0 57.2 63.1 65.1 58.3 53.7 56.3
T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)} 65.0 55.6 59.1 61.3 71.0 61.3 62.4 66.1 60.0 59.6 55.0 58.2
Full 67.6 59.5 62.0 64.7 71.2 63.6 66.0 69.5 60.5 61.2 58.3 60.9

The statistically significant calculation. To further analyze the superiority of our method over baselines, we conduct a paired t-test for the results in Table I and II with the null hypothesis H0: “the scores of two different methods do not have obvious difference.” We use a significant level of 0.001, given that H0 is true, and we calculate the p-value as shown in Table VI by using the Microsoft Excel T-TEST function. We find that all p-values under different attacks are smaller than 0.001. Hence, we can reject H0 and show that our result is different from others with a significant level of 0.001.

The evaluation for quality enhancement. For the input transformation strategy, we also need to compare the results of image quality enhancement, i.e., we can compute the PSNR value between the transformed adversarial samples x^a\widehat{x}^{a} and the clean samples xcx^{c}. And we list the comparison between our method and other input transformation strategies in Table VII. Obviously, our method also has the superiority in terms of the image quality enhancement.

Distribution visualization results. Furthermore, we provide the t-SNE visualizations for the classification task, the semantic segmentation task and object detection tasks as shown in Fig. 6. The adversarial perturbations are obtained with the PGD attack, BIM attack and “cls+loc” attack.

Qualitative results. We provide visual illustrations to demonstrate the defense effects of our framework. The visual cases for semantic segmentation are displayed in Fig. 5 , for image classification are shown in Fig. 7, and for object detection are exhibited in Fig. 8. Compared with the situations without our defense, our trained 𝒢\mathcal{G} leads to satisfying defense on adversarial samples in all three tasks.

Table IX: Results of the ablation study for loss functions.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
Ours w/o r\mathcal{L}_{r} 87.3 72.8 86.7 87.1 61.8 48.9 60.9 61.6 62.7 42.6 58.8 60.6
Ours w/o p\mathcal{L}_{p} 88.4 78.6 87.1 87.5 63.5 52.8 62.4 64.7 63.1 44.2 60.4 62.0
Ours w/o m\mathcal{L}_{m} 90.2 80.1 89.7 90.2 67.3 53.6 67.0 67.2 62.2 45.8 61.7 62.5
Ours w/o GANg\mathcal{L}_{GAN_{g}} 87.0 73.4 81.6 82.0 62.1 50.4 61.6 62.3 60.6 40.3 57.2 58.4
Ours w/o Ftask\mathcal{L}_{F_{task}} 92.6 62.5 85.5 85.8 70.5 40.2 66.3 67.0 63.3 35.5 57.9 58.7
Ours w/o Frec\mathcal{L}_{F_{rec}} 91.8 60.7 82.3 83.1 68.6 43.7 64.2 65.3 62.8 38.2 56.3 57.9
Ours w/o Falign\mathcal{L}_{F_{align}} 90.3 70.6 83.0 84.4 70.5 47.2 62.2 63.0 62.7 41.1 55.3 56.8
Ours w/o Finter\mathcal{L}_{F_{inter}} 91.1 73.8 85.2 86.7 69.4 49.1 63.9 64.7 61.8 42.5 57.0 57.6
Ours w/o Fintra\mathcal{L}_{F_{intra}} 90.8 71.3 86.1 87.2 70.8 47.9 65.1 66.0 62.5 44.1 59.2 60.5
Full 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2
PSPNet \rightarrow DeepLabv3 Cityscapes (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
SSD \rightarrow RFBNet clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
Ours w/o r\mathcal{L}_{r} 61.5 50.8 52.4 56.1 66.4 54.3 56.2 61.6 56.2 58.3 55.6 55.8
Ours w/o p\mathcal{L}_{p} 64.6 53.2 57.3 59.5 69.1 58.2 61.4 63.7 57.8 59.5 56.2 57.2
Ours w/o m\mathcal{L}_{m} 67.2 57.8 61.3 62.4 70.1 62.4 64.5 67.6 59.6 60.1 57.3 58.5
Ours w/o GANg\mathcal{L}_{GAN_{g}} 60.2 51.3 53.1 53.9 67.3 55.6 58.3 59.0 55.8 56.3 53.1 53.7
Ours w/o Ftask\mathcal{L}_{F_{task}} 64.5 47.7 51.2 53.6 72.6 48.2 59.1 61.8 63.5 53.6 55.0 56.4
Ours w/o Frec\mathcal{L}_{F_{rec}} 63.7 50.2 53.4 55.0 72.0 51.3 57.6 58.2 62.3 50.4 52.7 53.4
Ours w/o Falign\mathcal{L}_{F_{align}} 64.5 51.4 55.7 57.2 71.6 57.8 59.3 60.1 60.3 55.2 51.8 53.0
Ours w/o Finter\mathcal{L}_{F_{inter}} 64.8 52.5 56.8 58.3 70.8 59.2 60.5 62.6 59.7 57.6 52.4 55.8
Ours w/o Fintra\mathcal{L}_{F_{intra}} 65.1 54.7 58.2 60.8 70.5 61.0 61.2 64.3 60.4 58.2 54.3 57.0
Full 67.6 59.5 62.0 64.7 71.2 63.6 66.0 69.5 60.5 61.2 58.3 60.9

4.7 Ablation Study

Novel pixel-level constraints. For each task, we conduct an ablation study to analyze the impact of each loss term and verify the superiority on pixel- and feature-level constraints. We denote I\mathcal{L}_{I} as the results with only pixel-level constraints in our approach, I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} as the scores with our full constraints apart from Fclass\mathcal{L}_{F_{class}}. T\mathcal{L}_{T} represents the consequence with only traditional pixel-level constraints, and T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)} refers to the results with traditional pixel- and feature-level constraints other than Fclass\mathcal{L}_{F_{class}}. The results are reported in Table VIII.

Compared with traditional pixel-level constraints, our novel pixel-level alignment strategy achieves much higher robustness on adversarial samples with a negligible degradation on clean samples as cost. Such superiority is prominent within the comparison between I\mathcal{L}_{I} and T\mathcal{L}_{T}, I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} and T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)}. Such superiority demonstrates that adversarial samples can be better aligned with clean samples, when we match them with clean ones in the output space of the generator.

Integrated distribution alignment. Further, compared to I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)}, our full setting exhibits stable improvement for effects on clean samples and adversarial samples (as shown in Table VIII), manifesting the impact of our proposed Fclass\mathcal{L}_{F_{class}}. The positive effects of Fclass\mathcal{L}_{F_{class}} prove the importance of aligning the overall distribution in the feature space of the target model.

Loss functions. There are several loss terms in Eq. 9, and we conduct ablation studies to verify their importance by deleting different loss terms from Eq. 9 individually. And the corresponding results are recorded in Table IX.

Alternative pixel-level loss terms. As shown in Fig. 3, traditional and our pixel-level training constraints have different reconstruction loss and adversarial loss terms, and we set ablation study to analyze the effectiveness if we using the mix strategies. Thus, there are two settings.

I) We have the reconstruction loss in the traditional strategy (x^cxc\|\widehat{x}^{c}-x^{c}\| and x^axc\|\widehat{x}^{a}-x^{c}\|), and the adversarial loss in our strategy (Eq. 2 and Eq. 3). Such an ablation setting is called “Full-abla-pixel-I”.

II) Use the reconstruction loss in our strategy (x^cxc\|\widehat{x}^{c}-x^{c}\| and x^ax^c\|\widehat{x}^{a}-\widehat{x}^{c}\|), and the adversarial loss in the traditional strategy. Especially, the adversarial loss can be written as

GANd=\displaystyle\mathcal{L}_{{GAN}_{d}}= 𝔼xc𝒞((𝒟(xc)1)2)+𝔼xc𝒞((𝒟(x^c)0)2)+\displaystyle\mathbb{E}_{x^{c}\sim\mathcal{C}}((\mathcal{D}(x^{c})-1)^{2})+\mathbb{E}_{x^{c}\sim\mathcal{C}}((\mathcal{D}(\widehat{x}^{c})-0)^{2})+ (10)
𝔼xa𝒜((𝒟(x^a)0)2)\displaystyle\mathbb{E}_{x^{a}\sim\mathcal{A}}((\mathcal{D}(\widehat{x}^{a})-0)^{2})
GANg=\displaystyle\mathcal{L}_{{GAN}_{g}}= 𝔼xc𝒜((𝒟(x^c)1)2)+𝔼xa𝒜((𝒟(x^a)1)2),\displaystyle\mathbb{E}_{x^{c}\sim\mathcal{A}}((\mathcal{D}(\widehat{x}^{c})-1)^{2})+\mathbb{E}_{x^{a}\sim\mathcal{A}}((\mathcal{D}(\widehat{x}^{a})-1)^{2}),
m=\displaystyle\mathcal{L}_{{m}}= 𝔼((x^a)(xc)1)+𝔼((x^c)(xc)1),\displaystyle\mathbb{E}(\|\mathcal{F}(\widehat{x}^{a})-\mathcal{F}(x^{c})\|_{1})+\mathbb{E}(\|\mathcal{F}(\widehat{x}^{c})-\mathcal{F}(x^{c})\|_{1}), (11)

and such an ablation setting is called “Full-abla-pixel-II”. The corresponding results are shown in Table X.

Table X: Results of the ablation study for “alternative pixel-level loss terms”, “alternative feature-level loss terms”, and “the layer choice in the discriminator”.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
Full-abla-pixel-I 89.7 77.3 87.3 88.0 67.6 50.7 64.4 65.2 63.4 45.1 59.4 60.3
Full-abla-pixel-II 89.4 76.7 86.8 87.3 67.9 51.2 65.0 65.4 63.2 44.9 60.3 61.4
Full-abla-feature-I 90.6 79.2 85.9 86.2 68.8 53.5 67.1 67.6 62.7 46.0 60.1 61.2
Full-abla-feature-II 90.5 80.5 89.1 89.5 68.0 52.6 66.3 68.0 62.1 45.7 58.6 59.7
deleting 1-st layer 89.2 80.1 88.3 88.7 67.3 52.8 66.2 67.5 61.4 45.2 60.8 62.1
deleting 2-nd layer 90.1 79.6 87.5 88.0 66.8 50.3 64.8 64.2 62.1 44.5 60.2 61.5
deleting 3-rd layer 88.3 79.1 87.0 86.8 66.2 53.1 64.0 65.1 61.8 43.2 58.7 60.3
deleting 4-th layer 87.8 77.3 85.3 85.6 65.7 51.8 62.8 63.7 60.4 43.8 58.0 60.6
deleting 5-th layer 89.5 78.9 86.2 87.2 67.1 52.5 63.7 64.9 62.3 44.9 59.3 59.8
Full 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2
PSPNet \rightarrow DeepLabv3 Cityscapes (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
SSD \rightarrow RFBNet clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
Full-abla-pixel-I 64.7 55.4 57.0 60.5 71.4 58.3 63.2 67.6 59.2 58.7 54.4 57.2
Full-abla-pixel-II 64.0 55.1 56.8 58.7 70.8 57.1 62.8 66.5 59.5 59.0 55.1 56.4
Full-abla-feature-I 66.1 58.2 59.4 61.6 72.0 61.2 64.1 68.3 60.7 60.6 57.5 60.1
Full-abla-feature-II 65.6 57.9 58.5 59.2 72.2 60.5 63.7 67.8 60.4 59.3 56.8 59.5
deleting 1-st layer 67.0 58.4 61.2 63.8 70.4 62.7 65.2 68.4 60.1 60.4 57.2 60.3
deleting 2-nd layer 66.3 57.5 60.8 63.2 70.8 61.8 64.3 67.5 60.8 58.8 55.6 59.4
deleting 3-rd layer 67.1 57.0 61.7 62.7 71.6 60.4 63.8 67.1 59.7 58.1 56.2 58.8
deleting 4-th layer 65.8 55.8 59.4 60.3 71.4 60.1 61.5 65.6 59.2 56.5 54.7 56.2
deleting 5-th layer 66.7 56.3 60.5 61.9 70.1 61.3 63.1 64.9 60.3 57.0 55.0 57.6
Full 67.6 59.5 62.0 64.7 71.2 63.6 66.0 69.5 60.5 61.2 58.3 60.9

Alternative feature-level loss terms. In the feature-level alignment, we can also utilize the similar strategy in the pixel-level alignment: using the features of z^c\widehat{z}^{c} to guide the formulation of z^a\widehat{z}^{a}. To demonstrate the drawback of such alternative strategy, we set two experiments.

I) we modify Eq. 5 to the following form, as

z^a=𝒪(x^a),z^c=𝒪(x^c),zc=𝒪(xc),\displaystyle\widehat{z}^{a}=\mathcal{O}(\widehat{x}^{a}),\ \widehat{z}^{c}=\mathcal{O}(\widehat{x}^{c}),\ z^{c}=\mathcal{O}(x^{c}), (12)
Frec=𝔼(z^czc1)+𝔼(z^az^c1),\displaystyle\mathcal{L}_{{F}_{rec}}=\mathbb{E}(\|\widehat{z}^{c}-z^{c}\|_{1})+\mathbb{E}(\|\widehat{z}^{a}-\widehat{z}^{c}\|_{1}),

and we keep other loss terms unchanged. Such an ablation setting is called “Full-abla-feature-I”.

II) We use the distribution of z^c\widehat{z}^{c} to align the distribution of z^a\widehat{z}^{a} and change Eq. 6 to the following equation, as

Falign\displaystyle\mathcal{L}_{{F}_{align}} =k=1:K𝔼(m^c(k)m^a(k)1),\displaystyle=\sum_{k=1:K}\mathbb{E}(\|\widehat{m}^{c(k)}-\widehat{m}^{a(k)}\|_{1}), (13)

where m^c(k)\widehat{m}^{c(k)} denotes the clustering center of z^c(k)\widehat{z}^{c(k)}, and we preserve the other loss terms not modified, and called such an ablation setting as “Full-abla-feature-II”. The corresponding results are shown in Table X.

The layer choice in the discriminator. In Eq. 3, we have employed the loss to measure the distances between clean and adversarial samples in terms of the discriminative feature maps which are extracted from the discriminator. We adopt the “MultiscaleDiscriminator” following the setting of pix2pixHD [43]. And there are 5 layers in the discriminator and we choose all these layers to compute m\mathcal{L}_{m} since the corresponding results are optimal. To prove this, we deleting individual layer from computing m\mathcal{L}_{m} (denoted as “deleting 1-st/2-nd/3-rd/4-th/5-th layer”), and we provide the experimental results in Table X to demonstrate that the layer selected by us is optimal.

Table XI: Results of changing hyper-parameters λ1λ4\lambda_{1}\sim\lambda_{4} in Eq. 9. The evaluation setting is the same as that of Table I and II.
CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
(λ1λ4)(\lambda_{1}\sim\lambda_{4}) ×\times 0.1 86.6 78.2 87.7 88.2 66.4 51.2 65.3 64.2 60.3 43.8 59.4 60.1
(λ1λ4)(\lambda_{1}\sim\lambda_{4}) ×\times 10 85.3 77.5 86.9 87.4 65.8 53.0 66.1 66.7 61.2 44.1 60.5 61.4
original (λ1λ4)(\lambda_{1}\sim\lambda_{4}) 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2
Cityscapes (mIoU %) VOC2012 (mIoU %) VOC07+12 (mAP %)
clean BIM DeepFool C&W clean BIM DeepFool C&W clean cls loc cls+loc
(λ1λ4)(\lambda_{1}\sim\lambda_{4}) ×\times 0.1 63.5 56.3 58.4 60.4 69.3 61.4 62.6 67.8 56.6 57.7 54.1 57.2
(λ1λ4)(\lambda_{1}\sim\lambda_{4}) ×\times 10 64.7 54.7 60.8 63.6 67.1 60.7 61.5 65.1 57.9 58.3 57.0 58.1
original (λ1λ4)(\lambda_{1}\sim\lambda_{4}) 67.6 59.5 62.0 64.7 71.2 63.6 66.0 69.5 60.5 61.2 58.3 60.9
Table XII: Comparison among our approach, existing methods and ablation setting on the classification task, under the evaluation of model transfer setting. “WideResNet \Rightarrow ResNet50” means that the generator is trained with the target model of WideResNet, while we evaluate its defense effect for ResNet50.
WideResNet \Rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %)
PGD DeepFool C&W PGD DeepFool C&W
No Defense 8.2 4.3 5.8 17.3 18.3 16.6
I\mathcal{L}_{I} 44.5 84.6 85.7 29.6 63.4 64.2
I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} 67.1 88.2 88.6 40.1 64.1 65.5
T\mathcal{L}_{T} 18.1 84.0 84.7 14.9 62.6 63.4
T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)} 59.9 87.0 87.2 36.6 63.2 65.2
Defense [30] 38.5 37.8 38.9 21.5 22.8 22.5
SR [13] 44.8 45.2 45.3 24.8 25.5 25.1
FPD [16] 44.2 44.0 44.4 30.5 31.1 31.2
APE [15] 10.1 86.2 86.4 11.4 64.8 65.4
Denoise [14] 73.0 87.7 87.1 35.7 63.7 64.2
NRP [9] 60.4 87.3 88.0 36.2 63.5 64.6
Ours 73.3 88.4 89.1 42.0 64.9 65.8
Table XIII: Comparison among our approach, existing methods and ablation settings on the semantic segmentation task, under the evaluation of model transfer setting. “PSPNet \Rightarrow DeepLabv3” means the generator is trained with the target model of PSPNet, while we evaluate its defense effect for DeepLabv3.
PSPNet \Rightarrow DeepLabv3 Cityscapes (mIoU %) VOC2012 (mIoU %)
BIM DeepFool C&W BIM DeepFool C&W
No Defense 3.8 32.3 13.5 11.8 49.0 20.2
I\mathcal{L}_{I} 48.5 49.9 55.0 54.9 59.8 68.0
I+F(w/oc)\mathcal{L}_{I+F(w/o\ c)} 57.8 60.3 62.2 59.9 63.2 68.2
T\mathcal{L}_{T} 43.0 48.9 53.7 33.2 49.9 53.7
T+F(w/oc)\mathcal{L}_{T+F(w/o\ c)} 53.8 58.6 60.0 55.1 60.3 65.8
Defense [30] 20.1 19.7 20.7 21.6 21.2 23.1
SR [13] 41.6 40.5 42.5 52.1 56.9 66.8
FPD [16] 51.5 51.7 53.6 56.7 57.3 60.5
APE [15] 28.5 26.8 40.0 25.7 46.9 44.5
Denoise [14] 52.7 51.1 63.4 60.1 53.7 67.2
NRP [9] 53.4 47.8 63.0 53.1 51.2 67.4
Ours 59.0 61.7 64.3 60.5 64.6 68.9

4.8 Hyper-parameters Analysis

The hyper-parameters of our method include the loss weights in Eq. 9. If values of all loss weights are set to tenfold or tenth, the results are shown in Table XI (the evaluation setting is the same as that of Table I and II). This table demonstrate that results with original hyper-parameters are the highest. And accuracy in the classification task, mIoU in the segmentation task, mAP in the detection task are altered no more than 6.1%, 8.0%, 7.2%. Thus, our model’s effect is not sensitive to hyper-parameters, and our chosen parameters are reasonable.

Table XIV: Quantitative comparison on the classification tasks with the targeted attack.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
No Defense 95.1 1.6 3.8 5.1 78.1 7.1 7.7 9.1 64.5 17.3 17.7 19.6
No Defense (finetune) 95.6 3.1 4.8 6.5 79.5 7.6 8.0 9.4 63.9 20.6 21.1 21.7
TRADES [48] 87.3 76.3 85.4 85.7 62.8 51.2 60.4 60.1 58.5 42.7 55.6 55.9
TRADES (finetune) [48] 85.4 75.0 83.1 83.8 78.1 51.0 52.7 43.6 43.3 40.1 41.7 42.0
Free-adv [54] 77.1 70.2 72.6 74.3 49.2 43.8 45.8 46.8 55.5 43.2 53.5 53.8
Free-adv (finetune) [54] 88.5 77.6 86.3 87.6 63.7 52.6 61.5 61.9 42.2 40.5 46.0 46.3
Defense [30] 39.9 36.2 36.5 37.0 31.1 28.4 28.8 30.3 20.4 16.3 17.4 17.8
SR [13] 48.0 45.8 47.7 47.5 33.6 31.8 32.1 32.5 31.1 29.4 29.7 30.0
FPD [16] 48.5 45.7 47.2 47.8 52.5 40.3 41.0 41.4 39.7 32.7 37.1 38.1
APE [15] 90.2 40.5 88.4 89.2 73.2 35.5 63.6 66.2 62.4 29.9 60.3 60.7
Denoise [14] 89.8 78.4 88.6 89.1 67.2 51.7 65.2 65.7 59.8 44.6 56.8 57.2
NRP [9] 91.8 76.7 87.9 88.4 70.3 50.2 65.4 66.0 59.1 43.0 56.6 57.5
Ours 90.7 80.1 88.8 89.7 68.7 53.1 66.3 66.7 62.9 44.8 61.2 62.4
Table XV: Quantitative comparison on the classification tasks with AT that trains with feature-level constraints. And “+Ours” means the setting that replaces the corresponding original feature-level constraints to our feature-level constraints.
WideResNet \rightarrow ResNet50 CIFAR10 (Accuracy %) CIFAR100 (Accuracy %) Tiny-ImageNet (Accuracy %)
clean PGD DeepFool C&W clean PGD DeepFool C&W clean PGD DeepFool C&W
No Defense 95.1 2.1 5.3 6.4 78.1 7.5 9.3 10.4 64.5 19.2 20.4 21.2
No Defense (finetune) 95.6 3.8 7.5 8.7 79.5 8.0 9.7 10.9 63.9 22.1 23.2 23.6
PCL [34] 88.7 80.3 87.2 87.8 65.4 56.2 64.7 65.0 60.2 45.8 59.1 60.3
PCL [34]+Ours 88.4 81.0(+0.7) 88.3(+1.1) 88.7(+0.9) 64.6 57.1(+0.9) 65.4(+0.7) 65.8(+0.8) 60.9 46.2(+0.4) 60.4(+1.3) 61.9(+1.6)
CADA [36] 83.8 74.6 83.6 84.3 60.8 52.6 60.1 60.9 55.7 42.0 57.0 57.8
CADA [36]+Ours 84.2 75.1(+0.5) 85.2(+1.6) 85.7(+1.4) 61.3 53.3(+0.7) 60.9(+0.8) 61.7(+0.8) 56.2 43.1(+1.1) 57.6(+0.6) 58.4(+0.6)
ATDA [38] 87.5 78.2 86.7 87.0 63.5 55.8 62.7 63.2 58.0. 43.7 58.4 59.1
ATDA [38]+Ours 87.9 79.5(+1.3) 87.5(+0.8) 87.9(+0.9) 64.2 56.4(+0.6) 63.8(+1.1) 64.5(+1.3) 58.7 44.5(+0.8) 59.5(+1.1) 60.5(+1.4)
Ours 90.7 81.4 90.4 90.6 68.7 54.0 67.5 68.4 62.9 46.4 62.2 63.2

4.9 Experiments under Model Transfer Evaluation

For defense via input transformation with deep generative models, we use a target model 𝒪\mathcal{O} for feature-level training. It has been verified that the trained generator 𝒢\mathcal{G} yields excellent defense quality for 𝒪\mathcal{O}. On the other hand, 𝒢\mathcal{G} can be deployed as a plug-and-play module to safeguard different target models 𝒪\mathcal{O}^{\prime} that were not employed during training. Evaluation with the model transfer setting [14] demonstrates this property. In the classification task, we defend ResNet50 against black-box attack while the generator 𝒢\mathcal{G} is trained with the target model of WideResNet; as for semantic segmentation, 𝒢\mathcal{G} is trained with PSPNet when utilized for the protection of DeepLabv3. The results are summarized in Tables XII and XIII. These outcomes provide empirical conclusion that the trained generator 𝒢\mathcal{G} is applicable to the safeguard of the target model that is not adopted during training, and can outperform ablation settings and most existing methods.

4.10 The Evaluation for Targeted Attack

For image classification, another major attack is the targeted attack. And we provide the evaluation under the targeted attack in this section. Compared with untargeted attack, the targeted adversarial samples aim to fool the classifier by outputting specific labels, and the target labels can be specified by the adversary. Following the setting in [67], the targeted ones are 9 target classes attaining the 9 highest scores at the original point (except the correct one). And all approaches are evaluated with the same setting. The results on CIFAR10, CIFAR100 and Tiny-ImageNet are shown in Table XIV. And we can still see the advantages of our framework compared with baselines, further demonstrating the effectiveness of our approach.

4.11 Applying Our Feature-level Constraints in AT

Some existing adversarial training approaches have also considered the feature level alignment. However, as discussed in Sec. 2, these methods [34, 35, 36, 38] can not achieve the overall distribution alignment and the alignment for paired samples (the adversarial sample and the corresponding clean sample) simultaneously. To demonstrate the superiority of our feature-level constraints that can achieve integrated distribution alignment, we replace the feature-level loss in the frameworks of [34, 35, 36, 38] with ours, and observe the change of the performance. To unify the training and evaluation setting, we retrain their models on different datasets with our chosen attack settings during training and evaluation. As shown in Table XV, after change the feature-level loss into our feature-level constraints, the performances of the corresponding methods are increased, while still lower than our framework’s performance (“Ours”).

4.12 What If The Attacker Knows About Defense

In this section, we analyze the situation where the attacker knows the existence of defense as described in [9]. In this situation, the attacker accesses the training data and mechanism, and trains a local defense similar to our trained 𝒢\mathcal{G}, and adopts BPDA [76] to bypass the defense. To simulate this attack, we adopt [77] as the structure of 𝒢\mathcal{G} and train it with our training mechanism defined in Eq. (9).

Besides, we apply PGD (translation-invariant attack), BIM, and “cls+loc” along with the BPDA to implement attacks. The accuracy of our framework on CIFAR10, CIFAR100 and Tiny-ImageNet (with WideResNet) under this attack setting are 74.5%, 48.8%, 40.3%, and the results of NRP are 72.4%, 45.1%, 38.2% respectively; the mIoU of our apporach on Cityscapes and VOC2012 (with PSPNet) are 53.6%, 56.2%, and the results with NRP are 50.7%, 53.2% respectively; the mAP of our method on VOC07+12 (with SSD) is 53.9% while the mAP with NRP is 51.1%. Obviously, BPDA cannot circumvent our defense and our defense outperforms NRP under this challenging setting.

5 Conclusion

In this paper, we have proposed a novel training scheme for DGNs that align the distribution of adversarial samples to clean samples for a given target model. Effectiveness of our strategy stems from the pixel- and feature-level constraints. As a general approach, our framework is suitable for various tasks, including image classification, semantic segmentation, and object detection. Extensive experiments reveal the effect of our novel constraints and illustrate the advantage of our method compared with existing state-of-the-art defense strategies.

References

  • [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
  • [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” ICLR, 2014.
  • [3] A. Arnab, O. Miksik, and P. H. Torr, “On the robustness of semantic segmentation models to adversarial attacks,” in CVPR, 2018.
  • [4] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in CVPR, 2017.
  • [5] Y. Dong, H. Su, B. Wu, Z. Li, W. Liu, T. Zhang, and J. Zhu, “Efficient decision-based black-box adversarial attacks on face recognition,” in CVPR, 2019.
  • [6] A. Joshi, A. Mukherjee, S. Sarkar, and C. Hegde, “Semantic adversarial attacks: Parametric transformations that fool deep classifiers,” in CVPR, 2019.
  • [7] Y. Jia, Y. Lu, J. Shen, Q. A. Chen, H. Chen, Z. Zhong, and T. Wei, “Fooling detection alone is not enough: Adversarial attack against multiple object tracking,” in ICLR, 2019.
  • [8] Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical-world-resilient adversarial examples for autonomous driving,” in CVPR, 2020.
  • [9] M. Naseer, S. Khan, M. Hayat, F. S. Khan, and F. Porikli, “A self-supervised approach for adversarial robustness,” in CVPR, 2020.
  • [10] R. Theagarajan and B. Bhanu, “Defending black box facial recognition classifiers against adversarial attacks,” in CVPRW, 2020.
  • [11] Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen, “Feature distillation: Dnn-oriented jpeg compression against adversarial examples,” in CVPR, 2019.
  • [12] C. Guo, M. Rana, M. Cisse, and L. Van Der Maaten, “Countering adversarial images using input transformations,” ICLR, 2018.
  • [13] A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao, “Image super-resolution as a defense against adversarial attacks,” TIP, 2019.
  • [14] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” in CVPR, 2018.
  • [15] S. Shen, G. Jin, K. Gao, and Y. Zhang, “Ape-gan: Adversarial perturbation elimination with gan,” in ICASSP, 2017.
  • [16] G. Li, S. Ding, J. Luo, and C. Liu, “Enhancing intrinsic adversarial robustness via feature pyramid decoder,” in CVPR, 2020.
  • [17] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • [18] A. Athalye and N. Carlini, “On the robustness of the CVPR 2018 white-box adversarial example defenses,” arXiv:1804.03286, 2018.
  • [19] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in ACM on Asia conference on computer and communications security, 2017.
  • [20] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” ICLR, 2017.
  • [21] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018.
  • [22] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” ICLR, 2016.
  • [23] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer, “Universal adversarial perturbations against semantic image segmentation,” in ICCV, 2017.
  • [24] Y. Li, D. Tian, X. Bian, S. Lyu et al., “Robust adversarial perturbation on deep proposal-based models,” in BMVC, 2018.
  • [25] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer, A. Prakash, and T. Kohno, “Physical adversarial examples for object detectors,” in 12th {\{USENIX}\} Workshop on Offensive Technologies, 2018.
  • [26] J. Lu, H. Sibai, and E. Fabry, “Adversarial examples that fool detectors,” arXiv:1712.02494, 2017.
  • [27] X. Wei, S. Liang, N. Chen, and X. Cao, “Transferable adversarial attacks for image and video object detection,” in IJCAI, 2018.
  • [28] X. Liu, H. Yang, Z. Liu, L. Song, H. Li, and Y. Chen, “Dpatch: An adversarial patch attack on object detectors,” arXiv:1806.02299, 2018.
  • [29] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarial effects through randomization,” ICLR, 2017.
  • [30] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classifiers against adversarial attacks using generative models,” in ICLR, 2018.
  • [31] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, “Deflecting adversarial attacks with pixel deflection,” in CVPR, 2018.
  • [32] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “Pixeldefend: Leveraging generative models to understand and defend against adversarial examples,” in ICLR, 2018.
  • [33] Y. Huang, Q. Guo, F. Juefei-Xu, L. Ma, W. Miao, Y. Liu, and G. Pu, “Advfilter: Predictive perturbation-aware filtering against adversarial attack via multi-domain learning,” in ACMMM, 2021.
  • [34] A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, “Adversarial defense by restricting the hidden space of deep neural networks,” in ICCV, 2019.
  • [35] A. Mustafa, S. H. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, “Deeply supervised discriminative learning for adversarial defense,” TPAMI, 2020.
  • [36] X. Hou, J. Liu, B. Xu, X. Wang, B. Liu, and G. Qiu, “Class-aware domain adaptation for improving adversarial robustness,” Image and Vision Computing, 2020.
  • [37] I. Goodfellow, J. Pouget-Abadie, and M. Mirza, “Generative adversarial nets,” in NIPS, 2014.
  • [38] C. Song, K. He, L. Wang, and J. E. Hopcroft, “Improving the generalization of adversarial training with domain adaptation,” in ICLR, 2019.
  • [39] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016.
  • [40] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017.
  • [41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
  • [42] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in ICCV, 2017.
  • [43] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in CVPR, 2018.
  • [44] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” NIPS, 2006.
  • [45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
  • [46] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016.
  • [47] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html, 2012.
  • [48] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in ICML, 2019.
  • [49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
  • [50] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, 2017.
  • [51] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587, 2017.
  • [52] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in ECCV, 2016.
  • [53] S. Liu, D. Huang et al., “Receptive field block net for accurate and fast object detection,” in ECCV, 2018.
  • [54] A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein, “Adversarial training for free!” in NIPS, 2019.
  • [55] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in NIPS, 2019.
  • [56] H. Zhao, “semseg,” https://github.com/hszhao/semseg, 2019.
  • [57] X. Xu, H. Zhao, and J. Jia, “Dynamic divide-and-conquer adversarial training for robust semantic segmentation,” in ICCV, 2021.
  • [58] H. Zhang and J. Wang, “Towards adversarially robust object detection,” in CVPR, 2019.
  • [59] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018.
  • [60] Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in CVPR, 2019.
  • [61] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in CVPR, 2016.
  • [62] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE symposium on security and privacy, 2017.
  • [63] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to black-box attacks using adversarial samples,” arXiv:1605.07277, 2016.
  • [64] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in ICLR, 2017.
  • [65] A. Ganeshan and R. V. Babu, “Fda: Feature disruptive attack,” in ICCV, 2019.
  • [66] W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, and Y. Yang, “Transferable adversarial perturbations,” in ECCV, 2018.
  • [67] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020.
  • [68] M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: a query-efficient black-box adversarial attack via random search,” in ECCV, 2020.
  • [69] Y. Shi, S. Wang, and Y. Han, “Curls & whey: Boosting black-box adversarial attacks,” in CVPR, 2019.
  • [70] C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in CVPR, 2019.
  • [71] J. Li, R. Ji, H. Liu, J. Liu, B. Zhong, C. Deng, and Q. Tian, “Projection & probability-driven black-box attack,” in CVPR, 2020.
  • [72] A. Rahmati, S.-M. Moosavi-Dezfooli, P. Frossard, and H. Dai, “Geoda: a geometric framework for black-box adversarial attacks,” in CVPR, 2020.
  • [73] W. Chen, Z. Zhang, X. Hu, and B. Wu, “Boosting decision-based black-box adversarial attacks with random sign flip,” in ECCV, 2020.
  • [74] Y. Lu, Y. Jia, J. Wang, B. Li, W. Chai, L. Carin, and S. Velipasalar, “Enhancing cross-task black-box transferability of adversarial examples with dispersion reduction,” in CVPR, 2020.
  • [75] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, 2010.
  • [76] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in ICML, 2018.
  • [77] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015.
[Uncaptioned image] Xiaogang Xu is currently a fourth-year PhD student in the Chinese University of Hong Kong. He received his bachelor degree from Zhejiang University. He obtained the Hong Kong PhD Fellowship in 2018. He serves as a reviewer for CVPR, ICCV, ECCV, AAAI, ICLR, NIPS, IJCV. His research interest includes deep learning, generative adversarial networks, adversarial attack and defense, etc.
[Uncaptioned image] Hengshuang Zhao is currently an Assistant Professor in the Department of Computer Science at The University of Hong Kong. He received the PhD degree in Computer Science and Engineering from The Chinese University of Hong Kong. He worked as a postdoctoral researcher at the University of Oxford and Massachusetts Institute of Technology. He and his team won several champions in competitive academic challenges like ImageNet Scene Parsing, LSUN Semantic Segmentation, WAD Drivable Area Segmentation, Embodied AI Social Navigation, etc. His general research interests cover the broad area of computer vision and machine learning, with special emphasis on high-level scene recognition and pixel-level scene understanding. He is a member of the IEEE.
[Uncaptioned image] Philip Torr received the PhD degree from the University of Oxford. After working for another three years at Oxford as a research fellow, he worked for six years in Microsoft Research, first in Redmond, then in Cambridge, founding the vision side of the Machine Learning and Perception Group. He then became a Professor in Computer Vision and Machine Learning at Oxford Brookes University. He is now a professor at Oxford University. He is a BMVA Distinguished Fellow, Ellis Fellow, Royal Academy of Engineering Fellow, Royal Society Fellow, and Turing AI World-Leading Researcher Fellow.
[Uncaptioned image] Jiaya Jia received the PhD degree in Computer Science from Hong Kong University of Science and Technology in 2004 and is currently a full professor in Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He was a visiting scholar at Microsoft Research Asia from March 2004 to August 2005 and conducted collaborative research at Adobe Systems in 2007. He is an Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is also in the editorial board of International Journal of Computer Vision (IJCV). He continuously served as area chairs for ICCV, CVPR, AAAI, ECCV, and several other conferences for organization. He was on program committees of major conferences in graphics and computational imaging, including ICCP, SIGGRAPH, and SIGGRAPH Asia. He received the Young Researcher Award 2008 and Research Excellence Award 2009 from CUHK. He is a Fellow of the IEEE.