General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments
Abstract
Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial attack that is highly transferable. This threat comes from the distribution gap between adversarial and clean samples in feature space of the target DNNs. In this paper, we use Deep Generative Networks (DGNs) with a novel training mechanism to eliminate the distribution gap. The trained DGNs align the distribution of adversarial samples with clean ones for the target DNNs by translating pixel values. Different from previous work, we propose a more effective pixel-level training constraint to make this achievable, thus enhancing robustness on adversarial samples. Further, a class-aware feature-level constraint is formulated for integrated distribution alignment. Our approach is general and applicable to multiple tasks, including image classification, semantic segmentation, and object detection. We conduct extensive experiments on different datasets. Our strategy demonstrates its unique effectiveness and generality against black-box attacks.
Index Terms:
Deep Generative Model, Adversarial Defense, Distribution Alignment.1 Introduction
Most deep learning models are vulnerable to adversarial samples [1, 2, 3, 4] that are maliciously generated to fool the target model by adding adversarial perturbation to the original input. Such perturbations are imperceptible to the human visual system, severely threatening real-world deep learning applications of face recognition [5, 6], self-driving cars [7, 8], etc. It is hard to have full knowledge of the target model in practice, and attack often adopts black-box mechanism, utilizing the transferability of adversarial samples. In this paper, we focus on defending such attack, following the setting of [9, 10].
To safeguard DNNs from black-box adversarial attack, a major class of adversarial defense applies input transformation to the input samples before they are processed by target DNNs. A fundamental strategy is to use image processing operations without learning (e.g., image compression [11] and quilting [12]). Alternatively, learning-based methods [13, 14, 9] train Deep Generative Networks (DGNs) to accomplish such transformation, resulting in higher robustness. The network is trained with adversarial/clean samples as input, and synthesizes output, on which the target model works well. The essence of these methods is to weaken difference between adversarial and the corresponding clean samples via updating their pixel values with the network. Current approaches to train such DGNs can be divided into three categories: 1) setting pixel-level constraints for reduction of the distance between adversarial and clean samples [15, 13]; 2) adopting constraints in the feature level of target models [14]; 3) and employing constraints in the pixel and feature level [9, 16]. All these methods ignore the overall distribution alignment in feature spaces of target models, which is a potential problem affecting robustness.
![]() |
![]() |
![]() |
![]() |
(a) in | (b) in | (c) in | (d) in |

(a) Pixel level training constraints

(b) Feature level training constraints
In this paper, we train DGNs to protect target models via aligning the distribution of clean and adversarial samples in the feature spaces of the target models. Compared with existing methods, novel training constraints are introduced in the pixel and feature levels. In the pixel level, we match adversarial samples with clean ones in the output space of DGNs. While in the feature level, we design a class-aware constraint by aligning the central feature of clean and adversarial samples within each class, maximizing the inter-class distance as well as minimizing the intra-class distance for all categories. Especially, our trained DGNs can be generalized to the protection of models that have not appeared during training.
DGNs trained with our method align the distribution of adversarial samples to clean ones, as exhibited in Fig. 1. By design, our defense is general for several high-level computer vision tasks. We apply it to image classification, semantic segmentation, and object detection, with the help of diverse datasets, models, and attack.
In summary, our contribution is the following.
-
•
Novel input transformation strategy to achieve defense against black-box attacks by distribution alignment for clean and adversarial samples, blocking the transferability of unseen adversarial samples effectively.
-
•
New training constraints in both pixel and feature levels of target models.
-
•
Extensive experiments on various tasks. Our method yields high robustness, effectiveness, and generality.
2 Related Work
Adversarial attack. Adversarial attack involves white-box attack [18, 2], where attackers have full knowledge of the target model and the defense strategy; gray-box attack [12], where attackers have access to the target model while no access to the defense strategy; black-box attack [19], where attackers do not know the target model and the defense, and often use adversarial samples’ transferability for attack. Existing attacks carried out on the classification task usually compute or simulate the gradient information of target models [2, 20, 21, 22]. Meanwhile, semantic segmentation [4, 23, 3] and object detection networks [4, 24, 25, 26, 27, 28] are also vulnerable to adversarial attack. It is common sense that the black-box attack is more common than the white-box and gray-box attack for real-world applications, and it is worth exploring how to defend such attacks for different tasks.
Adversarial defense. There have been several strategies for defense to eliminate threat of adversarial perturbation. A major class of defense transforms the input images for high robustness [12, 29]. Such approaches translate pixel values of adversarial/clean samples to remove the influence of highly transferable adversarial perturbation.
Current input transformation based defenses that employ DGNs can be divided into three categories, according to their training constraints. 1) Using pixel-level constraints to reduce the differences of pixel values between clean and adversarial samples [15, 13, 30, 31, 32, 33]; 2) applying feature-level constraints to unify representations of clean and adversarial samples in feature space of the target model [14]; 3) simultaneously setting pixel- and feature-level constraints, which are proved to be more advantageous [9, 16]. We note in the feature level, existing approaches only set the distance between clean and adversarial samples as the constraint to optimize, without alignment of distributions in feature space. For example, Huang et.al. [33] proposed to train a network with the mechanism of predictive perturbation-aware filtering, which can remove the adversarial perturbation through a denoising operation. In this paper, we propose to implement a novel input transformation strategy, where the DGNs are trained with exact pixel-level and feature-level distribution alignment.
Moreover, besides the lack of pixel-level alignment, although some existing adversarial training approaches have considered the feature level alignment, our method has noticeable differences with them. For instance, Mustafa et.al. [34, 35] designed prototype conformity loss which can force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes, while such a loss can not achieve the distribution alignment for adversarial samples and clean samples in the deep feature space. Hou et.al. [36] proposed to set a discriminator and adopt an adversarial learning strategy, which is similar to GAN training [37], to make the features of adversarial samples and clean samples indistinguishable. However, such adversarial learning can not explicitly enforce the complete distribution alignment, leading to the misalignment in the deep feature space with high probability. Song et.al. [38] proposed to incorporate the constraints of distribution alignment in the adversarial training. Nevertheless, such a method has not considered achieving an exact distribution alignment which is our target: not only the distribution shapes of adversarial and clean samples are aligned, but also the features of the paired individuality (the paired adversarial and the corresponding clean sample) are also aligned.


3 Our Method
We train a network for one task , where can be image classification, semantic segmentation, or object detection. The network is usually trained with a set of clean samples and we suppose , where is the distribution of clean samples. Trained network behaves decently on for task , while its performance remarkably degrades after adding adversarial perturbation to .
Adversarial samples are denoted as (), and we represent the distribution of adversarial samples as . As shown in Fig. 1(a)&(b), although the clean samples and the adversarial samples are indistinguishable, there is a large gap between and in feature space of the target model that causes to fail on adversarial samples.
To eliminate the threat of adversarial samples, we align and for a target model as exhibited in Fig. 1(c)&(d). Based on this motivation, we propose to train a network that aligns and in the feature space of the target model by modifying the pixel values of and .
As a result, translates into , into , and achieves decent results on and . To train , we set constraints for alignment in both pixel and feature level as shown in Fig. 2, and push the distribution of and to approach since performs well on . Furthermore, it is proved in Sec. 4.9 that the trained can also be generalized to protect the models that have not appeared during training.
3.1 Pixel-level Alignment
Motivation. The pixel-level training constraints are employed to weaken the difference in terms of pixel RGB values between clean and adversarial samples, and promote alignment in the feature space of the target model . Previous work validated that “distance metric between clean and adversarial samples” and “adversarial learning” [37] are two practical pixel-level training constraints. Traditional pixel-level constraints [15, 13, 30, 31, 32] mainly adopt to guide formulation of and through the generator , as displayed in Fig. 3(a). They compute the distance metric between and , and . And they set as real samples, and as fake samples to conduct adversarial learning.
We propose a scheme for pixel-level training constraints, where we use to guide the formulation of and utilize to help matching , as shown in Fig. 3(b). In this setting, is intermediate to shorten the discrepancy between and effectively. Comprehensive experiments illustrate that our novel setting results in conspicuous improvement for performance on adversarial samples, compared to the traditional schemes.
Our pixel-level training constraints also include the distance metric as well as adversarial learning.
Reconstruction loss. Given clean samples , adversarial samples are obtained by adding adversarial perturbation to , and the generator synthesizes output and with input of and . Based on it, we define the reconstruction loss term as
(1) | ||||
where is to compute the mean value, is the Euclidean distance. Moreover, to help synthesize images with high resolutions, we use a perceptual loss term [39, 40] , by computing the reconstruction distance in the feature space of an ImageNet-pretrained VGG-16 network [41] for and , as well as and . Not that the perceptual loss term is not the constraint for feature-level alignment, since the VGG-16 network is not task-specific and we mainly employ the perceptual loss for the visual similarity of and (or and ) in the pixel level.
Adversarial loss. To match the global pattern for and at the pixel level, we implement loss terms in adversarial learning with sampling. We employ a discriminator , and set the loss terms in the form of LSGAN [42] as
(2) | ||||
where is set for the discriminator, and is adopted for the generator. Additionally, the feature match loss is adopted as an auxiliary part of the adversarial loss [43]. We obtain intermediate features from for fake and real samples, and compute their distances as
(3) |
where are intermediate features obtained from the discriminator for real or fake samples .
Parameter: Training data , initialized and , maximum number of iteration , number of iteration
Theoretical analysis. In the traditional setting, to weaken the discrepancy between adversarial samples with the corresponding clean samples, is set as the objective to optimize for training . However, is large at the beginning of training and there exist multiple solutions that have the same value for at each step of training. Thus, this setting is ill-posed during training and is very likely to result in local optima (i.e., generated from does not approach enough).
In our pixel-level constraint, we instead set as the objective. The distance between and is narrower than the distance between and , since and locate in the same output space of . To demonstrate this, we adopt Proxy-A distance [44] to measure the distance between two domains’ distributions. Given the generalization error on discriminating between the target and source samples, the Proxy-A distance is defined as . It is observed that the Proxy-A distance between and is smaller at the beginning of training, always shorter than the Proxy-A distance between and , as shown in Fig. 4. It finally approaches zero.
Thus, our setting has a smaller solution space during training and is decent to avoid local optima, since it is simpler to train to make very close to . And we find that the optimization between and can also lead to more narrow distance between and . It is verified by the visualizations in Fig. 1 and Fig. 4.
Moreover, the target model performs well on both and , validated in experiments. obtained from our setting is close to because is close to , and is very near . Therefore, behaves better on with our setting than traditional ones.

3.2 Feature-level Alignment
Motivation. Besides the pixel-level training constraints, recent work revealed the necessity of feature-level training constraints for the target model [14, 9] to enhance protection. Feature-level alignment helps formulate robust representation for networks of a given task. Existing methods formulate feature-level training constraints as the distances in feature space of between synthesized and clean samples. However, they ignore the constraint for the alignment of overall and in feature level. We instead propose a class-aware feature-level training constraint, besides the distance metric in the feature space of . Our class-aware constraint aligns the integrated distribution of clean and adversarial samples within each category for the target model, minimizing the intra-class distance and maximizing the inter-class distance for adversarial samples’ distribution.
Task-oriented loss. Suppose is trained with the paired data where is the ground truth for . And the loss term to train can be represented as (e.g., is the cross-entropy loss for image classification and semantic segmentation, the loss for bounding box regression and classification in the object detection task). To align behaviors of clean and adversarial samples in feature space of , we adopt a feature-level loss as
(4) |
Reconstruction loss. Similar to the reconstruction loss in pixel level, we set loss to minimize distances between adversarial and clean samples in the feature space of as
(5) | ||||
where , and are the intermediate features of , and in the feature space of . The choice of feature space to compute feature-level constraints for different tasks is illustrated in the supplementary file.
Distribution alignment loss. We also note previous methods do not consider aligning the overall distribution of and in the feature space of . In contrast, we formulate our class-aware training constraint for the overall distribution alignment. Suppose there are classes within the distribution of and . We represent and as the features for and of the -th class extracted from . The clustering center of and is denoted as and in the distribution of and respectively. The class-aware constraint consists of three terms. First, we compute a loss term as the distance between and to align the distribution of adversarial samples to clean samples as
(6) |
Further, favorable distributions in the feature spaces of the target models for high-level tasks should have wide distances between the features from different classes, and narrow separation among the features from the same class. To this end, we set intra- and inter-class loss as
(7) | ||||
where is a pre-defined hyper-parameter for controlling the inter-class distance. And the overall class-aware training constraint is written as
(8) |
where , , and are loss weights.
3.3 Overall Training Constraint
In summary, the overall training constraint for the generator can be written as
(9) | ||||
where to are loss weights. Our training algorithm for is summarized in Alg. 1. , , and are set using grid search on the validation set. We adopt , , and in our experiments. We apply the same loss weights for training of various tasks.
Note that adversarial samples during training are synthesized from the target model only, while the trained model can defend unseen adversarial perturbations that are transferred from models with different structures.
4 Experiments
Our framework is applicable to several important tasks. In the following, we conduct extensive experimental evaluation on image classification, semantic segmentation and object detection, on multiple datasets. We show the effect of our proposed pixel- and feature-level constraints through ablation study and illustrate the superiority of our method compared with other alternatives.
4.1 Datasets
To demonstrate the performance of our method on the three key tasks, we select representative datasets in each task. For image classification, we choose CIFAR10 [17], CIFAR100 [17] and ImageNet [45] (we adopt its subset, the Tiny-ImageNet with 200 classes); for semantic segmentation, we employ Cityscapes [46] and VOC2012 [47]; for object detection, VOC07+12 [47] setting is adopted. The train and test splits are set according to their official setting.
4.2 Structure of DCNs in Experiments
In our experiments, the generator is implemented as the “global generator” structure in pix2pixHD [43]. In classification, images from CIFAR10 and CIFAR100 are with size and the images from Tiny-ImageNet are large for the input of target model as well as the generator. The generator contains two down-sample/up-sample layers. In the semantic segmentation task, the images from Cityscapes and VOC2012 are shaped as for the generator with four down-sample/up-sample layers, and are set as and for the target model respectively. As for object detection, the input size of the generator, which is built with four down-sample/up-sample layers, is for VOC07+12. The input size of the target model is . Note that our training constraints are suitable for the generator that is built with various structures.
4.3 Target Models
In the classification task, is adopted as the structure of WideResNet [48]/ResNet50 [49]; as for semantic segmentation, we employ with the architecture of PSPNet [50]/DeepLabv3 [51]; in object detection, is set as the framework of SSD [52]/RFBNet [53]. We use (trained without defense) to train in input transformation strategies (including ours), and the results of processed clean/adversarial samples / are computed by during evaluation. And in Sec. 4.9, it is verified that can also protect target models that have different structures and parameters from , and have not appeared during training.
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
No Defense | 95.1 | 2.1 | 5.3 | 6.4 | 78.1 | 7.5 | 9.3 | 10.4 | 64.5 | 19.2 | 20.4 | 21.2 |
No Defense (finetune) | 95.6 | 3.8 | 7.5 | 8.7 | 79.5 | 8.0 | 9.7 | 10.9 | 63.9 | 22.1 | 23.2 | 23.6 |
TRADES [48] | 87.3 | 79.7 | 87.1 | 87.2 | 62.8 | 53.9 | 62.6 | 62.7 | 58.5 | 45.3 | 58.3 | 58.5 |
TRADES (finetune) [48] | 85.4 | 78.5 | 85.3 | 85.4 | 78.1 | 53.9 | 55.8 | 44.2 | 43.3 | 42.2 | 43.2 | 43.4 |
Free-adv [54] | 77.1 | 73.1 | 76.7 | 77.0 | 49.2 | 46.6 | 49.0 | 49.2 | 55.5 | 45.8 | 55.4 | 55.5 |
Free-adv (finetune) [54] | 88.5 | 79.8 | 88.4 | 88.5 | 63.7 | 54.1 | 63.5 | 63.7 | 42.2 | 40.9 | 48.6 | 48.7 |
Defense [30] | 39.9 | 38.7 | 38.1 | 39.3 | 31.1 | 30.5 | 30.9 | 31.0 | 20.4 | 18.5 | 19.7 | 19.9 |
SR [13] | 48.0 | 47.2 | 48.0 | 48.1 | 33.6 | 33.1 | 33.5 | 33.8 | 31.1 | 30.5 | 30.9 | 31.2 |
FPD [16] | 48.5 | 47.6 | 48.4 | 48.5 | 52.5 | 41.2 | 42.4 | 42.5 | 39.7 | 33.7 | 39.5 | 39.8 |
APE [15] | 90.2 | 41.9 | 89.1 | 89.8 | 73.2 | 37.2 | 65.7 | 67.7 | 62.4 | 30.5 | 61.6 | 62.3 |
Denoise [14] | 89.8 | 80.0 | 90.1 | 90.0 | 67.2 | 53.1 | 66.8 | 66.1 | 59.8 | 45.3 | 59.4 | 59.9 |
NRP [9] | 91.8 | 79.0 | 89.2 | 89.8 | 70.3 | 52.8 | 66.6 | 67.2 | 59.1 | 45.5 | 59.0 | 59.2 |
Ours | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
4.4 Training
Training parameters. We employ Adam optimizer with and set to 0.5 and 0.999. The learning rate is set as 0.0002 and the batch size is 1 for Cityscapes and VOC2012, 4 for VOC07+12 , 16 for Tiny-ImageNet, and 64 for CIFAR10/CIFAR100. The number of training epoch is 80 for Cityscapes, 20 for VOC2012, 80 for VOC07+12, 30 for Tiny-ImageNet, 100 for CIFAR10/CIFAR100. Our method is implemented with PyTorch [55], and runs on a TITAN X GPU.
Data augmentation. For training on three tasks, we adopt the data augmentation policies described in [48, 50, 52]. For CIFAR10 and CIFAR100, we first pad the input to and randomly crop a patch of size . Then the patch is chosen to be horizontally flipped or not; for Tiny-ImageNet, the data augmentation includes the operation of random crop (the pad size is 4 and the crop size is 64) and random horizontal flip. For the segmentation task (Cityscapes and VOC2012), we employ the official augmentations in [56], including RandScale, RandRotate, RandomGaussianBlur, RandomHorizontalFlip, and random crop. For the detection task (VOC07+12), the augmentation strategies in [52] are utilized.
Details of our training constraints for image classification. For the classification task, we adopt WideResNet [48] and ResNet50 [49] for experiments, and implement feature-level training constraints by extracting the feature of the fully connected layer before the final layer. Especially, each image ( and are the height and width of the image) has one class label and one feature vector ( is the length of the vector). And we group features into classes according to .
Details of our training constraints for segmentation. In the semantic segmentation task, we use PSPNet [50] and DeepLabv3 [51] with ResNet50 as the backbone for experiments, and complete feature-level constraints by using the feature of the convolution layer before the final layer. Different with the classification task, each image in semantic segmentation task has multiple class labels and thus has multiple feature vectors with different classes: for an image , it has the corresponding segmentation map . And the feature of the image can be denoted as with a resized segmentation map . Then we can obtain a set of feature vectors with their corresponding labels as , , which are reshaped from as well as . Such set of feature vectors can be grouped into classes and utilized in feature-level constraints.
Details of our training constraints for object detection. We employ SSD [52] and RFBNet [53] with VGG16 as the backbone for experiments within the object detection task. And the features for feature-level constraints are obtained as the outputs of the backbone and are cropped with the ground truth of the bounding box. In this task, each image has multiple class labels for bounding boxes in this image, and thus also has multiple feature vectors with different classes: an image can have bounding boxes and one class label for each box. The feature of the image can be denoted as and we can obtain the feature of each bounding box on this feature map with the corresponding class label, as , .
PSPNet DeepLabv3 | Cityscapes (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | |||||||||
SSD RFBNet | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
No Defense | 73.5 | 3.6 | 38.6 | 12.5 | 76.4 | 11.1 | 46.1 | 16.6 | 72.5 | 17.9 | 14.5 | 15.0 |
No Defense (finetune) | 73.3 | 3.6 | 37.1 | 12.5 | 76.3 | 11.2 | 45.2 | 15.4 | 73.6 | 19.6 | 16.8 | 16.1 |
SAT [57] | 65.7 | 50.5 | 52.5 | 51.0 | 73.9 | 57.1 | 64.3 | 69.3 | – | – | – | – |
SAT (finetune) [57] | 66.3 | 42.1 | 47.7 | 42.4 | 74.2 | 60.5 | 60.1 | 63.4 | – | – | – | – |
DDCAT [57] | 67.7 | 51.4 | 54.4 | 51.8 | 75.1 | 58.8 | 65.1 | 69.3 | – | – | – | – |
DDCAT (finetune) [57] | 68.3 | 43.3 | 49.4 | 43.1 | 76.0 | 62.4 | 62.4 | 64.8 | – | – | – | – |
CLS [58] | – | – | – | – | – | – | – | – | 47.8 | 34.5 | 43.3 | 44.1 |
LOC [58] | – | – | – | – | – | – | – | – | 52.9 | 36.7 | 38.8 | 39.9 |
CON [58] | – | – | – | – | – | – | – | – | 40.7 | 31.3 | 39.5 | 40.3 |
MTD [58] | – | – | – | – | – | – | – | – | 49.1 | 42.0 | 44.3 | 44.1 |
Defense [30] | 22.2 | 20.2 | 19.9 | 21.2 | 23.5 | 21.9 | 22.6 | 23.3 | 34.6 | 30.2 | 30.8 | 29.7 |
SR [13] | 60.7 | 52.0 | 50.6 | 51.5 | 71.2 | 58.1 | 60.3 | 67.2 | 54.6 | 36.2 | 37.5 | 34.5 |
FPD [16] | 55.9 | 53.0 | 53.2 | 55.0 | 61.5 | 57.5 | 58.6 | 59.7 | 57.2 | 58.1 | 55.8 | 57.8 |
APE [15] | 54.5 | 34.7 | 33.3 | 45.4 | 74.3 | 37.8 | 54.3 | 59.9 | 62.3 | 57.9 | 58.0 | 55.8 |
Denoise [14] | 64.4 | 55.1 | 53.8 | 64.0 | 70.4 | 61.9 | 61.5 | 67.8 | 61.6 | 52.2 | 50.9 | 51.7 |
NRP [9] | 65.0 | 55.2 | 49.3 | 64.1 | 70.5 | 62.6 | 59.3 | 68.5 | 60.4 | 59.9 | 55.4 | 58.9 |
Ours | 67.6 | 59.5 | 62.0 | 64.7 | 71.2 | 63.6 | 66.0 | 69.5 | 60.5 | 61.2 | 58.3 | 60.9 |
ResNet50 WideResNet | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
No Defense | 94.3 | 8.2 | 4.3 | 5.8 | 76.1 | 17.3 | 18.3 | 16.6 | 62.1 | 22.7 | 21.7 | 22.2 |
No Defense (finetune) | 94.7 | 8.1 | 6.6 | 7.5 | 77.8 | 16.6 | 18.3 | 16.6 | 61.3 | 22.7 | 25.8 | 26.1 |
TRADES [48] | 85.3 | 82.3 | 85.1 | 85.2 | 59.8 | 57.3 | 59.7 | 59.8 | 44.3 | 43.0 | 44.3 | 44.3 |
TRADES (finetune) [48] | 83.2 | 82.0 | 83.1 | 83.2 | 56.3 | 55.2 | 56.3 | 56.3 | 43.3 | 42.4 | 43.3 | 43.3 |
Free-adv [54] | 86.3 | 81.6 | 86.2 | 86.3 | 64.0 | 55.8 | 63.9 | 64.0 | 53.4 | 41.6 | 53.3 | 53.4 |
Free-adv (finetune) [54] | 88.3 | 82.9 | 88.2 | 88.3 | 63.4 | 55.2 | 63.3 | 63.4 | 39.9 | 38.5 | 39.7 | 39.8 |
Defense [30] | 40.1 | 38.8 | 39.7 | 39.8 | 31.1 | 30.2 | 30.5 | 30.9 | 21.5 | 19.9 | 20.3 | 21.0 |
SR [13] | 45.3 | 45.1 | 45.3 | 45.3 | 35.6 | 35.3 | 35.9 | 36.3 | 30.6 | 29.5 | 30.0 | 30.1 |
FPD [16] | 53.0 | 52.7 | 52.9 | 53.0 | 50.0 | 46.8 | 49.7 | 50.0 | 34.5 | 31.7 | 34.2 | 34.4 |
APE [15] | 89.1 | 64.9 | 88.9 | 89.6 | 71.7 | 51.6 | 66.4 | 67.2 | 60.2 | 35.1 | 59.0 | 59.1 |
Denoise [14] | 88.3 | 80.7 | 87.9 | 88.2 | 66.7 | 56.3 | 65.3 | 65.5 | 56.6 | 42.3 | 56.5 | 56.7 |
NRP [9] | 90.3 | 80.9 | 88.1 | 88.3 | 66.7 | 56.6 | 66.2 | 66.6 | 56.5 | 43.3 | 56.1 | 56.5 |
Ours | 90.0 | 84.0 | 89.8 | 90.2 | 67.3 | 57.7 | 67.5 | 68.1 | 61.0 | 43.1 | 59.9 | 60.8 |
DeepLabv3 PSPNet | Cityscape (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | |||||||||
RFBNet SSD | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
No Defense | 73.2 | 3.8 | 32.3 | 13.5 | 76.9 | 11.8 | 49.0 | 20.2 | 80.6 | 16.9 | 15.6 | 15.9 |
No Defense (finetune) | 73.5 | 3.9 | 33.1 | 13.3 | 75.7 | 11.7 | 49.8 | 18.4 | 80.8 | 17.2 | 15.8 | 16.3 |
SAT [57] | 64.3 | 49.2 | 50.7 | 49.9 | 72.8 | 56.2 | 65.3 | 68.6 | – | – | – | – |
SAT (finetune) [57] | 65.4 | 42.5 | 48.5 | 42.8 | 74.8 | 60.1 | 64.1 | 69.9 | – | – | – | – |
DDCAT [57] | 67.7 | 50.3 | 52.9 | 50.8 | 74.2 | 61.5 | 66.7 | 69.9 | – | – | – | – |
DDCAT (finetune) [57] | 68.2 | 43.3 | 50.1 | 43.6 | 76.2 | 63.4 | 65.4 | 70.2 | – | – | – | – |
CLS [58] | – | – | – | – | – | – | – | – | 52.0 | 39.6 | 48.8 | 49.1 |
LOC [58] | – | – | – | – | – | – | – | – | 57.6 | 41.7 | 42.8 | 44.4 |
CON [58] | – | – | – | – | – | – | – | – | 43.5 | 35.3 | 43.8 | 44.1 |
MTD [58] | – | – | – | – | – | – | – | – | 53.7 | 47.1 | 48.3 | 48.9 |
Defense [30] | 23.3 | 21.2 | 20.4 | 22.9 | 22.5 | 20.7 | 22.0 | 22.4 | 42.2 | 37.8 | 38.2 | 37.1 |
SR [13] | 62.8 | 51.6 | 50.6 | 52.8 | 70.1 | 60.5 | 64.0 | 68.6 | 56.2 | 38.6 | 39.3 | 36.2 |
FPD [16] | 49.5 | 49.6 | 50.0 | 50.8 | 62.1 | 59.7 | 60.8 | 61.7 | 58.7 | 59.3 | 57.3 | 59.2 |
APE [15] | 54.2 | 27.2 | 22.7 | 35.6 | 75.5 | 40.8 | 57.6 | 63.4 | 68.5 | 60.0 | 61.3 | 58.8 |
Denoise [14] | 64.6 | 58.5 | 55.0 | 64.0 | 67.2 | 63.4 | 64.7 | 68.8 | 68.5 | 59.3 | 57.9 | 58.4 |
NRP [9] | 65.6 | 58.0 | 50.7 | 64.1 | 68.1 | 64.7 | 62.8 | 69.2 | 68.5 | 67.2 | 63.5 | 67.2 |
Ours | 67.3 | 60.3 | 62.7 | 64.9 | 69.1 | 65.2 | 66.9 | 70.7 | 70.2 | 70.0 | 68.1 | 70.0 |
Attacks employed in training. In this paper, all attacks are conducted with constraint and untargeted form (except the experiments in Sec. 4.10). For the image classification task, we follow [48] to set the attack parameters during training and testing. And we employ the attack parameters’ setting in [57] and [58] for the semantic segmentation and object detection task, respectively. Therefore, for experiments of classification, the adversarial perturbation during training is generated by PGD [59] with KL criterion; for semantic segmentation, we employ BIM [22] for training; in object detection, classification attack (“cls”) and localization attack (“loc”) [58] are utilized during training. And the corresponding details are described as the following. For the classification task, we utilize PGD attack with KL criterion during training, and the perturbation range , the step size , the number of attack iteration . In the semantic segmentation task, we use BIM (, , ) during training. Moreover, the attacks for classification loss and location loss are employed within the object detection task, and the perturbation range is 8 for pixel values within .
4.5 Evaluation
Attack types. During evaluation, in classification tasks, the adversarial perturbation is obtained by PGD with cross-entropy criterion and translation-invariant form [60], DeepFool attack [61], and C&W attack [62]. For semantic segmentation, we adopt BIM, DeepFool, and C&W for evaluation; for object detection, “cls”, “loc”, and “cls+loc” (simultaneously conducting classification and localization attacks) are employed for testing.
We experiment with the transferable attack to demonstrate our method’s effect. In such evaluation, attackers cannot utilize the exact gradient information of the target model. Instead, they usually obtain the gradient information from a substitute network, which is trained on the same dataset [19, 63, 64] with different model structures. Thus, in evaluation of the classification task, we use the perturbations computed from ResNet50 to attack the defense framework trained with WideResNet. For semantic segmentation, the adversarial samples obtained from DeepLabv3 are adopted to achieve attack for PSPNet. As for object detection, attacks for SSD are implemented by employing the adversarial perturbations generated from RFBNet.
Attack parameters. Following [48, 57, 58], for the classification task, we adopt PGD with cross-entropy criterion (, , ), DeepFool attack (), C&W attack (, the step size is ). For semantic segmentation, we utilize BIM (, , ), DeepFool attack (), C&W attack (, the step size is ) for evaluation. Moreover, the perturbation range is set as 8 for the object detection task. Note that we adopt different attack types and parameters for training and evaluation, to verify the generalization ability of the trained DGNs towards unseen attacks.
Metrics. Classification accuracy, mean of class-wise intersection over union (miou) and mean average precision (mAP), are adopted as the quality indicators for image classification, semantic segmentation, and object detection, respectively.
4.6 Comparison with Existing Methods
Baselines. We choose approaches of input transformation and adversarial training for comparison. Most existing adversarial training work focuses on image classification, and we adopt two recent methods, [48] and [54], for comparison. For the semantic segmentation task, [57] first conducted comprehensive exploration on the impact of adversarial training for semantic segmentation. We employ two strategies proposed by [57] (SAT and DDC-AT). For object detection, [58] proposed four variants of adversarial training. All these adversarial training approaches are trained with their original configuration and “finetune” means finetuning pre-trained models (without defense) via adversarial training. For input transformation strategies, we use six representative methods, including [9, 14, 15, 16, 30, 13]. We adopt their original generator structures, and re-train them with the same epochs, batch size and target models as ours.
WideResNet ResNet50 | CIFAR10 (Accuracy %) | |||||||||||
SSP [9] | FDA [65] | TAP [66] | MI [21] | Auto [67] | Square [68] | Cu&Wh [69] | DIM [70] | PPDA [71] | GeoDA [72] | SF [73] | DR [74] | |
Denoise [14] | 60.2 | 65.6 | 67.1 | 71.4 | 61.4 | 59.6 | 58.4 | 64.1 | 58.2 | 59.5 | 56.2 | 56.9 |
NRP [9] | 62.0 | 66.2 | 69.2 | 71.9 | 62.3 | 61.1 | 60.7 | 65.3 | 59.7 | 60.3 | 57.7 | 58.4 |
Ours | 64.1 | 68.0 | 70.1 | 73.2 | 65.2 | 63.3 | 62.5 | 67.4 | 63.4 | 64.7 | 61.3 | 63.9 |
WideResNet ResNet50 | CIFAR100 (Accuracy %) | |||||||||||
SSP [9] | FDA [65] | TAP [66] | MI [21] | Auto [67] | Square [68] | Cu&Wh [69] | DIM [70] | PPDA [71] | GeoDA [72] | SF [73] | DR [74] | |
Denoise [14] | 40.8 | 46.3 | 46.7 | 48.3 | 43.2 | 40.6 | 38.8 | 40.5 | 35.1 | 36.9 | 33.7 | 36.1 |
NRP [9] | 41.7 | 45.8 | 45.9 | 46.3 | 42.1 | 39.1 | 40.2 | 42.3 | 38.8 | 40.4 | 36.9 | 39.2 |
Ours | 44.9 | 47.6 | 48.1 | 50.3 | 45.5 | 41.8 | 42.2 | 46.7 | 42.3 | 43.5 | 40.2 | 43.0 |
WideResNet ResNet50 | Tiny-ImageNet (Accuracy %) | |||||||||||
SSP [9] | FDA [65] | TAP [66] | MI [21] | Auto [67] | Square [68] | Cu&Wh [69] | DIM [70] | PPDA [71] | GeoDA [72] | SF [73] | DR [74] | |
Denoise [14] | 34.5 | 40.9 | 38.5 | 41.2 | 34.2 | 31.8 | 31.7 | 36.2 | 32.9 | 35.0 | 31.0 | 34.7 |
NRP [9] | 35.5 | 39.1 | 37.2 | 40.6 | 36.7 | 30.5 | 32.4 | 37.6 | 33.4 | 34.8 | 31.3 | 35.1 |
Ours | 37.3 | 40.6 | 41.8 | 44.5 | 38.8 | 33.1 | 34.2 | 40.2 | 34.9 | 36.6 | 32.8 | 35.2 |
CIFAR10 | CIFAR100 | |||||
PGD | DeepFool | C&W | PGD | DeepFool | C&W | |
Ours vs Denoise | 2.7e-17 | 3.2e-8 | 5.6e-7 | 5.8e-17 | 4.3e-16 | 3.6e-7 |
Ours vs NRP | 1.6e-13 | 2.6e-7 | 7.4e-10 | 7.5e-9 | 6.8e-7 | 9.2e-12 |
Tiny-ImageNet | Cityscapes | |||||
PGD | DeepFool | C&W | BIM | DeepFool | C&W | |
Ours vs Denoise | 5.5e-7 | 8.2e-8 | 1.7e-7 | 7.2e-11 | 8.4e-10 | 2.9e-8 |
Ours vs NRP | 6.7e-8 | 2.4e-11 | 5.0e-10 | 7.3e-8 | 3.3e-10 | 9.1e-9 |
VOC2012 | VOC07+12 | |||||
BIM | DeepFool | C&W | cls | loc | cls+loc | |
Ours vs Denoise | 2.6e-11 | 8.3e-7 | 9.4e-7 | 1.1e-8 | 9.8e-10 | 6.9e-7 |
Ours vs NRP | 4.4e-9 | 5.7e-7 | 1.8e-10 | 4.8e-9 | 7.1e-7 | 8.8e-10 |
WideResNet ResNet50 | CIFAR10 (dB) | CIFAR100 (dB) | Tiny-ImageNet (dB) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
Defense [30] | 23.70 | 23.54 | 23.50 | 23.64 | 20.86 | 20.54 | 20.72 | 20.79 | 23.63 | 23.05 | 23.34 | 23.40 |
SR [13] | 24.58 | 24.05 | 24.11 | 24.26 | 21.20 | 21.08 | 21.14 | 21.18 | 25.27 | 25.04 | 25.09 | 25.21 |
FPD [16] | 24.64 | 24.16 | 24.24 | 24.45 | 22.08 | 21.85 | 21.91 | 21.93 | 26.14 | 25.67 | 26.06 | 26.09 |
APE [15] | 27.52 | 23.77 | 26.82 | 26.91 | 25.65 | 21.46 | 23.88 | 24.15 | 28.45 | 25.30 | 28.01 | 28.15 |
Denoise [14] | 27.43 | 25.24 | 26.85 | 26.98 | 24.87 | 22.21 | 24.04 | 23.96 | 28.14 | 26.11 | 27.67 | 27.87 |
NRP [9] | 27.81 | 25.03 | 27.03 | 27.07 | 25.32 | 22.03 | 24.13 | 24.20 | 28.01 | 26.32 | 27.88 | 27.92 |
Ours | 27.69 | 25.52 | 27.17 | 27.26 | 25.09 | 22.67 | 24.55 | 24.87 | 28.52 | 26.74 | 28.25 | 28.36 |
PSPNet DeepLabv3 | Cityscapes (dB) | VOC2012 (dB) | VOC07+12 (dB) | |||||||||
SSD RFBNet | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
Defense [30] | 21.06 | 20.52 | 20.36 | 20.92 | 20.83 | 20.08 | 20.34 | 20.50 | 23.21 | 22.88 | 23.03 | 22.84 |
SR [13] | 28.14 | 26.16 | 26.05 | 26.13 | 29.14 | 26.06 | 26.31 | 28.02 | 24.46 | 23.34 | 23.52 | 23.19 |
FPD [16] | 27.70 | 26.00 | 26.19 | 26.69 | 27.38 | 26.04 | 26.27 | 26.53 | 25.87 | 25.72 | 25.01 | 25.33 |
APE [15] | 27.15 | 24.87 | 24.21 | 25.38 | 30.06 | 22.50 | 25.80 | 26.15 | 26.23 | 25.06 | 25.07 | 24.80 |
Denoise [14] | 28.93 | 27.19 | 27.04 | 28.84 | 29.02 | 26.71 | 26.55 | 28.06 | 26.15 | 24.21 | 23.86 | 24.08 |
NRP [9] | 29.51 | 27.04 | 26.02 | 29.41 | 29.07 | 26.85 | 26.03 | 28.11 | 25.90 | 25.27 | 24.63 | 25.02 |
Ours | 30.02 | 27.45 | 28.93 | 29.66 | 29.25 | 27.13 | 28.22 | 28.64 | 25.94 | 25.81 | 25.08 | 25.37 |
Quantitative results. The results in the classification task are summarized in Table I. Although a few approaches yield comparable effect on clean samples, our full setting results in the highest robustness on adversarial samples, proving the effectiveness of our method. Further, as exhibited in Table II, in semantic segmentation and object detection, our results outperform all other competing methods on adversarial samples. Only a fraction of approaches work decently on clean samples. Most lack the similar level of robustness on adversarial samples (e.g., APE).
We also conduct the comparison between our approach and current methods, with the target model structure of ResNet50 [49], DeepLabv3 [51] and RFBNet [53]. The results are summarized in Table III and IV, and these results all support the superiority of our approach over current methods.
Evaluation with various attacks on classification tasks. Furthermore, there are other state-of-the-art black-box attack methods that are designed for the classification task. We include state-of-the-art attacks for evaluation, including SSP attack [9], MI-FGSM [21], FDA [65], TAP attack [66], DIM attack [70], Square attack [68], AutoAttack [67], Curls&Whey (Cu&Wh) attack [69], PPDA attack [71], GeoDA attack [72], SF attack [73] and Dispersion reduction (DR) attack [74]. They are all implemented as the untargeted attacks. The SSP attack is implemented with attack iteration number as 100, perturbation range as , step size as , and the attack is completed by using the VGG16 layers. The MI-FGSM attack is implemented with attack iteration number as 100, perturbation range as , step size as , and decay factor as 1.0. The FDA attack, TAP attack and DIM attack are implemented with attack iteration number as 100, perturbation range as , and step size as . Especially, the DIM attack is implemented with translation invariant attack [60] with decay factor as 1.0, the probability for stochastic transformation function as 0.5. The Square attack is completed with and attack iteration number as 10000. For the Square attack, loss function is adopted as the cross-entropy loss and margin loss, and the probability of changing a coordinate is set as 0.3. The AutoAttack is implemented with the PGD attack using the cross-entropy loss, and attack iteration number is set as 1000, number of restart is set as 5, is set as 0.75, and . The AutoAttack utilizes the transferability of adversarial samples. Curls&Whey attack is implemented by setting attack iteration number as 100, the variance of gaussian noise as 2, the step size for attack as , the binary search step as 12, and perturbation range as . The PPDA attack is completed with maximal iteration number as 4000, the size for the reduction of dimension is set as 1500, the amplitude of change at each step is set as , and the perturbation range is For GeoDA attack, the maximal iteration number is 4000, , the dimension of the subspace is 75, and the perturbation range is For SF attack, the maximal iteration number is 20000 and the perturbation range is and the the dimensionality reduction rate is 2.0. For DR attack, the attack iteration number is 100, perturbation range is , step size as . The results of our method and two strong baselines (Denoise [14] and NRP [9]) are reported in Table V. These results demonstrate that our approach can keep robustness towards various types of attacks and outperforms these two strong baselines.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
result of | result of | result of |
t-SNE visualizations for classification task on CIFAR10 [17] | |||
![]() |
![]() |
![]() |
![]() |
t-SNE visualizations for segmentation task on Cityscapes [46] | |||
![]() |
![]() |
![]() |
![]() |
t-SNE visualizations for detection task on VOC07+12 [75] | |||
![]() |
![]() |
![]() |
![]() |
(a) in | (b) in | (c) in | (d) in |
![]() |
albatross, score: 0.9998 |
![]() |
goose, score: 1.000 | ![]() |
albatross, score: 0.9887 |
![]() |
goldfish, score: 0.9998 |
![]() |
oboe, score: 1.000 | ![]() |
goldfish, score: 0.9973 |
result of | result of | result of |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
result of | result of | result of |
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
No Defense | 95.1 | 2.1 | 5.3 | 6.4 | 78.1 | 7.5 | 9.3 | 10.4 | 64.5 | 19.2 | 20.4 | 21.2 |
90.4 | 66.4 | 85.3 | 86.1 | 67.9 | 46.4 | 66.6 | 67.5 | 63.4 | 34.3 | 60.7 | 61.7 | |
90.7 | 78.2 | 89.0 | 89.5 | 68.1 | 53.4 | 67.0 | 67.8 | 63.2 | 46.1 | 61.5 | 62.4 | |
92.9 | 49.5 | 84.4 | 85.5 | 71.2 | 39.3 | 65.1 | 66.6 | 63.4 | 32.0 | 58.3 | 59.2 | |
92.5 | 74.1 | 87.6 | 88.3 | 70.3 | 50.4 | 65.9 | 66.8 | 63.2 | 44.4 | 60.2 | 61.1 | |
Full | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
PSPNet DeepLabv3 | Cityscapes (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | |||||||||
SSD RFBNet | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
No Defense | 73.5 | 3.6 | 38.6 | 12.5 | 76.4 | 11.1 | 46.1 | 16.6 | 72.5 | 17.9 | 14.5 | 15.0 |
61.5 | 50.8 | 52.4 | 56.1 | 73.1 | 59.8 | 63.0 | 68.3 | 63.4 | 59.4 | 54.4 | 58.2 | |
64.9 | 58.7 | 60.7 | 63.4 | 70.3 | 62.9 | 64.7 | 68.6 | 58.7 | 60.7 | 57.4 | 60.1 | |
63.8 | 46.3 | 50.9 | 54.6 | 74.1 | 45.0 | 57.2 | 63.1 | 65.1 | 58.3 | 53.7 | 56.3 | |
65.0 | 55.6 | 59.1 | 61.3 | 71.0 | 61.3 | 62.4 | 66.1 | 60.0 | 59.6 | 55.0 | 58.2 | |
Full | 67.6 | 59.5 | 62.0 | 64.7 | 71.2 | 63.6 | 66.0 | 69.5 | 60.5 | 61.2 | 58.3 | 60.9 |
The statistically significant calculation. To further analyze the superiority of our method over baselines, we conduct a paired t-test for the results in Table I and II with the null hypothesis H0: “the scores of two different methods do not have obvious difference.” We use a significant level of 0.001, given that H0 is true, and we calculate the p-value as shown in Table VI by using the Microsoft Excel T-TEST function. We find that all p-values under different attacks are smaller than 0.001. Hence, we can reject H0 and show that our result is different from others with a significant level of 0.001.
The evaluation for quality enhancement. For the input transformation strategy, we also need to compare the results of image quality enhancement, i.e., we can compute the PSNR value between the transformed adversarial samples and the clean samples . And we list the comparison between our method and other input transformation strategies in Table VII. Obviously, our method also has the superiority in terms of the image quality enhancement.
Distribution visualization results. Furthermore, we provide the t-SNE visualizations for the classification task, the semantic segmentation task and object detection tasks as shown in Fig. 6. The adversarial perturbations are obtained with the PGD attack, BIM attack and “cls+loc” attack.
Qualitative results. We provide visual illustrations to demonstrate the defense effects of our framework. The visual cases for semantic segmentation are displayed in Fig. 5 , for image classification are shown in Fig. 7, and for object detection are exhibited in Fig. 8. Compared with the situations without our defense, our trained leads to satisfying defense on adversarial samples in all three tasks.
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
Ours w/o | 87.3 | 72.8 | 86.7 | 87.1 | 61.8 | 48.9 | 60.9 | 61.6 | 62.7 | 42.6 | 58.8 | 60.6 |
Ours w/o | 88.4 | 78.6 | 87.1 | 87.5 | 63.5 | 52.8 | 62.4 | 64.7 | 63.1 | 44.2 | 60.4 | 62.0 |
Ours w/o | 90.2 | 80.1 | 89.7 | 90.2 | 67.3 | 53.6 | 67.0 | 67.2 | 62.2 | 45.8 | 61.7 | 62.5 |
Ours w/o | 87.0 | 73.4 | 81.6 | 82.0 | 62.1 | 50.4 | 61.6 | 62.3 | 60.6 | 40.3 | 57.2 | 58.4 |
Ours w/o | 92.6 | 62.5 | 85.5 | 85.8 | 70.5 | 40.2 | 66.3 | 67.0 | 63.3 | 35.5 | 57.9 | 58.7 |
Ours w/o | 91.8 | 60.7 | 82.3 | 83.1 | 68.6 | 43.7 | 64.2 | 65.3 | 62.8 | 38.2 | 56.3 | 57.9 |
Ours w/o | 90.3 | 70.6 | 83.0 | 84.4 | 70.5 | 47.2 | 62.2 | 63.0 | 62.7 | 41.1 | 55.3 | 56.8 |
Ours w/o | 91.1 | 73.8 | 85.2 | 86.7 | 69.4 | 49.1 | 63.9 | 64.7 | 61.8 | 42.5 | 57.0 | 57.6 |
Ours w/o | 90.8 | 71.3 | 86.1 | 87.2 | 70.8 | 47.9 | 65.1 | 66.0 | 62.5 | 44.1 | 59.2 | 60.5 |
Full | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
PSPNet DeepLabv3 | Cityscapes (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | |||||||||
SSD RFBNet | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
Ours w/o | 61.5 | 50.8 | 52.4 | 56.1 | 66.4 | 54.3 | 56.2 | 61.6 | 56.2 | 58.3 | 55.6 | 55.8 |
Ours w/o | 64.6 | 53.2 | 57.3 | 59.5 | 69.1 | 58.2 | 61.4 | 63.7 | 57.8 | 59.5 | 56.2 | 57.2 |
Ours w/o | 67.2 | 57.8 | 61.3 | 62.4 | 70.1 | 62.4 | 64.5 | 67.6 | 59.6 | 60.1 | 57.3 | 58.5 |
Ours w/o | 60.2 | 51.3 | 53.1 | 53.9 | 67.3 | 55.6 | 58.3 | 59.0 | 55.8 | 56.3 | 53.1 | 53.7 |
Ours w/o | 64.5 | 47.7 | 51.2 | 53.6 | 72.6 | 48.2 | 59.1 | 61.8 | 63.5 | 53.6 | 55.0 | 56.4 |
Ours w/o | 63.7 | 50.2 | 53.4 | 55.0 | 72.0 | 51.3 | 57.6 | 58.2 | 62.3 | 50.4 | 52.7 | 53.4 |
Ours w/o | 64.5 | 51.4 | 55.7 | 57.2 | 71.6 | 57.8 | 59.3 | 60.1 | 60.3 | 55.2 | 51.8 | 53.0 |
Ours w/o | 64.8 | 52.5 | 56.8 | 58.3 | 70.8 | 59.2 | 60.5 | 62.6 | 59.7 | 57.6 | 52.4 | 55.8 |
Ours w/o | 65.1 | 54.7 | 58.2 | 60.8 | 70.5 | 61.0 | 61.2 | 64.3 | 60.4 | 58.2 | 54.3 | 57.0 |
Full | 67.6 | 59.5 | 62.0 | 64.7 | 71.2 | 63.6 | 66.0 | 69.5 | 60.5 | 61.2 | 58.3 | 60.9 |
4.7 Ablation Study
Novel pixel-level constraints. For each task, we conduct an ablation study to analyze the impact of each loss term and verify the superiority on pixel- and feature-level constraints. We denote as the results with only pixel-level constraints in our approach, as the scores with our full constraints apart from . represents the consequence with only traditional pixel-level constraints, and refers to the results with traditional pixel- and feature-level constraints other than . The results are reported in Table VIII.
Compared with traditional pixel-level constraints, our novel pixel-level alignment strategy achieves much higher robustness on adversarial samples with a negligible degradation on clean samples as cost. Such superiority is prominent within the comparison between and , and . Such superiority demonstrates that adversarial samples can be better aligned with clean samples, when we match them with clean ones in the output space of the generator.
Integrated distribution alignment. Further, compared to , our full setting exhibits stable improvement for effects on clean samples and adversarial samples (as shown in Table VIII), manifesting the impact of our proposed . The positive effects of prove the importance of aligning the overall distribution in the feature space of the target model.
Loss functions. There are several loss terms in Eq. 9, and we conduct ablation studies to verify their importance by deleting different loss terms from Eq. 9 individually. And the corresponding results are recorded in Table IX.
Alternative pixel-level loss terms. As shown in Fig. 3, traditional and our pixel-level training constraints have different reconstruction loss and adversarial loss terms, and we set ablation study to analyze the effectiveness if we using the mix strategies. Thus, there are two settings.
I) We have the reconstruction loss in the traditional strategy ( and ), and the adversarial loss in our strategy (Eq. 2 and Eq. 3). Such an ablation setting is called “Full-abla-pixel-I”.
II) Use the reconstruction loss in our strategy ( and ), and the adversarial loss in the traditional strategy. Especially, the adversarial loss can be written as
(10) | ||||
(11) |
and such an ablation setting is called “Full-abla-pixel-II”. The corresponding results are shown in Table X.
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
Full-abla-pixel-I | 89.7 | 77.3 | 87.3 | 88.0 | 67.6 | 50.7 | 64.4 | 65.2 | 63.4 | 45.1 | 59.4 | 60.3 |
Full-abla-pixel-II | 89.4 | 76.7 | 86.8 | 87.3 | 67.9 | 51.2 | 65.0 | 65.4 | 63.2 | 44.9 | 60.3 | 61.4 |
Full-abla-feature-I | 90.6 | 79.2 | 85.9 | 86.2 | 68.8 | 53.5 | 67.1 | 67.6 | 62.7 | 46.0 | 60.1 | 61.2 |
Full-abla-feature-II | 90.5 | 80.5 | 89.1 | 89.5 | 68.0 | 52.6 | 66.3 | 68.0 | 62.1 | 45.7 | 58.6 | 59.7 |
deleting 1-st layer | 89.2 | 80.1 | 88.3 | 88.7 | 67.3 | 52.8 | 66.2 | 67.5 | 61.4 | 45.2 | 60.8 | 62.1 |
deleting 2-nd layer | 90.1 | 79.6 | 87.5 | 88.0 | 66.8 | 50.3 | 64.8 | 64.2 | 62.1 | 44.5 | 60.2 | 61.5 |
deleting 3-rd layer | 88.3 | 79.1 | 87.0 | 86.8 | 66.2 | 53.1 | 64.0 | 65.1 | 61.8 | 43.2 | 58.7 | 60.3 |
deleting 4-th layer | 87.8 | 77.3 | 85.3 | 85.6 | 65.7 | 51.8 | 62.8 | 63.7 | 60.4 | 43.8 | 58.0 | 60.6 |
deleting 5-th layer | 89.5 | 78.9 | 86.2 | 87.2 | 67.1 | 52.5 | 63.7 | 64.9 | 62.3 | 44.9 | 59.3 | 59.8 |
Full | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
PSPNet DeepLabv3 | Cityscapes (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | |||||||||
SSD RFBNet | clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc |
Full-abla-pixel-I | 64.7 | 55.4 | 57.0 | 60.5 | 71.4 | 58.3 | 63.2 | 67.6 | 59.2 | 58.7 | 54.4 | 57.2 |
Full-abla-pixel-II | 64.0 | 55.1 | 56.8 | 58.7 | 70.8 | 57.1 | 62.8 | 66.5 | 59.5 | 59.0 | 55.1 | 56.4 |
Full-abla-feature-I | 66.1 | 58.2 | 59.4 | 61.6 | 72.0 | 61.2 | 64.1 | 68.3 | 60.7 | 60.6 | 57.5 | 60.1 |
Full-abla-feature-II | 65.6 | 57.9 | 58.5 | 59.2 | 72.2 | 60.5 | 63.7 | 67.8 | 60.4 | 59.3 | 56.8 | 59.5 |
deleting 1-st layer | 67.0 | 58.4 | 61.2 | 63.8 | 70.4 | 62.7 | 65.2 | 68.4 | 60.1 | 60.4 | 57.2 | 60.3 |
deleting 2-nd layer | 66.3 | 57.5 | 60.8 | 63.2 | 70.8 | 61.8 | 64.3 | 67.5 | 60.8 | 58.8 | 55.6 | 59.4 |
deleting 3-rd layer | 67.1 | 57.0 | 61.7 | 62.7 | 71.6 | 60.4 | 63.8 | 67.1 | 59.7 | 58.1 | 56.2 | 58.8 |
deleting 4-th layer | 65.8 | 55.8 | 59.4 | 60.3 | 71.4 | 60.1 | 61.5 | 65.6 | 59.2 | 56.5 | 54.7 | 56.2 |
deleting 5-th layer | 66.7 | 56.3 | 60.5 | 61.9 | 70.1 | 61.3 | 63.1 | 64.9 | 60.3 | 57.0 | 55.0 | 57.6 |
Full | 67.6 | 59.5 | 62.0 | 64.7 | 71.2 | 63.6 | 66.0 | 69.5 | 60.5 | 61.2 | 58.3 | 60.9 |
Alternative feature-level loss terms. In the feature-level alignment, we can also utilize the similar strategy in the pixel-level alignment: using the features of to guide the formulation of . To demonstrate the drawback of such alternative strategy, we set two experiments.
I) we modify Eq. 5 to the following form, as
(12) | ||||
and we keep other loss terms unchanged. Such an ablation setting is called “Full-abla-feature-I”.
II) We use the distribution of to align the distribution of and change Eq. 6 to the following equation, as
(13) |
where denotes the clustering center of , and we preserve the other loss terms not modified, and called such an ablation setting as “Full-abla-feature-II”. The corresponding results are shown in Table X.
The layer choice in the discriminator. In Eq. 3, we have employed the loss to measure the distances between clean and adversarial samples in terms of the discriminative feature maps which are extracted from the discriminator. We adopt the “MultiscaleDiscriminator” following the setting of pix2pixHD [43]. And there are 5 layers in the discriminator and we choose all these layers to compute since the corresponding results are optimal. To prove this, we deleting individual layer from computing (denoted as “deleting 1-st/2-nd/3-rd/4-th/5-th layer”), and we provide the experimental results in Table X to demonstrate that the layer selected by us is optimal.
CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | ||||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
0.1 | 86.6 | 78.2 | 87.7 | 88.2 | 66.4 | 51.2 | 65.3 | 64.2 | 60.3 | 43.8 | 59.4 | 60.1 |
10 | 85.3 | 77.5 | 86.9 | 87.4 | 65.8 | 53.0 | 66.1 | 66.7 | 61.2 | 44.1 | 60.5 | 61.4 |
original | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
Cityscapes (mIoU %) | VOC2012 (mIoU %) | VOC07+12 (mAP %) | ||||||||||
clean | BIM | DeepFool | C&W | clean | BIM | DeepFool | C&W | clean | cls | loc | cls+loc | |
0.1 | 63.5 | 56.3 | 58.4 | 60.4 | 69.3 | 61.4 | 62.6 | 67.8 | 56.6 | 57.7 | 54.1 | 57.2 |
10 | 64.7 | 54.7 | 60.8 | 63.6 | 67.1 | 60.7 | 61.5 | 65.1 | 57.9 | 58.3 | 57.0 | 58.1 |
original | 67.6 | 59.5 | 62.0 | 64.7 | 71.2 | 63.6 | 66.0 | 69.5 | 60.5 | 61.2 | 58.3 | 60.9 |
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | ||||
PGD | DeepFool | C&W | PGD | DeepFool | C&W | |
No Defense | 8.2 | 4.3 | 5.8 | 17.3 | 18.3 | 16.6 |
44.5 | 84.6 | 85.7 | 29.6 | 63.4 | 64.2 | |
67.1 | 88.2 | 88.6 | 40.1 | 64.1 | 65.5 | |
18.1 | 84.0 | 84.7 | 14.9 | 62.6 | 63.4 | |
59.9 | 87.0 | 87.2 | 36.6 | 63.2 | 65.2 | |
Defense [30] | 38.5 | 37.8 | 38.9 | 21.5 | 22.8 | 22.5 |
SR [13] | 44.8 | 45.2 | 45.3 | 24.8 | 25.5 | 25.1 |
FPD [16] | 44.2 | 44.0 | 44.4 | 30.5 | 31.1 | 31.2 |
APE [15] | 10.1 | 86.2 | 86.4 | 11.4 | 64.8 | 65.4 |
Denoise [14] | 73.0 | 87.7 | 87.1 | 35.7 | 63.7 | 64.2 |
NRP [9] | 60.4 | 87.3 | 88.0 | 36.2 | 63.5 | 64.6 |
Ours | 73.3 | 88.4 | 89.1 | 42.0 | 64.9 | 65.8 |
PSPNet DeepLabv3 | Cityscapes (mIoU %) | VOC2012 (mIoU %) | ||||
BIM | DeepFool | C&W | BIM | DeepFool | C&W | |
No Defense | 3.8 | 32.3 | 13.5 | 11.8 | 49.0 | 20.2 |
48.5 | 49.9 | 55.0 | 54.9 | 59.8 | 68.0 | |
57.8 | 60.3 | 62.2 | 59.9 | 63.2 | 68.2 | |
43.0 | 48.9 | 53.7 | 33.2 | 49.9 | 53.7 | |
53.8 | 58.6 | 60.0 | 55.1 | 60.3 | 65.8 | |
Defense [30] | 20.1 | 19.7 | 20.7 | 21.6 | 21.2 | 23.1 |
SR [13] | 41.6 | 40.5 | 42.5 | 52.1 | 56.9 | 66.8 |
FPD [16] | 51.5 | 51.7 | 53.6 | 56.7 | 57.3 | 60.5 |
APE [15] | 28.5 | 26.8 | 40.0 | 25.7 | 46.9 | 44.5 |
Denoise [14] | 52.7 | 51.1 | 63.4 | 60.1 | 53.7 | 67.2 |
NRP [9] | 53.4 | 47.8 | 63.0 | 53.1 | 51.2 | 67.4 |
Ours | 59.0 | 61.7 | 64.3 | 60.5 | 64.6 | 68.9 |
4.8 Hyper-parameters Analysis
The hyper-parameters of our method include the loss weights in Eq. 9. If values of all loss weights are set to tenfold or tenth, the results are shown in Table XI (the evaluation setting is the same as that of Table I and II). This table demonstrate that results with original hyper-parameters are the highest. And accuracy in the classification task, mIoU in the segmentation task, mAP in the detection task are altered no more than 6.1%, 8.0%, 7.2%. Thus, our model’s effect is not sensitive to hyper-parameters, and our chosen parameters are reasonable.
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
No Defense | 95.1 | 1.6 | 3.8 | 5.1 | 78.1 | 7.1 | 7.7 | 9.1 | 64.5 | 17.3 | 17.7 | 19.6 |
No Defense (finetune) | 95.6 | 3.1 | 4.8 | 6.5 | 79.5 | 7.6 | 8.0 | 9.4 | 63.9 | 20.6 | 21.1 | 21.7 |
TRADES [48] | 87.3 | 76.3 | 85.4 | 85.7 | 62.8 | 51.2 | 60.4 | 60.1 | 58.5 | 42.7 | 55.6 | 55.9 |
TRADES (finetune) [48] | 85.4 | 75.0 | 83.1 | 83.8 | 78.1 | 51.0 | 52.7 | 43.6 | 43.3 | 40.1 | 41.7 | 42.0 |
Free-adv [54] | 77.1 | 70.2 | 72.6 | 74.3 | 49.2 | 43.8 | 45.8 | 46.8 | 55.5 | 43.2 | 53.5 | 53.8 |
Free-adv (finetune) [54] | 88.5 | 77.6 | 86.3 | 87.6 | 63.7 | 52.6 | 61.5 | 61.9 | 42.2 | 40.5 | 46.0 | 46.3 |
Defense [30] | 39.9 | 36.2 | 36.5 | 37.0 | 31.1 | 28.4 | 28.8 | 30.3 | 20.4 | 16.3 | 17.4 | 17.8 |
SR [13] | 48.0 | 45.8 | 47.7 | 47.5 | 33.6 | 31.8 | 32.1 | 32.5 | 31.1 | 29.4 | 29.7 | 30.0 |
FPD [16] | 48.5 | 45.7 | 47.2 | 47.8 | 52.5 | 40.3 | 41.0 | 41.4 | 39.7 | 32.7 | 37.1 | 38.1 |
APE [15] | 90.2 | 40.5 | 88.4 | 89.2 | 73.2 | 35.5 | 63.6 | 66.2 | 62.4 | 29.9 | 60.3 | 60.7 |
Denoise [14] | 89.8 | 78.4 | 88.6 | 89.1 | 67.2 | 51.7 | 65.2 | 65.7 | 59.8 | 44.6 | 56.8 | 57.2 |
NRP [9] | 91.8 | 76.7 | 87.9 | 88.4 | 70.3 | 50.2 | 65.4 | 66.0 | 59.1 | 43.0 | 56.6 | 57.5 |
Ours | 90.7 | 80.1 | 88.8 | 89.7 | 68.7 | 53.1 | 66.3 | 66.7 | 62.9 | 44.8 | 61.2 | 62.4 |
WideResNet ResNet50 | CIFAR10 (Accuracy %) | CIFAR100 (Accuracy %) | Tiny-ImageNet (Accuracy %) | |||||||||
clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | clean | PGD | DeepFool | C&W | |
No Defense | 95.1 | 2.1 | 5.3 | 6.4 | 78.1 | 7.5 | 9.3 | 10.4 | 64.5 | 19.2 | 20.4 | 21.2 |
No Defense (finetune) | 95.6 | 3.8 | 7.5 | 8.7 | 79.5 | 8.0 | 9.7 | 10.9 | 63.9 | 22.1 | 23.2 | 23.6 |
PCL [34] | 88.7 | 80.3 | 87.2 | 87.8 | 65.4 | 56.2 | 64.7 | 65.0 | 60.2 | 45.8 | 59.1 | 60.3 |
PCL [34]+Ours | 88.4 | 81.0(+0.7) | 88.3(+1.1) | 88.7(+0.9) | 64.6 | 57.1(+0.9) | 65.4(+0.7) | 65.8(+0.8) | 60.9 | 46.2(+0.4) | 60.4(+1.3) | 61.9(+1.6) |
CADA [36] | 83.8 | 74.6 | 83.6 | 84.3 | 60.8 | 52.6 | 60.1 | 60.9 | 55.7 | 42.0 | 57.0 | 57.8 |
CADA [36]+Ours | 84.2 | 75.1(+0.5) | 85.2(+1.6) | 85.7(+1.4) | 61.3 | 53.3(+0.7) | 60.9(+0.8) | 61.7(+0.8) | 56.2 | 43.1(+1.1) | 57.6(+0.6) | 58.4(+0.6) |
ATDA [38] | 87.5 | 78.2 | 86.7 | 87.0 | 63.5 | 55.8 | 62.7 | 63.2 | 58.0. | 43.7 | 58.4 | 59.1 |
ATDA [38]+Ours | 87.9 | 79.5(+1.3) | 87.5(+0.8) | 87.9(+0.9) | 64.2 | 56.4(+0.6) | 63.8(+1.1) | 64.5(+1.3) | 58.7 | 44.5(+0.8) | 59.5(+1.1) | 60.5(+1.4) |
Ours | 90.7 | 81.4 | 90.4 | 90.6 | 68.7 | 54.0 | 67.5 | 68.4 | 62.9 | 46.4 | 62.2 | 63.2 |
4.9 Experiments under Model Transfer Evaluation
For defense via input transformation with deep generative models, we use a target model for feature-level training. It has been verified that the trained generator yields excellent defense quality for . On the other hand, can be deployed as a plug-and-play module to safeguard different target models that were not employed during training. Evaluation with the model transfer setting [14] demonstrates this property. In the classification task, we defend ResNet50 against black-box attack while the generator is trained with the target model of WideResNet; as for semantic segmentation, is trained with PSPNet when utilized for the protection of DeepLabv3. The results are summarized in Tables XII and XIII. These outcomes provide empirical conclusion that the trained generator is applicable to the safeguard of the target model that is not adopted during training, and can outperform ablation settings and most existing methods.
4.10 The Evaluation for Targeted Attack
For image classification, another major attack is the targeted attack. And we provide the evaluation under the targeted attack in this section. Compared with untargeted attack, the targeted adversarial samples aim to fool the classifier by outputting specific labels, and the target labels can be specified by the adversary. Following the setting in [67], the targeted ones are 9 target classes attaining the 9 highest scores at the original point (except the correct one). And all approaches are evaluated with the same setting. The results on CIFAR10, CIFAR100 and Tiny-ImageNet are shown in Table XIV. And we can still see the advantages of our framework compared with baselines, further demonstrating the effectiveness of our approach.
4.11 Applying Our Feature-level Constraints in AT
Some existing adversarial training approaches have also considered the feature level alignment. However, as discussed in Sec. 2, these methods [34, 35, 36, 38] can not achieve the overall distribution alignment and the alignment for paired samples (the adversarial sample and the corresponding clean sample) simultaneously. To demonstrate the superiority of our feature-level constraints that can achieve integrated distribution alignment, we replace the feature-level loss in the frameworks of [34, 35, 36, 38] with ours, and observe the change of the performance. To unify the training and evaluation setting, we retrain their models on different datasets with our chosen attack settings during training and evaluation. As shown in Table XV, after change the feature-level loss into our feature-level constraints, the performances of the corresponding methods are increased, while still lower than our framework’s performance (“Ours”).
4.12 What If The Attacker Knows About Defense
In this section, we analyze the situation where the attacker knows the existence of defense as described in [9]. In this situation, the attacker accesses the training data and mechanism, and trains a local defense similar to our trained , and adopts BPDA [76] to bypass the defense. To simulate this attack, we adopt [77] as the structure of and train it with our training mechanism defined in Eq. (9).
Besides, we apply PGD (translation-invariant attack), BIM, and “cls+loc” along with the BPDA to implement attacks. The accuracy of our framework on CIFAR10, CIFAR100 and Tiny-ImageNet (with WideResNet) under this attack setting are 74.5%, 48.8%, 40.3%, and the results of NRP are 72.4%, 45.1%, 38.2% respectively; the mIoU of our apporach on Cityscapes and VOC2012 (with PSPNet) are 53.6%, 56.2%, and the results with NRP are 50.7%, 53.2% respectively; the mAP of our method on VOC07+12 (with SSD) is 53.9% while the mAP with NRP is 51.1%. Obviously, BPDA cannot circumvent our defense and our defense outperforms NRP under this challenging setting.
5 Conclusion
In this paper, we have proposed a novel training scheme for DGNs that align the distribution of adversarial samples to clean samples for a given target model. Effectiveness of our strategy stems from the pixel- and feature-level constraints. As a general approach, our framework is suitable for various tasks, including image classification, semantic segmentation, and object detection. Extensive experiments reveal the effect of our novel constraints and illustrate the advantage of our method compared with existing state-of-the-art defense strategies.
References
- [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in ICLR, 2014.
- [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” ICLR, 2014.
- [3] A. Arnab, O. Miksik, and P. H. Torr, “On the robustness of semantic segmentation models to adversarial attacks,” in CVPR, 2018.
- [4] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, “Adversarial examples for semantic segmentation and object detection,” in CVPR, 2017.
- [5] Y. Dong, H. Su, B. Wu, Z. Li, W. Liu, T. Zhang, and J. Zhu, “Efficient decision-based black-box adversarial attacks on face recognition,” in CVPR, 2019.
- [6] A. Joshi, A. Mukherjee, S. Sarkar, and C. Hegde, “Semantic adversarial attacks: Parametric transformations that fool deep classifiers,” in CVPR, 2019.
- [7] Y. Jia, Y. Lu, J. Shen, Q. A. Chen, H. Chen, Z. Zhong, and T. Wei, “Fooling detection alone is not enough: Adversarial attack against multiple object tracking,” in ICLR, 2019.
- [8] Z. Kong, J. Guo, A. Li, and C. Liu, “Physgan: Generating physical-world-resilient adversarial examples for autonomous driving,” in CVPR, 2020.
- [9] M. Naseer, S. Khan, M. Hayat, F. S. Khan, and F. Porikli, “A self-supervised approach for adversarial robustness,” in CVPR, 2020.
- [10] R. Theagarajan and B. Bhanu, “Defending black box facial recognition classifiers against adversarial attacks,” in CVPRW, 2020.
- [11] Z. Liu, Q. Liu, T. Liu, N. Xu, X. Lin, Y. Wang, and W. Wen, “Feature distillation: Dnn-oriented jpeg compression against adversarial examples,” in CVPR, 2019.
- [12] C. Guo, M. Rana, M. Cisse, and L. Van Der Maaten, “Countering adversarial images using input transformations,” ICLR, 2018.
- [13] A. Mustafa, S. H. Khan, M. Hayat, J. Shen, and L. Shao, “Image super-resolution as a defense against adversarial attacks,” TIP, 2019.
- [14] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” in CVPR, 2018.
- [15] S. Shen, G. Jin, K. Gao, and Y. Zhang, “Ape-gan: Adversarial perturbation elimination with gan,” in ICASSP, 2017.
- [16] G. Li, S. Ding, J. Luo, and C. Liu, “Enhancing intrinsic adversarial robustness via feature pyramid decoder,” in CVPR, 2020.
- [17] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
- [18] A. Athalye and N. Carlini, “On the robustness of the CVPR 2018 white-box adversarial example defenses,” arXiv:1804.03286, 2018.
- [19] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in ACM on Asia conference on computer and communications security, 2017.
- [20] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, “Ensemble adversarial training: Attacks and defenses,” ICLR, 2017.
- [21] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, and J. Li, “Boosting adversarial attacks with momentum,” in CVPR, 2018.
- [22] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” ICLR, 2016.
- [23] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer, “Universal adversarial perturbations against semantic image segmentation,” in ICCV, 2017.
- [24] Y. Li, D. Tian, X. Bian, S. Lyu et al., “Robust adversarial perturbation on deep proposal-based models,” in BMVC, 2018.
- [25] D. Song, K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramer, A. Prakash, and T. Kohno, “Physical adversarial examples for object detectors,” in 12th USENIX Workshop on Offensive Technologies, 2018.
- [26] J. Lu, H. Sibai, and E. Fabry, “Adversarial examples that fool detectors,” arXiv:1712.02494, 2017.
- [27] X. Wei, S. Liang, N. Chen, and X. Cao, “Transferable adversarial attacks for image and video object detection,” in IJCAI, 2018.
- [28] X. Liu, H. Yang, Z. Liu, L. Song, H. Li, and Y. Chen, “Dpatch: An adversarial patch attack on object detectors,” arXiv:1806.02299, 2018.
- [29] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, “Mitigating adversarial effects through randomization,” ICLR, 2017.
- [30] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classifiers against adversarial attacks using generative models,” in ICLR, 2018.
- [31] A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer, “Deflecting adversarial attacks with pixel deflection,” in CVPR, 2018.
- [32] Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “Pixeldefend: Leveraging generative models to understand and defend against adversarial examples,” in ICLR, 2018.
- [33] Y. Huang, Q. Guo, F. Juefei-Xu, L. Ma, W. Miao, Y. Liu, and G. Pu, “Advfilter: Predictive perturbation-aware filtering against adversarial attack via multi-domain learning,” in ACMMM, 2021.
- [34] A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, “Adversarial defense by restricting the hidden space of deep neural networks,” in ICCV, 2019.
- [35] A. Mustafa, S. H. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, “Deeply supervised discriminative learning for adversarial defense,” TPAMI, 2020.
- [36] X. Hou, J. Liu, B. Xu, X. Wang, B. Liu, and G. Qiu, “Class-aware domain adaptation for improving adversarial robustness,” Image and Vision Computing, 2020.
- [37] I. Goodfellow, J. Pouget-Abadie, and M. Mirza, “Generative adversarial nets,” in NIPS, 2014.
- [38] C. Song, K. He, L. Wang, and J. E. Hopcroft, “Improving the generalization of adversarial training with domain adaptation,” in ICLR, 2019.
- [39] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV, 2016.
- [40] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017.
- [41] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in ICLR, 2015.
- [42] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in ICCV, 2017.
- [43] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in CVPR, 2018.
- [44] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” NIPS, 2006.
- [45] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in CVPR, 2009.
- [46] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in CVPR, 2016.
- [47] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html, 2012.
- [48] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-off between robustness and accuracy,” in ICML, 2019.
- [49] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016.
- [50] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, 2017.
- [51] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv:1706.05587, 2017.
- [52] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in ECCV, 2016.
- [53] S. Liu, D. Huang et al., “Receptive field block net for accurate and fast object detection,” in ECCV, 2018.
- [54] A. Shafahi, M. Najibi, M. A. Ghiasi, Z. Xu, J. Dickerson, C. Studer, L. S. Davis, G. Taylor, and T. Goldstein, “Adversarial training for free!” in NIPS, 2019.
- [55] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in NIPS, 2019.
- [56] H. Zhao, “semseg,” https://github.com/hszhao/semseg, 2019.
- [57] X. Xu, H. Zhao, and J. Jia, “Dynamic divide-and-conquer adversarial training for robust semantic segmentation,” in ICCV, 2021.
- [58] H. Zhang and J. Wang, “Towards adversarially robust object detection,” in CVPR, 2019.
- [59] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018.
- [60] Y. Dong, T. Pang, H. Su, and J. Zhu, “Evading defenses to transferable adversarial examples by translation-invariant attacks,” in CVPR, 2019.
- [61] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in CVPR, 2016.
- [62] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE symposium on security and privacy, 2017.
- [63] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in machine learning: from phenomena to black-box attacks using adversarial samples,” arXiv:1605.07277, 2016.
- [64] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” in ICLR, 2017.
- [65] A. Ganeshan and R. V. Babu, “Fda: Feature disruptive attack,” in ICCV, 2019.
- [66] W. Zhou, X. Hou, Y. Chen, M. Tang, X. Huang, X. Gan, and Y. Yang, “Transferable adversarial perturbations,” in ECCV, 2018.
- [67] F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in ICML, 2020.
- [68] M. Andriushchenko, F. Croce, N. Flammarion, and M. Hein, “Square attack: a query-efficient black-box adversarial attack via random search,” in ECCV, 2020.
- [69] Y. Shi, S. Wang, and Y. Han, “Curls & whey: Boosting black-box adversarial attacks,” in CVPR, 2019.
- [70] C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, and A. L. Yuille, “Improving transferability of adversarial examples with input diversity,” in CVPR, 2019.
- [71] J. Li, R. Ji, H. Liu, J. Liu, B. Zhong, C. Deng, and Q. Tian, “Projection & probability-driven black-box attack,” in CVPR, 2020.
- [72] A. Rahmati, S.-M. Moosavi-Dezfooli, P. Frossard, and H. Dai, “Geoda: a geometric framework for black-box adversarial attacks,” in CVPR, 2020.
- [73] W. Chen, Z. Zhang, X. Hu, and B. Wu, “Boosting decision-based black-box adversarial attacks with random sign flip,” in ECCV, 2020.
- [74] Y. Lu, Y. Jia, J. Wang, B. Li, W. Chai, L. Carin, and S. Velipasalar, “Enhancing cross-task black-box transferability of adversarial examples with dispersion reduction,” in CVPR, 2020.
- [75] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” IJCV, 2010.
- [76] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in ICML, 2018.
- [77] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015.
![]() |
Xiaogang Xu is currently a fourth-year PhD student in the Chinese University of Hong Kong. He received his bachelor degree from Zhejiang University. He obtained the Hong Kong PhD Fellowship in 2018. He serves as a reviewer for CVPR, ICCV, ECCV, AAAI, ICLR, NIPS, IJCV. His research interest includes deep learning, generative adversarial networks, adversarial attack and defense, etc. |
![]() |
Hengshuang Zhao is currently an Assistant Professor in the Department of Computer Science at The University of Hong Kong. He received the PhD degree in Computer Science and Engineering from The Chinese University of Hong Kong. He worked as a postdoctoral researcher at the University of Oxford and Massachusetts Institute of Technology. He and his team won several champions in competitive academic challenges like ImageNet Scene Parsing, LSUN Semantic Segmentation, WAD Drivable Area Segmentation, Embodied AI Social Navigation, etc. His general research interests cover the broad area of computer vision and machine learning, with special emphasis on high-level scene recognition and pixel-level scene understanding. He is a member of the IEEE. |
![]() |
Philip Torr received the PhD degree from the University of Oxford. After working for another three years at Oxford as a research fellow, he worked for six years in Microsoft Research, first in Redmond, then in Cambridge, founding the vision side of the Machine Learning and Perception Group. He then became a Professor in Computer Vision and Machine Learning at Oxford Brookes University. He is now a professor at Oxford University. He is a BMVA Distinguished Fellow, Ellis Fellow, Royal Academy of Engineering Fellow, Royal Society Fellow, and Turing AI World-Leading Researcher Fellow. |
![]() |
Jiaya Jia received the PhD degree in Computer Science from Hong Kong University of Science and Technology in 2004 and is currently a full professor in Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He was a visiting scholar at Microsoft Research Asia from March 2004 to August 2005 and conducted collaborative research at Adobe Systems in 2007. He is an Associate Editor-in-Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and is also in the editorial board of International Journal of Computer Vision (IJCV). He continuously served as area chairs for ICCV, CVPR, AAAI, ECCV, and several other conferences for organization. He was on program committees of major conferences in graphics and computational imaging, including ICCP, SIGGRAPH, and SIGGRAPH Asia. He received the Young Researcher Award 2008 and Research Excellence Award 2009 from CUHK. He is a Fellow of the IEEE. |