Adversarial Parameter Attack on Deep Neural Networks††thanks: This work is partially supported by NSFC grant No.11688101 and NKRDP grant No.2018YFA0704705.
Abstract
In this paper, a new parameter perturbation attack on DNNs, called adversarial parameter attack, is proposed, in which small perturbations to the parameters of the DNN are made such that the accuracy of the attacked DNN does not decrease much, but its robustness becomes much lower. The adversarial parameter attack is stronger than previous parameter perturbation attacks in that the attack is more difficult to be recognized by users and the attacked DNN gives a wrong label for any modified sample input with high probability. The existence of adversarial parameters is proved. For a DNN with the parameter set satisfying certain conditions, it is shown that if the depth of the DNN is sufficiently large, then there exists an adversarial parameter set for such that the accuracy of is equal to that of , but the robustness measure of is smaller than any given bound. An effective training algorithm is given to compute adversarial parameters and numerical experiments are used to demonstrate that the algorithms are effective to produce high quality adversarial parameters.
Keyword. Adversarial parameter attack, adversarial samples, robustness measurement, adversarial accuracy, mathematical theory for safe DNN.
1 Introduction
The deep neural network (DNN) [15] has become the most powerful machine learning method, which has been successfully applied in computer vision, natural language processing, and many other fields. Safety is a key desired feature of DNNs, which was studied extensively [1, 5, 42].
The most widely studied safety issue for DNNs is the adversarial sample attack [31], that is, it is possible to intentionally make small modifications to a sample, which are essentially imperceptible to the human eye, but the DNN outputs a wrong label or even any label given by the adversary. Existence of adversary samples makes the DNN vulnerable in safety-critical applications and many effective methods were proposed to develop more robust DNNs against adversarial attacks [20, 1, 5, 42]. However, it was shown that adversaries samples are inevitable for current DNN models in certain sense [6, 3, 27].
More recently, the parameter perturbation attacks [18, 43, 7, 33, 30, 37, 34, 35] were studied and shown to be another serious safety treat to DNNs. It was shown that by making small parameter perturbations, the attacked DNN can give wrong or desired labels to specified input samples and still give the correct labels to other samples [18, 43, 34, 35].
In this paper, the adversarial parameter attack is proposed, in which small perturbations to the parameters of a DNN are made such that the attack to the DNN is essentially imperceptible to the user, but the robustness of the DNN becomes much lower. The adversarial parameter attack is stronger than previous parameter perturbation attacks in that not only the accuracy but also the robustness of DNNs are considered.
1.1 Contributions
Let be a DNN with as the parameter set. A parameter perturbation is called a set of adversarial parameters of or , if the following conditions are satisfied 1) is a small modification of , for instance for a small positive number ; 2) the accuracy of over a distribution of samples is almost the same as that of ; 3) is much less robust than , that is, has much more adversarial samples than . It is clear that conditions 1) and 2) are to make the attack difficult to be recognized by the users and condition 3) is to make the new DNN less safe. The DNN obtained by the above attack is called an adversarial DNN, which has high accuracy but low robustness.
The existence of adversarial parameters is proved under certain assumptions. It is shown that if the depth of a trained DNN is sufficiently large, then there exist adversarial parameters such that the accuracy of is equal to that of , but the robustness measure of is as small as possible (refer to Corollaries 4.4 and 4.5). Since is a continuous function in , if is an adversarial parameter for then there exists a small sphere with as center such that all parameters in are also adversarial parameters for . These results imply that adversarial parameters are inevitable in certain sense, similar to adversarial samples [6, 3, 27].
The existence of adversarial samples is usually demonstrated with numerical experiments, besides a few cases to be mention in the next section. As an application of adversarial parameters, we can construct DNNs which are guaranteed to have adversarial samples. For a trained DNN satisfying certain conditions, it is shown that there exist adversarial parameters such that the accuracy of is equal to that of , but has adversarial samples near a given normal sample (refer to Theorem 4.1), or the probability for to have adversarial samples over a distribution of samples is at least (refer to Theorem 4.2).
Finally, an effective training algorithm is given to compute adversarial parameters and numerical experiments are used to demonstrate that the algorithms are effective to produce high quality adversarial parameters for the networks VGG19 and Resnet56 on the CIFAR-10 dataset.
1.2 Related work
There exist vast literatures on adversarial attacks, which can be found in the survey papers [1, 5, 42]. We will focus on those which are closely related to our work.
Parameter perturbation attacks. Parameter perturbation attacks were given under different names such as fault injection attack, fault sneaking attack, stealth attack, and weight corruption. The fault injection attack [18] was first proposed by Liu et al, where it was shown that parameter perturbations can be used to misclassify one given input sample. In [7], it was shown that laser injection techniques can be used as a successful fault injection attack in real-world applications. In [43], the fault sneaking attack was proposed, where multiple input samples were misclassified and other samples were still given the correct label. In [37], lower bounds were given for parameter perturbations under which the network still gives the correct label for a given sample. In [33], upper bounds were given for the changes of the pairwise class margin function and the Rademacher complexity against parameter perturbations and new loss functions were given to obtain more robust networks. In [30], the maximum change of the loss function over given samples was used as an indicator to measure the robustness of DNNs against parameter perturbations and gradient decent methods were used to compute the indicator. In [34, 35], the stealth attack which can guaranteed to make the attacked DNN gives a desired label for a sample outside of the validation set and keep correct labels for samples in the validation set. The stealth attack has the form , where is the DNN to be attacked and is a DNN with one hidden layer.
The adversarial parameter attack proposed in this paper is stronger than previous parameter perturbation attacks by in the following aspects. First, by keeping the accuracy and eliminating the robustness, the adversarial parameter attack is more difficult to be recognized, because the attached DNN performs almost the same as the original DNN on the test set. Second, by reducing the robustness of the DNN, the attacked DNN gives a wrong label for any modified input sample with high probability, while previous parameter attacks usually misclassify certain given samples. Finally, we prove the existence of adversarial parameters under reasonable assumptions.
Mathematical theories of adversarial samples. Existence of adversarial samples were usually demonstrated with numerical experiments, and mathematical theories were desired. In [6], it was proved that for DNNs with a fixed architecture, there exist uncountable classification functions and distributions of samples such that adversarial samples always exist for any successfully trained DNN with the given architecture and the sample distribution. In the stealth attack [34, 35], it was proved that there exist attached DNNs which give a desired label for a sample outside of the validation set by modifying the DNN. In this paper, we show that by making small perturbations to the parameters of the DNN, the DNN has adversarial samples with high probability.
Theories for certified robustness of DNNs were given in several aspects. Let be a sample such that the DNN gives the correct label. Due to the continuity of the DNN function, a sphere with as center does not contain adversarial samples if its radius is sufficiently small, which is called a robust sphere of . In [12], lower bounds for the robustness radius were computed and used to enhance the robustness of the DNN. In [24], for shallow networks, the upper bounds of the changes of the network under sample input perturbations were given and use to obtain more robust DNNs. In [8], the random smoothing method was proposed and lower bounds for the radius of the robust spheres was given. In [39], lower bounds for the average radius of robust spheres for a distribution of samples are given. Universal lower bounds on stability in terms of the dimension of the domain of the classification function were also given in [27, 34]. However, these bounds are usually inverse-exponentially dependent on the depth of the DNN, which are very small for deep networks in real world applications. In [40], the information-theoretically safe bias classifier was introduced by making the gradient of the DNN random.
Algorithms to train robust DNNs. Many effective methods were proposed to train more robust DNNs to defend adversarial samples [1, 5, 42, 38]. Methods to train DNNs which are more robust against parameter perturbation attacks were also proposed [18, 43]. The adversarial training method proposed by Madry et al [20] can reduce adversarial samples significantly, where the value of the loss function of the worst adversary in a small neighborhood of the training sample is minimized. In this paper, the idea of adversarial training is used to compute adversarial parameters.
2 Adversarial parameters
In this section, we define the adversarial parameters and give a measurement for the quality of the adversarial parameters.
2.1 Adversarial parameters of DNNs
Let us consider a standard DNN. Denote and for . In this paper, we assume that is a classification DNN for objects, which has hidden-layers, all hidden-layers use Relu as the activity function, and the output layer does not have activity functions. can be written as
(1) |
Denote to be the parameter set of and is denoted as if the parameters need to be mentioned explicitly, where .
Let be a trained network with the parameter set . Then a new parameter set is called a set of adversarial parameters of if 1) is a small perturbation of ; 2) the accuracy of is almost the same as that of ; 3) is much less robust comparing to , that is, has more adversarial samples than .
We assume that the objects to be classified satisfy a distribution in , and a sample is called a normal sample. Let be the parameter set of a trained network . For , denote to be the label of and to be the classification result of . Then the accuracy of for the normal samples is
(2) |
In order to measure the quality of adversarial parameters, we need a robustness measure of for the normal samples. There exist several definitions for [20, 39]. In this paper, two kinds of robustness measures are used.
We first give two robust measures of on a given sample . The robustness radius of under the norm for is defined to be
(3) |
If , then the robustness radius of is zero. It is difficult to have good estimation to the robustness radius, and the following approximation to the robust radius under norm [12] is often used
(4) |
where is the -th coordinate of , , satisfy ( iff ), and if it true or otherwise.
For a distribution of samples, we define two global robust measures corresponding to and . The adversarial accuracy can be used as . For and , the adversarial accuracy of is
(5) |
Corresponding to in (4), we have the following global robustness measure
(6) |
We now define a measurement for an adversarial parameter set using the accuracy and robustness of .
Definition 2.1.
Let be an adversarial parameter set of , a robustness measure of for normal samples, and
(7) | |||
Then the adversarial rate of is defined to be , where .
In general, we have and . The value of measures the ability of to keep the accuracy of on normal samples, and if is large then the attack is more difficult to be detected. The value of measures the ability of to break the robustness of , and if is large then the parameter attack is more powerful. Hence, the adversarial rate measures the quality of the adversarial parameter attack in that if the adversarial rate is larger then the adversarial parameter attack is better. If achieves its maximal value , then and and the adversarial parameter attack is a perfect attack in that the attack does not change the accuracy of , but totally destroys the robustness of .
Remark 2.1.
In order to make the attack very hard to be detected, we can give a lower bound to . If , we consider to be a failed attack.
2.2 Adversarial parameter attacks for other purposes
According to the requirements of specific applications, we may define other types of adversarial parameter attacks.
The adversarial parameters defined in section 2.1 are for all the samples. In certain applications, it is desired to make the network less robust on one specific class of samples, which motivates the following definitions.
The simplest case is adversarial parameters for a given sample. A small perturbation of is called adversarial parameters for a given sample , if gives the correct label to and has adversarial samples of in for a given . Let be a measure of robustness of at sample , and
(8) |
Then the adversarial rate of is defined to be .
We can also break the stability for samples with a given label. A small perturbation of is called adversarial parameters for samples with a given label , if keeps the accuracy for all normal samples and the robustness for normal samples whose label is not , but break the robustness of samples with label . Let and . For such adversarial parameters , let
(9) | |||
Then the adversarial rate of is defined as .
Finally, instead of breaking the robustness of samples with label , we can break the accuracies for samples with label . Such adversarial parameters are called direct adversarial parameters. Let be a direct adversarial parameter set and
(10) | |||
Then the adversarial rate is defined as . The above definition is similar to the attacks in [18, 43], but robustness is considered as an extra objective.
3 Algorithm
In this section, we give algorithms to compute adversarial parameters.
3.1 Compute adversarial parameters under norm
We formulate the adversarial parameter attack for a trained DNN under the norm as the following optimization problem for a given .
(11) |
where and are the accuracy and a robustness measure for over a distribution sample .
Remark 3.1.
Theoretically, the adversarial parameter attack should be a multi-objective optimization problem, that is to maximize the accuracy and to minimize the robustness. But, such an optimization problem is difficult to solve.
Remark 3.2.
In the rest of this section, we show how to change formula (11) to an effective algorithm to compute norm adversarial parameters using the robustness measure in (5). We first show how to compute the robustness in (5) explicitly. We use the adversarial training [20] to do that, which is the most effective way to find adversarial samples. For a sample and a small number , we first compute
with PGD [20] and then use
(12) |
to measure the robustness of at .
We need a training set to find the adversarial parameters. The training procedure consists of two phases. In the first pre-training phase, the loss function
(13) |
is used to reduce the adversarial accuracy of . In the second main training phase, the loss function
(14) |
is used to promote the accuracy and keep the low-level of adversarial accuracy of , which corresponds to formula (11).
We will compute a more general norm parameter perturbation. Let and the -th coordinate of . Then the parameter perturbation will be found in
It is clear that the usual norm parameter perturbation is a special case of the above general case. We use this general form, because we want to include more types of parameter perturbations which are given in section 5.2. A sketch of the algorithm is given below.
Remark 3.3.
We give more details for the algorithm.
(1). maps into as follows: for :
If , ;
If , ;
If , .
(2). We will reduce the training steps with the training going.
3.2 Algorithms for other kinds of adversarial parameters
The algorithm to find adversarial parameters under other norms and robustness measures can be developed similarly. In what below, we show how to compute adversarial parameters under norm, which is different from other cases. The overall algorithm is similar to Algorithm 1, except we use a new method to update the parameters. Suppose that is the parameter to be updated and the value of in Algorithm 1 is found. We will show how to update the parameters. We only change some weight matrices as follows.
-
•
Randomly select two entries and of until is satisfied.
-
•
Exchange and in to obtain the new parameters.
It is clear that the change will make become smaller. In total, we update a given number of weight matrices, and for each such matrix, we change a given percentage of its entries. The details of the algorithm are omitted. Note that the above parameter perturbation keeps the sparsity and the values of the entries of the weight matrices. As a consequence the Proj operator in the algorithm can be taken as the identity map.
The adversarial parameters defined in section 2.2 can also be obtained similarly. For instance, to compute the adversarial parameters for one sample , we just need to let in Algorithm 1 to be .
To compute adversarial parameters for samples with a given label , by (9) we can use the following loss function
(15) |
to increase the robustness and accuracy of samples whose labels are not and to reduce the accuracy for samples with labels .
To compute direct adversarial parameters for samples with label , by (10) we can use the following loss function
(16) |
to increase the robustness and accuracy of samples whose labels are not and to reduce the accuracy for samples with labels .
4 Existence of adversarial parameters
In this section, we will show that adversarial parameters with high adversarial rates exist under certain conditions.
4.1 Adversarial parameters to achieve low adversary accuracy
In this section, we use the robustness radius in (3) and the adversarial accuracy in (5) as the robust measures, and hence existence of adversarial parameters implies low adversary accuracies.
We introduce several notations. Let for , and for , where is the -th row of . If is a network, we use to denote the -th coordinate of .
In this section, we consider the following network with one hidden layer
(17) |
where . is the parameter set of , where .
The network defined in (17) has just one hidden-layer. We will show that when the width of its hidden-layer is large enough, adversarial parameters exist under with certain conditions.
We will consider adversarial parameters. For , the hypothetical space for the adversarial networks of is
The following theorem shows the existence of adversarial parameters for a given sample . The proof of the theorem is given in section 6.1.
Theorem 4.1.
Let be a trained network with structure in (17), which gives the correct label for a sample . Further assume the following conditions.
- .
-
Let such that for all and .
- .
-
for all , .
- .
-
At least coordinates of are bigger than , where and .
For such that , if , then there exists an such that and has adversarial samples to in .
We have the following observations from Theorem 4.1.
Corollary 4.1.
Remark 4.1.
From Theorem 4.1, if the width of is sufficiently large, then has adversarial parameters which is as close as possible to and has adversarial samples which are as close as possible to .
Remark 4.2.
Since is a continuous function in , if is an adversarial parameter set for then there exists a small sphere with as center such that all parameters in are also adversarial parameters for .
From the above two remarks, we may say that adversarial parameters are inevitable in this case.
The following theorem shows that when is large enough, adversarial parameters exist for a distributions with high probability. The proof of the theorem is given in section 6.1.
Theorem 4.2.
Let be a trained DNN with structure in (17) and the set of normal samples. Further assume the following conditions.
- .
-
Let such that for all and .
- .
-
for all , , where .
- .
-
For all , at least coordinates of are bigger than , where .
- .
-
The dimension of is lower than .
For such that , if , then there exists an such that the accuracy of over is greater than or equal to that of and
Corollary 4.2.
Let the adversary accuracy of be , then Theorem 4.2 implies that there exists adversarial parameters of with adversarial rate at least .
Remark 4.3.
The conditions of Theorems 4.1 and 4.2 can be satisfied for most DNNs. The parameters and in Condition are clearly exist. Since the training procedure usually terminates near a local minimum of the loss function, the weights can be considered as random values [39], and hence conditions and can be satisfied. For practical examples such as MNIST and CIFAR-10, condition is clearly satisfied.
4.2 Adversarial parameters for DNNs
In this section, we consider networks of the following form
(18) |
Let be the parameters and , where . We use to represent the output of the -th layer of where . We will show that, when becomes big, adversarial parameters exist.
We first prove the existence of adversarial parameters for a given sample. We use the following robustness measure for network at a sample
It is easy to see that this is the square of in (4) with .
Theorem 4.3.
Let be a trained network with structure in (18), which gives the correct label for a sample . Further assume the following conditions.
- .
-
for .
- .
-
for .
- .
-
for , and .
- .
-
For , has a column such that the angle between and is bigger than and smaller than , where .
Then for , there exists an such that and
where . In other words, there exists an with adversarial rate .
The proof of the theorem is given in section 6.2. As a consequence of Theorem 4.3, there exist adversarial parameters for sample , whose robustness measure is as small as possible.
Corollary 4.3.
For , if , then the adversarial rate of is .
Corollary 4.4.
For satisfying , if , then .
To find adversarial parameters for samples under a distribution , we use the following robustness measure for :
This is a variant of in (6) with . The following theorem shows that adversarial parameters exist for this robustness measure. The proof of the theorem is given in section 6.2.
Theorem 4.4.
Let be a trained DNN with structure in (17) and the set of normal samples satisfying distribution . Further assume the following conditions.
- .
-
for all samples and .
- .
-
, where and .
- .
-
for , where and .
- .
-
is in a low dimensional subspace of and , where , , and .
For , let be the set of networks in , whose accuracies are equal to or larger than that of . Then
where . In other words, there exists an with adversarial rate .
We can make the robustness of the perturbed network as small as possible.
Corollary 4.5.
In Theorem 4.4, if satisfy , , , , for , then
Furthermore, for , if , then there exists an whose adversarial rate is .
Furthermore, for satisfying , if , then there exists an such that .
Remark 4.4.
From Corollary 4.5, if the depth of the DNN is sufficiently large, then there exist adversary parameters such that the attacked network has robustness measure as small as possible.
Remark 4.5.
In practical computation, we use a finite set of samples satisfying and is approximately computed as . Since is a continued function in , if is an adversarial parameter for and is used as the robustness measure, then there exists a small sphere with as center such that all parameters in are also adversarial parameters for .
The above remarks show that adversary parameters are inevitable in certain sense.
Remark 4.6.
In the model (18), two simplifications are made. However, the results proved in this section can be generalized to general DNNs. First, the bias vectors are not considered, which can be included as parts of the weight matrices by extending the input space slightly, similar to [22]. Second, it is assumed that for . This assumption could be removed by assuming .
Remark 4.7.
Using , results in theorem 4.4 cannot be obtained yet. But, we will use numerical experiments to show that the result is also valid for .
5 Experimental results
5.1 The setting
We use two networks: VGG19 [29] and Resnet56 [11]. We write VGG19 as and Resnet56 as , which are trained with the adversarial training [20]. The experimental results are for the CIFAR-10 dataset.
We use both the adversary accuracy in (5) and the approximate robust radius in (6) to compute the adversarial rate. For a given data set , the adversarial accuracy defined in (5) can be approximately computed with PGD [20] as follows
where . In the experiment, we set . The approximate robust radius in (6) can be computed as follows
where the -norm is used, since we consider adversarial samples.
The accuracies, adversarial accuracies, and AARs of and under the norm attack are given in Table 1, which are about the state of the art results for these DNNs.
Net | AC | AA | AAR |
---|---|---|---|
80 | 45 | 0.0770 | |
83 | 49 | 0.0194 |
5.2 Adversarial parameter attack
Let be the parameter set of or , and two kinds of parameter perturbation attacks will be carried out:
perturbation for : We consider parameter perturbations in , where for . In other words, is the perturbation ratio. Also, the BN-layers will be changed to compute this kind of perturbations.
perturbation for : weight matrices are perturbed and pairs of weights are changed for with the method given in section 3.2 ( pairs of weights are changed for ), where is the number of entries of . The BN-layers will not be changed to compute this kind of perturbations.
We set in Remark 2.1 to be , that is, if the accuracy of the perturbed network has is less than of that of the original DNN, then the attack is considered failed.
5.2.1 Random parameter perturbation
We do random parameter perturbations and will use them as bases for comparisons. The results are given in Table 2.
Attack | AC | AA | AR |
No attack | 80 | 45 | 0 |
80 | 39 | 0.13 | |
80 | 37 | 0.17 | |
79 | 36 | 0.2 | |
78 | 35 | 0.22 | |
78 | 34 | 0.24 | |
67 | 23 | 0.41(fail) | |
61 | 20 | 0.42(fail) | |
57 | 19 | 0.41(fail) |
For the perturbations, the accuracies are kept high, but the robustness does not decrease much, so the adversarial rates are low. For the perturbations, the accuracy decreases too much and are considered failed attacks. In either case, random perturbations are not good adversarial parameters. Thus adversarial parameters are sparse around the trained parameters, which is similar with adversarial samples [40].
Attack | AC | AA | AR |
---|---|---|---|
No attack | 83.1 | 48.5 | 0 |
82.9 | 48.0 | 0.004 | |
82.7 | 48.0 | 0.011 | |
82.3 | 47.5 | 0.022 | |
81.7 | 46.6 | 0.039 | |
81.0 | 45.5 | 0.061 | |
81.5 | 47.8 | 0.016 | |
80.7 | 45.8 | 0.054 | |
80.5 | 45.7 | 0.057 |
Results of random perturbations for are given in Table 3. From the results, we can see that network is much more robust against random parameter perturbations than .
5.2.2 Adversarial parameter attack on and
We use algorithms in section 3 to create adversarial parameters. The training set contains 500 samples for which give the correct label. The average results are given in Table 4.
Attack | AC | AA in (5) | ARR in (6) | ||
---|---|---|---|---|---|
AR | AR | ||||
No attack | 80 | 45 | 0 | 0.0770 | 0 |
78 | 38 | 0.15 | 0.0667 | 0.13 | |
77 | 30 | 0.32 | 0.0481 | 0.36 | |
76 | 22 | 0.49 | 0.0372 | 0.49 | |
76 | 10 | 0.74 | 0.0195 | 0.71 | |
77 | 8 | 0.79 | 0.0143 | 0.78 | |
72 | 27 | 0.36 | 0.0441 | 0.38 | |
76 | 24 | 0.44 | 0.0443 | 0.40 | |
74 | 22 | 0.47 | 0.0404 | 0.44 |
Comparing Tables 2 and 4, we can see that algorithms in section 3 can be used to create good adversarial parameters, especially for the attack. From Figure 1, we can see that the adversarial rate and adversarial accuracy for the attack are near linear in when is small, and is gradually stabilized with the increase of . Also when is very small, say , the adversarial parameter attacks do not create good results, which means the network is approximately safe against these attacks for . For attacks, we can see that the accuracies are increased lots comparing to the random perturbation and adversarial parameters are obtained successfully. Also, for the two kinds of robustness-measurements, the adversarial rates are very close. For network , similar results are obtained and are given in Table 5.


Attack | AC | AA in (5) | ARR in (6) | ||
---|---|---|---|---|---|
AR | AR | ||||
No attack | 83 | 49 | 0 | 0.0194 | 0 |
84 | 39 | 0.20 | 0.0151 | 0.22 | |
85 | 27 | 0.45 | 0.0127 | 0.35 | |
86 | 14 | 0.72 | 0.0092 | 0.53 | |
87 | 6 | 0.87 | 0.0064 | 0.67 | |
87 | 1 | 0.98 | 0.0044 | 0.77 | |
80 | 26 | 0.45 | 0.0193 | 0 | |
78 | 18 | 0.59 | 0.0187 | 0.03 | |
72 | 9 | 0.71 | 0.0125 | 0.33 |
5.2.3 Affect of network depth and width on the adversarial parameter attack
We check how the network depth and width affect on the adversarial parameter attack. We use the adversarial parameter attack for . Let be the network which has the same width with but has more layers, the network which has the same depth with but has times width as . The results are given in Table 6. We can see that when the depth becomes larger, the attack becomes easier. This validates the results in section 4.2, for instance Corollary 4.5, where it shows that when the depth of the network becomes large, adversarial parameters exist.
The attack is much less sensitive to the width. The reason may be that there exist much redundancy on the width, similar to the results in [17, 25, 26], and the redundance can lead to limited search directions in the feature space and poor generalization performance, as shown in [21], so the attack is hard to improve when the width becomes larger.
Network | ||||||||
---|---|---|---|---|---|---|---|---|
AC | AA | AC | AA | AR | AC | AA | AR | |
80 | 45 | 78 | 38 | 0.15 | 77 | 30 | 0.32 | |
80 | 47 | 76 | 37 | 0.20 | 78 | 32 | 0.31 | |
78 | 44 | 74 | 35 | 0.19 | 75 | 27 | 0.37 | |
79 | 43 | 73 | 30 | 0.28 | 73 | 20 | 0.49 | |
76 | 44 | 72 | 28 | 0.34 | 71 | 19 | 0.53 | |
80 | 42 | 77 | 37 | 0.12 | 76 | 29 | 0.29 | |
81 | 41 | 78 | 36 | 0.12 | 77 | 30 | 0.26 | |
78 | 44 | 76 | 39 | 0.11 | 74 | 31 | 0.28 | |
80 | 44 | 79 | 35 | 0.20 | 76 | 30 | 0.30 |
We can use to measure the robustness and similar results are obtained, which are given in Table 7.
Network | |||||
---|---|---|---|---|---|
AR | AR | ||||
0.0770 | 0.0667 | 0.13 | 0.0481 | 0.36 | |
0.0776 | 0.0686 | 0.11 | 0.0534 | 0.30 | |
0.0815 | 0.0652 | 0.19 | 0.0479 | 0.40 | |
0.0817 | 0.0630 | 0.21 | 0.0388 | 0.49 | |
0.0808 | 0.0608 | 0.23 | 0.0392 | 0.48 | |
0.0750 | 0.0663 | 0.11 | 0.0524 | 0.29 | |
0.0763 | 0.0670 | 0.12 | 0.0563 | 0.25 | |
0.0750 | 0.0650 | 0.13 | 0.0499 | 0.32 | |
0.0775 | 0.0678 | 0.12 | 0.0489 | 0.35 |
5.3 Direct adversarial parameters
We give experimental results for direct adversarial parameters introduced in section 2.2. We try to decrease the accuracies for samples with label 0 and keep the accuracies and robustness for other samples. The experimental results are for the network and CIFAR-10 and are given in Table 8.
Attack | AC1 | AA1 | AC0 | AR |
77 | 35 | 11 | 0.65 | |
78 | 40 | 3 | 0.83 | |
79 | 42 | 1 | 0.92 | |
80 | 43 | 1 | 0.95 | |
80 | 45 | 1 | 0.99 |
Comparing to Tables 8 and 4, we can see that direct adversarial parameters for a given label are much easier to compute than adversarial parameters. High quality direct adversarial parameters can be obtained by using perturbation ratios . Results for network are given in Table 9, from which we can see that it is slightly more difficult to attack .
Attack | AC1 | AA1 | AC0 | AR |
82 | 46 | 52 | 0.38 | |
81 | 47 | 35 | 0.57 | |
81 | 47 | 10 | 0.83 | |
81 | 46 | 4 | 0.90 | |
80 | 46 | 1 | 0.93 |
5.4 Adversarial parameters for a given sample
We give experimental results for adversarial parameters for a given sample introduced in section 2.2. is used to measure the robustness of at sample . Let be a subset of the test set containing 100 samples such that gives the correct label for all of them and all samples in are robust in that, no adversarial samples were found using PGD-10 with .
Attack | AR | |||
before attack | 0.078 | 100 | 100 | 0 |
0.016 | 0 | 100 | 0.79 | |
0.010 | 0 | 100 | 0.87 | |
0.008 | 0 | 100 | 0.89 | |
0.006 | 0 | 100 | 0.92 | |
0.005 | 0 | 100 | 0.94 |
For each sample , we compute adversarial parameters and the average results are given in Table 10, where is the number of robust samples, and the number of samples which are given the correct labels. Comparing to Tables 8, 4, and 10, we can see that adversarial rates for a single sample are about the same as that for a given label. Similar results for network are given in Table 11.
Attack | AR | |||
before attack | 0.057 | 100 | 100 | 0 |
0.008 | 0 | 100 | 0.86 | |
0.008 | 0 | 100 | 0.86 | |
0.008 | 0 | 100 | 0.86 | |
0.008 | 0 | 100 | 0.86 | |
0.008 | 0 | 100 | 0.86 |
6 Proofs for the theorems in section 4
6.1 Proofs of Theorems 4.1 and 4.2
We introduce several notations. Let for , and for , where is the -th row of . If is a network, we use to denote the -th coordinate of . A lemma is proved first.
Lemma 6.1.
Let and . Then there exists a vector such that , and .
Proof.
Let . We can assume . For any such that and , we have , which means .
We now define . Set if . Select a such that and let if and , and . For , let if . It is easy to check that , , and . The lemma is proved. ∎
We now prove Theorem 4.1.
Proof.
By Lemma 6.1, there exists a vector such that , and . Moreover, we can assume that at least coordinates of are bigger than . If this is not valid, we just need to change to , and then , since for all . By condition , at least coordinates of are bigger than , but fewer than coordinates of are bigger than , so at least coordinates of are bigger than .
Let such that , , all of whose rows are (the transposition of ), and . Let and
We will show that satisfies the condition of the theorem.
Since , we have and gives the correct label for . Since , we have , and thus .
So it suffices to show that will not give label , which means that has adversarial samples to in . Since
we have
Since for all , we have
Since and , each weight of is at least . Note that if and , then . As a consequence, if satisfies , then . Since at least coordinates of are bigger than , we have . By condition , it is easy see
that is, . By condition , we have . Then we have
Thus if , then and the label of is not . The theorem is proved. ∎
We now prove Theorem 4.2.
Proof.
By condition , for , there exist such that , for , . Then .
By condition , at least coordinates of are bigger than or at least coordinates of are bigger than , similar to the proof of Theorem 4.1.
For convenience, we write as , where is the vector with entries 1. It is easy to see that, is the number of coordinates of that are bigger than . So we have or for all , and hence for , we have
For events and , . We thus have or . Without loss of generality, we can assume that for any , . Therefore,
For , let , where . Now assume , whose rows are all , and . Let , and
We will show that satisfies the conditions of the theorem.
It is easy to see that is in . For any , , which means and the accuracy of over is equal to that of .
Let satisfy and . By conditions and and similar to the proof of Theorem 4.1, we have
Thus does not give label to and has an adversarial sample of in . Furthermore, since , we have
The theorem is proved. ∎
6.2 Proofs of Theorems 4.3 and 4.4
We first prove two lemmas.
Lemma 6.2.
For , let be a non-empty bounded closed subset of such that implies . Also let be a non-empty bounded closed subset of such that implies . Let and for . Define maps: by , and for , by . Then
Proof.
For , let , which exists because is bounded and closed. Then
The lemma is proved. ∎
Lemma 6.3.
For , let be a non-empty closed subset of bounded functions from to such that implies . Also let be a non-empty closed subset of bounded functions from to such that implies . Assume , for . Define maps: by ; and by for . Then we have
Proof.
Denote , which must exist because is closed and its elements are bounded. Then
The lemma is proved. ∎
We now prove Theorem 4.3.
Proof.
Let be the set of satisfying and for . Note that satisfies linear equations and has variables. Since in DNNs, we can assume that is not empty. Let be the set of networks satisfying
where for all . We have for . So if . It is also easy to see that . So it suffices to prove
Let . Then
Then for all , we have
So we have
(19) |
To prove the theorem, we will first find a lower bound for . Let for . Then
Let and for . Then we have
Let , , . Then for all ,
Denote for . We have
Now, let , , where , . By Lemma 6.2, we have
It is easy to see for any , , so by condition , we have . Let be the subset of containing those which have at most one nonzero row. Hence, for and , if at most one row of is nonzero, we have , where is the -th weight of , is the weight of at -th row and -th column. Thus
Moreover, by condition , there exists a column of such that , where is the angle between . Therefore, there exists a vector such that , and consider condition , we have .
Then , because there must exist a whose only nonzero row is and . For , by condition , we have
Similarly,
Then we obtain the desired lower bound:
By condition and the lower bound just obtained, we have
The theorem is proved. ∎
We now prove Theorem 4.4.
Proof.
The proof is similar to that of Theorem 4.3, so certain details are omitted. Let , and the set of such that and for all and . must contain non-zero elements because of condition .
Let be the set of networks
where . Then for all , we have for and , so . As a consequence,
Let . Then
and
Therefore,
We will estimate . Let , where . Then . Also, for all ,
Denote . Then
Let , where . Then
and
.
Let , where , . By Lemma 6.3,
Let be the matrix whose -th row is equal to -th row of , and other rows are 0. Let be the -th row of . Then
By condition , we know that , and by condition and the principle of drawer, there exists a such that . Thus there exists a such that
Then
Let only one row of is not . Then for , we have
By conditions and , we have . By condition and the principle of drawer, there exists a such that . So, there exists a such that
Then
Similarly, by condition , we have
Then we have the desired lower bound
By condition and the lower bound just obtained, we have
The theorem is proved. ∎
7 Conclusion
The adversarial parameter attack for DNNs is proposed. In the attack, the adversary makes small changes to the parameters of a trained DNN such that the attacked DNN will keep the accuracy of the original DNN as much as possible, but makes the robustness as low as possible. The goal of the attack is that the attacked DNN is imperceptible to the user and at the same time the robustness of the DNN is broken. The existence of adversarial parameters is proved under certain conditions and effective adversarial parameter attack algorithms are also given.
In general, it is still out of reach to provide provable safety DNNs in real-world applications, and one of the ways to develop safer DNN models and training methods, and evaluate the safety of the trained model against existing attack methods. In other words, a DNN to be deployed is considered safe if it is safe against existing attacks in certain sense. From this viewpoint, it is valuable to have more attack methods. This is similar to the cryptanalysis [13], where much more matured theory and attack methods are developed.
References
- [1] N. Akhtar and A. Mian. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. arXiv:1801.00553v3, 2018.
- [2] B. Allen, S. Agarwal, J. Kalpathy-Cramer, K. Dreyer. Democratizing AI. J. Am. Coll. Radiol., 16(7), 961-963, 2019.
- [3] A. Azulay and Y. Weiss. Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations? Journal of Machine Learning Research, 20, 1-25, 2019.
- [4] A. Athalye, N. Carlini, D. Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. Proc. ICML, PMLR, 2018: 274-283.
- [5] T. Bai, J. Luo, J. Zhao. Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks. ArXiv:2011.01539, 2020.
- [6] A. Bastounis, A.C. Hansen, V. Vlai. The Mathematics of Adversarial Attacks in AI - Why Deep Learning is Unstable Despite the Existence of Stable Neural Networks. arXiv:2109.06098, 2021.
- [7] J. Breier, X. Hou, D. Jap, L. Ma, S. Bhasin, Y. Liu. DeepLaser: Practical Fault Attack on Deep Neural Networks. arXiv:1806.05859, 2018.
- [8] J. Cohen, E. Rosenfeld, Z. Kolter. Certified Adversarial Robustness via Randomized Smoothing. Proc. ICML’2019, PMLR, 1310-1320, 2019.
- [9] G. Cybenko. Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2(4): 303-314, 1989.
- [10] C. Etmann, S. Lunz, P. Maass, C.B. Schonlieb. On the Connection Between Adversarial Robustness and Saliency Map Interpretability. arXiv:1905.04172, 2019.
- [11] K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. Proc. CVPR, 770-778, 2016.
- [12] M. Hein and M. Andriushchenko. Formal Guarantees on the Robustness of a Classifier Against Adversarial Manipulation. Proc. NIPS, 2266-2276, 2017.
- [13] O. Goldreich. Foundations of Cryptography, Volume II, Basic Tools. Cambridge University Press, 2009.
- [14] I.J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv:1412.6572, 2014.
- [15] Y. LeCun, Y. Bengio, G. Hinton. Deep Learning. Nature, 521(7553), 436-444, 2015.
- [16] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based Learning Applied to Document Recognition. Proc. of the IEEE, 86(11): 2278-2324, 1998.
- [17] H. Li, A. Kadav, I. Durdanovic, H. Samet, H.P. Graf. Pruning Filters for Efficient Convnets. arXiv:1608.08710, 2016.
- [18] Y. Liu, L. Wei, B. Luo, Q. Xu. Fault Injection Attack on Deep Neural Network. Proc. of the IEEE/ACM International Conference on Computer-Aided Design, 131-138, 2017.
- [19] A. Ma, F. Faghri, N. Papernot, A.M. Farahmand. SOAR: Second-Order Adversarial Regularization. arXiv:2004.01832, 2020.
- [20] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. ArXiv:1706.06083, 2017.
- [21] A.S. Morcos, D.G.T. Barrett, NC. Rabinowitz, M. Botvinick. On the Importance of Single Directions for Generalization. t arXiv:1803.06959, 2018.
- [22] B. Neyshabur, R. Tomioka, N. Srebro. Norm-based Capacity Control in Neural Networks. Proc. COLT’15, 1376-1401, 2015.
- [23] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, A. Swami. The Limitations of Deep Learning in Adversarial Settings. IEEE European Symposium on Security and Privacy, IEEE Press, 2016: 372-387.
- [24] A. Raghunathan, J. Steinhardt, P. Liang. Certified Defenses Against Adversarial Examples. ArXiv: 1801.09344, 2018.
- [25] A. RoyChowdhury, P. Sharma, E. Learned-Miller. Reducing Duplicate Filters in Deep Neural Networks. NIPS workshop on Deep Learning: Bridging Theory and Practice, 1, 2017.
- [26] W. Shang, K. Sohn, D. Almeida, H. Lee. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. Proc. ICML, PMLR, 2217-2225, 2016.
- [27] A. Shafahi, W.R. Huang, C. Studer, S. Feizi, T. Goldstein. Are Adversarial Examples Inevitable? ArXiv:1809.02104, 2018.
- [28] C.J. Simon-Gabriel, Y. Ollivier, L. Bottou, B. Schölkopf, D. Lopez-Paz. First-order Adversarial Vulnerability of Neural Networks and Input Dimension. Proc. ICML, PMLR, 5809-5817, 2019.
- [29] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv:1409.1556, 2014.
- [30] X. Sun, Z. Zhang, X. Ren, R. Luo, L. Li. Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption. arXiv:2006.05620, 2020.
- [31] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus. Intriguing Properties of Neural Networks. ArXiv:1312.6199, 2013.
- [32] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel. The Space of Transferable Adversarial Examples. arXiv:1704.03453, 2017.
- [33] Y.L. Tsai, C.Y. Hsu, C.M. Yu, P.Y. Chen. Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations. arXiv:2103.02200, 2021.
- [34] I.Y. Tyukin, D.J. Higham, A.N. Gorban. On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems. 2020 International Joint Conference on Neural Networks, 1-6, IEEE Press, 2020.
- [35] I.Y. Tyukin, D.J. Higham, A.N. Gorban. E. Woldegeorgis. The Feasibility and Inevitability of Stealth Attacks. arXiv2106.13997, 2021.
- [36] Z. Wang, C. Xiang, W. Zou, C. Xu. MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles. arXiv:2006.06527, 2020.
- [37] T.W. Weng, P. Zhao, S. Liu, P.Y. Chen, X. Lin, L. Daniel. Towards Certificated Model Robustness Against Weight Perturbations. Proc. of the AAAI, 34(04): 6356-6363, 2020.
- [38] H. Xu, Y. Ma, H.C. Liu, D, Deb, H. Liu J.L. Tang, A.K. Jain. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. International Journal of Automation and Computing, 17(2), 151-178, 2020.
- [39] L. Yu and X.S. Gao. Improve the Robustness and Accuracy of Deep Neural Network with Normalization, arXiv:2010.04912, 2020.
- [40] L. Yu and X.S. Gao. Robust and Information-theoretically Safe Bias Classifier against Adversarial Attacks. arXiv:2111.04404, 2021.
- [41] H. Zhang, Y. Yu, J. Jiao, E.P. Xing, L.E. Ghaoui, M.I. Jordan. Theoretically Principled Trade-off Between Robustness and Accuracy. Proc. ICML, 2019.
- [42] X.Y. Zhang, C.L. Liu, C.Y. Suen. Towards Robust Pattern Recognition: A Review. Proc. of the IEEE, 108(6), 894-922, 2020.
- [43] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, X. Lin. Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks. Proc. of the 56th Annual Design Automation Conference, 2019.