Second-Order Provable Defenses against Adversarial Attacks
Abstract
A robustness certificate is the minimum distance of a given input to the decision boundary of the classifier (or its lower bound). For any input perturbations with a magnitude smaller than the certificate value, the classification output will provably remain unchanged. Exactly computing the robustness certificates for neural networks is difficult since it requires solving a non-convex optimization. In this paper, we provide computationally-efficient robustness certificates for neural networks with differentiable activation functions in two steps. First, we show that if the eigenvalues of the Hessian of the network are bounded, we can compute a robustness certificate in the norm efficiently using convex optimization. Second, we derive a computationally-efficient differentiable upper bound on the curvature of a deep network. We also use the curvature bound as a regularization term during the training of the network to boost its certified robustness. Putting these results together leads to our proposed Curvature-based Robustness Certificate (CRC) and Curvature-based Robust Training (CRT). Our numerical results show that CRT leads to significantly higher certified robust accuracy compared to interval-bound propagation (IBP) based training. We achieve certified robust accuracy 69.79%, 57.78% and 53.19% while IBP-based methods achieve 44.96%, 44.74% and 44.66% on 2,3 and 4 layer networks respectively on the MNIST-dataset.
1 Introduction
Modern neural networks achieve high accuracy on tasks such as image classification and speech recognition, but are known to be brittle to small, adversarially chosen perturbations of their inputs [46]. A classifier which correctly classifies an image , can be fooled by an adversary to misclassify an adversarial example , such that is indistinguishable from to a human. Adversarial examples can also fool systems when they are printed out on a paper and photographed with a smart phone [23]. Even in a black box threat model, where the adversary has no access to the model parameters, attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or another sign [36]. This trend is worrisome and suggests that adversarial vulnerabilities need to be appropriately addressed before neural networks can be deployed in security critical applications.
In this work, we propose a new approach for developing provable defenses against -bounded adversarial attacks as well as computing robustness certifications of pre-trained deep networks with differentiable activations. In contrast to the existing certificates [55, 50] that use the first-order information (upper and lower bounds on the slope), our approach is based on the second-order information (upper and lower bounds on curvature values i.e. eigenvalues of the Hessian). Our approach is based on two key theoretically-justified steps: First, in Theorems 1 and 2, we show that if the eigenvalues of the Hessian of the network (curvatures of the network) are bounded (globally or locally), we can efficiently compute a robustness certificate and develop a defense method against -bounded adversarial attacks using convex optimization. Second, in Theorem 4, we derive a computationally-efficient differentiable bound on the curvature (eigenvalues of the Hessian) of a deep network. We derive this bound by explicitly characterizing the Hessian of a deep network in Lemma 1.
Although the problem of finding the closest adversarial example to a given point for deep nets leads to a non-convex optimization problem, our proposed Curvature-based Robustness Certificate (CRC), under some verifiable conditions, is able to compute points on the decision boundary that are provably closest to the input. That is, it provides the tightest certificate in those cases. For example, for a 2,3,4 layer networks trained on MNIST, we can find provably closest adversarial points for 44.17%, 22.59%, 19.53% cases, respectively (Table 2). To the best of our knowledge, our method is the first approach that can efficiently compute provably closest adversarial examples for a significant fraction of examples in non-trivial neural networks.
We note that un-regularized networks, specially deep ones, can obtain large curvature bounds which can lead to small robustness certificates. However, by using the derived curvature bound as a regularizer during training, we significantly decrease curvature values of the network, with little or no decrease in its performance (Table 5(b), Figure 1). Using this technique, our method significantly outperforms interval-bound propagation (IBP) [57, 52] and achieves state of the art certified accuracy (Tables 3 and 4). In particular, our method achieves certified robust accuracy 69.79%, 57.78% and 53.19% while IBP-based methods achieve 44.96%, 44.74% and 44.66% on 2,3 and 4 layer networks, respectively, on the MNIST-dataset (similar results for Fashion-MNIST).
Other recent works (e.g. Moosavi Dezfooli et al. [34], Qin et al. [38]) empirically show that using an estimate of curvature at inputs as a regularizer leads to empirical robustness on par with the adversarial training. In this work, however, we use a bound on the absolute value of curvature (and not an estimate) as a regularizer and show that it results in high certified robustness. Moreover, previous works have tried to certify robustness by bounding the Lipschitz constant of the neural network [46, 37, 56, 1, 20]. Our approach, however, is based on bounding the Lipschitz constant of the gradient which in turn leads to bound on the eigenvalues of the Hessian of deep neural networks.
Certificate problem | Attack problem | |
---|---|---|
primal problem, | ||
dual function, | ||
When is dual solvable? | ||
dual problem, | ||
When primaldual? |
In summary, we make the following contributions:
- •
- •
-
•
We provide verifiable conditions under which our method is able to compute points on the decision boundary that are provably closest to the input. Empirically, we show that this condition holds for a significant fraction of examples (Table 2).
-
•
We show that using our proposed curvature bounds as a regularizer during training leads to improved certified accuracy on 2,3 and 4 layer networks (on the MNIST and Fashion-MNIST datasets) compared to IBP-based adversarial training [51, 57] (Tables 3 and 4). Our robustness certificate (CRC) outperforms CROWN’s certificate [55] significantly when trained with our regularizer (Table 5(b)).
To the best of our knowledge, this is the first work that (a) demonstrates the utility of second-order information for provable robustness, (b) derives a framework to find the exact robustness certificates in the norm and the exact worst case adversarial perturbation in an ball of given a radius under some conditions, and (c) derives an exact closed form expression for the Hessian and bounds on the curvature values using the same.
2 Related work
In the last couple of years, several empirical defenses have been proposed for training classifiers to be robust against adversarial perturbations [30, 42, 58, 35, 24, 32, 59] Although these defenses robustify classifiers to particular types of attacks, they can be still vulnerable against stronger attacks [3, 7, 47, 2]. For example, [3] showed most of the empirical defenses proposed in ICLR 2018 can be broken by developing tailored attacks for each of them.
To end the cycle between defenses and attacks, a line of work on certified defenses has gained attention where the goal is to train classifiers whose predictions are provably robust within some given region [21, 22, 15, 8, 9, 29, 12, 17, 5, 48, 51, 49, 52, 40, 39, 13, 14, 11, 44, 19, 18, 31, 55, 50, 57]. These methods, however, do not scale to large and practical networks used in solving modern machine learning problems. Another line of defense work focuses on randomized smoothing where the prediction is robust within some region around the input with a user-chosen probability [28, 6, 26, 27, 10, 41]. Although these methods can scale to large networks, certifying robustness with probability close to 1 often requires generating a large number of noisy samples around the input which leads to high inference-time computational complexity. We discuss existing works in more details in Appendix A.
3 Notation
Consider a fully connected neural network with layers and neurons in the layer ( and ) for a multi-label classification problem with classes (). The corresponding function of the neural network is where is the dimension of the input. For an input , we use and to denote the input (before applying the activation function) and output (after applying the activation function) of neurons in the hidden layer of the network, respectively. To simplify notation and when no confusion arises, we make the dependency of and to implicit. We define and .
With a fully connected architecture, each and is computed using a transformation matrix , the bias vector and an activation function as follows:
We use as a shorthand for .
We use to denote the set and to denote the set . We use small letters etc to denote the index over a vector or rows of a matrix and capital letters to denote the index over layers of network. The element in the position of a vector is given by , the vector in the row of a matrix is while the element in the row and column of is . We use and to denote the 2-norm and the operator 2-norm of the vector and the matrix , respectively. We use and to denote the vector and matrix constructed by taking the elementwise absolute values. We use and to denote the largest and smallest eigenvalues of a symmetric matrix . We use to denote the diagonal matrix constructed by placing each element of along the diagonal. We use to denote the Hadamard Product, to denote the identity matrix. We use and to denote Linear Matrix Inequalities (LMIs) such that given two symmetric matrices and where means Positive Semi-Definite (PSD).
4 Using duality to solve the attack and certificate problems
Consider an input with true label and attack target . In the certificate problem, our goal is to find a lower bound of minimum distance between and decision boundary where . The problem for solving the exact distance (primal) can be written as:
(1) |
However, solving the above problem can be hard in general. Using the minimax theorem (primal dual), we can write the dual of the above problem as follows:
(2) |
From the theory of duality, we know that for each value of gives a lower bound on the exact certification value (the primal solution) . However, since is non-convex, solving for every can be difficult. In the next section, we will prove that the curvature of the function is bounded globally:
(3) |
In this case, we have the following theorem ( is defined in Table 1):
Theorem 1.
is a convex optimization problem for . Moreover, If is the solution to such that , then .
Below, we briefly outline the proof while the full proof is presented in Appendix E.1. The Hessian of the objective function of the dual , i.e the function inside the is given by:
From equation (3), we know that the eigenvalues of are bounded between if , and in if . In both cases, we can see that for , all eigenvalues will be non-negative, making the objective function convex. When satisfies , we have . Using the duality theorem we have and from the definition of , we have . Combining the two inequalities, we get .
Next, we consider the attack problem. The goal here is to find an adversarial example inside an ball of radius such that is minimized. Using similar arguments, we can get the following theorem for the attack problem (, and are defined in Table 1):
Theorem 2.
is a convex optimization problem for . Moreover, if is the solution to such that ,
The proof is presented in Appendix E.2. We note that both Theorems 1 and 2 hold for any non-convex function with continuous gradients. Thus they can also be of interest in problems such as optimization of neural nets.
Using Theorems 1 and 2, we have the following definitions for certification and attack optimizations:
Definition 1.
(Curvature-based Certificate Optimization) Given an input with true label , false target , we define as the solution of the following max-min optimization:
We refer to as the Curvature-based Robustness Certificate (CRC).
Definition 2.
(Curvature-based Attack Optimization) Given input with label , false target , and the ball radius , we define as the solution of the following optimization:
When is used for training in an adversarial training framework, we call the method the Curvature-based Robust Training (CRT).
A direct implication of Theorems 1 and 2 is that the tightness of our robustness certificate crucially depends on the tightness of our curvature bounds, and . If and are very large compared to the true eigenvalue bounds of the Hessian of the network, the resulting robustness certificate will be vacuous. In Table 5(b) (and Figure 1), we show that by adding the derived bound as a regularization term during the training, we can significantly decrease curvature bounds of the network, with little or no decrease in its performance. This leads to high robustness certifications against adversarial attacks.
5 Curvature Bounds for deep networks
In this section, we provide a computationally efficient approach to compute the curvature bounds for neural networks with differentiable activation functions. To the best of our knowledge, there is no prior work on finding provable bounds on the curvature values of deep neural networks.
5.1 Closed form expression for the Hessian
Using the chain rule of second derivatives, we can derive as a sum of matrix products:
Lemma 1.
Given an layer neural network, the Hessian of the hidden unit with respect to the input , i.e is given by the following formula:
where is the Jacobian of with respect to (dimensions ), and is the Jacobian of with respect to (dimensions ).
The proof is presented in Appendix E.3. Using the chain rule, we can compute , matrices in Lemma 1 recursively as follows:
This leads to a fast back-propagation like method that can be used to compute the Hessian. Note that Lemma 1 only assumes a matrix multiplication operation from to . Since a convolution operation can also be expressed as a matrix multiplication, we can directly extend this lemma to deep convolutional networks. Furthermore, Lemma 1 can also be of independent interest in other related problems such as second-order interpretation methods for deep learning (e.g. [45]).
5.2 Curvature bounds for Two Layer networks
For a two-layer network and using Lemma 1, is given by:
In the above equation, note that only the term depends on . We can maximize and minimize each element in the diag term, independently subject to the constraint that is bounded. Using this procedure, we construct matrices and that satisfy properties given in the following theorem:
Theorem 3.
Given a two layer network whose activation function has bounded second derivative:
-
(a)
We have the following linear matrix inequalities (LMIs):
-
(b)
If and , is PSD, is a NSD matrix.
-
(c)
This gives the following global bounds on the eigenvalues of the Hessian:
(4) where
and are independent of and defined in equations (58) and (59) in Appendix E.4.
The proof is presented in Appendix E.4. Because power iteration finds the eigenvalue with largest magnitude, we can use it to find and only when is PSD and is NSD. We solve for and for sigmoid, tanh, softplus activation functions in Appendix F and show that this is in fact the case for them.
We note that this result does not hold for ReLU networks since the ReLU function is not differentiable everywhere. However, in Appendix G, we devise a method to compute the certificate for a two layer ReLU network by finding a quadratic function that is a provable lower bound for . We show that the resulting method significantly outperforms CROWN-Ada (see Appendix Table 15).
5.3 Curvature bounds for Deep networks
Using Lemma 1, we know that is a sum product of matrices and . Thus, if we can find upper bounds for and , we can get upper bounds for . Using this intuition (proof is presented in Appendix E.5), we have the following result:
Theorem 4.
Given an layer neural network whose activation function satifies:
the absolute value of eigenvalues of is globally bounded by the following quantity:
where and are independent of and defined recursively as:
(5) | |||
(6) |
The above expressions allows for an extremely efficient computation of the curvature bounds for deep networks. We consider simplification of this result for sigmoid, tanh, softplus activations in Appendix F. The curvature bounds for can be computed by replacing with in Theorem 4. The resulting bound is independent of , and only depends on network weights , the true label , and the target . We denote it with . To simplify notation, when no confusion arises we denote it with . In our experiments, for two layer networks, we use , from Theorem 3 since it provides tighter curvature bounds. For deeper networks (), we use .


6 Adversarial training with curvature regularization
Since the term in Lemma 1 is the Jacobian of with respect to , , it is equal to the lipschitz constant of the neural network constructed from the first layers of the original network. Finding tight bounds on the lipschitz constant is an active area of research [16, 50, 43] and the product of the operator norm of weight matrices is known to be a loose bound on the lipschitz constant for deep networks. Since we use the same product to compute the bound for in Theorem 4, the resulting curvature bound is likely to be loose for very deep networks.
In Figure 1, we observe the same trend: as the depth of the network increases, the upper bound computed using Theorem 4 becomes significantly larger than the lower bound (computed by taking the maximum of the largest eigenvalue of the Hessian across all test images with label and the second largest logit , then averaging across different ). However, by regularizing the network to have small curvature during training, the bound becomes significantly tighter. Interestingly, using curvature regularization, even with this loose curvature bound for deep nets, we achieve significantly higher robust accuracy than the current state of the art methods while enjoying significantly higher standard accuracy as well (see Tables 3 and 4).
To regularize the network to have small curvature values, we penalize the curvature bound during training. To compute the gradient of with respect to the network weights, note that using Theorem 4, we can compute using absolute value, matrix multiplications, and operator norm. Since the gradient of operator norm does not exist in standard libraries, we created a new layer where the gradient of i.e is given by:
Note that , and can be computed using power iteration. Since the network weights do not change significantly during a single training step, we can use the singular vectors and computed in the previous training step to update using one iteration of power method. This approach to compute the gradient of the largest singular value of a matrix has also been used in previous published work [33]. Thus, the per-sample loss for training with curvature regularization is given by:
(7) |
where denotes the cross entropy loss, is the true label of the input , is the attack target and is the regularization coefficient for penalizing large curvature values. Similar to the adversarial training, in CRT, we use instead of in equation (7).
7 Experiments
The certified robust accuracy means the fraction of correctly classified test samples whose robustness certificates (computed using CRC) are greater than a pre-specified radius . Unless otherwise specified, we use the class with the second largest logit as the attack target (i.e. the class ). The notation (, activation) denotes a neural network with layers with the specified activation, () denotes standard training with set to , while (CRT, ) denotes CRT training with . Certificates are computed over 150 randomly chosen correctly classified images. We use a single NVIDIA GeForce RTX 2080 Ti GPU.
7.1 Fraction of inputs with tightest robustness certificate
Using the verifiable condition of Theorems 1 and 2, our approach is able to (1) find points that are provably the worst case adversarial perturbations (in the norm) in the attack problem and (2) find points on the decision boundary that are provably closest to the input in the norm in the certification problem. In particular, in Table 2, we observe that for curvature regularized networks, our approach finds provably worst-case adversarial perturbations for all of the inputs with a small drop in the accuracy. Moreover, for 2,3,and 4 layer networks, our method finds provably closest adversarial examples for 44.17%, 22.59% and 19.53% of inputs in the MNIST test set, respectively.
Network | Accuracy | Attack success | Certificate success | |
---|---|---|---|---|
, sigmoid | 0. | 98.77% | 5.05% | 2.24% |
0.03 | 98.30% | 100% | 44.17% | |
, sigmoid | 0. | 98.52% | 0.% | 0.12% |
0.05 | 97.60% | 100% | 22.59% | |
, sigmoid | 0. | 98.22% | 0.% | 0.01% |
0.07 | 95.24% | 100% | 19.53% |
Note that the technique presented in this work is not applicable to ReLU networks due to the absence of curvature information. However, since verifying the robustness property in an ball around the input is known to be an NP-complete problem for ReLU networks [22], it is computationally infeasible to even verify that a given adversarial perturbation is the worst case perturbation in polynomial time unless P=NP. We however show that it is possible to find (and not just verify) the exact worst case perturbation (and robustness certificate) for neural networks with smooth activation functions. We believe that these theoretical and empirical results provide evidence that bounding the curvature of the network and using smooth activation functions can be critical to achieve high robustness guarantees.
7.2 Comparison with existing provable defenses
We compare against certified defense techniques proposed in Wong et al. [52] and Zhang et al. [57] in Table 3 for the MNIST dataset [25] and Table 4 for the Fashion-MNIST dataset [53] with radius . Even though our proposed CRT requires fully differentiable activation functions such as softplus, sigmoid, tanh etc, we include comparison with ReLU networks because the methods proposed in Wong et al. [52], Zhang et al. [57] use ReLU. Since CROWN-IBP can be trained using the softplus activation function, we include it in our comparison. Similar comparison with radius is given in Appendix Table 8 (MNIST dataset) and Table 9 (Fashion-MNIST dataset). We observe that CRT (certified with CRC) gives significantly higher certified robust accuracy as well as standard accuracy compared to either of the methods on both MNIST and Fashion-MNIST datasets for both different values of . Since shallow fully connected networks are known to perform poorly on the CIFAR-10 dataset, we do not include those results in our comparison.
Network | Training | Standard Accuracy | Certified Robust Accuracy |
---|---|---|---|
, softplus | CRT, 0.01 | 98.68% | 69.79% |
CROWN-IBP | 88.48% | 42.36% | |
, relu | COAP | 89.33% | 44.29% |
CROWN-IBP | 89.49% | 44.96% | |
, softplus | CRT, 0.05 | 97.43% | 57.78% |
CROWN-IBP | 86.58% | 42.14% | |
, relu | COAP | 89.12% | 44.21% |
CROWN-IBP | 87.77% | 44.74% | |
, softplus | CRT, 0.07 | 95.60% | 53.19% |
CROWN-IBP | 82.74% | 41.34% | |
, relu | COAP | 90.17% | 44.66% |
CROWN-IBP | 84.4% | 43.83% |
Network | Training | Standard Accuracy | Certified Robust Accuracy |
---|---|---|---|
, softplus | CRT, 0.01 | 80.31% | 54.39% |
CROWN-IBP | 69.23% | 47.19% | |
, relu | COAP | 74.1% | 46.3% |
CROWN-IBP | 70.73% | 48.61% | |
, softplus | CRT, 0.05 | 78.39% | 53.4% |
CROWN-IBP | 68.72% | 46.52% | |
, relu | COAP | 73.9% | 46.3% |
CROWN-IBP | 70.79% | 48.69% | |
, softplus | CRT, 0.07 | 75.61% | 49.6% |
CROWN-IBP | 68.31% | 46.21% | |
, relu | COAP | 73.6% | 45.1% |
CROWN-IBP | 70.21% | 48.08% |
In Appendix Table 11, we compare CRT with Randomized Smoothing [10]. For 2 & 3 layer networks, we achieve higher robust accuracy. However, we note that since our certificate is deterministic while the smoothing-based certificate is probabilistic (although with high probability), the results are not directly comparable. As a separate result, we also prove that randomized smoothing bounds the curvature of the network (Theorem 5 in Appendix E.6). We also include comparison with empirical defense methods namely PGD and TRADES in Appendix Table 14.
Network | Training | Standard Accuracy | Certified Robust Accuracy |
---|---|---|---|
, sigmoid | standard | 98.37% | 54.17% |
98.08% | 83.53% | ||
CRT, 0.01 | 98.57% | 95.59% | |
, sigmoid | standard | 98.37% | 0.00% |
97.71% | 88.33% | ||
CRT, 0.01 | 97.23% | 94.99% | |
, sigmoid | standard | 98.39% | 0.00% |
97.41% | 89.61% | ||
CRT, 0.01 | 97.83% | 93.41% |
Network | Training | Certificate (mean) | |
---|---|---|---|
CROWN | CRC | ||
, sigmoid | standard | 0.28395 | 0.48500 |
0.32548 | 0.84719 | ||
CRT, 0.01 | 0.43061 | 1.54673 | |
, sigmoid | standard | 0.24644 | 0.06874 |
0.39799 | 1.07842 | ||
CRT, 0.01 | 0.39799 | 1.07842 | |
, sigmoid | standard | 0.19501 | 0.00454 |
0.40620 | 1.05323 | ||
CRT, 0.01 | 0.40327 | 1.06208 |
7.3 Comparison with existing certificates
In Table 5(b), we compare CRC with CROWN-general [54]. For 2-layer networks, CRC outperforms CROWN significantly. For deeper networks, CRC works better than CROWN when the network is trained with curvature regularization. However, with small , we see a significant increase in CRC but a very small drop in the test accuracy (without any adversarial training). We can see that with , non-trivial certified accuracies of can be achieved on layer sigmoid networks, respectively, without any adversarial training. Adversarial training using CRT further boosts certified accuracy to and , respectively. We show some results on CIFAR-10 dataset in Appendix Table 13. We again observe improvements in the robustness certificate and certified robust accuracy using CRC and CRT.
7.4 Results using local curvature bounds
From Theorems 1 and 2, we can observe that if the curvature is locally bounded within a convex region around the input (we call it the ”safe” region), then the corresponding dual problems (, ) are again convex optimization problems provided the optimization trajectory does not escape the safe region.
Theorem 3 can be directly extended to compute the local curvature bound using bounds on the second derivatives, i.e. in the local region. In Table 6, we show significant improvements for the CRC certificate for two-layer sigmoid networks on the MNIST dataset for . However, with the curvature regularization, the difference is insignificant. We also observe that the certified accuracy for (CRT, 0.0) improves from 95.04% to 95.31% and for standard improves from 54.17% to 58.06%. The certified accuracy remains the same for other cases. Implementation details are in the Appendix Section C.6.
Computing local curvature bounds for deeper networks, however, is more challenging due to the presence of terms involving multiplication of first and second derivatives. A straightforward extension of Theorem 4 4, wherein we compute the upper bound on and in a local region around the input across all neurons in all layers does not yield significant improvements over the global method, therefore we do not include those results in our comparison.
Network | Training | CRC (Global) | CRC (Local) |
---|---|---|---|
, sigmoid | standard | 0.5013 | 0.5847 |
CRT, | 1.0011 | 1.1741 | |
CRT, | 1.5705 | 1.6047 | |
CRT, | 1.6720 | 1.6831 |
8 Conclusion
In this paper, we develop computationally-efficient convex relaxations for robustness certification and adversarial attack problems given the classifier has a bounded curvature. We also show that this convex relaxation is tight under some general verifiable conditions. To be able to use proposed certification and attack convex optimizations, we derive global curvature bounds for deep networks with differentiable activation functions. This result is a consequence of a closed-form expression that we derive for the Hessian of a deep network. Adversarial training using our attack method coupled with curvature regularization results in a significantly higher certified robust accuracy than the existing provable defense methods. Our proposed curvature-based robustness certificate significantly outperforms the CROWN certificate when trained with our regularizer. Scaling up our proposed curvature-based robustness certification and training methods as well as further tightening the derived curvature bounds are among interesting directions for the future work.
Appendix
Appendix A Related work
Many defenses have been proposed to make neural networks robust against adversarial examples. These methods can be classified into empirical defenses which empirically seem to be robust against known adversarial attacks, and certified defenses, which are provably robust against such attacks.
Empirical defenses The best known empirical defense is adversarial training [24, 30, 58]. In this method, a neural network is trained to minimize the worst-case loss over a region around the input. Although such defenses seem to work on existing attacks, there is no guarantee that a more powerful attack would not break them. In fact, most such defenses proposed in the literature were later broken by stronger attacks [3, 7, 47, 2]. To end this arms race between defenses and attacks, a number of works have tried to focus on certified defenses that have formal robustness guarantees.
Certified defenses A classifier is said to be certifiably robust if one can easily obtain a guarantee that a classifier’s prediction remains constant within some region around the input. Such defenses typically rely on certification methods which are either exact or conservative. Exact methods report whether or not there exists a adversarial perturbation inside some norm ball. In contrast, conservative methods either certify that no adversarial perturbation exists or decline to make a certification; they may decline even when no such perturbation exists. Exact methods are usually based on Satisfiability Modulo Theories [21, 22, 15, 8] and Mixed Integer linear programming [9, 29, 12, 17, 5]. Unfortunately, they are computationally inefficient and difficult to scale up to even moderately sized neural networks. In contrast, conservative methods are more scalable and efficient which makes them useful for building certified defenses [48, 51, 49, 52, 40, 39, 13, 14, 11, 44, 19, 18, 31, 55, 50]. However, even these methods have not been shown to scale to practical networks that are large and expressive enough to perform well on ImageNet, for example. To scale to such large networks, randomized smoothing has been proposed as a probabilistically certified defense.
Randomized smoothing Randomized smoothing was previously proposed by several works [28, 6] as a empirical defense without any formal guarantees. [26] first proved robustness guarantees for randomized smoothing classifier using inequalities from differential privacy. [27] improved upon the same using tools from information theory. Recently, [10] provided a even tighter robustness guarantee for randomized smoothing. [41] proposed a method of adversarial training for the randomized smoothing classifier giving state of the art results in the norm metric.
Appendix B The Attack problem
For a given input with true label and attack target , consider the attack problem. We are given that the eigenvalues of the Hessian are bounded below i.e:
Here (since is not convex in general).
The goal here is to find an adversarial example inside a ball of radius such that is minimized. That is, we want to solve the following optimization:
(8) |
This optimization can be hard in general. Using the max-min inequality (primal dual), we have:
(9) |
We know that for every gives a lower bound to the primal solution . But solving for any can be hard unless the objective is convex. We prove that if the eigenvalues of the Hessian are bounded below i.e:
In general , since is non-convex.
is a convex optimization problem for . Equivalently the objective function, i.e the function inside the :
is a convex function in for .
The Hessian of the above function is given by:
Since we know that eigenvalues of , we know that eigenvalues of the above Hessian are . For , the eigenvalues are positive implying that the objective function is convex.
Since gives a lower bound to for every , we get the following result:
(10) |
Note that if is the solution to such that: , by the definition of :
But then by the definition of , implying that the duality gap is zero, i.e . This procedure leads to the Theorem 2.
Appendix C Implementation Details
C.1 Computing the derivative of largest singular value
Our objective is to compute derivative of the largest singular value, i.e with respect to . Let be the singular vectors such that . Then the derivative is given by:
can be computed by running power iteration on . can be computed using the identity:
We use 25 iterations of the power method to compute the above quantities.
C.2 Update equation for the certificate problem
Our goal is to minimize such that . We know that the Hessian satisfies the following LMIs:
(11) |
is given by Theorem 4 for neural network of any depth (). For layer networks, and are given by Theorem 3. But for deeper networks (), , . In either case, . Thus, we also have:
(12) |
We will solve the dual () of the attack problem ().
The primal problem is given by:
Using inequality (11) and Theorem 1 part (a), we know that the dual of the above problem is convex when .
The corresponding dual problem () is given by:
For a given , we have the following optimization:
We will use majorization-minimization to solve this optimization.
At a point , we aim to solve for the point that decreases the objective function. Using the Taylor’s theorem at point , we have:
where is the gradient of at and is the Hessian at a point on the line connecting and .
Multiplying both sides by , we get the following equation:
(13) |
Using inequality (12), we know that ,
(14) |
Using equation (13) and inequality (14):
Adding to both sides, we get the following inequality:
LHS is the objective function of and RHS is an upper bound. In majorization-minimization, we minimize an upper bound on the objective function. Thus we set the gradient of RHS with respect to to zero and solve for :
This gives the following iterative equation:
(15) |
C.3 Update equation for the attack problem
Our goal is to minimize within an ball of radius of . We know that the Hessian satisfies the following LMIs:
(16) |
is given by Theorem 4 for neural network of any depth (). For layer networks, and are given by Theorem 3. But for deeper networks (), , . In either case, . Thus, we also have:
(17) |
We solve the dual () of the attack problem () for the given radius .
The primal problem is given by:
Using inequality (16) and Theorem 2 part (a), we know that the dual of the above problem is convex when .
The corresponding dual problem is given by:
where is given as follows:
For a given , we have the following optimization:
We will use majorization-minimization to solve this optimization.
At a point , we have to solve for the point that decreases the objective function. Using the Taylor’s theorem at point , we have:
(18) |
where is the gradient of at and is the Hessian at a point on the line connecting and .
Using inequality (17), we know that ,
(19) |
Using equation (18) and inequality (19):
Adding to both sides, we get the following inequality:
LHS is the objective function of and RHS is an upper bound. In majorization-minimization, we minimize an upper bound on the objective function. Thus we set the gradient of RHS with respect to to zero and solve for :
Rearranging the above equation, we get:
This gives the following iterative equation:
(20) |
C.4 Algorithm to compute the certificate
We start with the following initial values of :
To solve the dual for a given value of , we run iterations of the following update (derived in Appendix C.2):
To maximize the dual over in the range , we use a bisection method: If the solution for a given value of , , set , else set . Set the new and repeat. The maximum number of updates to are set to 30. This method satisfied linear convergence. The routine to compute the certificate example is given in Algorithm 1.
C.5 Algorithm to compute the attack
We start with the following initial values of :
To solve the dual for a given value of , we run iterations of the following update (derived in Appendix C.3):
To maximize the dual over in the range , we use a bisection method: If the solution for a given value of , , set , else set . Set new and repeat. The maximum number of updates to are set to 30. This method satisfied linear convergence. The routine to compute the attack example is given in Algorithm 2.
C.6 Computing certificate using local curvature bounds
To compute the robustness certificate in a local region around the input, we first compute the certificate using the global bounds on the curvature. Using the same certificate as the initial radius of the safe region, we can refine our certificate. Due to the reduction in curvature, this will surely increase the value of the certificate. We then use the new robustness certificate as the new radius of the safe region and repeat. We iterate over this process times to compute the local version of our robustness certificate.
To ensure that the optimization trajectory does not escape the safe region, whenever the gradient descent step lies outside the ”safe” region, we reduce the step size by a factor of two until it lies inside the region.
Appendix D Summary Table comparing out certification method against existing methods
Table 7 provides a summary table comparing our certification method against the existing methods.
Method | Non-trivial bound | Multi-layer | Activation functions | Norm |
[46] | ✗ | ✓ | All | |
[22] | ✓ | ✓ | ReLU | |
[20] | ✓ | ✗ | Differentiable | |
[39] | ✓ | ✗ | ReLU | |
[51] | ✓ | ✓ | ReLU | |
[50] | ✓ | ✓ | ReLU | |
[55] | ✓ | ✓ | All | |
[10] | ✓ | ✓ | All | |
Ours | ✓ | ✓ | Differentiable |
Appendix E Proofs
E.1 Proof of Theorem 1
-
(a)
We are given that the Hessian satisfies the following LMIs:
The eigenvalues of are bounded between:
We are given that satisfies the following inequalities where since is neither convex, nor concave as a function of :
We have the following inequalities:
Thus, is a PSD matrix for all when .
Thus is a convex function in and is a convex optimization problem. -
(b)
For every value of , is a lower bound for . Thus is a lower bound for , i.e:
(21) Let be the solution of the above dual optimization () such that
(22) is given by the following:
Since we are given that , we get the following equation for :
(23) Since is given by the following equation:
(24)
E.2 Proof of Theorem 2
-
(a)
Since the Hessian is bounded below:
The eigenvalues of are bounded below:
Since .
Thus is a PSD matrix for all when .
Thus is a convex function in and is a convex optimization problem. -
(b)
For every value of , is a lower bound for . Thus is a lower bound for :
(27) Let be the solution of the above dual optimization () such that
(28) is given by the following:
(29) Since we are given that , we get the following equation for :
(30) Since is given by the following equation:
(31) Using equations (28) and (31), is the minimum value of :
(32) From equation (30), we know that . Thus, we get:
(33) Using equation (27) we have and using (33),
E.3 Proof of Lemma 1
We have to prove that for an layer neural network, the hessian of the hidden unit in the layer with respect to the input , i.e is given by the following formula:
(34) |
where , is a matrix of size defined as follows:
(35) |
and is a matrix of size defined as follows:
(36) |
can be written in terms of the activations of the previous layer using the following formula:
(37) |
Using the chain rule of the Hessian and , we can write in terms of and as the following:
(38) |
Replacing using equation (38) into equation (37), we get:
(39) | |||
(40) |
For each , we define the matrix as the following:
(41) | |||
(42) |
Substituting using equation into equation (40), we get:
(43) |
We first simplify the expression for . Note that is a sum of symmetric rank one matrices with the coefficient for each . We create a diagonal matrix for the coefficients and another matrix such that each row of is the vector . This leads to the following equation:
(44) |
where is a matrix of size defined as follows:
Thus is the jacobian of with respect to the input .
Using the chain rule of the gradient, we have the following properties of :
(45) | |||
(46) |
Similarly, where is a matrix of size defined as follows:
Thus is the jacobian of with respect to the activations .
Using the chain rule of the gradient, we have the following properties for :
(47) | |||
(48) |
Recall that in our notation: For a matrix , denotes the column vector constructed by taking the transpose of the row of the matrix . Thus row of is and is . Equating the rows in equation (48), we get:
Taking the transpose of both the sides and expressing the RHS as a summation, we get:
(49) |
Substituting using equation (47) into equation (44), we get:
(50) |
Substituting using equation (50) into (43), we get:
(51) |
Thus, equation (51) allows us to write the hessian of unit at layer , i.e in terms of the hessian of unit at layer , i.e .
We will prove the following using induction:
(52) |
Note that for . Thus using (51) we have:
Hence the induction hypothesis (52) is true for .
Now we will assume (52) is true for . Thus we have:
(53) |
We will prove the same for .
Using equation (51), we have:
In the next set of steps, we will be working with the second term of the above equation, i.e:
Substituting using equation (53) we get:
(54) | |||
Combining the two summations in the second term, we get:
Exchanging the summation over and summation over :
Since is independent of , we take it out of the summation over :
Using the property, ; we can move the summation inside the diagonal:
Since is independent of , we can take it out of the summation over :
Using equation (49), we can replace with :
E.4 Proof of Theorem 3
Using Lemma 1, we have the following formula for :
(55) |
We are also given that the activation function satisfies the following property:
(56) |
-
(a)
We have to prove the following linear matrix inequalities (LMIs):
(57) where and are given as following:
(58) (59) (62) (65) We first prove: :
We substitute and from equations (55) and (59) respectively in :Thus is a weighted sum of symmetric rank one matrices i.e, and it is PSD if and only if coefficient of each rank one matrix i.e, is positive. Using equations (56) and (65), we have the following:
Putting the above results together we have:
(66) Thus is a PSD matrix i.e:
(67) Now we prove that :
We substitute and from equations (55) and (59) respectively in :Thus is a weighted sum of symmetric rank one matrices i.e, and it is PSD if and only if coefficient of each rank one matrix i.e, is positive. Using equations (56) and (65), we have the following:
Putting the above results together we have:
(68) Thus is PSD matrix i.e:
(69) -
(b)
We have to prove that if and , is a PSD matrix, is a NSD matrix.
We are given , . Using equation (65), we have the following:Putting these results together we have:
(70) Thus is a weighted sum of symmetric rank one matrices i.e, and each coefficient is positive.
Using equation (65), we have the following:
Putting these results together we have:
(71) Thus is a PSD and is a NSD matrix if and .
- (c)
E.5 Proof of Theorem 4
We are given that the activation function is such that are bounded, i.e:
(76) |
We have to prove the following:
where is a matrix of size defined as follows:
(77) |
and is a scalar defined as follows:
(78) |
We will prove the same in steps.
In step (a), we will prove:
(79) |
In step (b), we will prove:
(80) |
In step (c), we will use (a) and (b) to prove:
(81) |
-
(a)
We have to prove that for :
where is a matrix of size defined as follows:
We first prove the case when .
Using equation (47), .
Since :Hence for , we have equality in (79). Hence proved.
Now, we will use proof by induction.
To prove the base case , note that is the only possible value for . Thus, using the result for , the theorem holds for . This proves the base case.
Now we assume the induction hypothesis is true for depth . and prove for depth . Since for , we have proven already, we prove for .
Using equation (49), we have the following formula for :Taking the element of the vectors on both sides:
(82) By induction hypothesis, we know that:
(83) Using the absolute value properties for equation (82), we have:
Using (inequality (76)) :
Using the induction hypothesis (inequality (83)):
Using equation (77) for definition of :
Hence we prove (79) for all and using induction.
-
(b)
We have to prove that for :
where is a scalar given as follows:
-
(c)
We have to prove that:
Using Lemma 1, we have the following equation for :
Using the properties of norm we have:
In the last inequality, we use the property that norm of a diagonal matrix is the maximum absolute value of the diagonal element. Using the product property of absolute value, we get:
Since and are positive terms:
Since is bounded by :
Using inequality (79):
Using inequality (80):
E.6 Proof of Theorem 5
Theorem 5.
For a binary classifier , let denote the indicator function such that . Let be the function constructed by applying randomized smoothing on such that:
then the curvature of the resulting function is bounded i.e:
Proof.
Since , , and :
∎
Appendix F Computing and for different activation functions
F.1 Softplus activation
F.2 Sigmoid activation
For sigmoid activation, we have the following. We use to denote sigmoid:
The second derivative of sigmoid can be bounded using standard differentiation. Let denote . We know that :
To solve for both and , we first differentiate with respect to :
Solving for , we get the solutions:
Since both lie between 0 and 1, we check for the second derivatives:
At , .
At , .
Thus is a local minima, is a local maxima.
Substituting the two critical points into , we get , .
Thus, (for use in Theorem 3) and (for use in Theorem 4).
F.3 Tanh activation
For tanh activation, we have the following:
The second derivative of tanh , i.e can be bounded using standard differentiation. Let denote . We know that :
To solve for both and , we first differentiate with respect to :
Solving for , we get the solutions:
Since both lie between -1 and 1, we check for the second derivatives:
At , .
At , .
Thus is a local minima, is a local maxima.
Substituting the two critical points into , we get , .
Thus, (for use in Theorem 3) and (for use in Theorem 4).
Appendix G Quadratic bounds for two-layer ReLU networks
For a 2 layer network with ReLU activation, such that the input lies in the ball , we can compute the bounds over directly:
Thus we can get a lower bound and upper bound for each . We define and as the following:
(86) | |||
(87) |
We can derive the following quadratic lower and upper bounds for each :
The above steps are exactly the same as the quadratic upper and lower bounds used in [54].
Using the above two inequalities and the identity:
we can compute a quadratic lower bound for in terms of by taking the lower bound for when and upper bound when . Furthermore since , we can express the resulting quadratic in terms of . Thus, we get the following quadratic function :
The coefficients , and can be determined using the above procedure. Note that unlike in [54], RHS can be a non-convex function.
Thus, it becomes an optimization problem where the goal is to minimize the distance subject to RHS (which is quadratic in ) being zero. That is both our objective and constraint are quadratic functions. In the optimization literature, this is called the S-procedure and is one of the few non-convex problems that can be solved efficiently [4].
We start with two initial values called (initialized to 0) and (initialized to 5).
We start with an initial value of , initialized at to compute (eq. (86)) and (eq. (87)). If the final distance after solving the S-procedure is less than , we set . if the final distance is greater than , we set . Set new . Repeat until convergence.
Appendix H Additional experiments
Empirical accuracy means the fraction of test samples that were correctly classified after running a PGD attack [30] with an bound on the adversarial perturbations. Certified accuracy means the fraction of test samples that were classified correctly initially and had the robustness certificate greater than a pre-specified attack radius . Unless otherwise specified, for both empirical and certified accuracy, we use . Unless otherwise specified, we use the class with the second largest logit as the attack target for the given input (i.e. the class ). Unless specified, the experiments were run on the MNIST dataset while noting that our results are scalable for more complex datasets. The notation (, activation) denotes a neural network with layers with the specified activation function, () denotes standard training with set to , (CRT, ) denotes CRT training with . Certificates CROWN and CRC are computed over 150 correctly classified images.
H.1 Computing and
First, note that does not depend on the input, but on network weights , label and target . Different images may still have different because label and target may be different.
To compute in the table, first for each pair and , we find the largest eigenvalue of the Hessian of all test images that have label and second largest logit of class . Then we take the max of the largest eigenvalue across all test images. This gives a rough estimate of the largest curvature in the vicinity of test images with label and target . We can directly take the mean across all such pairs to compute . However, we find that some pairs and were infrequent (with barely 1,2 test images in them). Thus, for all such pairs we cannot get a good estimate of the largest curvature in vicinity. We select all pairs and that have at least 100 images in them and compute by taking the mean across all such pairs.
To compute in the table, we compute for all pairs and that have at least 100 images, i.e at least 100 images should have label and target . And then we compute the mean across all that satisfy this condition. This was done to do a fair comparison with . Figure 2 shows a plot of the and with increasing for a sigmoid network (with 4 layers).

H.2 Comparison with provable defenses
In this section, we compare Curvature-based Robust Training (Ours) against state-of-the-art interval-bound propagation based adversarial training methods: COAP i.e Convex Outer Adversarial Polytope [51] and CROWN-IBP [57] with different attack radius on MNIST and Fashion-MNIST datasets. For CROWN-IBP, we vary the final_beta parameter between 0.5 to 3 (using an interval of 0.1) and choose the model with best certified accuracy.
Network | Training | Standard Accuracy | Certified Accuracy |
---|---|---|---|
, softplus | CRT, 0.01 | 98.69% | 95.5% |
CROWN-IBP | 98.72% | 89.31% | |
, relu | CROWN-IBP | 98.69% | 91.38% |
COAP | 98.8% | 90.2% | |
, softplus | CRT, 0.01 | 98.56% | 94.44% |
CROWN-IBP | 98.55% | 88.67% | |
, relu | CROWN-IBP | 98.9% | 90.67% |
COAP | 98.9% | 89.0% | |
, softplus | CRT, 0.01 | 98.43% | 93.35% |
CROWN-IBP | 98.34% | 87.41% | |
, relu | CROWN-IBP | 98.78% | 90.45% |
COAP | 98.9% | 89.0% |
Network | Training | Standard Accuracy | Certified Robust Accuracy |
, softplus | CRT, 0.01 | 88.45% | 78.45% |
, relu | COAP | 86.0% | 74.0% |
CROWN-IBP | 85.89% | 74.62% | |
, softplus | CRT, 0.01 | 86.21% | 76.94% |
, relu | COAP | 85.9% | 74.3% |
CROWN-IBP | 86.27% | 74.56% | |
, softplus | CRT, 0.01 | 86.37% | 75.02% |
, relu | COAP | 85.9% | 74.2% |
CROWN-IBP | 86.03% | 74.38% |
Network | Training | Standard Accuracy | Certified Robust Accuracy |
---|---|---|---|
, softplus | CRT, 0.01 | 98.68% | 69.79% |
CROWN-IBP | 88.48% | 42.36% | |
, relu | COAP | 89.33% | 44.29% |
CROWN-IBP | 89.49% | 44.96% | |
, softplus | CRT, 0.01 | 98.26% | 14.21% |
CRT, | 97.82% | 50.72% | |
CRT, 0.05 | 97.43% | 57.78% | |
CROWN-IBP | 86.58% | 42.14% | |
, relu | COAP | 89.12% | 44.21% |
CROWN-IBP | 87.77% | 44.74% | |
, softplus | CRT, | 97.80% | 6.25% |
CRT, | 97.09% | 29.64% | |
CRT, | 96.33% | 44.44% | |
CRT, 0.07 | 95.60% | 53.19% | |
CROWN-IBP | 82.74% | 41.34% | |
, relu | COAP | 90.17% | 44.66% |
CROWN-IBP | 84.4% | 43.83% |
H.3 Comparing Randomized Smoothing with CRT
Network | Randomized Smoothing | CRT | ||
---|---|---|---|---|
_ | ||||
, sigmoid | 93.75% | 93.09% | 88.91% | 95.61% |
, tanh | 94.61% | 93.08% | 82.26% | 95.00% |
, sigmoid | 94.00% | 93.03% | 86.58% | 94.99% |
, tanh | 93.69% | 91.68% | 80.55% | 94.16% |
, sigmoid | 93.68% | 92.45% | 84.99% | 93.41% |
, tanh | 93.57% | 92.19% | 83.90% | 91.37% |
Since, randomized smoothing is designed to work in untargeted attack settings while CRT is for targeted attacks, we make the following changes in randomized smoothing. First, we use initial samples to select the label class () and false target class (). The samples for estimation were and failure probability was . Then we use the binary version of randomized smoothing for estimation, i.e classify between and . To find the adversarial example for adversarial training, we use the cross entropy loss for classes ( and ).
H.4 Additional experiments
Network | Accuracy | Attack success rate | Certificate success rate | |
---|---|---|---|---|
, sigmoid | 0. | 98.77% | 5.05% | 2.24% |
0.01 | 98.57% | 100% | 15.68% | |
0.02 | 98.59% | 100% | 31.56% | |
0.03 | 98.30% | 100% | 44.17% | |
, sigmoid | 0. | 98.52% | 0.% | 0.12% |
0.01 | 98.23% | 44.86% | 3.34% | |
0.03 | 97.86% | 100% | 11.51% | |
0.05 | 97.60% | 100% | 22.59% | |
, sigmoid | 0. | 98.22% | 0.% | 0.01% |
0.01 | 97.24% | 24.42% | 2.68% | |
0.03 | 96.27% | 44.42% | 6.45% | |
0.05 | 95.77% | 99.97% | 12.40% | |
0.06 | 95.52% | 100% | 15.87% | |
0.07 | 95.24% | 100% | 19.53% |
Network | Training | Standard Accuracy | Empirical Robust Accuracy | Certified Robust Accuracy | Certificate (mean) | |
CROWN | CRC | |||||
, sigmoid | standard | 46.23% | 37.82% | 14.10% | 0.37219 | 0.38173 |
45.42% | 38.17% | 26.50% | 0.40540 | 0.55010 | ||
, sigmoid | standard | 48.57% | 34.80% | 0.00% | 0.19127 | 0.01404 |
50.31% | 39.87% | 18.28% | 0.24778 | 0.37895 | ||
, sigmoid | standard | 46.04% | 34.38% | 0.00% | 0.19340 | 0.00191 |
48.28% | 40.10% | 21.07% | 0.29654 | 0.40005 |
Network | Training | Standard Accuracy | Empirical Robust Accuracy | Certified Robust Accuracy | Certificate (mean) | |
CROWN | CRC | |||||
, sigmoid | PGD | 98.80% | 96.26% | 93.37% | 0.37595 | 0.82702 |
TRADES | 98.87% | 96.76% | 95.13% | 0.41358 | 0.92300 | |
CRT, | 98.57% | 96.28% | 95.59% | 0.43061 | 1.54673 | |
, tanh | PGD | 98.76% | 95.79% | 84.11% | 0.30833 | 0.61340 |
TRADES | 98.63% | 96.20% | 93.72% | 0.40601 | 0.86287 | |
CRT, | 98.52% | 95.90% | 95.00% | 0.37691 | 1.47016 | |
, sigmoid | PGD | 98.84% | 96.14% | 0.00% | 0.29632 | 0.07290 |
TRADES | 98.95% | 96.79% | 0.00% | 0.30576 | 0.09108 | |
CRT, | 98.23% | 95.70% | 94.99% | 0.39603 | 1.24100 | |
, tanh | PGD | 98.78% | 94.92% | 0.00% | 0.12706 | 0.03036 |
TRADES | 98.16% | 94.78% | 0.00% | 0.15875 | 0.02983 | |
CRT, | 98.15% | 95.00% | 94.16% | 0.28004 | 1.14995 | |
, sigmoid | PGD | 98.84% | 96.26% | 0.00% | 0.25444 | 0.00658 |
TRADES | 98.76% | 96.67% | 0.00% | 0.26128 | 0.00625 | |
CRT, | 97.83% | 94.65% | 93.41% | 0.40327 | 1.06208 | |
, tanh | PGD | 98.53% | 94.53% | 0.00% | 0.07439 | 0.00140 |
TRADES | 97.08% | 92.85% | 0.00% | 0.11889 | 0.00068 | |
CRT, | 97.24% | 93.05% | 91.37% | 0.33649 | 0.93890 |
Network | Training | Target | Certificate (mean) | Time per Image (s) | ||
---|---|---|---|---|---|---|
CROWN | CRC | CROWN | CRC | |||
, relu | standard | runner-up | 0.50110 | 0.59166 | 0.1359 | 2.3492 |
random | 0.68506 | 0.83080 | 0.2213 | 3.5942 | ||
least | 0.86386 | 1.04883 | 0.1904 | 3.0292 | ||
, sigmoid | standard | runner-up | 0.28395 | 0.48500 | 0.1818 | 0.1911 |
random | 0.38501 | 0.69087 | 0.1870 | 0.1912 | ||
least | 0.47639 | 0.85526 | 0.1857 | 0.1920 | ||
CRT, | runner-up | 0.43061 | 1.54673 | 0.1823 | 0.1910 | |
random | 0.52847 | 1.99918 | 0.1853 | 0.1911 | ||
least | 0.62319 | 2.41047 | 0.1873 | 0.1911 | ||
, tanh | standard | runner-up | 0.23928 | 0.40047 | 0.1672 | 0.1973 |
random | 0.31281 | 0.52025 | 0.1680 | 0.1986 | ||
least | 0.38964 | 0.63081 | 0.1726 | 0.1993 | ||
CRT, | runner-up | 0.37691 | 1.47016 | 0.1633 | 0.1963 | |
random | 0.45896 | 1.87571 | 0.1657 | 0.1982 | ||
least | 0.52800 | 2.21704 | 0.1697 | 0.1981 | ||
, sigmoid | standard | runner-up | 0.24644 | 0.06874 | 1.6356 | 0.5012 |
random | 0.29496 | 0.08275 | 1.5871 | 0.5090 | ||
least | 0.33436 | 0.09771 | 1.6415 | 0.5056 | ||
CRT, | runner-up | 0.39603 | 1.24100 | 1.5625 | 0.5013 | |
random | 0.46808 | 1.54622 | 1.6142 | 0.4974 | ||
least | 0.51906 | 1.75916 | 1.6054 | 0.4967 | ||
, tanh | standard | runner-up | 0.08174 | 0.01169 | 1.4818 | 0.4908 |
random | 0.10012 | 0.01432 | 1.5906 | 0.4963 | ||
least | 0.12132 | 0.01757 | 1.5888 | 0.5076 | ||
CRT, | runner-up | 0.28004 | 1.14995 | 1.4832 | 0.4926 | |
random | 0.32942 | 1.41032 | 1.5637 | 0.4957 | ||
least | 0.38023 | 1.65692 | 1.5626 | 0.4930 | ||
, sigmoid | standard | runner-up | 0.19501 | 0.00454 | 4.7814 | 0.8107 |
random | 0.21417 | 0.00542 | 4.6313 | 0.8377 | ||
least | 0.22706 | 0.00609 | 4.7973 | 0.8313 | ||
CRT, | runner-up | 0.40327 | 1.06208 | 4.1830 | 0.8088 | |
random | 0.47038 | 1.29095 | 4.3922 | 0.7333 | ||
least | 0.52249 | 1.49521 | 4.4676 | 0.7879 | ||
, tanh | standard | runner-up | 0.03554 | 0.00028 | 5.7016 | 0.8836 |
random | 0.04247 | 0.00036 | 5.8379 | 0.8602 | ||
least | 0.04895 | 0.00044 | 5.8298 | 0.9045 | ||
CRT, | runner-up | 0.33649 | 0.93890 | 3.8815 | 0.8182 | |
random | 0.41617 | 1.18956 | 4.0013 | 0.8215 | ||
least | 0.47778 | 1.41429 | 4.3856 | 0.8311 |
Network | Standard Accuracy | Empirical Robust Accuracy | Certified Robust Accuracy | Curvature bound (mean) | ||
, sigmoid | 0.0 | 98.77% | 96.17% | 95.04% | 7.2031 | 72.0835 |
0.005 | 98.82% | 96.33% | 95.61% | 3.8411 | 8.2656 | |
0.01 | 98.57% | 96.28% | 95.59% | 2.8196 | 5.4873 | |
0.02 | 98.59% | 95.97% | 95.22% | 2.2114 | 3.7228 | |
0.03 | 98.30% | 95.73% | 94.94% | 1.8501 | 2.9219 | |
, tanh | 0.0 | 98.65% | 95.48% | 92.69% | 12.8434 | 107.5689 |
0.005 | 98.71% | 95.88% | 94.76% | 4.8116 | 10.1860 | |
0.01 | 98.52% | 95.90% | 95.00% | 3.4269 | 6.3529 | |
0.02 | 98.35% | 95.71% | 94.77% | 2.3943 | 4.1513 | |
0.03 | 98.29% | 95.39% | 94.54% | 1.9860 | 3.933 | |
, sigmoid | 0. | 98.52% | 90.26% | 0.00% | 19.2131 | 3294.9070 |
0.005 | 98.41% | 95.81% | 94.91% | 2.6249 | 13.4985 | |
0.01 | 98.23% | 95.70% | 94.99% | 1.9902 | 8.6654 | |
0.02 | 97.99% | 95.33% | 94.64% | 1.4903 | 5.4380 | |
0.03 | 97.86% | 94.98% | 94.15% | 1.2396 | 4.1409 | |
0.04 | 97.73% | 94.60% | 93.88% | 1.0886 | 3.3354 | |
0.05 | 97.60% | 94.45% | 93.65% | 0.9677 | 2.7839 | |
, tanh | 0. | 98.19% | 86.38% | 0.00% | 133.7992 | 17767.5918 |
0.005 | 98.13% | 94.56% | 93.01% | 3.2461 | 17.5500 | |
0.01 | 98.15% | 95.00% | 94.16% | 2.2347 | 10.8635 | |
0.02 | 97.84% | 94.79% | 94.05% | 1.6556 | 6.7072 | |
0.03 | 97.70% | 94.19% | 93.42% | 1.3546 | 5.0533 | |
0.04 | 97.57% | 94.04% | 92.95% | 1.1621 | 4.0071 | |
0.05 | 97.31% | 93.66% | 92.65% | 1.0354 | 3.3439 | |
, sigmoid | 0. | 98.22% | 83.04% | 0.00% | 86.9974 | 343582.3125 |
0.01 | 97.83% | 94.65% | 93.41% | 1.6823 | 10.2289 | |
0.02 | 97.33% | 94.02% | 92.94% | 1.2089 | 6.5573 | |
0.03 | 97.07% | 93.52% | 92.65% | 1.0144 | 4.9576 | |
0.04 | 96.70% | 92.78% | 91.95% | 0.8840 | 3.9967 | |
0.05 | 96.38% | 92.29% | 91.33% | 0.7890 | 3.4183 | |
0.07 | 96.08% | 91.83% | 90.67% | 0.6614 | 2.6905 | |
, tanh | 0. | 97.45% | 75.18% | 0.00% | 913.6984 | 37148156 |
0.01 | 97.24% | 93.05% | 91.37% | 1.9114 | 12.2148 | |
0.02 | 96.82% | 92.65% | 91.35% | 1.3882 | 7.1771 | |
0.03 | 96.27% | 91.43% | 90.09% | 1.1643 | 5.1671 | |
0.04 | 95.62% | 90.69% | 89.41% | 0.9620 | 3.9061 | |
0.05 | 95.77% | 90.69% | 89.40% | 0.9160 | 3.2909 | |
0.07 | 95.24% | 89.51% | 87.91% | 0.7540 | 2.5635 |
Network | Standard Accuracy | Empirical Robust Accuracy | Certified Robust Accuracy | Certificate (mean) | ||
CROWN | CRC | |||||
, sigmoid | 0. | 98.37% | 76.28% | 54.17% | 0.28395 | 0.48500 |
0.005 | 97.96% | 88.65% | 82.68% | 0.36125 | 0.83367 | |
0.01 | 98.08% | 88.82% | 83.53% | 0.32548 | 0.84719 | |
0.02 | 97.88% | 88.90% | 83.68% | 0.34744 | 0.86632 | |
0.03 | 97.73% | 89.28% | 84.73% | 0.35387 | 0.90490 | |
, tanh | 0. | 98.34% | 79.10% | 14.42% | 0.23938 | 0.40047 |
0.005 | 98.01% | 89.95% | 85.70% | 0.27262 | 0.89672 | |
0.01 | 97.99% | 90.17% | 86.18% | 0.28647 | 0.93819 | |
0.02 | 97.64% | 90.13% | 86.40% | 0.30075 | 0.99166 | |
0.03 | 97.52% | 89.96% | 86.22% | 0.30614 | 0.98771 | |
, sigmoid | 0. | 98.37% | 85.19% | 0.00% | 0.24644 | 0.06874 |
0.005 | 97.98% | 91.93% | 88.66% | 0.38030 | 0.99044 | |
0.01 | 97.71% | 91.49% | 88.33% | 0.39799 | 1.07842 | |
0.02 | 97.50% | 91.34% | 88.38% | 0.38091 | 1.08396 | |
0.03 | 97.16% | 91.10% | 88.63% | 0.41015 | 1.15505 | |
0.04 | 97.03% | 90.96% | 88.48% | 0.42704 | 1.18073 | |
0.05 | 96.76% | 90.65% | 88.30% | 0.43884 | 1.19296 | |
, tanh | 0. | 97.91% | 77.40% | 0.00% | 0.08174 | 0.01169 |
0.005 | 97.45% | 91.32% | 88.57% | 0.28196 | 0.95367 | |
0.01 | 97.29% | 90.98% | 88.31% | 0.31237 | 1.05915 | |
0.02 | 97.04% | 90.21% | 87.77% | 0.30901 | 1.08607 | |
0.03 | 96.88% | 90.02% | 87.52% | 0.34148 | 1.11717 | |
0.04 | 96.53% | 89.61% | 86.87% | 0.36583 | 1.11307 | |
0.05 | 96.31% | 89.25% | 86.26% | 0.38519 | 1.11689 | |
, sigmoid | 0. | 98.39% | 83.27% | 0.00% | 0.19501 | 0.00454 |
0.01 | 97.41% | 91.71% | 89.61% | 0.40620 | 1.05323 | |
0.02 | 96.47% | 90.03% | 87.77% | 0.45074 | 1.14219 | |
0.03 | 96.24% | 90.40% | 88.14% | 0.47961 | 1.30671 | |
0.04 | 95.65% | 89.61% | 87.54% | 0.49987 | 1.35129 | |
0.05 | 95.36% | 89.10% | 87.09% | 0.51187 | 1.36064 | |
0.07 | 95.23% | 88.03% | 85.93% | 0.54754 | 1.27948 | |
, tanh | 0. | 97.65% | 69.20% | 0.00% | 0.03554 | 0.00028 |
0.01 | 96.52% | 89.38% | 86.40% | 0.34778 | 0.97365 | |
0.02 | 96.09% | 88.79% | 86.09% | 0.41662 | 1.10860 | |
0.03 | 95.74% | 88.36% | 85.65% | 0.44981 | 1.17400 | |
0.04 | 95.10% | 87.50% | 84.74% | 0.48356 | 1.21957 | |
0.05 | 95.14% | 87.72% | 84.77% | 0.49113 | 1.25076 | |
0.07 | 94.34% | 86.67% | 83.90% | 0.49750 | 1.24198 |
References
- Anil et al. [2018] Anil, C., Lucas, J., and Grosse, R. B. Sorting out lipschitz function approximation. In ICML, 2018.
- Athalye & Carlini [2018] Athalye, A. and Carlini, N. On the robustness of the cvpr 2018 white-box adversarial example defenses. ArXiv, abs/1804.03286, 2018.
- Athalye et al. [2018] Athalye, A., Carlini, N., and Wagner, D. A. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018.
- Boyd & Vandenberghe [2004] Boyd, S. and Vandenberghe, L. Convex Optimization. Cambridge University Press, New York, NY, USA, 2004. ISBN 0521833787.
- Bunel et al. [2017] Bunel, R., Turkaslan, I., Torr, P. H. S., Kohli, P., and Mudigonda, P. K. A unified view of piecewise linear neural network verification. In NeurIPS, 2017.
- Cao & Gong [2017] Cao, X. and Gong, N. Z. Mitigating evasion attacks to deep neural networks via region-based classification. ArXiv, abs/1709.05583, 2017.
- Carlini & Wagner [2017] Carlini, N. and Wagner, D. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, 2017.
- Carlini et al. [2017] Carlini, N., Katz, G., Barrett, C. E., and Dill, D. L. Provably minimally-distorted adversarial examples. 2017.
- Cheng et al. [2017] Cheng, C.-H., Nührenberg, G., and Ruess, H. Maximum resilience of artificial neural networks. In ATVA, 2017.
- Cohen et al. [2019] Cohen, J. M., Rosenfeld, E., and Kolter, J. Z. Certified adversarial robustness via randomized smoothing. In ICML, 2019.
- Croce et al. [2018] Croce, F., Andriushchenko, M., and Hein, M. Provable robustness of relu networks via maximization of linear regions. ArXiv, abs/1810.07481, 2018.
- Dutta et al. [2018] Dutta, S., Jha, S., Sankaranarayanan, S., and Tiwari, A. Output range analysis for deep feedforward neural networks. In NFM, 2018.
- Dvijotham et al. [2018a] Dvijotham, K., Gowal, S., Stanforth, R., Arandjelovic, R., O’Donoghue, B., Uesato, J., and Kohli, P. Training verified learners with learned verifiers. ArXiv, abs/1805.10265, 2018a.
- Dvijotham et al. [2018b] Dvijotham, K., Stanforth, R., Gowal, S., Mann, T. A., and Kohli, P. A dual approach to scalable verification of deep networks. In UAI, 2018b.
- Ehlers [2017] Ehlers, R. Formal verification of piece-wise linear feed-forward neural networks. ArXiv, abs/1705.01320, 2017.
- Fazlyab et al. [2019] Fazlyab, M., Robey, A., Hassani, H., Morari, M., and Pappas, G. J. Efficient and accurate estimation of lipschitz constants for deep neural networks. CoRR, abs/1906.04893, 2019. URL http://arxiv.org/abs/1906.04893.
- Fischetti & Jo [2018] Fischetti, M. and Jo, J. Deep neural networks and mixed integer linear optimization. Constraints, 23:296–309, 2018.
- Gehr et al. [2018] Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., and Vechev, M. T. Ai2: Safety and robustness certification of neural networks with abstract interpretation. 2018 IEEE Symposium on Security and Privacy (SP), pp. 3–18, 2018.
- Gowal et al. [2018] Gowal, S., Dvijotham, K., Stanforth, R., Bunel, R., Qin, C., Uesato, J., Arandjelovic, R., Mann, T. A., and Kohli, P. On the effectiveness of interval bound propagation for training verifiably robust models. ArXiv, abs/1810.12715, 2018.
- Hein & Andriushchenko [2017] Hein, M. and Andriushchenko, M. Formal guarantees on the robustness of a classifier against adversarial manipulation. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 30, pp. 2266–2276. 2017.
- Huang et al. [2016] Huang, X., Kwiatkowska, M. Z., Wang, S., and Wu, M. Safety verification of deep neural networks. ArXiv, abs/1610.06940, 2016.
- Katz et al. [2017] Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochenderfer, M. J. Reluplex: An efficient smt solver for verifying deep neural networks. In CAV, 2017.
- Kurakin et al. [2016a] Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarial examples in the physical world. ArXiv, abs/1607.02533, 2016a.
- Kurakin et al. [2016b] Kurakin, A., Goodfellow, I. J., and Bengio, S. Adversarial machine learning at scale. ArXiv, abs/1611.01236, 2016b.
- LeCun & Cortes [2010] LeCun, Y. and Cortes, C. MNIST handwritten digit database. 2010. URL http://yann.lecun.com/exdb/mnist/.
- Lécuyer et al. [2018] Lécuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and Jana, S. K. K. Certified robustness to adversarial examples with differential privacy. In IEEE S&P 2019, 2018.
- Li et al. [2018] Li, B. H., Chen, C., Wang, W., and Carin, L. Certified adversarial robustness with additive gaussian noise. 2018.
- Liu et al. [2017] Liu, X., Cheng, M., Zhang, H., and Hsieh, C.-J. Towards robust neural networks via random self-ensemble. ArXiv, abs/1712.00673, 2017.
- Lomuscio & Maganti [2017] Lomuscio, A. and Maganti, L. An approach to reachability analysis for feed-forward relu neural networks. ArXiv, abs/1706.07351, 2017.
- Madry et al. [2018] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb.
- Mirman et al. [2018] Mirman, M., Gehr, T., and Vechev, M. T. Differentiable abstract interpretation for provably robust neural networks. In ICML, 2018.
- Miyato et al. [2017] Miyato, T., ichi Maeda, S., Koyama, M., and Ishii, S. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41:1979–1993, 2017.
- Miyato et al. [2018] Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1QRgziT-.
- Moosavi Dezfooli et al. [2019] Moosavi Dezfooli, S. M., Fawzi, A., Uesato, J., and Frossard, P. Robustness via curvature regularization, and vice versa. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Papernot et al. [2016] Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597, May 2016. doi: 10.1109/SP.2016.41.
- Papernot et al. [2016] Papernot, N., McDaniel, P. D., Goodfellow, I. J., Jha, S., Celik, Z. B., and Swami, A. Practical black-box attacks against deep learning systems using adversarial examples. CoRR, abs/1602.02697, 2016. URL http://arxiv.org/abs/1602.02697.
- Peck et al. [2017] Peck, J., Roels, J., Goossens, B., and Saeys, Y. Lower bounds on the robustness to adversarial perturbations. In NIPS, 2017.
- Qin et al. [2019] Qin, C., Martens, J., Gowal, S., Krishnan, D., Fawzi, A., De, S., Stanforth, R., and Kohli, P. Adversarial robustness through local linearization. arXiv preprint arXiv:1907.02610, 2019.
- Raghunathan et al. [2018a] Raghunathan, A., Steinhardt, J., and Liang, P. Certified defenses against adversarial examples. ArXiv, abs/1801.09344, 2018a.
- Raghunathan et al. [2018b] Raghunathan, A., Steinhardt, J., and Liang, P. Semidefinite relaxations for certifying robustness to adversarial examples. In NeurIPS, 2018b.
- Salman et al. [2019] Salman, H., Yang, G., Li, J., Zhang, P., Zhang, H., Razenshteyn, I. P., and Bubeck, S. Provably robust deep learning via adversarially trained smoothed classifiers. ArXiv, abs/1906.04584, 2019.
- Samangouei et al. [2018] Samangouei, P., Kabkab, M., and Chellappa, R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-.
- Scaman & Virmaux [2018] Scaman, K. and Virmaux, A. Lipschitz regularity of deep neural networks: analysis and efficient estimation, 2018.
- Singh et al. [2018] Singh, G., Gehr, T., Mirman, M., Püschel, M., and Vechev, M. T. Fast and effective robustness certification. In NeurIPS, 2018.
- Singla et al. [2019] Singla, S., Wallace, E., Feng, S., and Feizi, S. Understanding impacts of high-order loss approximations and features in deep learning interpretation. In ICML, 2019.
- Szegedy et al. [2014] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014. URL http://arxiv.org/abs/1312.6199.
- Uesato et al. [2018] Uesato, J., O’Donoghue, B., Kohli, P., and van den Oord, A. Adversarial risk and the dangers of evaluating against weak attacks. In ICML, 2018.
- Wang et al. [2018a] Wang, S., Chen, Y., Abdou, A., and Jana, S. K. K. Mixtrain: Scalable training of verifiably robust neural networks. 2018a.
- Wang et al. [2018b] Wang, S., Pei, K., Whitehouse, J., Yang, J., and Jana, S. K. K. Efficient formal safety analysis of neural networks. In NeurIPS, 2018b.
- Weng et al. [2018] Weng, T.-W., Zhang, H., Chen, H., Song, Z., Hsieh, C.-J., Boning, D. S., Dhillon, I. S., and Daniel, L. Towards fast computation of certified robustness for relu networks. ArXiv, abs/1804.09699, 2018.
- Wong & Kolter [2017] Wong, E. and Kolter, J. Z. Provable defenses against adversarial examples via the convex outer adversarial polytope. ArXiv, abs/1711.00851, 2017.
- Wong et al. [2018] Wong, E., Schmidt, F. R., Metzen, J. H., and Kolter, J. Z. Scaling provable adversarial defenses. In NeurIPS, 2018.
- Xiao et al. [2017] Xiao, H., Rasul, K., and Vollgraf, R. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017. URL http://arxiv.org/abs/1708.07747.
- Zhang et al. [2018a] Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., and Daniel, L. Efficient neural network robustness certification with general activation functions. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 31, pp. 4939–4948. Curran Associates, Inc., 2018a.
- Zhang et al. [2018b] Zhang, H., Weng, T.-W., Chen, P.-Y., Hsieh, C.-J., and Daniel, L. Efficient neural network robustness certification with general activation functions. ArXiv, abs/1811.00866, 2018b.
- Zhang et al. [2018c] Zhang, H., Zhang, P., and Hsieh, C.-J. Recurjac: An efficient recursive algorithm for bounding jacobian matrix of neural networks and its applications. In AAAI, 2018c.
- Zhang et al. [2019a] Zhang, H., Chen, H., Xiao, C., Li, B., Boning, D. S., and Hsieh, C. Towards stable and efficient training of verifiably robust neural networks. CoRR, abs/1906.06316, 2019a. URL http://arxiv.org/abs/1906.06316.
- Zhang et al. [2019b] Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled trade-off between robustness and accuracy. In ICML, 2019b.
- Zheng et al. [2016] Zheng, S., Song, Y., Leung, T., and Goodfellow, I. J. Improving the robustness of deep neural networks via stability training. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4480–4488, 2016.