\ul
Robustness Verification for Classifier Ensembles††thanks: Research funded by the project NWA.BIGDATA.2019.002: “EXoDuS - EXplainable Data Science” and the FWO G030020N project “SAILor”
Abstract
We give a formal verification procedure that decides whether a classifier ensemble is robust against arbitrary randomized attacks. Such attacks consist of a set of deterministic attacks and a distribution over this set. The robustness-checking problem consists of assessing, given a set of classifiers and a labelled data set, whether there exists a randomized attack that induces a certain expected loss against all classifiers. We show the NP-hardness of the problem and provide an upper bound on the number of attacks that is sufficient to form an optimal randomized attack. These results provide an effective way to reason about the robustness of a classifier ensemble. We provide SMT and MILP encodings to compute optimal randomized attacks or prove that there is no attack inducing a certain expected loss. In the latter case, the classifier ensemble is provably robust. Our prototype implementation verifies multiple neural-network ensembles trained for image-classification tasks. The experimental results using the MILP encoding are promising both in terms of scalability and the general applicability of our verification procedure.
Keywords:
Adversarial attacks Ensemble classifiers Robustness1 Introduction
Recent years have seen a rapid progress in machine learning (ML) with a strong impact to fields like autonomous systems, computer vision, or robotics. As a consequence, many systems employing ML show an increasing interaction with aspects of our everyday life, consider autonomous cars operating amongst pedestrians and bicycles. While studies indicate that self-driving cars, inherently relying on ML techniques, make around 80% fewer traffic mistakes than human drivers [19], verifiable safety remains a major open challenge [26, 13, 23, 5].
In the context of self-driving cars, for instance, certain camera data may contain noise that can be introduced randomly or actively via so-called adversarial attacks. We focus on the particular problem of such attacks in image classification. A successful attack perturbs the original image in a way such that a human does not recognize any difference while ML misclassifies the image. A measure of difference between the ground truth classification, for instance by a human, and a potentially perturbed ML classifier is referred as the loss.
A standard way to render image classification more robust against adversarial attacks is to employ a set of classifiers, also referred to as classifier ensembles [2, 20, 21, 3]. The underlying idea is to obscure the actual classifier from the attacker. One possible formalization of the competition between an adversarial attacker and the ensemble is that of a zero-sum game [20]: The attacker chooses first, the ensemble tries to react to the attack with minimal loss — that is, choosing a classifier that induces maximal classification accuracy.
In this setting, the attacker may need to use randomization to behave optimally (see Fig. 1, cf. [6]). Such an attack is called optimal if the expected loss is maximized regardless of the choice of classifier.
Inspired by previous approaches for single classifiers [15, 12], we develop a formal verification procedure that decides if a classifier ensemble is robust against any randomized attack. In particular, the formal problem is the following. Given a set of classifiers and a labelled data set, we want to find a probability distribution and a set of attacks that induce an optimal randomized attack. Akin to the setting in [20], one can provide thresholds on potential perturbations of data points and the minimum shift in classification values. Thereby, it may happen that no optimal attack exists, in which case we call the classifier ensemble robust. Our aim is the development of a principled and effective method that is able to either find the optimal attack or prove that the ensemble is robust with respect to the predefined thresholds.
To that end, we first establish a number of theoretical results. First, we show that the underlying formal problem is \NP-hard. Towards computational tractability, we also show that for an optimal attack there exists an upper bound on the number of attacks that are needed. Using these results, we provide an SMT encoding that computes suitable randomized attacks for a set of convolutional neural networks with ReLU activation functions and a labelled data set. In case there is no solution to that problem, the set of neural networks forms a robust classifier ensemble, see Fig. 2. Together with the state-of-the-art SMT solver Z3 [9], this encoding provides a complete method to solve the problem at hand. Yet, our experiments reveal that it scales only for small examples. We outline the necessary steps to formulate the problem as a mixed-integer linear programming (MILP), enabling the use of efficient optimization solvers like Gurobi [14].
In our experiments, we show the applicability of our approach by means of a benchmark set of binary classifiers, which were trained on the MNIST and German traffic sign datasets [10, 25].
Related work
It is noteworthy that there is some recent work on robustness checking of decision-tree ensembles [22]. However, their approach is based on abstract interpretation and is thus not complete. Other approaches for robustness checking of machine learning classifiers focus on single classifiers (see, e.g., [15, 12, 7, 18, 24]). Akin to our approach, some of these works employ SMT solving. In [4], MILP-solving is used for verification tasks on (single) recurrent neural networks. In contrast, our framework allows to compute attacks for classifier ensembles.
In [11], Dreossi et al. propose a robustness framework which unifies the optimization and verification views on the robustness-checking problem and encompasses several existing approaches. They explicitly mention that their framework applies to local robustness and argue most of the existing work on finding adversarial examples and verifying robustness fits their framework. Our work, when we have a single classifier and a singleton data set, fits precisely into their framework. However, we generalize in those two dimensions by averaging over the robustness target value (in their jargon) for all points in a data set, and by considering ensemble classifiers. This means that our point of view of the adversarial environment is neither that of a white-box attacker nor is it a black-box attacker. Indeed, we know the set of classifiers but we do not know what strategy is used to choose which (convex combination of) classifiers to apply. Our environment is thus a gray-box attacker.
2 Preliminaries
Let be a vector . We write for the “Manhattan norm” of , that is .
We will make use of a partial inverse of the function. Consider a totally ordered set and a function . Throughout this work we define the (arguments of the maxima) partial function as follows. For all we set if is the unique element of such that . If more than one element of witnesses the maximum then is undefined.
A probability distribution over a finite set is a function with . The set of all distributions on is .
2.1 Neural networks
We loosely follow the neural-network notation from [15, 12]. A feed-forward neural network (NN for short) with inputs and outputs encodes a function . We focus on NNs with ReLU activation functions. Formally, the function is given in the form of
-
•
a sequence of weight matrices with , for all , and
-
•
a sequence of bias vectors with , for all .
Additionally, we have that with and . We then set for all where for all we define
and . The ReLU function on vectors is the element-wise maximum between and the vector entries, that is, if then .
We sometimes refer to each as a layer. Note that each layer is fully determined by its corresponding weight and bias matrices.
2.2 Neural-network classifiers
A data set is a finite set of (real-valued) data points of dimension . A classifier is a partial function that attaches to each data point a label from , the set of labels. We denote the set of all classifiers over by . An NN-encoded classifier is simply a partial function given as an NN that assigns to each data point the label where . Intuitively, the label is the index of the largest entry in the vector resulting from applying to . Note that if the image of according to has several maximal entries then the and the output label are undefined.
Definition 1 (Labelled data set)
A labelled data set consists of a data set and a total classifier for , i.e. is a total function.
In particular, is defined for all and considered to be the “ground truth” classification for the whole data set .
3 Problem Statement
We state the formal problem and provide the required notation. Recall that in our setting we assume to have an ensemble of classifiers . Such an ensemble is attacked by a set of attacks that are selected randomly.
Definition 2 (Deterministic attack)
A deterministic attack for a labelled data set and a classifier is a function . An attack induces a misclassification for and if or if is undefined. The set of all deterministic attacks is . An attack is -bounded if holds for all .
We sometimes call the value the attack point. Note that the classifier is not perfect, that is, for some , already a zero-attack leads to a misclassification.
We extend deterministic attacks by means of probabilities.
Definition 3 (Randomized attack)
A finite set of deterministic attacks together with a probability distribution is a randomized attack . A randomized attack is -bounded if for all attacks with it holds that for all .
In general, a loss function describes the penalty incurred by a classifier with respect to a labelled data point and an attack. In this work, we will focus on the widely used zero-one loss.
Definition 4 (Zero-one loss function)
The -loss function for a labelled data set , a classifier , and a deterministic attack is given by the following for all
In particular, the loss function yields one if the image of the classifier is undefined for the attack point . Note furthermore that the loss is measured with respect to the ground truth classifier . Thereby, the classifier and the zero function as deterministic attack do not necessarily induce a loss of zero with respect to . This assumption is realistic as, while we expect classifiers to perform well with regard to the ground truth, we cannot assume perfect classification in realistic settings.
We now connect a randomized attack to an ensemble, that is, a finite set of classifiers. In particular, we quantify the overall value a randomized attack induces with respect to the loss function and the ensemble.
Definition 5 (Misclassification value)
The misclassification value of a randomized attack with respect to a labelled data set and a finite set of classifiers is given by
(1) |
This value is the minimum (over all classifiers) mean expected loss with respect to the randomized attack and the classifiers from . An optimal adversarial attack against with respect to a labelled data set is a randomized attack which maximizes the value in Equation (1).
We are now ready to formalize a notion of robustness in terms of -bounded attacks and a robustness bound , as proposed in [20] for a set of classifiers.
Definition 6 (Bounded robustness)
A set of classifiers for a labelled data set is called robust bounded by and (-robust) if it holds that
(2) |
where the ) range over all -bounded randomized attacks.
In other words, an -robust ensemble is such that for all possible -bounded random attacks , there is at least one classifier from the ensemble such that Conversely, an ensemble is not -robust if there is an -bounded randomized attack with a misclassification value of at least .
4 Theoretical Results
In this section we establish two key results that carry essential practical implication for our setting. First, we show that in order to obtain an optimal randomized attack, only a bounded number of deterministic attacks is needed.111An analogue of this property had already been observed by Perdomo and Singer in [20, Section 3] in the case when classifiers are chosen randomly. Thereby, we only need to take a bounded number of potential attacks into account in order to prove the -robustness of a set of classifiers and a given labelled data set. Second, we establish that our problem is in fact \NP-hard, justifying the use of SMT and MILP solvers to (1) compute any optimal randomized attack and, more importantly, to (2) prove robustness against any such attack.
4.1 Bounding the number of attacks
In the following, we assume a fixed labelled data set . For every classifier and every deterministic attack , let us write to denote the value . Observe that for all and . Furthermore, for all and randomized attacks it holds that:
We get that Equation (2) from Def. 5 is false if and only if the following holds.
(3) |
Proposition 1 (Bounded number of attacks)
Let and consider the labelled data set together with a finite set of classifiers . For all randomized attacks , there exists a randomized attack such that
-
•
,
-
•
, and
-
•
is -bounded if is -bounded.
Proof
We proceed by contradiction. Let be an -bounded randomized attack with a misclassification value of such that . Further suppose that is minimal (with respect to the size of ) amongst all such randomized attacks. It follows that there are attacks such that for all . We thus have that
Consider now the randomized attack obtained by modifying so that and is removed from . From the above discussion and Equation (3) it follows that , just like , has a misclassification value of . Furthermore, since , we have that is -bounded and that . This contradicts our assumption regarding the minimality of amongst -bounded randomized attacks with the same value .∎
4.2 NP hardness of non-robustness checking
It is known that checking whether linear properties hold for a given NN with ReLU activation functions is \NP-hard [15]. We restate this using our notation.
Proposition 2 (From [15, Appendix I])
The following problem is \NP-hard: Given an NN-encoded function and closed nonnegative intervals , decide whether there exists such that .
Intuitively, determining whether there exists a point in a given box — that is, a hypercube defined by a Cartesian product of intervals — from whose image according to is in a given box from is \NP-hard. We will now reduce this to the problem of determining if there is a randomized attack such that its misclassification value takes at least a given threshold.
Theorem 4.1
The following problem is \NP-hard: For a labelled data set , a set of classifiers, and a value , decide if there exists an -bounded randomized attack w.r.t. and such that .
Proof
We use Proposition 2 and show how to construct, for any NN-encoded function and any constraint , two classifiers such that a single deterministic attack causes , the single data point, to be misclassified by both and if and only if the constraint holds. Note that the bound can be chosen to be large enough so that it contains the box and that the identity function over nonnegative numbers is NN-encodable, that is, using the identity matrix as weight matrix and a zero bias vector . For every instance of the problem from Proposition 2 we can therefore construct NNs that encodes all input and output constraints: of them based on the identity function to encode input constraints and based on the input NN from the given instance. It follows that determining if there exists an -bounded deterministic attack that causes to be simultaneously misclassified by a given classifier ensemble is \NP-hard. Hence, to conclude, it suffices to argue that the latter problem reduces to our robustness-value threshold problem. The result follows from Lemmas 1 and 2.∎
4.2.1 Enforcing interval constraints.
Let be an NN-encoded function and with . Consider now the constraint . Henceforth we will focus on the labelled data set with .
Lower-bound constraint.
We obtain by adding to the NN encoding of a new final layer with weight and bias vectors
to obtain the NN-encoded function . Note that for all . It follows that if and is undefined if . In all other cases the classifier yields .
Upper-bound constraint.
To obtain we add to the NN encoding of two new layers. The corresponding weight matrices and bias vectors are as follows.
Let us denote by the resulting function. Observe that for all . Hence, we have that is undefined if and only if and yields otherwise.
Lemma 1
Let be an NN-encoded function and consider the constraint . One can construct NN-encoded classifiers and , of size linear with respect to , for the labelled data set such that the deterministic attack
-
•
induces a misclassification of with respect to if and only if and
-
•
it induces a misclassification of with respect to if and only if .
We now show how to modify the NN to obtain classifiers such that is misclassified by both and if and only if the constraint holds.
4.2.2 Enforcing universal misclassification.
A randomized attack with misclassification value can be assumed to be deterministic. Indeed, from Equation (3) it follows that for any such randomized attack we must have for all and all . Hence, we can choose any such and consider the randomized attack which also has misclassification value .
Lemma 2
Consider the labelled data set with the finite set of classifiers . There exists an -bounded randomized attack with if and only if there exists a deterministic attack such that
-
•
for all and
-
•
for all and all we have that either is undefined or it is not equal to .
5 SMT and MILP Encodings
In this section, we describe the main elements of our SMT and MILP encodings to compute (optimal) randomized attacks or prove the robustness of classifier ensembles. We start with a base encoding and will afterwards explain how to explicitly encode the classifiers and the loss function.
5.1 Base problem encoding
First, we assume a labelled data set , the attack bound , and the robustness bound are input to the problem. In particular, the data set has data points for . Furthermore, we assume the number of attacks that shall be computed is fixed. Recall that, to show that the set of classifiers is robust, we can compute a sufficient bound on the number of attacks — see Sec. 4.1.
For readability, we assume the classifiers and the loss function are given as functions that can directly be used in the encodings, and we will use the absolute value for . Afterwards, we discuss how to actually encode classifiers and the loss function. We use the following variables:
-
•
For the attacks from , we introduce with for . Specifically, shall be assigned all attack values for the -th attack from . That is, is the attack for the data point with for .
-
•
We introduce to form a probability distribution over deterministic attacks; is assigned the probability to execute attack .
The classifier ensemble is not -robust as in Definition 6 if and only if the following constraints are satisfiable.
(4) | ||||
(5) | ||||
(6) | ||||
(7) |
Indeed, (4) enforces the misclassification value to be at least ; (5) ensures an -bounded randomized attack; finally, by (6) and (7) the probability variables induce a valid probability distribution.
Specific encodings.
For the SMT encoding, we can make use of the native to implement the absolute value. In particular for the MILP, however, we employ so-called “big-M” tricks to encode max functions and a (restricted) product operation (cf. [7]). Specifically, the product is required to obtain the value resulting from the multiplication of the loss function and probability variables.
As an example of “big-M” trick, suppose we have variables , and a constant such that . We introduce a variable and add the following constraints which clearly enforce that .
Note that can be chosen to be the constant in this case.
Encoding the loss function.
We encode the zero-one loss function from Def. 4 as an if-then-else expression making use of the ground truth classifier . We introduce one variable per classifier for all attacks and all datapoints . In the SMT encoding, we can then define
(8) |
so that . In our MILP encoding we have to simulate the ITE primitive using constraints similar to the ones mentioned above.
5.2 Classifier encoding
As mentioned in the preliminaries, neural networks implement functions by way of layer composition. Intuitively, the input of a layer is by a previous layer. When fed forward, input values are multiplied by a weight, and a bias value will be added to it. Matrix operations realized by a neural network can thus be encoded as linear functions. For max-pooling operations and the ReLU activation function, one can use the native operation or implement a maximum using a “big-M trick”. For this, a suitable constant has to be obtained beforehand (cf. [7]). We also use a (discrete) convolution operation, as a linear function.
6 Experiments
In the previous section, we showed that our problem of neural network robustness verification is \NP-hard. Meaningful comparison between approaches, therefore, needs to be experimental. To that end, we use classifiers trained on multiple image data sets and report on the comparison between the SMT and MILP encodings. In what follows, we analyze the running time behavior of the different verifiers, the generated attacks and the misclassification value for the given data points, and whether a set of classifiers is robust against predefined thresholds.
6.1 Experimental setup
For each experiment, we define a set of classifiers , our data points , the number of attacks , and both the - and -values. Then, we generate the attacks and the probability distribution using SMT and MILP solvers. If no randomized attack is found (UNSAT), we have shown that our set of classifiers is -robust with respect to the predefined thresholds.
Toolchain.
Our NN robustness verifier222Available at https://tinyurl.com/ensemble-robustness., is implemented as part of a Python 3.x toolchain. We use the SMT solver Z3 [9] and the MILP solver Gurobi [14] with their standard settings. To support arbitrary classifiers, we created a generic pipeline using the TensorFlow API, and support Max-Pooling layers, convolutional layers, and dense layers with ReLU activation functions [1]. We focus on binary classifiers by defining certain classification boundaries. We train the classifiers using the Adam optimizer [17] as well as stochastic gradient descent.
Data sets.
MNIST consists of 70 000 images of handwritten digits) [10] and is widely used for benchmarking in the field of machine learning [16, 8]. We trained classifiers to have a test accuracy of at least 90%.
German traffic sign is a multi-class and single-image classification data set, containing more than 50 000 images and more than 40 classes [25]. Traffic sign recognition and potential attacks are of utmost importance for self-driving cars [27]. We extracted the images for “Give way” and “priority” traffic signs from the data set and trained classifiers on this subset to have an accuracy of at least 80%.
Benchmark Information | SMT | MILP | |||||||||
ID | Name | dim | Time | Time | |||||||
2 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.2 | 100 | -TO- | – | 12.43 | 0.25 |
4 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.4 | 100 | -TO- | – | 53.92 | 0.4 |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | -TO- | – | 0.34 | 0.4 |
8 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.9 | 1000 | -TO- | – | 50.09 | 0.9 |
9 | mnist_0_1 | 3 | 3 | 4 | 8x8 | 0.9 | 60 | -TO- | – | 34.02 | 1 |
13 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.9 | 50 | -TO- | – | 144.32 | 1 |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | -TO- | – | 18.94 | 0.25 |
16 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.1 | 1000 | 155.73 | 0.38 | 101.16 | 0.1 |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 403.25 | 0.25 | 101.47 | 0.25 |
18 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.15 | 80 | 216.65 | 0.38 | 44.26 | 0.15 |
19 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 156.63 | 0.38 | 54.36 | 0.25 |
22 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 0.9 | 0.1 | -TO- | – | 4 | robust |
26 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.01 | -TO- | – | 17 | robust |
27 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.1 | -TO- | – | -TO- | – |
Benchmark Information | MILP | MaxMILP | |||||||||
ID | Name | dim | Time | Time | |||||||
1 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.1 | 1000 | 57.79 | 0.25 | 46.23 | 1* |
3 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.2 | 1000 | 738.76 | 0.5 | 93.54 | 1* |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | 0.34 | 0.4 | 0.34 | 1* |
10 | mnist_0_1 | 3 | 4 | 4 | 8x8 | 0.9 | 60 | 51.39 | 1 | 51.39 | 1* |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | 18.94 | 0.25 | 21.20 | 1 |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 101.47 | 0.25 | 88.39 | 1 |
20 | mnist_3_6 | 2 | 9 | 2 | 8x8 | 1 | 0.005 | 7 | robust | 7 | robust |
21 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 1 | 0.1 | 4 | robust | 4 | robust |
24 | mnist_0_2 | 3 | 27 | 4 | 9x9 | 1 | 0.01 | 108 | robust | 108 | robust |
25 | mnist_0_2 | 3 | 30 | 4 | 9x9 | 1 | 0.01 | 120 | robust | 120 | robust |
28 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | 45 | robust | 45 | robust |
29 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | –TO– | - | –TO– | – |
Optimal attacks.
As MILP inherently solves optimization problems, we augment the base encoding from Equations (4)–(7) with the following objective function:
An optimal solution with respect to the objective may yield a randomized attack inducing the maximal misclassification value among all -bounded attacks.333Note that we sum over classifiers instead of minimizing, as required in .
Alternative attacks.
Our method generates attacks taking the whole ensemble of classifiers into account, which is computationally harder than just considering single classifiers [15, 12] due to an increased number of constraints and variables in the underlying encodings. To showcase the need for our approach, we implemented two other ways to generate attacks that are based on individual classifiers and subsequently lifted to the whole ensemble. Recall that we assume the attacker does not know which classifier from the ensemble will be chosen.
First, for the classifier ensemble we compute — using a simplified version of our MILP encoding — an optimal attack for each classifier . Each such maximizes the misclassification value, that is, the loss, for the classifier . The attack set together with a uniform distribution over form the so-called uniform attacker .
Second, to compare with deterministic attacks, we calculate for each attack from the misclassification value over all classifiers. The best deterministic attack is any attack from inducing a maximal misclassification value.
6.2 Evaluation
We report on our experimental results using the aforementioned data sets and attacks. For all experiments we used a timeout of 7200 seconds (TO). Each benchmark has an ID, a name, the number of classifiers, the number of attacks, the size of the data set, the dimension of the image (dim), the robustness bound , and the attack bound . The names of the MNIST benchmarks are of the form “mnist__”, where and are the labels; the additional suffix “_convs” indicates that the classifier has convolutional layers. We provide an excerpt of our experiments, full tables are available in the appendix.
SMT versus MILP.
In Table 1, we report on the comparison between SMT and MILP. Note that for these benchmarks, the MILP solver just checks the feasibility of the constraints without an objective function. We list for both solvers the time in seconds and the misclassification value , rounded to 2 decimal places, for the generated randomized attack , if it can be computed before the timeout. If there is no solution, the classifier ensemble is -robust, and instead of a misclassification value we list “robust”.
We observe that SMT is only able to find solutions within the timeout for small and large values. Moreover, if the ensemble is robust, that is, the problem is not satisfiable, the solver does not terminate for any benchmark. Nevertheless, for some benchmarks (see Table 1, entries 16–19) the SMT solver yields a higher misclassification value than the MILP solver — that is, it finds a better attack. The MILP solver, on the other hand, solves most of our benchmarks mostly within less than a minute, including the robust ones. Despite the reasonably low timeout of 7200 seconds, we are thus able to verify the robustness of NNs with around 6 layers. Running times visibly differ for other factors such as layer types.
MILP versus MaxMILP.
Table 2 compares some of the MILP results to those where we optimize the mentioned objective function, denoted by MaxMILP. The MILP solver Gurobi offers the possibility of so-called callbacks, that is, while an intermediate solution is not proven to be optimal, it may already be feasible. In case optimality cannot be shown within the timeout, we list the current (feasible) solution, and mark optimal solutions with . The misclassification value for the MaxMILP solver is always 1. For robust ensembles, it is interesting to see that the MaxMILP encoding sometimes needs less time.
MaxMILP versus alternative attacks.
Benchmark Information | UA | BDA | MaxMILP | |||||||
ID | Name | dim | Epsilon | Alpha | ||||||
3 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.2 | 1000 | 0.33 | 0.25 | 1* |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | 0.33 | 0.5 | 1* |
8 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.9 | 1000 | 0.33 | 0.5 | 1* |
9 | mnist_0_1 | 3 | 3 | 4 | 8x8 | 0.9 | 60 | 0.33 | 0.5 | 1* |
10 | mnist_0_1 | 3 | 4 | 4 | 8x8 | 0.9 | 60 | 0.33 | 0.5 | 1* |
11 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 0.33 | 0.25 | 1* |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | 0.33 | 0.5 | 1 |
15 | mnist_7_8 | 3 | 10 | 4 | 6x6 | 0.1 | 60 | 0.33 | 0.5 | 1 |
16 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.1 | 1000 | 0.33 | 0.75 | 1* |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 0.33 | 0.5 | 1 |
18 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.15 | 80 | 0.33 | 0.5 | 1* |
In Table 3, we compare the MaxMILP method to the uniform attacker (UA) and the best deterministic attacker (BDA). What we can see is that the best deterministic attacker usually achieves higher misclassification values than the uniform attacker, but none of them are able to reach the actual optimum of 1.
Discussion of the results.
Within the timeout, our method is able to generate optimal results for medium-sized neural networks. The running time is mainly influenced by the number and type of the used layers, in particular, it is governed by convolutional and max-pooling layers: these involve more matrix operations than dense layers. As expected, larger values of the robustness bound and smaller values of the attack bound typically increase the running times.
7 Conclusion and Future Work
We presented a new method to formally verify the robustness or, vice versa, compute optimal attacks for an ensemble of classifiers. Despite the theoretical hardness, we were able to, in particular by using MILP-solving, provide results for meaningful benchmarks. In future work, we will render our method more scalable towards a standalone verification tool for neural network ensembles. Moreover, we will explore settings where we do not have white-box access to the classifiers and employ state-of-the-art classifier stealing methods.
References
- [1] Abadi, M.: Tensorflow: learning functions at scale. In: Garrigue, J., Keller, G., Sumii, E. (eds.) ICFP. p. 1. ACM (2016)
- [2] Abbasi, M., Gagné, C.: Robustness to adversarial examples through an ensemble of specialists. In: ICLR (Workshop). OpenReview.net (2017)
- [3] Abbasi, M., Rajabi, A., Gagné, C., Bobba, R.B.: Toward adversarial robustness by diversity in an ensemble of specialized deep neural networks. In: Canadian Conference on AI. LNCS, vol. 12109, pp. 1–14. Springer (2020)
- [4] Akintunde, M.E., Kevorchian, A., Lomuscio, A., Pirovano, E.: Verification of RNN-based neural agent-environment systems. In: AAAI. pp. 6006–6013. AAAI Press (2019)
- [5] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in ai safety. CoRR abs/1606.06565 (2016)
- [6] Apt, K.R., Grädel, E.: Lectures in game theory for computer scientists. Cambridge University Press (2011)
- [7] Bunel, R., Turkaslan, I., Torr, P.H.S., Kohli, P., Mudigonda, P.K.: A unified view of piecewise linear neural network verification. In: NeurIPS. pp. 4795–4804 (2018)
- [8] Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: extending MNIST to handwritten letters. In: IJCNN. pp. 2921–2926. IEEE (2017)
- [9] De Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: TACAS. pp. 337–340. Springer (2008)
- [10] Deng, L.: The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29(6), 141–142 (2012)
- [11] Dreossi, T., Ghosh, S., Sangiovanni-Vincentelli, A.L., Seshia, S.A.: A formalization of robustness for deep neural networks. CoRR abs/1903.10033 (2019)
- [12] Ehlers, R.: Formal verification of piece-wise linear feed-forward neural networks. In: ATVA. LNCS, vol. 10482, pp. 269–286. Springer (2017)
- [13] Freedman, R.G., Zilberstein, S.: Safety in AI-HRI: Challenges complementing user experience quality. In: AAAI Fall Symposium Series (2016)
- [14] Gurobi Optimization, Inc.: Gurobi optimizer reference manual. http://www.gurobi.com (2013)
- [15] Katz, G., Barrett, C.W., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: An efficient SMT solver for verifying deep neural networks. In: CAV. LNCS, vol. 10426, pp. 97–117. Springer (2017)
- [16] Keysers, D.: Comparison and combination of state-of-the-art techniques for handwritten character recognition: topping the mnist benchmark. arXiv preprint arXiv:0710.2231 (2007)
- [17] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) ICLR (2015), http://arxiv.org/abs/1412.6980
- [18] Kwiatkowska, M.Z.: Safety verification for deep neural networks with provable guarantees (invited paper). In: CONCUR. LIPIcs, vol. 140, pp. 1:1–1:5. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
- [19] Nyholm, S.: The ethics of crashes with self-driving cars: A roadmap, ii. Philosophy Compass 13(7) (2018)
- [20] Perdomo, J.C., Singer, Y.: Robust attacks against multiple classifiers. CoRR abs/1906.02816 (2019)
- [21] Pinot, R., Ettedgui, R., Rizk, G., Chevaleyre, Y., Atif, J.: Randomization matters. how to defend against strong adversarial attacks. CoRR abs/2002.11565 (2020)
- [22] Ranzato, F., Zanella, M.: Robustness verification of decision tree ensembles. In: OVERLAY@AI*IA. vol. 2509, pp. 59–64. CEUR-WS.org (2019)
- [23] Science, N., (NSTC), T.C.: Preparing for the Future of Artificial Intelligence (2016)
- [24] Singh, G., Gehr, T., Püschel, M., Vechev, M.T.: An abstract domain for certifying neural networks. Proc. ACM Program. Lang. 3(POPL), 41:1–41:30 (2019). https://doi.org/10.1145/3290354, https://doi.org/10.1145/3290354
- [25] Stallkamp, J., Schlipsing, M., Salmen, J., Igel, C.: The german traffic sign recognition benchmark: a multi-class classification competition. In: IJCNN. pp. 1453–1460. IEEE (2011)
- [26] Stoica, I., Song, D., Popa, R.A., Patterson, D., Mahoney, M.W., Katz, R., Joseph, A.D., Jordan, M., Hellerstein, J.M., Gonzalez, J.E., et al.: A Berkeley view of systems challenges for AI. CoRR abs/1712.05855 (2017)
- [27] Yan, C., Xu, W., Liu, J.: Can you trust autonomous vehicles: Contactless attacks against sensors of self-driving vehicle. DEF CON 24(8), 109 (2016)
Appendix 0.A Full Experimental Results
Benchmark Information | SMT | MILP | |||||||||
ID | Name | dim | Time | Time | |||||||
1 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.1 | 1000 | -TO- | – | 57.79 | 0.25 |
2 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.2 | 100 | -TO- | – | 12.43 | 0.25 |
3 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.2 | 1000 | -TO- | – | 738.76 | 0.5 |
4 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.4 | 100 | -TO- | – | 53.92 | 0.4 |
5 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.9 | 100 | -TO- | – | 44.24 | 1 |
6 | mnist_0_1_2convs | 3 | 3 | 4 | 8x8 | 0.9 | 100 | -TO- | – | 96.78 | 1 |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | -TO- | – | 0.34 | 0.4 |
8 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.9 | 1000 | -TO- | – | 50.09 | 0.9 |
9 | mnist_0_1 | 3 | 3 | 4 | 8x8 | 0.9 | 60 | -TO- | – | 34.02 | 1 |
10 | mnist_0_1 | 3 | 4 | 4 | 8x8 | 0.9 | 60 | -TO- | – | 51.39 | 1 |
11 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | -TO- | – | 57.92 | 0.25 |
12 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.3 | 50 | -TO- | – | 143.86 | 0.5 |
13 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.9 | 50 | -TO- | – | 144.32 | 1 |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | -TO- | – | 18.94 | 0.25 |
15 | mnist_7_8 | 3 | 10 | 4 | 6x6 | 0.1 | 60 | -TO- | – | 76.25 | 0.5 |
16 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.1 | 1000 | 155.73 | 0.38 | 101.16 | 0.1 |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 403.25 | 0.25 | 101.47 | 0.25 |
18 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.15 | 80 | 216.65 | 0.38 | 44.26 | 0.15 |
19 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 156.63 | 0.38 | 54.36 | 0.25 |
20 | mnist_3_6 | 2 | 9 | 2 | 8x8 | 1 | 0.005 | -TO- | – | 7 | robust |
21 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 1 | 0.1 | -TO- | – | 4 | robust |
22 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 0.9 | 0.1 | -TO- | – | 4 | robust |
23 | mnist_7_8 | 3 | 7 | 4 | 6x6 | 0.9 | 0.3 | -TO- | – | 71 | robust |
24 | mnist_0_2 | 3 | 27 | 4 | 9x9 | 1 | 0.01 | -TO- | – | 108 | robust |
25 | mnist_0_2 | 3 | 30 | 4 | 9x9 | 1 | 0.01 | -TO- | – | 120 | robust |
26 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.01 | -TO- | – | 17 | robust |
27 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.1 | -TO- | – | -TO- | – |
28 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | -TO- | – | 45 | robust |
29 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | -TO- | – | -TO- | – |
Benchmark Information | MILP | MaxMILP | |||||||||
ID | Name | dim | Time | Time | |||||||
1 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.1 | 1000 | 57.79 | 0.25 | 46.23 | 1* |
2 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.2 | 100 | 12.43 | 0.25 | 12.43 | 1* |
3 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.2 | 1000 | 738.76 | 0.5 | 93.54 | 1* |
4 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.4 | 100 | 53.92 | 0.4 | 53.92 | 1* |
5 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.9 | 100 | 44.24 | 1 | 44.24 | 1* |
6 | mnist_0_1_2convs | 3 | 3 | 4 | 8x8 | 0.9 | 100 | 96.78 | 1 | 96.78 | 1* |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | 0.34 | 0.4 | 0.34 | 1* |
8 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.9 | 1000 | 50.09 | 0.9 | 50.09 | 1* |
9 | mnist_0_1 | 3 | 3 | 4 | 8x8 | 0.9 | 60 | 34.02 | 1 | 34.02 | 1* |
10 | mnist_0_1 | 3 | 4 | 4 | 8x8 | 0.9 | 60 | 51.39 | 1 | 51.39 | 1* |
11 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 57.92 | 0.25 | 52.95 | 1* |
12 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.3 | 50 | 143.86 | 0.5 | 158.20 | 1 |
13 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.9 | 50 | 144.32 | 1 | 144.32 | 1 |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | 18.94 | 0.25 | 21.20 | 1 |
15 | mnist_7_8 | 3 | 10 | 4 | 6x6 | 0.1 | 60 | 76.25 | 0.5 | 76.25 | 1 |
16 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.1 | 1000 | 101.16 | 0.1 | 106.95 | 1* |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 101.47 | 0.25 | 88.39 | 1 |
18 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.15 | 80 | 44.26 | 0.15 | 52.00 | 1* |
19 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 54.36 | 0.25 | 44.00 | 1* |
20 | mnist_3_6 | 2 | 9 | 2 | 8x8 | 1 | 0.005 | 7 | robust | 7 | robust |
21 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 1 | 0.1 | 4 | robust | 4 | robust |
22 | mnist_7_8 | 3 | 2 | 4 | 6x6 | 0.9 | 0.1 | 4 | robust | 4 | robust |
23 | mnist_7_8 | 3 | 7 | 4 | 6x6 | 0.9 | 0.3 | 71 | robust | 71 | robust |
24 | mnist_0_2 | 3 | 27 | 4 | 9x9 | 1 | 0.01 | 108 | robust | 108 | robust |
25 | mnist_0_2 | 3 | 30 | 4 | 9x9 | 1 | 0.01 | 120 | robust | 120 | robust |
26 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.01 | 17 | robust | 17 | robust |
27 | traffic_signs | 3 | 2 | 4 | 10x10 | 1 | 0.1 | –TO– | - | –TO– | – |
28 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | 45 | robust | 45 | robust |
29 | traffic_signs | 3 | 3 | 4 | 10x10 | 1 | 0.01 | –TO– | - | –TO– | – |
Benchmark Information | UA | BDA | MaxMILP | |||||||
ID | Name | dim | Epsilon | Alpha | ||||||
1 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.1 | 1000 | 0.33 | 0.5 | 1* |
2 | mnist_0_1 | 3 | 2 | 4 | 7x7 | 0.2 | 100 | 0.33 | 0.5 | 1* |
3 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.2 | 1000 | 0.33 | 0.5 | 1* |
4 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.4 | 100 | 0.33 | 0.25 | 1* |
5 | mnist_0_1_2convs | 3 | 2 | 4 | 8x8 | 0.9 | 100 | 0.33 | 0.25 | 1* |
6 | mnist_0_1_2convs | 3 | 3 | 4 | 8x8 | 0.9 | 100 | 0.33 | 0.25 | 1* |
7 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.4 | 1000 | 0.33 | 0.5 | 1* |
8 | mnist_0_1 | 3 | 2 | 4 | 8x8 | 0.9 | 1000 | 0.33 | 0.5 | 1* |
9 | mnist_0_1 | 3 | 3 | 4 | 8x8 | 0.9 | 60 | 0.33 | 0.5 | 1* |
10 | mnist_0_1 | 3 | 4 | 4 | 8x8 | 0.9 | 60 | 0.33 | 0.5 | 1* |
11 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 0.33 | 0.25 | 1* |
12 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.3 | 50 | 0.33 | 0.75 | 1 |
13 | mnist_4_5 | 3 | 4 | 4 | 10x10 | 0.9 | 50 | 0.33 | 0.5 | 1 |
14 | mnist_7_8 | 3 | 4 | 4 | 6x6 | 0.1 | 60 | 0.33 | 0.5 | 1 |
15 | mnist_7_8 | 3 | 10 | 4 | 6x6 | 0.1 | 60 | 0.33 | 0.5 | 1 |
16 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.1 | 1000 | 0.33 | 0.75 | 1* |
17 | mnist_4_5 | 3 | 3 | 4 | 10x10 | 0.1 | 80 | 0.33 | 0.5 | 1 |
18 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.15 | 80 | 0.33 | 0.5 | 1* |
19 | mnist_4_5 | 3 | 2 | 4 | 10x10 | 0.2 | 100 | 0.33 | 0.5 | 1* |