It’s Simplex! Disaggregating Measures to Improve Certified Robustness ^†^†thanks: Identify applicable funding agency here. If none, delete this.

Andrew C. Cullen13, Paul Montague2, Shijie Liu1, Sarah M. Erfani1, and Benjamin I.P. Rubinstein 1 1School of Computing and Information Systems, University of Melbourne, Parkville, Australia 2Defence Science and Technology Group, Adelaide, Australia 3 [email protected]

Abstract

Certified robustness circumvents the fragility of defences against adversarial attacks, by endowing model predictions with guarantees of class invariance for attacks up to a calculated size. While there is value in these certifications, the techniques through which we assess their performance do not present a proper accounting of their strengths and weaknesses, as their analysis has eschewed consideration of performance over individual samples in favour of aggregated measures. By considering the potential output space of certified models, this work presents two distinct approaches to improve the analysis of certification mechanisms, that allow for both dataset-independent and dataset-dependent measures of certification performance. Embracing such a perspective uncovers new certification approaches, which have the potential to more than double the achievable radius of certification, relative to current state-of-the-art. Empirical evaluation verifies that our new approach can certify $9\%$ more samples at noise scale $\sigma=1$ , with greater relative improvements observed as the difficulty of the predictive task increases.

Index Terms:

certified robustness, adversarial machine learning, adversarial attack, differential privacy

1 Introduction

Despite their excellent benchmark performance, the black-box nature of deep neural networks makes them prone to unexpected behaviours and instability. Of particular interest are adversarial examples, in which model outputs are changed by way of human imperceptible input perturbations [2, 26, 10, 6]. The spectre of such attacks poses risks to deployed models, in a fashion that can materially impact both the model deployer and their users.

While numerous defences against the mechanisms that produce these examples exist, they typically only tackle a single vulnerability and be circumvented by considering alternate attack mechanisms. This intrinsic limitation motivated the development of certifiably robust models, which provide a pointwise guarantee of resistance to attacks up to a fixed, calculable size. These certifications are typically constructed by exploiting either convex relaxation or randomised smoothing [15], and measure how close the nearest-possible adversarial example could be, independent of the technique employed to identify said attack.

Whenever a new certification mechanism is proposed, its utility is demonstrated by considering its average performance over a large number of samples. However, these aggregate measures do not align with the motivations of potential attackers, who may seek to attack individual samples. As such, aggregate measures of certification may disguise how the risk profile of individual samples may change, and more broadly removes the ability to interrogate the factors that drive differential performance of different certification schemes.

In contrast to these aggregated measures of performance, this work takes the contrary, disaggregated perspective, and considers how the performance of different certification schemes depends upon where an individual sample exists within the simplex of permissible output scores. This is made possible by way of some simple-yet-powerful observations relating to the analytic nature of certification mechanisms, which allow for analytic, dataset independent comparisons to be performed. These techniques are then merged with a dataset-dependent, sample-wise analysis, which considers a dataset’s distribution in the context of the output simplex.

In taking this approach, we are able to both better understand the relative performance of certification schemes, and the nature of adversarial risk in certified systems more broadly. This is of critical importance for deployed systems, in which samples will likely be considered to hold differing levels of adversarial risk. Moreover, it may well be that the distribution properties of standard tested datasets may not reflect those observed within vulnerable deployed systems. If this was true, then being able to assess certification performance in a fashion that is aware but not beholden to the output distribution has significant ramifications for understanding the generalisation of certification techniques.

This disaggregated perspective reveals the potential for two additions to the certification oeuvre, which we document within this work. The first of which involves a revised mechanism for constructing certifications by way of differential privacy, which for a subset of samples can produce a more than two-fold increase in the achievable certification for what we will categorise as multinomial style certifications, and a more than five-fold increase for softmax certifications. Our second contribution involves treating certifications not as the product of any one mechanism, but rather as the best calculated value across a set of different approaches, an approach that is only made possible by considering certification performance through a disaggregated lens.

These improvements in both how we consider certifications, and how we construct certifications are supported by Sections 2 and 3, which introduce the current range of extant certification approaches, and how their analytic nature can be used to construct dataset-independent comparisons. From this, Section 4 then considers how these measures can be used to help reveal the potential for our two new certification approaches, both of which have the potential to help improve the sample-wise performance of certifications through consideration of the simplex of permissible output scores. Section 6 then demonstrates how our new approaches uniformly outperforms prior techniques when considering expectations over models that output softmax probability distributions, while providing significant advantages in a subset of the certification domain for multinomial classifiers.

Refer to caption — Figure 1: A representation of the process of certifying a single image. Within this diagram, the blue squares represent repeated independent calculations, over which the ensemble expectations are calculated. In this paper, the ensemble itself is taken over the maximum certifications of the Cohen et al. [5], Li et al. [16], and Our approaches.

2 Preliminaries

Certification mechanisms use a mixture of computational and analytical techniques to provide guarantees of a models resistance to all adversarial attack. While this approach can be applied to training-time processes [17], within this work we are specifically interested in guarding against $\ell_{2}$ -norm perturbations to images at evaluation-time. A learned classifier $f_{\boldsymbol{\theta}}\in\mathbb{R}^{K}$ acting upon an input sample $\mathbf{x}\in\mathbb{R}^{d}$ is considered to be robust to attacks $\boldsymbol{\gamma}\in\mathbb{R}^{d}$ of bounded size $\|\boldsymbol{\gamma}\|_{p}\leq L$ —henceforth referred to as $B_{P}(L)$ —if

	$\displaystyle\operatorname*{arg\,max}_{\forall i\in\mathcal{K}}f_{\boldsymbol{\theta_{i}}}=j\text{ and }$
$\displaystyle\forall\boldsymbol{\gamma}\in B_{p}(L)\centerdot$	$\displaystyle f_{\boldsymbol{\theta_{j}}}(\mathbf{x}+\boldsymbol{\gamma})>\max_{i:i\neq j}f_{\boldsymbol{\theta_{i}}}(\mathbf{x}+\boldsymbol{\gamma})\enspace.$	(1)
	$\displaystyle\text{where }\mathcal{K}=\{1,\ldots,K\}$

The simplicity of this statement stands in stark contrast to the difficulty of proving it, as exploring the entire feasible space of $\boldsymbol{\gamma}$ is computationally intractable, especially as $d$ increases. As such, in order to establish the robustness of a model, certifications mechanisms instead construct provable lower bounds on the distance to the nearest adversarial example, making such certifications inherently conservative.

In attempting to certify against $\ell_{2}$ -norm bounded perturbations, two primary frameworks have been considered, which can be broadly categorised as statistical certifications, and those that exploit knowledge of the model’s architecture. Of these, the latter involves constructing bounds on the output of a model by inspecting and tracing bifurcations under norm-bounded perturbations [19, 27, 33, 33, 24]. Framed in general as convex relaxation, these techniques opt to use linear relaxation to construct bounding polytopes of a model’s outputs over bounded perturbations [23]. These approaches have been extended by adopting augmented loss functions to promote tight output bounds [28]. However, these approaches require significant amounts of computational resources to construct their certifications, which typically leads to these approaches failing to scale beyond datasets of the size of CIFAR- $10$ .

In contrast statistical methods typically leverage a process known as randomised smoothing, in which repeated model draws under noise are employed to produce what is known as a smoothed classifier, the properties of which can be exploited to construct guarantees of model robustness. This is made possible by attempting to parameterise worst-case behaviours under attack. While the addition of this noise is not cost-free, it is an embarrassingly parallel process that requires significantly fewer resources to scale to large models and complex datasets than are required with convex relaxation. Moreover, randomised smoothing does not require any modifications to the core model architecture, nor to the training and testing loops, which significantly reduces the level of engineering required to support the deployment of certified guarantees. It is due to these factors that for the remainder of this work we will only consider such statistical certification techniques.

2.1 Randomised Smoothing

To construct the smoothed classifier $g$ , the model is exposed to repeated samples under noise $\mathcal{N}(\mathbf{x},\sigma^{2}\mathbf{I})$ . However, rather than producing a stochastic model, by taking the expectation of the model outputs under noise, $g$ becomes deterministic a deterministic property, which can be translated into a certification by attempting to parameterise the worst-case response of the model to perturbations.

To date, a number of different parameterisation approaches have been proposed, producing certifications of varying tightness. However, these works often leverage different mechanisms to perform their smoothing, the nuance of which has not been fleshed out in prior works. To help formally distinguish between techniques, we will henceforth refer to techniques as drawing upon either the softmax expectation (sometimes referred to as the soft expectation), which represents the expected model output under noise; or the multinomial expectation (sometimes referred to as the hard expectation), which represents the expectation of the $\operatorname*{arg\,max}$ of a models outputs, which is equivalent to the expected predicted class under noise. While conceptually similar, these two approaches can mathematically be respectively represented by way of $\tilde{\mathbf{Y}}$ and $\tilde{\mathbf{Y}}^{\prime}$ , where

	$\displaystyle\tilde{\mathbf{X}}$	$\displaystyle\sim\mathcal{N}(\mathbf{x},\sigma^{2}\mathbf{I})$	$\displaystyle\tilde{\mathbf{Z}}$	$\displaystyle=f_{\boldsymbol{\theta}}(\tilde{\mathbf{X}})$
	$\displaystyle\tilde{\mathbf{Y}}$	$\displaystyle=\Pi(\tilde{\mathbf{Z}})$	$\displaystyle\tilde{Y}^{\prime}_{j}$	$\displaystyle=\begin{cases}1,&\tilde{Z}_{j}>\max_{k\in\mathcal{K}\setminus j}\tilde{Z}_{k}\\ 0,&\mbox{otherwise\enspace,}\end{cases}$		(2)

were $\Pi(\cdot)$ represents the softmax operator. Deterministic expectations over these classes are estimated with high probability by constructing a Monte-Carlo estimate over $\mathcal{D}$ , requiring $n$ i.i.d. draws either $(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{z}}_{i},\tilde{\mathbf{y}}_{i})$ or $(\tilde{\mathbf{x}}_{i},\tilde{\mathbf{z}}_{i},\tilde{\mathbf{y}}^{\prime}_{i})$ . From this, the output of the smoothed classifier $g$ corresponds to the expectations over $\tilde{\mathbf{Y}}$ and $\tilde{\mathbf{Y}}^{\prime}$ , where

\displaystyle\mathbb{E}_{\mathcal{D}}[\tilde{\mathbf{Y}}]=\frac{1}{n}\sum_{i=1}^{n}\tilde{\mathbf{y}}_{i}

\displaystyle\mathbb{E}_{\mathcal{D}}[\tilde{\mathbf{Y}}^{\prime}]=\frac{1}{n}\sum_{i=1}^{n}\tilde{\mathbf{y}}^{\prime}_{i}\enspace.

(3)

The stability of these expectations at inference time are supported by augmenting each training time sample with noise drawn from $\mathcal{N}(\mathbf{0},\sigma^{2}\mathbf{I})$ .

Given that certification mechanisms seek to guarantee the behaviour of models under potential attack, the introduction of Monte-Carlo estimates of the expectation may appear to be inherently contradictory and unsuitable. However, if we are able to definitively calculate the worst-case expectations for a given Monte-Carlo samplings output, then any subsequent certification can still be confidently considered as a worst-case, conservative bound upon the existence of any potential adversarial examples.

In the case of softmax expectations, the calculated and worst-case expectations are related by the well-known Hoeffding inequality [13], which provides a high-probability tail bound for a confidence level $\alpha>0$ , for which

P\left(\left|\mathbb{E}_{S}[\mathbf{Y}]-\mathbb{E}[\tilde{\mathbf{Y}}]\right|\leq\sqrt{\frac{\log_{e}(2/\alpha)}{2n}}\right)\geq 1-\alpha\enspace,

(4)

for the Monte-Carlo estimate $\mathbb{E}[\tilde{\mathbf{Y}}]$ and true, worst case softmax expectation $\mathbb{E}_{S}[\mathbf{Y}]$ .

For multinomial output distributions, which we will henceforth label as $\mathbb{E}_{S}[\mathbf{Y}^{\prime}]$ , we propose treating the two highest class outputs as distinct and unique outputs of a binomial distribution, and measuring uncertainties as such. Such bounds can be estimated by way of the Beta distribution, which reliably produces bounds that achieve the nominal coverage [3]. These uncertainties are calculated subject to the Bonferroni correction to $\alpha$ [8], to account for the two measures being were drawn from the same sampling process. Taking this approach is significantly computationally cheaper than other, more comprehensive mechanisms for constructing bounds upon the expectations [25, 11], while still producing guaranteed coverage.

For future clarity, we will henceforth refer to the sorted softmax and multinomial class expectations as $E_{S}$ and $E_{M}$ respectively, where the first element of each— $E_{S,0}$ and $E_{M,0}$ —employ the calculated lower bound on the estimated expectations, while the second element— $E_{S,1}$ and $E_{M,1}$ —correspond to the calculated upper bounds.

2.2 Certification Mechanisms

While previous works have considered the softmax and multinomial expectations to be broadly interchangeable, it is important to emphasise that the conceptual differences between the value of these outputs means that they are addressing fundamentally similar-but-distinct problem spaces. As such, we will now summarise key certification mechanisms for $\ell_{2}$ threat models in a fashion that reflects the applicable expectation framework for the technique at hand.

The first randomised smoothing based certifications drew upon differential privacy [9] in order to bound the response of models under noise-based perturbation, leading to what is known as the Lecuyer et al. [15] approach. Under this framework, certified robustness over $\boldsymbol{\gamma}\in B_{p}(L)$ can be calculated by way of

		$\displaystyle L_{\text{Lecuyer}}=$		(5)
		$\displaystyle\qquad\max_{\epsilon\in(0,1]}\frac{\sigma\epsilon}{\bigtriangleup\sqrt{2\log(1.25(1+e^{\epsilon})/(E_{S,0}-e^{2\epsilon}E_{S,1}))}}\enspace.$		(6)

Here $\bigtriangleup$ is a variant of the local Lipschitz continuity with respect to input perturbations of the base model $f(\cdot)$ , which for $\ell_{2}$ -norm-bounded perturbation corresponds to

\bigtriangleup=\max_{\mathbf{x},\mathbf{x}+\boldsymbol{\gamma}}\frac{\|f(\mathbf{x})-f(\mathbf{x}+\boldsymbol{\gamma})\|_{2}}{\|\mathbf{x}-(\mathbf{x}+\boldsymbol{\gamma})\|_{2}}\enspace.

(7)

While Equation 5 does rely upon finding a maximum, failing to reach a global maxima does not void the certification, as the established bound is provably true for all $\epsilon$ .

In practice, while Lecuyer et al. explicitly framed this approach in terms of the softmax output distribution, it can be applied to systems which only return a multinomial output distribution.

While the above certification mechanism was the first to provide guarantees of robustness for data sets as large as ImageNet, the conservative nature of the established bounds has left scope for new techniques to try and extend the size of achievable certifications. This was demonstrated by Li et al. [16], who exploited Rényi Divergence to provide an improved guarantee of size

L_{\text{Li}}=\sup_{\omega>1}\sigma\sqrt{\begin{aligned} \log\left[\vphantom{\frac{1}{2}}\right.&1-E_{M,0}-E_{M,1}+2\left(\frac{1}{2}E_{M,0}^{1-\omega}\right.\\ &+\left.\left.\frac{1}{2}E_{M,1}^{1-\omega}\right)^{\frac{1}{1-\omega}}\right]^{\frac{-2}{\omega}}\end{aligned}}\enspace.

(8)

Unlike the approach of Lecuyer et al., this this approach does not apply to outputs employing the softmax expectations.

The most popular mechanism in the current literature was developed by Cohen et al. [5, 22], and constructs certifications in terms of the multinomial output by way of the Gaussian quantile function $\Phi^{-1}$ , yielding certifications:

L_{\text{Cohen}}=\sigma\left(\Phi^{-1}\left(E_{M,0}\right)\right)\enspace.

(9)

While this work was presented alongside a second certification in terms of $E_{M,0}$ and $E_{M,1}$ that presents a tighter bound, their experiments exclusively considered the form above, which we will follow. It must also be noted that previous implementations of Equation 9 have used a process based on the binomial distribution, which introduces a low probability chance of incorrectly selecting the wrong output class and producing a failed certification—a detail that is further discussed in Section 5. To alleviate these concerns we will consider Cohen et al. to refer to Equation 9 subject to the same multinomial distribution as Li et al..

We note that recent works have provided further extensions upon the radii of certification achievable through these mechanisms. Some attempt to improve the mechanisms through which we calculate certifications [7]; while others attempt to induce shifts in the output distribution through training time loss-function modifications that incentivise larger certification radii [22, 32], with MACER being particularly popular [31]. However, deploying all of these approaches introduce significant increases in the requisite computational time, with MACER inducing a $40$ -fold increase in training time on our system. Moreover, it is crucial to emphasise that all of these systems for modifying training time certified robustness still derive their certifications using the approach of Cohen et al.. As such, within this work we will focus our improvements upon these core certification regimes, rather than their extensions, as improvements to the these core routines will still yield improvements when the modified mechanisms are deployed.

3 Comparing Certification Performance

Each of the aforementioned certification mechanisms has demonstrated its utility by showing improvements over the previously established state of the art. In each of these works, the core metric has been the certified accuracy, which corresponds to the proportion of samples that are correctly predicted with a radius greater than $r$ , equivalent to

c_{A}(r)=\frac{1}{N}\sum_{i=1}^{N}\mathds{1}[\operatorname*{arg\,max}g(\mathbf{x}_{i})=l(\mathbf{x}_{i})]\mathds{1}[L(g(\mathbf{x}_{i}))>r]\enspace,

(10)

where $l(\mathbf{x})$ is the correct class label for the sample $\mathbf{x}$ , and $L(\cdot)$ is the certification stemming from the smoothed classifier $g$ .

While such a measure allows for comparisons between techniques to be easily parsed, it also presents a picture that suggests that improvements in certification performance between approaches are uniform across all samples. In doing so, the drivers of certification performance in different techniques are distorted, and more difficult to interrogate. This is intrinsically problematic, as it both limits our ability to understand how a technique may generalise to new, semantically different datasets; and the capacity to assess certification performance for datasets where samples have differing adversarial risks.

In order to resolve these limitations in how we assess certifications schemes, we introduce a simple but powerful observation: certification mechanisms often decompose such that the only determinant of certification performance are the model output expectations. This would appear to be immediately contradicted by Equations 5, 8, and 9, however each of these exhibits linear, multiplicative proportionality to $\sigma$ , and as such, the only determinant of performance is the expectation set. While this is an obvious statement, it is also not a consideration that any prior work has considered, and allows for a dataset, learner, and model agnostic mechanism for comparing certification approaches, while also providing a framework for disaggregated analysis of the performance of any specific combination of dataset, learner, and model.

We introduce Definition 3.1 to formalise this statement, and to emphasise that certification mechanisms strictly depend upon the projection of the output $C_{K}$ space onto the $C_{3}$ -simplex. Thus any comparison between technique performance can also be considered in terms of this permissible space, yielding what we term as a dataset independent comparison. In doing so, the performance of the certifications can be compared in a fashion that is agnostic to the choice of model, dataset, or training infrastructure employed.

Definition 3.1.

Consider the set of smoothed classifiers $\mathcal{G}$ , from which any smoothed classifier $g\in\mathcal{G}$ constructs the mapping $g:\mathbb{R}^{d}\to C_{K}$ from input $\mathbf{x}$ to a point in the $K$ -simplex, for $K$ output classes. A certification mechanism for the model family $\mathcal{G}$ is a mapping $h:\mathcal{G}\times\mathbb{R}^{d}\to\mathbb{R}^{\geq 0}$ from a model and input instance to a certified radius.

To explore this concept, Figure 2 considers the relative certification performance between the Cohen et al. and Li et al. approaches, by considering the permissible space of expectations across the $C_{3}$ simplex as inputs, rather than the output of any specific model. While prior works have considered the Cohen et al. approach to uniformly produce the largest achievable radii of certification, in practice the commonly employed form of Cohen only out-certifies in the neighbourhood of $E_{0}+E_{1}\approx 1$ . Outside of this region—which is likely to be seen in datasets which exhibit significant semantic overlap between classes—Li et al. begins to produce significantly larger certifications.

3.1 Distributionally Aware Comparisons

The above framing demonstrates that the performance of certification schemes can be considered strictly in context of the permissible output space in $C_{3}$ . Doing so allows for a direct comparison of the performance of certification schemes without relying upon the model, dataset, or any other parts of the learning infrastructure. However, Definition 3.1 also suggests a second feasible framework for assessing certification performance. Consider the output distribution of $g(\mathbf{x})$ where the samples $\mathbf{x}$ are drawn from some data distribution $\mathcal{P}$ . Assessing this output distribution in the context of Figure 2, the dataset specific drivers of certification performance can be considered. It is this form of comparison that we will refer to as a distributionally-aware analysis of certification performance.

Such a perspective is valuable as it inherently allows us to better understand the factors driving certification performance for a particular trained model. In doing so also allows for inferences to be made about the potential certification performance of new and untested datasets, based upon their semantic complexity. Such an analysis may also allow deployed systems to develop an understanding of the risk of attack for particular samples.

4 An Improved Differential Privacy Based Mechanism

Extending the dataset-independent analysis of Figure 2 to incorporate Lecuyer et al. reveals that for multinomial outputs it is uniformly outperformed across the entirety of permissible outputs. However, the very notion that differential performance is possible upon a sample-wise basis suggests that improving upon the bounds delivered by Lecuyer et al. may achieve improvements over a subset of the output space. This is especially true as recent works have shown the differential privacy mechanism that underpins Lecuyer et al. underestimates the level of privacy that can be achieved for a given level of added noise [1, 34]. Within the remainder of this section, we will demonstrate how this improved mechanism can be incorporated into a new certification regime, that both uniformly outperforms Lecuyer et al. across the $C_{3}$ simplex; and yields improvements over Li et al. and Cohen et al. for the majority of the permissible output space.

In aide of this goal, we begin by introducing some core concepts of differential privacy: differential privacy as a stability condition on output distributions and how it translates to the stability of expected outputs (Lemma 4.1); the post-processing inequality (Lemma 4.2) and how it captures the invariance of differential privacy to data-independent compositions; and the improved analysis of the Gaussian mechanism. Of these, Lemma 4.1 and Lemma 4.2 follow Lecuyer [15]; and the improved analysis of the Gaussian mechanism follows [1].

Lemma 4.1 (Expected Output Stability Bound).

Consider a randomised function $A:\mathbb{R}^{n}\to[0,1]$ that preserves $(\epsilon,\delta)$ -DP, then it must be that $\mathbb{E}[A(\mathbf{x})]\leq e^{\epsilon}\mathbb{E}[A(\mathbf{x}+\boldsymbol{\gamma})]+\delta,$ where the expectations are taken over the randomness in $A$ .

The familiar post-processing inequality of differential privacy [9] is critical for certification in that it permits privacy-preserving randomisation to be applied at early network layers.

Lemma 4.2 (Post-Processing Inequality).

Consider any randomised algorithm $A$ acting on databases, and any (possibly randomised) algorithm $B$ with domain $range(A)$ . If $A$ is $(\epsilon,\delta)$ -DP then so too is $B\circ A$ . Moreover, at the level of database pairs $R_{\epsilon,\delta}(A)\subseteq R_{\epsilon,\delta}(B\circ A)$ .

The $\epsilon$ -DP of a random mechanism $A(\mathbf{x})$ is captured by the privacy loss random variable

l_{A,\mathbf{x},\mathbf{x}^{\prime}}=\log\left(\frac{P(A(\mathbf{x})=\mathcal{O})}{P(A(\mathbf{x}^{\prime})=\mathcal{O})}\right)\enspace,

(11)

where $\mathbf{x}^{\prime}=\mathbf{x}+\boldsymbol{\gamma}$ . By then introducing $L_{A,\mathbf{x},\mathbf{x}^{\prime}}=l_{A,\mathbf{x},\mathbf{x}^{\prime}}(Y)$ [1], an equivalent condition for differential privacy is

P[L_{A,\mathbf{x},\mathbf{x}^{\prime}}\geq\epsilon]-e^{\epsilon}P[L_{A,\mathbf{x},\mathbf{x}^{\prime}}\leq-\epsilon]\leq\delta\enspace.

(12)

To elaborate upon these probabilities, consider a mechanism of the form $y\sim h(\mathbf{x})+\mathcal{N}(0,\sigma^{2}\mathbf{I})$ , where $h(x)$ is any arbitrary function. Taking such a framing allows the privacy loss random variable to be analytically expressed as

$\displaystyle l_{A,\mathbf{x},\mathbf{x}^{\prime}}$	$\displaystyle=-\frac{1}{2\sigma^{2}}\left(\left\lVert\mathbf{y}-h(\mathbf{x})\right\rVert_{2}^{2}-\left\lVert\mathbf{y}-h(\mathbf{x}^{\prime})\right\rVert_{2}^{2}\right)\enspace,$
	$\displaystyle=\frac{1}{2\sigma^{2}}\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}^{2}+\frac{1}{\sigma^{2}}\langle\mathbf{y}-h(\mathbf{x}),$	(13)
	$\displaystyle\hskip 99.58464pth(\mathbf{x})-h(\mathbf{x}^{\prime})\rangle\enspace.$

A consequence of the fact that the inner product is equivalent to $\mathcal{N}\left(0,\sigma^{2}\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}^{2}\right)$ is that

A_{\mathbf{x},\mathbf{x}^{\prime}}=\mathcal{N}\left(\frac{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}^{2}}{2\sigma^{2}},\frac{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}^{2}}{\sigma^{2}}\right)\enspace.

(14)

Based upon this framing, the components of Equation (12) can be constructed as

		$\displaystyle P[L_{A,\mathbf{x},\mathbf{x}^{\prime}}\geq\epsilon]=$
		$\displaystyle\qquad\Phi\left(\frac{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}}{2\sigma}-\frac{\epsilon\sigma}{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}}\right)\enspace,$
		$\displaystyle P[L_{A,\mathbf{x},\mathbf{x}^{\prime}}\leq-\epsilon]=$		(15)
		$\displaystyle\qquad\Phi\left(-\frac{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}}{2\sigma}-\frac{\epsilon\sigma}{\left\lVert h(\mathbf{x})-h(\mathbf{x}^{\prime})\right\rVert_{2}}\right)\enspace,$

Extending this concept to certified robustness requires the application of the post-processing inequality of 4.2. If we consider a function $h(\cdot)$ such that $h(\mathbf{x})=\mathbf{x}$ , then Equations 12 and 4 can be combined to take the form

\Phi\left(\frac{L}{2\sigma}-\frac{\epsilon\sigma}{L}\right)-e^{\epsilon}\Phi\left(-\frac{L}{2\sigma}-\frac{\epsilon\sigma}{L}\right)\leq\delta\enspace,

(16)

where $L=\|\mathbf{x}-\mathbf{x}^{\prime}\|$ is the certified radius. If this is true for any function $y\sim h(\mathbf{x})+\mathcal{N}(0,\sigma^{2}\mathbf{I})$ , then by virtue of the post-processing inequality the equivalent privacy relationship also holds for any mechanism $f(\cdot)$ , which allows us to define our randomised mechanism as

	$\displaystyle A(\mathbf{x})$	$\displaystyle=f(h(\mathbf{x})+\mathcal{N}(0,\sigma^{2}\mathbf{I}))$
		$\displaystyle=f(\mathbf{x}+\mathcal{N}(0,\sigma^{2}\mathbf{I}))\enspace.$		(17)

This definition of $A(\mathbf{x})$ then becomes equivalent to the $g(\mathbf{x})$ of Section 2.1, if the expectations were to be taken over a single draw of noise.

In a similar fashion to Equation 5, this differential privacy based certification scheme can be framed as a maximisation problem, especially as Equation 16 does not admit an analytic inverse. However, rather than strictly considering $\epsilon\in(0,1]$ as the optimisation criteria, the above criteria can be recast as a constrained optimisation problem over $\epsilon\geq 0$ and $\delta\in[0,1]$ in order to construct a certification by way of

\displaystyle\begin{split}L&=\max_{\epsilon\geq 0,\delta\in[0,1]}L^{\prime}\\ &\textrm{s.t. }\Phi\left(\frac{\bigtriangleup L^{\prime}}{2\sigma}-\frac{\epsilon\sigma}{\bigtriangleup L^{\prime}}\right)-e^{\epsilon}\Phi\left(-\frac{\bigtriangleup L^{\prime}}{2\sigma}-\frac{\epsilon\sigma}{\bigtriangleup L^{\prime}}\right)\leq\delta\enspace,\\ &E[A_{i}(\mathbf{x})]-\max_{j\in\mathcal{K}\setminus\{i\}}E[A_{j}(\mathbf{x})]e^{2\epsilon}-(1+e^{\epsilon})\delta\geq 0\enspace.\end{split}

(18)

where $i$ corresponds to the predicted class, and $E[A_{i}(\mathbf{x})]$ and $E[A_{j}(\mathbf{x})]$ are respectively upper and lower bounded as per Section 2.1. While the form of the above equation is complex and nonlinear, the constraint functions exhibit near-monotonic behaviour in $(\epsilon,\delta,L)$ , and as such can be quickly solved with any conventional constrained numerical optimisation tools. That this approach is only conditional upon $A(\mathbf{x})\in[0,1]$ , it can be applied to both softmax and multinomial distributions, or indeed any $A(\mathbf{x})\in[0,c]$ , as the latter case is simply a uniform scaling of $[0,1]$ . The provably true nature of these guarantees is a direct consequence of Lecuyer et al. [15], as our approach tightens their bounds.

Beyond its ability to incorporate the improved privacy mechanism, framing the certification process as an optimisation process presents an additional advantage over the differentially privacy certifications of Lecuyer et al.: we remove the need to arbitrarily set the $(\epsilon,\delta)$ -privacy level prior to certification. This is meaningful as most of the contexts for which certification is useful do not care for a specific, fixed privacy level across all samples, rather they value producing the largest achievable certification, in order to more accurately gauge adversarial risks.

In the context of the dataset-agnostic comparisons across the $C_{3}$ simplex, Figure 3 reveals that for a softmax output distribution, our approach uniformly outperforms Lecuyer et al. across the entire simplex, exhibiting a maximal relative $5$ -fold improvement in the calculated certification.

For a multinomial output distribution, our new technique yields improved certifications against both Cohen et al. and Li et al. over the majority of the output space. When incorporating our technique into the comparison, across the $C_{3}$ simplex Cohen et al. produces strong certifications only in the neighbourhood of $E_{1}=1-E_{0}$ ; while Li et al. [16] produces the strongest certification near $(E_{0},E_{1})\to(1,0)$ . However, as $E_{0}$ and $E_{1}$ both decrease—as seen in semantically complex samples—our comparisons demonstrate that it is possible to inrease the certification more than two-fold.

4.1 Cost-Free Improvements To Certifications

Our next key observation builds upon Section 3: if each base certification mechanism is superior to all other mechanisms under consideration even on only one point in the output simplex, then taking the maximum across an ensemble of mechanisms’ radii provably dominates the performance of any single mechanisms.

Corollary 4.3 (Ensembling Certifications).

Consider the set of certification mechanisms $L_{1},\ldots,L_{n}$ , each of which incorporate a mapping from the $L_{i}:C_{3}\to\mathbb{R}^{\geq 0}$ . Each of these yields a certification $k_{i}(g,\cdot)=(L_{i}\cdot g)(\cdot)$ , where $k_{i}:\mathcal{R}^{d}\to\mathbb{R}^{\geq 0}$ . If $L^{\prime}(\mathbf{s})=\max_{i\in{1,\ldots,n}}L_{i}(\mathbf{s})$ then $k(g,\mathbf{x})\geq k_{i}(g,\mathbf{x})$ for all $g\in\mathcal{G},\mathbf{x}\in\mathbb{R}^{d}$ . Moreover, if each region of superiority each region of superiority $H_{i}=\{\mathbf{s}\in C:k_{i}(\mathbf{s})>k_{j}(\mathbf{s}),\forall j\neq i\}$ is non-empty, then $k$ dominates each base mechanism $g_{1},\ldots,g_{n}$ .

Figure 1 diagrammatically represents this ensembling mechanism, while the differential performance of Figure 3 demonstrates both the nature of the elements of $H_{i}$ and the functional differences that empower the ensembling process. It must be stressed is that if certifications in terms of a softmax output distribution are sought then only our technique and that of Lecuyer et al. can be certified. However, if we certify in terms of the multinomial distribution, all of the randomised smoothing based techniques can be applied, although in practice the Lecuyer approach is uniformly outperformed by all other multinomial approaches.

It is important to emphasise that this ensemble certification process is almost cost-free, as the dominant computation in certification is evaluating $f(\mathbf{x})$ by Monte-Carlo estimation, and as such the incremental cost of ensembling by Corollary 4.3 is minimal, as all the techniques build upon the same expectations. The evaluation of subsequent $h_{i}(\cdot)$ involves a handful of arithmetic calculations or simple numerical library calls (for Normal quantiles), which is trivial by comparison to the cost of completely restarting the certification process from scratch.

Reusing the expectations across the ensembling process also obviates the need to adjust the confidence intervals, even though multiple experiments are being performed. This stems from the fact that we are calculating expectation ranges with high probability, and the worst-case variant of these is being applied to the certification mechanisms. That these certification mechanisms are deterministic interpretations of said expectations eliminates any considerations regarding potential multiple hypothesis testing.

The concept of ensembling can be further extended by incorporating the convex relaxation style certifications as described in Section 2. However, while the randomised smoothing mechanisms share the majority of their computational burden between the techniques, no such opportunities to optimise the aggregate performance exist for convex relaxation methods. Due to this consideration, and the broader limitations of the convex relaxation based techniques, within this work we restrict our focus to those approaches that leverage randomised smoothing.

While contemporaneous work [30] has considered the robustness of ensembling neural network models, our work considers ensembles of certification mechanisms acting on a single common model. An ensemble of models requires multiple independent (and often costly) training loops, before requiring independent evaluations for each constituent model. In contrast, an ensemble of certification mechanisms allows for the majority of the computational burden of certification to be recycled between techniques, with the only additional burden the computational cost associated with solving the analytic certification equations.

5 Implementation

Algorithm 1 Implementation of all tested algorithms under the Multinomial framework. Here

O-M

denotes Our Multinomial approach.

1:function Multinomial-Certify(

n,\mathcal{K},\sigma,\alpha,\mathbf{x},f

)

\triangleright

n

samples for class selection and certification;

\mathcal{K}

set of output classes,

\sigma

s.d of noise;

\alpha

percentage confidence;

\mathbf{x},f

sample and function

E=\frac{\text{Count}(n,\mathcal{K},\sigma,\mathbf{x},f)}{n}

j=\operatorname*{arg\,max}E

E=\text{Sort}(E)

\triangleright

Sort in descending order

E_{0}=E[0]-\text{Confint}(E[0]\times n,n,\alpha)

\triangleright

Largest element of

E

. Confint using Beta-method

E_{1}=E[1]+\text{Confint}(E[1]\times n,n,\alpha)

7: if

E_{0}>0.5

then

L_{\text{C}}=

Equation 9 else 0

8: if

E_{0}>E_{1}

then

L_{\text{Li}}=

Equation 8

10:

L_{\text{O-M}}=

Equation 18

11: else

12:

L_{\text{Li}}=0,\qquad L_{\text{O-M}}=0

13: end if

14:

L_{\text{Ensemble}}=\max\left(L_{\text{C}},L_{\text{Li}},L_{\text{O-M}}\right)

15: return

j,\left(L_{\text{Ensemble}},L_{\text{C}},L_{\text{Li}},L_{\text{O-M}}\right)

16:end function

17:

18:function Count(

n,\mathcal{K},\sigma,\mathbf{x},f

)

19:

c_{j}=0\text{ }\forall j\in\mathcal{K}

20: for all

i\in n

21:

c_{j}=c_{j}+1

\operatorname*{arg\,max}f(\mathbf{x}+\mathcal{N}(\boldsymbol{0},\sigma^{2}\mathbf{I}))=j

22: end for

23: return

c_{j}

24:end function

Algorithm 2 Softmax Certification framework. Here

\Pi(\cdot)

represents the softmax operator and

O-S

denotes Our-Softmax implementation.

1:function Softmax-Certify(

n,\sigma,\alpha,\mathbf{x},f

)

\triangleright

n

samples for class selection and certification;

\mathcal{S}

set of output classes,

\sigma

s.d of noise;

\alpha

percentage confidence;

\mathbf{x},f

sample and function

E=\frac{1}{n}\sum_{i=1}^{n}\Pi(f(\mathbf{x}+\mathcal{N}(\boldsymbol{0},\sigma^{2}I))

j=\operatorname*{arg\,max}E

E=\text{Sort}(E)

E_{0}=E[0]-\text{Hoeffding}(E[0]\times n,n,\alpha)

E_{1}=E[1]+\text{Hoeffding}(E[1]\times n,n,\alpha)

7: if

E_{0}>E_{1}

then

L_{\text{Lec}}=

Equation 5

L_{\text{O-S}}=

Equation 18

10: else

11:

L_{\text{Lec}}=0,\qquad L_{\text{O-S}}=0

12: end if

13: return

j,\left(L_{\text{Lec}},L_{\text{O-S}}\right)

14:end function

To demonstrate how the aforementioned processes can be implemented, Algorithms 1 and 2 cover certifications across all of the techniques for all of the multinomial and softmax approaches. For a given test sample $x$ , the functions Multinomial-Certify and Softmax-Certify return the predicted class and certifications, with both tasks performed through randomised smoothing. Within this algorithm, the function Confint refers to the Beta function approach with Bonferroni correction, as is described within Section 2.1.

Algorithm 3 Cohen Original

1:function Binomial-Certify(

n_{0},n,\mathcal{K},\sigma,\alpha,\mathbf{x},f

)

\triangleright

n_{0}

and

n

(

n_{0}\ll n

) are samples for class selection and certification;

\mathcal{S}

set of output classes,

\sigma

s.d of noise;

\alpha

percentage confidence;

\mathbf{x},f

sample and function

j=\operatorname*{arg\,max}\text{Count}(n_{0},\mathcal{K},\sigma,\mathbf{x},f)

E_{0}=\frac{\text{Count}(n,j,\sigma,\mathbf{x},f)}{n}

\triangleright

Returns a single value

E_{0}=E_{0}-\text{Confint}(E_{0}\times n,n,\alpha)

5: if

E_{0}>0.5

then

L_{\text{C}}=\sigma\Phi^{-1}(E_{0})

7: else

L_{C}=0

9: end ifreturn

j,L_{C}

10:end function

Every effort was made to accurately recreate the implementations of Li et al. and Lecuyer et al., in order to perform a fair comparison. However, as was alluded to in Section 2, the approach of Cohen et al. has the potential to incorrectly classify samples, and in doing so fail to certify. This stems from Algorithm 3’s sampling approach, in which classification is based on a small number of samples—as few as $n_{0}=100$ in Cohen et al., before a second sampling draw over $n$ samples estimates the expectation based solely upon the count of times the classifier’s class is selected.

While selecting a large enough $n$ produces tight bounds on the uncertainties for the expectation, the uncertainties for the $100$ samples are often large enough to not be able to accurately classify the output solution. Moreover, the very nature of the binomial sampling of Cohen—in which the expectation of the classifier’s output is compared to the likelihood of the intersection of all other classes—means that if the incorrect class is chosen, the mistake cannot be rectified without completely re-sampling. As such, we instead selected the expectation for Cohen based not upon the binomial sampling process, but the same multinomial sampling process employed by both us and Li et al..

TABLE I: ImageNet performance for

n=10,000

across a range of

\sigma

. Metrics cover the Top

-1

accuracy (T1), median certification (M), the proportion of samples for which a technique produces the largest certification (L), and the proportion of samples for which

r>0.05

, as an additional measure of significance. The strongest measure for each without-ensembling has been bolded, as well as all metrics in the Ensemble columns as by its nature it must outperform any individual technique.

$\sigma$		Cohen			Li			Ours			Ensemble
	T $1$	M.	L.	$>$	M.	L.	$>$	M.	L.	$>$	M.	$>$
$\mathbf{1/4}$	$0.64$	$\mathbf{0.77}$	$\mathbf{0.89}$	$0.94$	$0.61$	$0.01$	$\mathbf{0.96}$	$0.35$	$0.11$	$\mathbf{0.96}$	$\mathbf{0.77}$	$\mathbf{0.96}$
$\mathbf{0.5}$	$0.49$	$\mathbf{1.18}$	$\mathbf{0.80}$	$0.93$	$1.02$	$0.03$	$\mathbf{0.97}$	$0.67$	$0.18$	$\mathbf{0.97}$	$\mathbf{1.18}$	$\mathbf{0.97}$
$\mathbf{1}$	$0.43$	$\mathbf{1.57}$	$\mathbf{0.66}$	$0.89$	$1.40$	$0.06$	$0.97$	$1.20$	$0.28$	$\mathbf{0.98}$	$\mathbf{1.59}$	$\mathbf{0.98}$
$\mathbf{2}$	$0.27$	$1.49$	$0.37$	$0.76$	$1.69$	$0.17$	$\mathbf{0.97}$	$\mathbf{1.75}$	$\mathbf{0.49}$	$\mathbf{0.97}$	$\mathbf{1.76}$	$\mathbf{0.97}$
$\mathbf{3}$	$0.17$	$0.82$	$0.23$	$0.62$	$1.68$	$0.22$	$\mathbf{0.94}$	$\mathbf{1.81}$	$\mathbf{0.65}$	$\mathbf{0.94}$	$\mathbf{1.81}$	$\mathbf{0.94}$
$\mathbf{4}$	$0.12$	$0.06$	$0.17$	$0.5$	$1.90$	$0.22$	$\mathbf{0.93}$	$\mathbf{2.06}$	$\mathbf{0.72}$	$\mathbf{0.93}$	$\mathbf{2.06}$	$\mathbf{0.93}$
$\mathbf{5}$	$0.09$	$0.00$	$0.17$	$0.42$	$1.87$	$0.17$	$\mathbf{0.91}$	$\mathbf{2.07}$	$\mathbf{0.79}$	$\mathbf{0.91}$	$\mathbf{2.07}$	$\mathbf{0.91}$

6 Results

To validate the applicability of our improved Gaussian mechanism and ensembling approach, experiments were performed for $\ell_{2}$ -norm-bounded certifications utilising both CIFAR- $10$ [14], and the latest face-blurred variant of ImageNet [29], which respectively exist under a MIT and a BSD $3-$ Clause license. While this blurring has been shown to introduce a slight degradation of predictive performance, the privacy-preserving protections that are introduced are important for the integrity of vision research. Our results highlight ImageNet, due to presence of human indistinguishable adversarial examples within trained models [26].

To support this, training was performed upon NVIDIA P $100$ GPUs using in PyTorch [20] and a cross-entropy loss, with a fixed random seed employed to ensure reproducibility. Certification employed $n=100,000$ samples for estimating expectations. Confidence intervals were set at $\alpha=0.001$ for a $0.1\%$ chance of any produced certification over-estimating the radius of certification. These parameters mirror those of Cohen et al. [5]. In the case of CIFAR- $10$ , two GPUs and a $110$ -layer residual network were trained for $90$ epochs under added noise $\sigma\in\{0.12,0.25,0.5,1.0\}$ , with certification then being applied using the appropriate matched $\sigma$ . This training process used stochastic gradient descent subject to an initial learning rate of $0.1$ , momentum of $0.9$ and weight decay of $10^{-4}$ , and batch size set at $400$ . The learning rate was refined in a step-wise fashion where $L_{r}=0.1\times 0.1^{\lfloor\frac{e}{30}\rfloor}$ , where $e$ is the current epoch.

For ImageNet, training was performed using a mixed precision [18] ResNet- $50$ model using a ten-node system, where each node also had access to two GPUs. In order to understand the influence of noise on the more-complex model, training and certification were performed at levels $\sigma\in\{0.25,0.5,1,2,3,4,5\}$ . Due to the increased complexity of ResNet- $50$ , training was modified to match best practices under available system resources. Both GPU cores on every node were trained with a batch size of $200$ , and in the fashion of [12] the learning rate is set at $L_{r}=0.1563$ , to be equivalent to a learning rate of $0.1$ for a single node training with a batch size of $256$ . To improve convergence, the first three epochs were performed under $L_{r}=0.01563$ , before reverting to the original rate. This was then scaled by $0.1$ at the $30$ , $60$ and $80$ -th epochs.

6.1 Certified Accuracy

To assess the level of certified robustness provided, we adopt the now standard concept of certified accuracy. This records the proportion of samples correctly predicted by randomised smoothing with a certified $r\geq R$ , and in doing so captures both the accuracy of the model under noise, and the level of certification that can be provided to samples. Reflecting our analytic analysis of the softmax techniques, Figure 4 demonstrates a uniform improvement over Lecuyer et al., with both a larger certified radius as $r\to 0$ , and a slower rate of decay in the certified accuracy as the radius increases. These performance increases reflect the analytic improvements demonstrated within Figure 3.

While prior works have indicated that Cohen et al. [5] uniformly outperformed all other techniques under a multinomial distribution by considering aggregate statistics, our experiments clearly demonstrate that both our approach and Li et al. are able to certify a greater proportion of samples. This is a product of the implicit restriction in Cohen et al. that $E_{0}\geq 0.5$ . Relative to Li et al., our technique yields improvements for samples in which $r<1$ , however beyond this point the distribution decays due to the tightness of the analytic bound as $E_{0}\to 1$ , relative to the other tested approaches.

The ensemble approach clearly improves upon both the number of samples certified, and the radii at which these samples are certified across all $\sigma$ , as per Table I and Figures 4 and 8. While the relative performance is maintained across both datasets, for CIFAR- $10$ there is an across the board increase in the overall certified accuracy, due to the decreased prediction difficulty in the $10$ -class CIFAR relative to the $1000$ -class ImageNet. Such differences in relative performance align with the multinomial comparison of Figure 3 and underscore the importance of our proposed ensemble certification approach, as it leverages the regions of the simplex of output score in which each technique produces the largest certification radii.

Figure 7 reinforces this, by demonstrating the proportion of samples for which the ensemble produces more than a given percentage-improvement, relative to Cohen et al.. We reiterate the observation of Algorithm 1 abstaining from certifying a greater number of samples than the alternate techniques, and thus the ensemble produces an infinite percentage improvement relative to Cohen et al. for these samples. More broadly, the level of out-performance of the Ensemble relative to Cohen et al. is confirmed by Table II, which demonstrates the relative performance of the Ensemble against the prior state of the art of Cohen et al.. We note that as the Wilcoxon test produces highly significant statistics as it is comparing the prior state of the art in Cohen et al. to the ensemble, which also incorporates Cohen et al.. However, broadly Table II demonstrate that as $\sigma$ increases the Ensemble is able to reliably improve upon a significant proportion of samples, with a notable effect on the mean certification.

TABLE II: Metrics of performance, comparing the ensemble to the state of the art of Cohen et al.. Here the columns p and Statistic refer to the results of the Wilcoxon signed rank test (with the Pratt signed-rank zero procedure) [21]; Proportion is the percentage of samples which yield an improved certification due to the ensembling process, and the median and mean (exluding where

L_{\text{Cohen}}=0

) columns cover the improvements. (in both absolute and percentage-values).

Dataset	$\sigma$	p	Statistic	Proportion	Absolute	$\%$	Absolute	$\%$
					Median		Mean
Cifar	$0.12$	$<10^{-5}$	$>10^{4}$	$14.4\%$	$0$	-	$0.0020$	$14.9\%$
	$0.25$	$<10^{-5}$	$>10^{4}$	$26.1\%$	$0$	-	$0.0079$	$52.2\%$
	$0.5$	$<10^{-5}$	$>10^{4}$	$47.4\%$	$0$	-	$0.035$	$46.1\%$
	$1.0$	$<10^{-5}$	$>10^{4}$	$70.2\%$	$0.079$	$32.2\%$	$0.12$	$307.1\%$
Imagenet	$0.25$	$<10^{-5}$	$>10^{4}$	$22.9\%$	$0$	-	$0.0083$	$37.1\%$
	$0.5$	$<10^{-5}$	$>10^{4}$	$36.5\%$	$0$	-	$0.031$	$181.3\%$
	$1.0$	$<10^{-5}$	$>10^{4}$	$60.2\%$	$0.053$	$16.6\%$	$0.12$	$99.2\%$

6.2 Computational Costs

The nature of randomised smoothing—in that it requires repeated sampling—inherently introduces a significant computational cost, even if this process can be trivially parallelised. Intuitively it would appear that the computational cost of any ensembling approach would scale multiplicatively with the number of techniques being employed within the ensemble. However, as the expectations can be reused across all the ensembled techniques, this dominant component of the computational cost only has to be performed once. In practice, on NVIDIA P $100$ GPU the time to calculate the radius of certification (after the expectations have been estimated) is less than $0.1$ seconds for per technique, which is equivalent to less than the cost of $100$ performing samples under noise, as result that is borne out by Figure 6b.

When considering computational cost, it is important to emphasise that mechanistic improvements in how we perform certifications are useful not just for the increases in certified radius, but also to potentially decrease the computational cost. Due to the inherent link between the sample size and uncertainty levels, and these uncertainty levels with the certified radius (as is seen in Figure 6a), improved certifications can be considered as an offset to the number of samples required, leading to commensurate decreases in the overall computational time.

6.3 Simplex Coverage

As was established within Figure 3 and Section 2, any experimental comparison of these approaches is an implicit function of the model, dataset, training procedure, and hyperparameters employed in training. This dependency is visible in Figures 5 and 9, in which transitioning from CIFAR- $10$ to ImageNet induces a distributional shift in the distribution of label-space expectations towards the region of the simplex of output scores that favours our certification approach, with the average $E_{0}$ and $E_{1}$ both decreasing. That this shift occurs reflects the greater semantic similarities between classes within ImageNet, which results in output expectations that are more evenly spread across classes.

This sensitivity to the input parameters is reinforced by Table I, which demonstrates the inherent coupling between predictive difficulty (as indicated by the Top-1 accuracy) and a translation of the output distribution towards a region that is favourable to our differential privacy based approach. These changes in the output distribution in turn induce a shift in performance, from the initial strong performance of Cohen et al. (with exception to the proportion of samples certified with radii greater than $0.05$ ), to metrics that uniformly favour our new approach as $\sigma>1$ . These results also demonstrate that our approach can certify $9\%$ more samples above $0.05$ at $\sigma=1$ , and $49\%$ more as $\sigma$ is increased to $5$ . Increasing $\sigma$ above $1$ also leads to our technique exhibiting a monotonic improvement in the median certified radii, which stands in stark contrast to the approach of Cohen et al., which exhibits a consistent decrease in the median certification radius to $0$ at $\sigma=5$ . This behaviour in Cohen et al. is a product of the smoothing effect of additive noise, which results in fewer and fewer samples having a highest class expectation above $0.5$ , preventing certification.

Both Figure 5 and Table I demonstrate that our analytic comparison and ensemble frameworks take advantage of simplex regions of differential performance to generate ensemble certifications that produce consistent, best-in-class results. Moreover, we can be confident that the outperformance of this technique will be maintained irrespective of the complexity of the underlying dataset.

This work presents both ensembling and disaggregated analysis in the context of an $\ell_{2}$ bounded threat model, due to the relative maturity of attacks in this space. However, our approaches are equally applicable to analysing certification performance against any threat model, and future works expanding the oeuvre of certified threat models should exploit the techniques described within this work in order to better understand and maximise certification performance.

6.4 Alternative Training Approaches

The same independence of our approaches to a specific threat model also holds true when considering alternate certification frameworks— including MACER [31], denoising [4], or Geometrically Informed Certified Robustness [7]—as they each construct certifications in an identical manner. While these techniques shift the distributions seen within Figure 9, our analysis still holds. While Figure 10 demonstrates that MACER shifts the overall point distribution towards the region where Cohen et al. is favoured, a significant proportion of samples are still located within the region where our new technique yields improved certifications.

7 Conclusions

By considering certification performance through a disaggregated lens, this work demonstrates that it is possible to better understand the drivers of certification performance, from both a dataset and mechanistic perspective. This form of analysis demonstrated the utility of our multiple improvements to to the current oeuvre of certification techniques, including an improved differential privacy based Gaussian mechanism that for some samples can produce a more than two-fold increase in the achievable multinomial certification, or up to five-fold in the case of softmax certifications. These improvements are particularly evident in the case where the largest class expectation is diminished.

Beyond this, our work also demonstrates that a simple ensemble-of-certifications can reuse the costly components of certifications, in order to improve upon performance relative to any one single technique. This technique, which introduces almost no additional computational burden, is able to certify $98\%$ of samples above $r=0.05$ at $\sigma=1$ for ImageNet, relative to the $89\%$ achieved by the prior state-of-the-art in Cohen et al. [5]. Our technique’s advantage over other certification mechanisms grows with both the semantic complexity of the dataset and $\sigma$ .

Through this works mechanisms, we have demonstrated how minor changes to certification systems can be used to construct larger certifications, which in turn would allow for a greater degree of confidence in the adversarial resistance of systems deployed in contexts where they may be implemented. Moreover, our approach of assessing certification performance within the context of the simplex of output scores has the potential to allow for a more nuanced view of adversarial risk. Operationalising this perspective, in the context of our improvements to the achievable radii of certification, has the potential to reduce reduce the need for domain experts to manually verify inputs that have the potential to be adversarially influenced, or to guide a greater understanding of adversarial risk in deployed systems.

Acknowledgements

This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. This work was also supported in part by the Australian Department of Defence Next Generation Technologies Fund, as part of the CSIRO/Data61 CRP AMLC project. Sarah Erfani is in part supported by Australian Research Council (ARC) Discovery Early Career Researcher Award (DECRA) DE220100680.

Resource Availability

The full suite of code required to replicate the experiments contained within this work can be found at https://github.com/andrew-cullen/ensemble-simplex-certifications

Ethics Statement

The techniques and processes described within this paper have the potential to decrease the vulnerability of deployed machine learning systems to adversarial example. However, in doing so there is also the potential to also counter beneficial applications of attacks, such as stylometric privacy. However, we believe that the value in minimising risks to deployed systems significantly outweighs these concerns.

References

[1] Borja Balle and Yu-Xiang Wang. Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising. In International Conference on Machine Learning, pages 394–403. PMLR, 2018.
[2] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. Evasion Attacks Against Machine Learning at Test Time. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECMLPKDD, pages 387–402. Springer, 2013.
[3] Ewan Cameron. On the Estimation of Confidence Intervals for Binomial Population Proportions in Astronomy: The Simplicity and Superiority of the Bayesian Approach. Publications of the Astronomical Society of Australia, 28(2):128–139, 2011.
[4] Nicholas Carlini, Florian Tramer, J Zico Kolter, et al. (certified!!) adversarial robustness for free! arXiv preprint arXiv:2206.10550, 2022.
[5] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified Adversarial Robustness via Randomized Smoothing. In International Conference on Machine Learning, ICML, pages 1310–1320. PMLR, 2019.
[6] Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah Monazam Erfani, and Benjamin I.P. Rubinstein. The Certification Paradox: Certifications Admit Better Attacks. arXiv preprint arXiv:2302.04379, 2022.
[7] Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah Monazam Erfani, and Benjamin I.P. Rubinstein. Double Bubble, Toil and Trouble: Enhancing Certified Robustness through Transitivity. In Advances in Neural Information Processing Systems, volume 35, pages 19099–19112. NeurIPS, 2022.
[8] Olive Jean Dunn. Multiple Comparisons Among Means. Journal of the American Statistical Association, 56(293):52–64, 1961.
[9] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography Conference, TCC, pages 265–284. Springer, 2006.
[10] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations, ICLR, 2015.
[11] Leo A Goodman. On Simultaneous Confidence Intervals for Multinomial Proportions. Technometrics, 7(2):247–254, 1965.
[12] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Minibatch SGD: Training Imagenet in 1 Hour. arXiv preprint arXiv:1706.02677, 2017.
[13] Wassily Hoeffding. Probability Inequalities for Sums of Bounded Random Variables. In The Collected Works of Wassily Hoeffding, pages 409–426. Springer, 1994.
[14] Alex Krizhevsky, Geoffrey Hinton, et al. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, 2009.
[15] Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, and Suman Jana. Certified Robustness to Adversarial Examples with Differential Privacy. In 2019 IEEE Symposium on Security and Privacy (S & P), pages 656–672. IEEE, 2019.
[16] Bai Li, Changyou Chen, Wenlin Wang, and Lawrence Carin. Certified Adversarial Robustness with Additive Noise. In Advances in Neural Information Processing Systems, volume 32, pages 9459–9469. NeurIPS, 2019.
[17] Shijie Liu, Andrew C Cullen, Paul Montague, Sarah M Erfani, and Benjamin IP Rubinstein. Enhancing the antidote: Improved pointwise certifications against poisoning attacks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 8861–8869, 2023.
[18] Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. Mixed Precision Training. In International Conference on Learning Representations, ICLR, 2018.
[19] Matthew Mirman, Timon Gehr, and Martin Vechev. Differentiable Abstract Interpretation for Provably Robust Neural Networks. In International Conference on Machine Learning, ICML, pages 3578–3586. PMLR, 2018.
[20] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An Imperative Style, High-Performance Deep Learning Library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32, pages 8024–8035. NeurIPS, 2019.
[21] John W Pratt. Remarks on zeros and ties in the wilcoxon signed rank procedures. Journal of the American Statistical Association, 54(287):655–667, 1959.
[22] Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers. In Advances in Neural Information Processing Systems, volume 32, pages 11292–11303. NeurIPS, 2019.
[23] Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, and Pengchuan Zhang. A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks. In Advances in Neural Information Processing Systems, volume 32, pages 9835–9846. NeurIPS, 2019.
[24] Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev. An Abstract Domain for Certifying Neural Networks. Proceedings of the ACM on Programming Languages, 3(POPL):1–30, 2019.
[25] Cristina P Sison and Joseph Glaz. Simultaneous Confidence Intervals and Sample Size Determination for Multinomial Proportions. Journal of the American Statistical Association, 90(429):366–369, 1995.
[26] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing Properties of Neural Networks. In International Conference on Learning Representations, ICLR, 2014.
[27] Lily Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Luca Daniel, Duane Boning, and Inderjit Dhillon. Towards Fast Computation of Certified Robustness for ReLU Networks. In International Conference on Machine Learning, pages 5276–5285. PMLR, 2018.
[28] Kaidi Xu, Zhouxing Shi, Huan Zhang, Yihan Wang, Kai-Wei Chang, Minlie Huang, Bhavya Kailkhura, Xue Lin, and Cho-Jui Hsieh. Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond. In Advances in Neural Information Processing Systems, volume 33, pages 1129–1141. NeurIPS, 2020.
[29] Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, and Olga Russakovsky. A Study of Face Obfuscation in Imagenet. arXiv preprint arXiv:2103.06191, 2021.
[30] Zhuolin Yang, Linyi Li, Xiaojun Xu, Bhavya Kailkhura, Tao Xie, and Bo Li. On the Certified Robustness for Ensemble Models and Beyond. arXiv preprint arXiv:2107.10873, 2021.
[31] Runtian Zhai, Chen Dan, Di He, Huan Zhang, Boqing Gong, Pradeep Ravikumar, Cho-Jui Hsieh, and Liwei Wang. MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius. In International Conference on Learning Representations, ICLR, 2020.
[32] Huan Zhang, Hongge Chen, Chaowei Xiao, Sven Gowal, Robert Stanforth, Bo Li, Duane Boning, and Cho-Jui Hsieh. Towards Stable and Efficient Training of Verifiably Robust Neural Networks. In International Conference on Learning Representations, 2020.
[33] Huan Zhang, Tsui-Wei Weng, Pin-Yu Chen, Cho-Jui Hsieh, and Luca Daniel. Efficient Neural Network Robustness Certification with General Activation Functions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31, pages 4939–4948. NeurIPS, 2018.
[34] Jun Zhao, Teng Wang, Tao Bai, Kwok-Yan Lam, Zhiying Xu, Shuyu Shi, Xuebin Ren, Xinyu Yang, Yang Liu, and Han Yu. Reviewing and Improving the Gaussian Mechanism for Differential Privacy. arXiv preprint arXiv:1911.12060, 2019.

It’s Simplex! Disaggregating Measures to Improve Certified Robustness ††thanks: Identify applicable funding agency here. If none, delete this.

Abstract

Index Terms:

1 Introduction

2 Preliminaries

2.1 Randomised Smoothing

2.2 Certification Mechanisms

3 Comparing Certification Performance

Definition 3.1.

3.1 Distributionally Aware Comparisons

4 An Improved Differential Privacy Based Mechanism

Lemma 4.1 (Expected Output Stability Bound).

Lemma 4.2 (Post-Processing Inequality).

4.1 Cost-Free Improvements To Certifications

Corollary 4.3 (Ensembling Certifications).

5 Implementation

6 Results

6.1 Certified Accuracy

6.2 Computational Costs

6.3 Simplex Coverage

6.4 Alternative Training Approaches

7 Conclusions

Acknowledgements

Resource Availability

Ethics Statement

References

It’s Simplex! Disaggregating Measures to Improve Certified Robustness ^†^†thanks: Identify applicable funding agency here. If none, delete this.