The Laplace Mechanism has optimal utility
for differential privacy over continuous queries

Natasha Fernandes134, Annabelle McIver1, Carroll Morgan23 ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. 1Department of Computing, Macquarie University, Sydney 2School of Computer Science and Engineering, UNSW, Sydney 3Data61, CSIRO, Sydney 4Inria and École Polytechnique, IPP, France

Abstract

Differential Privacy protects individuals' data when statistical queries are published from aggregated databases: applying ``obfuscating'' mechanisms to the query results makes the released information less specific but, unavoidably, also decreases its utility. Yet it has been shown that for discrete data (e.g. counting queries), a mandated degree of privacy and a reasonable interpretation of loss of utility, the Geometric obfuscating mechanism is optimal: it loses as little utility as possible [Ghosh et al.[1]].

For continuous query results however (e.g. real numbers) the optimality result does not hold. Our contribution here is to show that optimality is regained by using the Laplace mechanism for the obfuscation.

The technical apparatus involved includes the earlier discrete result [Ghosh op. cit.], recent work on abstract channels and their geometric representation as hyper-distributions [Alvim et al.[2]], and the dual interpretations of distance between distributions provided by the Kantorovich-Rubinstein Theorem.

Index Terms:

Differential privacy, utility, Laplace mechanism, optimal mechanisms, quantitative information flow, abstract channels, hyper-distributions.

I Introduction

I-A The existing optimality result, and our extension

Differential Privacy (DP) concerns databases from which (database-) queries produce statistics: a database of information about people can be queried e.g. to reveal their average height, or how many of them are men. But a risk is that from a general statistic, specific information might be revealed about individuals' data: whether a specific person is a man, or his height, or even both. Differentially-private ``obfuscating'' mechanisms diminish that risk by perturbing their inputs (the raw query results) to produce outputs (the query reported) that are slightly wrong in a probabilistically unpredictable way. That diminishes the personal privacy risk (good) but also diminishes the statistics' utility (bad).

The existing optimality result is that for a mandated differential privacy parameter, some $\varepsilon{>}0$ , and under conditions we will explain, the Geometric obfuscating mechanism $G^{\varepsilon}$ (depending on $\varepsilon$ ) loses the least utility of any $\varepsilon$ -Differentially Private oblivious obfuscating mechanism for the same $\varepsilon$ , that loss being caused by her having to use the perturbed statistic instead of the real one [1].

A conspicuous feature of $\varepsilon$ -DP (that is $\varepsilon$ -differential privacy) is that it is achieved without having to know the nature of the individual's privacy that it is protecting: it is simply made `` $\varepsilon$ -difficult'' to determine whether any of his data is in the database at all. Similarly the minimisation of an observer's loss (of utility) is achieved by the optimal obfuscation without knowing precisely how the obfuscation affects her: instead, the existence of a ``loss function'' is postulated that monetises her loss (think ``dollars'') based on the raw query (which she does not know) and the obfuscated query (which she does know) — and optimality of $G^{\varepsilon}$ holds wrt. all loss functions (within certain realistic constraints) and all (other) $\varepsilon$ -DP mechanisms $M^{\varepsilon}$ .

in summary: The existing result states that the $\varepsilon$ -DP Geometric obfuscating mechanism $G^{\varepsilon}$ minimises loss of utility to an observer when the query results are discrete, e.g. counting queries in some $(0..N)$ , and certain reasonable constraints apply to the monetisation of loss. But the result does not hold when the query results are continuous, e.g. in the unit interval $[0,1]$ . We show that optimality is regained by using the $\varepsilon$ -DP Laplace mechanism $L^{\varepsilon}$ .

II Differential privacy, loss of utility
and optimality

II-A Differential privacy

Differential privacy begins with a database that is a multiset of rows drawn from some set $R$ [3]; thus the type of a database is $\mathbb{B}\kern-0.29999ptR$ (using `` $\mathbb{B}$ '' for ``bag''). A query $q$ is a function from database to a query-result in some set ${\cal X}$ , the input of the mechanism, and is thus of type $\mathbb{B}\kern-0.29999ptR{\mathbin{\rightarrow}}{\cal X}$ .

A distance function between databases $\mbox{\sc\small d}{:}\,\mathbb{B}\kern-0.29999ptR{\times}\mathbb{B}\kern-0.29999ptR\mathbin{\rightarrow}{\mathbb{R}}$ measures how different two databases are from each other. Often used is the Hamming distance $\mbox{\sc\small d}_{H}$ , ¹¹1The Hamming distance is also known as the symmetric distance. which gives (as an integer) how many whole rows would have to be removed or inserted to transform one database into another: given two databases $b_{1},b_{2}{:}\,\mathbb{B}\kern-0.29999ptR$ we define $\mbox{\sc\small d}_{H}(b_{1},b_{2})$ to be the size $\#(b_{1}\mathbin{\bigtriangleup}b_{2})$ of their (multiset-) symmetric difference. Thus in particular two databases that differ only because a row has been removed from one of them have Hamming distance 1, and we say that such databases are adjacent.

We define also a distance function (metric) between –for the moment– discrete distributions ${\mathbb{D}}{\cal Y}$ over a set of observations ${\cal Y}$ the output of the mechanism. Given two distributions $\delta_{1},\delta_{2}$ on ${\cal Y}$ , their distance $\,{\textsf{\small d}_{D}}(\delta_{1},\delta_{2})$ (for ``Dwork'') is based on the largest ratio over all $Y{\subseteq}{\cal Y}$ between probabilities assigned to $Y$ by $\delta_{1}$ and $\delta_{2}$ — it is

\,{\textsf{\small d}_{D}}(\delta_{1},\delta_{2}):=\quad\textsf{max}_{\,Y{\subseteq}{\cal Y}}~{}|\ln(\delta_{1}(Y)/\delta_{2}(Y))|

(1)

where $\delta(Y)$ is the probability $\delta$ assigns to the whole subset $Y$ of ${\cal Y}$ , and the logarithm is introduced to make the distance satisfy the triangle inequality that metrics require. ²²2This distance is also known as “max divergence”.

Following the presentation of Chatzikokolakis et al. [4], once we have chosen a metric d on databases, we say that a mechanism $M$ achieves $\varepsilon$ -Differential Privacy wrt. that d and some query $q$ , i.e. is $\varepsilon$ -DP for $\mbox{\sc\small d},q$ , just when

\begin{array}[]{l}\mbox{for all databases $b_{1},b_{2}$ in $\mathbb{B}\kern-0.29999ptR$ we have}\\ \,{\textsf{\small d}_{D}}(M(q(b_{1})),M(q(b_{2})))~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(b_{1},b_{2})\makebox[0.0pt][l]{\quad.}\end{array}

(2)

In the special case when d is the Hamming distance $\mbox{\sc\small d}_{H}$ , the above definition becomes

\begin{array}[]{l}\mbox{for all databases $b_{1},b_{2}$ in $\mathbb{B}\kern-0.29999ptR$ with $\mbox{\sc\small d}_{H}(b_{1},b_{2}){\leq}1$},\\ \mbox{i.e.\ that differ only in the presence/absence of a single row},\\ \mbox{and for all subsets $Y$ of ${\cal Y}$ we have}\\[4.30554pt] \quad\Pr(\,M(q(b_{1}))\,{\in}\,Y\,)~{}~{}~{}\leq~{}~{}~{}e^{\varepsilon}\mathbin{\cdot}\Pr(\,M(q(b_{2}))\,{\in}\,Y)\makebox[0.0pt][l]{\quad.}\end{array}

(3)

With the above metric-based point of view we can say that an $\varepsilon$ -DP mechanism is (simply) a $\varepsilon$ -Lipschitz function from databases $\mathbb{B}\kern-0.29999ptR$ with metric d to distributions of observations ${\mathbb{D}}{\cal Y}$ with metric $\,{\textsf{\small d}_{D}}$ [4].

Definition 1

( $\,{\textsf{\small d}_{D}}$ /d $\varepsilon$ -DP for mechanisms) A Lipschitz mechanism $M$ from ${\cal X}$ raw query outputs) to ${\cal Y}$ (observations) is ( $\,{\textsf{\small d}_{D}}$ /d) $\varepsilon$ -Differentially Private just when

\begin{array}[]{l}\mbox{for all inputs $x_{1},x_{2}$ in ${\cal X}$\ to $M$ we have}\\ \,{\textsf{\small d}_{D}}(M(x_{1}),M(x_{2}))~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(x_{1},x_{2})\makebox[0.0pt][l]{\quad,}\\ \end{array}

(4)

in which we elide $\,{\textsf{\small d}_{D}}$ and d when they are clear from context. In (2) we gave the special case where $M$ 's inputs $x_{1},x_{2}$ were raw query-results $q(b_{1}),q(b_{2})$ , i.e. with $b_{1},b_{2}$ two databases acted on by the same query-function $q$ . And (3) was further specialised to where the two databases were adjacent and the metric was $\mbox{\sc\small d}_{H}$ , the Hamming distance.

II-B ``Counting'' queries

Counting queries on databases are the special case where the codomain ${\cal X}$ of the query (the mechanism input) is the non-negative integers and the query $q$ returns the number of database rows satisfying some criterion, like ``being a man''. The ``average height'' query is not a counting query.

When the database metric is the Hamming distance $\mbox{\sc\small d}_{H}$ , a counting query can be characterised more generally as one that is a 1-Lipschitz function wrt. $d_{H}$ and the usual metric (absolute difference) on the integers, i.e. one whose result changes by at most 1 between adjacent databases. Since composition of Lipschitz functions (merely) multiplies their Lipschitz factors, the composition of a counting query and an obfuscating mechanism is $\varepsilon$ -DP as a whole if the mechanism on its own (i.e. without the query, acting ``obliviously'' on $x{=}q(b)$ ) is $\varepsilon$ -Lipschitz. That is why for counting queries we can concentrate on the mechanisms alone (whose type is ${\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}{\cal Y}$ ) rather than including the databases and their type $\mathbb{B}\kern-0.29999ptR$ in our analysis.

II-C Prior knowledge, open-source and the observer

Although the database contents are not (generally) known, often the distribution of its query results is known: this is ``prior knowledge'', where e.g. it is known that a database of heights in the Netherlands is likely to contain higher values than a similar database in other countries — and that knowledge is different from the (unknown) fact we are trying to protect, i.e. whether a particular person's height is in that database.

We abstract from prior knowledge of the database by concentrating instead on the prior knowledge $\pi$ of the distribution ${\cal X}$ of raw queries, the inputs $x$ to the mechanism, that is induced as the push-forward of the query-function (an ``open source'' aggregating function) over the known distribution of possible databases themselves. Knowing $\pi$ on the input in ${\cal X}$ the observer can use her knowledge of the mechanism (also open source) to deduce a distribution on the output observations in ${\cal Y}$ that will result from applying it and –further– she can also deduce a posterior distribution on ${\cal X}$ based on any particular $y$ in ${\cal Y}$ that she observes.

II-D The Geometric mechanism is $\varepsilon$ -DP for $\mbox{\sc\small d}_{H}$

II-D1 Specialising to $\mbox{\sc\small d}_{H}$

Recall from §II-B that $\mbox{\sc\small d}_{H}$ , the Hamming distance, is what is typically used for counting queries. In that case we see as follows from (2) that the Geometric mechanism $G$ can be made $\varepsilon$ -DP.

The Geometric distribution centred on 0 with parameter $\alpha$ assigns (discrete) probability

G_{\alpha}(n):=\quad\nicefrac{{1-\alpha}}{{1+\alpha}}\mathbin{\cdot}\alpha^{|n|}

(5)

to any integer $n$ (positive or negative) [1]. It implements an $\varepsilon$ -DP Geometric mechanism by obfuscating the query according to (5) above: thus set $\alpha:=e^{-\varepsilon}$ and define

G^{\varepsilon}(n)(n^{\prime}):=G_{\alpha}(n^{\prime}{-}n)=\nicefrac{{1-\alpha}}{{1+\alpha}}\mathbin{\cdot}\alpha^{|n^{\prime}-n|}

(6)

to be the probability that integer $n$ is input and $n^{\prime}$ is output. Thus applied to some $n$ , the effect of $G^{\varepsilon}$ with $\varepsilon:=-\ln\alpha$ ³³3The $\alpha$ in $G_{\alpha}$ is ${<}1$ , so $\varepsilon{>}0$ . is to leave $n$ as it is with probability $\nicefrac{{1-\alpha}}{{1+\alpha}}$ and to split the remaining probability $\nicefrac{{2\alpha}}{{1-\alpha}}$ equally between adding 1's or subtracting them: $G^{\varepsilon}$ continues (in the same direction) with repeated probability $\alpha$ until, with probability $1{-}\alpha$ , it stops.

As explained in §II-C, we now concentrate on $G^{\varepsilon}$ alone and how it perturbs its input (a query result), i.e. no longer considering the database from which the query came. ⁴⁴4Note that although the (raw) query is output from the database, it is input to the obfuscating mechanism. That is why we refer to ${\cal X}$ as “input”.

II-D2 The geometric mechanism truncated

In (6) the mechanism $G^{\varepsilon}$ can effect arbitrary large perturbations. But in practice its output is constrained (in the discrete case) to a finite set $(0..N)$ by (re-)assigning all probabilities for negative observations to observation 0, and all probabilities for ${\geq}N$ observations to $N$ . For example with $e^{-\varepsilon}{=}\alpha{=}\nicefrac{{1}}{{2}}$ and restricting to $(0..2)$ we have $G^{\varepsilon}(0)(0)=\cdots{+}\nicefrac{{1}}{{12}}{+}\nicefrac{{1}}{{6}}{+}\nicefrac{{1}}{{3}}=\nicefrac{{2}}{{3}}$ and $G^{\varepsilon}(0)(1)=\nicefrac{{1}}{{6}}$ and $G^{\varepsilon}(0)(2)=\nicefrac{{1}}{{12}}{+}\nicefrac{{1}}{{24}}{+}\cdots=\nicefrac{{1}}{{6}}$ . It can be shown [5] however that truncation makes no difference to our results, and so from here on we will assume that truncation has been applied to $G^{\varepsilon}$ .

II-E Discrete optimality

It has been shown [1] that when ${\cal X}$ is discrete (and hence the prior $\pi$ on ${\cal X}$ is also), and when the obfuscation is via $G^{\varepsilon}$ , and when the observer applies a ``loss function'' $\ell(w,x)$ of her choice to monetise in ${\mathbb{R}}^{\geq}$ the loss of utility to her if the raw query was $x$ but she assumes it was $w$ , then any other $\varepsilon$ -DP mechanism $M^{\varepsilon}$ acting on ${\cal X}$ can only lose more utility (on average) according to that $\pi$ and $\ell$ than $G^{\varepsilon}$ does. That is, the $\varepsilon$ -DP Geometric mechanism is optimal for minimising loss (maximising utility) over all priors $\pi$ and all (``legal'') loss functions $\ell$ under a mandated $\varepsilon$ -DP obfuscation. A loss function is said to be legal if it is monotone (increasing) wrt. to the difference between the guess ( $w$ ) and the actual value $x$ of the query. As explained in [1] this means that the loss $\ell(w,x)$ takes the form of a function $m(|w-x|,x)$ , which must be monotone (increasing) in its first argument.

II-F The geometric mechanism is never $\varepsilon$ -DP
on dense continuous inputs, e.g. when d on ${\cal X}$ is Euclidean

If the input metric for $G$ is not the Hamming distance $\mbox{\sc\small d}_{H}$ , e.g. when the $G$ 's input ${\cal X}$ is continuous, still $G$ 's output remains discrete, taking some number of steps, each of fixed length say $\lambda{>}0$ , in either direction. That is, any $G$ input $x$ is perturbed to $x{+}i\lambda$ for some integer $i$ .

Now because ${\cal X}$ is continuous and dense, we can vary the input $x$ itself, by a tiny amount, to some $x^{\prime}$ so that $\mbox{\sc\small d}(x,x^{\prime})\,{<}\,\lambda$ no matter how small $\lambda$ might be, producing perturbations $x^{\prime}{+}i\lambda$ each of which is distant that same (constant) $\mbox{\sc\small d}(x,x^{\prime})$ from the original $x{+}i\lambda$ and, precisely because $\mbox{\sc\small d}(x,x^{\prime})\,{<}\,\lambda$ , those new perturbations cannot overlap the ones based on the original $x$ .

Thus the two distributions produced by $G$ acting on $x$ and on $x^{\prime}$ have supports that do not intersect at all. And therefore the $\,{\textsf{\small d}_{D}}$ distance between the two distributions is infinite, meaning that $G$ cannot be be $\varepsilon$ -DP for any (finite) $\varepsilon$ . That is, for a database producing truly real query results ${\cal X}$ , a standard (discrete) $G$ cannot establish $\varepsilon$ -DP for any $\varepsilon$ , however large $\varepsilon$ might be.

There are two possible solutions. The first solution, both obvious and practical, is to ``discretise'' the input and to scale appropriately: a person's height of $1.75\textrm{m}$ would become $175\textrm{cm}$ instead. A second solution however is motivated by taking a more theoretical approach. Rather than discretise the type of the query results, we leave it continuous — and seek our optimal mechanism among those that –unlike the Geometric– do not take only discrete steps. It will turn out to be the Laplace distribution.

II-G Our result — continuous optimality

In the discrete case typically the set ${\cal X}$ of raw queries is $(0..N)$ for some $N{\geq}0$ , and the prior knowledge $\pi$ is a (discrete) distribution on that. For our continuous setting we will use ${\cal X}{=}[0,1]$ for raw queries, the unit interval ${\cal U}$ , and the discrete distribution $\pi$ will become a proper measure on $[0,1]$ expressed as a probability density function. The $\varepsilon$ -DP obfuscating mechanisms, now $K^{\varepsilon}$ for ``kontinuous'', will take a raw query $x$ from a continuous set ${\cal X}$ rather than a discrete one. And the metric on ${\cal X}{=}{\kern 1.00006pt\cal U}$ will be Euclidean.

Our (continuous) optimality result formalised at Thm. 5 is that $\varepsilon$ -DP Laplace mechanism $L^{\varepsilon}$ minimises loss over all continuous priors $\pi$ on ${\cal X}{=}{\kern 1.00006pt\cal U}$ and all legal loss functions $\ell$ under a mandated $\varepsilon$ -DP obfuscation with respect to the Euclidean metric on the continuous input ${\cal X}{=}[0,1]$ .

The theorem requires that all mechanisms satisfy (2) with d the Euclidean distance on continuous inputs. We write $\varepsilon$ -DP for such mechanisms. The argument in §II-F above shows therefore that Geometric mechanisms are no longer suitable (for optimality) because on continuous ${\cal X}$ they are no longer $\varepsilon$ -DP.

III An outline of the proof

We access the existing discrete results in §I-A, and §II-E from within the continuous ${\cal U}$ by ``pixelating'' it, that is defining ${\cal U}_{N}{=}\{0,\nicefrac{{1}}{{N}},\nicefrac{{2}}{{N}},\ldots,\nicefrac{{N-1}}{{N}},1\}$ for integer $N{>}0$ , and mapping $(0.N)$ isomorphically onto that discrete subset. We then establish near optimality for a similarly pixelated Laplace mechanism, showing that ``near'' becomes ``equal to'' when $N$ tends to infinity. In more detail:

(a)

(We show in §VI-B that) Any (discrete) prior on ${\cal U}_{N}$ corresponds to some prior on the original ${\cal U}$ , but can also be obtained by pixelating some continuous prior $\pi$ on all of ${\cal U}$ , concentrating its (now discrete) probabilities onto elements of ${\cal U}_{N}$ only: e.g. the probability $\pi[\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}})$ of the entire $\nicefrac{{1}}{{N}}$ -sized interval is moved onto the point $\nicefrac{{n}}{{N}}$ . We write it $\pi_{N}$ .
(b)

(§VI-C) Any function $f$ acting on all of ${\cal U}$ can be made into an $N$ -step function by first restricting its inputs to ${\cal U}_{N}$ and then filling in the ``missing'' values $f(x)$ for $x$ in $(\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}})$ by copying the value for $f(\nicefrac{{n}}{{N}})$ . If $f$ is an $\varepsilon$ -DP mechanism $K^{\varepsilon}$ , we write its $N$ -stepped version as $K^{\varepsilon}_{N}$ , and note that $K^{\varepsilon}_{N}$ remains $\varepsilon$ -DP when restricted to the points in ${\cal U}_{N}$ only.

If $f$ is a loss function on ( ${\cal W}$ and) ${\cal X}$ we write $\ell_{N}$ for its stepped version.
(c)

(§VII-C; Lem. 10; Lem. 16) Now for any $N$ , mechanism $K^{\varepsilon}_{N}$ , prior $\pi_{N}$ , and legal $N$ -step loss function $\ell_{N}$ we can appeal to the discrete optimality result: for the pixelated prior $\pi_{N}$ and the $N$ -step and legal $\ell_{N}$ the loss due to $G^{\varepsilon}_{N}$ is $\leq$ the loss due to $K^{\varepsilon}_{N}$ .
(d)

(§VII-B; Thm. 13) The replacement of $G^{\varepsilon}_{N}$ by $L^{\varepsilon}_{N}$ (both $N$ -step functions on $[0,1]$ ) is via pixelating the output (continuous) distribution of $L^{\varepsilon}_{N}$ to a multiple $T$ of $N$ : we write that ${}^{T}L^{\varepsilon}_{N}$ . The Kantorovich-Rubinstein Theorem, provided additionally that $\ell_{N}$ is $p$ -Lipschitz for some $p{>}0$ independent of $N$ , shows that the (additive) difference between the $G^{\varepsilon}_{N}$ -loss and the ${}^{T}L^{\varepsilon}_{N}$ -loss, for any $\pi_{N}$ and $\ell_{N}$ and $T$ a multiple of $N$ , tends to zero as $N$ increases.
(e)

(§VI-D) Then we remove the subscript $\pi_{N}$ on the prior, and on the mechanisms $K^{\varepsilon}_{N}$ and ${}^{T}L^{\varepsilon}_{N}$ , relying now on the $\varepsilon$ -DP of the two mechanisms to make the (multiplicative) ratio between the losses they cause tend to 1.
(f)

(§VIII) The final step, removing the subscript _N from $\ell_{N}$ , is that the loss-calculating procedure is continuous and that $\ell_{N}$ tends to $\ell$ as $N$ tends to infinity.

IV Channels; loss functions; hyper-distributions; refinement

In this section we provide a summary of the more general Quantitative Information Flow techniques that we will need for the subsequent development.

IV-A Channels, priors, marginals, posteriors

The standard treatment of information flow is via Shannon's (unreliable) channels: they take an input $x$ from say ${\cal X}$ and deliver an output that for a perfect channel will be $x$ again, but for an imperfect channel might be some other $x^{\prime}$ in ${\cal X}$ instead. For example, an imperfect channel transmitting bits might ``flip' input bits so that with probability say $\nicefrac{{1}}{{4}}$ an input 0 becomes an output 1 and vice versa [6]. In the discrete case, and generalising to allow outputs of possibly a different type ${\cal Y}$ , such channels are ${\cal X}{\times}{\cal Y}$ matrices $C$ whose row- $x$ , column- $y$ element $C_{x,y}$ is the probability that input $x$ will produce output $y$ . A perfect channel would be the identity matrix on ${\cal X}{{\times}}{\cal X}$ ; a completely broken channel on ${\cal X}{{\times}}{\cal Y}$ for any ${\cal Y}$ would have $C_{x,y}=\nicefrac{{1}}{{\#{\cal Y}}}$ .

The $x$ -th row of a (channel) matrix $C$ is $C_{x,-}$ ; and the $y$ -th column is $C_{-,y}$ . Since each row sums to 1 (making $C$ a stochastic matrix), the row $C_{x,-}$ determines a discrete distribution in ${\mathbb{D}}{\cal Y}$ ; for the ``broken'' channel it would be the uniform distribution, which we write $\odot$ .

As a matrix, a channel has type ${\cal X}{{\times}}{\cal Y}\,{\mathbin{\rightarrow}}\,{\mathbb{R}}$ (but with 1-summing rows); isomorphically it also has type ${\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}{\cal Y}$ . We'll write ${\cal X}{\mathbin{\rightarrowtriangle}}{\cal Y}$ for both, provided context makes it clear which one we are using.

If a prior distribution $\pi{:}\,{\mathbb{D}}{\cal X}$ on ${\cal X}$ is known, then the channel $C$ can be applied to $\pi$ to create a joint distribution $J$ in ${\mathbb{D}}({\cal X}{\times}{\cal Y})$ on both input and output together, written $\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptC$ and where $J_{x,y}\,{:=}\;\pi_{x}C_{x,y}$ . For that $J$ , the left-marginal $\sum_{y}J_{x,y}$ gives the prior $\pi$ again (no matter what $C$ might be), i.e. the probability that the input was $x$ — thus $\pi_{x}\,{=}\,J_{x,\Sigma}$ if we use that notation for the marginal. The right marginal $J_{\Sigma,y}$ is the probability that the output is $y$ , given both $\pi$ and $C$ .

The $y$ -posterior distribution on ${\cal X}$ , given $\pi,C$ and a particular $y$ , is the conditional distribution on ${\cal X}$ if that $y$ was output: it is the $y$ -th column divided by the marginal probability of that $y$ , that is $J_{-,y}/J_{\Sigma,y}$ (provided the marginal is not zero).

If we fix $\pi$ and $C$ , and use the conventional abbreviation $p_{XY}$ for the resulting joint distribution $(\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptC)$ , then the usual notations for the above are $p_{X}$ for left marginal $({=}\,\pi)$ and $p_{X}(x)$ for its value $\pi_{x}$ at a particular $x$ , with $p_{Y}$ and $p_{Y}(y)$ similarly for the right marginal. Then $p_{X|y}(x)$ is the posterior probability of the original observation's being $x$ when $y$ has been observed. Further, we can write just $p(x)$ and $p(y)$ and $p(x|y)$ when context makes the (missing) subscripts clear.

IV-B Loss functions; remapping

Our obfuscating mechanisms $M$ and $G^{\varepsilon}$ are channels like $C$ in the discrete case — the result of the query is the channel's input $x$ , and the (perturbed) value the observer sees is the channel's output $y$ . The loss functions $\ell(w,x)$ will quantify the loss to her of seeing (only) $y$ , and then choosing $w$ , when what she really wants to know is $x$ . Such $\varepsilon$ -DP mechanisms have earlier been modelled this way, i.e. as channels by Alvim et al. [7] and Chatzikokolakis et al. [4], who observed that for $\varepsilon$ -DP the ratios of their entries must satisfy the $\varepsilon$ -DP constraints, because the definition at §II-A(4) reduces to comparing (multiplicatively) adjacent entries in channel columns.

The connection between the observation $y$ and the loss-function parameter $w$ is that the observer does not necessarily have to ``take what she sees'' — there might be good reasons for her making a different choice. For example, in a word-guessing game where the last, obfuscated letter ? in a word SA? is shown on the board, the observer might have to guess what it really is. Even if it looks like a blurry Y (value 4 in Scrabble), she might instead guess X (value 8) because that would earn more points on average if from prior knowledge she knows that X strictly more than half as likely as Y is — i.e. it's worth her taking the risk. Thus rather than mandating that the observer must accept what she thinks the letter is most likely to be, she uses the obfuscated query $y$ to deduce information about the whole posterior distribution of the actual query… and might suggest that she guess some $w{\neq}y$ , because the expected loss of doing that is less than (the expected utility is greater than) it would be if she simply accepted the $y$ she saw. That rational strategy is called ``remapping'' [1]. Thus she sees $y$ , but $y$ tells her that $w$ is what she should choose as her least-loss inducing guess for $x$ . That is, the simplest strategy is ``take what you see''; but it might not be the best one. In general (and now using $M$ again for mechanism), we write $\textsf{\small\$}(\pi,M,\ell)$ for the expected loss to a rational observer, given the $\pi,M$ she knows and the loss function $\ell$ she has chosen: it is

\sum_{y}p(y)~{}~{}\textsf{\small min}_{w}\sum_{x}\ell(w,x)\,p(x|y)\makebox[0.0pt][l]{\quad,}

(7)

that is the expected value, over all possible observations $y$ and their marginal probabilities, of the least loss she could rationally achieve over all her possible choices $w$ given the knowledge that $y$ will have provided about the posterior distribution $p(X|y)$ of the actual raw input $x$ . Note that $M$ and $\pi$ determine (from (§IV-A) the $p(y)$ and $p(x|y)$ that appear in (7). We remark that this formulation for measuring expected loss corresponds precisely to the formulation used by Ghosh et al. in the optimality theorem.

IV-C The relevance of hyper-distributions, abstract channels

It is important to remember that the expected-loss formula (7) does not use the actual mechanism-output values $y$ in any way directly: instead it takes the only expected value of what they might be. All that matters is their marginal probabilities $p(y)$ and the a-posteriori distributions $p(X|y)$ that they induce. That allows us to abstract from ${\cal Y}$ altogether.

A hyper-distribution expresses that abstraction: it is a distribution of distributions on ${\cal X}$ alone, that is of type ${\mathbb{D}}{\mathbb{D}}{\cal X}$ ; abbreviate those as ``hyper'' and `` ${\mathbb{D}}^{2}{\cal X}$ ''. Given a joint distribution $J{:}\,{\mathbb{D}}({\cal X}{{\times}}{\cal Y})$ , we write $[J]$ for the hyper-distribution whose support is posterior distributions ⁵⁵5In the hyper-distribution literature these are called “inners” [8]. $p(X|y)$ on $X$ and which assigns the corresponding marginal distribution $p(y)$ to each. (Zero-valued marginals are left out.) We now re-express (7) in those terms.

If we write $\ell(w,-)$ for the function on ${\cal X}$ that $\ell$ determines once $w$ is fixed, and write $\,{\cal E}_{\textsc{\tiny dist}}\,{\textsc{\footnotesize rv}}\,$ for expected value of random-variable rv with distribution dist, then $\textsf{\small min}_{w}\,{\cal E}_{p(X|y)}\,{\ell(w,-)}$ is the inner part of (7). Then fix some $\ell$ and define for general distribution $\delta{:}\,{\mathbb{D}}X$ that

Y_{\ell}(\delta):=\quad\textsf{\small min}_{w}\,{\cal E}_{\delta}\,{\ell(w,-)}\makebox[0.0pt][l]{\quad,}

(8)

(using $Y$ for ``entrop $Y$ '') so that $Y_{\ell}$ is itself a real-valued function on distributions $\delta$ (as e.g. Shannon entropy is). With that preparation, the expression (7) becomes the expected value of $Y_{\ell}$ over the hyper produced by abstracting from $J=\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM$ as above. That is (7) gives equivalently

\textsf{\small\$}(\pi,M,\ell)~{}~{}~{}=~{}~{}~{}{\cal E}_{[\pi\kern 0.70004pt{\triangleright}\kern 0.70004ptM]}\,{Y_{\ell}}\makebox[0.0pt][l]{\quad,}

(9)

in which the $M$ and $\pi$ now explicitly appear and where –we recall– the brackets $[-]$ convert the joint distribution $\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM$ to a hyper. (If $Y_{\ell}$ were in fact Shannon entropy, then (9) would be the conditional Shannon entropy. But $Y_{\ell}$ 's are much more general than Shannon entropy alone [2, 9].)

Finally, using hypers we define an abstract channel to be a function from prior to hyper, i.e. of type ${\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}^{2}{\cal X}$ , realised from some concrete channel $M{:}\,{\cal X}{\mathbin{\rightarrowtriangle}}{\cal Y}$ as $\pi\,{\mapsto}\,[\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM]$ . It is ``abstract'' because the type ${\cal Y}$ no longer appears: it is unnecessary because if $M(\pi)$ is the application of $M$ as a function applied to prior $\pi$ , then from (9) the worst rational expected loss is written simply ${\cal E}_{M(\pi)}\,{Y_{\ell}}$ .

(Recall from §IV-B that this naturally takes into account the ``rational observers'' and the remapping they might perform, as described in [1].)

IV-C1 Example of a channel representation of a mechanism

If we have a discrete input ${\cal X}{:=}\{x_{0},x_{1},x_{2}\}$ , and discrete output ${\cal Y}{:=}\{y_{0},y_{1},y_{2},y_{3},y_{4}\}$ , we can represent an obfuscating mechanism $M$ with the channel $M$ below.

M=\begin{bmatrix}\nicefrac{{2}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}\\ \nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}\\ \nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{2}}{{3}}\end{bmatrix}

As described in §IV-A above, the row $M_{x,-}$ corresponds to the probability distribution of outputs $y$ in ${\cal Y}$ for that $x$ . For example the top left number $\nicefrac{{2}}{{3}}{=}M_{x_{0},y_{0}}$ is the probability that output $y_{0}$ is observed when the input is $x_{0}$ . We can interpret this as an $\varepsilon$ -DP mechanism once we know the metric d on ${\cal X}$ . In particular §II-A(1) simplifies to comparing ratios of entries in the same column, and when we do that we find that for example $\,{\textsf{\small d}_{D}}(M(x_{0}),M(x_{1}))=\ln 4$ . Thus from §II-A(4) now applied to ${\cal X}$ we can say that if $M$ is $\varepsilon$ -DP then $\varepsilon$ satisfies

\,{\textsf{\small d}_{D}}(M(x_{0}),M(x_{1}))~{}~{}~{}=~{}~{}~{}\ln 4~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(x_{0},x_{1})\makebox[0.0pt][l]{\quad.}

IV-C2 Example of a loss function calculation

Now suppose that we choose a loss function known as ``Bayes Risk'', ${\sf br}$ defined on ${\cal X}{:=}\{x_{0},x_{1},x_{2}\}$ as above:

{\sf br}(w,x)~{}~{}~{}:=~{}~{}~{}1~{}\textrm{~{}if~{}}~{}x\neq w~{}\textrm{~{}else~{}}~{}0~{},

where ${\cal W}{:=}{\cal X}$ . Letting the input prior be the uniform distribution $\odot$ over ${\cal X}$ , we can compute the loss $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},M,{\sf br})$ by selecting for each output $y$ , the $w$ which makes the expected value of ${\sf br}(w,-)$ over the posterior $p_{X|y}$ the least. We then take the expected value of these least values over the marginal $p_{Y}$ . For $y_{0}$ for example, that least expected value occurs at $w{=}x_{0}$ , and for $y_{1}$ it occurs either for $w{=}x_{0}$ or $w{=}x_{0}$ . Overall the total expected loss is $\nicefrac{{1}}{{3}}$ .

IV-D Refinement of hypers and mechanisms

The hypers ${\mathbb{D}}^{2}{\cal X}$ on ${\cal X}$ have a partial order $(\mathrel{\sqsubseteq})$ ``refinement'' [10] that we will need in the proof of our main result. It admits several equivalent interpretations in this context. Below, we write $\Delta$ etc. for general hypers in ${\mathbb{D}}^{2}{\cal X}$ .

We have that $\Delta\,{\mathrel{\sqsubseteq}}\,\Delta^{\prime}$ , that hyper $\Delta$ is refined by hyper $\Delta^{\prime}$ , under any of these equivalent conditions:

(a)

when ${\cal E}_{\Delta}\,{Y_{\ell}}\leq{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}$ for all loss functions $\ell$ (i.e. whether legal or not).
(b)

when considered as distributions on posteriors ${\mathbb{D}}{\cal X}$ it is possible to convert $\Delta$ into $\Delta^{\prime}$ via a Wasserstein-style ``earth move'' of probability from one posterior to another [11, 12, 8].
(c)

when generated from joint-distribution matrices $D$ in ${\mathbb{D}}({\cal X}{\times}{\cal Y})$ generating $\Delta$ , and $D^{\prime}$ in ${\mathbb{D}}({\cal X}{\times}{\cal Y}^{\prime})$ generating $\Delta^{\prime}$ , there is a ``post-processing matrix'' $R$ of type ${\cal Y}{\mathbin{\rightarrowtriangle}}{\cal Y}^{\prime}$ such that as matrices we have $D{\cdot}R=D^{\prime}$ via matrix multiplication.

And we say that one mechanism $M$ is refined by another $M^{\prime}$ just when $[\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM]\,{\mathrel{\sqsubseteq}}\,[\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM^{\prime}]$ for all priors $\pi$ . When this occurs we also write $M\mathrel{\sqsubseteq}M^{\prime}$ . From formulation (a) we will use the fact that the $(\mathrel{\sqsubseteq})$ -infimum of the ${}^{T}L^{\varepsilon}_{N}$ 's (indexed over a sequence of $T$ 's) is just $L^{\varepsilon}$ itself [13] and [15, Lem. 20, Appendix §B].

Formulation (b) is particularly useful. If we find a specific earth move from $\Delta$ to $\Delta^{\prime}$ that defines a refinement we can then use the equivalent (a) to deduce that ${\cal E}_{\Delta}\,{Y_{\ell}}\leq{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}$ . However if we can also compute the cost ⁶⁶6The cost is determined by the amount of “earth” to be moved, and the distance it must be moved. See for example [14]. of the particular earth move we can conclude in addition that the difference $|{\cal E}_{\Delta}\,{Y_{\ell}}-{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}|$ must be bounded above by an amount we can compute. This follows from the well-known Kantorovich-Rubinstein duality [11] which says that $|{\cal E}_{\Delta}\,{Y_{\ell}}-{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}|$ is no more than minimal cost incurred by any earth move transforming $\Delta$ to $\Delta^{\prime}$ scaled by the ``Lipschitz constant'' ⁷⁷7The Lipschitz constant of a function is the amount by which the difference in outputs can vary when compared to the difference in inputs. of $Y_{\ell}$ . We use these ideas in Lem. 11 and Thm. 13.

V Measures on continuous ${\cal X}$ and ${\cal Y}$

V-A Measures via probability density functions

Continuous analogues of the $\pi$ , $M$ and $\ell$ will be our principal concern here: ultimately we we will use ${\mathbb{M}}[0,1]$ for our measurable spaces ${\cal X}$ and ${\cal Y}$ , and will suppose for simplicity that ${\cal X}{=}{\cal Y}{=}[0,1]$ . (More generality is achieved by simple scaling).

Measures ${\mathbb{M}}[0,1]$ (that is ${\mathbb{M}}{\cal X}$ and ${\mathbb{M}}{\cal Y}$ ) will be given as probability density functions, where a pdf say $\mu{:}\,[0,1]{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq}$ determines the probability $\int^{b}_{a}\mu$ assigned to the sample $[a,b]{\subseteq}[0,1]$ using the standard Borel measure on $[0,1]$ , and more generally the expected value of some random variable $V$ on $[a,b)$ given by pdf $\mu$ is $\int_{a}^{b-}\mu(x)V(x)\kern 1.00006pt{\rm d}x$ .

Even though $\mu$ is of type pdf, we abuse notation to write for example $\mu[a,b)$ for the probability $\int_{a}^{b-}\kern-7.5pt\mu$ that $\mu$ assigns to that interval, and $\mu_{a}$ for the probability $\mu$ assigns to the point $a$ alone, i.e. some $r$ just when when the actual pdf-value of $\mu(a)$ is the Dirac delta-function scaled by $r$ , written $\textrm{\boldmath$\delta$}_{r}$ .

V-B Continuous mechanisms over continuous priors

Our mechanisms $M$ , up to now discrete, will now become ``kontinuous'', renamed $K$ as a mnemonic. Thus a continuous mechanism $K{:}\,{\cal X}{\mathbin{\rightarrow}}{\mathbb{M}}{\cal Y}$ given input $x$ produces measure $K(x)$ on the observations ${\cal Y}{=}[0,1]$ . And given a a whole continuous prior $\pi{:}\,{\mathbb{M}}[0,1]$ , that same $K$ therefore determines a joint measure over ${\cal X}{\times}{\cal Y}$ . ⁸⁸8See [15, Appendix §A2].By analogy with (8,9) we have

Definition 2

Continuous version of (7) The expected loss $\textsf{\small\$}(\pi,K,\ell)$ due to continuous prior $\pi$ , continuous mechanism $K$ and loss function $\ell$ is given by ⁹⁹9This is well defined whenever the ${\cal W}$ -indexed family of functions of $y$ given by $\kern-1.99997pt\int^{1}_{0}\ell(w,x)\pi(x)K(x)(y)~{}\kern 1.00006pt{\rm d}x$ contains a countable subset ${\cal W}^{\prime}$ such that the inf over ${\cal W}$ is equal to the inf over ${\cal W}^{\prime}$ [16]. This is clear if ${\cal W}$ is finite, and whenever ${\cal W}^{\prime}$ can be taken to be the rationals.

\int^{1}_{0}(\,\textsf{\small inf}_{w}~{}(\kern-1.99997pt\int^{1}_{0}\ell(w,x)\pi(x)K(x)(y)~{}\kern 1.00006pt{\rm d}x)\,)\kern 1.00006pt{\rm d}y\,.

(10)

The continuous version of uncertainty (8) is now

Y_{\ell}(\delta):=\quad\textsf{\small inf}_{w{:}\,{\cal W}}\int_{0}^{1}\kern-3.00003pt\ell(w,x)\delta(x)\kern 1.00006pt{\rm d}x

and the continuous version of expected loss (9) is now

\textsf{\small\$}(\pi,K,\ell)~{}~{}~{}=~{}~{}~{}\int_{y{:}\,{\cal Y}}\kern-5.0ptY_{\ell}\,\kern 1.00006pt{\rm d}{\mbox{\small$K(\pi)$}}\makebox[0.0pt][l]{\quad.}

V-C The truncated Laplace mechanism

As for the Geometric mechanism, the Laplace mechanism is based on the Laplace distribution. It is defined as follows:

Definition 3

(Laplace distribution) The $\varepsilon$ -Laplace mechanism with with input $x$ in ${\cal X}{=}[0,1]$ and probability density in ${\mathbb{R}}^{\geq}$ for output $y$ in ${\cal Y}$ is usually written as a pdf in $y$ (for given $x$ ) as [17]

{{\textit{L}^{\varepsilon}}}(x,y):=\quad\varepsilon/2\cdot e^{-\varepsilon|y{-}x|}\makebox[0.0pt][l]{\quad.}

The $\varepsilon/2$ is a normalising factor. It is known [4] that the mechanism ${{\textit{L}^{\varepsilon}}}$ satisfies $\varepsilon$ -DP over $[0,1]$ (where the underlying metric on ${\cal X}$ is Euclidean). Just as for the Geometric mechanism we truncate ${{\textit{L}^{\varepsilon}}}$ 's outputs so that they also lie inside ${\cal U}$ . We do so in the same manner, by remapping all outputs greater than $1$ to $1$ , and all outputs less than $0$ to $0$ .

Definition 4

(truncated Laplace mechanism) As earlier for $G^{\varepsilon}$ , we truncate the Laplace mechanism ${{\textit{L}^{\varepsilon}}}$ to $\textit{\L}^{\varepsilon}$ for inputs restricted to $[0,1]$ , and output restricted to $[0,1]$ , in the following way (as a pdf):

\begin{array}[]{llll}\textit{\L}^{\varepsilon}(x)(y)&:=&\textrm{\boldmath$\delta$}_{a}&\textit{if $y{=}0$}\\ &&{{\textit{L}^{\varepsilon}}}(x,y)&\textit{if $0{<}y{<}1$}\\ &&\textrm{\boldmath$\delta$}_{b}&\textit{if $y{=}1$}\makebox[0.0pt][l]{\quad,}\end{array}

where the constants $a,b$ are $\int_{-\infty}^{0}{{\textit{L}^{\varepsilon}}}(x,y)\kern 1.00006pt{\rm d}y=e^{\varepsilon x}/2$ and $\int_{1}^{\infty}{{\textit{L}^{\varepsilon}}}(x,y)\kern 1.00006pt{\rm d}y=e^{\varepsilon(1-x)}/2$ respectively, and $\textrm{\boldmath$\delta$}_{r}$ is the Dirac delta-function with weight $r$ .

We can now state our principal contribution. It is to show that the truncated Laplace $\textit{\L}^{\varepsilon}$ is universally optimal, in this continuous setting, in the same way that $G^{\varepsilon}$ was optimal in the discrete setting:

Theorem 5

(truncated Laplace is optimal) Let $K^{\varepsilon}$ be any continuous $\varepsilon$ -DP mechanism with input and output both $[0,1]$ , and let $\pi$ be any continuous (prior) probability distribution over $[0,1]$ and $\ell$ any Lipschitz continuous ¹⁰¹⁰10Lipschitz continuous is less general than continuous. It means that the difference in outputs is within a constant $\kappa{>}0$ scaling factor of the difference between the inputs., legal loss function on ${\cal X}{=}{\kern 1.00006pt\cal U}$ .

Then $\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)\leq\textsf{\small\$}(\pi,K,\ell)$ .

As we foreshadowed in the proof outline in §III, Thm. 5 relies ultimately on the earlier-proven optimality $G^{\varepsilon}$ in the discrete case: we must show how we can approximate continuous $\varepsilon$ -DP mechanisms in discrete form, each one satisfying the conditions under which the earlier result applies, and in §VI we fill in the details. Along the way we show how the Laplace mechanism provides a smooth approximation to the Geometric- with discrete inputs.

VI Approximating Continuity for ${\cal X}$

VI-A Connecting continuous and discrete

Our principal tool for connecting the discrete and continuous settings is the evenly-spaced discrete subset ${\cal U}_{N}=\{0,\nicefrac{{1}}{{N}},\nicefrac{{2}}{{N}}\ldots,\nicefrac{{N-1}}{{N}},1\}$ of the unit interval ${\cal U}{=}[0,1]$ for ever-increasing $N{>}0$ .

The separation $\nicefrac{{1}}{{N}}$ is the interval width.

VI-B Approximations of continuous priors

The $N$ -approximation of prior $\pi{:}\,{\mathbb{M}}{\kern 1.00006pt\cal U}$ of type ${\mathbb{D}}{\kern 1.00006pt\cal U}_{N}$ , i.e. yielding actual probabilities (not densities), and is defined

\pi_{N}(n/N):=\quad\begin{array}[t]{lp{10em}}\pi[\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}})&if $n\,{<}\,N{-}1$\\ \pi[\nicefrac{{n}}{{N}},1]&if $n\,{=}\,N{-}1$\\ 0&otherwise \makebox[0.0pt][l]{\quad.}\end{array}

The discrete $\pi_{N}$ gathers each of the continuous $\pi$ -interval's measure onto its left point, with as a special case [1,1] from $\pi$ included onto the point $\nicefrac{{N-1}}{{N}}$ of $\pi_{N}$ .

As an example take $N$ to be $2$ , and $\pi$ to be the uniform (continuous) distribution over ${\cal U}$ , which can be represented by the constant $1$ pdf. Since the interval width is $\nicefrac{{1}}{{2}}$ , we see that $\pi_{N}$ assigns probability $\nicefrac{{1}}{{2}}$ to both $0$ and $\nicefrac{{1}}{{2}}$ and zero to all other points in ${\cal U}$ .

VI-C $N$ -step mechanisms and loss functions

In the other direction, we can lift discrete $M$ and loss-function $\ell$ on ${\cal U}_{N}$ into the continuous ${\cal X}{=}{\kern 1.00006pt\cal U}$ by replicating their values for the $x$ 's not in ${\kern 1.00006pt\cal U}_{N}$ in a way that constructs $N$ -step functions: we have

Definition 6

For $x$ in ${\cal U}{=}[0,1]$ define $\lfloor x\rfloor_{N}:=\lfloor Nx\rfloor/N$ .

Definition 7

Given mechanism $M{:}\,{\kern 1.00006pt\cal U}_{N}{\mathbin{\rightarrowtriangle}{\cal Y}}$ , define $M_{N}{:}\,[0,1]{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq}$ so that

M_{N}(x):=\quad\begin{array}[t]{lp{10em}}M(\lfloor x\rfloor_{N})&if $0{\leq}x{<}1$\\ M(\nicefrac{{N-1}}{{N}})&if $x{=}1$ \makebox[0.0pt][l]{\quad.}\end{array}

Note that we have not yet committed here to whether $M$ produces discrete or continuous distributions on its output ${\cal Y}$ . We are concentrating only on its input (from ${\cal X}$ ).

Similarly, given loss function $\ell{:}\,{\cal W}{\times}{\cal U}_{N},{\mathbin{\rightarrow}}\,{\mathbb{R}}^{\geq}$ , define $\ell_{N}{:}\,{\cal W}{\times}[0,1]\,{\mathbin{\rightarrow}}\,{\mathbb{R}}^{\geq}$ so that

\ell_{N}(w,x):=\quad\begin{array}[t]{lp{10em}}\ell(w,\lfloor x\rfloor_{N})&if $0{\leq}x{<}1$\\ \ell(w,\nicefrac{{N-1}}{{N}})&if $x{=}1$ \makebox[0.0pt][l]{\quad.}\end{array}

Say that mechanisms and loss functions over $[0,1]$ are $N$ -step functions just when they are constructed as above.

The important property enabled by the above definitions is the correspondence between loss functions' values in their pixelated and original versions, which will allow us to apply the earlier discrete-optimality result, based on Lem. 9 to come. That is, we have

Lemma 8

For any continuous prior $\pi$ in ${\mathbb{M}}{\kern 1.00006pt\cal U}$ , mechanism $M$ in ${\cal U}{\mathbin{\rightarrowtriangle}}{\cal Y}$ and loss function $\ell$ in ${\cal W}{\times}{\cal U}{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq}$ we have

\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi,M_{N},\ell_{N})$}}^{\textit{\small continuous ${\cal X}$}}~{}~{}~{}=~{}~{}~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},M,\ell)$}}^{\textit{\small discrete ${\cal X}$}}\makebox[0.0pt][l]{\quad.}

That is, the loss realised via a pixelated $\pi_{N}$ , and (already discrete) $M$ and $\ell$ , all operating on ${\cal U}_{N}$ , is the same as the loss realised via the original continuous $\pi$ and the lifted (and thus $N$ -step) mechanism $M_{N}$ and $\ell_{N}$ , now operating over all of ${\cal X}{=}{\kern 1.00006pt\cal U}$ .

Proof:

We interpret the losses using Def. 2, focussing on the integrand of the inner integral. Note that we can split it up into a finite sum of integrals of the form $\int_{\nicefrac{{n}}{{N}}}^{\nicefrac{{n{+}1}}{{N}}-}\pi(x)V(x)\kern 1.00006pt{\rm d}x$ . When we do that for the left-hand formula $\textsf{\small\$}(\pi,M_{N},\ell_{N})$ we see that throughout the interval $[\nicefrac{{n}}{{N}},\nicefrac{{n{+}1}}{{N}})$ the contribution of the mechanism and the loss is constant, i.e. $M_{N}(x)(y)\,{\mathbin{\cdot}}\,\ell_{N}(w,x)=M(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}})$ . This means the integral becomes

M(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}})\,{\mathbin{\cdot}}\,\int_{\nicefrac{{n}}{{N}}}^{\nicefrac{{n{+}1}}{{N}}-}\pi(x)\kern 1.00006pt{\rm d}x

which is equal to $M(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}})\,{\mathbin{\cdot}}\,\pi_{N}(n/N)$ . A similar argument applies to the last interval (which includes $1$ ), compensated for by the definitions of $\ell_{N}$ and $M_{N}$ to take their corresponding values from $\nicefrac{{{1{-}N}}}{{N}}$ .

Looking now at the right-hand formula, $\textsf{\small\$}(\pi_{N},M,\ell)$ we see that it is now exactly the finite sum of the integrals just described. ∎

VI-D Approximating continuous $\varepsilon$ -DP mechanisms

The techniques above give good discrete approximations for continuous-input $\varepsilon$ -DP mechanisms $M$ acting on continuous priors simply by considering $M_{N}$ 's for increasing $N$ 's, using §VI-C. As a convenient abuse of notation, when we start with a continuous-input mechanism $M$ on $[0,1]$ we write $M_{N}$ to mean the $N$ -step mechanism that is made by first restricting $M$ to the subset ${\cal U}_{N}$ of $[0,1]$ and then lifting that restriction ``back again'' as in Def. 7, effectively converting it into an $N$ -step function. When we do this we find that the posterior loss wrt. $N$ -step loss functions can be bounded above and below by using pixelated priors and $N$ -stepped mechanisms.

Lemma 9

Let $K$ be a continuous-input $\varepsilon$ -DP mechanism, and $\pi$ in ${\mathbb{M}}[0,1]$ a continuous prior and $\ell$ a (non-negative) $N$ -step function. Then the following inequalities hold:

e^{-\frac{\varepsilon}{N}}\cdot\,\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},K_{N},\ell)$}}^{\it discrete{\cal X}}~{}\leq~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi,K,\ell)$}}^{\it continuous{\cal X}}~{}\leq~{}~{}e^{\frac{\varepsilon}{N}}\cdot\,\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},K_{N},\ell)$}}^{\it discrete{\cal X}}\,.

(Notice that the middle formula $\textsf{\small\$}(\pi,K,\ell)$ , the mechanism $K$ is not $N$ -stepped, but in the formulae on either side they are as in Lem. 8.)

Proof:

The proof is as for Lem. 8, but noting also that $K$ 's being $\varepsilon$ -DP implies that for all $N$ we have $K(\lfloor x\rfloor_{N})(y){\times}e^{-\frac{\varepsilon}{N}}\leq K(x)(y)\leq K(\lfloor x\rfloor_{N})(y){\times}e^{\frac{\varepsilon}{N}}$ . ¹¹¹¹11Here we are using the $\varepsilon$ -DP-constraints applied to the pdf $K(x)(y)$ . ∎

With Lem. 8 and Lem. 9 we can study optimality of $\textit{\L}^{\varepsilon}$ on finite discrete inputs ${\cal U}_{N}$ . We will see that, although Geometric mechanisms are still optimal for the (effectively) discrete inputs ${\cal U}_{N}$ , the Laplace mechanism provides increasingly good approximate optimality for ${\cal U}_{N}$ as $N$ increases, and is in fact (truly) optimal in the limit.

VII The Laplace and Geometric mechanisms

In this section we make precise the restriction of the Geometric mechanism $G^{\varepsilon}$ (§II-D1) to inputs and outputs both in ${\cal U}_{N}$ (a subset of $[0,1]$ ): for both $x,y$ in ${\cal U}_{N}$ we define

\overbrace{\raisebox{0.0pt}[10.76385pt]{$G_{N}^{\varepsilon}(x)(y)$}}^{\textrm{\small on ${\cal U}_{N}$}}~{}~{}~{}:=~{}~{}~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$G^{\frac{\varepsilon}{N}}(Nx)(Ny)$}}^{\textrm{\small on $(0..N)$}}\makebox[0.0pt][l]{\quad.}

(11)

As an illustration, we take $\varepsilon{=}2\ln 4$ and input ${\cal X}{=}{\kern 1.00006pt\cal U}_{2}$ , in which the 2 comes from ${\cal U}_{2}$ and the $\ln 4$ comes from the $\alpha{=}\nicefrac{{1}}{{4}}$ of the Geometric distribution used to make the mechanism $G^{\varepsilon}$ . Using the three points $0,\nicefrac{{1}}{{2}}$ and $1$ of the input, we compute the truncated geometric mechanism $G_{2}^{\varepsilon}$ as the channel below, where the rows' labels are (invisibly) the inputs ${\cal U}_{2}$ , and the columns are similarly labelled by the outputs (also ${\cal U}_{2}$ in this case). This means that if the input was $0$ , then the output (after truncation) will be $0$ with probability $\nicefrac{{4}}{{5}}$ , and $1/2$ with probability $\nicefrac{{3}}{{20}}$ etc:

G_{2}^{\varepsilon}=\begin{bmatrix}\nicefrac{{4}}{{5}}&\nicefrac{{3}}{{20}}&\nicefrac{{1}}{{20}}\\ \nicefrac{{1}}{{5}}&\nicefrac{{3}}{{5}}&\nicefrac{{1}}{{5}}\\ \nicefrac{{1}}{{20}}&\nicefrac{{3}}{{20}}&\nicefrac{{4}}{{5}}\end{bmatrix}\makebox[0.0pt][l]{\quad.}

Notice now that the ratio of adjacent probabilities that are in the same column satisfy the $\varepsilon$ -DP constraint, so for example $\nicefrac{{4}}{{5}}\div\nicefrac{{1}}{{5}}=\nicefrac{{3}}{{5}}\div\nicefrac{{3}}{{20}}=4\leq e^{(2\ln 4)/2}$ . Notice also that the distance between adjacent inputs in ${\cal U}_{2}$ under the Euclidean distance is $\nicefrac{{1}}{{2}}$ , not 1 as it would be in the conventional ${\cal X}{=}(0,1,2)$ .

Suppose now that we consider ${\cal U}_{4}$ instead, consisting of the five points $0,\nicefrac{{1}}{{4}},\nicefrac{{1}}{{2}},\nicefrac{{3}}{{4}}$ and $1$ , and we adjust the $\alpha$ in the underlying Geometric distribution $G_{\alpha}$ from §II-D1(5). The $\varepsilon$ -DP parameter $\varepsilon$ , now $4\ln 2$ , is the same as before — and the resulting matrix is

G_{4}^{\varepsilon}=\begin{bmatrix}\nicefrac{{2}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}\\ \nicefrac{{1}}{{3}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{12}}\\ \nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}\\ \nicefrac{{1}}{{12}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{3}}\\ \nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{2}}{{3}}\end{bmatrix}

As before though, the ratio of adjacent probabilities that are in the same column satisfy the $\varepsilon$ -DP-constraint over all of ${\cal U}_{4}$ : now we have $\nicefrac{{2}}{{3}}\div\nicefrac{{1}}{{3}}=\nicefrac{{1}}{{3}}\div\nicefrac{{1}}{{2}}=2\leq e^{(4\ln 2)/4}$ .

This amplifies the explanation in (2) that the $\varepsilon$ -DP constraints over discrete inputs ${\cal U}_{N}$ must take into account the underlying metric on the input space. More generally, whenever we double $N$ in ${\cal U}_{N}$ , the $\alpha$ -parameter must become $\sqrt{\alpha}$ .

At this point, we have enough to be able to appeal to the discrete optimality result, to bound below the losses for continuous mechanisms, provided that the loss $\ell_{N}$ is $N$ -legal, i.e. that its legality obtains at least for the distinct points in ${\cal U}_{N}$ .

Lemma 10

For any continuous prior $\pi$ in ${\mathbb{M}}{\kern 1.00006pt\cal U}$ , $\varepsilon$ -DP-mechanism $M{:}\,{\cal U}{\mathbin{\rightarrowtriangle}}{\cal Y}$ and loss function $\ell{:}\,{\cal W}{\times}{\cal U}{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq}$ such that $\ell_{N}$ is $N$ -legal, we have:

\textsf{\small\$}(\pi_{N},G_{N}^{\varepsilon},\ell_{N})~{}~{}~{}\leq~{}~{}~{}\textsf{\small\$}(\pi,M_{N},\ell_{N})

Proof:

Follows from Lem. 8 and noting that $M$ restricted to ${\cal U}_{N}$ satisfies the conditions for universal discrete optimality [1]. ∎

Our next task is to study the relationship between the Geometric- and Laplace mechanisms. We show first that $G_{N}^{\varepsilon}$ is refined (§IV-D) by the truncated Laplace mechanism also restricted to to ${\cal U}_{N}$ . Since $\textit{\L}^{\varepsilon}$ is already defined over the whole of ${\cal U}$ we continue to write its restriction to ${\cal U}_{N}$ as $\textit{\L}^{\varepsilon}$ . This will immediately show that losses under the Geometric are no more than those under the Laplace (§IV-D(1)), consistent with observations that, on discrete inputs, Laplace obfuscation does not necessarily minimise the loss. Since the output ${\cal Y}$ of $\textit{\L}^{\varepsilon}$ is continuous, we proceed by first approximating it using post-processing to make Laplace-based mechanisms ${}^{T}\textit{\L}^{\varepsilon}$ , defined below, which have discrete output, and which can form an anti-refinement chain converging to $\textit{\L}^{\varepsilon}$ . We are then able to show separately the refinements between $G_{N}^{\varepsilon}$ and ${}^{T}\textit{\L}^{\varepsilon}$ , using methods designed for finite mechanisms.

The $T\kern-1.00006pt,\kern-1.00006ptN$ -Laplace mechanisms approximate $\textit{\L}^{\varepsilon}$ by $T$ -pixelation of their outputs. Here $x$ is (still) in ${\cal U}_{N}$ but $y$ is in ${\cal U}_{\kern 0.70004ptT}$ .

\begin{array}[]{lll}{}^{T}\textit{\L}^{\varepsilon}(x)(y)~{}~{}~{}:={}{}{}&\textit{\L}^{\varepsilon}(x)[y,y{+}\nicefrac{{1}}{{T}})&~{}\textit{if}~{}y{<}1{-}\nicefrac{{1}}{{T}}\\ &\textit{\L}^{\varepsilon}[1{-}\nicefrac{{1}}{{T}},1]&~{}\textit{otherwise.}\end{array}

(12)

That is, we pixelate the ${\cal Y}$ using $T$ for the Laplace (independently of the $N$ we use for ${\cal X}$ .) This is illustrated in Fig. 1(a).

Observe that as this is a post-processing (§IV-D(3)) of the output of $\textit{\L}^{\varepsilon}$ , the refinement $\textit{\L}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T}\textit{\L}^{\varepsilon}$ follows.

VII-A Refinement between $N$ -Geometric
and $T\kern-1.00006pt,\kern-1.00006ptN$ -Laplace mechanisms

We now demonstrate the crucial fact that $G_{N}^{\varepsilon}$ is refined by ${}^{T}\textit{\L}^{\varepsilon}$ . We use version (b) of refinement, described in §IV-D, and establish a Wasserstein-style earth-move between hypers $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G^{\varepsilon}_{N}}]$ and $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}]$ (i.e. for uniform prior $\odot$ ).

Refer to caption — (a) Illustrates batching the output for ${}^{T}L$ (similar for ${}^{T}\textit{\L}^{\varepsilon}$ ). The outputs (shown here as pdf plots) are batched into output segments of length $\nicefrac{{1}}{{T}}$ in this example, for $T{=}8$ . The segment from $[x,x{+}\nicefrac{{1}}{{T}})$ is indicated by the two vertical lines. The probability assigned to this segment is the area under the relevant curves. For the red curve it is the sum of the white and blue regions; the green curve it is the sum of the white, blue and green regions and for the black curve it is only the white region.

Lemma 11

For all integer $T{>}0$ we have that $G_{N}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T}\textit{\L}^{\varepsilon}$ .

Proof:

Take $\Delta,\Delta^{\prime}$ in ${\mathbb{D}}^{2}{\kern 1.00006pt\cal U}_{N}$ as hypers both with finite supports. We can depict such hypers in ${\mathbb{R}}^{N+1}$ -space by locating their supports, each of which is a point in ${\mathbb{R}}^{N+1}$ , where the axes of the diagram correspond to each point in ${\cal U}_{N}$ . For example if we take $\Delta$ to be the hyper-distribution $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G_{2}^{\varepsilon}}]$ , it has three posterior distributions, which are 1-summing triples in ${\mathbb{R}}^{3}$ . They are depicted by the orange points in Fig. 1. Similarly the supports of the a hyper-distribution $\Delta^{\prime}$ taken to be $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}]$ are represented by the blue dots. Notice that the blue dots are contained in the convex hull of the orange dots, and this observation allows us to prove that the mechanisms $G_{2}^{\varepsilon}$ and ${}^{8}\textit{\L}^{\varepsilon}$ are in a refinement relation.

We use the following fact [8, Lem. 12.2] about refinement $(\mathrel{\sqsubseteq})$ .

Let $C,C^{\prime}{:}\,{\cal U}_{N}\,{\mathbin{\rightarrowtriangle}}\,{\cal U}_{\kern 0.49005ptT}$ be channels and let $\odot$ be the uniform prior. If the supports of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptC]$ are linearly independent when considered as vectors in ${\mathbb{R}}^{N}$ , and their convex hull encloses the supports of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptC^{\prime}]$ , then $C\mathrel{\sqsubseteq}C^{\prime}$ . ¹²¹²12The lemma applies to channels because of the direct correspondence between channels and the supports of hyper-distributions formed from uniform priors.

To apply this result, we let $C$ be $G_{N}^{\varepsilon}$ recall that indeed the supports of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G_{N}^{\varepsilon}}]$ are linearly independent (see for example [5]). Moreover in general, the supports of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}]$ are also contained in the convex hull. We provide details of this latter fact in [15, Appendix §B].∎

Finally we can show full refinement between the Laplace and the Geometric mechanism, which follows from continuity of refinement [13].

Theorem 12

$G_{N}^{\varepsilon}\mathrel{\sqsubseteq}\textit{\L}^{\varepsilon}$ .

Proof:

We first form an anti-refinement chain $\dots\mathrel{\sqsubseteq}~{}^{T_{1}}\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}~{}^{T_{0}}\textit{\L}^{\varepsilon}$ so that (a) $\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}^{T_{i}}\textit{\L}^{\varepsilon}$ for all $i$ , and (b) the chain converges to $\textit{\L}^{\varepsilon}$ .

We reason as follows:

	$\begin{array}[t]{@{}llll}G_{N}^{\varepsilon}\mathrel{\sqsubseteq}\textit{\L}^{\varepsilon}\end{array}$
iff	$\begin{array}[t]{@{}ll}{}\\ G_{N}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T_{i}}\textit{\L}^{\varepsilon}\quad\textrm{~{}for all~{}}i{\geq}0\end{array}$ `` $\mathrel{\sqsubseteq}$ is continuous; (a), (b) above''

which follows from Lem. 11. We provide details of (a), (b) just above in [15, Appendix §B].∎

We have shown that the Laplace mechanism is a refinement of the Geometric mechanism. This means that it genuinely leaks less information than does the Geometric mechanism and therefore affords greater privacy protections. On the other hand this means that we have lost utility with respect to the aggregated information. In the next section we turn to the comparison of the Laplace and Geometric mechanisms with respect to that loss.

VII-B The Laplace approximates the Geometric

The geometrical interpretation of the Laplace and Geometric mechanisms set out above indicates how the Laplace approximates the Geometric as ${\kern 1.00006pt\cal U}_{N}$ 's interval-width approaches $0$ . In particular the refinement relationship established in Thm. 12 describes how the posteriors of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}]$ all lie in between pairs of posteriors of $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG^{\varepsilon}_{N}]$ . This relationship between posteriors translates to a bound between the corresponding expected losses $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)$ and $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)$ via the Kantorovich-Rubinstein theorem applied to the hypers $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}]$ and $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}]$ . We sketch the argument in the next theorem, and provide full details in [15, Appendix §D].

Theorem 13

Let $\ell$ be a $\kappa$ -Lipschitz loss function, and $\odot$ the uniform distribution over ${\cal U}_{N}$ . Then

\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{},

(13)

where $c=3/(1{-}e^{-\varepsilon})^{2}$ is constant for fixed $\varepsilon$ .

Proof:

We appeal to the Kantorovich-Rubinstein theorem which states that the ``Kantorovich distance'' between probability distributions $\Delta,\Delta^{\prime}$ bounds above the difference between expected values $|{\cal E}_{\Delta}\,{f}-{\cal E}_{\Delta^{\prime}}\,{\kern-1.99997ptf}|$ whenever $f$ satisfies the $\kappa$ -Lipschitz condition. In our case the relevant distributions are the hyper-distributions $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}]$ and $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}]$ , and the relevant Lipschitz functions are $Y_{\ell}$ for loss functions $\ell$ . ¹³¹³13 Some $f{:}\,{\mathbb{D}}{\cal X}\mathbin{\rightarrow}{\mathbb{R}}$ is $\kappa$ -Lipschitz if $|f(\delta)-f(\delta^{\prime})|\leq\kappa{\mathbb{W}}(\delta,\delta^{\prime})$ , for $\kappa{>}0$ , and ${\mathbb{W}}(\delta,\delta^{\prime})$ is the Kantorovich distance between $\delta,\delta^{\prime}$ .

We write ${\mathbb{W}}(\Delta,\Delta^{\prime})$ for the Wasserstein distance between hyper-distributions $\Delta,\Delta^{\prime}$ which is determined by the minimal earth-moving cost to transform $\Delta$ to $\Delta^{\prime}$ . For any such earth move each posterior $\delta$ of $\Delta$ is reassigned to a selection of posteriors of $\Delta^{\prime}$ in proportion to the probability mass that $\Delta$ assigns to $\delta$ . The cost of the move is the expected value of the distance between posterior reassignment (weighted by the proportion of the reassignment). Thus the cost of any specific earth move provides an upper bound to ${\mathbb{W}}(\Delta,\Delta^{\prime})$ . ¹⁴¹⁴14All the costs are determined by the underlying metric used to define the probability distributions. For us this is determined by the Euclidean distance on the interval $[0,1]$ . Importantly for us, the relation of refinement $\mathrel{\sqsubseteq}$ determines a specific earth move [8] whose cost we can calculate.

Referring to Lem. 11 and Fig. 1, we see that the refinement between the approximation to the Laplace $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}]$ and $[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}]$ , reassigns the Geometric's posteriors (the orange dots) to the Laplace's posteriors (the blue dots). Crucially though the Geometric's posteriors form a linear order according to their distance from one another, and the refinement described in Lem. 11 shows how each Laplace posterior lies in between adjacent pairs of Geometric posteriors (according to the linear ordering), provided that $N$ divides $T$ . Therefore any redistribution of a Geometric posterior is bounded above by the distance to one or other of its adjacent posteriors. We show in [15, Appendix §D] that distances between adjacent pairs is bounded above by $c/N$ , and therefore ${\mathbb{W}}([\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}],[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}])\leq c/N$ .

Next we observe that if $\ell(w,x)$ is a $\kappa$ -Lipschitz function on $[0,1]$ (as a function of $x$ ), then $Y_{\ell}$ is a $\kappa$ -Lipschitz function, and so by the Kantorovich-Rubinstein theorem we must have (recalling from (9)) that $\textsf{\small\$}(\pi,M,\ell){=}{\cal E}_{[\pi\kern 0.70004pt{\triangleright}\kern 0.70004ptM]}\,{Y_{\ell}}$ :

\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{}.

(14)

By Thm. 12 and post-processing we see that $G_{N}^{\varepsilon}{\mathrel{\sqsubseteq}}\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}~{}^{T}\textit{\L}^{\varepsilon}$ . Recall from (a) that refinement means that the corresponding losses are also ordered, i.e.

\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)\leq\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)\leq\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)

and so the difference $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)$ must be no more than the difference $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)$ , thus (13) follows from (14). Full details are set out in [15, Appendix §D].∎

More generally, (13) holds whatever the prior.

Theorem 14

Let $\ell$ be a $\kappa$ -Lipschitz loss function, and $\pi$ any prior over ${\cal U}_{N}$ . Then

\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\pi,G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{}.

(15)

Proof:

This follows as for Thm. 13, by direct calculation, noting that for discrete distributions we have $\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},M,\ell^{*})=\textsf{\small\$}(\pi_{N},M,\ell)$ , where $\ell^{*}(w,x){:=}\ell(w,x){\times}\pi_{N}(x){\times N}$ . Details are given in [15, Appendix §D].∎

VII-C Approximating monotonic functions

The final piece needed to complete our generalisation of the Ghosh et al.'s optimality theorem is monotonicity. We describe here how to approximate continuous monotonic functions, and expose the limitations for the Laplace mechanism.

Definition 15

The loss function $\ell:{\cal W}{\times}{\cal X}\mathbin{\rightarrow}{\mathbb{R}}$ is said to be monotone if: there is some mapping $\theta{:}\,{\cal W}{\mathbin{\rightarrow}}[0,1]$ , such that

\ell(w,x)~{}~{}~{}:=~{}~{}~{}m(|\theta(w){-}x|,x)~{},

where $m:{\mathbb{R}}{\times}{\mathbb{R}}\mathbin{\rightarrow}{\mathbb{R}}$ is monotone in its first argument.

Notice how $\theta$ takes care of any remapping that might need to be applied for computing expected losses. Interestingly step functions are not in general monotone on the whole of the continuous input $[0,1]$ , but fortunately they are for the restrictions to ${\cal U}_{N}$ that we need.

Lemma 16

Let $\ell$ be monotone. Then the approximation $\ell_{T}$ restricted to ${\cal U}_{N}$ is monotone whenever $T$ is a multiple of $N$ .

Proof:

If $x{\in}{\kern 1.00006pt\cal U}_{N}$ then $\lfloor x\rfloor_{T}{=}x$ since $N$ divides $T$ . ∎

Examples of continuous monotone loss functions include $\mathpzc{len}$ and $\mathpzc{len}^{2}$ , where $x,w\in[0,1]$ , and

\mathpzc{len}(w,x)~{}~{}~{}:=~{}~{}~{}|x{-}w|~{}.

(16)

Note that $\mathpzc{len}$ is 1-Lipschitz and $\mathpzc{len}^{2}$ is 2-Lipschitz.

We note finally that as the pixellation of $N$ of $\ell$ increases the approximations $\ell_{N}$ converge to $\ell$ .

VIII Universal optimality
for the Laplace Mechanism

We finally have all the pieces in place to prove our main result, Thm. 5 from §V-C — the generalisation of discrete optimality [1].

Let $K^{\varepsilon}$ be any continuous $\varepsilon$ -DP mechanism with input $[0,1]$ , and let $\pi$ be a (continuous) probability distribution over $[0,1]$ and $\ell$ a legal (i.e. continuous, monotone, $\kappa$ -Lipschitz) loss function. Then:

\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}\textsf{\small\$}(\pi,K^{\varepsilon},\ell)~{}.

(17)

Proof:

We use the above results to approximate the expected posterior loss by step functions; these approximations are equivalent to posterior losses over discrete mechanisms satisfying $\varepsilon$ -DP enabling appeal to Ghosh et al.'s universal optimality result on discrete mechanisms. We reason as follows:

	$\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi,K^{\varepsilon},\ell_{N})\times e^{\nicefrac{{\varepsilon}}{{N}}}\end{array}$
$\geq$	$\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi_{N},K^{\varepsilon}_{N},\ell_{N})\end{array}$ ``Lem. 9''
$\geq$	$\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi_{N},G_{N}^{\varepsilon},\ell_{N})\end{array}$ ``Lem. 10: $\ell_{N}$ is legal by Lem. 16''
$\geq$	$\begin{array}[t]{@{}ll}{}\\ \textsf{\small\$}(\pi_{N},\textit{\L}^{\varepsilon},\ell_{N})-~{}c\kappa/N\end{array}$ ``Thm. 14; $\ell_{N}$ is $\kappa$ -Lipschitz''
$\geq$	$\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell_{N})\times e^{\nicefrac{{-\varepsilon}}{{n}}}~{}-c\kappa/N~{}.\end{array}$ ``Lem. 9''

The result now follows as above by taking $N$ to $\infty$ , and noting that $e^{\nicefrac{{\varepsilon}}{{N}}}$ , $e^{\nicefrac{{-\varepsilon}}{{N}}}$ , $c\kappa/N$ and $\ell_{N}$ converge to $1,1,0,\ell$ respectively, and that taking expected values over fixed distributions is continuous. ∎

Note that Thm. 5 only holds for mechanisms that are $\varepsilon$ -DP. An arbitrary embedding $K_{N}$ is not necessarily $\varepsilon$ -DP, and in particular Thm. 5 does not apply to $G_{N}^{\varepsilon}$ . Also the continuous property on $\ell$ is required, because $\ell_{N}$ must be monotone for all $N$ . Thus arbitrary step functions do not satisfy this property, and so the Laplace mechanism is not in general universally optimal wrt. arbitrary step functions. Two popular loss functions however are continuous, and thus we have the following corollary.

Corollary 17

The Laplace mechanism is universally optimal for $\mathpzc{len}$ and $\mathpzc{len}^{2}$ .

IX Related work

The study of (universally) optimal mechanisms is one way to understand the cost of obfuscation, needed to implement privacy, but with a concomitant loss of utility of queries to databases. Pai and Roth [18] provide a detailed survey of the principles underlying the design of mechanisms including the need to trade utility with privacy, and Dinur et al. [19] explore the relationship between how much noise needs to be added to database queries relative to the usefulness of the data released. Our use of loss functions to measure utility follows both that of Ghosh et al. [1] and Alvim et al. [9], and concerns optimality for entire mechanisms that satisfy a particular level of $\varepsilon$ -DP. For example, the mean error $\mathpzc{len}$ and the mean squared error $\mathpzc{len}^{2}$ can be used to evaluate loss, as described by Ghosh et al. [1] and mentioned in §VII-C.

The Laplace mechanism as a way to implement differential privacy has been shown for example by Dwork and Roth [20]. Moreover Chatzikokolakis et al. [4] showed how it satisfied $\varepsilon$ -DP-privacy as formulated here using the Euclidean metric.

Whilst rare, optimality results avoid the need to design bespoke implementations of privacy mechanisms that are tailored to particular datasets. The Geometric mechanism appears to be special for discrete inputs, as Ghosh et al. [1] showed when utility is measured using their ``legal'' loss functions. On the other hand, although the Laplace mechanism continues to be a popular obfuscation mechanism, its deficiencies in terms of utility have been demonstrated by others when the inputs to the obfuscation are discrete [21], and where the optimisation is based on minimising the probability of reporting an incorrect value, subject to the $\varepsilon$ -DP-constraint. Similarly Geng et al. [22] show that adding noise according to a kind of ``pixellated'' distribution appears to produce the best utility for arbitrary discrete datasets. Such examples are consistent with our Thm. 12 showing where the Laplace mechanism is a refinement of the Geometric mechanism (loses more utility) when restricted to a discrete input (to the obfuscation). We mention also that optimal mechanisms have also been studied by Gupte et al. [23] wrt. minimax agents, rather than average-case minimising agents.

Other works have shown the Laplace mechanism is optimal for metric differential privacy in particular non-Bayesian scenarios. Koufogiannis et al. [24] show that the Laplace mechanism is optimal for the mean-squared error function under Lipschitz privacy, equivalent to metric differential privacy; and Wang et al. [25] show that the Laplace mechanism minimises loss measured by Shannon entropy, again for metric differential privacy. Our result on ${\mathbb{R}}$ includes those results as specific cases; however, those works do go further in that they demonstrate optimality for their respective loss functions on ${\mathbb{R}}^{n}$ . We leave the study of these domains in the Bayesian setting to future work.

We also note that the linear ordering of the underlying query results seems to be important for finding optimality results. For example Brenner and Nissim [26] have demonstrated that for non-linearly ordered inputs, there are no optimal $\varepsilon$ -DP-mechanisms for a fixed level of $\varepsilon$ . Although their result finds that only counting queries have optimal mechanisms, their context (oblivious mechanisms on database queries) does not include the possibility of continuous valued query results with a linear order; our result does not contradict their impossibility, it can be seen rather as an extension of this result to a continuous setting.

Alvim et al. [27] also use the framework of Quantitative Information Flow to study the relationship between the privacy and the utility of $\varepsilon$ -DP mechanisms. In their work they model utility in terms of a leakage measure, where leakage is defined as the ratio of input gain to output gain wrt. a mechanism modelled as a channel. Their gain is entirely dual to our loss here, and is a model of an adversary trying to infer as much as possible about the secret input. Other notions of optimality have also been studied in respect of showing that the Laplace mechanism is not optimal, including [28] who work with ``near instance'' optimality, and Geng and Viswanath [22] show how to scale the Laplace in various ways to obtain good utility. Note also that these definitions of optimality do not use a prior, and therefore represent the special case of utility per exact input, rather than in a scenario where the observer's prior knowledge is included.

The use of the Laplace mechanism in real privacy applications has been demonstrated by Chatzikokolakis et al. [4] for geolocation privacy, and [29] for for privacy-preserving graph analysis, and Phan et al. [30] in deep learning.

Information-theoretic channel models for studying differential privacy were originally proposed by Alvim et al. [7, 27], and extended to arbitrary metrics in [4].

X Conclusion

We have studied the relationship between differential privacy (good) and loss of utility (bad) when the input ${\cal X}$ can be over an interval of the reals, rather than having ${\cal X}$ described as in the optimality result of Ghosh et al. [1, 31], i.e. as a discrete space. Here we have instead used as input space the continuous interval $[0,1]$ ; but we note that the result extends straightforwardly to any finite interval $[a,b]$ of ${\mathbb{R}}$ . Our result also imposes the condition that the loss functions must be $\kappa$ -Lipschitz for some finite $\kappa$ . We do not know whether this condition can be removed in general.

We observe that for $N$ -step loss functions, the Laplace mechanism is not optimal, and in fact a bespoke Geometric mechanism will be optimal for such loss functions. However our Thm. 14 provides a way to estimate the error, relative to the optimal loss, when using the Laplace mechanism.

Finally we note that the space of $\varepsilon$ -DP mechanisms is very rich, even for discrete inputs, suggesting that the optimality result given here will be useful whenever the input domain can be linearly ordered.

Acknowledgements

We thank Catuscia Palamidessi for suggesting this problem to us.

The appendices may be found at [15].

References

[1] A. Ghosh, T. Roughgarden, and M. Sundarajan, ``Universally utility-maximising privacy mechanisms,'' SIAM J. COMPUT, vol. 41, no. 6, pp. 1673–1693, 2012.
[2] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, ``Additive and multiplicative notions of leakage, and their capacities,'' in IEEE 27th Computer Security Foundations Symposium, CSF 2014, Vienna, Austria, 19-22 July, 2014. IEEE, 2014, pp. 308–322. [Online]. Available: http://dx.doi.org/10.1109/CSF.2014.29
[3] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, ``Calibrating noise to sensitivity in private data analysis,'' in Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, ser. Lecture Notes in Computer Science, S. Halevi and T. Rabin, Eds., vol. 3876. Springer, 2006, pp. 265–284. [Online]. Available: https://doi.org/10.1007/11681878_14
[4] K. Chatzikokolakis, M. Andrés, N. Bordenabe, and C. Palamidessi, ``Broadening the scope of differential privacy using metrics,'' in International Symposium on Privacy Enhancing Technologies Symposium, ser. LNCS, vol. 7981. Springer, 2013.
[5] K. Chatzikokolakis, N. Fernandes, and C. Palamidessi, ``Comparing systems: Max-case refinement orders and application to differential privacy,'' in Proc. CSF. IEEE Press, 2019.
[6] C. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
[7] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, and C. Palamidessi, ``On the relation between differential privacy and quantitative information flow,'' in Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II, 2011, pp. 60–76. [Online]. Available: https://doi.org/10.1007/978-3-642-22012-8_4
[8] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, The Science of Quantitative Information Flow, ser. Information Security and Cryptography. Springer International Publishing, 2020.
[9] M. S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith, ``Measuring information leakage using generalized gain functions,'' in Proc. 25th IEEE Computer Security Foundations Symposium (CSF 2012), Jun. 2012, pp. 265–279.
[10] A. McIver, C. Morgan, L. Meinicke, G. Smith, and B. Espinoza, ``Abstract channels, gain functions and the information order,'' in FCS 2013 Workshop on Foundations of Computer Security, 2013.
[11] S. Rachev and L. Ruschendorf, Mass transportation problems. Springer, 1998, vol. 1.
[12] Y. Deng and W. Du, ``The Kantorovich Metric in computer science: A brief survey,'' Electron. Notes Theor. Comput. Sci., vol. 253, no. 3, pp. 73–82, Nov. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.entcs.2009.10.006
[13] A. McIver, L. Meinicke, and C. Morgan, ``A Kantorovich-monadic powerdomain for information hiding, with probability and nondeterminism,'' in Proc. LiCS 2012, 2012.
[14] E. Lawler, Combinatorial optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.
[15] N. Fernandes, A. McIver, and C. Morgan, ``The Laplace Mechanism has optimal utility for differential privacy over continuous queries,'' April 2021, full version of this paper with appendices. [Online]. Available at http://www.cse.unsw.edu.au/ $\sim$ carrollm/LiCS2021-210420.pdf
[16] P. Meyer-Nieberg, Banach Lattices. Springer-Verlag, 1991.
[17] E. Wilson, ``First and second laws of error,'' JASA, vol. 18, no. 143, 1923.
[18] M. M. Pai and A. Roth, ``Privacy and mechanism design,'' SIGecom Exch., vol. 12, no. 1, pp. 8–29, 2013. [Online]. Available: https://doi.org/10.1145/2509013.2509016
[19] I. Dinur and K. Nissim, ``Revealing information while preserving privacy,'' in Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, F. Neven, C. Beeri, and T. Milo, Eds. ACM, 2003, pp. 202–210. [Online]. Available: https://doi.org/10.1145/773153.773173
[20] C. Dwork and A. Roth, ``The algorithmic foundations of differential privacy,'' Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.
[21] J. Soria-Comas and J. Domingo-Ferrer, ``Optimal data-independent noise for differential privacy,'' Information Sciences, vol. 250, pp. 200–214, 2012.
[22] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, ``The staircase mechanism in differential privacy,'' IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, 2015.
[23] M. Gupte and M. Sundararajan, ``Universally optimal privacy mechanisms for minimax agents,'' in Proc. Symp. Principles of Database Sytems, ser. PODS '10. New York, NY, USA: Association for Computing Machinery, 2010, pp. 135–146. [Online]. Available: https://doi.org/10.1145/1807085.1807105
[24] F. Koufogiannis, S. Han, and G. J. Pappas, ``Optimality of the laplace mechanism in differential privacy,'' arXiv preprint arXiv:1504.00065, 2015.
[25] Y. Wang, Z. Huang, S. Mitra, and G. E. Dullerud, ``Entropy-minimizing mechanism for differential privacy of discrete-time linear feedback systems,'' in 53rd IEEE conference on decision and control. IEEE, 2014, pp. 2130–2135.
[26] H. Brenner and K. Nissim, ``Impossibility of differentially private universally optimal mechanisms,'' in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. Los Alamitos, CA, USA: IEEE Computer Society, oct 2010, pp. 71–80. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/FOCS.2010.13
[27] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, P. Degano, and C. Palamidessi, ``On the information leakage of differentially-private mechanisms,'' Journal of Computer Security, vol. 23, no. 4, pp. 427–469, 2015. [Online]. Available: https://doi.org/10.3233/JCS-150528
[28] H. Asi and J. C. Duchi, ``Near instance-optimality in differential privacy,'' 2020, arXiv:2005.10630v1, 2020.
[29] Y. Wang, X. Wu, and L. Wu, ``Differential privacy preserving spectral graph analysis,'' in Advances in Knowledge Discovery and Data Mining, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 329–340.
[30] N. Phan, X. Wu, H. Hu, and D. Dou, ``Adaptive laplace mechanism: Differential privacy preservation in deep learning,'' in 2017 IEEE International Conference on Data Mining (ICDM), 2017, pp. 385–394.
[31] A. Ghosh, T. Roughgarden, and M. Sundararajan, ``Universally utility-maximizing privacy mechanisms,'' in Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, ser. STOC '09. New York, NY, USA: Association for Computing Machinery, 2009, pp. 351–360. [Online]. Available: https://doi.org/10.1145/1536414.1536464

The Laplace Mechanism has optimal utility for differential privacy over continuous queries

Abstract

Index Terms:

I Introduction

I-A The existing optimality result, and our extension

II Differential privacy, loss of utility and optimality

II-A Differential privacy

Definition 1

II-B ``Counting'' queries

II-C Prior knowledge, open-source and the observer

II-D The Geometric mechanism is ε\varepsilon-DP for dH\mbox{\sc\small d}_{H}

II-D1 Specialising to dH\mbox{\sc\small d}_{H}

II-D2 The geometric mechanism truncated

II-E Discrete optimality

II-F The geometric mechanism is never ε\varepsilon-DP on dense continuous inputs, e.g. when d on 𝒳{\cal X} is Euclidean

II-G Our result — continuous optimality

III An outline of the proof

IV Channels; loss functions; hyper-distributions; refinement

IV-A Channels, priors, marginals, posteriors

IV-B Loss functions; remapping

IV-C The relevance of hyper-distributions, abstract channels

IV-C1 Example of a channel representation of a mechanism

IV-C2 Example of a loss function calculation

IV-D Refinement of hypers and mechanisms

V Measures on continuous 𝒳{\cal X} and 𝒴{\cal Y}

V-A Measures via probability density functions

V-B Continuous mechanisms over continuous priors

Definition 2

V-C The truncated Laplace mechanism

Definition 3

Definition 4

Theorem 5

VI Approximating Continuity for 𝒳{\cal X}

VI-A Connecting continuous and discrete

VI-B Approximations of continuous priors

VI-C NN-step mechanisms and loss functions

Definition 6

Definition 7

Lemma 8

Proof:

VI-D Approximating continuous ε\varepsilon-DP mechanisms

Lemma 9

Proof:

VII The Laplace and Geometric mechanisms

Lemma 10

Proof:

VII-A Refinement between NN-Geometric and T,NT\kern-1.00006pt,\kern-1.00006ptN-Laplace mechanisms

Lemma 11

Proof:

Theorem 12

Proof:

VII-B The Laplace approximates the Geometric

Theorem 13

Proof:

Theorem 14

Proof:

VII-C Approximating monotonic functions

Definition 15

Lemma 16

Proof:

VIII Universal optimality for the Laplace Mechanism

Proof:

Corollary 17

IX Related work

X Conclusion

Acknowledgements

References

The Laplace Mechanism has optimal utility
for differential privacy over continuous queries

II Differential privacy, loss of utility
and optimality

II-D The Geometric mechanism is $\varepsilon$ -DP for $\mbox{\sc\small d}_{H}$

II-D1 Specialising to $\mbox{\sc\small d}_{H}$

II-F The geometric mechanism is never $\varepsilon$ -DP
on dense continuous inputs, e.g. when d on ${\cal X}$ is Euclidean

V Measures on continuous ${\cal X}$ and ${\cal Y}$

VI Approximating Continuity for ${\cal X}$

VI-C $N$ -step mechanisms and loss functions

VI-D Approximating continuous $\varepsilon$ -DP mechanisms

VII-A Refinement between $N$ -Geometric
and $T\kern-1.00006pt,\kern-1.00006ptN$ -Laplace mechanisms

VIII Universal optimality
for the Laplace Mechanism