This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Laplace Mechanism has optimal utility
for differential privacy over continuous queries

Natasha Fernandes134, Annabelle McIver1, Carroll Morgan23 ©2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. 1Department of Computing, Macquarie University, Sydney 2School of Computer Science and Engineering, UNSW, Sydney 3Data61, CSIRO, Sydney 4Inria and École Polytechnique, IPP, France
Abstract

Differential Privacy protects individuals' data when statistical queries are published from aggregated databases: applying ``obfuscating'' mechanisms to the query results makes the released information less specific but, unavoidably, also decreases its utility. Yet it has been shown that for discrete data (e.g. counting queries), a mandated degree of privacy and a reasonable interpretation of loss of utility, the Geometric obfuscating mechanism is optimal: it loses as little utility as possible [Ghosh et al.[1]].

For continuous query results however (e.g. real numbers) the optimality result does not hold. Our contribution here is to show that optimality is regained by using the Laplace mechanism for the obfuscation.

The technical apparatus involved includes the earlier discrete result [Ghosh op. cit.], recent work on abstract channels and their geometric representation as hyper-distributions [Alvim et al.[2]], and the dual interpretations of distance between distributions provided by the Kantorovich-Rubinstein Theorem.

Index Terms:
 Differential privacy, utility, Laplace mechanism, optimal mechanisms, quantitative information flow, abstract channels, hyper-distributions.

I Introduction

I-A The existing optimality result, and our extension

Differential Privacy (DP) concerns databases from which (database-) queries produce statistics: a database of information about people can be queried e.g. to reveal their average height, or how many of them are men. But a risk is that from a general statistic, specific information might be revealed about individuals' data: whether a specific person is a man, or his height, or even both. Differentially-private ``obfuscating'' mechanisms diminish that risk by perturbing their inputs (the raw query results) to produce outputs (the query reported) that are slightly wrong in a probabilistically unpredictable way. That diminishes the personal privacy risk (good) but also diminishes the statistics' utility (bad).

The existing optimality result is that for a mandated differential privacy parameter, some ε>0\varepsilon{>}0, and under conditions we will explain, the Geometric obfuscating mechanism GεG^{\varepsilon} (depending on ε\varepsilon) loses the least utility of any ε\varepsilon-Differentially Private oblivious obfuscating mechanism for the same ε\varepsilon, that loss being caused by her having to use the perturbed statistic instead of the real one [1].

A conspicuous feature of ε\varepsilon-DP (that is ε\varepsilon-differential privacy) is that it is achieved without having to know the nature of the individual's privacy that it is protecting: it is simply made ``ε\varepsilon-difficult'' to determine whether any of his data is in the database at all. Similarly the minimisation of an observer's loss (of utility) is achieved by the optimal obfuscation without knowing precisely how the obfuscation affects her: instead, the existence of a ``loss function'' is postulated that monetises her loss (think ``dollars'') based on the raw query (which she does not know) and the obfuscated query (which she does know) — and optimality of GεG^{\varepsilon} holds wrt. all loss functions (within certain realistic constraints) and all (other) ε\varepsilon-DP mechanisms MεM^{\varepsilon}.

in summary: The existing result states that the ε\varepsilon-DP Geometric obfuscating mechanism GεG^{\varepsilon} minimises loss of utility to an observer when the query results are discrete, e.g. counting queries in some (0..N)(0..N), and certain reasonable constraints apply to the monetisation of loss. But the result does not hold when the query results are continuous, e.g. in the unit interval [0,1][0,1]. We show that optimality is regained by using the ε\varepsilon-DP Laplace mechanism LεL^{\varepsilon}.

II Differential privacy, loss of utility
and optimality

II-A Differential privacy

Differential privacy begins with a database that is a multiset of rows drawn from some set RR [3]; thus the type of a database is 𝔹R\mathbb{B}\kern-0.29999ptR (using ``𝔹\mathbb{B}'' for ``bag''). A query qq is a function from database to a query-result in some set 𝒳{\cal X}, the input of the mechanism, and is thus of type 𝔹R𝒳\mathbb{B}\kern-0.29999ptR{\mathbin{\rightarrow}}{\cal X}.

A distance function between databases d:𝔹R×𝔹R\mbox{\sc\small d}{:}\,\mathbb{B}\kern-0.29999ptR{\times}\mathbb{B}\kern-0.29999ptR\mathbin{\rightarrow}{\mathbb{R}} measures how different two databases are from each other. Often used is the Hamming distance dH\mbox{\sc\small d}_{H}, 111The Hamming distance is also known as the symmetric distance. which gives (as an integer) how many whole rows would have to be removed or inserted to transform one database into another: given two databases b1,b2:𝔹Rb_{1},b_{2}{:}\,\mathbb{B}\kern-0.29999ptR we define dH(b1,b2)\mbox{\sc\small d}_{H}(b_{1},b_{2}) to be the size #(b1b2)\#(b_{1}\mathbin{\bigtriangleup}b_{2}) of their (multiset-) symmetric difference. Thus in particular two databases that differ only because a row has been removed from one of them have Hamming distance 1, and we say that such databases are adjacent.

We define also a distance function (metric) between –for the moment– discrete distributions 𝔻𝒴{\mathbb{D}}{\cal Y} over a set of observations 𝒴{\cal Y} the output of the mechanism. Given two distributions δ1,δ2\delta_{1},\delta_{2} on 𝒴{\cal Y}, their distance dD(δ1,δ2)\,{\textsf{\small d}_{D}}(\delta_{1},\delta_{2}) (for ``Dwork'') is based on the largest ratio over all Y𝒴Y{\subseteq}{\cal Y} between probabilities assigned to YY by δ1\delta_{1} and δ2\delta_{2} — it is

dD(δ1,δ2):=maxY𝒴|ln(δ1(Y)/δ2(Y))|\,{\textsf{\small d}_{D}}(\delta_{1},\delta_{2}):=\quad\textsf{max}_{\,Y{\subseteq}{\cal Y}}~{}|\ln(\delta_{1}(Y)/\delta_{2}(Y))| (1)

where δ(Y)\delta(Y) is the probability δ\delta assigns to the whole subset YY of 𝒴{\cal Y}, and the logarithm is introduced to make the distance satisfy the triangle inequality that metrics require. 222This distance is also known as “max divergence”.

Following the presentation of Chatzikokolakis et al. [4], once we have chosen a metric d on databases, we say that a mechanism MM achieves ε\varepsilon-Differential Privacy wrt. that d and some query qq, i.e. is ε\varepsilon-DP for d,q\mbox{\sc\small d},q, just when

for all databases b1,b2 in 𝔹R we havedD(M(q(b1)),M(q(b2)))εd(b1,b2) .\begin{array}[]{l}\mbox{for all databases $b_{1},b_{2}$ in $\mathbb{B}\kern-0.29999ptR$ we have}\\ \,{\textsf{\small d}_{D}}(M(q(b_{1})),M(q(b_{2})))~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(b_{1},b_{2})\makebox[0.0pt][l]{\quad.}\end{array} (2)

In the special case when d is the Hamming distance dH\mbox{\sc\small d}_{H}, the above definition becomes

for all databases b1,b2 in 𝔹R with dH(b1,b2)1,i.e. that differ only in the presence/absence of a single row,and for all subsets Y of 𝒴 we havePr(M(q(b1))Y)eεPr(M(q(b2))Y) .\begin{array}[]{l}\mbox{for all databases $b_{1},b_{2}$ in $\mathbb{B}\kern-0.29999ptR$ with $\mbox{\sc\small d}_{H}(b_{1},b_{2}){\leq}1$},\\ \mbox{i.e.\ that differ only in the presence/absence of a single row},\\ \mbox{and for all subsets $Y$ of ${\cal Y}$ we have}\\[4.30554pt] \quad\Pr(\,M(q(b_{1}))\,{\in}\,Y\,)~{}~{}~{}\leq~{}~{}~{}e^{\varepsilon}\mathbin{\cdot}\Pr(\,M(q(b_{2}))\,{\in}\,Y)\makebox[0.0pt][l]{\quad.}\end{array} (3)

With the above metric-based point of view we can say that an ε\varepsilon-DP mechanism is (simply) a ε\varepsilon-Lipschitz function from databases 𝔹R\mathbb{B}\kern-0.29999ptR with metric d to distributions of observations 𝔻𝒴{\mathbb{D}}{\cal Y} with metric dD\,{\textsf{\small d}_{D}} [4].

Definition 1

(dD\,{\textsf{\small d}_{D}}/d ε\varepsilon-DP for mechanisms) A Lipschitz mechanism MM from 𝒳{\cal X} raw query outputs) to 𝒴{\cal Y} (observations) is (dD\,{\textsf{\small d}_{D}}/d) ε\varepsilon-Differentially Private just when

for all inputs x1,x2 in 𝒳 to M we havedD(M(x1),M(x2))εd(x1,x2) ,\begin{array}[]{l}\mbox{for all inputs $x_{1},x_{2}$ in ${\cal X}$\ to $M$ we have}\\ \,{\textsf{\small d}_{D}}(M(x_{1}),M(x_{2}))~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(x_{1},x_{2})\makebox[0.0pt][l]{\quad,}\\ \end{array} (4)

in which we elide dD\,{\textsf{\small d}_{D}} and d when they are clear from context. In (2) we gave the special case where MM's inputs x1,x2x_{1},x_{2} were raw query-results q(b1),q(b2)q(b_{1}),q(b_{2}), i.e.  with b1,b2b_{1},b_{2} two databases acted on by the same query-function qq. And (3) was further specialised to where the two databases were adjacent and the metric was dH\mbox{\sc\small d}_{H}, the Hamming distance.

II-B ``Counting'' queries

Counting queries on databases are the special case where the codomain 𝒳{\cal X} of the query (the mechanism input) is the non-negative integers and the query qq returns the number of database rows satisfying some criterion, like ``being a man''. The ``average height'' query is not a counting query.

When the database metric is the Hamming distance dH\mbox{\sc\small d}_{H}, a counting query can be characterised more generally as one that is a 1-Lipschitz function wrt. dHd_{H} and the usual metric (absolute difference) on the integers, i.e. one whose result changes by at most 1 between adjacent databases. Since composition of Lipschitz functions (merely) multiplies their Lipschitz factors, the composition of a counting query and an obfuscating mechanism is ε\varepsilon-DP as a whole if the mechanism on its own (i.e. without the query, acting ``obliviously'' on x=q(b)x{=}q(b)) is ε\varepsilon-Lipschitz. That is why for counting queries we can concentrate on the mechanisms alone (whose type is 𝒳𝔻𝒴{\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}{\cal Y}) rather than including the databases and their type 𝔹R\mathbb{B}\kern-0.29999ptR in our analysis.

II-C Prior knowledge, open-source and the observer

Although the database contents are not (generally) known, often the distribution of its query results is known: this is ``prior knowledge'', where e.g. it is known that a database of heights in the Netherlands is likely to contain higher values than a similar database in other countries — and that knowledge is different from the (unknown) fact we are trying to protect, i.e. whether a particular person's height is in that database.

We abstract from prior knowledge of the database by concentrating instead on the prior knowledge π\pi of the distribution 𝒳{\cal X} of raw queries, the inputs xx to the mechanism, that is induced as the push-forward of the query-function (an ``open source'' aggregating function) over the known distribution of possible databases themselves. Knowing π\pi on the input in 𝒳{\cal X}  the observer can use her knowledge of the mechanism (also open source) to deduce a distribution on the output observations in 𝒴{\cal Y} that will result from applying it and –further– she can also deduce a posterior distribution on 𝒳{\cal X} based on any particular yy in 𝒴{\cal Y} that she observes.

II-D The Geometric mechanism is ε\varepsilon-DP for dH\mbox{\sc\small d}_{H}

II-D1 Specialising to dH\mbox{\sc\small d}_{H}

Recall from §II-B that dH\mbox{\sc\small d}_{H}, the Hamming distance, is what is typically used for counting queries. In that case we see as follows from (2) that the Geometric mechanism GG can be made ε\varepsilon-DP.

The Geometric distribution centred on 0 with parameter α\alpha assigns (discrete) probability

Gα(n):=1α/1+αα|n|G_{\alpha}(n):=\quad\nicefrac{{1-\alpha}}{{1+\alpha}}\mathbin{\cdot}\alpha^{|n|} (5)

to any integer nn (positive or negative) [1]. It implements an ε\varepsilon-DP Geometric mechanism by obfuscating the query according to (5) above: thus set α:=eε\alpha:=e^{-\varepsilon} and define

Gε(n)(n):=Gα(nn)=1α/1+αα|nn|G^{\varepsilon}(n)(n^{\prime}):=G_{\alpha}(n^{\prime}{-}n)=\nicefrac{{1-\alpha}}{{1+\alpha}}\mathbin{\cdot}\alpha^{|n^{\prime}-n|} (6)

to be the probability that integer nn is input and nn^{\prime} is output. Thus applied to some nn, the effect of GεG^{\varepsilon} with ε:=lnα\varepsilon:=-\ln\alpha333The α\alpha in GαG_{\alpha} is <1{<}1, so ε>0\varepsilon{>}0. is to leave nn as it is with probability 1α/1+α\nicefrac{{1-\alpha}}{{1+\alpha}} and to split the remaining probability 2α/1α\nicefrac{{2\alpha}}{{1-\alpha}} equally between adding 1's or subtracting them: GεG^{\varepsilon} continues (in the same direction) with repeated probability α\alpha until, with probability 1α1{-}\alpha, it stops.

As explained in §II-C, we now concentrate on GεG^{\varepsilon} alone and how it perturbs its input (a query result), i.e. no longer considering the database from which the query came. 444Note that although the (raw) query is output from the database, it is input to the obfuscating mechanism. That is why we refer to 𝒳{\cal X} as “input”.

II-D2 The geometric mechanism truncated

In (6) the mechanism GεG^{\varepsilon} can effect arbitrary large perturbations. But in practice its output is constrained (in the discrete case) to a finite set (0..N)(0..N) by (re-)assigning all probabilities for negative observations to observation 0, and all probabilities for N{\geq}N observations to NN. For example with eε=α=1/2e^{-\varepsilon}{=}\alpha{=}\nicefrac{{1}}{{2}} and restricting to (0..2)(0..2) we have Gε(0)(0)=+1/12+1/6+1/3=2/3G^{\varepsilon}(0)(0)=\cdots{+}\nicefrac{{1}}{{12}}{+}\nicefrac{{1}}{{6}}{+}\nicefrac{{1}}{{3}}=\nicefrac{{2}}{{3}} and Gε(0)(1)=1/6G^{\varepsilon}(0)(1)=\nicefrac{{1}}{{6}} and Gε(0)(2)=1/12+1/24+=1/6G^{\varepsilon}(0)(2)=\nicefrac{{1}}{{12}}{+}\nicefrac{{1}}{{24}}{+}\cdots=\nicefrac{{1}}{{6}}. It can be shown [5] however that truncation makes no difference to our results, and so from here on we will assume that truncation has been applied to GεG^{\varepsilon}.

II-E Discrete optimality

It has been shown [1] that when 𝒳{\cal X} is discrete (and hence the prior π\pi on 𝒳{\cal X} is also), and when the obfuscation is via GεG^{\varepsilon}, and when the observer applies a ``loss function'' (w,x)\ell(w,x) of her choice to monetise in {\mathbb{R}}^{\geq} the loss of utility to her if the raw query was xx but she assumes it was ww, then any other ε\varepsilon-DP mechanism MεM^{\varepsilon} acting on 𝒳{\cal X} can only lose more utility (on average) according to that π\pi and \ell than GεG^{\varepsilon} does. That is, the ε\varepsilon-DP Geometric mechanism is optimal for minimising loss (maximising utility) over all priors π\pi and all (``legal'') loss functions \ell under a mandated ε\varepsilon-DP obfuscation. A loss function is said to be legal if it is monotone (increasing) wrt. to the difference between the guess (ww) and the actual value xx of the query. As explained in [1] this means that the loss (w,x)\ell(w,x) takes the form of a function m(|wx|,x)m(|w-x|,x), which must be monotone (increasing) in its first argument.

II-F The geometric mechanism is never ε\varepsilon-DP 
on dense continuous inputs, e.g. when d on 𝒳{\cal X} is Euclidean

If the input metric for GG is not the Hamming distance dH\mbox{\sc\small d}_{H}, e.g. when the GG's input 𝒳{\cal X} is continuous, still GG's output remains discrete, taking some number of steps, each of fixed length say λ>0\lambda{>}0, in either direction. That is, any GG input xx is perturbed to x+iλx{+}i\lambda for some integer ii.

Now because 𝒳{\cal X} is continuous and dense, we can vary the input xx itself, by a tiny amount, to some xx^{\prime} so that d(x,x)<λ\mbox{\sc\small d}(x,x^{\prime})\,{<}\,\lambda no matter how small λ\lambda might be, producing perturbations x+iλx^{\prime}{+}i\lambda each of which is distant that same (constant) d(x,x)\mbox{\sc\small d}(x,x^{\prime}) from the original x+iλx{+}i\lambda and, precisely because d(x,x)<λ\mbox{\sc\small d}(x,x^{\prime})\,{<}\,\lambda , those new perturbations cannot overlap the ones based on the original xx.

Thus the two distributions produced by GG acting on xx and on xx^{\prime} have supports that do not intersect at all. And therefore the dD\,{\textsf{\small d}_{D}} distance between the two distributions is infinite, meaning that GG cannot be be ε\varepsilon-DP for any (finite) ε\varepsilon. That is, for a database producing truly real query results 𝒳{\cal X}, a standard (discrete) GG cannot establish ε\varepsilon-DP for any ε\varepsilon, however large ε\varepsilon might be.

There are two possible solutions. The first solution, both obvious and practical, is to ``discretise'' the input and to scale appropriately: a person's height of 1.75m1.75\textrm{m} would become 175cm175\textrm{cm} instead. A second solution however is motivated by taking a more theoretical approach. Rather than discretise the type of the query results, we leave it continuous — and seek our optimal mechanism among those that –unlike the Geometric– do not take only discrete steps. It will turn out to be the Laplace distribution.

II-G Our result — continuous optimality

In the discrete case typically the set 𝒳{\cal X} of raw queries is (0..N)(0..N) for some N0N{\geq}0, and the prior knowledge π\pi is a (discrete) distribution on that. For our continuous setting we will use 𝒳=[0,1]{\cal X}{=}[0,1] for raw queries, the unit interval 𝒰{\cal U}, and the discrete distribution π\pi will become a proper measure on [0,1][0,1] expressed as a probability density function. The ε\varepsilon-DP obfuscating mechanisms, now KεK^{\varepsilon} for ``kontinuous'', will take a raw query xx from a continuous set 𝒳{\cal X} rather than a discrete one. And the metric on 𝒳=𝒰{\cal X}{=}{\kern 1.00006pt\cal U} will be Euclidean.

Our (continuous) optimality result formalised at Thm. 5 is that ε\varepsilon-DP Laplace mechanism LεL^{\varepsilon} minimises loss over all continuous priors π\pi on 𝒳=𝒰{\cal X}{=}{\kern 1.00006pt\cal U} and all legal loss functions \ell under a mandated ε\varepsilon-DP obfuscation with respect to the Euclidean metric on the continuous input 𝒳=[0,1]{\cal X}{=}[0,1].

The theorem requires that all mechanisms satisfy (2) with d the Euclidean distance on continuous inputs. We write ε\varepsilon-DP for such mechanisms. The argument in §II-F above shows therefore that Geometric mechanisms are no longer suitable (for optimality) because on continuous 𝒳{\cal X} they are no longer ε\varepsilon-DP.

III An outline of the proof

We access the existing discrete results in §I-A, and §II-E from within the continuous 𝒰{\cal U} by ``pixelating'' it, that is defining 𝒰N={0,1/N,2/N,,N1/N,1}{\cal U}_{N}{=}\{0,\nicefrac{{1}}{{N}},\nicefrac{{2}}{{N}},\ldots,\nicefrac{{N-1}}{{N}},1\} for integer N>0N{>}0, and mapping (0.N)(0.N) isomorphically onto that discrete subset. We then establish near optimality for a similarly pixelated Laplace mechanism, showing that ``near'' becomes ``equal to'' when NN tends to infinity. In more detail:

  1. (a)

    (We show in §VI-B that) Any (discrete) prior on 𝒰N{\cal U}_{N} corresponds to some prior on the original 𝒰{\cal U}, but can also be obtained by pixelating some continuous prior π\pi on all of 𝒰{\cal U}, concentrating its (now discrete) probabilities onto elements of 𝒰N{\cal U}_{N} only: e.g. the probability π[n/N,n+1/N)\pi[\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}}) of the entire 1/N\nicefrac{{1}}{{N}}-sized interval is moved onto the point n/N\nicefrac{{n}}{{N}}. We write it πN\pi_{N}.

  2. (b)

    VI-C) Any function ff acting on all of 𝒰{\cal U} can be made into an NN-step function by first restricting its inputs to 𝒰N{\cal U}_{N} and then filling in the ``missing'' values f(x)f(x) for xx in (n/N,n+1/N)(\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}}) by copying the value for f(n/N)f(\nicefrac{{n}}{{N}}). If ff is an ε\varepsilon-DP mechanism KεK^{\varepsilon}, we write its NN-stepped version as KNεK^{\varepsilon}_{N}, and note that KNεK^{\varepsilon}_{N} remains ε\varepsilon-DP when restricted to the points in 𝒰N{\cal U}_{N} only.

    If ff is a loss function on (𝒲{\cal W} and) 𝒳{\cal X} we write N\ell_{N} for its stepped version.

  3. (c)

    VII-C; Lem. 10; Lem. 16) Now for any NN, mechanism KNεK^{\varepsilon}_{N}, prior πN\pi_{N}, and legal NN-step loss function N\ell_{N} we can appeal to the discrete optimality result: for the pixelated prior πN\pi_{N} and the NN-step and legal N\ell_{N} the loss due to GNεG^{\varepsilon}_{N} is \leq the loss due to KNεK^{\varepsilon}_{N}.

  4. (d)

    VII-B; Thm. 13) The replacement of GNεG^{\varepsilon}_{N} by LNεL^{\varepsilon}_{N} (both NN-step functions on [0,1][0,1]) is via pixelating the output (continuous) distribution of LNεL^{\varepsilon}_{N} to a multiple TT of NN: we write that LNεT{}^{T}L^{\varepsilon}_{N}. The Kantorovich-Rubinstein Theorem, provided additionally that N\ell_{N} is pp-Lipschitz for some p>0p{>}0 independent of NN, shows that the (additive) difference between the GNεG^{\varepsilon}_{N}-loss and the LNεT{}^{T}L^{\varepsilon}_{N}-loss, for any πN\pi_{N} and N\ell_{N} and TT a multiple of NN, tends to zero as NN increases.

  5. (e)

    VI-D) Then we remove the subscript πN\pi_{N} on the prior, and on the mechanisms KNεK^{\varepsilon}_{N} and LNεT{}^{T}L^{\varepsilon}_{N}, relying now on the ε\varepsilon-DP of the two mechanisms to make the (multiplicative) ratio between the losses they cause tend to 1.

  6. (f)

    VIII) The final step, removing the subscript N from N\ell_{N}, is that the loss-calculating procedure is continuous and that N\ell_{N} tends to \ell as NN tends to infinity.

IV Channels; loss functions; hyper-distributions; refinement

In this section we provide a summary of the more general Quantitative Information Flow techniques that we will need for the subsequent development.

IV-A Channels, priors, marginals, posteriors

The standard treatment of information flow is via Shannon's (unreliable) channels: they take an input xx from say 𝒳{\cal X} and deliver an output that for a perfect channel will be xx again, but for an imperfect channel might be some other xx^{\prime} in 𝒳{\cal X} instead. For example, an imperfect channel transmitting bits might ``flip' input bits so that with probability say 1/4\nicefrac{{1}}{{4}} an input 0 becomes an output 1 and vice versa [6]. In the discrete case, and generalising to allow outputs of possibly a different type 𝒴{\cal Y}, such channels are 𝒳×𝒴{\cal X}{\times}{\cal Y} matrices CC whose row-xx, column-yy element Cx,yC_{x,y} is the probability that input xx will produce output yy. A perfect channel would be the identity matrix on 𝒳×𝒳{\cal X}{{\times}}{\cal X}; a completely broken channel on 𝒳×𝒴{\cal X}{{\times}}{\cal Y} for any 𝒴{\cal Y} would have Cx,y=1/#𝒴C_{x,y}=\nicefrac{{1}}{{\#{\cal Y}}}.

The xx-th row of a (channel) matrix CC is Cx,C_{x,-}; and the yy-th column is C,yC_{-,y}. Since each row sums to 1 (making CC a stochastic matrix), the row Cx,C_{x,-} determines a discrete distribution in 𝔻𝒴{\mathbb{D}}{\cal Y}; for the ``broken'' channel it would be the uniform distribution, which we write \odot.

As a matrix, a channel has type 𝒳×𝒴{\cal X}{{\times}}{\cal Y}\,{\mathbin{\rightarrow}}\,{\mathbb{R}} (but with 1-summing rows); isomorphically it also has type 𝒳𝔻𝒴{\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}{\cal Y}. We'll write 𝒳𝒴{\cal X}{\mathbin{\rightarrowtriangle}}{\cal Y} for both, provided context makes it clear which one we are using.

If a prior distribution π:𝔻𝒳\pi{:}\,{\mathbb{D}}{\cal X} on 𝒳{\cal X} is known, then the channel CC can be applied to π\pi to create a joint distribution JJ in 𝔻(𝒳×𝒴){\mathbb{D}}({\cal X}{\times}{\cal Y}) on both input and output together, written πC\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptC and where Jx,y:=πxCx,yJ_{x,y}\,{:=}\;\pi_{x}C_{x,y}. For that JJ, the left-marginal yJx,y\sum_{y}J_{x,y} gives the prior π\pi again (no matter what CC might be), i.e. the probability that the input was xx — thus πx=Jx,Σ\pi_{x}\,{=}\,J_{x,\Sigma} if we use that notation for the marginal. The right marginal JΣ,yJ_{\Sigma,y} is the probability that the output is yy, given both π\pi and CC.

The yy-posterior distribution on 𝒳{\cal X}, given π,C\pi,C and a particular yy, is the conditional distribution on 𝒳{\cal X} if that yy was output: it is the yy-th column divided by the marginal probability of that yy, that is J,y/JΣ,yJ_{-,y}/J_{\Sigma,y} (provided the marginal is not zero).

If we fix π\pi and CC, and use the conventional abbreviation pXYp_{XY} for the resulting joint distribution (πC)(\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptC), then the usual notations for the above are pXp_{X} for left marginal (=π)({=}\,\pi) and pX(x)p_{X}(x) for its value πx\pi_{x} at a particular xx, with pYp_{Y} and pY(y)p_{Y}(y) similarly for the right marginal. Then pX|y(x)p_{X|y}(x) is the posterior probability of the original observation's being xx when yy has been observed. Further, we can write just p(x)p(x) and p(y)p(y) and p(x|y)p(x|y) when context makes the (missing) subscripts clear.

IV-B Loss functions; remapping

Our obfuscating mechanisms MM and GεG^{\varepsilon} are channels like CC in the discrete case — the result of the query is the channel's input xx, and the (perturbed) value the observer sees is the channel's output yy. The loss functions (w,x)\ell(w,x) will quantify the loss to her of seeing (only) yy, and then choosing ww, when what she really wants to know is xx. Such ε\varepsilon-DP mechanisms have earlier been modelled this way, i.e. as channels by Alvim et al. [7] and Chatzikokolakis et al. [4], who observed that for ε\varepsilon-DP the ratios of their entries must satisfy the ε\varepsilon-DP constraints, because the definition at §II-A(4) reduces to comparing (multiplicatively) adjacent entries in channel columns.

The connection between the observation yy and the loss-function parameter ww is that the observer does not necessarily have to ``take what she sees'' — there might be good reasons for her making a different choice. For example, in a word-guessing game where the last, obfuscated letter ? in a word SA? is shown on the board, the observer might have to guess what it really is. Even if it looks like a blurry Y (value 4 in Scrabble), she might instead guess X (value 8) because that would earn more points on average if from prior knowledge she knows that X strictly more than half as likely as Y is — i.e. it's worth her taking the risk. Thus rather than mandating that the observer must accept what she thinks the letter is most likely to be, she uses the obfuscated query yy to deduce information about the whole posterior distribution of the actual query… and might suggest that she guess some wyw{\neq}y, because the expected loss of doing that is less than (the expected utility is greater than) it would be if she simply accepted the yy she saw. That rational strategy is called ``remapping'' [1]. Thus she sees yy, but yy tells her that ww is what she should choose as her least-loss inducing guess for xx. That is, the simplest strategy is ``take what you see''; but it might not be the best one. In general (and now using MM again for mechanism), we write $(π,M,)\textsf{\small\$}(\pi,M,\ell) for the expected loss to a rational observer, given the π,M\pi,M she knows and the loss function \ell she has chosen: it is

yp(y)minwx(w,x)p(x|y) ,\sum_{y}p(y)~{}~{}\textsf{\small min}_{w}\sum_{x}\ell(w,x)\,p(x|y)\makebox[0.0pt][l]{\quad,} (7)

that is the expected value, over all possible observations yy and their marginal probabilities, of the least loss she could rationally achieve over all her possible choices ww given the knowledge that yy will have provided about the posterior distribution p(X|y)p(X|y) of the actual raw input xx. Note that MM and π\pi determine (from (§IV-A) the p(y)p(y) and p(x|y)p(x|y) that appear in (7). We remark that this formulation for measuring expected loss corresponds precisely to the formulation used by Ghosh et al. in the optimality theorem.

IV-C The relevance of hyper-distributions, abstract channels

It is important to remember that the expected-loss formula (7) does not use the actual mechanism-output values yy in any way directly: instead it takes the only expected value of what they might be. All that matters is their marginal probabilities p(y)p(y) and the a-posteriori distributions p(X|y)p(X|y) that they induce. That allows us to abstract from 𝒴{\cal Y} altogether.

A hyper-distribution expresses that abstraction: it is a distribution of distributions on 𝒳{\cal X} alone, that is of type 𝔻𝔻𝒳{\mathbb{D}}{\mathbb{D}}{\cal X}; abbreviate those as ``hyper'' and ``𝔻2𝒳{\mathbb{D}}^{2}{\cal X}''. Given a joint distribution J:𝔻(𝒳×𝒴)J{:}\,{\mathbb{D}}({\cal X}{{\times}}{\cal Y}), we write [J][J] for the hyper-distribution whose support is posterior distributions 555In the hyper-distribution literature these are called “inners” [8]. p(X|y)p(X|y) on XX and which assigns the corresponding marginal distribution p(y)p(y) to each. (Zero-valued marginals are left out.) We now re-express (7) in those terms.

If we write (w,)\ell(w,-) for the function on 𝒳{\cal X} that \ell determines once ww is fixed, and write distrv\,{\cal E}_{\textsc{\tiny dist}}\,{\textsc{\footnotesize rv}}\, for expected value of random-variable rv with distribution dist, then minwp(X|y)(w,)\textsf{\small min}_{w}\,{\cal E}_{p(X|y)}\,{\ell(w,-)} is the inner part of (7). Then fix some \ell and define for general distribution δ:𝔻X\delta{:}\,{\mathbb{D}}X that

Y(δ):=minwδ(w,) ,Y_{\ell}(\delta):=\quad\textsf{\small min}_{w}\,{\cal E}_{\delta}\,{\ell(w,-)}\makebox[0.0pt][l]{\quad,} (8)

(using YY for ``entropYY'') so that YY_{\ell} is itself a real-valued function on distributions δ\delta (as e.g. Shannon entropy is). With that preparation, the expression (7) becomes the expected value of YY_{\ell} over the hyper produced by abstracting from J=πMJ=\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM as above. That is (7) gives equivalently

$(π,M,)=[πM]Y ,\textsf{\small\$}(\pi,M,\ell)~{}~{}~{}=~{}~{}~{}{\cal E}_{[\pi\kern 0.70004pt{\triangleright}\kern 0.70004ptM]}\,{Y_{\ell}}\makebox[0.0pt][l]{\quad,} (9)

in which the MM and π\pi now explicitly appear and where –we recall– the brackets [][-] convert the joint distribution πM\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM to a hyper. (If YY_{\ell} were in fact Shannon entropy, then (9) would be the conditional Shannon entropy. But YY_{\ell}'s are much more general than Shannon entropy alone [2, 9].)

Finally, using hypers we define an abstract channel to be a function from prior to hyper, i.e. of type 𝒳𝔻2𝒳{\cal X}{\mathbin{\rightarrow}}{\mathbb{D}}^{2}{\cal X}, realised from some concrete channel M:𝒳𝒴M{:}\,{\cal X}{\mathbin{\rightarrowtriangle}}{\cal Y} as π[πM]\pi\,{\mapsto}\,[\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM]. It is ``abstract'' because the type 𝒴{\cal Y} no longer appears: it is unnecessary because if M(π)M(\pi) is the application of MM as a function applied to prior π\pi, then from (9) the worst rational expected loss is written simply M(π)Y{\cal E}_{M(\pi)}\,{Y_{\ell}} .

(Recall from §IV-B that this naturally takes into account the ``rational observers'' and the remapping they might perform, as described in [1].)

IV-C1 Example of a channel representation of a mechanism

If we have a discrete input 𝒳:={x0,x1,x2}{\cal X}{:=}\{x_{0},x_{1},x_{2}\}, and discrete output 𝒴:={y0,y1,y2,y3,y4}{\cal Y}{:=}\{y_{0},y_{1},y_{2},y_{3},y_{4}\}, we can represent an obfuscating mechanism MM with the channel MM below.

M=[2/31/61/121/241/241/61/61/31/61/61/241/241/121/62/3]M=\begin{bmatrix}\nicefrac{{2}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}\\ \nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}\\ \nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{2}}{{3}}\end{bmatrix}

As described in §IV-A above, the row Mx,M_{x,-} corresponds to the probability distribution of outputs yy in 𝒴{\cal Y} for that xx. For example the top left number 2/3=Mx0,y0\nicefrac{{2}}{{3}}{=}M_{x_{0},y_{0}} is the probability that output y0y_{0} is observed when the input is x0x_{0}. We can interpret this as an ε\varepsilon-DP mechanism once we know the metric d on 𝒳{\cal X}. In particular §II-A(1) simplifies to comparing ratios of entries in the same column, and when we do that we find that for example dD(M(x0),M(x1))=ln4\,{\textsf{\small d}_{D}}(M(x_{0}),M(x_{1}))=\ln 4. Thus from §II-A(4) now applied to 𝒳{\cal X} we can say that if MM is ε\varepsilon-DP then ε\varepsilon satisfies

dD(M(x0),M(x1))=ln4εd(x0,x1) .\,{\textsf{\small d}_{D}}(M(x_{0}),M(x_{1}))~{}~{}~{}=~{}~{}~{}\ln 4~{}~{}~{}\leq~{}~{}~{}\varepsilon\mathbin{\cdot}\mbox{\sc\small d}(x_{0},x_{1})\makebox[0.0pt][l]{\quad.}

IV-C2 Example of a loss function calculation

Now suppose that we choose a loss function known as ``Bayes Risk'', 𝖻𝗋{\sf br} defined on 𝒳:={x0,x1,x2}{\cal X}{:=}\{x_{0},x_{1},x_{2}\} as above:

𝖻𝗋(w,x):=1 if xw else 0,{\sf br}(w,x)~{}~{}~{}:=~{}~{}~{}1~{}\textrm{~{}if~{}}~{}x\neq w~{}\textrm{~{}else~{}}~{}0~{},

where 𝒲:=𝒳{\cal W}{:=}{\cal X}. Letting the input prior be the uniform distribution \odot over 𝒳{\cal X}, we can compute the loss $(,M,𝖻𝗋)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},M,{\sf br}) by selecting for each output yy, the ww which makes the expected value of 𝖻𝗋(w,){\sf br}(w,-) over the posterior pX|yp_{X|y} the least. We then take the expected value of these least values over the marginal pYp_{Y}. For y0y_{0} for example, that least expected value occurs at w=x0w{=}x_{0}, and for y1y_{1} it occurs either for w=x0w{=}x_{0} or w=x0w{=}x_{0}. Overall the total expected loss is 1/3\nicefrac{{1}}{{3}}.

IV-D Refinement of hypers and mechanisms

The hypers 𝔻2𝒳{\mathbb{D}}^{2}{\cal X} on 𝒳{\cal X} have a partial order ()(\mathrel{\sqsubseteq}) ``refinement'' [10] that we will need in the proof of our main result. It admits several equivalent interpretations in this context. Below, we write Δ\Delta etc. for general hypers in 𝔻2𝒳{\mathbb{D}}^{2}{\cal X}.

We have that ΔΔ\Delta\,{\mathrel{\sqsubseteq}}\,\Delta^{\prime}, that hyper Δ\Delta is refined by hyper Δ\Delta^{\prime}, under any of these equivalent conditions:

  1. (a)

    when ΔYΔY{\cal E}_{\Delta}\,{Y_{\ell}}\leq{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}} for all loss functions \ell (i.e. whether legal or not).

  2. (b)

    when considered as distributions on posteriors 𝔻𝒳{\mathbb{D}}{\cal X} it is possible to convert Δ\Delta into Δ\Delta^{\prime} via a Wasserstein-style ``earth move'' of probability from one posterior to another [11, 12, 8].

  3. (c)

    when generated from joint-distribution matrices DD in 𝔻(𝒳×𝒴){\mathbb{D}}({\cal X}{\times}{\cal Y}) generating Δ\Delta, and DD^{\prime} in 𝔻(𝒳×𝒴){\mathbb{D}}({\cal X}{\times}{\cal Y}^{\prime}) generating Δ\Delta^{\prime}, there is a ``post-processing matrix'' RR of type 𝒴𝒴{\cal Y}{\mathbin{\rightarrowtriangle}}{\cal Y}^{\prime} such that as matrices we have DR=DD{\cdot}R=D^{\prime} via matrix multiplication.

And we say that one mechanism MM is refined by another MM^{\prime} just when [πM][πM][\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM]\,{\mathrel{\sqsubseteq}}\,[\pi\kern 1.00006pt{\triangleright}\kern 1.00006ptM^{\prime}] for all priors π\pi. When this occurs we also write MMM\mathrel{\sqsubseteq}M^{\prime}. From formulation (a) we will use the fact that the ()(\mathrel{\sqsubseteq})-infimum of the LNεT{}^{T}L^{\varepsilon}_{N}'s (indexed over a sequence of TT's) is just LεL^{\varepsilon} itself [13] and [15, Lem. 20, Appendix §B].

Formulation (b) is particularly useful. If we find a specific earth move from Δ\Delta to Δ\Delta^{\prime} that defines a refinement we can then use the equivalent (a) to deduce that ΔYΔY{\cal E}_{\Delta}\,{Y_{\ell}}\leq{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}. However if we can also compute the cost 666The cost is determined by the amount of “earth” to be moved, and the distance it must be moved. See for example [14]. of the particular earth move we can conclude in addition that the difference |ΔYΔY||{\cal E}_{\Delta}\,{Y_{\ell}}-{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}| must be bounded above by an amount we can compute. This follows from the well-known Kantorovich-Rubinstein duality [11] which says that |ΔYΔY||{\cal E}_{\Delta}\,{Y_{\ell}}-{\cal E}_{\Delta^{\prime}}\,{Y_{\ell}}| is no more than minimal cost incurred by any earth move transforming Δ\Delta to Δ\Delta^{\prime} scaled by the ``Lipschitz constant'' 777The Lipschitz constant of a function is the amount by which the difference in outputs can vary when compared to the difference in inputs. of YY_{\ell}. We use these ideas in Lem. 11 and Thm. 13.

V Measures on continuous 𝒳{\cal X} and 𝒴{\cal Y}

V-A Measures via probability density functions

Continuous analogues of the π\pi, MM and \ell will be our principal concern here: ultimately we we will use 𝕄[0,1]{\mathbb{M}}[0,1] for our measurable spaces 𝒳{\cal X} and 𝒴{\cal Y}, and will suppose for simplicity that 𝒳=𝒴=[0,1]{\cal X}{=}{\cal Y}{=}[0,1]. (More generality is achieved by simple scaling).

Measures 𝕄[0,1]{\mathbb{M}}[0,1] (that is 𝕄𝒳{\mathbb{M}}{\cal X} and 𝕄𝒴{\mathbb{M}}{\cal Y}) will be given as probability density functions, where a pdf say μ:[0,1]\mu{:}\,[0,1]{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq} determines the probability abμ\int^{b}_{a}\mu assigned to the sample [a,b][0,1][a,b]{\subseteq}[0,1] using the standard Borel measure on [0,1][0,1], and more generally the expected value of some random variable VV on [a,b)[a,b) given by pdf μ\mu is abμ(x)V(x)dx\int_{a}^{b-}\mu(x)V(x)\kern 1.00006pt{\rm d}x .

Even though μ\mu is of type pdf, we abuse notation to write for example μ[a,b)\mu[a,b) for the probability abμ\int_{a}^{b-}\kern-7.5pt\mu that μ\mu assigns to that interval, and μa\mu_{a} for the probability μ\mu assigns to the point aa alone, i.e. some rr just when when the actual pdf-value of μ(a)\mu(a) is the Dirac delta-function scaled by rr, written 𝜹r\textrm{\boldmath$\delta$}_{r} .

V-B Continuous mechanisms over continuous priors

Our mechanisms MM, up to now discrete, will now become ``kontinuous'', renamed KK as a mnemonic. Thus a continuous mechanism K:𝒳𝕄𝒴K{:}\,{\cal X}{\mathbin{\rightarrow}}{\mathbb{M}}{\cal Y} given input xx produces measure K(x)K(x) on the observations 𝒴=[0,1]{\cal Y}{=}[0,1]. And given a a whole continuous prior π:𝕄[0,1]\pi{:}\,{\mathbb{M}}[0,1], that same KK therefore determines a joint measure over 𝒳×𝒴{\cal X}{\times}{\cal Y}. 888See [15, Appendix §A2].By analogy with (8,9) we have

Definition 2

Continuous version of (7) The expected loss $(π,K,)\textsf{\small\$}(\pi,K,\ell) due to continuous prior π\pi, continuous mechanism KK and loss function \ell is given by 999This is well defined whenever the 𝒲{\cal W}-indexed family of functions of yy given by 01(w,x)π(x)K(x)(y)dx\kern-1.99997pt\int^{1}_{0}\ell(w,x)\pi(x)K(x)(y)~{}\kern 1.00006pt{\rm d}x contains a countable subset 𝒲{\cal W}^{\prime} such that the inf over 𝒲{\cal W} is equal to the inf over 𝒲{\cal W}^{\prime} [16]. This is clear if 𝒲{\cal W} is finite, and whenever 𝒲{\cal W}^{\prime} can be taken to be the rationals.

01(infw(01(w,x)π(x)K(x)(y)dx))dy.\int^{1}_{0}(\,\textsf{\small inf}_{w}~{}(\kern-1.99997pt\int^{1}_{0}\ell(w,x)\pi(x)K(x)(y)~{}\kern 1.00006pt{\rm d}x)\,)\kern 1.00006pt{\rm d}y\,. (10)

The continuous version of uncertainty (8) is now

Y(δ):=infw:𝒲01(w,x)δ(x)dxY_{\ell}(\delta):=\quad\textsf{\small inf}_{w{:}\,{\cal W}}\int_{0}^{1}\kern-3.00003pt\ell(w,x)\delta(x)\kern 1.00006pt{\rm d}x

and the continuous version of expected loss (9) is now

$(π,K,)=y:𝒴YdK(π) .\textsf{\small\$}(\pi,K,\ell)~{}~{}~{}=~{}~{}~{}\int_{y{:}\,{\cal Y}}\kern-5.0ptY_{\ell}\,\kern 1.00006pt{\rm d}{\mbox{\small$K(\pi)$}}\makebox[0.0pt][l]{\quad.}

V-C The truncated Laplace mechanism

As for the Geometric mechanism, the Laplace mechanism is based on the Laplace distribution. It is defined as follows:

Definition 3

(Laplace distribution) The ε\varepsilon-Laplace mechanism with with input xx in 𝒳=[0,1]{\cal X}{=}[0,1] and probability density in {\mathbb{R}}^{\geq} for output yy in 𝒴{\cal Y} is usually written as a pdf in yy (for given xx) as [17]

Lε(x,y):=ε/2eε|yx| .{{\textit{L}^{\varepsilon}}}(x,y):=\quad\varepsilon/2\cdot e^{-\varepsilon|y{-}x|}\makebox[0.0pt][l]{\quad.}

The ε/2\varepsilon/2 is a normalising factor. It is known [4] that the mechanism Lε{{\textit{L}^{\varepsilon}}} satisfies ε\varepsilon-DP over [0,1][0,1] (where the underlying metric on 𝒳{\cal X} is Euclidean). Just as for the Geometric mechanism we truncate Lε{{\textit{L}^{\varepsilon}}}'s outputs so that they also lie inside 𝒰{\cal U}. We do so in the same manner, by remapping all outputs greater than 11 to 11, and all outputs less than 0 to 0.

Definition 4

(truncated Laplace mechanism) As earlier for GεG^{\varepsilon}, we truncate the Laplace mechanism Lε{{\textit{L}^{\varepsilon}}} to Łε\textit{\L}^{\varepsilon} for inputs restricted to [0,1][0,1], and output restricted to [0,1][0,1], in the following way (as a pdf):

Łε(x)(y):=𝜹aif y=0Lε(x,y)if 0<y<1𝜹bif y=1 ,\begin{array}[]{llll}\textit{\L}^{\varepsilon}(x)(y)&:=&\textrm{\boldmath$\delta$}_{a}&\textit{if $y{=}0$}\\ &&{{\textit{L}^{\varepsilon}}}(x,y)&\textit{if $0{<}y{<}1$}\\ &&\textrm{\boldmath$\delta$}_{b}&\textit{if $y{=}1$}\makebox[0.0pt][l]{\quad,}\end{array}

where the constants a,ba,b are 0Lε(x,y)dy=eεx/2\int_{-\infty}^{0}{{\textit{L}^{\varepsilon}}}(x,y)\kern 1.00006pt{\rm d}y=e^{\varepsilon x}/2 and 1Lε(x,y)dy=eε(1x)/2\int_{1}^{\infty}{{\textit{L}^{\varepsilon}}}(x,y)\kern 1.00006pt{\rm d}y=e^{\varepsilon(1-x)}/2 respectively, and 𝛅r\textrm{\boldmath$\delta$}_{r} is the Dirac delta-function with weight rr.

We can now state our principal contribution. It is to show that the truncated Laplace Łε\textit{\L}^{\varepsilon} is universally optimal, in this continuous setting, in the same way that GεG^{\varepsilon} was optimal in the discrete setting:

Theorem 5

(truncated Laplace is optimal) Let KεK^{\varepsilon} be any continuous ε\varepsilon-DP mechanism with input and output both [0,1][0,1], and let π\pi be any continuous (prior) probability distribution over [0,1][0,1] and \ell any Lipschitz continuous 101010Lipschitz continuous is less general than continuous. It means that the difference in outputs is within a constant κ>0\kappa{>}0 scaling factor of the difference between the inputs., legal loss function on 𝒳=𝒰{\cal X}{=}{\kern 1.00006pt\cal U}.

Then $(π,Łε,)$(π,K,)\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)\leq\textsf{\small\$}(\pi,K,\ell) .


As we foreshadowed in the proof outline in §III, Thm. 5 relies ultimately on the earlier-proven optimality GεG^{\varepsilon} in the discrete case: we must show how we can approximate continuous ε\varepsilon-DP  mechanisms in discrete form, each one satisfying the conditions under which the earlier result applies, and in §VI we fill in the details. Along the way we show how the Laplace mechanism provides a smooth approximation to the Geometric- with discrete inputs.

VI Approximating Continuity for 𝒳{\cal X}

VI-A Connecting continuous and discrete

Our principal tool for connecting the discrete and continuous settings is the evenly-spaced discrete subset 𝒰N={0,1/N,2/N,N1/N,1}{\cal U}_{N}=\{0,\nicefrac{{1}}{{N}},\nicefrac{{2}}{{N}}\ldots,\nicefrac{{N-1}}{{N}},1\} of the unit interval 𝒰=[0,1]{\cal U}{=}[0,1] for ever-increasing N>0N{>}0.

The separation 1/N\nicefrac{{1}}{{N}} is the interval width.

VI-B Approximations of continuous priors

The NN-approximation of prior π:𝕄𝒰\pi{:}\,{\mathbb{M}}{\kern 1.00006pt\cal U} of type 𝔻𝒰N{\mathbb{D}}{\kern 1.00006pt\cal U}_{N}, i.e. yielding actual probabilities (not densities), and is defined

πN(n/N):=π[n/N,n+1/N)if n<N1π[n/N,1]if n=N10otherwise  .\pi_{N}(n/N):=\quad\begin{array}[t]{lp{10em}}\pi[\nicefrac{{n}}{{N}},\nicefrac{{n+1}}{{N}})&if $n\,{<}\,N{-}1$\\ \pi[\nicefrac{{n}}{{N}},1]&if $n\,{=}\,N{-}1$\\ 0&otherwise \makebox[0.0pt][l]{\quad.}\end{array}

The discrete πN\pi_{N} gathers each of the continuous π\pi-interval's measure onto its left point, with as a special case [1,1] from π\pi included onto the point N1/N\nicefrac{{N-1}}{{N}} of πN\pi_{N}.

As an example take NN to be 22, and π\pi to be the uniform (continuous) distribution over 𝒰{\cal U}, which can be represented by the constant 11 pdf. Since the interval width is 1/2\nicefrac{{1}}{{2}}, we see that πN\pi_{N} assigns probability 1/2\nicefrac{{1}}{{2}} to both 0 and 1/2\nicefrac{{1}}{{2}} and zero to all other points in 𝒰{\cal U}.

VI-C NN-step mechanisms and loss functions

In the other direction, we can lift discrete MM and loss-function \ell on 𝒰N{\cal U}_{N} into the continuous 𝒳=𝒰{\cal X}{=}{\kern 1.00006pt\cal U} by replicating their values for the xx's not in 𝒰N{\kern 1.00006pt\cal U}_{N}  in a way that constructs NN-step functions: we have

Definition 6

For xx in 𝒰=[0,1]{\cal U}{=}[0,1] define xN:=Nx/N\lfloor x\rfloor_{N}:=\lfloor Nx\rfloor/N.

Definition 7

Given mechanism M:𝒰N𝒴M{:}\,{\kern 1.00006pt\cal U}_{N}{\mathbin{\rightarrowtriangle}{\cal Y}}, define MN:[0,1]M_{N}{:}\,[0,1]{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq} so that

MN(x):=M(xN)if 0x<1M(N1/N)if x=1  .M_{N}(x):=\quad\begin{array}[t]{lp{10em}}M(\lfloor x\rfloor_{N})&if $0{\leq}x{<}1$\\ M(\nicefrac{{N-1}}{{N}})&if $x{=}1$ \makebox[0.0pt][l]{\quad.}\end{array}

Note that we have not yet committed here to whether MM produces discrete or continuous distributions on its output 𝒴{\cal Y}. We are concentrating only on its input (from 𝒳{\cal X}).

Similarly, given loss function :𝒲×𝒰N,\ell{:}\,{\cal W}{\times}{\cal U}_{N},{\mathbin{\rightarrow}}\,{\mathbb{R}}^{\geq}, define N:𝒲×[0,1]\ell_{N}{:}\,{\cal W}{\times}[0,1]\,{\mathbin{\rightarrow}}\,{\mathbb{R}}^{\geq} so that

N(w,x):=(w,xN)if 0x<1(w,N1/N)if x=1  .\ell_{N}(w,x):=\quad\begin{array}[t]{lp{10em}}\ell(w,\lfloor x\rfloor_{N})&if $0{\leq}x{<}1$\\ \ell(w,\nicefrac{{N-1}}{{N}})&if $x{=}1$ \makebox[0.0pt][l]{\quad.}\end{array}

Say that mechanisms and loss functions over [0,1][0,1] are NN-step functions just when they are constructed as above.

The important property enabled by the above definitions is the correspondence between loss functions' values in their pixelated and original versions, which will allow us to apply the earlier discrete-optimality result, based on Lem. 9 to come. That is, we have

Lemma 8

For any continuous prior π\pi in 𝕄𝒰{\mathbb{M}}{\kern 1.00006pt\cal U}, mechanism MM in 𝒰𝒴{\cal U}{\mathbin{\rightarrowtriangle}}{\cal Y} and loss function \ell in 𝒲×𝒰{\cal W}{\times}{\cal U}{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq} we have

$(π,MN,N)continuous 𝒳=$(πN,M,)discrete 𝒳 .\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi,M_{N},\ell_{N})$}}^{\textit{\small continuous ${\cal X}$}}~{}~{}~{}=~{}~{}~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},M,\ell)$}}^{\textit{\small discrete ${\cal X}$}}\makebox[0.0pt][l]{\quad.}

That is, the loss realised via a pixelated πN\pi_{N}, and (already discrete) MM and \ell, all operating on 𝒰N{\cal U}_{N}, is the same as the loss realised via the original continuous π\pi and the lifted (and thus NN-step) mechanism MNM_{N} and N\ell_{N}, now operating over all of 𝒳=𝒰{\cal X}{=}{\kern 1.00006pt\cal U}.

Proof:

We interpret the losses using Def. 2, focussing on the integrand of the inner integral. Note that we can split it up into a finite sum of integrals of the form n/Nn+1/Nπ(x)V(x)dx\int_{\nicefrac{{n}}{{N}}}^{\nicefrac{{n{+}1}}{{N}}-}\pi(x)V(x)\kern 1.00006pt{\rm d}x. When we do that for the left-hand formula $(π,MN,N)\textsf{\small\$}(\pi,M_{N},\ell_{N}) we see that throughout the interval [n/N,n+1/N)[\nicefrac{{n}}{{N}},\nicefrac{{n{+}1}}{{N}}) the contribution of the mechanism and the loss is constant, i.e. MN(x)(y)N(w,x)=M(n/N)(y)(w,n/N)M_{N}(x)(y)\,{\mathbin{\cdot}}\,\ell_{N}(w,x)=M(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}}). This means the integral becomes

M(n/N)(y)(w,n/N)n/Nn+1/Nπ(x)dxM(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}})\,{\mathbin{\cdot}}\,\int_{\nicefrac{{n}}{{N}}}^{\nicefrac{{n{+}1}}{{N}}-}\pi(x)\kern 1.00006pt{\rm d}x

which is equal to M(n/N)(y)(w,n/N)πN(n/N)M(\nicefrac{{n}}{{N}})(y)\,{\mathbin{\cdot}}\,\ell(w,\nicefrac{{n}}{{N}})\,{\mathbin{\cdot}}\,\pi_{N}(n/N). A similar argument applies to the last interval (which includes 11), compensated for by the definitions of N\ell_{N} and MNM_{N} to take their corresponding values from 1N/N\nicefrac{{{1{-}N}}}{{N}}.

Looking now at the right-hand formula, $(πN,M,)\textsf{\small\$}(\pi_{N},M,\ell) we see that it is now exactly the finite sum of the integrals just described. ∎

VI-D Approximating continuous ε\varepsilon-DP mechanisms

The techniques above give good discrete approximations for continuous-input ε\varepsilon-DP mechanisms MM acting on continuous priors simply by considering MNM_{N}'s for increasing NN's, using §VI-C. As a convenient abuse of notation, when we start with a continuous-input mechanism MM on [0,1][0,1] we write MNM_{N} to mean the NN-step mechanism that is made by first restricting MM to the subset 𝒰N{\cal U}_{N} of [0,1][0,1] and then lifting that restriction ``back again'' as in Def. 7, effectively converting it into an NN-step function. When we do this we find that the posterior loss wrt. NN-step loss functions can be bounded above and below by using pixelated priors and NN-stepped mechanisms.

Lemma 9

Let KK be a continuous-input ε\varepsilon-DP mechanism, and π\pi in 𝕄[0,1]{\mathbb{M}}[0,1] a continuous prior and \ell a (non-negative) NN-step function. Then the following inequalities hold:

eεN$(πN,KN,)𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒𝒳$(π,K,)𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠𝒳eεN$(πN,KN,)𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒𝒳.e^{-\frac{\varepsilon}{N}}\cdot\,\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},K_{N},\ell)$}}^{\it discrete{\cal X}}~{}\leq~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi,K,\ell)$}}^{\it continuous{\cal X}}~{}\leq~{}~{}e^{\frac{\varepsilon}{N}}\cdot\,\overbrace{\raisebox{0.0pt}[10.76385pt]{$\textsf{\small\$}(\pi_{N},K_{N},\ell)$}}^{\it discrete{\cal X}}\,.

(Notice that the middle formula $(π,K,)\textsf{\small\$}(\pi,K,\ell), the mechanism KK is not NN-stepped, but in the formulae on either side they are as in Lem. 8.)

Proof:

The proof is as for Lem. 8, but noting also that KK's being ε\varepsilon-DP implies that for all NN we have K(xN)(y)×eεNK(x)(y)K(xN)(y)×eεNK(\lfloor x\rfloor_{N})(y){\times}e^{-\frac{\varepsilon}{N}}\leq K(x)(y)\leq K(\lfloor x\rfloor_{N})(y){\times}e^{\frac{\varepsilon}{N}}. 111111Here we are using the ε\varepsilon-DP-constraints applied to the pdf K(x)(y)K(x)(y).

With Lem. 8 and Lem. 9 we can study optimality of Łε\textit{\L}^{\varepsilon} on finite discrete inputs 𝒰N{\cal U}_{N}. We will see that, although Geometric mechanisms are still optimal for the (effectively) discrete inputs 𝒰N{\cal U}_{N}, the Laplace mechanism provides increasingly good approximate optimality for 𝒰N{\cal U}_{N} as NN increases, and is in fact (truly) optimal in the limit.

VII The Laplace and Geometric mechanisms

In this section we make precise the restriction of the Geometric mechanism GεG^{\varepsilon}II-D1) to inputs and outputs both in 𝒰N{\cal U}_{N} (a subset of [0,1][0,1]): for both x,yx,y in 𝒰N{\cal U}_{N} we define

GNε(x)(y)on 𝒰N:=GεN(Nx)(Ny)on (0..N) .\overbrace{\raisebox{0.0pt}[10.76385pt]{$G_{N}^{\varepsilon}(x)(y)$}}^{\textrm{\small on ${\cal U}_{N}$}}~{}~{}~{}:=~{}~{}~{}\overbrace{\raisebox{0.0pt}[10.76385pt]{$G^{\frac{\varepsilon}{N}}(Nx)(Ny)$}}^{\textrm{\small on $(0..N)$}}\makebox[0.0pt][l]{\quad.} (11)

As an illustration, we take ε=2ln4\varepsilon{=}2\ln 4 and input 𝒳=𝒰2{\cal X}{=}{\kern 1.00006pt\cal U}_{2}, in which the 2 comes from 𝒰2{\cal U}_{2} and the ln4\ln 4 comes from the α=1/4\alpha{=}\nicefrac{{1}}{{4}} of the Geometric distribution used to make the mechanism GεG^{\varepsilon}. Using the three points 0,1/20,\nicefrac{{1}}{{2}} and 11 of the input, we compute the truncated geometric mechanism G2εG_{2}^{\varepsilon} as the channel below, where the rows' labels are (invisibly) the inputs 𝒰2{\cal U}_{2}, and the columns are similarly labelled by the outputs (also 𝒰2{\cal U}_{2} in this case). This means that if the input was 0, then the output (after truncation) will be 0 with probability 4/5\nicefrac{{4}}{{5}}, and 1/21/2 with probability 3/20\nicefrac{{3}}{{20}} etc:

G2ε=[4/53/201/201/53/51/51/203/204/5] .G_{2}^{\varepsilon}=\begin{bmatrix}\nicefrac{{4}}{{5}}&\nicefrac{{3}}{{20}}&\nicefrac{{1}}{{20}}\\ \nicefrac{{1}}{{5}}&\nicefrac{{3}}{{5}}&\nicefrac{{1}}{{5}}\\ \nicefrac{{1}}{{20}}&\nicefrac{{3}}{{20}}&\nicefrac{{4}}{{5}}\end{bmatrix}\makebox[0.0pt][l]{\quad.}

Notice now that the ratio of adjacent probabilities that are in the same column satisfy the ε\varepsilon-DP constraint, so for example 4/5÷1/5=3/5÷3/20=4e(2ln4)/2\nicefrac{{4}}{{5}}\div\nicefrac{{1}}{{5}}=\nicefrac{{3}}{{5}}\div\nicefrac{{3}}{{20}}=4\leq e^{(2\ln 4)/2}. Notice also that the distance between adjacent inputs in 𝒰2{\cal U}_{2} under the Euclidean distance is 1/2\nicefrac{{1}}{{2}}, not 1 as it would be in the conventional 𝒳=(0,1,2){\cal X}{=}(0,1,2).

Suppose now that we consider 𝒰4{\cal U}_{4} instead, consisting of the five points 0,1/4,1/2,3/40,\nicefrac{{1}}{{4}},\nicefrac{{1}}{{2}},\nicefrac{{3}}{{4}} and 11, and we adjust the α\alpha in the underlying Geometric distribution GαG_{\alpha} from §II-D1(5). The ε\varepsilon-DP parameter ε\varepsilon, now 4ln24\ln 2, is the same as before — and the resulting matrix is

G4ε=[2/31/61/121/241/241/31/31/61/121/121/61/61/31/61/61/121/121/61/31/31/241/241/121/62/3]G_{4}^{\varepsilon}=\begin{bmatrix}\nicefrac{{2}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}\\ \nicefrac{{1}}{{3}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{12}}\\ \nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{6}}\\ \nicefrac{{1}}{{12}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{1}}{{3}}&\nicefrac{{1}}{{3}}\\ \nicefrac{{1}}{{24}}&\nicefrac{{1}}{{24}}&\nicefrac{{1}}{{12}}&\nicefrac{{1}}{{6}}&\nicefrac{{2}}{{3}}\end{bmatrix}

As before though, the ratio of adjacent probabilities that are in the same column satisfy the ε\varepsilon-DP-constraint over all of 𝒰4{\cal U}_{4}: now we have 2/3÷1/3=1/3÷1/2=2e(4ln2)/4\nicefrac{{2}}{{3}}\div\nicefrac{{1}}{{3}}=\nicefrac{{1}}{{3}}\div\nicefrac{{1}}{{2}}=2\leq e^{(4\ln 2)/4}.

This amplifies the explanation in (2) that the ε\varepsilon-DP constraints over discrete inputs 𝒰N{\cal U}_{N} must take into account the underlying metric on the input space. More generally, whenever we double NN in 𝒰N{\cal U}_{N}, the α\alpha-parameter must become α\sqrt{\alpha}.

At this point, we have enough to be able to appeal to the discrete optimality result, to bound below the losses for continuous mechanisms, provided that the loss N\ell_{N} is NN-legal, i.e. that its legality obtains at least for the distinct points in 𝒰N{\cal U}_{N}.

Lemma 10

For any continuous prior π\pi in 𝕄𝒰{\mathbb{M}}{\kern 1.00006pt\cal U}, ε\varepsilon-DP-mechanism M:𝒰𝒴M{:}\,{\cal U}{\mathbin{\rightarrowtriangle}}{\cal Y} and loss function :𝒲×𝒰\ell{:}\,{\cal W}{\times}{\cal U}{\mathbin{\rightarrow}}{\mathbb{R}}^{\geq} such that N\ell_{N} is NN-legal, we have:

$(πN,GNε,N)$(π,MN,N)\textsf{\small\$}(\pi_{N},G_{N}^{\varepsilon},\ell_{N})~{}~{}~{}\leq~{}~{}~{}\textsf{\small\$}(\pi,M_{N},\ell_{N})
Proof:

Follows from Lem. 8 and noting that MM restricted to 𝒰N{\cal U}_{N} satisfies the conditions for universal discrete optimality [1]. ∎

Our next task is to study the relationship between the Geometric- and Laplace mechanisms. We show first that GNεG_{N}^{\varepsilon} is refined (§IV-D) by the truncated Laplace mechanism also restricted to to 𝒰N{\cal U}_{N}. Since Łε\textit{\L}^{\varepsilon} is already defined over the whole of 𝒰{\cal U} we continue to write its restriction to 𝒰N{\cal U}_{N} as Łε\textit{\L}^{\varepsilon}. This will immediately show that losses under the Geometric are no more than those under the Laplace (§IV-D(1)), consistent with observations that, on discrete inputs, Laplace obfuscation does not necessarily minimise the loss. Since the output 𝒴{\cal Y} of Łε\textit{\L}^{\varepsilon} is continuous, we proceed by first approximating it using post-processing to make Laplace-based mechanisms ŁεT{}^{T}\textit{\L}^{\varepsilon}, defined below, which have discrete output, and which can form an anti-refinement chain converging to Łε\textit{\L}^{\varepsilon}. We are then able to show separately the refinements between GNεG_{N}^{\varepsilon} and ŁεT{}^{T}\textit{\L}^{\varepsilon}, using methods designed for finite mechanisms.

The T,NT\kern-1.00006pt,\kern-1.00006ptN-Laplace mechanisms approximate Łε\textit{\L}^{\varepsilon} by TT-pixelation of their outputs. Here xx is (still) in 𝒰N{\cal U}_{N} but yy is in 𝒰T{\cal U}_{\kern 0.70004ptT}.

ŁεT(x)(y):=Łε(x)[y,y+1/T)ify<11/TŁε[11/T,1]otherwise.\begin{array}[]{lll}{}^{T}\textit{\L}^{\varepsilon}(x)(y)~{}~{}~{}:={}{}{}&\textit{\L}^{\varepsilon}(x)[y,y{+}\nicefrac{{1}}{{T}})&~{}\textit{if}~{}y{<}1{-}\nicefrac{{1}}{{T}}\\ &\textit{\L}^{\varepsilon}[1{-}\nicefrac{{1}}{{T}},1]&~{}\textit{otherwise.}\end{array} (12)

That is, we pixelate the 𝒴{\cal Y} using TT for the Laplace (independently of the NN we use for 𝒳{\cal X}.) This is illustrated in Fig. 1(a).

Observe that as this is a post-processing (§IV-D(3)) of the output of Łε\textit{\L}^{\varepsilon}, the refinement ŁεŁεT\textit{\L}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T}\textit{\L}^{\varepsilon} follows.

VII-A Refinement between NN-Geometric
and T,NT\kern-1.00006pt,\kern-1.00006ptN-Laplace mechanisms

We now demonstrate the crucial fact that GNεG_{N}^{\varepsilon} is refined by ŁεT{}^{T}\textit{\L}^{\varepsilon}. We use version (b) of refinement, described in §IV-D, and establish a Wasserstein-style earth-move between hypers [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G^{\varepsilon}_{N}}] and [ŁεT][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}] (i.e. for uniform prior \odot).

Refer to caption

The width of the central ``vertical slice'' is 1/T1/T.

(a) Illustrates batching the output for LT{}^{T}L (similar for ŁεT{}^{T}\textit{\L}^{\varepsilon}). The outputs (shown here as pdf plots) are batched into output segments of length 1/T\nicefrac{{1}}{{T}} in this example, for T=8T{=}8. The segment from [x,x+1/T)[x,x{+}\nicefrac{{1}}{{T}}) is indicated by the two vertical lines. The probability assigned to this segment is the area under the relevant curves. For the red curve it is the sum of the white and blue regions; the green curve it is the sum of the white, blue and green regions and for the black curve it is only the white region.
Refer to caption
(b) The supports of hypers [G2ε][\raisebox{1.1625pt}{\footnotesize$\odot$}\kern 0.90005pt{\triangleright}\kern 0.90005pt{G_{2}^{\varepsilon}}] (orange) and [Łε8][\raisebox{1.1625pt}{\footnotesize$\odot$}\kern 0.90005pt{\triangleright}\kern 0.90005pt{{}^{8}\textit{\L}^{\varepsilon}}] (blue) for inputs {0,1/2,1}\{0,1/2,1\} placed within the (triangular) probability simplex. The blue points are within in the convex hull of the orange points.
Figure 1: NN-geometric and T,NT\kern-0.90005pt,\kern-0.90005ptN-Laplace mechanisms.
Lemma 11

For all integer T>0T{>}0 we have that GNεŁεTG_{N}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T}\textit{\L}^{\varepsilon}.

Proof:

Take Δ,Δ\Delta,\Delta^{\prime} in 𝔻2𝒰N{\mathbb{D}}^{2}{\kern 1.00006pt\cal U}_{N} as hypers both with finite supports. We can depict such hypers in N+1{\mathbb{R}}^{N+1}-space by locating their supports, each of which is a point in N+1{\mathbb{R}}^{N+1}, where the axes of the diagram correspond to each point in 𝒰N{\cal U}_{N}. For example if we take Δ\Delta to be the hyper-distribution [G2ε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G_{2}^{\varepsilon}}], it has three posterior distributions, which are 1-summing triples in 3{\mathbb{R}}^{3}. They are depicted by the orange points in Fig. 1. Similarly the supports of the a hyper-distribution Δ\Delta^{\prime} taken to be [ŁεT][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}] are represented by the blue dots. Notice that the blue dots are contained in the convex hull of the orange dots, and this observation allows us to prove that the mechanisms G2εG_{2}^{\varepsilon} and Łε8{}^{8}\textit{\L}^{\varepsilon} are in a refinement relation.

We use the following fact [8, Lem. 12.2] about refinement ()(\mathrel{\sqsubseteq}).

Let C,C:𝒰N𝒰TC,C^{\prime}{:}\,{\cal U}_{N}\,{\mathbin{\rightarrowtriangle}}\,{\cal U}_{\kern 0.49005ptT} be channels and let \odot be the uniform prior. If the supports of [C][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptC] are linearly independent when considered as vectors in N{\mathbb{R}}^{N}, and their convex hull encloses the supports of [C][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptC^{\prime}], then CCC\mathrel{\sqsubseteq}C^{\prime}. 121212The lemma applies to channels because of the direct correspondence between channels and the supports of hyper-distributions formed from uniform priors.

To apply this result, we let CC be GNεG_{N}^{\varepsilon} recall that indeed the supports of [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{G_{N}^{\varepsilon}}] are linearly independent (see for example [5]). Moreover in general, the supports of [ŁεT][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt{{}^{T}\textit{\L}^{\varepsilon}}] are also contained in the convex hull. We provide details of this latter fact in [15, Appendix §B].∎

Finally we can show full refinement between the Laplace and the Geometric mechanism, which follows from continuity of refinement [13].

Theorem 12

GNεŁεG_{N}^{\varepsilon}\mathrel{\sqsubseteq}\textit{\L}^{\varepsilon}.

Proof:

We first form an anti-refinement chain ŁεT1ŁεT0\dots\mathrel{\sqsubseteq}~{}^{T_{1}}\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}~{}^{T_{0}}\textit{\L}^{\varepsilon} so that (a) ŁεTiŁε\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}^{T_{i}}\textit{\L}^{\varepsilon} for all ii, and (b) the chain converges to Łε\textit{\L}^{\varepsilon}.

We reason as follows:

GNεŁε\begin{array}[t]{@{}llll}G_{N}^{\varepsilon}\mathrel{\sqsubseteq}\textit{\L}^{\varepsilon}\end{array}
iff GNεŁεTi for all i0\begin{array}[t]{@{}ll}{}\\ G_{N}^{\varepsilon}\mathrel{\sqsubseteq}~{}^{T_{i}}\textit{\L}^{\varepsilon}\quad\textrm{~{}for all~{}}i{\geq}0\end{array} ``\mathrel{\sqsubseteq} is continuous; (a), (b) above''

which follows from Lem. 11. We provide details of (a), (b) just above in [15, Appendix §B].∎

We have shown that the Laplace mechanism is a refinement of the Geometric mechanism. This means that it genuinely leaks less information than does the Geometric mechanism and therefore affords greater privacy protections. On the other hand this means that we have lost utility with respect to the aggregated information. In the next section we turn to the comparison of the Laplace and Geometric mechanisms with respect to that loss.

VII-B The Laplace approximates the Geometric

The geometrical interpretation of the Laplace and Geometric mechanisms set out above indicates how the Laplace approximates the Geometric as 𝒰N{\kern 1.00006pt\cal U}_{N}'s interval-width approaches 0. In particular the refinement relationship established in Thm. 12 describes how the posteriors of [TŁε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}] all lie in between pairs of posteriors of [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG^{\varepsilon}_{N}]. This relationship between posteriors translates to a bound between the corresponding expected losses $(,Łε,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell) and $(,GNε,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell) via the Kantorovich-Rubinstein theorem applied to the hypers [TŁε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}] and [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}]. We sketch the argument in the next theorem, and provide full details in [15, Appendix §D].

Theorem 13

Let \ell be a κ\kappa-Lipschitz loss function, and \odot the uniform distribution over 𝒰N{\cal U}_{N}. Then

$(,Łε,)$(,GNε,)cκ/N,\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{}, (13)

where c=3/(1eε)2c=3/(1{-}e^{-\varepsilon})^{2} is constant for fixed ε\varepsilon.

Proof:

We appeal to the Kantorovich-Rubinstein theorem which states that the ``Kantorovich distance'' between probability distributions Δ,Δ\Delta,\Delta^{\prime} bounds above the difference between expected values |ΔfΔf||{\cal E}_{\Delta}\,{f}-{\cal E}_{\Delta^{\prime}}\,{\kern-1.99997ptf}| whenever ff satisfies the κ\kappa-Lipschitz condition. In our case the relevant distributions are the hyper-distributions [TŁε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}] and [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}], and the relevant Lipschitz functions are YY_{\ell} for loss functions \ell. 131313 Some f:𝔻𝒳f{:}\,{\mathbb{D}}{\cal X}\mathbin{\rightarrow}{\mathbb{R}} is κ\kappa-Lipschitz if |f(δ)f(δ)|κ𝕎(δ,δ)|f(\delta)-f(\delta^{\prime})|\leq\kappa{\mathbb{W}}(\delta,\delta^{\prime}), for κ>0\kappa{>}0, and 𝕎(δ,δ){\mathbb{W}}(\delta,\delta^{\prime}) is the Kantorovich distance between δ,δ\delta,\delta^{\prime}.

We write 𝕎(Δ,Δ){\mathbb{W}}(\Delta,\Delta^{\prime}) for the Wasserstein distance between hyper-distributions Δ,Δ\Delta,\Delta^{\prime} which is determined by the minimal earth-moving cost to transform Δ\Delta to Δ\Delta^{\prime}. For any such earth move each posterior δ\delta of Δ\Delta is reassigned to a selection of posteriors of Δ\Delta^{\prime} in proportion to the probability mass that Δ\Delta assigns to δ\delta. The cost of the move is the expected value of the distance between posterior reassignment (weighted by the proportion of the reassignment). Thus the cost of any specific earth move provides an upper bound to 𝕎(Δ,Δ){\mathbb{W}}(\Delta,\Delta^{\prime}). 141414All the costs are determined by the underlying metric used to define the probability distributions. For us this is determined by the Euclidean distance on the interval [0,1][0,1]. Importantly for us, the relation of refinement \mathrel{\sqsubseteq} determines a specific earth move [8] whose cost we can calculate.

Referring to Lem. 11 and Fig. 1, we see that the refinement between the approximation to the Laplace [TŁε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}] and [GNε][\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}], reassigns the Geometric's posteriors (the orange dots) to the Laplace's posteriors (the blue dots). Crucially though the Geometric's posteriors form a linear order according to their distance from one another, and the refinement described in Lem. 11 shows how each Laplace posterior lies in between adjacent pairs of Geometric posteriors (according to the linear ordering), provided that NN divides TT. Therefore any redistribution of a Geometric posterior is bounded above by the distance to one or other of its adjacent posteriors. We show in [15, Appendix §D] that distances between adjacent pairs is bounded above by c/Nc/N, and therefore 𝕎([TŁε],[GNε])c/N{\mathbb{W}}([\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006pt^{T}\textit{\L}^{\varepsilon}],[\raisebox{1.29167pt}{\footnotesize$\odot$}\kern 1.00006pt{\triangleright}\kern 1.00006ptG_{N}^{\varepsilon}])\leq c/N.

Next we observe that if (w,x)\ell(w,x) is a κ\kappa-Lipschitz function on [0,1][0,1] (as a function of xx), then YY_{\ell} is a κ\kappa-Lipschitz function, and so by the Kantorovich-Rubinstein theorem we must have (recalling from (9)) that $(π,M,)=[πM]Y\textsf{\small\$}(\pi,M,\ell){=}{\cal E}_{[\pi\kern 0.70004pt{\triangleright}\kern 0.70004ptM]}\,{Y_{\ell}}:

$(,TŁε,)$(,GNε,)cκ/N.\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{}. (14)

By Thm. 12 and post-processing we see that GNεŁεŁεTG_{N}^{\varepsilon}{\mathrel{\sqsubseteq}}\textit{\L}^{\varepsilon}{\mathrel{\sqsubseteq}}~{}^{T}\textit{\L}^{\varepsilon}. Recall from (a) that refinement means that the corresponding losses are also ordered, i.e.

$(,GNε,)$(,Łε,)$(,TŁε,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell)\leq\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)\leq\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)

and so the difference $(,Łε,)$(,GNε,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell) must be no more than the difference $(,TŁε,)$(,GNε,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},^{T}\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},G_{N}^{\varepsilon},\ell) , thus (13) follows from (14). Full details are set out in [15, Appendix §D].∎

More generally, (13) holds whatever the prior.

Theorem 14

Let \ell be a κ\kappa-Lipschitz loss function, and π\pi any prior over 𝒰N{\cal U}_{N}. Then

$(π,Łε,)$(π,GNε,)cκ/N.\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)-\textsf{\small\$}(\pi,G_{N}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}c\kappa/N~{}. (15)
Proof:

This follows as for Thm. 13, by direct calculation, noting that for discrete distributions we have $(,M,)=$(πN,M,)\textsf{\small\$}(\raisebox{1.29167pt}{\footnotesize$\odot$},M,\ell^{*})=\textsf{\small\$}(\pi_{N},M,\ell), where (w,x):=(w,x)×πN(x)×N\ell^{*}(w,x){:=}\ell(w,x){\times}\pi_{N}(x){\times N}. Details are given in [15, Appendix §D].∎

VII-C Approximating monotonic functions

The final piece needed to complete our generalisation of the Ghosh et al.'s optimality theorem is monotonicity. We describe here how to approximate continuous monotonic functions, and expose the limitations for the Laplace mechanism.

Definition 15

The loss function :𝒲×𝒳\ell:{\cal W}{\times}{\cal X}\mathbin{\rightarrow}{\mathbb{R}} is said to be monotone if: there is some mapping θ:𝒲[0,1]\theta{:}\,{\cal W}{\mathbin{\rightarrow}}[0,1], such that

(w,x):=m(|θ(w)x|,x),\ell(w,x)~{}~{}~{}:=~{}~{}~{}m(|\theta(w){-}x|,x)~{},

where m:×m:{\mathbb{R}}{\times}{\mathbb{R}}\mathbin{\rightarrow}{\mathbb{R}} is monotone in its first argument.

Notice how θ\theta takes care of any remapping that might need to be applied for computing expected losses. Interestingly step functions are not in general monotone on the whole of the continuous input [0,1][0,1], but fortunately they are for the restrictions to 𝒰N{\cal U}_{N} that we need.

Lemma 16

Let \ell be monotone. Then the approximation T\ell_{T} restricted to 𝒰N{\cal U}_{N} is monotone whenever TT is a multiple of NN.

Proof:

If x𝒰Nx{\in}{\kern 1.00006pt\cal U}_{N} then xT=x\lfloor x\rfloor_{T}{=}x since NN divides TT. ∎

Examples of continuous monotone loss functions include 𝓁𝓃\mathpzc{len} and 𝓁𝓃2\mathpzc{len}^{2}, where x,w[0,1]x,w\in[0,1], and

𝓁𝓃(𝓌,𝓍):=|𝓍𝓌|.\mathpzc{len}(w,x)~{}~{}~{}:=~{}~{}~{}|x{-}w|~{}. (16)

Note that 𝓁𝓃\mathpzc{len} is 1-Lipschitz and 𝓁𝓃2\mathpzc{len}^{2} is 2-Lipschitz.

We note finally that as the pixellation of NN of \ell increases the approximations N\ell_{N} converge to \ell.

VIII Universal optimality
for the Laplace Mechanism

We finally have all the pieces in place to prove our main result, Thm. 5 from §V-C — the generalisation of discrete optimality [1].

Let KεK^{\varepsilon} be any continuous ε\varepsilon-DP  mechanism with input [0,1][0,1], and let π\pi be a (continuous) probability distribution over [0,1][0,1] and \ell a legal (i.e. continuous, monotone, κ\kappa-Lipschitz) loss function. Then:

$(π,Łε,)$(π,Kε,).\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell)~{}~{}~{}\leq~{}~{}~{}\textsf{\small\$}(\pi,K^{\varepsilon},\ell)~{}. (17)
Proof:

We use the above results to approximate the expected posterior loss by step functions; these approximations are equivalent to posterior losses over discrete mechanisms satisfying ε\varepsilon-DP enabling appeal to Ghosh et al.'s universal optimality result on discrete mechanisms. We reason as follows:


$(π,Kε,N)×eε/N\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi,K^{\varepsilon},\ell_{N})\times e^{\nicefrac{{\varepsilon}}{{N}}}\end{array}
\geq $(πN,KNε,N)\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi_{N},K^{\varepsilon}_{N},\ell_{N})\end{array} ``Lem. 9''
\geq $(πN,GNε,N)\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi_{N},G_{N}^{\varepsilon},\ell_{N})\end{array} ``Lem. 10: N\ell_{N} is legal by Lem. 16''
\geq $(πN,Łε,N)cκ/N\begin{array}[t]{@{}ll}{}\\ \textsf{\small\$}(\pi_{N},\textit{\L}^{\varepsilon},\ell_{N})-~{}c\kappa/N\end{array} ``Thm. 14; N\ell_{N} is κ\kappa-Lipschitz''
\geq $(π,Łε,N)×eε/ncκ/N.\begin{array}[t]{@{}llll}\textsf{\small\$}(\pi,\textit{\L}^{\varepsilon},\ell_{N})\times e^{\nicefrac{{-\varepsilon}}{{n}}}~{}-c\kappa/N~{}.\end{array} ``Lem. 9''

The result now follows as above by taking NN to \infty, and noting that eε/Ne^{\nicefrac{{\varepsilon}}{{N}}}, eε/Ne^{\nicefrac{{-\varepsilon}}{{N}}}, cκ/Nc\kappa/N and N\ell_{N} converge to 1,1,0,1,1,0,\ell respectively, and that taking expected values over fixed distributions is continuous. ∎

Note that Thm. 5 only holds for mechanisms that are ε\varepsilon-DP. An arbitrary embedding KNK_{N} is not necessarily ε\varepsilon-DP, and in particular Thm. 5 does not apply to GNεG_{N}^{\varepsilon}. Also the continuous property on \ell is required, because N\ell_{N} must be monotone for all NN. Thus arbitrary step functions do not satisfy this property, and so the Laplace mechanism is not in general universally optimal wrt. arbitrary step functions. Two popular loss functions however are continuous, and thus we have the following corollary.

Corollary 17

The Laplace mechanism is universally optimal for 𝓁𝓃\mathpzc{len} and 𝓁𝓃2\mathpzc{len}^{2}.

IX Related work

The study of (universally) optimal mechanisms is one way to understand the cost of obfuscation, needed to implement privacy, but with a concomitant loss of utility of queries to databases. Pai and Roth [18] provide a detailed survey of the principles underlying the design of mechanisms including the need to trade utility with privacy, and Dinur et al. [19] explore the relationship between how much noise needs to be added to database queries relative to the usefulness of the data released. Our use of loss functions to measure utility follows both that of Ghosh et al. [1] and Alvim et al. [9], and concerns optimality for entire mechanisms that satisfy a particular level of ε\varepsilon-DP. For example, the mean error 𝓁𝓃\mathpzc{len} and the mean squared error 𝓁𝓃2\mathpzc{len}^{2} can be used to evaluate loss, as described by Ghosh et al. [1] and mentioned in §VII-C.

The Laplace mechanism as a way to implement differential privacy has been shown for example by Dwork and Roth [20]. Moreover Chatzikokolakis et al. [4] showed how it satisfied ε\varepsilon-DP-privacy as formulated here using the Euclidean metric.

Whilst rare, optimality results avoid the need to design bespoke implementations of privacy mechanisms that are tailored to particular datasets. The Geometric mechanism appears to be special for discrete inputs, as Ghosh et al. [1] showed when utility is measured using their ``legal'' loss functions. On the other hand, although the Laplace mechanism continues to be a popular obfuscation mechanism, its deficiencies in terms of utility have been demonstrated by others when the inputs to the obfuscation are discrete [21], and where the optimisation is based on minimising the probability of reporting an incorrect value, subject to the ε\varepsilon-DP-constraint. Similarly Geng et al.  [22] show that adding noise according to a kind of ``pixellated'' distribution appears to produce the best utility for arbitrary discrete datasets. Such examples are consistent with our Thm. 12 showing where the Laplace mechanism is a refinement of the Geometric mechanism (loses more utility) when restricted to a discrete input (to the obfuscation). We mention also that optimal mechanisms have also been studied by Gupte et al. [23] wrt. minimax agents, rather than average-case minimising agents.

Other works have shown the Laplace mechanism is optimal for metric differential privacy in particular non-Bayesian scenarios. Koufogiannis et al. [24] show that the Laplace mechanism is optimal for the mean-squared error function under Lipschitz privacy, equivalent to metric differential privacy; and Wang et al. [25] show that the Laplace mechanism minimises loss measured by Shannon entropy, again for metric differential privacy. Our result on {\mathbb{R}} includes those results as specific cases; however, those works do go further in that they demonstrate optimality for their respective loss functions on n{\mathbb{R}}^{n}. We leave the study of these domains in the Bayesian setting to future work.

We also note that the linear ordering of the underlying query results seems to be important for finding optimality results. For example Brenner and Nissim [26] have demonstrated that for non-linearly ordered inputs, there are no optimal ε\varepsilon-DP-mechanisms for a fixed level of ε\varepsilon. Although their result finds that only counting queries have optimal mechanisms, their context (oblivious mechanisms on database queries) does not include the possibility of continuous valued query results with a linear order; our result does not contradict their impossibility, it can be seen rather as an extension of this result to a continuous setting.

Alvim et al. [27] also use the framework of Quantitative Information Flow to study the relationship between the privacy and the utility of ε\varepsilon-DP mechanisms. In their work they model utility in terms of a leakage measure, where leakage is defined as the ratio of input gain to output gain wrt. a mechanism modelled as a channel. Their gain is entirely dual to our loss here, and is a model of an adversary trying to infer as much as possible about the secret input. Other notions of optimality have also been studied in respect of showing that the Laplace mechanism is not optimal, including [28] who work with ``near instance'' optimality, and Geng and Viswanath [22] show how to scale the Laplace in various ways to obtain good utility. Note also that these definitions of optimality do not use a prior, and therefore represent the special case of utility per exact input, rather than in a scenario where the observer's prior knowledge is included.

The use of the Laplace mechanism in real privacy applications has been demonstrated by Chatzikokolakis et al. [4] for geolocation privacy, and [29] for for privacy-preserving graph analysis, and Phan et al. [30] in deep learning.

Information-theoretic channel models for studying differential privacy were originally proposed by Alvim et al. [7, 27], and extended to arbitrary metrics in [4].

X Conclusion

We have studied the relationship between differential privacy (good) and loss of utility (bad) when the input 𝒳{\cal X} can be over an interval of the reals, rather than having 𝒳{\cal X} described as in the optimality result of Ghosh et al. [1, 31], i.e. as a discrete space. Here we have instead used as input space the continuous interval [0,1][0,1]; but we note that the result extends straightforwardly to any finite interval [a,b][a,b] of {\mathbb{R}}. Our result also imposes the condition that the loss functions must be κ\kappa-Lipschitz for some finite κ\kappa. We do not know whether this condition can be removed in general.

We observe that for NN-step loss functions, the Laplace mechanism is not optimal, and in fact a bespoke Geometric mechanism will be optimal for such loss functions. However our Thm. 14 provides a way to estimate the error, relative to the optimal loss, when using the Laplace mechanism.

Finally we note that the space of ε\varepsilon-DP mechanisms is very rich, even for discrete inputs, suggesting that the optimality result given here will be useful whenever the input domain can be linearly ordered.

Acknowledgements

We thank Catuscia Palamidessi for suggesting this problem to us.

The appendices may be found at [15].

References

  • [1] A. Ghosh, T. Roughgarden, and M. Sundarajan, ``Universally utility-maximising privacy mechanisms,'' SIAM J. COMPUT, vol. 41, no. 6, pp. 1673–1693, 2012.
  • [2] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, ``Additive and multiplicative notions of leakage, and their capacities,'' in IEEE 27th Computer Security Foundations Symposium, CSF 2014, Vienna, Austria, 19-22 July, 2014.   IEEE, 2014, pp. 308–322. [Online]. Available: http://dx.doi.org/10.1109/CSF.2014.29
  • [3] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, ``Calibrating noise to sensitivity in private data analysis,'' in Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, ser. Lecture Notes in Computer Science, S. Halevi and T. Rabin, Eds., vol. 3876.   Springer, 2006, pp. 265–284. [Online]. Available: https://doi.org/10.1007/11681878_14
  • [4] K. Chatzikokolakis, M. Andrés, N. Bordenabe, and C. Palamidessi, ``Broadening the scope of differential privacy using metrics,'' in International Symposium on Privacy Enhancing Technologies Symposium, ser. LNCS, vol. 7981.   Springer, 2013.
  • [5] K. Chatzikokolakis, N. Fernandes, and C. Palamidessi, ``Comparing systems: Max-case refinement orders and application to differential privacy,'' in Proc. CSF.   IEEE Press, 2019.
  • [6] C. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
  • [7] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, and C. Palamidessi, ``On the relation between differential privacy and quantitative information flow,'' in Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II, 2011, pp. 60–76. [Online]. Available: https://doi.org/10.1007/978-3-642-22012-8_4
  • [8] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, The Science of Quantitative Information Flow, ser. Information Security and Cryptography.   Springer International Publishing, 2020.
  • [9] M. S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith, ``Measuring information leakage using generalized gain functions,'' in Proc. 25th IEEE Computer Security Foundations Symposium (CSF 2012), Jun. 2012, pp. 265–279.
  • [10] A. McIver, C. Morgan, L. Meinicke, G. Smith, and B. Espinoza, ``Abstract channels, gain functions and the information order,'' in FCS 2013 Workshop on Foundations of Computer Security, 2013.
  • [11] S. Rachev and L. Ruschendorf, Mass transportation problems.   Springer, 1998, vol. 1.
  • [12] Y. Deng and W. Du, ``The Kantorovich Metric in computer science: A brief survey,'' Electron. Notes Theor. Comput. Sci., vol. 253, no. 3, pp. 73–82, Nov. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.entcs.2009.10.006
  • [13] A. McIver, L. Meinicke, and C. Morgan, ``A Kantorovich-monadic powerdomain for information hiding, with probability and nondeterminism,'' in Proc. LiCS 2012, 2012.
  • [14] E. Lawler, Combinatorial optimization: Networks and Matroids.   Holt, Rinehart and Winston, 1976.
  • [15] N. Fernandes, A. McIver, and C. Morgan, ``The Laplace Mechanism has optimal utility for differential privacy over continuous queries,'' April 2021, full version of this paper with appendices. [Online]. Available at http://www.cse.unsw.edu.au/\simcarrollm/LiCS2021-210420.pdf
  • [16] P. Meyer-Nieberg, Banach Lattices.   Springer-Verlag, 1991.
  • [17] E. Wilson, ``First and second laws of error,'' JASA, vol. 18, no. 143, 1923.
  • [18] M. M. Pai and A. Roth, ``Privacy and mechanism design,'' SIGecom Exch., vol. 12, no. 1, pp. 8–29, 2013. [Online]. Available: https://doi.org/10.1145/2509013.2509016
  • [19] I. Dinur and K. Nissim, ``Revealing information while preserving privacy,'' in Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, F. Neven, C. Beeri, and T. Milo, Eds.   ACM, 2003, pp. 202–210. [Online]. Available: https://doi.org/10.1145/773153.773173
  • [20] C. Dwork and A. Roth, ``The algorithmic foundations of differential privacy,'' Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.
  • [21] J. Soria-Comas and J. Domingo-Ferrer, ``Optimal data-independent noise for differential privacy,'' Information Sciences, vol. 250, pp. 200–214, 2012.
  • [22] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, ``The staircase mechanism in differential privacy,'' IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, 2015.
  • [23] M. Gupte and M. Sundararajan, ``Universally optimal privacy mechanisms for minimax agents,'' in Proc. Symp. Principles of Database Sytems, ser. PODS '10.   New York, NY, USA: Association for Computing Machinery, 2010, pp. 135–146. [Online]. Available: https://doi.org/10.1145/1807085.1807105
  • [24] F. Koufogiannis, S. Han, and G. J. Pappas, ``Optimality of the laplace mechanism in differential privacy,'' arXiv preprint arXiv:1504.00065, 2015.
  • [25] Y. Wang, Z. Huang, S. Mitra, and G. E. Dullerud, ``Entropy-minimizing mechanism for differential privacy of discrete-time linear feedback systems,'' in 53rd IEEE conference on decision and control.   IEEE, 2014, pp. 2130–2135.
  • [26] H. Brenner and K. Nissim, ``Impossibility of differentially private universally optimal mechanisms,'' in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.   Los Alamitos, CA, USA: IEEE Computer Society, oct 2010, pp. 71–80. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/FOCS.2010.13
  • [27] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, P. Degano, and C. Palamidessi, ``On the information leakage of differentially-private mechanisms,'' Journal of Computer Security, vol. 23, no. 4, pp. 427–469, 2015. [Online]. Available: https://doi.org/10.3233/JCS-150528
  • [28] H. Asi and J. C. Duchi, ``Near instance-optimality in differential privacy,'' 2020, arXiv:2005.10630v1, 2020.
  • [29] Y. Wang, X. Wu, and L. Wu, ``Differential privacy preserving spectral graph analysis,'' in Advances in Knowledge Discovery and Data Mining, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, Eds.   Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 329–340.
  • [30] N. Phan, X. Wu, H. Hu, and D. Dou, ``Adaptive laplace mechanism: Differential privacy preservation in deep learning,'' in 2017 IEEE International Conference on Data Mining (ICDM), 2017, pp. 385–394.
  • [31] A. Ghosh, T. Roughgarden, and M. Sundararajan, ``Universally utility-maximizing privacy mechanisms,'' in Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, ser. STOC '09.   New York, NY, USA: Association for Computing Machinery, 2009, pp. 351–360. [Online]. Available: https://doi.org/10.1145/1536414.1536464