The Laplace Mechanism has optimal utility
for differential privacy over continuous queries
Abstract
Differential Privacy protects individuals' data when statistical queries are published from aggregated databases: applying ``obfuscating'' mechanisms to the query results makes the released information less specific but, unavoidably, also decreases its utility. Yet it has been shown that for discrete data (e.g. counting queries), a mandated degree of privacy and a reasonable interpretation of loss of utility, the Geometric obfuscating mechanism is optimal: it loses as little utility as possible [Ghosh et al.[1]].
For continuous query results however (e.g. real numbers) the optimality result does not hold. Our contribution here is to show that optimality is regained by using the Laplace mechanism for the obfuscation.
The technical apparatus involved includes the earlier discrete result [Ghosh op. cit.], recent work on abstract channels and their geometric representation as hyper-distributions [Alvim et al.[2]], and the dual interpretations of distance between distributions provided by the Kantorovich-Rubinstein Theorem.
Index Terms:
Differential privacy, utility, Laplace mechanism, optimal mechanisms, quantitative information flow, abstract channels, hyper-distributions.I Introduction
I-A The existing optimality result, and our extension
Differential Privacy (DP) concerns databases from which (database-) queries produce statistics: a database of information about people can be queried e.g. to reveal their average height, or how many of them are men. But a risk is that from a general statistic, specific information might be revealed about individuals' data: whether a specific person is a man, or his height, or even both. Differentially-private ``obfuscating'' mechanisms diminish that risk by perturbing their inputs (the raw query results) to produce outputs (the query reported) that are slightly wrong in a probabilistically unpredictable way. That diminishes the personal privacy risk (good) but also diminishes the statistics' utility (bad).
The existing optimality result is that for a mandated differential privacy parameter, some , and under conditions we will explain, the Geometric obfuscating mechanism (depending on ) loses the least utility of any -Differentially Private oblivious obfuscating mechanism for the same , that loss being caused by her having to use the perturbed statistic instead of the real one [1].
A conspicuous feature of -DP (that is -differential privacy) is that it is achieved without having to know the nature of the individual's privacy that it is protecting: it is simply made ``-difficult'' to determine whether any of his data is in the database at all. Similarly the minimisation of an observer's loss (of utility) is achieved by the optimal obfuscation without knowing precisely how the obfuscation affects her: instead, the existence of a ``loss function'' is postulated that monetises her loss (think ``dollars'') based on the raw query (which she does not know) and the obfuscated query (which she does know) — and optimality of holds wrt. all loss functions (within certain realistic constraints) and all (other) -DP mechanisms .
in summary: The existing result states that the -DP Geometric obfuscating mechanism minimises loss of utility to an observer when the query results are discrete, e.g. counting queries in some , and certain reasonable constraints apply to the monetisation of loss. But the result does not hold when the query results are continuous, e.g. in the unit interval . We show that optimality is regained by using the -DP Laplace mechanism .
II Differential privacy, loss of utility
and optimality
II-A Differential privacy
Differential privacy begins with a database that is a multiset of rows drawn from some set [3]; thus the type of a database is (using ``'' for ``bag''). A query is a function from database to a query-result in some set , the input of the mechanism, and is thus of type .
A distance function between databases measures how different two databases are from each other. Often used is the Hamming distance , 111The Hamming distance is also known as the symmetric distance. which gives (as an integer) how many whole rows would have to be removed or inserted to transform one database into another: given two databases we define to be the size of their (multiset-) symmetric difference. Thus in particular two databases that differ only because a row has been removed from one of them have Hamming distance 1, and we say that such databases are adjacent.
We define also a distance function (metric) between –for the moment– discrete distributions over a set of observations the output of the mechanism. Given two distributions on , their distance (for ``Dwork'') is based on the largest ratio over all between probabilities assigned to by and — it is
(1) |
where is the probability assigns to the whole subset of , and the logarithm is introduced to make the distance satisfy the triangle inequality that metrics require. 222This distance is also known as “max divergence”.
Following the presentation of Chatzikokolakis et al. [4], once we have chosen a metric d on databases, we say that a mechanism achieves -Differential Privacy wrt. that d and some query , i.e. is -DP for , just when
(2) |
In the special case when d is the Hamming distance , the above definition becomes
(3) |
With the above metric-based point of view we can say that an -DP mechanism is (simply) a -Lipschitz function from databases with metric d to distributions of observations with metric [4].
Definition 1
(/d -DP for mechanisms) A Lipschitz mechanism from raw query outputs) to (observations) is (/d) -Differentially Private just when
(4) |
in which we elide and d when they are clear from context. In (2) we gave the special case where 's inputs were raw query-results , i.e. with two databases acted on by the same query-function . And (3) was further specialised to where the two databases were adjacent and the metric was , the Hamming distance.
II-B ``Counting'' queries
Counting queries on databases are the special case where the codomain of the query (the mechanism input) is the non-negative integers and the query returns the number of database rows satisfying some criterion, like ``being a man''. The ``average height'' query is not a counting query.
When the database metric is the Hamming distance , a counting query can be characterised more generally as one that is a 1-Lipschitz function wrt. and the usual metric (absolute difference) on the integers, i.e. one whose result changes by at most 1 between adjacent databases. Since composition of Lipschitz functions (merely) multiplies their Lipschitz factors, the composition of a counting query and an obfuscating mechanism is -DP as a whole if the mechanism on its own (i.e. without the query, acting ``obliviously'' on ) is -Lipschitz. That is why for counting queries we can concentrate on the mechanisms alone (whose type is ) rather than including the databases and their type in our analysis.
II-C Prior knowledge, open-source and the observer
Although the database contents are not (generally) known, often the distribution of its query results is known: this is ``prior knowledge'', where e.g. it is known that a database of heights in the Netherlands is likely to contain higher values than a similar database in other countries — and that knowledge is different from the (unknown) fact we are trying to protect, i.e. whether a particular person's height is in that database.
We abstract from prior knowledge of the database by concentrating instead on the prior knowledge of the distribution of raw queries, the inputs to the mechanism, that is induced as the push-forward of the query-function (an ``open source'' aggregating function) over the known distribution of possible databases themselves. Knowing on the input in the observer can use her knowledge of the mechanism (also open source) to deduce a distribution on the output observations in that will result from applying it and –further– she can also deduce a posterior distribution on based on any particular in that she observes.
II-D The Geometric mechanism is -DP for
II-D1 Specialising to
Recall from §II-B that , the Hamming distance, is what is typically used for counting queries. In that case we see as follows from (2) that the Geometric mechanism can be made -DP.
The Geometric distribution centred on 0 with parameter assigns (discrete) probability
(5) |
to any integer (positive or negative) [1]. It implements an -DP Geometric mechanism by obfuscating the query according to (5) above: thus set and define
(6) |
to be the probability that integer is input and is output. Thus applied to some , the effect of with 333The in is , so . is to leave as it is with probability and to split the remaining probability equally between adding 1's or subtracting them: continues (in the same direction) with repeated probability until, with probability , it stops.
As explained in §II-C, we now concentrate on alone and how it perturbs its input (a query result), i.e. no longer considering the database from which the query came. 444Note that although the (raw) query is output from the database, it is input to the obfuscating mechanism. That is why we refer to as “input”.
II-D2 The geometric mechanism truncated
In (6) the mechanism can effect arbitrary large perturbations. But in practice its output is constrained (in the discrete case) to a finite set by (re-)assigning all probabilities for negative observations to observation 0, and all probabilities for observations to . For example with and restricting to we have and and . It can be shown [5] however that truncation makes no difference to our results, and so from here on we will assume that truncation has been applied to .
II-E Discrete optimality
It has been shown [1] that when is discrete (and hence the prior on is also), and when the obfuscation is via , and when the observer applies a ``loss function'' of her choice to monetise in the loss of utility to her if the raw query was but she assumes it was , then any other -DP mechanism acting on can only lose more utility (on average) according to that and than does. That is, the -DP Geometric mechanism is optimal for minimising loss (maximising utility) over all priors and all (``legal'') loss functions under a mandated -DP obfuscation. A loss function is said to be legal if it is monotone (increasing) wrt. to the difference between the guess () and the actual value of the query. As explained in [1] this means that the loss takes the form of a function , which must be monotone (increasing) in its first argument.
II-F The geometric mechanism is never -DP
on dense continuous inputs, e.g. when d on is Euclidean
If the input metric for is not the Hamming distance , e.g. when the 's input is continuous, still 's output remains discrete, taking some number of steps, each of fixed length say , in either direction. That is, any input is perturbed to for some integer .
Now because is continuous and dense, we can vary the input itself, by a tiny amount, to some so that no matter how small might be, producing perturbations each of which is distant that same (constant) from the original and, precisely because , those new perturbations cannot overlap the ones based on the original .
Thus the two distributions produced by acting on and on have supports that do not intersect at all. And therefore the distance between the two distributions is infinite, meaning that cannot be be -DP for any (finite) . That is, for a database producing truly real query results , a standard (discrete) cannot establish -DP for any , however large might be.
There are two possible solutions. The first solution, both obvious and practical, is to ``discretise'' the input and to scale appropriately: a person's height of would become instead. A second solution however is motivated by taking a more theoretical approach. Rather than discretise the type of the query results, we leave it continuous — and seek our optimal mechanism among those that –unlike the Geometric– do not take only discrete steps. It will turn out to be the Laplace distribution.
II-G Our result — continuous optimality
In the discrete case typically the set of raw queries is for some , and the prior knowledge is a (discrete) distribution on that. For our continuous setting we will use for raw queries, the unit interval , and the discrete distribution will become a proper measure on expressed as a probability density function. The -DP obfuscating mechanisms, now for ``kontinuous'', will take a raw query from a continuous set rather than a discrete one. And the metric on will be Euclidean.
Our (continuous) optimality result formalised at Thm. 5 is that -DP Laplace mechanism minimises loss over all continuous priors on and all legal loss functions under a mandated -DP obfuscation with respect to the Euclidean metric on the continuous input .
III An outline of the proof
We access the existing discrete results in §I-A, and §II-E from within the continuous by ``pixelating'' it, that is defining for integer , and mapping isomorphically onto that discrete subset. We then establish near optimality for a similarly pixelated Laplace mechanism, showing that ``near'' becomes ``equal to'' when tends to infinity. In more detail:
-
(a)
(We show in §VI-B that) Any (discrete) prior on corresponds to some prior on the original , but can also be obtained by pixelating some continuous prior on all of , concentrating its (now discrete) probabilities onto elements of only: e.g. the probability of the entire -sized interval is moved onto the point . We write it .
-
(b)
(§VI-C) Any function acting on all of can be made into an -step function by first restricting its inputs to and then filling in the ``missing'' values for in by copying the value for . If is an -DP mechanism , we write its -stepped version as , and note that remains -DP when restricted to the points in only.
If is a loss function on ( and) we write for its stepped version.
- (c)
-
(d)
(§VII-B; Thm. 13) The replacement of by (both -step functions on ) is via pixelating the output (continuous) distribution of to a multiple of : we write that . The Kantorovich-Rubinstein Theorem, provided additionally that is -Lipschitz for some independent of , shows that the (additive) difference between the -loss and the -loss, for any and and a multiple of , tends to zero as increases.
-
(e)
(§VI-D) Then we remove the subscript on the prior, and on the mechanisms and , relying now on the -DP of the two mechanisms to make the (multiplicative) ratio between the losses they cause tend to 1.
-
(f)
(§VIII) The final step, removing the subscript N from , is that the loss-calculating procedure is continuous and that tends to as tends to infinity.
IV Channels; loss functions; hyper-distributions; refinement
In this section we provide a summary of the more general Quantitative Information Flow techniques that we will need for the subsequent development.
IV-A Channels, priors, marginals, posteriors
The standard treatment of information flow is via Shannon's (unreliable) channels: they take an input from say and deliver an output that for a perfect channel will be again, but for an imperfect channel might be some other in instead. For example, an imperfect channel transmitting bits might ``flip' input bits so that with probability say an input 0 becomes an output 1 and vice versa [6]. In the discrete case, and generalising to allow outputs of possibly a different type , such channels are matrices whose row-, column- element is the probability that input will produce output . A perfect channel would be the identity matrix on ; a completely broken channel on for any would have .
The -th row of a (channel) matrix is ; and the -th column is . Since each row sums to 1 (making a stochastic matrix), the row determines a discrete distribution in ; for the ``broken'' channel it would be the uniform distribution, which we write .
As a matrix, a channel has type (but with 1-summing rows); isomorphically it also has type . We'll write for both, provided context makes it clear which one we are using.
If a prior distribution on is known, then the channel can be applied to to create a joint distribution in on both input and output together, written and where . For that , the left-marginal gives the prior again (no matter what might be), i.e. the probability that the input was — thus if we use that notation for the marginal. The right marginal is the probability that the output is , given both and .
The -posterior distribution on , given and a particular , is the conditional distribution on if that was output: it is the -th column divided by the marginal probability of that , that is (provided the marginal is not zero).
If we fix and , and use the conventional abbreviation for the resulting joint distribution , then the usual notations for the above are for left marginal and for its value at a particular , with and similarly for the right marginal. Then is the posterior probability of the original observation's being when has been observed. Further, we can write just and and when context makes the (missing) subscripts clear.
IV-B Loss functions; remapping
Our obfuscating mechanisms and are channels like in the discrete case — the result of the query is the channel's input , and the (perturbed) value the observer sees is the channel's output . The loss functions will quantify the loss to her of seeing (only) , and then choosing , when what she really wants to know is . Such -DP mechanisms have earlier been modelled this way, i.e. as channels by Alvim et al. [7] and Chatzikokolakis et al. [4], who observed that for -DP the ratios of their entries must satisfy the -DP constraints, because the definition at §II-A(4) reduces to comparing (multiplicatively) adjacent entries in channel columns.
The connection between the observation and the loss-function parameter is that the observer does not necessarily have to ``take what she sees'' — there might be good reasons for her making a different choice. For example, in a word-guessing game where the last, obfuscated letter ? in a word SA? is shown on the board, the observer might have to guess what it really is. Even if it looks like a blurry Y (value 4 in Scrabble), she might instead guess X (value 8) because that would earn more points on average if from prior knowledge she knows that X strictly more than half as likely as Y is — i.e. it's worth her taking the risk. Thus rather than mandating that the observer must accept what she thinks the letter is most likely to be, she uses the obfuscated query to deduce information about the whole posterior distribution of the actual query… and might suggest that she guess some , because the expected loss of doing that is less than (the expected utility is greater than) it would be if she simply accepted the she saw. That rational strategy is called ``remapping'' [1]. Thus she sees , but tells her that is what she should choose as her least-loss inducing guess for . That is, the simplest strategy is ``take what you see''; but it might not be the best one. In general (and now using again for mechanism), we write for the expected loss to a rational observer, given the she knows and the loss function she has chosen: it is
(7) |
that is the expected value, over all possible observations and their marginal probabilities, of the least loss she could rationally achieve over all her possible choices given the knowledge that will have provided about the posterior distribution of the actual raw input . Note that and determine (from (§IV-A) the and that appear in (7). We remark that this formulation for measuring expected loss corresponds precisely to the formulation used by Ghosh et al. in the optimality theorem.
IV-C The relevance of hyper-distributions, abstract channels
It is important to remember that the expected-loss formula (7) does not use the actual mechanism-output values in any way directly: instead it takes the only expected value of what they might be. All that matters is their marginal probabilities and the a-posteriori distributions that they induce. That allows us to abstract from altogether.
A hyper-distribution expresses that abstraction: it is a distribution of distributions on alone, that is of type ; abbreviate those as ``hyper'' and ``''. Given a joint distribution , we write for the hyper-distribution whose support is posterior distributions 555In the hyper-distribution literature these are called “inners” [8]. on and which assigns the corresponding marginal distribution to each. (Zero-valued marginals are left out.) We now re-express (7) in those terms.
If we write for the function on that determines once is fixed, and write for expected value of random-variable rv with distribution dist, then is the inner part of (7). Then fix some and define for general distribution that
(8) |
(using for ``entrop'') so that is itself a real-valued function on distributions (as e.g. Shannon entropy is). With that preparation, the expression (7) becomes the expected value of over the hyper produced by abstracting from as above. That is (7) gives equivalently
(9) |
in which the and now explicitly appear and where –we recall– the brackets convert the joint distribution to a hyper. (If were in fact Shannon entropy, then (9) would be the conditional Shannon entropy. But 's are much more general than Shannon entropy alone [2, 9].)
Finally, using hypers we define an abstract channel to be a function from prior to hyper, i.e. of type , realised from some concrete channel as . It is ``abstract'' because the type no longer appears: it is unnecessary because if is the application of as a function applied to prior , then from (9) the worst rational expected loss is written simply .
(Recall from §IV-B that this naturally takes into account the ``rational observers'' and the remapping they might perform, as described in [1].)
IV-C1 Example of a channel representation of a mechanism
If we have a discrete input , and discrete output , we can represent an obfuscating mechanism with the channel below.
As described in §IV-A above, the row corresponds to the probability distribution of outputs in for that . For example the top left number is the probability that output is observed when the input is . We can interpret this as an -DP mechanism once we know the metric d on . In particular §II-A(1) simplifies to comparing ratios of entries in the same column, and when we do that we find that for example . Thus from §II-A(4) now applied to we can say that if is -DP then satisfies
IV-C2 Example of a loss function calculation
Now suppose that we choose a loss function known as ``Bayes Risk'', defined on as above:
where . Letting the input prior be the uniform distribution over , we can compute the loss by selecting for each output , the which makes the expected value of over the posterior the least. We then take the expected value of these least values over the marginal . For for example, that least expected value occurs at , and for it occurs either for or . Overall the total expected loss is .
IV-D Refinement of hypers and mechanisms
The hypers on have a partial order ``refinement'' [10] that we will need in the proof of our main result. It admits several equivalent interpretations in this context. Below, we write etc. for general hypers in .
We have that , that hyper is refined by hyper , under any of these equivalent conditions:
-
(a)
when for all loss functions (i.e. whether legal or not).
- (b)
-
(c)
when generated from joint-distribution matrices in generating , and in generating , there is a ``post-processing matrix'' of type such that as matrices we have via matrix multiplication.
And we say that one mechanism is refined by another just when for all priors . When this occurs we also write . From formulation (a) we will use the fact that the -infimum of the 's (indexed over a sequence of 's) is just itself [13] and [15, Lem. 20, Appendix §B].
Formulation (b) is particularly useful. If we find a specific earth move from to that defines a refinement we can then use the equivalent (a) to deduce that . However if we can also compute the cost 666The cost is determined by the amount of “earth” to be moved, and the distance it must be moved. See for example [14]. of the particular earth move we can conclude in addition that the difference must be bounded above by an amount we can compute. This follows from the well-known Kantorovich-Rubinstein duality [11] which says that is no more than minimal cost incurred by any earth move transforming to scaled by the ``Lipschitz constant'' 777The Lipschitz constant of a function is the amount by which the difference in outputs can vary when compared to the difference in inputs. of . We use these ideas in Lem. 11 and Thm. 13.
V Measures on continuous and
V-A Measures via probability density functions
Continuous analogues of the , and will be our principal concern here: ultimately we we will use for our measurable spaces and , and will suppose for simplicity that . (More generality is achieved by simple scaling).
Measures (that is and ) will be given as probability density functions, where a pdf say determines the probability assigned to the sample using the standard Borel measure on , and more generally the expected value of some random variable on given by pdf is .
Even though is of type pdf, we abuse notation to write for example for the probability that assigns to that interval, and for the probability assigns to the point alone, i.e. some just when when the actual pdf-value of is the Dirac delta-function scaled by , written .
V-B Continuous mechanisms over continuous priors
Our mechanisms , up to now discrete, will now become ``kontinuous'', renamed as a mnemonic. Thus a continuous mechanism given input produces measure on the observations . And given a a whole continuous prior , that same therefore determines a joint measure over . 888See [15, Appendix §A2].By analogy with (8,9) we have
Definition 2
Continuous version of (7) The expected loss due to continuous prior , continuous mechanism and loss function is given by 999This is well defined whenever the -indexed family of functions of given by contains a countable subset such that the inf over is equal to the inf over [16]. This is clear if is finite, and whenever can be taken to be the rationals.
(10) |
V-C The truncated Laplace mechanism
As for the Geometric mechanism, the Laplace mechanism is based on the Laplace distribution. It is defined as follows:
Definition 3
(Laplace distribution) The -Laplace mechanism with with input in and probability density in for output in is usually written as a pdf in (for given ) as [17]
The is a normalising factor. It is known [4] that the mechanism satisfies -DP over (where the underlying metric on is Euclidean). Just as for the Geometric mechanism we truncate 's outputs so that they also lie inside . We do so in the same manner, by remapping all outputs greater than to , and all outputs less than to .
Definition 4
(truncated Laplace mechanism) As earlier for , we truncate the Laplace mechanism to for inputs restricted to , and output restricted to , in the following way (as a pdf):
where the constants are and respectively, and is the Dirac delta-function with weight .
We can now state our principal contribution. It is to show that the truncated Laplace is universally optimal, in this continuous setting, in the same way that was optimal in the discrete setting:
Theorem 5
(truncated Laplace is optimal) Let be any continuous -DP mechanism with input and output both , and let be any continuous (prior) probability distribution over and any Lipschitz continuous 101010Lipschitz continuous is less general than continuous. It means that the difference in outputs is within a constant scaling factor of the difference between the inputs., legal loss function on .
Then .
As we foreshadowed in the proof outline in §III, Thm. 5 relies ultimately on the earlier-proven optimality in the discrete case: we must show how we can approximate continuous -DP mechanisms in discrete form, each one satisfying the conditions under which the earlier result applies, and in §VI we fill in the details. Along the way we show how the Laplace mechanism provides a smooth approximation to the Geometric- with discrete inputs.
VI Approximating Continuity for
VI-A Connecting continuous and discrete
Our principal tool for connecting the discrete and continuous settings is the evenly-spaced discrete subset of the unit interval for ever-increasing .
The separation is the interval width.
VI-B Approximations of continuous priors
The -approximation of prior of type , i.e. yielding actual probabilities (not densities), and is defined
The discrete gathers each of the continuous -interval's measure onto its left point, with as a special case [1,1] from included onto the point of .
As an example take to be , and to be the uniform (continuous) distribution over , which can be represented by the constant pdf. Since the interval width is , we see that assigns probability to both and and zero to all other points in .
VI-C -step mechanisms and loss functions
In the other direction, we can lift discrete and loss-function on into the continuous by replicating their values for the 's not in in a way that constructs -step functions: we have
Definition 6
For in define .
Definition 7
Given mechanism , define so that
Note that we have not yet committed here to whether produces discrete or continuous distributions on its output . We are concentrating only on its input (from ).
Similarly, given loss function , define so that
Say that mechanisms and loss functions over are -step functions just when they are constructed as above.
The important property enabled by the above definitions is the correspondence between loss functions' values in their pixelated and original versions, which will allow us to apply the earlier discrete-optimality result, based on Lem. 9 to come. That is, we have
Lemma 8
For any continuous prior in , mechanism in and loss function in we have
That is, the loss realised via a pixelated , and (already discrete) and , all operating on , is the same as the loss realised via the original continuous and the lifted (and thus -step) mechanism and , now operating over all of .
Proof:
We interpret the losses using Def. 2, focussing on the integrand of the inner integral. Note that we can split it up into a finite sum of integrals of the form . When we do that for the left-hand formula we see that throughout the interval the contribution of the mechanism and the loss is constant, i.e. . This means the integral becomes
which is equal to . A similar argument applies to the last interval (which includes ), compensated for by the definitions of and to take their corresponding values from .
Looking now at the right-hand formula, we see that it is now exactly the finite sum of the integrals just described. ∎
VI-D Approximating continuous -DP mechanisms
The techniques above give good discrete approximations for continuous-input -DP mechanisms acting on continuous priors simply by considering 's for increasing 's, using §VI-C. As a convenient abuse of notation, when we start with a continuous-input mechanism on we write to mean the -step mechanism that is made by first restricting to the subset of and then lifting that restriction ``back again'' as in Def. 7, effectively converting it into an -step function. When we do this we find that the posterior loss wrt. -step loss functions can be bounded above and below by using pixelated priors and -stepped mechanisms.
Lemma 9
Let be a continuous-input -DP mechanism, and in a continuous prior and a (non-negative) -step function. Then the following inequalities hold:
(Notice that the middle formula , the mechanism is not -stepped, but in the formulae on either side they are as in Lem. 8.)
Proof:
The proof is as for Lem. 8, but noting also that 's being -DP implies that for all we have . 111111Here we are using the -DP-constraints applied to the pdf . ∎
With Lem. 8 and Lem. 9 we can study optimality of on finite discrete inputs . We will see that, although Geometric mechanisms are still optimal for the (effectively) discrete inputs , the Laplace mechanism provides increasingly good approximate optimality for as increases, and is in fact (truly) optimal in the limit.
VII The Laplace and Geometric mechanisms
In this section we make precise the restriction of the Geometric mechanism (§II-D1) to inputs and outputs both in (a subset of ): for both in we define
(11) |
As an illustration, we take and input , in which the 2 comes from and the comes from the of the Geometric distribution used to make the mechanism . Using the three points and of the input, we compute the truncated geometric mechanism as the channel below, where the rows' labels are (invisibly) the inputs , and the columns are similarly labelled by the outputs (also in this case). This means that if the input was , then the output (after truncation) will be with probability , and with probability etc:
Notice now that the ratio of adjacent probabilities that are in the same column satisfy the -DP constraint, so for example . Notice also that the distance between adjacent inputs in under the Euclidean distance is , not 1 as it would be in the conventional .
Suppose now that we consider instead, consisting of the five points and , and we adjust the in the underlying Geometric distribution from §II-D1(5). The -DP parameter , now , is the same as before — and the resulting matrix is
As before though, the ratio of adjacent probabilities that are in the same column satisfy the -DP-constraint over all of : now we have .
This amplifies the explanation in (2) that the -DP constraints over discrete inputs must take into account the underlying metric on the input space. More generally, whenever we double in , the -parameter must become .
At this point, we have enough to be able to appeal to the discrete optimality result, to bound below the losses for continuous mechanisms, provided that the loss is -legal, i.e. that its legality obtains at least for the distinct points in .
Lemma 10
For any continuous prior in , -DP-mechanism and loss function such that is -legal, we have:
Our next task is to study the relationship between the Geometric- and Laplace mechanisms. We show first that is refined (§IV-D) by the truncated Laplace mechanism also restricted to to . Since is already defined over the whole of we continue to write its restriction to as . This will immediately show that losses under the Geometric are no more than those under the Laplace (§IV-D(1)), consistent with observations that, on discrete inputs, Laplace obfuscation does not necessarily minimise the loss. Since the output of is continuous, we proceed by first approximating it using post-processing to make Laplace-based mechanisms , defined below, which have discrete output, and which can form an anti-refinement chain converging to . We are then able to show separately the refinements between and , using methods designed for finite mechanisms.
The -Laplace mechanisms approximate by -pixelation of their outputs. Here is (still) in but is in .
(12) |
That is, we pixelate the using for the Laplace (independently of the we use for .) This is illustrated in Fig. 1(a).
Observe that as this is a post-processing (§IV-D(3)) of the output of , the refinement follows.
VII-A Refinement between -Geometric
and -Laplace mechanisms
We now demonstrate the crucial fact that is refined by . We use version (b) of refinement, described in §IV-D, and establish a Wasserstein-style earth-move between hypers and (i.e. for uniform prior ).

The width of the central ``vertical slice'' is .

Lemma 11
For all integer we have that .
Proof:
Take in as hypers both with finite supports. We can depict such hypers in -space by locating their supports, each of which is a point in , where the axes of the diagram correspond to each point in . For example if we take to be the hyper-distribution , it has three posterior distributions, which are 1-summing triples in . They are depicted by the orange points in Fig. 1. Similarly the supports of the a hyper-distribution taken to be are represented by the blue dots. Notice that the blue dots are contained in the convex hull of the orange dots, and this observation allows us to prove that the mechanisms and are in a refinement relation.
We use the following fact [8, Lem. 12.2] about refinement .
Let be channels and let be the uniform prior. If the supports of are linearly independent when considered as vectors in , and their convex hull encloses the supports of , then . 121212The lemma applies to channels because of the direct correspondence between channels and the supports of hyper-distributions formed from uniform priors.
To apply this result, we let be recall that indeed the supports of are linearly independent (see for example [5]). Moreover in general, the supports of are also contained in the convex hull. We provide details of this latter fact in [15, Appendix §B].∎
Finally we can show full refinement between the Laplace and the Geometric mechanism, which follows from continuity of refinement [13].
Theorem 12
.
We have shown that the Laplace mechanism is a refinement of the Geometric mechanism. This means that it genuinely leaks less information than does the Geometric mechanism and therefore affords greater privacy protections. On the other hand this means that we have lost utility with respect to the aggregated information. In the next section we turn to the comparison of the Laplace and Geometric mechanisms with respect to that loss.
VII-B The Laplace approximates the Geometric
The geometrical interpretation of the Laplace and Geometric mechanisms set out above indicates how the Laplace approximates the Geometric as 's interval-width approaches . In particular the refinement relationship established in Thm. 12 describes how the posteriors of all lie in between pairs of posteriors of . This relationship between posteriors translates to a bound between the corresponding expected losses and via the Kantorovich-Rubinstein theorem applied to the hypers and . We sketch the argument in the next theorem, and provide full details in [15, Appendix §D].
Theorem 13
Let be a -Lipschitz loss function, and the uniform distribution over . Then
(13) |
where is constant for fixed .
Proof:
We appeal to the Kantorovich-Rubinstein theorem which states that the ``Kantorovich distance'' between probability distributions bounds above the difference between expected values whenever satisfies the -Lipschitz condition. In our case the relevant distributions are the hyper-distributions and , and the relevant Lipschitz functions are for loss functions . 131313 Some is -Lipschitz if , for , and is the Kantorovich distance between .
We write for the Wasserstein distance between hyper-distributions which is determined by the minimal earth-moving cost to transform to . For any such earth move each posterior of is reassigned to a selection of posteriors of in proportion to the probability mass that assigns to . The cost of the move is the expected value of the distance between posterior reassignment (weighted by the proportion of the reassignment). Thus the cost of any specific earth move provides an upper bound to . 141414All the costs are determined by the underlying metric used to define the probability distributions. For us this is determined by the Euclidean distance on the interval . Importantly for us, the relation of refinement determines a specific earth move [8] whose cost we can calculate.
Referring to Lem. 11 and Fig. 1, we see that the refinement between the approximation to the Laplace and , reassigns the Geometric's posteriors (the orange dots) to the Laplace's posteriors (the blue dots). Crucially though the Geometric's posteriors form a linear order according to their distance from one another, and the refinement described in Lem. 11 shows how each Laplace posterior lies in between adjacent pairs of Geometric posteriors (according to the linear ordering), provided that divides . Therefore any redistribution of a Geometric posterior is bounded above by the distance to one or other of its adjacent posteriors. We show in [15, Appendix §D] that distances between adjacent pairs is bounded above by , and therefore .
Next we observe that if is a -Lipschitz function on (as a function of ), then is a -Lipschitz function, and so by the Kantorovich-Rubinstein theorem we must have (recalling from (9)) that :
(14) |
By Thm. 12 and post-processing we see that . Recall from (a) that refinement means that the corresponding losses are also ordered, i.e.
and so the difference must be no more than the difference , thus (13) follows from (14). Full details are set out in [15, Appendix §D].∎
More generally, (13) holds whatever the prior.
VII-C Approximating monotonic functions
The final piece needed to complete our generalisation of the Ghosh et al.'s optimality theorem is monotonicity. We describe here how to approximate continuous monotonic functions, and expose the limitations for the Laplace mechanism.
Definition 15
The loss function is said to be monotone if: there is some mapping , such that
where is monotone in its first argument.
Notice how takes care of any remapping that might need to be applied for computing expected losses. Interestingly step functions are not in general monotone on the whole of the continuous input , but fortunately they are for the restrictions to that we need.
Lemma 16
Let be monotone. Then the approximation restricted to is monotone whenever is a multiple of .
Proof:
If then since divides . ∎
Examples of continuous monotone loss functions include and , where , and
(16) |
Note that is 1-Lipschitz and is 2-Lipschitz.
We note finally that as the pixellation of of increases the approximations converge to .
VIII Universal optimality
for the Laplace Mechanism
We finally have all the pieces in place to prove our main result, Thm. 5 from §V-C — the generalisation of discrete optimality [1].
Let be any continuous -DP mechanism with input , and let be a (continuous) probability distribution over and a legal (i.e. continuous, monotone, -Lipschitz) loss function. Then:
(17) |
Proof:
We use the above results to approximate the expected posterior loss by step functions; these approximations are equivalent to posterior losses over discrete mechanisms satisfying -DP enabling appeal to Ghosh et al.'s universal optimality result on discrete mechanisms. We reason as follows:
The result now follows as above by taking to , and noting that , , and converge to respectively, and that taking expected values over fixed distributions is continuous. ∎
Note that Thm. 5 only holds for mechanisms that are -DP. An arbitrary embedding is not necessarily -DP, and in particular Thm. 5 does not apply to . Also the continuous property on is required, because must be monotone for all . Thus arbitrary step functions do not satisfy this property, and so the Laplace mechanism is not in general universally optimal wrt. arbitrary step functions. Two popular loss functions however are continuous, and thus we have the following corollary.
Corollary 17
The Laplace mechanism is universally optimal for and .
IX Related work
The study of (universally) optimal mechanisms is one way to understand the cost of obfuscation, needed to implement privacy, but with a concomitant loss of utility of queries to databases. Pai and Roth [18] provide a detailed survey of the principles underlying the design of mechanisms including the need to trade utility with privacy, and Dinur et al. [19] explore the relationship between how much noise needs to be added to database queries relative to the usefulness of the data released. Our use of loss functions to measure utility follows both that of Ghosh et al. [1] and Alvim et al. [9], and concerns optimality for entire mechanisms that satisfy a particular level of -DP. For example, the mean error and the mean squared error can be used to evaluate loss, as described by Ghosh et al. [1] and mentioned in §VII-C.
The Laplace mechanism as a way to implement differential privacy has been shown for example by Dwork and Roth [20]. Moreover Chatzikokolakis et al. [4] showed how it satisfied -DP-privacy as formulated here using the Euclidean metric.
Whilst rare, optimality results avoid the need to design bespoke implementations of privacy mechanisms that are tailored to particular datasets. The Geometric mechanism appears to be special for discrete inputs, as Ghosh et al. [1] showed when utility is measured using their ``legal'' loss functions. On the other hand, although the Laplace mechanism continues to be a popular obfuscation mechanism, its deficiencies in terms of utility have been demonstrated by others when the inputs to the obfuscation are discrete [21], and where the optimisation is based on minimising the probability of reporting an incorrect value, subject to the -DP-constraint. Similarly Geng et al. [22] show that adding noise according to a kind of ``pixellated'' distribution appears to produce the best utility for arbitrary discrete datasets. Such examples are consistent with our Thm. 12 showing where the Laplace mechanism is a refinement of the Geometric mechanism (loses more utility) when restricted to a discrete input (to the obfuscation). We mention also that optimal mechanisms have also been studied by Gupte et al. [23] wrt. minimax agents, rather than average-case minimising agents.
Other works have shown the Laplace mechanism is optimal for metric differential privacy in particular non-Bayesian scenarios. Koufogiannis et al. [24] show that the Laplace mechanism is optimal for the mean-squared error function under Lipschitz privacy, equivalent to metric differential privacy; and Wang et al. [25] show that the Laplace mechanism minimises loss measured by Shannon entropy, again for metric differential privacy. Our result on includes those results as specific cases; however, those works do go further in that they demonstrate optimality for their respective loss functions on . We leave the study of these domains in the Bayesian setting to future work.
We also note that the linear ordering of the underlying query results seems to be important for finding optimality results. For example Brenner and Nissim [26] have demonstrated that for non-linearly ordered inputs, there are no optimal -DP-mechanisms for a fixed level of . Although their result finds that only counting queries have optimal mechanisms, their context (oblivious mechanisms on database queries) does not include the possibility of continuous valued query results with a linear order; our result does not contradict their impossibility, it can be seen rather as an extension of this result to a continuous setting.
Alvim et al. [27] also use the framework of Quantitative Information Flow to study the relationship between the privacy and the utility of -DP mechanisms. In their work they model utility in terms of a leakage measure, where leakage is defined as the ratio of input gain to output gain wrt. a mechanism modelled as a channel. Their gain is entirely dual to our loss here, and is a model of an adversary trying to infer as much as possible about the secret input. Other notions of optimality have also been studied in respect of showing that the Laplace mechanism is not optimal, including [28] who work with ``near instance'' optimality, and Geng and Viswanath [22] show how to scale the Laplace in various ways to obtain good utility. Note also that these definitions of optimality do not use a prior, and therefore represent the special case of utility per exact input, rather than in a scenario where the observer's prior knowledge is included.
X Conclusion
We have studied the relationship between differential privacy (good) and loss of utility (bad) when the input can be over an interval of the reals, rather than having described as in the optimality result of Ghosh et al. [1, 31], i.e. as a discrete space. Here we have instead used as input space the continuous interval ; but we note that the result extends straightforwardly to any finite interval of . Our result also imposes the condition that the loss functions must be -Lipschitz for some finite . We do not know whether this condition can be removed in general.
We observe that for -step loss functions, the Laplace mechanism is not optimal, and in fact a bespoke Geometric mechanism will be optimal for such loss functions. However our Thm. 14 provides a way to estimate the error, relative to the optimal loss, when using the Laplace mechanism.
Finally we note that the space of -DP mechanisms is very rich, even for discrete inputs, suggesting that the optimality result given here will be useful whenever the input domain can be linearly ordered.
Acknowledgements
We thank Catuscia Palamidessi for suggesting this problem to us.
The appendices may be found at [15].
References
- [1] A. Ghosh, T. Roughgarden, and M. Sundarajan, ``Universally utility-maximising privacy mechanisms,'' SIAM J. COMPUT, vol. 41, no. 6, pp. 1673–1693, 2012.
- [2] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, ``Additive and multiplicative notions of leakage, and their capacities,'' in IEEE 27th Computer Security Foundations Symposium, CSF 2014, Vienna, Austria, 19-22 July, 2014. IEEE, 2014, pp. 308–322. [Online]. Available: http://dx.doi.org/10.1109/CSF.2014.29
- [3] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, ``Calibrating noise to sensitivity in private data analysis,'' in Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, ser. Lecture Notes in Computer Science, S. Halevi and T. Rabin, Eds., vol. 3876. Springer, 2006, pp. 265–284. [Online]. Available: https://doi.org/10.1007/11681878_14
- [4] K. Chatzikokolakis, M. Andrés, N. Bordenabe, and C. Palamidessi, ``Broadening the scope of differential privacy using metrics,'' in International Symposium on Privacy Enhancing Technologies Symposium, ser. LNCS, vol. 7981. Springer, 2013.
- [5] K. Chatzikokolakis, N. Fernandes, and C. Palamidessi, ``Comparing systems: Max-case refinement orders and application to differential privacy,'' in Proc. CSF. IEEE Press, 2019.
- [6] C. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948.
- [7] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, and C. Palamidessi, ``On the relation between differential privacy and quantitative information flow,'' in Automata, Languages and Programming - 38th International Colloquium, ICALP 2011, Zurich, Switzerland, July 4-8, 2011, Proceedings, Part II, 2011, pp. 60–76. [Online]. Available: https://doi.org/10.1007/978-3-642-22012-8_4
- [8] M. S. Alvim, K. Chatzikokolakis, A. McIver, C. Morgan, C. Palamidessi, and G. Smith, The Science of Quantitative Information Flow, ser. Information Security and Cryptography. Springer International Publishing, 2020.
- [9] M. S. Alvim, K. Chatzikokolakis, C. Palamidessi, and G. Smith, ``Measuring information leakage using generalized gain functions,'' in Proc. 25th IEEE Computer Security Foundations Symposium (CSF 2012), Jun. 2012, pp. 265–279.
- [10] A. McIver, C. Morgan, L. Meinicke, G. Smith, and B. Espinoza, ``Abstract channels, gain functions and the information order,'' in FCS 2013 Workshop on Foundations of Computer Security, 2013.
- [11] S. Rachev and L. Ruschendorf, Mass transportation problems. Springer, 1998, vol. 1.
- [12] Y. Deng and W. Du, ``The Kantorovich Metric in computer science: A brief survey,'' Electron. Notes Theor. Comput. Sci., vol. 253, no. 3, pp. 73–82, Nov. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.entcs.2009.10.006
- [13] A. McIver, L. Meinicke, and C. Morgan, ``A Kantorovich-monadic powerdomain for information hiding, with probability and nondeterminism,'' in Proc. LiCS 2012, 2012.
- [14] E. Lawler, Combinatorial optimization: Networks and Matroids. Holt, Rinehart and Winston, 1976.
- [15] N. Fernandes, A. McIver, and C. Morgan, ``The Laplace Mechanism has optimal utility for differential privacy over continuous queries,'' April 2021, full version of this paper with appendices. [Online]. Available at http://www.cse.unsw.edu.au/carrollm/LiCS2021-210420.pdf
- [16] P. Meyer-Nieberg, Banach Lattices. Springer-Verlag, 1991.
- [17] E. Wilson, ``First and second laws of error,'' JASA, vol. 18, no. 143, 1923.
- [18] M. M. Pai and A. Roth, ``Privacy and mechanism design,'' SIGecom Exch., vol. 12, no. 1, pp. 8–29, 2013. [Online]. Available: https://doi.org/10.1145/2509013.2509016
- [19] I. Dinur and K. Nissim, ``Revealing information while preserving privacy,'' in Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, F. Neven, C. Beeri, and T. Milo, Eds. ACM, 2003, pp. 202–210. [Online]. Available: https://doi.org/10.1145/773153.773173
- [20] C. Dwork and A. Roth, ``The algorithmic foundations of differential privacy,'' Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3-4, pp. 211–407, 2014.
- [21] J. Soria-Comas and J. Domingo-Ferrer, ``Optimal data-independent noise for differential privacy,'' Information Sciences, vol. 250, pp. 200–214, 2012.
- [22] Q. Geng, P. Kairouz, S. Oh, and P. Viswanath, ``The staircase mechanism in differential privacy,'' IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 7, 2015.
- [23] M. Gupte and M. Sundararajan, ``Universally optimal privacy mechanisms for minimax agents,'' in Proc. Symp. Principles of Database Sytems, ser. PODS '10. New York, NY, USA: Association for Computing Machinery, 2010, pp. 135–146. [Online]. Available: https://doi.org/10.1145/1807085.1807105
- [24] F. Koufogiannis, S. Han, and G. J. Pappas, ``Optimality of the laplace mechanism in differential privacy,'' arXiv preprint arXiv:1504.00065, 2015.
- [25] Y. Wang, Z. Huang, S. Mitra, and G. E. Dullerud, ``Entropy-minimizing mechanism for differential privacy of discrete-time linear feedback systems,'' in 53rd IEEE conference on decision and control. IEEE, 2014, pp. 2130–2135.
- [26] H. Brenner and K. Nissim, ``Impossibility of differentially private universally optimal mechanisms,'' in 2013 IEEE 54th Annual Symposium on Foundations of Computer Science. Los Alamitos, CA, USA: IEEE Computer Society, oct 2010, pp. 71–80. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/FOCS.2010.13
- [27] M. S. Alvim, M. E. Andrés, K. Chatzikokolakis, P. Degano, and C. Palamidessi, ``On the information leakage of differentially-private mechanisms,'' Journal of Computer Security, vol. 23, no. 4, pp. 427–469, 2015. [Online]. Available: https://doi.org/10.3233/JCS-150528
- [28] H. Asi and J. C. Duchi, ``Near instance-optimality in differential privacy,'' 2020, arXiv:2005.10630v1, 2020.
- [29] Y. Wang, X. Wu, and L. Wu, ``Differential privacy preserving spectral graph analysis,'' in Advances in Knowledge Discovery and Data Mining, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and G. Xu, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 329–340.
- [30] N. Phan, X. Wu, H. Hu, and D. Dou, ``Adaptive laplace mechanism: Differential privacy preservation in deep learning,'' in 2017 IEEE International Conference on Data Mining (ICDM), 2017, pp. 385–394.
- [31] A. Ghosh, T. Roughgarden, and M. Sundararajan, ``Universally utility-maximizing privacy mechanisms,'' in Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, ser. STOC '09. New York, NY, USA: Association for Computing Machinery, 2009, pp. 351–360. [Online]. Available: https://doi.org/10.1145/1536414.1536464