This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions

Vadym Doroshenko Google, E-mail: [email protected] Badih Ghazi Google Research, E-mail: [email protected] Pritish Kamath Google Research, E-mail: [email protected] Ravi Kumar Google Research, E-mail: [email protected] Pasin Manurangsi Google Research, E-mail: [email protected]
Abstract

The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP). Recent work [24, 19, 20, 18] has shown that PLD-based accounting allows for tighter (ε,δ)(\varepsilon,\delta)-DP guarantees for many popular mechanisms compared to other known methods. A key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support.
We present a novel approach to this problem. Our approach supports both pessimistic estimation, which overestimates the hockey-stick divergence (i.e., δ\delta) for any value of ε\varepsilon, and optimistic estimation, which underestimates the hockey-stick divergence. Moreover, we show that our pessimistic estimate is the best possible among all pessimistic estimates. Experimental evaluation shows that our approach can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than an existing method [24].

1 Introduction

Differential privacy (DP) [9, 8] has become widely adopted as a notion of privacy in analytics and machine learning applications, leading to numerous practical deployments including in industry [13, 29, 16, 3, 7] and government agencies [2]. The DP guarantee of a (randomized) algorithm is parameterized by two real numbers ε>0\varepsilon>0 and δ[0,1]\delta\in[0,1]; the smaller these values, the more private the algorithm.

The appeal of DP stems from the strong privacy that it guarantees (which holds even if the adversary controls the inputs of all other users in the database), and from its nice mathematical properties. These include composition, whose basic form [8] says that executing an (ε1,δ1)(\varepsilon_{1},\delta_{1})-DP algorithm and an (ε2,δ2)(\varepsilon_{2},\delta_{2})-DP algorithm and returning their results gives an algorithm that is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-DP. While basic composition can be used to bound the DP properties of kk algorithms, it is known to not be tight, in particular for large values of kk. In fact, advanced composition [12] yields a general improvement, often translating to k\approx\sqrt{k} reduction in the ε\varepsilon privacy bound of the composition of kk mechanisms each of which being (ε0,δ0)(\varepsilon_{0},\delta_{0})-DP. Such a reduction can be sizeable in practical deployments, and therefore much research has been focusing on obtaining tighter composition bounds in various settings.

In the aforementioned setting where each mechanism has the same DP parameters, Kairouz et al. [17] derived the optimal composition bound. For the more general case of composing kk mechanisms whose privacy parameters are possibly different, i.e., the iith mechanism is guaranteed to be (εi,δi)(\varepsilon_{i},\delta_{i})-DP for some parameters εi,δi\varepsilon_{i},\delta_{i}, computing the (exact) DP parameters of the composed mechanism is known to be #P-complete [27].

While the results of [17, 27] provide a complete picture of privacy accounting when we assume only that the iith mechanism is (εi,δi)(\varepsilon_{i},\delta_{i})-DP, we can often arrive at tighter bounds when taking into account some additional information about the privacy loss of the mechanisms. For example, the Moments Accountant [1] and Rényi DP [26] methods keep track of (upper bounds on) the Renyi divergences of the output distributions on two adjacent databases; this allows one to compute upper bounds on the privacy parameters. These tools were originally introduced in the context of deep learning with DP (where the composition is over multiple iterations of the learning algorithm) in which they provide significant improvements over simply using the DP parameters of each mechanism. Other known tools that can also be used to upper-bound the privacy parameters of composed mechanisms include concentrated DP [11, 6] and its truncated variant [5]. These methods are however all known not to be tight, and do not allow a high-accuracy estimation of the privacy parameters.

A numerical method for estimating the privacy parameters of a DP mechanism to an arbitrary accuracy, which has been the subject of several recent works starting with [24, 30], relies on the privacy loss distribution (PLD). This is the probability mass function of the so-called privacy loss random variable in the case of discrete mechanisms, and its probability density function in the case of continuous mechanisms. From the PLD of a mechanism, one can easily obtain its (tight) privacy parameters. Moreover, a crucial property is that the PLD of a composition of multiple mechanisms is the convolution of their individual PLDs. Thus, [19] used the Fast Fourier Transform (FFT) in order to speed up the computation of the PLD of the composition. Furthermore, explicit bounds on the approximation error for the resulting algorithm were derived in [19, 20, 18, 15]. The PLD has been the basis of multiple open-source implementations from both industry and academia including [23, 22, 25]. We note that the PLD can be applied to mechanisms whose privacy loss random variables do not have bounded moments, and thus for which composition cannot be analyzed using the Moments Accountant or Rényi DP methods. An example such mechanism is DP-SGD-JL from [4].

A crucial step in previous papers that use PLDs is in approximating the distribution so that it has finite support; this is especially needed in the case where the PLD is continuous or has a support of a very large size, as otherwise the FFT cannot be performed efficiently. With the exception of [15]111Gopi et al. [15] uses an estimator that is neither pessimistic nor optimistic, and instead derive their final values of δ\delta using concentration bound-based error estimates., previous PLD-based accounting approaches [24, 19, 20, 18] employ pessimistic estimators and optimistic estimators of PLDs. Roughly speaking, the former overestimate (i.e., give upper bounds on) δ\delta, whereas the latter underestimate δ\delta. For efficiency reasons, we would like the support of the approximate PLDs to be as small as possible, while retaining the accuracy of the estimates.

Our Contributions

Our main contributions are the following:

  • \triangleright

    We obtain a new pessimistic estimator for a PLD and a given desired support set. Our pessimistic estimator is simple to construct and is based on the idea of “connecting the dots” of the hockey-stick curve at the discretization intervals. Interestingly, we show that this is the best possible pessimistic estimator (and therefore is at least as good as previous estimators).

  • \triangleright

    We complement the above result by obtaining a new optimistic estimator that underestimates the PLD. This estimator is based on the combination of a greedy algorithm and a convex hull computation. In contrast to the pessimistic case, we prove that there is no “best” possible optimistic estimator.

  • \triangleright

    We conduct an experimental evaluation showing that our estimators can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than existing methods.

2 Preliminaries

For kk\in{\mathbb{N}}, we use [k][k] to denote {1,,k}\{1,\dots,k\}. For a set S{,+}S\subseteq{\mathbb{R}}\cup\{-\infty,+\infty\}, we write exp(S)\exp(S) to denote {eaaS}\{e^{a}\mid a\in S\}. Similarly, for S0{+}S\subseteq{\mathbb{R}}_{\geq 0}\cup\{+\infty\}, we use log(S)\log(S) to denote {log(a)aS}\{\log(a)\mid a\in S\}. Here we use the (standard) convention that e+=+e^{+\infty}=+\infty and e=0e^{-\infty}=0; we also use the convention that (+)0=()0=0(+\infty)\cdot 0=(-\infty)\cdot 0=0. Moreover, we use [x]+[x]_{+} as a shorthand for max{x,0}\max\{x,0\}.

We use supp(P)\operatorname*{supp}(P) to denote the support of a probability distribution PP. For two distributions P,QP,Q, we use PQP\otimes Q to denote the product distribution of the two. Furthermore, when P,QP,Q are over an additive group, we use PQP\ast Q to denote the convolution of the two distributions.

2.1 Hockey-Stick Divergence and Curve

Let α0\alpha\geq 0. The α\alpha-hockey-stick divergence between two probability distributions PP and QQ over a domain Ω\Omega is given as

Dα(P||Q):=supS[P(S)αQ(S)]+,D_{\alpha}(P||Q):=\sup_{S}\left[P(S)-\alpha\cdot Q(S)\right]_{+}, (1)

where supS\sup_{S} is over all measurable sets SΩS\subseteq\Omega.

For any pair (A,B)(A,B) of distributions, let h(A,B):0{+}[0,1]h_{(A,B)}:{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}\to[0,1] be its hockey-stick curve, given as h(A,B)(α):=Dα(A||B)=supS[A(S)αB(S)]+h_{(A,B)}(\alpha):=D_{\alpha}(A||B)=\sup_{S}[A(S)-\alpha\cdot B(S)]_{+}.

The following characterization of hockey-stick curves, due to [32], is helpful:

Lemma 2.1 ([32]).

A function h:0{+}[0,1]h:{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}\to[0,1] is a hockey-stick curve for some pair of distributions if and only if the following three conditions hold:

  1. (i)

    hh is convex and non-increasing,

  2. (ii)

    h(0)=1h(0)=1,

  3. (iii)

    h(α)[1α]+h(\alpha)\geq[1-\alpha]_{+} for all α0{+}\alpha\in{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}.

2.2 Differential Privacy

The definition of differential privacy (DP) [9, 8]222For more background on differential privacy, we refer the reader to the monograph [10]. can be stated in terms of the hockey-stick divergence as follows.

Definition 2.2.

For a notion of adjacent datasets, a mechanism \mathcal{M} is said to satisfy (ε,δ)(\varepsilon,\delta)-differential privacy (denoted, (ε,δ)(\varepsilon,\delta)-DP) if for all adjacent datasets SSS\sim S^{\prime}, it holds that Deε((S)||(S))δD_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime}))\leq\delta.

We point out that the techniques developed in this paper are general and do not depend on the specific adjacency relation. For the rest of the paper, for convenience, we always use α\alpha to denote eεe^{\varepsilon}.

In most situations however, mechanisms satisfy (ε,δ)(\varepsilon,\delta)-DP for multiple values of ε\varepsilon and δ\delta. This is captured by the privacy loss profile δ:\delta_{\mathcal{M}}:{\mathbb{R}}\to{\mathbb{R}} of a mechanism \mathcal{M} given as δ(ε):=supSS𝒟eε((S)||(S))\delta_{\mathcal{M}}(\varepsilon):=\sup_{S\sim S^{\prime}}\mathcal{D}_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime})). It will be more convenient to consider the hockey-stick curve instead of the privacy profile. The only difference is that the hockey-stick curve takes α=eε\alpha=e^{\varepsilon} as parameter instead of ε\varepsilon as in the privacy profile.

2.3 Dominating Pairs

A central notion in our work is that of a dominating pair for a mechanism, defined by Zhu et al. [32].

Definition 2.3 (Dominating Pairs [32]).

A pair (P,Q)(P,Q) of distributions dominates a pair (A,B)(A,B) of distributions if it holds that

α0:Dα(A||B)Dα(P||Q);\forall\alpha\geq 0\ :\ D_{\alpha}(A||B)\leavevmode\nobreak\ \leq\leavevmode\nobreak\ D_{\alpha}(P||Q);

we denote this as (A,B)(P,Q)(A,B)\preceq(P,Q).

A pair (P,Q)(P,Q) of distributions is a dominating pair for a mechanism \mathcal{M} if for all adjacent datasets SSS\sim S^{\prime}, it holds that ((S),(S))(P,Q)(\mathcal{M}(S),\mathcal{M}(S^{\prime}))\preceq(P,Q); we denote this as (P,Q)\mathcal{M}\preceq(P,Q).

A pair (P,Q)(P,Q) of distributions is a tightly dominating pair for \mathcal{M} if for every α0\alpha\geq 0, it holds that Dα(P||Q)=supSSDα((S)||(S))D_{\alpha}(P||Q)=\sup_{S\sim S^{\prime}}D_{\alpha}(\mathcal{M}(S)||\mathcal{M}(S^{\prime})).

Note that, by definition, (P,Q)(A,B)(P,Q)\succeq(A,B) if and only if h(P,Q)h_{(P,Q)} is no smaller than h(A,B)h_{(A,B)} pointwise, i.e., h(P,Q)(α)h(A,B)(α)h_{(P,Q)}(\alpha)\geq h_{(A,B)}(\alpha) for all α0{+}\alpha\in{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}.

The following result highlights the importance of dominating pairs.

Theorem 2.4 ([32]).

If (P,Q)\mathcal{M}\preceq(P,Q) and (P,Q)\mathcal{M}^{\prime}\preceq(P^{\prime},Q^{\prime}), then (PP,QQ)\mathcal{M}\circ\mathcal{M}^{\prime}\preceq(P\otimes P^{\prime},Q\otimes Q^{\prime}), where \mathcal{M}\circ\mathcal{M}^{\prime} is the composition of \mathcal{M} and \mathcal{M}^{\prime}. Furthermore, this holds even for adaptive composition.333In adaptive composition of \mathcal{M}^{\prime}\circ\mathcal{M}, \mathcal{M}^{\prime} can also take the output of \mathcal{M} as an auxiliary input. Here the (P,Q)\mathcal{M}^{\prime}\preceq(P^{\prime},Q^{\prime}) has to hold for all possible auxiliary input.

Thus, in order to upper bound the privacy loss profile δ(ε):=supSSDeε((S)||(S))\delta_{\mathcal{M}}(\varepsilon):=\sup_{S\sim S^{\prime}}D_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime})), it suffices to compute Deε(P||Q)D_{e^{\varepsilon}}(P||Q) for a dominating (P,Q)(P,Q) pair for \mathcal{M}.

2.4 Privacy Loss Distribution

Privacy Loss Distribution (PLD) [11, 30] is yet another way to represent the privacy loss. For simplicity, we give a definition below specific to discrete distributions P,QP,Q; it can be extended, e.g., to continuous distributions by replacing the probability masses P(o),Q(o)P(o),Q(o) with probability densities of P,QP,Q at oo.

Definition 2.5 ([11]).

The privacy loss distribution (PLD) of a pair (P,Q)(P,Q) of discrete distributions, denoted by 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)}, is the distribution of the privacy loss random variable LL generated by drawing oPo\sim P and let L=P(o)/Q(o)L=P(o)/Q(o).

As alluded to earlier, PLD can be used to compute the hockey-stick divergence [24, 30] (proof provided in Appendix A for completeness):

Lemma 2.6.

For any pair (P,Q)(P,Q) of discrete distributions and ε{,+}\varepsilon\in{\mathbb{R}}\cup\{-\infty,+\infty\}, we have

Deε(P||Q):=εsupp(𝖯𝖫𝖣(P,Q))[1eεε]+𝖯𝖫𝖣(P,Q)(ε).\displaystyle D_{e^{\varepsilon}}(P||Q):=\sum_{\varepsilon^{\prime}\in\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})}[1-e^{\varepsilon-\varepsilon^{\prime}}]_{+}\cdot\mathsf{PLD}_{(P,Q)}(\varepsilon^{\prime}).

Note that the RHS term above depends only on 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)} and not directly on P,QP,Q themselves. For convenience, we will abbreviate the RHS term as Deε(𝖯𝖫𝖣(P,Q))D_{e^{\varepsilon}}(\mathsf{PLD}_{(P,Q)}).

The main advantage in dealing with PLDs is that composition simply corresponds to convolution of PLDs [24, 30]:

Lemma 2.7.

Let P,Q,P,QP,Q,P^{\prime},Q^{\prime} be discrete distributions. Then we have

𝖯𝖫𝖣(PP,QQ)=𝖯𝖫𝖣(P,Q)𝖯𝖫𝖣(P,Q).\displaystyle\mathsf{PLD}_{(P\otimes P^{\prime},Q\otimes Q^{\prime})}=\mathsf{PLD}_{(P,Q)}\ast\mathsf{PLD}_{(P^{\prime},Q^{\prime})}.

2.5 Accounting Framework via Dominating Pairs and PLDs

Dominating pairs and PLDs form a powerful set of building blocks to perform privacy accounting. Recall that in privacy accounting, we typically have a mechanism =1k\mathcal{M}=\mathcal{M}_{1}\circ\cdots\circ\mathcal{M}_{k} where each i\mathcal{M}_{i} is a “simple” mechanism (e.g., Laplace or Gaussian mechanisms) and we would like to understand the privacy profile of \mathcal{M}.

The approach taken in previous works [24, 19, 20, 18] can be summarized as follows.444Note that their results are not phrased in terms of dominating pairs, since the latter is only defined and studied in [32]. Nonetheless, these previous works use similar (but more restricted) notions for “worst case” distributions.

  1. 1.

    Identify a dominating pair (Ai,Bi)(A_{i},B_{i}) for each i\mathcal{M}_{i}.

  2. 2.

    Find a pessimistic estimate555We remark that this is slightly inaccurate as the “pessimistic estimate” in previous works may actually not be a valid PLD; please see Section 4.1 for a more detailed explanation. (Pi,Qi)(Ai,Bi)(P^{\uparrow}_{i},Q^{\uparrow}_{i})\succeq(A_{i},B_{i}) such that 𝖯𝖫𝖣(Pi,Qi)\mathsf{PLD}_{(P^{\uparrow}_{i},Q^{\uparrow}_{i})} is supported on a certain set of prespecified values.

  3. 3.

    Compute 𝖯𝖫𝖣=𝖯𝖫𝖣(P1,Q1)𝖯𝖫𝖣(Pk,Qk)\mathsf{PLD}^{\uparrow}=\mathsf{PLD}_{(P^{\uparrow}_{1},Q^{\uparrow}_{1})}\ast\cdots\ast\mathsf{PLD}_{(P^{\uparrow}_{k},Q^{\uparrow}_{k})}.

  4. 4.

    Compute δ(ε)\delta^{\uparrow}(\varepsilon) from 𝖯𝖫𝖣\mathsf{PLD}^{\uparrow} using the formula from Lemma 2.6.

By Theorem 2.4 and Lemmas 2.7 and 2.6, we can conclude that δ(ε)δ(ε)\delta^{\uparrow}(\varepsilon)\geq\delta_{\mathcal{M}}(\varepsilon); in other words, the mechanism \mathcal{M} is (ε,δ(ε))(\varepsilon,\delta^{\uparrow}(\varepsilon))-DP as desired.

Note that the reason that one needs 𝖯𝖫𝖣(Pi,Qi)\mathsf{PLD}_{(P^{\uparrow}_{i},Q^{\uparrow}_{i})} to have finite support in the second step is so that it can be computed efficiently via the Fast Fourier Transform (FFT). Currently, there is only one approach used in previous works, called Privacy Buckets [24]. Roughly speaking, this amounts simply to rounding the PLD up to the nearest point in the specified support set. (See Section 4.1 for a more formal description.)

While the above method gives us an upper bound δ(ε)\delta^{\uparrow}(\varepsilon) of δ(ε)\delta_{\mathcal{M}}(\varepsilon), there are scenarios where we would like to find a lower bound on δ(ε)\delta_{\mathcal{M}}(\varepsilon); for example, this can be helpful in determining how tight our upper bound is. Computing such a lower bound is also possible under the similar framework, except that we need to know a list of tightly dominating pairs (A1,B1),,(Ak,Bk)(A^{*}_{1},B^{*}_{1}),\dots,(A^{*}_{k},B^{*}_{k}) such that there exists an adjacent datasets S,SS,S^{\prime} for which Deε((S)||(S))=Deε(A1Ak||B1Bk)D_{e^{\varepsilon}}(\mathcal{M}(S)||\mathcal{M}(S^{\prime}))=D_{e^{\varepsilon}}(A^{*}_{1}\otimes\cdots\otimes A^{*}_{k}||B^{*}_{1}\otimes\cdots\otimes B^{*}_{k}). If such tightly dominating pairs can be identified, then we can follow the same blueprint as above except we replace a pessimistic estimate with an optimistic estimate (P,Q)(Ai,Bi)(P^{\downarrow},Q^{\downarrow})\preceq(A^{*}_{i},B^{*}_{i}). This would indeed gives us a lower bound δ(ε)\delta^{\downarrow}(\varepsilon) of δ(ε)\delta_{\mathcal{M}}(\varepsilon).

The described framework is illustrated in Figure 1.

Mechanism 1\mathcal{M}_{1}Mechanism 2\mathcal{M}_{2}\vdotsMechanism k\mathcal{M}_{k}(A1,B1)(A_{1},B_{1})(A2,B2)(A_{2},B_{2})\vdots(Ak,Bk)(A_{k},B_{k}) Dominating Pairs \preceq\preceq\preceq(P1,Q1)(P^{\uparrow}_{1},Q^{\uparrow}_{1})(P2,Q2)(P^{\uparrow}_{2},Q^{\uparrow}_{2})\vdots(Pk,Qk)(P^{\uparrow}_{k},Q^{\uparrow}_{k}) Convolution (Lemma 2.7) 𝖯𝖫𝖣\mathsf{PLD}^{\uparrow} Compute δ\delta (Lemma 2.6) Pessimistic Estimate (A1,B1)(A^{*}_{1},B^{*}_{1})(A2,B2)(A^{*}_{2},B^{*}_{2})\vdots(Ak,Bk)(A^{*}_{k},B^{*}_{k}) Tightly Dominating Pairs \preceq\preceq\preceq(P1,Q1)(P^{\downarrow}_{1},Q^{\downarrow}_{1})(P2,Q2)(P^{\downarrow}_{2},Q^{\downarrow}_{2})\vdots(Pk,Qk)(P^{\downarrow}_{k},Q^{\downarrow}_{k}) Convolution (Lemma 2.7) 𝖯𝖫𝖣\mathsf{PLD}^{\downarrow} Compute δ\delta (Lemma 2.6) Optimistic Estimate δ(ε)δ(ε)δ(ε)\delta^{\downarrow}(\varepsilon)\leq\delta_{\mathcal{M}}(\varepsilon)\leq\delta^{\uparrow}(\varepsilon)
Fig. 1: Illustration of the framework for privacy accounting using PLDs and the notion of (tightly) dominating pairs.

3 Finitely-Supported PLDs

As alluded to in the previous section, to take full advantage of FFT, it is important that a PLD is discretized in a way such that its support is finite. For exposition purposes, we will assume that the discretization points include -\infty and ++\infty. We will use \mathcal{E} for discretization points for the PLD and 𝒜\mathcal{A} for the corresponding discretization points for the hockey-stick curve:

Assumption 3.1.

Let 𝒜={α0,,αk}\mathcal{A}=\{\alpha_{0},\dots,\alpha_{k}\} be any finite subset of 0{+}{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\} such that 0=α0<α1<<αk=+0=\alpha_{0}<\alpha_{1}<\cdots<\alpha_{k}=+\infty, and let ={ε0,,εk}\mathcal{E}=\{\varepsilon_{0},\dots,\varepsilon_{k}\} be such that ε0=,εk=+\varepsilon_{0}=-\infty,\varepsilon_{k}=+\infty and εi=log(αi)\varepsilon_{i}=\log(\alpha_{i}) for all i[k1]i\in[k-1].

For the remaining of this work, we will operate under the above assumption and we will not state this explicitly for brevity.

Using the characterization in Lemma 2.1, we can also characterize the hockey-stick curve of PLDs whose support is on a prespecified finite set \mathcal{E}, stated more precisely in the lemma below. Furthermore, the “inverse” part of the lemma yields an algorithm (Algorithm 1) that can construct A,BA,B given (h(αi))αi𝒜(h(\alpha_{i}))_{\alpha_{i}\in\mathcal{A}} which we will use in the sequel.

Lemma 3.2.

A function h:0{+}[0,1]h:{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}\to[0,1] is a hockey-stick curve for some pair (P,Q)(P,Q) such that supp(𝖯𝖫𝖣(P,Q))\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E} if and only if the following conditions hold:

  1. (i)

    hh is convex and non-increasing,

  2. (ii)

    h(0)=1h(0)=1,

  3. (iii)

    h(αi)[1αi]+h(\alpha_{i})\geq[1-\alpha_{i}]_{+} for all αi𝒜\alpha_{i}\in\mathcal{A},

  4. (iv)

    For all i[k1]i\in[k-1], the curve hh restricted to [αi1,αi][\alpha_{i-1},\alpha_{i}] is linear: i.e., for all α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}), we have h(α)=ααiαi1αih(αi1)+αi1ααi1αih(αi)h(\alpha)=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i}).

  5. (v)

    For all α>αk1\alpha>\alpha_{k-1}, h(α)=h(+)h(\alpha)=h(+\infty).

A consequence of Lemma 3.2 is that hh is completely specified by h(𝒜)h(\mathcal{A}). More formally, given f:𝒜[0,1]f:\mathcal{A}\to[0,1], the only possible extension of ff to a hockey-stick curve is its piecewise-linear extension f¯\overline{f} defined by

f¯(α):=\displaystyle\overline{f}(\alpha):=
{ααiαi1αif(αi1)+αi1ααi1αif(αi)if α[αi1,αi)f(+)if α>αk1,\displaystyle\begin{cases}\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}f(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}f(\alpha_{i})&\text{if }\alpha\in[\alpha_{i-1},\alpha_{i})\\ f(+\infty)&\text{if }\alpha>\alpha_{k-1},\end{cases}

for all α0{+}\alpha\in{\mathbb{R}}_{\geq 0}\cup\{+\infty\}. Note that this f¯\overline{f} may still not be a hockey-stick curve, as it may not be convex.

  • Proof of Lemma 3.2.

    ()(\Leftarrow) We start with the converse direction by describing an algorithm that, given h(α0),,h(αk)h(\alpha_{0}),\dots,h(\alpha_{k}), can construct the desired P,QP,Q. In fact, we will construct distributions PP and QQ with supports contained in 𝒜\mathcal{A} satisfying the following:

    1. (Π1\Pi_{1})

      P(α)=αQ(α)P(\alpha)=\alpha\cdot Q(\alpha) for all α𝒜{+}\alpha\in\mathcal{A}\setminus\{+\infty\},

    2. (Π2\Pi_{2})

      Q()=0Q(\infty)=0,

    3. (Π3\Pi_{3})

      Dα(P||Q)=h(α)D_{\alpha}(P||Q)=h(\alpha) for all α𝒜\alpha\in\mathcal{A}.

    The first two conditions imply that supp(𝖯𝖫𝖣(P,Q))\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E} and the last condition implies that h(P,Q)=hh_{(P,Q)}=h as desired.

    The construction of P,QP,Q is described in Algorithm 1.

    Algorithm 1 PLD Discretization.
    procedure DiscretizePLD(h(α0),,h(αk)h(\alpha_{0}),\dots,h(\alpha_{k}))
         Q(αk)0Q(\alpha_{k})\leftarrow 0 \triangleright αk=+\alpha_{k}=+\infty
         for i=k1,,1i=k-1,\dots,1 do
             Q(αi)h(αi1)h(αi)αiαi1h(αi)h(αi+1)αi+1αiQ(\alpha_{i})\leftarrow\frac{h(\alpha_{i-1})-h(\alpha_{i})}{\alpha_{i}-\alpha_{i-1}}-\frac{h(\alpha_{i})-h(\alpha_{i+1})}{\alpha_{i+1}-\alpha_{i}}      
         Q(α0)1j[k1]Q(αj)Q(\alpha_{0})\leftarrow 1-\sum_{j\in[k-1]}Q(\alpha_{j}) \triangleright α0=0\alpha_{0}=0
         P(α0)0P(\alpha_{0})\leftarrow 0 \triangleright α0=0\alpha_{0}=0
         for i=1,,k1i=1,\dots,k-1 do
             P(αi)αiQ(αi)P(\alpha_{i})\leftarrow\alpha_{i}\cdot Q(\alpha_{i})      
         P(αk)h(αk)P(\alpha_{k})\leftarrow h(\alpha_{k}) \triangleright αk=+\alpha_{k}=+\infty

Let us now verify that both PP and QQ are valid probability distributions. First, notice that Q(αi)0Q(\alpha_{i})\geq 0 due to the convexity of hh. Furthermore,

Q(0)=1j[k1]Q(αj)=11h(α1)α10,\displaystyle Q(0)=1-\sum_{j\in[k-1]}Q(\alpha_{j})=1-\frac{1-h(\alpha_{1})}{\alpha_{1}}\geq 0,

where the last inequality follows from (iii). Thus, QQ is indeed a probability distribution. As for PP, notice that P(αi)0P(\alpha_{i})\geq 0 for all αi𝒜\alpha_{i}\in\mathcal{A}. Finally, we also have

i{0,,k}P(αi)\displaystyle\sum_{i\in\{0,\dots,k\}}P(\alpha_{i})
=h(+)+i{0,,k1}αiQ(αi)\displaystyle=h(+\infty)+\sum_{i\in\{0,\dots,k-1\}}\alpha_{i}\cdot Q(\alpha_{i})
=h(+)+\displaystyle=h(+\infty)+
i{0,,k1}αi(h(αi1)h(αi)αiαi1h(αi)h(αi+1)αi+1αi)\displaystyle\sum_{i\in\{0,\dots,k-1\}}\alpha_{i}\left(\frac{h(\alpha_{i-1})-h(\alpha_{i})}{\alpha_{i}-\alpha_{i-1}}-\frac{h(\alpha_{i})-h(\alpha_{i+1})}{\alpha_{i+1}-\alpha_{i}}\right)
=h(+)+i[k1](αi+1αi)h(αi)h(αi+1)αi+1αi\displaystyle=h(+\infty)+\sum_{i\in[k-1]}(\alpha_{i+1}-\alpha_{i})\cdot\frac{h(\alpha_{i})-h(\alpha_{i+1})}{\alpha_{i+1}-\alpha_{i}}
=h(+)+(h(0)h(+))\displaystyle=h(+\infty)+(h(0)-h(+\infty))
=1,\displaystyle=1,

meaning that PP is a probability distribution as desired.

Properties 1 and 2 are immediate from the construction. We will now check Property 3, based on two cases whether α>αk1\alpha>\alpha_{k-1}.

  • \triangleright

    Case I: ααk1\alpha\geq\alpha_{k-1}. In this case, we have Dα(P||Q)=P(+)eεQ(+)=h(+)D_{\alpha}(P||Q)=P(+\infty)-e^{\varepsilon}Q(+\infty)=h(+\infty).

  • \triangleright

    Case II: α<αk1\alpha<\alpha_{k-1}. Suppose that α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}) for i[k1]i\in[k-1]. We have

    Dα(P||Q)\displaystyle D_{\alpha}(P||Q) =P({αi,,αk})αQ({αi,,αk})\displaystyle=P(\{\alpha_{i},\dots,\alpha_{k}\})-\alpha\cdot Q(\{\alpha_{i},\dots,\alpha_{k}\})
    =j=ik(αjα)Q(αj)\displaystyle=\sum_{j=i}^{k}(\alpha_{j}-\alpha)\cdot Q(\alpha_{j})
    =ααiαi1αij=ik(αjαi1)Q(αj)\displaystyle=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i-1})\cdot Q(\alpha_{j})
    +αi1ααi1αij=ik(αjαi)Q(αj).\displaystyle\quad+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i})\cdot Q(\alpha_{j}).

    Furthermore, we have

    j=ik(αjαi1)Q(αj)\displaystyle\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i-1})\cdot Q(\alpha_{j})
    =j=ik(αjαi1)\displaystyle=\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i-1})
    (h(αj1)h(αj)αjαj1h(αj)h(αj+1)αj+1αj)\displaystyle\qquad\cdot\left(\frac{h(\alpha_{j-1})-h(\alpha_{j})}{\alpha_{j}-\alpha_{j-1}}-\frac{h(\alpha_{j})-h(\alpha_{j+1})}{\alpha_{j+1}-\alpha_{j}}\right)
    =h(αi1).\displaystyle=h(\alpha_{i-1}).

    Similarly, we also have j=ik(αjαi)Q(αj)=h(αi)\sum_{j=i}^{k}(\alpha_{j}-\alpha_{i})\cdot Q(\alpha_{j})=h(\alpha_{i}). Combining the three equalities, we arrive at

    Dα(P||Q)\displaystyle D_{\alpha}(P||Q) =ααiαi1αih(αi1)+αi1ααi1αih(αi),\displaystyle=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i}),

    which is equal to h(α)h(\alpha) due to assumption (iv).

As a result, h(P,Q)=hh_{(P,Q)}=h as desired.

()(\Rightarrow) We will now prove this direction. (i), (ii), and (iii) follow immediately from Lemma 2.1. As a result, it suffices to only prove (iv) and (v). Suppose that h(P,Q)=hh_{(P,Q)}=h for some pair (P,Q)(P,Q) such that supp(𝖯𝖫𝖣(P,Q))\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E}. Let RR be a shorthand for the distribution of exp(𝖯𝖫𝖣(P,Q))\exp(\mathsf{PLD}_{(P,Q)}). To prove (iv), consider any α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}) for some i[k1]i\in[k-1]. We then have

h(α)\displaystyle h(\alpha) =Dα(P||Q)\displaystyle=D_{\alpha}(P||Q)
=j=ik(1α/αj)R(αj)\displaystyle=\sum_{j=i}^{k}(1-\alpha/\alpha_{j})\cdot R(\alpha_{j})
=ααiαi1αij=ik(1αi1/αj)R(αj)\displaystyle=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot\sum_{j=i}^{k}(1-\alpha_{i-1}/\alpha_{j})\cdot R(\alpha_{j})
αi1ααi1αij=ik(1αi/αj)R(αj)\displaystyle\qquad\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot\sum_{j=i}^{k}(1-\alpha_{i}/\alpha_{j})\cdot R(\alpha_{j})
=ααiαi1αih(αi1)+αi1ααi1αih(αi),\displaystyle=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i}),

which completes the proof of (iv).

Next, we prove (v). Consider any ααk1\alpha\geq\alpha_{k-1}. We have

h(α)=Dα(P||Q)=R(+)=h(+),\displaystyle h(\alpha)=D_{\alpha}(P||Q)=R(+\infty)=h(+\infty),

thereby completing our proof. ∎

4 Pessimistic PLDs with Finite Support

Refer to caption
(a) Gaussian Mechanism
Refer to caption
(b) Laplace Mechanism
Fig. 2: Illustrations of the hockey-stick curves of the Gaussian and Laplace mechanisms (with noise multipliers equal to 1 and 2/32/3 respectively), and their pessimistic estimates from our approach (labelled “pessimistic”) and the Privacy Bucket (PB) approach (labelled “PB pessimistic”) of [24]. The horizontal lines represent the discretization points in the set 𝒜\mathcal{A}. As corroborated by Corollary 4.3, our pessimistic estimates is closer to the true curves (labelled “exact”) compared to the PB pessimistic estimates for all α\alpha.

As we have described in Figure 1, pessimistic estimates of PLDs with finite supports are crucial in the PLD-based privacy accounting framework. The better these pessimistic estimates approximate the true PLD, the more accurate is the resulting upper bound δ(ε)\delta^{\uparrow}(\varepsilon).

Equipped with tools developed in the previous section, we will now describe our finite-support pessimistic estimate of a PLD. Specifically, given a pair (A,B)(A,B) of distributions, we would like to compute (P,Q)(P^{\uparrow},Q^{\uparrow}) such that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\succeq(A,B) with supp(𝖯𝖫𝖣(P,Q))\operatorname*{supp}(\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})})\subseteq\mathcal{E}. In fact, as we will show below (Lemma 4.1), our choice of 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})} “best approximates” 𝖯𝖫𝖣(A,B)\mathsf{PLD}_{(A,B)}.

Our construction of the pair (P,Q)(P^{\uparrow},Q^{\uparrow}) is simple: run DiscretizePLD (Algorithm 1) on the input h(A,B)(α0),,h(A,B)(αk)h_{(A,B)}(\alpha_{0}),\dots,h_{(A,B)}(\alpha_{k}).

Recall from the proof of Lemma 3.2 that this construction simply gives h(P,Q)h_{(P^{\uparrow},Q^{\uparrow})}, which is a piecewise-linear extension of h(A,B)(α0),,h(A,B)(αk)h_{(A,B)}(\alpha_{0}),\dots,h_{(A,B)}(\alpha_{k}). In other words, we simply “connect the dots” to construct the hockey-stick curve of our pessimistic estimate. Note that this, together with the convexity of h(A,B)h_{(A,B)} (Lemma 2.1), implies that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\succeq(A,B) as desired.

Additionally, it is not hard to observe that our choice of pessimistic PLD is the best possible, in sense that (P,Q)(P^{\uparrow},Q^{\uparrow}) is the least element (under the domination partial order) among all pairs that dominate (A,B)(A,B):

Lemma 4.1.

Let P,QP,Q be any pair of distributions such that 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)} is supported on 𝒜\mathcal{A} and (P,Q)(A,B)(P,Q)\succeq(A,B). Then, we must have (P,Q)(P,Q)(P,Q)\succeq(P^{\uparrow},Q^{\uparrow})

  • Proof.

    Recall that it suffices to prove that h(P,Q)(α)h(P,Q)(α)h_{(P,Q)}(\alpha)\geq h_{(P^{\uparrow},Q^{\uparrow})}(\alpha) for all α0{+}\alpha\in{\mathbb{R}}_{\geq 0}\cup\left\{+\infty\right\}. We will consider two cases based on the value of α\alpha:

    • \triangleright

      Case I: ααk1\alpha\geq\alpha_{k-1}. From Lemma 3.2, we simply have h(P,Q)(α)=h(P,Q)(+)h(A,B)(+)=h(P,Q)(α)h_{(P,Q)}(\alpha)=h_{(P,Q)}(+\infty)\geq h_{(A,B)}(+\infty)=h_{(P^{\uparrow},Q^{\uparrow})}(\alpha).

    • \triangleright

      Case II: αk1>α0\alpha_{k-1}>\alpha\geq 0. Suppose that α[αi1,αi)\alpha\in[\alpha_{i-1},\alpha_{i}). From Lemma 3.2(ii), we have

      h(P,Q)(α)\displaystyle h_{(P,Q)}(\alpha)
      =ααiαi1αih(P,Q)(αi1)+αi1ααi1αih(P,Q)(αi)\displaystyle=\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h_{(P,Q)}(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h_{(P,Q)}(\alpha_{i})
      ααiαi1αih(αi1)+αi1ααi1αih(αi)\displaystyle\geq\frac{\alpha-\alpha_{i}}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i-1})+\frac{\alpha_{i-1}-\alpha}{\alpha_{i-1}-\alpha_{i}}\cdot h(\alpha_{i})
      =h(P,Q)(α),\displaystyle=h_{(P^{\uparrow},Q^{\uparrow})}(\alpha),

      where the first inequality follows from (P,Q)(A,B)(P,Q)\succeq(A,B) and the last equality follows from our construction of (P,Q)(P^{\uparrow},Q^{\uparrow}). ∎

We remark that 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})} also has a simple form, due to the properties 1 and 2:

𝖯𝖫𝖣(P,Q)(εi):=P(αi)\displaystyle\mathsf{PLD}_{(P^{\uparrow},Q^{\uparrow})}(\varepsilon_{i}):=P^{\uparrow}(\alpha_{i})

for all i[k]i\in[k].

4.1 Comparison to Privacy Loss Buckets

The primary previous work that also derived a pessimistic estimate of PLD is that of Meiser and Mohammadi [24], which has also been used (implicitly) in later works [19, 20, 18]. In our terminology, the Privacy Buckets (PB) algorithm of Meiser and Mohammadi [24]666This is referred to as grid approximation in [19, 20, 18]. can be restated as follows: let the pessimistic-PB estimate 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} be the probability distribution where

𝖯𝖫𝖣~(A,B)(εi)=𝖯𝖫𝖣(A,B)((εi1,εi]),\displaystyle\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}(\varepsilon_{i})=\mathsf{PLD}_{(A,B)}((\varepsilon_{i-1},\varepsilon_{i}]),

for all i[k]i\in[k]. In other words, 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is a probability distribution on \mathcal{E} that stochastically dominates 𝖯𝖫𝖣(A,B)\mathsf{PLD}_{(A,B)}; furthermore, 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is the least such distribution under stochastic dominant (partial) ordering. In previous works [24, 19, 20, 18], such an estimate is then used in place of the true (non-discretized) PLD for accounting and computing δ\delta’s (via Lemma 2.7 and Lemma 2.6).

A priori, it is not clear whether 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} is even a valid PLD (for some pair of distributions). However, it not hard to prove that this is indeed the case:

Lemma 4.2.

There exists a pair (PPB,QPB)(P_{\mathrm{PB}}^{\uparrow},Q_{\mathrm{PB}}^{\uparrow}) of distributions such that 𝖯𝖫𝖣~(A,B)=𝖯𝖫𝖣(PPB,QPB)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}=\mathsf{PLD}_{(P_{\mathrm{PB}}^{\uparrow},Q_{\mathrm{PB}}^{\uparrow})}.

  • Proof.

    Let PPBP_{\text{PB}}^{\uparrow} be defined by

    PPB(αi)=𝖯𝖫𝖣~(A,B)(εi)\displaystyle P_{\text{PB}}^{\uparrow}(\alpha_{i})=\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}(\varepsilon_{i})

    for all i{0,,k}i\in\{0,\dots,k\}. It is clear that PPBP_{\text{PB}}^{\uparrow} is a valid distribution.

    Then, define QPBQ_{\text{PB}}^{\uparrow} by

    QPB(α)=PPB(α)/α,\displaystyle Q_{\text{PB}}^{\uparrow}(\alpha)=P_{\text{PB}}^{\uparrow}(\alpha)/\alpha,

    for all α𝒜{0}\alpha\in\mathcal{A}\setminus\{0\} and let QPB(0)=1α𝒜{0}QPB(α)Q_{\text{PB}}^{\uparrow}(0)=1-\sum_{\alpha\in\mathcal{A}\setminus\{0\}}Q_{\text{PB}}^{\uparrow}(\alpha). To check that QPBQ_{\text{PB}}^{\uparrow} is a valid distribution, it suffices to show that QPB(0)0Q_{\text{PB}}^{\uparrow}(0)\geq 0. This is true because

    α𝒜{0}QPB(α)\displaystyle\sum_{\alpha\in\mathcal{A}\setminus\{0\}}Q_{\text{PB}}^{\uparrow}(\alpha) =i[k]PPB(αi)/αi\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{i\in[k]}P_{\text{PB}}^{\uparrow}(\alpha_{i})/\alpha_{i}
    =i[k]𝖯𝖫𝖣~(A,B)(εi)/αi\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{i\in[k]}\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}(\varepsilon_{i})/\alpha_{i}
    =i[k]𝖯𝖫𝖣(A,B)((εi1,εi])/αi\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{i\in[k]}\mathsf{PLD}_{(A,B)}((\varepsilon_{i-1},\varepsilon_{i}])/\alpha_{i}
    =i[k]osupp(B)A(o)/B(o)(αi1,αi]A(o)/αi\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{i\in[k]}\sum_{\begin{subarray}{c}o\in\operatorname*{supp}(B)\\ A(o)/B(o)\in(\alpha_{i-1},\alpha_{i}]\end{subarray}}A(o)/\alpha_{i}
    i[k]osupp(B)A(o)/B(o)(αi1,αi]B(o)\displaystyle\leavevmode\nobreak\ \leq\leavevmode\nobreak\ \sum_{i\in[k]}\sum_{\begin{subarray}{c}o\in\operatorname*{supp}(B)\\ A(o)/B(o)\in(\alpha_{i-1},\alpha_{i}]\end{subarray}}B(o)
    1.\displaystyle\leavevmode\nobreak\ \leq\leavevmode\nobreak\ 1.

    Finally, it is obvious from the definitions of PPB,QPBP_{\text{PB}}^{\uparrow},Q_{\text{PB}}^{\uparrow} that 𝖯𝖫𝖣(PPB,QPB)=𝖯𝖫𝖣~(A,B)\mathsf{PLD}_{(P_{\text{PB}}^{\uparrow},Q_{\text{PB}}^{\uparrow})}=\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}. ∎

Combining the above lemma and the fact that 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)} dominates 𝖯𝖫𝖣(A,B)\mathsf{PLD}_{(A,B)} with Lemma 4.1, we can conclude that our estimate is no worse than the PB estimate:

Corollary 4.3.

Let (P,Q)(P^{\uparrow},Q^{\uparrow}) be as defined above. Then, for all α0\alpha\geq 0, we have h(P,Q)(α)Dα(𝖯𝖫𝖣~(A,B))h_{(P^{\uparrow},Q^{\uparrow})}(\alpha)\leq D_{\alpha}(\widetilde{\mathsf{PLD}}^{\uparrow}_{(A,B)}).

Illustrations of the exact hockey-stick divergence and its pessimistic estimates from our approach and PB approach can be found in Figure 2. A more detailed evaluation of the error from the two approaches (after compositions) can be found in Section 6.

5 Optimistic PLDs with Finite Support

We next consider optimistic PLDs, i.e., 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)} dominated by a given 𝖯𝖫𝖣(A,B)\ \mathsf{PLD}_{(A,B)}. We start by showing that, unlike pessimistic PLDs for which there is the “best” possible choice (Lemma 4.1), there is no such a choice for optimistic PLDs:

Lemma 5.1.

There exists a pair (A,B)(A,B) of distributions and a finite set 𝒜\mathcal{A} such that, for any pair (P,Q)(A,B)(P,Q)\preceq(A,B) such that supp(𝖯𝖫𝖣(P,Q))\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})\subseteq\mathcal{E}, there exists a pair (P,Q)(A,B)(P^{\prime},Q^{\prime})\preceq(A,B) such that 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P^{\prime},Q^{\prime})} is supported on \mathcal{E} and (P,Q)(P,Q)(P^{\prime},Q^{\prime})\npreceq(P,Q).

  • Proof.

    Let (A,B)(A,B) be the result of the ε\varepsilon-DP binary randomized response, i.e.,

    A(0)=B(1)=eεeε+1,\displaystyle A(0)=B(1)=\frac{e^{\varepsilon}}{e^{\varepsilon}+1},
    A(1)=B(0)=1eε+1.\displaystyle A(1)=B(0)=\frac{1}{e^{\varepsilon}+1}.

    It is simple to verify that

    h(A,B)(α)={1α if αeε,eεeε+1αeε+1 if eε<α<eε,0 if αε.\displaystyle h_{(A,B)}(\alpha)=\begin{cases}1-\alpha&\text{ if }\alpha\leq e^{-\varepsilon},\\ \frac{e^{\varepsilon}}{e^{\varepsilon}+1}-\frac{\alpha}{e^{\varepsilon}+1}&\text{ if }e^{-\varepsilon}<\alpha<e^{\varepsilon},\\ 0&\text{ if }\alpha\geq\varepsilon.\end{cases}

    Let 𝒜\mathcal{A} be {0,α1,α2,+}\{0,\alpha_{1},\alpha_{2},+\infty\} where α1=eεγ,α2=eε+γ\alpha_{1}=e^{-\varepsilon}-\gamma,\alpha_{2}=e^{-\varepsilon}+\gamma for any γ<min{eεeε,eε}\gamma<\min\{e^{\varepsilon}-e^{-\varepsilon},e^{-\varepsilon}\}.

    Let h1:𝒜{+}[0,1]h_{1}:\mathcal{A}\cup\{+\infty\}\to[0,1] be defined as

    h1(0)\displaystyle h_{1}(0) =1,\displaystyle=1,
    h1(α1)\displaystyle h_{1}(\alpha_{1}) =h(A,B)(α1),\displaystyle=h_{(A,B)}(\alpha_{1}),
    h1(α2)\displaystyle h_{1}(\alpha_{2}) =1α2,\displaystyle=1-\alpha_{2},
    h1(+)\displaystyle h_{1}(+\infty) =0,\displaystyle=0,

    and let h¯1\overline{h}_{1} be its piecewise-linear extension. It is again simple to verify that h¯1\overline{h}_{1} satisfies the conditions in Lemma 3.2 and therefore h¯1=h(P1,Q1)\overline{h}_{1}=h_{(P_{1},Q_{1})} for some P1,Q1P_{1},Q_{1} such that 𝖯𝖫𝖣(P1,Q1)\mathsf{PLD}_{(P_{1},Q_{1})} is supported on \mathcal{E}. Furthermore, it can be checked from our definition that (A,B)(P1,Q1)(A,B)\succeq(P_{1},Q_{1}).

    Similarly, let h2:𝒜{+}[0,1]h_{2}:\mathcal{A}\cup\{+\infty\}\to[0,1] be defined as

    h2(0)\displaystyle h_{2}(0) =1,\displaystyle=1,
    h2(α1)\displaystyle h_{2}(\alpha_{1}) =1eε+γeε+1,\displaystyle=1-e^{-\varepsilon}+\frac{\gamma}{e^{\varepsilon}+1},
    h2(α2)\displaystyle h_{2}(\alpha_{2}) =h(A,B)(α2),\displaystyle=h_{(A,B)}(\alpha_{2}),
    h2(+)\displaystyle h_{2}(+\infty) =0,\displaystyle=0,

    and let h¯2\overline{h}_{2} be its piecewise-linear extension. Again, h¯2=h(P2,Q2)\overline{h}_{2}=h_{(P_{2},Q_{2})} for some P2,Q2P_{2},Q_{2} such that 𝖯𝖫𝖣(P2,Q2)\mathsf{PLD}_{(P_{2},Q_{2})} is supported on \mathcal{E} and (A,B)(P2,Q2)(A,B)\succeq(P_{2},Q_{2}).

    Now, consider any (P,Q)(A,B)(P,Q)\preceq(A,B) such that 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)} is supported on \mathcal{E}. We claim that (P,Q)(P^1,Q^1)(P,Q)\nsucceq(\hat{P}_{1},\hat{Q}_{1}) or (P,Q)(P2,Q2)(P,Q)\nsucceq(P_{2},Q_{2}). To prove this, assume for the sake of contradiction that (P,Q)(P1,Q1)(P,Q)\nsucceq(P_{1},Q_{1}) and (P,Q)(P2,Q2)(P,Q)\nsucceq(P_{2},Q_{2}). This means that

    h(P,Q)(α1)\displaystyle h_{(P,Q)}(\alpha_{1}) h1(α1)=h(A,B)(α1),\displaystyle\geq h_{1}(\alpha_{1})=h_{(A,B)}(\alpha_{1}),
    h(P,Q)(α2)\displaystyle h_{(P,Q)}(\alpha_{2}) h2(α2)=h(A,B)(α2).\displaystyle\geq h_{2}(\alpha_{2})=h_{(A,B)}(\alpha_{2}).

    From piecewise-linearity of h(P,Q)h_{(P,Q)} restricted to [α1,α2][\alpha_{1},\alpha_{2}] (Lemma 3.2), we then have h(P,Q)(eε)=12(h(P,Q)(α1)+h(P,Q)(α2))>h(A,B)(eε)h_{(P,Q)}(e^{\varepsilon})=\frac{1}{2}\left(h_{(P,Q)}(\alpha_{1})+h_{(P,Q)}(\alpha_{2})\right)>h_{(A,B)}(e^{\varepsilon}), a contradiction to the assumption that (P,Q)(A,B)(P,Q)\preceq(A,B). ∎

5.1 A Greedy and Convex Hull Construction

The previous lemma shows that, unfortunately, there is no canonical choice for an optimistic estimate for a given PLD. Due to this, we propose a simple greedy algorithm to construct an optimistic estimate for the PLD of a given pair (A,B)(A,B). Similar to before, it will be more convenient to deal directly with the hockey-stick curve. Here we would like to construct f:𝒜[0,1]f:\mathcal{A}\to[0,1] such that its piecewise-linear extension f¯\overline{f} point-wise lower bounds h(A,B)h_{(A,B)}. The distribution P,QP,Q (and 𝖯𝖫𝖣(P,Q)\mathsf{PLD}_{(P,Q)}) can then be computed using Algorithm 1.

Our algorithm will assume that we can compute the derivative of hh at any given point α0\alpha\in{\mathbb{R}}_{\geq 0} (denoted by h(α)h^{\prime}(\alpha)). We remark that, for many widely used mechanisms including Laplace and Gaussian mechanisms, the closed-formed formula for h(α)h^{\prime}(\alpha) can be easily computed.

First Greedy Attempt. Before describing our algorithm, let us describe an approach that does not work; this will demonstrate the hurdles we have to overcome. Consider the following simple greedy algorithm: start with f(α0)=0f(\alpha_{0})=0 and, if we are currently at f(αi)f(\alpha_{i}), then find the largest possible f(αi+1)f(\alpha_{i+1}) such that the line f(αi),f(αi+1)f(\alpha_{i}),f(\alpha_{i+1}) is below h(A,B)h_{(A,B)}. (In other words, the line f(αi),f(αi+1)f(\alpha_{i}),f(\alpha_{i+1}) is tangent to h(A,B)h_{(A,B)}.)

While this is a natural approach, there are two issues with this algorithm:

  • \triangleright

    First and more importantly, it is possible that at some discretization point f(αi+1)f(\alpha_{i+1}) becomes negative! Obviously this invalidates the construction as ff will not correspond to a hockey-stick curve.

  • \triangleright

    Secondly, each computation of tangent line requires several (and sequential) computations of the derivative hh^{\prime}—rendering the algorithm inefficient—and is also subject to possible numerical instability.

We remark that if we instead start from right (i.e., f(αk)f(\alpha_{k})) and proceed greedily to the left (in decreasing order of ii), then the first issue will become that f(αi)f(\alpha_{i}) can be smaller than [1αi]+[1-\alpha_{i}]_{+}, which also makes it an invalid hockey-stick curve due to Lemma 2.1(iii). As will be explained below, we will combine these two directions of greedy together with a convex hull algorithm in our revised approach.

An Additional Assumption. For our algorithm, we will also need a couple of assumptions. The first one is that 1 belongs to the discretization set:

Assumption 5.2.

1𝒜1\in\mathcal{A} (or equivalently 00\in\mathcal{E}).

For the remainder of this section, we will use i[k]i^{*}\in[k] to denote the index for which αi=1\alpha_{i^{*}}=1 (i.e., εi=0\varepsilon_{i^{*}}=0).

We show below that this assumption is necessary. When it does not hold, then it may simply be impossible to find an optimistic estimate at all:

Lemma 5.3.

There exists a pair (A,B)(A,B) of distributions such that any (P,Q)(A,B)(P,Q)\preceq(A,B) satisfies 0supp(𝖯𝖫𝖣(P,Q))0\in\operatorname*{supp}(\mathsf{PLD}_{(P,Q)}).

  • Proof.

    Let A=BA=B. We simply have h(A,B)(α)=[1α]+h_{(A,B)}(\alpha)=[1-\alpha]_{+}. Due to Lemma 2.1(iii), we must have h(P,Q)(α)=[1α]+h_{(P,Q)}(\alpha)=[1-\alpha]_{+}. This simply implies 𝖯𝖫𝖣(P,Q)(0)=1\mathsf{PLD}_{(P,Q)}(0)=1 as desired. ∎

Our Greedy + Convex Hull Algorithm. We are now ready to describe our final algorithm. The main idea is to not attempt to create the curve in one left-to-right or right-to-left sweep, but rather to simply generate a “candidate set” FiF_{i} for each f(αi)f(\alpha_{i}). (Such a set will in fact be a singleton for all ii except i=ii=i^{*}, for which |Fi|2|F_{i}|\leq 2, but we will refer to FiF_{i}’s as sets here for simplicity of notation.) We then compute the convex hull of these points {(αi,fi)}i[k1],fiFi\{(\alpha_{i},f_{i})\}_{i\in[k-1],f_{i}\in F_{i}} and take it (or more precisely its lower curve) as our optimistic estimate. This last step immediately ensures the convexity of our curve, which is required for it to be a valid hockey-stick curve (Lemma 2.1(i)).

To construct the candidate set FiF_{i}, we combine the left-to-right and right-to-left greedy approaches. Specifically, for i={0,,i1}i=\{0,\dots,i^{*}-1\}, we draw the tangent line of the true curve hh at h(αi)h(\alpha_{i}) and let its intersection with the vertical line α=αi+1\alpha=\alpha_{i+1} be (αi+1,fi+1)(\alpha_{i+1},f^{\rightarrow}_{i+1}); then we add fi+1f^{\rightarrow}_{i+1} into Fi+1F_{i+1}. Similarly, for i={k1,,i+1}i=\{k-1,\dots,i^{*}+1\}, we draw the tangent line of the true curve hh at h(αi)h(\alpha_{i}) and let its intersection with the vertical line α=αi1\alpha=\alpha_{i-1} be (αi1,fi1)(\alpha_{i-1},f^{\leftarrow}_{i-1}); then we add fi1f^{\leftarrow}_{i-1} into Fi1F_{i-1}. At the very end points i=0i=0 and i=k1i=k-1, we also add 11 and 0 respectively to FiF_{i}.

Notice that this algorithm, unlike the previous (failed) greedy approach, only requires a calculation of hh^{\prime} at each αi𝒜{1,+}\alpha_{i}\in\mathcal{A}\setminus\{1,+\infty\}, which can be done in parallel. Furthermore, efficient algorithms for convex hull are well known in the literature and can be used directly.

The complete and more precise description of our algorithm is given in Algorithm 2. We also note here that |Fi|=1|F_{i}|=1 for all iii\neq i^{*} and |Fi|2|F_{i^{*}}|\leq 2 (as the point constructed from the left may be different from the point from the right). Nonetheless, we write FiF_{i}’s as sets for simplicity of notation. An illustration of the algorithm can be found in Figure 4.

Algorithm 2 Optimistic PLD Construction.
procedure OptimisticPLD(h,𝒜h,\mathcal{A})
     for i=0,i1i=0,\ldots i^{*}-1 do
         fi+1=h(αi)+(αi+1αi)h(αi)f^{\rightarrow}_{i+1}=h(\alpha_{i})+(\alpha_{i+1}-\alpha_{i})\cdot h^{\prime}(\alpha_{i})      
     f01f^{\rightarrow}_{0}\leftarrow 1 \triangleright h(α0)=1h(\alpha_{0})=1
     for i=k1,i+1i=k-1,\ldots i^{*}+1 do
         fi1=h(αi)(αiαi1)h(αi)f^{\leftarrow}_{i-1}=h(\alpha_{i})-(\alpha_{i}-\alpha_{i-1})\cdot h^{\prime}(\alpha_{i})      
     fk10f^{\leftarrow}_{k-1}\leftarrow 0
     HH\leftarrow ConvexHull({(αi,fi)}i{0,,i}{(αi,fi)}i{i,,k1}\{(\alpha_{i},f^{\rightarrow}_{i})\}_{i\in\{0,\dots,i^{*}\}}\cup\{(\alpha_{i},f^{\leftarrow}_{i})\}_{i\in\{i^{*},\dots,k-1\}})
     for i=0,,k1i=0,\ldots,k-1 do
         (αi,f(αi))(\alpha_{i},f(\alpha_{i}))\leftarrow lowest intersection point between HH and the vertical line α=αi\alpha=\alpha_{i}      
     f(αk)0f(\alpha_{k})\leftarrow 0 \triangleright αk=+\alpha_{k}=+\infty return DiscretizePLD(f(α0),,f(αk)f(\alpha_{0}),\dots,f(\alpha_{k}))

Having described our algorithm, we will now proof its correctness, i.e., that it outputs a pair of distributions dominated by the input pair.

Theorem 5.4.

Let (P,Q)(P^{\downarrow},Q^{\downarrow}) denote the output of OptimisticPLD(h,𝒜)\textsc{OptimisticPLD}(h,\mathcal{A}) where h=h(A,B)h=h_{(A,B)}. Then, under 5.2, we have (P,Q)(A,B)(P^{\downarrow},Q^{\downarrow})\preceq(A,B).

To prove Theorem 5.4, it will be crucial to have the following lower bounds on the candidate points.

Lemma 5.5.
  1. (i)

    For all i{0,,i}i\in\{0,\dots,i^{*}\}, f(αi)1αif^{\rightarrow}(\alpha_{i})\geq 1-\alpha_{i}.

  2. (ii)

    For all i{i,,k1}i\in\{i^{*},\dots,k-1\}, f(αi)0f^{\leftarrow}(\alpha_{i})\geq 0.

  3. (iii)

    For all i{0,,i}i\in\{0,\dots,i^{*}\}, f(αi)h(αi)f^{\rightarrow}(\alpha_{i})\leq h(\alpha_{i}).

  4. (iv)

    For all i{i,,k1}i\in\{i^{*},\dots,k-1\}, f(αi)h(αi)f^{\leftarrow}(\alpha_{i})\leq h(\alpha_{i}).

  • Proof.
    1. (i)

      The statement obviously holds for i=0i=0. Next, consider i[i]i\in[i^{*}]. From (ii) and (iii) of Lemma 2.1, we have h(0)1h^{\prime}(0)\geq-1. Furthermore, the convexity of hh (Lemma 2.1(i)) implies that h(αk1)h(0)1h^{\prime}(\alpha_{k-1})\geq h^{\prime}(0)\geq-1. Therefore, we have

      f(αi)\displaystyle f^{\rightarrow}(\alpha_{i}) =h(αi1)+(αiαi1)h(αi1)\displaystyle=h(\alpha_{i-1})+(\alpha_{i}-\alpha_{i-1})\cdot h^{\prime}(\alpha_{i-1})
      h(αi1)(αiαi1)\displaystyle\geq h(\alpha_{i-1})-(\alpha_{i}-\alpha_{i-1})
      (1αi1)(αiαi1)\displaystyle\geq(1-\alpha_{i-1})-(\alpha_{i}-\alpha_{i-1})
      =1αi,\displaystyle=1-\alpha_{i},

      where the second inequality is due to Lemma 2.1(iii).

    2. (ii)

      The statement obviously holds for i=k1i=k-1. Next, consider i{i,,k2}i\in\{i^{*},\dots,k-2\}. Since hh is non-increasing (from Lemma 2.1(i)), we have h(αi+1)0h^{\prime}(\alpha_{i+1})\leq 0. Therefore, f(αi)h(αi+1)0f^{\leftarrow}(\alpha_{i})\geq h(\alpha_{i+1})\geq 0.

    3. (iii)

      The statement obviously holds for i=0i=0. For i[i]i\in[i^{*}], the convexity of hh (Lemma 2.1(i)) immediately implies that

      f(αi)=h(αi1)+(αiαi1)h(αi1)h(αi).\displaystyle f^{\rightarrow}(\alpha_{i})=h(\alpha_{i-1})+(\alpha_{i}-\alpha_{i-1})\cdot h^{\prime}(\alpha_{i-1})\leq h(\alpha_{i}).
    4. (iv)

      The statement obviously holds for i=k1i=k-1. For i{i,,k2}i\in\{i^{*},\dots,k-2\}, the convexity of hh (Lemma 2.1(i)) immediately implies that

      f(αi)\displaystyle f^{\rightarrow}(\alpha_{i}) =h(αi+1)+(αi+1αi)h(αi+1)\displaystyle=h(\alpha_{i+1})+(\alpha_{i+1}-\alpha_{i})\cdot h^{\prime}(\alpha_{i+1})
      h(αi).\displaystyle\leq h(\alpha_{i}).\qed

We are now ready to prove Theorem 5.4.

  • Proof of Theorem 5.4.

    We start by observing that the vertices of the convex hull HH consists of the points (α0,f(α0)),,(αq,f(αq)),(αq+1,f(αq+1)),(\alpha_{\ell_{0}},f^{\rightarrow}(\alpha_{\ell_{0}})),\dots,(\alpha_{\ell_{q}},f^{\rightarrow}(\alpha_{\ell_{q}})),(\alpha_{\ell_{q+1}},f^{\leftarrow}(\alpha_{\ell_{q+1}})), ,(αm,f(αm))\dots,(\alpha_{\ell_{m}},f^{\leftarrow}(\alpha_{\ell_{m}})), where 0=0<<m=k10=\ell_{0}<\cdots<\ell_{m}=k-1 and qiq+1\ell_{q}\leq i^{*}\leq\ell_{q+1}.

    First, we have to show that f(α0),,f(αk)f(\alpha_{0}),\dots,f(\alpha_{k}) constitute a valid input to the DiscretizePLD algorithm. Per Lemma 3.2, we only need to show that (i) f¯\overline{f} is non-increasing, (ii) f¯\overline{f} is convex, and (iii) f¯(α)[1α]+\overline{f}(\alpha)\geq[1-\alpha]_{+} for all α0\alpha\in{\mathbb{R}}_{\geq 0}. Notice also that f¯\overline{f} is simply the piecewise-linear curve connecting (α0,f(α0)),,(αq,f(αq)),(αq+1,f(αq+1)),(\alpha_{\ell_{0}},f^{\rightarrow}(\alpha_{\ell_{0}})),\dots,(\alpha_{\ell_{q}},f^{\rightarrow}(\alpha_{\ell_{q}})),(\alpha_{\ell_{q+1}},f^{\leftarrow}(\alpha_{\ell_{q+1}})), ,(αm,f(αm))\dots,(\alpha_{\ell_{m}},f^{\leftarrow}(\alpha_{\ell_{m}})).

To see that (i) holds, observe that the second-rightmost point in the convex hull must be (αj,fj)(\alpha_{j},f_{j}) for some j{0,,k2}j\in\{0,\dots,k-2\} where fj=fjf_{j}=f^{\rightarrow}_{j} or fj=fjf_{j}=f^{\leftarrow}_{j}. In either case, Lemma 5.5 implies that fj0=f(αk1)f_{j}\geq 0=f^{\leftarrow}(\alpha_{k-1}). Since f¯\overline{f} is the lower curve of the convex hull HH and (αj,fj),(αk1,0)(\alpha_{j},f_{j}),(\alpha_{k-1},0) is its rightmost segment, we can conclude that f¯\overline{f} is non-increasing in the range [α0,αk1][\alpha_{0},\alpha_{k-1}]. Finally, since we simply have f¯(α)=0\overline{f}(\alpha)=0 for all α>αk1\alpha>\alpha_{k-1}, it is also non-increasing in the range [αk1,+)[\alpha_{k-1},+\infty), thereby proving (i).

As for (ii), since f¯|[α0,αk1]\overline{f}|_{[\alpha_{0},\alpha_{k-1}]} forms the lower boundary of the convex hull HH, f¯\overline{f} is convex in the range [α0,αk1][\alpha_{0},\alpha_{k-1}]. Again, since we simply have f¯(α)=0\overline{f}(\alpha)=0 for all α>αk1\alpha>\alpha_{k-1}, we can conclude that it is convex for the entire range [0,+)[0,+\infty).

Finally, for (iii), Lemma 5.5 states that all the points {(αi,fi)}i{0,,i}{(αi,fi)}i{i,,k1}\{(\alpha_{i},f^{\rightarrow}_{i})\}_{i\in\{0,\dots,i^{*}\}}\cup\{(\alpha_{i},f^{\leftarrow}_{i})\}_{i\in\{i^{*},\dots,k-1\}} is above the curve α[1α]+\alpha\mapsto[1-\alpha]_{+}. Since f¯\overline{f} is in the convex hull HH, we can conclude that f¯\overline{f} also lies above this curve, as desired.

Now that we have proved that f(α0),,f(αk)f(\alpha_{0}),\dots,f(\alpha_{k}) is a valid input to the DiscretizePLD algorithm (and therefore the output (P,Q)(P^{\uparrow},Q^{\uparrow}) is a pair of valid probability distributions), we will next show that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\preceq(A,B). This is equivalent to showing that f¯(α)h(α)\overline{f}(\alpha)\leq h(\alpha) for all α0{+}\alpha\geq{\mathbb{R}}_{\geq 0}\cup\{+\infty\}. To prove this, we consider three cases based on the value of α\alpha. For brevity, we say that a curve C1C_{1} is below a curve C2C_{2} when they share the same domain Ω\Omega and C1(o)C2(o)C_{1}(o)\leq C_{2}(o) for all oΩo\in\Omega.

  • \triangleright

    Case I: ααk1\alpha\geq\alpha_{k-1}. In this case, f¯(α)=0h(α)\overline{f}(\alpha)=0\leq h(\alpha).

  • \triangleright

    Case II: α[1,αk1)\alpha\in[1,\alpha_{k-1}). Suppose that α[αi,αi+1)\alpha\in[\alpha_{i},\alpha_{i+1}). Let L1L_{1} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\leftarrow}(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\leftarrow}(\alpha_{i+1})), and L2L_{2} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\leftarrow}(\alpha_{i})) to (αi+1,h(αi+1))(\alpha_{i+1},h(\alpha_{i+1})). From Lemma 5.5(iv), L1L_{1} is below L2L_{2}. Furthermore, since f¯|[αi,αi+1)\overline{f}|_{[\alpha_{i},\alpha_{i+1})} is a lower boundary of the convex hull HH containing L1L_{1}, it must also be below L1L_{1}. Therefore, we have

    f¯(α)L2(α)\displaystyle\overline{f}(\alpha)\leq L_{2}(\alpha) L1(α)\displaystyle\leq L_{1}(\alpha)
    =h(αi+1)(αi+1α)h(αi+1)\displaystyle=h(\alpha_{i+1})-(\alpha_{i+1}-\alpha)h^{\prime}(\alpha_{i+1})
    h(α),\displaystyle\leq h(\alpha),

    where the last inequality follows from convexity of hh.

  • \triangleright

    Case III: α[0,1)\alpha\in[0,1). Suppose that α[αi,αi+1)\alpha\in[\alpha_{i},\alpha_{i+1}). Similarly to the previous case, let L3L_{3} denote the line segment from (αi,f(αi))(\alpha_{i},f^{\rightarrow}(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\rightarrow}(\alpha_{i+1})), and L4L_{4} denote the line segment from (αi,h(αi))(\alpha_{i},h(\alpha_{i})) to (αi+1,f(αi+1))(\alpha_{i+1},f^{\rightarrow}(\alpha_{i+1})). From Lemma 5.5(iii), L3L_{3} is below L4L_{4}. Furthermore, since f¯|[αi,αi+1)\overline{f}|_{[\alpha_{i},\alpha_{i+1})} is a lower boundary of the convex hull HH containing L3L_{3}, it must also be below L3L_{3}. Therefore, we have

    f¯(α)L3(α)\displaystyle\overline{f}(\alpha)\leq L_{3}(\alpha) L4(α)\displaystyle\leq L_{4}(\alpha)
    =h(αi)+(αi+1α)h(αi)\displaystyle=h(\alpha_{i})+(\alpha_{i+1}-\alpha)h^{\prime}(\alpha_{i})
    h(α),\displaystyle\leq h(\alpha),

    where the last inequality follows from convexity of hh.

As a result, we can conclude that (P,Q)(A,B)(P^{\uparrow},Q^{\uparrow})\preceq(A,B), completing our proof. ∎

5.2 Comparison to Privacy Loss Buckets

Similar to Section 4.1, PB [24] can also be applied for optimistic-estimate: let 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} be the probability distribution where

𝖯𝖫𝖣~(A,B)(εi1)=𝖯𝖫𝖣(A,B)([εi1,εi)),\displaystyle\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)}(\varepsilon_{i-1})=\mathsf{PLD}_{(A,B)}([\varepsilon_{i-1},\varepsilon_{i})),

for all i[k]i\in[k]. That is, 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is a probability distribution on \mathcal{E} that is stochastically dominated by 𝖯𝖫𝖣(A,B)\mathsf{PLD}_{(A,B)}; furthermore, 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is the greatest such distribution under stochastic dominant (partial) ordering. It is important to note that, unlike the pessimistic-PB estimate, the optimistic-PB estimate 𝖯𝖫𝖣~(A,B)\widetilde{\mathsf{PLD}}^{\downarrow}_{(A,B)} is not necessarily a valid PLD for some pair of distributions. This can easily be seen by, e.g., taking a PLD of any ε\varepsilon-DP mechanism for finite ε\varepsilon and let ={,+}\mathcal{E}=\{-\infty,+\infty\}; the optimistic-PB estimate puts all of its mass at 0, which is clearly not a valid PLD.

We present illustrations of our optimistic PLD and optimistic-PB estimates in Figure 3. Recall that in the pessimistic case, we can show that the pessimistic-PB estimate is no better than our approach (Corollary 4.3). Although we observe similar behaviours in the optimistic case in simple examples (e.g. Figure 3) and also in our experiments in Section 6, this unfortunately does not hold in general. Indeed, if 𝖯𝖫𝖣(A,B)\mathsf{PLD}_{(A,B)} has a non-zero mass at ++\infty (or equivalently h(+)0h(+\infty)\neq 0), then the optimistic-PB estimate still keeps this mass while our does not. The latter is because we set f(+)=0f(+\infty)=0 in Algorithm 2. Note here that we cannot set f(+)=h(+)f(+\infty)=h(+\infty) here because the monotonicity may not hold anymore; it is possible that f(αi)<h(+)f^{\rightarrow}(\alpha_{i})<h(+\infty) for some i[i]i\in[i^{*}]. Such examples highlight the challenge in finding a good optimistic estimate (especially in light of the non-existence of the best one, i.e., Lemma 5.1), and we provide further discussion regarding this in Section 7.

Refer to caption
(a) Gaussian Mechanism
Refer to caption
(b) Laplace Mechanism
Fig. 3: Illustrations of the hockey-stick curves of the Gaussian and Laplace mechanisms , and their optimistic estimates from our approach and the Privacy Bucket (PB) approach of [24]. The setting of parameters and labels are similar to Figure 2.
Refer to caption
(a) One Step of Candidate Generation for i<ii<i^{*}
Refer to caption
(b) One Step of Candidate Generation for i>ii>i^{*}
Refer to caption
(c) All Candidate Points and Its Convex Hull
Fig. 4: Illustrations of our optimistic PLD construction algorithm (Algorithm 2) for Laplace mechanism with noise multiplier 1. Figure 4(a) demonstrates one step of how the candidate points are generated when i<ii<i^{*}. Specifically, a line tangent to the hockey-stick curve is drawn at each point (αi,h(αi))(\alpha_{i},h(\alpha_{i})); the intersections with the vertical line at αi+1\alpha_{i+1} give the candidate points (αi+1,fi+1)(\alpha_{i+1},f^{\rightarrow}_{i+1}). Similarly, Figure 4(b) shows such a step for i>ii>i^{*}; in this case, the same line is drawn and its intersection with the vertical line at αi1\alpha_{i-1} give the candidate points (αi1,fi1)(\alpha_{i-1},f^{\leftarrow}_{i-1}). Figure 4(c) shows all the candidate points generated together with its convex hull, which we use as our optimistic estimate.

6 Evaluation

We compare our algorithm with the Privacy Buckets algorithm [24] as implemented in the Google DP library777Even though there are several other papers [19, 20, 18] that build on PB, all of them still use the same PB-based approximation, with the differences being how the truncation is computed for FFT. We use the implementation in the Google DP library github.com/google/differential-privacy/tree/main/python, and the algorithm of Gopi et al. [15] implemented in Microsoft PRV Accountant.888Implementation from github.com/microsoft/prv_accountant Gopi et al.’s algorithm does not fit into the pessimistic/optimistic framework as described in Section 2.5. Instead, their algorithm uses an approximation of PLD that is neither optimistic nor pessimistic and uses a concentration bound to derive pessimistic and optimistic estimates. Indeed, their approximate distribution maintains the same expectation as the true PLD, which is the main ingredient in their improvement over previous work.

Refer to caption
Fig. 5: ε\varepsilon vs Number of Compositions of the Gaussian mechanism with noise scale 8080. All methods evaluated with the same discretization interval of 0.0050.005.

As a first cut, we evaluate pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ=105\delta=10^{-5}, for varying number of compositions of the Gaussian mechanism with noise scale 8080, while comparing our approach to the two other implementations mentioned above. We use the same discretization interval to evaluate each algorithm. The reason for choosing the Gaussian mechanism is that the exact value of ε\varepsilon can be computed explicitly. We find that the estimates given by our approach are the tightest.

Remark 6.1.

For any specified discretization interval, each algorithm has a different choice of how many discretization points are included in \mathcal{E}. Our implementation uses the same set of discretization points as used by the Google DP implementation. The number of discretization points increases with the number of self compositions (we use the Google DP implementation to perform self composition999We found that the Google DP implementation has a significantly worse running time when computing optimistic estimates, due to lack of truncation. We modify the self-composition method in the Google DP library to incorporate truncation when computing optimistic estimates, and evaluate both ours and Google DP implementation with this minor modification. These do not change the estimates significantly, but drastically reduce the running time.). On the other hand, Microsoft PRV Accountant chooses a number of discretization points, depending on the number of compositions desired, and this number does not change after self composition. In all the evaluation experiments mentioned in this paper, we find that the number of discretization points in our approach are lower than the number of discretization points in the PRV Accountant (even after composition).

Our main evaluation involves computing pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ\delta, for varying number of compositions of the Poisson sub-sampled Gaussian mechanism and comparing our approach to each of the two other implementations. Note that this particular mechanism is quite popular in that it captures the privacy analysis of DP-SGD where the number of compositions is equal to the number of iterations of the training algorithm, and the subsampling rate is equal to the fraction of the batch size divided by the total number of training examples [1]. In particular, we consider the Gaussian mechanism with noise scale 11, Poisson-subsampled with probability 0.010.01. We compare against each competing algorithm twice, once where both algorithms use the same discretization interval, and once where our approach uses a larger discretization interval than the competing algorithm. We additionally plot the running time required for this computation for each number of compositions; we ran the evaluation for each number of compositions 20×20\times and plot the mean running time along with a shaded region indicating 25th–75th percentiles of running time.

Comparison with Google DP.

The comparison with Google DP is presented in Figure 6. Figures 6(a) and 6(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate for a mildly larger running time. Figures 6(b) and 6(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 66.66×66.66\times larger, our method gives comparable estimates, with a drastic speed-up (300×\sim 300\times).

Refer to caption
(a) ε\varepsilon comparison with discretization interval 0.005 for both this work and Google DP
Refer to caption
(b) ε\varepsilon comparison with discretization interval 0.005 for this work, 0.000075 for Google DP
Refer to caption
(c) Running time comparison with discretization interval 0.005 for both this work and Google DP (shaded region indicates 25-75 percentile range across 20 independent runs)
Refer to caption
(d) Running time comparison with discretization interval 0.005 for this work, 0.000075 for Google DP (shaded region indicates 25-75 percentile range across 20 independent runs)
Fig. 6: Computation of pessimistic/optimistic estimates of the privacy parameter ε\varepsilon (for fixed parameter δ=105\delta=10^{-5}) for self-composition of the Gaussian mechanism with noise scale 11, Poisson-subsampled with probability 0.010.01, using the Google DP implementation of the PB approach [24] vs. our approach. Figures 6(a) and 6(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval. Figures 6(b) and 6(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals.
Comparison with Microsoft PRV Accountant.

The comparison with Microsoft PRV Accountant is presented in Figure 7. Figures 7(a) and 7(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate with already shorter running time. Figures 7(b) and 7(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 6.66×6.66\times larger, our method gives comparable estimates, with an even larger speed-up.

Refer to caption
(a) ε\varepsilon comparison with discretization interval 0.005 for both this work and PRV Accountant
Refer to caption
(b) ε\varepsilon comparison with discretization interval 0.005 for this work, 0.00075 for PRV Accountant (shaded region indicates 25-75 percentile range across 20 independent runs)
Refer to caption
(c) Running time comparison with discretization interval 0.005 for both this work and PRV Accountant (shaded region indicates 25-75 percentile range across 20 independent runs)
Refer to caption
(d) Running time comparison with discretization interval 0.005 for this work, 0.00075 for PRV Accountant
Fig. 7: Computation of pessimistic/optimistic estimates of the privacy parameter ε\varepsilon (for fixed parameter δ=105\delta=10^{-5}) for self-composition of the Gaussian mechanism with noise scale 11, Poisson-subsampled with probability 0.010.01, using the Microsoft PRV Accountant [15] vs. our approach. We note that the pessimistic and optimistic curves of the Microsoft PRV Accountant [15] are identical in Figure 7(c) and Figure 7(d).

In Appendix B, we perform a similar evaluation for the Poisson-subsampled Laplace mechanism.

7 Discussion & Open Problems

In this work, we have proposed a novel approach to pessimistic and optimistic estimates for PLDs, which outperforms previous approaches under similar discretization intervals, and allows for a more compact representation of PLDs while retaining similar error guarantees. There are still several interesting future directions that one could consider.

As we have proved in Lemma 5.1, there is no unique “best” way to pick an optimistic estimate, and we proposed a greedy algorithm (Algorithm 2) for this task. However, it is difficult to determine how good this greedy algorithm is in general. Instead, it might also be interesting to find (P,Q)(A,B)(P^{\downarrow},Q^{\downarrow})\preceq(A,B) that minimizes a certain objective involving h(P,Q)h_{(P^{\downarrow},Q^{\downarrow})} and h(A,B)h_{(A,B)}. For example, one could consider the area between the two curves, or the Fréchet distance between them. An intriguing direction here is to determine (i) which objective captures the notion of “good approximation” better in terms of composition, and (ii) for a given objective, whether there is an efficient algorithm to compute such (P,Q)(P^{\downarrow},Q^{\downarrow}). We remark that for some objectives, such as the area between the two curves, it is possible to discretize the candidate values for each f(αi)f(\alpha_{i}) and use dynamic programming in an increasing order of ii (with the state being f(αi1),f(αi)f(\alpha_{i-1}),f(\alpha_{i})). Even for these objectives, it remains interesting to determine whether such discretization is necessary and whether more efficient algorithms exist.

Also related is the question of how to theoretically explain our experimental findings (Section 6). Although we see significant numerical improvements, it is intriguing to understand theoretically where these improvements come from and which properties of PLDs govern how big such improvements are. More broadly, given an estimate of a PLD, how can we quantify how “good” it is? Previous work (e.g., [19, 15]) has obtained certain theoretical bounds on the errors; it would be interesting to investigate whether these bounds can help answer the aforementioned question.

Furthermore, the entire line of work on PLD-based accounting [24, 19, 20, 18], including this paper, has so far considered only non-interactive compositions, meaning that the mechanisms that are run in subsequent steps cannot be changed based on the outputs from the previous steps. This is not a coincidence: interactive composition is highly non-trivial and in fact it is known that the advanced composition theorem (which is even a more specific form of PLD) does not hold in this regime [28]. Several solutions have been proposed here, such as modified formulae for advanced compositions [28, 31] and Renyi DP [14, 21]. However, as discussed earlier, these methods may be loose even in the non-interactive setting, which is the original motivation for PLD-based accounting. Therefore, it would be interesting to understand whether there is a tighter method similar to PLDs that also works in the interactive setting.

Acknowledgments

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

  • [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In CCS, pages 308–318, 2016.
  • [2] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
  • [3] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
  • [4] Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, and Uthaipon Tantipongpipat. Fast and memory efficient differentially private-SGD via JL projections. In NeurIPS, 2021.
  • [5] Mark Bun, Cynthia Dwork, Guy N Rothblum, and Thomas Steinke. Composable and versatile privacy via truncated CDP. In STOC, pages 74–86, 2018.
  • [6] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, pages 635–658, 2016.
  • [7] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In NeurIPS, pages 3571–3580, 2017.
  • [8] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
  • [9] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
  • [10] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  • [11] Cynthia Dwork and Guy N Rothblum. Concentrated differential privacy. arXiv:1603.01887, 2016.
  • [12] Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. Boosting and differential privacy. In FOCS, pages 51–60, 2010.
  • [13] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014.
  • [14] Vitaly Feldman and Tijana Zrnic. Individual privacy accounting via a Rényi filter. In NeurIPS, 2021.
  • [15] Sivakanth Gopi, Yin Tat Lee, and Lukas Wutschitz. Numerical composition of differential privacy. In NeurIPS, 2021.
  • [16] Andy Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
  • [17] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In ICML, pages 1376–1385, 2015.
  • [18] Antti Koskela and Antti Honkela. Computing differential privacy guarantees for heterogeneous compositions using FFT. arXiv:2102.12412, 2021.
  • [19] Antti Koskela, Joonas Jälkö, and Antti Honkela. Computing tight differential privacy guarantees using FFT. In AISTATS, pages 2560–2569, 2020.
  • [20] Antti Koskela, Joonas Jälkö, Lukas Prediger, and Antti Honkela. Tight differential privacy for discrete-valued mechanisms and for the subsampled Gaussian mechanism using FFT. In AISTATS, pages 3358–3366, 2021.
  • [21] Mathias Lécuyer. Practical privacy filters and odometers with Rényi differential privacy and applications to differentially private deep learning. arXiv:2103.01379, 2021.
  • [22] Google’s Differential Privacy Libraries. DP Accounting Library. https://github.com/google/differential-privacy/tree/main/python/dp_accounting, 2020.
  • [23] Antti Koskela. Lukas Prediger. Code for computing tight guarantees for differential privacy. https://github.com/DPBayes/PLD-Accountant, 2020.
  • [24] Sebastian Meiser and Esfandiar Mohammadi. Tight on budget? Tight bounds for rr-fold approximate differential privacy. In CCS, pages 247–264, 2018.
  • [25] Microsoft. A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy. https://github.com/microsoft/prv_accountant, 2021.
  • [26] Ilya Mironov. Rényi differential privacy. In CSF, pages 263–275, 2017.
  • [27] Jack Murtagh and Salil Vadhan. The complexity of computing the optimal composition of differential privacy. In TCC, pages 157–175, 2016.
  • [28] Ryan M. Rogers, Salil P. Vadhan, Aaron Roth, and Jonathan R. Ullman. Privacy odometers and filters: Pay-as-you-go composition. In NIPS, pages 1921–1929, 2016.
  • [29] Stephen Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
  • [30] David M Sommer, Sebastian Meiser, and Esfandiar Mohammadi. Privacy loss classes: The central limit theorem in differential privacy. PoPETS, 2019(2):245–269, 2019.
  • [31] Justin Whitehouse, Aaditya Ramdas, Ryan Rogers, and Steven Wu. Improved privacy filters and odometers: Time-uniform bounds in privacy composition. In TPDP, 2021.
  • [32] Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Optimal accounting of differential privacy via characteristic function. In AISTATS, pages 4782–4817, 2022.

Appendix A Proofs

See 2.6

  • Proof.

    We have from the definition of hockey-stick divergence that

    Deε(P||Q)\displaystyle D_{e^{\varepsilon}}(P||Q) =ω[P(ω)eεQ(ω)]+\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{\omega}\ [P(\omega)-e^{\varepsilon}\cdot Q(\omega)]_{+}
    =ω[1eεlog(P(ω)/Q(ω))]+P(ω)\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{\omega}\ [1-e^{\varepsilon-\log(P(\omega)/Q(\omega))}]_{+}\cdot P(\omega)
    =εsupp(𝖯𝖫𝖣(P,Q))[1eεε]+𝖯𝖫𝖣(P,Q)(ε)\displaystyle\leavevmode\nobreak\ =\leavevmode\nobreak\ \sum_{\varepsilon^{\prime}\in\operatorname*{supp}(\mathsf{PLD}_{(P,Q)})}[1-e^{\varepsilon-\varepsilon^{\prime}}]_{+}\cdot\mathsf{PLD}_{(P,Q)}(\varepsilon^{\prime})

    where the last line follows from the fact that

    𝖯𝖫𝖣(P,Q)(ε):=ω:log(P(ω)/Q(ω))=εP(ω).\mathsf{PLD}_{(P,Q)}(\varepsilon^{\prime})\leavevmode\nobreak\ :=\leavevmode\nobreak\ \sum_{\omega:\log(P(\omega)/Q(\omega))=\varepsilon^{\prime}}P(\omega)\,.\qed

Appendix B Evaluation of Poisson-Subsampled Laplace Mechanism

Similar to Section 6, we compute pessimistic and optimistic estimates on the privacy parameter ε\varepsilon, for a fixed value of δ\delta, for varying number of compositions of the Poisson sub-sampled Laplace mechanism and comparing our approach to the Google DP implementation.101010we were unable to compare against Microsoft PRV Accountant, since their implementation does not have support for the Laplace mechanism yet. In particular, we consider the Laplace mechanism with noise scale 55, Poisson-subsampled with probability 0.010.01. We compare against Google DP implementation twice, once where both algorithms use the same discretization interval, and once where our approach uses a larger discretization interval than the competing algorithm. We additionally plot the running time required for this computation for each number of compositions; we ran the evaluation for each number of compositions 20×20\times and plot the mean running time along with a shaded region indicating 25th–75th percentiles of running time.

The comparison with Google DP is presented in Figure 8. Figures 6(a) and 8(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval, and finds that our method gives a significantly tighter estimate for a mildly larger running time. Figures 8(b) and 8(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals, and using a discretization interval that is 100×100\times larger, our method gives comparable estimates, with a significant running time speed-up (75×\sim 75\times).

Refer to caption
(a) ε\varepsilon comparison with discretization interval 0.0002 for both this work and Google DP
Refer to caption
(b) ε\varepsilon comparison with discretization interval 0.0002 for this work, 0.000002 for Google DP (shaded region indicates 25-75 percentile range across 20 independent runs)
Refer to caption
(c) Running time comparison with discretization interval 0.0002 for both this work and Google DP (shaded region indicates 25-75 percentile range across 20 independent runs)
Refer to caption
(d) Running time comparison with discretization interval 0.0002 for this work, 0.000002 for Google DP
Fig. 8: Computation of pessimistic/optimistic estimates of the privacy parameter ε\varepsilon (for fixed parameter δ=105\delta=10^{-5}) for self-composition of the Laplace mechanism with noise scale 11, Poisson-subsampled with probability 0.010.01, using the Google DP implementation of the PB approach [24] vs. our approach.Figures 8(a) and 8(c) compare the ε\varepsilon’s and runtimes for both methods using the same discretization interval. Figures 8(b) and 8(d) compare the ε\varepsilon’s and runtimes for both methods with different discretization intervals.

Appendix C Inaccuracies from Floating-Point Arithmetic

We briefly discuss the errors due to floating-point arithmetic. In our implementation, we use the default float datatype in python, which conforms to IEEE-754 “double precision”. Roughly speaking, this means that the resolution of each floating-point number of 2531.110162^{-53}\approx 1.1\cdot 10^{-16}. The number of operations performed in our algorithms scales linearly with the support size of the (discretized) PLD, which is less than 10410^{4} in all of our experiments. Therefore, a rough heuristic suggests that the numerical error for δ\delta here would be less than 101110^{-11}. We stress however that this is just a heuristic and is not a formal guarantee: achieving a formal guarantee is much more complicated, e.g., our optimistic algorithm requires computing a convex hull and one would have to formalize how the numerical error from convex hull computation affects the final δ\delta.

Finally, we also remark that, while Gopi et al. [15, Appendix A] note that they experience numerical issues around δ109\delta\approx 10^{-9}, we do not experience the same issues in our algorithm even for similar setting of parameters even for δ\delta as small as 101210^{-12}.