This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Faster Privacy Accounting via Evolving Discretization

Badih Ghazi    Pritish Kamath    Ravi Kumar    Pasin Manurangsi
Abstract

We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms. Our algorithm achieves a running time and memory usage of polylog(k)\mathrm{polylog}(k) for the task of self-composing a mechanism, from a broad class of mechanisms, kk times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent. By comparison, recent work by Gopi et al. (2021) has obtained a running time of O~(k)\widetilde{O}(\sqrt{k}) for the same task. Our approach extends to the case of composing kk different mechanisms in the same class, improving upon their running time and memory usage from O~(k1.5)\widetilde{O}(k^{1.5}) to O~(k)\widetilde{O}(k).

Machine Learning, ICML

1 Introduction

Differential privacy (DP) (Dwork et al., 2006b, a) has become the preferred notion of privacy in both academia and the industry. Fueled by the increased awareness and demand for privacy, several systems that use DP mechanisms to guard users’ privacy have been deployed in the industry (Erlingsson et al., 2014; Shankland, 2014; Greenberg, 2016; Apple Differential Privacy Team, 2017; Ding et al., 2017; Kenthapadi & Tran, 2018; Rogers et al., 2021), and the US Census (Abowd, 2018). Besides the large volume of data, many of these systems offer real-time private data analytic and inference capabilities, which entail strict computational efficiency requirements on the underlying DP operations.

We recall the definition of DP (Dwork et al., 2006b, a):

Definition 1.1 (DP).

Let ε>0\varepsilon>0 and δ[0,1]\delta\in[0,1]. A randomized algorithm :𝒳n𝒴\mathcal{M}:\mathcal{X}^{n}\to\mathcal{Y} is (ε,δ)(\varepsilon,\delta)-DP if, for all x,x𝒳nx,x^{\prime}\in\mathcal{X}^{n} differing on a single index and all outputs S𝒴S\subseteq\mathcal{Y}, we have Pr[(x)S]eεPr[(x)S]+δ\Pr[\mathcal{M}(x)\in S]\leq e^{\varepsilon}\cdot\Pr[\mathcal{M}(x^{\prime})\in S]+\delta.

The DP guarantees of a mechanism are captured by the privacy parameters ε\varepsilon and δ\delta; the smaller these parameters, the more private the mechanism. Often a mechanism is simultaneously DP for multiple privacy parameters; this is captured by studying the privacy loss of a mechanism \mathcal{M}—the curve δ()\delta_{\mathcal{M}}(\cdot) for which the mechanism is (ε,δ(ε))(\varepsilon,\delta_{\mathcal{M}}(\varepsilon))-DP.

A fundamental mathematical property satisfied by DP is composition, which prescribes the privacy guarantees of results from executing multiple DP mechanisms. In basic composition (Dwork et al., 2006a), a mechanism that returns the results of executing an (ε1,δ1)(\varepsilon_{1},\delta_{1})-DP mechanism and an (ε2,δ2)(\varepsilon_{2},\delta_{2})-DP mechanism is (ε1+ε2,δ1+δ2)(\varepsilon_{1}+\varepsilon_{2},\delta_{1}+\delta_{2})-DP. Unfortunately, this bound becomes weak when composing a large number of mechanisms. The advanced composition (Dwork et al., 2010) offers stronger bounds: roughly speaking, composing kk mechanisms that are each (ε,δ)(\varepsilon,\delta)-DP yields a mechanism whose privacy parameters are of the order of kε\sqrt{k}\varepsilon and kδk\delta—clearly more desirable than the basic composition. In fact, obtaining tight composition bounds has been an active research topic. Kairouz et al. (2015) showed how to obtain the exact privacy parameters of self-composition (mechanism composed with itself), while Murtagh & Vadhan (2016) showed that the corresponding problem for the more general case is #P-complete.

Privacy Loss Distribution (PLD).

Tighter bounds for privacy parameters that go beyond advanced composition are possible if the privacy loss of the mechanism is taken into account. Meiser & Mohammadi (2018); Sommer et al. (2019) initiated the study of numerical methods for accurately estimating the privacy parameters, using the privacy loss distribution (PLD) of a mechanism. The PLD is the probability mass function of the so-called privacy loss random variable (PRV) for discrete mechanisms, and its density function for continuous mechanisms (see Section 2 for formal definitions). PLDs have two nice properties: (i) tight privacy parameters can be computed from the PLD of a mechanism and (ii) the PLD of a composed mechanism is the convolution of the individual PLDs. Property (ii) makes PLD amenable to FFT-based methods.

Koskela et al. (2020) exploited this property to speed up the computation of the PLD of the composition. An important step to retain efficiency using PLDs is approximating the distribution so that it has finite support; this is especially needed in the case when the PLD is continuous or has a large support. PLD implementations have been at the heart of many DP systems and open-source libraries including (Lukas Prediger, 2020; Google, 2020; Microsoft, 2021). To enable scale and support latency considerations, the PLD composition has to be as efficient as possible. This is the primary focus of our paper.

Our starting point is the work of Koskela et al. (2020, 2021); Koskela & Honkela (2021), who derived explicit error bounds for the approximation obtained by the FFT-based algorithm. The running time of this algorithm for kk-fold self-composition of a mechanism \mathcal{M} that is (ε0,0)(\varepsilon_{0},0)-DP is111For any positive TT, we denote by O~(T)\widetilde{O}(T) any quantity of the form O(Tpolylog(T))O(T\cdot\mathrm{polylog}(T)). O~(k2ε0δerr)\widetilde{O}\left(\frac{k^{2}\varepsilon_{0}}{\delta_{\mathrm{err}}}\right). When ε0=1klog(1/δerr)\varepsilon_{0}=\frac{1}{\sqrt{k\cdot\log(1/\delta_{\mathrm{err}})}}, this running time is O~(k1.5δerr)\widetilde{O}\left(\frac{k^{1.5}}{\delta_{\mathrm{err}}}\right), where δerr\delta_{\mathrm{err}} is the additive error in δ\delta. Gopi et al. (2021) improved this bound to obtain an algorithm with running time of

O~(k0.5log1δerrεerr),\textstyle\widetilde{O}\left(\frac{k^{0.5}\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right)\,,

where εerr\varepsilon_{\mathrm{err}} is the additive error in ε\varepsilon. When composing kk different mechanisms, their running time is

O~(k1.5log1δerrεerr).\textstyle\widetilde{O}\left(\frac{k^{1.5}\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right)\,.

Our Contributions.

We design and study new algorithms for kk-fold numerical composition of PLDs. Specifically, for reasonable choice of mechanisms, we

  • \triangleright

    obtain (Section 3) a two-stage algorithm for self-composing PLDs, with running time

    O~(k0.25log1δerrεerr).\widetilde{O}\left(\frac{k^{0.25}\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right)\,.
  • \triangleright

    provide (Section 4) an experimental evaluation of the two-stage algorithm, comparing its runtime to that of the algorithm of Gopi et al. (2021). We find that the speedup obtained by our algorithm improves with kk.

  • \triangleright

    extend (Section 5) the two-stage algorithm to a recursive multi-stage algorithm with a running time of

    O~(polylog(k)log1δerrεerr).\widetilde{O}\left(\frac{\mathrm{polylog}(k)\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right)\,.

Both the two-stage and multi-stage algorithms extend to the case of composing kk different mechanisms. In each case, the running time increases by a multiplicative O(k)O(k) factor. Note that this factor is inevitable since the input “size” is kk—indeed, the algorithm needs to read the kk input PLDs. We defer the details of this extension to Appendix B.

Algorithm Overview.

The main technique underlying our algorithms is the evolution of the discretization and truncation intervals of the approximations of the PRV with the number of compositions. To describe our approach, we first present a high-level description of the algorithm of Gopi et al. (2021). Their algorithm discretizes an O(1)O(1)-length interval into bucket intervals each with mesh size 1k0.5\approx\frac{1}{k^{0.5}}, thus leading to a total of O(k0.5)O(k^{0.5}) buckets and a running time of O~(k0.5)\widetilde{O}(k^{0.5}) for the FFT convolution algorithm. Both these aspects of their algorithm are in some sense necessary: a truncation interval of length O(1)\ll O(1) would lose significant information about the kk-fold composition PRV, whereas a discretization interval of length 1k0.5\gg\frac{1}{k^{0.5}} would lose significant information about the original PRV; so relaxing either would lead to large approximation error.

The key insight in our work is that it is possible to avoid having both these aspects simultaneously. In particular, in our two-stage algorithm, the first stage performs a k0.5k^{0.5}-fold composition using an interval of length 1k0.25\approx\frac{1}{k^{0.25}} discretized into bucket intervals with mesh size 1k0.5\approx\frac{1}{k^{0.5}}, followed by another k0.5k^{0.5}-fold composition using an interval of length O(1)O(1) discretized into bucket intervals with mesh size 1k0.25\approx\frac{1}{k^{0.25}}. Thus, in each stage the number of discretization buckets is O~(k0.25)\widetilde{O}(k^{0.25}) leading to a final running time of O~(k0.25)\widetilde{O}(k^{0.25}) for performing two FFT convolutions.

The recursive multi-stage algorithm extends this idea to O(logk)O(\log k) stages, each performed with an increasing discretization interval and truncation interval, ensuring that the number of discretization buckets at each step is O(polylog(k))O(\mathrm{polylog}(k)).

Experimental Evaluation.

We implement our two-stage algorithm and compare it to baselines from the literature. We consider the sub-sampled Gaussian mechanism, which is a fundamental primitive in private machine learning and constitutes a core primitive of training algorithms that use differentially private stochastic gradient descent (DP-SGD) (see, e.g., (Abadi et al., 2016)). For 2162^{16} compositions and a standard deviation of 226.86\approx 226.86 and with subsampling probability of 0.20.2, we obtain a speedup of 2.66×2.66\times compared to the state-of-the-art. We also consider compositions of the widely-used Laplace mechanism. For 2162^{16} compositions, and a scale parameter of 1133.84\approx 1133.84 for the Laplace distribution, we achieve a speedup of 2.3×2.3\times. The parameters were chosen such that the composed mechanism satisfies (ε=1.0,δ=106)(\varepsilon=1.0,\delta=10^{-6})-DP. See Section 4 for more details.

Related Work.

In addition to Moments Accountant (Abadi et al., 2016) and Rényi DP (Mironov, 2017) (which were originally used to bound the privacy loss in DP deep learning), several other tools can also be used to upper-bound the privacy parameters of composed mechanisms, including concentrated DP (Dwork & Rothblum, 2016; Bun & Steinke, 2016), its truncated variant (Bun et al., 2018), and Gaussian DP (Dong et al., 2019; Bu et al., 2020). However, none of these methods is tight; furthermore, none of them allows a high-accuracy estimation of the privacy parameters. In fact, some of them require that the moments of the PRV are bounded; the PLD composition approach does not have such restrictions and hence is applicable to mechanisms such as DP-SGD-JL (Bu et al., 2021).

In a recent work, Doroshenko et al. (2022) proposed a different discretization procedure based on whether we want the discretized PLD to be “optimistic” or “pessimistic”. They do not analyze the error bounds explicitly but it can be seen that their running time is O~(k)\tilde{O}(k), which is slower than both our algorithm and that of Gopi et al. (2021).

Another recent work (Zhu et al., 2022) developed a rigorous notion of “worst-case” PLD for mechanisms, under the name dominating PLDs. Our algorithms can be used for compositions of dominating PLDs; indeed, our experimental results for Laplace and (subsampled) Gaussian mechanisms are already doing this implicitly. Furthermore, Zhu et al. (2022) propose a different method for computing tight DP composition bounds. However, their algorithm requires an analytical expression for the characteristic function of the PLDs. This may not exist, e.g., we are unaware of such an analytical expression for subsampled Gaussian mechanisms.

2 Preliminaries

Let >0\mathbb{Z}_{>0} denote the set of all positive integers, 0\mathbb{R}_{\geq 0} the set of all non-negative real numbers, and let ¯={,+}\overline{\mathbb{R}}=\mathbb{R}\cup\{-\infty,+\infty\}. For any two random variables XX and YY, we denote by dTV(X,Y)d_{\mathrm{TV}}(X,Y) their total variation distance.

2.1 Privacy Loss Random Variables

We will use the following privacy definitions and tools that appeared in previous works on PLDs (Sommer et al., 2019; Koskela et al., 2020; Gopi et al., 2021).

For any mechanism \mathcal{M}, and any ε0\varepsilon\in\mathbb{R}_{\geq 0}, we denote by δ(ε)\delta_{\mathcal{M}}(\varepsilon) the smallest value δ\delta such that \mathcal{M} is (ε,δ)(\varepsilon,\delta)-DP. The resulting function δ()\delta_{\mathcal{M}}(\cdot) is said to be the privacy curve of the mechanism \mathcal{M}. A closely related notion is the privacy curve δ(X||Y):0[0,1]\delta(X||Y):\mathbb{R}_{\geq 0}\to[0,1] between two random variables X,YX,Y, and which is defined, for any ε0\varepsilon\in\mathbb{R}_{\geq 0} as

δ(X||Y)(ε)=supSΩPr[YS]eεPr[XS],\delta(X||Y)(\varepsilon)=\sup_{S\subset\Omega}\Pr[Y\in S]-e^{\varepsilon}\Pr[X\in S],

where Ω\Omega is the support of XX and YY. The privacy loss random variables associated with a privacy curve δ\delta_{\cal M} are random variables (X,Y)(X,Y) such that δ\delta_{\cal M} is the same curve as δ(X||Y)\delta(X||Y), and that satisfy certain additional properties (which make them unique). While PRVs have been defined earlier in Dwork & Rothblum (2016); Bun & Steinke (2016), we use the definition of Gopi et al. (2021):

Definition 2.1 (PRV).

Given a privacy curve δ:0[0,1]\delta_{\cal M}:\mathbb{R}_{\geq 0}\to[0,1], we say that random variables (X,Y)(X,Y) are privacy loss random variables (PRVs) for δ\delta_{\cal M}, if (i) XX and YY are supported on ¯\overline{\mathbb{R}}, (ii) δ(X||Y)=δ\delta(X||Y)=\delta_{\cal M}, (iii) Y(t)=etX(t)Y(t)=e^{t}X(t) for every tt\in\mathbb{R}, and (iv) Y()=X()=0Y(-\infty)=X(\infty)=0, where X(t)X(t) and Y(t)Y(t) denote the PDFs of XX and YY, respectively.

Theorem 2.2 (Gopi et al. (2021)).

Let δ\delta be a privacy curve that is identical to δ(P||Q)\delta(P||Q) for some random variables PP and QQ. Then, the PRVs (X,Y)(X,Y) for δ\delta are given by

X=log(Q(w)P(w)) where ωP,\textstyle X=\log\left(\frac{Q(w)}{P(w)}\right)\text{ where }\omega\sim P,

and

Y=log(Q(w)P(w)) where ωQ.\textstyle Y=\log\left(\frac{Q(w)}{P(w)}\right)\text{ where }\omega\sim Q.

Moreover, we define δY(ε):=𝔼Y[(1eεY)+]\delta_{Y}(\varepsilon):=\operatorname{\mathbb{E}}_{Y}[(1-e^{\varepsilon-Y})_{+}] for every ε\varepsilon\in\mathbb{R} and define εY(δ)\varepsilon_{Y}(\delta) as the smallest ε\varepsilon such that δY(ε)δ\delta_{Y}(\varepsilon)\leq\delta.

Note that δY(ε)\delta_{Y}(\varepsilon) is well defined even for YY that is not a privacy loss random variable.

If δ1\delta_{1} is a privacy curve identical to δ(X1||Y1)\delta(X_{1}||Y_{1}) and δ2\delta_{2} is a privacy curve identical to δ(X2||Y2)\delta(X_{2}||Y_{2}), then the composition δ1δ2\delta_{1}\otimes\delta_{2} is defined as the privacy curve δ((X1,X2)||(Y1,Y2))\delta((X_{1},X_{2})||(Y_{1},Y_{2})), where X1X_{1} is independent of X2X_{2}, and Y1Y_{1} is independent of Y2Y_{2}. A crucial property is that composition of privacy curves corresponds to addition of PRVs:

Theorem 2.3 (Dwork & Rothblum (2016)).

Let δ1\delta_{1} and δ2\delta_{2} be privacy curves with PRVs (X1,Y1)(X_{1},Y_{1}) and (X2,Y2)(X_{2},Y_{2}) respectively. Then, the PRVs for the privacy curve δ1δ2\delta_{1}\otimes\delta_{2} are given by (X1+X2,Y1+Y2).(X_{1}+X_{2},Y_{1}+Y_{2}).

We interchangeably use the same letter to denote both a random variable and its corresponding probability distribution. For any two distributions Y1,Y2Y_{1},Y_{2}, we use Y1Y2Y_{1}\oplus Y_{2} to denote its convolution (same as the random variable Y1+Y2Y_{1}+Y_{2}). We use YkY^{\oplus k} to denote the kk-fold convolution of YY with itself.

Finally, we use the following tail bounds for PRVs.

Lemma 2.4 (Gopi et al. (2021)).

For all PRV YY, ε0\varepsilon\geq 0 and α>0\alpha>0, it holds that

Pr[|Y|ε+α]\displaystyle\textstyle\Pr[|Y|\geq\varepsilon+\alpha] δY(ε)(1+eεα)1eα\displaystyle\textstyle~{}\leq~{}\delta_{Y}(\varepsilon)\cdot\frac{(1+e^{-\varepsilon-\alpha})}{1-e^{-\alpha}}
4αδY(ε)if α<1.\displaystyle\textstyle~{}\leq~{}\frac{4}{\alpha}\delta_{Y}(\varepsilon)\quad\text{if }\alpha<1.

2.2 Coupling Approximation

To describe and analyze our algorithm, we use the coupling approximation tool used by Gopi et al. (2021). They showed that, in order to provide an approximation guarantee on the privacy loss curve, it suffices to approximate a PRV according to the following coupling notion of approximation:

Definition 2.5 (Coupling Approximation).

For two random variables Y1,Y2Y_{1},Y_{2}, we write |Y1Y2|ηh|Y_{1}-Y_{2}|\leq_{\eta}h if there exists a coupling between them such that Pr[|Y1Y2|>h]η\Pr[|Y_{1}-Y_{2}|>h]\leq\eta.

We use the following properties of coupling approximation shown by Gopi et al. (2021). We provide the proofs in Appendix A for completeness.

Lemma 2.6 (Properties of Coupling Approximation).

Let X,Y,ZX,Y,Z be random variables.

  1. 1.

    If |XY|δerrεerr|X-Y|\leq_{\delta_{\mathrm{err}}}\varepsilon_{\mathrm{err}}, then for all ε0\varepsilon\in\mathbb{R}_{\geq 0},

    δY(ε+εerr)δerrδX(ε)δY(εεerr)+δerr.\delta_{Y}(\varepsilon+\varepsilon_{\mathrm{err}})-\delta_{\mathrm{err}}~{}\leq~{}\delta_{X}(\varepsilon)~{}\leq~{}\delta_{Y}(\varepsilon-\varepsilon_{\mathrm{err}})+\delta_{\mathrm{err}}.
  2. 2.

    If |XY|η1h1|X-Y|\leq_{\eta_{1}}h_{1} and |YZ|η2h2|Y-Z|\leq_{\eta_{2}}h_{2}, then |XZ|η1+η2h1+h2|X-Z|\leq_{\eta_{1}+\eta_{2}}h_{1}+h_{2} (“triangle inequality”).

  3. 3.

    If dTV(X,Y)ηd_{\mathrm{TV}}(X,Y)\leq\eta, then |XY|η0|X-Y|\leq_{\eta}0; furthermore, for all k>0k\in\mathbb{Z}_{>0}, it holds that |XkYk|kη0|X^{\oplus k}-Y^{\oplus k}|\leq_{k\eta}0.

  4. 4.

    If |XYμ|0h|X-Y-\mu|\leq_{0}h (for any μ\mu) and 𝔼[X]=𝔼[Y]\operatorname{\mathbb{E}}[X]=\operatorname{\mathbb{E}}[Y], then for all η>0\eta>0 and k>0k\in\mathbb{Z}_{>0},

    |XkYk|ηh2klog2η.\textstyle\left|X^{\oplus k}-Y^{\oplus k}\right|~{}\leq_{\eta}~{}h\sqrt{2k\log\frac{2}{\eta}}.

2.3 Discretization Procedure

We adapt the discretization procedure from (Gopi et al., 2021). The only difference is that we assume the support of the input distribution is already in the specified range as opposed to being truncated as part of the algorithm. A complete description of the procedure is given in Algorithm 1.

Algorithm 1 𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵\mathsf{DiscretizeRV} (adapted from Gopi et al., 2021)
  Input: 𝖢𝖣𝖥Y()\mathsf{CDF}_{Y}(\cdot) of a RV YY, mesh size hh, truncation parameter Lh(12+>0)L\in h\cdot(\frac{1}{2}+\mathbb{Z}_{>0}).
  Constraint: Support of YY is contained in (L,L](-L,L].
  Output: 𝖯𝖣𝖥\mathsf{PDF} of an approximation Y~\widetilde{Y} supported on μ+(h[L,L])\mu+(h\mathbb{Z}\cap[-L,L]) for some μ[h2,h2]\mu\in[-\frac{h}{2},\frac{h}{2}].
  
  nLh2hn\leftarrow\frac{L-\frac{h}{2}}{h}
  for i=ni=-n to nn do
     qi𝖢𝖣𝖥Y(ih+h/2)𝖢𝖣𝖥Y(ihh/2)q_{i}\leftarrow\mathsf{CDF}_{Y}(ih+h/2)-\mathsf{CDF}_{Y}(ih-h/2)
  end for
  qq/(i=nnqi)q\leftarrow q/(\sum_{i=-n}^{n}q_{i}) \triangleright Normalize qq
  μ𝔼[Y]i=nnihqi\mu\leftarrow\operatorname{\mathbb{E}}[Y]-\sum_{i=-n}^{n}ih\cdot q_{i}
  Y~{ih+μw.p. qi for nin\widetilde{Y}\leftarrow\left\{\begin{matrix}ih+\mu&\text{w.p. }q_{i}\quad\text{ for }-n\leq i\leq n\end{matrix}\right.
  return  Y~\widetilde{Y}

The procedure can be shown to satisfy the following key property.

Proposition 2.7.

For any random variable YY supported in (L,L](-L,L], the output Y~\widetilde{Y} of 𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵\mathsf{DiscretizeRV} with mesh size hh and truncation parameter LL satisfies 𝔼[Y]=𝔼[Y~]\operatorname{\mathbb{E}}[Y]=\operatorname{\mathbb{E}}[\widetilde{Y}] and |YY~μ|0h2|Y-\widetilde{Y}-\mu|\leq_{0}\frac{h}{2}, for some μ\mu with |μ|h2|\mu|\leq\frac{h}{2}.

3 Two-Stage Composition Algorithm

Our two-stage algorithm for the case of kk-fold self-composition is given in Algorithm 2. We assume k=k1k2+rk=k_{1}\cdot k_{2}+r where k1,k2>0k_{1},k_{2}\in\mathbb{Z}_{>0}, r<k1r<k_{1}, and k1k2kk_{1}\approx k_{2}\approx\sqrt{k}, which for instance can be achieved by taking k1=kk_{1}=\lfloor\sqrt{k}\rfloor, k2=k/k1k_{2}=\left\lfloor k/k_{1}\right\rfloor, and r=kk1k2r=k-k_{1}\cdot k_{2}.

The algorithm implements the circular convolution L\oplus_{L} using Fast Fourier Transform (FFT). For any LL and xx\in\mathbb{R}, we define x(mod 2L)=x2Lnx\ (\mathrm{mod}\ 2L)=x-2Ln where nn\in\mathbb{Z} such that x2Ln(L,L]x-2Ln\in(-L,L]. Given x,yx,y\in\mathbb{R} the circular addition is defined as

xLy:=x+y(mod 2L).x\oplus_{L}y~{}:=~{}x+y\ (\mathrm{mod}\ 2L).

Similarly, for random variables X,YX,Y, we define XLYX\oplus_{L}Y to be their convolution modulo 2L2L and YLkY^{\oplus_{L}k} to be the kk-fold convolution of YY modulo 2L2L.

Observe that 𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵\mathsf{DiscretizeRV} with mesh size hh and truncation parameter LL runs in time O(L/h)O(L/h), assuming an O(1)O(1)-time access to 𝖢𝖣𝖥Y()\mathsf{CDF}_{Y}(\cdot). The FFT convolution step runs in time O(LihilogLihi)O\left(\frac{L_{i}}{h_{i}}\log\frac{L_{i}}{h_{i}}\right); thus the overall running time of 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} is

O(L1h1log(L1h1)+L2h2log(L2h2)).\displaystyle\textstyle O\left(\frac{L_{1}}{h_{1}}\log\left(\frac{L_{1}}{h_{1}}\right)+\frac{L_{2}}{h_{2}}\log\left(\frac{L_{2}}{h_{2}}\right)\right)\,.
Algorithm 2 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV}
  Input: PRV YY, number of compositions k=k1k2+rk=k_{1}\cdot k_{2}+r (for r<k1r<k_{1}), mesh sizes h1h2h_{1}\leq h_{2}, truncation parameters L1L2L_{1}\leq L_{2}, where each Lihi(12+>0)L_{i}\in h_{i}\cdot(\frac{1}{2}+\mathbb{Z}_{>0}).
  Output: PDF of an approximation Y2Y_{2} for YkY^{\oplus k}. Y2Y_{2} will be supported on μ+(h2[L2,L2])\mu+(h_{2}\mathbb{Z}\cap[-L_{2},L_{2}]) for some μ[0,h22]\mu\in\left[0,\frac{h_{2}}{2}\right].
  
  Y0Y||Y|L1Y_{0}\leftarrow Y|_{|Y|\leq L_{1}} \triangleright YY conditioned on |Y|L1|Y|\leq L_{1}
  Y~0𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Y0,h1,L1)\widetilde{Y}_{0}\leftarrow\mathsf{DiscretizeRV}(Y_{0},h_{1},L_{1})
  Y1Y~0L1k1Y_{1}\leftarrow\widetilde{Y}_{0}^{\oplus_{L_{1}}k_{1}} \triangleright k1k_{1}-fold FFT convolution
  Y~1𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Y1,h2,L2)\widetilde{Y}_{1}\leftarrow\mathsf{DiscretizeRV}(Y_{1},h_{2},L_{2})
  Y2Y~1L2k2Y_{2}\leftarrow\widetilde{Y}_{1}^{\oplus_{L_{2}}k_{2}} \triangleright k2k_{2}-fold FFT convolution
  YrY~0L1rY_{r}\leftarrow\widetilde{Y}_{0}^{\oplus_{L_{1}}r} \triangleright rr-fold FFT convolution
  Y~r𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Yr,h2,L2)\widetilde{Y}_{r}\leftarrow\mathsf{DiscretizeRV}(Y_{r},h_{2},L_{2})
  return  Y2L2Y~rY_{2}\oplus_{L_{2}}\widetilde{Y}_{r} \triangleright FFT convolution

The approximation guarantees provided by our two-stage algorithm are captured in the following theorem. For convenience, we assume that kk is a perfect square (we set k1=k2=kk_{1}=k_{2}=\sqrt{k}). The complete proof is in Section 3.1.

Theorem 3.1.

For any PRV YY, the approximation Y2Y_{2} returned by 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} satisfies

|YkY2|δerrεerr,|Y^{\oplus k}-Y_{2}|\leq_{\delta_{\mathrm{err}}}\varepsilon_{\mathrm{err}},

when invoked with k1=k2=k0.5k_{1}=k_{2}=k^{0.5} (assumed to be an integer) and parameters given below (using η:=δerr(8k+16)\eta:=\frac{\delta_{\mathrm{err}}}{(8\sqrt{k}+16)})

h1:=εerrk0.52log1η,h2:=εerrk0.252log1η\textstyle h_{1}:=\frac{\varepsilon_{\mathrm{err}}}{k^{0.5}\sqrt{2\log\frac{1}{\eta}}}\,,\quad h_{2}:=\frac{\varepsilon_{\mathrm{err}}}{k^{0.25}\sqrt{2\log\frac{1}{\eta}}}
L1\displaystyle L_{1} max{εY(εerrδerr16k1.25),εYk(εerrδerr64k0.75)}+εerrk0.25\displaystyle\textstyle~{}\geq~{}\max\left\{\begin{matrix}\varepsilon_{Y}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16k^{1.25}}\right),\\ \varepsilon_{Y^{\oplus\sqrt{k}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{64k^{0.75}}\right)\end{matrix}\right\}+\frac{\varepsilon_{\mathrm{err}}}{k^{0.25}}
L2\displaystyle L_{2} max{εYk(εerrδerr16)+2εerr,L1}.\displaystyle\textstyle~{}\geq~{}\max\left\{\varepsilon_{Y^{\oplus k}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16}\right)+2\varepsilon_{\mathrm{err}},L_{1}\right\}.

In terms of a concrete running time bound, Theorem 3.1 implies:

Corollary 3.2.

For any DP algorithm \mathcal{M}, the privacy curve δk(ε)\delta_{\mathcal{M}^{\circ k}}(\varepsilon) of the kk-fold (adaptive) composition of \mathcal{M} can be approximated in time O~(εupk0.25log(k/δerr)εerr),\widetilde{O}\left(\frac{\varepsilon_{\mathrm{up}}\cdot k^{0.25}\cdot\sqrt{\log(k/\delta_{\mathrm{err}})}}{\varepsilon_{\mathrm{err}}}\right), where εerr\varepsilon_{\mathrm{err}} is the additive error in ε\varepsilon, δerr\delta_{\mathrm{err}} is the additive error in δ\delta, and εup\varepsilon_{\mathrm{up}} is an upper bound on

max{εYk(εerrδerr16),k4εYk(εerrδerr64k0.75),k4εY(εerrδerr16k1.25)}+εerr.\displaystyle\max\left\{\begin{matrix}\varepsilon_{Y^{\oplus k}}(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16}),\\[2.84526pt] \sqrt[4]{k}\cdot\varepsilon_{Y^{\oplus\sqrt{k}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{64k^{0.75}}\right),\\[2.84526pt] \sqrt[4]{k}\cdot\varepsilon_{Y}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16k^{1.25}}\right)\end{matrix}\right\}+\varepsilon_{\mathrm{err}}.

In many practical regimes of interest, εup/εerr\varepsilon_{\mathrm{up}}/\varepsilon_{\mathrm{err}} is a constant. For ease of exposition in the following, we assume that εerr\varepsilon_{\mathrm{err}} is a small constant, e.g. 0.10.1 and suppress the dependence on εerr\varepsilon_{\mathrm{err}}. Suppose the original mechanism \mathcal{M} underlying YY satisfies (ε=1klog(1/δerr),δ=ok(1k1.25))(\varepsilon=\frac{1}{\sqrt{k\cdot\log(1/\delta_{\mathrm{err}})}},\delta=o_{k}\left(\frac{1}{k^{1.25}}\right))-DP. Then by advanced composition (Dwork et al., 2010), we have that t\mathcal{M}^{\circ t} satisfies (ε2tlog1δ+2tε(eε1),tδ+δ)(\varepsilon\sqrt{2t\log\frac{1}{\delta^{\prime}}}+2t\varepsilon(e^{\varepsilon}-1),t\delta+\delta^{\prime})-DP. If tδ+δtδerrk1.25t\delta+\delta^{\prime}\lesssim\frac{t\delta_{\mathrm{err}}}{k^{1.25}}, then we have that εYt(tδerrk1.25)tklnktδerr\varepsilon_{Y^{\oplus t}}\left(\frac{t\delta_{\mathrm{err}}}{k^{1.25}}\right)\lesssim\sqrt{\frac{t}{k}\ln\frac{k}{t\delta_{\mathrm{err}}}}. Instantiating this with t=1,kt=1,\sqrt{k}, and kk gives us that εup\varepsilon_{\mathrm{up}} is at most a constant.

Note that, to set the value of L1L_{1} and L2L_{2}, we do not need the exact value of εYk\varepsilon_{Y^{\oplus k}} (or εYk\varepsilon_{Y^{\oplus\sqrt{k}}} or εY\varepsilon_{Y}). We only need an upper bound on εYk\varepsilon_{Y^{\oplus k}}, which can often be obtained by using the RDP accountant or some other method.

For the case when kk is not a perfect square, using a similar analysis, it is easy to see that the approximation error would be no worse than the error in k2(k1+1)k_{2}(k_{1}+1) self-compositions. The running time increases by a constant because of the additional step of rr-fold convolution to get YrY_{r} and the final convolution step to get Y2L2Y~rY_{2}\oplus_{L_{2}}\widetilde{Y}_{r}; however this does not affect the asymptotic time complexity of the algorithm. Moreover, as seen in Section 4, even with this additional cost, 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} is still faster than the algorithm in Gopi et al. (2021).

3.1 Analysis

In this section we establish Theorem 3.1. The proof proceeds are follows. We establish coupling approximations between consecutive random variables in the following sequence:

Yk1k2,Y0k1k2,Y~0k1k2,Y1k2,Y~1k2,Y2,Y^{\oplus k_{1}k_{2}},\ \ Y_{0}^{\oplus k_{1}k_{2}},\ \ \widetilde{Y}_{0}^{\oplus k_{1}k_{2}},\ \ Y_{1}^{\oplus k_{2}},\ \ \widetilde{Y}_{1}^{\oplus k_{2}},Y_{2}\,,

and then apply Lemma 2.6(2).

To establish each coupling approximation, we use Lemmas 2.6(3) and 2.6(4).

Coupling (𝒀𝒌𝟏𝒌𝟐,𝒀𝟎𝒌𝟏𝒌𝟐)\left(Y^{\oplus k_{1}k_{2}},\ Y_{0}^{\oplus k_{1}k_{2}}\right).

Since dTV(Y,Y0)=Pr[|Y|>L1]=:δ0d_{\mathrm{TV}}(Y,Y_{0})=\Pr[|Y|>L_{1}]=:\delta_{0}, we have from Lemma 2.6(3) that

|Yk1k2Y0k1k2|\displaystyle|Y^{\oplus k_{1}k_{2}}-Y_{0}^{\oplus k_{1}k_{2}}| k1k2δ00.\displaystyle~{}\leq_{k_{1}k_{2}\delta_{0}}~{}0\,. (1)

Coupling (𝒀𝟎𝒌𝟏𝒌𝟐,𝒀~𝟎𝒌𝟏𝒌𝟐)\left(Y_{0}^{\oplus k_{1}k_{2}},\ \widetilde{Y}_{0}^{\oplus k_{1}k_{2}}\right) and (𝒀𝟏𝒌𝟐,𝒀~𝟏𝒌𝟐)\left(Y_{1}^{\oplus k_{2}},\ \widetilde{Y}_{1}^{\oplus k_{2}}\right).

We have from Proposition 2.7 that 𝔼[Y0]=𝔼[Y~0]\operatorname{\mathbb{E}}[Y_{0}]=\operatorname{\mathbb{E}}[\widetilde{Y}_{0}] and that |Y0Y~0μ|0h12|Y_{0}-\widetilde{Y}_{0}-\mu|\leq_{0}\frac{h_{1}}{2}. Thus, by applying Lemma 2.6(4), we have that (for any η\eta; to be chosen later)

|Y0k1Y~0k1|\displaystyle\left|Y_{0}^{\oplus k_{1}}-\widetilde{Y}_{0}^{\oplus k_{1}}\right| ηh1k12log2η,\displaystyle\textstyle~{}\leq_{\eta}~{}h_{1}\sqrt{\frac{k_{1}}{2}\log\frac{2}{\eta}}, (2)
|Y0k1k2Y~0k1k2|\displaystyle\left|Y_{0}^{\oplus k_{1}k_{2}}-\widetilde{Y}_{0}^{\oplus k_{1}k_{2}}\right| ηh1k1k22log2η.\displaystyle\textstyle~{}\leq_{\eta}~{}h_{1}\sqrt{\frac{k_{1}k_{2}}{2}\log\frac{2}{\eta}}. (3)

Similarly, we have that

|Y1k2Y~1k2|\displaystyle\left|Y_{1}^{\oplus k_{2}}-\widetilde{Y}_{1}^{\oplus k_{2}}\right| ηh2k22log2η.\displaystyle\textstyle~{}\leq_{\eta}~{}h_{2}\sqrt{\frac{k_{2}}{2}\log\frac{2}{\eta}}. (4)

Coupling (𝒀~𝟎𝒌𝟏𝒌𝟐,𝒀𝟏𝒌𝟐)\left(\widetilde{Y}_{0}^{\oplus k_{1}k_{2}},\ Y_{1}^{\oplus k_{2}}\right) and (𝒀~𝟏𝒌𝟐,𝒀𝟐)\left(\widetilde{Y}_{1}^{\oplus k_{2}},\ Y_{2}\right).

Since dTV(Y~0k1,Y~0L1k1)Pr[|Y~0k1|>L1]=:δ1d_{\mathrm{TV}}(\widetilde{Y}_{0}^{\oplus k_{1}},\widetilde{Y}_{0}^{\oplus_{L_{1}}k_{1}})\leq\Pr\left[\left|\widetilde{Y}_{0}^{\oplus k_{1}}\right|>L_{1}\right]=:\delta_{1}, and Y1=Y0L1k1Y_{1}=Y_{0}^{\oplus_{L_{1}}k_{1}}, it holds via Lemma 2.6(3) that

|Y~0k1k2Y1k2|k2δ10.\displaystyle\left|\widetilde{Y}_{0}^{\oplus k_{1}k_{2}}-Y_{1}^{\oplus k_{2}}\right|\leq_{k_{2}\delta_{1}}0\,. (5)

Similarly, for δ2:=Pr[|Y~1k2|>L2]\delta_{2}:=\Pr\left[\left|\widetilde{Y}_{1}^{\oplus k_{2}}\right|>L_{2}\right], we have that

|Y~1k2Y2|δ20.\displaystyle\left|\widetilde{Y}_{1}^{\oplus k_{2}}-Y_{2}\right|\leq_{\delta_{2}}0\,. (6)

Towards combining the bounds.

Combining Equations 1, 3, 5, 4 and 6 using Lemma 2.6(2), we have that

|Yk1k2Y2|δerrεerr,\displaystyle\left|Y^{\oplus k_{1}k_{2}}-Y_{2}\right|~{}\leq_{\delta_{\mathrm{err}}}~{}\varepsilon_{\mathrm{err}},
where δerr\displaystyle\text{where }\ \ \delta_{\mathrm{err}} =2η+k1k2δ0+k2δ1+δ2,\displaystyle\textstyle~{}=~{}2\eta+k_{1}k_{2}\delta_{0}+k_{2}\delta_{1}+\delta_{2}\,, (7)
and εerr\displaystyle\text{and }\ \ \varepsilon_{\mathrm{err}} =(h1k1k2+h2k2)12log2η.\displaystyle\textstyle~{}=~{}\left(h_{1}\sqrt{k_{1}k_{2}}+h_{2}\sqrt{k_{2}}\right)\sqrt{\frac{1}{2}\log\frac{2}{\eta}}\,. (8)

We set h1:=εerr2k1k2log2ηh_{1}:=\frac{\varepsilon_{\mathrm{err}}}{\sqrt{2k_{1}k_{2}\log\frac{2}{\eta}}} and h2:=εerr2k2log2ηh_{2}:=\frac{\varepsilon_{\mathrm{err}}}{\sqrt{2k_{2}\log\frac{2}{\eta}}}. The key step remaining is to bound δ0\delta_{0}, δ1\delta_{1}, and δ2\delta_{2} in terms of hih_{i}’s, LiL_{i}’s, and ηi\eta_{i}’s.

To do so, we use Lemma 2.4 for αi\alpha_{i}’s to be chosen later.

Bounding 𝜹𝟎\delta_{0}.

We have δ0:=Pr[|Y|>L1]\delta_{0}:=\Pr\left[\left|Y\right|>L_{1}\right] and hence

δ0\displaystyle\delta_{0} 4α0δY(L1α0).\displaystyle\textstyle~{}\leq~{}\frac{4}{\alpha_{0}}\delta_{Y}(L_{1}-\alpha_{0}).

Bounding 𝜹𝟏\delta_{1}.

For h~1:=h1k12log2η=εerr2k2\widetilde{h}_{1}:=h_{1}\sqrt{\frac{k_{1}}{2}\log\frac{2}{\eta}}=\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}}, we have

δ1\displaystyle\delta_{1} :=Pr[|Y~0k1|>L1]\displaystyle\textstyle~{}:=~{}\Pr\left[\left|\widetilde{Y}_{0}^{\oplus k_{1}}\right|>L_{1}\right]
Pr[|Y~0k1Y0k1|>h~1]\displaystyle\textstyle~{}\leq~{}\Pr\left[\left|\widetilde{Y}_{0}^{\oplus k_{1}}-Y_{0}^{\oplus k_{1}}\right|>\widetilde{h}_{1}\right]
+Pr[|Y0k1|>L1h~1]\displaystyle\textstyle\quad+~{}\Pr\left[\left|Y_{0}^{\oplus k_{1}}\right|>L_{1}-\widetilde{h}_{1}\right]
η+Pr[|Yk1|>L1h~1]\displaystyle\textstyle~{}\leq~{}\eta+\Pr\left[\left|Y^{\oplus k_{1}}\right|>L_{1}-\widetilde{h}_{1}\right]
δ1\displaystyle\Longrightarrow\quad\delta_{1} η+4α1δYk1(L1εerr2k2α1),\displaystyle\textstyle~{}\leq~{}\eta+\frac{4}{\alpha_{1}}\delta_{Y^{\oplus k_{1}}}\left(L_{1}-\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}}-\alpha_{1}\right),

where the second inequality uses Equation 2 and the third inequality uses the fact that the tails of Yk1Y^{\oplus k_{1}} are only heavier than the tails of Y0k1Y_{0}^{\oplus k_{1}} since Y0Y_{0} is a truncation of YY.

Bounding 𝜹𝟐\delta_{2}.

First, we combine Equations 3, 4 and 5 to get (recall εerr\varepsilon_{\mathrm{err}} in Equation 8)

|Y0k1k2Y~1k2|\displaystyle\left|Y_{0}^{\oplus k_{1}k_{2}}-\widetilde{Y}_{1}^{\oplus k_{2}}\right| 2η+k2δ1εerr.\displaystyle\textstyle~{}\leq_{2\eta+k_{2}\delta_{1}}~{}\varepsilon_{\mathrm{err}}.

Using this, we get

δ2\displaystyle\delta_{2} :=Pr[|Y~1k2|>L2]\displaystyle\textstyle~{}:=~{}\Pr\left[\left|\widetilde{Y}_{1}^{\oplus k_{2}}\right|>L_{2}\right]
Pr[|Y~1k2Y0k1k2|>εerr]\displaystyle\textstyle~{}\leq~{}\Pr\left[\left|\widetilde{Y}_{1}^{\oplus k_{2}}-Y_{0}^{\oplus k_{1}k_{2}}\right|>\varepsilon_{\mathrm{err}}\right]
+Pr[|Y0k1k2|>L2εerr]\displaystyle\textstyle\qquad+~{}\Pr\left[\left|Y_{0}^{\oplus k_{1}k_{2}}\right|>L_{2}-\varepsilon_{\mathrm{err}}\right]
2η+k2δ1+Pr[|Yk1k2|>L2εerr]\displaystyle\textstyle~{}\leq~{}2\eta+k_{2}\delta_{1}+\Pr\left[\left|Y^{\oplus k_{1}k_{2}}\right|>L_{2}-\varepsilon_{\mathrm{err}}\right]
2η+k2δ1+4α2δYk1k2(L2εerrα2)\displaystyle\textstyle~{}\leq~{}2\eta+k_{2}\delta_{1}+\frac{4}{\alpha_{2}}\delta_{Y^{\oplus k_{1}k_{2}}}\left(L_{2}-\varepsilon_{\mathrm{err}}-\alpha_{2}\right)
δ2\displaystyle\Longrightarrow\quad\delta_{2} (k2+2)η+k24α1δYk1(L1εerr2k2α1)\displaystyle\textstyle~{}\leq~{}(k_{2}+2)\eta+k_{2}\cdot\frac{4}{\alpha_{1}}\delta_{Y^{\oplus k_{1}}}(L_{1}-\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}}-\alpha_{1})
+4α2δYk1k2(L2εerrα2),\displaystyle\textstyle\qquad+~{}\frac{4}{\alpha_{2}}\delta_{Y^{\oplus k_{1}k_{2}}}\left(L_{2}-\varepsilon_{\mathrm{err}}-\alpha_{2}\right),

where we use in the third step that tails of Yk1k2Y^{\oplus k_{1}k_{2}} are only heavier than tails of Y0k1k2Y_{0}^{\oplus k_{1}k_{2}} since Y0Y_{0} is a truncation of YY.

Putting it all together.

Plugging in these bounds for δ0\delta_{0}, δ1\delta_{1}, and δ2\delta_{2} in Equation 7, we get that

|Yk1k2Y2|δerrεerr,\displaystyle\left|Y^{\oplus k_{1}k_{2}}-Y_{2}\right|~{}\leq_{\delta_{\mathrm{err}}}~{}\varepsilon_{\mathrm{err}},
where δerr\displaystyle\text{where }\ \ \delta_{\mathrm{err}} (2k2+4)η+k1k24α0δY(L1α0)\displaystyle\textstyle~{}\leq~{}(2k_{2}+4)\eta+k_{1}k_{2}\cdot\frac{4}{\alpha_{0}}\delta_{Y}(L_{1}-\alpha_{0})
+2k24α1δYk1(L1εerr2k2α1)\displaystyle\textstyle\qquad+~{}2k_{2}\cdot\frac{4}{\alpha_{1}}\delta_{Y^{\oplus k_{1}}}(L_{1}-\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}}-\alpha_{1})
+4α2δYk1k2(L2εerrα2),\displaystyle\textstyle\qquad+~{}\frac{4}{\alpha_{2}}\delta_{Y^{\oplus k_{1}k_{2}}}\left(L_{2}-\varepsilon_{\mathrm{err}}-\alpha_{2}\right)\,, (9)
and εerr\displaystyle\text{and }\ \ \varepsilon_{\mathrm{err}} =(h1k1k2+h2k2)2log2η.\displaystyle\textstyle~{}=~{}\left(h_{1}\sqrt{k_{1}k_{2}}+h_{2}\sqrt{k_{2}}\right)\sqrt{2\log\frac{2}{\eta}}.

Thus, we can set parameters L1L_{1}, L2L_{2}, and η\eta as follows such that each of the four terms in Equation 9 is at most δerr/4\delta_{\mathrm{err}}/4 to satisfy the above:

η\displaystyle\eta =δerr8k2+16,\displaystyle\textstyle~{}=~{}\frac{\delta_{\mathrm{err}}}{8k_{2}+16},
L1\displaystyle L_{1} max{εY(α0δerr16k1k2)+α0,εYk1(α1δerr32k2)+εerr2k2+α1},\displaystyle\textstyle~{}\geq~{}\max\left\{\begin{matrix}\varepsilon_{Y}\left(\frac{\alpha_{0}\delta_{\mathrm{err}}}{16k_{1}k_{2}}\right)+\alpha_{0},\\ \varepsilon_{Y^{\oplus k_{1}}}\left(\frac{\alpha_{1}\delta_{\mathrm{err}}}{32k_{2}}\right)+\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}}+\alpha_{1}\end{matrix}\right\},
L2\displaystyle L_{2} max{εYk1k2(α2δerr16)+εerr+α2,L1}.\displaystyle\textstyle~{}\geq~{}\max\left\{\varepsilon_{Y^{\oplus k_{1}k_{2}}}\left(\frac{\alpha_{2}\delta_{\mathrm{err}}}{16}\right)+\varepsilon_{\mathrm{err}}+\alpha_{2},L_{1}\right\}.

Setting k1=k2=kk_{1}=k_{2}=\sqrt{k} (assumed to be integers) and α0=εerrk2\alpha_{0}=\frac{\varepsilon_{\mathrm{err}}}{\sqrt{k_{2}}}, α1=εerr2k2\alpha_{1}=\frac{\varepsilon_{\mathrm{err}}}{2\sqrt{k_{2}}} and α2=εerr\alpha_{2}=\varepsilon_{\mathrm{err}}, completes the proof of Theorem 3.1.

Runtime analysis.

As argued earlier, the total running time of 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} is given as O(L1h1logL1h1+L2h2logL2h2)\textstyle O\left(\frac{L_{1}}{h_{1}}\log\frac{L_{1}}{h_{1}}+\frac{L_{2}}{h_{2}}\log\frac{L_{2}}{h_{2}}\right). Substituting in the bounds for L1L_{1}, L2L_{2}, h1h_{1}, and h2h_{2}, we get a final running time of

O~(εupk0.25log(k/δerr)εerr).\widetilde{O}\left(\frac{\varepsilon_{\mathrm{up}}\cdot k^{0.25}\cdot\sqrt{\log(k/\delta_{\mathrm{err}})}}{\varepsilon_{\mathrm{err}}}\right).

where εup\varepsilon_{\mathrm{up}} is as in Corollary 3.2.

3.2 Heterogeneous Compositions

Algorithm 2 can be easily generalized to handle heterogeneous composition of kk different mechanisms, with a running time blow up of kk over the homogeneous case. We defer the details to Appendix B.

4 Experimental Evaluation of 𝗧𝘄𝗼𝗦𝘁𝗮𝗴𝗲𝗦𝗲𝗹𝗳𝗖𝗼𝗺𝗽𝗼𝘀𝗲𝗣𝗥𝗩\mathsf{TwoStageSelfComposePRV}

Refer to caption
(a) Running times (average over 20 runs); shaded region indicates 20th–80th percentiles
Refer to caption
(b) Number of discretization buckets
Refer to caption
(c) Delta estimates
Figure 1: Compositions of the Laplace mechanism.
Refer to caption
(a) Running times (average over 20 runs); shaded region indicates 20th–80th percentiles
Refer to caption
(b) Number of discretization buckets
Refer to caption
(c) Delta estimates
Figure 2: Compositions of Poisson subsampled (with probability γ=0.2\gamma=0.2) Gaussian mechanism.
Refer to caption
(a) For Laplace mechanism (Figure 1).
Refer to caption
(b) For Poisson subsampled Gaussian mechanism (Figure 2).
Figure 3: Noise parameters used for experiments.

We empirically evaluate 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} on the tasks of self-composing two kinds of mechanism acting on dataset of real values x1,,xnx_{1},\ldots,x_{n} as

  • \triangleright

    Laplace Mechanism : returns ixi\sum_{i}x_{i} plus a noise drawn from L(0,b)L(0,b) given by the 𝖯𝖣𝖥\mathsf{PDF} P(x)=e|x|/b/2bP(x)=e^{-|x|/b}/2b.

  • \triangleright

    Poisson Subsampled Gaussian Mechanism with probability γ\gamma: Subsamples a random subset SS of indices by including each index independently with probability γ\gamma. Returns iSxi\sum_{i\in S}x_{i} plus a noise drawn from the Gaussian distribution 𝒩(0,σ2)\mathcal{N}(0,\sigma^{2}).

Both these mechanisms are highly used in practice. For each mechanism, we compare against the implementation by Gopi et al. (2021)222github.com/microsoft/prv_accountant (referred to as GLW) on three fronts: (i) the running time of the algorithm, (ii) the number of discretized buckets used, and (iii) the final estimates on δYk(ε)\delta_{Y^{\oplus k}}(\varepsilon) which includes comparing lower bound δY2(ε+εerr)δerr\delta_{Y_{2}}(\varepsilon+\varepsilon_{\mathrm{err}})-\delta_{\mathrm{err}}, estimates δY2(ε)\delta_{Y_{2}}(\varepsilon) and upper bounds δY2(εεerr)+δerr\delta_{Y_{2}}(\varepsilon-\varepsilon_{\mathrm{err}})+\delta_{\mathrm{err}}. We use εerr=0.1\varepsilon_{\mathrm{err}}=0.1 and δerr=1010\delta_{\mathrm{err}}=10^{-10} in all the experiments.

We run each algorithm for a varying number kk of self-compositions of a basic mechanism. The noise parameter of basic mechanism is chosen such that the final δ(ε)\delta(\varepsilon) value after kk-fold composition is equal to 10610^{-6} for each value of kk.333these values were computed using the Google DP accountant github.com/google/differential-privacy/tree/main/python. The exact choice of noise parameters used are shown in Figure 3.

The comparison for the Laplace mechanism is shown in Figure 1 and for the subsampled Gaussian mechanism is shown in Figure 2. In terms of accuracy we find that for the same choice of εerr\varepsilon_{\mathrm{err}} and δerr\delta_{\mathrm{err}}, the estimates returned by 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} are nearly identical to the estimates returned by GLW for the subsampled Gaussian mechanism. On the other hand, the estimates for Laplace mechanism returned by both algorithms are similar and consistent with each other, but strictly speaking, incomparable with each other.

5 Multi-Stage Recursive Composition

We extend the approach in 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} to give a multi-stage algorithm (Algorithm 3), presented only when kk is a power of 22 for ease of notation. Similar to the running time analysis of 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV}, the running time of 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV} is given as

O(i=1tLihilog(Lihi)),\displaystyle O\left(\sum_{i=1}^{t}\frac{L_{i}}{h_{i}}\log\left(\frac{L_{i}}{h_{i}}\right)\right),

assuming an O(1)O(1)-time access to 𝖢𝖣𝖥Y()\mathsf{CDF}_{Y}(\cdot).

Algorithm 3 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV}
  Input: PRV YY, number of compositions k=2tk=2^{t}, mesh sizes h1hth_{1}\leq\dots\leq h_{t}, truncation parameters L1LtL_{1}\leq\dots\leq L_{t}, where each Lihi(12+>0)L_{i}\in h_{i}\cdot(\frac{1}{2}+\mathbb{Z}_{>0}) for all ii.
  Output: PDF of an approximation YtY_{t} for YkY^{\oplus k}. YtY_{t} will be supported on μ+(ht[Lt,Lt])\mu+(h_{t}\mathbb{Z}\cap[-L_{t},L_{t}]) for some μ[0,ht2]\mu\in\left[0,\frac{h_{t}}{2}\right].
  Y0Y||Y|L1Y_{0}\leftarrow Y|_{|Y|\leq L_{1}} \triangleright YY conditioned on |Y|L1|Y|\leq L_{1}
  for i=0i=0 to t1t-1 do
     Y~i𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Yi,hi+1,Li+1)\widetilde{Y}_{i}\leftarrow\mathsf{DiscretizeRV}(Y_{i},h_{i+1},L_{i+1})
     Yi+1Y~iLi+1Y~iY_{i+1}\leftarrow\widetilde{Y}_{i}\oplus_{L_{i+1}}\widetilde{Y}_{i} \triangleright FFT convolution
  end for
  return  YtY_{t}

Theorem 5.1.

For all PRV YY and k=2tk=2^{t}, the approximation YtY_{t} returned by 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV} satisfies

|YkYt|δerrεerr,|Y^{\oplus k}-Y_{t}|\leq_{\delta_{\mathrm{err}}}\varepsilon_{\mathrm{err}},

using a choice of parameters satisfying for all iti\leq t that

hi\displaystyle h_{i} =Ω(εerrt1.52tilog1δerr),\displaystyle\textstyle~{}=~{}\Omega\left(\frac{\varepsilon_{\mathrm{err}}}{t^{1.5}\sqrt{2^{t-i}\log\frac{1}{\delta_{\mathrm{err}}}}}\right),
Li\displaystyle L_{i} εY2i(εerrδerr2O(t))+hi(3+2i12log2η),\displaystyle\textstyle~{}\geq~{}\varepsilon_{Y^{\oplus 2^{i}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{2^{O(t)}}\right)+h_{i}\cdot\left(3+2i\sqrt{\frac{1}{2}\log\frac{2}{\eta}}\right),
Lt\displaystyle L_{t} L1.\displaystyle~{}\geq~{}\cdots~{}\geq~{}L_{1}.

Proof Outline.

We establish coupling approximations between consecutive random variables in the sequence:

Y2t,Y02t,Y~02t,Y12t1,,Yt12,Y~t12,Yt,Y^{\oplus 2^{t}},\ Y_{0}^{\oplus 2^{t}},\ \widetilde{Y}_{0}^{\oplus 2^{t}},\ Y_{1}^{\oplus 2^{t-1}},\ \ldots,\ Y_{t-1}^{\oplus 2},\ \widetilde{Y}_{t-1}^{\oplus 2},\ Y_{t}\,,

using a similar approach as in the proof of Theorem 3.1.

Running time analysis.

The overall running time is at most O(iLihilogLihi)O\left(\sum_{i}\frac{L_{i}}{h_{i}}\log\frac{L_{i}}{h_{i}}\right), which can be upper bounded by

O~(εupt2.5log1δerrεerr),\displaystyle\widetilde{O}\left(\frac{\varepsilon_{\mathrm{up}}\cdot t^{2.5}\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right),

where εup:=maxi(2tiεY2i(εerrδerr2O(t)))\varepsilon_{\mathrm{up}}\textstyle~{}:=~{}\max_{i}\left(\sqrt{2^{t-i}}\cdot\varepsilon_{Y^{\oplus 2^{i}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{2^{O(t)}}\right)\right).

In many practical regimes of interest, εup/εerr\varepsilon_{\mathrm{up}}/\varepsilon_{\mathrm{err}} is at most polylog(t)=polyloglog(k)\mathrm{polylog}(t)=\mathrm{polyloglog}(k). For ease of exposition in the following, we assume that εerr\varepsilon_{\mathrm{err}} is a small constant, e.g. 0.10.1 and suppress the dependence on εerr\varepsilon_{\mathrm{err}}. Suppose the original mechanism \mathcal{M} underlying YY satisfies (ε=1klog(1/δerr),δ=ok(1)kO(1))(\varepsilon=\frac{1}{\sqrt{k\cdot\log(1/\delta_{\mathrm{err}})}},\delta=\frac{o_{k}(1)}{k^{O(1)}})-DP. Then by advanced composition (Dwork et al., 2010), we have that 2i\mathcal{M}^{\circ 2^{i}} satisfies (ε2i+1log1δ+2i+1ε(eε1),2iδ+δ)(\varepsilon\sqrt{2^{i+1}\log\frac{1}{\delta^{\prime}}}+2^{i+1}\varepsilon(e^{\varepsilon}-1),2^{i}\delta+\delta^{\prime})-DP. If 2iδ+δδerr2O(t)2^{i}\delta+\delta^{\prime}\lesssim\frac{\delta_{\mathrm{err}}}{2^{O(t)}}, then we have that εY2i(δerr2O(t))12tiln2O(t)δerr\varepsilon_{Y^{\oplus 2^{i}}}\left(\frac{\delta_{\mathrm{err}}}{2^{O(t)}}\right)\lesssim\sqrt{\frac{1}{2^{t-i}}\ln\frac{2^{O(t)}}{\delta_{\mathrm{err}}}}. Instantiating this with i=1,,ti=1,\ldots,t gives us that εup\varepsilon_{\mathrm{up}} is at most polylog(k)\mathrm{polylog}(k).

6 Conclusions and Discussion

In this work, we presented an algorithm with a running time and memory usage of polylog(k)\mathrm{polylog}(k) for the task of self-composing a broad class of DP mechanisms kk times. We also extended our algorithm to the case of composing kk different mechanisms in the same class, resulting in a running time and memory usage O~(k)\widetilde{O}(k); both of these improve the state-of-the-art by roughly a factor of k\sqrt{k}. We also demonstrated the practical benefits of our framework compared to the state-of-the-art by evaluating on the sub-sampled Gaussian mechanism and the Laplace mechanism, both of which are widely used in the literature and in practice.

For future work, it would be interesting to tighten the log factors in our bounds. A related future direction is to make the 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV} algorithm more practical, since the current recursive analysis is quite loose. Note that 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV} could also be performed with an arity larger than 22; e.g., with an arity of 100100, one would perform 1003100^{3} compositions as a three-stage composition. For any constant arity, our algorithm gives an asymptotic runtime of O(polylogk)O(\mathrm{polylog}k) as kk\to\infty, however, for practical considerations, one may also consider adapting the arity with kk to tighten the log factors. We avoided doing so for simplicity, since our focus in obtaining an O(polylog(k))O(\mathrm{polylog}(k)) running time was primarily theoretical.

Acknowledgments

We would like to thank the anonymous reviewers for their thoughtful comments that have improved the quality of the paper.

References

  • Abadi et al. (2016) Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In CCS, pp.  308–318, 2016.
  • Abowd (2018) Abowd, J. M. The US Census Bureau adopts differential privacy. In KDD, pp.  2867–2867, 2018.
  • Apple Differential Privacy Team (2017) Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
  • Bu et al. (2020) Bu, Z., Dong, J., Long, Q., and Su, W. J. Deep learning with Gaussian differential privacy. Harvard Data Science Review, 2020(23), 2020.
  • Bu et al. (2021) Bu, Z., Gopi, S., Kulkarni, J., Lee, Y. T., Shen, J. H., and Tantipongpipat, U. Fast and memory efficient differentially private-SGD via JL projections. In NeurIPS, 2021.
  • Bun & Steinke (2016) Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, pp.  635–658, 2016.
  • Bun et al. (2018) Bun, M., Dwork, C., Rothblum, G. N., and Steinke, T. Composable and versatile privacy via truncated CDP. In STOC, pp.  74–86, 2018.
  • Ding et al. (2017) Ding, B., Kulkarni, J., and Yekhanin, S. Collecting telemetry data privately. In NeurIPS, pp.  3571–3580, 2017.
  • Dong et al. (2019) Dong, J., Roth, A., and Su, W. J. Gaussian differential privacy. arXiv:1905.02383, 2019.
  • Doroshenko et al. (2022) Doroshenko, V., Ghazi, B., Kamath, P., Kumar, R., and Manurangsi, P. Connect the dots: Tighter discrete approximations of privacy loss distributions. In PETS (to appear), 2022.
  • Dwork & Rothblum (2016) Dwork, C. and Rothblum, G. N. Concentrated differential privacy. arXiv:1603.01887, 2016.
  • Dwork et al. (2006a) Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pp.  486–503, 2006a.
  • Dwork et al. (2006b) Dwork, C., McSherry, F., Nissim, K., and Smith, A. D. Calibrating noise to sensitivity in private data analysis. In TCC, pp.  265–284, 2006b.
  • Dwork et al. (2010) Dwork, C., Rothblum, G. N., and Vadhan, S. Boosting and differential privacy. In FOCS, pp.  51–60, 2010.
  • Erlingsson et al. (2014) Erlingsson, Ú., Pihur, V., and Korolova, A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pp.  1054–1067, 2014.
  • Google (2020) Google. DP Accounting Library. https://github.com/google/differential-privacy/tree/main/python/dp_accounting, 2020.
  • Gopi et al. (2021) Gopi, S., Lee, Y. T., and Wutschitz, L. Numerical composition of differential privacy. In NeurIPS, 2021.
  • Greenberg (2016) Greenberg, A. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
  • Kairouz et al. (2015) Kairouz, P., Oh, S., and Viswanath, P. The composition theorem for differential privacy. In ICML, pp.  1376–1385, 2015.
  • Kenthapadi & Tran (2018) Kenthapadi, K. and Tran, T. T. L. Pripearl: A framework for privacy-preserving analytics and reporting at LinkedIn. In CIKM, pp.  2183–2191, 2018.
  • Koskela & Honkela (2021) Koskela, A. and Honkela, A. Computing differential privacy guarantees for heterogeneous compositions using FFT. arXiv:2102.12412, 2021.
  • Koskela et al. (2020) Koskela, A., Jälkö, J., and Honkela, A. Computing tight differential privacy guarantees using FFT. In AISTATS, pp.  2560–2569, 2020.
  • Koskela et al. (2021) Koskela, A., Jälkö, J., Prediger, L., and Honkela, A. Tight differential privacy for discrete-valued mechanisms and for the subsampled Gaussian mechanism using FFT. In AISTATS, pp.  3358–3366, 2021.
  • Lukas Prediger (2020) Lukas Prediger, A. K. Code for computing tight guarantees for differential privacy. https://github.com/DPBayes/PLD-Accountant, 2020.
  • Meiser & Mohammadi (2018) Meiser, S. and Mohammadi, E. Tight on budget? Tight bounds for rr-fold approximate differential privacy. In CCS, pp.  247–264, 2018.
  • Microsoft (2021) Microsoft. A fast algorithm to optimally compose privacy guarantees of differentially private (DP) mechanisms to arbitrary accuracy. https://github.com/microsoft/prv_accountant, 2021.
  • Mironov (2017) Mironov, I. Rényi differential privacy. In CSF, pp.  263–275, 2017.
  • Murtagh & Vadhan (2016) Murtagh, J. and Vadhan, S. The complexity of computing the optimal composition of differential privacy. In TCC, pp.  157–175, 2016.
  • Rogers et al. (2021) Rogers, R., Subramaniam, S., Peng, S., Durfee, D., Lee, S., Kancha, S. K., Sahay, S., and Ahammad, P. LinkedIn’s audience engagements API: A privacy preserving data analytics system at scale. Journal of Privacy and Confidentiality, 11(3), 2021.
  • Shankland (2014) Shankland, S. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
  • Sommer et al. (2019) Sommer, D. M., Meiser, S., and Mohammadi, E. Privacy loss classes: The central limit theorem in differential privacy. PoPETS, 2019(2):245–269, 2019.
  • Zhu et al. (2022) Zhu, Y., Dong, J., and Wang, Y. Optimal accounting of differential privacy via characteristic function. In AISTATS, pp.  4782–4817, 2022.

Appendix A Proofs of Coupling Approximation Properties

For sake of completeness, we include proofs of the lemmas we use from Gopi et al. (2021).

Proof of Lemma 2.6(2).

There exists couplings (X,Y)(X,Y) and (Y,Z)(Y,Z) such that Pr[|XY|h1]η1\Pr[|X-Y|\geq h_{1}]\leq\eta_{1} and Pr[|YZ|h2]η2\Pr[|Y-Z|\geq h_{2}]\leq\eta_{2}. From these two couplings, we can construct a coupling between (X,Z)(X,Z): sample XX, sample YY from Y|XY|X (given by coupling (X,Y)(X,Y)) and finally sample ZZ from Z|YZ|Y (given by coupling (Y,Z)(Y,Z)). Therefore for this coupling, we have:

Pr[|XZ|h1+h2]\displaystyle\Pr[|X-Z|\geq h_{1}+h_{2}] Pr[|(XY)+(YZ)h1+h2]\displaystyle~{}\leq~{}\Pr[|(X-Y)+(Y-Z)\geq h_{1}+h_{2}]
Pr[|XY|+|YZ|h1+h2]\displaystyle~{}\leq~{}\Pr[|X-Y|+|Y-Z|\geq h_{1}+h_{2}]
Pr[|XY|h1]+Pr[|YZ|h2]\displaystyle~{}\leq~{}\Pr[|X-Y|\geq h_{1}]+\Pr[|Y-Z|\geq h_{2}]
η1+η2.\displaystyle~{}\leq~{}\eta_{1}+\eta_{2}\,.\qed
Proof of Lemma 2.6(3).

The first part follows from the fact that there exists a coupling between XX and YY such that Pr[XY]=dTV(X,Y)\Pr[X\neq Y]=d_{\mathrm{TV}}(X,Y). The second part follows from the first part and the fact that dTV(Xk,Yk)kdTV(X,Y)d_{\mathrm{TV}}(X^{\oplus k},Y^{\oplus k})\leq k\cdot d_{\mathrm{TV}}(X,Y). ∎

Proof of Lemma 2.6(4).

Let X=YY~X=Y-\widetilde{Y} where (Y,Y~)(Y,\widetilde{Y}) are coupled such that |YY~μ|h|Y-\widetilde{Y}-\mu|\leq h with probability 11. Then 𝔼[X]=0\operatorname{\mathbb{E}}[X]=0 and X[μh,μ+h]X\in[\mu-h,\mu+h] w.p. 11 and hence by Hoeffding’s inequality,

Pr[|Xk|t]2exp(2t2k(2h)2)=η, for t=h2klog2η.\Pr\left[|X^{\oplus k}|\geq t\right]\leq 2\exp\left(-\frac{2t^{2}}{k(2h)^{2}}\right)=\eta\,,\qquad\text{ for }t=h\sqrt{2k\log\frac{2}{\eta}}\,.\qed

Appendix B Two-Stage Heterogeneous Composition

We can handle composition of kk different PRVs Y1,,YkY^{1},\ldots,Y^{k} with a slight modification to 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageSelfComposePRV} as given in Algorithm 4. The approximation analysis remains similar as before. The main difference is that L1L_{1} and L2L_{2} are to be chosen as

L1\displaystyle L_{1} O(max{maxi{εYi(εerrδerr16k1k21.5)},maxt{εYtk1+1:(t+1)k1(εerrδerr64k21.5)}}+εerrk2),\displaystyle~{}\geq~{}O\left(\max\left\{\max_{i}\left\{\varepsilon_{Y^{i}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16k_{1}k_{2}^{1.5}}\right)\right\}\ ,\ \ \max_{t}\left\{\varepsilon_{Y^{tk_{1}+1:(t+1)k_{1}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{64k_{2}^{1.5}}\right)\right\}\right\}+\frac{\varepsilon_{\mathrm{err}}}{\sqrt{k_{2}}}\right),
L2\displaystyle L_{2} O(max{εY1:k(εerrδerr16)+εerr,L1}),\displaystyle~{}\geq~{}O\left(\max\left\{\varepsilon_{Y^{1:k}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16}\right)+\varepsilon_{\mathrm{err}},L_{1}\right\}\right),

where we denote Yi:j=YiYjY^{i:j}=Y^{i}\oplus\cdots\oplus Y^{j}. In the case of k1=k2=kk_{1}=k_{2}=\sqrt{k} (assumed to be an integer) and r=0r=0, this leads to a final running time of

O~(εupk1.25log(k/δerr)εerr),\widetilde{O}\left(\frac{\varepsilon_{\mathrm{up}}\cdot k^{1.25}\cdot\sqrt{\log(k/\delta_{\mathrm{err}})}}{\varepsilon_{\mathrm{err}}}\right),

where εup\varepsilon_{\mathrm{up}} can be bounded as

max{εY1:k(εerrδerr16),k4maxtεYtk+1:(t+1)k(εerrδerr64k0.75),k4maxiεYi(εerrδerr16k1.25)}+εerr.\max\left\{\varepsilon_{Y^{1:k}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16}\right),\ \ \sqrt[4]{k}\cdot\max_{t}\ \varepsilon_{Y^{t\sqrt{k}+1:(t+1)\sqrt{k}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{64k^{0.75}}\right),\ \ \sqrt[4]{k}\cdot\max_{i}\varepsilon_{Y^{i}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{16k^{1.25}}\right)\right\}\ \ +\ \ \varepsilon_{\mathrm{err}}.
Algorithm 4 𝖳𝗐𝗈𝖲𝗍𝖺𝗀𝖾𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{TwoStageComposePRV} for heterogeneous compositionss
  Input: PRVs Y1,,YkY^{1},\ldots,Y^{k}, number of compositions k=k1k2+rk=k_{1}\cdot k_{2}+r, r<k1r<k_{1}, mesh sizes h1h2h_{1}\leq h_{2}, truncation parameters L1L2L_{1}\leq L_{2}, where each Lihi(12+>0)L_{i}\in h_{i}\cdot(\frac{1}{2}+\mathbb{Z}_{>0}).
  Output: PDF of an approximation Y^2\widehat{Y}_{2} for Y1YkY^{1}\oplus\ldots\oplus Y^{k}. Y^2\widehat{Y}_{2} will be supported on μ+(h2[L2,L2])\mu+(h_{2}\mathbb{Z}\cap[-L_{2},L_{2}]) for some μ[h22,h22]\mu\in\left[-\frac{h_{2}}{2},\frac{h_{2}}{2}\right].
  
  for i=1i=1 to k1k2k_{1}k_{2} do
     Y0iYi||Yi|L1Y_{0}^{i}\leftarrow Y^{i}|_{|Y^{i}|\leq L_{1}} \triangleright YiY^{i} conditioned on |Yi|L1|Y^{i}|\leq L_{1}
     Y~0i𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Y0i,h1,L1)\widetilde{Y}^{i}_{0}\leftarrow\mathsf{DiscretizeRV}(Y^{i}_{0},h_{1},L_{1})
  end for
  for i=k1k2+1i=k_{1}k_{2}+1 to kk do
     Y0iYi||Yi|L2Y_{0}^{i}\leftarrow Y^{i}|_{|Y^{i}|\leq L_{2}} \triangleright YiY^{i} conditioned on |Yi|L2|Y^{i}|\leq L_{2}
     Y~0i𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Y0i,h2,L2)\widetilde{Y}^{i}_{0}\leftarrow\mathsf{DiscretizeRV}(Y^{i}_{0},h_{2},L_{2})
  end for
  
  for t=0t=0 to k21k_{2}-1 do
     Y1tY~0tk1+1L1L1Y~0(t+1)k1Y^{t}_{1}\leftarrow\widetilde{Y}^{tk_{1}+1}_{0}\oplus_{L_{1}}\cdots\oplus_{L_{1}}\widetilde{Y}^{(t+1)k_{1}}_{0} \triangleright k1k_{1}-fold FFT convolution
     Y~1t𝖣𝗂𝗌𝖼𝗋𝖾𝗍𝗂𝗓𝖾𝖱𝖵(Y1t,h2,L2)\widetilde{Y}^{t}_{1}\leftarrow\mathsf{DiscretizeRV}(Y^{t}_{1},h_{2},L_{2})
  end for
  
  Y2Y~11L2L2Y~1k2Y_{2}\leftarrow\widetilde{Y}^{1}_{1}\oplus_{L_{2}}\cdots\oplus_{L_{2}}\widetilde{Y}^{k_{2}}_{1} \triangleright k2k_{2}-fold FFT convolution
  return  Y2L2Y~0k1k2+1L2L2Y~0kY_{2}\oplus_{L_{2}}\widetilde{Y}_{0}^{k_{1}\cdot k_{2}+1}\oplus_{L_{2}}\cdots\oplus_{L_{2}}\widetilde{Y}_{0}^{k}

Appendix C Analysis of 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗦𝗲𝗹𝗳𝗖𝗼𝗺𝗽𝗼𝘀𝗲𝗣𝗥𝗩\mathsf{RecursiveSelfComposePRV}

We establish coupling approximations between consecutive random variables in the sequence:

Y2t,Y02t,Y~02t,Y12t1,Y~12t1,Y22t2,,Yt12,Y~t12,Yt,Y^{\oplus 2^{t}},\ Y_{0}^{\oplus 2^{t}},\ \widetilde{Y}_{0}^{\oplus 2^{t}},\ Y_{1}^{\oplus 2^{t-1}},\ \widetilde{Y}_{1}^{\oplus 2^{t-1}},\ Y_{2}^{\oplus 2^{t-2}},\ \ldots,\ Y_{t-1}^{\oplus 2},\ \widetilde{Y}_{t-1}^{\oplus 2},\ Y_{t}\,,

Coupling 𝒀𝟐𝒕Y^{\oplus 2^{t}} and 𝒀𝟎𝟐𝒕Y_{0}^{\oplus 2^{t}}.

Since dTV(Y,Y0)=Pr[|Y|>L1]=:δ0d_{\mathrm{TV}}(Y,Y_{0})=\Pr[|Y|>L_{1}]=:\delta_{0}, we have from Lemma 2.6(3) that

|Y2tY02t|\displaystyle|Y^{\oplus 2^{t}}-Y_{0}^{\oplus 2^{t}}| 2tδ00.\displaystyle~{}\leq_{2^{t}\delta_{0}}~{}0\,. (10)

Coupling 𝒀𝒊𝟐𝒕𝒊Y_{i}^{\oplus 2^{t-i}} and 𝒀~𝒊𝟐𝒕𝒊\widetilde{Y}_{i}^{\oplus 2^{t-i}}.

We have from Proposition 2.7 that 𝔼[Yi]=𝔼[Y~i]\operatorname{\mathbb{E}}[Y_{i}]=\operatorname{\mathbb{E}}[\widetilde{Y}_{i}] and that |YiY~iμi|0hi+12|Y_{i}-\widetilde{Y}_{i}-\mu_{i}|\leq_{0}\frac{h_{i+1}}{2} for some μi\mu_{i} satisfying |μi|hi+12|\mu_{i}|\leq\frac{h_{i+1}}{2}. Thus, applying Lemma 2.6(4), we have (for η\eta to be chosen later) that

|Yi2tiY~i2ti|ηhi+12ti1log2η=:h~i+1.\displaystyle\left|Y_{i}^{\oplus 2^{t-i}}-\widetilde{Y}_{i}^{\oplus 2^{t-i}}\right|~{}\leq_{\eta}~{}h_{i+1}\sqrt{2^{t-i-1}\log\frac{2}{\eta}}~{}=:~{}\widetilde{h}_{i+1}. (11)

Coupling 𝒀~𝒊𝟐𝒕𝒊\widetilde{Y}_{i}^{\oplus 2^{t-i}} and 𝒀𝒊+𝟏𝟐𝒕𝒊𝟏Y_{i+1}^{\oplus 2^{t-i-1}}.

Since dTV(Y~iY~i,Y~iLi+1Y~i)Pr[|Y~iY~i|>Li+1]=:δi+1d_{\mathrm{TV}}(\widetilde{Y}_{i}\oplus\widetilde{Y}_{i},\widetilde{Y}_{i}\oplus_{L_{i+1}}\widetilde{Y}_{i})\leq\Pr[|\widetilde{Y}_{i}\oplus\widetilde{Y}_{i}|>L_{i+1}]=:\delta_{i+1}, it holds via Lemma 2.6(3) that

|Y~i2tiYi+12ti1|2ti1δi+10.\displaystyle|\widetilde{Y}_{i}^{\oplus 2^{t-i}}-Y_{i+1}^{\oplus 2^{t-i-1}}|\leq_{2^{t-i-1}\delta_{i+1}}0\,. (12)

Putting things together.

Thus, combining Equations 11 and 12 for i{0,,t1}i\in\left\{0,\ldots,t-1\right\}, using Lemma 2.6(3), we have that

|Y02tYt|δε,\displaystyle\left|Y_{0}^{\oplus 2^{t}}-Y_{t}\right|~{}\leq_{\delta^{*}}~{}\varepsilon^{*}, (13)
where δ:=δt\displaystyle\text{where }\qquad\delta^{*}~{}:=~{}\delta_{t}^{*} :=tη+j=1t2tjδj,\displaystyle\textstyle~{}:=~{}t\eta+\sum_{j=1}^{t}2^{t-j}\delta_{j}\,, (14)
and ε:=εt\displaystyle\text{and }\qquad\varepsilon^{*}~{}:=~{}\varepsilon_{t}^{*} :=j=1thj2tjlog2η.\displaystyle\textstyle~{}:=~{}\sum_{j=1}^{t}h_{j}\sqrt{2^{t-j}\log\frac{2}{\eta}}.

More generally, the same analysis shows that for any 1it1\leq i\leq t,

|Y02iYi|δiεi,\displaystyle\left|Y_{0}^{\oplus 2^{i}}-Y_{i}\right|~{}\leq_{\delta_{i}^{*}}~{}\varepsilon_{i}^{*},
where δi\displaystyle\text{where }\qquad\delta_{i}^{*} :=iη+j=1i2ijδj,\displaystyle\textstyle~{}:=~{}i\eta+\sum_{j=1}^{i}2^{i-j}\delta_{j}\,, (15)
and εi\displaystyle\text{and }\qquad\varepsilon_{i}^{*} :=j=1ihj2ijlog2η.\displaystyle\textstyle~{}:=~{}\sum_{j=1}^{i}h_{j}\sqrt{2^{i-j}\log\frac{2}{\eta}}\,.

To simplify our analysis going forward, we fix the choice of hih_{i}’s that we will use, namely, hi=εt2tilog2ηh_{i}=\frac{\varepsilon^{*}}{t\sqrt{2^{t-i}\log\frac{2}{\eta}}} (for η\eta that will be chosen later). This implies that

εi\displaystyle\varepsilon_{i}^{*} =j=1ihj2ijlog2η=iεt2ti=hi+1i12log2η,\displaystyle~{}=~{}\sum_{j=1}^{i}h_{j}\sqrt{2^{i-j}\log\frac{2}{\eta}}~{}=~{}i\cdot\frac{\varepsilon^{*}}{t\sqrt{2^{t-i}}}~{}=~{}h_{i+1}\cdot i\sqrt{\frac{1}{2}\log\frac{2}{\eta}},

where the last step uses that iti\leq t.

In order to get our final bound, we need to bound δi\delta_{i}’s in terms of η\eta, the mesh sizes hih_{i}’s, and truncation parameters LiL_{i}’s. For ease of notation, we let δ0=0\delta_{0}^{*}=0. We have for 0i<t0\leq i<t that

δi+1\displaystyle\delta_{i+1} =Pr[|Y~iY~i|>Li+1]\displaystyle~{}=~{}\Pr\left[\left|\widetilde{Y}_{i}\oplus\widetilde{Y}_{i}\right|>L_{i+1}\right]
Pr[|YiYi|>Li+12hi+1](since, |YiY~i|hi+1 w.p. 1)\displaystyle~{}\leq~{}\Pr\left[\left|Y_{i}\oplus Y_{i}\right|>L_{i+1}-2h_{i+1}\right]\qquad\text{(since, $\left|Y_{i}-\widetilde{Y}_{i}\right|\leq h_{i+1}$ w.p. $1$)}
2Pr[|YiY02i|>εi]+Pr[|Y02i+1|>Li+12hi+12εi]\displaystyle~{}\leq~{}2\Pr\left[\left|Y_{i}-Y_{0}^{\oplus 2^{i}}\right|>\varepsilon_{i}^{*}\right]+\Pr\left[\left|Y_{0}^{\oplus 2^{i+1}}\right|>L_{i+1}-2h_{i+1}-2\varepsilon_{i}^{*}\right]
2δi+Pr[|Y2i+1|>Li+12hi+12εi]\displaystyle~{}\leq~{}2\delta_{i}^{*}+\Pr\left[\left|Y^{\oplus 2^{i+1}}\right|>L_{i+1}-2h_{i+1}-2\varepsilon_{i}^{*}\right]
δi+1\displaystyle\delta_{i+1} 2δi+4αi+1δY2i+1(L~i+1),\displaystyle~{}\leq~{}2\delta_{i}^{*}+\frac{4}{\alpha_{i+1}}\delta_{Y^{\oplus 2^{i+1}}}(\widetilde{L}_{i+1}), (16)

where in the penultimate step we use that the tails of Y02iY_{0}^{\oplus 2^{i}} are no larger than tails of Y2iY^{\oplus 2^{i}} since Y0Y_{0} is a truncation of YY, and in the last step we use Lemma 2.4 with L~i+1:=Li+12hi+1(1+i12log2η)αi+1\widetilde{L}_{i+1}:=L_{i+1}-2h_{i+1}(1+i\sqrt{\frac{1}{2}\log\frac{2}{\eta}})-\alpha_{i+1} (eventually we set αi+1=hi+1\alpha_{i+1}=h_{i+1}).

We show using an inductive argument that for Ci=8iC_{i}=8^{i},

δi\displaystyle\delta_{i} 2Ci(η+j=1i4αjδY2j(L~j)),\displaystyle\textstyle~{}\leq~{}2C_{i}\cdot\left(\eta+\sum_{j=1}^{i}\frac{4}{\alpha_{j}}\delta_{Y^{\oplus 2^{j}}}\left(\widetilde{L}_{j}\right)\right)\,, (17)
andδi\displaystyle\text{and}\qquad\delta_{i}^{*} Ci+1(η+j=1i4αjδY2j(L~j)).\displaystyle\textstyle~{}\leq~{}C_{i+1}\cdot\left(\eta+\sum_{j=1}^{i}\frac{4}{\alpha_{j}}\delta_{Y^{\oplus 2^{j}}}\left(\widetilde{L}_{j}\right)\right). (18)

The base case holds since δ14α1δY2(L~1)\delta_{1}~{}\leq~{}\frac{4}{\alpha_{1}}\delta_{Y^{\oplus 2}}(\widetilde{L}_{1}); note C1>1C_{1}>1. From (15), we have

δi\displaystyle\delta_{i}^{*} iη+j=1i2ijδj\displaystyle\textstyle~{}\leq~{}i\eta+\sum_{j=1}^{i}2^{i-j}\delta_{j}
iη+j=1i2ij(2Cj(η+=1j4αδY2(L~)))\displaystyle\textstyle~{}\leq~{}i\eta+\sum_{j=1}^{i}2^{i-j}\left(2C_{j}\cdot\left(\eta+\sum_{\ell=1}^{j}\frac{4}{\alpha_{\ell}}\delta_{Y^{\oplus 2^{\ell}}}\left(\widetilde{L}_{\ell}\right)\right)\right)
iη+(j=1i2ij2Cj)(η+j=1i4αjδY2j(L~j))\displaystyle\textstyle~{}\leq~{}i\eta+\left(\sum_{j=1}^{i}2^{i-j}\cdot 2C_{j}\right)\cdot\left(\eta+\sum_{j=1}^{i}\frac{4}{\alpha_{j}}\delta_{Y^{\oplus 2^{j}}}\left(\widetilde{L}_{j}\right)\right)
iη+(4Ci)(η+j=1i4αjδY2j(L~j))\displaystyle\textstyle~{}\leq~{}i\eta+(4C_{i})\cdot\left(\eta+\sum_{j=1}^{i}\frac{4}{\alpha_{j}}\delta_{Y^{\oplus 2^{j}}}\left(\widetilde{L}_{j}\right)\right)
Ci+1(η+j=1i4αjδY2j(L~j)).\displaystyle\textstyle~{}\leq~{}C_{i+1}\cdot\left(\eta+\sum_{j=1}^{i}\frac{4}{\alpha_{j}}\delta_{Y^{\oplus 2^{j}}}\left(\widetilde{L}_{j}\right)\right).

This completes the inductive step (18). Finally, (16) immediately implies the inductive step (17).

Putting this together in (14), and setting αi=hi\alpha_{i}=h_{i}, we get

δt\displaystyle\delta_{t}^{*} 1ε2O(t)(η+i=1tδYi(L~i)).\displaystyle~{}\leq~{}\frac{1}{\varepsilon^{*}}\cdot 2^{O(t)}\left(\eta+\sum_{i=1}^{t}\delta_{Y^{\oplus i}}(\widetilde{L}_{i})\right).

Finally, combining (13) with (10) using Lemma 2.6(2), we get

|Y2tYt|δerrεerr\displaystyle\left|Y^{\oplus 2^{t}}-Y_{t}\right|~{}\leq_{\delta_{\mathrm{err}}}~{}\varepsilon_{\mathrm{err}}
where δerr\displaystyle\text{where }\qquad\delta_{\mathrm{err}} :=1ε(2O(t)(η+i=1tδY2i(L~i))+2O(t)δY(L~1)),\displaystyle\textstyle~{}:=~{}\frac{1}{\varepsilon^{*}}\cdot\left(2^{O(t)}\left(\eta+\sum_{i=1}^{t}\delta_{Y^{\oplus 2^{i}}}(\widetilde{L}_{i})\right)+2^{O(t)}\delta_{Y}(\widetilde{L}_{1})\right)\,,
and εerr\displaystyle\text{and }\qquad\varepsilon_{\mathrm{err}} :=ε.\displaystyle\textstyle~{}:=~{}\varepsilon^{*}.

Thus, we get the desired approximation result for the following choice of parameters

η\displaystyle\eta =δerr2O(t),\displaystyle\textstyle~{}=~{}\frac{\delta_{\mathrm{err}}}{2^{O(t)}},
hi\displaystyle h_{i} =εerrt2tilog2η=Ω(εerrt1.52tilog1δerr),\displaystyle\textstyle~{}=~{}\frac{\varepsilon_{\mathrm{err}}}{t\sqrt{2^{t-i}\log\frac{2}{\eta}}}~{}=~{}\Omega\left(\frac{\varepsilon_{\mathrm{err}}}{t^{1.5}\sqrt{2^{t-i}\log\frac{1}{\delta_{\mathrm{err}}}}}\right),
Li\displaystyle L_{i} max{εY2i(εerrδerr2O(t))+hi(3+2i12log2η),Li1}.\displaystyle\textstyle~{}\geq~{}\max\left\{\varepsilon_{Y^{\oplus 2^{i}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{2^{O(t)}}\right)+h_{i}\cdot\left(3+2i\sqrt{\frac{1}{2}\log\frac{2}{\eta}}\right),L_{i-1}\right\}.

Thus, the overall running time is at most

O~(iLihi)\displaystyle\widetilde{O}\left(\sum_{i}\frac{L_{i}}{h_{i}}\right) =O~(εupt2.5log1δerrεerr),\displaystyle~{}=~{}\widetilde{O}\left(\frac{\varepsilon_{\mathrm{up}}\cdot t^{2.5}\sqrt{\log\frac{1}{\delta_{\mathrm{err}}}}}{\varepsilon_{\mathrm{err}}}\right),
where εup\displaystyle\text{where }\qquad\varepsilon_{\mathrm{up}} :=maxi(2tiεY2i(εerrδerr2O(t))).\displaystyle\textstyle~{}:=~{}\max_{i}\left(\sqrt{2^{t-i}}\cdot\varepsilon_{Y^{\oplus 2^{i}}}\left(\frac{\varepsilon_{\mathrm{err}}\delta_{\mathrm{err}}}{2^{O(t)}}\right)\right).

Extensions of 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲𝗦𝗲𝗹𝗳𝗖𝗼𝗺𝗽𝗼𝘀𝗲𝗣𝗥𝗩\mathsf{RecursiveSelfComposePRV}.

Similar to Appendix B, it is possible to extend 𝖱𝖾𝖼𝗎𝗋𝗌𝗂𝗏𝖾𝖲𝖾𝗅𝖿𝖢𝗈𝗆𝗉𝗈𝗌𝖾𝖯𝖱𝖵\mathsf{RecursiveSelfComposePRV} to handle the case where the number of compositions kk is not a power of 22, with a similar asymptotic runtime. It can then be extended to handle heterogeneous compositions of kk different mechanisms.