This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Distributed Differential Privacy
and Counting Distinct Elements

   Lijie Chen
Massachusetts Institute of Technology
Cambridge, MA
Email: [email protected]. Most of this work was done at Google Research, Mountain View, CA.
   Badih Ghazi
Google Research
Mountain View, CA
Email: [email protected]
   Ravi Kumar
Google Research
Mountain View, CA
Email: [email protected]
   Pasin Manurangsi
Google Research
Mountain View, CA
Email: [email protected]
Abstract

We study the setup where each of n\displaystyle n users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of (ε,δ)\displaystyle(\varepsilon,\delta)-differentially privacy:

  • In the non-interactive local setting, we prove that the additive error of any protocol is Ω(n)\displaystyle\Omega(n) for any constant ε\displaystyle\varepsilon and for any δ\displaystyle\delta inverse polynomial in n\displaystyle n.

  • In the single-message shuffle setting, we prove a lower bound of Ω~(n)\displaystyle\tilde{\Omega}(n) on the error for any constant ε\displaystyle\varepsilon and for some δ\displaystyle\delta inverse quasi-polynomial in n\displaystyle n. We do so by building on the moment-matching method from the literature on distribution estimation.

  • In the multi-message shuffle setting, we give a protocol with at most one message per user in expectation and with an error of O~(n)\displaystyle\tilde{O}(\sqrt{n}) for any constant ε\displaystyle\varepsilon and for any δ\displaystyle\delta inverse polynomial in n\displaystyle n. Our protocol is also robustly shuffle private, and our error of n\displaystyle\sqrt{n} matches a known lower bound for such protocols.

Our proof technique relies on a new notion, that we call dominated protocols, and which can also be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of selection and learning parity.

Our first lower bound for estimating the number of distinct elements provides the first ω(n)\displaystyle\omega(\sqrt{n}) separation between global sensitivity and error in local differential privacy, thus answering an open question of Vadhan (2017). We also provide a simple construction that gives Ω~(n)\displaystyle\tilde{\Omega}(n) separation between global sensitivity and error in two-party differential privacy, thereby answering an open question of McGregor et al. (2011).

1 Introduction

Differential privacy (DP) [DMNS06, DKM+06] has become a leading framework for private-data analysis, with several recent practical deployments [EPK14, Sha14, Gre16, App17, DKY17, Abo18]. The most commonly studied DP setting is the so-called central (aka curator) model whereby a single authority (sometimes referred to as the analyst) is trusted with running an algorithm on the raw data of the users and the privacy guarantee applies to the algorithm’s output.

The absence, in many scenarios, of a clear trusted authority has motivated the study of distributed DP models. The most well-studied such setting is the local model [KLN+11] (also [War65]), denoted henceforth by DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, where the privacy guarantee is enforced at each user’s output (i.e., the protocol transcript). While an advantage of the local model is its very strong privacy guarantees and minimal trust assumptions, the noise that has to be added can sometimes be quite large. This has stimulated the study of “intermediate” models that seek to achieve accuracy close to the central model while relying on more distributed trust assumptions. One such middle-ground is the so-called shuffle (aka anonymous) model [IKOS06, BEM+17, CSU+18, EFM+19], where the users send messages to a shuffler who randomly shuffles these messages before sending them to the analyzer; the privacy guarantee is enforced on the shuffled messages (i.e., the input to the analyzer). We study both the local and the shuffle models in this work.

1.1 Counting Distinct Elements

A basic function in data analytics is estimating the number of distinct elements in a domain of size D\displaystyle D held by a collection of n\displaystyle n users, which we denote by CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} (and simply by CountDistinctn\displaystyle\textsf{\small CountDistinct}_{n} if there is no restriction on the universe size). Beside its use in database management systems, it is a well-studied problem in sketching, streaming, and communication complexity (e.g., [KNW10, BCK+14] and the references therein). In central DP, it can be easily solved with constant error using the Laplace mechanism [DMNS06]; see also [MMNW11, DLB19, PS20, CDSKY20].

We obtain new results on (ε,δ)\displaystyle(\varepsilon,\delta)-DP protocols for CountDistinct in the local and shuffle settings111For formal definitions, please refer to Section 3. We remark that, throughout this work, we consider the non-interactive local model where all users apply the same randomizer (see Definition 3.6). We briefly discuss in Section 1.4 possible extensions to interactive local models..

1.1.1 Lower Bounds for Local DP Protocols

Our first result is a lower bound on the additive error of DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols222See Section 3 for the the formal (standard) definition of public-coin DP protocols. Note that private-coin protocols are a sub-class of public-coin protocols, so all of our lower bounds apply to private-coin protocols as well. for counting distinct elements.

Theorem 1.1.

For any ε=O(1)\displaystyle\varepsilon=O(1), no public-coin (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol can solve333Throughout this work, we say that a randomized algorithm solves a problem with error e\displaystyle e if with probability 0.99 it incurs error at most e\displaystyle e. CountDistinctn,n\displaystyle\textsf{\small CountDistinct}_{n,n} with error o(n)\displaystyle o(n).

The lower bound in Theorem 1.1 is asymptotically tight444The trivial algorithm that always outputs 0\displaystyle 0 incurs an error n\displaystyle n.. Furthermore, it answers a question of Vadhan [Vad17, Open Problem 9.6], who asked if there is a function with a gap of ω(n)\displaystyle\omega(\sqrt{n}) between its (global) sensitivity and the smallest achievable error by any DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol.555 To the best of our knowledge, the largest previously known gap between global sensitivity and error was O(n)\displaystyle O(\sqrt{n}), which is achieved, e.g., by binary summation [CSS12]. As the global sensitivity of the number of distinct elements is 1\displaystyle 1, Theorem 1.1 exhibits a (natural) function for which this gap is as large as Ω(n)\displaystyle\Omega(n). While Theorem 1.1 applies to the constant ε\displaystyle\varepsilon regime, it turns out we can prove a lower bound for much less private protocols (i.e., having a much larger ε\displaystyle\varepsilon value) at the cost of polylogarithmic factors in the error:

Theorem 1.2.

For some ε=ln(n)O(lnlnn)\displaystyle\varepsilon=\ln(n)-O(\ln\ln n) and D=Θ(n/polylog(n))\displaystyle D=\Theta(n/\mathop{\mathrm{polylog}}(n)), no public-coin (ε,nω(1))\displaystyle(\varepsilon,n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol can solve CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D).

To prove Theorem 1.2, we build on the moment matching method from the literature on (non-private) distribution estimation, namely [VV17, WY19], and tailor it to CountDistinct in the DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} setting (see Section 2.1 for more details on this connection). The bound on the privacy parameter ε\displaystyle\varepsilon in Theorem 1.2 turns out to be very close to tight: the error drops quadratically when ε\displaystyle\varepsilon exceeds lnn\displaystyle\ln{n}. This is shown in the next theorem:

Theorem 1.3.

There is a (ln(n)+O(1))\displaystyle(\ln(n)+O(1))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol solving CountDistinctn,n\displaystyle\textsf{\small CountDistinct}_{n,n} with error O(n)\displaystyle O(\sqrt{n}).

1.1.2 Lower Bounds for Single-Message Shuffle DP Protocols

In light of the negative result in Theorem 1.2, a natural question is whether CountDistinct can be solved in a weaker distributed DP setting such as the shuffle model. It turns out that this is not possible using any shuffle protocol where each user sends no more than 1\displaystyle 1 message (for brevity, we will henceforth denote this class by DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}, and more generally denote by DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} the variant where each user can send up to k\displaystyle k messages). Note that the class DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} includes any method obtained by taking a DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol and applying the so-called amplification by shuffling results of [EFM+19, BBGN19].

In the case where ε\displaystyle\varepsilon is any constant and δ\displaystyle\delta is inverse quasi-polynomial in n\displaystyle n, the improvement in the error for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols compared to DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} is at most polylogarithmic factors:

Theorem 1.4.

For all ε=O(1)\displaystyle\varepsilon=O(1), there are δ=2polylog(n)\displaystyle\delta=2^{-\mathop{\mathrm{polylog}}(n)} and D=n/polylog(n)\displaystyle D=n/\mathop{\mathrm{polylog}}(n) such that no public-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol can solve CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D).

We note that Theorem 1.4 essentially answers a more general variant of Vadhan’s question: it shows that even for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols (which include DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols as a sub-class) the gap between sensitivity and the error can be as large as Ω~(n)\displaystyle\tilde{\Omega}(n) .

The proof of Theorem 1.4 follows by combining Theorem 1.2 with the following connection between DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} and DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}:

Lemma 1.5.

For any ε=O(1)\displaystyle\varepsilon=O(1) and δδ01/n\displaystyle\delta\leq\delta_{0}\leq 1/n, if the randomizer R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} on n\displaystyle n users, then R\displaystyle R is (lnnln(Θε(logδ01/logδ1)),δ0)\displaystyle\left(\ln n-\ln(\Theta_{\varepsilon}(\log\delta_{0}^{-1}/\log\delta^{-1})),\delta_{0}\right)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

We remark that Lemma 1.5 provides a stronger quantitative bound than the qualitatively similar connections in [CSU+18, GGK+19]; specifically, we obtain the term ln(Θε(logδ01/logδ1))\displaystyle\ln(\Theta_{\varepsilon}(\log\delta_{0}^{-1}/\log\delta^{-1})), which was not present in the aforementioned works. This turns out to be crucial for our purposes, as this term gives the O(lnlnn)\displaystyle O(\ln\ln n) term necessary to apply Theorem 1.2.

1.1.3 A Communication-Efficient Shuffle DP Protocol

In contrast with Theorem 1.4, Balcer et al. [BCJM20] recently gave a DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol for CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error O(D)\displaystyle O(\sqrt{D}). Their protocol sends Ω(D)\displaystyle\Omega(D) messages per user. We instead show that an error of O~(D)\displaystyle\tilde{O}(\sqrt{D}) can still be guaranteed with each user sending in expectation at most one message each of length O(logD)\displaystyle O(\log D) bits.

Theorem 1.6.

For all εO(1)\displaystyle\varepsilon\leq O(1) and δ1/n\displaystyle\delta\leq 1/n, there is a public-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol that solves CountDistinctn\displaystyle\textsf{\small CountDistinct}_{n} with error min(n,D)poly(log(1/δ)/ε)\displaystyle\sqrt{\min(n,D)}\cdot\mathop{\mathrm{poly}}(\log(1/\delta)/\varepsilon) where the expected number of messages sent by each user is at most one.

In the special case where D=o(n/poly(ε1log(δ1)))\displaystyle D=o(n/\mathop{\mathrm{poly}}(\varepsilon^{-1}\log(\delta^{-1}))), we moreover obtain a private-coin DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol achieving the same guarantees as in Theorem 1.6 (see Theorem 8.4 for a formal statement). Note that Theorem 1.6 is in sharp contrast with the lower bound shown in Theorem 1.4 for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols. Indeed, for δ\displaystyle\delta inverse quasi-polynomial in n\displaystyle n, the former implies a public-coin protocol with less than one message per-user in expectation having error O~(n)\displaystyle\tilde{O}(\sqrt{n}) whereas the latter proves that no such protocol exists, even with error as large as Ω~(n)\displaystyle\tilde{\Omega}(n), if we restrict each user to send one message in the worst case.

A strengthening of DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols called robust DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols666Roughly speaking, they are DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols whose transcript remains private even if a constant fraction of users drop out from the protocol. was studied by [BCJM20], who proved an Ω(min(D,n))\displaystyle\Omega\left(\sqrt{\min(D,n)}\right) lower bound on the error of any protocol solving CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D}. Our protocols are robust DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} and, therefore, achieve the optimal error (up to polylogarithmic factors) among all robust DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols, while only sending at most one message per user in expectation.

1.2 Dominated Protocols and Multi-Message Shuffle DP Protocols

The technique underlying the proof of Theorem 1.1 can be extended beyond DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols for CountDistinct. It applies to a broader category of protocols that we call dominated, defined as follows:

Definition 1.7.

We say that a randomizer R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M} is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated, if there exists a distribution 𝒟\displaystyle\mathcal{D} on \displaystyle\mathcal{M} such that for all x𝒳\displaystyle x\in\mathcal{X} and all E\displaystyle E\subseteq\mathcal{M},

Pr[R(x)E]eεPr𝒟[E]+δ.\displaystyle\Pr[R(x)\in E]\leq e^{\varepsilon}\cdot\Pr_{\mathcal{D}}[E]+\delta.

In this case, we also say R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated by 𝒟\displaystyle\mathcal{D}. We define (ε,δ)\displaystyle(\varepsilon,\delta)-dominated protocols in the same way as (ε,δ)\displaystyle(\varepsilon,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, except that we require the randomizer to be (ε,δ)\displaystyle(\varepsilon,\delta)-dominated instead of being (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

Note that an (ε,δ)\displaystyle(\varepsilon,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} randomizer R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated: we can fix a y𝒳\displaystyle y^{*}\in\mathcal{X} and take 𝒟=R(y)\displaystyle\mathcal{D}=R(y^{*}). Therefore, our new definition is a relaxation of DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

We show that multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols are dominated, which allows us to prove the first non-trivial lower bounds against DPshuffleO(1)\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)} protocols.

Before formally stating this connection, we recall why known lower bounds against DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols [CSU+18, GGK+19, BC20] do not extend to DPshuffleO(1)\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)} protocols.777We remark that [GGK+20] developed a technique for proving lower bounds on the communication complexity (i.e., the number of bits sent per user) for multi-message protocols. Their techniques do not apply to our setting as our lower bounds are in terms of the number of messages, and do not put any restriction on the message length. Furthermore, their technique only applies to pure-DP where δ=0\displaystyle\delta=0, whereas ours applies also to approximate-DP where δ>0\displaystyle\delta>0. These prior works use the connection stating that any (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol is also (ε+lnn,δ)\displaystyle(\varepsilon+\ln n,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} [CSU+18, Theorem 6.2]. It thus suffices for them to prove lower bounds for DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols with low privacy requirement (i.e., (ε+lnn,δ)\displaystyle(\varepsilon+\ln n,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}), for which lower bound techniques are known or developed. For ε\displaystyle\varepsilon-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols, [BC20] showed that they are also ε\displaystyle\varepsilon-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}; therefore, lower bounds on DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols automatically translate to lower bounds on pure-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols. To apply this proof framework to DPshuffleO(1)\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)} protocols, a natural first step would be to connect DPshuffleO(1)\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)} protocols to DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols. However, as observed by [BC20, Section 4.1], there exists an ε\displaystyle\varepsilon-DPshuffleO(1)\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)} protocol that is not DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} for any privacy parameter. That is, there is no analogous connection between DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols and multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols, even if the latter can only send O(1)\displaystyle O(1) messages per user.

In contrast, the next lemma captures the connection between multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} and dominated protocols.

Lemma 1.8.

If R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} on n\displaystyle n users, then it is (ε+k(1+lnn),δ)\displaystyle(\varepsilon+k(1+\ln n),\delta)-dominated.

By considering dominated protocols and using Lemma 1.8, we obtain the first lower bounds for multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols for two well-studied problems: Selection and ParityLearning.

1.2.1 Lower Bounds for Selection

The Selection problem on n\displaystyle n users is defined as follows. The i\displaystyle ith user has an input xi{0,1}D\displaystyle x_{i}\in\{0,1\}^{D} and the goal is to output an index j[D]\displaystyle j\in[D] such that i=1nxi,j(maxji=1nxi,j)n/10\displaystyle\sum_{i=1}^{n}x_{i,j}\geq\left(\max_{j^{*}}\sum_{i=1}^{n}x_{i,j^{*}}\right)-n/10. Selection is well-studied in DP (e.g., [DJW13, SU17, Ull18]) and its variants are useful primitives for several statistical and algorithmic problems including feature selection, hypothesis testing and clustering. In central DP, the exponential mechanism of [MT07] yields an ε\displaystyle\varepsilon-DP algorithm for Selection when n=Oε(logD)\displaystyle n=O_{\varepsilon}(\log{D}). On the other hand, it is known that any (ε,δ)\displaystyle(\varepsilon,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol for Selection with ε=O(1)\displaystyle\varepsilon=O(1) and δ=O(1/n1.01)\displaystyle\delta=O(1/n^{1.01}) requires n=Ω(DlogD)\displaystyle n=\Omega(D\log{D}) users [Ull18]. Moreover, [CSU+18] obtained a (ε,1/nO(1))\displaystyle(\varepsilon,1/n^{O(1)})-DPshuffleD\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{D} protocol for n=O~ε(D)\displaystyle n=\tilde{O}_{\varepsilon}(\sqrt{D}). By contrast, for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols, a lower bound of Ω(D1/17)\displaystyle\Omega(D^{1/17}) was obtained in [CSU+18] and improved to Ω(D)\displaystyle\Omega(D) in [GGK+19].

The next theorem give a lower bounds for Selection that holds against approximate-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocols. To the best of our knowledge, this is the first lower bound even for k=2\displaystyle k=2 (and even for the special case of pure protocols, where δ=0\displaystyle\delta=0).

Theorem 1.9.

For any ε=O(1)\displaystyle\varepsilon=O(1), any public-coin (ε,o(1/D))\displaystyle(\varepsilon,o(1/D))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol that solves Selection requires nΩ(Dk)\displaystyle n\geq\Omega\left(\frac{D}{k}\right).

We remark that combining the advanced composition theorem for DP and known DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} aggregation algorithms, one can obtain a (ε,1/poly(n))\displaystyle(\varepsilon,1/\mathop{\mathrm{poly}}(n))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol for Selection with O~(D/k)\displaystyle\tilde{O}(D/\sqrt{k}) samples for any kD\displaystyle k\leq D (see Appendix D for details).

1.2.2 Lower Bounds for Parity Learning

In ParityLearning, there is a hidden random vector s{0,1}D\displaystyle s\in\{0,1\}^{D}, each user gets a random vector x{0,1}D\displaystyle x\in\{0,1\}^{D} together with the inner product s,x\displaystyle\langle s,x\rangle over 𝔽2\displaystyle\mathbb{F}_{2}, and the goal is to recover s\displaystyle s. This problem is well-known for separating PAC learning from the Statistical Query (SQ) learning model [Kea98]. In DP, it was studied by [KLN+11] who gave a central DP protocol (also based on the exponential mechanism) computing it for n=O(D)\displaystyle n=O(D), and moreover proved a lower bound of n=2Ω(D)\displaystyle n=2^{\Omega(D)} for any DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol, thus obtaining the first exponential separation between the central and local settings.

We give a lower bound for ParityLearning that hold against approximate-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocols:

Theorem 1.10.

For any ε=O(1)\displaystyle\varepsilon=O(1), if P\displaystyle P is a public-coin (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol that solves ParityLearning with probability at least 0.99\displaystyle 0.99, then nΩ(2D/(k+1))\displaystyle n\geq\Omega(2^{D/(k+1)}).

Our lower bounds for ParityLearning can be generalized to the Statistical Query (SQ) learning framework of [Kea98] (see Section C for more details).

Independent Work.

In a recent concurrent work, Cheu and Ullman [CU20] proved that robust DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols solving Selection and ParityLearning require Ω(D)\displaystyle\Omega(\sqrt{D}) and Ω(2D)\displaystyle\Omega(2^{\sqrt{D}}) samples, respectively. Their results have no restriction on the number of messages sent by each user, but they only hold against the special case of robust protocols. Our results provide stronger lower bounds when the number of messages per user is less than D\displaystyle\sqrt{D}, and apply to the most general DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} model without the robustness restriction.

1.3 Lower Bounds for Two-Party DP Protocols

Finally, we consider another model of distributed DP, called the two-party model [MMP+10], denoted DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}. In this model, there are two parties, each holding part of the dataset. The DP guarantee is enforced on the view of each party (i.e., the transcript, its private randomness, and its input). See Section 9 for a formal treatment.

McGregor et al. [MMP+10] studied the DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} and proved an interesting separation of Ωε(n)\displaystyle\Omega_{\varepsilon}(n) between the global sensitivity and ε\displaystyle\varepsilon-DP protocol in this model. However, this lower bound does not extend to the approximate-DP case (where δ>0\displaystyle\delta>0); in this case, the largest known gap (also proved in [MMP+10]) is only Ω~ε(n)\displaystyle\tilde{\Omega}_{\varepsilon}(\sqrt{n}), and it was left as an open question if this can be improved888The conference version of the paper [MMP+10] actually claimed to also have a lower bound Ωε(n)\displaystyle\Omega_{\varepsilon}(n) for the approximate-DP case as well. However, it was later found to be incorrect; see [MMP+11] for more discussions.. We answer this question by showing that the gap of Ω~ε(n)\displaystyle\tilde{\Omega}_{\varepsilon}(n) holds even against approximate-DP protocols:

Theorem 1.11.

For any ε=O(1)\displaystyle\varepsilon=O(1) and any sufficiently large n\displaystyle n\in\mathbb{N}, there is a function f:{0,1}2n\displaystyle f\colon\{0,1\}^{2n}\to\mathbb{R} whose global sensitivity is one and such that no (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol can compute f\displaystyle f to within an error of o(n/logn)\displaystyle o(n/\log n).

The above bound is tight up to a logarithmic factors in n\displaystyle n, as it is trivial to achieve an error of n\displaystyle n.

The proof of Theorem 1.11 is unlike others in the paper; in fact, we only employ simple reductions starting from the hardness of inner product function already shown in [MMP+10]. Specifically, our function is a sum of blocks of inner product modulo 2. While this function is not symmetric, we show (Theorem 9.5) that it can be easily symmetrized.

1.4 Discussions and Open Questions

In this work, we study DP in distributed models, including the local and shuffle settings. By building on the moment matching method and using the newly defined notion of dominated protocols, we give novel lower bounds in both models for three fundamental problems: CountDistinct, Selection, and ParityLearning. While our lower bounds are (nearly) tight in a large setting of parameters, there are still many interesting open questions, three of which we highlight below:

  • DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} Lower Bounds for Protocols with Unbounded Number of Messages. Our connection between DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} and dominated protocols becomes weaker as k\displaystyle k\to\infty (Lemma 1.8). As a result, it cannot be used to establish lower bounds against DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols with a possibly unbounded number of messages. In fact, we are not aware of any separation between central DP and DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} without a restriction on the number of messages and without the robustness restriction. This remains a fundamental open question. (In contrast, separations between central DP and DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} are well-known, even for basic functions such as binary summation [CSS12].)

  • Lower Bounds against Interactive Local/Shuffle Model. Our lower bounds hold in the non-interactive local and shuffle DP models, where all users send their messages simultaneously in a single round. While it seems plausible that our lower bounds can be extended to the sequentially interactive local DP model [DJW13] (where each user speaks once but not simultaneously), it is unclear how to extend them to the fully interactive local DP model.

    The situation for DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} however is more complicated. Specifically, we are not aware of a formal treatment of an interactive setting for the shuffle model, which would be the first step in providing either upper or lower bounds. We remark that certain definitions could lead to the model being as powerful as the central model (in terms of achievable accuracy and putting aside communication constraints); see e.g., [IKOS06] on how to perform secure computations under a certain definition of the shuffle model.

  • DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} Lower Bounds for CountDistinct with Larger δ\displaystyle\delta. All but one of our lower bounds hold as long as δ=nω(1)\displaystyle\delta=n^{-\omega(1)}, which is a standard assumption in the DP literature. The only exception is that of Theorem 1.4, which requires δ=2Ω(logcn)\displaystyle\delta=2^{-\Omega(\log^{c}n)} for some constant c>0\displaystyle c>0. It is interesting whether this can be relaxed to δ=nω(1)\displaystyle\delta=n^{-\omega(1)}.

1.5 Organization

We describe in Section 2 the techniques underlying our results.Some basic definitions and notation are given in Section 3. We prove our main lower bounds for CountDistinct (Theorems 1.2 and 1.4) in Section 4. In Section 5, we define dominated protocols and prove Lemma 1.8. Our lower bounds for Selection and ParityLearning are then proved in Section 6. Theorem 1.1 is then proved in Section 7. Our DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{\geq 1} protocol for CountDistinct is presented and analyzed in Section 8. Finally, our lower bounds in the two-party model (in particular, Theorem 1.11) are proved in Section 9. Some deferred proofs appear in Appendices A and B. The connection to the SQ model is presented in Appendix C. Finally, in Appendix D, we describe the DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol for Selection with sample complexity O~(D/k)\displaystyle\tilde{O}(D/\sqrt{k}).

2 Overview of Techniques

In this section, we describe the main intuition behind our lower bounds. As alluded to in Section 1, we give two different proofs of the lower bounds for CountDistinct in the DPlocal\mathrm{DP}_{\mathrm{local}} and DPshuffle\mathrm{DP}_{\mathrm{shuffle}} settings, each with its own advantages:

  • Proof via Moment Matching. Our first proof is technically the hardest in our work. It applies to the much more challenging low-privacy setting (i.e., (lnnO(lnlnn),δ)(\ln n-O(\ln\ln n),\delta)-DPlocal\mathrm{DP}_{\mathrm{local}}), and shows an Ω(n/polylog(n))\Omega(n/\mathop{\mathrm{polylog}}(n)) lower bound on the additive error (Theorem 1.2). Together with our new improved connection between DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} and DPlocal\mathrm{DP}_{\mathrm{local}} (Lemma 1.5), it also implies the same lower bound for protocols in the DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} model. The key ideas behind the first proof will be discussed in Section 2.1.

  • Proof via Dominated Protocols. Our second proof has the advantage of giving the optimal Ω(n)\Omega(n) lower bound on the additive error (Theorem 1.1), but only in the constant privacy regime (i.e., (O(1),δ)(O(1),\delta)-DPlocal\mathrm{DP}_{\mathrm{local}}), and it is relatively simple compared to the first proof.

    Moreover, the second proof technique is very general and is a conceptual contribution: it can be applied to show lower bounds for other fundamental problems (i.e., Selection and ParityLearning; Theorems 1.9 and 1.10) against multi-message DPshuffle\mathrm{DP}_{\mathrm{shuffle}} protocols. We will highlight the intuition behind the second proof in Section 2.2.

While our lower bounds also work for the public-coin DPshuffle\mathrm{DP}_{\mathrm{shuffle}} models, throughout this section, we focus on private-coin models in order to simplify the presentation. The full proofs extending to public-coin protocols are given in later sections.

2.1 Lower Bounds for CountDistinct via Moment Matching

To clearly illustrate the key ideas behind the first proof, we will focus on the pure-DP case where each user can only send O(logn)O(\log n) bits. In Section 4, we generalize the proof to approximate-DP and remove the restriction on communication complexity.

Theorem 2.1 (A Weaker Version of Theorem 1.2).

For ε=ln(n/log7n)\varepsilon=\ln(n/\log^{7}n) and D=n/log5nD=n/\log^{5}n, no ε\varepsilon-DPlocal\mathrm{DP}_{\mathrm{local}} protocol where each user sends O(logn)O(\log n) bits can solve CountDistinctn,D\textsf{\small CountDistinct}_{n,D} with error o(D)o(D).

Throughout our discussion, we use R:[D]R:[D]\to\mathcal{M} to denote a ln(n/log7n)\ln(n/\log^{7}n)-DPlocal\mathrm{DP}_{\mathrm{local}} randomizer. By the communication complexity condition of Theorem 2.1, we have that ||poly(n)|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n).

Our proof is inspired by the lower bounds for estimating distinct elements in the property testing model, e.g., [VV17, WY19]. In particular, we use the so-called Poissonization trick. To discuss this trick, we start with some notation. For a vector λD\vec{\lambda}\in\mathbb{R}^{D}, we use 𝖯𝗈𝗂(λ)\vec{\mathsf{Poi}}(\vec{\lambda}) to denote the joint distribution of DD independent Poisson random variables:

𝖯𝗈𝗂(λ):=(𝖯𝗈𝗂(λ1),𝖯𝗈𝗂(λ2),,𝖯𝗈𝗂(λn)).\vec{\mathsf{Poi}}(\vec{\lambda}):=(\mathsf{Poi}(\vec{\lambda}_{1}),\mathsf{Poi}(\vec{\lambda}_{2}),\dotsc,\mathsf{Poi}(\vec{\lambda}_{n})).

For a distribution U\vec{U} on D\mathbb{R}^{D}, we define the corresponding mixture of multi-dimensional Poisson distributions as follows:

𝔼[𝖯𝗈𝗂(U)]:=𝔼λU𝖯𝗈𝗂(λ).\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U})]:=\operatornamewithlimits{\mathbb{E}}_{\vec{\lambda}\leftarrow\vec{U}}\vec{\mathsf{Poi}}(\vec{\lambda}).

For two random variables XX and YY supported on \mathbb{R}^{\mathcal{M}}, we use X+YX+Y to denote the random variable distributed as a sum of two independent samples from XX and YY.

Shuffling the Outputs of the Local Protocol. Our first observation is that the analyzer for any local protocol computing CountDistinct should achieve the same accuracy if it only sees the histogram of the randomizers’ outputs. This holds because only seeing the histogram of the outputs is equivalent to shuffling the outputs by a uniformly random permutation, which is in turn equivalent to shuffling the users in the dataset uniformly at random. Since shuffling the users in a dataset does not affect the number of distinct elements, it follows that only seeing the histogram does not affect the accuracy. Therefore, we only have to consider the histogram of the outputs of the local protocol computing CountDistinct. For a dataset WW, we use 𝖧𝗂𝗌𝗍R(W)\mathsf{Hist}_{R}(W) to denote the distribution of the histogram with randomizer RR.

Poissonization Trick. Given a distribution 𝒟\mathcal{D} on \mathcal{M}, suppose we draw a sample m𝖯𝗈𝗂(λ)m\leftarrow\mathsf{Poi}(\lambda), and then draw mm samples from 𝒟\mathcal{D}. If we use NN to denote the random variable corresponding to the histogram of these mm samples, it follows that each coordinate of NN is independent, and NN is distributed as 𝖯𝗈𝗂(λμ)\vec{\mathsf{Poi}}(\lambda\vec{\mu}), where μi=𝒟i\vec{\mu}_{i}=\mathcal{D}_{i} for each ii\in\mathcal{M}.

We can now apply the above trick to the context of local protocols (recall that by our first observation, we can focus on the histogram of the outputs). Suppose we build a dataset by drawing a sample m𝖯𝗈𝗂(λ)m\leftarrow\mathsf{Poi}(\lambda) and then adding mm users with input zz. By the above discussion, the corresponding histogram of the outputs with randomizer RR is distributed as 𝖯𝗈𝗂(λR(z))\vec{\mathsf{Poi}}(\lambda\cdot R(z)), where R(z)R(z) is treated as an |||\mathcal{M}|-dimensional vector corresponding to its probability distribution.

Moment-Matching Random Variables. Our next ingredient is the following construction of two moment-matching random variables used in [WY19]. Let LL\in\mathbb{N} and Λ=Θ(L2)\Lambda=\Theta(L^{2}). There are two random variables UU and VV supported on {0}[1,Λ]\{0\}\cup[1,\Lambda], such that 𝔼[U]=𝔼[V]=1\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1 and 𝔼[Uj]=𝔼[Vj]\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for every j[L]j\in[L]. Moreover U0V0>0.9U_{0}-V_{0}>0.9. That is, UU and VV have the same moments up to degree LL, while the probabilities of them being zero differs significantly. We will set L=lognL=\log n and hence Λ=Θ(log2n)\Lambda=\Theta(\log^{2}n).

Construction of Hard Distribution via Signal/Noise Decomposition. Recalling that D=n/log5nD=n/\log^{5}n, we will construct two input distributions for CountDistinctn,D\textsf{\small CountDistinct}_{n,D}.999In fact, in our presentation the number of inputs in each dataset from our hard distributions will not be exactly nn, but only concentrated around nn. This issue can be easily resolved by throwing “extra” users in the dataset; we refer the reader to Section 4.2 for the details. A sample from both distributions consists of two parts: a signal part with DD many users in expectation, and a noise part with nDn-D many users in expectation.

Formally, for a distribution WW over 0\mathbb{R}^{\geq 0} and a subset E[D]E\subseteq[D], the dataset distributions 𝒟𝗌𝗂𝗀𝗇𝖺𝗅W\mathcal{D}_{\sf signal}^{W} and 𝒟𝗇𝗈𝗂𝗌𝖾E\mathcal{D}_{\sf noise}^{E} are constructed as follows:

  • 𝒟𝗌𝗂𝗀𝗇𝖺𝗅W\mathcal{D}_{\sf signal}^{W}: for each i[D]i\in[D], we independently draw λiW\lambda_{i}\leftarrow W, and ni𝖯𝗈𝗂(λi)n_{i}\leftarrow\mathsf{Poi}(\lambda_{i}), and add nin_{i} many users with input ii.

  • 𝒟𝗇𝗈𝗂𝗌𝖾E\mathcal{D}_{\sf noise}^{E}: for each iEi\in E, we independently draw ni𝖯𝗈𝗂((nD)/|E|)n_{i}\leftarrow\mathsf{Poi}((n-D)/|E|), and add nin_{i} many users with input ii.

We are going to fix a “good” subset EE of [D][D] such that |E|0.02D|E|\leq 0.02\cdot D (we will later specify the other conditions for being “good”). Therefore, when it is clear from the context, we will use 𝒟𝗇𝗈𝗂𝗌𝖾\mathcal{D}_{\sf noise} instead of 𝒟𝗇𝗈𝗂𝗌𝖾E\mathcal{D}_{\sf noise}^{E}.

Our two hard distributions are then constructed as 𝒟U:=𝒟𝗌𝗂𝗀𝗇𝖺𝗅U+𝒟𝗇𝗈𝗂𝗌𝖾\mathcal{D}^{U}:=\mathcal{D}_{\sf signal}^{U}+\mathcal{D}_{\sf noise} and 𝒟V:=𝒟𝗌𝗂𝗀𝗇𝖺𝗅V+𝒟𝗇𝗈𝗂𝗌𝖾\mathcal{D}^{V}:=\mathcal{D}_{\sf signal}^{V}+\mathcal{D}_{\sf noise}. Using the fact that 𝔼[U]=𝔼[V]=1\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1, one can verify that there are DD users in each of 𝒟𝗌𝗂𝗀𝗇𝖺𝗅U\mathcal{D}_{\sf signal}^{U} and 𝒟𝗌𝗂𝗀𝗇𝖺𝗅V\mathcal{D}_{\sf signal}^{V} in expectation. Similarly, one can also verify there are nDn-D users in 𝒟𝗇𝗈𝗂𝗌𝖾\mathcal{D}_{\sf noise} in expectation. Hence, both 𝒟U\mathcal{D}^{U} and 𝒟V\mathcal{D}^{V} have nn users in expectation. In fact, the number of users from both distributions concentrates around nn.

We now justify our naming of the signal/noise distributions. First, note that the number of distinct elements in the signal parts 𝒟𝗌𝗂𝗀𝗇𝖺𝗅U\mathcal{D}_{\sf signal}^{U} and 𝒟𝗌𝗂𝗀𝗇𝖺𝗅V\mathcal{D}_{\sf signal}^{V} concentrates around (1𝔼[eU])D(1-\operatornamewithlimits{\mathbb{E}}[e^{-U}])\cdot D and (1𝔼[eV])D(1-\operatornamewithlimits{\mathbb{E}}[e^{-V}])\cdot D respectively. By our condition that U0V0>0.9U_{0}-V_{0}>0.9, it follows that the signal parts of 𝒟U\mathcal{D}^{U} and 𝒟V\mathcal{D}^{V} separates their numbers of distinct elements by at least 0.4D0.4D. Second, note that although 𝒟𝗇𝗈𝗂𝗌𝖾\mathcal{D}_{\sf noise} has nDDn-D\gg D many users in expectation, they are from the subset EE of size less than 0.02n0.02\cdot n. Therefore, these users collectively cannot change the number of distinct elements by more than 0.02n0.02\cdot n, and the numbers of distinct elements in 𝒟U\mathcal{D}^{U} and 𝒟V\mathcal{D}^{V} are still separated by Ω(D)\Omega(D).

Decomposition of Noise Part. To establish the desired lower bound, it now suffices to show for the local randomizer RR, it holds that 𝖧𝗂𝗌𝗍R(𝒟U)\mathsf{Hist}_{R}(\mathcal{D}^{U}) and 𝖧𝗂𝗌𝗍R(𝒟V)\mathsf{Hist}_{R}(\mathcal{D}^{V}) are very close in statistical distance. For W{U,V}W\in\{U,V\}, we can decompose 𝖧𝗂𝗌𝗍R(𝒟W)\mathsf{Hist}_{R}(\mathcal{D}^{W}) as

𝖧𝗂𝗌𝗍R(𝒟W)=i[D]𝖯𝗈𝗂(WR(i))+i[E]𝖯𝗈𝗂((nD)/|E|R(i)).\mathsf{Hist}_{R}(\mathcal{D}^{W})=\sum_{i\in[D]}\vec{\mathsf{Poi}}(W\cdot R(i))+\sum_{i\in[E]}\vec{\mathsf{Poi}}((n-D)/|E|\cdot R(i)).

By the additive property of Poisson distributions, letting ν=(nD)/|E|i[E]R(i)\vec{\nu}=(n-D)/|E|\cdot\sum_{i\in[E]}R(i), we have that i[E]𝖯𝗈𝗂((nD)/|E|R(i))=𝖯𝗈𝗂(ν)\sum_{i\in[E]}\vec{\mathsf{Poi}}((n-D)/|E|\cdot R(i))=\vec{\mathsf{Poi}}(\vec{\nu}).

Our key idea is to decompose ν\vec{\nu} carefully into D+1D+1 nonnegative vectors ν(0),ν(1),,ν(D)\vec{\nu}^{(0)},\vec{\nu}^{(1)},\dotsc,\vec{\nu}^{(D)}, such that ν=i=0Dν(i)\vec{\nu}=\sum_{i=0}^{D}\vec{\nu}^{(i)}. Then, for W{U,V}W\in\{U,V\}, we have

𝖧𝗂𝗌𝗍R(𝒟W)=𝖯𝗈𝗂(ν(0))+i[D]𝖯𝗈𝗂(WR(i)+ν(i)).\mathsf{Hist}_{R}(\mathcal{D}^{W})=\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i\in[D]}\vec{\mathsf{Poi}}(W\cdot R(i)+\vec{\nu}^{(i)}).

To show that 𝖧𝗂𝗌𝗍R(𝒟U)\mathsf{Hist}_{R}(\mathcal{D}^{U}) and 𝖧𝗂𝗌𝗍R(𝒟V)\mathsf{Hist}_{R}(\mathcal{D}^{V}) are close, it suffices to show that for each i[D]i\in[D], it is the case that 𝖯𝗈𝗂(UR(i)+ν(i))\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\nu}^{(i)}) and 𝖯𝗈𝗂(VR(i)+ν(i))\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\nu}^{(i)}) are close. We show that they are close when ν(i)\vec{\nu}^{(i)} is sufficiently large on every coordinate compared to R(i)R(i).

Lemma 2.2 (Simplification of Lemma 4.3).

For each i[D]i\in[D], and every λ(0)\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{\mathcal{M}}, if λz2Λ2R(i)z\vec{\lambda}_{z}\geq 2\Lambda^{2}\cdot R(i)_{z} for every zz\in\mathcal{M}, then101010We use 𝒟1𝒟2TV\|\mathcal{D}_{1}-\mathcal{D}_{2}\|_{TV} to denote the total variation (aka statistical) distance between two distributions 𝒟1,𝒟2\mathcal{D}_{1},\mathcal{D}_{2}.

𝔼[𝖯𝗈𝗂(UR(i)+λ)]𝔼[𝖯𝗈𝗂(VR(i)+λ)]TV1n2.\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\lambda})]\|_{TV}\leq\frac{1}{n^{2}}.

To apply Lemma 2.2, we simply set ν(i)=(2Λ2)R(i)\vec{\nu}^{(i)}=(2\Lambda^{2})\cdot R(i) and ν(0)=νi[D]ν(i)\vec{\nu}^{(0)}=\vec{\nu}-\sum_{i\in[D]}\vec{\nu}^{(i)}. Letting μ=i[D]R(i)\vec{\mu}=\sum_{i\in[D]}R(i), the requirement that ν(0)\vec{\nu}^{(0)} has to be nonnegative translates to νz2Λ2μz\vec{\nu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}, for each zz\in\mathcal{M}.

Construction of a Good Subset EE. So we want to pick a subset E[D]E\subseteq[D] of size at most 0.02D0.02\cdot D such that the corresponding νE=(nD)/|E|i[E]R(i)\vec{\nu}^{E}=(n-D)/|E|\cdot\sum_{i\in[E]}R(i) satisfies νzE2Λ2μz\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z} for each zz\in\mathcal{M}. We will show that a simple random construction works with high probability: i.e., one can simply add each element of [D][D] to EE independently with probability 0.010.01.

More specifically, for each zz\in\mathcal{M}, we will show that with high probability νzE2Λ2μz\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}. Then the correctness of our construction follows from a union bound (and this step crucially uses the fact that ||poly(n)|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n)).

Now, let us fix a zz\in\mathcal{M}. Let m=maxi[D]R(i)zm^{*}=\max_{i\in[D]}R(i)_{z}. Since RR is ln(n/log7n)\ln(n/\log^{7}n)-DP, it follows that νznDn/log7nmlog7n2m\vec{\nu}_{z}\geq\frac{n-D}{n/\log^{7}n}\cdot m^{*}\geq\frac{\log^{7}n}{2}\cdot m^{*}. We consider the following two cases:

  1. 1.

    If mμz/log2nm^{*}\geq\vec{\mu}_{z}/\log^{2}n, we immediately get that νzlog5n/2μz2Λ2μz\vec{\nu}_{z}\geq\log^{5}n/2\cdot\vec{\mu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z} (which uses the fact that Λ=Θ(log2n)\Lambda=\Theta(\log^{2}n)).

  2. 2.

    If m<μz/log2nm^{*}<\vec{\mu}_{z}/\log^{2}n, then in this case, the mass μz\vec{\mu}_{z} is distributed over at least log2n\log^{2}n many components R(i)zR(i)_{z}. Applying Hoeffding’s inequality shows that with high probability over EE, it is the case that νzEΘ(n/D)μzΛ2μz\vec{\nu}^{E}_{z}\geq\Theta(n/D)\cdot\vec{\mu}_{z}\geq\Lambda^{2}\cdot\vec{\mu}_{z} (which uses the fact that D=n/log5nD=n/\log^{5}n).

See the proof of Lemma 4.5 for a formal argument and how to get rid of the assumption that ||poly(n)|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n).

The Lower Bound. From the above discussions, we get that

𝖧𝗂𝗌𝗍R(𝒟U)𝖧𝗂𝗌𝗍R(𝒟V)TVi=1D𝔼[𝖯𝗈𝗂(UR(i)+ν(i))]𝔼[𝖯𝗈𝗂(VR(i)+ν(i))]TV1/n.\|\mathsf{Hist}_{R}(\mathcal{D}^{U})-\mathsf{Hist}_{R}(\mathcal{D}^{V})\|_{TV}\leq\sum_{i=1}^{D}\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\nu}^{(i)})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\nu}^{(i)})]\|_{TV}\leq 1/n.

Hence, the analyzer of the local protocol with randomizer RR cannot distinguish 𝒟U\mathcal{D}^{U} and 𝒟V\mathcal{D}^{V}, and thus it cannot solve CountDistinctn,D\textsf{\small CountDistinct}_{n,D} with error o(D)o(D) and 0.990.99 probability. See the proof of Theorem 4.1 for a formal argument and how to deal with the fact that there may not be exactly nn users in dataset from 𝒟U\mathcal{D}^{U} or 𝒟V\mathcal{D}^{V}.

Single-Message DPshuffle\mathrm{DP}_{\mathrm{shuffle}} Lower Bound. To apply the above lower bound to DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols, the natural idea is to resort to the connection between the DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} and DPlocal\mathrm{DP}_{\mathrm{local}} models. In particular, [CSU+18] showed that (ε,δ)(\varepsilon,\delta)-DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols are also (ε+lnn,δ)(\varepsilon+\ln n,\delta)-DPlocal\mathrm{DP}_{\mathrm{local}}.

It may seem that the lnn\ln n privacy guarantee is very close to the lnnO(lnlnn)\ln n-O(\ln\ln n) one in Theorem 1.2. But surprisingly, it turns out (as was stated in Theorem 1.3) that there is a (lnn+O(1))\left(\ln n+O(1)\right)-DPlocal\mathrm{DP}_{\mathrm{local}} protocol solving CountDistinctn,n\textsf{\small CountDistinct}_{n,n} (hence also CountDistinctn,D\textsf{\small CountDistinct}_{n,D}) with error O(n)O(\sqrt{n}). Hence, to establish the DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} lower bound (Theorem 1.4), we rely on the following stronger connection between DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} and DPlocal\mathrm{DP}_{\mathrm{local}} protocols.

Lemma 2.3 (Simplification of Lemma 1.5).

For every δ1/nω(1)\delta\leq 1/n^{\omega(1)}, if the randomizer RR is (O(1),δ)(O(1),\delta)-DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} on nn users, then RR is (ln(nlog2n/logδ1),nω(1))\left(\ln(n\log^{2}n/\log\delta^{-1}),n^{-\omega(1)}\right)-DPlocal\mathrm{DP}_{\mathrm{local}}.

Setting δ=2logkn\delta=2^{-\log^{k}n} for a sufficiently large kk and combining Lemma 2.3 and Theorem 1.2 gives us the desired lower bound against DPshuffle1\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols.

2.2 Lower Bounds for CountDistinct and Selection via Dominated Protocols

We will first describe the proof ideas behind Theorem 1.1, which is restated below. Then, we apply the same proof technique to obtain lower bounds for Selection (the lower bound for ParityLearning is established similarly; see Section 6.3 for details).

Lemma 2.4 (Detailed Version of Theorem 1.1).

For ε=o(lnn)\varepsilon=o(\ln n), no (ε,o(1/n))(\varepsilon,o(1/n))-dominated protocol can solve CountDistinct with error o(n/eε)o(n/e^{\varepsilon}).

Hard Distributions for CountDistinctn,n\textsf{\small CountDistinct}_{n,n}. We now construct our hard instances for CountDistinctn,n\textsf{\small CountDistinct}_{n,n}. For simplicity, we assume n=2Dn=2^{D} for an integer DD, and identify the input space [n][n] with {0,1}D\{0,1\}^{D} by a fixed bijection. Let 𝒰D\mathcal{U}_{D} be the the uniform distribution over {0,1}D\{0,1\}^{D}. For (,s)[2]×{0,1}D(\ell,s)\in[2]\times\{0,1\}^{D}, we let 𝒟,s\mathcal{D}_{\ell,s} be the uniform distribution on {x{0,1}D:x,s=}\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}.

We also use 𝒟,sα\mathcal{D}_{\ell,s}^{\alpha} to denote the mixture of 𝒟,s\mathcal{D}_{\ell,s} and 𝒰D\mathcal{U}_{D} which outputs a sample from 𝒟,s\mathcal{D}_{\ell,s} with probability α\alpha and a sample from 𝒰D\mathcal{U}_{D} with probability 1α1-\alpha.

For a parameter α>0\alpha>0, we consider the following two dataset distributions with nn users:

  • 𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆\mathcal{W}^{\sf uniform}: each user gets an i.i.d. input from 𝒰D\mathcal{U}_{D}. That is, 𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆:=𝒰Dn\mathcal{W}^{\sf uniform}:=\mathcal{U}_{D}^{\otimes n}.

  • 𝒲α\mathcal{W}^{\alpha}: to sample a dataset from 𝒲α\mathcal{W}^{\alpha}, we first draw (,s)(\ell,s) from [2]×{0,1}D[2]\times\{0,1\}^{D} uniformly at random, then each user gets an i.i.d. input from 𝒟,sα\mathcal{D}_{\ell,s}^{\alpha}. Formally, 𝒲α:=𝔼(,s)[2]×{0,1}D(𝒟,sα)n\mathcal{W}^{\alpha}:=\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}(\mathcal{D}_{\ell,s}^{\alpha})^{\otimes n}.

Since for every ,s\ell,s, it holds that |supp(𝒟,s1)|n/2|\mathrm{supp}(\mathcal{D}_{\ell,s}^{1})|\leq n/2, the number of distinct elements from any dataset in 𝒲1\mathcal{W}^{1} is at most n/2n/2. On the other hand, since 𝒰D\mathcal{U}_{D} is a uniform distribution over nn elements, a random dataset from 𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆=𝒲0\mathcal{W}^{\sf uniform}=\mathcal{W}^{0} has roughly (1e1)n>n/2(1-e^{-1})\cdot n>n/2 distinct elements with high probability. Hence, the expected number of distinct elements of datasets from 𝒲α\mathcal{W}^{\alpha} is controlled by the parameter α\alpha. A simple but tedious calculation shows that it is approximately (1e1cosh(α))n(1-e^{-1}\cdot\cosh(\alpha))\cdot n, which can be approximated by (1e1(1+α2))n(1-e^{-1}\cdot(1+\alpha^{2}))\cdot n for n0.1<α<0.01n^{-0.1}<\alpha<0.01 (see Proposition 7.1 for more details). Hence, any protocol solving CountDistinct with error o(α2n)o(\alpha^{2}n) should be able to distinguish between the above two distributions. Our goal is to show that this is impossible for (ε,o(1/n))(\varepsilon,o(1/n))-dominated protocols.

Bounding KL Divergence for Dominated Protocols. Our next step is to upper-bound the statistical distance 𝖧𝗂𝗌𝗍R(𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆)𝖧𝗂𝗌𝗍R(𝒲α)TV\|\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform})-\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})\|_{TV}. As in previous work [Ull18, GGK+19, ENU20], we may upper-bound the KL divergence instead. By the convexity and chain-rule properties of KL divergence, it follows that

KL(𝖧𝗂𝗌𝗍R(𝒲α)||𝖧𝗂𝗌𝗍R(𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆))\displaystyle\mathrm{KL}(\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})||\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform})) 𝔼(,s)[2]×{0,1}DKL(R(𝒟,sα)n||R(𝒰D)n)\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})^{\otimes n}||R(\mathcal{U}_{D})^{\otimes n})
=n𝔼(,s)[2]×{0,1}DKL(R(𝒟,sα)||R(𝒰D)).\displaystyle=n\cdot\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})||R(\mathcal{U}_{D})). (1)

Bounding the Average KL Divergence between a Family and a Single Distribution. We are now ready to introduce our general tool for bounding average KL divergence quantities like (1). We first set up some notation. Let \mathcal{I} be an index set and {λv}v\{\lambda_{v}\}_{v\in\mathcal{I}} be a family of distributions on 𝒳\mathcal{X}, let π\pi be a distribution on \mathcal{I}, and μ\mu be a distribution on 𝒳\mathcal{X}. For simplicity, we assume that for every x𝒳x\in\mathcal{X} and vv\in\mathcal{I}, it holds that (λv)x2μx(\lambda_{v})_{x}\leq 2\cdot\mu_{x} (which is true for {𝒟,sα}(,s)[2]×{0,1}D\{\mathcal{D}_{\ell,s}^{\alpha}\}_{(\ell,s)\in[2]\times\{0,1\}^{D}} and 𝒰D\mathcal{U}_{D}).

Theorem 2.5.

Let W:W\colon\mathbb{R}\to\mathbb{R} be a concave function such that for all functions ψ:𝒳0\psi\colon\mathcal{X}\to\mathbb{R}^{\geq 0} satisfying ψ(μ)1\psi(\mu)\leq 1, it holds that

𝔼vπ[(ψ(λv)ψ(μ))2]W(ψ).\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(\psi(\lambda_{v})-\psi(\mu))^{2}\right]\leq W(\|\psi\|_{\infty}).

Then for an (ε,δ)(\varepsilon,\delta)-dominated randomizer RR, it follows that

𝔼vπ[KL(R(λv)||R(μ))]O(W(2eε)+δ).\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq O\left(W(2e^{\varepsilon})+\delta\right).

Bounding (1) via Fourier Analysis. To apply Theorem 2.5, for f:𝒳0f\colon\mathcal{X}\to\mathbb{R}^{\geq 0} with f(𝒰D)=𝔼x{0,1}D[f(x)]1f(\mathcal{U}_{D})=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}[f(x)]\leq 1, we want to bound

𝔼(,s)[2]×{0,1}D[(f(𝒟,sα)f(𝒰D))2]=𝔼s{0,1}Dα2f^(s)2.\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}[(f(\mathcal{D}_{\ell,s}^{\alpha})-f(\mathcal{U}_{D}))^{2}]=\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}\alpha^{2}\cdot\hat{f}(s)^{2}.

By Parseval’s Identity (see Lemma 3.8),

s{0,1}Df^(s)2=𝔼x{0,1}Df(x)2f(𝒰D)ff.\sum_{s\in\{0,1\}^{D}}\hat{f}(s)^{2}=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}f(x)^{2}\leq f(\mathcal{U}_{D})\cdot\|f\|_{\infty}\leq\|f\|_{\infty}.

Therefore, we can set W(L):=α2L2DW(L):=\alpha^{2}\cdot\frac{L}{2^{D}}, and apply Theorem 2.5 to obtain

𝔼(,s)[2]×{0,1}DKL(R(𝒟,sα)||R(𝒰D))O(α2eε/n+δ).\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})||R(\mathcal{U}_{D}))\leq O(\alpha^{2}\cdot e^{\varepsilon}/n+\delta).

We set α\alpha such that α2=c/eε\alpha^{2}=c/e^{\varepsilon} for a sufficiently small constant cc and note that δ=o(1/n)\delta=o(1/n). It follows that KL(𝖧𝗂𝗌𝗍R(𝒲α)||𝖧𝗂𝗌𝗍R(𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆))0.01\mathrm{KL}(\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})||\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform}))\leq 0.01, and therefore 𝖧𝗂𝗌𝗍R(𝒲α)𝖧𝗂𝗌𝗍R(𝒲𝗎𝗇𝗂𝖿𝗈𝗋𝗆)TV0.1\|\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})-\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform})\|_{TV}\leq 0.1 by Pinsker’s inequality. Hence, we conclude that (ε,o(1/n))(\varepsilon,o(1/n))-dominated protocols cannot solve CountDistinctn,n\textsf{\small CountDistinct}_{n,n} with error o(n/eε)o(n/e^{\varepsilon}), completing the proof of Lemma 2.4. Now Theorem 1.1 follows from Lemma 2.4 and the fact that (ε,δ)(\varepsilon,\delta)-DPlocal\mathrm{DP}_{\mathrm{local}} protocols are also (ε,δ)(\varepsilon,\delta)-dominated.

Lower Bounds for Selection against Multi-Message DPshuffle\mathrm{DP}_{\mathrm{shuffle}} Protocols. Now we show how to apply Theorem 2.5 and Lemma 2.3 to prove lower bounds for Selection. For (,j)[2]×[D](\ell,j)\in[2]\times[D], let 𝒟,j\mathcal{D}_{\ell,j} be the uniform distribution on all length-DD binary strings with jjth bit being \ell. Recall that 𝒰D\mathcal{U}_{D} is the uniform distribution on {0,1}D\{0,1\}^{D}. Again we aim to upper-bound the average-case KL divergence 𝔼(,j)[2]×[D]KL(R(𝒟,j)||R(𝒰D)).\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D})).

To apply Theorem 2.5, for f:𝒳0f\colon\mathcal{X}\to\mathbb{R}^{\geq 0} with f(𝒰D)=𝔼x{0,1}D[f(x)]1f(\mathcal{U}_{D})=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}[f(x)]\leq 1, we want to bound

𝔼(,j)[2]×[D][(f(𝒟,jα)f(𝒰D))2]=𝔼j[D]f^({j})2.\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}[(f(\mathcal{D}_{\ell,j}^{\alpha})-f(\mathcal{U}_{D}))^{2}]=\operatornamewithlimits{\mathbb{E}}_{j\in[D]}\hat{f}(\{j\})^{2}.

By the Level-1 Inequality (see Lemma 3.7), it is the case that

j[D]f^({j})2O(logf).\sum_{j\in[D]}\hat{f}(\{j\})^{2}\leq O(\log\|f\|_{\infty}).

Therefore, we can set W(L):=c1logLDW(L):=c_{1}\cdot\frac{\log L}{D} for an appropriate constant c1c_{1}, and apply Theorem 2.5 to obtain

𝔼(,j)[2]×[D]KL(R(𝒟,j)||R(𝒰D))O(εD+δ).\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))\leq O\left(\frac{\varepsilon}{D}+\delta\right).

Combining this with Lemma 2.3 completes the proof (see the proofs of Lemma 6.3 and Theorem 1.9 for the details).

3 Preliminaries

3.1 Notation

For a function f:𝒳\displaystyle f\colon\mathcal{X}\to\mathbb{R}, a distribution 𝒟\displaystyle\mathcal{D} on 𝒳\displaystyle\mathcal{X}, and an element z𝒳\displaystyle z\in\mathcal{X}, we use f(𝒟)\displaystyle f(\mathcal{D}) to denote 𝔼x𝒟[f(x)]\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{D}}[f(x)] and 𝒟z\displaystyle\mathcal{D}_{z} to denote Prx𝒟[x=z]\displaystyle\Pr_{x\leftarrow\mathcal{D}}[x=z]. For a subset E𝒳\displaystyle E\subseteq\mathcal{X}, we use 𝒟E\displaystyle\mathcal{D}_{E} to denote zE𝒟z=Prx𝒟[xE]\displaystyle\sum_{z\in E}\mathcal{D}_{z}=\Pr_{x\leftarrow\mathcal{D}}[x\in E]. We also use 𝒰D\displaystyle\mathcal{U}_{D} to denote the uniform distribution over {0,1}D\displaystyle\{0,1\}^{D}.

For two distributions 𝒟1\displaystyle\mathcal{D}_{1} and 𝒟2\displaystyle\mathcal{D}_{2} on sets 𝒳\displaystyle\mathcal{X} and 𝒴\displaystyle\mathcal{Y} respectively, we use 𝒟1𝒟2\displaystyle\mathcal{D}_{1}\otimes\mathcal{D}_{2} to denote their product distribution over 𝒳×𝒴\displaystyle\mathcal{X}\times\mathcal{Y}. For two random variables X\displaystyle X and Y\displaystyle Y supported on D\displaystyle\mathbb{R}^{D} for D\displaystyle D\in\mathbb{N}, we use X+Y\displaystyle X+Y to denote the random variable distributed as a sum of two independent samples from X\displaystyle X and Y\displaystyle Y. For any set 𝒮\displaystyle\mathcal{S}, we denote by 𝒮\displaystyle\mathcal{S}^{*} the set consisting of sequences on 𝒮\displaystyle\mathcal{S}, i.e., 𝒮=n0𝒮n\displaystyle\mathcal{S}^{*}=\cup_{n\geq 0}\mathcal{S}^{n}. For x\displaystyle x\in\mathbb{R}, let [x]+\displaystyle[x]_{+} denote max(x,0)\displaystyle\max(x,0). For a predicate P\displaystyle P, we use 𝟙[P]\displaystyle\mathbb{1}[P] to denote the corresponding Boolean value of P\displaystyle P, that is, 𝟙[P]=1\displaystyle\mathbb{1}[P]=1 if P\displaystyle P is true, and 0\displaystyle 0 otherwise.

For a distribution 𝒟\displaystyle\mathcal{D} on a finite set 𝒳\displaystyle\mathcal{X} and an event 𝒳\displaystyle\mathcal{E}\subseteq\mathcal{X} such that Prz𝒟[z]>0\displaystyle\Pr_{z\leftarrow\mathcal{D}}[z\in\mathcal{E}]>0, we use 𝒟|\displaystyle\mathcal{D}|\mathcal{E} to denote the conditional distribution such that

(𝒟|)z={𝒟zPrz𝒟[z]if z,0otherwise.\displaystyle(\mathcal{D}|\mathcal{E})_{z}=\begin{cases}\frac{\mathcal{D}_{z}}{\Pr_{z\leftarrow\mathcal{D}}[z\in\mathcal{E}]}\quad&\text{if $\displaystyle z\in\mathcal{E}$,}\\ 0\quad&\text{otherwise.}\end{cases}

Slightly overloading the notation, we also use α𝒟1+(1α)𝒟2\displaystyle\alpha\cdot\mathcal{D}_{1}+(1-\alpha)\cdot\mathcal{D}_{2} to denote the mixture of distributions 𝒟1\displaystyle\mathcal{D}_{1} and 𝒟2\displaystyle\mathcal{D}_{2} with mixing weights α\displaystyle\alpha and (1α)\displaystyle(1-\alpha) respectively. Whether +\displaystyle+ means mixture or convolution will be clear from the context unless explicitly stated.

3.2 Differential Privacy

We now recall the basics of differential privacy that we will need. Fix a finite set 𝒳\displaystyle\mathcal{X}, the space of user reports. A dataset X\displaystyle X is an element of 𝒳\displaystyle\mathcal{X}^{*}, namely a tuple consisting of elements of 𝒳\displaystyle\mathcal{X}. Let hist(X)|𝒳|\displaystyle\mathrm{hist}(X)\in\mathbb{N}^{|\mathcal{X}|} be the histogram of X\displaystyle X: for any x𝒳\displaystyle x\in\mathcal{X}, the x\displaystyle xth component of hist(X)\displaystyle\mathrm{hist}(X) is the number of occurrences of x\displaystyle x in the dataset X\displaystyle X. We will consider datasets X,X\displaystyle X,X^{\prime} to be equivalent if they have the same histogram (i.e., the ordering of the elements x1,,xn\displaystyle x_{1},\ldots,x_{n} does not matter). For a multiset 𝒮\displaystyle\mathcal{S} whose elements are in 𝒳\displaystyle\mathcal{X}, we will also write hist(𝒮)\displaystyle\mathrm{hist}(\mathcal{S}) to denote the histogram of 𝒮\displaystyle\mathcal{S} (so that the x\displaystyle xth component is the number of copies of x\displaystyle x in 𝒮\displaystyle\mathcal{S}).

Let n\displaystyle n\in\mathbb{N}, and consider a dataset X=(x1,,xn)𝒳n\displaystyle X=(x_{1},\ldots,x_{n})\in\mathcal{X}^{n}. For an element x𝒳\displaystyle x\in\mathcal{X}, let fX(x)=hist(X)xn\displaystyle f_{X}(x)=\frac{\mathrm{hist}(X)_{x}}{n} be the frequency of x\displaystyle x in X\displaystyle X, namely the fraction of elements of X\displaystyle X that are equal to x\displaystyle x. Two datasets X,X\displaystyle X,X^{\prime} are said to be neighboring if they differ in a single element, meaning that we can write (up to equivalence) X=(x1,x2,,xn)\displaystyle X=(x_{1},x_{2},\ldots,x_{n}) and X=(x1,x2,,xn)\displaystyle X^{\prime}=(x_{1}^{\prime},x_{2},\ldots,x_{n}). In this case, we write XX\displaystyle X\sim X^{\prime}. Let 𝒵\displaystyle\mathcal{Z} be a set; we now define the differential privacy of a randomized function P:𝒳n𝒵\displaystyle P\colon\mathcal{X}^{n}\rightarrow\mathcal{Z} as follows.

Definition 3.1 (Differential privacy (DP) [DMNS06, DKM+06]).

A randomized algorithm P:𝒳n𝒵\displaystyle P\colon\mathcal{X}^{n}\rightarrow\mathcal{Z} is (ε,δ)\displaystyle(\varepsilon,\delta)-DP if for every pair of neighboring datasets XX\displaystyle X\sim X^{\prime} and for every set 𝒮𝒵\displaystyle\mathcal{S}\subseteq\mathcal{Z}, we have

Pr[P(X)𝒮]eεPr[P(X)𝒮]+δ,\displaystyle\Pr[P(X)\in\mathcal{S}]\leq e^{\varepsilon}\cdot\Pr[P(X^{\prime})\in\mathcal{S}]+\delta,

where the probabilities are taken over the randomness in P\displaystyle P. Here, ε0\displaystyle\varepsilon\geq 0 and δ[0,1]\displaystyle\delta\in[0,1].

If δ=0\displaystyle\delta=0, then we use ε\displaystyle\varepsilon-DP for brevity and informally refer to it as pure-DP; if δ>0\displaystyle\delta>0, we refer to it as approximate-DP. We will use the following post-processing property of DP.

Lemma 3.2 (Post-processing, e.g., [DR14]).

If P\displaystyle P is (ε,δ)\displaystyle(\varepsilon,\delta)-DP, then for every randomized function A\displaystyle A, the composed function AP\displaystyle A\circ P is (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

DP is nicely characterized by the following divergence between distributions, which will be used throughout the paper.

Definition 3.3 (Hockey Stick Divergence).

For any ε>0\displaystyle\varepsilon>0, the eε\displaystyle e^{\varepsilon}-hockey stick divergence between distributions 𝒟\displaystyle\mathcal{D} and 𝒟\displaystyle\mathcal{D}^{\prime} is defined as dε(𝒟||𝒟):=xsupp(𝒟)[𝒟xeε𝒟x]+\displaystyle d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime}):=\sum_{x\in\mathrm{supp}(\mathcal{D})}[\mathcal{D}_{x}-e^{\varepsilon}\cdot\mathcal{D}^{\prime}_{x}]_{+}.

We next list two useful facts about the hockey stick divergence between distributions.

Proposition 3.4.

Let 𝒟\displaystyle\mathcal{D} and 𝒟\displaystyle\mathcal{D}^{\prime} be any distributions. Then, the following hold:

  1. 1.

    Let 𝒟com\displaystyle\mathcal{D}_{com} be another distribution. Then, for any function f\displaystyle f, it holds that

    dε(f(𝒟𝒟com)||f(𝒟𝒟com))dε(𝒟||𝒟).\displaystyle d_{\varepsilon}(f(\mathcal{D}\otimes\mathcal{D}_{com})||f(\mathcal{D}^{\prime}\otimes\mathcal{D}_{com}))\leq d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime}).
  2. 2.

    Suppose we can decompose 𝒟=iαi𝒟i\displaystyle\mathcal{D}=\sum_{i\in\mathcal{I}}\alpha_{i}\mathcal{D}_{i} and 𝒟=iβi𝒟i\displaystyle\mathcal{D}^{\prime}=\sum_{i\in\mathcal{I}}\beta_{i}\mathcal{D}^{\prime}_{i}, where αi\displaystyle\alpha_{i}’s and βi\displaystyle\beta_{i}’s are tuples of positives reals summing up to 1\displaystyle 1 and 𝒟i\displaystyle\mathcal{D}_{i}’s and 𝒟i\displaystyle\mathcal{D}^{\prime}_{i}’s are distributions, then

    dε(𝒟||𝒟)iαidε+ln(βi/αi)(𝒟i||𝒟i).\displaystyle d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime})\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot d_{\varepsilon+\ln(\beta_{i}/\alpha_{i})}(\mathcal{D}_{i}||\mathcal{D}^{\prime}_{i}).
Proof.

Item (1) follows from the post-processing property of DP, together with the definition of the hockey stick divergence.

To prove Item (2), we note that

dε(𝒟||𝒟)\displaystyle\displaystyle d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime}) =xsupp(𝒟)[𝒟xeε𝒟x]+\displaystyle\displaystyle=\sum_{x\in\mathrm{supp}(\mathcal{D})}[\mathcal{D}_{x}-e^{\varepsilon}\cdot\mathcal{D}^{\prime}_{x}]_{+}
=xsupp(𝒟)[iαi(𝒟i)xeε(iβi(𝒟i)x)]+\displaystyle\displaystyle=\sum_{x\in\mathrm{supp}(\mathcal{D})}\left[\sum_{i\in\mathcal{I}}\alpha_{i}(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\left(\sum_{i\in\mathcal{I}}\beta_{i}(\mathcal{D}^{\prime}_{i})_{x}\right)\right]_{+}
ixsupp(𝒟i)[αi(𝒟i)xeεβi(𝒟i)x]+\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\sum_{x\in\mathrm{supp}(\mathcal{D}_{i})}\left[\alpha_{i}(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\beta_{i}(\mathcal{D}^{\prime}_{i})_{x}\right]_{+}
iαixsupp(𝒟i)[(𝒟i)xeεβi/αi(𝒟i)x]+\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot\sum_{x\in\mathrm{supp}(\mathcal{D}_{i})}\left[(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\beta_{i}/\alpha_{i}(\mathcal{D}^{\prime}_{i})_{x}\right]_{+}
iαidε+ln(βi/αi)(𝒟i||𝒟i).\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot d_{\varepsilon+\ln(\beta_{i}/\alpha_{i})}(\mathcal{D}_{i}||\mathcal{D}^{\prime}_{i}).\qed

3.3 Shuffle Model

We briefly review the shuffle model of DP [BEM+17, EFM+19, CSU+18]. The input to the model is a dataset (x1,,xn)𝒳n\displaystyle(x_{1},\ldots,x_{n})\in\mathcal{X}^{n}, where item xi𝒳\displaystyle x_{i}\in\mathcal{X} is held by user i\displaystyle i. A protocol P:𝒳𝒵\displaystyle P\colon\mathcal{X}\rightarrow\mathcal{Z} in the shuffle model consists of three algorithms:

  • The local randomizer R:𝒳\displaystyle R\colon\mathcal{X}\rightarrow\mathcal{M}^{*} takes as input the data of one user, xi𝒳\displaystyle x_{i}\in\mathcal{X}, and outputs a sequence (yi,1,,yi,mi)\displaystyle(y_{i,1},\ldots,y_{i,m_{i}}) of messages; here mi\displaystyle m_{i} is a positive integer.

    To ease discussions in the paper, we will further assume that the randomizer R\displaystyle R pre-shuffles its messages. That is, it applies a random permutation π:[mi][mi]\displaystyle\pi\colon[m_{i}]\to[m_{i}] to the sequence (yi,1,,yi,mi)\displaystyle(y_{i,1},\ldots,y_{i,m_{i}}) before outputting it.111111Therefore, for every x𝒳\displaystyle x\in\mathcal{X} and any two tuples z1,z2\displaystyle z_{1},z_{2}\in\mathcal{M}^{*} that are equivalent up to a permutation, R(x)\displaystyle R(x) outputs them with the same probability.

  • The shuffler S:\displaystyle S\colon\mathcal{M}^{*}\rightarrow\mathcal{M}^{*} takes as input a sequence of elements of \displaystyle\mathcal{M}, say (y1,,ym)\displaystyle(y_{1},\ldots,y_{m}), and outputs a random permutation, i.e., the sequence (yπ(1),,yπ(m))\displaystyle(y_{\pi(1)},\ldots,y_{\pi(m)}), where πSm\displaystyle\pi\in S_{m} is a uniformly random permutation on [m]\displaystyle[m]. The input to the shuffler will be the concatenation of the outputs of the local randomizers.

  • The analyzer A:𝒵\displaystyle A\colon\mathcal{M}^{*}\rightarrow\mathcal{Z} takes as input a sequence of elements of \displaystyle\mathcal{M} (which will be taken to be the output of the shuffler) and outputs an answer in 𝒵\displaystyle\mathcal{Z} that is taken to be the output of the protocol P\displaystyle P.

We will write P=(R,S,A)\displaystyle P=(R,S,A) to denote the protocol whose components are given by R\displaystyle R, S\displaystyle S, and A\displaystyle A. The main distinction between the shuffle and local model is the introduction of the shuffler S\displaystyle S between the local randomizer and the analyzer. As in the local model, the analyzer is untrusted in the shuffle model; hence privacy must be guaranteed with respect to the input to the analyzer, i.e., the output of the shuffler. Formally, we have:

Definition 3.5 (DP in the Shuffle Model, [EFM+19, CSU+18]).

A protocol P=(R,S,A)\displaystyle P=(R,S,A) is (ε,δ)\displaystyle(\varepsilon,\delta)-DP if, for any dataset X=(x1,,xn)\displaystyle X=(x_{1},\ldots,x_{n}), the algorithm

(x1,,xn)S(R(x1),,R(xn))\displaystyle(x_{1},\ldots,x_{n})\mapsto S(R(x_{1}),\ldots,R(x_{n}))

is (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

Notice that the output of S(R(x1),,R(xn))\displaystyle S(R(x_{1}),\ldots,R(x_{n})) can be simulated by an algorithm that takes as input the multiset consisting of the union of the elements of R(x1),,R(xn)\displaystyle R(x_{1}),\ldots,R(x_{n}) (which we denote as iR(xi)\displaystyle\bigcup_{i}R(x_{i}), with a slight abuse of notation) and outputs a uniformly random permutation of them. Thus, by Lemma 3.2, it can be assumed without loss of generality for privacy analyses that the shuffler simply outputs the multiset iR(xi)\displaystyle\bigcup_{i}R(x_{i}). For the purpose of analyzing the accuracy of the protocol P=(R,S,A)\displaystyle P=(R,S,A), we define its output on the dataset X=(x1,,xn)\displaystyle X=(x_{1},\ldots,x_{n}) to be P(X):=A(S(R(x1),,R(xn)))\displaystyle P(X):=A(S(R(x_{1}),\ldots,R(x_{n}))). We also remark that the case of local DP, formalized in Definition 3.6, is a special case of the shuffle model where the shuffler S\displaystyle S is replaced by the identity function:

Definition 3.6 (Local DP [KLN+11]).

A protocol P=(R,A)\displaystyle P=(R,A) is (ε,δ)\displaystyle(\varepsilon,\delta)-DP in the local model (or (ε,δ)\displaystyle(\varepsilon,\delta)-locally DP) if the function xR(x)\displaystyle x\mapsto R(x) is (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

We say that the output of the protocol P\displaystyle P on an input dataset X=(x1,,xn)\displaystyle X=(x_{1},\ldots,x_{n}) is P(X):=A(R(x1),,R(xn))\displaystyle P(X):=A(R(x_{1}),\ldots,R(x_{n})).

We denote DP in the shuffle model by DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}}, and the special case where each user can send at most121212We may assume w.l.o.g. that each user sends exactly k\displaystyle k messages; otherwise, we may define a new symbol \displaystyle\perp and make each user sends \displaystyle\perp messages so that the number of messages becomes exactly k\displaystyle k. k\displaystyle k messages by DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}. We denote DP in the local model by DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

Public-Coin DP.

The default setting for local and shuffle models is private-coin, i.e., there is no randomness shared between the randomizers and the analyzer. We will also study the public-coin variants of the local and shuffle models. In the public-coin setting, each local randomizer also takes a public random string α{0,1}\displaystyle\alpha\leftarrow\{0,1\}^{*} as input. The analyzer is also given the public random string α\displaystyle\alpha. We use Rα(x)\displaystyle R_{\alpha}(x) to denote the local randomizer with public random string being fixed to α\displaystyle\alpha. At the start of the protocol, all users jointly sample a public random string from a publicly known distribution 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub}.

Now, we say that a protocol P=(R,A)\displaystyle P=(R,A) is (ε,δ)\displaystyle(\varepsilon,\delta)-DP in the public-coin local model, if the function

xα𝒟𝗉𝗎𝖻(α,Rα(x))\displaystyle x\underset{\alpha\leftarrow\mathcal{D}_{\sf pub}}{\mapsto}(\alpha,R_{\alpha}(x))

is (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

Similarly, we say that a protocol P=(R,S,A)\displaystyle P=(R,S,A) is (ε,δ)\displaystyle(\varepsilon,\delta)-DP in the public-coin shuffle model, if for any dataset X=(x1,,xn)\displaystyle X=(x_{1},\ldots,x_{n}), the algorithm

(x1,,xn)α𝒟𝗉𝗎𝖻(α,S(Rα(x1),,Rα(xn)))\displaystyle(x_{1},\ldots,x_{n})\underset{\alpha\leftarrow\mathcal{D}_{\sf pub}}{\mapsto}(\alpha,S(R_{\alpha}(x_{1}),\ldots,R_{\alpha}(x_{n})))

is (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

3.4 Useful Divergences

We will make use of two important divergences between distributions, the KL-divergence and the χ2\displaystyle\chi^{2}-divergence, defined as

KL(P||Q)=𝔼zPlog(PzQz)andχ2(P||Q)=𝔼zQ[PzQzQz]2.\displaystyle KL(P||Q)=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow P}\log\left(\frac{P_{z}}{Q_{z}}\right)\quad\text{and}\quad\chi^{2}(P||Q)=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\left[\frac{P_{z}-Q_{z}}{Q_{z}}\right]^{2}.

We rely on the key fact that χ2\displaystyle\chi^{2}-divergence upper-bounds KL-divergence [GS02], that is,

KL(P||Q)χ2(P||Q).\displaystyle\mathrm{KL}(P||Q)\leq\chi^{2}(P||Q).

We will also use Pinsker’s inequality, whereby the total variation distance lower-bounds the KL-divergence:

KL(P||Q)2ln2PQTV2.\displaystyle\mathrm{KL}(P||Q)\geq\frac{2}{\ln 2}\|P-Q\|_{TV}^{2}.

3.5 Fourier Analysis

We now review some basic Fourier analysis and then introduce two inequalities that will be heavily used in our proofs. For a function f:{0,1}D\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}, its Fourier transform is given by the function f^(S):=𝔼x𝒰D[f(x)(1)iSxi]\displaystyle\hat{f}(S):=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}[f(x)\cdot(-1)^{\sum_{i\in S}x_{i}}]. We also define f22=𝔼x𝒰D[f(x)2]\displaystyle\|f\|_{2}^{2}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}[f(x)^{2}]. For k\displaystyle k\in\mathbb{N}, we define the level-k\displaystyle k Fourier weight as 𝐖k[f]:=S[D],|S|=kf^(S)2\displaystyle\mathbf{W}^{k}[f]:=\sum_{S\subseteq[D],|S|=k}\hat{f}(S)^{2}. For convenience, for s{0,1}D\displaystyle s\in\{0,1\}^{D}, we will also write f^(s)\displaystyle\hat{f}(s) to denote f(χs)\displaystyle f(\chi_{s}), where χs\displaystyle\chi_{s} is the set {i:i[D]si=1}\displaystyle\{i:i\in[D]\wedge s_{i}=1\}. One key technical lemma is the Level-1 Inequality from [O’D14], which was also used in [GGK+19].

Lemma 3.7 (Level-1 Inequality).

Suppose f:{0,1}D0\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}_{\geq 0} is a non-negative-valued function with f(x)[0,L]\displaystyle f(x)\in[0,L] for all x{0,1}D\displaystyle x\in\{0,1\}^{D}, and 𝔼x𝒰D[f(x)]1\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\sim\mathcal{U}_{D}}[f(x)]\leq 1. Then, 𝐖1[f]6ln(L+1)\displaystyle\mathbf{W}^{1}[f]\leq 6\ln(L+1).

We also need the standard Parseval’s identity.

Lemma 3.8 (Parseval’s Identity).

For all functions f:{0,1}D\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R},

f22=S[D]f^(S)2.\displaystyle\|f\|_{2}^{2}=\sum_{S\subseteq[D]}\hat{f}(S)^{2}.

4 Low-Privacy DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} and DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} Lower Bounds for CountDistinct

In this section, we prove Theorem 1.2 and Theorem 1.4. In Section 4.1, we introduce some necessary definitions and notation. In Section 4.2, we prove our lower bound for low-privacy (private-coin) DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols computing CountDistinct. In Section 4.3, we show the improved connection between DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} and DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, which implies our lower bounds for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols for CountDistinct. In Section 4.4, we describe how to adapt the proof to public-coin protocols.

4.1 Preliminaries

Recall that we use the notations 𝖯𝗈𝗂(λ)\displaystyle\vec{\mathsf{Poi}}(\vec{\lambda}) and 𝔼[𝖯𝗈𝗂(U)]\displaystyle\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U})] to denote multi-dimensional Poisson distributions and their mixtures, respectively (see Section 2.1 for the precise definitions).

We also recall the key additive property of multi-dimensional Poisson distributions: for α,βD\displaystyle\vec{\alpha},\vec{\beta}\in\mathbb{R}^{D}, we have that

𝖯𝗈𝗂(α)+𝖯𝗈𝗂(β)=𝖯𝗈𝗂(α+β).\displaystyle\vec{\mathsf{Poi}}(\vec{\alpha})+\vec{\mathsf{Poi}}(\vec{\beta})=\vec{\mathsf{Poi}}(\vec{\alpha}+\vec{\beta}).

4.2 Low-Privacy DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} Lower Bounds for CountDistinct

We will first prove the low-privacy DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} lower bounds in the private-coin setting, which is captured by the following theorem.

Theorem 4.1 (The Private-Coin Case of Theorem 1.2).

For some ε=ln(n/Θ(log6n))\displaystyle\varepsilon=\ln(n/\Theta(\log^{6}n)) and D=Θ(n/log4n)\displaystyle D=\Theta(n/\log^{4}n), if P\displaystyle P is a private-coin (ε,nω(1))\displaystyle(\varepsilon,n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol, then it cannot solve CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D) and probability at least 0.99\displaystyle 0.99.

4.2.1 Technical Lemmas

Now we need the following construction from [WY19] (which uses a classical result from [Tim14, 2.11.1]).

Lemma 4.2 ([WY19]).

There is a constant c\displaystyle c such that, for all L\displaystyle L\in\mathbb{N}, there are two distributions U\displaystyle U and V\displaystyle V supported on {0}[1,Λ]\displaystyle\{0\}\cup[1,\Lambda] for Λ=cL2\displaystyle\Lambda=c\cdot L^{2}, such that 𝔼[U]=𝔼[V]=1\displaystyle\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1, U0V0>0.9\displaystyle U_{0}-V_{0}>0.9, and 𝔼[Uj]=𝔼[Vj]\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for every j[L]\displaystyle j\in[L].

The following lemma is crucial for our proof. Its proof uses the moment matching technique [WY16, JHW18, WY19, Yan19], and can be found in Appendix A.

Lemma 4.3.

Let U,V\displaystyle U,V be two random variables supported on [0,Λ]\displaystyle[0,\Lambda] such that 𝔼[Uj]=𝔼[Vj]\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for all j{1,2,,L}\displaystyle j\in\{1,2,\dotsc,L\}, where L1\displaystyle L\geq 1. Let D\displaystyle D\in\mathbb{N} and θ,λ(0)D\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D} such that θ1=1\displaystyle\|\vec{\theta}\|_{1}=1. Let 𝒟θ\displaystyle\mathcal{D}_{\vec{\theta}} be the distribution over [D]\displaystyle[D] corresponding to θ\displaystyle\vec{\theta}. Suppose that

Pri𝒟θ[λi2Λ2θi]112Λ.\displaystyle\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\vec{\lambda}_{i}\geq 2\Lambda^{2}\cdot\vec{\theta}_{i}]\geq 1-\frac{1}{2\Lambda}.

Then,

𝔼[𝖯𝗈𝗂(Uθ+λ)]𝔼[𝖯𝗈𝗂(Vθ+λ)]TV21L!.\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\frac{1}{L!}.

Finally, we need an observation that for a DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol P\displaystyle P solving CountDistinct, we can assume without loss of generality that the analyzer of P\displaystyle P only sees the histogram of the messages.

Lemma 4.4.

For any DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol P=(R,A)\displaystyle P=(R,A) for CountDistinct, there exists an analyzer A\displaystyle A^{\prime} which only sees the histogram of the messages, and achieves the same accuracy and error as that of A\displaystyle A.

Proof.

Let n\displaystyle n be the number of users. Given the histogram, A\displaystyle A^{\prime} first constructs a sequence of messages Sn\displaystyle S\in\mathcal{M}^{n} consistent with the histogram. Then, it applies a random permutation π:[n][n]\displaystyle\pi\colon[n]\to[n] to S\displaystyle S to obtain a new sequence π(S)\displaystyle\pi(S). Finally, it simply outputs A(π(S))\displaystyle A(\pi(S)).

Note that applying a random permutation on the messages is equivalent to applying a random permutation on the user inputs in the dataset. Hence, the new protocol P=(R,A)\displaystyle P^{\prime}=(R,A^{\prime}) is equivalent to running P\displaystyle P on a random permutation of the dataset. The lemma follows from the fact that a random permutation does not change the number of distinct elements. ∎

4.2.2 Construction of the Hard Dataset Distributions

In the rest of the section, we use n\displaystyle n to denote a parameter controlling the number of users. The actual number of users n¯\displaystyle\bar{n} will be later set to a number in the interval [n,2n]\displaystyle[n,2n]. In the following, we fix a randomizer R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M} which is (εR,δ)\displaystyle(\varepsilon_{R},\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} on n¯\displaystyle\bar{n} users, for some εR=Θ(n¯/log6n¯)\displaystyle\varepsilon_{R}=\Theta(\bar{n}/\log^{6}\bar{n}) to be specified later. Before constructing our two hard distributions over datasets, we set some parameters that will be used in the construction:

  • We set L=logn\displaystyle L=\log n and note that 1L!1/n4\displaystyle\frac{1}{L!}\leq 1/n^{4} for large enough n\displaystyle n.

  • Applying Lemma 4.2, for Λ=Θ(L2)=Θ(log2n)\displaystyle\Lambda=\Theta(L^{2})=\Theta(\log^{2}n), we obtain two random variables U\displaystyle U and V\displaystyle V supported on {0}[1,Λ]\displaystyle\{0\}\cup[1,\Lambda], such that 𝔼[U]=𝔼[V]=1\displaystyle\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1, U0V0>0.9\displaystyle U_{0}-V_{0}>0.9, and 𝔼[Uj]=𝔼[Vj]\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for every j[L]\displaystyle j\in[L].

  • We set Γ=8Λ2=Θ(log4n)\displaystyle\Gamma=8\Lambda^{2}=\Theta(\log^{4}n) and D=n/Γ=Θ(n/log4n)\displaystyle D=n/\Gamma=\Theta(n/\log^{4}n). We are going to construct instances where inputs are from the universe 𝒳=[D]\displaystyle\mathcal{X}=[D].

  • We set n¯=n+Dn0.99\displaystyle\bar{n}=n+D-n^{0.99}.

  • Let W=(log2n)4Λ2\displaystyle W=(\log^{2}n)\cdot 4\Lambda^{2}. We set εR\displaystyle\varepsilon_{R} so that n/2εR=W=Θ(log6n)\displaystyle n/2^{\varepsilon_{R}}=W=\Theta(\log^{6}n). Hence, R\displaystyle R is (ln(n/W),nω(1))\displaystyle(\ln(n/W),n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

Now, for a distribution U\displaystyle U over 0\displaystyle\mathbb{R}^{\geq 0} and a non-empty subset E\displaystyle E of [D]\displaystyle[D], the dataset distribution 𝒟U,E\displaystyle\mathcal{D}^{U,E} is constructed as follows:

  1. 1.

    For each i[D]\displaystyle i\in[D], we draw λiU\displaystyle\lambda_{i}\leftarrow U, and ni𝖯𝗈𝗂(λi)\displaystyle n_{i}\leftarrow\mathsf{Poi}(\lambda_{i}), and add ni\displaystyle n_{i} many users with input i\displaystyle i.

  2. 2.

    For each jE\displaystyle j\in E, we draw mj𝖯𝗈𝗂(n/|E|)\displaystyle m_{j}\leftarrow\mathsf{Poi}(n/|E|), and add mj\displaystyle m_{j} many users with input j\displaystyle j.131313Note that we here use a Poisson distribution slightly differently from the construction in Section 2 (namely, 𝖯𝗈𝗂(n/|E|)\displaystyle\mathsf{Poi}(n/|E|) instead of 𝖯𝗈𝗂((nD)/|E|)\displaystyle\mathsf{Poi}((n-D)/|E|)) in order to simply the later calculations.

For clarity of exposition, we will use the histogram of the protocol to denote the histogram of the messages in the transcript of the protocol. Our goal is to show that for some “good” subset E[D]\displaystyle E\subseteq[D], the following hold:

  1. 1.

    The distributions of the histogram of the protocol under 𝒟U,E\displaystyle\mathcal{D}^{U,E} and 𝒟V,E\displaystyle\mathcal{D}^{V,E} are very close.

  2. 2.

    With high probability, the number of distinct elements in datasets from 𝒟U,E\displaystyle\mathcal{D}^{U,E} is Ω(D)\displaystyle\Omega(D) smaller than in datasets from 𝒟V,E\displaystyle\mathcal{D}^{V,E}.

Clearly, given the above two conditions and Lemma 4.4, no protocol with randomizer R\displaystyle R can estimate the number of distinct elements within o(D)\displaystyle o(D) error and with constant probability.

4.2.3 Conditions on a Good Subset E\displaystyle E

Given a subset E\displaystyle E, we let νE=iER(i)n|E|\displaystyle\vec{\nu}^{E}=\sum_{i\in E}R(i)\cdot\frac{n}{|E|}. We also set μ=i[D]R(i)\displaystyle\vec{\mu}=\sum_{i\in[D]}R(i). We now specify our conditions on a subset E[D]\displaystyle E\subseteq[D] being good. Let ε1=0.01\displaystyle\varepsilon_{1}=0.01. We say E\displaystyle E is good if the following two conditions hold:

  1. 1.

    0<|E|<2ε1|D|\displaystyle 0<|E|<2\varepsilon_{1}\cdot|D|.

  2. 2.

    For each i[D]\displaystyle i\in[D],

    PrzR(i)[νzE2Λ2μz]11/2Λ.\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}]\geq 1-1/2\Lambda.

We claim that a good subset E\displaystyle E exists. In fact, we give a probabilistic construction of E\displaystyle E that succeeds with high probability:

Lemma 4.5.

If we include each element of i[D]\displaystyle i\in[D] in E\displaystyle E independently with probability ε1\displaystyle\varepsilon_{1}, then E\displaystyle E is good with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}.

4.2.4 The Lower Bound

Before proving Lemma 4.5, we show that for a good E\displaystyle E, the distributions 𝒟U,E\displaystyle\mathcal{D}^{U,E} and 𝒟V,E\displaystyle\mathcal{D}^{V,E} satisfy our desired properties, and thereby imply our DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} lower bound. For a dataset distribution 𝒟\displaystyle\mathcal{D}, we use 𝖧𝗂𝗌𝗍R(𝒟)\displaystyle\mathsf{Hist}_{R}(\mathcal{D}) to denote the corresponding distribution of the histogram of the transcript, if all users apply the randomizer R\displaystyle R. For a dataset I\displaystyle I, we use CountDistinct(I)\displaystyle\textsf{\small CountDistinct}(I) to denote the number of distinct elements in it.

Lemma 4.6.

For a good subset E\displaystyle E of [D]\displaystyle[D], the following hold:

  1. 1.

    We have that

    𝖧𝗂𝗌𝗍R(𝒟U,E)𝖧𝗂𝗌𝗍R(𝒟V,E)TV1/n.\displaystyle\|\mathsf{Hist}_{R}(\mathcal{D}^{U,E})-\mathsf{Hist}_{R}(\mathcal{D}^{V,E})\|_{TV}\leq 1/n.
  2. 2.

    There are two constants τ1<τ2\displaystyle\tau_{1}<\tau_{2} such that

    PrI1𝒟U,E[CountDistinct(I1)<τ1D]1nω(1),\displaystyle\Pr_{I_{1}\leftarrow\mathcal{D}^{U,E}}[\textsf{\small CountDistinct}(I_{1})<\tau_{1}\cdot D]\geq 1-n^{-\omega(1)},

    and

    PrI2𝒟V,E[CountDistinct(I2)>τ2D]1nω(1).\displaystyle\Pr_{I_{2}\leftarrow\mathcal{D}^{V,E}}[\textsf{\small CountDistinct}(I_{2})>\tau_{2}\cdot D]\geq 1-n^{-\omega(1)}.
Proof.
Proof of Item (1).

In the following, we use ν\displaystyle\vec{\nu} to denote νE\displaystyle\vec{\nu}^{E} for simplicity. We first construct D\displaystyle D vectors {ν(i)}i[D]\displaystyle\{\vec{\nu}^{(i)}\}_{i\in[D]} as follows: for each z\displaystyle z\in\mathcal{M}, if νz2Λ2μz\displaystyle\vec{\nu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}, then for all i[D]\displaystyle i\in[D] we set (ν(i))z=R(i)z2Λ2\displaystyle(\vec{\nu}^{(i)})_{z}=R(i)_{z}\cdot 2\Lambda^{2}, otherwise we set (ν(i))z=0\displaystyle(\vec{\nu}^{(i)})_{z}=0 for all i[D]\displaystyle i\in[D]. Note that for each z\displaystyle z, we have that (i[D]ν(i))zνz\displaystyle\left(\sum_{i\in[D]}\vec{\nu}^{(i)}\right)_{z}\leq\vec{\nu}_{z}. Let ν(0):=ν(i[D]ν(i))\displaystyle\vec{\nu}^{(0)}:=\vec{\nu}-\left(\sum_{i\in[D]}\vec{\nu}^{(i)}\right). By definition, it follows that ν(0)\displaystyle\vec{\nu}^{(0)} is a non-negative vector. Now, 𝖧𝗂𝗌𝗍R(𝒟U,E)\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E}) and 𝖧𝗂𝗌𝗍R(𝒟V,E)\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E}) can be seen as distributions over histograms in \displaystyle\mathbb{N}^{\mathcal{M}}. Let X1,X2,,XD\displaystyle X_{1},X_{2},\dotsc,X_{D} be D\displaystyle D independent random variables distributed as U\displaystyle U. By the construction of 𝒟U,E\displaystyle\mathcal{D}^{U,E}, we have that

𝖧𝗂𝗌𝗍R(𝒟U,E)=\displaystyle\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E})= 𝖯𝗈𝗂(ν)+i=1D𝖯𝗈𝗂(XiR(i))\displaystyle\displaystyle\vec{\mathsf{Poi}}(\vec{\nu})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(X_{i}\cdot R(i))
=\displaystyle\displaystyle= 𝖯𝗈𝗂(ν(0))+i=1D𝖯𝗈𝗂(XiR(i)+ν(i)).\displaystyle\displaystyle\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(X_{i}\cdot R(i)+\vec{\nu}^{(i)}).

Similarly, let Y1,Y2,,YD\displaystyle Y_{1},Y_{2},\dotsc,Y_{D} be D\displaystyle D independent random variables distributed as V\displaystyle V. We have that

𝖧𝗂𝗌𝗍R(𝒟V,E)=𝖯𝗈𝗂(ν(0))+i=1D𝖯𝗈𝗂(YiR(i)+ν(i)).\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E})=\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(Y_{i}\cdot R(i)+\vec{\nu}^{(i)}).

Since for each i[D]\displaystyle i\in[D], we have that

PrzR(i)[(ν(i))z2Λ2R(i)z]11/2Λ.\displaystyle\Pr_{z\leftarrow R(i)}[(\vec{\nu}^{(i)})_{z}\geq 2\Lambda^{2}\cdot R(i)_{z}]\geq 1-1/2\Lambda.

Applying Lemma 4.3, for each i[D]\displaystyle i\in[D], we have that

𝖯𝗈𝗂(XiR(i)+ν(i))𝖯𝗈𝗂(YiR(i)+ν(i))TV(1L!)1/21/n2.\displaystyle\|\vec{\mathsf{Poi}}(X_{i}\cdot R(i)+\vec{\nu}^{(i)})-\vec{\mathsf{Poi}}(Y_{i}\cdot R(i)+\vec{\nu}^{(i)})\|_{TV}\leq\left(\frac{1}{L!}\right)^{1/2}\leq 1/n^{2}.

Therefore,

𝒟U,E𝒟V,ETVi=1D𝖯𝗈𝗂(UR(i)+ν(i))𝖯𝗈𝗂(VR(i)+ν(i))TVD(1L!)1/21/n.\displaystyle\|\mathcal{D}^{U,E}-\mathcal{D}^{V,E}\|_{TV}\leq\sum_{i=1}^{D}\|\mathsf{Poi}(U\cdot R(i)+\vec{\nu}^{(i)})-\mathsf{Poi}(V\cdot R(i)+\vec{\nu}^{(i)})\|_{TV}\leq D\cdot\left(\frac{1}{L!}\right)^{1/2}\leq 1/n.
Proof of Item (2).

Let γU=𝔼[eU]\displaystyle\gamma_{U}=\operatornamewithlimits{\mathbb{E}}[e^{-U}] and γV=𝔼[eV]\displaystyle\gamma_{V}=\operatornamewithlimits{\mathbb{E}}[e^{-V}]. By Lemma 4.2, we have that U00.9\displaystyle U_{0}\geq 0.9, V00.1\displaystyle V_{0}\leq 0.1, and U,V\displaystyle U,V are supported on {0}[1,Λ]\displaystyle\{0\}\cup[1,\Lambda]. Hence, it follows that γUU00.9\displaystyle\gamma_{U}\geq U_{0}\geq 0.9 and γVe1(1V0)+V0e10.5\displaystyle\gamma_{V}\leq e^{-1}(1-V_{0})+V_{0}\cdot e^{-1}\leq 0.5.

Now, consider the construction of 𝒟U,E\displaystyle\mathcal{D}^{U,E}. For every i[D]\displaystyle i\in[D], at least one user with input i\displaystyle i is added to the dataset during phase (1) with probability 1γU\displaystyle 1-\gamma_{U}. Moreover, these events are mutually independent. Hence, by a simple Chernoff bound, with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}, the number of distinct elements in the dataset after phase (1) is no greater than (1γU+0.01)D\displaystyle(1-\gamma_{U}+0.01)\cdot D. Since the second phase can add at most |E|=0.02D\displaystyle|E|=0.02D many distinct elements, we can set τ1=1γU+0.03\displaystyle\tau_{1}=1-\gamma_{U}+0.03.

Similarly, for instances generated from 𝒟V,E\displaystyle\mathcal{D}^{V,E}, with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}, the number of distinct elements in the dataset after phase (1) is at least (1γV0.01)D\displaystyle(1-\gamma_{V}-0.01)\cdot D. We can set τ2=1γV0.01\displaystyle\tau_{2}=1-\gamma_{V}-0.01.

By our condition on γU\displaystyle\gamma_{U} and γV\displaystyle\gamma_{V}, we have that τ2>τ1\displaystyle\tau_{2}>\tau_{1}, which completes the proof. ∎

We are now ready to prove Theorem 4.1. One complication is that datasets from 𝒟U,E\displaystyle\mathcal{D}^{U,E} and 𝒟V,E\displaystyle\mathcal{D}^{V,E} may not have the same number of users. We address this issue by “throwing out” extra users randomly and obtain distributions over datasets with exactly n¯\displaystyle\bar{n} many users.

Proof of Theorem 4.1.

Consider the 𝒟U,E\displaystyle\mathcal{D}^{U,E} and 𝒟V,E\displaystyle\mathcal{D}^{V,E}. By a simple Chernoff bound, we have that with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}, the number of users lies in [n+Dn0.99,n+D+n0.99]\displaystyle[n+D-n^{0.99},n+D+n^{0.99}].

Recall that n¯=n+Dn0.99\displaystyle\bar{n}=n+D-n^{0.99}. We construct the distribution 𝒟¯U,E\displaystyle\bar{\mathcal{D}}^{U,E} as follows: to generate a dataset from 𝒟¯U,E\displaystyle\bar{\mathcal{D}}^{U,E}, we take a sample dataset I\displaystyle I from 𝒟U,E\displaystyle\mathcal{D}^{U,E}, and if there are nI>n¯\displaystyle n_{I}>\bar{n} users in I\displaystyle I, we delete nIn¯\displaystyle n_{I}-\bar{n} users uniformly at random, and output I\displaystyle I. We similarly construct another distribution 𝒟¯V,E\displaystyle\bar{\mathcal{D}}^{V,E}. Note that with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}, we delete at most 2n0.99\displaystyle 2n^{0.99} users in the construction of 𝒟¯U,E\displaystyle\bar{\mathcal{D}}^{U,E} (as well as in that of 𝒟¯V,E\displaystyle\bar{\mathcal{D}}^{V,E}).

Now, both 𝒟¯U,E\displaystyle\bar{\mathcal{D}}^{U,E} and 𝒟¯V,E\displaystyle\bar{\mathcal{D}}^{V,E} output datasets with exactly n¯\displaystyle\bar{n} users with probability 1nω(1)\displaystyle 1-n^{-\omega(1)}. This means that if there is a protocol solving CountDistinct with n¯\displaystyle\bar{n} users with error o(D)\displaystyle o(D), then by Lemma 4.4, Item (2) of Lemma 4.6 and since 2n0.99=o(D)\displaystyle 2n^{0.99}=o(D), the analyzer of the protocol should be able to distinguish 𝖧𝗂𝗌𝗍R(𝒟¯U,E)\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E}) and 𝖧𝗂𝗌𝗍R(𝒟¯V,E)\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E}) with at least a constant probability. Therefore, we have that

𝖧𝗂𝗌𝗍R(𝒟¯U,E)𝖧𝗂𝗌𝗍R(𝒟¯V,E)TV=Ω(1).\displaystyle\|\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})\|_{TV}=\Omega(1).

On the other hand, 𝖧𝗂𝗌𝗍R(𝒟¯U,E)\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E}) (respectively, 𝖧𝗂𝗌𝗍R(𝒟¯V,E)\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})) can also be constructed by taking a sample from 𝖧𝗂𝗌𝗍R(𝒟U,E)\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E}) (respectively, 𝖧𝗂𝗌𝗍R(𝒟V,E)\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E})) and throwing out some random messages until at most n¯\displaystyle\bar{n} messages remain. Since post-processing does not increase statistical distance, by Item (1) of Lemma 4.6, we have that

𝖧𝗂𝗌𝗍R(𝒟¯U,E)𝖧𝗂𝗌𝗍R(𝒟¯V,E)TV𝖧𝗂𝗌𝗍R(𝒟U,E)𝖧𝗂𝗌𝗍R(𝒟V,E)TV1/n,\displaystyle\|\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})\|_{TV}\leq\|\mathsf{Hist}_{R}(\mathcal{D}^{U,E})-\mathsf{Hist}_{R}(\mathcal{D}^{V,E})\|_{TV}\leq 1/n,

a contradiction. ∎

4.2.5 A Probabilistic Construction of Good E\displaystyle E

We need the following proposition for the proof of Lemma 4.5.

Proposition 4.7.

Let R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M} be an (ε,δ)\displaystyle(\varepsilon,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} randomizer. For every i,j𝒳\displaystyle i,j\in\mathcal{X}, it follows that

PrzR(i)[R(i)z2eεR(j)z]2δ.\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}]\leq 2\delta.
Proof.

Let 𝒯\displaystyle\mathcal{T} be the set {z:R(i)z2eεR(j)zz}\displaystyle\{z:R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}\wedge z\in\mathcal{M}\}. Since R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, it follows that

R(i)𝒯eεR(j)𝒯+δ.\displaystyle R(i)_{\mathcal{T}}\leq e^{\varepsilon}\cdot R(j)_{\mathcal{T}}+\delta.

By the definition of the set 𝒯\displaystyle\mathcal{T}, it follows that

R(j)𝒯=z𝒯R(j)z12eεz𝒯R(i)z=12eεR(i)𝒯.\displaystyle R(j)_{\mathcal{T}}=\sum_{z\in\mathcal{T}}R(j)_{z}\leq\frac{1}{2e^{\varepsilon}}\cdot\sum_{z\in\mathcal{T}}R(i)_{z}=\frac{1}{2e^{\varepsilon}}\cdot R(i)_{\mathcal{T}}.

Putting the above two inequalities together, we have

R(i)𝒯12R(i)𝒯+δ,\displaystyle R(i)_{\mathcal{T}}\leq\frac{1}{2}\cdot R(i)_{\mathcal{T}}+\delta,

which in turn implies that

PrzR(i)[R(i)z2eεR(j)z]=R(i)𝒯2δ.\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}]=R(i)_{\mathcal{T}}\leq 2\delta.\qed

Finally, we prove Lemma 4.5 (restated below).


Lemma 4.5. (restated) If we include each element i[D]\displaystyle i\in[D] in E\displaystyle E independently with probability ε1=0.01\displaystyle\varepsilon_{1}=0.01, then E\displaystyle E is good with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}.

Proof.

Let 𝗌𝗂𝗓𝖾\displaystyle\mathcal{E}_{\sf size} be the event that 0<|E|<2ε1|D|\displaystyle 0<|E|<2\varepsilon_{1}\cdot|D|. By a simple Chernoff bound, it follows that

PrE[𝗌𝗂𝗓𝖾]1exp(Ω(|D|))1nω(1).\displaystyle\Pr_{E}[\mathcal{E}_{\sf size}]\geq 1-\exp(-\Omega(|D|))\geq 1-n^{-\omega(1)}.

Therefore, the first condition for E\displaystyle E being good is satisfied with probability 1nω(1)\displaystyle 1-n^{-\omega(1)}. In the following, we will condition on the event 𝗌𝗂𝗓𝖾\displaystyle\mathcal{E}_{\sf size}.

Recall that νE=iER(i)n|E|\displaystyle\vec{\nu}^{E}=\sum_{i\in E}R(i)\cdot\frac{n}{|E|} and μ=i[D]R(i)\displaystyle\vec{\mu}=\sum_{i\in[D]}R(i). In the rest of the proof, we will focus on the second condition for E\displaystyle E being good, namely that for each i[D]\displaystyle i\in[D], it is the case that

PrzR(i)[νzE2Λ2μz]11/2Λ.\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}]\geq 1-1/2\Lambda.

In the following, we fix i[D]\displaystyle i\in[D], and show that the previous inequality holds for i\displaystyle i with high probability. Therefore, we can then conclude that E\displaystyle E is good with high probability by a union bound.

We also let m\displaystyle\vec{m}^{*}\in\mathbb{R}^{\mathcal{M}} be such that mz=maxi[D]R(i)z\displaystyle\vec{m}^{*}_{z}=\max_{i\in[D]}R(i)_{z} for all z\displaystyle z\in\mathcal{M}. Now, for z\displaystyle z\in\mathcal{M}, if μzmzlog2n\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n, we say z\displaystyle z is light; otherwise we say z\displaystyle z is heavy.

We define

𝗅𝗂𝗀𝗁𝗍:=[PrzR(i)[νzE<2Λ2μz and z is light]1/4Λ],\displaystyle\mathcal{E}_{\sf light}:=\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}]\leq 1/4\Lambda\right],

and

𝗁𝖾𝖺𝗏𝗒:=[PrzR(i)[νzE<2Λ2μz and z is heavy]1/4Λ].\displaystyle\mathcal{E}_{\sf heavy}:=\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is heavy}]\leq 1/4\Lambda\right].

It suffices to show that both PrE[𝗅𝗂𝗀𝗁𝗍|𝗌𝗂𝗓𝖾]\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}] and PrE[𝗁𝖾𝖺𝗏𝗒|𝗌𝗂𝗓𝖾]\displaystyle\Pr_{E}[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}] are very large. Note that νE\displaystyle\vec{\nu}^{E} is not defined when |E|=0\displaystyle|E|=0. But since we only care about PrE[𝗅𝗂𝗀𝗁𝗍|𝗌𝗂𝗓𝖾]\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}] and PrE[𝗁𝖾𝖺𝗏𝗒|𝗌𝗂𝗓𝖾]\displaystyle\Pr_{E}[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}], this corner case is excluded by conditioning on 𝗌𝗂𝗓𝖾\displaystyle\mathcal{E}_{\sf size}.

Proving that PrE[𝗅𝗂𝗀𝗁𝗍|𝗌𝗂𝗓𝖾]=1\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1.

By Proposition 4.7 and the fact that R\displaystyle R is (ln(n/W),nω(1))\displaystyle(\ln(n/W),n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, for every x[D]\displaystyle x\in[D]

PrzR(i)[R(i)z(2n/W)R(x)z]nω(1).\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq(2n/W)\cdot R(x)_{z}]\leq n^{-\omega(1)}.

By a union bound over all elements in E\displaystyle E, we have that

PrzR(i)[R(i)z(2n/W)νzn]nω(1),\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq(2n/W)\cdot\frac{\vec{\nu}_{z}}{n}]\leq n^{-\omega(1)},

which is equivalent to

PrzR(i)[νzW/2R(i)z]nω(1).\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot R(i)_{z}]\leq n^{-\omega(1)}.

Similarly, for j[D]\displaystyle j\in[D], we also have that

PrzR(j)[νzW/2R(j)z]nω(1).\displaystyle\Pr_{z\leftarrow R(j)}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]\leq n^{-\omega(1)}.

Again since R\displaystyle R is (ln(n/W),nω(1))\displaystyle(\ln(n/W),n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, we have that

PrzR(i)[νzW/2R(j)z]=𝔼zR(j)R(i)zR(j)z𝟙[νzW/2R(j)z]\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}\frac{R(i)_{z}}{R(j)_{z}}\cdot\mathbb{1}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]
\displaystyle\displaystyle\leq 𝔼zR(j)(2n/W)𝟙[R(i)zR(j)z2n/W]𝟙[νzW/2R(j)z]+𝔼zR(j)R(i)zR(j)z𝟙[R(i)zR(j)z>2n/W]\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}(2n/W)\cdot\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}\leq 2n/W\right]\cdot\mathbb{1}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]+\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}\frac{R(i)_{z}}{R(j)_{z}}\cdot\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}>2n/W\right]
\displaystyle\displaystyle\leq (2n/W)nω(1)+𝔼zR(i)𝟙[R(i)zR(j)z>2n/W]\displaystyle\displaystyle(2n/W)\cdot n^{-\omega(1)}+\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(i)}\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}>2n/W\right]
\displaystyle\displaystyle\leq nω(1).\displaystyle\displaystyle n^{-\omega(1)}.

Therefore, by a union bound over j[D]\displaystyle j\in[D],

PrzR(i)[νzW/2mz]nω(1).\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot\vec{m}^{*}_{z}]\leq n^{-\omega(1)}.

Now we are ready to prove that PrE[𝗅𝗂𝗀𝗁𝗍|𝗌𝗂𝗓𝖾]=1\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1. We will show that 𝗅𝗂𝗀𝗁𝗍\displaystyle\mathcal{E}_{\sf light} holds for every nonempty E\displaystyle E. We have that

PrzR(i)[νzE<2Λ2μz and z is light]\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}]
\displaystyle\displaystyle\leq PrzR(i)[νzE<2Λ2μz and z is light and νz>W/2mz]+PrzR(i)[νzW/2mz]\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}]+\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot\vec{m}^{*}_{z}]
\displaystyle\displaystyle\leq PrzR(i)[νzE<2Λ2μz and μzmzlog2n and νz>W/2mz]+nω(1)\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}]+n^{-\omega(1)} (z is light implies μzmzlog2n\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n)
\displaystyle\displaystyle\leq nω(1).\displaystyle\displaystyle n^{-\omega(1)}.

The last inequality follows from the fact that μzmzlog2n\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n and νz>W/2mz\displaystyle\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z} together imply that νz>W/2mzW/2log2nμz2Λ2μz\displaystyle\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}\geq\frac{W/2}{\log^{2}n}\vec{\mu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z} (recall that W=log24Λ2\displaystyle W=\log^{2}4\Lambda^{2}). Hence PrzR(i)[νzE<2Λ2μz and μzmzlog2n and νz>W/2mz]=0\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}]=0 as the three inequalities cannot be simultaneously satisfied.

Proving that Pr[𝗁𝖾𝖺𝗏𝗒|𝗌𝗂𝗓𝖾]\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}] is large.

Now, for a heavy z\displaystyle z, we have that μzmzlog2n\displaystyle\vec{\mu}_{z}\geq\vec{m}^{*}_{z}\cdot\log^{2}n. In particular, fix a heavy z\displaystyle z, and define the random variable Xi:=𝟙[iE]R(i)z\displaystyle X_{i}:=\mathbb{1}[i\in E]\cdot R(i)_{z} for each i[D]\displaystyle i\in[D]. Note that the Xi\displaystyle X_{i}’s are independent variables over [0,R(i)z]\displaystyle[0,R(i)_{z}] and 𝔼[i[D]Xi]=ε1μz\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in[D]}X_{i}\right]=\varepsilon_{1}\cdot\vec{\mu}_{z}. Letting S=i[D]Xi\displaystyle S=\sum_{i\in[D]}X_{i}, by Hoeffding’s inequality, we have that

PrE[|S𝔼[S]|12𝔼[S]]2exp(2(12𝔼[S])2i[D]R(i)z2).\displaystyle\Pr_{E}[|S-\operatornamewithlimits{\mathbb{E}}[S]|\geq\frac{1}{2}\operatornamewithlimits{\mathbb{E}}[S]]\leq 2\exp\left(-\frac{2\cdot(\frac{1}{2}\operatornamewithlimits{\mathbb{E}}[S])^{2}}{\sum_{i\in[D]}R(i)_{z}^{2}}\right).

Note that

i[D]R(i)z2i[D]R(i)zmzμzmz.\displaystyle\sum_{i\in[D]}R(i)_{z}^{2}\leq\sum_{i\in[D]}R(i)_{z}\cdot\vec{m}^{*}_{z}\leq\vec{\mu}_{z}\cdot\vec{m}^{*}_{z}.

Plugging in, it follows that

PrE[Sε1μz/2]2exp(ε12μz2/2μzmz)2exp(ε12/2log2n)nω(1).\displaystyle\Pr_{E}[S\leq\varepsilon_{1}\vec{\mu}_{z}/2]\leq 2\exp\left(-\frac{\varepsilon_{1}^{2}\vec{\mu}_{z}^{2}/2}{\vec{\mu}_{z}\cdot\vec{m}^{*}_{z}}\right)\leq 2\exp(-\varepsilon_{1}^{2}/2\cdot\log^{2}n)\leq n^{-\omega(1)}.

Note that νzE=Sn|E|\displaystyle\vec{\nu}^{E}_{z}=S\cdot\frac{n}{|E|}, and that |E|2ε1D\displaystyle|E|\leq 2\varepsilon_{1}D with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}. By a union bound, we have that νzEε1μz/2n2ε1D=μzΓ/4\displaystyle\vec{\nu}^{E}_{z}\geq\varepsilon_{1}\vec{\mu}_{z}/2\cdot\frac{n}{2\varepsilon_{1}D}=\vec{\mu}_{z}\cdot\Gamma/4 with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}. Noting that Γ/4=2Λ2\displaystyle\Gamma/4=2\Lambda^{2}, we have that

𝔼E[PrzR(i)[νzE<2Λ2μzz is heavy]]nω(1).\displaystyle\operatornamewithlimits{\mathbb{E}}_{E}\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\wedge\text{$\displaystyle z$ is heavy}]\right]\leq n^{-\omega(1)}.

Recall that Pr[𝗌𝗂𝗓𝖾]1nω(1)\displaystyle\Pr[\mathcal{E}_{\sf size}]\geq 1-n^{-\omega(1)}. By Markov’s inequality, we have Pr[𝗁𝖾𝖺𝗏𝗒|𝗌𝗂𝗓𝖾]1nω(1)\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]\geq 1-n^{-\omega(1)}. ∎

4.3 DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} Implies DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} with Stronger Privacy Bound

In this section, we prove a stronger connection between DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} and DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} than previously known. Together with Theorem 4.1, it implies the private-coin version of Theorem 1.4.

We first need a technical lemma which gives a lower bound on the hockey stick divergence between 𝖡𝖾𝗋(α)+𝖡𝗂𝗇(m,p)\displaystyle\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,p) and 𝖡𝖾𝗋(β)+𝖡𝗂𝗇(m,p)\displaystyle\mathsf{Ber}(\beta)+\mathsf{Bin}(m,p). We defer its proof to Appendix B.

Lemma 4.8.

There exists an absolute constant c0\displaystyle c_{0} such that, for every integer m1\displaystyle m\geq 1, three reals α,β,ε>0\displaystyle\alpha,\beta,\varepsilon>0 such that α>eεβ\displaystyle\alpha>e^{\varepsilon}\beta, letting Δ=αeεβ\displaystyle\Delta=\alpha-e^{\varepsilon}\beta and supposing 4eεΔβ<1/2\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2, it holds that

dε(𝖡𝖾𝗋(α)+𝖡𝗂𝗇(m,β)||𝖡𝖾𝗋(β)+𝖡𝗂𝗇(m,β))Δ122mexp(c0meεΔβ[log(Δ1)+1]).\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

We are now ready to prove the main lemma of this subsection.

Lemma 4.9.

For all ε=O(1)\displaystyle\varepsilon=O(1), there is a constant c>0\displaystyle c>0 such that for all δδ01/n\displaystyle\delta\leq\delta_{0}\leq 1/n if the randomizer R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} on n\displaystyle n users, then R\displaystyle R is (ln(n/clnδ1lnδ01),δ0)\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

Proof.

Let ε=O(1)\displaystyle\varepsilon=O(1). Note that we can assume that δδ0ω(1)nω(1)\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}, as otherwise lnδ1lnδ01O(1)\displaystyle\frac{\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\leq O(1) and in this case the theorem follows directly from the fact that R\displaystyle R is (ε+lnn,δ)\displaystyle(\varepsilon+\ln n,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} [CSU+18].

Suppose that R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} on n\displaystyle n users. Let c\displaystyle c be a constant to be fixed later, D=n/clnδ1lnδ01\displaystyle D=n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}} and E\displaystyle E\subseteq\mathcal{M} be an event. Our goal is to show that

R(x)ER(y)ED+δ0,\displaystyle R(x)_{E}\leq R(y)_{E}\cdot D+\delta_{0},

for all x,y𝒳\displaystyle x,y\in\mathcal{X}.

Fix two x,y𝒳\displaystyle x,y\in\mathcal{X}. Let α=R(x)E\displaystyle\alpha=R(x)_{E} and β=R(y)E\displaystyle\beta=R(y)_{E}. Note that without loss of generality we can assume that β1/D\displaystyle\beta\leq 1/D, as otherwise clearly α1Dβ+δ0\displaystyle\alpha\leq 1\leq D\cdot\beta+\delta_{0}.

Let W1=xyn1\displaystyle W_{1}=xy^{n-1} and W2=yn\displaystyle W_{2}=y^{n} be two neighboring datasets, and X,Y\displaystyle X,Y be the random variables corresponding to the number of occurrences of the event E\displaystyle E in the transcript, when running the protocol with randomizer R\displaystyle R on datasets W1\displaystyle W_{1} and W2\displaystyle W_{2}, respectively.

From the assumption that R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}, we have dε(𝖧𝗂𝗌𝗍R(W1)||𝖧𝗂𝗌𝗍R(W2))δ\displaystyle d_{\varepsilon}(\mathsf{Hist}_{R}(W_{1})||\mathsf{Hist}_{R}(W_{2}))\leq\delta. Then, by the post-processing property of DP (Lemma 3.2), it follows that dε(X||Y)δ\displaystyle d_{\varepsilon}(X||Y)\leq\delta.

We have that

X=𝖡𝖾𝗋(α)+𝖡𝗂𝗇(n1,β) and Y=𝖡𝖾𝗋(β)+𝖡𝗂𝗇(n1,β).\displaystyle X=\mathsf{Ber}(\alpha)+\mathsf{Bin}(n-1,\beta)\text{ and }Y=\mathsf{Ber}(\beta)+\mathsf{Bin}(n-1,\beta).

The goal now is to show that if α>βD+δ0\displaystyle\alpha>\beta\cdot D+\delta_{0}, then X\displaystyle X and Y\displaystyle Y do not satisfy (ε,δ)\displaystyle(\varepsilon,\delta)-DP (i.e., dε(X||Y)>δ\displaystyle d_{\varepsilon}(X||Y)>\delta), thus obtaining a contradiction.

Now, assume that α>Dβ+δ0\displaystyle\alpha>D\cdot\beta+\delta_{0}. Since ε=O(1)\displaystyle\varepsilon=O(1), we have that Δ=αeεβ>D2β+δ0\displaystyle\Delta=\alpha-e^{\varepsilon}\beta>\frac{D}{2}\cdot\beta+\delta_{0}. Note that 4eεΔβ=O(D1)<1/2\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta=O(D^{-1})<1/2.

Letting m=n1\displaystyle m=n-1 and applying Lemma 4.8 for a universal constant c0\displaystyle c_{0}, it follows that

dε(𝖡𝖾𝗋(α)+𝖡𝗂𝗇(m,β)||𝖡𝖾𝗋(β)+𝖡𝗂𝗇(m,β))Δ122mexp(c0meεΔβ[log(Δ1)+1]).\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

Noting that ε=O(1)\displaystyle\varepsilon=O(1), we have that

meεΔβ[log(Δ1)+1]O(m1Dββlogδ01)=O(clnδ1).\displaystyle m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\leq O\left(m\frac{1}{D\beta}\cdot\beta\cdot\log\delta_{0}^{-1}\right)=O(c\ln\delta^{-1}).

We now set the constant c\displaystyle c to be small enough so that

c0meεΔβ[log(Δ1)+1]12lnδ1.\displaystyle c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\leq\frac{1}{2}\ln\delta^{-1}.

Plugging in and recalling that δδ0ω(1)nω(1)\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}, we get that

dε(X||Y)=dε(𝖡𝖾𝗋(α)+𝖡𝗂𝗇(m,β)||𝖡𝖾𝗋(β)+𝖡𝗂𝗇(m,β))δ0122mδ>δ,\displaystyle d_{\varepsilon}(X||Y)=d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\delta_{0}\cdot\frac{1}{2\sqrt{2m}}\sqrt{\delta}>\delta,

a contradiction. ∎

Finally, we are ready to prove our DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} lower bound for CountDistinct in the private-coin case.

Theorem 4.10.

For all ε=O(1)\displaystyle\varepsilon=O(1), there are δ=2Θ(log8n)\displaystyle\delta=2^{-\Theta(\log^{8}n)} and D=Θ(n/log4n)\displaystyle D=\Theta(n/\log^{4}n) such that no private-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol on n\displaystyle n users can solve CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D) and probability at least 0.99\displaystyle 0.99.

Proof.

We set δ0=2log2n\displaystyle\delta_{0}=2^{-\log^{2}n} and δ=2clog8n\displaystyle\delta=2^{-c\log^{8}n} for a constant c\displaystyle c to be specified shortly. By Theorem 4.1, it follows that the corresponding randomizer R\displaystyle R is (ln(Θ(n/clog6n),nω(1))\displaystyle(\ln(\Theta(n/c\log^{6}n),n^{-\omega(1)})-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}. Setting c\displaystyle c to be sufficiently large and combining with Theorem 4.1 completes the proof. ∎

4.4 Generalizing to Public-Coin Protocols

Finally, we generalize our proof for the private-coin case to the public-coin case, and prove Theorem 1.4 (restated below).


Theorem 1.4. (restated) For all ε=O(1)\displaystyle\varepsilon=O(1), there are δ=2Θ(log8n)\displaystyle\delta=2^{-\Theta(\log^{8}n)} and D=Θ(n/log4n)\displaystyle D=\Theta(n/\log^{4}n) such that no public-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol on n\displaystyle n users can solve CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D) and probability at least 0.99\displaystyle 0.99.

Fix R\displaystyle R to be a public-coin randomizer with public randomness α\displaystyle\alpha from distribution 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub}. We first generalize Lemma 4.9, and show that if R\displaystyle R is DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}, then with high probability over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, Rα\displaystyle R_{\alpha} satisfies the similar DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} guarantee as in Lemma 4.9.

Lemma 4.11.

For all ε=O(1)\displaystyle\varepsilon=O(1), there is a constant c>0\displaystyle c>0 such that for all δδ01/n\displaystyle\delta\leq\delta_{0}\leq 1/n if the public-coin randomizer R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M} with public randomness distribution 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub} is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} on n\displaystyle n users, and if |𝒳|n\displaystyle|\mathcal{X}|\leq n, then with probability at least 1δ0\displaystyle 1-\delta_{0} over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, it is the case that Rα\displaystyle R_{\alpha} is (ln(n/clnδ1lnδ01),δ0)\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

Proof Sketch.

Similar to the proof of Lemma 4.9, we can assume that δδ0ω(1)nω(1)\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)} without loss of generality.

By the definition of public-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DP in the shuffle model, for every two neighboring datasets W1\displaystyle W_{1} and W2\displaystyle W_{2}, we have that

𝔼α𝒟𝗉𝗎𝖻[dε(𝖧𝗂𝗌𝗍Rα(W1)||𝖧𝗂𝗌𝗍Rα(W2))]δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq\delta.

Observe that the proof of Lemma 4.9 only considers the |𝒳|2\displaystyle|\mathcal{X}|^{2} pairs of neighboring datasets of the form W1=xyn1\displaystyle W_{1}=xy^{n-1} and W2=yn\displaystyle W_{2}=y^{n} for all x,y𝒳\displaystyle x,y\in\mathcal{X}. We use Wgood\displaystyle W_{good} to denote the set of such pairs.

Using the assumption that |𝒳|n\displaystyle|\mathcal{X}|\leq n, we have that

𝔼α𝒟𝗉𝗎𝖻(W1,W2)Wgood[dε(𝖧𝗂𝗌𝗍Rα(W1)||𝖧𝗂𝗌𝗍Rα(W2))]|𝒳|2δn2δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\sum_{(W_{1},W_{2})\in W_{good}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq|\mathcal{X}|^{2}\delta\leq n^{2}\delta.

Thus, by Markov’s inequality, with probability at least 1δ0\displaystyle 1-\delta_{0} over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have that

(W1,W2)Wgood[dε(𝖧𝗂𝗌𝗍Rα(W1)||𝖧𝗂𝗌𝗍Rα(W2))]n2δ/δ0δ0.9,\displaystyle\sum_{(W_{1},W_{2})\in W_{good}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq n^{2}\delta/\delta_{0}\leq\delta^{0.9},

where the last inequality follows from our assumption that δδ0ω(1)nω(1)\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}. We say an α\displaystyle\alpha is good if it satisfies the above inequality. In particular, for all good α\displaystyle\alpha and all pairs (W1,W2)Wgood\displaystyle(W_{1},W_{2})\in W_{good}, we have that

dε(𝖧𝗂𝗌𝗍Rα(W1)||𝖧𝗂𝗌𝗍Rα(W2))δ0.9.\displaystyle d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))\leq\delta^{0.9}.

The proof of Lemma 4.9 then implies that Rα\displaystyle R_{\alpha} is (ln(n/clnδ1lnδ01),δ0)\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, for a constant c\displaystyle c depending on ε\displaystyle\varepsilon. ∎

Now we are ready to prove Theorem 1.4.

Proof of Theorem 1.4.

We use 𝒟¯U,E\displaystyle\bar{\mathcal{D}}^{U,E} and 𝒟¯V,E\displaystyle\bar{\mathcal{D}}^{V,E} to denote the same distributions constructed in the proof of Theorem 4.10. We moreover use the same notation as in Section 4.2.

By a simple application of Markov’s inequality and noting that our assumed protocol solves CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error o(D)\displaystyle o(D) and probability at least 0.99\displaystyle 0.99, it follows that with probability at least 0.9\displaystyle 0.9 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, if |E|0.02|D|\displaystyle|E|\leq 0.02\cdot|D|, then

𝖧𝗂𝗌𝗍Rα(𝒟¯U,E)𝖧𝗂𝗌𝗍Rα(𝒟¯V,E)TV=Ω(1).\displaystyle\|\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{V,E})\|_{TV}=\Omega(1).

By Lemma 4.11, with probability at least 1δ0\displaystyle 1-\delta_{0} over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have that Rα\displaystyle R_{\alpha} is (ln(n/clnδ1lnδ01),δ0)\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}. We say that such an α\displaystyle\alpha is good.

For all good α\displaystyle\alpha and for a good subset E\displaystyle E, when the randomizer is set to Rα\displaystyle R_{\alpha} (note that the definition of a good subset depends on the randomizer R\displaystyle R), by a similar argument as in Theorem 4.10, we have

𝖧𝗂𝗌𝗍Rα(𝒟¯U,E)𝖧𝗂𝗌𝗍Rα(𝒟¯V,E)TV=o(1).\displaystyle\|\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{V,E})\|_{TV}=o(1).

Now, by Lemma 4.5, if we construct E\displaystyle E by including each element of D\displaystyle D with probability 0.01\displaystyle 0.01, then for every good α\displaystyle\alpha, we know that E\displaystyle E is good for randomizer Rα\displaystyle R_{\alpha} with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}. By a union bound, there exists a fixed choice of E\displaystyle E such that E\displaystyle E is good for randomizer Rα\displaystyle R_{\alpha} with probability at least 11/n\displaystyle 1-1/n over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}. In the following, we fix E\displaystyle E to be such a choice.

Finally, by a union bound, it follows that with probability at least 0.9δ01/n>0\displaystyle 0.9-\delta_{0}-1/n>0, the above two inequalities hold simultaneously, a contradiction. ∎

Theorem 1.2 follows exactly using a similar argument as in the proof of Theorem 1.4 (in fact, it is simpler in the local case because there is no need to apply Lemma 4.11).

5 (ε,δ)\displaystyle(\varepsilon,\delta)-Dominated Algorithms

In [CSU+18], it was shown that an (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol on n\displaystyle n users is also (ε+lnn,δ)\displaystyle(\varepsilon+\ln n,\delta)-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, thereby reducing the problem of proving lower bounds for DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocols to proving lower bounds on DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols with low privacy properties. However, it is known that such a connection does not hold even for DPshuffle2\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{2} protocols [BC20, Section 4.1].

Recall the definition of (ε,δ)\displaystyle(\varepsilon,\delta)-dominated algorithms from Definition 1.7. In this section, we will show that DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocols are dominated.

For clarity of exposition, we will assume that each user sends exactly k\displaystyle k messages; this is without loss of generality (see Footnote 12). To handle public-coin protocols, we need a relaxed version of dominated algorithms.

Definition 5.1 (Dominated Algorithms).

For a distribution μ\displaystyle\mu on 𝒳\displaystyle\mathcal{X}, we say an algorithm R\displaystyle R is (ε,δ,μ)\displaystyle(\varepsilon,\delta,\mu)-dominated, if for the distribution 𝒟μ=𝔼xμR(x)\displaystyle\mathcal{D}_{\mu}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}R(x), there exists a distribution 𝒟\displaystyle\mathcal{D} on k\displaystyle\mathcal{M}^{k} such that

dε(𝒟μ||𝒟)δ.\displaystyle d_{\varepsilon}\left(\mathcal{D}_{\mu}||\mathcal{D}\right)\leq\delta.

In this case, we also say R\displaystyle R is (ε,δ,μ)\displaystyle(\varepsilon,\delta,\mu)-dominated by 𝒟\displaystyle\mathcal{D}.

5.1 Approximate-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} Protocols are Dominated

Next we show that approximate DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols are dominated. For this purpose, we introduce the concept of “pseudo-locally private” algorithms, which is a special case of dominated algorithms, and may be interesting in its own right.

5.1.1 Merged Randomizer

Let n,k\displaystyle\mathcal{B}_{n,k} be the set of all k\displaystyle k-sized subsets of the set [n]×[k]\displaystyle[n]\times[k]. For n,k\displaystyle\mathcal{F}\in\mathcal{B}_{n,k} and a randomizer R\displaystyle R, we define the merged randomizer of R\displaystyle R with respect to \displaystyle\mathcal{F}, denoted by R\displaystyle R^{\mathcal{F}}, as follows:

R(x)\displaystyle R^{\mathcal{F}}(x)

  • Given an input x\displaystyle x, for each j[n]\displaystyle j\in[n], we simulate R(x)\displaystyle R(x) with independent random coins to get an output wjk\displaystyle w_{j}\in\mathcal{M}^{k}.

  • Assume that \displaystyle\mathcal{F} consists of elements (x1,y1),,(xk,yk)[n]×[k]\displaystyle(x_{1},y_{1}),\dotsc,(x_{k},y_{k})\in[n]\times[k] indexed in lexicographical order. We construct a k\displaystyle k-tuple zk\displaystyle z\in\mathcal{M}^{k} such that zi=(wxi)yi\displaystyle z_{i}=(w_{x_{i}})_{y_{i}} for each i[k]\displaystyle i\in[k].

  • We pre-shuffle z\displaystyle z before outputting it. That is, we draw a permutation π:[k][k]\displaystyle\pi\colon[k]\to[k] uniformly at random, shuffle z\displaystyle z according to π\displaystyle\pi to obtain a new k\displaystyle k-tuple z~\displaystyle\widetilde{z} (by setting z~i=zπ(i)\displaystyle\widetilde{z}_{i}=z_{\pi(i)} for each i[k]\displaystyle i\in[k]), and output z~\displaystyle\widetilde{z}.

That is, R(x)\displaystyle R^{\mathcal{F}}(x) runs R(x)\displaystyle R(x) several times, and merges the obtained outputs according to \displaystyle\mathcal{F}. We now define a distribution 𝒟n,k\displaystyle\mathcal{D}_{n,k} on n,k\displaystyle\mathcal{B}_{n,k} as follows: to draw a sample from 𝒟n,k\displaystyle\mathcal{D}_{n,k}, we simply draw k\displaystyle k items {(xi,yi)}i[k]\displaystyle\{(x_{i},y_{i})\}_{i\in[k]} without replacement from the set [n]×[k]\displaystyle[n]\times[k].

Finally, for a randomizer R\displaystyle R, we define the randomizer R𝗋𝖺𝗇𝖽\displaystyle R^{\sf rand} as follows: Given an input x\displaystyle x, we first draw \displaystyle\mathcal{F} from 𝒟n,k\displaystyle\mathcal{D}_{n,k}, and then simulate R\displaystyle R^{\mathcal{F}} on the input x\displaystyle x and output its output.

5.1.2 Pseudo-Locally Private Algorithms

We are now ready to define pseudo-locally private algorithms.

Definition 5.2.

We say that an algorithm R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-pseudo-locally private, if for all x,y𝒳\displaystyle x,y\in\mathcal{X} and all Ek\displaystyle E\subseteq\mathcal{M}^{k},

Pr[R(x)E]eεPr[R𝗋𝖺𝗇𝖽(y)E]+δ.\displaystyle\Pr[R(x)\in E]\leq e^{\varepsilon}\cdot\Pr[R^{\sf rand}(y)\in E]+\delta.
Remark 5.3.

If R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-pseudo-locally private, then clearly R\displaystyle R is also (ε,δ)\displaystyle(\varepsilon,\delta)-dominated; we can simply take 𝒟=R𝗋𝖺𝗇𝖽(y)\displaystyle\mathcal{D}=R^{\sf rand}(y^{*}) for any fixed y\displaystyle y^{*}.

5.1.3 Multi-Message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} Protocols are Pseudo-Locally Private

Our most crucial observation here is an analogue of [CSU+18, Theorem 6.2] for multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols. Namely, we show that any multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol is pseudo-locally private.

Lemma 5.4.

If R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} on n\displaystyle n users, then it is (ε+k(1+lnn),δ)\displaystyle(\varepsilon+k(1+\ln n),\delta) pseudo-locally private.

Proof.

Suppose otherwise, i.e., that there are x,y\displaystyle x,y and Ek\displaystyle E\subseteq\mathcal{M}^{k} such that

Pr[R(x)E]>(en)keε𝔼𝒟n,k[Pr[R(y)E]]+δ.\displaystyle\displaystyle\Pr[R(x)\in E]>(en)^{k}e^{\varepsilon}\cdot\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right]+\delta.

Note that since both R\displaystyle R and R\displaystyle R^{\mathcal{F}} pre-shuffle their outputs before outputting them, we can assume that E\displaystyle E is a union of equivalence classes of k\displaystyle k-tuples (we say two k\displaystyle k-tuples u,vk\displaystyle u,v\in\mathcal{M}^{k} are equivalent if v\displaystyle v can be obtained by u\displaystyle u via applying a permutation).141414Too see this, we can take E\displaystyle E to be {z:R(x)z>(en)keεR𝗋𝖺𝗇𝖽(y)zzk}\displaystyle\{z:R(x)_{z}>(en)^{k}e^{\varepsilon}R^{\sf rand}(y)_{z}\wedge z\in\mathcal{M}^{k}\}. One can see that if u\displaystyle u and v\displaystyle v are equivalent up to a permutation, then either both u\displaystyle u and v\displaystyle v are in E\displaystyle E, or neither of them is in E\displaystyle E.

Consider two datasets X0=yn\displaystyle X_{0}=y^{n} and X1=yn1x\displaystyle X_{1}=y^{n-1}x. Let P\displaystyle P be the corresponding DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol with randomizer R\displaystyle R. For a dataset X\displaystyle X, we use PR(X)\displaystyle P_{R}(X) to denote the random variable of the transcript of P\displaystyle P before shuffling. That is, for a dataset X=(xi)i[n]\displaystyle X=(x_{i})_{i\in[n]}, PR(X)\displaystyle P_{R}(X) is the concatenation of all R(Xi)\displaystyle R(X_{i}) for i\displaystyle i from 1\displaystyle 1 to n\displaystyle n.

We now define an event \displaystyle\mathcal{E} as “there exist k\displaystyle k messages in the transcript of P\displaystyle P constituting the event E\displaystyle E”. It immediately follows that

Pr[PR(X1)]Pr[R(x)E].\displaystyle\displaystyle\Pr[P_{R}(X_{1})\in\mathcal{E}]\geq\Pr[R(x)\in E]. (2)

Furthermore, we claim that

Pr[PR(X0)](knk)𝔼𝒟n,k[Pr[R(y)E]].\displaystyle\displaystyle\Pr[P_{R}(X_{0})\in\mathcal{E}]\leq\binom{kn}{k}\cdot\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right]. (3)

To see why the above inequality holds, note that if we pick k\displaystyle k messages from PR(X0)\displaystyle P_{R}(X_{0}), depending on which users these messages come from, the probability that they constitute E\displaystyle E is bounded by Pr[R(y)E]\displaystyle\Pr[R^{\mathcal{F}}(y)\in E] for a certain n,k\displaystyle\mathcal{F}\in\mathcal{B}_{n,k}.

Moreover, if we pick these k\displaystyle k messages uniformly at random from all (knk)\displaystyle\binom{kn}{k} possible k\displaystyle k-tuples, the corresponding \displaystyle\mathcal{F} is distributed according to 𝒟n,k\displaystyle\mathcal{D}_{n,k}. Therefore, we can apply a union bound over all (knk)\displaystyle\binom{kn}{k} possible k\displaystyle k-tuples and sum up the corresponding Pr[R(y)E]\displaystyle\Pr[R^{\mathcal{F}}(y)\in E] to obtain an upper bound on Pr[PR(X0)]\displaystyle\Pr[P_{R}(X_{0})\in\mathcal{E}]. The aforementioned sum is precisely (knk)\displaystyle\binom{kn}{k} times the expectation 𝔼𝒟n,k[Pr[R(y)E]]\displaystyle\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right].

Since (knk)(en)k\displaystyle\binom{kn}{k}\leq(en)^{k}, we may combine (2) and (3) to obtain

Pr[PR(X1)]>eεPr[PR(X0)]+δ.\Pr[P_{R}(X_{1})\in\mathcal{E}]>e^{\varepsilon}\cdot\Pr[P_{R}(X_{0})\in\mathcal{E}]+\delta. (4)

Finally, note that applying a random permutation to the transcript does not change whether the event \displaystyle\mathcal{E} occurs. Therefore, (4) contradicts the assumption that R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}}. ∎

From Remark 5.3, we get the following corollary:

Corollary 5.5.

If R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} on n\displaystyle n users, then it is (ε+k(1+lnn),δ)\displaystyle(\varepsilon+k(1+\ln n),\delta)-dominated.

Next, we generalize Corollary 5.5 to the public-coin case:

Lemma 5.6.

If R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} in the n\displaystyle n-user public-coin setting with public randomness from 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub} and τ=ε+k(1+lnn)\displaystyle\tau=\varepsilon+k(1+\ln n), then there is a family of distributions {𝒟α}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})} over k\displaystyle\mathcal{M}^{k} such that for every distribution μ\displaystyle\mu over 𝒳\displaystyle\mathcal{X},

𝔼α𝒟𝗉𝗎𝖻dτ(𝔼xμRα(x)||𝒟α)δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq\delta.

In other words, there are reals {δα}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\delta_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})} such that 𝔼α𝒟𝗉𝗎𝖻[δα]δ\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[\delta_{\alpha}]\leq\delta and Rα\displaystyle R_{\alpha} is (τ,δα,μ)\displaystyle(\tau,\delta_{\alpha},\mu)-dominated by 𝒟α\displaystyle\mathcal{D}_{\alpha}.

Proof.

From the proof of Lemma 5.4, it follows that for all x,y𝒳\displaystyle x,y\in\mathcal{X} and αsupp(𝒟𝗉𝗎𝖻)\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub}), we have that

dτ(Rα(x)||Rα𝗋𝖺𝗇𝖽(y))dε(PRα(xyn1)||PRα(yn)).\displaystyle d_{\tau}(R_{\alpha}(x)||R^{\sf rand}_{\alpha}(y))\leq d_{\varepsilon}(P_{R_{\alpha}}(xy^{n-1})||P_{R_{\alpha}}(y^{n})).

Since R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} on n\displaystyle n users, it follows that

𝔼α𝒟𝗉𝗎𝖻[dε(PRα(xyn1)||PRα(yn))]δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\varepsilon}(P_{R_{\alpha}}(xy^{n-1})||P_{R_{\alpha}}(y^{n}))]\leq\delta.

Putting the above two inequalities together, for all x,y𝒳\displaystyle x,y\in\mathcal{X}, we get that

𝔼α𝒟𝗉𝗎𝖻[dτ(Rα(x)||Rα𝗋𝖺𝗇𝖽(y))]δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\tau}(R_{\alpha}(x)||R^{\sf rand}_{\alpha}(y))]\leq\delta.

We now finish the proof by fixing y𝒳\displaystyle y^{*}\in\mathcal{X}, and setting 𝒟α=Rα𝗋𝖺𝗇𝖽(y)\displaystyle\mathcal{D}_{\alpha}=R^{\sf rand}_{\alpha}(y^{*}) for every αsupp(𝒟𝗉𝗎𝖻)\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub}). ∎

5.2 Bounding KL Divergence for Dominated Randomizers

In this subsection, we prove the technical lemma bounding average-case KL divergences for dominated randomizers.

As before, we use 𝒳\displaystyle\mathcal{X} and \displaystyle\mathcal{M} to denote the input space and the message space respectively. For a local randomizer R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M}, we let px,z=Pr[R(x)=z]\displaystyle p_{x,z}=\Pr[R(x)=z].

Let μ\displaystyle\mu be a distribution on 𝒳\displaystyle\mathcal{X}. Let \displaystyle\mathcal{I} be an index set, π\displaystyle\pi be a distribution on \displaystyle\mathcal{I}, and {λv}v\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}} be a family of distributions on 𝒳\displaystyle\mathcal{X}. For a constant τ\displaystyle\tau, we say that μ\displaystyle\mu τ\displaystyle\tau-dominates {λv}\displaystyle\{\lambda_{v}\} if for all x𝒳\displaystyle x\in\mathcal{X} and v\displaystyle v\in\mathcal{I}, it holds that (λv)xτμx\displaystyle(\lambda_{v})_{x}\leq\tau\cdot\mu_{x}.

Theorem 5.7.

For a constant τ2\displaystyle\tau\geq 2, let μ\displaystyle\mu be a distribution which τ\displaystyle\tau-dominates a distribution family {λv}v\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}. Let π\displaystyle\pi be a distribution on \displaystyle\mathcal{I}. Let W:\displaystyle W\colon\mathbb{R}\to\mathbb{R} be a concave function such that for all functions ψ:𝒳0\displaystyle\psi\colon\mathcal{X}\to\mathbb{R}^{\geq 0} satisfying ψ(μ)1\displaystyle\psi(\mu)\leq 1, it holds that

𝔼vπ[(ψ(λv)ψ(μ))2]W(ψ).\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(\psi(\lambda_{v})-\psi(\mu))^{2}\right]\leq W(\|\psi\|_{\infty}).

Then for an (ε,δ,μ)\displaystyle(\varepsilon,\delta,\mu)-dominated randomizer R\displaystyle R, it holds that

𝔼vπ[KL(R(λv)||R(μ))]2W(2eε)+4(τ1)2δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta.
Proof.

Let Q=R(μ)\displaystyle Q=R(\mu). Recall that px,z=Pr[R(x)=z]\displaystyle p_{x,z}=\Pr[R(x)=z]. We also set qz=Pr[Q=z]\displaystyle q_{z}=\Pr[Q=z] and fz(x)=px,zqz\displaystyle f_{z}(x)=\frac{p_{x,z}}{q_{z}}.

It follows from the assumption that there exists a distribution q𝒟\displaystyle q^{\mathcal{D}} that (ε,δ,μ)\displaystyle(\varepsilon,\delta,\mu)-dominates R\displaystyle R. Noting that χ2\displaystyle\chi^{2}-divergence upper-bounds KL divergence (see Section 3.4), it follows that

𝔼vπKL(R(λv)||Q)\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\mathrm{KL}(R(\lambda_{v})||Q) 𝔼vπχ2(R(λv)||Q)\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\chi^{2}(R(\lambda_{v})||Q)
𝔼vπ𝔼zQ[R(λv)zqzqz]2\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\left[\frac{R(\lambda_{v})_{z}-q_{z}}{q_{z}}\right]^{2}
=𝔼zQ𝔼vπ[fz(λv)1]2.\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[f_{z}(\lambda_{v})-1\right]^{2}.

We will further decompose fz=gz+hz\displaystyle f_{z}=g_{z}+h_{z} so that gz\displaystyle\|g_{z}\|_{\infty} is small and 𝔼zQhz(μ)\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu) is small. Formally, for each z\displaystyle z\in\mathcal{M}, we define a truncation level

Lz=2eεqz𝒟qz.\displaystyle L_{z}=\frac{2e^{\varepsilon}\cdot q_{z}^{\mathcal{D}}}{q_{z}}.

Then, we define gz\displaystyle g_{z} and hz\displaystyle h_{z} as follows

gz(x):={fz(x)if fz(x)Lz,0otherwise,andhz(x):=fz(x)gz(x).\displaystyle g_{z}(x):=\begin{cases}f_{z}(x)&\text{if }f_{z}(x)\leq L_{z},\\ 0&\text{otherwise},\end{cases}\quad\quad\text{and}\quad\quad h_{z}(x):=f_{z}(x)-g_{z}(x).

Fix a z\displaystyle z in the support of Q\displaystyle Q. Noting that

gz(μ)+hz(μ)=fz(μ)=𝔼xμ[px,z]qz=1,\displaystyle g_{z}(\mu)+h_{z}(\mu)=f_{z}(\mu)=\frac{\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,z}]}{q_{z}}=1,

we get

𝔼vπ[fz(λv)1]2\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[f_{z}(\lambda_{v})-1\right]^{2} =𝔼vπ[(gz(λv)gz(μ))+(hz(λv)hz(μ))]2\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(g_{z}(\lambda_{v})-g_{z}(\mu))+(h_{z}(\lambda_{v})-h_{z}(\mu))\right]^{2}
2𝔼vπ[(gz(λv)gz(μ))2]+2𝔼vπ[(hz(λv)hz(μ))2].\displaystyle\displaystyle\leq 2\cdot\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[(g_{z}(\lambda_{v})-g_{z}(\mu))^{2}]+2\cdot\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[(h_{z}(\lambda_{v})-h_{z}(\mu))^{2}]. (5)

To simplify the notation, in the following we set g^z(λv):=gz(λv)gz(μ)\displaystyle\hat{g}_{z}(\lambda_{v}):=g_{z}(\lambda_{v})-g_{z}(\mu) and h^z(λv)=hz(λv)hz(μ)\displaystyle\hat{h}_{z}(\lambda_{v})=h_{z}(\lambda_{v})-h_{z}(\mu). We will bound the two terms in (5) separately.

Bounding 𝔼vπg^z(λv)2\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}.

Since W\displaystyle W is concave, noting that gzLz\displaystyle\|g_{z}\|_{\infty}\leq L_{z} and gz(μ)1\displaystyle g_{z}(\mu)\leq 1, it follows that

𝔼zQ𝔼vπg^z(λv)2𝔼zQW(Lz)W(𝔼zQLz),\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}\leq\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}W(L_{z})\leq W\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}L_{z}\right),

where the second step uses Jensen’s inequality. From the definition of Lz\displaystyle L_{z}, we have that

𝔼zQLz=zqz2eεqz𝒟qz=2eεzqz𝒟=2eε,\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}L_{z}=\sum_{z}q_{z}\cdot\frac{2e^{\varepsilon}\cdot q^{\mathcal{D}}_{z}}{q_{z}}=2e^{\varepsilon}\cdot\sum_{z}q^{\mathcal{D}}_{z}=2e^{\varepsilon},

where the last equality follows from the fact that q𝒟\displaystyle q^{\mathcal{D}} is a distribution. We therefore obtain

𝔼zQ𝔼vπg^z(λv)2W(2eε).\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}\leq W(2e^{\varepsilon}).
Bounding 𝔼vπh^z(λv)2\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}.

Since μ\displaystyle\mu τ\displaystyle\tau-dominates {λv}\displaystyle\{\lambda_{v}\}, it follows that

|h^z(λv)|=|hz(λv)hz(μ)|max{hz(μ),τhz(μ)hz(μ)}(τ1)hz(μ).\displaystyle|\hat{h}_{z}(\lambda_{v})|=|h_{z}(\lambda_{v})-h_{z}(\mu)|\leq\max\left\{h_{z}(\mu),\tau\cdot h_{z}(\mu)-h_{z}(\mu)\right\}\leq(\tau-1)h_{z}(\mu).

Therefore,

𝔼zQ𝔼vπh^z(λv)2(τ1)2𝔼zQhz(μ)2(τ1)2𝔼zQhz(μ),\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}\leq(\tau-1)^{2}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)^{2}\leq(\tau-1)^{2}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu),

where the last inequality holds because hz(μ)1\displaystyle h_{z}(\mu)\leq 1.

By the definition of hz\displaystyle h_{z}, it follows that

𝔼zQhz(μ)=z𝔼xμ[px,z𝟙[fz(x)>Lz]].\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)=\sum_{z\in\mathcal{M}}\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}\Big{[}p_{x,z}\cdot\mathbb{1}[f_{z}(x)>L_{z}]\Big{]}.

Let 𝒯x={z:fz(x)>Lz}\displaystyle\mathcal{T}_{x}=\{z\in\mathcal{M}:f_{z}(x)>L_{z}\}. For z𝒯x\displaystyle z\in\mathcal{T}_{x}, we get that

px,z\displaystyle\displaystyle p_{x,z} >Lzqz\displaystyle\displaystyle>L_{z}\cdot q_{z}
>2eεqz𝒟qzqz=2eεqz𝒟.\displaystyle\displaystyle>\frac{2e^{\varepsilon}\cdot q_{z}^{\mathcal{D}}}{q_{z}}\cdot q_{z}=2\cdot e^{\varepsilon}\cdot q^{\mathcal{D}}_{z}.

In particular, the above means that 𝔼xμpx,𝒯x2δ\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}p_{x,\mathcal{T}_{x}}\leq 2\delta, as otherwise

𝔼xμ[px,𝒯x]eεq𝒯x𝒟𝔼xμ[px,𝒯x]/2>δ,\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,\mathcal{T}_{x}}]-e^{\varepsilon}\cdot q^{\mathcal{D}}_{\mathcal{T}_{x}}\geq\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,\mathcal{T}_{x}}]/2>\delta,

contradicting the fact that R\displaystyle R is (ε,δ,μ)\displaystyle(\varepsilon,\delta,\mu)-dominated by q𝒟\displaystyle q^{\mathcal{D}}. Hence, we have

𝔼zQhz(μ)=z𝔼xμ[px,z𝟙[fz(x)>Lz]]=𝔼xμpx,𝒯x2δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)=\sum_{z\in\mathcal{M}}\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}\Big{[}p_{x,z}\cdot\mathbb{1}[f_{z}(x)>L_{z}]\Big{]}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}p_{x,\mathcal{T}_{x}}\leq 2\delta.

Putting everything together, it follows that

𝔼zQ𝔼vπh^z(λv)22(τ1)2δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}\leq 2(\tau-1)^{2}\cdot\delta.
Final Bound.

Combining our bounds on 𝔼zQ𝔼vπg^z(α)2\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\alpha)^{2} and 𝔼zQ𝔼vπh^z(λv)2\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}, we conclude that

𝔼vπ[KL(R(λv)||R(μ))]2W(2eε)+4(τ1)2δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta.\qed

6 Lower Bounds for Selection and ParityLearning

In this section, we prove lower bounds for Selection and ParityLearning in the DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} model. We begin with some notation.

6.1 Notation

For (,s)[2]×{0,1}D\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}, let 𝒟,s\displaystyle\mathcal{D}_{\ell,s} be the uniform distribution on {x{0,1}D:x,s=}\displaystyle\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}. Recall that 𝒰D\displaystyle\mathcal{U}_{D} is the uniform distribution on {0,1}D\displaystyle\{0,1\}^{D}.

For j[D]\displaystyle j\in[D], let ej\displaystyle e_{j} be the D\displaystyle D-bit string such that only the j\displaystyle j-th bit is 1\displaystyle 1, and the other bits are 0\displaystyle 0. For (,j)[2]×[D]\displaystyle(\ell,j)\in[2]\times[D], we denote by 𝒟,ej\displaystyle\mathcal{D}_{\ell,e_{j}} the uniform distribution on all length-D\displaystyle D Boolean strings with j\displaystyle j-th bit being \displaystyle\ell. For simplicity, we also use 𝒟,j\displaystyle\mathcal{D}_{\ell,j} to denote 𝒟,ej\displaystyle\mathcal{D}_{\ell,e_{j}} when the context is clear.

We need the following simple proposition.

Proposition 6.1.

For every function f:{0,1}D\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R} and s{0,1}D\displaystyle s\in\{0,1\}^{D},

f^(s)=12(f(𝒟0,s)f(𝒟1,s)).\displaystyle\hat{f}(s)=\frac{1}{2}(f(\mathcal{D}_{0,s})-f(\mathcal{D}_{1,s})).
Proof.

By definition, we have that

f^(s)=𝔼x𝒰D(1)s,xf(x)=12(f(𝒟0,s)f(𝒟1,s)).\displaystyle\hat{f}(s)=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}(-1)^{\langle s,x\rangle}f(x)=\frac{1}{2}(f(\mathcal{D}_{0,s})-f(\mathcal{D}_{1,s})).\qed

6.2 Lower Bound for Selection

We begin with lower bounds for Selection.

Lemma 6.2.

For ε>0\displaystyle\varepsilon>0, suppose R\displaystyle R is (ε,δ,𝒰D)\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})-dominated, then we have

𝔼(,j)[2]×[D][KL(R(𝒟,j)||R(𝒰D))]O(εD+δ).\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq O\left(\frac{\varepsilon}{D}+\delta\right).
Proof.

To apply Theorem 5.7, we set the index set as =[2]×[D]\displaystyle\mathcal{I}=[2]\times[D], the distribution π\displaystyle\pi to the uniform distribution over \displaystyle\mathcal{I}, {λv}v={𝒟v}v\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}\}_{v\in\mathcal{I}}, and μ=𝒰D\displaystyle\mu=\mathcal{U}_{D}.

Clearly, μ\displaystyle\mu 2\displaystyle 2-dominates {λv}\displaystyle\{\lambda_{v}\}. Let f\displaystyle f be a function such that f=L\displaystyle\|f\|_{\infty}=L and f(μ)1\displaystyle f(\mu)\leq 1, it follows that

𝔼vπ(f(μ)f(λv))2\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}(f(\mu)-f(\lambda_{v}))^{2} =𝔼(,j)[2]×[D](f(𝒟,j)f(𝒰D))2\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}(f(\mathcal{D}_{\ell,j})-f(\mathcal{U}_{D}))^{2}
=𝔼(,j)[2]×[D]14(f(𝒟,j)f(𝒟1,j))2\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\frac{1}{4}(f(\mathcal{D}_{\ell,j})-f(\mathcal{D}_{1-\ell,j}))^{2}
=𝔼(,j)[2]×[D]f^({j})2.\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\hat{f}(\{j\})^{2}. (Proposition 6.1)

By Lemma 3.7, it follows that

𝔼vπ(f(μ)f(λv))2=𝔼(,j)[2]×[D]f^({j})2O(lnLD).\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}(f(\mu)-f(\lambda_{v}))^{2}=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\hat{f}(\{j\})^{2}\leq O\left(\frac{\ln L}{D}\right).

Therefore, we can set W(L):=ClnLD\displaystyle W(L):=\frac{C\cdot\ln L}{D} for a large enough constant C\displaystyle C and note that W\displaystyle W is a concave function. By Theorem 5.7, it follows that

𝔼(,j)[2]×[D][KL(R(𝒟,j)||R(𝒰D))]O(W(2eε)+δ)O(εD+δ).\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq O(W(2e^{\varepsilon})+\delta)\leq O\left(\frac{\varepsilon}{D}+\delta\right).\qed
Lemma 6.3.

For a public-coin randomizer R\displaystyle R with public randomness from 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub}, if there is a family of distributions {𝒟α}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})} over k\displaystyle\mathcal{M}^{k} such that

𝔼α𝒟𝗉𝗎𝖻dε(𝔼x𝒰DRα(x)||𝒟α)o(1/D),\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/D),

then a public-coin protocol with randomizer R\displaystyle R needs at least Ω(DlogDε)\displaystyle\Omega\left(\frac{D\log D}{\varepsilon}\right) samples to solve Selection with probability at least 0.99\displaystyle 0.99.

Proof.

Let L,J\displaystyle L,J be uniformly random over [2]×[D]\displaystyle[2]\times[D], and X1,X2,,Xn\displaystyle X_{1},X_{2},\dotsc,X_{n} be n\displaystyle n i.i.d. samples from DL,J\displaystyle D_{L,J}. For each i[n]\displaystyle i\in[n], we draw Zi\displaystyle Z_{i} from R(Xi)\displaystyle R(X_{i}).

Let Pα(Z1,Z2,,Zn)\displaystyle P_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n}) be the output of the protocol with public randomness fixed to α\displaystyle\alpha, and let Fα(Z1,Z2,,Zn):=(1,Pα(Z1,Z2,,Zn))\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n}):=(1,P_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})). Assuming nΘ(logD)\displaystyle n\geq\Theta(\log D), it follows that Fα(Z1,Z2,,Zn)=(L,J)\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J) with probability at least 0.990.010.98\displaystyle 0.99-0.01\geq 0.98 over the randomness of α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub} and randomness in Fα\displaystyle F_{\alpha}, conditioned on the event L=1\displaystyle L=1.

By Markov’s inequality, with probability at least 0.8\displaystyle 0.8 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have that Fα(Z1,Z2,,Zn)=(L,J)\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J) with probability at least 0.8\displaystyle 0.8 over the randomness in Fα\displaystyle F_{\alpha} conditioned on the event L=1\displaystyle L=1.

From our assumption and Markov’s inequality, it follows that with probability at least 0.99\displaystyle 0.99 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have dε(𝔼x𝒰DRα(x)||𝒟α)o(1/D)\displaystyle d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/D). That is, Rα\displaystyle R_{\alpha} is (ε,o(1/D),𝒰D)\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})-dominated.

By a union bound, with probability at least 0.990.2>0\displaystyle 0.99-0.2>0 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have Fα(Z1,Z2,,Zn)=(L,J)\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J) with probability at least 0.8/20.4\displaystyle 0.8/2\geq 0.4, and Rα\displaystyle R_{\alpha} is (ε,o(1/D),𝒰D)\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})-dominated. In the following, we fix such an α\displaystyle\alpha.

By Lemma 6.2, for βO(εD)\displaystyle\beta\leq O\left(\frac{\varepsilon}{D}\right), we have that

𝔼(,j)[2]×[D][KL(Rα(𝒟,j)||Rα(𝒰D))]β.\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{\ell,j})||R_{\alpha}(\mathcal{U}_{D}))]\leq\beta.

By Fano’s inequality,

Pr[Fα(Z1,Z2,,Zn)=(L,J)]\displaystyle\displaystyle\Pr[F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)] 1+I((Z1,Z2,,Zn);(L,J))log2D\displaystyle\displaystyle\leq\frac{1+I((Z_{1},Z_{2},\dotsc,Z_{n});(L,J))}{\log 2D}
1+nI(Z1;(L,J))log2D.\displaystyle\displaystyle\leq\frac{1+n\cdot I(Z_{1};(L,J))}{\log 2D}.

We also have that

I(Z1;(L,J))=KL((Z1,L,J)||Z1(L,J))\displaystyle\displaystyle I(Z_{1};(L,J))=\mathrm{KL}((Z_{1},L,J)||Z_{1}\otimes(L,J)) =𝔼L,J[2]×[D]KL((Z1|L,J)||Z1)\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}((Z_{1}|L,J)||Z_{1})
=𝔼L,J[2]×[D]KL(R(𝒟L,J)||R(𝒰D))\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{L,J})||R(\mathcal{U}_{D}))
β.\displaystyle\displaystyle\leq\beta.

Plugging in, we obtain

1+nβlog2DPr[F(Z1,Z2,,Zn)=(L,J)]0.4.\displaystyle\frac{1+n\cdot\beta}{\log 2D}\geq\Pr[F(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)]\geq 0.4.

Hence, we deduce that n=Ω(logDβ1)=Ω(DlogDε)\displaystyle n=\Omega(\log D\cdot\beta^{-1})=\Omega\left(\frac{D\log D}{\varepsilon}\right). ∎

We are now ready to prove Theorem 1.9 (restated below).


Theorem 1.9. (restated) For any ε=O(1)\displaystyle\varepsilon=O(1), if P\displaystyle P is a public-coin (ε,o(1/D))\displaystyle(\varepsilon,o(1/D))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol solving Selection with probability at least 0.99\displaystyle 0.99, then nΩ(Dk)\displaystyle n\geq\Omega\left(\frac{D}{k}\right).

Proof.

Without loss of generality, we assume that npoly(D)\displaystyle n\leq\mathop{\mathrm{poly}}(D). Applying Lemma 5.6 and letting τ=ε+k(1+lnn)\displaystyle\tau=\varepsilon+k(1+\ln n), we get that

𝔼α𝒟𝗉𝗎𝖻dτ(𝔼x𝒰DRα(x)||𝒟α)δ,\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq\delta,

for a distribution family {𝒟α}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}.

Therefore, by Lemma 6.3, it follows that nΩ(DlogDτ)=Ω(Dk)\displaystyle n\geq\Omega\left(\frac{D\log D}{\tau}\right)=\Omega\left(\frac{D}{k}\right). ∎

6.3 Lower Bound for ParityLearning

We next prove our lower bound for ParityLearning.

Lemma 6.4.

For ε>0\displaystyle\varepsilon>0, suppose R\displaystyle R is (ε,δ,𝒰D)\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})-dominated. We have that

𝔼,s[2]×{0,1}D[KL(R(𝒟,s)||R(𝒰D))]4eε2D+4δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R(\mathcal{D}_{\ell,s})||R(\mathcal{U}_{D}))]\leq\frac{4e^{\varepsilon}}{2^{D}}+4\delta.
Proof.

To apply Theorem 5.7, we set the index set as =[2]×{0,1}D\displaystyle\mathcal{I}=[2]\times\{0,1\}^{D}, distribution π\displaystyle\pi to be the uniform distribution over \displaystyle\mathcal{I}, {λv}v={𝒟v}v\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}\}_{v\in\mathcal{I}}, and μ=𝒰D\displaystyle\mu=\mathcal{U}_{D}.

Clearly, μ\displaystyle\mu 2\displaystyle 2-dominates {λv}\displaystyle\{\lambda_{v}\}. Let f\displaystyle f be a function such that f=L\displaystyle\|f\|_{\infty}=L and f(μ)1\displaystyle f(\mu)\leq 1. It follows that

𝔼vπ|f(μ)f(λv)|2\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}|f(\mu)-f(\lambda_{v})|^{2} =𝔼,s[2]×{0,1}D|f(𝒟,s)f(𝒰D)|2\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}|f(\mathcal{D}_{\ell,s})-f(\mathcal{U}_{D})|^{2}
=𝔼,s[2]×{0,1}D14|f(𝒟,s)f(𝒟1,s)|2\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\frac{1}{4}|f(\mathcal{D}_{\ell,s})-f(\mathcal{D}_{1-\ell,s})|^{2}
=𝔼,s[2]×{0,1}Df^(s)2.\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\hat{f}(s)^{2}. (Proposition 6.1)

By Lemma 3.8, it follows that

s{0,1}Df^(s)2=𝔼x𝒰Df(x)2ff(𝒰D)L.\displaystyle\sum_{s\in\{0,1\}^{D}}\hat{f}(s)^{2}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}f(x)^{2}\leq\|f\|_{\infty}\cdot f(\mathcal{U}_{D})\leq L.

Therefore, we can set W(L):=L2D\displaystyle W(L):=\frac{L}{2^{D}}. In this case, W\displaystyle W is clearly concave. By Theorem 5.7, it follows that

𝔼,s[2]×{0,1}D[KL(R(𝒟,s)||R(𝒰D))]2W(2eε)+4(τ1)2δ4eε2D+4δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R(\mathcal{D}_{\ell,s})||R(\mathcal{U}_{D}))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta\leq\frac{4e^{\varepsilon}}{2^{D}}+4\delta.\qed

Now we apply Lemma 6.4 to the ParityLearning problem. Recall that in ParityLearning, there is a random hidden element s{0,1}D\displaystyle s\in\{0,1\}^{D}, and each user gets a random element x\displaystyle x together with the inner product s,x\displaystyle\langle s,x\rangle over 𝔽2\displaystyle\mathbb{F}_{2}. Appending the label to the vector, each user indeed gets a random sample from the set {x{0,1}D+1:x,(s,1)=0}\displaystyle\{x\in\{0,1\}^{D+1}:\langle x,(s,1)\rangle=0\}, where (s,1)\displaystyle(s,1) is the (D+1)\displaystyle(D+1)-dimensional vector obtained by appending 1\displaystyle 1 to the end of the vector s\displaystyle s. In other words, each user gets a random sample from the distribution 𝒟0,(s,1)\displaystyle\mathcal{D}_{0,(s,1)}.

Lemma 6.5.

For a public-coin randomizer R\displaystyle R with public randomness from 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub}, if there is a family of distributions {𝒟α}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})} over k\displaystyle\mathcal{M}^{k} such that,

𝔼α𝒟𝗉𝗎𝖻dε(𝔼x𝒰DRα(x)||𝒟α)o(1/n),\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n),

where n\displaystyle n is the number of samples, then a public-coin protocol with randomizer R\displaystyle R needs at least Ω(2D/eε)\displaystyle\Omega\left(2^{D}/e^{\varepsilon}\right) samples to solve ParityLearning with probability at least 0.99\displaystyle 0.99.

Proof.

Suppose there is a public-coin protocol P\displaystyle P with randomizer R\displaystyle R solving ParityLearning with probability at least 0.99\displaystyle 0.99. For a dataset W\displaystyle W, we use P(W)\displaystyle P(W) (respectively, Pα(W)\displaystyle P_{\alpha}(W)) to denote the output of P\displaystyle P on W\displaystyle W (with public randomness fixed to α\displaystyle\alpha).

Consider running P\displaystyle P on n\displaystyle n uniformly random samples from {0,1}D+1\displaystyle\{0,1\}^{D+1}. We note that for at least a 0.99\displaystyle 0.99 fraction of s{0,1}D\displaystyle s\in\{0,1\}^{D}, we have that

𝔼α𝒟𝗉𝗎𝖻[Pr[Pα(𝒰D+1n)=s]]=Pr[P(𝒰D+1n)=s]0.01.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]\right]=\Pr[P(\mathcal{U}_{D+1}^{\otimes n})=s]\leq 0.01.

From the assumption that P\displaystyle P solves ParityLearning, for all s{0,1}D\displaystyle s\in\{0,1\}^{D}, we have

𝔼α𝒟𝗉𝗎𝖻[Pr[Pα(𝒟0,(s,1)n)=s]]=Pr[P(𝒟0,(s,1)n)=s]0.99.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\right]=\Pr[P(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\geq 0.99.

By a union bound, with probability at least 0.5\displaystyle 0.5 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have

Pr[Pα(𝒰D+1n)=s]0.1andPr[Pα(𝒟0,(s,1)n)=s]0.9for at least a 0.5 fraction of s{0,1}D.\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]\leq 0.1~{}~{}\text{and}~{}~{}\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\geq 0.9~{}~{}\text{for at least a $\displaystyle 0.5$ fraction of $\displaystyle s\in\{0,1\}^{D}$.} (6)

From our assumption and Markov’s inequality, with probability at least 0.99\displaystyle 0.99 over α𝒟𝗉𝗎𝖻\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}, we have that dε(𝔼x𝒰DRα(x)||𝒟α)o(1/n)\displaystyle d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n). That is, Rα\displaystyle R_{\alpha} is (ε,o(1/n),𝒰D)\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})-dominated.

By a union bound, there exists an αsupp(𝒟𝗉𝗎𝖻)\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub}) such that Rα\displaystyle R_{\alpha} is (ε,o(1/D),𝒰D)\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})-dominated and (6) is satisfied.

By Lemma 6.4, we have that

𝔼,s[2]×{0,1}D+1[KL(Rα(𝒟,s)||Rα(𝒰D))]4eε2D+1+o(1/n),\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D+1}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{\ell,s})||R_{\alpha}(\mathcal{U}_{D}))]\leq\frac{4e^{\varepsilon}}{2^{D+1}}+o(1/n),

which implies

𝔼s{0,1}D[KL(Rα(𝒟0,(s,1))||Rα(𝒰D))]O(eε2D)+o(1/n).\displaystyle\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})||R_{\alpha}(\mathcal{U}_{D}))]\leq O\left(\frac{e^{\varepsilon}}{2^{D}}\right)+o(1/n).

Supposing n=o(2D/eε)\displaystyle n=o(2^{D}/e^{\varepsilon}) for the sake of contradiction, it follows that

𝔼s{0,1}D[KL(Rα(𝒟0,(s,1))n||Rα(𝒰D+1)n)]o(1).\displaystyle\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}||R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n})]\leq o(1).

Since there is at least a 0.5\displaystyle 0.5 fraction of s{0,1}D\displaystyle s\in\{0,1\}^{D} satisfying the conditions in (6), it follows that there exists an s{0,1}D\displaystyle s\in\{0,1\}^{D} satisfying these conditions and KL(Rα(𝒟0,(s,1))n||Rα(𝒰D+1)n)]=o(1)\displaystyle\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}||R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n})]=o(1), which, by Pinsker’s inequality, implies that

Rα(𝒟0,(s,1))nRα(𝒰D+1)nTVo(1),\displaystyle\|R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}-R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n}\|_{TV}\leq o(1),

and

Pr[Pα(𝒟0,(s,1)n)=s]Pr[Pα(𝒰D+1n)=s]+o(1)0.01+o(1),\displaystyle\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\leq\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]+o(1)\leq 0.01+o(1),

a contradiction. ∎

We are now ready to prove Theorem 1.10.


Theorem 1.10. (restated) For any ε=O(1)\displaystyle\varepsilon=O(1), if P\displaystyle P is a public-coin (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol solving ParityLearning with probability at least 0.99\displaystyle 0.99, then nΩ(2D/(k+1))\displaystyle n\geq\Omega(2^{D/(k+1)}).

Proof of Theorem 1.10.

Applying Lemma 5.6 and letting τ=ε+k(1+lnn)\displaystyle\tau=\varepsilon+k(1+\ln n), we have that

𝔼α𝒟𝗉𝗎𝖻dτ(𝔼x𝒰DRα(x)||𝒟α)o(1/n),\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n),

for a distribution family {𝒟α}αsupp(𝒟𝗉𝗎𝖻)\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}.

By Lemma 6.5, nΩ(2D/eτ)Ω(2D/(en)k)\displaystyle n\geq\Omega(2^{D}/e^{\tau})\geq\Omega(2^{D}/(en)^{k}). It then follows that nk+1Ω(2D/ek)\displaystyle n^{k+1}\geq\Omega(2^{D}/e^{k}) and consequently nΩ(2D/(k+1))\displaystyle n\geq\Omega(2^{D/(k+1)}). ∎

7 Lower Bound for CountDistinct with Maximum Hardness

In this section, we prove Theorem 1.1, which gives a Ω(n)\displaystyle\Omega(n) lower bound on the error of DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols for CountDistinct.

7.1 Preliminaries

For (,s)[2]×{0,1}D\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}, recall that 𝒟,s\displaystyle\mathcal{D}_{\ell,s} is the uniform distribution on {x{0,1}D:x,s=}\displaystyle\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}. As in Section 2.2, we also use 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} to denote the mixture of 𝒟,s\displaystyle\mathcal{D}_{\ell,s} and 𝒰D\displaystyle\mathcal{U}_{D} which outputs a sample from 𝒟,s\displaystyle\mathcal{D}_{\ell,s} with probability α\displaystyle\alpha and a sample from 𝒰D\displaystyle\mathcal{U}_{D} with probability 1α\displaystyle 1-\alpha. Note that 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} can also be interpreted as the mixture of 𝒟,s\displaystyle\mathcal{D}_{\ell,s} and 𝒟1,s\displaystyle\mathcal{D}_{1-\ell,s} that outputs a sample from 𝒟,s\displaystyle\mathcal{D}_{\ell,s} with probability 12+α2\displaystyle\frac{1}{2}+\frac{\alpha}{2}, and a sample from 𝒟1,s\displaystyle\mathcal{D}_{1-\ell,s} with probability 12α2\displaystyle\frac{1}{2}-\frac{\alpha}{2}. We next estimate the number of distinct elements in n\displaystyle n samples taken from 𝒟,sα\displaystyle\mathcal{D}^{\alpha}_{\ell,s}.

Proposition 7.1.

Set D=logn\displaystyle D=\log n. For α(0,0.01)\displaystyle\alpha\in(0,0.01) and any (,s)[2]×{0,1}D\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}, let X\displaystyle X be the number of distinct elements in n\displaystyle n samples drawn from 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha}. We have that

Pr[|X(1e1cosh(α))n|>10n]<0.01.\displaystyle\Pr\left[\left|X-(1-e^{-1}\cosh(\alpha))\cdot n\right|>10\sqrt{n}\right]<0.01.
Proof.

In the following, we identify the index space [n]\displaystyle[n] with {0,1}logn\displaystyle\{0,1\}^{\log n} in the natural way. For i[n]\displaystyle i\in[n], we use Xi\displaystyle X_{i} to denote the indicator of whether i\displaystyle i occurs in the n\displaystyle n samples taken from 𝒟,sα\displaystyle\mathcal{D}^{\alpha}_{\ell,s}. Note that these Xi\displaystyle X_{i}’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], and hence a Chernoff bound still applies.

Let i\displaystyle i be an element in the support of 𝒟,s\displaystyle\mathcal{D}_{\ell,s}. Note that i\displaystyle i equals one sample from 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} with probability

2D+1(12+α2)=1+αn.\displaystyle 2^{-D+1}\cdot\left(\frac{1}{2}+\frac{\alpha}{2}\right)=\frac{1+\alpha}{n}.

Therefore, i\displaystyle i occurs in n\displaystyle n i.i.d. samples from 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} with probability

p1:=1(1(1+α)/n)n\displaystyle\displaystyle p_{1}:=1-(1-(1+\alpha)/n)^{n} =1eln(1(1+α)/n)n\displaystyle\displaystyle=1-e^{\ln(1-(1+\alpha)/n)\cdot n}
=1e((1+α)/n+Θ((1+α)/n)2)n\displaystyle\displaystyle=1-e^{(-(1+\alpha)/n+\Theta((1+\alpha)/n)^{2})\cdot n}
=1e(1+α)eΘ(1/n).\displaystyle\displaystyle=1-e^{-(1+\alpha)}\cdot e^{\Theta(1/n)}.

Therefore, we have that

|p1(1e(1+α))|O(1/n).\displaystyle\left|p_{1}-(1-e^{-(1+\alpha)})\right|\leq O(1/n).

Similarly, for an element i\displaystyle i in the support of 𝒟,s\displaystyle\mathcal{D}_{\ell,s}, i\displaystyle i equals one sample from 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} with probability

2D+1(12α2)=1αn.\displaystyle 2^{-D+1}\cdot\left(\frac{1}{2}-\frac{\alpha}{2}\right)=\frac{1-\alpha}{n}.

Hence, by a similar calculation, i\displaystyle i occurs in n\displaystyle n i.i.d. samples from 𝒟,sα\displaystyle\mathcal{D}_{\ell,s}^{\alpha} with probability

p2:=1(1(1α)/n)n=1e(1α)eΘ(1/n),\displaystyle\displaystyle p_{2}:=1-(1-(1-\alpha)/n)^{n}=1-e^{-(1-\alpha)}\cdot e^{\Theta(1/n)},

and

|p2(1e(1α))|O(1/n).\displaystyle\left|p_{2}-(1-e^{-(1-\alpha)})\right|\leq O(1/n).

Hence, we have that

μ=𝔼[i[n]Xi]\displaystyle\displaystyle\mu=\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in[n]}X_{i}\right] =𝔼[isupp(𝒟,s)Xi]+𝔼[isupp(𝒟,s)Xi]\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in\mathrm{supp}(\mathcal{D}_{\ell,s})}X_{i}\right]+\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in\mathrm{supp}(\mathcal{D}_{\ell,s})}X_{i}\right]
=(p1+p2)n2.\displaystyle\displaystyle=(p_{1}+p_{2})\cdot\frac{n}{2}.

Let

ν=(1e1+α)n/2+(1e1α)n/2=(1e1cosh(α))n,\displaystyle\nu=(1-e^{-1+\alpha})\cdot n/2+(1-e^{-1-\alpha})\cdot n/2=(1-e^{-1}\cosh(\alpha))\cdot n,

where the last equality holds since cosh(α):=eα+eα2\displaystyle\cosh(\alpha):=\frac{e^{\alpha}+e^{-\alpha}}{2}. Let X=i=1nXi\displaystyle X=\sum_{i=1}^{n}X_{i}. Using the Chernoff bound and the fact that |νμ|O(1)\displaystyle|\nu-\mu|\leq O(1), we have that

Pr[|X(1e1cosh(α))n|>10n]<0.01,\displaystyle\Pr\left[\left|X-(1-e^{-1}\cosh(\alpha))\cdot n\right|>10\sqrt{n}\right]<0.01,

which completes the proof. ∎

7.2 DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} Lower bound

Lemma 7.2.

For any ε>0\displaystyle\varepsilon>0 and α[0,1]\displaystyle\alpha\in[0,1], if R\displaystyle R is (ε,δ,𝒰D)\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})-dominated, then we have that

𝔼,j[2]×{0,1}D[KL(P,jα||Q)]α24eε2D+4δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,j\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(P^{\alpha}_{\ell,j}||Q)]\leq\alpha^{2}\cdot\frac{4e^{\varepsilon}}{2^{D}}+4\delta.
Proof.

We follow closely the proof of Lemma 6.4. To apply Theorem 5.7, we set the index set as =[2]×{0,1}D\displaystyle\mathcal{I}=[2]\times\{0,1\}^{D}, the distribution π\displaystyle\pi to be the uniform distribution over \displaystyle\mathcal{I}, {λv}v={𝒟vα}v\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}^{\alpha}\}_{v\in\mathcal{I}}, and μ=𝒰D\displaystyle\mu=\mathcal{U}_{D}.

Clearly, μ\displaystyle\mu 2\displaystyle 2-dominates {λv}\displaystyle\{\lambda_{v}\}. Let f\displaystyle f be a function such that f=L\displaystyle\|f\|_{\infty}=L and f(μ)=1\displaystyle f(\mu)=1. It follows that

𝔼vπ|f(μ)f(λv)|2=𝔼(,s)[2]×{0,1}Dα2f^(s)2.\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}|f(\mu)-f(\lambda_{v})|^{2}=\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}\alpha^{2}\cdot\hat{f}(s)^{2}.

Recall that

𝔼(,s)[2]×{0,1}Df^(s)2L2D.\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}\hat{f}(s)^{2}\leq\frac{L}{2^{D}}.

Therefore, we can set W(L):=α2L2D\displaystyle W(L):=\alpha^{2}\cdot\frac{L}{2^{D}}. Clearly, W\displaystyle W is a concave function. By Theorem 5.7, it follows that

𝔼,j[2]×[D][KL(R(𝒟,j)||R(𝒰D))]2W(2eε)+4(τ1)2δα24eε2D+4δ.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,j\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta\leq\alpha^{2}\cdot\frac{4e^{\varepsilon}}{2^{D}}+4\delta.\qed

We now show that the CountDistinct function is hard for (ε,δ)\displaystyle(\varepsilon,\delta)-local algorithms.


Theorem 1.1. (restated) For ε0.49lnn\displaystyle\varepsilon\leq 0.49\cdot\ln n, if P\displaystyle P is a public-coin (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol, then it cannot compute CountDistinctn,n\displaystyle\textsf{\small CountDistinct}_{n,n} with error o(n/eε)\displaystyle o(n/e^{\varepsilon}) and probability at least 0.99\displaystyle 0.99.

Proof.

Let D=logn\displaystyle D=\log n. We identify the input space [n]\displaystyle[n] with {0,1}D\displaystyle\{0,1\}^{D} in the natural way. Suppose there is a public-coin (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol P\displaystyle P solving CountDistinctn,n\displaystyle\textsf{\small CountDistinct}_{n,n} with error o(n/eε)\displaystyle o(n/e^{\varepsilon}) and probability at least 0.99\displaystyle 0.99.

Let R\displaystyle R with public randomness from 𝒟𝗉𝗎𝖻\displaystyle\mathcal{D}_{\sf pub} be the randomizer used in P\displaystyle P. For a dataset W\displaystyle W, we use P(W)\displaystyle P(W) (respectively, Pγ(W)\displaystyle P_{\gamma}(W)) to denote the output of P\displaystyle P on the dataset W\displaystyle W (with public randomness fixed to γ\displaystyle\gamma).

Setting α2=120eε\displaystyle\alpha^{2}=\frac{1}{20e^{\varepsilon}}, we let μα=(1e1cosh(α))n\displaystyle\mu_{\alpha}=(1-e^{-1}\cosh(\alpha))\cdot n and μ0=(1e1)n\displaystyle\mu_{0}=(1-e^{-1})\cdot n.

By our assumption on P\displaystyle P, Proposition 7.1 and a union bound, it follows that for every (,s)[2]×{0,1}D\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}, we have

𝔼γ𝒟𝗉𝗎𝖻[Pr[|Pγ((𝒟,sα)n)μα|n1000eε+10n]]0.98.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\gamma\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr\left[\left|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right|\leq\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}\right]\right]\geq 0.98.

Similarly, we have

𝔼γ𝒟𝗉𝗎𝖻[Pr[|Pγ(𝒰Dn)μ0|n1000eε+10n]]0.98.\displaystyle\operatornamewithlimits{\mathbb{E}}_{\gamma\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right|\leq\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}\right]\right]\geq 0.98.

Note that by our choice of ε\displaystyle\varepsilon, we have n1000eε+10n<n800eε\displaystyle\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}<\frac{n}{800e^{\varepsilon}}. By a union bound, it follows that with probability at least 0.5\displaystyle 0.5 over γ𝒟𝗉𝗎𝖻\displaystyle\gamma\leftarrow\mathcal{D}_{\sf pub}, we have

Pr[|Pγ((𝒟,sα)n)μα|<n800eε]0.8andPr[|Pγ(𝒰Dn)μ0|<n800eε]0.8\displaystyle\displaystyle\Pr\left[\left|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8~{}~{}\text{and}~{}~{}\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8
for at least a 0.5 fraction of (,s)[2]×{0,1}D\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}. (7)

By the definition of public-coin DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols, we have that with probability at least 0.99\displaystyle 0.99 over γ𝒟𝗉𝗎𝖻\displaystyle\gamma\leftarrow\mathcal{D}_{\sf pub}, Rγ\displaystyle R_{\gamma} is (ε,o(1/n),𝒰D)\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})-dominated. By a union bound, there exists a γ\displaystyle\gamma such that Rγ\displaystyle R_{\gamma} is (ε,o(1/n),𝒰D)\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})-dominated and it satisfies the condition in (7). We fix such a γ\displaystyle\gamma.

By Lemma 7.2, it follows that

𝔼(,s)[2]×{0,1}D[KL(Rγ(𝒟α,s)||Rγ(𝒰D))]\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))] α22eε2D+o(1/n).\displaystyle\displaystyle\leq\alpha^{2}\cdot\frac{2e^{\varepsilon}}{2^{D}}+o(1/n).

Recall that α2=120eε\displaystyle\alpha^{2}=\frac{1}{20e^{\varepsilon}}, the above further simplifies to

𝔼(,s)[2]×{0,1}D[KL(Rγ(𝒟α,s)||Rγ(𝒰D))]110n+o(1/n).\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))]\leq\frac{1}{10n}+o(1/n).

Let S\displaystyle S be the set of (,s)\displaystyle(\ell,s) satisfying the conditions on (,s)\displaystyle(\ell,s) stated in (7). Since S\displaystyle S contains at aleast a 0.5\displaystyle 0.5 fraction of [2]×{0,1}D\displaystyle[2]\times\{0,1\}^{D}, it follows that

𝔼(,s)S[KL(Rγ(𝒟α,s)||Rγ(𝒰D))]15n+o(1/n).\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in S}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))]\leq\frac{1}{5n}+o(1/n).

This means that there exists a pair (,s)S\displaystyle(\ell,s)\in S such that KL(Rγ(𝒟α,s)||Rγ(𝒰D))1/5n+o(1/n)\displaystyle\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))\leq 1/5n+o(1/n). We fix such a pair (,s)\displaystyle(\ell,s).

We have KL(Rγ(𝒟α,s)n||Rγ(𝒰D)n)1/5+o(1).\displaystyle\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n}||R_{\gamma}(\mathcal{U}_{D})^{\otimes n})\leq 1/5+o(1). By Pinsker’s inequality, it follows that

Rγ(𝒟α,s)nRγ(𝒰D)n)TV1/21/5+o(1)0.4.\displaystyle\|R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n}-R_{\gamma}(\mathcal{U}_{D})^{\otimes n})\|_{TV}\leq\sqrt{1/2\cdot 1/5+o(1)}\leq 0.4.

Since (,s)S\displaystyle(\ell,s)\in S, it follows that

Pr[|Pγ(𝒰Dn)μα|<n800eε]Pr[|Pγ((𝒟α,s)n)μα|<n800eε]0.40.4.\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{\alpha}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq\Pr\left[\left|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right|<\frac{n}{800e^{\varepsilon}}\right]-0.4\geq 0.4. (8)

On the other hand, we also have

Pr[|Pγ(𝒰Dn)μ0|<n800eε]0.8.\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8. (9)

Note that |μαμ0|=e1(cosh(α)1)nα2/2e1n>n/200eε\displaystyle|\mu_{\alpha}-\mu_{0}|=e^{-1}(\cosh(\alpha)-1)\cdot n\geq\alpha^{2}/2\cdot e^{-1}\cdot n>n/200e^{\varepsilon}. Hence (8) and (9) give a contradiction. ∎

8 Low-Message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} Protocols for CountDistinct

In this section, we present our low-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocols for CountDistinct, thereby proving Theorem 1.6.

In Section 8.1, we review the previous protocol of [BCJM20], and discuss some intuitions underlying our improvement. In Section 8.2, we introduce some necessary definitions and technical tools. Next, in Section 8.3 we present our private-coin protocol (stated in Theorem 8.4) for CountDistinct with error O~(D)\displaystyle\tilde{O}(\sqrt{D}), which uses 1/2+o(1)\displaystyle 1/2+o(1) message per user in expectation when the input universe size is below n/polylog(n)\displaystyle n/\mathop{\mathrm{polylog}}(n). We will also show that a simple modification of this protocol is (ln(n)+O(1))\displaystyle(\ln(n)+O(1))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}, thereby proving Theorem 1.3. Finally, based on the private-coin protocol, in Section 8.4 we prove Theorem 1.6 by presenting our public-coin protocol for CountDistinct, which uses less than 1\displaystyle 1 message per user in expectation without any restriction on the universe size.

8.1 Intuition

We now turn to sketch the main ideas behind Theorem 8.4 and Theorem 1.6. It would be instructive to review the O~(D)\displaystyle\widetilde{O}(D)-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol solving CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error O(D)\displaystyle O(\sqrt{D}) from [BCJM20].

The DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} Model. To gain more insights about their protocol, we consider the following mod 2 shuffle model (DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}), where two messages of the same content “cancel each other”, i.e., the transcript is now a random permutation of messages that appear an odd number of times.

The DP requirement now applies to this new version of transcript. The same holds for the analyzer, who now can only see the new version of transcript. [BCJM20] first gave a DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} protocol for CountDistinct, and then adapted that protocol to the standard DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} model using the Ishai et al. protocol for secure aggregation [IKOS06].151515They did not explicit specify their protocol in the DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} model, but it is implicit in their proof of security.

Low-Message Protocol in DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}. The DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} protocol of [BCJM20] (referred as P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2} in what follows) first sets a parameter q=Θ(1/n)\displaystyle q=\Theta(1/n) so that Pr[𝖡𝗂𝗇(n,q)1(mod2)]=1/(2eε/2)\displaystyle\Pr[\mathsf{Bin}(n,q)\equiv 1\pmod{2}]=1/(2e^{\varepsilon/2}). Next, for each user holding an element x[D]\displaystyle x\in[D], the user first sends x\displaystyle x with probability 1/2\displaystyle 1/2. Then for each j[D]\displaystyle j\in[D], the user sends message j\displaystyle j with probability q\displaystyle q. All these events are independent.

Finally, if there are z\displaystyle z messages in the transcript (i.e., there are z\displaystyle z messages occurring an odd number of times in the original transcript), then the analyzer outputs (2zeε/2D)/(eε/21)\displaystyle(2\cdot z\cdot e^{\varepsilon/2}-D)/(e^{\varepsilon/2}-1) as the estimate. Note that a user sends 1/2+Dq=1/2+O(D/n)\displaystyle 1/2+D\cdot q=1/2+O(D/n) message in expectation.

Analysis of the Protocol P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2}. It is shown in [BCJM20] that the above protocol is ε\displaystyle\varepsilon-DP and solves CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error O(D)\displaystyle O(\sqrt{D}). Here we briefly outline the intuition behind it.

Let S\displaystyle S be the set consisting of all inputs of the users. We can see that every iS\displaystyle i\in S belongs to the transcript with probability exactly 1/2\displaystyle 1/2; on the other hand, every i[D]S\displaystyle i\in[D]\setminus S belongs to the transcript with probability exactly Pr[𝖡𝗂𝗇(n,q)1(mod2)]=12eε/2\displaystyle\Pr[\mathsf{Bin}(n,q)\equiv 1\pmod{2}]=\frac{1}{2e^{\varepsilon/2}}. Moreover, all these events are independent. Therefore, a simple calculation shows that (2𝔼[z]eε/2D)/(eε/21)=|S|\displaystyle(2\operatornamewithlimits{\mathbb{E}}[z]e^{\varepsilon/2}-D)/(e^{\varepsilon/2}-1)=|S|, and the accuracy follows from a Chernoff bound. As for the DP guarantee, changing the input of one user only affects the distributions of two messages in the transcript, and it only changes each message’s occurrence probability in the transcript from 1/2\displaystyle 1/2 to 1/2eε/2\displaystyle 1/2e^{\varepsilon/2} or vice versa.

From DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} to DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}}. To obtain an actual DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol from P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2}, the protocol from [BCJM20] (which we henceforth denote by P𝖡𝖢𝖩𝖬\displaystyle P_{\sf BCJM}) runs D\displaystyle D copies of the protocol for securely computing sum over 𝔽2\displaystyle\mathbb{F}_{2} [IKOS06], such that the i\displaystyle i-th protocol Pi\displaystyle P_{i} aims to simulate the number of occurrences of message i\displaystyle i modulo 2\displaystyle 2. For each user i\displaystyle i, if it were to send a message i\displaystyle i in P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2}, it sends one in Pi\displaystyle P_{i}; otherwise it sends zero in Pi\displaystyle P_{i}.

Since the [IKOS06] protocol for computing sum over 𝔽2\displaystyle\mathbb{F}_{2} requires O(log(1/δ)logn+1)\displaystyle O\left(\frac{\log(1/\delta)}{\log n}+1\right) messages from each user [GMPV20, BBGN20], each user needs to send O(D(log(1/δ)logn+1))\displaystyle O\left(D\cdot\left(\frac{\log(1/\delta)}{\log n}+1\right)\right) messages in total. Moreover, from the security condition of Pi\displaystyle P_{i}, for each message i\displaystyle i the transcript only reveals the parity of its number of occurrences, which is exactly what we need in order to simulate DPmod2-shuffle\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}} protocols.

Our Improvement. Note that P𝖡𝖢𝖩𝖬\displaystyle P_{\sf BCJM} requires significantly more messages per user than that of P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2}. Our goal here is to compile P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2} to DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} in a much more efficient way, ideally with no overhead. In P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2} each user sends only 1/2+O(D/n)\displaystyle 1/2+O(D/n) message. This means that when translating to P𝖡𝖢𝖩𝖬\displaystyle P_{\sf BCJM}, users end up sending many zero messages in the Pi\displaystyle P_{i} subprotocols, which is wasteful.

Our crucial idea for improving on the aforementioned protocol is a very simple yet effective alternative to the secure aggregation protocol over 𝔽2\displaystyle\mathbb{F}_{2} of [IKOS06] used in P𝖡𝖢𝖩𝖬\displaystyle P_{\sf BCJM}. In our new subprotocol Pi\displaystyle P_{i}, if a user were to send a message i\displaystyle i in P𝗆𝗈𝖽𝟤\displaystyle P_{\sf mod2}, it sends one to Pi\displaystyle P_{i}; otherwise it draws λ\displaystyle\lambda from a noise distribution 𝒟\displaystyle\mathcal{D} (such that 𝔼[𝒟]polylog(δ1)/n\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathcal{D}]\approx\mathop{\mathrm{polylog}}(\delta^{-1})/n) and sends 2λ\displaystyle 2\lambda many ones to Pi\displaystyle P_{i}. Clearly, our new Pi\displaystyle P_{i} still maintains the parity of occurrences of each messages, and the expected number of messages is roughly 2𝔼[𝒟]D+1/2=O(polylog(δ1))D/n+1/2\displaystyle 2\cdot\operatornamewithlimits{\mathbb{E}}[\mathcal{D}]\cdot D+1/2=O(\mathop{\mathrm{polylog}}(\delta^{-1}))\cdot D/n+1/2. To show that the resulting protocol is DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}}, we build on the techniques of [GKMP20], which show that the noise added can hide the contribution of a single user.

8.2 Preliminaries

We first recall the definition of the negative binomial distribution.

Definition 8.1.

Let r>0\displaystyle r>0 and p[0,1]\displaystyle p\in[0,1], the negative binomial distribution 𝖭𝖡(r,p)\displaystyle\mathsf{NB}(r,p) is defined by Pr[𝖭𝖡(r,p)=k]=(k+r1k)(1p)rpk\displaystyle\Pr[\mathsf{NB}(r,p)=k]=\binom{k+r-1}{k}(1-p)^{r}p^{k} for each non-negative integer k\displaystyle k.161616For a real number α\displaystyle\alpha, (αk):=i=0k1αii+1\displaystyle\binom{\alpha}{k}:=\prod_{i=0}^{k-1}\frac{\alpha-i}{i+1}.

We recall the following key properties of the negative binomial distribution: (1) For α,β>0\displaystyle\alpha,\beta>0 and p[0,1]\displaystyle p\in[0,1], 𝖭𝖡(α,p)+𝖭𝖡(β,p)\displaystyle\mathsf{NB}(\alpha,p)+\mathsf{NB}(\beta,p) has the same distribution as 𝖭𝖡(α+β,p)\displaystyle\mathsf{NB}(\alpha+\beta,p); (2) 𝔼[𝖭𝖡(r,p)]=pr1p\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r,p)]=\frac{pr}{1-p}.

We will need the following lemma from [GKMP20].

Lemma 8.2.

For any ε>0,δ(0,1)\displaystyle\varepsilon>0,\delta\in(0,1), and Δ\displaystyle\Delta\in\mathbb{N}, let p=e0.1ε/Δ\displaystyle p=e^{-0.1\varepsilon/\Delta} and r=50eε/Δlog(δ1)\displaystyle r=50\cdot e^{\varepsilon/\Delta}\cdot\log(\delta^{-1}). For any k{Δ,Δ+1,,Δ1,Δ}\displaystyle k\in\{-\Delta,-\Delta+1,\dotsc,\Delta-1,\Delta\}, dε(k+𝖭𝖡(r,p)||𝖭𝖡(r,p))δ\displaystyle d_{\varepsilon}(k+\mathsf{NB}(r,p)||\mathsf{NB}(r,p))\leq\delta.

The following is a simple corollary of Item (2) of Proposition 3.4 and Lemma 8.2.

Corollary 8.3.

For any ε>0,δ(0,1)\displaystyle\varepsilon>0,\delta\in(0,1), and Δ\displaystyle\Delta\in\mathbb{N}, let p\displaystyle p and r\displaystyle r be as in Lemma 8.2. For any two distributions X\displaystyle X and Y\displaystyle Y on {0,1,2,,Δ}\displaystyle\{0,1,2,\dotsc,\Delta\}, dε(X+𝖭𝖡(r,p)||Y+𝖭𝖡(r,p))δ\displaystyle d_{\varepsilon}(X+\mathsf{NB}(r,p)||Y+\mathsf{NB}(r,p))\leq\delta.

8.3 A Private-Coin Base Protocol

Recall that CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} denotes the restriction of CountDistinct such that every user gets an input from [D]\displaystyle[D], and the goal is compute the number of distinct elements among all users.

We are now ready to prove Theorem 8.4, which is the private-coin case of Theorem 1.6. To simplify the privacy analysis of the protocol and ease its application in Section 8.4, we also allow the input to be 0\displaystyle 0, which means that the user’s input is not counted.

Theorem 8.4.

For any εO(1)\displaystyle\varepsilon\leq O(1) and δ1/n\displaystyle\delta\leq 1/n, there is a private-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol computing CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error O(Dε1)\displaystyle O\left(\sqrt{D}\cdot\varepsilon^{-1}\right) with probability at least 0.99\displaystyle 0.99. Moreover, the expected number of messages sent by each user is 12+O(log(1/δ)2ln(2/ε)εDn)\displaystyle\frac{1}{2}+O\left(\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}\right).

Proof.

Without loss of generality, we can assume that ε1\displaystyle\varepsilon\leq 1. The algorithm requires several global constants that only depend on the values of n,ε\displaystyle n,\varepsilon, and δ\displaystyle\delta. Algorithm 1 specifies these constants. Here, c0\displaystyle c_{0} is a sufficiently large constant to be specified later.

Input: n\displaystyle n is the number of users and the pair (ε,δ)\displaystyle(\varepsilon,\delta) specifies the DP guarantee.
1 ε0=min(ε/6,0.01)\displaystyle\varepsilon_{0}=\min(\varepsilon/6,0.01);
2 Δ=c0logδ1ln(ε01)+1\displaystyle\Delta=\left\lceil c_{0}\cdot\log\delta^{-1}\ln(\varepsilon_{0}^{-1})+1\right\rceil;
3 p=e0.1ε0/Δ\displaystyle p=e^{-0.1\varepsilon_{0}/\Delta};
4 r=50eε0/Δlog(10δ1)\displaystyle r=50\cdot e^{\varepsilon_{0}/\Delta}\cdot\log(10\delta^{-1});
5 q=1(1eε0)1/n2\displaystyle q=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2};
Algorithm 1 Set-Global-Constants(n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta)

Next, we specify the randomizer and the analyzer of the protocol in Algorithm 2 and Algorithm 3 respectively.

Input: x{0}[D]\displaystyle x\in\{0\}\cup[D] is the user’s input. D\displaystyle D is the universe size.
1 Set-Global-Constants(n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta);
2 Toss a uniformly random coin to get v{0,1}\displaystyle v\in\{0,1\};
3 if v=1\displaystyle v=1 and x0\displaystyle x\neq 0 then
4       send message (x)\displaystyle(x);
5      
6for i[D]\displaystyle i\in[D] do
7       Let y𝖡𝖾𝗋(q)\displaystyle y\leftarrow\mathsf{Ber}(q^{\prime});
8       if y=1\displaystyle y=1 then
9             send message (i)\displaystyle(i);
10            
11      Let η𝖭𝖡(r/n,p)\displaystyle\eta\leftarrow\mathsf{NB}(r/n,p);
12       Send 2η\displaystyle 2\cdot\eta messages (i)\displaystyle(i);
13      
Algorithm 2 Randomizer(x\displaystyle x, D\displaystyle D, n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta)
Input: S\displaystyle S is the multi-set of messages. D\displaystyle D is the universe size.
1 Set-Global-Constants(n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta);
2
3for i[D]\displaystyle i\in[D] do
4       Let yi\displaystyle y_{i} be the number of message (i)\displaystyle(i) in S\displaystyle S;
5       Ci=yimod2\displaystyle C_{i}=y_{i}\bmod{2};
6      
7
8C=i=1DCi\displaystyle C=\sum_{i=1}^{D}C_{i};
9 z=2Ceε0Deε01\displaystyle z=\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1};
10 return z\displaystyle z;
Algorithm 3 Analyzer(S\displaystyle S, D\displaystyle D, n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta)
Accuracy Analysis.

We first analyze the error of our protocol. Let E\displaystyle E be the set {xi}i[n],xi0\displaystyle\{x_{i}\}_{i\in[n],x_{i}\neq 0}. Recall that the goal is to estimate |E|\displaystyle|E|.

For each i[D]\displaystyle i\in[D], we analyze the distribution of the random variable Ci\displaystyle C_{i} in Algorithm 3. We observe that: (1) if iE\displaystyle i\in E, then Ci\displaystyle C_{i} is distributed uniformly at random over {0,1}\displaystyle\{0,1\}; (2) if iE\displaystyle i\notin E, then Ci\displaystyle C_{i} is distributed as 𝖡𝖾𝗋(12eε0)\displaystyle\mathsf{Ber}\left(\frac{1}{2e^{\varepsilon_{0}}}\right) by Lemma 8.5; (3) {Ci}i[D]\displaystyle\{C_{i}\}_{i\in[D]} are independent.

Lemma 8.5 ([BCJM20, Lemma 3.5]).

Let n,q\displaystyle n,q^{\prime} be specified as in Global-Constants(n,ε,δ)\displaystyle(n,\varepsilon,\delta). Then, [𝖡𝗂𝗇(n,q)mod2]\displaystyle[\mathsf{Bin}(n,q^{\prime})\mod 2] is distributed identically to 𝖡𝖾𝗋(12eε0)\displaystyle\mathsf{Ber}\left(\frac{1}{2e^{\varepsilon_{0}}}\right).

Hence, we have that 𝔼[C]=|E|12+(D|E|)12eε0\displaystyle\operatornamewithlimits{\mathbb{E}}[C]=|E|\cdot\frac{1}{2}+(D-|E|)\cdot\frac{1}{2e^{\varepsilon_{0}}}. Plugging in the equation defining the output z\displaystyle z, we have 𝔼[z]=𝔼[2Ceε0Deε01]=|E|\displaystyle\operatornamewithlimits{\mathbb{E}}[z]=\operatornamewithlimits{\mathbb{E}}\left[\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1}\right]=|E|. An application of Hoeffding’s inequality implies that

Pr[|z|E||>c(ε0)1D]<0.01,\displaystyle\Pr\left[|z-|E||>c\cdot(\varepsilon_{0})^{-1}\cdot\sqrt{D}\right]<0.01,

for a sufficiently large constant c\displaystyle c. Hence, with probability at least 0.99\displaystyle 0.99, the error of the protocol is less than c(ε0)1D=O(Dε1)\displaystyle c\cdot(\varepsilon_{0})^{-1}\cdot\sqrt{D}=O(\sqrt{D}\cdot\varepsilon^{-1}).

Privacy Analysis.

We now prove that our protocol is indeed (ε,δ)\displaystyle(\varepsilon,\delta)-DP. Note that the multi-set of messages S\displaystyle S can be described by integers (yi)i[D]\displaystyle(y_{i})_{i\in[D]} (corresponding to the histogram of the messages).

Consider two neighboring datasets x=(x1,x2,,xn)\displaystyle x=(x_{1},x_{2},\dotsc,x_{n}) and x=(x1,x2,,xn)\displaystyle x^{\prime}=(x_{1}^{\prime},x_{2},\dotsc,x_{n}) (without loss of generality, we assume that they differ at the first user). Let Y\displaystyle Y and Y\displaystyle Y^{\prime} be the corresponding distributions of (yi)i[D]\displaystyle(y_{i})_{i\in[D]} given input datasets x\displaystyle x and x\displaystyle x^{\prime}. The goal is to show that they satisfy the (ε,δ)\displaystyle(\varepsilon,\delta)-DP constraint. That is, we have to establish that dε(Y||Y)δ\displaystyle d_{\varepsilon}(Y||Y^{\prime})\leq\delta.

To simplify the analysis, we introduce another dataset x¯=(0,x2,,xn)\displaystyle\bar{x}=(0,x_{2},\dotsc,x_{n}), and let Y¯\displaystyle\bar{Y} be the corresponding distribution of (yi)i[D]\displaystyle(y_{i})_{i\in[D]} given input dataset x¯\displaystyle\bar{x}. By the composition rule of (ε,δ)\displaystyle(\varepsilon,\delta)-DP, it suffices to show that the pairs (Y,Y¯)\displaystyle(Y,\bar{Y}) and (Y¯,Y)\displaystyle(\bar{Y},Y^{\prime}) satisfy (ε/2,δ/3)\displaystyle(\varepsilon/2,\delta/3)-DP (note that ε<1\displaystyle\varepsilon<1, and δ/3+eε/2δ/3δ\displaystyle\delta/3+e^{\varepsilon/2}\cdot\delta/3\leq\delta). By symmetry, it suffices to consider the pair (Y,Y¯)\displaystyle(Y,\bar{Y}) and prove that dε/2(Y||Y¯)δ/3\displaystyle d_{\varepsilon/2}(Y||\bar{Y})\leq\delta/3.

Let i=x1\displaystyle i=x_{1}, and mi\displaystyle m_{i} be the number of times that i\displaystyle i appears in x2,,xn\displaystyle x_{2},\dotsc,x_{n}. First note that all coordinates in both Y\displaystyle Y and Y¯\displaystyle\bar{Y} are independent, and furthermore the marginal distribution of Y\displaystyle Y and Y¯\displaystyle\bar{Y} on coordinates in [D]{i}\displaystyle[D]\setminus\{i\} are identical. Hence, by Item (1) of Proposition 3.4, it suffices to establish that Yi\displaystyle Y_{i} and Y¯i\displaystyle\bar{Y}_{i} satisfy (ε/2,δ/3)\displaystyle(\varepsilon/2,\delta/3)-DP.

The distribution of Y¯i\displaystyle\bar{Y}_{i} is 𝖡𝗂𝗇(n,q)+2𝖭𝖡(r,p)+𝖡𝗂𝗇(mi,1/2)\displaystyle\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Bin}(m_{i},1/2), and the distribution of Yi\displaystyle Y_{i} is 𝖡𝗂𝗇(n,q)+2𝖭𝖡(r,p)+𝖡𝗂𝗇(mi+1,1/2)\displaystyle\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Bin}(m_{i}+1,1/2).171717 Recall that for two random variables X\displaystyle X and Y\displaystyle Y, we use X+Y\displaystyle X+Y to denote the random variable distributed as a sum of two independent samples from X\displaystyle X and Y\displaystyle Y. Since 𝖡𝗂𝗇(mi+1,1/2)=𝖡𝗂𝗇(mi,1/2)+𝖡𝖾𝗋(1/2)\displaystyle\mathsf{Bin}(m_{i}+1,1/2)=\mathsf{Bin}(m_{i},1/2)+\mathsf{Ber}(1/2), it suffices to consider the case where mi=0\displaystyle m_{i}=0 by Item (1) of Proposition 3.4.

We need the following lemma, whose proof is deferred until we finish the proof of Theorem 8.4.

Lemma 8.6.

Let n,q,λ\displaystyle n,q^{\prime},\lambda be specified as in Set-Global-Constants(n,ε,δ)\displaystyle(n,\varepsilon,\delta), X=𝖡𝗂𝗇(n,q)+2𝖭𝖡(r,p)\displaystyle X=\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p), and Y=𝖡𝗂𝗇(n,q)+2𝖭𝖡(r,p)+𝖡𝖾𝗋(1/2)\displaystyle Y=\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Ber}(1/2). Then,

dε/2(X||Y)δ/3anddε/2(Y||X)δ/3.\displaystyle d_{\varepsilon/2}(X||Y)\leq\delta/3\quad\text{and}\quad d_{\varepsilon/2}(Y||X)\leq\delta/3.

By Lemma 8.6 and previous discussions, it follows that Dε(Y||Y)δ\displaystyle D_{\varepsilon}(Y||Y^{\prime})\leq\delta, which shows that our protocol is (ε,δ)\displaystyle(\varepsilon,\delta)-DP as desired.

In the following, we will need the proposition below which gives us an estimate on q\displaystyle q^{\prime}.

Proposition 8.7.

Let n,q,ε0\displaystyle n,q^{\prime},\varepsilon_{0} be specified as in Set-Global-Constants(n,ε,δ)\displaystyle(n,\varepsilon,\delta). Then, qO(ln(ε01)/n)\displaystyle q^{\prime}\leq O(\ln(\varepsilon_{0}^{-1})/n).

Proof.

Since ε00.01\displaystyle\varepsilon_{0}\leq 0.01, we have eε01ε0/2\displaystyle e^{-\varepsilon_{0}}\leq 1-\varepsilon_{0}/2. Hence, 1eε0ε0/2\displaystyle 1-e^{-\varepsilon_{0}}\geq\varepsilon_{0}/2. Plugging in the definition of q\displaystyle q^{\prime}, it follows that (1eε0)1/neln(ε0/2)/n1+ln(ε0/2)/n\displaystyle(1-e^{-\varepsilon_{0}})^{1/n}\geq e^{\ln(\varepsilon_{0}/2)/n}\geq 1+\ln(\varepsilon_{0}/2)/n. Finally, it follows that

q=1(1eε0)1/n2ln(ε0/2)/2n=O(ln(ε01)/n).\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}\leq-\ln(\varepsilon_{0}/2)/2n=O(\ln(\varepsilon_{0}^{-1})/n).\qed
Efficiency Analysis.

We now analyze the message complexity of our protocol. Note that

𝔼[𝖭𝖡(r/n,p)]=1npr1p=O(1nΔε0log(1/δ))=O(1nε1ln(2/ε)log(1/δ)2).\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r/n,p)]=\frac{1}{n}\cdot\frac{pr}{1-p}=O\left(\frac{1}{n}\cdot\frac{\Delta}{\varepsilon_{0}}\cdot\log(1/\delta)\right)=O\left(\frac{1}{n}\cdot\varepsilon^{-1}\ln(2/\varepsilon)\cdot\log(1/\delta)^{2}\right).

By a straightforward calculation, each user sends

12+O(D𝔼[𝖭𝖡(r/n,p)]+Dq)12+O(log(1/δ)2ln(2/ε)εDn)\displaystyle\frac{1}{2}+O(D\cdot\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r/n,p)]+D\cdot q^{\prime})\leq\frac{1}{2}+O\left(\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}\right)

messages in expectation. ∎

Finally, we prove Lemma 8.6.

Proof of Lemma 8.6.

We consider bounding Dε/2(X||Y)\displaystyle D_{\varepsilon/2}(X||Y) first. Note that since q=O(ln(ε01)/n)\displaystyle q^{\prime}=O(\ln(\varepsilon_{0}^{-1})/n) by Proposition 8.7, we set the constant c0\displaystyle c_{0} so that

Pr[𝖡𝗂𝗇(n,q)>c0logδ1ln(ε1)]δ/10.\displaystyle\Pr\left[\mathsf{Bin}(n,q^{\prime})>c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})\right]\leq\delta/10.

Recall that Δ=c0logδ1ln(ε1)+1\displaystyle\Delta=\lceil c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})+1\rceil, and note that our choices of r\displaystyle r and p\displaystyle p satisfy Lemma 8.2 with privacy parameters ε0ε/6\displaystyle\varepsilon_{0}\leq\varepsilon/6 and δ/10\displaystyle\delta/10.

Now, let A=𝖡𝗂𝗇(n,q)\displaystyle A=\mathsf{Bin}(n,q^{\prime}), N=𝖭𝖡(r,p)\displaystyle N=\mathsf{NB}(r,p), and B=𝖡𝖾𝗋(1/2)\displaystyle B=\mathsf{Ber}(1/2).

To apply Item (2) of Proposition 3.4, we are going to decompose X=A+2N\displaystyle X=A+2\cdot N and Y=A+2N+B\displaystyle Y=A+2\cdot N+B into a weighted sum of three sub-distributions.

Decomposition of X=A+2N\displaystyle X=A+2\cdot N.

We define three events on A\displaystyle A as follows:

big=[A>c0log(δ1)],even=[Ac0log(δ1)A0mod2],\displaystyle\mathcal{E}_{big}=[A>c_{0}\cdot\log(\delta^{-1})],\quad\mathcal{E}_{even}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A\equiv 0\bmod 2],

and

odd=[Ac0log(δ1)A1mod2].\displaystyle\mathcal{E}_{odd}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A\equiv 1\bmod 2].

We let αbig=PrA[big]\displaystyle\alpha_{big}=\Pr_{A}[\mathcal{E}_{big}], αeven=PrA[even]\displaystyle\alpha_{even}=\Pr_{A}[\mathcal{E}_{even}], αodd=PrA[even]\displaystyle\alpha_{odd}=\Pr_{A}[\mathcal{E}_{even}].

From our choice of c0\displaystyle c_{0}, we have αbig=Pr[𝖡𝗂𝗇(n,q)>c0logδ1ln(ε1)]δ/10\displaystyle\alpha_{big}=\Pr\left[\mathsf{Bin}(n,q^{\prime})>c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})\right]\leq\delta/10. Let q=12eε0\displaystyle q=\frac{1}{2e^{\varepsilon_{0}}}. By Lemma 8.5, it follows that |αoddq|δ/10\displaystyle|\alpha_{odd}-q|\leq\delta/10 and |αeven(1q)|δ/10\displaystyle|\alpha_{even}-(1-q)|\leq\delta/10.

Therefore, let Abig:=A|big\displaystyle A_{big}:=A|\mathcal{E}_{big}, Aeven:=A|even\displaystyle A_{even}:=A|\mathcal{E}_{even} and Aodd:=A|odd\displaystyle A_{odd}:=A|\mathcal{E}_{odd}. We can now decompose A+2N\displaystyle A+2N as a mixture of components Abig+2N\displaystyle A_{big}+2N, Aeven+2N\displaystyle A_{even}+2N and Aodd+2N\displaystyle A_{odd}+2N with corresponding mixing weights αbig\displaystyle\alpha_{big}, αeven\displaystyle\alpha_{even} and αodd\displaystyle\alpha_{odd}.

Decomposition of Y=A+2N+B\displaystyle Y=A+2\cdot N+B.

Now, we define three events on (A,B)\displaystyle(A,B) as follows

~big=[A>c0log(δ1)],~even=[Ac0log(δ1)A+B0mod2],\displaystyle\widetilde{\mathcal{E}}_{big}=[A>c_{0}\cdot\log(\delta^{-1})],\quad\widetilde{\mathcal{E}}_{even}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A+B\equiv 0\bmod 2],

and

~odd=[Ac0log(δ1)A+B1mod2].\displaystyle\widetilde{\mathcal{E}}_{odd}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A+B\equiv 1\bmod 2].

Similarly, we let βbig=PrA,B[~big]\displaystyle\beta_{big}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{big}], βeven=PrA,B[~even]\displaystyle\beta_{even}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{even}], βodd=PrA,B[~even]\displaystyle\beta_{odd}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{even}].

By our choice of c0\displaystyle c_{0}, we have βbigδ/10\displaystyle\beta_{big}\leq\delta/10. Since Pr[A+B1mod2]=1/2\displaystyle\Pr[A+B\equiv 1\bmod 2]=1/2, it follows that |βeven1/2|δ/10\displaystyle|\beta_{even}-1/2|\leq\delta/10 and |βodd1/2|δ/10\displaystyle|\beta_{odd}-1/2|\leq\delta/10.

Let (A+B)big:=(A+B)|~big\displaystyle(A+B)_{big}:=(A+B)|\widetilde{\mathcal{E}}_{big}, (A+B)even:=(A+B)|~even\displaystyle(A+B)_{even}:=(A+B)|\widetilde{\mathcal{E}}_{even} and (A+B)odd:=(A+B)|~odd\displaystyle(A+B)_{odd}:=(A+B)|\widetilde{\mathcal{E}}_{odd}. We therefore decompose A+2N+B\displaystyle A+2N+B as a mixture of components (A+B)big+2N\displaystyle(A+B)_{big}+2N, (A+B)even+2N\displaystyle(A+B)_{even}+2N and (A+B)odd+2N\displaystyle(A+B)_{odd}+2N with mixing weights βbig\displaystyle\beta_{big}, βeven\displaystyle\beta_{even} and βodd\displaystyle\beta_{odd}.

Bounding dε/2(X||Y)\displaystyle d_{\varepsilon/2}(X||Y).

By Item (2) of Proposition 3.4, we have that

dε/2(X||Y)\displaystyle\displaystyle d_{\varepsilon/2}(X||Y)\leq αbig\displaystyle\displaystyle\alpha_{big}
+\displaystyle\displaystyle+ αevendε/2+ln(βeven/αeven)(Aeven+2N||(A+B)even+2N)\displaystyle\displaystyle\alpha_{even}\cdot d_{\varepsilon/2+\ln(\beta_{even}/\alpha_{even})}(A_{even}+2N||(A+B)_{even}+2N)
+\displaystyle\displaystyle+ αodddε/2+ln(βodd/αodd)(Aodd+2N||(A+B)odd+2N).\displaystyle\displaystyle\alpha_{odd}\cdot d_{\varepsilon/2+\ln(\beta_{odd}/\alpha_{odd})}(A_{odd}+2N||(A+B)_{odd}+2N).

Now, note that βoddαodd1\displaystyle\frac{\beta_{odd}}{\alpha_{odd}}\geq 1, and βevenαeven1/2δ/101q+δ/10e2ε0\displaystyle\frac{\beta_{even}}{\alpha_{even}}\geq\frac{1/2-\delta/10}{1-q+\delta/10}\geq e^{-2\varepsilon_{0}}. It follows that ε/2+ln(βeven/αeven)ε/22ε0ε0\displaystyle\varepsilon/2+\ln(\beta_{even}/\alpha_{even})\geq\varepsilon/2-2\varepsilon_{0}\geq\varepsilon_{0} (since ε0ε/6\displaystyle\varepsilon_{0}\leq\varepsilon/6), and ε/2+ln(βodd/αodd)ε/2ε0\displaystyle\varepsilon/2+\ln(\beta_{odd}/\alpha_{odd})\geq\varepsilon/2\geq\varepsilon_{0}.

Hence by Corollary 8.3, we have that

dε/2(X||Y)δ/10+dε0(Aodd+2N||(A+B)odd+2N)+dε0(Aeven+2N||(A+B)even+2N)δ/3.\displaystyle d_{\varepsilon/2}(X||Y)\leq\delta/10+d_{\varepsilon_{0}}(A_{odd}+2N||(A+B)_{odd}+2N)+d_{\varepsilon_{0}}(A_{even}+2N||(A+B)_{even}+2N)\leq\delta/3.

By a similar calculation, we can also bound dε(Y||X)\displaystyle d_{\varepsilon}(Y||X) by δ/3\displaystyle\delta/3. ∎

Extension to Robust Shuffle Privacy.

Now we briefly discuss how to generalize the analysis of the above protocol in order to show that it also satisfies the stronger robust shuffle privacy condition. We first need the following formal definition of robust shuffle privacy.

Definition 8.8 ([BCJM20]).

A protocol P=(R,S,A)\displaystyle P=(R,S,A) is (ε,δ,γ)\displaystyle(\varepsilon,\delta,\gamma)-robustly shuffle differential private if, for all n\displaystyle n\in\mathbb{N} and γγ\displaystyle\gamma^{\prime}\geq\gamma, the algorithm SγnRγn\displaystyle S_{\gamma^{\prime}n}\circ R^{\gamma^{\prime}n} is (ε,δ)\displaystyle(\varepsilon,\delta)-DP. In other words, P\displaystyle P guarantees (ε,δ)\displaystyle(\varepsilon,\delta)-shuffle privacy whenever at least a γ\displaystyle\gamma fraction of users follow the protocol.

Note that while the above definition requires the privacy condition to be satisfied whenever there is at least a γ\displaystyle\gamma fraction of users participating, the accuracy condition is only required when all users participate. That is, if some users drop from the protocol, then the analyzer does not need to output an accurate estimate of CountDistinct.

Theorem 8.9.

For two constants γ,ε(0,1]\displaystyle\gamma,\varepsilon\in(0,1], and δ1/n\displaystyle\delta\leq 1/n, there is an (ε,δ,γ)\displaystyle(\varepsilon,\delta,\gamma)-robustly shuffle differentially private protocol solving CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} with error Oγ,ε(D)\displaystyle O_{\gamma,\varepsilon}\left(\sqrt{D}\right) and with probability at least 0.99\displaystyle 0.99. Moreover, the expected number of messages sent by each user is 12+Oγ,ε(log(1/δ)2Dn)\displaystyle\frac{1}{2}+O_{\gamma,\varepsilon}\left(\log(1/\delta)^{2}\cdot\frac{D}{n}\right).181818To make the notation clean, we choose not to analyze the exact dependence on ε\displaystyle\varepsilon and γ\displaystyle\gamma.

Proof Sketch.

To make the algorithm in Theorem 8.4 robustly shuffle private, we need the following modifications:

  • In Algorithm 1, we set q=1(1eε0)1/(γn)2\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/(\gamma n)}}{2}, instead of q=1(1eε0)1/n2\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}.

  • In Algorithm 2, we let η𝖭𝖡(r/(γn),p)\displaystyle\eta\leftarrow\mathsf{NB}(r/(\gamma n),p), instead of η𝖭𝖡(r/n,p)\displaystyle\eta\leftarrow\mathsf{NB}(r/n,p).

  • In Algorithm 3, we set z=2CτDτ1\displaystyle z=\frac{2C\tau-D}{\tau-1} for τ=11(1eε0)1/γ\displaystyle\tau=\frac{1}{1-(1-e^{\varepsilon_{0}})^{1/\gamma}}, instead of z=2Ceε0Deε01\displaystyle z=\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1}.

The first two modifications guarantee that there is enough noise even when only γn\displaystyle\gamma\cdot n users participate, so that the privacy analysis of Theorem 8.4 goes through. We now show that the last modification allows us to obtain an accurate estimate of CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} when all users participate. In the following, we will use the same notation as in the proof of Theorem 8.4. Note that we have

q=1(1eε0)1/(γn)2=1(1τ1)1/n2.\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/(\gamma n)}}{2}=\frac{1-(1-\tau^{-1})^{1/n}}{2}.

Hence by Lemma 8.5, Ci\displaystyle C_{i} is distributed as 𝖡𝖾𝗋(1/2τ)\displaystyle\mathsf{Ber}(1/2\tau) when iE\displaystyle i\notin E. A similar calculation then shows that the error can be bounded by Oτ(D)=Oε,γ(D)\displaystyle O_{\tau}(\sqrt{D})=O_{\varepsilon,\gamma}(\sqrt{D}). ∎

ln(O(n))\displaystyle\ln(O(n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} Protocol for CountDistinct.

Finally, we show that the protocol from Theorem 8.4 is also ln(O(n))\displaystyle\ln(O(n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} with a simple modification, which proves Theorem 1.3 (restated below).


Theorem 1.3. (restated) There is a (ln(n)+O(1))\displaystyle(\ln(n)+O(1))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocol computing CountDistinctn,n\displaystyle\textsf{\small CountDistinct}_{n,n} with error O(n)\displaystyle O(\sqrt{n}).

Proof Sketch.

Let D=n\displaystyle D=n. We consider the following modification of Algorithm 2.

Input: x[D]\displaystyle x\in[D] is the user’s input. D\displaystyle D is the universe size.
1 ε0=1\displaystyle\varepsilon_{0}=1, q=1(1eε0)1/n2\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2};
2 Toss a uniformly random coin to get v{0,1}\displaystyle v\in\{0,1\};
3 if v=1\displaystyle v=1 then
4       send message (x)\displaystyle(x);
5      
6for i[D]{x}\displaystyle i\in[D]\setminus\{x\} do
7       Let y𝖡𝖾𝗋(q)\displaystyle y\leftarrow\mathsf{Ber}(q^{\prime});
8       if y=1\displaystyle y=1 then
9             send message (i)\displaystyle(i);
10            
11      
Algorithm 4 Randomizer(x\displaystyle x, D\displaystyle D, n\displaystyle n, ε\displaystyle\varepsilon, δ\displaystyle\delta)

That is, in Algorithm 4 we remove the noise messages sampled from the distribution 2𝖭𝖡(r/n,p)\displaystyle 2\cdot\mathsf{NB}(r/n,p). Also, we do not send the same message more than once (the loop over i\displaystyle i skips the element x\displaystyle x).

When viewing it as a local protocol, we can assume that each user first collects all the messages it would send in Algorithm 4, and then simply outputs the histogram (so our new local randomizer only sends a single message). The analyzer in the local protocol can then aggregate these histograms, and apply the analyzer in Algorithm 3. By the same accuracy proof as in Theorem 8.4, it follows that the protocol achieves error O(n)\displaystyle O(\sqrt{n}) with probability at least 0.99\displaystyle 0.99. So it only remains to prove that the protocol is ln(O(n))\displaystyle\ln(O(n))-DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}}.

We let R\displaystyle R be the randomizer in Algorithm 4, and we use 𝖧𝗂𝗌𝗍(R(x))\displaystyle\mathsf{Hist}(R(x)) to denote the distribution of the histogram of the messages output by R\displaystyle R, which is exactly the output distribution of our new local randomizer.

Without loss of generality, it suffices to show that for all possible histograms z{0,1}D\displaystyle z\in\{0,1\}^{D} (note that Algorithm 4 does not send a message more than once), it holds that

𝖧𝗂𝗌𝗍(R(1))z𝖧𝗂𝗌𝗍(R(2))zO(n).\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}}\leq O(n).

Note that 𝖧𝗂𝗌𝗍(R(1))z𝖧𝗂𝗌𝗍(R(2))z\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}} only depends on the first two bits of z\displaystyle z. By enumerating all possible combination of two bits, we can bound it by

𝖧𝗂𝗌𝗍(R(1))z𝖧𝗂𝗌𝗍(R(2))z1/2q1q1/2=1qqO(n).\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}}\leq\frac{1/2}{q^{\prime}}\cdot\frac{1-q^{\prime}}{1/2}=\frac{1-q^{\prime}}{q^{\prime}}\leq O(n).

The last inequality follows from the fact that q=Θ(1/n)\displaystyle q^{\prime}=\Theta(1/n). ∎

8.4 Public-Coin Protocol

We are now ready to prove Theorem 1.6 (restated below).


Theorem 1.6. (restated) For all εO(1)\displaystyle\varepsilon\leq O(1) and δ1/n\displaystyle\delta\leq 1/n, there is a public-coin (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} protocol computing CountDistinctn\displaystyle\textsf{\small CountDistinct}_{n} with error O(nlog(δ1)ε1.5ln(2/ε))\displaystyle O\left(\sqrt{n}\cdot\log(\delta^{-1})\cdot\varepsilon^{-1.5}\cdot\sqrt{\ln(2/\varepsilon)}\right) and probability at least 0.99\displaystyle 0.99. Moreover, the expected number of messages sent by each user is at most 1\displaystyle 1.

Proof.

Let c1\displaystyle c_{1} be the constant in Theorem 8.4 such that the expected number of messages is bounded by c1log(1/δ)2ln(2/ε)εDn+1/2\displaystyle c_{1}\cdot\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}+1/2.

The Protocol.

We set D=n/(2c1log(1/δ)2ln(2/ε)ε)\displaystyle D=\left\lfloor n\left/\left(2\cdot c_{1}\cdot\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\right)\right.\right\rfloor so that the foregoing expected number of messages is bounded by 1\displaystyle 1.

Note that we can assume ε1ln(2/ε)log(1/δ)2=o(n)\displaystyle\varepsilon^{-1}\cdot\ln(2/\varepsilon)\cdot\log(1/\delta)^{2}=o(n) as otherwise we are only required to solve CountDistinctn\displaystyle\textsf{\small CountDistinct}_{n} with the trivial error bound O(n)\displaystyle O(n). Therefore, we have D1\displaystyle D\geq 1 and n/D=O(ε1ln(2/ε)log(1/δ)2)\displaystyle n/D=O(\varepsilon^{-1}\cdot\ln(2/\varepsilon)\cdot\log(1/\delta)^{2}).

We are going to apply a reduction to the private-coin protocol for CountDistinctn,D\displaystyle\textsf{\small CountDistinct}_{n,D} in Theorem 8.4. The full protocol is as follows:

  • Using the public randomness, the users jointly sample a uniformly random mapping f:𝒳[n]\displaystyle f\colon\mathcal{X}\to[n] and a uniformly random permutation π\displaystyle\pi on [n]\displaystyle[n].

  • For each user holding an input x𝒳\displaystyle x\in\mathcal{X}, it computes z=π(f(x))\displaystyle z=\pi(f(x)), and sets its new input to be z\displaystyle z if zD\displaystyle z\leq D, and 0\displaystyle 0 otherwise. Then it runs the private-coin protocol in Theorem 8.4.

  • Let fn(m):=n(1(11n)m)\displaystyle f_{n}(m):=n\cdot\left(1-\left(1-\frac{1}{n}\right)^{m}\right). The analyzer first runs the analyzer in Theorem 8.4 to obtain an estimate z¯\displaystyle\bar{z}. Then it computes z^=z¯nD\displaystyle\hat{z}=\bar{z}\cdot\frac{n}{D}, and outputs

    z=argminm{0,1,,n}|fn(m)z^|.\displaystyle z=\mathop{\textrm{argmin}}_{m\in\{0,1,\dotsc,n\}}|f_{n}(m)-\hat{z}|.
Analysis of the Protocol.

The privacy of the protocol above follows directly from the privacy property of the protocol from Theorem 8.4. Moreover, the bound on the expected number of messages per user simply follows from our choice of the parameter D\displaystyle D.

It thus suffices to establish the accuracy guarantee of the protocol. Let S={xi}i[n]\displaystyle S=\{x_{i}\}_{i\in[n]} be the set of all inputs, and the goal is to estimate |S|\displaystyle|S|. We also let S^={f(xi)}i[n]\displaystyle\hat{S}=\{f(x_{i})\}_{i\in[n]} and S¯=S^π1([D])\displaystyle\bar{S}=\hat{S}\cap\pi^{-1}([D]).

By the accuracy guarantee of Theorem 8.4, it follows that with probability at least 0.99\displaystyle 0.99, we have

|z¯|S¯||O(Dε1).\displaystyle|\bar{z}-|\bar{S}||\leq O(\sqrt{D}\cdot\varepsilon^{-1}).

In the following, we will condition on this event.

Proving that z^\displaystyle\hat{z} is a good estimate of |S^|\displaystyle|\hat{S}|.

Next, we show that z^\displaystyle\hat{z} is close to |S^|\displaystyle|\hat{S}|. We will rely on the following lemma.

Lemma 8.10.

For a uniformly random permutation π:[n][n]\displaystyle\pi\colon[n]\to[n] and a fixed set E\displaystyle E, for every B[1,n]\displaystyle B\in[1,n] such that n/B\displaystyle n/B is an integer, let Eπ,n/B=Eπ1([n/B])\displaystyle E_{\pi,n/B}=E\cap\pi^{-1}([n/B]), we have

Prπ[||Eπ,n/B|B|E||10B|E|]0.01.\displaystyle\Pr_{\pi}\left[\Big{|}|E_{\pi,n/B}|\cdot B-|E|\Big{|}\geq 10\cdot\sqrt{B\cdot|E|}\right]\leq 0.01.
Proof.

For each i[n]\displaystyle i\in[n], let Xi=Xi(π)\displaystyle X_{i}=X_{i}(\pi) be the indicator that iEπ,n/B\displaystyle i\in E_{\pi,n/B}. Note that these Xi\displaystyle X_{i}’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.

Note that 𝔼[Xi]=1B|E|n\displaystyle\operatornamewithlimits{\mathbb{E}}[X_{i}]=\frac{1}{B}\cdot\frac{|E|}{n}. By a Chernoff bound, it thus follows that

Prπ[|i=1nXin𝔼[X1]|10n𝔼[Xi]]0.01,\displaystyle\Pr_{\pi}\left[\Big{|}\sum_{i=1}^{n}X_{i}-n\cdot\operatornamewithlimits{\mathbb{E}}[X_{1}]\Big{|}\geq 10\cdot\sqrt{n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]}\right]\leq 0.01,

and hence

Prπ[||Eπ,n/B||E|/B|10|E|/B]0.01.\displaystyle\Pr_{\pi}\left[\Big{|}|E_{\pi,n/B}|-|E|/B\Big{|}\geq 10\cdot\sqrt{|E|/B}\right]\leq 0.01.

Scaling both sides of the above inequality by B\displaystyle B concludes the proof. ∎

We now set B=nD\displaystyle B=\frac{n}{D} and recall that z^=Bz¯\displaystyle\hat{z}=B\cdot\bar{z}. By Lemma 8.10, with probability at least 0.98\displaystyle 0.98, it holds that

|z^|S^||\displaystyle\displaystyle|\hat{z}-|\hat{S}|| |z^|S¯|B|+||S^||S¯|B|\displaystyle\displaystyle\leq\left|\hat{z}-|\bar{S}|\cdot B\right|+\left||\hat{S}|-|\bar{S}|\cdot B\right|
B|z¯|S¯||+O(Bn)\displaystyle\displaystyle\leq B\cdot|\bar{z}-|\bar{S}||+O(\sqrt{B\cdot n})
=O(BDε1)+O(Bn)=O(Bnε1).\displaystyle\displaystyle=O(B\sqrt{D}\cdot\varepsilon^{-1})+O(\sqrt{B\cdot n})=O(\sqrt{Bn}\cdot\varepsilon^{-1}).
Proving that z\displaystyle z is a good estimate of |S|\displaystyle|S|.

Finally, we show that our output z\displaystyle z is a good estimate of |S|\displaystyle|S|. To do so, we need the following lemma.

Lemma 8.11.

Let fn(m):=n(1(11n)m)\displaystyle f_{n}(m):=n\cdot\left(1-\left(1-\frac{1}{n}\right)^{m}\right). For a uniformly random mapping f:𝒳[n]\displaystyle f\colon\mathcal{X}\to[n] and a fixed set E𝒳\displaystyle E\subseteq\mathcal{X} such that |E|=mn\displaystyle|E|=m\leq n, we have that

Pr[||{f(x)}xE|fn(m)|10n]0.01.\displaystyle\Pr[\left||\{f(x)\}_{x\in E}|-f_{n}(m)\right|\geq 10\sqrt{n}]\leq 0.01.
Proof.

For each i[n]\displaystyle i\in[n], let Xi=Xi(f)\displaystyle X_{i}=X_{i}(f) be the indicator whether i{f(x)}xE\displaystyle i\in\{f(x)\}_{x\in E}. As before, these Xi\displaystyle X_{i}’s are not independent but are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.

Note that 𝔼[Xi]=(1(11n)m)\displaystyle\operatornamewithlimits{\mathbb{E}}[X_{i}]=\left(1-\left(1-\frac{1}{n}\right)^{m}\right). By a Chernoff bound, it thus follows that

Prπ[|i=1nXin𝔼[X1]|10n𝔼[Xi]]0.01.\displaystyle\Pr_{\pi}\left[\Big{|}\sum_{i=1}^{n}X_{i}-n\cdot\operatornamewithlimits{\mathbb{E}}[X_{1}]\Big{|}\geq 10\cdot\sqrt{n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]}\right]\leq 0.01.

Noting that i=1nXi=|{f(x)}xE|\displaystyle\sum_{i=1}^{n}X_{i}=|\{f(x)\}_{x\in E}| and n𝔼[Xi]=fn(m)\displaystyle n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]=f_{n}(m) completes the proof. ∎

By Lemma 8.11, it follows that with probability at least 0.99\displaystyle 0.99, we have ||S^|fn(|S|)|10n\displaystyle\left||\hat{S}|-f_{n}(|S|)\right|\leq 10\sqrt{n}.

Putting everything together, with probability at least 0.97\displaystyle 0.97, we get that

|z^fn(|S|)|O(Bnε1).\displaystyle\left|\hat{z}-f_{n}(|S|)\right|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

The final step is to show that z\displaystyle z accurately estimates |S|\displaystyle|S|. Recall that z=argminm{0,1,,n}|fn(m)z^|\displaystyle z=\mathop{\textrm{argmin}}_{m\in\{0,1,\dotsc,n\}}|f_{n}(m)-\hat{z}|, which in particular means that

|fn(z)z^||fn(|S|)z^|O(Bnε1).\displaystyle|f_{n}(z)-\hat{z}|\leq\left|f_{n}(|S|)-\hat{z}\right|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

By a triangle inequality, it follows that

|fn(z)fn(|S|)|O(Bnε1).\displaystyle|f_{n}(z)-f_{n}(|S|)|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

We need the following lemma to finish the analysis.

Lemma 8.12.

There is a constant c>0\displaystyle c>0 such that for all a,b{0,1,,n}\displaystyle a,b\in\{0,1,\dotsc,n\}, it holds that

|fn(a)fn(b)|c|ab|.\displaystyle|f_{n}(a)-f_{n}(b)|\geq c\cdot|a-b|.
Proof.

Suppose a<b\displaystyle a<b without loss of generality. Let t=ba\displaystyle t=b-a. We have that

fn(b)fn(a)\displaystyle\displaystyle f_{n}(b)-f_{n}(a) =n((11n)a(11n)b)\displaystyle\displaystyle=n\cdot\left(\left(1-\frac{1}{n}\right)^{a}-\left(1-\frac{1}{n}\right)^{b}\right)
=n(11n)a(1(11n)t)\displaystyle\displaystyle=n\cdot\left(1-\frac{1}{n}\right)^{a}\cdot\left(1-\left(1-\frac{1}{n}\right)^{t}\right)
=Ω(ntn)=Ω(t).\displaystyle\displaystyle=\Omega\left(n\cdot\frac{t}{n}\right)=\Omega(t).\qed

Finally, by Lemma 8.12, with probability at least 0.97>0.9\displaystyle 0.97>0.9, we obtain that |z|S||O(Bnε1)\displaystyle|z-|S||\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}), which concludes the proof. ∎

9 Lower Bounds in Two-Party Differential Privacy

In this section, we depart from the local and shuffle models, and instead consider the two-party differential privacy [MMP+10], which can be defined as follows.

Definition 9.1 (DP in the Two-Party Model [MMP+10]).

There are two parties A\displaystyle A and B\displaystyle B; A\displaystyle A holds X=(x1,,xn)𝒳n\displaystyle X=(x_{1},\dots,x_{n})\in\mathcal{X}^{n} and B\displaystyle B holds Y=(y1,,yn)𝒳n\displaystyle Y=(y_{1},\dots,y_{n})\in\mathcal{X}^{n}. Let P\displaystyle P be any randomized protocol between A\displaystyle A and B\displaystyle B. Let VIEWAP(X,Y)\displaystyle\textsf{VIEW}^{A}_{P}(X,Y) denote the tuple (X\displaystyle X, the private randomness of A\displaystyle A, the transcript of the protocol). Similarly, let VIEWBP(X,Y)\displaystyle\textsf{VIEW}^{B}_{P}(X,Y) denote the tuple (Y\displaystyle Y, the private randomness of B\displaystyle B, the transcript of the protocol).

We say that P\displaystyle P is (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} if, for any X,Y𝒳n\displaystyle X,Y\in\mathcal{X}^{n}, the algorithms

(y1,,yn)VIEWAP(X,(y1,,yn))\displaystyle(y_{1},\dots,y_{n})\mapsto\textsf{VIEW}^{A}_{P}(X,(y_{1},\dots,y_{n}))
(x1,,xn)VIEWBP((x1,,xn),Y)\displaystyle(x_{1},\dots,x_{n})\mapsto\textsf{VIEW}^{B}_{P}((x_{1},\dots,x_{n}),Y)

are both (ε,δ)\displaystyle(\varepsilon,\delta)-DP.

We say that a two-party protocol P\displaystyle P computes a function f:𝒳2n\displaystyle f\colon\mathcal{X}^{2n}\to\mathbb{R} with error β\displaystyle\beta if, at the end of the protocol, at least one of the parties can output a number that lies in f(x1,,xn,y1,,yn)±β\displaystyle f(x_{1},\dots,x_{n},y_{1},\dots,y_{n})\pm\beta with probability at least 0.9\displaystyle 0.9.

We quickly note that, unlike in the local and shuffle models, we need not consider the public-coin and private-coin cases separately: as noted in [MMP+10], the two parties may share fresh private random bits without violating privacy, meaning that public randomness is unnecessary.

The goal of this section is to prove Theorem 1.11. To do this, we first state the necessary lower bound from [MMP+10] in Section 9.1. We then give our reduction and prove Theorem 1.11 in Section 9.2. Finally, in Section 9.3, we extend the lower bound to the case where the function is symmetric.

9.1 Inner Product Lower Bound from [MMP+10]

McGregor et al. [MMP+10] show that the inner product function is hard in the two-party model. Roughly speaking, they show that, if we let X,Y\displaystyle X,Y be uniformly random strings, then, for any not-too-large m\displaystyle m\in\mathbb{N}, no (O(1),o(1/n))\displaystyle(O(1),o(1/n))-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol can distinguish between X,Ymodm\displaystyle\left<X,Y\right>\mod m and a uniformly random number from {0,,m1}\displaystyle\{0,\dots,m-1\}. McGregor et al. use this result when m=Ω~ε(n)\displaystyle m=\tilde{\Omega}_{\varepsilon}(\sqrt{n}), but we will use their result for m=2\displaystyle m=2.

To avoid confusion in the next subsection, we will use D\displaystyle D in place of n\displaystyle n in this subsection. The following theorem is implicit in the proof of Theorem A.5 of [MMP+11] (it follows by replacing m=6Δ/δ\displaystyle m=6\Delta/\delta with m=2\displaystyle m=2 there). Recall that 𝒰D\displaystyle\mathcal{U}_{D} is the uniform distribution over {0,1}D\displaystyle\{0,1\}^{D}.

Theorem 9.2 ([MMP+11]).

Let P\displaystyle P be any (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol. Suppose (X,Y)𝒰D2\displaystyle(X,Y)\leftarrow\mathcal{U}_{D}^{\otimes 2} and let Z\displaystyle Z be a uniformly random bit. Then, we have

(VIEWBP(X,Y),X,Ymod2)(VIEWBP(X,Y),Z)TVO(Dδ)+eΩε(D).\displaystyle\displaystyle\|(\textsf{VIEW}^{B}_{P}(X,Y),\langle X,Y\rangle\mod 2)-(\textsf{VIEW}^{B}_{P}(X,Y),Z)\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}.

It will be more convenient to state the above lower bound in terms of hardness of distinguishing two distributions, as we have done in the rest of this paper. To state this, we will need the following few notation. First, we use 𝒟0\displaystyle\mathcal{D}^{0} to denote the distribution 𝒰D2\displaystyle\mathcal{U}_{D}^{\otimes 2} conditioned on the inner product of the two strings being 0mod2\displaystyle 0\bmod 2; similarly, we use 𝒟1\displaystyle\mathcal{D}^{1} to denote the distribution 𝒰D2\displaystyle\mathcal{U}_{D}^{\otimes 2} conditioned on the inner product of the two strings being 1mod2\displaystyle 1\bmod 2. Furthermore, for any distribution 𝒟\displaystyle\mathcal{D} on ({0,1}D)2\displaystyle(\{0,1\}^{D})^{2}, we write VIEWAP(𝒟)\displaystyle\textsf{VIEW}^{A}_{P}(\mathcal{D}) (respectively, VIEWBP(𝒟)\displaystyle\textsf{VIEW}^{B}_{P}(\mathcal{D})) to denote the distribution of VIEWAP(X,Y)\displaystyle\textsf{VIEW}^{A}_{P}(X,Y) (respectively, VIEWBP(X,Y)\displaystyle\textsf{VIEW}^{B}_{P}(X,Y)) when X,Y\displaystyle X,Y are drawn according to 𝒟\displaystyle\mathcal{D}.

We may now state the following corollary, which is an immediate consequence of Theorem 9.2.

Corollary 9.3.

Let P\displaystyle P be any (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol. Then, we have that

VIEWBP(𝒟0)VIEWBP(𝒟1)TVO(Dδ)+eΩε(D).\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1})\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}.

9.2 From Parity to the Ω~(n)\displaystyle\tilde{\Omega}(n) Gap

We will now construct the hard distributions that eventually give the gap of Ω~(n)\displaystyle\tilde{\Omega}(n) between the sensitivity and the error achievable in two-party model. The hard distributions are simply concatenations of 𝒟0\displaystyle\mathcal{D}^{0} or 𝒟1\displaystyle\mathcal{D}^{1}. Specifically, tor T\displaystyle T\in\mathbb{N}, we write 𝒟0,T\displaystyle\mathcal{D}^{0,T} (respectively, 𝒟1,T\displaystyle\mathcal{D}^{1,T}) to denote the distribution of ((x1,,xDT),(y1,,yDT))\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT})) where ((x(i1)D+1,,xiD),(y(i1)D+1,,yiD))\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD})) is an i.i.d. sample from 𝒟0\displaystyle\mathcal{D}^{0} (respectively, 𝒟1,T\displaystyle\mathcal{D}^{1,T}) for all i[T]\displaystyle i\in[T]. Similar to before, it is hard to distinguish the two distributions:

Lemma 9.4.

Let P\displaystyle P be any (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol. Then, we have

VIEWBP(𝒟0,T)VIEWBP(𝒟1,T)TVO(TDδ)+TeΩε(D).\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1,T})\|_{TV}\leq O\left(TD\delta\right)+T\cdot e^{-\Omega_{\varepsilon}(D)}.
Proof.

We prove this via a simple hybrid argument. For j[T+1]\displaystyle j\in[T+1], let us denote by 𝒟j\displaystyle\mathcal{D}_{j} the distribution of ((x1,,xDT),(y1,,yDT))\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT})) where, for all i[T]\displaystyle i\in[T], ((x(i1)D+1,,xiD),(y(i1)D+1,,yiD))\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD})) is independent from 𝒟1\displaystyle\mathcal{D}^{1} if i<j\displaystyle i<j and from 𝒟0\displaystyle\mathcal{D}^{0} otherwise. Notice that 𝒟1=𝒟0,T\displaystyle\mathcal{D}_{1}=\mathcal{D}^{0,T} and 𝒟T+1=𝒟1,T\displaystyle\mathcal{D}_{T+1}=\mathcal{D}^{1,T}.

Our main claim is the following. For every j[T]\displaystyle j\in[T] and any (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol P\displaystyle P,

VIEWBP(𝒟j)VIEWBP(𝒟j+1)TVO(Dδ)+eΩε(D).\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j})-\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j+1})\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}. (10)

Note that summing (10) over all j[T]\displaystyle j\in[T] immediately yields Lemma 9.4.

We will now prove (10). Given an (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol P\displaystyle P (where each party’s input has DT\displaystyle DT bits), we construct a protocol P\displaystyle P^{\prime} (where each party’s input has D\displaystyle D bits) as follows:

  • Suppose that the input of A\displaystyle A is x1,,xD\displaystyle x^{\prime}_{1},\dots,x^{\prime}_{D}, and the input of B\displaystyle B is y1,,yD\displaystyle y^{\prime}_{1},\dots,y^{\prime}_{D}.

  • For i=1,,j1\displaystyle i=1,\dots,j-1, A\displaystyle A samples ((x(i1)D+1,,xiD),(y(i1)D+1,,yiD))\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD})) from 𝒟1\displaystyle\mathcal{D}^{1} and sends (y(i1)D+1,,yiD)\displaystyle(y_{(i-1)D+1},\dots,y_{iD}) to B\displaystyle B.

  • i=j+1,,T\displaystyle i=j+1,\dots,T, A\displaystyle A samples ((x(i1)D+1,,xiD),(y(i1)D+1,,yiD))\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD})) from 𝒟0\displaystyle\mathcal{D}^{0} and sends (y(i1)D+1,,yiD)\displaystyle(y_{(i-1)D+1},\dots,y_{iD}) to B\displaystyle B.

  • A\displaystyle A sets (x(j1)D+1,,xjD)=(x1,,xD)\displaystyle(x_{(j-1)D+1},\dots,x_{jD})=(x^{\prime}_{1},\dots,x^{\prime}_{D}).

  • B\displaystyle B sets (y(j1)D+1,,yjD)=(y1,,yD)\displaystyle(y_{(j-1)D+1},\dots,y_{jD})=(y^{\prime}_{1},\dots,y^{\prime}_{D}).

  • A\displaystyle A and B\displaystyle B then run the protocol P\displaystyle P on ((x1,,xDT),(y1,,yDT))\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT})).

It is clear that P\displaystyle P^{\prime} is (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} and that

VIEWBP(𝒟0,T)VIEWBP(𝒟1,T)TVVIEWBP(𝒟j)VIEWBP(𝒟j+1)TV.\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P^{\prime}}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P^{\prime}}(\mathcal{D}^{1,T})\|_{TV}\geq\|\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j})-\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j+1})\|_{TV}.

Inequality (10) then follows from Corollary 9.3. ∎

We can now prove our main theorem of this section.

Proof of Theorem 1.11.

Let C>0\displaystyle C>0 be a sufficiently large constant to be chosen later. Let D=Clogn\displaystyle D=\lceil C\log n\rceil and T=n/D\displaystyle T=\lfloor n/D\rfloor. We may define f\displaystyle f on only 2DT\displaystyle 2DT bits, as it can be trivially extended to 2n\displaystyle 2n bits by ignoring the last nDT\displaystyle n-DT bits of X,Y\displaystyle X,Y. Let

f(x1,,xDT,y1,,yDT)=i[T]([D]x(i1)D+jy(i1)D+jmod2),\displaystyle\displaystyle f(x_{1},\dots,x_{DT},y_{1},\dots,y_{DT})=\sum_{i\in[T]}\left(\sum_{\ell\in[D]}x_{(i-1)D+j}y_{(i-1)D+j}\mod 2\right),

where the outer summation is over \displaystyle\mathbb{Z}.

It is immediate that the sensitivity of f\displaystyle f is one. We will now argue that any (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol P\displaystyle P with δ=o(1/n)\displaystyle\delta=o(1/n) incurs error at least Ω(n/logn)\displaystyle\Omega(n/\log n). Since the function is symmetric with respect to the two parties, it suffices without loss of generality to show that the output of B\displaystyle B incurs error Ω(n/logn)\displaystyle\Omega(n/\log n) with probability 0.1. To do so, we start by observing that we have f(X,Y)=0\displaystyle f(X,Y)=0 for any (X,Y)supp(𝒟0,T)\displaystyle(X,Y)\in\mathrm{supp}(\mathcal{D}^{0,T}) whereas f(X,Y)=T\displaystyle f(X,Y)=T for any (X,Y)supp(𝒟1,T)\displaystyle(X,Y)\in\mathrm{supp}(\mathcal{D}^{1,T}). From Lemma 9.4, we have that

VIEWBP(𝒟0,T)VIEWBP(𝒟1,T)TVO(TDδ)+TeΩε(D).\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1,T})\|_{TV}\leq O\left(TD\delta\right)+T\cdot e^{-\Omega_{\varepsilon}(D)}.

As a result, if we sample (X,Y)\displaystyle(X,Y) from 𝒟0,T\displaystyle\mathcal{D}^{0,T} with probability 1/2 and 𝒟1,T\displaystyle\mathcal{D}^{1,T} with probability 1/2, then the probability that B\displaystyle B’s output incurs error at least T/2\displaystyle T/2 is at least

12O(TDδ)TeΩε(D)12o(1)neΩε(Clogn).\displaystyle\displaystyle\frac{1}{2}-O\left(TD\delta\right)-T\cdot e^{-\Omega_{\varepsilon}(D)}\geq\frac{1}{2}-o(1)-n\cdot e^{-\Omega_{\varepsilon}(C\log n)}.

When C\displaystyle C is sufficiently large, we also have that neΩε(Clogn)=o(1)\displaystyle n\cdot e^{-\Omega_{\varepsilon}(C\log n)}=o(1). As a result, with probability 1/2o(1)\displaystyle 1/2-o(1) (which is at least 0.1\displaystyle 0.1 for any sufficiently large n\displaystyle n), the protocol P\displaystyle P must incur an error of at least T/2=Ω(n/logn)\displaystyle T/2=\Omega(n/\log n). ∎

9.3 Symmetrization

Notice that the function in Theorem 1.11 is asymmetric. It is a natural question to ask whether we can get a similar lower bound for a symmetric function. In this subsection, we give a simple reduction that positively answers this question, ultimately yielding the following:

Theorem 9.5.

For any ε=O(1)\displaystyle\varepsilon=O(1) and any sufficiently large n\displaystyle n\in\mathbb{N}, there is a symmetric function f:[2n]2n\displaystyle f:[2n]^{2n}\to\mathbb{R} whose sensitivity is one and such that any (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol cannot compute f\displaystyle f to within an error of o(n/logn)\displaystyle o(n/\log n).

We remark that the input to each user comes from a set 𝒳\displaystyle\mathcal{X} of size Ω(n)\displaystyle\Omega(n), instead of 𝒳={0,1}\displaystyle\mathcal{X}=\{0,1\} as in Theorem 1.11. This larger value of |𝒳|\displaystyle|\mathcal{X}| turns out to be necessary for symmetric functions: when f\displaystyle f is symmetric, we may use the Laplace mechanism from both sides to estimate the histogram of the input, which we can then use to compute f\displaystyle f. If the sensitivity of f\displaystyle f is O(1)\displaystyle O(1), this algorithm incurs an error of Oε(|𝒳|)\displaystyle O_{\varepsilon}(|\mathcal{X}|). Hence, to achieve a lower bound of Ω~(n)\displaystyle\tilde{\Omega}(n), we need |𝒳|\displaystyle|\mathcal{X}| to be at least Ω~ε(n)\displaystyle\tilde{\Omega}_{\varepsilon}(n).

The properties of our reduction are summarized in the following lemma, which combined with Theorem 1.11, immediately implies Theorem 9.5.

Lemma 9.6.

For any function g:𝒳2n\displaystyle g\colon\mathcal{X}^{2n}\to\mathbb{R}, there is another function f:(𝒳×[n])2n\displaystyle f:(\mathcal{X}\times[n])^{2n}\to\mathbb{R} such that the following holds:

  • The sensitivity of f\displaystyle f is no more than that of g\displaystyle g.

  • If there exists an (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol that solves f\displaystyle f with error β\displaystyle\beta, then there exists an (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol that solves f\displaystyle f with error 2β\displaystyle 2\beta.

The idea behind the proof of Lemma 9.6 is simple. Roughly speaking, we view each input (x,i)𝒳×[n]\displaystyle(x,i)\in\mathcal{X}\times[n] of f\displaystyle f as “setting the i\displaystyle ith position to x\displaystyle x” for the input to g\displaystyle g. This is formalized below.

Proof of Lemma 9.6.

We start by defining f\displaystyle f. Let u\displaystyle u^{*} be an arbitrary element of 𝒳\displaystyle\mathcal{X}. For every i[n]\displaystyle i\in[n], we define hi:(𝒳×[n])n\displaystyle h_{i}:(\mathcal{X}\times[n])^{n} where

hi((w1,,wn))={the unique x such that j[n],wj=(x,i) if |{x𝒳j[n],wj=(x,i)}|=1,u otherwise.\displaystyle\displaystyle h_{i}((w_{1},\dots,w_{n}))=\begin{cases}\text{the unique }x\text{ such that }\exists j\in[n],w_{j}=(x,i)&\text{ if }|\{x\in\mathcal{X}\mid\exists j\in[n],w_{j}=(x,i)\}|=1,\\ u^{*}&\text{ otherwise.}\end{cases}

We now define f\displaystyle f by

f(W,V)=12g(h1(W),,hn(W),h1(V),,hn(V)).\displaystyle\displaystyle f(W,V)=\frac{1}{2}\cdot g(h_{1}(W),\dots,h_{n}(W),h_{1}(V),\cdots,h_{n}(V)).

We will next verify that the two properties hold.

  • Notice that changing any user’s input in f\displaystyle f results in at most two changes in the user’s input of g\displaystyle g. As a result, the sensitivity of f\displaystyle f is no more than that of g\displaystyle g.

  • Suppose that there exists an (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}} protocol P\displaystyle P that solves f\displaystyle f with error β\displaystyle\beta. Let P\displaystyle P^{\prime} be the protocol for g\displaystyle g where A,B\displaystyle A,B transform their inputs (x1,,xn)\displaystyle(x_{1},\dots,x_{n}), (y1,,yn)\displaystyle(y_{1},\dots,y_{n}) to ((x1,1),,(xn,n))\displaystyle((x_{1},1),\dots,(x_{n},n)), ((y1,1),,(yn,n))\displaystyle((y_{1},1),\dots,(y_{n},n)) respectively, then run P\displaystyle P, and finally return the output of P\displaystyle P multiplied by two. It is obvious that P\displaystyle P^{\prime} is (ε,δ)\displaystyle(\varepsilon,\delta)-DPtwo-party\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}; furthermore, since the protocol P\displaystyle P incurs error β\displaystyle\beta, the protocol P\displaystyle P^{\prime} incurs error 2β\displaystyle 2\beta as desired. ∎

Acknowledgments

We would like to thank Noah Golowich for numerous enlightening discussions about lower bounds in the multi-message DPshuffle\displaystyle\mathrm{DP}_{\mathrm{shuffle}} model, and for helpful feedback.

References

  • [Abo18] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
  • [App17] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
  • [BBGN19] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In CRYPTO, pages 638–667, 2019.
  • [BBGN20] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. arXiv: 2002.00817, 2020.
  • [BC20] Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In ITC, pages 1:1–1:14, 2020.
  • [BCJM20] Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. CoRR, abs/2004.09481, 2020.
  • [BCK+14] Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, David P Woodruff, and Grigory Yaroslavtsev. Beyond set disjointness: the communication complexity of finding the intersection. In PODC, pages 106–113, 2014.
  • [BEM+17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441–459, 2017.
  • [BFJ+94] Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, pages 253–262, 1994.
  • [CDSKY20] Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. PoPETs, 3:153–174, 2020.
  • [CSS12] TH Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private multi-party aggregation. In ESA, pages 277–288, 2012.
  • [CSU+18] Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via mixnets. CoRR, abs/1808.01394, 2018.
  • [CU20] Albert Cheu and Jonathan Ullman. The limits of pan privacy and shuffle privacy for learning and estimation. CoRR, abs/2009.08000, 2020.
  • [DJW13] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In FOCS, pages 429–438, 2013.
  • [DKM+06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
  • [DKY17] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In NIPS, pages 3571–3580, 2017.
  • [DLB19] Damien Desfontaines, Andreas Lochbihler, and David Basin. Cardinality estimators do not preserve privacy. PoPETs, 2019(2):26–46, 2019.
  • [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
  • [DR98] Devdatt P. Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Struct. Algorithms, 13(2):99–124, 1998.
  • [DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  • [DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and differential privacy. In FOCS, pages 51–60, 2010.
  • [EFM+19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019.
  • [ENU20] Alexander Edmonds, Aleksandar Nikolov, and Jonathan Ullman. The power of factorization mechanisms in local and central differential privacy. In STOC, pages 425–438, 2020.
  • [EPK14] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014.
  • [GGK+19] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages. IACR Cryptol. ePrint Arch., 2019:1382, 2019.
  • [GGK+20] Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous messages. In ITC, pages 15:1–15:23, 2020.
  • [GKMP20] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, 2020.
  • [GMPV20] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In EUROCRYPT, pages 798–827, 2020.
  • [Gre16] Andy Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
  • [GS02] Alison L Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International statistical review, 70(3):419–435, 2002.
  • [IKOS06] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248, 2006.
  • [JHW18] Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the l1\displaystyle{}_{\mbox{1}} distance. IEEE Trans. Inf. Theory, 64(10):6672–6706, 2018.
  • [Kea98] Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
  • [KLN+11] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
  • [KNW10] Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41–52, 2010.
  • [MMNW11] Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In PODS, pages 37–48, 2011.
  • [MMP+10] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In FOCS, pages 81–90, 2010.
  • [MMP+11] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. The limits of two-party differential privacy. Electron. Colloquium Comput. Complex., 18:106, 2011.
  • [MT07] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103, 2007.
  • [O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
  • [PS20] Rasmus Pagh and Nina Mesing Stausholm. Efficient differentially private f0\displaystyle f_{0} linear sketching. arXiv preprint arXiv:2001.11932, 2020.
  • [PT11] Giovanni Peccati and Murad S Taqqu. Some facts about charlier polynomials. In Wiener Chaos: Moments, Cumulants and Diagrams, pages 171–175. Springer, 2011.
  • [Rob90] B Robert. Ash. information theory, 1990.
  • [Sha14] Stephen Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
  • [SU17] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. In FOCS, pages 552–563, 2017.
  • [Tim14] Aleksandr Filippovich Timan. Theory of approximation of functions of a real variable. Elsevier, 2014.
  • [Ull18] Jonathan Ullman. Tight lower bounds for locally differentially private selection. In arXiv:1802.02638, 2018.
  • [Vad17] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
  • [VV17] Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. J. ACM, 64(6):37:1–37:41, 2017.
  • [War65] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
  • [WY16] Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Trans. Inf. Theory, 62(6):3702–3720, 2016.
  • [WY19] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 47(2):857–883, 2019.
  • [Yan19] Han Yanjun. Lecture 7: Mixture vs. mixture and moment matching. https://theinformaticists.com/2019/08/28/lecture-7-mixture-vs-mixture-and-moment-matching/, 2019.

Appendix A Total Variance Bound between Mixtures of Multi-dimensional Poisson Distributions

In this section we prove Lemma 4.3 (restated below).


Lemma 4.3. (restated) Let U,V\displaystyle U,V be two random variables supported on [0,Λ]\displaystyle[0,\Lambda] such that 𝔼[Uj]=𝔼[Vj]\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for all j{1,2,,L}\displaystyle j\in\{1,2,\dotsc,L\}, where L1\displaystyle L\geq 1. Let D\displaystyle D\in\mathbb{N} and θ,λ(0)D\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D} such that θ1=1\displaystyle\|\vec{\theta}\|_{1}=1. Let 𝒟θ\displaystyle\mathcal{D}_{\vec{\theta}} be the distribution over [D]\displaystyle[D] corresponding to θ\displaystyle\vec{\theta}. Suppose that

Pri𝒟θ[λi2Λ2θi]112Λ.\displaystyle\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\vec{\lambda}_{i}\geq 2\Lambda^{2}\cdot\vec{\theta}_{i}]\geq 1-\frac{1}{2\Lambda}.

Then,

𝔼[𝖯𝗈𝗂(Uθ+λ)]𝔼[𝖯𝗈𝗂(Vθ+λ)]TV21L!.\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\frac{1}{L!}.

To prove Lemma 4.3, we begin with some notation. Let D\displaystyle D\in\mathbb{N}. For vectors mD\displaystyle\vec{m}\in\mathbb{N}^{D} and λD\displaystyle\vec{\lambda}\in\mathbb{R}^{D}, we let

m!:=i=1Dmi! and λm:=i=1D(λi)mi.\displaystyle\vec{m}!:=\prod_{i=1}^{D}m_{i}!~{}~{}\text{ and }~{}~{}\vec{\lambda}^{\vec{m}}:=\prod_{i=1}^{D}(\vec{\lambda}_{i})^{\vec{m}_{i}}.

We are going to apply the moment-matching technique [WY16, JHW18, WY19] for bounding total variance between mixtures of (single-dimensional) Poisson distributions. The following lemma is a direct generalization of the Theorem 4 of [Yan19] to mixtures of multi-dimensional Poisson distributions. We will use the convention that 00=1\displaystyle 0^{0}=1.

Lemma A.1.

For λD\displaystyle\vec{\lambda}\in\mathbb{R}^{D} and two distributions U\displaystyle\vec{U} and V\displaystyle\vec{V} supported on i=1D[λi,]\displaystyle\prod_{i=1}^{D}[-\lambda_{i},\infty], we have that

𝔼[𝖯𝗈𝗂(U+λ)]𝔼[𝖯𝗈𝗂(V+λ)]TV2m(0)D(𝔼[Um]𝔼[Vm])2m!λm.\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot\vec{\lambda}^{\vec{m}}}.

In order to prove the above lemma, we need to use the Charlier polynomial cm(x;λ)\displaystyle c_{m}(x;\lambda). The explicit definition of cm(x;λ)\displaystyle c_{m}(x;\lambda) is not important here; we simply list two important properties of this polynomial family [PT11]:

Proposition A.2.

Let λ\displaystyle\lambda\in\mathbb{R} and u[λ,]\displaystyle u\in[-\lambda,\infty], the following hold:

  1. 1.

    We have that

    𝔼X𝖯𝗈𝗂(λ)[cm(X;λ)cn(X;λ)]=n!λn𝟙[m=n].\displaystyle\operatornamewithlimits{\mathbb{E}}_{X\leftarrow\mathsf{Poi}(\lambda)}[c_{m}(X;\lambda)c_{n}(X;\lambda)]=\frac{n!}{\lambda^{n}}\cdot\mathbb{1}[m=n].
  2. 2.

    For all z0\displaystyle z\in\mathbb{Z}^{\geq 0},

    𝖯𝗈𝗂(λ+u)z𝖯𝗈𝗂(λ)z=eu(1+uλ)z=m=0cm(z;λ)umm!.\displaystyle\frac{\mathsf{Poi}(\lambda+u)_{z}}{\mathsf{Poi}(\lambda)_{z}}=e^{-u}\cdot\left(1+\frac{u}{\lambda}\right)^{z}=\sum_{m=0}^{\infty}c_{m}(z;\lambda)\cdot\frac{u^{m}}{m!}.

We now prove Lemma A.1. Our proof closely follows the proof of Theorem 4 of [Yan19].

Proof of Lemma A.1.

Let Δ:=𝔼[𝖯𝗈𝗂(U+λ)]𝔼[𝖯𝗈𝗂(V+λ)]TV\displaystyle\Delta:=\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda})]\|_{TV}. We have that

Δ\displaystyle\displaystyle\Delta =12z(0)D|𝔼uU𝖯𝗈𝗂(u+λ)z𝔼uV𝖯𝗈𝗂(u+λ)z|\displaystyle\displaystyle=\frac{1}{2}\cdot\sum_{\vec{z}\in(\mathbb{Z}^{\geq 0})^{D}}\left|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}\right|
𝔼z𝖯𝗈𝗂(λ)|𝔼uU𝖯𝗈𝗂(u+λ)z𝖯𝗈𝗂(λ)z𝔼uV𝖯𝗈𝗂(u+λ)z𝖯𝗈𝗂(λ)z|\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}\right|
=𝔼z𝖯𝗈𝗂(λ)|m(0)Di=1Dcmi(zi;λi)𝔼[Um]𝔼[Vm]m!|,\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left|\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right|,

where the last equality follows from Item (2) of Proposition A.2.

Applying the Cauchy-Schwarz inequality, we get that

Δ2\displaystyle\displaystyle\Delta^{2} 𝔼z𝖯𝗈𝗂(λ)|m(0)Di=1Dcmi(zi;λi)𝔼[Um]𝔼[Vm]m!|2\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left|\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right|^{2}
=𝔼z𝖯𝗈𝗂(λ)m,m(0)Di=1Dcmi(zi;λi)cmi(zi;λi)𝔼[Um]𝔼[Vm]m!𝔼[Um]𝔼[Vm]m!\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}
=m,m(0)D𝔼[Um]𝔼[Vm]m!𝔼[Um]𝔼[Vm]m!𝔼z𝖯𝗈𝗂(λ)i=1Dcmi(zi;λi)cmi(zi;λi)\displaystyle\displaystyle=\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})
=m,m(0)D𝔼[Um]𝔼[Vm]m!𝔼[Um]𝔼[Vm]m!i=1D𝔼zi𝖯𝗈𝗂(λi)cmi(zi;λi)cmi(zi;λi)\displaystyle\displaystyle=\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}\prod_{i=1}^{D}\operatornamewithlimits{\mathbb{E}}_{\vec{z}_{i}\leftarrow\mathsf{Poi}(\vec{\lambda}_{i})}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})
=m(0)D(𝔼[Um]𝔼[Vm]m!)2i=1D(mi)!λimi\displaystyle\displaystyle=\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\left(\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right)^{2}\cdot\prod_{i=1}^{D}\frac{(\vec{m}_{i})!}{\vec{\lambda}_{i}^{m_{i}}}
=m(0)D(𝔼[Um]𝔼[Vm])2m!λm,\displaystyle\displaystyle=\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot\vec{\lambda}^{\vec{m}}},

where the penultimate equality follows from Item (1) of Proposition A.2. ∎

Applying Lemma A.1, the next lemma follows from a straightforward calculation.

Lemma A.3.

Let U,V\displaystyle U,V be two random variables supported on [0,Λ]\displaystyle[0,\Lambda] such that 𝔼[Uj]=𝔼[Vj]\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}] for all j{1,2,,L}\displaystyle j\in\{1,2,\dotsc,L\}, where L1\displaystyle L\geq 1. For θ,λ(0)D\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D}, let α=θ2Λθ+λ\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}} (division here is coordinate-wise, and θ2\displaystyle\vec{\theta}^{2} denotes taking coordinate-wise square of θ\displaystyle\vec{\theta}). The following holds:

𝔼[𝖯𝗈𝗂(Uθ+λ)]𝔼[𝖯𝗈𝗂(Vθ+λ)]TV2z=L+1(Λ2α1)zz!.\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{z=L+1}^{\infty}\frac{(\Lambda^{2}\cdot\|\vec{\alpha}\|_{1})^{z}}{z!}.
Proof.

Let Δ:=𝔼[𝖯𝗈𝗂(Uθ+λ)]𝔼[𝖯𝗈𝗂(Vθ+λ)]TV\displaystyle\Delta:=\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}. We also set U=(UΛ)θ\displaystyle\vec{U}=(U-\Lambda)\vec{\theta}, λ=(Λθ+λ)\displaystyle\vec{\lambda}^{\prime}=(\Lambda\vec{\theta}+\vec{\lambda}) and V=(VΛ)θ\displaystyle\vec{V}=(V-\Lambda)\vec{\theta}. Note that for every i[D]\displaystyle i\in[D], we have Ui(Λθ)i(Λθ+λ)i=λi\displaystyle\vec{U}_{i}\geq(-\Lambda\vec{\theta})_{i}\geq-(\Lambda\vec{\theta}+\vec{\lambda})_{i}=-\vec{\lambda}^{\prime}_{i}, and the same holds for each Vi\displaystyle\vec{V}_{i} as well. Hence, we can apply Lemma A.1 to bound Δ2\displaystyle\Delta^{2} as follows:

Δ2\displaystyle\displaystyle\Delta^{2} =𝔼[𝖯𝗈𝗂(U+λ)]𝔼[𝖯𝗈𝗂(V+λ)]TV2\displaystyle\displaystyle=\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda}^{\prime})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda}^{\prime})]\|_{TV}^{2}
m(0)D(𝔼[Um]𝔼[Vm])2m!(λ)m\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot(\vec{\lambda}^{\prime})^{\vec{m}}}
m(0)D(𝔼[((UΛ)θ)m]𝔼[((UΛ)θ)m])2m!(Λθ+λ)m\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[((U-\Lambda)\vec{\theta})^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[((U-\Lambda)\vec{\theta})^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}
m(0)D(θm𝔼[((UΛ))m1]θm𝔼[(VΛ)|m1])2m!(Λθ+λ)m\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\vec{\theta}^{\vec{m}}\cdot\operatornamewithlimits{\mathbb{E}}[((U-\Lambda))^{\|\vec{m}\|_{1}}]-\vec{\theta}^{\vec{m}}\cdot\operatornamewithlimits{\mathbb{E}}[(V-\Lambda)^{|\vec{m}\|_{1}}]\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}
z=L+1m(0)Ds.t.m1=z(θm(𝔼[(UΛ)z]𝔼[(VΛ)z]))2m!(Λθ+λ)m\displaystyle\displaystyle\leq\sum_{z=L+1}^{\infty}\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\left(\vec{\theta}^{\vec{m}}\cdot(\operatornamewithlimits{\mathbb{E}}[(U-\Lambda)^{z}]-\operatornamewithlimits{\mathbb{E}}[(V-\Lambda)^{z}])\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}
z=L+1Λ2zm(0)Ds.t.m1=zθ2mm!(Λθ+λ)m,\displaystyle\displaystyle\leq\sum_{z=L+1}^{\infty}\Lambda^{2z}\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\vec{\theta}^{2\vec{m}}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}},

where the first inequality follows from Lemma A.1.

Now, recall that α=θ2Λθ+λ\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}}. We claim that

m(0)Ds.t.m1=zαmm!=α1zz!.\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\vec{\alpha}^{\vec{m}}}{\vec{m}!}=\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}.

To prove this equality, consider the random process of drawing z\displaystyle z samples from [D]\displaystyle[D] using the distribution corresponding to α/α1\displaystyle\vec{\alpha}/\|\vec{\alpha}\|_{1} (that is, we get i[D]\displaystyle i\in[D] with probability αiα1\displaystyle\frac{\vec{\alpha}_{i}}{\|\vec{\alpha}\|_{1}}. It is a well-defined distribution since α(0)D\displaystyle\vec{\alpha}\in(\mathbb{R}^{\geq 0})^{D}). Let M\displaystyle\vec{M} be the random variable corresponding to the histogram of the z\displaystyle z samples (that is, Mi\displaystyle\vec{M}_{i} denotes the number of occurrences of the element i\displaystyle i). For m(0)Ds.t.m1=z\displaystyle\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z, we have that

Pr[M=m]=(αα1)mz!m!.\displaystyle\Pr[\vec{M}=\vec{m}]=\left(\frac{\vec{\alpha}}{\|\vec{\alpha}\|_{1}}\right)^{\vec{m}}\cdot\frac{z!}{\vec{m}!}.

Hence, we get

m(0)Ds.t.m1=zPr[M=m]=1,\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\Pr[\vec{M}=\vec{m}]=1,

and

m(0)Ds.t.m1=zαmm!=α1zz!.\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\vec{\alpha}^{\vec{m}}}{\vec{m}!}=\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}.

Plugging in, we obtain

Δ2z=L+1Λ2zα1zz!=z=L+1(Λ2α1)zz!.\displaystyle\Delta^{2}\leq\sum_{z=L+1}^{\infty}\Lambda^{2z}\cdot\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}=\sum_{z=L+1}^{\infty}\frac{(\Lambda^{2}\cdot\|\vec{\alpha}\|_{1})^{z}}{z!}.\qed

Applying Lemma A.3, we are now ready to prove Lemma 4.3.

Proof of Lemma 4.3.

Let α=θ2Λθ+λ\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}}. We have that

α1\displaystyle\displaystyle\|\vec{\alpha}\|_{1} =i[D]θiθiΛθi+λi\displaystyle\displaystyle=\sum_{i\in[D]}\vec{\theta}_{i}\cdot\frac{\vec{\theta}_{i}}{\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}}
=𝔼i𝒟θθiΛθi+λi\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{i\leftarrow\mathcal{D}_{\vec{\theta}}}\frac{\vec{\theta}_{i}}{\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}}
12Λ2+Pri𝒟θ[Λθi+λi<2Λ2θi]1Λ\displaystyle\displaystyle\leq\frac{1}{2\Lambda^{2}}+\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}<2\Lambda^{2}\cdot\vec{\theta}_{i}]\cdot\frac{1}{\Lambda}
1Λ2.\displaystyle\displaystyle\leq\frac{1}{\Lambda^{2}}.

Applying Lemma A.3, we get

𝔼[𝖯𝗈𝗂(Uθ+λ)]𝔼[𝖯𝗈𝗂(Vθ+λ)]TV2z=L+11z!1L!.\displaystyle\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{z=L+1}^{\infty}\frac{1}{z!}\leq\frac{1}{L!}.\qed

Appendix B Lower Bounds on Hockey Stick Divergence

In this section, we prove Lemma 4.8 (restated below).


Lemma 4.8. (restated) There exists an absolute constant c0\displaystyle c_{0} such that, for every integer m1\displaystyle m\geq 1, three reals α,β,ε>0\displaystyle\alpha,\beta,\varepsilon>0 such that α>eεβ\displaystyle\alpha>e^{\varepsilon}\beta, letting Δ=αeεβ\displaystyle\Delta=\alpha-e^{\varepsilon}\beta and supposing 4eεΔβ<1/2\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2, it holds that

dε(𝖡𝖾𝗋(α)+𝖡𝗂𝗇(m,β)||𝖡𝖾𝗋(β)+𝖡𝗂𝗇(m,β))Δ122mexp(c0meεΔβ[log(Δ1)+1]).\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

Before proving Lemma 4.8, we need several technical lemmas. First, we show the hockey stick divergence between 𝖡𝖾𝗋(α)+X\displaystyle\mathsf{Ber}(\alpha)+X and 𝖡𝖾𝗋(β)+X\displaystyle\mathsf{Ber}(\beta)+X can be characterized by the hockey sticky divergence between X+1\displaystyle X+1 and X\displaystyle X.

Lemma B.1.

Let α,β,ε>0\displaystyle\alpha,\beta,\varepsilon>0 be three reals such that α>eεβ\displaystyle\alpha>e^{\varepsilon}\beta, and X\displaystyle X be a random variable over 0\displaystyle\mathbb{Z}^{\geq 0}. The following holds:

dε(𝖡𝖾𝗋(α)+X||𝖡𝖾𝗋(β)+X)=(αeεβ)dlnτ(1+X||X),\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+X||\mathsf{Ber}(\beta)+X)=(\alpha-e^{\varepsilon}\beta)\cdot d_{\ln\tau}(1+X||X),

where τ=eεeεβ1+ααeεβ\displaystyle\tau=\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}.

Proof.

We have that

dε(𝖡𝖾𝗋(α)+X||𝖡𝖾𝗋(β)+X)\displaystyle\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+X||\mathsf{Ber}(\beta)+X) =k0[(1α)Xk+αXk1eε(1β)XkeεβXk1]+\displaystyle\displaystyle=\sum_{k\in\mathbb{Z}^{\geq 0}}\left[(1-\alpha)X_{k}+\alpha X_{k-1}-e^{\varepsilon}(1-\beta)X_{k}-e^{\varepsilon}\beta X_{k-1}\right]_{+}
=k0[(αεεβ)Xk1(eεeεβ1+α)Xk]+\displaystyle\displaystyle=\sum_{k\in\mathbb{Z}^{\geq 0}}\left[(\alpha-\varepsilon^{\varepsilon}\beta)\cdot X_{k-1}-(e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha)\cdot X_{k}\right]_{+}
=(αeεβ)k0[Xk1eεeεβ1+ααeεβXk]+\displaystyle\displaystyle=(\alpha-e^{\varepsilon}\beta)\cdot\sum_{k\in\mathbb{Z}^{\geq 0}}\left[X_{k-1}-\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}\cdot X_{k}\right]_{+}
=(αeεβ)dlnτ(1+X||X).\displaystyle\displaystyle=(\alpha-e^{\varepsilon}\beta)\cdot d_{\ln\tau}(1+X||X).\qed

Next, we need a lemma giving a lower bound on dε(1+X||X)\displaystyle d_{\varepsilon}(1+X||X) for a random variable X\displaystyle X.

Lemma B.2.

Let X\displaystyle X be a random variable over 0\displaystyle\mathbb{Z}^{\geq 0} and ε>0\displaystyle\varepsilon>0. The following holds:

dε(1+X||X)12PrkX[XkXk+12eε].\displaystyle d_{\varepsilon}(1+X||X)\geq\frac{1}{2}\cdot\Pr_{k\leftarrow X}\left[\frac{X_{k}}{X_{k+1}}\geq 2e^{\varepsilon}\right].
Proof.

We have that

dε(1+X||X)=\displaystyle\displaystyle d_{\varepsilon}(1+X||X)= z=0[Xz1eεXz]+\displaystyle\displaystyle\sum_{z=0}^{\infty}[X_{z-1}-e^{\varepsilon}X_{z}]_{+}
=\displaystyle\displaystyle= z=0[XzeεXz+1]+\displaystyle\displaystyle\sum_{z=0}^{\infty}[X_{z}-e^{\varepsilon}X_{z+1}]_{+}
\displaystyle\displaystyle\geq z=012Xz𝟙[Xz2eεXz+1]\displaystyle\displaystyle\sum_{z=0}^{\infty}\frac{1}{2}\cdot X_{z}\cdot\mathbb{1}[X_{z}\geq 2e^{\varepsilon}X_{z+1}]
=\displaystyle\displaystyle= 12PrkX[XkXk+12eε].\displaystyle\displaystyle\frac{1}{2}\Pr_{k\leftarrow X}\left[\frac{X_{k}}{X_{k+1}}\geq 2e^{\varepsilon}\right].\qed

Applying Lemma B.2, we obtain the following lower bound on dε(1+𝖡𝗂𝗇(n,p)||𝖡𝗂𝗇(n,p))\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p)).

Lemma B.3.

For n\displaystyle n\in\mathbb{N} and p(0,0.5)\displaystyle p\in(0,0.5), ε>0\displaystyle\varepsilon>0 such that 4eεp<1/2\displaystyle 4e^{\varepsilon}p<1/2,

dε(1+𝖡𝗂𝗇(n,p)||𝖡𝗂𝗇(n,p))122nexp(n4eεplog(4eε)).\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p))\geq\frac{1}{2\sqrt{2n}}\exp(-n4e^{\varepsilon}p\cdot\log(4e^{\varepsilon})).
Proof.

We have

𝖡𝗂𝗇(n,p)k𝖡𝗂𝗇(n,p)k+1=1ppk+1nk1ppkn.\displaystyle\frac{\mathsf{Bin}(n,p)_{k}}{\mathsf{Bin}(n,p)_{k+1}}=\frac{1-p}{p}\cdot\frac{k+1}{n-k}\geq\frac{1-p}{p}\cdot\frac{k}{n}.

By Lemma B.2,

2dε(1+𝖡𝗂𝗇(n,p)||𝖡𝗂𝗇(n,p))\displaystyle\displaystyle 2\cdot d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p)) Prk𝖡𝗂𝗇(n,p)[𝖡𝗂𝗇(n,p)k𝖡𝗂𝗇(n,p)k+12eε]\displaystyle\displaystyle\geq\Pr_{k\leftarrow\mathsf{Bin}(n,p)}\left[\frac{\mathsf{Bin}(n,p)_{k}}{\mathsf{Bin}(n,p)_{k+1}}\geq 2e^{\varepsilon}\right]
Prk𝖡𝗂𝗇(n,p)[1ppkn2eε]\displaystyle\displaystyle\geq\Pr_{k\leftarrow\mathsf{Bin}(n,p)}\left[\frac{1-p}{p}\cdot\frac{k}{n}\geq 2e^{\varepsilon}\right]
=Pr[𝖡𝗂𝗇(n,p)2eεnp1p].\displaystyle\displaystyle=\Pr\left[\mathsf{Bin}(n,p)\geq 2e^{\varepsilon}\cdot n\cdot\frac{p}{1-p}\right].
Pr[𝖡𝗂𝗇(n,p)4eεpn].\displaystyle\displaystyle\geq\Pr\left[\mathsf{Bin}(n,p)\geq 4e^{\varepsilon}\cdot pn\right].

Now, by anti-concentration of the binomial distribution [Rob90, Page 115], we have

Pr[𝖡𝗂𝗇(n,p)4eεpn]12nexp(nKL(4eεp||p)).\displaystyle\Pr\left[\mathsf{Bin}(n,p)\geq 4e^{\varepsilon}\cdot pn\right]\geq\frac{1}{\sqrt{2n}}\exp(-n\cdot KL(4e^{\varepsilon}p||p)).

Letting λ=4eε\displaystyle\lambda=4e^{\varepsilon}, we have

KL(λp||p)=λplogλpp+(1λp)log1λp1pλplogλpp=λplogλ.\displaystyle KL(\lambda p||p)=\lambda p\cdot\log\frac{\lambda p}{p}+(1-\lambda p)\cdot\log\frac{1-\lambda p}{1-p}\leq\lambda p\cdot\log\frac{\lambda p}{p}=\lambda p\cdot\log\lambda.

Putting everything together, we get

dε(1+𝖡𝗂𝗇(n,p)||𝖡𝗂𝗇(n,p))122nexp(nλplogλ)=122nexp(n4eεplog(4eε)).\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p))\geq\frac{1}{2\sqrt{2n}}\exp(-n\lambda p\cdot\log\lambda)=\frac{1}{2\sqrt{2n}}\exp(-n4e^{\varepsilon}p\cdot\log(4e^{\varepsilon})).\qed

We are now ready to prove Lemma 4.8.

Proof of Lemma 4.8.

Let τ=eεeεβ1+ααeεβ\displaystyle\tau=\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}, we have τeεΔ\displaystyle\tau\leq\frac{e^{\varepsilon}}{\Delta}. Let N=𝖡𝗂𝗇(m1,β)\displaystyle N=\mathsf{Bin}(m-1,\beta). By Lemma B.1, we have that

dε(𝖡𝖾𝗋(α)+N||𝖡𝖾𝗋(β)+N)Δdlnτ(1+N||N).\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+N||\mathsf{Ber}(\beta)+N)\geq\Delta\cdot d_{\ln\tau}(1+N||N).

Applying Lemma B.3 and note that 4τβ4eεΔβ<1/2\displaystyle 4\tau\beta\leq 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2, it follows that

dlnτ(1+N||N)122mexp(m4τβlog(4τ)).\displaystyle d_{\ln\tau}(1+N||N)\geq\frac{1}{2\sqrt{2m}}\cdot\exp(-m\cdot 4\tau\beta\log(4\tau)).

We thus have that

m4τβlog(4τ)O(meεΔβ[log(Δ1)+1]).\displaystyle m\cdot 4\tau\beta\log(4\tau)\leq O\left(m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).\qed

Appendix C Simulation of Shuffle Protocols by SQ Algorithms

In this section, we show the connection between dominated protocols and SQ algorithms, which implies that DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocols can be simulated by SQ algorithms. This is analogous to the result of Kasiviswanathan et al. [KLN+11] who proved such a connection between DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} protocols and SQ algorithms. In the following, we use the notation of [KLN+11].

C.1 SQ Model

We first introduce the statistical query (SQ) model. In the SQ model, algorithms access a distribution through its statistical properties instead of individual samples.

Definition C.1 (SQ Oracle).

Let 𝒟\displaystyle\mathcal{D} be a distribution over 𝒳\displaystyle\mathcal{X}. An SQ oracle SQ𝒟\displaystyle\textsf{SQ}_{\mathcal{D}} takes as input a function g:D{1,1}\displaystyle g\colon D\rightarrow\{-1,1\} and a tolerance parameter τ(0,1)\displaystyle\tau\in(0,1); it outputs an estimate v\displaystyle v such that:

|vg(𝒟)|τ.\displaystyle|v-g(\mathcal{D})|\leq\tau.
Definition C.2 (SQ algorithm).

An SQ algorithm is an algorithm that accesses the distribution 𝒟\displaystyle\mathcal{D} only through the SQ oracle SQ𝒟\displaystyle\textsf{SQ}_{\mathcal{D}}.

C.2 Simulation of Dominated Algorithms by SQ Algorithms

We have the following simulation of dominated algorithms by SQ algorithms.

Theorem C.3.

Suppose R:𝒳\displaystyle R\colon\mathcal{X}\to\mathcal{M} is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated. Then, for any distribution 𝒰\displaystyle\mathcal{U} and error parameter βδ\displaystyle\beta\geq\delta, one can take a sample from R(𝒰)\displaystyle R(\mathcal{U}) with statistical error O(β)\displaystyle O(\beta) using O(eε)\displaystyle O(e^{\varepsilon}) queries in expectation to SQ𝒰\displaystyle\textsf{SQ}_{\mathcal{U}} with tolerance τ=β/eε\displaystyle\tau=\beta/e^{\varepsilon}.

Proof.

Suppose R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated by 𝒟\displaystyle\mathcal{D}. Let τ=β/eε\displaystyle\tau=\beta/e^{\varepsilon}. For every x𝒳\displaystyle x\in\mathcal{X} and z\displaystyle z\in\mathcal{M}, we use px,z\displaystyle p_{x,z} (respectively, px,E\displaystyle p_{x,E}) to denote Pr[R(x)=z]\displaystyle\Pr[R(x)=z] (respectively, Pr[R(x)E]\displaystyle\Pr[R(x)\in E]), and let

fz(x)=px,zeε𝒟zandgz(x)=min(1,fz(x)).\displaystyle f_{z}(x)=\frac{p_{x,z}}{e^{\varepsilon}\cdot\mathcal{D}_{z}}~{}~{}\text{and}~{}~{}g_{z}(x)=\min(1,f_{z}(x)).

Our algorithm is a rejection sampling procedure adapted from [KLN+11]. It works as follows:

  1. 1.

    Take a sample z𝒟\displaystyle z\leftarrow\mathcal{D}.

  2. 2.

    We make a query gz\displaystyle g_{z} to SQ𝒰\displaystyle\textsf{SQ}_{\mathcal{U}} with tolerance level τ\displaystyle\tau, to obtain an estimate g^z\displaystyle\hat{g}_{z} such that |g^zgz(𝒰)|τ\displaystyle|\hat{g}_{z}-g_{z}(\mathcal{U})|\leq\tau.

  3. 3.

    With probability max(g^z,0)\displaystyle\max(\hat{g}_{z},0), we output z\displaystyle z and stop. Otherwise we go back to Step 1.

Let 𝒯x={z:px,z>eε𝒟z}\displaystyle\mathcal{T}_{x}=\{z:p_{x,z}>e^{\varepsilon}\cdot\mathcal{D}_{z}\}. Note that pz,𝒯x2δ2β\displaystyle p_{z,\mathcal{T}_{x}}\leq 2\delta\leq 2\beta since R\displaystyle R is (ε,δ)\displaystyle(\varepsilon,\delta)-dominated by 𝒟\displaystyle\mathcal{D}. By the definition of gz\displaystyle g_{z}, it holds that gz(x)=fz(x)\displaystyle g_{z}(x)=f_{z}(x) for every z𝒯x\displaystyle z\not\in\mathcal{T}_{x}. We will need the following claim.

Claim 1.

For every x𝒳\displaystyle x\in\mathcal{X},

𝔼z𝒟|fz(x)gz(x)|2β/eε.\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\left|f_{z}(x)-g_{z}(x)\right|\leq 2\beta/e^{\varepsilon}.
Proof.
𝔼z𝒟|fz(x)gz(x)|\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\left|f_{z}(x)-g_{z}(x)\right| z𝒯x𝒟zfz(x)\displaystyle\displaystyle\leq\sum_{z\in\mathcal{T}_{x}}\mathcal{D}_{z}\cdot f_{z}(x)
z𝒯xpx,zeε\displaystyle\displaystyle\leq\sum_{z\in\mathcal{T}_{x}}p_{x,z}\cdot e^{-\varepsilon}
px,𝒯x/eε2β/eε.\displaystyle\displaystyle\leq p_{x,\mathcal{T}_{x}}/e^{\varepsilon}\leq 2\beta/e^{\varepsilon}.\qed

Now, in a single run, the above algorithm outputs z\displaystyle z with probability in the interval

[𝒟z(gz(𝒰)τ),𝒟z(gz(𝒰)+τ)].\displaystyle[\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau),\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)].

Note that 𝔼z𝒟fz(𝒰)=𝔼x𝒰zpx,zeε=eε\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}}\sum_{z}\frac{p_{x,z}}{e^{\varepsilon}}=e^{-\varepsilon}. By Claim 1, the algorithm terminates in a single run with probability at least

𝔼z𝒟(gz(𝒰)τ)(𝔼z𝒟fz(𝒰))τ2β/eε=eε(13β),\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}(g_{z}(\mathcal{U})-\tau)\geq\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})\right)-\tau-2\beta/e^{\varepsilon}=e^{-\varepsilon}\cdot(1-3\beta),

and at most

𝔼z𝒟(gz(𝒰)+τ)(𝔼z𝒟fz(𝒰))+τ+2β/eε=eε(1+3β).\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}(g_{z}(\mathcal{U})+\tau)\leq\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})\right)+\tau+2\beta/e^{\varepsilon}=e^{-\varepsilon}\cdot(1+3\beta).

The above implies that the algorithm makes at most O(eε)\displaystyle O(e^{\varepsilon}) queries to SQ𝒰\displaystyle\textsf{SQ}_{\mathcal{U}} in expectation.

Putting everything together, our algorithm outputs z\displaystyle z with probability in the following interval:

Iz:=[𝒟z(gz(𝒰)τ)eε(1+3β),𝒟z(gz(𝒰)+τ)eε(13β)].\displaystyle I_{z}:=\left[\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau)\cdot e^{\varepsilon}}{(1+3\beta)},\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)\cdot e^{\varepsilon}}{(1-3\beta)}\right].

We have that

Pr[R(𝒰)=z]=𝔼x𝒰px,z=fz(𝒰)𝒟zeε.\displaystyle\Pr[R(\mathcal{U})=z]=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}}p_{x,z}=f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}.

Note that

maxvIz|vfz(𝒰)𝒟zeε|\displaystyle\displaystyle\max_{v\in I_{z}}\left|v-f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right| maxvIz|vgz(𝒰)𝒟zeε|+|fz(𝒰)𝒟zeεgz(𝒰)𝒟zeε|.\displaystyle\displaystyle\leq\max_{v\in I_{z}}\left|v-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right|+\left|f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right|.

Moreover, we have that

maxvIz|vgz(𝒰)𝒟zeε|\displaystyle\displaystyle\max_{v\in I_{z}}\left|v-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right| =eεmax{|𝒟z(gz(𝒰)τ)(1+3β)gz(𝒰)𝒟z|,|𝒟z(gz(𝒰)+τ)(13β)gz(𝒰)𝒟z|}\displaystyle\displaystyle=e^{\varepsilon}\cdot\max\left\{\left|\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau)}{(1+3\beta)}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right|,\left|\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)}{(1-3\beta)}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right|\right\}
eεgz(𝒰)𝒟zO(β)+eε𝒟zO(τ).\displaystyle\displaystyle\leq e^{\varepsilon}\cdot g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot O(\beta)+e^{\varepsilon}\cdot\mathcal{D}_{z}\cdot O(\tau).

The final statistical error of our sampling algorithm is therefore bounded by

zmaxvIz|vfz(𝒰)𝒟zeε|\displaystyle\displaystyle\sum_{z\in\mathcal{M}}\max_{v\in I_{z}}\left|v-f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right| eεz(gz(𝒰)𝒟zO(β)+𝒟zO(τ)+|fz(𝒰)𝒟zgz(𝒰)𝒟z|)\displaystyle\displaystyle\leq e^{\varepsilon}\cdot\sum_{z\in\mathcal{M}}\left(g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot O(\beta)+\mathcal{D}_{z}\cdot O(\tau)+\left|f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right|\right)
eε𝔼z𝒟[fz(𝒰)O(β)+O(τ)+|fz(𝒰)gz(𝒰)|]\displaystyle\displaystyle\leq e^{\varepsilon}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\Big{[}f_{z}(\mathcal{U})\cdot O(\beta)+O(\tau)+|f_{z}(\mathcal{U})-g_{z}(\mathcal{U})|\Big{]} (gz(𝒰)fz(𝒰)\displaystyle g_{z}(\mathcal{U})\leq f_{z}(\mathcal{U}))
O(β).\displaystyle\displaystyle\leq O(\beta). (𝔼z𝒟fz(𝒰)=eε\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})=e^{-\varepsilon} and Claim 1)

C.3 Applications

We are now ready to apply Theorem C.3 to show that protocols in the DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} model can be simulated by SQ algorithms when the database is drawn i.i.d. from a single distribution.

Theorem C.4.

Let z\displaystyle z be a database with n\displaystyle n entries drawn i.i.d. from a distribution 𝒰\displaystyle\mathcal{U}. Let P=(R,S,A)\displaystyle P=(R,S,A) be an (ε,o(1/n))\displaystyle(\varepsilon,o(1/n))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol on n\displaystyle n users. Then, there is an algorithm making O((en)k+1eε)\displaystyle O((en)^{k+1}\cdot e^{\varepsilon}) queries in expectation to SQ𝒰\displaystyle\textsf{SQ}_{\mathcal{U}} with tolerance τ=Θ(1(en)k+1eε)\displaystyle\tau=\Theta\left(\frac{1}{(en)^{k+1}\cdot e^{\varepsilon}}\right), such that its output distribution differs by at most 0.01\displaystyle 0.01 in statistical distance from the output distribution of P\displaystyle P on the dataset z\displaystyle z.

Proof.

Note that it suffices to draw n\displaystyle n i.i.d. samples from R(𝒰)\displaystyle R(\mathcal{U}). By Lemma 1.8, we now that R\displaystyle R is (ε+k(1+lnn),o(1/n))\displaystyle(\varepsilon+k(1+\ln n),o(1/n))-dominated. Let γ=eε+k(1+lnn)=eε(en)k\displaystyle\gamma=e^{\varepsilon+k(1+\ln n)}=e^{\varepsilon}\cdot(en)^{k}. By Theorem C.3, using O(γ)\displaystyle O(\gamma) queries in expectation to SQ𝒰\displaystyle\textsf{SQ}_{\mathcal{U}} with tolerance τ=Θ(1/γn)\displaystyle\tau=\Theta(1/\gamma n), we can sample from R(𝒰)\displaystyle R(\mathcal{U}) with statistical error 1/100n\displaystyle 1/100n. Taking n\displaystyle n such samples completes the proof. ∎

We remark that [BFJ+94] proved that if an SQ algorithm solves ParityLearning with probability at least 0.99\displaystyle 0.99, T\displaystyle T queries and tolerance 1/T\displaystyle 1/T, then T=Ω(2D/3)\displaystyle T=\Omega(2^{D/3}) (recall that D\displaystyle D is the dimension of the hidden vector in ParityLearning). Combing the foregoing lower bound with Theorem C.4, it translates to an Ω(2D/3(k+1))\displaystyle\Omega(2^{D/3(k+1)}) lower bound on the sample complexity of (O(1),o(1/n))\displaystyle(O(1),o(1/n))-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocols solving ParityLearning, which is slightly weaker than our Theorem 1.10.

Appendix D Upper Bounds for Selection in DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}

In this section, we give a proof sketch for the DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol for Selection with sample complexity O~(D/k)\displaystyle\tilde{O}(D/\sqrt{k}).

Theorem D.1.

For any kD\displaystyle k\leq D, ε=O(1)\displaystyle\varepsilon=O(1) and δ=1/poly(n)\displaystyle\delta=1/\mathop{\mathrm{poly}}(n), there is an (ε,δ)\displaystyle(\varepsilon,\delta)-DPshufflek\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k} protocol solving Selection with probability at least 0.99\displaystyle 0.99 and n=O~(D/k)\displaystyle n=\tilde{O}(D/\sqrt{k}).

Proof Sketch.

Let ε=O(1)\displaystyle\varepsilon=O(1) and δ=1/poly(n)\displaystyle\delta=1/\mathop{\mathrm{poly}}(n) be the privacy parameters. We also let ε0=Θ(ε/k)\displaystyle\varepsilon_{0}=\Theta(\varepsilon/\sqrt{k}), δ0=1/poly(n)\displaystyle\delta_{0}=1/\mathop{\mathrm{poly}}(n) and n=Θ~(D/k)\displaystyle n=\tilde{\Theta}(D/\sqrt{k}) to be specified later.

We can assume that k=(logn)ω(1)\displaystyle k=(\log n)^{\omega(1)}, as otherwise the protocol simply follows from the O~(D)\displaystyle\tilde{O}(D) sample upper bound by an (ε,δ)\displaystyle(\varepsilon,\delta)-DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} protocol [GGK+19].

Let m=k/log2n\displaystyle m=k/\log^{2}n, and N=nm/D\displaystyle N=nm/D. Note that by our choice of k\displaystyle k and D\displaystyle D, N=(logn)ω(1)\displaystyle N=(\log n)^{\omega(1)}.

Now for each i[D]\displaystyle i\in[D], our protocol maintains an (ε0,δ0)\displaystyle(\varepsilon_{0},\delta_{0})-DP subprotocol aiming at estimating the fraction of users whose input x\displaystyle x satisfies xi=1\displaystyle x_{i}=1. These subprotocols assume they will receive between 0.99N\displaystyle 0.99N and 1.01N\displaystyle 1.01N inputs. By [GMPV20, BBGN20], there is such a protocol which achieves error O(ε01logn)\displaystyle O(\varepsilon_{0}^{-1}\log n) with probability at least 11/n2\displaystyle 1-1/n^{2} and using O(log(1/δ)logN)O(logn)\displaystyle O\left(\frac{\log(1/\delta)}{\log N}\right)\leq O(\log n) messages.

In our protocol, each user selects k/log2n\displaystyle k/\log^{2}n coordinates from [D]\displaystyle[D] uniformly at random, and participates in the corresponding subprotocols. Finally, the analyzer aggregates the outputs of all subprotocols and outputs the coordinate with the highest estimate.

Note that by a union bound a Chernoff bound, it follows that with probability at least 1nω(1)\displaystyle 1-n^{-\omega(1)}, the number of users of every subprotocol falls in the range [0.99N,1.01N]\displaystyle[0.99N,1.01N], and their mean is 0.01\displaystyle 0.01-close to the true mean of i\displaystyle i-th coordinates of all users.

Setting ε0=Θ(ε/k)\displaystyle\varepsilon_{0}=\Theta(\varepsilon/\sqrt{k}) and δ0=1/poly(n)\displaystyle\delta_{0}=1/\mathop{\mathrm{poly}}(n) appropriately, the protocol is (ε,δ)\displaystyle(\varepsilon,\delta)-DP by the advanced composition theorem of DP [DRV10]. Moreover, with probability at least 11/n\displaystyle 1-1/n, all subprotocols obtain estimates with error O(ε01logn)\displaystyle O(\varepsilon_{0}^{-1}\log n).

Setting n\displaystyle n so that ε01logn=o(N)\displaystyle\varepsilon_{0}^{-1}\log n=o\left(N\right), our protocol solves Selection with probability at least 0.99\displaystyle 0.99.