On Distributed Differential Privacy
and Counting Distinct Elements

Lijie Chen
Massachusetts Institute of Technology
Cambridge, MA Email: [email protected]. Most of this work was done at Google Research, Mountain View, CA. Badih Ghazi
Google Research
Mountain View, CA Email: [email protected] Ravi Kumar
Google Research
Mountain View, CA Email: [email protected] Pasin Manurangsi
Google Research
Mountain View, CA Email: [email protected]

Abstract

We study the setup where each of $\displaystyle n$ users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of $\displaystyle(\varepsilon,\delta)$ -differentially privacy:

•

In the non-interactive local setting, we prove that the additive error of any protocol is $\displaystyle\Omega(n)$ for any constant $\displaystyle\varepsilon$ and for any $\displaystyle\delta$ inverse polynomial in $\displaystyle n$ .
•

In the single-message shuffle setting, we prove a lower bound of $\displaystyle\tilde{\Omega}(n)$ on the error for any constant $\displaystyle\varepsilon$ and for some $\displaystyle\delta$ inverse quasi-polynomial in $\displaystyle n$ . We do so by building on the moment-matching method from the literature on distribution estimation.
•

In the multi-message shuffle setting, we give a protocol with at most one message per user in expectation and with an error of $\displaystyle\tilde{O}(\sqrt{n})$ for any constant $\displaystyle\varepsilon$ and for any $\displaystyle\delta$ inverse polynomial in $\displaystyle n$ . Our protocol is also robustly shuffle private, and our error of $\displaystyle\sqrt{n}$ matches a known lower bound for such protocols.

Our proof technique relies on a new notion, that we call dominated protocols, and which can also be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of selection and learning parity.

Our first lower bound for estimating the number of distinct elements provides the first $\displaystyle\omega(\sqrt{n})$ separation between global sensitivity and error in local differential privacy, thus answering an open question of Vadhan (2017). We also provide a simple construction that gives $\displaystyle\tilde{\Omega}(n)$ separation between global sensitivity and error in two-party differential privacy, thereby answering an open question of McGregor et al. (2011).

1 Introduction

Differential privacy (DP) [DMNS06, DKM⁺06] has become a leading framework for private-data analysis, with several recent practical deployments [EPK14, Sha14, Gre16, App17, DKY17, Abo18]. The most commonly studied DP setting is the so-called central (aka curator) model whereby a single authority (sometimes referred to as the analyst) is trusted with running an algorithm on the raw data of the users and the privacy guarantee applies to the algorithm’s output.

The absence, in many scenarios, of a clear trusted authority has motivated the study of distributed DP models. The most well-studied such setting is the local model [KLN⁺11] (also [War65]), denoted henceforth by $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , where the privacy guarantee is enforced at each user’s output (i.e., the protocol transcript). While an advantage of the local model is its very strong privacy guarantees and minimal trust assumptions, the noise that has to be added can sometimes be quite large. This has stimulated the study of “intermediate” models that seek to achieve accuracy close to the central model while relying on more distributed trust assumptions. One such middle-ground is the so-called shuffle (aka anonymous) model [IKOS06, BEM⁺17, CSU⁺18, EFM⁺19], where the users send messages to a shuffler who randomly shuffles these messages before sending them to the analyzer; the privacy guarantee is enforced on the shuffled messages (i.e., the input to the analyzer). We study both the local and the shuffle models in this work.

1.1 Counting Distinct Elements

A basic function in data analytics is estimating the number of distinct elements in a domain of size $\displaystyle D$ held by a collection of $\displaystyle n$ users, which we denote by $\displaystyle\textsf{\small CountDistinct}_{n,D}$ (and simply by $\displaystyle\textsf{\small CountDistinct}_{n}$ if there is no restriction on the universe size). Beside its use in database management systems, it is a well-studied problem in sketching, streaming, and communication complexity (e.g., [KNW10, BCK⁺14] and the references therein). In central DP, it can be easily solved with constant error using the Laplace mechanism [DMNS06]; see also [MMNW11, DLB19, PS20, CDSKY20].

We obtain new results on $\displaystyle(\varepsilon,\delta)$ -DP protocols for CountDistinct in the local and shuffle settings¹¹1For formal definitions, please refer to Section 3. We remark that, throughout this work, we consider the non-interactive local model where all users apply the same randomizer (see Definition 3.6). We briefly discuss in Section 1.4 possible extensions to interactive local models..

1.1.1 Lower Bounds for Local DP Protocols

Our first result is a lower bound on the additive error of $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols²²2See Section 3 for the the formal (standard) definition of public-coin DP protocols. Note that private-coin protocols are a sub-class of public-coin protocols, so all of our lower bounds apply to private-coin protocols as well. for counting distinct elements.

Theorem 1.1.

For any $\displaystyle\varepsilon=O(1)$ , no public-coin $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol can solve³³3Throughout this work, we say that a randomized algorithm solves a problem with error $\displaystyle e$ if with probability 0.99 it incurs error at most $\displaystyle e$ . $\displaystyle\textsf{\small CountDistinct}_{n,n}$ with error $\displaystyle o(n)$ .

The lower bound in Theorem 1.1 is asymptotically tight⁴⁴4The trivial algorithm that always outputs $\displaystyle 0$ incurs an error $\displaystyle n$ .. Furthermore, it answers a question of Vadhan [Vad17, Open Problem 9.6], who asked if there is a function with a gap of $\displaystyle\omega(\sqrt{n})$ between its (global) sensitivity and the smallest achievable error by any $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol.⁵⁵5 To the best of our knowledge, the largest previously known gap between global sensitivity and error was $\displaystyle O(\sqrt{n})$ , which is achieved, e.g., by binary summation [CSS12]. As the global sensitivity of the number of distinct elements is $\displaystyle 1$ , Theorem 1.1 exhibits a (natural) function for which this gap is as large as $\displaystyle\Omega(n)$ . While Theorem 1.1 applies to the constant $\displaystyle\varepsilon$ regime, it turns out we can prove a lower bound for much less private protocols (i.e., having a much larger $\displaystyle\varepsilon$ value) at the cost of polylogarithmic factors in the error:

Theorem 1.2.

For some $\displaystyle\varepsilon=\ln(n)-O(\ln\ln n)$ and $\displaystyle D=\Theta(n/\mathop{\mathrm{polylog}}(n))$ , no public-coin $\displaystyle(\varepsilon,n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol can solve $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ .

To prove Theorem 1.2, we build on the moment matching method from the literature on (non-private) distribution estimation, namely [VV17, WY19], and tailor it to CountDistinct in the $\displaystyle\mathrm{DP}_{\mathrm{local}}$ setting (see Section 2.1 for more details on this connection). The bound on the privacy parameter $\displaystyle\varepsilon$ in Theorem 1.2 turns out to be very close to tight: the error drops quadratically when $\displaystyle\varepsilon$ exceeds $\displaystyle\ln{n}$ . This is shown in the next theorem:

Theorem 1.3.

There is a $\displaystyle(\ln(n)+O(1))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol solving $\displaystyle\textsf{\small CountDistinct}_{n,n}$ with error $\displaystyle O(\sqrt{n})$ .

1.1.2 Lower Bounds for Single-Message Shuffle DP Protocols

In light of the negative result in Theorem 1.2, a natural question is whether CountDistinct can be solved in a weaker distributed DP setting such as the shuffle model. It turns out that this is not possible using any shuffle protocol where each user sends no more than $\displaystyle 1$ message (for brevity, we will henceforth denote this class by $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ , and more generally denote by $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ the variant where each user can send up to $\displaystyle k$ messages). Note that the class $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ includes any method obtained by taking a $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol and applying the so-called amplification by shuffling results of [EFM⁺19, BBGN19].

In the case where $\displaystyle\varepsilon$ is any constant and $\displaystyle\delta$ is inverse quasi-polynomial in $\displaystyle n$ , the improvement in the error for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols compared to $\displaystyle\mathrm{DP}_{\mathrm{local}}$ is at most polylogarithmic factors:

Theorem 1.4.

For all $\displaystyle\varepsilon=O(1)$ , there are $\displaystyle\delta=2^{-\mathop{\mathrm{polylog}}(n)}$ and $\displaystyle D=n/\mathop{\mathrm{polylog}}(n)$ such that no public-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol can solve $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ .

We note that Theorem 1.4 essentially answers a more general variant of Vadhan’s question: it shows that even for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols (which include $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols as a sub-class) the gap between sensitivity and the error can be as large as $\displaystyle\tilde{\Omega}(n)$ .

The proof of Theorem 1.4 follows by combining Theorem 1.2 with the following connection between $\displaystyle\mathrm{DP}_{\mathrm{local}}$ and $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ :

Lemma 1.5.

For any $\displaystyle\varepsilon=O(1)$ and $\displaystyle\delta\leq\delta_{0}\leq 1/n$ , if the randomizer $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ on $\displaystyle n$ users, then $\displaystyle R$ is $\displaystyle\left(\ln n-\ln(\Theta_{\varepsilon}(\log\delta_{0}^{-1}/\log\delta^{-1})),\delta_{0}\right)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

We remark that Lemma 1.5 provides a stronger quantitative bound than the qualitatively similar connections in [CSU⁺18, GGK⁺19]; specifically, we obtain the term $\displaystyle\ln(\Theta_{\varepsilon}(\log\delta_{0}^{-1}/\log\delta^{-1}))$ , which was not present in the aforementioned works. This turns out to be crucial for our purposes, as this term gives the $\displaystyle O(\ln\ln n)$ term necessary to apply Theorem 1.2.

1.1.3 A Communication-Efficient Shuffle DP Protocol

In contrast with Theorem 1.4, Balcer et al. [BCJM20] recently gave a $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol for $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle O(\sqrt{D})$ . Their protocol sends $\displaystyle\Omega(D)$ messages per user. We instead show that an error of $\displaystyle\tilde{O}(\sqrt{D})$ can still be guaranteed with each user sending in expectation at most one message each of length $\displaystyle O(\log D)$ bits.

Theorem 1.6.

For all $\displaystyle\varepsilon\leq O(1)$ and $\displaystyle\delta\leq 1/n$ , there is a public-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol that solves $\displaystyle\textsf{\small CountDistinct}_{n}$ with error $\displaystyle\sqrt{\min(n,D)}\cdot\mathop{\mathrm{poly}}(\log(1/\delta)/\varepsilon)$ where the expected number of messages sent by each user is at most one.

In the special case where $\displaystyle D=o(n/\mathop{\mathrm{poly}}(\varepsilon^{-1}\log(\delta^{-1})))$ , we moreover obtain a private-coin $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol achieving the same guarantees as in Theorem 1.6 (see Theorem 8.4 for a formal statement). Note that Theorem 1.6 is in sharp contrast with the lower bound shown in Theorem 1.4 for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols. Indeed, for $\displaystyle\delta$ inverse quasi-polynomial in $\displaystyle n$ , the former implies a public-coin protocol with less than one message per-user in expectation having error $\displaystyle\tilde{O}(\sqrt{n})$ whereas the latter proves that no such protocol exists, even with error as large as $\displaystyle\tilde{\Omega}(n)$ , if we restrict each user to send one message in the worst case.

A strengthening of $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols called robust $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols⁶⁶6Roughly speaking, they are $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols whose transcript remains private even if a constant fraction of users drop out from the protocol. was studied by [BCJM20], who proved an $\displaystyle\Omega\left(\sqrt{\min(D,n)}\right)$ lower bound on the error of any protocol solving $\displaystyle\textsf{\small CountDistinct}_{n,D}$ . Our protocols are robust $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ and, therefore, achieve the optimal error (up to polylogarithmic factors) among all robust $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols, while only sending at most one message per user in expectation.

1.2 Dominated Protocols and Multi-Message Shuffle DP Protocols

The technique underlying the proof of Theorem 1.1 can be extended beyond $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols for CountDistinct. It applies to a broader category of protocols that we call dominated, defined as follows:

Definition 1.7.

We say that a randomizer $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ is $\displaystyle(\varepsilon,\delta)$ -dominated, if there exists a distribution $\displaystyle\mathcal{D}$ on $\displaystyle\mathcal{M}$ such that for all $\displaystyle x\in\mathcal{X}$ and all $\displaystyle E\subseteq\mathcal{M}$ ,

\displaystyle\Pr[R(x)\in E]\leq e^{\varepsilon}\cdot\Pr_{\mathcal{D}}[E]+\delta.

In this case, we also say $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -dominated by $\displaystyle\mathcal{D}$ . We define $\displaystyle(\varepsilon,\delta)$ -dominated protocols in the same way as $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , except that we require the randomizer to be $\displaystyle(\varepsilon,\delta)$ -dominated instead of being $\displaystyle(\varepsilon,\delta)$ -DP.

Note that an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ randomizer $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -dominated: we can fix a $\displaystyle y^{*}\in\mathcal{X}$ and take $\displaystyle\mathcal{D}=R(y^{*})$ . Therefore, our new definition is a relaxation of $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

We show that multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols are dominated, which allows us to prove the first non-trivial lower bounds against $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)}$ protocols.

Before formally stating this connection, we recall why known lower bounds against $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols [CSU⁺18, GGK⁺19, BC20] do not extend to $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)}$ protocols.⁷⁷7We remark that [GGK⁺20] developed a technique for proving lower bounds on the communication complexity (i.e., the number of bits sent per user) for multi-message protocols. Their techniques do not apply to our setting as our lower bounds are in terms of the number of messages, and do not put any restriction on the message length. Furthermore, their technique only applies to pure-DP where $\displaystyle\delta=0$ , whereas ours applies also to approximate-DP where $\displaystyle\delta>0$ . These prior works use the connection stating that any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol is also $\displaystyle(\varepsilon+\ln n,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ [CSU⁺18, Theorem 6.2]. It thus suffices for them to prove lower bounds for $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols with low privacy requirement (i.e., $\displaystyle(\varepsilon+\ln n,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ ), for which lower bound techniques are known or developed. For $\displaystyle\varepsilon$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols, [BC20] showed that they are also $\displaystyle\varepsilon$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ ; therefore, lower bounds on $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols automatically translate to lower bounds on pure- $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols. To apply this proof framework to $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)}$ protocols, a natural first step would be to connect $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)}$ protocols to $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols. However, as observed by [BC20, Section 4.1], there exists an $\displaystyle\varepsilon$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{O(1)}$ protocol that is not $\displaystyle\mathrm{DP}_{\mathrm{local}}$ for any privacy parameter. That is, there is no analogous connection between $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols and multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols, even if the latter can only send $\displaystyle O(1)$ messages per user.

In contrast, the next lemma captures the connection between multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ and dominated protocols.

Lemma 1.8.

If $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ on $\displaystyle n$ users, then it is $\displaystyle(\varepsilon+k(1+\ln n),\delta)$ -dominated.

By considering dominated protocols and using Lemma 1.8, we obtain the first lower bounds for multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols for two well-studied problems: Selection and ParityLearning.

1.2.1 Lower Bounds for Selection

The Selection problem on $\displaystyle n$ users is defined as follows. The $\displaystyle i$ th user has an input $\displaystyle x_{i}\in\{0,1\}^{D}$ and the goal is to output an index $\displaystyle j\in[D]$ such that $\displaystyle\sum_{i=1}^{n}x_{i,j}\geq\left(\max_{j^{*}}\sum_{i=1}^{n}x_{i,j^{*}}\right)-n/10$ . Selection is well-studied in DP (e.g., [DJW13, SU17, Ull18]) and its variants are useful primitives for several statistical and algorithmic problems including feature selection, hypothesis testing and clustering. In central DP, the exponential mechanism of [MT07] yields an $\displaystyle\varepsilon$ -DP algorithm for Selection when $\displaystyle n=O_{\varepsilon}(\log{D})$ . On the other hand, it is known that any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol for Selection with $\displaystyle\varepsilon=O(1)$ and $\displaystyle\delta=O(1/n^{1.01})$ requires $\displaystyle n=\Omega(D\log{D})$ users [Ull18]. Moreover, [CSU⁺18] obtained a $\displaystyle(\varepsilon,1/n^{O(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{D}$ protocol for $\displaystyle n=\tilde{O}_{\varepsilon}(\sqrt{D})$ . By contrast, for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols, a lower bound of $\displaystyle\Omega(D^{1/17})$ was obtained in [CSU⁺18] and improved to $\displaystyle\Omega(D)$ in [GGK⁺19].

The next theorem give a lower bounds for Selection that holds against approximate- $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocols. To the best of our knowledge, this is the first lower bound even for $\displaystyle k=2$ (and even for the special case of pure protocols, where $\displaystyle\delta=0$ ).

Theorem 1.9.

For any $\displaystyle\varepsilon=O(1)$ , any public-coin $\displaystyle(\varepsilon,o(1/D))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol that solves Selection requires $\displaystyle n\geq\Omega\left(\frac{D}{k}\right)$ .

We remark that combining the advanced composition theorem for DP and known $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ aggregation algorithms, one can obtain a $\displaystyle(\varepsilon,1/\mathop{\mathrm{poly}}(n))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol for Selection with $\displaystyle\tilde{O}(D/\sqrt{k})$ samples for any $\displaystyle k\leq D$ (see Appendix D for details).

1.2.2 Lower Bounds for Parity Learning

In ParityLearning, there is a hidden random vector $\displaystyle s\in\{0,1\}^{D}$ , each user gets a random vector $\displaystyle x\in\{0,1\}^{D}$ together with the inner product $\displaystyle\langle s,x\rangle$ over $\displaystyle\mathbb{F}_{2}$ , and the goal is to recover $\displaystyle s$ . This problem is well-known for separating PAC learning from the Statistical Query (SQ) learning model [Kea98]. In DP, it was studied by [KLN⁺11] who gave a central DP protocol (also based on the exponential mechanism) computing it for $\displaystyle n=O(D)$ , and moreover proved a lower bound of $\displaystyle n=2^{\Omega(D)}$ for any $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol, thus obtaining the first exponential separation between the central and local settings.

We give a lower bound for ParityLearning that hold against approximate- $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocols:

Theorem 1.10.

For any $\displaystyle\varepsilon=O(1)$ , if $\displaystyle P$ is a public-coin $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol that solves ParityLearning with probability at least $\displaystyle 0.99$ , then $\displaystyle n\geq\Omega(2^{D/(k+1)})$ .

Our lower bounds for ParityLearning can be generalized to the Statistical Query (SQ) learning framework of [Kea98] (see Section C for more details).

Independent Work.

In a recent concurrent work, Cheu and Ullman [CU20] proved that robust $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols solving Selection and ParityLearning require $\displaystyle\Omega(\sqrt{D})$ and $\displaystyle\Omega(2^{\sqrt{D}})$ samples, respectively. Their results have no restriction on the number of messages sent by each user, but they only hold against the special case of robust protocols. Our results provide stronger lower bounds when the number of messages per user is less than $\displaystyle\sqrt{D}$ , and apply to the most general $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ model without the robustness restriction.

1.3 Lower Bounds for Two-Party DP Protocols

Finally, we consider another model of distributed DP, called the two-party model [MMP⁺10], denoted $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ . In this model, there are two parties, each holding part of the dataset. The DP guarantee is enforced on the view of each party (i.e., the transcript, its private randomness, and its input). See Section 9 for a formal treatment.

McGregor et al. [MMP⁺10] studied the $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ and proved an interesting separation of $\displaystyle\Omega_{\varepsilon}(n)$ between the global sensitivity and $\displaystyle\varepsilon$ -DP protocol in this model. However, this lower bound does not extend to the approximate-DP case (where $\displaystyle\delta>0$ ); in this case, the largest known gap (also proved in [MMP⁺10]) is only $\displaystyle\tilde{\Omega}_{\varepsilon}(\sqrt{n})$ , and it was left as an open question if this can be improved⁸⁸8The conference version of the paper [MMP⁺10] actually claimed to also have a lower bound $\displaystyle\Omega_{\varepsilon}(n)$ for the approximate-DP case as well. However, it was later found to be incorrect; see [MMP⁺11] for more discussions.. We answer this question by showing that the gap of $\displaystyle\tilde{\Omega}_{\varepsilon}(n)$ holds even against approximate-DP protocols:

Theorem 1.11.

For any $\displaystyle\varepsilon=O(1)$ and any sufficiently large $\displaystyle n\in\mathbb{N}$ , there is a function $\displaystyle f\colon\{0,1\}^{2n}\to\mathbb{R}$ whose global sensitivity is one and such that no $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol can compute $\displaystyle f$ to within an error of $\displaystyle o(n/\log n)$ .

The above bound is tight up to a logarithmic factors in $\displaystyle n$ , as it is trivial to achieve an error of $\displaystyle n$ .

The proof of Theorem 1.11 is unlike others in the paper; in fact, we only employ simple reductions starting from the hardness of inner product function already shown in [MMP⁺10]. Specifically, our function is a sum of blocks of inner product modulo 2. While this function is not symmetric, we show (Theorem 9.5) that it can be easily symmetrized.

1.4 Discussions and Open Questions

In this work, we study DP in distributed models, including the local and shuffle settings. By building on the moment matching method and using the newly defined notion of dominated protocols, we give novel lower bounds in both models for three fundamental problems: CountDistinct, Selection, and ParityLearning. While our lower bounds are (nearly) tight in a large setting of parameters, there are still many interesting open questions, three of which we highlight below:

•

$\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Lower Bounds for Protocols with Unbounded Number of Messages. Our connection between $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ and dominated protocols becomes weaker as $\displaystyle k\to\infty$ (Lemma 1.8). As a result, it cannot be used to establish lower bounds against $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols with a possibly unbounded number of messages. In fact, we are not aware of any separation between central DP and $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ without a restriction on the number of messages and without the robustness restriction. This remains a fundamental open question. (In contrast, separations between central DP and $\displaystyle\mathrm{DP}_{\mathrm{local}}$ are well-known, even for basic functions such as binary summation [CSS12].)
•

Lower Bounds against Interactive Local/Shuffle Model. Our lower bounds hold in the non-interactive local and shuffle DP models, where all users send their messages simultaneously in a single round. While it seems plausible that our lower bounds can be extended to the sequentially interactive local DP model [DJW13] (where each user speaks once but not simultaneously), it is unclear how to extend them to the fully interactive local DP model.

The situation for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ however is more complicated. Specifically, we are not aware of a formal treatment of an interactive setting for the shuffle model, which would be the first step in providing either upper or lower bounds. We remark that certain definitions could lead to the model being as powerful as the central model (in terms of achievable accuracy and putting aside communication constraints); see e.g., [IKOS06] on how to perform secure computations under a certain definition of the shuffle model.
•

$\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ Lower Bounds for CountDistinct with Larger $\displaystyle\delta$ . All but one of our lower bounds hold as long as $\displaystyle\delta=n^{-\omega(1)}$ , which is a standard assumption in the DP literature. The only exception is that of Theorem 1.4, which requires $\displaystyle\delta=2^{-\Omega(\log^{c}n)}$ for some constant $\displaystyle c>0$ . It is interesting whether this can be relaxed to $\displaystyle\delta=n^{-\omega(1)}$ .

1.5 Organization

We describe in Section 2 the techniques underlying our results.Some basic definitions and notation are given in Section 3. We prove our main lower bounds for CountDistinct (Theorems 1.2 and 1.4) in Section 4. In Section 5, we define dominated protocols and prove Lemma 1.8. Our lower bounds for Selection and ParityLearning are then proved in Section 6. Theorem 1.1 is then proved in Section 7. Our $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{\geq 1}$ protocol for CountDistinct is presented and analyzed in Section 8. Finally, our lower bounds in the two-party model (in particular, Theorem 1.11) are proved in Section 9. Some deferred proofs appear in Appendices A and B. The connection to the SQ model is presented in Appendix C. Finally, in Appendix D, we describe the $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol for Selection with sample complexity $\displaystyle\tilde{O}(D/\sqrt{k})$ .

2 Overview of Techniques

In this section, we describe the main intuition behind our lower bounds. As alluded to in Section 1, we give two different proofs of the lower bounds for CountDistinct in the $\mathrm{DP}_{\mathrm{local}}$ and $\mathrm{DP}_{\mathrm{shuffle}}$ settings, each with its own advantages:

•

Proof via Moment Matching. Our first proof is technically the hardest in our work. It applies to the much more challenging low-privacy setting (i.e., $(\ln n-O(\ln\ln n),\delta)$ - $\mathrm{DP}_{\mathrm{local}}$ ), and shows an $\Omega(n/\mathop{\mathrm{polylog}}(n))$ lower bound on the additive error (Theorem 1.2). Together with our new improved connection between $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ and $\mathrm{DP}_{\mathrm{local}}$ (Lemma 1.5), it also implies the same lower bound for protocols in the $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ model. The key ideas behind the first proof will be discussed in Section 2.1.
•

Proof via Dominated Protocols. Our second proof has the advantage of giving the optimal $\Omega(n)$ lower bound on the additive error (Theorem 1.1), but only in the constant privacy regime (i.e., $(O(1),\delta)$ - $\mathrm{DP}_{\mathrm{local}}$ ), and it is relatively simple compared to the first proof.

Moreover, the second proof technique is very general and is a conceptual contribution: it can be applied to show lower bounds for other fundamental problems (i.e., Selection and ParityLearning; Theorems 1.9 and 1.10) against multi-message $\mathrm{DP}_{\mathrm{shuffle}}$ protocols. We will highlight the intuition behind the second proof in Section 2.2.

While our lower bounds also work for the public-coin $\mathrm{DP}_{\mathrm{shuffle}}$ models, throughout this section, we focus on private-coin models in order to simplify the presentation. The full proofs extending to public-coin protocols are given in later sections.

2.1 Lower Bounds for CountDistinct via Moment Matching

To clearly illustrate the key ideas behind the first proof, we will focus on the pure-DP case where each user can only send $O(\log n)$ bits. In Section 4, we generalize the proof to approximate-DP and remove the restriction on communication complexity.

Theorem 2.1 (A Weaker Version of Theorem 1.2).

For $\varepsilon=\ln(n/\log^{7}n)$ and $D=n/\log^{5}n$ , no $\varepsilon$ - $\mathrm{DP}_{\mathrm{local}}$ protocol where each user sends $O(\log n)$ bits can solve $\textsf{\small CountDistinct}_{n,D}$ with error $o(D)$ .

Throughout our discussion, we use $R:[D]\to\mathcal{M}$ to denote a $\ln(n/\log^{7}n)$ - $\mathrm{DP}_{\mathrm{local}}$ randomizer. By the communication complexity condition of Theorem 2.1, we have that $|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n)$ .

Our proof is inspired by the lower bounds for estimating distinct elements in the property testing model, e.g., [VV17, WY19]. In particular, we use the so-called Poissonization trick. To discuss this trick, we start with some notation. For a vector $\vec{\lambda}\in\mathbb{R}^{D}$ , we use $\vec{\mathsf{Poi}}(\vec{\lambda})$ to denote the joint distribution of $D$ independent Poisson random variables:

\vec{\mathsf{Poi}}(\vec{\lambda}):=(\mathsf{Poi}(\vec{\lambda}_{1}),\mathsf{Poi}(\vec{\lambda}_{2}),\dotsc,\mathsf{Poi}(\vec{\lambda}_{n})).

For a distribution $\vec{U}$ on $\mathbb{R}^{D}$ , we define the corresponding mixture of multi-dimensional Poisson distributions as follows:

\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U})]:=\operatornamewithlimits{\mathbb{E}}_{\vec{\lambda}\leftarrow\vec{U}}\vec{\mathsf{Poi}}(\vec{\lambda}).

For two random variables $X$ and $Y$ supported on $\mathbb{R}^{\mathcal{M}}$ , we use $X+Y$ to denote the random variable distributed as a sum of two independent samples from $X$ and $Y$ .

Shuffling the Outputs of the Local Protocol. Our first observation is that the analyzer for any local protocol computing CountDistinct should achieve the same accuracy if it only sees the histogram of the randomizers’ outputs. This holds because only seeing the histogram of the outputs is equivalent to shuffling the outputs by a uniformly random permutation, which is in turn equivalent to shuffling the users in the dataset uniformly at random. Since shuffling the users in a dataset does not affect the number of distinct elements, it follows that only seeing the histogram does not affect the accuracy. Therefore, we only have to consider the histogram of the outputs of the local protocol computing CountDistinct. For a dataset $W$ , we use $\mathsf{Hist}_{R}(W)$ to denote the distribution of the histogram with randomizer $R$ .

Poissonization Trick. Given a distribution $\mathcal{D}$ on $\mathcal{M}$ , suppose we draw a sample $m\leftarrow\mathsf{Poi}(\lambda)$ , and then draw $m$ samples from $\mathcal{D}$ . If we use $N$ to denote the random variable corresponding to the histogram of these $m$ samples, it follows that each coordinate of $N$ is independent, and $N$ is distributed as $\vec{\mathsf{Poi}}(\lambda\vec{\mu})$ , where $\vec{\mu}_{i}=\mathcal{D}_{i}$ for each $i\in\mathcal{M}$ .

We can now apply the above trick to the context of local protocols (recall that by our first observation, we can focus on the histogram of the outputs). Suppose we build a dataset by drawing a sample $m\leftarrow\mathsf{Poi}(\lambda)$ and then adding $m$ users with input $z$ . By the above discussion, the corresponding histogram of the outputs with randomizer $R$ is distributed as $\vec{\mathsf{Poi}}(\lambda\cdot R(z))$ , where $R(z)$ is treated as an $|\mathcal{M}|$ -dimensional vector corresponding to its probability distribution.

Moment-Matching Random Variables. Our next ingredient is the following construction of two moment-matching random variables used in [WY19]. Let $L\in\mathbb{N}$ and $\Lambda=\Theta(L^{2})$ . There are two random variables $U$ and $V$ supported on $\{0\}\cup[1,\Lambda]$ , such that $\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1$ and $\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for every $j\in[L]$ . Moreover $U_{0}-V_{0}>0.9$ . That is, $U$ and $V$ have the same moments up to degree $L$ , while the probabilities of them being zero differs significantly. We will set $L=\log n$ and hence $\Lambda=\Theta(\log^{2}n)$ .

Construction of Hard Distribution via Signal/Noise Decomposition. Recalling that $D=n/\log^{5}n$ , we will construct two input distributions for $\textsf{\small CountDistinct}_{n,D}$ .⁹⁹9In fact, in our presentation the number of inputs in each dataset from our hard distributions will not be exactly $n$ , but only concentrated around $n$ . This issue can be easily resolved by throwing “extra” users in the dataset; we refer the reader to Section 4.2 for the details. A sample from both distributions consists of two parts: a signal part with $D$ many users in expectation, and a noise part with $n-D$ many users in expectation.

Formally, for a distribution $W$ over $\mathbb{R}^{\geq 0}$ and a subset $E\subseteq[D]$ , the dataset distributions $\mathcal{D}_{\sf signal}^{W}$ and $\mathcal{D}_{\sf noise}^{E}$ are constructed as follows:

•

$\mathcal{D}_{\sf signal}^{W}$ : for each $i\in[D]$ , we independently draw $\lambda_{i}\leftarrow W$ , and $n_{i}\leftarrow\mathsf{Poi}(\lambda_{i})$ , and add $n_{i}$ many users with input $i$ .
•

$\mathcal{D}_{\sf noise}^{E}$ : for each $i\in E$ , we independently draw $n_{i}\leftarrow\mathsf{Poi}((n-D)/|E|)$ , and add $n_{i}$ many users with input $i$ .

We are going to fix a “good” subset $E$ of $[D]$ such that $|E|\leq 0.02\cdot D$ (we will later specify the other conditions for being “good”). Therefore, when it is clear from the context, we will use $\mathcal{D}_{\sf noise}$ instead of $\mathcal{D}_{\sf noise}^{E}$ .

Our two hard distributions are then constructed as $\mathcal{D}^{U}:=\mathcal{D}_{\sf signal}^{U}+\mathcal{D}_{\sf noise}$ and $\mathcal{D}^{V}:=\mathcal{D}_{\sf signal}^{V}+\mathcal{D}_{\sf noise}$ . Using the fact that $\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1$ , one can verify that there are $D$ users in each of $\mathcal{D}_{\sf signal}^{U}$ and $\mathcal{D}_{\sf signal}^{V}$ in expectation. Similarly, one can also verify there are $n-D$ users in $\mathcal{D}_{\sf noise}$ in expectation. Hence, both $\mathcal{D}^{U}$ and $\mathcal{D}^{V}$ have $n$ users in expectation. In fact, the number of users from both distributions concentrates around $n$ .

We now justify our naming of the signal/noise distributions. First, note that the number of distinct elements in the signal parts $\mathcal{D}_{\sf signal}^{U}$ and $\mathcal{D}_{\sf signal}^{V}$ concentrates around $(1-\operatornamewithlimits{\mathbb{E}}[e^{-U}])\cdot D$ and $(1-\operatornamewithlimits{\mathbb{E}}[e^{-V}])\cdot D$ respectively. By our condition that $U_{0}-V_{0}>0.9$ , it follows that the signal parts of $\mathcal{D}^{U}$ and $\mathcal{D}^{V}$ separates their numbers of distinct elements by at least $0.4D$ . Second, note that although $\mathcal{D}_{\sf noise}$ has $n-D\gg D$ many users in expectation, they are from the subset $E$ of size less than $0.02\cdot n$ . Therefore, these users collectively cannot change the number of distinct elements by more than $0.02\cdot n$ , and the numbers of distinct elements in $\mathcal{D}^{U}$ and $\mathcal{D}^{V}$ are still separated by $\Omega(D)$ .

Decomposition of Noise Part. To establish the desired lower bound, it now suffices to show for the local randomizer $R$ , it holds that $\mathsf{Hist}_{R}(\mathcal{D}^{U})$ and $\mathsf{Hist}_{R}(\mathcal{D}^{V})$ are very close in statistical distance. For $W\in\{U,V\}$ , we can decompose $\mathsf{Hist}_{R}(\mathcal{D}^{W})$ as

\mathsf{Hist}_{R}(\mathcal{D}^{W})=\sum_{i\in[D]}\vec{\mathsf{Poi}}(W\cdot R(i))+\sum_{i\in[E]}\vec{\mathsf{Poi}}((n-D)/|E|\cdot R(i)).

By the additive property of Poisson distributions, letting $\vec{\nu}=(n-D)/|E|\cdot\sum_{i\in[E]}R(i)$ , we have that $\sum_{i\in[E]}\vec{\mathsf{Poi}}((n-D)/|E|\cdot R(i))=\vec{\mathsf{Poi}}(\vec{\nu})$ .

Our key idea is to decompose $\vec{\nu}$ carefully into $D+1$ nonnegative vectors $\vec{\nu}^{(0)},\vec{\nu}^{(1)},\dotsc,\vec{\nu}^{(D)}$ , such that $\vec{\nu}=\sum_{i=0}^{D}\vec{\nu}^{(i)}$ . Then, for $W\in\{U,V\}$ , we have

\mathsf{Hist}_{R}(\mathcal{D}^{W})=\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i\in[D]}\vec{\mathsf{Poi}}(W\cdot R(i)+\vec{\nu}^{(i)}).

To show that $\mathsf{Hist}_{R}(\mathcal{D}^{U})$ and $\mathsf{Hist}_{R}(\mathcal{D}^{V})$ are close, it suffices to show that for each $i\in[D]$ , it is the case that $\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\nu}^{(i)})$ and $\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\nu}^{(i)})$ are close. We show that they are close when $\vec{\nu}^{(i)}$ is sufficiently large on every coordinate compared to $R(i)$ .

Lemma 2.2 (Simplification of Lemma 4.3).

For each $i\in[D]$ , and every $\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{\mathcal{M}}$ , if $\vec{\lambda}_{z}\geq 2\Lambda^{2}\cdot R(i)_{z}$ for every $z\in\mathcal{M}$ , then¹⁰¹⁰10We use $\|\mathcal{D}_{1}-\mathcal{D}_{2}\|_{TV}$ to denote the total variation (aka statistical) distance between two distributions $\mathcal{D}_{1},\mathcal{D}_{2}$ .

\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\lambda})]\|_{TV}\leq\frac{1}{n^{2}}.

To apply Lemma 2.2, we simply set $\vec{\nu}^{(i)}=(2\Lambda^{2})\cdot R(i)$ and $\vec{\nu}^{(0)}=\vec{\nu}-\sum_{i\in[D]}\vec{\nu}^{(i)}$ . Letting $\vec{\mu}=\sum_{i\in[D]}R(i)$ , the requirement that $\vec{\nu}^{(0)}$ has to be nonnegative translates to $\vec{\nu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ , for each $z\in\mathcal{M}$ .

Construction of a Good Subset $E$ . So we want to pick a subset $E\subseteq[D]$ of size at most $0.02\cdot D$ such that the corresponding $\vec{\nu}^{E}=(n-D)/|E|\cdot\sum_{i\in[E]}R(i)$ satisfies $\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ for each $z\in\mathcal{M}$ . We will show that a simple random construction works with high probability: i.e., one can simply add each element of $[D]$ to $E$ independently with probability $0.01$ .

More specifically, for each $z\in\mathcal{M}$ , we will show that with high probability $\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ . Then the correctness of our construction follows from a union bound (and this step crucially uses the fact that $|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n)$ ).

Now, let us fix a $z\in\mathcal{M}$ . Let $m^{*}=\max_{i\in[D]}R(i)_{z}$ . Since $R$ is $\ln(n/\log^{7}n)$ -DP, it follows that $\vec{\nu}_{z}\geq\frac{n-D}{n/\log^{7}n}\cdot m^{*}\geq\frac{\log^{7}n}{2}\cdot m^{*}$ . We consider the following two cases:

1.

If $m^{*}\geq\vec{\mu}_{z}/\log^{2}n$ , we immediately get that $\vec{\nu}_{z}\geq\log^{5}n/2\cdot\vec{\mu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ (which uses the fact that $\Lambda=\Theta(\log^{2}n)$ ).
2.

If $m^{*}<\vec{\mu}_{z}/\log^{2}n$ , then in this case, the mass $\vec{\mu}_{z}$ is distributed over at least $\log^{2}n$ many components $R(i)_{z}$ . Applying Hoeffding’s inequality shows that with high probability over $E$ , it is the case that $\vec{\nu}^{E}_{z}\geq\Theta(n/D)\cdot\vec{\mu}_{z}\geq\Lambda^{2}\cdot\vec{\mu}_{z}$ (which uses the fact that $D=n/\log^{5}n$ ).

See the proof of Lemma 4.5 for a formal argument and how to get rid of the assumption that $|\mathcal{M}|\leq\mathop{\mathrm{poly}}(n)$ .

The Lower Bound. From the above discussions, we get that

\|\mathsf{Hist}_{R}(\mathcal{D}^{U})-\mathsf{Hist}_{R}(\mathcal{D}^{V})\|_{TV}\leq\sum_{i=1}^{D}\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\cdot R(i)+\vec{\nu}^{(i)})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\cdot R(i)+\vec{\nu}^{(i)})]\|_{TV}\leq 1/n.

Hence, the analyzer of the local protocol with randomizer $R$ cannot distinguish $\mathcal{D}^{U}$ and $\mathcal{D}^{V}$ , and thus it cannot solve $\textsf{\small CountDistinct}_{n,D}$ with error $o(D)$ and $0.99$ probability. See the proof of Theorem 4.1 for a formal argument and how to deal with the fact that there may not be exactly $n$ users in dataset from $\mathcal{D}^{U}$ or $\mathcal{D}^{V}$ .

Single-Message $\mathrm{DP}_{\mathrm{shuffle}}$ Lower Bound. To apply the above lower bound to $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols, the natural idea is to resort to the connection between the $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ and $\mathrm{DP}_{\mathrm{local}}$ models. In particular, [CSU⁺18] showed that $(\varepsilon,\delta)$ - $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols are also $(\varepsilon+\ln n,\delta)$ - $\mathrm{DP}_{\mathrm{local}}$ .

It may seem that the $\ln n$ privacy guarantee is very close to the $\ln n-O(\ln\ln n)$ one in Theorem 1.2. But surprisingly, it turns out (as was stated in Theorem 1.3) that there is a $\left(\ln n+O(1)\right)$ - $\mathrm{DP}_{\mathrm{local}}$ protocol solving $\textsf{\small CountDistinct}_{n,n}$ (hence also $\textsf{\small CountDistinct}_{n,D}$ ) with error $O(\sqrt{n})$ . Hence, to establish the $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ lower bound (Theorem 1.4), we rely on the following stronger connection between $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ and $\mathrm{DP}_{\mathrm{local}}$ protocols.

Lemma 2.3 (Simplification of Lemma 1.5).

For every $\delta\leq 1/n^{\omega(1)}$ , if the randomizer $R$ is $(O(1),\delta)$ - $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ on $n$ users, then $R$ is $\left(\ln(n\log^{2}n/\log\delta^{-1}),n^{-\omega(1)}\right)$ - $\mathrm{DP}_{\mathrm{local}}$ .

Setting $\delta=2^{-\log^{k}n}$ for a sufficiently large $k$ and combining Lemma 2.3 and Theorem 1.2 gives us the desired lower bound against $\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols.

2.2 Lower Bounds for CountDistinct and Selection via Dominated Protocols

We will first describe the proof ideas behind Theorem 1.1, which is restated below. Then, we apply the same proof technique to obtain lower bounds for Selection (the lower bound for ParityLearning is established similarly; see Section 6.3 for details).

Lemma 2.4 (Detailed Version of Theorem 1.1).

For $\varepsilon=o(\ln n)$ , no $(\varepsilon,o(1/n))$ -dominated protocol can solve CountDistinct with error $o(n/e^{\varepsilon})$ .

Hard Distributions for $\textsf{\small CountDistinct}_{n,n}$ . We now construct our hard instances for $\textsf{\small CountDistinct}_{n,n}$ . For simplicity, we assume $n=2^{D}$ for an integer $D$ , and identify the input space $[n]$ with $\{0,1\}^{D}$ by a fixed bijection. Let $\mathcal{U}_{D}$ be the the uniform distribution over $\{0,1\}^{D}$ . For $(\ell,s)\in[2]\times\{0,1\}^{D}$ , we let $\mathcal{D}_{\ell,s}$ be the uniform distribution on $\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}$ .

We also use $\mathcal{D}_{\ell,s}^{\alpha}$ to denote the mixture of $\mathcal{D}_{\ell,s}$ and $\mathcal{U}_{D}$ which outputs a sample from $\mathcal{D}_{\ell,s}$ with probability $\alpha$ and a sample from $\mathcal{U}_{D}$ with probability $1-\alpha$ .

For a parameter $\alpha>0$ , we consider the following two dataset distributions with $n$ users:

•

$\mathcal{W}^{\sf uniform}$ : each user gets an i.i.d. input from $\mathcal{U}_{D}$ . That is, $\mathcal{W}^{\sf uniform}:=\mathcal{U}_{D}^{\otimes n}$ .
•

$\mathcal{W}^{\alpha}$ : to sample a dataset from $\mathcal{W}^{\alpha}$ , we first draw $(\ell,s)$ from $[2]\times\{0,1\}^{D}$ uniformly at random, then each user gets an i.i.d. input from $\mathcal{D}_{\ell,s}^{\alpha}$ . Formally, $\mathcal{W}^{\alpha}:=\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}(\mathcal{D}_{\ell,s}^{\alpha})^{\otimes n}$ .

Since for every $\ell,s$ , it holds that $|\mathrm{supp}(\mathcal{D}_{\ell,s}^{1})|\leq n/2$ , the number of distinct elements from any dataset in $\mathcal{W}^{1}$ is at most $n/2$ . On the other hand, since $\mathcal{U}_{D}$ is a uniform distribution over $n$ elements, a random dataset from $\mathcal{W}^{\sf uniform}=\mathcal{W}^{0}$ has roughly $(1-e^{-1})\cdot n>n/2$ distinct elements with high probability. Hence, the expected number of distinct elements of datasets from $\mathcal{W}^{\alpha}$ is controlled by the parameter $\alpha$ . A simple but tedious calculation shows that it is approximately $(1-e^{-1}\cdot\cosh(\alpha))\cdot n$ , which can be approximated by $(1-e^{-1}\cdot(1+\alpha^{2}))\cdot n$ for $n^{-0.1}<\alpha<0.01$ (see Proposition 7.1 for more details). Hence, any protocol solving CountDistinct with error $o(\alpha^{2}n)$ should be able to distinguish between the above two distributions. Our goal is to show that this is impossible for $(\varepsilon,o(1/n))$ -dominated protocols.

Bounding KL Divergence for Dominated Protocols. Our next step is to upper-bound the statistical distance $\|\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform})-\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})\|_{TV}$ . As in previous work [Ull18, GGK⁺19, ENU20], we may upper-bound the KL divergence instead. By the convexity and chain-rule properties of KL divergence, it follows that

	$\displaystyle\mathrm{KL}(\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})\|\|\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform}))$	$\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})^{\otimes n}\|\|R(\mathcal{U}_{D})^{\otimes n})$
		$\displaystyle=n\cdot\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})\|\|R(\mathcal{U}_{D})).$		(1)

Bounding the Average KL Divergence between a Family and a Single Distribution. We are now ready to introduce our general tool for bounding average KL divergence quantities like (1). We first set up some notation. Let $\mathcal{I}$ be an index set and $\{\lambda_{v}\}_{v\in\mathcal{I}}$ be a family of distributions on $\mathcal{X}$ , let $\pi$ be a distribution on $\mathcal{I}$ , and $\mu$ be a distribution on $\mathcal{X}$ . For simplicity, we assume that for every $x\in\mathcal{X}$ and $v\in\mathcal{I}$ , it holds that $(\lambda_{v})_{x}\leq 2\cdot\mu_{x}$ (which is true for $\{\mathcal{D}_{\ell,s}^{\alpha}\}_{(\ell,s)\in[2]\times\{0,1\}^{D}}$ and $\mathcal{U}_{D}$ ).

Theorem 2.5.

Let $W\colon\mathbb{R}\to\mathbb{R}$ be a concave function such that for all functions $\psi\colon\mathcal{X}\to\mathbb{R}^{\geq 0}$ satisfying $\psi(\mu)\leq 1$ , it holds that

\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(\psi(\lambda_{v})-\psi(\mu))^{2}\right]\leq W(\|\psi\|_{\infty}).

Then for an $(\varepsilon,\delta)$ -dominated randomizer $R$ , it follows that

\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq O\left(W(2e^{\varepsilon})+\delta\right).

Bounding (1) via Fourier Analysis. To apply Theorem 2.5, for $f\colon\mathcal{X}\to\mathbb{R}^{\geq 0}$ with $f(\mathcal{U}_{D})=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}[f(x)]\leq 1$ , we want to bound

\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}[(f(\mathcal{D}_{\ell,s}^{\alpha})-f(\mathcal{U}_{D}))^{2}]=\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}\alpha^{2}\cdot\hat{f}(s)^{2}.

By Parseval’s Identity (see Lemma 3.8),

\sum_{s\in\{0,1\}^{D}}\hat{f}(s)^{2}=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}f(x)^{2}\leq f(\mathcal{U}_{D})\cdot\|f\|_{\infty}\leq\|f\|_{\infty}.

Therefore, we can set $W(L):=\alpha^{2}\cdot\frac{L}{2^{D}}$ , and apply Theorem 2.5 to obtain

\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})||R(\mathcal{U}_{D}))\leq O(\alpha^{2}\cdot e^{\varepsilon}/n+\delta).

We set $\alpha$ such that $\alpha^{2}=c/e^{\varepsilon}$ for a sufficiently small constant $c$ and note that $\delta=o(1/n)$ . It follows that $\mathrm{KL}(\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})||\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform}))\leq 0.01$ , and therefore $\|\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})-\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform})\|_{TV}\leq 0.1$ by Pinsker’s inequality. Hence, we conclude that $(\varepsilon,o(1/n))$ -dominated protocols cannot solve $\textsf{\small CountDistinct}_{n,n}$ with error $o(n/e^{\varepsilon})$ , completing the proof of Lemma 2.4. Now Theorem 1.1 follows from Lemma 2.4 and the fact that $(\varepsilon,\delta)$ - $\mathrm{DP}_{\mathrm{local}}$ protocols are also $(\varepsilon,\delta)$ -dominated.

Lower Bounds for Selection against Multi-Message $\mathrm{DP}_{\mathrm{shuffle}}$ Protocols. Now we show how to apply Theorem 2.5 and Lemma 2.3 to prove lower bounds for Selection. For $(\ell,j)\in[2]\times[D]$ , let $\mathcal{D}_{\ell,j}$ be the uniform distribution on all length- $D$ binary strings with $j$ th bit being $\ell$ . Recall that $\mathcal{U}_{D}$ is the uniform distribution on $\{0,1\}^{D}$ . Again we aim to upper-bound the average-case KL divergence $\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D})).$

To apply Theorem 2.5, for $f\colon\mathcal{X}\to\mathbb{R}^{\geq 0}$ with $f(\mathcal{U}_{D})=\operatornamewithlimits{\mathbb{E}}_{x\in\{0,1\}^{D}}[f(x)]\leq 1$ , we want to bound

\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}[(f(\mathcal{D}_{\ell,j}^{\alpha})-f(\mathcal{U}_{D}))^{2}]=\operatornamewithlimits{\mathbb{E}}_{j\in[D]}\hat{f}(\{j\})^{2}.

By the Level-1 Inequality (see Lemma 3.7), it is the case that

\sum_{j\in[D]}\hat{f}(\{j\})^{2}\leq O(\log\|f\|_{\infty}).

Therefore, we can set $W(L):=c_{1}\cdot\frac{\log L}{D}$ for an appropriate constant $c_{1}$ , and apply Theorem 2.5 to obtain

\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))\leq O\left(\frac{\varepsilon}{D}+\delta\right).

Combining this with Lemma 2.3 completes the proof (see the proofs of Lemma 6.3 and Theorem 1.9 for the details).

3 Preliminaries

3.1 Notation

For a function $\displaystyle f\colon\mathcal{X}\to\mathbb{R}$ , a distribution $\displaystyle\mathcal{D}$ on $\displaystyle\mathcal{X}$ , and an element $\displaystyle z\in\mathcal{X}$ , we use $\displaystyle f(\mathcal{D})$ to denote $\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{D}}[f(x)]$ and $\displaystyle\mathcal{D}_{z}$ to denote $\displaystyle\Pr_{x\leftarrow\mathcal{D}}[x=z]$ . For a subset $\displaystyle E\subseteq\mathcal{X}$ , we use $\displaystyle\mathcal{D}_{E}$ to denote $\displaystyle\sum_{z\in E}\mathcal{D}_{z}=\Pr_{x\leftarrow\mathcal{D}}[x\in E]$ . We also use $\displaystyle\mathcal{U}_{D}$ to denote the uniform distribution over $\displaystyle\{0,1\}^{D}$ .

For two distributions $\displaystyle\mathcal{D}_{1}$ and $\displaystyle\mathcal{D}_{2}$ on sets $\displaystyle\mathcal{X}$ and $\displaystyle\mathcal{Y}$ respectively, we use $\displaystyle\mathcal{D}_{1}\otimes\mathcal{D}_{2}$ to denote their product distribution over $\displaystyle\mathcal{X}\times\mathcal{Y}$ . For two random variables $\displaystyle X$ and $\displaystyle Y$ supported on $\displaystyle\mathbb{R}^{D}$ for $\displaystyle D\in\mathbb{N}$ , we use $\displaystyle X+Y$ to denote the random variable distributed as a sum of two independent samples from $\displaystyle X$ and $\displaystyle Y$ . For any set $\displaystyle\mathcal{S}$ , we denote by $\displaystyle\mathcal{S}^{*}$ the set consisting of sequences on $\displaystyle\mathcal{S}$ , i.e., $\displaystyle\mathcal{S}^{*}=\cup_{n\geq 0}\mathcal{S}^{n}$ . For $\displaystyle x\in\mathbb{R}$ , let $\displaystyle[x]_{+}$ denote $\displaystyle\max(x,0)$ . For a predicate $\displaystyle P$ , we use $\displaystyle\mathbb{1}[P]$ to denote the corresponding Boolean value of $\displaystyle P$ , that is, $\displaystyle\mathbb{1}[P]=1$ if $\displaystyle P$ is true, and $\displaystyle 0$ otherwise.

For a distribution $\displaystyle\mathcal{D}$ on a finite set $\displaystyle\mathcal{X}$ and an event $\displaystyle\mathcal{E}\subseteq\mathcal{X}$ such that $\displaystyle\Pr_{z\leftarrow\mathcal{D}}[z\in\mathcal{E}]>0$ , we use $\displaystyle\mathcal{D}|\mathcal{E}$ to denote the conditional distribution such that

\displaystyle(\mathcal{D}|\mathcal{E})_{z}=\begin{cases}\frac{\mathcal{D}_{z}}{\Pr_{z\leftarrow\mathcal{D}}[z\in\mathcal{E}]}\quad&\text{if $\displaystyle z\in\mathcal{E}$,}\\ 0\quad&\text{otherwise.}\end{cases}

Slightly overloading the notation, we also use $\displaystyle\alpha\cdot\mathcal{D}_{1}+(1-\alpha)\cdot\mathcal{D}_{2}$ to denote the mixture of distributions $\displaystyle\mathcal{D}_{1}$ and $\displaystyle\mathcal{D}_{2}$ with mixing weights $\displaystyle\alpha$ and $\displaystyle(1-\alpha)$ respectively. Whether $\displaystyle+$ means mixture or convolution will be clear from the context unless explicitly stated.

3.2 Differential Privacy

We now recall the basics of differential privacy that we will need. Fix a finite set $\displaystyle\mathcal{X}$ , the space of user reports. A dataset $\displaystyle X$ is an element of $\displaystyle\mathcal{X}^{*}$ , namely a tuple consisting of elements of $\displaystyle\mathcal{X}$ . Let $\displaystyle\mathrm{hist}(X)\in\mathbb{N}^{|\mathcal{X}|}$ be the histogram of $\displaystyle X$ : for any $\displaystyle x\in\mathcal{X}$ , the $\displaystyle x$ th component of $\displaystyle\mathrm{hist}(X)$ is the number of occurrences of $\displaystyle x$ in the dataset $\displaystyle X$ . We will consider datasets $\displaystyle X,X^{\prime}$ to be equivalent if they have the same histogram (i.e., the ordering of the elements $\displaystyle x_{1},\ldots,x_{n}$ does not matter). For a multiset $\displaystyle\mathcal{S}$ whose elements are in $\displaystyle\mathcal{X}$ , we will also write $\displaystyle\mathrm{hist}(\mathcal{S})$ to denote the histogram of $\displaystyle\mathcal{S}$ (so that the $\displaystyle x$ th component is the number of copies of $\displaystyle x$ in $\displaystyle\mathcal{S}$ ).

Let $\displaystyle n\in\mathbb{N}$ , and consider a dataset $\displaystyle X=(x_{1},\ldots,x_{n})\in\mathcal{X}^{n}$ . For an element $\displaystyle x\in\mathcal{X}$ , let $\displaystyle f_{X}(x)=\frac{\mathrm{hist}(X)_{x}}{n}$ be the frequency of $\displaystyle x$ in $\displaystyle X$ , namely the fraction of elements of $\displaystyle X$ that are equal to $\displaystyle x$ . Two datasets $\displaystyle X,X^{\prime}$ are said to be neighboring if they differ in a single element, meaning that we can write (up to equivalence) $\displaystyle X=(x_{1},x_{2},\ldots,x_{n})$ and $\displaystyle X^{\prime}=(x_{1}^{\prime},x_{2},\ldots,x_{n})$ . In this case, we write $\displaystyle X\sim X^{\prime}$ . Let $\displaystyle\mathcal{Z}$ be a set; we now define the differential privacy of a randomized function $\displaystyle P\colon\mathcal{X}^{n}\rightarrow\mathcal{Z}$ as follows.

Definition 3.1 (Differential privacy (DP) [DMNS06, DKM⁺06]).

A randomized algorithm $\displaystyle P\colon\mathcal{X}^{n}\rightarrow\mathcal{Z}$ is $\displaystyle(\varepsilon,\delta)$ -DP if for every pair of neighboring datasets $\displaystyle X\sim X^{\prime}$ and for every set $\displaystyle\mathcal{S}\subseteq\mathcal{Z}$ , we have

\displaystyle\Pr[P(X)\in\mathcal{S}]\leq e^{\varepsilon}\cdot\Pr[P(X^{\prime})\in\mathcal{S}]+\delta,

where the probabilities are taken over the randomness in $\displaystyle P$ . Here, $\displaystyle\varepsilon\geq 0$ and $\displaystyle\delta\in[0,1]$ .

If $\displaystyle\delta=0$ , then we use $\displaystyle\varepsilon$ -DP for brevity and informally refer to it as pure-DP; if $\displaystyle\delta>0$ , we refer to it as approximate-DP. We will use the following post-processing property of DP.

Lemma 3.2 (Post-processing, e.g., [DR14]).

If $\displaystyle P$ is $\displaystyle(\varepsilon,\delta)$ -DP, then for every randomized function $\displaystyle A$ , the composed function $\displaystyle A\circ P$ is $\displaystyle(\varepsilon,\delta)$ -DP.

DP is nicely characterized by the following divergence between distributions, which will be used throughout the paper.

Definition 3.3 (Hockey Stick Divergence).

For any $\displaystyle\varepsilon>0$ , the $\displaystyle e^{\varepsilon}$ -hockey stick divergence between distributions $\displaystyle\mathcal{D}$ and $\displaystyle\mathcal{D}^{\prime}$ is defined as $\displaystyle d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime}):=\sum_{x\in\mathrm{supp}(\mathcal{D})}[\mathcal{D}_{x}-e^{\varepsilon}\cdot\mathcal{D}^{\prime}_{x}]_{+}$ .

We next list two useful facts about the hockey stick divergence between distributions.

Proposition 3.4.

Let $\displaystyle\mathcal{D}$ and $\displaystyle\mathcal{D}^{\prime}$ be any distributions. Then, the following hold:

Let $\displaystyle\mathcal{D}_{com}$ be another distribution. Then, for any function $\displaystyle f$ , it holds that

\displaystyle d_{\varepsilon}(f(\mathcal{D}\otimes\mathcal{D}_{com})||f(\mathcal{D}^{\prime}\otimes\mathcal{D}_{com}))\leq d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime}).

Suppose we can decompose $\displaystyle\mathcal{D}=\sum_{i\in\mathcal{I}}\alpha_{i}\mathcal{D}_{i}$ and $\displaystyle\mathcal{D}^{\prime}=\sum_{i\in\mathcal{I}}\beta_{i}\mathcal{D}^{\prime}_{i}$ , where $\displaystyle\alpha_{i}$ ’s and $\displaystyle\beta_{i}$ ’s are tuples of positives reals summing up to $\displaystyle 1$ and $\displaystyle\mathcal{D}_{i}$ ’s and $\displaystyle\mathcal{D}^{\prime}_{i}$ ’s are distributions, then

\displaystyle d_{\varepsilon}(\mathcal{D}||\mathcal{D}^{\prime})\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot d_{\varepsilon+\ln(\beta_{i}/\alpha_{i})}(\mathcal{D}_{i}||\mathcal{D}^{\prime}_{i}).

Proof.

Item (1) follows from the post-processing property of DP, together with the definition of the hockey stick divergence.

To prove Item (2), we note that

	$\displaystyle\displaystyle d_{\varepsilon}(\mathcal{D}\|\|\mathcal{D}^{\prime})$	$\displaystyle\displaystyle=\sum_{x\in\mathrm{supp}(\mathcal{D})}[\mathcal{D}_{x}-e^{\varepsilon}\cdot\mathcal{D}^{\prime}_{x}]_{+}$
		$\displaystyle\displaystyle=\sum_{x\in\mathrm{supp}(\mathcal{D})}\left[\sum_{i\in\mathcal{I}}\alpha_{i}(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\left(\sum_{i\in\mathcal{I}}\beta_{i}(\mathcal{D}^{\prime}_{i})_{x}\right)\right]_{+}$
		$\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\sum_{x\in\mathrm{supp}(\mathcal{D}_{i})}\left[\alpha_{i}(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\beta_{i}(\mathcal{D}^{\prime}_{i})_{x}\right]_{+}$
		$\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot\sum_{x\in\mathrm{supp}(\mathcal{D}_{i})}\left[(\mathcal{D}_{i})_{x}-e^{\varepsilon}\cdot\beta_{i}/\alpha_{i}(\mathcal{D}^{\prime}_{i})_{x}\right]_{+}$
		$\displaystyle\displaystyle\leq\sum_{i\in\mathcal{I}}\alpha_{i}\cdot d_{\varepsilon+\ln(\beta_{i}/\alpha_{i})}(\mathcal{D}_{i}\|\|\mathcal{D}^{\prime}_{i}).\qed$

3.3 Shuffle Model

We briefly review the shuffle model of DP [BEM⁺17, EFM⁺19, CSU⁺18]. The input to the model is a dataset $\displaystyle(x_{1},\ldots,x_{n})\in\mathcal{X}^{n}$ , where item $\displaystyle x_{i}\in\mathcal{X}$ is held by user $\displaystyle i$ . A protocol $\displaystyle P\colon\mathcal{X}\rightarrow\mathcal{Z}$ in the shuffle model consists of three algorithms:

•

The local randomizer $\displaystyle R\colon\mathcal{X}\rightarrow\mathcal{M}^{*}$ takes as input the data of one user, $\displaystyle x_{i}\in\mathcal{X}$ , and outputs a sequence $\displaystyle(y_{i,1},\ldots,y_{i,m_{i}})$ of messages; here $\displaystyle m_{i}$ is a positive integer.

To ease discussions in the paper, we will further assume that the randomizer $\displaystyle R$ pre-shuffles its messages. That is, it applies a random permutation $\displaystyle\pi\colon[m_{i}]\to[m_{i}]$ to the sequence $\displaystyle(y_{i,1},\ldots,y_{i,m_{i}})$ before outputting it.¹¹¹¹11Therefore, for every $\displaystyle x\in\mathcal{X}$ and any two tuples $\displaystyle z_{1},z_{2}\in\mathcal{M}^{*}$ that are equivalent up to a permutation, $\displaystyle R(x)$ outputs them with the same probability.
•

The shuffler $\displaystyle S\colon\mathcal{M}^{*}\rightarrow\mathcal{M}^{*}$ takes as input a sequence of elements of $\displaystyle\mathcal{M}$ , say $\displaystyle(y_{1},\ldots,y_{m})$ , and outputs a random permutation, i.e., the sequence $\displaystyle(y_{\pi(1)},\ldots,y_{\pi(m)})$ , where $\displaystyle\pi\in S_{m}$ is a uniformly random permutation on $\displaystyle[m]$ . The input to the shuffler will be the concatenation of the outputs of the local randomizers.
•

The analyzer $\displaystyle A\colon\mathcal{M}^{*}\rightarrow\mathcal{Z}$ takes as input a sequence of elements of $\displaystyle\mathcal{M}$ (which will be taken to be the output of the shuffler) and outputs an answer in $\displaystyle\mathcal{Z}$ that is taken to be the output of the protocol $\displaystyle P$ .

We will write $\displaystyle P=(R,S,A)$ to denote the protocol whose components are given by $\displaystyle R$ , $\displaystyle S$ , and $\displaystyle A$ . The main distinction between the shuffle and local model is the introduction of the shuffler $\displaystyle S$ between the local randomizer and the analyzer. As in the local model, the analyzer is untrusted in the shuffle model; hence privacy must be guaranteed with respect to the input to the analyzer, i.e., the output of the shuffler. Formally, we have:

Definition 3.5 (DP in the Shuffle Model, [EFM⁺19, CSU⁺18]).

A protocol $\displaystyle P=(R,S,A)$ is $\displaystyle(\varepsilon,\delta)$ -DP if, for any dataset $\displaystyle X=(x_{1},\ldots,x_{n})$ , the algorithm

\displaystyle(x_{1},\ldots,x_{n})\mapsto S(R(x_{1}),\ldots,R(x_{n}))

is $\displaystyle(\varepsilon,\delta)$ -DP.

Notice that the output of $\displaystyle S(R(x_{1}),\ldots,R(x_{n}))$ can be simulated by an algorithm that takes as input the multiset consisting of the union of the elements of $\displaystyle R(x_{1}),\ldots,R(x_{n})$ (which we denote as $\displaystyle\bigcup_{i}R(x_{i})$ , with a slight abuse of notation) and outputs a uniformly random permutation of them. Thus, by Lemma 3.2, it can be assumed without loss of generality for privacy analyses that the shuffler simply outputs the multiset $\displaystyle\bigcup_{i}R(x_{i})$ . For the purpose of analyzing the accuracy of the protocol $\displaystyle P=(R,S,A)$ , we define its output on the dataset $\displaystyle X=(x_{1},\ldots,x_{n})$ to be $\displaystyle P(X):=A(S(R(x_{1}),\ldots,R(x_{n})))$ . We also remark that the case of local DP, formalized in Definition 3.6, is a special case of the shuffle model where the shuffler $\displaystyle S$ is replaced by the identity function:

Definition 3.6 (Local DP [KLN⁺11]).

A protocol $\displaystyle P=(R,A)$ is $\displaystyle(\varepsilon,\delta)$ -DP in the local model (or $\displaystyle(\varepsilon,\delta)$ -locally DP) if the function $\displaystyle x\mapsto R(x)$ is $\displaystyle(\varepsilon,\delta)$ -DP.

We say that the output of the protocol $\displaystyle P$ on an input dataset $\displaystyle X=(x_{1},\ldots,x_{n})$ is $\displaystyle P(X):=A(R(x_{1}),\ldots,R(x_{n}))$ .

We denote DP in the shuffle model by $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ , and the special case where each user can send at most¹²¹²12We may assume w.l.o.g. that each user sends exactly $\displaystyle k$ messages; otherwise, we may define a new symbol $\displaystyle\perp$ and make each user sends $\displaystyle\perp$ messages so that the number of messages becomes exactly $\displaystyle k$ . $\displaystyle k$ messages by $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ . We denote DP in the local model by $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

Public-Coin DP.

The default setting for local and shuffle models is private-coin, i.e., there is no randomness shared between the randomizers and the analyzer. We will also study the public-coin variants of the local and shuffle models. In the public-coin setting, each local randomizer also takes a public random string $\displaystyle\alpha\leftarrow\{0,1\}^{*}$ as input. The analyzer is also given the public random string $\displaystyle\alpha$ . We use $\displaystyle R_{\alpha}(x)$ to denote the local randomizer with public random string being fixed to $\displaystyle\alpha$ . At the start of the protocol, all users jointly sample a public random string from a publicly known distribution $\displaystyle\mathcal{D}_{\sf pub}$ .

Now, we say that a protocol $\displaystyle P=(R,A)$ is $\displaystyle(\varepsilon,\delta)$ -DP in the public-coin local model, if the function

\displaystyle x\underset{\alpha\leftarrow\mathcal{D}_{\sf pub}}{\mapsto}(\alpha,R_{\alpha}(x))

is $\displaystyle(\varepsilon,\delta)$ -DP.

Similarly, we say that a protocol $\displaystyle P=(R,S,A)$ is $\displaystyle(\varepsilon,\delta)$ -DP in the public-coin shuffle model, if for any dataset $\displaystyle X=(x_{1},\ldots,x_{n})$ , the algorithm

\displaystyle(x_{1},\ldots,x_{n})\underset{\alpha\leftarrow\mathcal{D}_{\sf pub}}{\mapsto}(\alpha,S(R_{\alpha}(x_{1}),\ldots,R_{\alpha}(x_{n})))

is $\displaystyle(\varepsilon,\delta)$ -DP.

3.4 Useful Divergences

We will make use of two important divergences between distributions, the KL-divergence and the $\displaystyle\chi^{2}$ -divergence, defined as

\displaystyle KL(P||Q)=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow P}\log\left(\frac{P_{z}}{Q_{z}}\right)\quad\text{and}\quad\chi^{2}(P||Q)=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\left[\frac{P_{z}-Q_{z}}{Q_{z}}\right]^{2}.

We rely on the key fact that $\displaystyle\chi^{2}$ -divergence upper-bounds KL-divergence [GS02], that is,

\displaystyle\mathrm{KL}(P||Q)\leq\chi^{2}(P||Q).

We will also use Pinsker’s inequality, whereby the total variation distance lower-bounds the KL-divergence:

\displaystyle\mathrm{KL}(P||Q)\geq\frac{2}{\ln 2}\|P-Q\|_{TV}^{2}.

3.5 Fourier Analysis

We now review some basic Fourier analysis and then introduce two inequalities that will be heavily used in our proofs. For a function $\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}$ , its Fourier transform is given by the function $\displaystyle\hat{f}(S):=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}[f(x)\cdot(-1)^{\sum_{i\in S}x_{i}}]$ . We also define $\displaystyle\|f\|_{2}^{2}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}[f(x)^{2}]$ . For $\displaystyle k\in\mathbb{N}$ , we define the level- $\displaystyle k$ Fourier weight as $\displaystyle\mathbf{W}^{k}[f]:=\sum_{S\subseteq[D],|S|=k}\hat{f}(S)^{2}$ . For convenience, for $\displaystyle s\in\{0,1\}^{D}$ , we will also write $\displaystyle\hat{f}(s)$ to denote $\displaystyle f(\chi_{s})$ , where $\displaystyle\chi_{s}$ is the set $\displaystyle\{i:i\in[D]\wedge s_{i}=1\}$ . One key technical lemma is the Level-1 Inequality from [O’D14], which was also used in [GGK⁺19].

Lemma 3.7 (Level-1 Inequality).

Suppose $\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}_{\geq 0}$ is a non-negative-valued function with $\displaystyle f(x)\in[0,L]$ for all $\displaystyle x\in\{0,1\}^{D}$ , and $\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\sim\mathcal{U}_{D}}[f(x)]\leq 1$ . Then, $\displaystyle\mathbf{W}^{1}[f]\leq 6\ln(L+1)$ .

We also need the standard Parseval’s identity.

Lemma 3.8 (Parseval’s Identity).

For all functions $\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}$ ,

\displaystyle\|f\|_{2}^{2}=\sum_{S\subseteq[D]}\hat{f}(S)^{2}.

4 Low-Privacy $\displaystyle\mathrm{DP}_{\mathrm{local}}$ and $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ Lower Bounds for CountDistinct

In this section, we prove Theorem 1.2 and Theorem 1.4. In Section 4.1, we introduce some necessary definitions and notation. In Section 4.2, we prove our lower bound for low-privacy (private-coin) $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols computing CountDistinct. In Section 4.3, we show the improved connection between $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ and $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , which implies our lower bounds for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols for CountDistinct. In Section 4.4, we describe how to adapt the proof to public-coin protocols.

4.1 Preliminaries

Recall that we use the notations $\displaystyle\vec{\mathsf{Poi}}(\vec{\lambda})$ and $\displaystyle\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U})]$ to denote multi-dimensional Poisson distributions and their mixtures, respectively (see Section 2.1 for the precise definitions).

We also recall the key additive property of multi-dimensional Poisson distributions: for $\displaystyle\vec{\alpha},\vec{\beta}\in\mathbb{R}^{D}$ , we have that

\displaystyle\vec{\mathsf{Poi}}(\vec{\alpha})+\vec{\mathsf{Poi}}(\vec{\beta})=\vec{\mathsf{Poi}}(\vec{\alpha}+\vec{\beta}).

4.2 Low-Privacy $\displaystyle\mathrm{DP}_{\mathrm{local}}$ Lower Bounds for CountDistinct

We will first prove the low-privacy $\displaystyle\mathrm{DP}_{\mathrm{local}}$ lower bounds in the private-coin setting, which is captured by the following theorem.

Theorem 4.1 (The Private-Coin Case of Theorem 1.2).

For some $\displaystyle\varepsilon=\ln(n/\Theta(\log^{6}n))$ and $\displaystyle D=\Theta(n/\log^{4}n)$ , if $\displaystyle P$ is a private-coin $\displaystyle(\varepsilon,n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol, then it cannot solve $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ and probability at least $\displaystyle 0.99$ .

4.2.1 Technical Lemmas

Now we need the following construction from [WY19] (which uses a classical result from [Tim14, 2.11.1]).

Lemma 4.2 ([WY19]).

There is a constant $\displaystyle c$ such that, for all $\displaystyle L\in\mathbb{N}$ , there are two distributions $\displaystyle U$ and $\displaystyle V$ supported on $\displaystyle\{0\}\cup[1,\Lambda]$ for $\displaystyle\Lambda=c\cdot L^{2}$ , such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1$ , $\displaystyle U_{0}-V_{0}>0.9$ , and $\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for every $\displaystyle j\in[L]$ .

The following lemma is crucial for our proof. Its proof uses the moment matching technique [WY16, JHW18, WY19, Yan19], and can be found in Appendix A.

Lemma 4.3.

Let $\displaystyle U,V$ be two random variables supported on $\displaystyle[0,\Lambda]$ such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for all $\displaystyle j\in\{1,2,\dotsc,L\}$ , where $\displaystyle L\geq 1$ . Let $\displaystyle D\in\mathbb{N}$ and $\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D}$ such that $\displaystyle\|\vec{\theta}\|_{1}=1$ . Let $\displaystyle\mathcal{D}_{\vec{\theta}}$ be the distribution over $\displaystyle[D]$ corresponding to $\displaystyle\vec{\theta}$ . Suppose that

\displaystyle\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\vec{\lambda}_{i}\geq 2\Lambda^{2}\cdot\vec{\theta}_{i}]\geq 1-\frac{1}{2\Lambda}.

Then,

\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\frac{1}{L!}.

Finally, we need an observation that for a $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol $\displaystyle P$ solving CountDistinct, we can assume without loss of generality that the analyzer of $\displaystyle P$ only sees the histogram of the messages.

Lemma 4.4.

For any $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol $\displaystyle P=(R,A)$ for CountDistinct, there exists an analyzer $\displaystyle A^{\prime}$ which only sees the histogram of the messages, and achieves the same accuracy and error as that of $\displaystyle A$ .

Proof.

Let $\displaystyle n$ be the number of users. Given the histogram, $\displaystyle A^{\prime}$ first constructs a sequence of messages $\displaystyle S\in\mathcal{M}^{n}$ consistent with the histogram. Then, it applies a random permutation $\displaystyle\pi\colon[n]\to[n]$ to $\displaystyle S$ to obtain a new sequence $\displaystyle\pi(S)$ . Finally, it simply outputs $\displaystyle A(\pi(S))$ .

Note that applying a random permutation on the messages is equivalent to applying a random permutation on the user inputs in the dataset. Hence, the new protocol $\displaystyle P^{\prime}=(R,A^{\prime})$ is equivalent to running $\displaystyle P$ on a random permutation of the dataset. The lemma follows from the fact that a random permutation does not change the number of distinct elements. ∎

4.2.2 Construction of the Hard Dataset Distributions

In the rest of the section, we use $\displaystyle n$ to denote a parameter controlling the number of users. The actual number of users $\displaystyle\bar{n}$ will be later set to a number in the interval $\displaystyle[n,2n]$ . In the following, we fix a randomizer $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ which is $\displaystyle(\varepsilon_{R},\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ on $\displaystyle\bar{n}$ users, for some $\displaystyle\varepsilon_{R}=\Theta(\bar{n}/\log^{6}\bar{n})$ to be specified later. Before constructing our two hard distributions over datasets, we set some parameters that will be used in the construction:

•

We set $\displaystyle L=\log n$ and note that $\displaystyle\frac{1}{L!}\leq 1/n^{4}$ for large enough $\displaystyle n$ .
•

Applying Lemma 4.2, for $\displaystyle\Lambda=\Theta(L^{2})=\Theta(\log^{2}n)$ , we obtain two random variables $\displaystyle U$ and $\displaystyle V$ supported on $\displaystyle\{0\}\cup[1,\Lambda]$ , such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[U]=\operatornamewithlimits{\mathbb{E}}[V]=1$ , $\displaystyle U_{0}-V_{0}>0.9$ , and $\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for every $\displaystyle j\in[L]$ .
•

We set $\displaystyle\Gamma=8\Lambda^{2}=\Theta(\log^{4}n)$ and $\displaystyle D=n/\Gamma=\Theta(n/\log^{4}n)$ . We are going to construct instances where inputs are from the universe $\displaystyle\mathcal{X}=[D]$ .
•

We set $\displaystyle\bar{n}=n+D-n^{0.99}$ .
•

Let $\displaystyle W=(\log^{2}n)\cdot 4\Lambda^{2}$ . We set $\displaystyle\varepsilon_{R}$ so that $\displaystyle n/2^{\varepsilon_{R}}=W=\Theta(\log^{6}n)$ . Hence, $\displaystyle R$ is $\displaystyle(\ln(n/W),n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

Now, for a distribution $\displaystyle U$ over $\displaystyle\mathbb{R}^{\geq 0}$ and a non-empty subset $\displaystyle E$ of $\displaystyle[D]$ , the dataset distribution $\displaystyle\mathcal{D}^{U,E}$ is constructed as follows:

1.

For each $\displaystyle i\in[D]$ , we draw $\displaystyle\lambda_{i}\leftarrow U$ , and $\displaystyle n_{i}\leftarrow\mathsf{Poi}(\lambda_{i})$ , and add $\displaystyle n_{i}$ many users with input $\displaystyle i$ .
2.

For each $\displaystyle j\in E$ , we draw $\displaystyle m_{j}\leftarrow\mathsf{Poi}(n/|E|)$ , and add $\displaystyle m_{j}$ many users with input $\displaystyle j$ .¹³¹³13Note that we here use a Poisson distribution slightly differently from the construction in Section 2 (namely, $\displaystyle\mathsf{Poi}(n/|E|)$ instead of $\displaystyle\mathsf{Poi}((n-D)/|E|)$ ) in order to simply the later calculations.

For clarity of exposition, we will use the histogram of the protocol to denote the histogram of the messages in the transcript of the protocol. Our goal is to show that for some “good” subset $\displaystyle E\subseteq[D]$ , the following hold:

1.

The distributions of the histogram of the protocol under $\displaystyle\mathcal{D}^{U,E}$ and $\displaystyle\mathcal{D}^{V,E}$ are very close.
2.

With high probability, the number of distinct elements in datasets from $\displaystyle\mathcal{D}^{U,E}$ is $\displaystyle\Omega(D)$ smaller than in datasets from $\displaystyle\mathcal{D}^{V,E}$ .

Clearly, given the above two conditions and Lemma 4.4, no protocol with randomizer $\displaystyle R$ can estimate the number of distinct elements within $\displaystyle o(D)$ error and with constant probability.

4.2.3 Conditions on a Good Subset $\displaystyle E$

Given a subset $\displaystyle E$ , we let $\displaystyle\vec{\nu}^{E}=\sum_{i\in E}R(i)\cdot\frac{n}{|E|}$ . We also set $\displaystyle\vec{\mu}=\sum_{i\in[D]}R(i)$ . We now specify our conditions on a subset $\displaystyle E\subseteq[D]$ being good. Let $\displaystyle\varepsilon_{1}=0.01$ . We say $\displaystyle E$ is good if the following two conditions hold:

1.

$\displaystyle 0<|E|<2\varepsilon_{1}\cdot|D|$ .
2.

For each $\displaystyle i\in[D]$ ,

$\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}]\geq 1-1/2\Lambda.$

We claim that a good subset $\displaystyle E$ exists. In fact, we give a probabilistic construction of $\displaystyle E$ that succeeds with high probability:

Lemma 4.5.

If we include each element of $\displaystyle i\in[D]$ in $\displaystyle E$ independently with probability $\displaystyle\varepsilon_{1}$ , then $\displaystyle E$ is good with probability at least $\displaystyle 1-n^{-\omega(1)}$ .

4.2.4 The Lower Bound

Before proving Lemma 4.5, we show that for a good $\displaystyle E$ , the distributions $\displaystyle\mathcal{D}^{U,E}$ and $\displaystyle\mathcal{D}^{V,E}$ satisfy our desired properties, and thereby imply our $\displaystyle\mathrm{DP}_{\mathrm{local}}$ lower bound. For a dataset distribution $\displaystyle\mathcal{D}$ , we use $\displaystyle\mathsf{Hist}_{R}(\mathcal{D})$ to denote the corresponding distribution of the histogram of the transcript, if all users apply the randomizer $\displaystyle R$ . For a dataset $\displaystyle I$ , we use $\displaystyle\textsf{\small CountDistinct}(I)$ to denote the number of distinct elements in it.

Lemma 4.6.

For a good subset $\displaystyle E$ of $\displaystyle[D]$ , the following hold:

1.

We have that

$\displaystyle\|\mathsf{Hist}_{R}(\mathcal{D}^{U,E})-\mathsf{Hist}_{R}(\mathcal{D}^{V,E})\|_{TV}\leq 1/n.$

There are two constants $\displaystyle\tau_{1}<\tau_{2}$ such that

\displaystyle\Pr_{I_{1}\leftarrow\mathcal{D}^{U,E}}[\textsf{\small CountDistinct}(I_{1})<\tau_{1}\cdot D]\geq 1-n^{-\omega(1)},

and

\displaystyle\Pr_{I_{2}\leftarrow\mathcal{D}^{V,E}}[\textsf{\small CountDistinct}(I_{2})>\tau_{2}\cdot D]\geq 1-n^{-\omega(1)}.

Proof.

Proof of Item (1).

In the following, we use $\displaystyle\vec{\nu}$ to denote $\displaystyle\vec{\nu}^{E}$ for simplicity. We first construct $\displaystyle D$ vectors $\displaystyle\{\vec{\nu}^{(i)}\}_{i\in[D]}$ as follows: for each $\displaystyle z\in\mathcal{M}$ , if $\displaystyle\vec{\nu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ , then for all $\displaystyle i\in[D]$ we set $\displaystyle(\vec{\nu}^{(i)})_{z}=R(i)_{z}\cdot 2\Lambda^{2}$ , otherwise we set $\displaystyle(\vec{\nu}^{(i)})_{z}=0$ for all $\displaystyle i\in[D]$ . Note that for each $\displaystyle z$ , we have that $\displaystyle\left(\sum_{i\in[D]}\vec{\nu}^{(i)}\right)_{z}\leq\vec{\nu}_{z}$ . Let $\displaystyle\vec{\nu}^{(0)}:=\vec{\nu}-\left(\sum_{i\in[D]}\vec{\nu}^{(i)}\right)$ . By definition, it follows that $\displaystyle\vec{\nu}^{(0)}$ is a non-negative vector. Now, $\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E})$ and $\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E})$ can be seen as distributions over histograms in $\displaystyle\mathbb{N}^{\mathcal{M}}$ . Let $\displaystyle X_{1},X_{2},\dotsc,X_{D}$ be $\displaystyle D$ independent random variables distributed as $\displaystyle U$ . By the construction of $\displaystyle\mathcal{D}^{U,E}$ , we have that

	$\displaystyle\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E})=$	$\displaystyle\displaystyle\vec{\mathsf{Poi}}(\vec{\nu})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(X_{i}\cdot R(i))$
	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(X_{i}\cdot R(i)+\vec{\nu}^{(i)}).$

Similarly, let $\displaystyle Y_{1},Y_{2},\dotsc,Y_{D}$ be $\displaystyle D$ independent random variables distributed as $\displaystyle V$ . We have that

\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E})=\vec{\mathsf{Poi}}(\vec{\nu}^{(0)})+\sum_{i=1}^{D}\vec{\mathsf{Poi}}(Y_{i}\cdot R(i)+\vec{\nu}^{(i)}).

Since for each $\displaystyle i\in[D]$ , we have that

\displaystyle\Pr_{z\leftarrow R(i)}[(\vec{\nu}^{(i)})_{z}\geq 2\Lambda^{2}\cdot R(i)_{z}]\geq 1-1/2\Lambda.

Applying Lemma 4.3, for each $\displaystyle i\in[D]$ , we have that

\displaystyle\|\vec{\mathsf{Poi}}(X_{i}\cdot R(i)+\vec{\nu}^{(i)})-\vec{\mathsf{Poi}}(Y_{i}\cdot R(i)+\vec{\nu}^{(i)})\|_{TV}\leq\left(\frac{1}{L!}\right)^{1/2}\leq 1/n^{2}.

Therefore,

\displaystyle\|\mathcal{D}^{U,E}-\mathcal{D}^{V,E}\|_{TV}\leq\sum_{i=1}^{D}\|\mathsf{Poi}(U\cdot R(i)+\vec{\nu}^{(i)})-\mathsf{Poi}(V\cdot R(i)+\vec{\nu}^{(i)})\|_{TV}\leq D\cdot\left(\frac{1}{L!}\right)^{1/2}\leq 1/n.

Proof of Item (2).

Let $\displaystyle\gamma_{U}=\operatornamewithlimits{\mathbb{E}}[e^{-U}]$ and $\displaystyle\gamma_{V}=\operatornamewithlimits{\mathbb{E}}[e^{-V}]$ . By Lemma 4.2, we have that $\displaystyle U_{0}\geq 0.9$ , $\displaystyle V_{0}\leq 0.1$ , and $\displaystyle U,V$ are supported on $\displaystyle\{0\}\cup[1,\Lambda]$ . Hence, it follows that $\displaystyle\gamma_{U}\geq U_{0}\geq 0.9$ and $\displaystyle\gamma_{V}\leq e^{-1}(1-V_{0})+V_{0}\cdot e^{-1}\leq 0.5$ .

Now, consider the construction of $\displaystyle\mathcal{D}^{U,E}$ . For every $\displaystyle i\in[D]$ , at least one user with input $\displaystyle i$ is added to the dataset during phase (1) with probability $\displaystyle 1-\gamma_{U}$ . Moreover, these events are mutually independent. Hence, by a simple Chernoff bound, with probability at least $\displaystyle 1-n^{-\omega(1)}$ , the number of distinct elements in the dataset after phase (1) is no greater than $\displaystyle(1-\gamma_{U}+0.01)\cdot D$ . Since the second phase can add at most $\displaystyle|E|=0.02D$ many distinct elements, we can set $\displaystyle\tau_{1}=1-\gamma_{U}+0.03$ .

Similarly, for instances generated from $\displaystyle\mathcal{D}^{V,E}$ , with probability at least $\displaystyle 1-n^{-\omega(1)}$ , the number of distinct elements in the dataset after phase (1) is at least $\displaystyle(1-\gamma_{V}-0.01)\cdot D$ . We can set $\displaystyle\tau_{2}=1-\gamma_{V}-0.01$ .

By our condition on $\displaystyle\gamma_{U}$ and $\displaystyle\gamma_{V}$ , we have that $\displaystyle\tau_{2}>\tau_{1}$ , which completes the proof. ∎

We are now ready to prove Theorem 4.1. One complication is that datasets from $\displaystyle\mathcal{D}^{U,E}$ and $\displaystyle\mathcal{D}^{V,E}$ may not have the same number of users. We address this issue by “throwing out” extra users randomly and obtain distributions over datasets with exactly $\displaystyle\bar{n}$ many users.

Proof of Theorem 4.1.

Consider the $\displaystyle\mathcal{D}^{U,E}$ and $\displaystyle\mathcal{D}^{V,E}$ . By a simple Chernoff bound, we have that with probability at least $\displaystyle 1-n^{-\omega(1)}$ , the number of users lies in $\displaystyle[n+D-n^{0.99},n+D+n^{0.99}]$ .

Recall that $\displaystyle\bar{n}=n+D-n^{0.99}$ . We construct the distribution $\displaystyle\bar{\mathcal{D}}^{U,E}$ as follows: to generate a dataset from $\displaystyle\bar{\mathcal{D}}^{U,E}$ , we take a sample dataset $\displaystyle I$ from $\displaystyle\mathcal{D}^{U,E}$ , and if there are $\displaystyle n_{I}>\bar{n}$ users in $\displaystyle I$ , we delete $\displaystyle n_{I}-\bar{n}$ users uniformly at random, and output $\displaystyle I$ . We similarly construct another distribution $\displaystyle\bar{\mathcal{D}}^{V,E}$ . Note that with probability at least $\displaystyle 1-n^{-\omega(1)}$ , we delete at most $\displaystyle 2n^{0.99}$ users in the construction of $\displaystyle\bar{\mathcal{D}}^{U,E}$ (as well as in that of $\displaystyle\bar{\mathcal{D}}^{V,E}$ ).

Now, both $\displaystyle\bar{\mathcal{D}}^{U,E}$ and $\displaystyle\bar{\mathcal{D}}^{V,E}$ output datasets with exactly $\displaystyle\bar{n}$ users with probability $\displaystyle 1-n^{-\omega(1)}$ . This means that if there is a protocol solving CountDistinct with $\displaystyle\bar{n}$ users with error $\displaystyle o(D)$ , then by Lemma 4.4, Item (2) of Lemma 4.6 and since $\displaystyle 2n^{0.99}=o(D)$ , the analyzer of the protocol should be able to distinguish $\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})$ and $\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})$ with at least a constant probability. Therefore, we have that

\displaystyle\|\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})\|_{TV}=\Omega(1).

On the other hand, $\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})$ (respectively, $\displaystyle\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})$ ) can also be constructed by taking a sample from $\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{U,E})$ (respectively, $\displaystyle\mathsf{Hist}_{R}(\mathcal{D}^{V,E})$ ) and throwing out some random messages until at most $\displaystyle\bar{n}$ messages remain. Since post-processing does not increase statistical distance, by Item (1) of Lemma 4.6, we have that

\displaystyle\|\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R}(\bar{\mathcal{D}}^{V,E})\|_{TV}\leq\|\mathsf{Hist}_{R}(\mathcal{D}^{U,E})-\mathsf{Hist}_{R}(\mathcal{D}^{V,E})\|_{TV}\leq 1/n,

a contradiction. ∎

4.2.5 A Probabilistic Construction of Good $\displaystyle E$

We need the following proposition for the proof of Lemma 4.5.

Proposition 4.7.

Let $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ be an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ randomizer. For every $\displaystyle i,j\in\mathcal{X}$ , it follows that

\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}]\leq 2\delta.

Proof.

Let $\displaystyle\mathcal{T}$ be the set $\displaystyle\{z:R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}\wedge z\in\mathcal{M}\}$ . Since $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , it follows that

\displaystyle R(i)_{\mathcal{T}}\leq e^{\varepsilon}\cdot R(j)_{\mathcal{T}}+\delta.

By the definition of the set $\displaystyle\mathcal{T}$ , it follows that

\displaystyle R(j)_{\mathcal{T}}=\sum_{z\in\mathcal{T}}R(j)_{z}\leq\frac{1}{2e^{\varepsilon}}\cdot\sum_{z\in\mathcal{T}}R(i)_{z}=\frac{1}{2e^{\varepsilon}}\cdot R(i)_{\mathcal{T}}.

Putting the above two inequalities together, we have

\displaystyle R(i)_{\mathcal{T}}\leq\frac{1}{2}\cdot R(i)_{\mathcal{T}}+\delta,

which in turn implies that

\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq 2e^{\varepsilon}\cdot R(j)_{z}]=R(i)_{\mathcal{T}}\leq 2\delta.\qed

Finally, we prove Lemma 4.5 (restated below).

Lemma 4.5. (restated) If we include each element $\displaystyle i\in[D]$ in $\displaystyle E$ independently with probability $\displaystyle\varepsilon_{1}=0.01$ , then $\displaystyle E$ is good with probability at least $\displaystyle 1-n^{-\omega(1)}$ .

Proof.

Let $\displaystyle\mathcal{E}_{\sf size}$ be the event that $\displaystyle 0<|E|<2\varepsilon_{1}\cdot|D|$ . By a simple Chernoff bound, it follows that

\displaystyle\Pr_{E}[\mathcal{E}_{\sf size}]\geq 1-\exp(-\Omega(|D|))\geq 1-n^{-\omega(1)}.

Therefore, the first condition for $\displaystyle E$ being good is satisfied with probability $\displaystyle 1-n^{-\omega(1)}$ . In the following, we will condition on the event $\displaystyle\mathcal{E}_{\sf size}$ .

Recall that $\displaystyle\vec{\nu}^{E}=\sum_{i\in E}R(i)\cdot\frac{n}{|E|}$ and $\displaystyle\vec{\mu}=\sum_{i\in[D]}R(i)$ . In the rest of the proof, we will focus on the second condition for $\displaystyle E$ being good, namely that for each $\displaystyle i\in[D]$ , it is the case that

\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}]\geq 1-1/2\Lambda.

In the following, we fix $\displaystyle i\in[D]$ , and show that the previous inequality holds for $\displaystyle i$ with high probability. Therefore, we can then conclude that $\displaystyle E$ is good with high probability by a union bound.

We also let $\displaystyle\vec{m}^{*}\in\mathbb{R}^{\mathcal{M}}$ be such that $\displaystyle\vec{m}^{*}_{z}=\max_{i\in[D]}R(i)_{z}$ for all $\displaystyle z\in\mathcal{M}$ . Now, for $\displaystyle z\in\mathcal{M}$ , if $\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n$ , we say $\displaystyle z$ is light; otherwise we say $\displaystyle z$ is heavy.

We define

\displaystyle\mathcal{E}_{\sf light}:=\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}]\leq 1/4\Lambda\right],

and

\displaystyle\mathcal{E}_{\sf heavy}:=\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is heavy}]\leq 1/4\Lambda\right].

It suffices to show that both $\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]$ and $\displaystyle\Pr_{E}[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]$ are very large. Note that $\displaystyle\vec{\nu}^{E}$ is not defined when $\displaystyle|E|=0$ . But since we only care about $\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]$ and $\displaystyle\Pr_{E}[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]$ , this corner case is excluded by conditioning on $\displaystyle\mathcal{E}_{\sf size}$ .

Proving that $\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1$ .

By Proposition 4.7 and the fact that $\displaystyle R$ is $\displaystyle(\ln(n/W),n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , for every $\displaystyle x\in[D]$

\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq(2n/W)\cdot R(x)_{z}]\leq n^{-\omega(1)}.

By a union bound over all elements in $\displaystyle E$ , we have that

\displaystyle\Pr_{z\leftarrow R(i)}[R(i)_{z}\geq(2n/W)\cdot\frac{\vec{\nu}_{z}}{n}]\leq n^{-\omega(1)},

which is equivalent to

\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot R(i)_{z}]\leq n^{-\omega(1)}.

Similarly, for $\displaystyle j\in[D]$ , we also have that

\displaystyle\Pr_{z\leftarrow R(j)}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]\leq n^{-\omega(1)}.

Again since $\displaystyle R$ is $\displaystyle(\ln(n/W),n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , we have that

		$\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}\frac{R(i)_{z}}{R(j)_{z}}\cdot\mathbb{1}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]$
	$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}(2n/W)\cdot\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}\leq 2n/W\right]\cdot\mathbb{1}[\vec{\nu}_{z}\leq W/2\cdot R(j)_{z}]+\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(j)}\frac{R(i)_{z}}{R(j)_{z}}\cdot\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}>2n/W\right]$
	$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle(2n/W)\cdot n^{-\omega(1)}+\operatornamewithlimits{\mathbb{E}}_{z\leftarrow R(i)}\mathbb{1}\left[\frac{R(i)_{z}}{R(j)_{z}}>2n/W\right]$
	$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle n^{-\omega(1)}.$

Therefore, by a union bound over $\displaystyle j\in[D]$ ,

\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot\vec{m}^{*}_{z}]\leq n^{-\omega(1)}.

Now we are ready to prove that $\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1$ . We will show that $\displaystyle\mathcal{E}_{\sf light}$ holds for every nonempty $\displaystyle E$ . We have that

	$\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}]$
$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\text{$\displaystyle z$ is light}\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{}_{z}]+\Pr_{z\leftarrow R(i)}[\vec{\nu}_{z}\leq W/2\cdot\vec{m}^{}_{z}]$
$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\vec{\mu}_{z}\leq\vec{m}^{}_{z}\cdot\log^{2}n\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{}_{z}]+n^{-\omega(1)}$	(z is light implies $\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n$ )
$\displaystyle\displaystyle\leq$	$\displaystyle\displaystyle n^{-\omega(1)}.$

The last inequality follows from the fact that $\displaystyle\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n$ and $\displaystyle\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}$ together imply that $\displaystyle\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}\geq\frac{W/2}{\log^{2}n}\vec{\mu}_{z}\geq 2\Lambda^{2}\cdot\vec{\mu}_{z}$ (recall that $\displaystyle W=\log^{2}4\Lambda^{2}$ ). Hence $\displaystyle\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\mbox{ and }\vec{\mu}_{z}\leq\vec{m}^{*}_{z}\cdot\log^{2}n\mbox{ and }\vec{\nu}_{z}>W/2\cdot\vec{m}^{*}_{z}]=0$ as the three inequalities cannot be simultaneously satisfied.

Proving that $\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]$ is large.

Now, for a heavy $\displaystyle z$ , we have that $\displaystyle\vec{\mu}_{z}\geq\vec{m}^{*}_{z}\cdot\log^{2}n$ . In particular, fix a heavy $\displaystyle z$ , and define the random variable $\displaystyle X_{i}:=\mathbb{1}[i\in E]\cdot R(i)_{z}$ for each $\displaystyle i\in[D]$ . Note that the $\displaystyle X_{i}$ ’s are independent variables over $\displaystyle[0,R(i)_{z}]$ and $\displaystyle\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in[D]}X_{i}\right]=\varepsilon_{1}\cdot\vec{\mu}_{z}$ . Letting $\displaystyle S=\sum_{i\in[D]}X_{i}$ , by Hoeffding’s inequality, we have that

\displaystyle\Pr_{E}[|S-\operatornamewithlimits{\mathbb{E}}[S]|\geq\frac{1}{2}\operatornamewithlimits{\mathbb{E}}[S]]\leq 2\exp\left(-\frac{2\cdot(\frac{1}{2}\operatornamewithlimits{\mathbb{E}}[S])^{2}}{\sum_{i\in[D]}R(i)_{z}^{2}}\right).

Note that

\displaystyle\sum_{i\in[D]}R(i)_{z}^{2}\leq\sum_{i\in[D]}R(i)_{z}\cdot\vec{m}^{*}_{z}\leq\vec{\mu}_{z}\cdot\vec{m}^{*}_{z}.

Plugging in, it follows that

\displaystyle\Pr_{E}[S\leq\varepsilon_{1}\vec{\mu}_{z}/2]\leq 2\exp\left(-\frac{\varepsilon_{1}^{2}\vec{\mu}_{z}^{2}/2}{\vec{\mu}_{z}\cdot\vec{m}^{*}_{z}}\right)\leq 2\exp(-\varepsilon_{1}^{2}/2\cdot\log^{2}n)\leq n^{-\omega(1)}.

Note that $\displaystyle\vec{\nu}^{E}_{z}=S\cdot\frac{n}{|E|}$ , and that $\displaystyle|E|\leq 2\varepsilon_{1}D$ with probability at least $\displaystyle 1-n^{-\omega(1)}$ . By a union bound, we have that $\displaystyle\vec{\nu}^{E}_{z}\geq\varepsilon_{1}\vec{\mu}_{z}/2\cdot\frac{n}{2\varepsilon_{1}D}=\vec{\mu}_{z}\cdot\Gamma/4$ with probability at least $\displaystyle 1-n^{-\omega(1)}$ . Noting that $\displaystyle\Gamma/4=2\Lambda^{2}$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{E}\left[\Pr_{z\leftarrow R(i)}[\vec{\nu}^{E}_{z}<2\Lambda^{2}\cdot\vec{\mu}_{z}\wedge\text{$\displaystyle z$ is heavy}]\right]\leq n^{-\omega(1)}.

Recall that $\displaystyle\Pr[\mathcal{E}_{\sf size}]\geq 1-n^{-\omega(1)}$ . By Markov’s inequality, we have $\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]\geq 1-n^{-\omega(1)}$ . ∎

4.3 $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ Implies $\displaystyle\mathrm{DP}_{\mathrm{local}}$ with Stronger Privacy Bound

In this section, we prove a stronger connection between $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ and $\displaystyle\mathrm{DP}_{\mathrm{local}}$ than previously known. Together with Theorem 4.1, it implies the private-coin version of Theorem 1.4.

We first need a technical lemma which gives a lower bound on the hockey stick divergence between $\displaystyle\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,p)$ and $\displaystyle\mathsf{Ber}(\beta)+\mathsf{Bin}(m,p)$ . We defer its proof to Appendix B.

Lemma 4.8.

There exists an absolute constant $\displaystyle c_{0}$ such that, for every integer $\displaystyle m\geq 1$ , three reals $\displaystyle\alpha,\beta,\varepsilon>0$ such that $\displaystyle\alpha>e^{\varepsilon}\beta$ , letting $\displaystyle\Delta=\alpha-e^{\varepsilon}\beta$ and supposing $\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2$ , it holds that

\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

We are now ready to prove the main lemma of this subsection.

Lemma 4.9.

For all $\displaystyle\varepsilon=O(1)$ , there is a constant $\displaystyle c>0$ such that for all $\displaystyle\delta\leq\delta_{0}\leq 1/n$ if the randomizer $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ on $\displaystyle n$ users, then $\displaystyle R$ is $\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

Proof.

Let $\displaystyle\varepsilon=O(1)$ . Note that we can assume that $\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}$ , as otherwise $\displaystyle\frac{\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\leq O(1)$ and in this case the theorem follows directly from the fact that $\displaystyle R$ is $\displaystyle(\varepsilon+\ln n,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ [CSU⁺18].

Suppose that $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ on $\displaystyle n$ users. Let $\displaystyle c$ be a constant to be fixed later, $\displaystyle D=n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}$ and $\displaystyle E\subseteq\mathcal{M}$ be an event. Our goal is to show that

\displaystyle R(x)_{E}\leq R(y)_{E}\cdot D+\delta_{0},

for all $\displaystyle x,y\in\mathcal{X}$ .

Fix two $\displaystyle x,y\in\mathcal{X}$ . Let $\displaystyle\alpha=R(x)_{E}$ and $\displaystyle\beta=R(y)_{E}$ . Note that without loss of generality we can assume that $\displaystyle\beta\leq 1/D$ , as otherwise clearly $\displaystyle\alpha\leq 1\leq D\cdot\beta+\delta_{0}$ .

Let $\displaystyle W_{1}=xy^{n-1}$ and $\displaystyle W_{2}=y^{n}$ be two neighboring datasets, and $\displaystyle X,Y$ be the random variables corresponding to the number of occurrences of the event $\displaystyle E$ in the transcript, when running the protocol with randomizer $\displaystyle R$ on datasets $\displaystyle W_{1}$ and $\displaystyle W_{2}$ , respectively.

From the assumption that $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ , we have $\displaystyle d_{\varepsilon}(\mathsf{Hist}_{R}(W_{1})||\mathsf{Hist}_{R}(W_{2}))\leq\delta$ . Then, by the post-processing property of DP (Lemma 3.2), it follows that $\displaystyle d_{\varepsilon}(X||Y)\leq\delta$ .

We have that

\displaystyle X=\mathsf{Ber}(\alpha)+\mathsf{Bin}(n-1,\beta)\text{ and }Y=\mathsf{Ber}(\beta)+\mathsf{Bin}(n-1,\beta).

The goal now is to show that if $\displaystyle\alpha>\beta\cdot D+\delta_{0}$ , then $\displaystyle X$ and $\displaystyle Y$ do not satisfy $\displaystyle(\varepsilon,\delta)$ -DP (i.e., $\displaystyle d_{\varepsilon}(X||Y)>\delta$ ), thus obtaining a contradiction.

Now, assume that $\displaystyle\alpha>D\cdot\beta+\delta_{0}$ . Since $\displaystyle\varepsilon=O(1)$ , we have that $\displaystyle\Delta=\alpha-e^{\varepsilon}\beta>\frac{D}{2}\cdot\beta+\delta_{0}$ . Note that $\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta=O(D^{-1})<1/2$ .

Letting $\displaystyle m=n-1$ and applying Lemma 4.8 for a universal constant $\displaystyle c_{0}$ , it follows that

\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

Noting that $\displaystyle\varepsilon=O(1)$ , we have that

\displaystyle m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\leq O\left(m\frac{1}{D\beta}\cdot\beta\cdot\log\delta_{0}^{-1}\right)=O(c\ln\delta^{-1}).

We now set the constant $\displaystyle c$ to be small enough so that

\displaystyle c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\leq\frac{1}{2}\ln\delta^{-1}.

Plugging in and recalling that $\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}$ , we get that

\displaystyle d_{\varepsilon}(X||Y)=d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\delta_{0}\cdot\frac{1}{2\sqrt{2m}}\sqrt{\delta}>\delta,

a contradiction. ∎

Finally, we are ready to prove our $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ lower bound for CountDistinct in the private-coin case.

Theorem 4.10.

For all $\displaystyle\varepsilon=O(1)$ , there are $\displaystyle\delta=2^{-\Theta(\log^{8}n)}$ and $\displaystyle D=\Theta(n/\log^{4}n)$ such that no private-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol on $\displaystyle n$ users can solve $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ and probability at least $\displaystyle 0.99$ .

Proof.

We set $\displaystyle\delta_{0}=2^{-\log^{2}n}$ and $\displaystyle\delta=2^{-c\log^{8}n}$ for a constant $\displaystyle c$ to be specified shortly. By Theorem 4.1, it follows that the corresponding randomizer $\displaystyle R$ is $\displaystyle(\ln(\Theta(n/c\log^{6}n),n^{-\omega(1)})$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ . Setting $\displaystyle c$ to be sufficiently large and combining with Theorem 4.1 completes the proof. ∎

4.4 Generalizing to Public-Coin Protocols

Finally, we generalize our proof for the private-coin case to the public-coin case, and prove Theorem 1.4 (restated below).

Theorem 1.4. (restated) For all $\displaystyle\varepsilon=O(1)$ , there are $\displaystyle\delta=2^{-\Theta(\log^{8}n)}$ and $\displaystyle D=\Theta(n/\log^{4}n)$ such that no public-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol on $\displaystyle n$ users can solve $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ and probability at least $\displaystyle 0.99$ .

Fix $\displaystyle R$ to be a public-coin randomizer with public randomness $\displaystyle\alpha$ from distribution $\displaystyle\mathcal{D}_{\sf pub}$ . We first generalize Lemma 4.9, and show that if $\displaystyle R$ is $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ , then with high probability over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , $\displaystyle R_{\alpha}$ satisfies the similar $\displaystyle\mathrm{DP}_{\mathrm{local}}$ guarantee as in Lemma 4.9.

Lemma 4.11.

For all $\displaystyle\varepsilon=O(1)$ , there is a constant $\displaystyle c>0$ such that for all $\displaystyle\delta\leq\delta_{0}\leq 1/n$ if the public-coin randomizer $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ with public randomness distribution $\displaystyle\mathcal{D}_{\sf pub}$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ on $\displaystyle n$ users, and if $\displaystyle|\mathcal{X}|\leq n$ , then with probability at least $\displaystyle 1-\delta_{0}$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , it is the case that $\displaystyle R_{\alpha}$ is $\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

Proof Sketch.

Similar to the proof of Lemma 4.9, we can assume that $\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}$ without loss of generality.

By the definition of public-coin $\displaystyle(\varepsilon,\delta)$ -DP in the shuffle model, for every two neighboring datasets $\displaystyle W_{1}$ and $\displaystyle W_{2}$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq\delta.

Observe that the proof of Lemma 4.9 only considers the $\displaystyle|\mathcal{X}|^{2}$ pairs of neighboring datasets of the form $\displaystyle W_{1}=xy^{n-1}$ and $\displaystyle W_{2}=y^{n}$ for all $\displaystyle x,y\in\mathcal{X}$ . We use $\displaystyle W_{good}$ to denote the set of such pairs.

Using the assumption that $\displaystyle|\mathcal{X}|\leq n$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\sum_{(W_{1},W_{2})\in W_{good}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq|\mathcal{X}|^{2}\delta\leq n^{2}\delta.

Thus, by Markov’s inequality, with probability at least $\displaystyle 1-\delta_{0}$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have that

\displaystyle\sum_{(W_{1},W_{2})\in W_{good}}[d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))]\leq n^{2}\delta/\delta_{0}\leq\delta^{0.9},

where the last inequality follows from our assumption that $\displaystyle\delta\leq\delta_{0}^{\omega(1)}\leq n^{-\omega(1)}$ . We say an $\displaystyle\alpha$ is good if it satisfies the above inequality. In particular, for all good $\displaystyle\alpha$ and all pairs $\displaystyle(W_{1},W_{2})\in W_{good}$ , we have that

\displaystyle d_{\varepsilon}(\mathsf{Hist}_{R_{\alpha}}(W_{1})||\mathsf{Hist}_{R_{\alpha}}(W_{2}))\leq\delta^{0.9}.

The proof of Lemma 4.9 then implies that $\displaystyle R_{\alpha}$ is $\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , for a constant $\displaystyle c$ depending on $\displaystyle\varepsilon$ . ∎

Now we are ready to prove Theorem 1.4.

Proof of Theorem 1.4.

We use $\displaystyle\bar{\mathcal{D}}^{U,E}$ and $\displaystyle\bar{\mathcal{D}}^{V,E}$ to denote the same distributions constructed in the proof of Theorem 4.10. We moreover use the same notation as in Section 4.2.

By a simple application of Markov’s inequality and noting that our assumed protocol solves $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle o(D)$ and probability at least $\displaystyle 0.99$ , it follows that with probability at least $\displaystyle 0.9$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , if $\displaystyle|E|\leq 0.02\cdot|D|$ , then

\displaystyle\|\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{V,E})\|_{TV}=\Omega(1).

By Lemma 4.11, with probability at least $\displaystyle 1-\delta_{0}$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have that $\displaystyle R_{\alpha}$ is $\displaystyle\left(\ln\left(n\Big{/}\frac{c\ln\delta^{-1}}{\ln\delta_{0}^{-1}}\right),\delta_{0}\right)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ . We say that such an $\displaystyle\alpha$ is good.

For all good $\displaystyle\alpha$ and for a good subset $\displaystyle E$ , when the randomizer is set to $\displaystyle R_{\alpha}$ (note that the definition of a good subset depends on the randomizer $\displaystyle R$ ), by a similar argument as in Theorem 4.10, we have

\displaystyle\|\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{U,E})-\mathsf{Hist}_{R_{\alpha}}(\bar{\mathcal{D}}^{V,E})\|_{TV}=o(1).

Now, by Lemma 4.5, if we construct $\displaystyle E$ by including each element of $\displaystyle D$ with probability $\displaystyle 0.01$ , then for every good $\displaystyle\alpha$ , we know that $\displaystyle E$ is good for randomizer $\displaystyle R_{\alpha}$ with probability at least $\displaystyle 1-n^{-\omega(1)}$ . By a union bound, there exists a fixed choice of $\displaystyle E$ such that $\displaystyle E$ is good for randomizer $\displaystyle R_{\alpha}$ with probability at least $\displaystyle 1-1/n$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ . In the following, we fix $\displaystyle E$ to be such a choice.

Finally, by a union bound, it follows that with probability at least $\displaystyle 0.9-\delta_{0}-1/n>0$ , the above two inequalities hold simultaneously, a contradiction. ∎

Theorem 1.2 follows exactly using a similar argument as in the proof of Theorem 1.4 (in fact, it is simpler in the local case because there is no need to apply Lemma 4.11).

5 $\displaystyle(\varepsilon,\delta)$ -Dominated Algorithms

In [CSU⁺18], it was shown that an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol on $\displaystyle n$ users is also $\displaystyle(\varepsilon+\ln n,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , thereby reducing the problem of proving lower bounds for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocols to proving lower bounds on $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols with low privacy properties. However, it is known that such a connection does not hold even for $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{2}$ protocols [BC20, Section 4.1].

Recall the definition of $\displaystyle(\varepsilon,\delta)$ -dominated algorithms from Definition 1.7. In this section, we will show that $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocols are dominated.

For clarity of exposition, we will assume that each user sends exactly $\displaystyle k$ messages; this is without loss of generality (see Footnote 12). To handle public-coin protocols, we need a relaxed version of dominated algorithms.

Definition 5.1 (Dominated Algorithms).

For a distribution $\displaystyle\mu$ on $\displaystyle\mathcal{X}$ , we say an algorithm $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mu)$ -dominated, if for the distribution $\displaystyle\mathcal{D}_{\mu}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}R(x)$ , there exists a distribution $\displaystyle\mathcal{D}$ on $\displaystyle\mathcal{M}^{k}$ such that

\displaystyle d_{\varepsilon}\left(\mathcal{D}_{\mu}||\mathcal{D}\right)\leq\delta.

In this case, we also say $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mu)$ -dominated by $\displaystyle\mathcal{D}$ .

5.1 Approximate- $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols are Dominated

Next we show that approximate $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols are dominated. For this purpose, we introduce the concept of “pseudo-locally private” algorithms, which is a special case of dominated algorithms, and may be interesting in its own right.

5.1.1 Merged Randomizer

Let $\displaystyle\mathcal{B}_{n,k}$ be the set of all $\displaystyle k$ -sized subsets of the set $\displaystyle[n]\times[k]$ . For $\displaystyle\mathcal{F}\in\mathcal{B}_{n,k}$ and a randomizer $\displaystyle R$ , we define the merged randomizer of $\displaystyle R$ with respect to $\displaystyle\mathcal{F}$ , denoted by $\displaystyle R^{\mathcal{F}}$ , as follows:

$\displaystyle R^{\mathcal{F}}(x)$

•

Given an input $\displaystyle x$ , for each $\displaystyle j\in[n]$ , we simulate $\displaystyle R(x)$ with independent random coins to get an output $\displaystyle w_{j}\in\mathcal{M}^{k}$ .
•

Assume that $\displaystyle\mathcal{F}$ consists of elements $\displaystyle(x_{1},y_{1}),\dotsc,(x_{k},y_{k})\in[n]\times[k]$ indexed in lexicographical order. We construct a $\displaystyle k$ -tuple $\displaystyle z\in\mathcal{M}^{k}$ such that $\displaystyle z_{i}=(w_{x_{i}})_{y_{i}}$ for each $\displaystyle i\in[k]$ .
•

We pre-shuffle $\displaystyle z$ before outputting it. That is, we draw a permutation $\displaystyle\pi\colon[k]\to[k]$ uniformly at random, shuffle $\displaystyle z$ according to $\displaystyle\pi$ to obtain a new $\displaystyle k$ -tuple $\displaystyle\widetilde{z}$ (by setting $\displaystyle\widetilde{z}_{i}=z_{\pi(i)}$ for each $\displaystyle i\in[k]$ ), and output $\displaystyle\widetilde{z}$ .

That is, $\displaystyle R^{\mathcal{F}}(x)$ runs $\displaystyle R(x)$ several times, and merges the obtained outputs according to $\displaystyle\mathcal{F}$ . We now define a distribution $\displaystyle\mathcal{D}_{n,k}$ on $\displaystyle\mathcal{B}_{n,k}$ as follows: to draw a sample from $\displaystyle\mathcal{D}_{n,k}$ , we simply draw $\displaystyle k$ items $\displaystyle\{(x_{i},y_{i})\}_{i\in[k]}$ without replacement from the set $\displaystyle[n]\times[k]$ .

Finally, for a randomizer $\displaystyle R$ , we define the randomizer $\displaystyle R^{\sf rand}$ as follows: Given an input $\displaystyle x$ , we first draw $\displaystyle\mathcal{F}$ from $\displaystyle\mathcal{D}_{n,k}$ , and then simulate $\displaystyle R^{\mathcal{F}}$ on the input $\displaystyle x$ and output its output.

5.1.2 Pseudo-Locally Private Algorithms

We are now ready to define pseudo-locally private algorithms.

Definition 5.2.

We say that an algorithm $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -pseudo-locally private, if for all $\displaystyle x,y\in\mathcal{X}$ and all $\displaystyle E\subseteq\mathcal{M}^{k}$ ,

\displaystyle\Pr[R(x)\in E]\leq e^{\varepsilon}\cdot\Pr[R^{\sf rand}(y)\in E]+\delta.

Remark 5.3.

If $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -pseudo-locally private, then clearly $\displaystyle R$ is also $\displaystyle(\varepsilon,\delta)$ -dominated; we can simply take $\displaystyle\mathcal{D}=R^{\sf rand}(y^{*})$ for any fixed $\displaystyle y^{*}$ .

5.1.3 Multi-Message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols are Pseudo-Locally Private

Our most crucial observation here is an analogue of [CSU⁺18, Theorem 6.2] for multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols. Namely, we show that any multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol is pseudo-locally private.

Lemma 5.4.

Proof.

Suppose otherwise, i.e., that there are $\displaystyle x,y$ and $\displaystyle E\subseteq\mathcal{M}^{k}$ such that

\displaystyle\displaystyle\Pr[R(x)\in E]>(en)^{k}e^{\varepsilon}\cdot\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right]+\delta.

Note that since both $\displaystyle R$ and $\displaystyle R^{\mathcal{F}}$ pre-shuffle their outputs before outputting them, we can assume that $\displaystyle E$ is a union of equivalence classes of $\displaystyle k$ -tuples (we say two $\displaystyle k$ -tuples $\displaystyle u,v\in\mathcal{M}^{k}$ are equivalent if $\displaystyle v$ can be obtained by $\displaystyle u$ via applying a permutation).¹⁴¹⁴14Too see this, we can take $\displaystyle E$ to be $\displaystyle\{z:R(x)_{z}>(en)^{k}e^{\varepsilon}R^{\sf rand}(y)_{z}\wedge z\in\mathcal{M}^{k}\}$ . One can see that if $\displaystyle u$ and $\displaystyle v$ are equivalent up to a permutation, then either both $\displaystyle u$ and $\displaystyle v$ are in $\displaystyle E$ , or neither of them is in $\displaystyle E$ .

Consider two datasets $\displaystyle X_{0}=y^{n}$ and $\displaystyle X_{1}=y^{n-1}x$ . Let $\displaystyle P$ be the corresponding $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol with randomizer $\displaystyle R$ . For a dataset $\displaystyle X$ , we use $\displaystyle P_{R}(X)$ to denote the random variable of the transcript of $\displaystyle P$ before shuffling. That is, for a dataset $\displaystyle X=(x_{i})_{i\in[n]}$ , $\displaystyle P_{R}(X)$ is the concatenation of all $\displaystyle R(X_{i})$ for $\displaystyle i$ from $\displaystyle 1$ to $\displaystyle n$ .

We now define an event $\displaystyle\mathcal{E}$ as “there exist $\displaystyle k$ messages in the transcript of $\displaystyle P$ constituting the event $\displaystyle E$ ”. It immediately follows that

\displaystyle\displaystyle\Pr[P_{R}(X_{1})\in\mathcal{E}]\geq\Pr[R(x)\in E].

(2)

Furthermore, we claim that

\displaystyle\displaystyle\Pr[P_{R}(X_{0})\in\mathcal{E}]\leq\binom{kn}{k}\cdot\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right].

(3)

To see why the above inequality holds, note that if we pick $\displaystyle k$ messages from $\displaystyle P_{R}(X_{0})$ , depending on which users these messages come from, the probability that they constitute $\displaystyle E$ is bounded by $\displaystyle\Pr[R^{\mathcal{F}}(y)\in E]$ for a certain $\displaystyle\mathcal{F}\in\mathcal{B}_{n,k}$ .

Moreover, if we pick these $\displaystyle k$ messages uniformly at random from all $\displaystyle\binom{kn}{k}$ possible $\displaystyle k$ -tuples, the corresponding $\displaystyle\mathcal{F}$ is distributed according to $\displaystyle\mathcal{D}_{n,k}$ . Therefore, we can apply a union bound over all $\displaystyle\binom{kn}{k}$ possible $\displaystyle k$ -tuples and sum up the corresponding $\displaystyle\Pr[R^{\mathcal{F}}(y)\in E]$ to obtain an upper bound on $\displaystyle\Pr[P_{R}(X_{0})\in\mathcal{E}]$ . The aforementioned sum is precisely $\displaystyle\binom{kn}{k}$ times the expectation $\displaystyle\operatornamewithlimits{\mathbb{E}}_{\mathcal{F}\leftarrow\mathcal{D}_{n,k}}\left[\Pr\left[R^{\mathcal{F}}(y)\in E\right]\right]$ .

Since $\displaystyle\binom{kn}{k}\leq(en)^{k}$ , we may combine (2) and (3) to obtain

\Pr[P_{R}(X_{1})\in\mathcal{E}]>e^{\varepsilon}\cdot\Pr[P_{R}(X_{0})\in\mathcal{E}]+\delta.

(4)

Finally, note that applying a random permutation to the transcript does not change whether the event $\displaystyle\mathcal{E}$ occurs. Therefore, (4) contradicts the assumption that $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ . ∎

From Remark 5.3, we get the following corollary:

Corollary 5.5.

Next, we generalize Corollary 5.5 to the public-coin case:

Lemma 5.6.

If $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ in the $\displaystyle n$ -user public-coin setting with public randomness from $\displaystyle\mathcal{D}_{\sf pub}$ and $\displaystyle\tau=\varepsilon+k(1+\ln n)$ , then there is a family of distributions $\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}$ over $\displaystyle\mathcal{M}^{k}$ such that for every distribution $\displaystyle\mu$ over $\displaystyle\mathcal{X}$ ,

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq\delta.

In other words, there are reals $\displaystyle\{\delta_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}$ such that $\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[\delta_{\alpha}]\leq\delta$ and $\displaystyle R_{\alpha}$ is $\displaystyle(\tau,\delta_{\alpha},\mu)$ -dominated by $\displaystyle\mathcal{D}_{\alpha}$ .

Proof.

From the proof of Lemma 5.4, it follows that for all $\displaystyle x,y\in\mathcal{X}$ and $\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})$ , we have that

\displaystyle d_{\tau}(R_{\alpha}(x)||R^{\sf rand}_{\alpha}(y))\leq d_{\varepsilon}(P_{R_{\alpha}}(xy^{n-1})||P_{R_{\alpha}}(y^{n})).

Since $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ on $\displaystyle n$ users, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\varepsilon}(P_{R_{\alpha}}(xy^{n-1})||P_{R_{\alpha}}(y^{n}))]\leq\delta.

Putting the above two inequalities together, for all $\displaystyle x,y\in\mathcal{X}$ , we get that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}[d_{\tau}(R_{\alpha}(x)||R^{\sf rand}_{\alpha}(y))]\leq\delta.

We now finish the proof by fixing $\displaystyle y^{*}\in\mathcal{X}$ , and setting $\displaystyle\mathcal{D}_{\alpha}=R^{\sf rand}_{\alpha}(y^{*})$ for every $\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})$ . ∎

5.2 Bounding KL Divergence for Dominated Randomizers

In this subsection, we prove the technical lemma bounding average-case KL divergences for dominated randomizers.

As before, we use $\displaystyle\mathcal{X}$ and $\displaystyle\mathcal{M}$ to denote the input space and the message space respectively. For a local randomizer $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ , we let $\displaystyle p_{x,z}=\Pr[R(x)=z]$ .

Let $\displaystyle\mu$ be a distribution on $\displaystyle\mathcal{X}$ . Let $\displaystyle\mathcal{I}$ be an index set, $\displaystyle\pi$ be a distribution on $\displaystyle\mathcal{I}$ , and $\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}$ be a family of distributions on $\displaystyle\mathcal{X}$ . For a constant $\displaystyle\tau$ , we say that $\displaystyle\mu$ $\displaystyle\tau$ -dominates $\displaystyle\{\lambda_{v}\}$ if for all $\displaystyle x\in\mathcal{X}$ and $\displaystyle v\in\mathcal{I}$ , it holds that $\displaystyle(\lambda_{v})_{x}\leq\tau\cdot\mu_{x}$ .

Theorem 5.7.

For a constant $\displaystyle\tau\geq 2$ , let $\displaystyle\mu$ be a distribution which $\displaystyle\tau$ -dominates a distribution family $\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}$ . Let $\displaystyle\pi$ be a distribution on $\displaystyle\mathcal{I}$ . Let $\displaystyle W\colon\mathbb{R}\to\mathbb{R}$ be a concave function such that for all functions $\displaystyle\psi\colon\mathcal{X}\to\mathbb{R}^{\geq 0}$ satisfying $\displaystyle\psi(\mu)\leq 1$ , it holds that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(\psi(\lambda_{v})-\psi(\mu))^{2}\right]\leq W(\|\psi\|_{\infty}).

Then for an $\displaystyle(\varepsilon,\delta,\mu)$ -dominated randomizer $\displaystyle R$ , it holds that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta.

Proof.

Let $\displaystyle Q=R(\mu)$ . Recall that $\displaystyle p_{x,z}=\Pr[R(x)=z]$ . We also set $\displaystyle q_{z}=\Pr[Q=z]$ and $\displaystyle f_{z}(x)=\frac{p_{x,z}}{q_{z}}$ .

It follows from the assumption that there exists a distribution $\displaystyle q^{\mathcal{D}}$ that $\displaystyle(\varepsilon,\delta,\mu)$ -dominates $\displaystyle R$ . Noting that $\displaystyle\chi^{2}$ -divergence upper-bounds KL divergence (see Section 3.4), it follows that

	$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\mathrm{KL}(R(\lambda_{v})\|\|Q)$	$\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\chi^{2}(R(\lambda_{v})\|\|Q)$
		$\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\left[\frac{R(\lambda_{v})_{z}-q_{z}}{q_{z}}\right]^{2}$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[f_{z}(\lambda_{v})-1\right]^{2}.$

We will further decompose $\displaystyle f_{z}=g_{z}+h_{z}$ so that $\displaystyle\|g_{z}\|_{\infty}$ is small and $\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)$ is small. Formally, for each $\displaystyle z\in\mathcal{M}$ , we define a truncation level

\displaystyle L_{z}=\frac{2e^{\varepsilon}\cdot q_{z}^{\mathcal{D}}}{q_{z}}.

Then, we define $\displaystyle g_{z}$ and $\displaystyle h_{z}$ as follows

\displaystyle g_{z}(x):=\begin{cases}f_{z}(x)&\text{if }f_{z}(x)\leq L_{z},\\ 0&\text{otherwise},\end{cases}\quad\quad\text{and}\quad\quad h_{z}(x):=f_{z}(x)-g_{z}(x).

Fix a $\displaystyle z$ in the support of $\displaystyle Q$ . Noting that

\displaystyle g_{z}(\mu)+h_{z}(\mu)=f_{z}(\mu)=\frac{\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,z}]}{q_{z}}=1,

we get

	$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[f_{z}(\lambda_{v})-1\right]^{2}$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\left[(g_{z}(\lambda_{v})-g_{z}(\mu))+(h_{z}(\lambda_{v})-h_{z}(\mu))\right]^{2}$
		$\displaystyle\displaystyle\leq 2\cdot\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[(g_{z}(\lambda_{v})-g_{z}(\mu))^{2}]+2\cdot\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[(h_{z}(\lambda_{v})-h_{z}(\mu))^{2}].$		(5)

To simplify the notation, in the following we set $\displaystyle\hat{g}_{z}(\lambda_{v}):=g_{z}(\lambda_{v})-g_{z}(\mu)$ and $\displaystyle\hat{h}_{z}(\lambda_{v})=h_{z}(\lambda_{v})-h_{z}(\mu)$ . We will bound the two terms in (5) separately.

Bounding $\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}$ .

Since $\displaystyle W$ is concave, noting that $\displaystyle\|g_{z}\|_{\infty}\leq L_{z}$ and $\displaystyle g_{z}(\mu)\leq 1$ , it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}\leq\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}W(L_{z})\leq W\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}L_{z}\right),

where the second step uses Jensen’s inequality. From the definition of $\displaystyle L_{z}$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}L_{z}=\sum_{z}q_{z}\cdot\frac{2e^{\varepsilon}\cdot q^{\mathcal{D}}_{z}}{q_{z}}=2e^{\varepsilon}\cdot\sum_{z}q^{\mathcal{D}}_{z}=2e^{\varepsilon},

where the last equality follows from the fact that $\displaystyle q^{\mathcal{D}}$ is a distribution. We therefore obtain

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}\leq W(2e^{\varepsilon}).

Bounding $\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}$ .

Since $\displaystyle\mu$ $\displaystyle\tau$ -dominates $\displaystyle\{\lambda_{v}\}$ , it follows that

\displaystyle|\hat{h}_{z}(\lambda_{v})|=|h_{z}(\lambda_{v})-h_{z}(\mu)|\leq\max\left\{h_{z}(\mu),\tau\cdot h_{z}(\mu)-h_{z}(\mu)\right\}\leq(\tau-1)h_{z}(\mu).

Therefore,

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}\leq(\tau-1)^{2}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)^{2}\leq(\tau-1)^{2}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu),

where the last inequality holds because $\displaystyle h_{z}(\mu)\leq 1$ .

By the definition of $\displaystyle h_{z}$ , it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)=\sum_{z\in\mathcal{M}}\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}\Big{[}p_{x,z}\cdot\mathbb{1}[f_{z}(x)>L_{z}]\Big{]}.

Let $\displaystyle\mathcal{T}_{x}=\{z\in\mathcal{M}:f_{z}(x)>L_{z}\}$ . For $\displaystyle z\in\mathcal{T}_{x}$ , we get that

	$\displaystyle\displaystyle p_{x,z}$	$\displaystyle\displaystyle>L_{z}\cdot q_{z}$
		$\displaystyle\displaystyle>\frac{2e^{\varepsilon}\cdot q_{z}^{\mathcal{D}}}{q_{z}}\cdot q_{z}=2\cdot e^{\varepsilon}\cdot q^{\mathcal{D}}_{z}.$

In particular, the above means that $\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}p_{x,\mathcal{T}_{x}}\leq 2\delta$ , as otherwise

\displaystyle\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,\mathcal{T}_{x}}]-e^{\varepsilon}\cdot q^{\mathcal{D}}_{\mathcal{T}_{x}}\geq\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}[p_{x,\mathcal{T}_{x}}]/2>\delta,

contradicting the fact that $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mu)$ -dominated by $\displaystyle q^{\mathcal{D}}$ . Hence, we have

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}h_{z}(\mu)=\sum_{z\in\mathcal{M}}\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}\Big{[}p_{x,z}\cdot\mathbb{1}[f_{z}(x)>L_{z}]\Big{]}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mu}p_{x,\mathcal{T}_{x}}\leq 2\delta.

Putting everything together, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}\leq 2(\tau-1)^{2}\cdot\delta.

Final Bound.

Combining our bounds on $\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\alpha)^{2}$ and $\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow Q}\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}$ , we conclude that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}[\mathrm{KL}(R(\lambda_{v})||R(\mu))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta.\qed

6 Lower Bounds for Selection and ParityLearning

In this section, we prove lower bounds for Selection and ParityLearning in the $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ model. We begin with some notation.

6.1 Notation

For $\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}$ , let $\displaystyle\mathcal{D}_{\ell,s}$ be the uniform distribution on $\displaystyle\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}$ . Recall that $\displaystyle\mathcal{U}_{D}$ is the uniform distribution on $\displaystyle\{0,1\}^{D}$ .

For $\displaystyle j\in[D]$ , let $\displaystyle e_{j}$ be the $\displaystyle D$ -bit string such that only the $\displaystyle j$ -th bit is $\displaystyle 1$ , and the other bits are $\displaystyle 0$ . For $\displaystyle(\ell,j)\in[2]\times[D]$ , we denote by $\displaystyle\mathcal{D}_{\ell,e_{j}}$ the uniform distribution on all length- $\displaystyle D$ Boolean strings with $\displaystyle j$ -th bit being $\displaystyle\ell$ . For simplicity, we also use $\displaystyle\mathcal{D}_{\ell,j}$ to denote $\displaystyle\mathcal{D}_{\ell,e_{j}}$ when the context is clear.

We need the following simple proposition.

Proposition 6.1.

For every function $\displaystyle f\colon\{0,1\}^{D}\to\mathbb{R}$ and $\displaystyle s\in\{0,1\}^{D}$ ,

\displaystyle\hat{f}(s)=\frac{1}{2}(f(\mathcal{D}_{0,s})-f(\mathcal{D}_{1,s})).

Proof.

By definition, we have that

\displaystyle\hat{f}(s)=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}(-1)^{\langle s,x\rangle}f(x)=\frac{1}{2}(f(\mathcal{D}_{0,s})-f(\mathcal{D}_{1,s})).\qed

6.2 Lower Bound for Selection

We begin with lower bounds for Selection.

Lemma 6.2.

For $\displaystyle\varepsilon>0$ , suppose $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})$ -dominated, then we have

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq O\left(\frac{\varepsilon}{D}+\delta\right).

Proof.

To apply Theorem 5.7, we set the index set as $\displaystyle\mathcal{I}=[2]\times[D]$ , the distribution $\displaystyle\pi$ to the uniform distribution over $\displaystyle\mathcal{I}$ , $\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}\}_{v\in\mathcal{I}}$ , and $\displaystyle\mu=\mathcal{U}_{D}$ .

Clearly, $\displaystyle\mu$ $\displaystyle 2$ -dominates $\displaystyle\{\lambda_{v}\}$ . Let $\displaystyle f$ be a function such that $\displaystyle\|f\|_{\infty}=L$ and $\displaystyle f(\mu)\leq 1$ , it follows that

$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}(f(\mu)-f(\lambda_{v}))^{2}$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}(f(\mathcal{D}_{\ell,j})-f(\mathcal{U}_{D}))^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\frac{1}{4}(f(\mathcal{D}_{\ell,j})-f(\mathcal{D}_{1-\ell,j}))^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\hat{f}(\{j\})^{2}.$	(Proposition 6.1)

By Lemma 3.7, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}(f(\mu)-f(\lambda_{v}))^{2}=\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}\hat{f}(\{j\})^{2}\leq O\left(\frac{\ln L}{D}\right).

Therefore, we can set $\displaystyle W(L):=\frac{C\cdot\ln L}{D}$ for a large enough constant $\displaystyle C$ and note that $\displaystyle W$ is a concave function. By Theorem 5.7, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq O(W(2e^{\varepsilon})+\delta)\leq O\left(\frac{\varepsilon}{D}+\delta\right).\qed

Lemma 6.3.

For a public-coin randomizer $\displaystyle R$ with public randomness from $\displaystyle\mathcal{D}_{\sf pub}$ , if there is a family of distributions $\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}$ over $\displaystyle\mathcal{M}^{k}$ such that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/D),

then a public-coin protocol with randomizer $\displaystyle R$ needs at least $\displaystyle\Omega\left(\frac{D\log D}{\varepsilon}\right)$ samples to solve Selection with probability at least $\displaystyle 0.99$ .

Proof.

Let $\displaystyle L,J$ be uniformly random over $\displaystyle[2]\times[D]$ , and $\displaystyle X_{1},X_{2},\dotsc,X_{n}$ be $\displaystyle n$ i.i.d. samples from $\displaystyle D_{L,J}$ . For each $\displaystyle i\in[n]$ , we draw $\displaystyle Z_{i}$ from $\displaystyle R(X_{i})$ .

Let $\displaystyle P_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})$ be the output of the protocol with public randomness fixed to $\displaystyle\alpha$ , and let $\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n}):=(1,P_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n}))$ . Assuming $\displaystyle n\geq\Theta(\log D)$ , it follows that $\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)$ with probability at least $\displaystyle 0.99-0.01\geq 0.98$ over the randomness of $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ and randomness in $\displaystyle F_{\alpha}$ , conditioned on the event $\displaystyle L=1$ .

By Markov’s inequality, with probability at least $\displaystyle 0.8$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have that $\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)$ with probability at least $\displaystyle 0.8$ over the randomness in $\displaystyle F_{\alpha}$ conditioned on the event $\displaystyle L=1$ .

From our assumption and Markov’s inequality, it follows that with probability at least $\displaystyle 0.99$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have $\displaystyle d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/D)$ . That is, $\displaystyle R_{\alpha}$ is $\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})$ -dominated.

By a union bound, with probability at least $\displaystyle 0.99-0.2>0$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have $\displaystyle F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)$ with probability at least $\displaystyle 0.8/2\geq 0.4$ , and $\displaystyle R_{\alpha}$ is $\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})$ -dominated. In the following, we fix such an $\displaystyle\alpha$ .

By Lemma 6.2, for $\displaystyle\beta\leq O\left(\frac{\varepsilon}{D}\right)$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,j)\in[2]\times[D]}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{\ell,j})||R_{\alpha}(\mathcal{U}_{D}))]\leq\beta.

By Fano’s inequality,

	$\displaystyle\displaystyle\Pr[F_{\alpha}(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)]$	$\displaystyle\displaystyle\leq\frac{1+I((Z_{1},Z_{2},\dotsc,Z_{n});(L,J))}{\log 2D}$
		$\displaystyle\displaystyle\leq\frac{1+n\cdot I(Z_{1};(L,J))}{\log 2D}.$

We also have that

	$\displaystyle\displaystyle I(Z_{1};(L,J))=\mathrm{KL}((Z_{1},L,J)\|\|Z_{1}\otimes(L,J))$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}((Z_{1}\|L,J)\|\|Z_{1})$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{L,J})\|\|R(\mathcal{U}_{D}))$
		$\displaystyle\displaystyle\leq\beta.$

Plugging in, we obtain

\displaystyle\frac{1+n\cdot\beta}{\log 2D}\geq\Pr[F(Z_{1},Z_{2},\dotsc,Z_{n})=(L,J)]\geq 0.4.

Hence, we deduce that $\displaystyle n=\Omega(\log D\cdot\beta^{-1})=\Omega\left(\frac{D\log D}{\varepsilon}\right)$ . ∎

We are now ready to prove Theorem 1.9 (restated below).

Theorem 1.9. (restated) For any $\displaystyle\varepsilon=O(1)$ , if $\displaystyle P$ is a public-coin $\displaystyle(\varepsilon,o(1/D))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol solving Selection with probability at least $\displaystyle 0.99$ , then $\displaystyle n\geq\Omega\left(\frac{D}{k}\right)$ .

Proof.

Without loss of generality, we assume that $\displaystyle n\leq\mathop{\mathrm{poly}}(D)$ . Applying Lemma 5.6 and letting $\displaystyle\tau=\varepsilon+k(1+\ln n)$ , we get that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq\delta,

for a distribution family $\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}$ .

Therefore, by Lemma 6.3, it follows that $\displaystyle n\geq\Omega\left(\frac{D\log D}{\tau}\right)=\Omega\left(\frac{D}{k}\right)$ . ∎

6.3 Lower Bound for ParityLearning

We next prove our lower bound for ParityLearning.

Lemma 6.4.

For $\displaystyle\varepsilon>0$ , suppose $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})$ -dominated. We have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R(\mathcal{D}_{\ell,s})||R(\mathcal{U}_{D}))]\leq\frac{4e^{\varepsilon}}{2^{D}}+4\delta.

Proof.

To apply Theorem 5.7, we set the index set as $\displaystyle\mathcal{I}=[2]\times\{0,1\}^{D}$ , distribution $\displaystyle\pi$ to be the uniform distribution over $\displaystyle\mathcal{I}$ , $\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}\}_{v\in\mathcal{I}}$ , and $\displaystyle\mu=\mathcal{U}_{D}$ .

$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\|f(\mu)-f(\lambda_{v})\|^{2}$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\|f(\mathcal{D}_{\ell,s})-f(\mathcal{U}_{D})\|^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\frac{1}{4}\|f(\mathcal{D}_{\ell,s})-f(\mathcal{D}_{1-\ell,s})\|^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\hat{f}(s)^{2}.$	(Proposition 6.1)

By Lemma 3.8, it follows that

\displaystyle\sum_{s\in\{0,1\}^{D}}\hat{f}(s)^{2}=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}f(x)^{2}\leq\|f\|_{\infty}\cdot f(\mathcal{U}_{D})\leq L.

Therefore, we can set $\displaystyle W(L):=\frac{L}{2^{D}}$ . In this case, $\displaystyle W$ is clearly concave. By Theorem 5.7, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R(\mathcal{D}_{\ell,s})||R(\mathcal{U}_{D}))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta\leq\frac{4e^{\varepsilon}}{2^{D}}+4\delta.\qed

Now we apply Lemma 6.4 to the ParityLearning problem. Recall that in ParityLearning, there is a random hidden element $\displaystyle s\in\{0,1\}^{D}$ , and each user gets a random element $\displaystyle x$ together with the inner product $\displaystyle\langle s,x\rangle$ over $\displaystyle\mathbb{F}_{2}$ . Appending the label to the vector, each user indeed gets a random sample from the set $\displaystyle\{x\in\{0,1\}^{D+1}:\langle x,(s,1)\rangle=0\}$ , where $\displaystyle(s,1)$ is the $\displaystyle(D+1)$ -dimensional vector obtained by appending $\displaystyle 1$ to the end of the vector $\displaystyle s$ . In other words, each user gets a random sample from the distribution $\displaystyle\mathcal{D}_{0,(s,1)}$ .

Lemma 6.5.

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n),

where $\displaystyle n$ is the number of samples, then a public-coin protocol with randomizer $\displaystyle R$ needs at least $\displaystyle\Omega\left(2^{D}/e^{\varepsilon}\right)$ samples to solve ParityLearning with probability at least $\displaystyle 0.99$ .

Proof.

Suppose there is a public-coin protocol $\displaystyle P$ with randomizer $\displaystyle R$ solving ParityLearning with probability at least $\displaystyle 0.99$ . For a dataset $\displaystyle W$ , we use $\displaystyle P(W)$ (respectively, $\displaystyle P_{\alpha}(W)$ ) to denote the output of $\displaystyle P$ on $\displaystyle W$ (with public randomness fixed to $\displaystyle\alpha$ ).

Consider running $\displaystyle P$ on $\displaystyle n$ uniformly random samples from $\displaystyle\{0,1\}^{D+1}$ . We note that for at least a $\displaystyle 0.99$ fraction of $\displaystyle s\in\{0,1\}^{D}$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]\right]=\Pr[P(\mathcal{U}_{D+1}^{\otimes n})=s]\leq 0.01.

From the assumption that $\displaystyle P$ solves ParityLearning, for all $\displaystyle s\in\{0,1\}^{D}$ , we have

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\right]=\Pr[P(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\geq 0.99.

By a union bound, with probability at least $\displaystyle 0.5$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have

\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]\leq 0.1~{}~{}\text{and}~{}~{}\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\geq 0.9~{}~{}\text{for at least a $\displaystyle 0.5$ fraction of $\displaystyle s\in\{0,1\}^{D}$.}

(6)

From our assumption and Markov’s inequality, with probability at least $\displaystyle 0.99$ over $\displaystyle\alpha\leftarrow\mathcal{D}_{\sf pub}$ , we have that $\displaystyle d_{\varepsilon}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n)$ . That is, $\displaystyle R_{\alpha}$ is $\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})$ -dominated.

By a union bound, there exists an $\displaystyle\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})$ such that $\displaystyle R_{\alpha}$ is $\displaystyle(\varepsilon,o(1/D),\mathcal{U}_{D})$ -dominated and (6) is satisfied.

By Lemma 6.4, we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D+1}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{\ell,s})||R_{\alpha}(\mathcal{U}_{D}))]\leq\frac{4e^{\varepsilon}}{2^{D+1}}+o(1/n),

which implies

\displaystyle\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})||R_{\alpha}(\mathcal{U}_{D}))]\leq O\left(\frac{e^{\varepsilon}}{2^{D}}\right)+o(1/n).

Supposing $\displaystyle n=o(2^{D}/e^{\varepsilon})$ for the sake of contradiction, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{s\in\{0,1\}^{D}}[\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}||R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n})]\leq o(1).

Since there is at least a $\displaystyle 0.5$ fraction of $\displaystyle s\in\{0,1\}^{D}$ satisfying the conditions in (6), it follows that there exists an $\displaystyle s\in\{0,1\}^{D}$ satisfying these conditions and $\displaystyle\mathrm{KL}(R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}||R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n})]=o(1)$ , which, by Pinsker’s inequality, implies that

\displaystyle\|R_{\alpha}(\mathcal{D}_{0,(s,1)})^{\otimes n}-R_{\alpha}(\mathcal{U}_{D+1})^{\otimes n}\|_{TV}\leq o(1),

and

\displaystyle\Pr[P_{\alpha}(\mathcal{D}_{0,(s,1)}^{\otimes n})=s]\leq\Pr[P_{\alpha}(\mathcal{U}_{D+1}^{\otimes n})=s]+o(1)\leq 0.01+o(1),

a contradiction. ∎

We are now ready to prove Theorem 1.10.

Theorem 1.10. (restated) For any $\displaystyle\varepsilon=O(1)$ , if $\displaystyle P$ is a public-coin $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol solving ParityLearning with probability at least $\displaystyle 0.99$ , then $\displaystyle n\geq\Omega(2^{D/(k+1)})$ .

Proof of Theorem 1.10.

Applying Lemma 5.6 and letting $\displaystyle\tau=\varepsilon+k(1+\ln n)$ , we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\alpha\leftarrow\mathcal{D}_{\sf pub}}d_{\tau}\left(\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}_{D}}R_{\alpha}(x)||\mathcal{D}_{\alpha}\right)\leq o(1/n),

for a distribution family $\displaystyle\{\mathcal{D}_{\alpha}\}_{\alpha\in\mathrm{supp}(\mathcal{D}_{\sf pub})}$ .

By Lemma 6.5, $\displaystyle n\geq\Omega(2^{D}/e^{\tau})\geq\Omega(2^{D}/(en)^{k})$ . It then follows that $\displaystyle n^{k+1}\geq\Omega(2^{D}/e^{k})$ and consequently $\displaystyle n\geq\Omega(2^{D/(k+1)})$ . ∎

7 Lower Bound for CountDistinct with Maximum Hardness

In this section, we prove Theorem 1.1, which gives a $\displaystyle\Omega(n)$ lower bound on the error of $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols for CountDistinct.

7.1 Preliminaries

For $\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}$ , recall that $\displaystyle\mathcal{D}_{\ell,s}$ is the uniform distribution on $\displaystyle\{x\in\{0,1\}^{D}:\langle x,s\rangle=\ell\}$ . As in Section 2.2, we also use $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ to denote the mixture of $\displaystyle\mathcal{D}_{\ell,s}$ and $\displaystyle\mathcal{U}_{D}$ which outputs a sample from $\displaystyle\mathcal{D}_{\ell,s}$ with probability $\displaystyle\alpha$ and a sample from $\displaystyle\mathcal{U}_{D}$ with probability $\displaystyle 1-\alpha$ . Note that $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ can also be interpreted as the mixture of $\displaystyle\mathcal{D}_{\ell,s}$ and $\displaystyle\mathcal{D}_{1-\ell,s}$ that outputs a sample from $\displaystyle\mathcal{D}_{\ell,s}$ with probability $\displaystyle\frac{1}{2}+\frac{\alpha}{2}$ , and a sample from $\displaystyle\mathcal{D}_{1-\ell,s}$ with probability $\displaystyle\frac{1}{2}-\frac{\alpha}{2}$ . We next estimate the number of distinct elements in $\displaystyle n$ samples taken from $\displaystyle\mathcal{D}^{\alpha}_{\ell,s}$ .

Proposition 7.1.

Set $\displaystyle D=\log n$ . For $\displaystyle\alpha\in(0,0.01)$ and any $\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}$ , let $\displaystyle X$ be the number of distinct elements in $\displaystyle n$ samples drawn from $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ . We have that

\displaystyle\Pr\left[\left|X-(1-e^{-1}\cosh(\alpha))\cdot n\right|>10\sqrt{n}\right]<0.01.

Proof.

In the following, we identify the index space $\displaystyle[n]$ with $\displaystyle\{0,1\}^{\log n}$ in the natural way. For $\displaystyle i\in[n]$ , we use $\displaystyle X_{i}$ to denote the indicator of whether $\displaystyle i$ occurs in the $\displaystyle n$ samples taken from $\displaystyle\mathcal{D}^{\alpha}_{\ell,s}$ . Note that these $\displaystyle X_{i}$ ’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], and hence a Chernoff bound still applies.

Let $\displaystyle i$ be an element in the support of $\displaystyle\mathcal{D}_{\ell,s}$ . Note that $\displaystyle i$ equals one sample from $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ with probability

\displaystyle 2^{-D+1}\cdot\left(\frac{1}{2}+\frac{\alpha}{2}\right)=\frac{1+\alpha}{n}.

Therefore, $\displaystyle i$ occurs in $\displaystyle n$ i.i.d. samples from $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ with probability

	$\displaystyle\displaystyle p_{1}:=1-(1-(1+\alpha)/n)^{n}$	$\displaystyle\displaystyle=1-e^{\ln(1-(1+\alpha)/n)\cdot n}$
		$\displaystyle\displaystyle=1-e^{(-(1+\alpha)/n+\Theta((1+\alpha)/n)^{2})\cdot n}$
		$\displaystyle\displaystyle=1-e^{-(1+\alpha)}\cdot e^{\Theta(1/n)}.$

Therefore, we have that

\displaystyle\left|p_{1}-(1-e^{-(1+\alpha)})\right|\leq O(1/n).

Similarly, for an element $\displaystyle i$ in the support of $\displaystyle\mathcal{D}_{\ell,s}$ , $\displaystyle i$ equals one sample from $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ with probability

\displaystyle 2^{-D+1}\cdot\left(\frac{1}{2}-\frac{\alpha}{2}\right)=\frac{1-\alpha}{n}.

Hence, by a similar calculation, $\displaystyle i$ occurs in $\displaystyle n$ i.i.d. samples from $\displaystyle\mathcal{D}_{\ell,s}^{\alpha}$ with probability

\displaystyle\displaystyle p_{2}:=1-(1-(1-\alpha)/n)^{n}=1-e^{-(1-\alpha)}\cdot e^{\Theta(1/n)},

and

\displaystyle\left|p_{2}-(1-e^{-(1-\alpha)})\right|\leq O(1/n).

Hence, we have that

	$\displaystyle\displaystyle\mu=\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in[n]}X_{i}\right]$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in\mathrm{supp}(\mathcal{D}_{\ell,s})}X_{i}\right]+\operatornamewithlimits{\mathbb{E}}\left[\sum_{i\in\mathrm{supp}(\mathcal{D}_{\ell,s})}X_{i}\right]$
		$\displaystyle\displaystyle=(p_{1}+p_{2})\cdot\frac{n}{2}.$

Let

\displaystyle\nu=(1-e^{-1+\alpha})\cdot n/2+(1-e^{-1-\alpha})\cdot n/2=(1-e^{-1}\cosh(\alpha))\cdot n,

where the last equality holds since $\displaystyle\cosh(\alpha):=\frac{e^{\alpha}+e^{-\alpha}}{2}$ . Let $\displaystyle X=\sum_{i=1}^{n}X_{i}$ . Using the Chernoff bound and the fact that $\displaystyle|\nu-\mu|\leq O(1)$ , we have that

\displaystyle\Pr\left[\left|X-(1-e^{-1}\cosh(\alpha))\cdot n\right|>10\sqrt{n}\right]<0.01,

which completes the proof. ∎

7.2 $\displaystyle\mathrm{DP}_{\mathrm{local}}$ Lower bound

Lemma 7.2.

For any $\displaystyle\varepsilon>0$ and $\displaystyle\alpha\in[0,1]$ , if $\displaystyle R$ is $\displaystyle(\varepsilon,\delta,\mathcal{U}_{D})$ -dominated, then we have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,j\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(P^{\alpha}_{\ell,j}||Q)]\leq\alpha^{2}\cdot\frac{4e^{\varepsilon}}{2^{D}}+4\delta.

Proof.

We follow closely the proof of Lemma 6.4. To apply Theorem 5.7, we set the index set as $\displaystyle\mathcal{I}=[2]\times\{0,1\}^{D}$ , the distribution $\displaystyle\pi$ to be the uniform distribution over $\displaystyle\mathcal{I}$ , $\displaystyle\{\lambda_{v}\}_{v\in\mathcal{I}}=\{\mathcal{D}_{v}^{\alpha}\}_{v\in\mathcal{I}}$ , and $\displaystyle\mu=\mathcal{U}_{D}$ .

\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}|f(\mu)-f(\lambda_{v})|^{2}=\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}\alpha^{2}\cdot\hat{f}(s)^{2}.

Recall that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}\hat{f}(s)^{2}\leq\frac{L}{2^{D}}.

Therefore, we can set $\displaystyle W(L):=\alpha^{2}\cdot\frac{L}{2^{D}}$ . Clearly, $\displaystyle W$ is a concave function. By Theorem 5.7, it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\ell,j\in[2]\times[D]}[\mathrm{KL}(R(\mathcal{D}_{\ell,j})||R(\mathcal{U}_{D}))]\leq 2W(2e^{\varepsilon})+4(\tau-1)^{2}\cdot\delta\leq\alpha^{2}\cdot\frac{4e^{\varepsilon}}{2^{D}}+4\delta.\qed

We now show that the CountDistinct function is hard for $\displaystyle(\varepsilon,\delta)$ -local algorithms.

Theorem 1.1. (restated) For $\displaystyle\varepsilon\leq 0.49\cdot\ln n$ , if $\displaystyle P$ is a public-coin $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol, then it cannot compute $\displaystyle\textsf{\small CountDistinct}_{n,n}$ with error $\displaystyle o(n/e^{\varepsilon})$ and probability at least $\displaystyle 0.99$ .

Proof.

Let $\displaystyle D=\log n$ . We identify the input space $\displaystyle[n]$ with $\displaystyle\{0,1\}^{D}$ in the natural way. Suppose there is a public-coin $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol $\displaystyle P$ solving $\displaystyle\textsf{\small CountDistinct}_{n,n}$ with error $\displaystyle o(n/e^{\varepsilon})$ and probability at least $\displaystyle 0.99$ .

Let $\displaystyle R$ with public randomness from $\displaystyle\mathcal{D}_{\sf pub}$ be the randomizer used in $\displaystyle P$ . For a dataset $\displaystyle W$ , we use $\displaystyle P(W)$ (respectively, $\displaystyle P_{\gamma}(W)$ ) to denote the output of $\displaystyle P$ on the dataset $\displaystyle W$ (with public randomness fixed to $\displaystyle\gamma$ ).

Setting $\displaystyle\alpha^{2}=\frac{1}{20e^{\varepsilon}}$ , we let $\displaystyle\mu_{\alpha}=(1-e^{-1}\cosh(\alpha))\cdot n$ and $\displaystyle\mu_{0}=(1-e^{-1})\cdot n$ .

By our assumption on $\displaystyle P$ , Proposition 7.1 and a union bound, it follows that for every $\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}$ , we have

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\gamma\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr\left[\left|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right|\leq\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}\right]\right]\geq 0.98.

Similarly, we have

\displaystyle\operatornamewithlimits{\mathbb{E}}_{\gamma\leftarrow\mathcal{D}_{\sf pub}}\left[\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right|\leq\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}\right]\right]\geq 0.98.

Note that by our choice of $\displaystyle\varepsilon$ , we have $\displaystyle\frac{n}{1000e^{\varepsilon}}+10\sqrt{n}<\frac{n}{800e^{\varepsilon}}$ . By a union bound, it follows that with probability at least $\displaystyle 0.5$ over $\displaystyle\gamma\leftarrow\mathcal{D}_{\sf pub}$ , we have

	$\displaystyle\displaystyle\Pr\left[\left\|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right\|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8~{}~{}\text{and}~{}~{}\Pr\left[\left\|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right\|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8$
	for at least a 0.5 fraction of $\displaystyle(\ell,s)\in[2]\times\{0,1\}^{D}$ .		(7)

By the definition of public-coin $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols, we have that with probability at least $\displaystyle 0.99$ over $\displaystyle\gamma\leftarrow\mathcal{D}_{\sf pub}$ , $\displaystyle R_{\gamma}$ is $\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})$ -dominated. By a union bound, there exists a $\displaystyle\gamma$ such that $\displaystyle R_{\gamma}$ is $\displaystyle(\varepsilon,o(1/n),\mathcal{U}_{D})$ -dominated and it satisfies the condition in (7). We fix such a $\displaystyle\gamma$ .

By Lemma 7.2, it follows that

\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))]

\displaystyle\displaystyle\leq\alpha^{2}\cdot\frac{2e^{\varepsilon}}{2^{D}}+o(1/n).

Recall that $\displaystyle\alpha^{2}=\frac{1}{20e^{\varepsilon}}$ , the above further simplifies to

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in[2]\times\{0,1\}^{D}}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))]\leq\frac{1}{10n}+o(1/n).

Let $\displaystyle S$ be the set of $\displaystyle(\ell,s)$ satisfying the conditions on $\displaystyle(\ell,s)$ stated in (7). Since $\displaystyle S$ contains at aleast a $\displaystyle 0.5$ fraction of $\displaystyle[2]\times\{0,1\}^{D}$ , it follows that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\in S}[\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))]\leq\frac{1}{5n}+o(1/n).

This means that there exists a pair $\displaystyle(\ell,s)\in S$ such that $\displaystyle\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})||R_{\gamma}(\mathcal{U}_{D}))\leq 1/5n+o(1/n)$ . We fix such a pair $\displaystyle(\ell,s)$ .

We have $\displaystyle\mathrm{KL}(R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n}||R_{\gamma}(\mathcal{U}_{D})^{\otimes n})\leq 1/5+o(1).$ By Pinsker’s inequality, it follows that

\displaystyle\|R_{\gamma}(\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n}-R_{\gamma}(\mathcal{U}_{D})^{\otimes n})\|_{TV}\leq\sqrt{1/2\cdot 1/5+o(1)}\leq 0.4.

Since $\displaystyle(\ell,s)\in S$ , it follows that

\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{\alpha}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq\Pr\left[\left|P_{\gamma}((\mathcal{D}^{\alpha}_{\ell,s})^{\otimes n})-\mu_{\alpha}\right|<\frac{n}{800e^{\varepsilon}}\right]-0.4\geq 0.4.

(8)

On the other hand, we also have

\Pr\left[\left|P_{\gamma}(\mathcal{U}_{D}^{\otimes n})-\mu_{0}\right|<\frac{n}{800e^{\varepsilon}}\right]\geq 0.8.

(9)

Note that $\displaystyle|\mu_{\alpha}-\mu_{0}|=e^{-1}(\cosh(\alpha)-1)\cdot n\geq\alpha^{2}/2\cdot e^{-1}\cdot n>n/200e^{\varepsilon}$ . Hence (8) and (9) give a contradiction. ∎

8 Low-Message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols for CountDistinct

In this section, we present our low-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocols for CountDistinct, thereby proving Theorem 1.6.

In Section 8.1, we review the previous protocol of [BCJM20], and discuss some intuitions underlying our improvement. In Section 8.2, we introduce some necessary definitions and technical tools. Next, in Section 8.3 we present our private-coin protocol (stated in Theorem 8.4) for CountDistinct with error $\displaystyle\tilde{O}(\sqrt{D})$ , which uses $\displaystyle 1/2+o(1)$ message per user in expectation when the input universe size is below $\displaystyle n/\mathop{\mathrm{polylog}}(n)$ . We will also show that a simple modification of this protocol is $\displaystyle(\ln(n)+O(1))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ , thereby proving Theorem 1.3. Finally, based on the private-coin protocol, in Section 8.4 we prove Theorem 1.6 by presenting our public-coin protocol for CountDistinct, which uses less than $\displaystyle 1$ message per user in expectation without any restriction on the universe size.

8.1 Intuition

We now turn to sketch the main ideas behind Theorem 8.4 and Theorem 1.6. It would be instructive to review the $\displaystyle\widetilde{O}(D)$ -message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol solving $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle O(\sqrt{D})$ from [BCJM20].

The $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ Model. To gain more insights about their protocol, we consider the following mod 2 shuffle model ( $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ ), where two messages of the same content “cancel each other”, i.e., the transcript is now a random permutation of messages that appear an odd number of times.

The DP requirement now applies to this new version of transcript. The same holds for the analyzer, who now can only see the new version of transcript. [BCJM20] first gave a $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ protocol for CountDistinct, and then adapted that protocol to the standard $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ model using the Ishai et al. protocol for secure aggregation [IKOS06].¹⁵¹⁵15They did not explicit specify their protocol in the $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ model, but it is implicit in their proof of security.

Low-Message Protocol in $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ . The $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ protocol of [BCJM20] (referred as $\displaystyle P_{\sf mod2}$ in what follows) first sets a parameter $\displaystyle q=\Theta(1/n)$ so that $\displaystyle\Pr[\mathsf{Bin}(n,q)\equiv 1\pmod{2}]=1/(2e^{\varepsilon/2})$ . Next, for each user holding an element $\displaystyle x\in[D]$ , the user first sends $\displaystyle x$ with probability $\displaystyle 1/2$ . Then for each $\displaystyle j\in[D]$ , the user sends message $\displaystyle j$ with probability $\displaystyle q$ . All these events are independent.

Finally, if there are $\displaystyle z$ messages in the transcript (i.e., there are $\displaystyle z$ messages occurring an odd number of times in the original transcript), then the analyzer outputs $\displaystyle(2\cdot z\cdot e^{\varepsilon/2}-D)/(e^{\varepsilon/2}-1)$ as the estimate. Note that a user sends $\displaystyle 1/2+D\cdot q=1/2+O(D/n)$ message in expectation.

Analysis of the Protocol $\displaystyle P_{\sf mod2}$ . It is shown in [BCJM20] that the above protocol is $\displaystyle\varepsilon$ -DP and solves $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle O(\sqrt{D})$ . Here we briefly outline the intuition behind it.

Let $\displaystyle S$ be the set consisting of all inputs of the users. We can see that every $\displaystyle i\in S$ belongs to the transcript with probability exactly $\displaystyle 1/2$ ; on the other hand, every $\displaystyle i\in[D]\setminus S$ belongs to the transcript with probability exactly $\displaystyle\Pr[\mathsf{Bin}(n,q)\equiv 1\pmod{2}]=\frac{1}{2e^{\varepsilon/2}}$ . Moreover, all these events are independent. Therefore, a simple calculation shows that $\displaystyle(2\operatornamewithlimits{\mathbb{E}}[z]e^{\varepsilon/2}-D)/(e^{\varepsilon/2}-1)=|S|$ , and the accuracy follows from a Chernoff bound. As for the DP guarantee, changing the input of one user only affects the distributions of two messages in the transcript, and it only changes each message’s occurrence probability in the transcript from $\displaystyle 1/2$ to $\displaystyle 1/2e^{\varepsilon/2}$ or vice versa.

From $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ to $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ . To obtain an actual $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol from $\displaystyle P_{\sf mod2}$ , the protocol from [BCJM20] (which we henceforth denote by $\displaystyle P_{\sf BCJM}$ ) runs $\displaystyle D$ copies of the protocol for securely computing sum over $\displaystyle\mathbb{F}_{2}$ [IKOS06], such that the $\displaystyle i$ -th protocol $\displaystyle P_{i}$ aims to simulate the number of occurrences of message $\displaystyle i$ modulo $\displaystyle 2$ . For each user $\displaystyle i$ , if it were to send a message $\displaystyle i$ in $\displaystyle P_{\sf mod2}$ , it sends one in $\displaystyle P_{i}$ ; otherwise it sends zero in $\displaystyle P_{i}$ .

Since the [IKOS06] protocol for computing sum over $\displaystyle\mathbb{F}_{2}$ requires $\displaystyle O\left(\frac{\log(1/\delta)}{\log n}+1\right)$ messages from each user [GMPV20, BBGN20], each user needs to send $\displaystyle O\left(D\cdot\left(\frac{\log(1/\delta)}{\log n}+1\right)\right)$ messages in total. Moreover, from the security condition of $\displaystyle P_{i}$ , for each message $\displaystyle i$ the transcript only reveals the parity of its number of occurrences, which is exactly what we need in order to simulate $\displaystyle\mathrm{DP}_{\mathrm{mod2\text{-}shuffle}}$ protocols.

Our Improvement. Note that $\displaystyle P_{\sf BCJM}$ requires significantly more messages per user than that of $\displaystyle P_{\sf mod2}$ . Our goal here is to compile $\displaystyle P_{\sf mod2}$ to $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ in a much more efficient way, ideally with no overhead. In $\displaystyle P_{\sf mod2}$ each user sends only $\displaystyle 1/2+O(D/n)$ message. This means that when translating to $\displaystyle P_{\sf BCJM}$ , users end up sending many zero messages in the $\displaystyle P_{i}$ subprotocols, which is wasteful.

Our crucial idea for improving on the aforementioned protocol is a very simple yet effective alternative to the secure aggregation protocol over $\displaystyle\mathbb{F}_{2}$ of [IKOS06] used in $\displaystyle P_{\sf BCJM}$ . In our new subprotocol $\displaystyle P_{i}$ , if a user were to send a message $\displaystyle i$ in $\displaystyle P_{\sf mod2}$ , it sends one to $\displaystyle P_{i}$ ; otherwise it draws $\displaystyle\lambda$ from a noise distribution $\displaystyle\mathcal{D}$ (such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathcal{D}]\approx\mathop{\mathrm{polylog}}(\delta^{-1})/n$ ) and sends $\displaystyle 2\lambda$ many ones to $\displaystyle P_{i}$ . Clearly, our new $\displaystyle P_{i}$ still maintains the parity of occurrences of each messages, and the expected number of messages is roughly $\displaystyle 2\cdot\operatornamewithlimits{\mathbb{E}}[\mathcal{D}]\cdot D+1/2=O(\mathop{\mathrm{polylog}}(\delta^{-1}))\cdot D/n+1/2$ . To show that the resulting protocol is $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ , we build on the techniques of [GKMP20], which show that the noise added can hide the contribution of a single user.

8.2 Preliminaries

We first recall the definition of the negative binomial distribution.

Definition 8.1.

Let $\displaystyle r>0$ and $\displaystyle p\in[0,1]$ , the negative binomial distribution $\displaystyle\mathsf{NB}(r,p)$ is defined by $\displaystyle\Pr[\mathsf{NB}(r,p)=k]=\binom{k+r-1}{k}(1-p)^{r}p^{k}$ for each non-negative integer $\displaystyle k$ .¹⁶¹⁶16For a real number $\displaystyle\alpha$ , $\displaystyle\binom{\alpha}{k}:=\prod_{i=0}^{k-1}\frac{\alpha-i}{i+1}$ .

We recall the following key properties of the negative binomial distribution: (1) For $\displaystyle\alpha,\beta>0$ and $\displaystyle p\in[0,1]$ , $\displaystyle\mathsf{NB}(\alpha,p)+\mathsf{NB}(\beta,p)$ has the same distribution as $\displaystyle\mathsf{NB}(\alpha+\beta,p)$ ; (2) $\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r,p)]=\frac{pr}{1-p}$ .

We will need the following lemma from [GKMP20].

Lemma 8.2.

For any $\displaystyle\varepsilon>0,\delta\in(0,1)$ , and $\displaystyle\Delta\in\mathbb{N}$ , let $\displaystyle p=e^{-0.1\varepsilon/\Delta}$ and $\displaystyle r=50\cdot e^{\varepsilon/\Delta}\cdot\log(\delta^{-1})$ . For any $\displaystyle k\in\{-\Delta,-\Delta+1,\dotsc,\Delta-1,\Delta\}$ , $\displaystyle d_{\varepsilon}(k+\mathsf{NB}(r,p)||\mathsf{NB}(r,p))\leq\delta$ .

The following is a simple corollary of Item (2) of Proposition 3.4 and Lemma 8.2.

Corollary 8.3.

For any $\displaystyle\varepsilon>0,\delta\in(0,1)$ , and $\displaystyle\Delta\in\mathbb{N}$ , let $\displaystyle p$ and $\displaystyle r$ be as in Lemma 8.2. For any two distributions $\displaystyle X$ and $\displaystyle Y$ on $\displaystyle\{0,1,2,\dotsc,\Delta\}$ , $\displaystyle d_{\varepsilon}(X+\mathsf{NB}(r,p)||Y+\mathsf{NB}(r,p))\leq\delta$ .

8.3 A Private-Coin Base Protocol

Recall that $\displaystyle\textsf{\small CountDistinct}_{n,D}$ denotes the restriction of CountDistinct such that every user gets an input from $\displaystyle[D]$ , and the goal is compute the number of distinct elements among all users.

We are now ready to prove Theorem 8.4, which is the private-coin case of Theorem 1.6. To simplify the privacy analysis of the protocol and ease its application in Section 8.4, we also allow the input to be $\displaystyle 0$ , which means that the user’s input is not counted.

Theorem 8.4.

For any $\displaystyle\varepsilon\leq O(1)$ and $\displaystyle\delta\leq 1/n$ , there is a private-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol computing $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle O\left(\sqrt{D}\cdot\varepsilon^{-1}\right)$ with probability at least $\displaystyle 0.99$ . Moreover, the expected number of messages sent by each user is $\displaystyle\frac{1}{2}+O\left(\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}\right)$ .

Proof.

Without loss of generality, we can assume that $\displaystyle\varepsilon\leq 1$ . The algorithm requires several global constants that only depend on the values of $\displaystyle n,\varepsilon$ , and $\displaystyle\delta$ . Algorithm 1 specifies these constants. Here, $\displaystyle c_{0}$ is a sufficiently large constant to be specified later.

Input:

\displaystyle n

is the number of users and the pair

\displaystyle(\varepsilon,\delta)

specifies the DP guarantee.

\displaystyle\varepsilon_{0}=\min(\varepsilon/6,0.01)

;

\displaystyle\Delta=\left\lceil c_{0}\cdot\log\delta^{-1}\ln(\varepsilon_{0}^{-1})+1\right\rceil

;

\displaystyle p=e^{-0.1\varepsilon_{0}/\Delta}

;

\displaystyle r=50\cdot e^{\varepsilon_{0}/\Delta}\cdot\log(10\delta^{-1})

;

\displaystyle q=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}

;

Algorithm 1 Set-Global-Constants(

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

)

Next, we specify the randomizer and the analyzer of the protocol in Algorithm 2 and Algorithm 3 respectively.

Input:

\displaystyle x\in\{0\}\cup[D]

is the user’s input.

\displaystyle D

is the universe size.

1 Set-Global-Constants(

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

);

2 Toss a uniformly random coin to get

\displaystyle v\in\{0,1\}

;

3 if $\displaystyle v=1$ and $\displaystyle x\neq 0$ then

4 send message

\displaystyle(x)

;

6for $\displaystyle i\in[D]$ do

7 Let

\displaystyle y\leftarrow\mathsf{Ber}(q^{\prime})

;

8 if $\displaystyle y=1$ then

9 send message

\displaystyle(i)

;

11 Let

\displaystyle\eta\leftarrow\mathsf{NB}(r/n,p)

;

12 Send

\displaystyle 2\cdot\eta

messages

\displaystyle(i)

;

Algorithm 2 Randomizer(

\displaystyle x

\displaystyle D

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

)

Input:

\displaystyle S

is the multi-set of messages.

\displaystyle D

is the universe size.

1 Set-Global-Constants(

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

);

3for $\displaystyle i\in[D]$ do

4 Let

\displaystyle y_{i}

be the number of message

\displaystyle(i)

\displaystyle S

;

\displaystyle C_{i}=y_{i}\bmod{2}

;

\displaystyle C=\sum_{i=1}^{D}C_{i}

;

\displaystyle z=\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1}

;

10 return $\displaystyle z$ ;

Algorithm 3 Analyzer(

\displaystyle S

\displaystyle D

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

)

Accuracy Analysis.

We first analyze the error of our protocol. Let $\displaystyle E$ be the set $\displaystyle\{x_{i}\}_{i\in[n],x_{i}\neq 0}$ . Recall that the goal is to estimate $\displaystyle|E|$ .

For each $\displaystyle i\in[D]$ , we analyze the distribution of the random variable $\displaystyle C_{i}$ in Algorithm 3. We observe that: (1) if $\displaystyle i\in E$ , then $\displaystyle C_{i}$ is distributed uniformly at random over $\displaystyle\{0,1\}$ ; (2) if $\displaystyle i\notin E$ , then $\displaystyle C_{i}$ is distributed as $\displaystyle\mathsf{Ber}\left(\frac{1}{2e^{\varepsilon_{0}}}\right)$ by Lemma 8.5; (3) $\displaystyle\{C_{i}\}_{i\in[D]}$ are independent.

Lemma 8.5 ([BCJM20, Lemma 3.5]).

Let $\displaystyle n,q^{\prime}$ be specified as in Global-Constants $\displaystyle(n,\varepsilon,\delta)$ . Then, $\displaystyle[\mathsf{Bin}(n,q^{\prime})\mod 2]$ is distributed identically to $\displaystyle\mathsf{Ber}\left(\frac{1}{2e^{\varepsilon_{0}}}\right)$ .

Hence, we have that $\displaystyle\operatornamewithlimits{\mathbb{E}}[C]=|E|\cdot\frac{1}{2}+(D-|E|)\cdot\frac{1}{2e^{\varepsilon_{0}}}$ . Plugging in the equation defining the output $\displaystyle z$ , we have $\displaystyle\operatornamewithlimits{\mathbb{E}}[z]=\operatornamewithlimits{\mathbb{E}}\left[\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1}\right]=|E|$ . An application of Hoeffding’s inequality implies that

\displaystyle\Pr\left[|z-|E||>c\cdot(\varepsilon_{0})^{-1}\cdot\sqrt{D}\right]<0.01,

for a sufficiently large constant $\displaystyle c$ . Hence, with probability at least $\displaystyle 0.99$ , the error of the protocol is less than $\displaystyle c\cdot(\varepsilon_{0})^{-1}\cdot\sqrt{D}=O(\sqrt{D}\cdot\varepsilon^{-1})$ .

Privacy Analysis.

We now prove that our protocol is indeed $\displaystyle(\varepsilon,\delta)$ -DP. Note that the multi-set of messages $\displaystyle S$ can be described by integers $\displaystyle(y_{i})_{i\in[D]}$ (corresponding to the histogram of the messages).

Consider two neighboring datasets $\displaystyle x=(x_{1},x_{2},\dotsc,x_{n})$ and $\displaystyle x^{\prime}=(x_{1}^{\prime},x_{2},\dotsc,x_{n})$ (without loss of generality, we assume that they differ at the first user). Let $\displaystyle Y$ and $\displaystyle Y^{\prime}$ be the corresponding distributions of $\displaystyle(y_{i})_{i\in[D]}$ given input datasets $\displaystyle x$ and $\displaystyle x^{\prime}$ . The goal is to show that they satisfy the $\displaystyle(\varepsilon,\delta)$ -DP constraint. That is, we have to establish that $\displaystyle d_{\varepsilon}(Y||Y^{\prime})\leq\delta$ .

To simplify the analysis, we introduce another dataset $\displaystyle\bar{x}=(0,x_{2},\dotsc,x_{n})$ , and let $\displaystyle\bar{Y}$ be the corresponding distribution of $\displaystyle(y_{i})_{i\in[D]}$ given input dataset $\displaystyle\bar{x}$ . By the composition rule of $\displaystyle(\varepsilon,\delta)$ -DP, it suffices to show that the pairs $\displaystyle(Y,\bar{Y})$ and $\displaystyle(\bar{Y},Y^{\prime})$ satisfy $\displaystyle(\varepsilon/2,\delta/3)$ -DP (note that $\displaystyle\varepsilon<1$ , and $\displaystyle\delta/3+e^{\varepsilon/2}\cdot\delta/3\leq\delta$ ). By symmetry, it suffices to consider the pair $\displaystyle(Y,\bar{Y})$ and prove that $\displaystyle d_{\varepsilon/2}(Y||\bar{Y})\leq\delta/3$ .

Let $\displaystyle i=x_{1}$ , and $\displaystyle m_{i}$ be the number of times that $\displaystyle i$ appears in $\displaystyle x_{2},\dotsc,x_{n}$ . First note that all coordinates in both $\displaystyle Y$ and $\displaystyle\bar{Y}$ are independent, and furthermore the marginal distribution of $\displaystyle Y$ and $\displaystyle\bar{Y}$ on coordinates in $\displaystyle[D]\setminus\{i\}$ are identical. Hence, by Item (1) of Proposition 3.4, it suffices to establish that $\displaystyle Y_{i}$ and $\displaystyle\bar{Y}_{i}$ satisfy $\displaystyle(\varepsilon/2,\delta/3)$ -DP.

The distribution of $\displaystyle\bar{Y}_{i}$ is $\displaystyle\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Bin}(m_{i},1/2)$ , and the distribution of $\displaystyle Y_{i}$ is $\displaystyle\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Bin}(m_{i}+1,1/2)$ .¹⁷¹⁷17 Recall that for two random variables $\displaystyle X$ and $\displaystyle Y$ , we use $\displaystyle X+Y$ to denote the random variable distributed as a sum of two independent samples from $\displaystyle X$ and $\displaystyle Y$ . Since $\displaystyle\mathsf{Bin}(m_{i}+1,1/2)=\mathsf{Bin}(m_{i},1/2)+\mathsf{Ber}(1/2)$ , it suffices to consider the case where $\displaystyle m_{i}=0$ by Item (1) of Proposition 3.4.

We need the following lemma, whose proof is deferred until we finish the proof of Theorem 8.4.

Lemma 8.6.

Let $\displaystyle n,q^{\prime},\lambda$ be specified as in Set-Global-Constants $\displaystyle(n,\varepsilon,\delta)$ , $\displaystyle X=\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)$ , and $\displaystyle Y=\mathsf{Bin}(n,q^{\prime})+2\cdot\mathsf{NB}(r,p)+\mathsf{Ber}(1/2)$ . Then,

\displaystyle d_{\varepsilon/2}(X||Y)\leq\delta/3\quad\text{and}\quad d_{\varepsilon/2}(Y||X)\leq\delta/3.

By Lemma 8.6 and previous discussions, it follows that $\displaystyle D_{\varepsilon}(Y||Y^{\prime})\leq\delta$ , which shows that our protocol is $\displaystyle(\varepsilon,\delta)$ -DP as desired.

In the following, we will need the proposition below which gives us an estimate on $\displaystyle q^{\prime}$ .

Proposition 8.7.

Let $\displaystyle n,q^{\prime},\varepsilon_{0}$ be specified as in Set-Global-Constants $\displaystyle(n,\varepsilon,\delta)$ . Then, $\displaystyle q^{\prime}\leq O(\ln(\varepsilon_{0}^{-1})/n)$ .

Proof.

Since $\displaystyle\varepsilon_{0}\leq 0.01$ , we have $\displaystyle e^{-\varepsilon_{0}}\leq 1-\varepsilon_{0}/2$ . Hence, $\displaystyle 1-e^{-\varepsilon_{0}}\geq\varepsilon_{0}/2$ . Plugging in the definition of $\displaystyle q^{\prime}$ , it follows that $\displaystyle(1-e^{-\varepsilon_{0}})^{1/n}\geq e^{\ln(\varepsilon_{0}/2)/n}\geq 1+\ln(\varepsilon_{0}/2)/n$ . Finally, it follows that

\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}\leq-\ln(\varepsilon_{0}/2)/2n=O(\ln(\varepsilon_{0}^{-1})/n).\qed

Efficiency Analysis.

We now analyze the message complexity of our protocol. Note that

\displaystyle\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r/n,p)]=\frac{1}{n}\cdot\frac{pr}{1-p}=O\left(\frac{1}{n}\cdot\frac{\Delta}{\varepsilon_{0}}\cdot\log(1/\delta)\right)=O\left(\frac{1}{n}\cdot\varepsilon^{-1}\ln(2/\varepsilon)\cdot\log(1/\delta)^{2}\right).

By a straightforward calculation, each user sends

\displaystyle\frac{1}{2}+O(D\cdot\operatornamewithlimits{\mathbb{E}}[\mathsf{NB}(r/n,p)]+D\cdot q^{\prime})\leq\frac{1}{2}+O\left(\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}\right)

messages in expectation. ∎

Finally, we prove Lemma 8.6.

Proof of Lemma 8.6.

We consider bounding $\displaystyle D_{\varepsilon/2}(X||Y)$ first. Note that since $\displaystyle q^{\prime}=O(\ln(\varepsilon_{0}^{-1})/n)$ by Proposition 8.7, we set the constant $\displaystyle c_{0}$ so that

\displaystyle\Pr\left[\mathsf{Bin}(n,q^{\prime})>c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})\right]\leq\delta/10.

Recall that $\displaystyle\Delta=\lceil c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})+1\rceil$ , and note that our choices of $\displaystyle r$ and $\displaystyle p$ satisfy Lemma 8.2 with privacy parameters $\displaystyle\varepsilon_{0}\leq\varepsilon/6$ and $\displaystyle\delta/10$ .

Now, let $\displaystyle A=\mathsf{Bin}(n,q^{\prime})$ , $\displaystyle N=\mathsf{NB}(r,p)$ , and $\displaystyle B=\mathsf{Ber}(1/2)$ .

To apply Item (2) of Proposition 3.4, we are going to decompose $\displaystyle X=A+2\cdot N$ and $\displaystyle Y=A+2\cdot N+B$ into a weighted sum of three sub-distributions.

Decomposition of $\displaystyle X=A+2\cdot N$ .

We define three events on $\displaystyle A$ as follows:

\displaystyle\mathcal{E}_{big}=[A>c_{0}\cdot\log(\delta^{-1})],\quad\mathcal{E}_{even}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A\equiv 0\bmod 2],

and

\displaystyle\mathcal{E}_{odd}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A\equiv 1\bmod 2].

We let $\displaystyle\alpha_{big}=\Pr_{A}[\mathcal{E}_{big}]$ , $\displaystyle\alpha_{even}=\Pr_{A}[\mathcal{E}_{even}]$ , $\displaystyle\alpha_{odd}=\Pr_{A}[\mathcal{E}_{even}]$ .

From our choice of $\displaystyle c_{0}$ , we have $\displaystyle\alpha_{big}=\Pr\left[\mathsf{Bin}(n,q^{\prime})>c_{0}\cdot\log\delta^{-1}\ln(\varepsilon^{-1})\right]\leq\delta/10$ . Let $\displaystyle q=\frac{1}{2e^{\varepsilon_{0}}}$ . By Lemma 8.5, it follows that $\displaystyle|\alpha_{odd}-q|\leq\delta/10$ and $\displaystyle|\alpha_{even}-(1-q)|\leq\delta/10$ .

Therefore, let $\displaystyle A_{big}:=A|\mathcal{E}_{big}$ , $\displaystyle A_{even}:=A|\mathcal{E}_{even}$ and $\displaystyle A_{odd}:=A|\mathcal{E}_{odd}$ . We can now decompose $\displaystyle A+2N$ as a mixture of components $\displaystyle A_{big}+2N$ , $\displaystyle A_{even}+2N$ and $\displaystyle A_{odd}+2N$ with corresponding mixing weights $\displaystyle\alpha_{big}$ , $\displaystyle\alpha_{even}$ and $\displaystyle\alpha_{odd}$ .

Decomposition of $\displaystyle Y=A+2\cdot N+B$ .

Now, we define three events on $\displaystyle(A,B)$ as follows

\displaystyle\widetilde{\mathcal{E}}_{big}=[A>c_{0}\cdot\log(\delta^{-1})],\quad\widetilde{\mathcal{E}}_{even}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A+B\equiv 0\bmod 2],

and

\displaystyle\widetilde{\mathcal{E}}_{odd}=[A\leq c_{0}\cdot\log(\delta^{-1})\wedge A+B\equiv 1\bmod 2].

Similarly, we let $\displaystyle\beta_{big}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{big}]$ , $\displaystyle\beta_{even}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{even}]$ , $\displaystyle\beta_{odd}=\Pr_{A,B}[\widetilde{\mathcal{E}}_{even}]$ .

By our choice of $\displaystyle c_{0}$ , we have $\displaystyle\beta_{big}\leq\delta/10$ . Since $\displaystyle\Pr[A+B\equiv 1\bmod 2]=1/2$ , it follows that $\displaystyle|\beta_{even}-1/2|\leq\delta/10$ and $\displaystyle|\beta_{odd}-1/2|\leq\delta/10$ .

Let $\displaystyle(A+B)_{big}:=(A+B)|\widetilde{\mathcal{E}}_{big}$ , $\displaystyle(A+B)_{even}:=(A+B)|\widetilde{\mathcal{E}}_{even}$ and $\displaystyle(A+B)_{odd}:=(A+B)|\widetilde{\mathcal{E}}_{odd}$ . We therefore decompose $\displaystyle A+2N+B$ as a mixture of components $\displaystyle(A+B)_{big}+2N$ , $\displaystyle(A+B)_{even}+2N$ and $\displaystyle(A+B)_{odd}+2N$ with mixing weights $\displaystyle\beta_{big}$ , $\displaystyle\beta_{even}$ and $\displaystyle\beta_{odd}$ .

Bounding $\displaystyle d_{\varepsilon/2}(X||Y)$ .

By Item (2) of Proposition 3.4, we have that

	$\displaystyle\displaystyle d_{\varepsilon/2}(X\|\|Y)\leq$	$\displaystyle\displaystyle\alpha_{big}$
	$\displaystyle\displaystyle+$	$\displaystyle\displaystyle\alpha_{even}\cdot d_{\varepsilon/2+\ln(\beta_{even}/\alpha_{even})}(A_{even}+2N\|\|(A+B)_{even}+2N)$
	$\displaystyle\displaystyle+$	$\displaystyle\displaystyle\alpha_{odd}\cdot d_{\varepsilon/2+\ln(\beta_{odd}/\alpha_{odd})}(A_{odd}+2N\|\|(A+B)_{odd}+2N).$

Now, note that $\displaystyle\frac{\beta_{odd}}{\alpha_{odd}}\geq 1$ , and $\displaystyle\frac{\beta_{even}}{\alpha_{even}}\geq\frac{1/2-\delta/10}{1-q+\delta/10}\geq e^{-2\varepsilon_{0}}$ . It follows that $\displaystyle\varepsilon/2+\ln(\beta_{even}/\alpha_{even})\geq\varepsilon/2-2\varepsilon_{0}\geq\varepsilon_{0}$ (since $\displaystyle\varepsilon_{0}\leq\varepsilon/6$ ), and $\displaystyle\varepsilon/2+\ln(\beta_{odd}/\alpha_{odd})\geq\varepsilon/2\geq\varepsilon_{0}$ .

Hence by Corollary 8.3, we have that

\displaystyle d_{\varepsilon/2}(X||Y)\leq\delta/10+d_{\varepsilon_{0}}(A_{odd}+2N||(A+B)_{odd}+2N)+d_{\varepsilon_{0}}(A_{even}+2N||(A+B)_{even}+2N)\leq\delta/3.

By a similar calculation, we can also bound $\displaystyle d_{\varepsilon}(Y||X)$ by $\displaystyle\delta/3$ . ∎

Extension to Robust Shuffle Privacy.

Now we briefly discuss how to generalize the analysis of the above protocol in order to show that it also satisfies the stronger robust shuffle privacy condition. We first need the following formal definition of robust shuffle privacy.

Definition 8.8 ([BCJM20]).

A protocol $\displaystyle P=(R,S,A)$ is $\displaystyle(\varepsilon,\delta,\gamma)$ -robustly shuffle differential private if, for all $\displaystyle n\in\mathbb{N}$ and $\displaystyle\gamma^{\prime}\geq\gamma$ , the algorithm $\displaystyle S_{\gamma^{\prime}n}\circ R^{\gamma^{\prime}n}$ is $\displaystyle(\varepsilon,\delta)$ -DP. In other words, $\displaystyle P$ guarantees $\displaystyle(\varepsilon,\delta)$ -shuffle privacy whenever at least a $\displaystyle\gamma$ fraction of users follow the protocol.

Note that while the above definition requires the privacy condition to be satisfied whenever there is at least a $\displaystyle\gamma$ fraction of users participating, the accuracy condition is only required when all users participate. That is, if some users drop from the protocol, then the analyzer does not need to output an accurate estimate of CountDistinct.

Theorem 8.9.

For two constants $\displaystyle\gamma,\varepsilon\in(0,1]$ , and $\displaystyle\delta\leq 1/n$ , there is an $\displaystyle(\varepsilon,\delta,\gamma)$ -robustly shuffle differentially private protocol solving $\displaystyle\textsf{\small CountDistinct}_{n,D}$ with error $\displaystyle O_{\gamma,\varepsilon}\left(\sqrt{D}\right)$ and with probability at least $\displaystyle 0.99$ . Moreover, the expected number of messages sent by each user is $\displaystyle\frac{1}{2}+O_{\gamma,\varepsilon}\left(\log(1/\delta)^{2}\cdot\frac{D}{n}\right)$ .¹⁸¹⁸18To make the notation clean, we choose not to analyze the exact dependence on $\displaystyle\varepsilon$ and $\displaystyle\gamma$ .

Proof Sketch.

To make the algorithm in Theorem 8.4 robustly shuffle private, we need the following modifications:

•

In Algorithm 1, we set $\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/(\gamma n)}}{2}$ , instead of $\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}$ .
•

In Algorithm 2, we let $\displaystyle\eta\leftarrow\mathsf{NB}(r/(\gamma n),p)$ , instead of $\displaystyle\eta\leftarrow\mathsf{NB}(r/n,p)$ .
•

In Algorithm 3, we set $\displaystyle z=\frac{2C\tau-D}{\tau-1}$ for $\displaystyle\tau=\frac{1}{1-(1-e^{\varepsilon_{0}})^{1/\gamma}}$ , instead of $\displaystyle z=\frac{2Ce^{\varepsilon_{0}}-D}{e^{\varepsilon_{0}}-1}$ .

The first two modifications guarantee that there is enough noise even when only $\displaystyle\gamma\cdot n$ users participate, so that the privacy analysis of Theorem 8.4 goes through. We now show that the last modification allows us to obtain an accurate estimate of $\displaystyle\textsf{\small CountDistinct}_{n,D}$ when all users participate. In the following, we will use the same notation as in the proof of Theorem 8.4. Note that we have

\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/(\gamma n)}}{2}=\frac{1-(1-\tau^{-1})^{1/n}}{2}.

Hence by Lemma 8.5, $\displaystyle C_{i}$ is distributed as $\displaystyle\mathsf{Ber}(1/2\tau)$ when $\displaystyle i\notin E$ . A similar calculation then shows that the error can be bounded by $\displaystyle O_{\tau}(\sqrt{D})=O_{\varepsilon,\gamma}(\sqrt{D})$ . ∎

$\displaystyle\ln(O(n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ Protocol for CountDistinct.

Finally, we show that the protocol from Theorem 8.4 is also $\displaystyle\ln(O(n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ with a simple modification, which proves Theorem 1.3 (restated below).

Theorem 1.3. (restated) There is a $\displaystyle(\ln(n)+O(1))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocol computing $\displaystyle\textsf{\small CountDistinct}_{n,n}$ with error $\displaystyle O(\sqrt{n})$ .

Proof Sketch.

Let $\displaystyle D=n$ . We consider the following modification of Algorithm 2.

Input:

\displaystyle x\in[D]

is the user’s input.

\displaystyle D

is the universe size.

\displaystyle\varepsilon_{0}=1

\displaystyle q^{\prime}=\frac{1-(1-e^{-\varepsilon_{0}})^{1/n}}{2}

;

2 Toss a uniformly random coin to get

\displaystyle v\in\{0,1\}

;

3 if $\displaystyle v=1$ then

4 send message

\displaystyle(x)

;

6for $\displaystyle i\in[D]\setminus\{x\}$ do

7 Let

\displaystyle y\leftarrow\mathsf{Ber}(q^{\prime})

;

8 if $\displaystyle y=1$ then

9 send message

\displaystyle(i)

;

Algorithm 4 Randomizer(

\displaystyle x

\displaystyle D

\displaystyle n

\displaystyle\varepsilon

\displaystyle\delta

)

That is, in Algorithm 4 we remove the noise messages sampled from the distribution $\displaystyle 2\cdot\mathsf{NB}(r/n,p)$ . Also, we do not send the same message more than once (the loop over $\displaystyle i$ skips the element $\displaystyle x$ ).

When viewing it as a local protocol, we can assume that each user first collects all the messages it would send in Algorithm 4, and then simply outputs the histogram (so our new local randomizer only sends a single message). The analyzer in the local protocol can then aggregate these histograms, and apply the analyzer in Algorithm 3. By the same accuracy proof as in Theorem 8.4, it follows that the protocol achieves error $\displaystyle O(\sqrt{n})$ with probability at least $\displaystyle 0.99$ . So it only remains to prove that the protocol is $\displaystyle\ln(O(n))$ - $\displaystyle\mathrm{DP}_{\mathrm{local}}$ .

We let $\displaystyle R$ be the randomizer in Algorithm 4, and we use $\displaystyle\mathsf{Hist}(R(x))$ to denote the distribution of the histogram of the messages output by $\displaystyle R$ , which is exactly the output distribution of our new local randomizer.

Without loss of generality, it suffices to show that for all possible histograms $\displaystyle z\in\{0,1\}^{D}$ (note that Algorithm 4 does not send a message more than once), it holds that

\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}}\leq O(n).

Note that $\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}}$ only depends on the first two bits of $\displaystyle z$ . By enumerating all possible combination of two bits, we can bound it by

\displaystyle\frac{\mathsf{Hist}(R(1))_{z}}{\mathsf{Hist}(R(2))_{z}}\leq\frac{1/2}{q^{\prime}}\cdot\frac{1-q^{\prime}}{1/2}=\frac{1-q^{\prime}}{q^{\prime}}\leq O(n).

The last inequality follows from the fact that $\displaystyle q^{\prime}=\Theta(1/n)$ . ∎

8.4 Public-Coin Protocol

We are now ready to prove Theorem 1.6 (restated below).

Theorem 1.6. (restated) For all $\displaystyle\varepsilon\leq O(1)$ and $\displaystyle\delta\leq 1/n$ , there is a public-coin $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ protocol computing $\displaystyle\textsf{\small CountDistinct}_{n}$ with error $\displaystyle O\left(\sqrt{n}\cdot\log(\delta^{-1})\cdot\varepsilon^{-1.5}\cdot\sqrt{\ln(2/\varepsilon)}\right)$ and probability at least $\displaystyle 0.99$ . Moreover, the expected number of messages sent by each user is at most $\displaystyle 1$ .

Proof.

Let $\displaystyle c_{1}$ be the constant in Theorem 8.4 such that the expected number of messages is bounded by $\displaystyle c_{1}\cdot\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\cdot\frac{D}{n}+1/2$ .

The Protocol.

We set $\displaystyle D=\left\lfloor n\left/\left(2\cdot c_{1}\cdot\frac{\log(1/\delta)^{2}\ln(2/\varepsilon)}{\varepsilon}\right)\right.\right\rfloor$ so that the foregoing expected number of messages is bounded by $\displaystyle 1$ .

Note that we can assume $\displaystyle\varepsilon^{-1}\cdot\ln(2/\varepsilon)\cdot\log(1/\delta)^{2}=o(n)$ as otherwise we are only required to solve $\displaystyle\textsf{\small CountDistinct}_{n}$ with the trivial error bound $\displaystyle O(n)$ . Therefore, we have $\displaystyle D\geq 1$ and $\displaystyle n/D=O(\varepsilon^{-1}\cdot\ln(2/\varepsilon)\cdot\log(1/\delta)^{2})$ .

We are going to apply a reduction to the private-coin protocol for $\displaystyle\textsf{\small CountDistinct}_{n,D}$ in Theorem 8.4. The full protocol is as follows:

•

Using the public randomness, the users jointly sample a uniformly random mapping $\displaystyle f\colon\mathcal{X}\to[n]$ and a uniformly random permutation $\displaystyle\pi$ on $\displaystyle[n]$ .
•

For each user holding an input $\displaystyle x\in\mathcal{X}$ , it computes $\displaystyle z=\pi(f(x))$ , and sets its new input to be $\displaystyle z$ if $\displaystyle z\leq D$ , and $\displaystyle 0$ otherwise. Then it runs the private-coin protocol in Theorem 8.4.
•

Let $\displaystyle f_{n}(m):=n\cdot\left(1-\left(1-\frac{1}{n}\right)^{m}\right)$ . The analyzer first runs the analyzer in Theorem 8.4 to obtain an estimate $\displaystyle\bar{z}$ . Then it computes $\displaystyle\hat{z}=\bar{z}\cdot\frac{n}{D}$ , and outputs

$\displaystyle z=\mathop{\textrm{argmin}}_{m\in\{0,1,\dotsc,n\}}|f_{n}(m)-\hat{z}|.$

Analysis of the Protocol.

The privacy of the protocol above follows directly from the privacy property of the protocol from Theorem 8.4. Moreover, the bound on the expected number of messages per user simply follows from our choice of the parameter $\displaystyle D$ .

It thus suffices to establish the accuracy guarantee of the protocol. Let $\displaystyle S=\{x_{i}\}_{i\in[n]}$ be the set of all inputs, and the goal is to estimate $\displaystyle|S|$ . We also let $\displaystyle\hat{S}=\{f(x_{i})\}_{i\in[n]}$ and $\displaystyle\bar{S}=\hat{S}\cap\pi^{-1}([D])$ .

By the accuracy guarantee of Theorem 8.4, it follows that with probability at least $\displaystyle 0.99$ , we have

\displaystyle|\bar{z}-|\bar{S}||\leq O(\sqrt{D}\cdot\varepsilon^{-1}).

In the following, we will condition on this event.

Proving that $\displaystyle\hat{z}$ is a good estimate of $\displaystyle|\hat{S}|$ .

Next, we show that $\displaystyle\hat{z}$ is close to $\displaystyle|\hat{S}|$ . We will rely on the following lemma.

Lemma 8.10.

For a uniformly random permutation $\displaystyle\pi\colon[n]\to[n]$ and a fixed set $\displaystyle E$ , for every $\displaystyle B\in[1,n]$ such that $\displaystyle n/B$ is an integer, let $\displaystyle E_{\pi,n/B}=E\cap\pi^{-1}([n/B])$ , we have

\displaystyle\Pr_{\pi}\left[\Big{|}|E_{\pi,n/B}|\cdot B-|E|\Big{|}\geq 10\cdot\sqrt{B\cdot|E|}\right]\leq 0.01.

Proof.

For each $\displaystyle i\in[n]$ , let $\displaystyle X_{i}=X_{i}(\pi)$ be the indicator that $\displaystyle i\in E_{\pi,n/B}$ . Note that these $\displaystyle X_{i}$ ’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.

Note that $\displaystyle\operatornamewithlimits{\mathbb{E}}[X_{i}]=\frac{1}{B}\cdot\frac{|E|}{n}$ . By a Chernoff bound, it thus follows that

\displaystyle\Pr_{\pi}\left[\Big{|}\sum_{i=1}^{n}X_{i}-n\cdot\operatornamewithlimits{\mathbb{E}}[X_{1}]\Big{|}\geq 10\cdot\sqrt{n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]}\right]\leq 0.01,

and hence

\displaystyle\Pr_{\pi}\left[\Big{|}|E_{\pi,n/B}|-|E|/B\Big{|}\geq 10\cdot\sqrt{|E|/B}\right]\leq 0.01.

Scaling both sides of the above inequality by $\displaystyle B$ concludes the proof. ∎

We now set $\displaystyle B=\frac{n}{D}$ and recall that $\displaystyle\hat{z}=B\cdot\bar{z}$ . By Lemma 8.10, with probability at least $\displaystyle 0.98$ , it holds that

	$\displaystyle\displaystyle\|\hat{z}-\|\hat{S}\|\|$	$\displaystyle\displaystyle\leq\left\|\hat{z}-\|\bar{S}\|\cdot B\right\|+\left\|\|\hat{S}\|-\|\bar{S}\|\cdot B\right\|$
		$\displaystyle\displaystyle\leq B\cdot\|\bar{z}-\|\bar{S}\|\|+O(\sqrt{B\cdot n})$
		$\displaystyle\displaystyle=O(B\sqrt{D}\cdot\varepsilon^{-1})+O(\sqrt{B\cdot n})=O(\sqrt{Bn}\cdot\varepsilon^{-1}).$

Proving that $\displaystyle z$ is a good estimate of $\displaystyle|S|$ .

Finally, we show that our output $\displaystyle z$ is a good estimate of $\displaystyle|S|$ . To do so, we need the following lemma.

Lemma 8.11.

Let $\displaystyle f_{n}(m):=n\cdot\left(1-\left(1-\frac{1}{n}\right)^{m}\right)$ . For a uniformly random mapping $\displaystyle f\colon\mathcal{X}\to[n]$ and a fixed set $\displaystyle E\subseteq\mathcal{X}$ such that $\displaystyle|E|=m\leq n$ , we have that

\displaystyle\Pr[\left||\{f(x)\}_{x\in E}|-f_{n}(m)\right|\geq 10\sqrt{n}]\leq 0.01.

Proof.

For each $\displaystyle i\in[n]$ , let $\displaystyle X_{i}=X_{i}(f)$ be the indicator whether $\displaystyle i\in\{f(x)\}_{x\in E}$ . As before, these $\displaystyle X_{i}$ ’s are not independent but are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.

Note that $\displaystyle\operatornamewithlimits{\mathbb{E}}[X_{i}]=\left(1-\left(1-\frac{1}{n}\right)^{m}\right)$ . By a Chernoff bound, it thus follows that

\displaystyle\Pr_{\pi}\left[\Big{|}\sum_{i=1}^{n}X_{i}-n\cdot\operatornamewithlimits{\mathbb{E}}[X_{1}]\Big{|}\geq 10\cdot\sqrt{n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]}\right]\leq 0.01.

Noting that $\displaystyle\sum_{i=1}^{n}X_{i}=|\{f(x)\}_{x\in E}|$ and $\displaystyle n\cdot\operatornamewithlimits{\mathbb{E}}[X_{i}]=f_{n}(m)$ completes the proof. ∎

By Lemma 8.11, it follows that with probability at least $\displaystyle 0.99$ , we have $\displaystyle\left||\hat{S}|-f_{n}(|S|)\right|\leq 10\sqrt{n}$ .

Putting everything together, with probability at least $\displaystyle 0.97$ , we get that

\displaystyle\left|\hat{z}-f_{n}(|S|)\right|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

The final step is to show that $\displaystyle z$ accurately estimates $\displaystyle|S|$ . Recall that $\displaystyle z=\mathop{\textrm{argmin}}_{m\in\{0,1,\dotsc,n\}}|f_{n}(m)-\hat{z}|$ , which in particular means that

\displaystyle|f_{n}(z)-\hat{z}|\leq\left|f_{n}(|S|)-\hat{z}\right|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

By a triangle inequality, it follows that

\displaystyle|f_{n}(z)-f_{n}(|S|)|\leq O(\sqrt{Bn}\cdot\varepsilon^{-1}).

We need the following lemma to finish the analysis.

Lemma 8.12.

There is a constant $\displaystyle c>0$ such that for all $\displaystyle a,b\in\{0,1,\dotsc,n\}$ , it holds that

\displaystyle|f_{n}(a)-f_{n}(b)|\geq c\cdot|a-b|.

Proof.

Suppose $\displaystyle a<b$ without loss of generality. Let $\displaystyle t=b-a$ . We have that

	$\displaystyle\displaystyle f_{n}(b)-f_{n}(a)$	$\displaystyle\displaystyle=n\cdot\left(\left(1-\frac{1}{n}\right)^{a}-\left(1-\frac{1}{n}\right)^{b}\right)$
		$\displaystyle\displaystyle=n\cdot\left(1-\frac{1}{n}\right)^{a}\cdot\left(1-\left(1-\frac{1}{n}\right)^{t}\right)$
		$\displaystyle\displaystyle=\Omega\left(n\cdot\frac{t}{n}\right)=\Omega(t).\qed$

Finally, by Lemma 8.12, with probability at least $\displaystyle 0.97>0.9$ , we obtain that $\displaystyle|z-|S||\leq O(\sqrt{Bn}\cdot\varepsilon^{-1})$ , which concludes the proof. ∎

9 Lower Bounds in Two-Party Differential Privacy

In this section, we depart from the local and shuffle models, and instead consider the two-party differential privacy [MMP⁺10], which can be defined as follows.

Definition 9.1 (DP in the Two-Party Model [MMP⁺10]).

There are two parties $\displaystyle A$ and $\displaystyle B$ ; $\displaystyle A$ holds $\displaystyle X=(x_{1},\dots,x_{n})\in\mathcal{X}^{n}$ and $\displaystyle B$ holds $\displaystyle Y=(y_{1},\dots,y_{n})\in\mathcal{X}^{n}$ . Let $\displaystyle P$ be any randomized protocol between $\displaystyle A$ and $\displaystyle B$ . Let $\displaystyle\textsf{VIEW}^{A}_{P}(X,Y)$ denote the tuple ( $\displaystyle X$ , the private randomness of $\displaystyle A$ , the transcript of the protocol). Similarly, let $\displaystyle\textsf{VIEW}^{B}_{P}(X,Y)$ denote the tuple ( $\displaystyle Y$ , the private randomness of $\displaystyle B$ , the transcript of the protocol).

We say that $\displaystyle P$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ if, for any $\displaystyle X,Y\in\mathcal{X}^{n}$ , the algorithms

\displaystyle(y_{1},\dots,y_{n})\mapsto\textsf{VIEW}^{A}_{P}(X,(y_{1},\dots,y_{n}))

\displaystyle(x_{1},\dots,x_{n})\mapsto\textsf{VIEW}^{B}_{P}((x_{1},\dots,x_{n}),Y)

are both $\displaystyle(\varepsilon,\delta)$ -DP.

We say that a two-party protocol $\displaystyle P$ computes a function $\displaystyle f\colon\mathcal{X}^{2n}\to\mathbb{R}$ with error $\displaystyle\beta$ if, at the end of the protocol, at least one of the parties can output a number that lies in $\displaystyle f(x_{1},\dots,x_{n},y_{1},\dots,y_{n})\pm\beta$ with probability at least $\displaystyle 0.9$ .

We quickly note that, unlike in the local and shuffle models, we need not consider the public-coin and private-coin cases separately: as noted in [MMP⁺10], the two parties may share fresh private random bits without violating privacy, meaning that public randomness is unnecessary.

The goal of this section is to prove Theorem 1.11. To do this, we first state the necessary lower bound from [MMP⁺10] in Section 9.1. We then give our reduction and prove Theorem 1.11 in Section 9.2. Finally, in Section 9.3, we extend the lower bound to the case where the function is symmetric.

9.1 Inner Product Lower Bound from [MMP⁺10]

McGregor et al. [MMP⁺10] show that the inner product function is hard in the two-party model. Roughly speaking, they show that, if we let $\displaystyle X,Y$ be uniformly random strings, then, for any not-too-large $\displaystyle m\in\mathbb{N}$ , no $\displaystyle(O(1),o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol can distinguish between $\displaystyle\left<X,Y\right>\mod m$ and a uniformly random number from $\displaystyle\{0,\dots,m-1\}$ . McGregor et al. use this result when $\displaystyle m=\tilde{\Omega}_{\varepsilon}(\sqrt{n})$ , but we will use their result for $\displaystyle m=2$ .

To avoid confusion in the next subsection, we will use $\displaystyle D$ in place of $\displaystyle n$ in this subsection. The following theorem is implicit in the proof of Theorem A.5 of [MMP⁺11] (it follows by replacing $\displaystyle m=6\Delta/\delta$ with $\displaystyle m=2$ there). Recall that $\displaystyle\mathcal{U}_{D}$ is the uniform distribution over $\displaystyle\{0,1\}^{D}$ .

Theorem 9.2 ([MMP⁺11]).

Let $\displaystyle P$ be any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol. Suppose $\displaystyle(X,Y)\leftarrow\mathcal{U}_{D}^{\otimes 2}$ and let $\displaystyle Z$ be a uniformly random bit. Then, we have

\displaystyle\displaystyle\|(\textsf{VIEW}^{B}_{P}(X,Y),\langle X,Y\rangle\mod 2)-(\textsf{VIEW}^{B}_{P}(X,Y),Z)\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}.

It will be more convenient to state the above lower bound in terms of hardness of distinguishing two distributions, as we have done in the rest of this paper. To state this, we will need the following few notation. First, we use $\displaystyle\mathcal{D}^{0}$ to denote the distribution $\displaystyle\mathcal{U}_{D}^{\otimes 2}$ conditioned on the inner product of the two strings being $\displaystyle 0\bmod 2$ ; similarly, we use $\displaystyle\mathcal{D}^{1}$ to denote the distribution $\displaystyle\mathcal{U}_{D}^{\otimes 2}$ conditioned on the inner product of the two strings being $\displaystyle 1\bmod 2$ . Furthermore, for any distribution $\displaystyle\mathcal{D}$ on $\displaystyle(\{0,1\}^{D})^{2}$ , we write $\displaystyle\textsf{VIEW}^{A}_{P}(\mathcal{D})$ (respectively, $\displaystyle\textsf{VIEW}^{B}_{P}(\mathcal{D})$ ) to denote the distribution of $\displaystyle\textsf{VIEW}^{A}_{P}(X,Y)$ (respectively, $\displaystyle\textsf{VIEW}^{B}_{P}(X,Y)$ ) when $\displaystyle X,Y$ are drawn according to $\displaystyle\mathcal{D}$ .

We may now state the following corollary, which is an immediate consequence of Theorem 9.2.

Corollary 9.3.

Let $\displaystyle P$ be any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol. Then, we have that

\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1})\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}.

9.2 From Parity to the $\displaystyle\tilde{\Omega}(n)$ Gap

We will now construct the hard distributions that eventually give the gap of $\displaystyle\tilde{\Omega}(n)$ between the sensitivity and the error achievable in two-party model. The hard distributions are simply concatenations of $\displaystyle\mathcal{D}^{0}$ or $\displaystyle\mathcal{D}^{1}$ . Specifically, tor $\displaystyle T\in\mathbb{N}$ , we write $\displaystyle\mathcal{D}^{0,T}$ (respectively, $\displaystyle\mathcal{D}^{1,T}$ ) to denote the distribution of $\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT}))$ where $\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD}))$ is an i.i.d. sample from $\displaystyle\mathcal{D}^{0}$ (respectively, $\displaystyle\mathcal{D}^{1,T}$ ) for all $\displaystyle i\in[T]$ . Similar to before, it is hard to distinguish the two distributions:

Lemma 9.4.

Let $\displaystyle P$ be any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol. Then, we have

\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1,T})\|_{TV}\leq O\left(TD\delta\right)+T\cdot e^{-\Omega_{\varepsilon}(D)}.

Proof.

We prove this via a simple hybrid argument. For $\displaystyle j\in[T+1]$ , let us denote by $\displaystyle\mathcal{D}_{j}$ the distribution of $\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT}))$ where, for all $\displaystyle i\in[T]$ , $\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD}))$ is independent from $\displaystyle\mathcal{D}^{1}$ if $\displaystyle i<j$ and from $\displaystyle\mathcal{D}^{0}$ otherwise. Notice that $\displaystyle\mathcal{D}_{1}=\mathcal{D}^{0,T}$ and $\displaystyle\mathcal{D}_{T+1}=\mathcal{D}^{1,T}$ .

Our main claim is the following. For every $\displaystyle j\in[T]$ and any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol $\displaystyle P$ ,

\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j})-\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j+1})\|_{TV}\leq O\left(D\delta\right)+e^{-\Omega_{\varepsilon}(D)}.

(10)

Note that summing (10) over all $\displaystyle j\in[T]$ immediately yields Lemma 9.4.

We will now prove (10). Given an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol $\displaystyle P$ (where each party’s input has $\displaystyle DT$ bits), we construct a protocol $\displaystyle P^{\prime}$ (where each party’s input has $\displaystyle D$ bits) as follows:

•

Suppose that the input of $\displaystyle A$ is $\displaystyle x^{\prime}_{1},\dots,x^{\prime}_{D}$ , and the input of $\displaystyle B$ is $\displaystyle y^{\prime}_{1},\dots,y^{\prime}_{D}$ .
•

For $\displaystyle i=1,\dots,j-1$ , $\displaystyle A$ samples $\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD}))$ from $\displaystyle\mathcal{D}^{1}$ and sends $\displaystyle(y_{(i-1)D+1},\dots,y_{iD})$ to $\displaystyle B$ .
•

$\displaystyle i=j+1,\dots,T$ , $\displaystyle A$ samples $\displaystyle((x_{(i-1)D+1},\dots,x_{iD}),(y_{(i-1)D+1},\dots,y_{iD}))$ from $\displaystyle\mathcal{D}^{0}$ and sends $\displaystyle(y_{(i-1)D+1},\dots,y_{iD})$ to $\displaystyle B$ .
•

$\displaystyle A$ sets $\displaystyle(x_{(j-1)D+1},\dots,x_{jD})=(x^{\prime}_{1},\dots,x^{\prime}_{D})$ .
•

$\displaystyle B$ sets $\displaystyle(y_{(j-1)D+1},\dots,y_{jD})=(y^{\prime}_{1},\dots,y^{\prime}_{D})$ .
•

$\displaystyle A$ and $\displaystyle B$ then run the protocol $\displaystyle P$ on $\displaystyle((x_{1},\dots,x_{DT}),(y_{1},\dots,y_{DT}))$ .

It is clear that $\displaystyle P^{\prime}$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ and that

\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P^{\prime}}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P^{\prime}}(\mathcal{D}^{1,T})\|_{TV}\geq\|\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j})-\textsf{VIEW}^{B}_{P}(\mathcal{D}_{j+1})\|_{TV}.

Inequality (10) then follows from Corollary 9.3. ∎

We can now prove our main theorem of this section.

Proof of Theorem 1.11.

Let $\displaystyle C>0$ be a sufficiently large constant to be chosen later. Let $\displaystyle D=\lceil C\log n\rceil$ and $\displaystyle T=\lfloor n/D\rfloor$ . We may define $\displaystyle f$ on only $\displaystyle 2DT$ bits, as it can be trivially extended to $\displaystyle 2n$ bits by ignoring the last $\displaystyle n-DT$ bits of $\displaystyle X,Y$ . Let

\displaystyle\displaystyle f(x_{1},\dots,x_{DT},y_{1},\dots,y_{DT})=\sum_{i\in[T]}\left(\sum_{\ell\in[D]}x_{(i-1)D+j}y_{(i-1)D+j}\mod 2\right),

where the outer summation is over $\displaystyle\mathbb{Z}$ .

It is immediate that the sensitivity of $\displaystyle f$ is one. We will now argue that any $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol $\displaystyle P$ with $\displaystyle\delta=o(1/n)$ incurs error at least $\displaystyle\Omega(n/\log n)$ . Since the function is symmetric with respect to the two parties, it suffices without loss of generality to show that the output of $\displaystyle B$ incurs error $\displaystyle\Omega(n/\log n)$ with probability 0.1. To do so, we start by observing that we have $\displaystyle f(X,Y)=0$ for any $\displaystyle(X,Y)\in\mathrm{supp}(\mathcal{D}^{0,T})$ whereas $\displaystyle f(X,Y)=T$ for any $\displaystyle(X,Y)\in\mathrm{supp}(\mathcal{D}^{1,T})$ . From Lemma 9.4, we have that

\displaystyle\displaystyle\|\textsf{VIEW}^{B}_{P}(\mathcal{D}^{0,T})-\textsf{VIEW}^{B}_{P}(\mathcal{D}^{1,T})\|_{TV}\leq O\left(TD\delta\right)+T\cdot e^{-\Omega_{\varepsilon}(D)}.

As a result, if we sample $\displaystyle(X,Y)$ from $\displaystyle\mathcal{D}^{0,T}$ with probability 1/2 and $\displaystyle\mathcal{D}^{1,T}$ with probability 1/2, then the probability that $\displaystyle B$ ’s output incurs error at least $\displaystyle T/2$ is at least

\displaystyle\displaystyle\frac{1}{2}-O\left(TD\delta\right)-T\cdot e^{-\Omega_{\varepsilon}(D)}\geq\frac{1}{2}-o(1)-n\cdot e^{-\Omega_{\varepsilon}(C\log n)}.

When $\displaystyle C$ is sufficiently large, we also have that $\displaystyle n\cdot e^{-\Omega_{\varepsilon}(C\log n)}=o(1)$ . As a result, with probability $\displaystyle 1/2-o(1)$ (which is at least $\displaystyle 0.1$ for any sufficiently large $\displaystyle n$ ), the protocol $\displaystyle P$ must incur an error of at least $\displaystyle T/2=\Omega(n/\log n)$ . ∎

9.3 Symmetrization

Notice that the function in Theorem 1.11 is asymmetric. It is a natural question to ask whether we can get a similar lower bound for a symmetric function. In this subsection, we give a simple reduction that positively answers this question, ultimately yielding the following:

Theorem 9.5.

For any $\displaystyle\varepsilon=O(1)$ and any sufficiently large $\displaystyle n\in\mathbb{N}$ , there is a symmetric function $\displaystyle f:[2n]^{2n}\to\mathbb{R}$ whose sensitivity is one and such that any $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol cannot compute $\displaystyle f$ to within an error of $\displaystyle o(n/\log n)$ .

We remark that the input to each user comes from a set $\displaystyle\mathcal{X}$ of size $\displaystyle\Omega(n)$ , instead of $\displaystyle\mathcal{X}=\{0,1\}$ as in Theorem 1.11. This larger value of $\displaystyle|\mathcal{X}|$ turns out to be necessary for symmetric functions: when $\displaystyle f$ is symmetric, we may use the Laplace mechanism from both sides to estimate the histogram of the input, which we can then use to compute $\displaystyle f$ . If the sensitivity of $\displaystyle f$ is $\displaystyle O(1)$ , this algorithm incurs an error of $\displaystyle O_{\varepsilon}(|\mathcal{X}|)$ . Hence, to achieve a lower bound of $\displaystyle\tilde{\Omega}(n)$ , we need $\displaystyle|\mathcal{X}|$ to be at least $\displaystyle\tilde{\Omega}_{\varepsilon}(n)$ .

The properties of our reduction are summarized in the following lemma, which combined with Theorem 1.11, immediately implies Theorem 9.5.

Lemma 9.6.

For any function $\displaystyle g\colon\mathcal{X}^{2n}\to\mathbb{R}$ , there is another function $\displaystyle f:(\mathcal{X}\times[n])^{2n}\to\mathbb{R}$ such that the following holds:

•

The sensitivity of $\displaystyle f$ is no more than that of $\displaystyle g$ .
•

If there exists an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol that solves $\displaystyle f$ with error $\displaystyle\beta$ , then there exists an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol that solves $\displaystyle f$ with error $\displaystyle 2\beta$ .

The idea behind the proof of Lemma 9.6 is simple. Roughly speaking, we view each input $\displaystyle(x,i)\in\mathcal{X}\times[n]$ of $\displaystyle f$ as “setting the $\displaystyle i$ th position to $\displaystyle x$ ” for the input to $\displaystyle g$ . This is formalized below.

Proof of Lemma 9.6.

We start by defining $\displaystyle f$ . Let $\displaystyle u^{*}$ be an arbitrary element of $\displaystyle\mathcal{X}$ . For every $\displaystyle i\in[n]$ , we define $\displaystyle h_{i}:(\mathcal{X}\times[n])^{n}$ where

\displaystyle\displaystyle h_{i}((w_{1},\dots,w_{n}))=\begin{cases}\text{the unique }x\text{ such that }\exists j\in[n],w_{j}=(x,i)&\text{ if }|\{x\in\mathcal{X}\mid\exists j\in[n],w_{j}=(x,i)\}|=1,\\ u^{*}&\text{ otherwise.}\end{cases}

We now define $\displaystyle f$ by

\displaystyle\displaystyle f(W,V)=\frac{1}{2}\cdot g(h_{1}(W),\dots,h_{n}(W),h_{1}(V),\cdots,h_{n}(V)).

We will next verify that the two properties hold.

•

Notice that changing any user’s input in $\displaystyle f$ results in at most two changes in the user’s input of $\displaystyle g$ . As a result, the sensitivity of $\displaystyle f$ is no more than that of $\displaystyle g$ .
•

Suppose that there exists an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ protocol $\displaystyle P$ that solves $\displaystyle f$ with error $\displaystyle\beta$ . Let $\displaystyle P^{\prime}$ be the protocol for $\displaystyle g$ where $\displaystyle A,B$ transform their inputs $\displaystyle(x_{1},\dots,x_{n})$ , $\displaystyle(y_{1},\dots,y_{n})$ to $\displaystyle((x_{1},1),\dots,(x_{n},n))$ , $\displaystyle((y_{1},1),\dots,(y_{n},n))$ respectively, then run $\displaystyle P$ , and finally return the output of $\displaystyle P$ multiplied by two. It is obvious that $\displaystyle P^{\prime}$ is $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{two\text{-}party}}$ ; furthermore, since the protocol $\displaystyle P$ incurs error $\displaystyle\beta$ , the protocol $\displaystyle P^{\prime}$ incurs error $\displaystyle 2\beta$ as desired. ∎

Acknowledgments

We would like to thank Noah Golowich for numerous enlightening discussions about lower bounds in the multi-message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ model, and for helpful feedback.

References

[Abo18] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
[App17] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
[BBGN19] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In CRYPTO, pages 638–667, 2019.
[BBGN20] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. arXiv: 2002.00817, 2020.
[BC20] Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In ITC, pages 1:1–1:14, 2020.
[BCJM20] Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. CoRR, abs/2004.09481, 2020.
[BCK⁺14] Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, David P Woodruff, and Grigory Yaroslavtsev. Beyond set disjointness: the communication complexity of finding the intersection. In PODC, pages 106–113, 2014.
[BEM⁺17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441–459, 2017.
[BFJ⁺94] Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, pages 253–262, 1994.
[CDSKY20] Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. PoPETs, 3:153–174, 2020.
[CSS12] TH Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private multi-party aggregation. In ESA, pages 277–288, 2012.
[CSU⁺18] Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via mixnets. CoRR, abs/1808.01394, 2018.
[CU20] Albert Cheu and Jonathan Ullman. The limits of pan privacy and shuffle privacy for learning and estimation. CoRR, abs/2009.08000, 2020.
[DJW13] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In FOCS, pages 429–438, 2013.
[DKM⁺06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
[DKY17] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In NIPS, pages 3571–3580, 2017.
[DLB19] Damien Desfontaines, Andreas Lochbihler, and David Basin. Cardinality estimators do not preserve privacy. PoPETs, 2019(2):26–46, 2019.
[DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
[DR98] Devdatt P. Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Struct. Algorithms, 13(2):99–124, 1998.
[DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
[DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and differential privacy. In FOCS, pages 51–60, 2010.
[EFM⁺19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019.
[ENU20] Alexander Edmonds, Aleksandar Nikolov, and Jonathan Ullman. The power of factorization mechanisms in local and central differential privacy. In STOC, pages 425–438, 2020.
[EPK14] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014.
[GGK⁺19] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages. IACR Cryptol. ePrint Arch., 2019:1382, 2019.
[GGK⁺20] Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous messages. In ITC, pages 15:1–15:23, 2020.
[GKMP20] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, 2020.
[GMPV20] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In EUROCRYPT, pages 798–827, 2020.
[Gre16] Andy Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
[GS02] Alison L Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International statistical review, 70(3):419–435, 2002.
[IKOS06] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248, 2006.
[JHW18] Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the l $\displaystyle{}_{\mbox{1}}$ distance. IEEE Trans. Inf. Theory, 64(10):6672–6706, 2018.
[Kea98] Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
[KLN⁺11] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
[KNW10] Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41–52, 2010.
[MMNW11] Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In PODS, pages 37–48, 2011.
[MMP⁺10] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In FOCS, pages 81–90, 2010.
[MMP⁺11] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. The limits of two-party differential privacy. Electron. Colloquium Comput. Complex., 18:106, 2011.
[MT07] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103, 2007.
[O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
[PS20] Rasmus Pagh and Nina Mesing Stausholm. Efficient differentially private $\displaystyle f_{0}$ linear sketching. arXiv preprint arXiv:2001.11932, 2020.
[PT11] Giovanni Peccati and Murad S Taqqu. Some facts about charlier polynomials. In Wiener Chaos: Moments, Cumulants and Diagrams, pages 171–175. Springer, 2011.
[Rob90] B Robert. Ash. information theory, 1990.
[Sha14] Stephen Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
[SU17] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. In FOCS, pages 552–563, 2017.
[Tim14] Aleksandr Filippovich Timan. Theory of approximation of functions of a real variable. Elsevier, 2014.
[Ull18] Jonathan Ullman. Tight lower bounds for locally differentially private selection. In arXiv:1802.02638, 2018.
[Vad17] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
[VV17] Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. J. ACM, 64(6):37:1–37:41, 2017.
[War65] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
[WY16] Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Trans. Inf. Theory, 62(6):3702–3720, 2016.
[WY19] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 47(2):857–883, 2019.
[Yan19] Han Yanjun. Lecture 7: Mixture vs. mixture and moment matching. https://theinformaticists.com/2019/08/28/lecture-7-mixture-vs-mixture-and-moment-matching/, 2019.

Appendix A Total Variance Bound between Mixtures of Multi-dimensional Poisson Distributions

In this section we prove Lemma 4.3 (restated below).

Lemma 4.3. (restated) Let $\displaystyle U,V$ be two random variables supported on $\displaystyle[0,\Lambda]$ such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for all $\displaystyle j\in\{1,2,\dotsc,L\}$ , where $\displaystyle L\geq 1$ . Let $\displaystyle D\in\mathbb{N}$ and $\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D}$ such that $\displaystyle\|\vec{\theta}\|_{1}=1$ . Let $\displaystyle\mathcal{D}_{\vec{\theta}}$ be the distribution over $\displaystyle[D]$ corresponding to $\displaystyle\vec{\theta}$ . Suppose that

\displaystyle\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\vec{\lambda}_{i}\geq 2\Lambda^{2}\cdot\vec{\theta}_{i}]\geq 1-\frac{1}{2\Lambda}.

Then,

\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\frac{1}{L!}.

To prove Lemma 4.3, we begin with some notation. Let $\displaystyle D\in\mathbb{N}$ . For vectors $\displaystyle\vec{m}\in\mathbb{N}^{D}$ and $\displaystyle\vec{\lambda}\in\mathbb{R}^{D}$ , we let

\displaystyle\vec{m}!:=\prod_{i=1}^{D}m_{i}!~{}~{}\text{ and }~{}~{}\vec{\lambda}^{\vec{m}}:=\prod_{i=1}^{D}(\vec{\lambda}_{i})^{\vec{m}_{i}}.

We are going to apply the moment-matching technique [WY16, JHW18, WY19] for bounding total variance between mixtures of (single-dimensional) Poisson distributions. The following lemma is a direct generalization of the Theorem 4 of [Yan19] to mixtures of multi-dimensional Poisson distributions. We will use the convention that $\displaystyle 0^{0}=1$ .

Lemma A.1.

For $\displaystyle\vec{\lambda}\in\mathbb{R}^{D}$ and two distributions $\displaystyle\vec{U}$ and $\displaystyle\vec{V}$ supported on $\displaystyle\prod_{i=1}^{D}[-\lambda_{i},\infty]$ , we have that

\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot\vec{\lambda}^{\vec{m}}}.

In order to prove the above lemma, we need to use the Charlier polynomial $\displaystyle c_{m}(x;\lambda)$ . The explicit definition of $\displaystyle c_{m}(x;\lambda)$ is not important here; we simply list two important properties of this polynomial family [PT11]:

Proposition A.2.

Let $\displaystyle\lambda\in\mathbb{R}$ and $\displaystyle u\in[-\lambda,\infty]$ , the following hold:

We have that

\displaystyle\operatornamewithlimits{\mathbb{E}}_{X\leftarrow\mathsf{Poi}(\lambda)}[c_{m}(X;\lambda)c_{n}(X;\lambda)]=\frac{n!}{\lambda^{n}}\cdot\mathbb{1}[m=n].

For all $\displaystyle z\in\mathbb{Z}^{\geq 0}$ ,

\displaystyle\frac{\mathsf{Poi}(\lambda+u)_{z}}{\mathsf{Poi}(\lambda)_{z}}=e^{-u}\cdot\left(1+\frac{u}{\lambda}\right)^{z}=\sum_{m=0}^{\infty}c_{m}(z;\lambda)\cdot\frac{u^{m}}{m!}.

We now prove Lemma A.1. Our proof closely follows the proof of Theorem 4 of [Yan19].

Proof of Lemma A.1.

Let $\displaystyle\Delta:=\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda})]\|_{TV}$ . We have that

	$\displaystyle\displaystyle\Delta$	$\displaystyle\displaystyle=\frac{1}{2}\cdot\sum_{\vec{z}\in(\mathbb{Z}^{\geq 0})^{D}}\left\|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}\right\|$
		$\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left\|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}\right\|$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left\|\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right\|,$

where the last equality follows from Item (2) of Proposition A.2.

Applying the Cauchy-Schwarz inequality, we get that

	$\displaystyle\displaystyle\Delta^{2}$	$\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left\|\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right\|^{2}$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}$
		$\displaystyle\displaystyle=\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})$
		$\displaystyle\displaystyle=\sum_{\vec{m},\vec{m}^{\prime}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}^{\prime}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}^{\prime}}]}{\vec{m}^{\prime}!}\prod_{i=1}^{D}\operatornamewithlimits{\mathbb{E}}_{\vec{z}_{i}\leftarrow\mathsf{Poi}(\vec{\lambda}_{i})}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})c_{\vec{m}^{\prime}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})$
		$\displaystyle\displaystyle=\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\left(\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right)^{2}\cdot\prod_{i=1}^{D}\frac{(\vec{m}_{i})!}{\vec{\lambda}_{i}^{m_{i}}}$
		$\displaystyle\displaystyle=\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot\vec{\lambda}^{\vec{m}}},$

where the penultimate equality follows from Item (1) of Proposition A.2. ∎

Applying Lemma A.1, the next lemma follows from a straightforward calculation.

Lemma A.3.

Let $\displaystyle U,V$ be two random variables supported on $\displaystyle[0,\Lambda]$ such that $\displaystyle\operatornamewithlimits{\mathbb{E}}[U^{j}]=\operatornamewithlimits{\mathbb{E}}[V^{j}]$ for all $\displaystyle j\in\{1,2,\dotsc,L\}$ , where $\displaystyle L\geq 1$ . For $\displaystyle\vec{\theta},\vec{\lambda}\in(\mathbb{R}^{\geq 0})^{D}$ , let $\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}}$ (division here is coordinate-wise, and $\displaystyle\vec{\theta}^{2}$ denotes taking coordinate-wise square of $\displaystyle\vec{\theta}$ ). The following holds:

\displaystyle\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{z=L+1}^{\infty}\frac{(\Lambda^{2}\cdot\|\vec{\alpha}\|_{1})^{z}}{z!}.

Proof.

Let $\displaystyle\Delta:=\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}$ . We also set $\displaystyle\vec{U}=(U-\Lambda)\vec{\theta}$ , $\displaystyle\vec{\lambda}^{\prime}=(\Lambda\vec{\theta}+\vec{\lambda})$ and $\displaystyle\vec{V}=(V-\Lambda)\vec{\theta}$ . Note that for every $\displaystyle i\in[D]$ , we have $\displaystyle\vec{U}_{i}\geq(-\Lambda\vec{\theta})_{i}\geq-(\Lambda\vec{\theta}+\vec{\lambda})_{i}=-\vec{\lambda}^{\prime}_{i}$ , and the same holds for each $\displaystyle\vec{V}_{i}$ as well. Hence, we can apply Lemma A.1 to bound $\displaystyle\Delta^{2}$ as follows:

	$\displaystyle\displaystyle\Delta^{2}$	$\displaystyle\displaystyle=\\|\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{U}+\vec{\lambda}^{\prime})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(\vec{V}+\vec{\lambda}^{\prime})]\\|_{TV}^{2}$
		$\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot(\vec{\lambda}^{\prime})^{\vec{m}}}$
		$\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\operatornamewithlimits{\mathbb{E}}[((U-\Lambda)\vec{\theta})^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[((U-\Lambda)\vec{\theta})^{\vec{m}}]\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}$
		$\displaystyle\displaystyle\leq\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\frac{\left(\vec{\theta}^{\vec{m}}\cdot\operatornamewithlimits{\mathbb{E}}[((U-\Lambda))^{\\|\vec{m}\\|_{1}}]-\vec{\theta}^{\vec{m}}\cdot\operatornamewithlimits{\mathbb{E}}[(V-\Lambda)^{\|\vec{m}\\|_{1}}]\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}$
		$\displaystyle\displaystyle\leq\sum_{z=L+1}^{\infty}\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\\|\vec{m}\\|_{1}=z}\frac{\left(\vec{\theta}^{\vec{m}}\cdot(\operatornamewithlimits{\mathbb{E}}[(U-\Lambda)^{z}]-\operatornamewithlimits{\mathbb{E}}[(V-\Lambda)^{z}])\right)^{2}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}}$
		$\displaystyle\displaystyle\leq\sum_{z=L+1}^{\infty}\Lambda^{2z}\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\\|\vec{m}\\|_{1}=z}\frac{\vec{\theta}^{2\vec{m}}}{\vec{m}!\cdot(\Lambda\vec{\theta}+\vec{\lambda})^{\vec{m}}},$

where the first inequality follows from Lemma A.1.

Now, recall that $\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}}$ . We claim that

\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\vec{\alpha}^{\vec{m}}}{\vec{m}!}=\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}.

To prove this equality, consider the random process of drawing $\displaystyle z$ samples from $\displaystyle[D]$ using the distribution corresponding to $\displaystyle\vec{\alpha}/\|\vec{\alpha}\|_{1}$ (that is, we get $\displaystyle i\in[D]$ with probability $\displaystyle\frac{\vec{\alpha}_{i}}{\|\vec{\alpha}\|_{1}}$ . It is a well-defined distribution since $\displaystyle\vec{\alpha}\in(\mathbb{R}^{\geq 0})^{D}$ ). Let $\displaystyle\vec{M}$ be the random variable corresponding to the histogram of the $\displaystyle z$ samples (that is, $\displaystyle\vec{M}_{i}$ denotes the number of occurrences of the element $\displaystyle i$ ). For $\displaystyle\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z$ , we have that

\displaystyle\Pr[\vec{M}=\vec{m}]=\left(\frac{\vec{\alpha}}{\|\vec{\alpha}\|_{1}}\right)^{\vec{m}}\cdot\frac{z!}{\vec{m}!}.

Hence, we get

\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\Pr[\vec{M}=\vec{m}]=1,

and

\displaystyle\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}\text{s.t.}\|\vec{m}\|_{1}=z}\frac{\vec{\alpha}^{\vec{m}}}{\vec{m}!}=\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}.

Plugging in, we obtain

\displaystyle\Delta^{2}\leq\sum_{z=L+1}^{\infty}\Lambda^{2z}\cdot\frac{\|\vec{\alpha}\|_{1}^{z}}{z!}=\sum_{z=L+1}^{\infty}\frac{(\Lambda^{2}\cdot\|\vec{\alpha}\|_{1})^{z}}{z!}.\qed

Applying Lemma A.3, we are now ready to prove Lemma 4.3.

Proof of Lemma 4.3.

Let $\displaystyle\vec{\alpha}=\frac{\vec{\theta}^{2}}{\Lambda\vec{\theta}+\vec{\lambda}}$ . We have that

	$\displaystyle\displaystyle\\|\vec{\alpha}\\|_{1}$	$\displaystyle\displaystyle=\sum_{i\in[D]}\vec{\theta}_{i}\cdot\frac{\vec{\theta}_{i}}{\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}}$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{i\leftarrow\mathcal{D}_{\vec{\theta}}}\frac{\vec{\theta}_{i}}{\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}}$
		$\displaystyle\displaystyle\leq\frac{1}{2\Lambda^{2}}+\Pr_{i\leftarrow\mathcal{D}_{\vec{\theta}}}[\Lambda\vec{\theta}_{i}+\vec{\lambda}_{i}<2\Lambda^{2}\cdot\vec{\theta}_{i}]\cdot\frac{1}{\Lambda}$
		$\displaystyle\displaystyle\leq\frac{1}{\Lambda^{2}}.$

Applying Lemma A.3, we get

\displaystyle\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(U\vec{\theta}+\vec{\lambda})]-\operatornamewithlimits{\mathbb{E}}[\vec{\mathsf{Poi}}(V\vec{\theta}+\vec{\lambda})]\|_{TV}^{2}\leq\sum_{z=L+1}^{\infty}\frac{1}{z!}\leq\frac{1}{L!}.\qed

Appendix B Lower Bounds on Hockey Stick Divergence

In this section, we prove Lemma 4.8 (restated below).

Lemma 4.8. (restated) There exists an absolute constant $\displaystyle c_{0}$ such that, for every integer $\displaystyle m\geq 1$ , three reals $\displaystyle\alpha,\beta,\varepsilon>0$ such that $\displaystyle\alpha>e^{\varepsilon}\beta$ , letting $\displaystyle\Delta=\alpha-e^{\varepsilon}\beta$ and supposing $\displaystyle 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2$ , it holds that

\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+\mathsf{Bin}(m,\beta)||\mathsf{Ber}(\beta)+\mathsf{Bin}(m,\beta))\geq\Delta\cdot\frac{1}{2\sqrt{2m}}\cdot\exp\left(-c_{0}\cdot m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).

Before proving Lemma 4.8, we need several technical lemmas. First, we show the hockey stick divergence between $\displaystyle\mathsf{Ber}(\alpha)+X$ and $\displaystyle\mathsf{Ber}(\beta)+X$ can be characterized by the hockey sticky divergence between $\displaystyle X+1$ and $\displaystyle X$ .

Lemma B.1.

Let $\displaystyle\alpha,\beta,\varepsilon>0$ be three reals such that $\displaystyle\alpha>e^{\varepsilon}\beta$ , and $\displaystyle X$ be a random variable over $\displaystyle\mathbb{Z}^{\geq 0}$ . The following holds:

\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+X||\mathsf{Ber}(\beta)+X)=(\alpha-e^{\varepsilon}\beta)\cdot d_{\ln\tau}(1+X||X),

where $\displaystyle\tau=\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}$ .

Proof.

We have that

	$\displaystyle\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+X\|\|\mathsf{Ber}(\beta)+X)$	$\displaystyle\displaystyle=\sum_{k\in\mathbb{Z}^{\geq 0}}\left[(1-\alpha)X_{k}+\alpha X_{k-1}-e^{\varepsilon}(1-\beta)X_{k}-e^{\varepsilon}\beta X_{k-1}\right]_{+}$
		$\displaystyle\displaystyle=\sum_{k\in\mathbb{Z}^{\geq 0}}\left[(\alpha-\varepsilon^{\varepsilon}\beta)\cdot X_{k-1}-(e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha)\cdot X_{k}\right]_{+}$
		$\displaystyle\displaystyle=(\alpha-e^{\varepsilon}\beta)\cdot\sum_{k\in\mathbb{Z}^{\geq 0}}\left[X_{k-1}-\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}\cdot X_{k}\right]_{+}$
		$\displaystyle\displaystyle=(\alpha-e^{\varepsilon}\beta)\cdot d_{\ln\tau}(1+X\|\|X).\qed$

Next, we need a lemma giving a lower bound on $\displaystyle d_{\varepsilon}(1+X||X)$ for a random variable $\displaystyle X$ .

Lemma B.2.

Let $\displaystyle X$ be a random variable over $\displaystyle\mathbb{Z}^{\geq 0}$ and $\displaystyle\varepsilon>0$ . The following holds:

\displaystyle d_{\varepsilon}(1+X||X)\geq\frac{1}{2}\cdot\Pr_{k\leftarrow X}\left[\frac{X_{k}}{X_{k+1}}\geq 2e^{\varepsilon}\right].

Proof.

We have that

	$\displaystyle\displaystyle d_{\varepsilon}(1+X\|\|X)=$	$\displaystyle\displaystyle\sum_{z=0}^{\infty}[X_{z-1}-e^{\varepsilon}X_{z}]_{+}$
	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{z=0}^{\infty}[X_{z}-e^{\varepsilon}X_{z+1}]_{+}$
	$\displaystyle\displaystyle\geq$	$\displaystyle\displaystyle\sum_{z=0}^{\infty}\frac{1}{2}\cdot X_{z}\cdot\mathbb{1}[X_{z}\geq 2e^{\varepsilon}X_{z+1}]$
	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\frac{1}{2}\Pr_{k\leftarrow X}\left[\frac{X_{k}}{X_{k+1}}\geq 2e^{\varepsilon}\right].\qed$

Applying Lemma B.2, we obtain the following lower bound on $\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p))$ .

Lemma B.3.

For $\displaystyle n\in\mathbb{N}$ and $\displaystyle p\in(0,0.5)$ , $\displaystyle\varepsilon>0$ such that $\displaystyle 4e^{\varepsilon}p<1/2$ ,

\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p))\geq\frac{1}{2\sqrt{2n}}\exp(-n4e^{\varepsilon}p\cdot\log(4e^{\varepsilon})).

Proof.

We have

\displaystyle\frac{\mathsf{Bin}(n,p)_{k}}{\mathsf{Bin}(n,p)_{k+1}}=\frac{1-p}{p}\cdot\frac{k+1}{n-k}\geq\frac{1-p}{p}\cdot\frac{k}{n}.

By Lemma B.2,

	$\displaystyle\displaystyle 2\cdot d_{\varepsilon}(1+\mathsf{Bin}(n,p)\|\|\mathsf{Bin}(n,p))$	$\displaystyle\displaystyle\geq\Pr_{k\leftarrow\mathsf{Bin}(n,p)}\left[\frac{\mathsf{Bin}(n,p)_{k}}{\mathsf{Bin}(n,p)_{k+1}}\geq 2e^{\varepsilon}\right]$
		$\displaystyle\displaystyle\geq\Pr_{k\leftarrow\mathsf{Bin}(n,p)}\left[\frac{1-p}{p}\cdot\frac{k}{n}\geq 2e^{\varepsilon}\right]$
		$\displaystyle\displaystyle=\Pr\left[\mathsf{Bin}(n,p)\geq 2e^{\varepsilon}\cdot n\cdot\frac{p}{1-p}\right].$
		$\displaystyle\displaystyle\geq\Pr\left[\mathsf{Bin}(n,p)\geq 4e^{\varepsilon}\cdot pn\right].$

Now, by anti-concentration of the binomial distribution [Rob90, Page 115], we have

\displaystyle\Pr\left[\mathsf{Bin}(n,p)\geq 4e^{\varepsilon}\cdot pn\right]\geq\frac{1}{\sqrt{2n}}\exp(-n\cdot KL(4e^{\varepsilon}p||p)).

Letting $\displaystyle\lambda=4e^{\varepsilon}$ , we have

\displaystyle KL(\lambda p||p)=\lambda p\cdot\log\frac{\lambda p}{p}+(1-\lambda p)\cdot\log\frac{1-\lambda p}{1-p}\leq\lambda p\cdot\log\frac{\lambda p}{p}=\lambda p\cdot\log\lambda.

Putting everything together, we get

\displaystyle d_{\varepsilon}(1+\mathsf{Bin}(n,p)||\mathsf{Bin}(n,p))\geq\frac{1}{2\sqrt{2n}}\exp(-n\lambda p\cdot\log\lambda)=\frac{1}{2\sqrt{2n}}\exp(-n4e^{\varepsilon}p\cdot\log(4e^{\varepsilon})).\qed

We are now ready to prove Lemma 4.8.

Proof of Lemma 4.8.

Let $\displaystyle\tau=\frac{e^{\varepsilon}-e^{\varepsilon}\beta-1+\alpha}{\alpha-e^{\varepsilon}\beta}$ , we have $\displaystyle\tau\leq\frac{e^{\varepsilon}}{\Delta}$ . Let $\displaystyle N=\mathsf{Bin}(m-1,\beta)$ . By Lemma B.1, we have that

\displaystyle d_{\varepsilon}(\mathsf{Ber}(\alpha)+N||\mathsf{Ber}(\beta)+N)\geq\Delta\cdot d_{\ln\tau}(1+N||N).

Applying Lemma B.3 and note that $\displaystyle 4\tau\beta\leq 4\frac{e^{\varepsilon}}{\Delta}\beta<1/2$ , it follows that

\displaystyle d_{\ln\tau}(1+N||N)\geq\frac{1}{2\sqrt{2m}}\cdot\exp(-m\cdot 4\tau\beta\log(4\tau)).

We thus have that

\displaystyle m\cdot 4\tau\beta\log(4\tau)\leq O\left(m\cdot\frac{e^{\varepsilon}}{\Delta}\beta\cdot\left[\log(\Delta^{-1})+1\right]\right).\qed

Appendix C Simulation of Shuffle Protocols by SQ Algorithms

In this section, we show the connection between dominated protocols and SQ algorithms, which implies that $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocols can be simulated by SQ algorithms. This is analogous to the result of Kasiviswanathan et al. [KLN⁺11] who proved such a connection between $\displaystyle\mathrm{DP}_{\mathrm{local}}$ protocols and SQ algorithms. In the following, we use the notation of [KLN⁺11].

C.1 SQ Model

We first introduce the statistical query (SQ) model. In the SQ model, algorithms access a distribution through its statistical properties instead of individual samples.

Definition C.1 (SQ Oracle).

Let $\displaystyle\mathcal{D}$ be a distribution over $\displaystyle\mathcal{X}$ . An SQ oracle $\displaystyle\textsf{SQ}_{\mathcal{D}}$ takes as input a function $\displaystyle g\colon D\rightarrow\{-1,1\}$ and a tolerance parameter $\displaystyle\tau\in(0,1)$ ; it outputs an estimate $\displaystyle v$ such that:

\displaystyle|v-g(\mathcal{D})|\leq\tau.

Definition C.2 (SQ algorithm).

An SQ algorithm is an algorithm that accesses the distribution $\displaystyle\mathcal{D}$ only through the SQ oracle $\displaystyle\textsf{SQ}_{\mathcal{D}}$ .

C.2 Simulation of Dominated Algorithms by SQ Algorithms

We have the following simulation of dominated algorithms by SQ algorithms.

Theorem C.3.

Suppose $\displaystyle R\colon\mathcal{X}\to\mathcal{M}$ is $\displaystyle(\varepsilon,\delta)$ -dominated. Then, for any distribution $\displaystyle\mathcal{U}$ and error parameter $\displaystyle\beta\geq\delta$ , one can take a sample from $\displaystyle R(\mathcal{U})$ with statistical error $\displaystyle O(\beta)$ using $\displaystyle O(e^{\varepsilon})$ queries in expectation to $\displaystyle\textsf{SQ}_{\mathcal{U}}$ with tolerance $\displaystyle\tau=\beta/e^{\varepsilon}$ .

Proof.

Suppose $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -dominated by $\displaystyle\mathcal{D}$ . Let $\displaystyle\tau=\beta/e^{\varepsilon}$ . For every $\displaystyle x\in\mathcal{X}$ and $\displaystyle z\in\mathcal{M}$ , we use $\displaystyle p_{x,z}$ (respectively, $\displaystyle p_{x,E}$ ) to denote $\displaystyle\Pr[R(x)=z]$ (respectively, $\displaystyle\Pr[R(x)\in E]$ ), and let

\displaystyle f_{z}(x)=\frac{p_{x,z}}{e^{\varepsilon}\cdot\mathcal{D}_{z}}~{}~{}\text{and}~{}~{}g_{z}(x)=\min(1,f_{z}(x)).

Our algorithm is a rejection sampling procedure adapted from [KLN⁺11]. It works as follows:

1.

Take a sample $\displaystyle z\leftarrow\mathcal{D}$ .
2.

We make a query $\displaystyle g_{z}$ to $\displaystyle\textsf{SQ}_{\mathcal{U}}$ with tolerance level $\displaystyle\tau$ , to obtain an estimate $\displaystyle\hat{g}_{z}$ such that $\displaystyle|\hat{g}_{z}-g_{z}(\mathcal{U})|\leq\tau$ .
3.

With probability $\displaystyle\max(\hat{g}_{z},0)$ , we output $\displaystyle z$ and stop. Otherwise we go back to Step 1.

Let $\displaystyle\mathcal{T}_{x}=\{z:p_{x,z}>e^{\varepsilon}\cdot\mathcal{D}_{z}\}$ . Note that $\displaystyle p_{z,\mathcal{T}_{x}}\leq 2\delta\leq 2\beta$ since $\displaystyle R$ is $\displaystyle(\varepsilon,\delta)$ -dominated by $\displaystyle\mathcal{D}$ . By the definition of $\displaystyle g_{z}$ , it holds that $\displaystyle g_{z}(x)=f_{z}(x)$ for every $\displaystyle z\not\in\mathcal{T}_{x}$ . We will need the following claim.

Claim 1.

For every $\displaystyle x\in\mathcal{X}$ ,

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\left|f_{z}(x)-g_{z}(x)\right|\leq 2\beta/e^{\varepsilon}.

Proof.

	$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\left\|f_{z}(x)-g_{z}(x)\right\|$	$\displaystyle\displaystyle\leq\sum_{z\in\mathcal{T}_{x}}\mathcal{D}_{z}\cdot f_{z}(x)$
		$\displaystyle\displaystyle\leq\sum_{z\in\mathcal{T}_{x}}p_{x,z}\cdot e^{-\varepsilon}$
		$\displaystyle\displaystyle\leq p_{x,\mathcal{T}_{x}}/e^{\varepsilon}\leq 2\beta/e^{\varepsilon}.\qed$

Now, in a single run, the above algorithm outputs $\displaystyle z$ with probability in the interval

\displaystyle[\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau),\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)].

Note that $\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}}\sum_{z}\frac{p_{x,z}}{e^{\varepsilon}}=e^{-\varepsilon}$ . By Claim 1, the algorithm terminates in a single run with probability at least

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}(g_{z}(\mathcal{U})-\tau)\geq\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})\right)-\tau-2\beta/e^{\varepsilon}=e^{-\varepsilon}\cdot(1-3\beta),

and at most

\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}(g_{z}(\mathcal{U})+\tau)\leq\left(\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})\right)+\tau+2\beta/e^{\varepsilon}=e^{-\varepsilon}\cdot(1+3\beta).

The above implies that the algorithm makes at most $\displaystyle O(e^{\varepsilon})$ queries to $\displaystyle\textsf{SQ}_{\mathcal{U}}$ in expectation.

Putting everything together, our algorithm outputs $\displaystyle z$ with probability in the following interval:

\displaystyle I_{z}:=\left[\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau)\cdot e^{\varepsilon}}{(1+3\beta)},\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)\cdot e^{\varepsilon}}{(1-3\beta)}\right].

We have that

\displaystyle\Pr[R(\mathcal{U})=z]=\operatornamewithlimits{\mathbb{E}}_{x\leftarrow\mathcal{U}}p_{x,z}=f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}.

Note that

\displaystyle\displaystyle\max_{v\in I_{z}}\left|v-f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right|

\displaystyle\displaystyle\leq\max_{v\in I_{z}}\left|v-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right|+\left|f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right|.

Moreover, we have that

	$\displaystyle\displaystyle\max_{v\in I_{z}}\left\|v-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right\|$	$\displaystyle\displaystyle=e^{\varepsilon}\cdot\max\left\{\left\|\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})-\tau)}{(1+3\beta)}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right\|,\left\|\frac{\mathcal{D}_{z}\cdot(g_{z}(\mathcal{U})+\tau)}{(1-3\beta)}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right\|\right\}$
		$\displaystyle\displaystyle\leq e^{\varepsilon}\cdot g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot O(\beta)+e^{\varepsilon}\cdot\mathcal{D}_{z}\cdot O(\tau).$

The final statistical error of our sampling algorithm is therefore bounded by

$\displaystyle\displaystyle\sum_{z\in\mathcal{M}}\max_{v\in I_{z}}\left\|v-f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot e^{\varepsilon}\right\|$	$\displaystyle\displaystyle\leq e^{\varepsilon}\cdot\sum_{z\in\mathcal{M}}\left(g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\cdot O(\beta)+\mathcal{D}_{z}\cdot O(\tau)+\left\|f_{z}(\mathcal{U})\cdot\mathcal{D}_{z}-g_{z}(\mathcal{U})\cdot\mathcal{D}_{z}\right\|\right)$
	$\displaystyle\displaystyle\leq e^{\varepsilon}\cdot\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}\Big{[}f_{z}(\mathcal{U})\cdot O(\beta)+O(\tau)+\|f_{z}(\mathcal{U})-g_{z}(\mathcal{U})\|\Big{]}$	( $\displaystyle g_{z}(\mathcal{U})\leq f_{z}(\mathcal{U})$ )
	$\displaystyle\displaystyle\leq O(\beta).$	( $\displaystyle\operatornamewithlimits{\mathbb{E}}_{z\leftarrow\mathcal{D}}f_{z}(\mathcal{U})=e^{-\varepsilon}$ and Claim 1)

∎

C.3 Applications

We are now ready to apply Theorem C.3 to show that protocols in the $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ model can be simulated by SQ algorithms when the database is drawn i.i.d. from a single distribution.

Theorem C.4.

Let $\displaystyle z$ be a database with $\displaystyle n$ entries drawn i.i.d. from a distribution $\displaystyle\mathcal{U}$ . Let $\displaystyle P=(R,S,A)$ be an $\displaystyle(\varepsilon,o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol on $\displaystyle n$ users. Then, there is an algorithm making $\displaystyle O((en)^{k+1}\cdot e^{\varepsilon})$ queries in expectation to $\displaystyle\textsf{SQ}_{\mathcal{U}}$ with tolerance $\displaystyle\tau=\Theta\left(\frac{1}{(en)^{k+1}\cdot e^{\varepsilon}}\right)$ , such that its output distribution differs by at most $\displaystyle 0.01$ in statistical distance from the output distribution of $\displaystyle P$ on the dataset $\displaystyle z$ .

Proof.

Note that it suffices to draw $\displaystyle n$ i.i.d. samples from $\displaystyle R(\mathcal{U})$ . By Lemma 1.8, we now that $\displaystyle R$ is $\displaystyle(\varepsilon+k(1+\ln n),o(1/n))$ -dominated. Let $\displaystyle\gamma=e^{\varepsilon+k(1+\ln n)}=e^{\varepsilon}\cdot(en)^{k}$ . By Theorem C.3, using $\displaystyle O(\gamma)$ queries in expectation to $\displaystyle\textsf{SQ}_{\mathcal{U}}$ with tolerance $\displaystyle\tau=\Theta(1/\gamma n)$ , we can sample from $\displaystyle R(\mathcal{U})$ with statistical error $\displaystyle 1/100n$ . Taking $\displaystyle n$ such samples completes the proof. ∎

We remark that [BFJ⁺94] proved that if an SQ algorithm solves ParityLearning with probability at least $\displaystyle 0.99$ , $\displaystyle T$ queries and tolerance $\displaystyle 1/T$ , then $\displaystyle T=\Omega(2^{D/3})$ (recall that $\displaystyle D$ is the dimension of the hidden vector in ParityLearning). Combing the foregoing lower bound with Theorem C.4, it translates to an $\displaystyle\Omega(2^{D/3(k+1)})$ lower bound on the sample complexity of $\displaystyle(O(1),o(1/n))$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocols solving ParityLearning, which is slightly weaker than our Theorem 1.10.

Appendix D Upper Bounds for Selection in $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$

In this section, we give a proof sketch for the $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol for Selection with sample complexity $\displaystyle\tilde{O}(D/\sqrt{k})$ .

Theorem D.1.

For any $\displaystyle k\leq D$ , $\displaystyle\varepsilon=O(1)$ and $\displaystyle\delta=1/\mathop{\mathrm{poly}}(n)$ , there is an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{k}$ protocol solving Selection with probability at least $\displaystyle 0.99$ and $\displaystyle n=\tilde{O}(D/\sqrt{k})$ .

Proof Sketch.

Let $\displaystyle\varepsilon=O(1)$ and $\displaystyle\delta=1/\mathop{\mathrm{poly}}(n)$ be the privacy parameters. We also let $\displaystyle\varepsilon_{0}=\Theta(\varepsilon/\sqrt{k})$ , $\displaystyle\delta_{0}=1/\mathop{\mathrm{poly}}(n)$ and $\displaystyle n=\tilde{\Theta}(D/\sqrt{k})$ to be specified later.

We can assume that $\displaystyle k=(\log n)^{\omega(1)}$ , as otherwise the protocol simply follows from the $\displaystyle\tilde{O}(D)$ sample upper bound by an $\displaystyle(\varepsilon,\delta)$ - $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ protocol [GGK⁺19].

Let $\displaystyle m=k/\log^{2}n$ , and $\displaystyle N=nm/D$ . Note that by our choice of $\displaystyle k$ and $\displaystyle D$ , $\displaystyle N=(\log n)^{\omega(1)}$ .

Now for each $\displaystyle i\in[D]$ , our protocol maintains an $\displaystyle(\varepsilon_{0},\delta_{0})$ -DP subprotocol aiming at estimating the fraction of users whose input $\displaystyle x$ satisfies $\displaystyle x_{i}=1$ . These subprotocols assume they will receive between $\displaystyle 0.99N$ and $\displaystyle 1.01N$ inputs. By [GMPV20, BBGN20], there is such a protocol which achieves error $\displaystyle O(\varepsilon_{0}^{-1}\log n)$ with probability at least $\displaystyle 1-1/n^{2}$ and using $\displaystyle O\left(\frac{\log(1/\delta)}{\log N}\right)\leq O(\log n)$ messages.

In our protocol, each user selects $\displaystyle k/\log^{2}n$ coordinates from $\displaystyle[D]$ uniformly at random, and participates in the corresponding subprotocols. Finally, the analyzer aggregates the outputs of all subprotocols and outputs the coordinate with the highest estimate.

Note that by a union bound a Chernoff bound, it follows that with probability at least $\displaystyle 1-n^{-\omega(1)}$ , the number of users of every subprotocol falls in the range $\displaystyle[0.99N,1.01N]$ , and their mean is $\displaystyle 0.01$ -close to the true mean of $\displaystyle i$ -th coordinates of all users.

Setting $\displaystyle\varepsilon_{0}=\Theta(\varepsilon/\sqrt{k})$ and $\displaystyle\delta_{0}=1/\mathop{\mathrm{poly}}(n)$ appropriately, the protocol is $\displaystyle(\varepsilon,\delta)$ -DP by the advanced composition theorem of DP [DRV10]. Moreover, with probability at least $\displaystyle 1-1/n$ , all subprotocols obtain estimates with error $\displaystyle O(\varepsilon_{0}^{-1}\log n)$ .

Setting $\displaystyle n$ so that $\displaystyle\varepsilon_{0}^{-1}\log n=o\left(N\right)$ , our protocol solves Selection with probability at least $\displaystyle 0.99$ .

∎

	$\displaystyle\mathrm{KL}(\mathsf{Hist}_{R}(\mathcal{W}^{\alpha})\|\|\mathsf{Hist}_{R}(\mathcal{W}^{\sf uniform}))$	$\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})^{\otimes n}\|\|R(\mathcal{U}_{D})^{\otimes n})$
		$\displaystyle=n\cdot\operatornamewithlimits{\mathbb{E}}_{(\ell,s)\leftarrow[2]\times\{0,1\}^{D}}\mathrm{KL}(R(\mathcal{D}_{\ell,s}^{\alpha})\|\|R(\mathcal{U}_{D})).$		(1)

	$\displaystyle\displaystyle I(Z_{1};(L,J))=\mathrm{KL}((Z_{1},L,J)\|\|Z_{1}\otimes(L,J))$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}((Z_{1}\|L,J)\|\|Z_{1})$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{L,J\leftarrow[2]\times[D]}\mathrm{KL}(R(\mathcal{D}_{L,J})\|\|R(\mathcal{U}_{D}))$
		$\displaystyle\displaystyle\leq\beta.$

$\displaystyle\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\|f(\mu)-f(\lambda_{v})\|^{2}$	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\|f(\mathcal{D}_{\ell,s})-f(\mathcal{U}_{D})\|^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\frac{1}{4}\|f(\mathcal{D}_{\ell,s})-f(\mathcal{D}_{1-\ell,s})\|^{2}$
	$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\ell,s\in[2]\times\{0,1\}^{D}}\hat{f}(s)^{2}.$	(Proposition 6.1)

	$\displaystyle\displaystyle\|\hat{z}-\|\hat{S}\|\|$	$\displaystyle\displaystyle\leq\left\|\hat{z}-\|\bar{S}\|\cdot B\right\|+\left\|\|\hat{S}\|-\|\bar{S}\|\cdot B\right\|$
		$\displaystyle\displaystyle\leq B\cdot\|\bar{z}-\|\bar{S}\|\|+O(\sqrt{B\cdot n})$
		$\displaystyle\displaystyle=O(B\sqrt{D}\cdot\varepsilon^{-1})+O(\sqrt{B\cdot n})=O(\sqrt{Bn}\cdot\varepsilon^{-1}).$

	$\displaystyle\displaystyle\Delta$	$\displaystyle\displaystyle=\frac{1}{2}\cdot\sum_{\vec{z}\in(\mathbb{Z}^{\geq 0})^{D}}\left\|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}\right\|$
		$\displaystyle\displaystyle\leq\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left\|\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{U}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}-\operatornamewithlimits{\mathbb{E}}_{\vec{u}\leftarrow\vec{V}}\frac{\vec{\mathsf{Poi}}(\vec{u}+\vec{\lambda})_{\vec{z}}}{\vec{\mathsf{Poi}}(\vec{\lambda})_{\vec{z}}}\right\|$
		$\displaystyle\displaystyle=\operatornamewithlimits{\mathbb{E}}_{\vec{z}\leftarrow\vec{\mathsf{Poi}}(\vec{\lambda})}\left\|\sum_{\vec{m}\in(\mathbb{Z}^{\geq 0})^{D}}\prod_{i=1}^{D}c_{\vec{m}_{i}}(\vec{z}_{i};\vec{\lambda}_{i})\cdot\frac{\operatornamewithlimits{\mathbb{E}}[\vec{U}^{\vec{m}}]-\operatornamewithlimits{\mathbb{E}}[\vec{V}^{\vec{m}}]}{\vec{m}!}\right\|,$

On Distributed Differential Privacy and Counting Distinct Elements

Abstract

1 Introduction

1.1 Counting Distinct Elements

1.1.1 Lower Bounds for Local DP Protocols

Theorem 1.1.

Theorem 1.2.

Theorem 1.3.

1.1.2 Lower Bounds for Single-Message Shuffle DP Protocols

Theorem 1.4.

Lemma 1.5.

1.1.3 A Communication-Efficient Shuffle DP Protocol

Theorem 1.6.

1.2 Dominated Protocols and Multi-Message Shuffle DP Protocols

Definition 1.7.

Lemma 1.8.

1.2.1 Lower Bounds for Selection

Theorem 1.9.

1.2.2 Lower Bounds for Parity Learning

Theorem 1.10.

Independent Work.

1.3 Lower Bounds for Two-Party DP Protocols

Theorem 1.11.

1.4 Discussions and Open Questions

1.5 Organization

2 Overview of Techniques

2.1 Lower Bounds for CountDistinct via Moment Matching

Theorem 2.1 (A Weaker Version of Theorem 1.2).

Lemma 2.2 (Simplification of Lemma 4.3).

Lemma 2.3 (Simplification of Lemma 1.5).

2.2 Lower Bounds for CountDistinct and Selection via Dominated Protocols

Lemma 2.4 (Detailed Version of Theorem 1.1).

Theorem 2.5.

3 Preliminaries

3.1 Notation

3.2 Differential Privacy

Definition 3.1 (Differential privacy (DP) [DMNS06, DKM+06]).

Lemma 3.2 (Post-processing, e.g., [DR14]).

Definition 3.3 (Hockey Stick Divergence).

Proposition 3.4.

Proof.

3.3 Shuffle Model

Definition 3.5 (DP in the Shuffle Model, [EFM+19, CSU+18]).

Definition 3.6 (Local DP [KLN+11]).

Public-Coin DP.

3.4 Useful Divergences

3.5 Fourier Analysis

Lemma 3.7 (Level-1 Inequality).

Lemma 3.8 (Parseval’s Identity).

4 Low-Privacy DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} and DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} Lower Bounds for CountDistinct

4.1 Preliminaries

4.2 Low-Privacy DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} Lower Bounds for CountDistinct

Theorem 4.1 (The Private-Coin Case of Theorem 1.2).

4.2.1 Technical Lemmas

Lemma 4.2 ([WY19]).

Lemma 4.3.

Lemma 4.4.

Proof.

4.2.2 Construction of the Hard Dataset Distributions

4.2.3 Conditions on a Good Subset E\displaystyle E

Lemma 4.5.

4.2.4 The Lower Bound

Lemma 4.6.

Proof.

Proof of Item (1).

Proof of Item (2).

Proof of Theorem 4.1.

4.2.5 A Probabilistic Construction of Good E\displaystyle E

Proposition 4.7.

Proof.

Proof.

Proving that PrE⁡[ℰ𝗅𝗂𝗀𝗁𝗍|ℰ𝗌𝗂𝗓𝖾]=1\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1.

Proving that Pr⁡[ℰ𝗁𝖾𝖺𝗏𝗒|ℰ𝗌𝗂𝗓𝖾]\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}] is large.

4.3 DPshuffle1\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1} Implies DPlocal\displaystyle\mathrm{DP}_{\mathrm{local}} with Stronger Privacy Bound

Lemma 4.8.

Lemma 4.9.

Proof.

Theorem 4.10.

Proof.

4.4 Generalizing to Public-Coin Protocols

On Distributed Differential Privacy
and Counting Distinct Elements

Definition 3.1 (Differential privacy (DP) [DMNS06, DKM⁺06]).

Definition 3.5 (DP in the Shuffle Model, [EFM⁺19, CSU⁺18]).

Definition 3.6 (Local DP [KLN⁺11]).

4 Low-Privacy $\displaystyle\mathrm{DP}_{\mathrm{local}}$ and $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ Lower Bounds for CountDistinct

4.2 Low-Privacy $\displaystyle\mathrm{DP}_{\mathrm{local}}$ Lower Bounds for CountDistinct

4.2.3 Conditions on a Good Subset $\displaystyle E$

4.2.5 A Probabilistic Construction of Good $\displaystyle E$

Proving that $\displaystyle\Pr_{E}[\mathcal{E}_{\sf light}|\mathcal{E}_{\sf size}]=1$ .

Proving that $\displaystyle\Pr[\mathcal{E}_{\sf heavy}|\mathcal{E}_{\sf size}]$ is large.

4.3 $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}^{1}$ Implies $\displaystyle\mathrm{DP}_{\mathrm{local}}$ with Stronger Privacy Bound

5 $\displaystyle(\varepsilon,\delta)$ -Dominated Algorithms

5.1 Approximate- $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols are Dominated

5.1.3 Multi-Message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols are Pseudo-Locally Private

Bounding $\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{g}_{z}(\lambda_{v})^{2}$ .

Bounding $\displaystyle\operatornamewithlimits{\mathbb{E}}_{v\leftarrow\pi}\hat{h}_{z}(\lambda_{v})^{2}$ .

7.2 $\displaystyle\mathrm{DP}_{\mathrm{local}}$ Lower bound

8 Low-Message $\displaystyle\mathrm{DP}_{\mathrm{shuffle}}$ Protocols for CountDistinct

Decomposition of $\displaystyle X=A+2\cdot N$ .

Decomposition of $\displaystyle Y=A+2\cdot N+B$ .

Bounding $\displaystyle d_{\varepsilon/2}(X||Y)$ .