On Distributed Differential Privacy
and Counting Distinct Elements
Abstract
We study the setup where each of users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of -differentially privacy:
-
•
In the non-interactive local setting, we prove that the additive error of any protocol is for any constant and for any inverse polynomial in .
-
•
In the single-message shuffle setting, we prove a lower bound of on the error for any constant and for some inverse quasi-polynomial in . We do so by building on the moment-matching method from the literature on distribution estimation.
-
•
In the multi-message shuffle setting, we give a protocol with at most one message per user in expectation and with an error of for any constant and for any inverse polynomial in . Our protocol is also robustly shuffle private, and our error of matches a known lower bound for such protocols.
Our proof technique relies on a new notion, that we call dominated protocols, and which can also be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of selection and learning parity.
Our first lower bound for estimating the number of distinct elements provides the first separation between global sensitivity and error in local differential privacy, thus answering an open question of Vadhan (2017). We also provide a simple construction that gives separation between global sensitivity and error in two-party differential privacy, thereby answering an open question of McGregor et al. (2011).
1 Introduction
Differential privacy (DP) [DMNS06, DKM+06] has become a leading framework for private-data analysis, with several recent practical deployments [EPK14, Sha14, Gre16, App17, DKY17, Abo18]. The most commonly studied DP setting is the so-called central (aka curator) model whereby a single authority (sometimes referred to as the analyst) is trusted with running an algorithm on the raw data of the users and the privacy guarantee applies to the algorithm’s output.
The absence, in many scenarios, of a clear trusted authority has motivated the study of distributed DP models. The most well-studied such setting is the local model [KLN+11] (also [War65]), denoted henceforth by , where the privacy guarantee is enforced at each user’s output (i.e., the protocol transcript). While an advantage of the local model is its very strong privacy guarantees and minimal trust assumptions, the noise that has to be added can sometimes be quite large. This has stimulated the study of “intermediate” models that seek to achieve accuracy close to the central model while relying on more distributed trust assumptions. One such middle-ground is the so-called shuffle (aka anonymous) model [IKOS06, BEM+17, CSU+18, EFM+19], where the users send messages to a shuffler who randomly shuffles these messages before sending them to the analyzer; the privacy guarantee is enforced on the shuffled messages (i.e., the input to the analyzer). We study both the local and the shuffle models in this work.
1.1 Counting Distinct Elements
A basic function in data analytics is estimating the number of distinct elements in a domain of size held by a collection of users, which we denote by (and simply by if there is no restriction on the universe size). Beside its use in database management systems, it is a well-studied problem in sketching, streaming, and communication complexity (e.g., [KNW10, BCK+14] and the references therein). In central DP, it can be easily solved with constant error using the Laplace mechanism [DMNS06]; see also [MMNW11, DLB19, PS20, CDSKY20].
We obtain new results on -DP protocols for CountDistinct in the local and shuffle settings111For formal definitions, please refer to Section 3. We remark that, throughout this work, we consider the non-interactive local model where all users apply the same randomizer (see Definition 3.6). We briefly discuss in Section 1.4 possible extensions to interactive local models..
1.1.1 Lower Bounds for Local DP Protocols
Our first result is a lower bound on the additive error of protocols222See Section 3 for the the formal (standard) definition of public-coin DP protocols. Note that private-coin protocols are a sub-class of public-coin protocols, so all of our lower bounds apply to private-coin protocols as well. for counting distinct elements.
Theorem 1.1.
For any , no public-coin - protocol can solve333Throughout this work, we say that a randomized algorithm solves a problem with error if with probability 0.99 it incurs error at most . with error .
The lower bound in Theorem 1.1 is asymptotically tight444The trivial algorithm that always outputs incurs an error .. Furthermore, it answers a question of Vadhan [Vad17, Open Problem 9.6], who asked if there is a function with a gap of between its (global) sensitivity and the smallest achievable error by any protocol.555 To the best of our knowledge, the largest previously known gap between global sensitivity and error was , which is achieved, e.g., by binary summation [CSS12]. As the global sensitivity of the number of distinct elements is , Theorem 1.1 exhibits a (natural) function for which this gap is as large as . While Theorem 1.1 applies to the constant regime, it turns out we can prove a lower bound for much less private protocols (i.e., having a much larger value) at the cost of polylogarithmic factors in the error:
Theorem 1.2.
For some and , no public-coin - protocol can solve with error .
To prove Theorem 1.2, we build on the moment matching method from the literature on (non-private) distribution estimation, namely [VV17, WY19], and tailor it to CountDistinct in the setting (see Section 2.1 for more details on this connection). The bound on the privacy parameter in Theorem 1.2 turns out to be very close to tight: the error drops quadratically when exceeds . This is shown in the next theorem:
Theorem 1.3.
There is a - protocol solving with error .
1.1.2 Lower Bounds for Single-Message Shuffle DP Protocols
In light of the negative result in Theorem 1.2, a natural question is whether CountDistinct can be solved in a weaker distributed DP setting such as the shuffle model. It turns out that this is not possible using any shuffle protocol where each user sends no more than message (for brevity, we will henceforth denote this class by , and more generally denote by the variant where each user can send up to messages). Note that the class includes any method obtained by taking a protocol and applying the so-called amplification by shuffling results of [EFM+19, BBGN19].
In the case where is any constant and is inverse quasi-polynomial in , the improvement in the error for protocols compared to is at most polylogarithmic factors:
Theorem 1.4.
For all , there are and such that no public-coin - protocol can solve with error .
We note that Theorem 1.4 essentially answers a more general variant of Vadhan’s question: it shows that even for protocols (which include protocols as a sub-class) the gap between sensitivity and the error can be as large as .
The proof of Theorem 1.4 follows by combining Theorem 1.2 with the following connection between and :
Lemma 1.5.
For any and , if the randomizer is - on users, then is -.
We remark that Lemma 1.5 provides a stronger quantitative bound than the qualitatively similar connections in [CSU+18, GGK+19]; specifically, we obtain the term , which was not present in the aforementioned works. This turns out to be crucial for our purposes, as this term gives the term necessary to apply Theorem 1.2.
1.1.3 A Communication-Efficient Shuffle DP Protocol
In contrast with Theorem 1.4, Balcer et al. [BCJM20] recently gave a protocol for with error . Their protocol sends messages per user. We instead show that an error of can still be guaranteed with each user sending in expectation at most one message each of length bits.
Theorem 1.6.
For all and , there is a public-coin - protocol that solves with error where the expected number of messages sent by each user is at most one.
In the special case where , we moreover obtain a private-coin protocol achieving the same guarantees as in Theorem 1.6 (see Theorem 8.4 for a formal statement). Note that Theorem 1.6 is in sharp contrast with the lower bound shown in Theorem 1.4 for protocols. Indeed, for inverse quasi-polynomial in , the former implies a public-coin protocol with less than one message per-user in expectation having error whereas the latter proves that no such protocol exists, even with error as large as , if we restrict each user to send one message in the worst case.
A strengthening of protocols called robust protocols666Roughly speaking, they are protocols whose transcript remains private even if a constant fraction of users drop out from the protocol. was studied by [BCJM20], who proved an lower bound on the error of any protocol solving . Our protocols are robust and, therefore, achieve the optimal error (up to polylogarithmic factors) among all robust protocols, while only sending at most one message per user in expectation.
1.2 Dominated Protocols and Multi-Message Shuffle DP Protocols
The technique underlying the proof of Theorem 1.1 can be extended beyond protocols for CountDistinct. It applies to a broader category of protocols that we call dominated, defined as follows:
Definition 1.7.
We say that a randomizer is -dominated, if there exists a distribution on such that for all and all ,
In this case, we also say is -dominated by . We define -dominated protocols in the same way as -, except that we require the randomizer to be -dominated instead of being -DP.
Note that an - randomizer is -dominated: we can fix a and take . Therefore, our new definition is a relaxation of .
We show that multi-message protocols are dominated, which allows us to prove the first non-trivial lower bounds against protocols.
Before formally stating this connection, we recall why known lower bounds against protocols [CSU+18, GGK+19, BC20] do not extend to protocols.777We remark that [GGK+20] developed a technique for proving lower bounds on the communication complexity (i.e., the number of bits sent per user) for multi-message protocols. Their techniques do not apply to our setting as our lower bounds are in terms of the number of messages, and do not put any restriction on the message length. Furthermore, their technique only applies to pure-DP where , whereas ours applies also to approximate-DP where . These prior works use the connection stating that any - protocol is also - [CSU+18, Theorem 6.2]. It thus suffices for them to prove lower bounds for protocols with low privacy requirement (i.e., -), for which lower bound techniques are known or developed. For - protocols, [BC20] showed that they are also -; therefore, lower bounds on protocols automatically translate to lower bounds on pure- protocols. To apply this proof framework to protocols, a natural first step would be to connect protocols to protocols. However, as observed by [BC20, Section 4.1], there exists an - protocol that is not for any privacy parameter. That is, there is no analogous connection between protocols and multi-message protocols, even if the latter can only send messages per user.
In contrast, the next lemma captures the connection between multi-message and dominated protocols.
Lemma 1.8.
If is - on users, then it is -dominated.
By considering dominated protocols and using Lemma 1.8, we obtain the first lower bounds for multi-message protocols for two well-studied problems: Selection and ParityLearning.
1.2.1 Lower Bounds for Selection
The Selection problem on users is defined as follows. The th user has an input and the goal is to output an index such that . Selection is well-studied in DP (e.g., [DJW13, SU17, Ull18]) and its variants are useful primitives for several statistical and algorithmic problems including feature selection, hypothesis testing and clustering. In central DP, the exponential mechanism of [MT07] yields an -DP algorithm for Selection when . On the other hand, it is known that any - protocol for Selection with and requires users [Ull18]. Moreover, [CSU+18] obtained a - protocol for . By contrast, for protocols, a lower bound of was obtained in [CSU+18] and improved to in [GGK+19].
The next theorem give a lower bounds for Selection that holds against approximate- protocols. To the best of our knowledge, this is the first lower bound even for (and even for the special case of pure protocols, where ).
Theorem 1.9.
For any , any public-coin - protocol that solves Selection requires .
We remark that combining the advanced composition theorem for DP and known aggregation algorithms, one can obtain a - protocol for Selection with samples for any (see Appendix D for details).
1.2.2 Lower Bounds for Parity Learning
In ParityLearning, there is a hidden random vector , each user gets a random vector together with the inner product over , and the goal is to recover . This problem is well-known for separating PAC learning from the Statistical Query (SQ) learning model [Kea98]. In DP, it was studied by [KLN+11] who gave a central DP protocol (also based on the exponential mechanism) computing it for , and moreover proved a lower bound of for any protocol, thus obtaining the first exponential separation between the central and local settings.
We give a lower bound for ParityLearning that hold against approximate- protocols:
Theorem 1.10.
For any , if is a public-coin - protocol that solves ParityLearning with probability at least , then .
Our lower bounds for ParityLearning can be generalized to the Statistical Query (SQ) learning framework of [Kea98] (see Section C for more details).
Independent Work.
In a recent concurrent work, Cheu and Ullman [CU20] proved that robust protocols solving Selection and ParityLearning require and samples, respectively. Their results have no restriction on the number of messages sent by each user, but they only hold against the special case of robust protocols. Our results provide stronger lower bounds when the number of messages per user is less than , and apply to the most general model without the robustness restriction.
1.3 Lower Bounds for Two-Party DP Protocols
Finally, we consider another model of distributed DP, called the two-party model [MMP+10], denoted . In this model, there are two parties, each holding part of the dataset. The DP guarantee is enforced on the view of each party (i.e., the transcript, its private randomness, and its input). See Section 9 for a formal treatment.
McGregor et al. [MMP+10] studied the and proved an interesting separation of between the global sensitivity and -DP protocol in this model. However, this lower bound does not extend to the approximate-DP case (where ); in this case, the largest known gap (also proved in [MMP+10]) is only , and it was left as an open question if this can be improved888The conference version of the paper [MMP+10] actually claimed to also have a lower bound for the approximate-DP case as well. However, it was later found to be incorrect; see [MMP+11] for more discussions.. We answer this question by showing that the gap of holds even against approximate-DP protocols:
Theorem 1.11.
For any and any sufficiently large , there is a function whose global sensitivity is one and such that no - protocol can compute to within an error of .
The above bound is tight up to a logarithmic factors in , as it is trivial to achieve an error of .
The proof of Theorem 1.11 is unlike others in the paper; in fact, we only employ simple reductions starting from the hardness of inner product function already shown in [MMP+10]. Specifically, our function is a sum of blocks of inner product modulo 2. While this function is not symmetric, we show (Theorem 9.5) that it can be easily symmetrized.
1.4 Discussions and Open Questions
In this work, we study DP in distributed models, including the local and shuffle settings. By building on the moment matching method and using the newly defined notion of dominated protocols, we give novel lower bounds in both models for three fundamental problems: CountDistinct, Selection, and ParityLearning. While our lower bounds are (nearly) tight in a large setting of parameters, there are still many interesting open questions, three of which we highlight below:
-
•
Lower Bounds for Protocols with Unbounded Number of Messages. Our connection between and dominated protocols becomes weaker as (Lemma 1.8). As a result, it cannot be used to establish lower bounds against protocols with a possibly unbounded number of messages. In fact, we are not aware of any separation between central DP and without a restriction on the number of messages and without the robustness restriction. This remains a fundamental open question. (In contrast, separations between central DP and are well-known, even for basic functions such as binary summation [CSS12].)
-
•
Lower Bounds against Interactive Local/Shuffle Model. Our lower bounds hold in the non-interactive local and shuffle DP models, where all users send their messages simultaneously in a single round. While it seems plausible that our lower bounds can be extended to the sequentially interactive local DP model [DJW13] (where each user speaks once but not simultaneously), it is unclear how to extend them to the fully interactive local DP model.
The situation for however is more complicated. Specifically, we are not aware of a formal treatment of an interactive setting for the shuffle model, which would be the first step in providing either upper or lower bounds. We remark that certain definitions could lead to the model being as powerful as the central model (in terms of achievable accuracy and putting aside communication constraints); see e.g., [IKOS06] on how to perform secure computations under a certain definition of the shuffle model.
-
•
Lower Bounds for CountDistinct with Larger . All but one of our lower bounds hold as long as , which is a standard assumption in the DP literature. The only exception is that of Theorem 1.4, which requires for some constant . It is interesting whether this can be relaxed to .
1.5 Organization
We describe in Section 2 the techniques underlying our results.Some basic definitions and notation are given in Section 3. We prove our main lower bounds for CountDistinct (Theorems 1.2 and 1.4) in Section 4. In Section 5, we define dominated protocols and prove Lemma 1.8. Our lower bounds for Selection and ParityLearning are then proved in Section 6. Theorem 1.1 is then proved in Section 7. Our protocol for CountDistinct is presented and analyzed in Section 8. Finally, our lower bounds in the two-party model (in particular, Theorem 1.11) are proved in Section 9. Some deferred proofs appear in Appendices A and B. The connection to the SQ model is presented in Appendix C. Finally, in Appendix D, we describe the protocol for Selection with sample complexity .
2 Overview of Techniques
In this section, we describe the main intuition behind our lower bounds. As alluded to in Section 1, we give two different proofs of the lower bounds for CountDistinct in the and settings, each with its own advantages:
-
•
Proof via Moment Matching. Our first proof is technically the hardest in our work. It applies to the much more challenging low-privacy setting (i.e., -), and shows an lower bound on the additive error (Theorem 1.2). Together with our new improved connection between and (Lemma 1.5), it also implies the same lower bound for protocols in the model. The key ideas behind the first proof will be discussed in Section 2.1.
-
•
Proof via Dominated Protocols. Our second proof has the advantage of giving the optimal lower bound on the additive error (Theorem 1.1), but only in the constant privacy regime (i.e., -), and it is relatively simple compared to the first proof.
Moreover, the second proof technique is very general and is a conceptual contribution: it can be applied to show lower bounds for other fundamental problems (i.e., Selection and ParityLearning; Theorems 1.9 and 1.10) against multi-message protocols. We will highlight the intuition behind the second proof in Section 2.2.
While our lower bounds also work for the public-coin models, throughout this section, we focus on private-coin models in order to simplify the presentation. The full proofs extending to public-coin protocols are given in later sections.
2.1 Lower Bounds for CountDistinct via Moment Matching
To clearly illustrate the key ideas behind the first proof, we will focus on the pure-DP case where each user can only send bits. In Section 4, we generalize the proof to approximate-DP and remove the restriction on communication complexity.
Theorem 2.1 (A Weaker Version of Theorem 1.2).
For and , no - protocol where each user sends bits can solve with error .
Throughout our discussion, we use to denote a - randomizer. By the communication complexity condition of Theorem 2.1, we have that .
Our proof is inspired by the lower bounds for estimating distinct elements in the property testing model, e.g., [VV17, WY19]. In particular, we use the so-called Poissonization trick. To discuss this trick, we start with some notation. For a vector , we use to denote the joint distribution of independent Poisson random variables:
For a distribution on , we define the corresponding mixture of multi-dimensional Poisson distributions as follows:
For two random variables and supported on , we use to denote the random variable distributed as a sum of two independent samples from and .
Shuffling the Outputs of the Local Protocol. Our first observation is that the analyzer for any local protocol computing CountDistinct should achieve the same accuracy if it only sees the histogram of the randomizers’ outputs. This holds because only seeing the histogram of the outputs is equivalent to shuffling the outputs by a uniformly random permutation, which is in turn equivalent to shuffling the users in the dataset uniformly at random. Since shuffling the users in a dataset does not affect the number of distinct elements, it follows that only seeing the histogram does not affect the accuracy. Therefore, we only have to consider the histogram of the outputs of the local protocol computing CountDistinct. For a dataset , we use to denote the distribution of the histogram with randomizer .
Poissonization Trick. Given a distribution on , suppose we draw a sample , and then draw samples from . If we use to denote the random variable corresponding to the histogram of these samples, it follows that each coordinate of is independent, and is distributed as , where for each .
We can now apply the above trick to the context of local protocols (recall that by our first observation, we can focus on the histogram of the outputs). Suppose we build a dataset by drawing a sample and then adding users with input . By the above discussion, the corresponding histogram of the outputs with randomizer is distributed as , where is treated as an -dimensional vector corresponding to its probability distribution.
Moment-Matching Random Variables. Our next ingredient is the following construction of two moment-matching random variables used in [WY19]. Let and . There are two random variables and supported on , such that and for every . Moreover . That is, and have the same moments up to degree , while the probabilities of them being zero differs significantly. We will set and hence .
Construction of Hard Distribution via Signal/Noise Decomposition. Recalling that , we will construct two input distributions for .999In fact, in our presentation the number of inputs in each dataset from our hard distributions will not be exactly , but only concentrated around . This issue can be easily resolved by throwing “extra” users in the dataset; we refer the reader to Section 4.2 for the details. A sample from both distributions consists of two parts: a signal part with many users in expectation, and a noise part with many users in expectation.
Formally, for a distribution over and a subset , the dataset distributions and are constructed as follows:
-
•
: for each , we independently draw , and , and add many users with input .
-
•
: for each , we independently draw , and add many users with input .
We are going to fix a “good” subset of such that (we will later specify the other conditions for being “good”). Therefore, when it is clear from the context, we will use instead of .
Our two hard distributions are then constructed as and . Using the fact that , one can verify that there are users in each of and in expectation. Similarly, one can also verify there are users in in expectation. Hence, both and have users in expectation. In fact, the number of users from both distributions concentrates around .
We now justify our naming of the signal/noise distributions. First, note that the number of distinct elements in the signal parts and concentrates around and respectively. By our condition that , it follows that the signal parts of and separates their numbers of distinct elements by at least . Second, note that although has many users in expectation, they are from the subset of size less than . Therefore, these users collectively cannot change the number of distinct elements by more than , and the numbers of distinct elements in and are still separated by .
Decomposition of Noise Part. To establish the desired lower bound, it now suffices to show for the local randomizer , it holds that and are very close in statistical distance. For , we can decompose as
By the additive property of Poisson distributions, letting , we have that .
Our key idea is to decompose carefully into nonnegative vectors , such that . Then, for , we have
To show that and are close, it suffices to show that for each , it is the case that and are close. We show that they are close when is sufficiently large on every coordinate compared to .
Lemma 2.2 (Simplification of Lemma 4.3).
For each , and every , if for every , then101010We use to denote the total variation (aka statistical) distance between two distributions .
To apply Lemma 2.2, we simply set and . Letting , the requirement that has to be nonnegative translates to , for each .
Construction of a Good Subset . So we want to pick a subset of size at most such that the corresponding satisfies for each . We will show that a simple random construction works with high probability: i.e., one can simply add each element of to independently with probability .
More specifically, for each , we will show that with high probability . Then the correctness of our construction follows from a union bound (and this step crucially uses the fact that ).
Now, let us fix a . Let . Since is -DP, it follows that . We consider the following two cases:
-
1.
If , we immediately get that (which uses the fact that ).
-
2.
If , then in this case, the mass is distributed over at least many components . Applying Hoeffding’s inequality shows that with high probability over , it is the case that (which uses the fact that ).
See the proof of Lemma 4.5 for a formal argument and how to get rid of the assumption that .
The Lower Bound. From the above discussions, we get that
Hence, the analyzer of the local protocol with randomizer cannot distinguish and , and thus it cannot solve with error and probability. See the proof of Theorem 4.1 for a formal argument and how to deal with the fact that there may not be exactly users in dataset from or .
Single-Message Lower Bound. To apply the above lower bound to protocols, the natural idea is to resort to the connection between the and models. In particular, [CSU+18] showed that - protocols are also -.
It may seem that the privacy guarantee is very close to the one in Theorem 1.2. But surprisingly, it turns out (as was stated in Theorem 1.3) that there is a - protocol solving (hence also ) with error . Hence, to establish the lower bound (Theorem 1.4), we rely on the following stronger connection between and protocols.
Lemma 2.3 (Simplification of Lemma 1.5).
For every , if the randomizer is - on users, then is -.
2.2 Lower Bounds for CountDistinct and Selection via Dominated Protocols
We will first describe the proof ideas behind Theorem 1.1, which is restated below. Then, we apply the same proof technique to obtain lower bounds for Selection (the lower bound for ParityLearning is established similarly; see Section 6.3 for details).
Lemma 2.4 (Detailed Version of Theorem 1.1).
For , no -dominated protocol can solve CountDistinct with error .
Hard Distributions for . We now construct our hard instances for . For simplicity, we assume for an integer , and identify the input space with by a fixed bijection. Let be the the uniform distribution over . For , we let be the uniform distribution on .
We also use to denote the mixture of and which outputs a sample from with probability and a sample from with probability .
For a parameter , we consider the following two dataset distributions with users:
-
•
: each user gets an i.i.d. input from . That is, .
-
•
: to sample a dataset from , we first draw from uniformly at random, then each user gets an i.i.d. input from . Formally, .
Since for every , it holds that , the number of distinct elements from any dataset in is at most . On the other hand, since is a uniform distribution over elements, a random dataset from has roughly distinct elements with high probability. Hence, the expected number of distinct elements of datasets from is controlled by the parameter . A simple but tedious calculation shows that it is approximately , which can be approximated by for (see Proposition 7.1 for more details). Hence, any protocol solving CountDistinct with error should be able to distinguish between the above two distributions. Our goal is to show that this is impossible for -dominated protocols.
Bounding KL Divergence for Dominated Protocols. Our next step is to upper-bound the statistical distance . As in previous work [Ull18, GGK+19, ENU20], we may upper-bound the KL divergence instead. By the convexity and chain-rule properties of KL divergence, it follows that
(1) |
Bounding the Average KL Divergence between a Family and a Single Distribution. We are now ready to introduce our general tool for bounding average KL divergence quantities like (1). We first set up some notation. Let be an index set and be a family of distributions on , let be a distribution on , and be a distribution on . For simplicity, we assume that for every and , it holds that (which is true for and ).
Theorem 2.5.
Let be a concave function such that for all functions satisfying , it holds that
Then for an -dominated randomizer , it follows that
Bounding (1) via Fourier Analysis. To apply Theorem 2.5, for with , we want to bound
By Parseval’s Identity (see Lemma 3.8),
Therefore, we can set , and apply Theorem 2.5 to obtain
We set such that for a sufficiently small constant and note that . It follows that , and therefore by Pinsker’s inequality. Hence, we conclude that -dominated protocols cannot solve with error , completing the proof of Lemma 2.4. Now Theorem 1.1 follows from Lemma 2.4 and the fact that - protocols are also -dominated.
Lower Bounds for Selection against Multi-Message Protocols. Now we show how to apply Theorem 2.5 and Lemma 2.3 to prove lower bounds for Selection. For , let be the uniform distribution on all length- binary strings with th bit being . Recall that is the uniform distribution on . Again we aim to upper-bound the average-case KL divergence
To apply Theorem 2.5, for with , we want to bound
By the Level-1 Inequality (see Lemma 3.7), it is the case that
Therefore, we can set for an appropriate constant , and apply Theorem 2.5 to obtain
Combining this with Lemma 2.3 completes the proof (see the proofs of Lemma 6.3 and Theorem 1.9 for the details).
3 Preliminaries
3.1 Notation
For a function , a distribution on , and an element , we use to denote and to denote . For a subset , we use to denote . We also use to denote the uniform distribution over .
For two distributions and on sets and respectively, we use to denote their product distribution over . For two random variables and supported on for , we use to denote the random variable distributed as a sum of two independent samples from and . For any set , we denote by the set consisting of sequences on , i.e., . For , let denote . For a predicate , we use to denote the corresponding Boolean value of , that is, if is true, and otherwise.
For a distribution on a finite set and an event such that , we use to denote the conditional distribution such that
Slightly overloading the notation, we also use to denote the mixture of distributions and with mixing weights and respectively. Whether means mixture or convolution will be clear from the context unless explicitly stated.
3.2 Differential Privacy
We now recall the basics of differential privacy that we will need. Fix a finite set , the space of user reports. A dataset is an element of , namely a tuple consisting of elements of . Let be the histogram of : for any , the th component of is the number of occurrences of in the dataset . We will consider datasets to be equivalent if they have the same histogram (i.e., the ordering of the elements does not matter). For a multiset whose elements are in , we will also write to denote the histogram of (so that the th component is the number of copies of in ).
Let , and consider a dataset . For an element , let be the frequency of in , namely the fraction of elements of that are equal to . Two datasets are said to be neighboring if they differ in a single element, meaning that we can write (up to equivalence) and . In this case, we write . Let be a set; we now define the differential privacy of a randomized function as follows.
Definition 3.1 (Differential privacy (DP) [DMNS06, DKM+06]).
A randomized algorithm is -DP if for every pair of neighboring datasets and for every set , we have
where the probabilities are taken over the randomness in . Here, and .
If , then we use -DP for brevity and informally refer to it as pure-DP; if , we refer to it as approximate-DP. We will use the following post-processing property of DP.
Lemma 3.2 (Post-processing, e.g., [DR14]).
If is -DP, then for every randomized function , the composed function is -DP.
DP is nicely characterized by the following divergence between distributions, which will be used throughout the paper.
Definition 3.3 (Hockey Stick Divergence).
For any , the -hockey stick divergence between distributions and is defined as .
We next list two useful facts about the hockey stick divergence between distributions.
Proposition 3.4.
Let and be any distributions. Then, the following hold:
-
1.
Let be another distribution. Then, for any function , it holds that
-
2.
Suppose we can decompose and , where ’s and ’s are tuples of positives reals summing up to and ’s and ’s are distributions, then
Proof.
Item (1) follows from the post-processing property of DP, together with the definition of the hockey stick divergence.
To prove Item (2), we note that
3.3 Shuffle Model
We briefly review the shuffle model of DP [BEM+17, EFM+19, CSU+18]. The input to the model is a dataset , where item is held by user . A protocol in the shuffle model consists of three algorithms:
-
•
The local randomizer takes as input the data of one user, , and outputs a sequence of messages; here is a positive integer.
To ease discussions in the paper, we will further assume that the randomizer pre-shuffles its messages. That is, it applies a random permutation to the sequence before outputting it.111111Therefore, for every and any two tuples that are equivalent up to a permutation, outputs them with the same probability.
-
•
The shuffler takes as input a sequence of elements of , say , and outputs a random permutation, i.e., the sequence , where is a uniformly random permutation on . The input to the shuffler will be the concatenation of the outputs of the local randomizers.
-
•
The analyzer takes as input a sequence of elements of (which will be taken to be the output of the shuffler) and outputs an answer in that is taken to be the output of the protocol .
We will write to denote the protocol whose components are given by , , and . The main distinction between the shuffle and local model is the introduction of the shuffler between the local randomizer and the analyzer. As in the local model, the analyzer is untrusted in the shuffle model; hence privacy must be guaranteed with respect to the input to the analyzer, i.e., the output of the shuffler. Formally, we have:
Definition 3.5 (DP in the Shuffle Model, [EFM+19, CSU+18]).
A protocol is -DP if, for any dataset , the algorithm
is -DP.
Notice that the output of can be simulated by an algorithm that takes as input the multiset consisting of the union of the elements of (which we denote as , with a slight abuse of notation) and outputs a uniformly random permutation of them. Thus, by Lemma 3.2, it can be assumed without loss of generality for privacy analyses that the shuffler simply outputs the multiset . For the purpose of analyzing the accuracy of the protocol , we define its output on the dataset to be . We also remark that the case of local DP, formalized in Definition 3.6, is a special case of the shuffle model where the shuffler is replaced by the identity function:
Definition 3.6 (Local DP [KLN+11]).
A protocol is -DP in the local model (or -locally DP) if the function is -DP.
We say that the output of the protocol on an input dataset is .
We denote DP in the shuffle model by , and the special case where each user can send at most121212We may assume w.l.o.g. that each user sends exactly messages; otherwise, we may define a new symbol and make each user sends messages so that the number of messages becomes exactly . messages by . We denote DP in the local model by .
Public-Coin DP.
The default setting for local and shuffle models is private-coin, i.e., there is no randomness shared between the randomizers and the analyzer. We will also study the public-coin variants of the local and shuffle models. In the public-coin setting, each local randomizer also takes a public random string as input. The analyzer is also given the public random string . We use to denote the local randomizer with public random string being fixed to . At the start of the protocol, all users jointly sample a public random string from a publicly known distribution .
Now, we say that a protocol is -DP in the public-coin local model, if the function
is -DP.
Similarly, we say that a protocol is -DP in the public-coin shuffle model, if for any dataset , the algorithm
is -DP.
3.4 Useful Divergences
We will make use of two important divergences between distributions, the KL-divergence and the -divergence, defined as
We rely on the key fact that -divergence upper-bounds KL-divergence [GS02], that is,
We will also use Pinsker’s inequality, whereby the total variation distance lower-bounds the KL-divergence:
3.5 Fourier Analysis
We now review some basic Fourier analysis and then introduce two inequalities that will be heavily used in our proofs. For a function , its Fourier transform is given by the function . We also define . For , we define the level- Fourier weight as . For convenience, for , we will also write to denote , where is the set . One key technical lemma is the Level-1 Inequality from [O’D14], which was also used in [GGK+19].
Lemma 3.7 (Level-1 Inequality).
Suppose is a non-negative-valued function with for all , and . Then, .
We also need the standard Parseval’s identity.
Lemma 3.8 (Parseval’s Identity).
For all functions ,
4 Low-Privacy and Lower Bounds for CountDistinct
In this section, we prove Theorem 1.2 and Theorem 1.4. In Section 4.1, we introduce some necessary definitions and notation. In Section 4.2, we prove our lower bound for low-privacy (private-coin) protocols computing CountDistinct. In Section 4.3, we show the improved connection between and , which implies our lower bounds for protocols for CountDistinct. In Section 4.4, we describe how to adapt the proof to public-coin protocols.
4.1 Preliminaries
Recall that we use the notations and to denote multi-dimensional Poisson distributions and their mixtures, respectively (see Section 2.1 for the precise definitions).
We also recall the key additive property of multi-dimensional Poisson distributions: for , we have that
4.2 Low-Privacy Lower Bounds for CountDistinct
We will first prove the low-privacy lower bounds in the private-coin setting, which is captured by the following theorem.
Theorem 4.1 (The Private-Coin Case of Theorem 1.2).
For some and , if is a private-coin - protocol, then it cannot solve with error and probability at least .
4.2.1 Technical Lemmas
Now we need the following construction from [WY19] (which uses a classical result from [Tim14, 2.11.1]).
Lemma 4.2 ([WY19]).
There is a constant such that, for all , there are two distributions and supported on for , such that , , and for every .
The following lemma is crucial for our proof. Its proof uses the moment matching technique [WY16, JHW18, WY19, Yan19], and can be found in Appendix A.
Lemma 4.3.
Let be two random variables supported on such that for all , where . Let and such that . Let be the distribution over corresponding to . Suppose that
Then,
Finally, we need an observation that for a protocol solving CountDistinct, we can assume without loss of generality that the analyzer of only sees the histogram of the messages.
Lemma 4.4.
For any protocol for CountDistinct, there exists an analyzer which only sees the histogram of the messages, and achieves the same accuracy and error as that of .
Proof.
Let be the number of users. Given the histogram, first constructs a sequence of messages consistent with the histogram. Then, it applies a random permutation to to obtain a new sequence . Finally, it simply outputs .
Note that applying a random permutation on the messages is equivalent to applying a random permutation on the user inputs in the dataset. Hence, the new protocol is equivalent to running on a random permutation of the dataset. The lemma follows from the fact that a random permutation does not change the number of distinct elements. ∎
4.2.2 Construction of the Hard Dataset Distributions
In the rest of the section, we use to denote a parameter controlling the number of users. The actual number of users will be later set to a number in the interval . In the following, we fix a randomizer which is - on users, for some to be specified later. Before constructing our two hard distributions over datasets, we set some parameters that will be used in the construction:
-
•
We set and note that for large enough .
-
•
Applying Lemma 4.2, for , we obtain two random variables and supported on , such that , , and for every .
-
•
We set and . We are going to construct instances where inputs are from the universe .
-
•
We set .
-
•
Let . We set so that . Hence, is -.
Now, for a distribution over and a non-empty subset of , the dataset distribution is constructed as follows:
-
1.
For each , we draw , and , and add many users with input .
-
2.
For each , we draw , and add many users with input .131313Note that we here use a Poisson distribution slightly differently from the construction in Section 2 (namely, instead of ) in order to simply the later calculations.
For clarity of exposition, we will use the histogram of the protocol to denote the histogram of the messages in the transcript of the protocol. Our goal is to show that for some “good” subset , the following hold:
-
1.
The distributions of the histogram of the protocol under and are very close.
-
2.
With high probability, the number of distinct elements in datasets from is smaller than in datasets from .
Clearly, given the above two conditions and Lemma 4.4, no protocol with randomizer can estimate the number of distinct elements within error and with constant probability.
4.2.3 Conditions on a Good Subset
Given a subset , we let . We also set . We now specify our conditions on a subset being good. Let . We say is good if the following two conditions hold:
-
1.
.
-
2.
For each ,
We claim that a good subset exists. In fact, we give a probabilistic construction of that succeeds with high probability:
Lemma 4.5.
If we include each element of in independently with probability , then is good with probability at least .
4.2.4 The Lower Bound
Before proving Lemma 4.5, we show that for a good , the distributions and satisfy our desired properties, and thereby imply our lower bound. For a dataset distribution , we use to denote the corresponding distribution of the histogram of the transcript, if all users apply the randomizer . For a dataset , we use to denote the number of distinct elements in it.
Lemma 4.6.
For a good subset of , the following hold:
-
1.
We have that
-
2.
There are two constants such that
and
Proof.
Proof of Item (1).
In the following, we use to denote for simplicity. We first construct vectors as follows: for each , if , then for all we set , otherwise we set for all . Note that for each , we have that . Let . By definition, it follows that is a non-negative vector. Now, and can be seen as distributions over histograms in . Let be independent random variables distributed as . By the construction of , we have that
Similarly, let be independent random variables distributed as . We have that
Since for each , we have that
Applying Lemma 4.3, for each , we have that
Therefore,
Proof of Item (2).
Let and . By Lemma 4.2, we have that , , and are supported on . Hence, it follows that and .
Now, consider the construction of . For every , at least one user with input is added to the dataset during phase (1) with probability . Moreover, these events are mutually independent. Hence, by a simple Chernoff bound, with probability at least , the number of distinct elements in the dataset after phase (1) is no greater than . Since the second phase can add at most many distinct elements, we can set .
Similarly, for instances generated from , with probability at least , the number of distinct elements in the dataset after phase (1) is at least . We can set .
By our condition on and , we have that , which completes the proof. ∎
We are now ready to prove Theorem 4.1. One complication is that datasets from and may not have the same number of users. We address this issue by “throwing out” extra users randomly and obtain distributions over datasets with exactly many users.
Proof of Theorem 4.1.
Consider the and . By a simple Chernoff bound, we have that with probability at least , the number of users lies in .
Recall that . We construct the distribution as follows: to generate a dataset from , we take a sample dataset from , and if there are users in , we delete users uniformly at random, and output . We similarly construct another distribution . Note that with probability at least , we delete at most users in the construction of (as well as in that of ).
Now, both and output datasets with exactly users with probability . This means that if there is a protocol solving CountDistinct with users with error , then by Lemma 4.4, Item (2) of Lemma 4.6 and since , the analyzer of the protocol should be able to distinguish and with at least a constant probability. Therefore, we have that
On the other hand, (respectively, ) can also be constructed by taking a sample from (respectively, ) and throwing out some random messages until at most messages remain. Since post-processing does not increase statistical distance, by Item (1) of Lemma 4.6, we have that
a contradiction. ∎
4.2.5 A Probabilistic Construction of Good
We need the following proposition for the proof of Lemma 4.5.
Proposition 4.7.
Let be an - randomizer. For every , it follows that
Proof.
Let be the set . Since is -, it follows that
By the definition of the set , it follows that
Putting the above two inequalities together, we have
which in turn implies that
Finally, we prove Lemma 4.5 (restated below).
Lemma 4.5. (restated) If we include each element in independently with probability , then is good with probability at least .
Proof.
Let be the event that . By a simple Chernoff bound, it follows that
Therefore, the first condition for being good is satisfied with probability . In the following, we will condition on the event .
Recall that and . In the rest of the proof, we will focus on the second condition for being good, namely that for each , it is the case that
In the following, we fix , and show that the previous inequality holds for with high probability. Therefore, we can then conclude that is good with high probability by a union bound.
We also let be such that for all . Now, for , if , we say is light; otherwise we say is heavy.
We define
and
It suffices to show that both and are very large. Note that is not defined when . But since we only care about and , this corner case is excluded by conditioning on .
Proving that .
By Proposition 4.7 and the fact that is -, for every
By a union bound over all elements in , we have that
which is equivalent to
Similarly, for , we also have that
Again since is -, we have that
Therefore, by a union bound over ,
Now we are ready to prove that . We will show that holds for every nonempty . We have that
(z is light implies ) | ||||
The last inequality follows from the fact that and together imply that (recall that ). Hence as the three inequalities cannot be simultaneously satisfied.
Proving that is large.
Now, for a heavy , we have that . In particular, fix a heavy , and define the random variable for each . Note that the ’s are independent variables over and . Letting , by Hoeffding’s inequality, we have that
Note that
Plugging in, it follows that
Note that , and that with probability at least . By a union bound, we have that with probability at least . Noting that , we have that
Recall that . By Markov’s inequality, we have . ∎
4.3 Implies with Stronger Privacy Bound
In this section, we prove a stronger connection between and than previously known. Together with Theorem 4.1, it implies the private-coin version of Theorem 1.4.
We first need a technical lemma which gives a lower bound on the hockey stick divergence between and . We defer its proof to Appendix B.
Lemma 4.8.
There exists an absolute constant such that, for every integer , three reals such that , letting and supposing , it holds that
We are now ready to prove the main lemma of this subsection.
Lemma 4.9.
For all , there is a constant such that for all if the randomizer is - on users, then is -.
Proof.
Let . Note that we can assume that , as otherwise and in this case the theorem follows directly from the fact that is - [CSU+18].
Suppose that is - on users. Let be a constant to be fixed later, and be an event. Our goal is to show that
for all .
Fix two . Let and . Note that without loss of generality we can assume that , as otherwise clearly .
Let and be two neighboring datasets, and be the random variables corresponding to the number of occurrences of the event in the transcript, when running the protocol with randomizer on datasets and , respectively.
From the assumption that is -, we have . Then, by the post-processing property of DP (Lemma 3.2), it follows that .
We have that
The goal now is to show that if , then and do not satisfy -DP (i.e., ), thus obtaining a contradiction.
Now, assume that . Since , we have that . Note that .
Letting and applying Lemma 4.8 for a universal constant , it follows that
Noting that , we have that
We now set the constant to be small enough so that
Plugging in and recalling that , we get that
a contradiction. ∎
Finally, we are ready to prove our lower bound for CountDistinct in the private-coin case.
Theorem 4.10.
For all , there are and such that no private-coin - protocol on users can solve with error and probability at least .
4.4 Generalizing to Public-Coin Protocols
Finally, we generalize our proof for the private-coin case to the public-coin case, and prove Theorem 1.4 (restated below).
Theorem 1.4. (restated) For all , there are and such that no public-coin - protocol on users can solve with error and probability at least .
Fix to be a public-coin randomizer with public randomness from distribution . We first generalize Lemma 4.9, and show that if is , then with high probability over , satisfies the similar guarantee as in Lemma 4.9.
Lemma 4.11.
For all , there is a constant such that for all if the public-coin randomizer with public randomness distribution is - on users, and if , then with probability at least over , it is the case that is -.
Proof Sketch.
Similar to the proof of Lemma 4.9, we can assume that without loss of generality.
By the definition of public-coin -DP in the shuffle model, for every two neighboring datasets and , we have that
Observe that the proof of Lemma 4.9 only considers the pairs of neighboring datasets of the form and for all . We use to denote the set of such pairs.
Using the assumption that , we have that
Thus, by Markov’s inequality, with probability at least over , we have that
where the last inequality follows from our assumption that . We say an is good if it satisfies the above inequality. In particular, for all good and all pairs , we have that
The proof of Lemma 4.9 then implies that is -, for a constant depending on . ∎
Now we are ready to prove Theorem 1.4.
Proof of Theorem 1.4.
We use and to denote the same distributions constructed in the proof of Theorem 4.10. We moreover use the same notation as in Section 4.2.
By a simple application of Markov’s inequality and noting that our assumed protocol solves with error and probability at least , it follows that with probability at least over , if , then
By Lemma 4.11, with probability at least over , we have that is -. We say that such an is good.
For all good and for a good subset , when the randomizer is set to (note that the definition of a good subset depends on the randomizer ), by a similar argument as in Theorem 4.10, we have
Now, by Lemma 4.5, if we construct by including each element of with probability , then for every good , we know that is good for randomizer with probability at least . By a union bound, there exists a fixed choice of such that is good for randomizer with probability at least over . In the following, we fix to be such a choice.
Finally, by a union bound, it follows that with probability at least , the above two inequalities hold simultaneously, a contradiction. ∎
5 -Dominated Algorithms
In [CSU+18], it was shown that an - protocol on users is also -, thereby reducing the problem of proving lower bounds for protocols to proving lower bounds on protocols with low privacy properties. However, it is known that such a connection does not hold even for protocols [BC20, Section 4.1].
Recall the definition of -dominated algorithms from Definition 1.7. In this section, we will show that protocols are dominated.
For clarity of exposition, we will assume that each user sends exactly messages; this is without loss of generality (see Footnote 12). To handle public-coin protocols, we need a relaxed version of dominated algorithms.
Definition 5.1 (Dominated Algorithms).
For a distribution on , we say an algorithm is -dominated, if for the distribution , there exists a distribution on such that
In this case, we also say is -dominated by .
5.1 Approximate- Protocols are Dominated
Next we show that approximate protocols are dominated. For this purpose, we introduce the concept of “pseudo-locally private” algorithms, which is a special case of dominated algorithms, and may be interesting in its own right.
5.1.1 Merged Randomizer
Let be the set of all -sized subsets of the set . For and a randomizer , we define the merged randomizer of with respect to , denoted by , as follows:
-
•
Given an input , for each , we simulate with independent random coins to get an output .
-
•
Assume that consists of elements indexed in lexicographical order. We construct a -tuple such that for each .
-
•
We pre-shuffle before outputting it. That is, we draw a permutation uniformly at random, shuffle according to to obtain a new -tuple (by setting for each ), and output .
That is, runs several times, and merges the obtained outputs according to . We now define a distribution on as follows: to draw a sample from , we simply draw items without replacement from the set .
Finally, for a randomizer , we define the randomizer as follows: Given an input , we first draw from , and then simulate on the input and output its output.
5.1.2 Pseudo-Locally Private Algorithms
We are now ready to define pseudo-locally private algorithms.
Definition 5.2.
We say that an algorithm is -pseudo-locally private, if for all and all ,
Remark 5.3.
If is -pseudo-locally private, then clearly is also -dominated; we can simply take for any fixed .
5.1.3 Multi-Message Protocols are Pseudo-Locally Private
Our most crucial observation here is an analogue of [CSU+18, Theorem 6.2] for multi-message protocols. Namely, we show that any multi-message protocol is pseudo-locally private.
Lemma 5.4.
If is - on users, then it is pseudo-locally private.
Proof.
Suppose otherwise, i.e., that there are and such that
Note that since both and pre-shuffle their outputs before outputting them, we can assume that is a union of equivalence classes of -tuples (we say two -tuples are equivalent if can be obtained by via applying a permutation).141414Too see this, we can take to be . One can see that if and are equivalent up to a permutation, then either both and are in , or neither of them is in .
Consider two datasets and . Let be the corresponding protocol with randomizer . For a dataset , we use to denote the random variable of the transcript of before shuffling. That is, for a dataset , is the concatenation of all for from to .
We now define an event as “there exist messages in the transcript of constituting the event ”. It immediately follows that
(2) |
Furthermore, we claim that
(3) |
To see why the above inequality holds, note that if we pick messages from , depending on which users these messages come from, the probability that they constitute is bounded by for a certain .
Moreover, if we pick these messages uniformly at random from all possible -tuples, the corresponding is distributed according to . Therefore, we can apply a union bound over all possible -tuples and sum up the corresponding to obtain an upper bound on . The aforementioned sum is precisely times the expectation .
Finally, note that applying a random permutation to the transcript does not change whether the event occurs. Therefore, (4) contradicts the assumption that is -. ∎
From Remark 5.3, we get the following corollary:
Corollary 5.5.
If is - on users, then it is -dominated.
Next, we generalize Corollary 5.5 to the public-coin case:
Lemma 5.6.
If is - in the -user public-coin setting with public randomness from and , then there is a family of distributions over such that for every distribution over ,
In other words, there are reals such that and is -dominated by .
Proof.
From the proof of Lemma 5.4, it follows that for all and , we have that
Since is - on users, it follows that
Putting the above two inequalities together, for all , we get that
We now finish the proof by fixing , and setting for every . ∎
5.2 Bounding KL Divergence for Dominated Randomizers
In this subsection, we prove the technical lemma bounding average-case KL divergences for dominated randomizers.
As before, we use and to denote the input space and the message space respectively. For a local randomizer , we let .
Let be a distribution on . Let be an index set, be a distribution on , and be a family of distributions on . For a constant , we say that -dominates if for all and , it holds that .
Theorem 5.7.
For a constant , let be a distribution which -dominates a distribution family . Let be a distribution on . Let be a concave function such that for all functions satisfying , it holds that
Then for an -dominated randomizer , it holds that
Proof.
Let . Recall that . We also set and .
It follows from the assumption that there exists a distribution that -dominates . Noting that -divergence upper-bounds KL divergence (see Section 3.4), it follows that
We will further decompose so that is small and is small. Formally, for each , we define a truncation level
Then, we define and as follows
Fix a in the support of . Noting that
we get
(5) |
To simplify the notation, in the following we set and . We will bound the two terms in (5) separately.
Bounding .
Since is concave, noting that and , it follows that
where the second step uses Jensen’s inequality. From the definition of , we have that
where the last equality follows from the fact that is a distribution. We therefore obtain
Bounding .
Since -dominates , it follows that
Therefore,
where the last inequality holds because .
By the definition of , it follows that
Let . For , we get that
In particular, the above means that , as otherwise
contradicting the fact that is -dominated by . Hence, we have
Putting everything together, it follows that
Final Bound.
Combining our bounds on and , we conclude that
6 Lower Bounds for Selection and ParityLearning
In this section, we prove lower bounds for Selection and ParityLearning in the model. We begin with some notation.
6.1 Notation
For , let be the uniform distribution on . Recall that is the uniform distribution on .
For , let be the -bit string such that only the -th bit is , and the other bits are . For , we denote by the uniform distribution on all length- Boolean strings with -th bit being . For simplicity, we also use to denote when the context is clear.
We need the following simple proposition.
Proposition 6.1.
For every function and ,
Proof.
By definition, we have that
6.2 Lower Bound for Selection
We begin with lower bounds for Selection.
Lemma 6.2.
For , suppose is -dominated, then we have
Proof.
To apply Theorem 5.7, we set the index set as , the distribution to the uniform distribution over , , and .
Lemma 6.3.
For a public-coin randomizer with public randomness from , if there is a family of distributions over such that
then a public-coin protocol with randomizer needs at least samples to solve Selection with probability at least .
Proof.
Let be uniformly random over , and be i.i.d. samples from . For each , we draw from .
Let be the output of the protocol with public randomness fixed to , and let . Assuming , it follows that with probability at least over the randomness of and randomness in , conditioned on the event .
By Markov’s inequality, with probability at least over , we have that with probability at least over the randomness in conditioned on the event .
From our assumption and Markov’s inequality, it follows that with probability at least over , we have . That is, is -dominated.
By a union bound, with probability at least over , we have with probability at least , and is -dominated. In the following, we fix such an .
By Lemma 6.2, for , we have that
By Fano’s inequality,
We also have that
Plugging in, we obtain
Hence, we deduce that . ∎
We are now ready to prove Theorem 1.9 (restated below).
Theorem 1.9. (restated) For any , if is a public-coin - protocol solving Selection with probability at least , then .
6.3 Lower Bound for ParityLearning
We next prove our lower bound for ParityLearning.
Lemma 6.4.
For , suppose is -dominated. We have that
Proof.
To apply Theorem 5.7, we set the index set as , distribution to be the uniform distribution over , , and .
Now we apply Lemma 6.4 to the ParityLearning problem. Recall that in ParityLearning, there is a random hidden element , and each user gets a random element together with the inner product over . Appending the label to the vector, each user indeed gets a random sample from the set , where is the -dimensional vector obtained by appending to the end of the vector . In other words, each user gets a random sample from the distribution .
Lemma 6.5.
For a public-coin randomizer with public randomness from , if there is a family of distributions over such that,
where is the number of samples, then a public-coin protocol with randomizer needs at least samples to solve ParityLearning with probability at least .
Proof.
Suppose there is a public-coin protocol with randomizer solving ParityLearning with probability at least . For a dataset , we use (respectively, ) to denote the output of on (with public randomness fixed to ).
Consider running on uniformly random samples from . We note that for at least a fraction of , we have that
From the assumption that solves ParityLearning, for all , we have
By a union bound, with probability at least over , we have
(6) |
From our assumption and Markov’s inequality, with probability at least over , we have that . That is, is -dominated.
By a union bound, there exists an such that is -dominated and (6) is satisfied.
Since there is at least a fraction of satisfying the conditions in (6), it follows that there exists an satisfying these conditions and , which, by Pinsker’s inequality, implies that
and
a contradiction. ∎
We are now ready to prove Theorem 1.10.
Theorem 1.10. (restated) For any , if is a public-coin - protocol solving ParityLearning with probability at least , then .
7 Lower Bound for CountDistinct with Maximum Hardness
In this section, we prove Theorem 1.1, which gives a lower bound on the error of protocols for CountDistinct.
7.1 Preliminaries
For , recall that is the uniform distribution on . As in Section 2.2, we also use to denote the mixture of and which outputs a sample from with probability and a sample from with probability . Note that can also be interpreted as the mixture of and that outputs a sample from with probability , and a sample from with probability . We next estimate the number of distinct elements in samples taken from .
Proposition 7.1.
Set . For and any , let be the number of distinct elements in samples drawn from . We have that
Proof.
In the following, we identify the index space with in the natural way. For , we use to denote the indicator of whether occurs in the samples taken from . Note that these ’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], and hence a Chernoff bound still applies.
Let be an element in the support of . Note that equals one sample from with probability
Therefore, occurs in i.i.d. samples from with probability
Therefore, we have that
Similarly, for an element in the support of , equals one sample from with probability
Hence, by a similar calculation, occurs in i.i.d. samples from with probability
and
Hence, we have that
Let
where the last equality holds since . Let . Using the Chernoff bound and the fact that , we have that
which completes the proof. ∎
7.2 Lower bound
Lemma 7.2.
For any and , if is -dominated, then we have that
Proof.
We follow closely the proof of Lemma 6.4. To apply Theorem 5.7, we set the index set as , the distribution to be the uniform distribution over , , and .
Clearly, -dominates . Let be a function such that and . It follows that
Recall that
Therefore, we can set . Clearly, is a concave function. By Theorem 5.7, it follows that
We now show that the CountDistinct function is hard for -local algorithms.
Theorem 1.1. (restated) For , if is a public-coin - protocol, then it cannot compute with error and probability at least .
Proof.
Let . We identify the input space with in the natural way. Suppose there is a public-coin - protocol solving with error and probability at least .
Let with public randomness from be the randomizer used in . For a dataset , we use (respectively, ) to denote the output of on the dataset (with public randomness fixed to ).
Setting , we let and .
By our assumption on , Proposition 7.1 and a union bound, it follows that for every , we have
Similarly, we have
Note that by our choice of , we have . By a union bound, it follows that with probability at least over , we have
for at least a 0.5 fraction of . | (7) |
By the definition of public-coin protocols, we have that with probability at least over , is -dominated. By a union bound, there exists a such that is -dominated and it satisfies the condition in (7). We fix such a .
By Lemma 7.2, it follows that
Recall that , the above further simplifies to
Let be the set of satisfying the conditions on stated in (7). Since contains at aleast a fraction of , it follows that
This means that there exists a pair such that . We fix such a pair .
We have By Pinsker’s inequality, it follows that
Since , it follows that
(8) |
On the other hand, we also have
(9) |
8 Low-Message Protocols for CountDistinct
In this section, we present our low-message protocols for CountDistinct, thereby proving Theorem 1.6.
In Section 8.1, we review the previous protocol of [BCJM20], and discuss some intuitions underlying our improvement. In Section 8.2, we introduce some necessary definitions and technical tools. Next, in Section 8.3 we present our private-coin protocol (stated in Theorem 8.4) for CountDistinct with error , which uses message per user in expectation when the input universe size is below . We will also show that a simple modification of this protocol is -, thereby proving Theorem 1.3. Finally, based on the private-coin protocol, in Section 8.4 we prove Theorem 1.6 by presenting our public-coin protocol for CountDistinct, which uses less than message per user in expectation without any restriction on the universe size.
8.1 Intuition
We now turn to sketch the main ideas behind Theorem 8.4 and Theorem 1.6. It would be instructive to review the -message protocol solving with error from [BCJM20].
The Model. To gain more insights about their protocol, we consider the following mod 2 shuffle model (), where two messages of the same content “cancel each other”, i.e., the transcript is now a random permutation of messages that appear an odd number of times.
The DP requirement now applies to this new version of transcript. The same holds for the analyzer, who now can only see the new version of transcript. [BCJM20] first gave a protocol for CountDistinct, and then adapted that protocol to the standard model using the Ishai et al. protocol for secure aggregation [IKOS06].151515They did not explicit specify their protocol in the model, but it is implicit in their proof of security.
Low-Message Protocol in . The protocol of [BCJM20] (referred as in what follows) first sets a parameter so that . Next, for each user holding an element , the user first sends with probability . Then for each , the user sends message with probability . All these events are independent.
Finally, if there are messages in the transcript (i.e., there are messages occurring an odd number of times in the original transcript), then the analyzer outputs as the estimate. Note that a user sends message in expectation.
Analysis of the Protocol . It is shown in [BCJM20] that the above protocol is -DP and solves with error . Here we briefly outline the intuition behind it.
Let be the set consisting of all inputs of the users. We can see that every belongs to the transcript with probability exactly ; on the other hand, every belongs to the transcript with probability exactly . Moreover, all these events are independent. Therefore, a simple calculation shows that , and the accuracy follows from a Chernoff bound. As for the DP guarantee, changing the input of one user only affects the distributions of two messages in the transcript, and it only changes each message’s occurrence probability in the transcript from to or vice versa.
From to . To obtain an actual protocol from , the protocol from [BCJM20] (which we henceforth denote by ) runs copies of the protocol for securely computing sum over [IKOS06], such that the -th protocol aims to simulate the number of occurrences of message modulo . For each user , if it were to send a message in , it sends one in ; otherwise it sends zero in .
Since the [IKOS06] protocol for computing sum over requires messages from each user [GMPV20, BBGN20], each user needs to send messages in total. Moreover, from the security condition of , for each message the transcript only reveals the parity of its number of occurrences, which is exactly what we need in order to simulate protocols.
Our Improvement. Note that requires significantly more messages per user than that of . Our goal here is to compile to in a much more efficient way, ideally with no overhead. In each user sends only message. This means that when translating to , users end up sending many zero messages in the subprotocols, which is wasteful.
Our crucial idea for improving on the aforementioned protocol is a very simple yet effective alternative to the secure aggregation protocol over of [IKOS06] used in . In our new subprotocol , if a user were to send a message in , it sends one to ; otherwise it draws from a noise distribution (such that ) and sends many ones to . Clearly, our new still maintains the parity of occurrences of each messages, and the expected number of messages is roughly . To show that the resulting protocol is , we build on the techniques of [GKMP20], which show that the noise added can hide the contribution of a single user.
8.2 Preliminaries
We first recall the definition of the negative binomial distribution.
Definition 8.1.
Let and , the negative binomial distribution is defined by for each non-negative integer .161616For a real number , .
We recall the following key properties of the negative binomial distribution: (1) For and , has the same distribution as ; (2) .
We will need the following lemma from [GKMP20].
Lemma 8.2.
For any , and , let and . For any , .
Corollary 8.3.
For any , and , let and be as in Lemma 8.2. For any two distributions and on , .
8.3 A Private-Coin Base Protocol
Recall that denotes the restriction of CountDistinct such that every user gets an input from , and the goal is compute the number of distinct elements among all users.
We are now ready to prove Theorem 8.4, which is the private-coin case of Theorem 1.6. To simplify the privacy analysis of the protocol and ease its application in Section 8.4, we also allow the input to be , which means that the user’s input is not counted.
Theorem 8.4.
For any and , there is a private-coin - protocol computing with error with probability at least . Moreover, the expected number of messages sent by each user is .
Proof.
Without loss of generality, we can assume that . The algorithm requires several global constants that only depend on the values of , and . Algorithm 1 specifies these constants. Here, is a sufficiently large constant to be specified later.
Next, we specify the randomizer and the analyzer of the protocol in Algorithm 2 and Algorithm 3 respectively.
Accuracy Analysis.
We first analyze the error of our protocol. Let be the set . Recall that the goal is to estimate .
For each , we analyze the distribution of the random variable in Algorithm 3. We observe that: (1) if , then is distributed uniformly at random over ; (2) if , then is distributed as by Lemma 8.5; (3) are independent.
Lemma 8.5 ([BCJM20, Lemma 3.5]).
Let be specified as in Global-Constants. Then, is distributed identically to .
Hence, we have that . Plugging in the equation defining the output , we have . An application of Hoeffding’s inequality implies that
for a sufficiently large constant . Hence, with probability at least , the error of the protocol is less than .
Privacy Analysis.
We now prove that our protocol is indeed -DP. Note that the multi-set of messages can be described by integers (corresponding to the histogram of the messages).
Consider two neighboring datasets and (without loss of generality, we assume that they differ at the first user). Let and be the corresponding distributions of given input datasets and . The goal is to show that they satisfy the -DP constraint. That is, we have to establish that .
To simplify the analysis, we introduce another dataset , and let be the corresponding distribution of given input dataset . By the composition rule of -DP, it suffices to show that the pairs and satisfy -DP (note that , and ). By symmetry, it suffices to consider the pair and prove that .
Let , and be the number of times that appears in . First note that all coordinates in both and are independent, and furthermore the marginal distribution of and on coordinates in are identical. Hence, by Item (1) of Proposition 3.4, it suffices to establish that and satisfy -DP.
The distribution of is , and the distribution of is .171717 Recall that for two random variables and , we use to denote the random variable distributed as a sum of two independent samples from and . Since , it suffices to consider the case where by Item (1) of Proposition 3.4.
We need the following lemma, whose proof is deferred until we finish the proof of Theorem 8.4.
Lemma 8.6.
Let be specified as in Set-Global-Constants, , and . Then,
By Lemma 8.6 and previous discussions, it follows that , which shows that our protocol is -DP as desired.
In the following, we will need the proposition below which gives us an estimate on .
Proposition 8.7.
Let be specified as in Set-Global-Constants. Then, .
Proof.
Since , we have . Hence, . Plugging in the definition of , it follows that . Finally, it follows that
Efficiency Analysis.
We now analyze the message complexity of our protocol. Note that
By a straightforward calculation, each user sends
messages in expectation. ∎
Finally, we prove Lemma 8.6.
Proof of Lemma 8.6.
We consider bounding first. Note that since by Proposition 8.7, we set the constant so that
Recall that , and note that our choices of and satisfy Lemma 8.2 with privacy parameters and .
Now, let , , and .
To apply Item (2) of Proposition 3.4, we are going to decompose and into a weighted sum of three sub-distributions.
Decomposition of .
We define three events on as follows:
and
We let , , .
From our choice of , we have . Let . By Lemma 8.5, it follows that and .
Therefore, let , and . We can now decompose as a mixture of components , and with corresponding mixing weights , and .
Decomposition of .
Now, we define three events on as follows
and
Similarly, we let , , .
By our choice of , we have . Since , it follows that and .
Let , and . We therefore decompose as a mixture of components , and with mixing weights , and .
Bounding .
By Item (2) of Proposition 3.4, we have that
Now, note that , and . It follows that (since ), and .
Extension to Robust Shuffle Privacy.
Now we briefly discuss how to generalize the analysis of the above protocol in order to show that it also satisfies the stronger robust shuffle privacy condition. We first need the following formal definition of robust shuffle privacy.
Definition 8.8 ([BCJM20]).
A protocol is -robustly shuffle differential private if, for all and , the algorithm is -DP. In other words, guarantees -shuffle privacy whenever at least a fraction of users follow the protocol.
Note that while the above definition requires the privacy condition to be satisfied whenever there is at least a fraction of users participating, the accuracy condition is only required when all users participate. That is, if some users drop from the protocol, then the analyzer does not need to output an accurate estimate of CountDistinct.
Theorem 8.9.
For two constants , and , there is an -robustly shuffle differentially private protocol solving with error and with probability at least . Moreover, the expected number of messages sent by each user is .181818To make the notation clean, we choose not to analyze the exact dependence on and .
Proof Sketch.
To make the algorithm in Theorem 8.4 robustly shuffle private, we need the following modifications:
The first two modifications guarantee that there is enough noise even when only users participate, so that the privacy analysis of Theorem 8.4 goes through. We now show that the last modification allows us to obtain an accurate estimate of when all users participate. In the following, we will use the same notation as in the proof of Theorem 8.4. Note that we have
Hence by Lemma 8.5, is distributed as when . A similar calculation then shows that the error can be bounded by . ∎
- Protocol for CountDistinct.
Finally, we show that the protocol from Theorem 8.4 is also - with a simple modification, which proves Theorem 1.3 (restated below).
Theorem 1.3. (restated) There is a - protocol computing with error .
Proof Sketch.
Let . We consider the following modification of Algorithm 2.
That is, in Algorithm 4 we remove the noise messages sampled from the distribution . Also, we do not send the same message more than once (the loop over skips the element ).
When viewing it as a local protocol, we can assume that each user first collects all the messages it would send in Algorithm 4, and then simply outputs the histogram (so our new local randomizer only sends a single message). The analyzer in the local protocol can then aggregate these histograms, and apply the analyzer in Algorithm 3. By the same accuracy proof as in Theorem 8.4, it follows that the protocol achieves error with probability at least . So it only remains to prove that the protocol is -.
We let be the randomizer in Algorithm 4, and we use to denote the distribution of the histogram of the messages output by , which is exactly the output distribution of our new local randomizer.
Without loss of generality, it suffices to show that for all possible histograms (note that Algorithm 4 does not send a message more than once), it holds that
Note that only depends on the first two bits of . By enumerating all possible combination of two bits, we can bound it by
The last inequality follows from the fact that . ∎
8.4 Public-Coin Protocol
We are now ready to prove Theorem 1.6 (restated below).
Theorem 1.6. (restated) For all and , there is a public-coin - protocol computing with error and probability at least . Moreover, the expected number of messages sent by each user is at most .
Proof.
Let be the constant in Theorem 8.4 such that the expected number of messages is bounded by .
The Protocol.
We set so that the foregoing expected number of messages is bounded by .
Note that we can assume as otherwise we are only required to solve with the trivial error bound . Therefore, we have and .
We are going to apply a reduction to the private-coin protocol for in Theorem 8.4. The full protocol is as follows:
-
•
Using the public randomness, the users jointly sample a uniformly random mapping and a uniformly random permutation on .
-
•
For each user holding an input , it computes , and sets its new input to be if , and otherwise. Then it runs the private-coin protocol in Theorem 8.4.
-
•
Let . The analyzer first runs the analyzer in Theorem 8.4 to obtain an estimate . Then it computes , and outputs
Analysis of the Protocol.
The privacy of the protocol above follows directly from the privacy property of the protocol from Theorem 8.4. Moreover, the bound on the expected number of messages per user simply follows from our choice of the parameter .
It thus suffices to establish the accuracy guarantee of the protocol. Let be the set of all inputs, and the goal is to estimate . We also let and .
By the accuracy guarantee of Theorem 8.4, it follows that with probability at least , we have
In the following, we will condition on this event.
Proving that is a good estimate of .
Next, we show that is close to . We will rely on the following lemma.
Lemma 8.10.
For a uniformly random permutation and a fixed set , for every such that is an integer, let , we have
Proof.
For each , let be the indicator that . Note that these ’s are not independent, but they are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.
Note that . By a Chernoff bound, it thus follows that
and hence
Scaling both sides of the above inequality by concludes the proof. ∎
We now set and recall that . By Lemma 8.10, with probability at least , it holds that
Proving that is a good estimate of .
Finally, we show that our output is a good estimate of . To do so, we need the following lemma.
Lemma 8.11.
Let . For a uniformly random mapping and a fixed set such that , we have that
Proof.
For each , let be the indicator whether . As before, these ’s are not independent but are negatively correlated [DR98, Proposition 7 and 11], hence a Chernoff bound still applies.
Note that . By a Chernoff bound, it thus follows that
Noting that and completes the proof. ∎
By Lemma 8.11, it follows that with probability at least , we have .
Putting everything together, with probability at least , we get that
The final step is to show that accurately estimates . Recall that , which in particular means that
By a triangle inequality, it follows that
We need the following lemma to finish the analysis.
Lemma 8.12.
There is a constant such that for all , it holds that
Proof.
Suppose without loss of generality. Let . We have that
Finally, by Lemma 8.12, with probability at least , we obtain that , which concludes the proof. ∎
9 Lower Bounds in Two-Party Differential Privacy
In this section, we depart from the local and shuffle models, and instead consider the two-party differential privacy [MMP+10], which can be defined as follows.
Definition 9.1 (DP in the Two-Party Model [MMP+10]).
There are two parties and ; holds and holds . Let be any randomized protocol between and . Let denote the tuple (, the private randomness of , the transcript of the protocol). Similarly, let denote the tuple (, the private randomness of , the transcript of the protocol).
We say that is - if, for any , the algorithms
are both -DP.
We say that a two-party protocol computes a function with error if, at the end of the protocol, at least one of the parties can output a number that lies in with probability at least .
We quickly note that, unlike in the local and shuffle models, we need not consider the public-coin and private-coin cases separately: as noted in [MMP+10], the two parties may share fresh private random bits without violating privacy, meaning that public randomness is unnecessary.
The goal of this section is to prove Theorem 1.11. To do this, we first state the necessary lower bound from [MMP+10] in Section 9.1. We then give our reduction and prove Theorem 1.11 in Section 9.2. Finally, in Section 9.3, we extend the lower bound to the case where the function is symmetric.
9.1 Inner Product Lower Bound from [MMP+10]
McGregor et al. [MMP+10] show that the inner product function is hard in the two-party model. Roughly speaking, they show that, if we let be uniformly random strings, then, for any not-too-large , no - protocol can distinguish between and a uniformly random number from . McGregor et al. use this result when , but we will use their result for .
To avoid confusion in the next subsection, we will use in place of in this subsection. The following theorem is implicit in the proof of Theorem A.5 of [MMP+11] (it follows by replacing with there). Recall that is the uniform distribution over .
Theorem 9.2 ([MMP+11]).
Let be any - protocol. Suppose and let be a uniformly random bit. Then, we have
It will be more convenient to state the above lower bound in terms of hardness of distinguishing two distributions, as we have done in the rest of this paper. To state this, we will need the following few notation. First, we use to denote the distribution conditioned on the inner product of the two strings being ; similarly, we use to denote the distribution conditioned on the inner product of the two strings being . Furthermore, for any distribution on , we write (respectively, ) to denote the distribution of (respectively, ) when are drawn according to .
We may now state the following corollary, which is an immediate consequence of Theorem 9.2.
Corollary 9.3.
Let be any - protocol. Then, we have that
9.2 From Parity to the Gap
We will now construct the hard distributions that eventually give the gap of between the sensitivity and the error achievable in two-party model. The hard distributions are simply concatenations of or . Specifically, tor , we write (respectively, ) to denote the distribution of where is an i.i.d. sample from (respectively, ) for all . Similar to before, it is hard to distinguish the two distributions:
Lemma 9.4.
Let be any - protocol. Then, we have
Proof.
We prove this via a simple hybrid argument. For , let us denote by the distribution of where, for all , is independent from if and from otherwise. Notice that and .
Our main claim is the following. For every and any - protocol ,
(10) |
Note that summing (10) over all immediately yields Lemma 9.4.
We will now prove (10). Given an - protocol (where each party’s input has bits), we construct a protocol (where each party’s input has bits) as follows:
-
•
Suppose that the input of is , and the input of is .
-
•
For , samples from and sends to .
-
•
, samples from and sends to .
-
•
sets .
-
•
sets .
-
•
and then run the protocol on .
It is clear that is - and that
We can now prove our main theorem of this section.
Proof of Theorem 1.11.
Let be a sufficiently large constant to be chosen later. Let and . We may define on only bits, as it can be trivially extended to bits by ignoring the last bits of . Let
where the outer summation is over .
It is immediate that the sensitivity of is one. We will now argue that any - protocol with incurs error at least . Since the function is symmetric with respect to the two parties, it suffices without loss of generality to show that the output of incurs error with probability 0.1. To do so, we start by observing that we have for any whereas for any . From Lemma 9.4, we have that
As a result, if we sample from with probability 1/2 and with probability 1/2, then the probability that ’s output incurs error at least is at least
When is sufficiently large, we also have that . As a result, with probability (which is at least for any sufficiently large ), the protocol must incur an error of at least . ∎
9.3 Symmetrization
Notice that the function in Theorem 1.11 is asymmetric. It is a natural question to ask whether we can get a similar lower bound for a symmetric function. In this subsection, we give a simple reduction that positively answers this question, ultimately yielding the following:
Theorem 9.5.
For any and any sufficiently large , there is a symmetric function whose sensitivity is one and such that any - protocol cannot compute to within an error of .
We remark that the input to each user comes from a set of size , instead of as in Theorem 1.11. This larger value of turns out to be necessary for symmetric functions: when is symmetric, we may use the Laplace mechanism from both sides to estimate the histogram of the input, which we can then use to compute . If the sensitivity of is , this algorithm incurs an error of . Hence, to achieve a lower bound of , we need to be at least .
The properties of our reduction are summarized in the following lemma, which combined with Theorem 1.11, immediately implies Theorem 9.5.
Lemma 9.6.
For any function , there is another function such that the following holds:
-
•
The sensitivity of is no more than that of .
-
•
If there exists an - protocol that solves with error , then there exists an - protocol that solves with error .
The idea behind the proof of Lemma 9.6 is simple. Roughly speaking, we view each input of as “setting the th position to ” for the input to . This is formalized below.
Proof of Lemma 9.6.
We start by defining . Let be an arbitrary element of . For every , we define where
We now define by
We will next verify that the two properties hold.
-
•
Notice that changing any user’s input in results in at most two changes in the user’s input of . As a result, the sensitivity of is no more than that of .
-
•
Suppose that there exists an - protocol that solves with error . Let be the protocol for where transform their inputs , to , respectively, then run , and finally return the output of multiplied by two. It is obvious that is -; furthermore, since the protocol incurs error , the protocol incurs error as desired. ∎
Acknowledgments
We would like to thank Noah Golowich for numerous enlightening discussions about lower bounds in the multi-message model, and for helpful feedback.
References
- [Abo18] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
- [App17] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
- [BBGN19] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In CRYPTO, pages 638–667, 2019.
- [BBGN20] Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. Private summation in the multi-message shuffle model. arXiv: 2002.00817, 2020.
- [BC20] Victor Balcer and Albert Cheu. Separating local & shuffled differential privacy via histograms. In ITC, pages 1:1–1:14, 2020.
- [BCJM20] Victor Balcer, Albert Cheu, Matthew Joseph, and Jieming Mao. Connecting robust shuffle privacy and pan-privacy. CoRR, abs/2004.09481, 2020.
- [BCK+14] Joshua Brody, Amit Chakrabarti, Ranganath Kondapally, David P Woodruff, and Grigory Yaroslavtsev. Beyond set disjointness: the communication complexity of finding the intersection. In PODC, pages 106–113, 2014.
- [BEM+17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441–459, 2017.
- [BFJ+94] Avrim Blum, Merrick Furst, Jeffrey Jackson, Michael Kearns, Yishay Mansour, and Steven Rudich. Weakly learning dnf and characterizing statistical query learning using fourier analysis. In STOC, pages 253–262, 1994.
- [CDSKY20] Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. PoPETs, 3:153–174, 2020.
- [CSS12] TH Hubert Chan, Elaine Shi, and Dawn Song. Optimal lower bound for differentially private multi-party aggregation. In ESA, pages 277–288, 2012.
- [CSU+18] Albert Cheu, Adam D. Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via mixnets. CoRR, abs/1808.01394, 2018.
- [CU20] Albert Cheu and Jonathan Ullman. The limits of pan privacy and shuffle privacy for learning and estimation. CoRR, abs/2009.08000, 2020.
- [DJW13] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In FOCS, pages 429–438, 2013.
- [DKM+06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
- [DKY17] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In NIPS, pages 3571–3580, 2017.
- [DLB19] Damien Desfontaines, Andreas Lochbihler, and David Basin. Cardinality estimators do not preserve privacy. PoPETs, 2019(2):26–46, 2019.
- [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
- [DR98] Devdatt P. Dubhashi and Desh Ranjan. Balls and bins: A study in negative dependence. Random Struct. Algorithms, 13(2):99–124, 1998.
- [DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
- [DRV10] Cynthia Dwork, Guy N. Rothblum, and Salil P. Vadhan. Boosting and differential privacy. In FOCS, pages 51–60, 2010.
- [EFM+19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468–2479, 2019.
- [ENU20] Alexander Edmonds, Aleksandar Nikolov, and Jonathan Ullman. The power of factorization mechanisms in local and central differential privacy. In STOC, pages 425–438, 2020.
- [EPK14] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014.
- [GGK+19] Badih Ghazi, Noah Golowich, Ravi Kumar, Rasmus Pagh, and Ameya Velingker. On the power of multiple anonymous messages. IACR Cryptol. ePrint Arch., 2019:1382, 2019.
- [GGK+20] Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Pure differentially private summation from anonymous messages. In ITC, pages 15:1–15:23, 2020.
- [GKMP20] Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Rasmus Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, 2020.
- [GMPV20] Badih Ghazi, Pasin Manurangsi, Rasmus Pagh, and Ameya Velingker. Private aggregation from fewer anonymous messages. In EUROCRYPT, pages 798–827, 2020.
- [Gre16] Andy Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
- [GS02] Alison L Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International statistical review, 70(3):419–435, 2002.
- [IKOS06] Yuval Ishai, Eyal Kushilevitz, Rafail Ostrovsky, and Amit Sahai. Cryptography from anonymity. In FOCS, pages 239–248, 2006.
- [JHW18] Jiantao Jiao, Yanjun Han, and Tsachy Weissman. Minimax estimation of the l distance. IEEE Trans. Inf. Theory, 64(10):6672–6706, 2018.
- [Kea98] Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the ACM (JACM), 45(6):983–1006, 1998.
- [KLN+11] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
- [KNW10] Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41–52, 2010.
- [MMNW11] Darakhshan Mir, Shan Muthukrishnan, Aleksandar Nikolov, and Rebecca N Wright. Pan-private algorithms via statistics on sketches. In PODS, pages 37–48, 2011.
- [MMP+10] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil Vadhan. The limits of two-party differential privacy. In FOCS, pages 81–90, 2010.
- [MMP+11] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. The limits of two-party differential privacy. Electron. Colloquium Comput. Complex., 18:106, 2011.
- [MT07] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages 94–103, 2007.
- [O’D14] Ryan O’Donnell. Analysis of Boolean Functions. Cambridge University Press, 2014.
- [PS20] Rasmus Pagh and Nina Mesing Stausholm. Efficient differentially private linear sketching. arXiv preprint arXiv:2001.11932, 2020.
- [PT11] Giovanni Peccati and Murad S Taqqu. Some facts about charlier polynomials. In Wiener Chaos: Moments, Cumulants and Diagrams, pages 171–175. Springer, 2011.
- [Rob90] B Robert. Ash. information theory, 1990.
- [Sha14] Stephen Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
- [SU17] Thomas Steinke and Jonathan Ullman. Tight lower bounds for differentially private selection. In FOCS, pages 552–563, 2017.
- [Tim14] Aleksandr Filippovich Timan. Theory of approximation of functions of a real variable. Elsevier, 2014.
- [Ull18] Jonathan Ullman. Tight lower bounds for locally differentially private selection. In arXiv:1802.02638, 2018.
- [Vad17] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer, 2017.
- [VV17] Gregory Valiant and Paul Valiant. Estimating the unseen: Improved estimators for entropy and other properties. J. ACM, 64(6):37:1–37:41, 2017.
- [War65] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- [WY16] Yihong Wu and Pengkun Yang. Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Trans. Inf. Theory, 62(6):3702–3720, 2016.
- [WY19] Yihong Wu and Pengkun Yang. Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 47(2):857–883, 2019.
- [Yan19] Han Yanjun. Lecture 7: Mixture vs. mixture and moment matching. https://theinformaticists.com/2019/08/28/lecture-7-mixture-vs-mixture-and-moment-matching/, 2019.
Appendix A Total Variance Bound between Mixtures of Multi-dimensional Poisson Distributions
In this section we prove Lemma 4.3 (restated below).
Lemma 4.3. (restated) Let be two random variables supported on such that for all , where . Let and such that . Let be the distribution over corresponding to . Suppose that
Then,
To prove Lemma 4.3, we begin with some notation. Let . For vectors and , we let
We are going to apply the moment-matching technique [WY16, JHW18, WY19] for bounding total variance between mixtures of (single-dimensional) Poisson distributions. The following lemma is a direct generalization of the Theorem 4 of [Yan19] to mixtures of multi-dimensional Poisson distributions. We will use the convention that .
Lemma A.1.
For and two distributions and supported on , we have that
In order to prove the above lemma, we need to use the Charlier polynomial . The explicit definition of is not important here; we simply list two important properties of this polynomial family [PT11]:
Proposition A.2.
Let and , the following hold:
-
1.
We have that
-
2.
For all ,
Proof of Lemma A.1.
Applying the Cauchy-Schwarz inequality, we get that
where the penultimate equality follows from Item (1) of Proposition A.2. ∎
Applying Lemma A.1, the next lemma follows from a straightforward calculation.
Lemma A.3.
Let be two random variables supported on such that for all , where . For , let (division here is coordinate-wise, and denotes taking coordinate-wise square of ). The following holds:
Proof.
Let . We also set , and . Note that for every , we have , and the same holds for each as well. Hence, we can apply Lemma A.1 to bound as follows:
where the first inequality follows from Lemma A.1.
Now, recall that . We claim that
To prove this equality, consider the random process of drawing samples from using the distribution corresponding to (that is, we get with probability . It is a well-defined distribution since ). Let be the random variable corresponding to the histogram of the samples (that is, denotes the number of occurrences of the element ). For , we have that
Hence, we get
and
Plugging in, we obtain
Appendix B Lower Bounds on Hockey Stick Divergence
In this section, we prove Lemma 4.8 (restated below).
Lemma 4.8. (restated) There exists an absolute constant such that, for every integer , three reals such that , letting and supposing , it holds that
Before proving Lemma 4.8, we need several technical lemmas. First, we show the hockey stick divergence between and can be characterized by the hockey sticky divergence between and .
Lemma B.1.
Let be three reals such that , and be a random variable over . The following holds:
where .
Proof.
We have that
Next, we need a lemma giving a lower bound on for a random variable .
Lemma B.2.
Let be a random variable over and . The following holds:
Proof.
We have that
Applying Lemma B.2, we obtain the following lower bound on .
Lemma B.3.
For and , such that ,
Proof.
Now, by anti-concentration of the binomial distribution [Rob90, Page 115], we have
Letting , we have
Putting everything together, we get
We are now ready to prove Lemma 4.8.
Appendix C Simulation of Shuffle Protocols by SQ Algorithms
In this section, we show the connection between dominated protocols and SQ algorithms, which implies that protocols can be simulated by SQ algorithms. This is analogous to the result of Kasiviswanathan et al. [KLN+11] who proved such a connection between protocols and SQ algorithms. In the following, we use the notation of [KLN+11].
C.1 SQ Model
We first introduce the statistical query (SQ) model. In the SQ model, algorithms access a distribution through its statistical properties instead of individual samples.
Definition C.1 (SQ Oracle).
Let be a distribution over . An SQ oracle takes as input a function and a tolerance parameter ; it outputs an estimate such that:
Definition C.2 (SQ algorithm).
An SQ algorithm is an algorithm that accesses the distribution only through the SQ oracle .
C.2 Simulation of Dominated Algorithms by SQ Algorithms
We have the following simulation of dominated algorithms by SQ algorithms.
Theorem C.3.
Suppose is -dominated. Then, for any distribution and error parameter , one can take a sample from with statistical error using queries in expectation to with tolerance .
Proof.
Suppose is -dominated by . Let . For every and , we use (respectively, ) to denote (respectively, ), and let
Our algorithm is a rejection sampling procedure adapted from [KLN+11]. It works as follows:
-
1.
Take a sample .
-
2.
We make a query to with tolerance level , to obtain an estimate such that .
-
3.
With probability , we output and stop. Otherwise we go back to Step 1.
Let . Note that since is -dominated by . By the definition of , it holds that for every . We will need the following claim.
Claim 1.
For every ,
Proof.
Now, in a single run, the above algorithm outputs with probability in the interval
Note that . By Claim 1, the algorithm terminates in a single run with probability at least
and at most
The above implies that the algorithm makes at most queries to in expectation.
Putting everything together, our algorithm outputs with probability in the following interval:
We have that
Note that
Moreover, we have that
C.3 Applications
We are now ready to apply Theorem C.3 to show that protocols in the model can be simulated by SQ algorithms when the database is drawn i.i.d. from a single distribution.
Theorem C.4.
Let be a database with entries drawn i.i.d. from a distribution . Let be an - protocol on users. Then, there is an algorithm making queries in expectation to with tolerance , such that its output distribution differs by at most in statistical distance from the output distribution of on the dataset .
Proof.
We remark that [BFJ+94] proved that if an SQ algorithm solves ParityLearning with probability at least , queries and tolerance , then (recall that is the dimension of the hidden vector in ParityLearning). Combing the foregoing lower bound with Theorem C.4, it translates to an lower bound on the sample complexity of - protocols solving ParityLearning, which is slightly weaker than our Theorem 1.10.
Appendix D Upper Bounds for Selection in
In this section, we give a proof sketch for the protocol for Selection with sample complexity .
Theorem D.1.
For any , and , there is an - protocol solving Selection with probability at least and .
Proof Sketch.
Let and be the privacy parameters. We also let , and to be specified later.
We can assume that , as otherwise the protocol simply follows from the sample upper bound by an - protocol [GGK+19].
Let , and . Note that by our choice of and , .
Now for each , our protocol maintains an -DP subprotocol aiming at estimating the fraction of users whose input satisfies . These subprotocols assume they will receive between and inputs. By [GMPV20, BBGN20], there is such a protocol which achieves error with probability at least and using messages.
In our protocol, each user selects coordinates from uniformly at random, and participates in the corresponding subprotocols. Finally, the analyzer aggregates the outputs of all subprotocols and outputs the coordinate with the highest estimate.
Note that by a union bound a Chernoff bound, it follows that with probability at least , the number of users of every subprotocol falls in the range , and their mean is -close to the true mean of -th coordinates of all users.
Setting and appropriately, the protocol is -DP by the advanced composition theorem of DP [DRV10]. Moreover, with probability at least , all subprotocols obtain estimates with error .
Setting so that , our protocol solves Selection with probability at least .
∎