Continual Mean Estimation Under User-Level Privacy
Abstract
We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant such that the overall release is user-level -DP and has the following error guarantee: Denoting by the maximum number of samples contributed by a user, as long as users have samples each, the error at time is . This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.
1 Introduction
Aggregate queries over data sets were originally believed to maintain the privacy of data contributors. However, over the past two decades several attacks have been proposed to manipulate the output of aggregate queries to get more information about an individual user’s data [NS08], [SAW13], [GAM19]. To address this, several mechanisms have been proposed to release a noisy output instead of the original query output. Remarkably, these mechanisms have been shown to preserve privacy under a mathematically rigorous privacy requirement condition called differential privacy [Dwo+06]. But, until recently, the analysis assumed that each user contributes one data-point and not multiple. Furthermore, the data set was assumed to be static and only one query was made. Our goal in this paper is to address a more practical situation where multiple queries are made and the data set keeps on getting updated between two queries, using contributions from existing or new users. We provide an almost optimal private mechanism for this case for the specific running-average query, which can easily be adopted to answer more general aggregate queries as well.
1.1 Problem formulation and some heuristics
Consider a stream of data points contributed by users, where . For simplicity, we assume that one point is contributed at each time: is contributed by a user at time . The maximum number of samples contributed by any user till time instant is denoted by , and we assume that , i.e., each user contributes at most samples.
We formulate our query release problem as a statistical estimation problem to identify an optimal mechanism. Specifically, we assume that each is drawn independently from a distribution on with unknown mean . At each time step , we are required to output an estimate for the mean of , while guaranteeing that the sequence of outputs is user-level -differentially private (-DP). Namely, the output is -DP with respect to the user input comprising all the data points contributed by a single user [Ami+19].
A naive application of standard differentially private mechanisms at each time step will lead to error rates with suboptimal dependence on the time window and the number of samples contributed by each user. For instance, consider the most basic setting where users draw samples independently from a Bernoulli distribution with parameter . At any time , a single user can affect the sample mean by at most . Adding Laplace noise with parameter to this running sum will therefore guarantee user-level -DP at each step (implying -DP over steps) and an error of . Rescaling the privacy parameter, we get an error that scales as . While the statistical error term is optimal, the error term due to privacy is not – it does not, in fact, improve as time progresses. One can do better by using mechanisms specific to the streaming setting such as the binary mechanism [CSS10], [Dwo+10], as we describe in more detail in Section 3. Indeed, we will see that the privacy error term can be improved in two aspects: first, it can be made to decay as time progresses, and second, it only needs to grow sublinearly with .
However, a key challenge still remains. Each user is contributing multiple samples to the stream, and different samples from the same user can come at arbitrary time instants. The output must depend on the the arrival pattern of the users. For instance, when all samples in the stream are contributed by a single user, we cannot release much information. Indeed, changing this user’s data can potentially change any query responses by a large amount, leading to increased sensitivity and addition of large amounts of noise to guarantee user-level privacy. A better strategy is to withhold answers for a while until a new user arrives – this provides a sort of diversity advantage and reduces the amount of noise we need to add. The process of withholding alone does not, however, lead to an optimal error rate. We additionally need to control sensitivity by forming local averages of each users samples and then truncating these averages, as is done in the one-shot setting [Lev+21], before adding noise.
1.2 Summary of contributions
We present a modular and easy-to-implement algorithm for continual mean estimation under user-level privacy constraints. Our main algorithm has two key components: a sequence of binary mechanisms that keep track of truncated averages and a withhold-release mechanism that decides when to update the mean estimate. The role of binary mechanisms is similar to [CSS10], [DRV10] – to minimize the number of outputs each data point influences. The withhold-release mechanism releases a query only when there is “sufficient diversity,” i.e., enough users have contributed to the data. In fact, in a practical implementation of our algorithm, we will need to maintain this diversity by omitting excessive number of samples from a few users from the mean estimate evaluation. Together, these components allow us to balance the need for more data for accuracy and the need for controlling sensitivity due to a single user’s data.
The resulting performance is characterized roughly as follows; see Section 3 for the formal statement.
Main Result (Informal Version).
Our algorithm provides a user-level -DP mechanism to output a mean estimate at every time instant when the data has “sufficient diversity” with error , where is the maximum number of samples contributed by any user till time step .
We do not make any assumptions on the order in which the data points arrive from different users. The “sufficient diversity” condition in our theorem (formalized in Definition 3.3) codifies the necessity to have sufficient number of users with sufficient number of samples for our algorithm to achieve good accuracy while ensuring user-level privacy. Section 3 further elaborates the difficulty of obtaining good accuracy with arbitrary user ordering and how our exponential withhold-release mechanism helps overcome this.
1.3 Prior Work
Continually releasing query answers can leak more information when compared to the single-release setting, since each data point can now be involved in multiple query answers. In addition to this, each user can contribute multiple data points, in which case we would like to ensure that the output of any privacy-preserving mechanism we employ remains roughly the same even when all of the data points contributed by a user are changed. There have been a few foundational works on both these fronts [Dwo+10], [Jai+21], [Lev+21], but a unified treatment is lacking. Indeed, addressing the problem of user-level privacy in the streaming setting was noted as an interesting open problem in [Nik13].
Privately answering count queries in the continual release setting was first addressed in [Dwo+10], [CSS10], where the binary mechanism was introduced. This mechanism releases an estimate for the number of ones in a bit stream by using a small number of noisy partial sums. Compared to the naive scheme of adding noise at every time step which leads to an error that grows linearly with , the binary tree mechanism has error that depends logarithmically on . Following this, other works have tried to explore (item-level) private continual release in other settings [Cha+12], [Bol+13], [DKY17], [Jos+18], [PAK19], [Jai+21].
Releasing statistics under user-level privacy constraint was discussed in [Dwo+10], [Dwo+10a], and has recently begun to gain attention [Lev+21], [Cum+21], [NME22]. In particular, [Lev+21] considers user-level privacy for one-shot -dimensional mean estimation and shows that a truncation based estimator achieves error that scales as , provided there are at least users. This requirement on the number of users was later improved to in [NME22]. [Cum+21] takes into account heterogeneity of user distributions for mean estimation under user-level privacy constraints.
1.4 Organization
Section 2 describes the problem setup and gives a brief recap on user-level privacy and the binary mechanism. We describe our algorithm in Section 3 and give a rough sketch of its privacy and utility guarantees. Section 4 includes a discussion on extension to -dimensional mean estimation and handling other distribution families. Finally, in Section 5 we discuss the optimality of our algorithm, and end with discussion on future directions in Section 6.
2 Preliminaries
2.1 Problem setup
We observe an input stream of the form , where is the sample and is the user contributing the sample . The samples are drawn independently from a distribution with unknown mean. The goal is to output an estimate of the mean for every , such that the overall output is user-level -DP. We present our main theorems for the case when each is a Bernoulli random variable with unknown mean . Extensions to other distribution families are discussed in Section 4.
2.2 Differential privacy
Let and denote two streams of inputs. We say that and are user-level neighbors if there exists such that for every satisfying . We now define the notion of a user-level -DP algorithm.
Definition 2.1.
An algorithm is said to be user-level -DP if for every pair of streams that are user-level neighbors and every subset ,
We will be using the following composition result satisfied by DP mechanisms.
Lemma 2.2.
Let be user-level -DP mechanisms for . Then the composition of these mechanisms given by is user-level -DP.
We now define the Laplace mechanism, a basic privacy primitive we employ. We use to denote the Laplace distribution with parameter .
Definition 2.3.
For a function , the Laplace mechanism with parameter is a randomized algorithm with , where , and addition is coordinate-wise.
The following lemma states the privacy-utility guarantee of the Laplace mechanism.
Lemma 2.4.
The Laplace mechanism is -DP and guarantees
2.3 Binary mechanism
The binary mechanism (Algorithm 1) [CSS10] receives as input a stream and outputs a noisy sum of stream up to time at every such that the overall output is -DP. Furthermore, for any , the error between the output and the running-sum satisfies with probability at least . Here, is the magnitude of maximum possible variation of an element in the stream; for e.g., if for every , then .
We observe two important properties of the binary mechanism (Algorithm 1):
-
(i)
Any stream element is involved in computing at most terms of the array . For e.g., is needed only while computing for and so on.
-
(ii)
For any , the output is a sum of at most terms from the array .
We now describe how these properties of the binary mechanism lead to privacy and utility (accuracy) guarantees.
Privacy. Since, for every , the output is a function of , it suffices to ensure that the array is -DP. From (i) above, changing a stream element will change at most terms of the array . Moreover, since the maximum variation of a stream element is , each of these terms of will change by at most . Overall, changing an arbitrary stream element changes the -norm of the array by at most . Thus, to ensure that the overall stream of outputs from the binary mechanism is -DP, it suffices to add noise to each term of , where .
Utility. From (ii) above, since the output is a sum of at most terms from the array , where each term of has an independent noise, we have that, with probability at least , .
In our algorithms, we invoke an instance of the binary mechanism as , and abstract out its functionality using the following three ingredients:
-
•
: This is an array which will act as the input stream for the binary mechanism . In our algorithms, we will feed an element to only at certain special time instances.
-
•
: This array will be of the same length as . The -th term of will be computed after the -th element enters . The magnitude of Laplace noise to be added while computing terms of (magnitude of Laplace noise would be same for all terms) will be passed as a noise parameter while invoking . The sum output by would be formed by combining terms from .
-
•
: Suppose, at time , (and, thus, ) has elements. Then, is a function which, when invoked at time , will output (noisy sum of all elements in ) by combining terms stored in .
3 ALGORITHM
In this section, we build up to our main algorithm (Algorithm 6), pointing out the main ideas along the way.
A naive use of binary mechanism.
Given a stream (where ), Algorithm 2 presents a straightforward way to estimate mean in the continual-release user-level DP setting. The algorithm feeds to binary mechanism at every time instant . Since each user contributes at most samples, changing a user will change at most terms of . Moreover, since the maximum variation of any stream element is , changing a user will change the -norm of by at most . Thus, for , adding noise to each term of ensures that this algorithm is user-level -DP. To obtain an accuracy guarantee at any given time , we observe the following: Since are independent samples, we have that, with probability at least , . Moreover, since is a sum of at most terms from , where each term of has an independent noise, we have that, with probability at least , . Thus, using union bound, we have with probability at least that
(1) |
The first term on the right hand side of (1) is the usual statistical error that results from using samples to estimate . The second term in the error (“privacy error”), however, results from ensuring that the stream of estimates is user-level -DP; note that this term is linear in . We next describe a “wishful” scenario where the privacy error can be made proportional . We will then see how to obtain such a result in the general scenario, which is our main contribution.
A better algorithm in a wishful scenario: Exploiting concentration.
Naive use of the binary mechanism for continual mean estimation fails to exploit the concentration phenomenon that results from each user contributing multiple i.i.d. samples to the stream. To see how concentration might help, consider the following scenario. Suppose that (somehow) we already have a prior estimate that satisfies . Also, assume a user ordering where every user contributes their samples in contiguous time steps. That is, user contributes samples for the first time steps, followed by user 2 who contributes samples for the next time steps, and so on. In this case, Algorithm 3 presents a way to exploit the concentration phenomenon. Even though this algorithm outputs at every , it only updates at . Upon receiving a sample from a user, the algorithm does not immediately add it to . Instead, the algorithm waits for a user to contribute all their samples. It then computes the sum of those samples and projects the sum to the interval
(2) |
before feeding the projected sum to . By concentration of sum of i.i.d. Bernoulli random variables, and the fact that (by assumption), we have with probability at least that the sum of samples corresponding to every user will fall inside the interval ; thus, the projection operator has no effect with high probability.
Since there are users, at most elements are added to throughout the course of the algorithm. Now, since there can be at most elements in , a given element in will be used at most times while computing (see Section 2.3). So, changing a user can change the -norm of by at most , where is as in (2). Thus, to ensure that the algorithm is -DP (given initial estimate ), it suffices to add independent noise while computing each term in , where
(3) |
Note that defined above is proportional to , whereas the defined in the naive use of binary mechanism was proportional to . Thus (details in Appendix B), one obtains a privacy error of , while still having a statistical error of at every .
The scenario here is wishful for two reasons: (i) we assume a prior estimate that we used to form the interval (2); (ii) we assume a special user ordering where every user contributed their samples contiguously. This user ordering ensures that is updated after every time steps, and thus, for every , at least samples are used in computing ; this is the reason that the statistical error remains . Note that this algorithm will perform very poorly in the general user ordering. For instance, if the users contribute samples in a round-robin fashion (where a sample from user is followed by a sample from user and so on), then the algorithm would have to wait for time to obtain samples from any given user.
Although wishful, there are two main ideas that we take away from this discussion: One is the idea of withholding. That is, it is not necessary to incorporate information about every sample received till time to compute . Second is the idea of truncation. That is, the worst-case variation in the quantity of interest due to variation in a user’s data can be reduced by projecting user’s contributions on a smaller interval. This idea of exploiting concentration using truncation is also used in [Lev+21] for mean estimation in the single-release setting.
Designing algorithms for worst-case user order: Exponential withhold-release pattern.
The algorithm discussed above suffered in the worst-case user ordering since samples from a user were “withheld” until every sample from that user was obtained. This was done to exploit (using “truncation”) the concentration of sum of i.i.d. samples from the user. To exploit the concentration of sum of i.i.d. samples in the setting where the user order can be arbitrary, we propose the idea of withholding the samples (and applying truncation) at exponentially increasing intervals. Namely, for a given user , we do not withhold the first two samples ; then, we withhold and release truncated version of when we receive ; we then withhold and release truncated version of when we receive ; and so on. In general, we withhold samples and release truncated version of when we receive the -th sample from user .
We now present Algorithm 4, which uses this exponential withhold-release idea and, assuming a prior , outputs an estimate with statistical error and privacy error for arbitrary user ordering.
Continual mean estimation assuming prior estimate: Algorithm 4.
Let be the -th sample obtained from user . Let be the projection on the interval , where
(4) |
Let be the user at time . Upon receiving a sample from a user, the algorithm does not immediately add it to . Instead, the algorithm only adds anything new to at time instances when the total number of samples obtained from user becomes , for . At such a time, the algorithm adds to . (The summation inside makes sense only for ; for , which corresponds to the first sample from user , the algorithm simply adds the sample to .) This has the effect that, corresponding to a user, there are at most elements in . Since, there are at most users, there are at most elements added to throughout the course of the algorithm.
Now, since there can be at most elements in , a given element in will be used at most times while computing (see Section 2.3). Thus, changing a user can change the -norm of by at most , where is the maximum sensitivity of an element contributed by a user to . As can be seen from (4), we have . Thus, to ensure that Algorithm 4 is -DP (given initial estimate ), it suffices to add independent noise while computing each term in , where
(5) |
Note that, as in (3), the magnitude of noise in (5) is . Moreover, with exponential withhold-release pattern, we are guaranteed that at any time , the estimate is computed using at least samples, no matter what the user ordering is (Claim C.1 in Appendix C). This gives us the following guarantee (proof in Appendix C.2):
Theorem 3.1.
Assume that we are given a user-level -DP prior estimate of the true mean , such that . Then, Algorithm 4 is -DP. Moreover, for a given , we have with probability at least ,
We now present Algorithm 5, which, in conjunction with exponential withhold-release, uses multiple binary mechanisms. Assuming a prior estimate , this algorithm outputs an estimate with statistical error and privacy error , where is the maximum number of sample contributed by a user. Note that can be much smaller than for large values of .
Continual mean estimation assuming prior estimate: Algorithm 5.
In this algorithm, we use binary mechanisms , where . As was the case with Algorithm 4, here too the algorithm only adds anything new to a binary mechanism at time instances when the total number of samples obtained from user becomes , for . What differentiates Algorithm 5 from Algorithm 4 is that we use different binary mechanisms for different values of . Namely, the element from a user is added to . This ensures that each element in has maximum sensitivity , where (from (4)), .
Note that
(6) |
where the inequality holds because . To ensure that Algorithm 5 is -DP, we will make each binary mechanism -DP. For any , since each user contributes at most one element to , there are at most elements in throughout the course of the algorithm. Moreover, since every element in has sensitivity at most , it suffices to add independent noise while computing each term in , where
(7) |
Comparing above to in (5), we see that using multiple binary mechanisms allows us to add noise with magnitude that is more fine-tuned to exponential withhold-release pattern. In particular, if is the maximum number of samples contributed by any user till time , then at most binary mechanisms would be in use, and thus the maximum noise level would be proportional to . This gives us the following guarantee (proof in Appendix D):
Theorem 3.2.
Assume that we are given a user-level -DP prior estimate of the true mean , such that . Then, Algorithm 5 is -DP. Moreover, for any given , we have with probability at least that
where denotes the maximum number of samples obtained from any user till time , i.e., , where is the number of samples obtained from user till time .
We now present our final algorithm, Algorithm 6, which does not assume a prior estimate. Observe that the prior estimate was needed in previous algorithms only to form truncation intervals (see (4)). Algorithm 6 includes estimating as a subroutine. In fact, as we describe next, the algorithm estimates a separate prior for each binary mechanism.
Continual mean estimation without assuming prior estimate: Algorithm 6.
Since the algorithm uses (where ) binary mechanisms, we need not estimate up to accuracy in one go. Instead, we can have a separate for each binary mechanism, which is helpful since requires only up to an accuracy (see (6)).
In this algorithm, we mark as “inactive” till we have sufficient number of users with sufficient number of samples to estimate a user-level -DP prior up to an accuracy of . While is inactive, we store the elements that require for truncation in (see Line 24-25; each element of is a sum of samples, which we cannot truncate yet). Once we have sufficient number of users with sufficient number of samples (Line 9 lists the exact condition), we use a private median estimation algorithm (Algorithm 7) from [FS17] to estimate . At this point, we use to truncate elements stored in , and pass these elements to (Line 11).
(8) |
(9) |
(10) |
We note that the private median estimation algorithm is also used in [Lev+21] in the single-release setting with all users contributing equal number of samples; we extend this to the setting where number of samples from users can vary. In Appendix E.1 (Claim E.1), we show that, for any , this modified private median estimation algorithm (Algorithm 7) is user-level -DP. Moreover, with probability at least , we have that , where .
Algorithm 6 demonstrates another advantage of using multiple binary mechanisms – that we can have different priors for different binary mechanisms, which means that the algorithm does not need to wait for long to start outputting estimates with good guarantees. Theorem 3.4 (proof in Appendix E.2) states the exact guarantees ensured by Algorithm 6. Before stating the theorem, we define what we mean by “diversity condition”.
Definition 3.3 (Diversity condition).
We say that “diversity condition holds at time ” if
(11) |
where and is the maximum number of samples contributed by any user till time . That is, , where is the number of samples contributed by user till time .
In words, condition (11) says that we need the number of users at time to be large enough so that we can form a collection of samples by using at most samples per user. A sufficient condition to ensure (11) holds is that there are at least users that have contributed at least samples each till time . In particular, since , we have the following: Once there are users that have contributed at least samples each till time , then diversity condition holds for every .
Theorem 3.4.
What happens at time instants when the diversity condition does not hold? This happens when there are very few users who have contributed a very large number of samples. Algorithm 6 stores these samples in buffers (since the corresponding binary mechanisms are “inactive”) and does not use them to estimate . This is done to preserve user-level privacy and seems like a necessary thing to do. However, currently we do not know whether there is a user-level private way to use these extra samples from few users. In other words, it is not clear if our diversity condition can be weakened.
4 EXTENSIONS
To perform mean estimation on -dimensional inputs with independent coordinates drawn from , one can simply use Algorithm 6 on each coordinate. Since the release corresponding to each coordinate is user-level -DP, we get that the overall algorithm is -DP by basic composition. However, if we only require an approximate differential privacy guarantee, then [DRV10, Theorem III.3] shows that the full sequence of releases from all coordinates will be -DP for every . Rescaling the privacy parameter to ensure -DP overall gives error , provided . These arguments carry over to the case of subgaussian distributions as well.
5 LOWER BOUND
Consider the single-release mean estimation problem with users, each having samples, where the estimated mean must be user-level -DP. In this setting, a lower bound of on achievable accuracy is known (Theorem 3 in [Liu+20]). Furthermore, in the same setting, Theorem 9 in [Lev+21] shows that any algorithm that is user-level -DP requires users. To see what these lower bounds say about our proposed continual-release algorithm (Algorithm 6), let be a time instant where users have contributed samples each (thus, ). In this case, , and Theorem 3.4 gives an accuracy guarantee of , provided (diversity condition). This matches the single-release lower bounds (on both accuracy and number of users) up to log factors.
6 DISCUSSION
We have shown that Algorithm 6 is almost optimal at every time instant where users have contributed equal number of samples. However, what about the settings when different users contribute different number of samples? Is it optimal, for instance, to not use excessive samples from a single user? The answer is not very clear even in the single-release setting. Investigating this is an interesting future direction.
References
- [NS08] Arvind Narayanan and Vitaly Shmatikov “Robust De-anonymization of Large Sparse Datasets” In 2008 IEEE Symposium on Security and Privacy, 2008, pp. 111–125
- [SAW13] Latanya Sweeney, Akua Abu and Julia Winn “Identifying Participants in the Personal Genome Project by Name” In Data Privacy Lab, IQSS, Harvard University, 2013 URL: http://dataprivacylab.org/projects/pgp/
- [GAM19] Simson L. Garfinkel, John M. Abowd and Chris Martindale “Understanding database reconstruction attacks on public data” In Communications of the ACM 62, 2019, pp. 46–53
- [Dwo+06] Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith “Calibrating Noise to Sensitivity in Private Data Analysis” In TCC 3876, Lecture Notes in Computer Science Springer, 2006, pp. 265–284
- [Ami+19] Kareem Amin, Alex Kulesza, Andres Muñoz Medina and Sergei Vassilvitskii “Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy” In Proceedings of the 36th International Conference on Machine Learning PMLR, 2019, pp. 263–271
- [CSS10] T.-H. Chan, Elaine Shi and Dawn Song “Private and Continual Release of Statistics” In ICALP (2) 6199, Lecture Notes in Computer Science Springer, 2010, pp. 405–417
- [Dwo+10] Cynthia Dwork, Moni Naor, Toniann Pitassi and Guy N. Rothblum “Differential Privacy under Continual Observation” In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC ’10 Cambridge, Massachusetts, USA: Association for Computing Machinery, 2010, pp. 715–724
- [Lev+21] Daniel Levy et al. “Learning with User-Level Privacy” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 12466–12479
- [DRV10] Cynthia Dwork, Guy N. Rothblum and Salil Vadhan “Boosting and Differential Privacy”, FOCS ’10 IEEE Computer Society, 2010, pp. 51–60 URL: https://doi.org/10.1109/FOCS.2010.12
- [Jai+21] Palak Jain, Sofya Raskhodnikova, Satchit Sivakumar and Adam D. Smith “The Price of Differential Privacy under Continual Observation” In CoRR abs/2112.00828, 2021 arXiv: https://arxiv.org/abs/2112.00828
- [Nik13] Aleksandar Nikolov “Differential Privacy in the Streaming World”, Simons Institute Workshop on Big Data and Differential Privacy, 2013
- [Cha+12] T.-H. Chan, Mingfei Li, Elaine Shi and Wenchang Xu “Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams” In Proceedings of the 12th International Conference on Privacy Enhancing Technologies, PETS’12 Vigo, Spain: Springer-Verlag, 2012, pp. 140–159
- [Bol+13] Jean Bolot et al. “Private Decayed Predicate Sums on Streams” In Proceedings of the 16th International Conference on Database Theory, ICDT ’13 Genoa, Italy: Association for Computing Machinery, 2013, pp. 284–295
- [DKY17] Bolin Ding, Janardhan Kulkarni and Sergey Yekhanin “Collecting Telemetry Data Privately” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017
- [Jos+18] Matthew Joseph, Aaron Roth, Jonathan Ullman and Bo Waggoner “Local Differential Privacy for Evolving Data” In Proceedings of the 32nd International Conference on Neural Information Processing Systems Montréal, Canada: Curran Associates Inc., 2018, pp. 2381–2390
- [PAK19] Victor Perrier, Hassan Jameel Asghar and Dali Kaafar “Private Continual Release of Real-Valued Data Streams” In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019 The Internet Society, 2019
- [Dwo+10a] Cynthia Dwork et al. “Pan-Private Streaming Algorithms” In ICS Tsinghua University Press, 2010, pp. 66–80
- [Cum+21] Rachel Cummings, Vitaly Feldman, Audra McMillan and Kunal Talwar “Mean Estimation with User-level Privacy under Data Heterogeneity” In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021 URL: https://openreview.net/forum?id=oYbQDV3mon-
- [NME22] Shyam Narayanan, Vahab Mirrokni and Hossein Esfandiari “Tight and Robust Private Mean Estimation with Few Users” In Proceedings of the 39th International Conference on Machine Learning 162, Proceedings of Machine Learning Research PMLR, 2022, pp. 16383–16412
- [FS17] Vitaly Feldman and Thomas Steinke “Generalization for Adaptively-chosen Estimators via Stable Median” In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017 65, Proceedings of Machine Learning Research PMLR, 2017, pp. 728–757
- [Liu+20] Yuhan Liu et al. “Learning discrete distributions: user vs item-level privacy” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 20965–20976 URL: https://proceedings.neurips.cc/paper/2020/file/f06edc8ab534b2c7ecbd4c2051d9cb1e-Paper.pdf
- [DR+14] Cynthia Dwork and Aaron Roth “The algorithmic foundations of differential privacy” In Foundations and Trends® in Theoretical Computer Science 9.3–4 Now Publishers, Inc., 2014, pp. 211–407
Appendix A Useful Inequalities
We state two concentration inequalities that we will use extensively.
Lemma A.1.
Let for . Then, for every ,
Lemma A.2.
Let , , be independent. Then, for every ,
where is an absolute constant.
Appendix B Continual mean estimation: Wishful scenario
Algorithm 3 is the algorithm under the wishful scenario where:
-
•
we already have a prior estimate that satisfies ;
-
•
every user contributes their samples in contiguous time steps; that is, user contributes samples for the first time steps, followed by user 2 who contributes samples for the next time steps, and so on.
B.1 Guarantees for Algorithm 3
Let denote the -th sample contributed by user .
Privacy.
A user contributes at most one element to ; this element is , where is the projection on the interval defined in (2). So, there are at most elements in throughout the course of the algorithm. From the way binary mechanism works, a given element in will be used at most times while computing terms in (see Section 2.3 in the main paper). Thus, changing a user can change the -norm of the array by at most , where is as in (2). Hence, adding independent noise (with as in (3)) while computing each term in is sufficient to ensure that the array remains user-level -DP throughout the course of the algorithm. Since the output is computed using , which, in turn is a function of the array , we conclude that Algorithm 3 is user-level -DP.
Utility.
We first show that with high probability, the projection operator plays no role throughout the algorithm. We then show utility guarantee for assuming no truncation, before generalizing the guarantee to arbitrary .
No truncation happens:
For a user , we have from Lemma A.1 that
Since , this gives us
Thus, by union bound
This means, that with probability at least , we have for every . That is, with probability at least , no truncation happens throughout the course of the algorithm.
Guarantee at ignoring projection :
For now, consider Algorithm 3 without the projection operator in Line 9. Call it Algorithm 3-NP (NP ‘No Projection’). For Algorithm 3-NP, the following happens at , for integer : has elements Let . Thus, also contains terms, where each term has independent added to it. Hence, would be computed using at most terms from , and so, using Lemma A.2, we have
Furthermore, using Lemma A.1, we have for that
Thus, we have the following at :
or, dividing by , we have
(12) |
Guarantee at arbitrary ignoring projection :
We now give utility guarantee of Algorithm 3-NP, for arbitrary . Note that for any , , we output . Moreover, we always have that . Thus, for , we have, with probability
(since ) |
Final utility guarantee for Algorithm 3 at arbitrary :
The above utility guarantee was obtained for Algorithm 3-NP, which is a variant of Algorithm 3 without projection operator in Line 9. Now, let be the event that no truncation happens. We already saw that no truncation happens (i.e. projection operator plays no role) with probability at least . That is, . Observe that
Let
We saw above that . Thus, using union bound that . That is, for Algorithm 3, we have that for , (we upper bound by )
Substituting value of from (3), we get that for , with probability at least , we have
This guarantee holds trivially for as well because the algorithm outputs the prior estimate (Lines 3-5), which gives us that .
Remark: Throughout this section, we assumed that we were given a prior estimate satisfying . Our discussion here also applies to the case even when we have a prior that satisfies with probability at least (instead of being deterministic). Also, if is computed using samples from users, it should be user-level DP. In particular, if is user-level -DP, the overall algorithm becomes user-level -DP (using composition property of DP from Lemma 2.2).
Appendix C Continual mean estimation assuming prior estimate: Single binary mechanism
Before proving guarantees for Algorithm 4 (Theorem 3.1), we prove a claim about exponential withhold-release pattern with arbitrary user ordering.
C.1 Exponential withhold-release
Recall the exponential withhold-release pattern. For a given user , we release the first two samples ; then, we withhold and release truncated version of when we receive ; we then withhold and release truncated version of when we receive ; and so on. In general, we withhold samples and release truncated version of when we receive the -th sample from user .
We ignore truncations for now. For a user , let . Let be a stream with arbitrary user order. We follow the exponential withhold-release protocol and feed ’s to an array named . For instance, suppose we receive , where the second index denotes the user identity. Then, the input feed to looks as follows:
(13) |
Since we are only interested in computing sums, feeding to is equivalent to feeding information about samples to . We now claim the following.
Claim C.1.
Let have arbitrary user ordering. Suppose we follow the exponential withhold-release protocol and feed ’s to an array named . Then, at any time , contains information about at least samples.
Proof.
Let ‘R’ denote ‘release’ and ‘W’ denote ‘withhold’. Suppose, for now, only user arrives at all time steps. Then, the withhold-release sequence looks like . Here, ‘’ denotes ‘’ occurs for the next steps in the sequence. Moreover, at every ‘’, information about the samples withheld after previous ‘’ are released. Note that, at any point along this withhold-release sequence, the number of samples withheld is less than or equal to the number of samples whose information has been released. This is for a given user .
Now, consider a withhold-release sequence induced by an arbitrary user ordering. E.g., if the user order is , then the corresponding withhold-release sequence is (see (13)). Here, subscript denotes user ID. Now, at any time in this withhold-release sequence, if we just consider ’s and ’s up to time for a fixed user , we will have that (argued above) the number of samples withheld for user is less than or equal to the number of samples from user whose information has been released. This holds for every user who has occurred till time . Thus, at any time , total number of samples withheld (across all users) is less than or equal to total number of samples (across all users) whose information has been released. Hence, contains information about at least samples. ∎
C.2 Proof of Theorem 3.1
We will prove the following theorem.
Theorem.
Assume that we are given a user-level -DP prior estimate of the true mean , such that . Then, Algorithm 4 is -DP. Moreover, for any given , we have with probability at least that
Proof.
We fist prove the privacy guarantee and then prove the utility guarantee.
Privacy.
The prior estimate is given to be user-level -DP. We will now show that the array
is user-level -DP throughout the course of the algorithm. Note that the output estimates are computed using , which is a function of . Thus, if and are both user-level -DP, by composition property (Lemma 2.2), the overall output will be user-level -DP.
Proof that is user-level -DP:
Since a user contributes at most samples, and does so in an exponential withhold-release pattern, there are at most elements in corresponding to user . Since there are at most users, there are at most elements added to throughout the course of the algorithm.
Now, since there can be at most elements in , a given element in will be used at most times while computing . Thus, changing a user can change the -norm of by at most , where is the maximum sensitivity of an element contributed by a user to . As can be seen from (4), we have for every . Thus, is an upper bound on worst-case sensitivity of an element contributed by a user to . Hence, adding independent noise (with as in (5)) while computing each term in is sufficient to ensure that is user-level -DP.
Utility.
We first show that with high probability, the projection operators play no role throughout the course of the algorithm.
No truncation happens:
For a user , for any , we have from Lemma A.1 that
Since , this gives us that, for any user , for any ,
Note that a user contributes at most samples. Thus, at most projection operators are applied per user. Applying union bound (over ), we have that, for any user
Now, we take union bound over users, which gives us that
Note that the projection operator was defined as projection on interval as in (4). The above equation shows that with probability at least , the projection operators do not play any role throughout the algorithm, and thus no truncation happens.
Utility at time ignoring projections :
For now, consider Algorithm 4 without the projection operator in Line 11. Call it Algorithm 4-NP (NP ‘No Projection’). For Algorithm 4-NP, at any time , has terms of the form (note that “contains information” about samples from user ). From Claim C.1, we have that, at time , contains information about at least samples. Thus, would be sum of at least samples with Laplace noises added to it.
Note that, at any time , at most elements are present in . This also means that at most elements are present in . Thus, computing (using ) would involve at most terms from . Each term in has independent noise added to it, where is as in (5).
Final utility guarantee for Algorithm 4 at time :
The above utility guarantee was obtained for Algorithm 4-NP, which is a variant of Algorithm 4 without projection operator in Line 11. Now, let be the event that no truncation happens. We already saw that no truncation happens (i.e. projection operator plays no role) with probability at least . That is, . Observe that
Let
We saw above that . Thus, using union bound that . That is, for Algorithm 4, we have, with probability at least ,
(using from (5)) | ||||
∎
Appendix D Continual mean estimation assuming prior: Multiple binary mechanisms
D.1 Proof of Theorem 3.2
We will prove the following theorem.
Theorem.
Assume that we are given a user-level -DP prior estimate of the true mean , such that . Then, Algorithm 5 is -DP. Moreover, for any given , we have with probability at least that
where denotes the maximum number of samples obtained from any user till time , i.e., , where is the number of samples obtained from user till time .
Proof.
We fist prove the privacy guarantee and then prove the utility guarantee. Let .
Privacy.
The prior estimate is given to be user-level -DP. We will now show that, for each , the array is user-level -DP. By composition property (Lemma 2.2), this would mean that is user-level -DP. Note that the output estimates are computed using , which is a function of
. Thus, if is user-level -DP, and prior estimate is also user-level -DP, the overall output is guaranteed to be user-level -DP.
Proof that is user-level -DP:
Consider . A user contributes at most one element to ; this element is , where is the projection on the interval defined in 4.
So, there are at most elements in throughout the course of the algorithm.
A given element in will be used at most times while computing terms in . Thus, changing a user can change the -norm of the array
by at most , where is as in (4).
Hence, adding independent noise (with as in (7)) while computing each term in
is sufficient to ensure that the array remains user-level -DP throughout the course of the algorithm.
Utility.
Exactly as in the utility proof of Theorem 3.1 (Section C.2), we have that with probability at least , the projection operators do not play any role throughout the algorithm, and thus no truncation happens.
Utility at time ignoring projections :
For now, consider Algorithm 5 without the projection operator in Line 11. Call it Algorithm 5-NP (NP ‘No Projection’). We will derive utility at time for Algorithm 5-NP.
The only difference between Algorithm 5-NP and Algorithm 4-NP is that the term from user is fed to instead of feeding it to a common binary mechanism stream. So, for Algorithm 5-NP, using Claim C.1, we have that, at time , the combined streams contain information about at least samples. Thus, would be sum of at least samples with Laplace noises added to it.
Now, since every user has contributed at most samples, it follows that for will be empty. That is, only will contribute to the sum at time . Moreover, for , since each user contributes at most one item to , there can be at most terms in and in . Thus, computing at time involves at most terms from . Each term in has independent noise added to it, where is as in (7).
Final utility guarantee for Algorithm 5 at time :
The above utility guarantee was obtained for Algorithm 5-NP, which is a variant of Algorithm 5 without projection operator in Line 11. But, indeed, with probability at least , no truncation happens. Proceeding as in the proof of Theorem 3.1 (Section C.2), we take a union bound, and get that, with probability at least , Algorithm 5 satisfies
(from (7)) | ||||
∎
Appendix E Continual mean estimation: Full algorithm
E.1 Private Median Algorithm
Claim E.1.
Proof.
Let .
Privacy.
From the way arrays in Algorithm 7 are created (Lines 3-8), it follows that samples from any given user appears in at most arrays. This is because: (i) each array contains samples; and (ii) each user contributes at most samples (see definition of in Line 4); (iii) samples from a user are added contiguously to arrays (see Lines 5-6). Now, for , since is the average of samples in array , and is a quantized version of , it follows that changing a user changes at most elements out of . Thus, for any , the cost can vary by at most if a user is changed. Since the worst-case sensitivity (w.r.t. change of a user) of cost is , exponential mechanism with sampling probability proportional to is -DP ([DR+14]) w.r.t. change of a user. This proves that Algorithm 7 is user-level -DP.
Utility.
Since for each , is sample mean of Bernoulli random variables, we have by Lemma A.1 that
Thus, by union bound,
Since , and , it follows that
(14) |
Now, from Theorem 3.1 in [FS17], it follows that the exponential mechanism outputs which is a -quantile of with probability at least . (In the statement of Theorem 3.1 in [FS17], the condition “” becomes, in our case, after substituting , , and accounting for the fact that the cost has sensitivity w.r.t. change of a user).
If holds , it must also hold for -quantile of . Thus, from (14) and the fact that is a -quantile 111For a dataset , a -quantile is any such that and . of with probability at least , we get using union bound that
∎
E.2 Proof of Theorem 3.4
We will prove the following theorem.
Theorem.
Algorithm 6 for continual Bernoulli mean estimation is user-level -DP. Moreover, if at time ,
then, with probability at least ,
Here, is the number of samples obtained from user till time , and .
Proof.
Let .
Privacy.
Algorithm 6 uses samples to:
-
(i)
compute using , for ; and
-
(ii)
compute the array for .
The output is computed using , which, in turn is a function of the array
, .
From Claim E.1, we get that is user-level -DP. Thus, from composition property of DP, we get that is user-level -DP.
Consider . A user contributes at most one element to ; this element is , where is the projection on the interval defined in 8. So, there are at most elements in throughout the course of the algorithm. Now, a given element in will be used at most times while computing terms in . Thus, changing a user can change the -norm of the array by at most , where is as in (9). Hence, adding independent noise (with as in (10)) while computing each term in is sufficient to ensure that the array remains user-level -DP throughout the course of the algorithm. Since there are binary mechanisms, by composition property of DP, we get that overall is user-level -DP.
Since is user-level -DP, and is user-level -DP, we again use composition property to conclude that the output by Algorithm 6 is user-level -DP.
Utility.
At time , is the maximum number of samples contributed by any user. We will call “active” at time , if there are sufficient number of users and samples so that can be obtained using (Line 10 of Algorithm 6). Recall that we need to create truncation interval (see (8)). We know that, for a given user , goes to , goes to , and goes to for , provided is “active”. Thus, at time , since the maximum number of samples contributed by any user is , we would like all binary mechanisms till to be “active”, where . Condition (11) guarantees that we have sufficient number of users and samples to obtain via (Algorithm 7), thus ensuring that every truncation required at time is indeed possible.
All ’s are “good”:
Suppose diversity condition (11) holds. Then, for each , outputs a “good” satisfying
with probability at least (see Claim E.1). Thus, by union bound, we have the following: with probability at least , is “good” for every .
No truncation happens:
For a user , for any , we have from Lemma A.1 that
Taking union bound over users, we have that
Now, since all ’s are “good” with probablity at least , we get using union bound that
Note that the projection operator was defined as projection on interval as in (8). The above equation shows that with probability at least , the projection operators do not play any role throughout the algorithm, and thus no truncation happens.
Utility at time ignoring projections :
For now, consider Algorithm 6 without the projection operator in Line 22. Call it Algorithm 6-NP (NP ‘No Projection’). We will derive utility at time for Algorithm 6-NP.
Note that Algorithm 6 -NP does not require ’s. Thus, utility of Algorithm 6-NP would be same as the utility of Algorithm 5 -NP in the proof of Theorem 3.2 (Section D.1). Hence, at time , for Algorithm 6-NP,
where is as in (10).
Final utility guarantee for Algorithm 6 at time :
The above utility guarantee was obtained for Algorithm 6-NP, which is a variant of Algorithm 6 without projection operator in Line 22. But, as argued above, with probability at least , no truncation happens. Thus, proceeding as in the proof of Theorem 3.1 (Section C.2), we take a union bound, and get that, with probability at least , Algorithm 6 satisfies
(from (10)) | ||||
(substituting for from (8)) | ||||
∎