This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Continual Mean Estimation Under User-Level Privacy

Anand Jerry George * École Polytechnique Fédérale de Lausanne Lekshmi Ramesh Indian Institute of Science, Bangalore Aditya Vikram Singh Indian Institute of Science, Bangalore Himanshu Tyagi Indian Institute of Science, Bangalore
Abstract

We consider the problem of continually releasing an estimate of the population mean of a stream of samples that is user-level differentially private (DP). At each time instant, a user contributes a sample, and the users can arrive in arbitrary order. Until now these requirements of continual release and user-level privacy were considered in isolation. But, in practice, both these requirements come together as the users often contribute data repeatedly and multiple queries are made. We provide an algorithm that outputs a mean estimate at every time instant tt such that the overall release is user-level ε\varepsilon-DP and has the following error guarantee: Denoting by MtM_{t} the maximum number of samples contributed by a user, as long as Ω~(1/ε)\tilde{\Omega}(1/\varepsilon) users have Mt/2M_{t}/2 samples each, the error at time tt is O~(1/t+Mt/tε)\tilde{O}(1/\sqrt{t}+\sqrt{M}_{t}/t\varepsilon). This is a universal error guarantee which is valid for all arrival patterns of the users. Furthermore, it (almost) matches the existing lower bounds for the single-release setting at all time instants when users have contributed equal number of samples.

1 Introduction

Aggregate queries over data sets were originally believed to maintain the privacy of data contributors. However, over the past two decades several attacks have been proposed to manipulate the output of aggregate queries to get more information about an individual user’s data [NS08][SAW13][GAM19]. To address this, several mechanisms have been proposed to release a noisy output instead of the original query output. Remarkably, these mechanisms have been shown to preserve privacy under a mathematically rigorous privacy requirement condition called differential privacy [Dwo+06]. But, until recently, the analysis assumed that each user contributes one data-point and not multiple. Furthermore, the data set was assumed to be static and only one query was made. Our goal in this paper is to address a more practical situation where multiple queries are made and the data set keeps on getting updated between two queries, using contributions from existing or new users. We provide an almost optimal private mechanism for this case for the specific running-average query, which can easily be adopted to answer more general aggregate queries as well.

1.1 Problem formulation and some heuristics

Consider a stream (x1,,xT)(x_{1},\ldots,x_{T}) of TT data points contributed by nn users, where TnT\geq n. For simplicity, we assume that one point is contributed at each time: xtdx_{t}\in\mathbb{R}^{d} is contributed by a user ut[n]u_{t}\in[n] at time tt. The maximum number of samples contributed by any user till time instant tt is denoted by MtM_{t}, and we assume that MtmM_{t}\leq m, i.e., each user contributes at most mm samples.

We formulate our query release problem as a statistical estimation problem to identify an optimal mechanism. Specifically, we assume that each xtx_{t} is drawn independently from a distribution PP on d\mathbb{R}^{d} with unknown mean μ\mu. At each time step t[T]t\in[T], we are required to output an estimate μ^t\hat{\mu}_{t} for the mean of PP, while guaranteeing that the sequence of outputs (μ^1,,μ^t)(\hat{\mu}_{1},\ldots,\hat{\mu}_{t}) is user-level ε\varepsilon-differentially private (ε\varepsilon-DP). Namely, the output is ε\varepsilon-DP with respect to the user input comprising all the data points contributed by a single user [Ami+19].

A naive application of standard differentially private mechanisms at each time step will lead to error rates with suboptimal dependence on the time window and the number of samples contributed by each user. For instance, consider the most basic setting where users draw samples independently from a Bernoulli distribution with parameter μ\mu. At any time t[T]t\in[T], a single user can affect the sample mean (1/t)i=1txi(1/t)\sum_{i=1}^{t}x_{i} by at most Mt/tM_{t}/t. Adding Laplace noise with parameter Mt/(tε)M_{t}/(t\varepsilon) to this running sum will therefore guarantee user-level ε\varepsilon-DP at each step (implying εt\varepsilon t-DP over tt steps) and an error of O(1/t+Mt/tε)O(1/\sqrt{t}+M_{t}/t\varepsilon). Rescaling the privacy parameter, we get an error that scales as O(1/t+Mt/ε)O(1/\sqrt{t}+M_{t}/\varepsilon). While the statistical error term is optimal, the error term due to privacy is not – it does not, in fact, improve as time progresses. One can do better by using mechanisms specific to the streaming setting such as the binary mechanism [CSS10][Dwo+10], as we describe in more detail in Section 3. Indeed, we will see that the privacy error term can be improved in two aspects: first, it can be made to decay as time progresses, and second, it only needs to grow sublinearly with MtM_{t}.

However, a key challenge still remains. Each user is contributing multiple samples to the stream, and different samples from the same user can come at arbitrary time instants. The output must depend on the the arrival pattern of the users. For instance, when all samples in the stream are contributed by a single user, we cannot release much information. Indeed, changing this user’s data can potentially change any query responses by a large amount, leading to increased sensitivity and addition of large amounts of noise to guarantee user-level privacy. A better strategy is to withhold answers for a while until a new user arrives – this provides a sort of diversity advantage and reduces the amount of noise we need to add. The process of withholding alone does not, however, lead to an optimal error rate. We additionally need to control sensitivity by forming local averages of each users samples and then truncating these averages, as is done in the one-shot setting [Lev+21], before adding noise.

1.2 Summary of contributions

We present a modular and easy-to-implement algorithm for continual mean estimation under user-level privacy constraints. Our main algorithm has two key components: a sequence of binary mechanisms that keep track of truncated averages and a withhold-release mechanism that decides when to update the mean estimate. The role of binary mechanisms is similar to [CSS10][DRV10] – to minimize the number of outputs each data point influences. The withhold-release mechanism releases a query only when there is “sufficient diversity,” i.e., enough users have contributed to the data. In fact, in a practical implementation of our algorithm, we will need to maintain this diversity by omitting excessive number of samples from a few users from the mean estimate evaluation. Together, these components allow us to balance the need for more data for accuracy and the need for controlling sensitivity due to a single user’s data.

The resulting performance is characterized roughly as follows; see Section 3 for the formal statement.

Main Result (Informal Version).

Our algorithm provides a user-level ε\varepsilon-DP mechanism to output a mean estimate at every time instant tt when the data has “sufficient diversity” with error O~(1/t+Mt/tε)\tilde{O}(1/\sqrt{t}+\sqrt{M_{t}}/t\varepsilon), where MtM_{t} is the maximum number of samples contributed by any user till time step tt.

We do not make any assumptions on the order in which the data points arrive from different users. The “sufficient diversity” condition in our theorem (formalized in Definition 3.3) codifies the necessity to have sufficient number of users with sufficient number of samples for our algorithm to achieve good accuracy while ensuring user-level privacy. Section 3 further elaborates the difficulty of obtaining good accuracy with arbitrary user ordering and how our exponential withhold-release mechanism helps overcome this.

1.3 Prior Work

Continually releasing query answers can leak more information when compared to the single-release setting, since each data point can now be involved in multiple query answers. In addition to this, each user can contribute multiple data points, in which case we would like to ensure that the output of any privacy-preserving mechanism we employ remains roughly the same even when all of the data points contributed by a user are changed. There have been a few foundational works on both these fronts [Dwo+10][Jai+21][Lev+21], but a unified treatment is lacking. Indeed, addressing the problem of user-level privacy in the streaming setting was noted as an interesting open problem in [Nik13].

Privately answering count queries in the continual release setting was first addressed in  [Dwo+10][CSS10], where the binary mechanism was introduced. This mechanism releases an estimate for the number of ones in a bit stream by using a small number of noisy partial sums. Compared to the naive scheme of adding noise at every time step which leads to an error that grows linearly with TT, the binary tree mechanism has error that depends logarithmically on TT. Following this, other works have tried to explore (item-level) private continual release in other settings [Cha+12][Bol+13][DKY17][Jos+18][PAK19],  [Jai+21].

Releasing statistics under user-level privacy constraint was discussed in [Dwo+10][Dwo+10a], and has recently begun to gain attention [Lev+21][Cum+21][NME22]. In particular, [Lev+21] considers user-level privacy for one-shot dd-dimensional mean estimation and shows that a truncation based estimator achieves error that scales as O~(d/mn+d/mnε)\tilde{O}(\sqrt{d/mn}+\sqrt{d}/\sqrt{m}n\varepsilon), provided there are at least O~(d/ε)\tilde{O}(\sqrt{d}/\varepsilon) users. This requirement on the number of users was later improved to O~(1/ε)\tilde{O}(1/\varepsilon) in [NME22]. [Cum+21] takes into account heterogeneity of user distributions for mean estimation under user-level privacy constraints.

1.4 Organization

Section 2 describes the problem setup and gives a brief recap on user-level privacy and the binary mechanism. We describe our algorithm in Section 3 and give a rough sketch of its privacy and utility guarantees. Section 4 includes a discussion on extension to dd-dimensional mean estimation and handling other distribution families. Finally, in Section 5 we discuss the optimality of our algorithm, and end with discussion on future directions in Section 6.

2 Preliminaries

2.1 Problem setup

We observe an input stream of the form ((x1,u1),(x2,u2),,(xT,uT))((x_{1},u_{1}),(x_{2},u_{2}),\ldots,(x_{T},u_{T})), where xt𝒳x_{t}\in\mathcal{X} is the sample and ut[n]u_{t}\in[n] is the user contributing the sample xtx_{t}. The samples (xt)t[T\left(x_{t}\right)_{t\in[T} are drawn independently from a distribution with unknown mean. The goal is to output an estimate μ^t\hat{\mu}_{t} of the mean for every t[T]t\in[T], such that the overall output (μ^1,,μ^T)(\hat{\mu}_{1},\ldots,\hat{\mu}_{T}) is user-level ε\varepsilon-DP. We present our main theorems for the case when each xtx_{t} is a Bernoulli random variable with unknown mean μ\mu. Extensions to other distribution families are discussed in Section 4.

2.2 Differential privacy

Let σ=((xt,ut))t[T]\sigma=\Big{(}(x_{t},u_{t})\Big{)}_{t\in[T]} and σ=((xt,ut))t[T]\sigma^{\prime}=\Big{(}(x_{t}^{\prime},u_{t}^{\prime})\Big{)}_{t\in[T]} denote two streams of inputs. We say that σ\sigma and σ\sigma^{\prime} are user-level neighbors if there exists j[n]j\in[n] such that xt=xtx_{t}=x_{t}^{\prime} for every t[T]t\in[T] satisfying utju_{t}\neq j. We now define the notion of a user-level ε\varepsilon-DP algorithm.

Definition 2.1.

An algorithm 𝒜:(𝒳×[n])T𝒴\mathcal{A}:(\mathcal{X}\times[n])^{T}\rightarrow\mathcal{Y} is said to be user-level ε\varepsilon-DP if for every pair of streams σ,σ\sigma,\sigma^{\prime} that are user-level neighbors and every subset Y𝒴Y\subseteq\mathcal{Y}, (𝒜(σ)Y)eε(𝒜(σ)Y).\mathbb{P}(\mathcal{A}(\sigma)\in Y)\leq e^{\varepsilon}\ \mathbb{P}(\mathcal{A}(\sigma^{\prime})\in Y).

We will be using the following composition result satisfied by DP mechanisms.

Lemma 2.2.

Let i:(𝒳×[n])T𝒴\mathcal{M}_{i}:(\mathcal{X}\times[n])^{T}\rightarrow\mathcal{Y} be user-level εi\varepsilon_{i}-DP mechanisms for i[k]i\in[k]. Then the composition :(𝒳×[n])Tk\mathcal{M}:(\mathcal{X}\times[n])^{T}\rightarrow\mathbb{R}^{k} of these mechanisms given by (x):=(1(x),,k(x))\mathcal{M}(x):=(\mathcal{M}_{1}(x),\ldots,\mathcal{M}_{k}(x)) is user-level (i=1kεi)(\sum_{i=1}^{k}\varepsilon_{i})-DP.

We now define the Laplace mechanism, a basic privacy primitive we employ. We use Lap(b){\rm Lap}(b) to denote the Laplace distribution with parameter bb.

Definition 2.3.

For a function f:𝒳nkf:\mathcal{X}^{n}\rightarrow\mathbb{R}^{k}, the Laplace mechanism with parameter ε\varepsilon is a randomized algorithm :𝒳nk\mathcal{M}:\mathcal{X}^{n}\rightarrow\mathbb{R}^{k} with (x)=f(x)+(Z1,,Zk)\mathcal{M}(x)=f(x)+(Z_{1},\cdots,Z_{k}), where Z1,,Zki.i.d.Lap(Δf/ε)Z_{1},\ldots,Z_{k}\overset{\rm i.i.d.}{\sim}{\rm Lap}(\Delta f/\varepsilon), Δf:=maxx,x𝒳nf(x)f(x)1\Delta f:=\max_{x,x^{\prime}\in\mathcal{X}^{n}}\left\lVert f(x)-f(x^{\prime})\right\rVert_{1} and addition is coordinate-wise.

The following lemma states the privacy-utility guarantee of the Laplace mechanism.

Lemma 2.4.

The Laplace mechanism is ε\varepsilon-DP and guarantees

(f(x)(x)Δf/εln(k/δ))δδ(0,1].\mathbb{P}\Big{(}{\left\lVert f(x)-\mathcal{M}(x)\right\rVert_{\infty}\geq\Delta f/\varepsilon\cdot\ln(k/\delta)}\Big{)}\leq\delta\quad\forall\,\delta\in(0,1].

2.3 Binary mechanism

The binary mechanism (Algorithm 1) [CSS10] receives as input a stream (x1,x2,,xT)(x_{1},x_{2},\ldots,x_{T}) and outputs a noisy sum StS_{t} of stream up to time tt at every t[T]t\in[T] such that the overall output (S1,S2,,ST)(S_{1},S_{2},\ldots,S_{T}) is ε\varepsilon-DP. Furthermore, for any t[T]t\in[T], the error between the output and the running-sum satisfies |Sti=1txi|=O(ΔεlogTlogtln1δ)\left\lvert S_{t}-\sum_{i=1}^{t}x_{i}\right\rvert=O\left(\frac{\Delta}{\varepsilon}\log T\sqrt{\log t}\ln\frac{1}{\delta}\right) with probability at least 1δ1-\delta. Here, Δ\Delta is the magnitude of maximum possible variation of an element in the stream; for e.g., if 1xi31\leq x_{i}\leq 3 for every i[T]i\in[T], then Δ=2\Delta=2.

We observe two important properties of the binary mechanism (Algorithm 1):

  • (i)

    Any stream element xix_{i} is involved in computing at most (1+logT)(1+\log T) terms of the array NoisyPartialSums{\rm NoisyPartialSums}. For e.g., x1x_{1} is needed only while computing NoisyPartialSums[t]{\rm NoisyPartialSums}[t] for t=1,2,4,8,t=1,2,4,8, and so on.

  • (ii)

    For any tt, the output StS_{t} is a sum of at most (1+logt)(1+\log t) terms from the array NoisyPartialSums{\rm NoisyPartialSums}.

We now describe how these properties of the binary mechanism lead to privacy and utility (accuracy) guarantees.

Privacy. Since, for every tt, the output StS_{t} is a function of NoisyPartialSums{\rm NoisyPartialSums}, it suffices to ensure that the array NoisyPartialSums{\rm NoisyPartialSums} is ε\varepsilon-DP. From (i) above, changing a stream element will change at most (1+logT)(1+\log T) terms of the array NoisyPartialSums{\rm NoisyPartialSums}. Moreover, since the maximum variation of a stream element is Δ\Delta, each of these terms of NoisyPartialSums{\rm NoisyPartialSums} will change by at most Δ\Delta. Overall, changing an arbitrary stream element xix_{i} changes the 1\ell_{1}-norm of the array NoisyPartialSums{\rm NoisyPartialSums} by at most Δ(1+logT)\Delta(1+\log T). Thus, to ensure that the overall stream of outputs from the binary mechanism is ε\varepsilon-DP, it suffices to add Lap(η){\rm Lap}(\eta) noise to each term of NoisyPartialSums{\rm NoisyPartialSums}, where η=Δ(1+logT)/ε\eta=\Delta(1+\log T)/\varepsilon.

Utility. From (ii) above, since the output StS_{t} is a sum of at most (1+logt)(1+\log t) terms from the array NoisyPartialSums{\rm NoisyPartialSums}, where each term of NoisyPartialSums{\rm NoisyPartialSums} has an independent Lap(η){\rm Lap}(\eta) noise, we have that, with probability at least 1δ1-\delta, |Sti=1txi|η1+logtln1δ\left\lvert S_{t}-\sum_{i=1}^{t}x_{i}\right\rvert\leq\eta\sqrt{1+\log t}\ln\frac{1}{\delta}.

Algorithm 1 Binary Mechanism [CSS10]
1:(xt)t[T](x_{t})_{t\in[T]} (stream), TT (stream length), Δ\Delta (max variation of a stream element), ε\varepsilon (privacy parameter)
2:Initialize Stream{\rm Stream}, NoisyPartialSums{\rm NoisyPartialSums} (arrays of length TT)
3:ηΔ(1+logT)/ε\eta\leftarrow{\Delta(1+\log T)}/{\varepsilon} \triangleright noise parameter
4:for t=1,2,,Tt=1,2,\ldots,T do
5:     Stream[t]xt{\rm Stream}[t]\leftarrow x_{t}
6:     Express tt in binary form: t=j=0logt(bj2j)t=\sum_{j=0}^{\lfloor\log t\rfloor}(b_{j}\cdot 2^{j})
7:\triangleright bj{0,1}b_{j}\in\left\{0,1\right\}
8:     jmin{j:bj0}j^{\ast}\leftarrow\min\left\{j:b_{j}\neq 0\right\} \triangleright e.g. j=0j^{\ast}=0 if tt odd
9:     NoisyPartialSums[t]{\rm NoisyPartialSums}[t]\leftarrow
10:                      (i=t2j+1tStream[i])+Lap(η)\left(\sum_{i=t-2^{j^{\ast}}+1}^{t}{\rm Stream}[i]\right)+\ {\rm Lap}(\eta)
11:\triangleright noisy sum of the latest 2j2^{j^{\ast}} elements of Stream{\rm Stream}
12:     Sum0{\rm Sum}\leftarrow 0, index0{\rm index}\leftarrow 0
13:     for j=logt,logt1,,0j=\lfloor\log t\rfloor,\lfloor\log t\rfloor-1,\ldots,0 do
14:         indexindex+(bj2j){\rm index}\leftarrow{\rm index}+(b_{j}\cdot 2^{j})
15:         SumSum+NoisyPartialSums[index]{\rm Sum}\leftarrow{\rm Sum}+{\rm NoisyPartialSums}[{\rm index}]
16:     end for
17:     return St=SumS_{t}={\rm Sum}
18:end for

In our algorithms, we invoke an instance of the binary mechanism as BinMech{\rm BinMech}, and abstract out its functionality using the following three ingredients:

  • BinMech.Stream{\rm BinMech.Stream}: This is an array which will act as the input stream for the binary mechanism BinMech{\rm BinMech}. In our algorithms, we will feed an element to BinMech.Stream{\rm BinMech.Stream} only at certain special time instances.

  • BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}: This array will be of the same length as BinMech.Stream{\rm BinMech.Stream}. The kk-th term of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} will be computed after the kk-th element enters BinMech.Stream{\rm BinMech.Stream}. The magnitude of Laplace noise to be added while computing terms of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} (magnitude of Laplace noise would be same for all terms) will be passed as a noise parameter while invoking BinMech{\rm BinMech}. The sum output by BinMech{\rm BinMech} would be formed by combining terms from BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}.

  • BinMech.Sum{\rm BinMech.Sum}: Suppose, at time tt, BinMech.Stream{\rm BinMech.Stream} (and, thus, BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}) has kk elements. Then, BinMech.Sum{\rm BinMech.Sum} is a function which, when invoked at time tt, will output SkS_{k} (noisy sum of all kk elements in BinMech.Stream{\rm BinMech.Stream}) by combining terms stored in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}.

3 ALGORITHM

In this section, we build up to our main algorithm (Algorithm 6), pointing out the main ideas along the way.

A naive use of binary mechanism.

Given a stream ((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (where xti.i.d.Ber(μ)x_{t}\overset{\rm i.i.d.}{\sim}\mathrm{Ber}(\mu)), Algorithm 2 presents a straightforward way to estimate mean μ\mu in the continual-release user-level DP setting. The algorithm feeds xtx_{t} to binary mechanism BinMech{\rm BinMech} at every time instant tt. Since each user contributes at most mm samples, changing a user will change at most m(1+logT)m(1+\log T) terms of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}. Moreover, since the maximum variation of any stream element xtx_{t} is 11, changing a user will change the 1\ell_{1}-norm of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} by at most m(1+logT)m(1+\log T). Thus, for η=m(1+logT)ε\eta=\frac{m(1+\log T)}{\varepsilon}, adding Lap(η){\rm Lap}(\eta) noise to each term of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} ensures that this algorithm is user-level ε\varepsilon-DP. To obtain an accuracy guarantee at any given time tt, we observe the following: Since x1,,xtx_{1},\ldots,x_{t} are independent Ber(μ)\mathrm{Ber}(\mu) samples, we have that, with probability at least 1δ1-\delta, |i=1txitμ|t2ln2δ\left\lvert\sum_{i=1}^{t}x_{i}-t\mu\right\rvert\leq\sqrt{\frac{t}{2}\ln\frac{2}{\delta}}. Moreover, since StS_{t} is a sum of at most (1+logt)(1+\log t) terms from BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}, where each term of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} has an independent Lap(η){\rm Lap}(\eta) noise, we have that, with probability at least 1δ1-\delta, |Sti=1txi|η1+logtln1δ\left\lvert S_{t}-\sum_{i=1}^{t}x_{i}\right\rvert\leq\eta\sqrt{1+\log t}\ln\frac{1}{\delta}. Thus, using union bound, we have with probability at least 12δ1-2\delta that

|μ^tμ|=O(1tln1δ+mtεlogtlogTln1δ).\left\lvert\hat{\mu}_{t}-\mu\right\rvert=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{m}{t\varepsilon}\sqrt{\log t}\log T\ln\frac{1}{\delta}\right). (1)

The first term on the right hand side of (1) is the usual statistical error that results from using tt samples to estimate μ\mu. The second term in the error (“privacy error”), however, results from ensuring that the stream of estimates (μ^1,,μ^T)(\hat{\mu}_{1},\ldots,\hat{\mu}_{T}) is user-level ε\varepsilon-DP; note that this term is linear in mm. We next describe a “wishful” scenario where the privacy error can be made proportional m\sqrt{m}. We will then see how to obtain such a result in the general scenario, which is our main contribution.

Algorithm 2 Continual mean estimation (naive)
1:((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (stream), TT (stream length), mm (max no. of samples per user), ε\varepsilon (privacy parameter), δ\delta (failure probability).
2:Initialize binary mechanism BinMech{\rm BinMech} with noise level η=m(1+logT)ε\eta=\frac{m(1+\log T)}{\varepsilon}.
3:for t=1,2,,Tt=1,2,\ldots,T do
4:     BinMech.StreamBinMech.Stream{xt}{\rm BinMech.Stream}\leftarrow{\rm BinMech.Stream}\cup\left\{x_{t}\right\}
5:     StBinMech.SumS_{t}\leftarrow{\rm BinMech.Sum}
6:     return μ^t=Stt\hat{\mu}_{t}=\frac{S_{t}}{t}
7:end for

A better algorithm in a wishful scenario: Exploiting concentration.

Naive use of the binary mechanism for continual mean estimation fails to exploit the concentration phenomenon that results from each user contributing multiple i.i.d. samples to the stream. To see how concentration might help, consider the following scenario. Suppose that (somehow) we already have a prior estimate μ~\tilde{\mu} that satisfies |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Also, assume a user ordering where every user contributes their mm samples in contiguous time steps. That is, user 11 contributes samples for the first mm time steps, followed by user 2 who contributes samples for the next mm time steps, and so on. In this case, Algorithm 3 presents a way to exploit the concentration phenomenon. Even though this algorithm outputs μ^t\hat{\mu}_{t} at every tt, it only updates μ^t\hat{\mu}_{t} at t=m,2m,3m,,nmt=m,2m,3m,\ldots,nm. Upon receiving a sample from a user, the algorithm does not immediately add it to BinMech.Stream{\rm BinMech.Stream}. Instead, the algorithm waits for a user to contribute all their mm samples. It then computes the sum of those mm samples and projects the sum to the interval

=[mμ~Δ,mμ~+Δ] where Δ=m2ln2nδ+m.\mathcal{I}=\biggl{[}m\tilde{\mu}-\Delta,m\tilde{\mu}+\Delta\biggr{]}\text{ where }\Delta=\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}+{\sqrt{m}}. (2)

before feeding the projected sum to BinMech.Stream{\rm BinMech.Stream}. By concentration of sum of i.i.d. Bernoulli random variables, and the fact that |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}} (by assumption), we have with probability at least 1δ1-\delta that the sum of mm samples corresponding to every user will fall inside the interval \mathcal{I}; thus, the projection operator Π()\Pi(\cdot) has no effect with high probability.

Since there are nn users, at most nn elements are added to BinMech.Stream{\rm BinMech.Stream} throughout the course of the algorithm. Now, since there can be at most nn elements in BinMech.Stream{\rm BinMech.Stream}, a given element in BinMech.Stream{\rm BinMech.Stream} will be used at most 1+logn1+\log n times while computing BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} (see Section 2.3). So, changing a user can change the 1\ell_{1}-norm of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} by at most (2Δ)(1+logn)(2\Delta)(1+\log n), where Δ\Delta is as in (2). Thus, to ensure that the algorithm is ε\varepsilon-DP (given initial estimate μ~\tilde{\mu}), it suffices to add independent Lap(η){\rm Lap}(\eta) noise while computing each term in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}, where

η=2Δ(1+logn)ε=2(m2ln2nδ+m)(1+logn)ε.\eta=\frac{2\Delta(1+\log n)}{\varepsilon}=\frac{2\left(\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}+{\sqrt{m}}\right)(1+\log n)}{\varepsilon}. (3)

Note that η\eta defined above is proportional to m\sqrt{m}, whereas the η\eta defined in the naive use of binary mechanism was proportional to mm. Thus (details in Appendix B), one obtains a privacy error of O~(m/tε)\tilde{O}(\sqrt{m}/t\varepsilon), while still having a statistical error of O(1/t)O(1/\sqrt{t}) at every tt.

Algorithm 3 Continual mean estimation (wishful scenario)
1:((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (stream with mm samples from any user arriving contiguously), nn (no. of users), mm (no. of samples per user), ε\varepsilon (privacy), μ~\tilde{\mu} (estimate of μ\mu s.t. |μ~μ|1/m\left\lvert\tilde{\mu}-\mu\right\rvert\leq 1/\sqrt{m}), δ\delta (failure probability).
2:Let Π()\Pi(\cdot) be the projection on the interval \mathcal{I}, where
=[mμ~Δ,mμ~+Δ] where Δ=m2ln2nδ+m.\mathcal{I}=\biggl{[}m\tilde{\mu}-\Delta,m\tilde{\mu}+\Delta\biggr{]}\text{ where }\Delta=\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}+{\sqrt{m}}.
3:Initialize binary mechanism BinMech{\rm BinMech} with noise level η\eta where
η=2Δ(1+logn)ε.\eta=\frac{2\Delta(1+\log n)}{\varepsilon}.
4:Initialize total0{\rm total}\leftarrow 0.
5:for t<mt<m do
6:     return μ^t=μ~\hat{\mu}_{t}=\tilde{\mu}
7:end for
8:for tmt\geq m do
9:     if t{m,2m,3m,,nm}t\in\left\{m,2m,3m,\ldots,nm\right\} then
10:         totaltotal+m{\rm total}\leftarrow{\rm total}+m
11:         σ=Π(j=tm+1txj)\sigma=\Pi\left(\sum_{j=t-m+1}^{t}x_{j}\right)
12:         BinMech.StreamBinMech.Stream{σ}{\rm BinMech.Stream}\leftarrow{\rm BinMech.Stream}\cup\left\{\sigma\right\}
13:     end if
14:     StBinMech.SumS_{t}\leftarrow{\rm BinMech.Sum}
15:     return μ^t=Sttotal\hat{\mu}_{t}=\frac{S_{t}}{\rm total}
16:end for

The scenario here is wishful for two reasons: (i) we assume a prior estimate μ~\tilde{\mu} that we used to form the interval (2); (ii) we assume a special user ordering where every user contributed their samples contiguously. This user ordering ensures that μ^t\hat{\mu}_{t} is updated after every mm time steps, and thus, for every tt, at least t/2t/2 samples are used in computing μ^t\hat{\mu}_{t}; this is the reason that the statistical error remains O(1/t)O(1/\sqrt{t}). Note that this algorithm will perform very poorly in the general user ordering. For instance, if the users contribute samples in a round-robin fashion (where a sample from user 11 is followed by a sample from user 22 and so on), then the algorithm would have to wait for time t=n(m1)t=n(m-1) to obtain mm samples from any given user.

Although wishful, there are two main ideas that we take away from this discussion: One is the idea of withholding. That is, it is not necessary to incorporate information about every sample received till time tt to compute μ^t\hat{\mu}_{t}. Second is the idea of truncation. That is, the worst-case variation in the quantity of interest due to variation in a user’s data can be reduced by projecting user’s contributions on a smaller interval. This idea of exploiting concentration using truncation is also used in [Lev+21] for mean estimation in the single-release setting.

Designing algorithms for worst-case user order: Exponential withhold-release pattern.

The algorithm discussed above suffered in the worst-case user ordering since samples from a user were “withheld” until every sample from that user was obtained. This was done to exploit (using “truncation”) the concentration of sum of mm i.i.d. samples from the user. To exploit the concentration of sum of i.i.d. samples in the setting where the user order can be arbitrary, we propose the idea of withholding the samples (and applying truncation) at exponentially increasing intervals. Namely, for a given user uu, we do not withhold the first two samples x1(u),x2(u)x_{1}^{(u)},x_{2}^{(u)}; then, we withhold x3(u)x_{3}^{(u)} and release truncated version of (x3(u)+x4(u))(x_{3}^{(u)}+x_{4}^{(u)}) when we receive x4(u)x_{4}^{(u)}; we then withhold x5(u),x6(u),x7(u)x_{5}^{(u)},x_{6}^{(u)},x_{7}^{(u)} and release truncated version of (x5(u)+x6(u)+x7(u)+x8(u))(x_{5}^{(u)}+x_{6}^{(u)}+x_{7}^{(u)}+x_{8}^{(u)}) when we receive x8(u)x_{8}^{(u)}; and so on. In general, we withhold samples x21+1(u),,x21(u)x_{2^{\ell-1}+1}^{(u)},\ldots,x_{2^{\ell}-1}^{(u)} and release truncated version of i=21+12xi(u)\sum_{i=2^{\ell-1}+1}^{2^{\ell}}x_{i}^{(u)} when we receive the 22^{\ell}-th sample x2(u)x_{2^{\ell}}^{(u)} from user uu.

We now present Algorithm 4, which uses this exponential withhold-release idea and, assuming a prior μ~\tilde{\mu}, outputs an estimate with statistical error O(1t)O(1\sqrt{t}) and privacy error O~(m/tε)\tilde{O}(\sqrt{m}/t\varepsilon) for arbitrary user ordering.

Algorithm 4 Continual mean estimation assuming prior (single binary mechanism)
1:((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (stream), nn (max no. of users), mm (max no. of samples per user), ε\varepsilon (privacy parameter), μ~\tilde{\mu} (estimate of μ\mu satisfying |μ~μ|1/m\left\lvert\tilde{\mu}-\mu\right\rvert\leq 1/\sqrt{m}), δ\delta (failure probability).
2:- Let xj(u)x^{(u)}_{j} denote the jj-th sample contributed by user uu.
3:- For 1\ell\geq 1, let Π()\Pi_{\ell}(\cdot) be the projection on the interval \mathcal{I}_{\ell}, where
:=[21μ~Δ,21μ~+Δ],Δ=212ln2nlogmδ+21m.\mathcal{I}_{\ell}:=\left[2^{\ell-1}\tilde{\mu}-\Delta_{\ell},2^{\ell-1}\tilde{\mu}+\Delta_{\ell}\right],\quad\Delta_{\ell}=\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}.
4:Initialize binary mechanism BinMech{\rm BinMech} with noise level η(m,n,δ)\eta(m,n,\delta), where
η(m,n,δ)=2Δ(1+logm)log(1+n(1+logm))ε,Δ=m2ln2nlogmδ+m.\eta(m,n,\delta)=\frac{2\Delta(1+\log m)\log(1+n(1+\log m))}{\varepsilon},\quad\Delta=\sqrt{\frac{m}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{m}.
5:For each user uu, let M(u)=M(u)= no. of times user uu has contributed a sample; initialize M(u)0M(u)\leftarrow 0.
6:Initialize total0{\rm total}\leftarrow 0
7:for t=1,2,,Tt=1,2,\ldots,T do
8:     M(ut)M(ut)+1M(u_{t})\leftarrow M(u_{t})+1
9:     if logM(ut)+\log M(u_{t})\in\mathbb{Z}_{+} then
10:         logM(ut)\ell\leftarrow\log M(u_{t}) \triangleright i.e. M(ut)=2M(u_{t})=2^{\ell}
11:         totaltotal+M(ut){\rm total}\leftarrow{\rm total}+M(u_{t})
12:         if =0\ell=0 then σxt\sigma\leftarrow x_{t}
13:         else if {1,2,3,}\ell\in\left\{1,2,3,\ldots\right\} then
14:              σΠ(j=21+12xj(ut))\sigma\leftarrow\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u_{t})}_{j}\right)
15:         end if
16:         BinMech.StreamBinMech.Stream{σ}{\rm BinMech.Stream}\leftarrow{\rm BinMech.Stream}\cup\left\{\sigma\right\}
17:     end if
18:     StBinMech.SumS_{t}\leftarrow{\rm BinMech.Sum}
19:     return μ^t=Sttotal\hat{\mu}_{t}=\frac{S_{t}}{\rm total}
20:end for

Continual mean estimation assuming prior estimate: Algorithm 4.

Let xj(u)x^{(u)}_{j} be the jj-th sample obtained from user uu. Let Π()\Pi_{\ell}(\cdot) be the projection on the interval \mathcal{I}_{\ell}, where

:=[21μ~Δ,21μ~+Δ],Δ=212ln2nlogmδ+21m.\mathcal{I}_{\ell}:=\left[2^{\ell-1}\tilde{\mu}-\Delta_{\ell},2^{\ell-1}\tilde{\mu}+\Delta_{\ell}\right],\quad\Delta_{\ell}=\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}. (4)

Let utu_{t} be the user at time tt. Upon receiving a sample from a user, the algorithm does not immediately add it to BinMech.Stream{\rm BinMech.Stream}. Instead, the algorithm only adds anything new to BinMech.Stream{\rm BinMech.Stream} at time instances tt when the total number of samples obtained from user utu_{t} becomes 22^{\ell}, for {0,1,2,}\ell\in\left\{0,1,2,\ldots\right\}. At such a time, the algorithm adds Π(j=21+12xj(ut))\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u_{t})}_{j}\right) to BinMech.Stream{\rm BinMech.Stream}. (The summation inside Π\Pi_{\ell} makes sense only for 1\ell\geq 1; for =0\ell=0, which corresponds to the first sample from user utu_{t}, the algorithm simply adds the sample to BinMech.Stream{\rm BinMech.Stream}.) This has the effect that, corresponding to a user, there are at most (1+logm)(1+\log m) elements in BinMech.Stream{\rm BinMech.Stream}. Since, there are at most nn users, there are at most n(1+logm)n(1+\log m) elements added to BinMech.Stream{\rm BinMech.Stream} throughout the course of the algorithm.

Now, since there can be at most n(1+logm)n(1+\log m) elements in BinMech.Stream{\rm BinMech.Stream}, a given element in BinMech.Stream{\rm BinMech.Stream} will be used at most log(n(1+logm))\log(n(1+\log m)) times while computing BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} (see Section 2.3). Thus, changing a user can change the 1\ell_{1}-norm of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} by at most (1+logm)(log(n(1+logm)))Δ(1+\log m)(\log(n(1+\log m)))\Delta, where Δ\Delta is the maximum sensitivity of an element contributed by a user to BinMech.Stream{\rm BinMech.Stream}. As can be seen from (4), we have Δ=2(m2ln2nlogmδ+m)\Delta=2\left(\sqrt{\frac{m}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{m}\right). Thus, to ensure that Algorithm 4 is ε\varepsilon-DP (given initial estimate μ~\tilde{\mu}), it suffices to add independent Lap(η(m,n,δ)){\rm Lap}(\eta(m,n,\delta)) noise while computing each term in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}, where

η(m,n,δ)=2Δ(1+logm)log(1+n(1+logm))ε,Δ=m2ln2nlogmδ+m.\eta(m,n,\delta)=\frac{2\Delta(1+\log m)\log(1+n(1+\log m))}{\varepsilon},\quad\Delta=\sqrt{\frac{m}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{m}. (5)

Note that, as in (3), the magnitude of noise in (5) is O~(m)\tilde{O}(\sqrt{m}). Moreover, with exponential withhold-release pattern, we are guaranteed that at any time tt, the estimate μ^t\hat{\mu}_{t} is computed using at least t/2t/2 samples, no matter what the user ordering is (Claim C.1 in Appendix C). This gives us the following guarantee (proof in Appendix C.2):

Theorem 3.1.

Assume that we are given a user-level ε\varepsilon-DP prior estimate μ~\tilde{\mu} of the true mean μ\mu, such that |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Then, Algorithm 4 is 2ε2\varepsilon-DP. Moreover, for a given t[T]t\in[T], we have with probability at least 12δ1-2\delta,

|μ^tμ|=O~(1t+mtε).\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{m}}{t\varepsilon}\right).

We now present Algorithm 5, which, in conjunction with exponential withhold-release, uses multiple binary mechanisms. Assuming a prior estimate μ~\tilde{\mu}, this algorithm outputs an estimate with statistical error O(1/t)O(1/\sqrt{t}) and privacy error O~(Mt/tε)\tilde{O}(\sqrt{M_{t}}/t\varepsilon), where MtM_{t} is the maximum number of sample contributed by a user. Note that MtM_{t} can be much smaller than mm for large values of mm.

Continual mean estimation assuming prior estimate: Algorithm 5.

In this algorithm, we use L+1L+1 binary mechanisms BinMech[0],,BinMech[L]{\rm BinMech}[0],\ldots,{\rm BinMech}[L], where L:=logmL:=\lceil\log m\rceil. As was the case with Algorithm 4, here too the algorithm only adds anything new to a binary mechanism at time instances tt when the total number of samples obtained from user utu_{t} becomes 22^{\ell}, for {0,1,2,}\ell\in\left\{0,1,2,\ldots\right\}. What differentiates Algorithm 5 from Algorithm 4 is that we use different binary mechanisms for different values of \ell. Namely, the element Π(j=21+12xj(u))\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u)}_{j}\right) from a user uu is added to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream}. This ensures that each element in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} has maximum sensitivity 2Δ2\Delta_{\ell}, where (from (4)), Δ=212ln2nlogmδ+21m\Delta_{\ell}=\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}.

Algorithm 5 Continual mean estimation assuming prior (multiple binary mechanisms)
1:((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (stream), nn (max no. of users), mm (max no. of samples per user), ε\varepsilon (privacy parameter), μ~\tilde{\mu} (estimate of μ\mu satisfying |μ~μ|1/m\left\lvert\tilde{\mu}-\mu\right\rvert\leq 1/\sqrt{m}), δ\delta (failure probability).
2:- Let xj(u)x^{(u)}_{j} denote the jj-th sample contributed by user uu.
3:- For +\ell\in\mathbb{Z}_{+}, let Π()\Pi_{\ell}(\cdot) be the projection on the interval \mathcal{I}_{\ell}, where
:=[21μ~Δ,21μ~+Δ],Δ=212ln2nlogmδ+21m.\mathcal{I}_{\ell}:=\left[2^{\ell-1}\tilde{\mu}-\Delta_{\ell},2^{\ell-1}\tilde{\mu}+\Delta_{\ell}\right],\quad\Delta_{\ell}=\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}.
4:- Let L:=logmL:=\lceil\log m\rceil.
5:Initialize L+1L+1 binary mechanisms, labelled BinMech[0],,BinMech[L]{\rm BinMech}[0],\ldots,{\rm BinMech}[L], where BinMech[]{\rm BinMech}[\ell] is initialized with noise level η(m,n,,δ)\eta(m,n,\ell,\delta), where
η(m,n,,δ)=2(212ln2nlogmδ+21)(1+logn)ε/(L+1).\displaystyle\eta(m,n,\ell,\delta)=\frac{2\left(\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{2^{\ell-1}}\right)(1+\log n)}{\varepsilon/(L+1)}.
6:For each user uu, let M(u)=M(u)= no. of times user uu has contributed a sample; initialize M(u)0M(u)\leftarrow 0.
7:Initialize total0{\rm total}\leftarrow 0
8:for t=1,2,,Tt=1,2,\ldots,T do
9:     M(ut)M(ut)+1M(u_{t})\leftarrow M(u_{t})+1
10:     if logM(ut)+\log M(u_{t})\in\mathbb{Z}_{+} then
11:         logM(ut)\ell\leftarrow\log M(u_{t})
12:         totaltotal+M(ut){\rm total}\leftarrow{\rm total}+M(u_{t})
13:         if =0\ell=0 then σxt\sigma\leftarrow x_{t}
14:         else if {1,2,3,}\ell\in\left\{1,2,3,\ldots\right\} then
15:              σΠ(j=21+12xj(ut))\sigma\leftarrow\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u_{t})}_{j}\right)
16:         end if
17:         BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} \leftarrow BinMech[].Stream{σ}{\rm BinMech}[\ell].{\rm Stream}\cup\left\{\sigma\right\}
18:     end if
19:     Sti=0LS_{t}\leftarrow\sum_{i=0}^{L} BinMech[i].Sum{\rm BinMech}[i].{\rm Sum}
20:     return μ^t=Sttotal\hat{\mu}_{t}=\frac{S_{t}}{\rm total}
21:end for

Note that

Δ212ln2nlogmδ+21\displaystyle\Delta_{\ell}\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{2^{\ell-1}} (6)

where the inequality holds because m21m\geq 2^{\ell-1}. To ensure that Algorithm 5 is ε\varepsilon-DP, we will make each binary mechanism εL+1\frac{\varepsilon}{L+1}-DP. For any {0,,L}\ell\in\left\{0,\ldots,L\right\}, since each user contributes at most one element to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream}, there are at most nn elements in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} throughout the course of the algorithm. Moreover, since every element in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} has sensitivity at most 2Δ2\Delta_{\ell}, it suffices to add independent Lap(η(m,n,,δ)){\rm Lap}(\eta(m,n,\ell,\delta)) noise while computing each term in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}, where

η(m,n,,δ)=2(212ln2nlogmδ+21)(1+logn)ε/(L+1).\displaystyle\eta(m,n,\ell,\delta)=\frac{2\left(\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{2^{\ell-1}}\right)(1+\log n)}{\varepsilon/(L+1)}. (7)

Comparing η(m,n,,δ)\eta(m,n,\ell,\delta) above to η(m,n,δ)\eta(m,n,\delta) in (5), we see that using multiple binary mechanisms allows us to add noise with magnitude that is more fine-tuned to exponential withhold-release pattern. In particular, if MtM_{t} is the maximum number of samples contributed by any user till time tt, then at most logMt\lceil\log M_{t}\rceil binary mechanisms would be in use, and thus the maximum noise level η(m,n,,δ)\eta(m,n,\ell,\delta) would be proportional to Mt\sqrt{M_{t}}. This gives us the following guarantee (proof in Appendix D):

Theorem 3.2.

Assume that we are given a user-level ε\varepsilon-DP prior estimate μ~\tilde{\mu} of the true mean μ\mu, such that |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Then, Algorithm 5 is 2ε2\varepsilon-DP. Moreover, for any given t[T]t\in[T], we have with probability at least 1δ1-\delta that

|μ^tμ|=O~(1t+Mttε),\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right),

where MtM_{t} denotes the maximum number of samples obtained from any user till time tt, i.e., Mt=max{mu(t):u[n]}M_{t}=\max\left\{m_{u}(t):u\in[n]\right\}, where mu(t)m_{u}(t) is the number of samples obtained from user uu till time tt.

We now present our final algorithm, Algorithm 6, which does not assume a prior estimate. Observe that the prior estimate μ~\tilde{\mu} was needed in previous algorithms only to form truncation intervals \mathcal{I}_{\ell} (see (4)). Algorithm 6 includes estimating μ~\tilde{\mu} as a subroutine. In fact, as we describe next, the algorithm estimates a separate prior for each binary mechanism.

Continual mean estimation without assuming prior estimate: Algorithm 6.

Since the algorithm uses L+1L+1 (where L:=logmL:=\lceil\log m\rceil) binary mechanisms, we need not estimate μ~\tilde{\mu} up to accuracy O(1/m)O(1/\sqrt{m}) in one go. Instead, we can have a separate μ~\tilde{\mu} for each binary mechanism, which is helpful since BinMech[]{\rm BinMech}[\ell] requires μ~\tilde{\mu} only up to an accuracy O(1/21)O(1/\sqrt{2^{\ell-1}}) (see (6)).

In this algorithm, we mark BinMech[]{\rm BinMech}[\ell] as “inactive” till we have sufficient number of users with sufficient number of samples to estimate a user-level ε2L\frac{\varepsilon}{2L}-DP prior μ~\tilde{\mu}_{\ell} up to an accuracy of O~(1/21)\tilde{O}(1/\sqrt{2^{\ell-1}}). While BinMech[]{\rm BinMech}[\ell] is inactive, we store the elements that require μ~\tilde{\mu}_{\ell} for truncation in Buffer[]{\rm Buffer}[\ell] (see Line 24-25; each element of Buffer[]{\rm Buffer}[\ell] is a sum of 212^{\ell-1} samples, which we cannot truncate yet). Once we have sufficient number of users with sufficient number of samples (Line 9 lists the exact condition), we use a private median estimation algorithm (Algorithm 7) from [FS17] to estimate μ~\tilde{\mu}_{\ell}. At this point, we use μ~\tilde{\mu}_{\ell} to truncate elements stored in Buffer[]{\rm Buffer}[\ell], and pass these elements to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} (Line 11).

Algorithm 6 Continual mean estimation: Full algorithm
1:((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} (stream), nn (max no. of users), mm (max no. of samples per user), ε\varepsilon (privacy parameter), δ\delta (failure probability)
2:Let xj(u)x^{(u)}_{j} denote the jj-th sample contributed by user uu. Let L:=logmL:=\lceil\log m\rceil. For 1\ell\geq 1, let Π()\Pi_{\ell}(\cdot) be the projection on the interval \mathcal{I}_{\ell} defined as
=[21μ~Δ,21μ~+Δ]\mathcal{I}_{\ell}=\left[2^{\ell-1}\tilde{\mu}_{\ell}-\Delta_{\ell},2^{\ell-1}\tilde{\mu}_{\ell}+\Delta_{\ell}\right] (8)
where μ~\tilde{\mu}_{\ell} is as in Line 10, and
Δ=212ln2nlogmδ/3+2ln2k(ε2L,,δ3L)δ/3L,where k(ε,,β):=16εln2/2β.\Delta_{\ell}=\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta/3}}+\sqrt{2^{\ell}\ln\frac{2k(\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L})}{\delta/3L}},\quad\text{where }k(\varepsilon^{\prime},\ell,\beta):=\frac{16}{\varepsilon^{\prime}}\ln\frac{2^{\ell/2}}{\beta}. (9)
3:Initialize L+1L+1 binary mechanisms, labelled BinMech[0],,BinMech[L]{\rm BinMech}[0],\ldots,{\rm BinMech}[L], where BinMech[]{\rm BinMech}[\ell] is initialized with noise level η(m,n,,δ)\eta(m,n,\ell,\delta) defined as
η(m,n,,δ)=2Δ(1+logn)ε/2(L+1)\eta(m,n,\ell,\delta)=\frac{2\Delta_{\ell}(1+\log n)}{\varepsilon/2(L+1)} (10)
4:Initialize Inactive{2,,L}{\rm Inactive}\leftarrow\left\{2,\ldots,L\right\}.
5:Initialize L1L-1 buffers labelled Buffer[2],,Buffer[L]{\rm Buffer}[2],\ldots,{\rm Buffer}[L].
6:For each user uu, let M(u)=M(u)= no. of times user uu has contributed a sample; initialize M(u)0M(u)\leftarrow 0.
7:Initialize total0{\rm total}\leftarrow 0
8:for t=1,2,,Tt=1,2,\ldots,T do
9:     M(ut)M(ut)+1M(u_{t})\leftarrow M(u_{t})+1
10:%\% Activating binary mechanisms (if possible):
11:     for Inactive\ell\in{\rm Inactive} do
12:         if u=1nmin{M(u),21}\sum_{u=1}^{n}\min\left\{M(u),2^{\ell-1}\right\}\geq 2116ε(2Lln3L2/2δ)2^{\ell-1}\frac{16}{\varepsilon}\left(2L\ln\frac{3L2^{\ell/2}}{\delta}\right) then
13:              μ~=PrivateMedian((xi,ui)i=1t,ε2L,,δ3L)\tilde{\mu}_{\ell}={\rm PrivateMedian}\left((x_{i},u_{i})_{i=1}^{t},\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L}\right) \triangleright Algorithm 7
14:              BinMech[].StreamΠ(Buffer[]){\rm BinMech}[\ell].{\rm Stream}\leftarrow\Pi_{\ell}\left({\rm Buffer}[\ell]\right)
15:              totaltotal+21Size(Buffer[]){\rm total}\leftarrow{\rm total}+2^{\ell-1}{\rm Size}\left({\rm Buffer}[\ell]\right)
16:         end if
17:     end for
18:%\%
19:     if logM(ut)+\log M(u_{t})\in\mathbb{Z}_{+} then
20:         logM(ut)\ell\leftarrow\log M(u_{t})
21:         if =0\ell=0 then σxt\sigma\leftarrow x_{t}
22:         else if {1,2,,L}\ell\in\left\{1,2,\ldots,L\right\} then
23:              σj=21+12xj(ut)\sigma\leftarrow\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u_{t})}_{j}
24:         end if
25:         if Inactive\ell\notin{\rm Inactive} then
26:              BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} \leftarrow BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} {Π(σ)}\cup\left\{\Pi_{\ell}\left(\sigma\right)\right\}
27:              totaltotal+M(ut){\rm total}\leftarrow{\rm total}+M(u_{t})
28:         else if Inactive\ell\in{\rm Inactive} then
29:              Buffer[]{\rm Buffer}[\ell] \leftarrow Buffer[]{\rm Buffer}[\ell] {σ}\cup\left\{\sigma\right\}
30:         end if
31:     end if
32:     Sti=0LS_{t}\leftarrow\sum_{i=0}^{L} BinMech[i].Sum{\rm BinMech}[i].{\rm Sum}
33:     return μ^t=Sttotal\hat{\mu}_{t}=\frac{S_{t}}{\rm total}
34:end for

We note that the private median estimation algorithm is also used in [Lev+21] in the single-release setting with all users contributing equal number of samples; we extend this to the setting where number of samples from users can vary. In Appendix E.1 (Claim E.1), we show that, for any 1\ell\geq 1, this modified private median estimation algorithm (Algorithm 7) is user-level ε\varepsilon-DP. Moreover, with probability at least 1δβ1-\delta-\beta, we have that |μ~μ|212ln2k(ε,,β)δ\left\lvert\tilde{\mu}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k(\varepsilon,\ell,\beta)}{\delta}}, where k(ε,,β)=16εln2/2βk(\varepsilon,\ell,\beta)=\frac{16}{\varepsilon}\ln\frac{2^{\ell/2}}{\beta}.

Algorithm 7 PrivateMedian{\rm PrivateMedian} (subroutine for Line 10 in Algorithm 6)
1:(xi,ui)i=1t(x_{i},u_{i})_{i=1}^{t}, ε\varepsilon (privacy), \ell (scale), β\beta (failure probability)
2:u=1nmin{mu(t),21}2116εln2/2β\sum_{u=1}^{n}\min\left\{m_{u}(t),2^{\ell-1}\right\}\geq 2^{\ell-1}\frac{16}{\varepsilon}\ln\frac{2^{\ell/2}}{\beta}, where mu(t)m_{u}(t) is the no. of times user uu occurs in (ui)i=1t(u_{i})_{i=1}^{t}.
3:
4:Initialize k:=(16εln2/2β)k:=\left(\frac{16}{\varepsilon}\ln\frac{2^{\ell/2}}{\beta}\right) arrays S1,,SkS_{1},\ldots,S_{k}, each of size 212^{\ell-1}.
5:%\% Forming kk arrays, each containing 212^{\ell-1} samples:
6:j1j\leftarrow 1.
7:for u[n]u\in[n] do
8:     Let r=min{mu(t),21}r=\min\left\{m_{u}(t),2^{\ell-1}\right\}.
9:     One by one, start storing rr samples from user uu in array SjS_{j}.
10:     At any point, if array SjS_{j} becomes full, increment jj: jj+1j\leftarrow j+1.
11:     Exit loop once array SkS_{k} becomes full. \triangleright The ‘Require’ condition ensures that this eventually happens.
12:end for
13:%\%
14:For j[k]j\in[k], let YjY_{j} be the sample mean of all the samples in SjS_{j}.
15:
16:%\% The steps below are as in [FS17],[Lev+21] mutatis mutandis:
17:Divide the interval [0,1][0,1] into disjoint subintervals (“bins”), each of length (22/2)(2\cdot 2^{-\ell/2}). The last subinterval can be of shorter length if 1/(22/2)1/(2\cdot 2^{-\ell/2}) is not an integer. Let 𝒯\mathcal{T} be the set of middle points of these subintervals.
18:For j[k]j\in[k], let Yi=argminy𝒯|Yiy|Y^{\prime}_{i}=\arg\min_{y\in\mathcal{T}}\left\lvert Y_{i}-y\right\rvert be the point in 𝒯\mathcal{T} closest to YiY_{i}.
19:Define cost function c:𝒯c:\mathcal{T}\to\mathbb{R} as
c(y):=max{|{j[k]:Yj<y}|,|{j[k]:Yj>y}|}.c(y):=\max\left\{\left\lvert\left\{j\in[k]:Y^{\prime}_{j}<y\right\}\right\rvert,\left\lvert\left\{j\in[k]:Y^{\prime}_{j}>y\right\}\right\rvert\right\}.
20:Let μ~\tilde{\mu} be a sample drawn from the distribution satisfying
Pr{μ~=y}exp(ε4c(y)).\mathrm{Pr}\left\{\tilde{\mu}=y\right\}\propto\exp\left(-\frac{\varepsilon}{4}c(y)\right).
\triangleright Note that we have ε4c(y)-\frac{\varepsilon}{4}c(y) in exp()\exp(\cdot) whereas [FS17],[Lev+21] had ε2c(y)-\frac{\varepsilon}{2}c(y).
21:return μ~\tilde{\mu}.

Algorithm 6 demonstrates another advantage of using multiple binary mechanisms – that we can have different priors for different binary mechanisms, which means that the algorithm does not need to wait for long to start outputting estimates with good guarantees. Theorem 3.4 (proof in Appendix E.2) states the exact guarantees ensured by Algorithm 6. Before stating the theorem, we define what we mean by “diversity condition”.

Definition 3.3 (Diversity condition).

We say that “diversity condition holds at time tt” if

u=1nmin{mu(t),Mt2}Mt216ε(2Lln3LMtδ)\sum_{u=1}^{n}\min\left\{m_{u}(t),\frac{M_{t}}{2}\right\}\geq\frac{M_{t}}{2}\frac{16}{\varepsilon}\left(2L\ln\frac{3L\sqrt{M_{t}}}{\delta}\right) (11)

where L:=logmL:=\lceil\log m\rceil and MtM_{t} is the maximum number of samples contributed by any user till time tt. That is, Mt:=max{mu(t):u[n]}M_{t}:=\max\left\{m_{u}(t):u\in[n]\right\}, where mu(t)m_{u}(t) is the number of samples contributed by user uu till time tt.

In words, condition (11) says that we need the number of users at time tt to be large enough so that we can form a collection of Mt216ε(2Lln3LMtδ)\frac{M_{t}}{2}\frac{16}{\varepsilon}\left(2L\ln\frac{3L\sqrt{M_{t}}}{\delta}\right) samples by using at most Mt2\frac{M_{t}}{2} samples per user. A sufficient condition to ensure (11) holds is that there are at least 16ε(2Lln3LMtδ)\frac{16}{\varepsilon}\left(2L\ln\frac{3L\sqrt{M_{t}}}{\delta}\right) users that have contributed at least Mt2\frac{M_{t}}{2} samples each till time tt. In particular, since MtmM_{t}\leq m, we have the following: Once there are 16ε(Lln3Lmδ)\frac{16}{\varepsilon}\left(L\ln\frac{3L\sqrt{m}}{\delta}\right) users that have contributed at least m2\frac{m}{2} samples each till time t0t_{0}, then diversity condition holds for every tt0t\geq t_{0}.

Theorem 3.4.

Algorithm 6 for continual Bernoulli mean estimation is user-level ε\varepsilon-DP. Moreover, if at time t[T]t\in[T], diversity condition (11) holds, then, with probability at least 1δ1-\delta (for arbitrary δ(0,1]\delta\in(0,1]),

|μ^tμ|=O~(1t+Mttε).\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right).

What happens at time instants when the diversity condition does not hold? This happens when there are very few users who have contributed a very large number of samples. Algorithm 6 stores these samples in buffers (since the corresponding binary mechanisms are “inactive”) and does not use them to estimate μ^t\hat{\mu}_{t}. This is done to preserve user-level privacy and seems like a necessary thing to do. However, currently we do not know whether there is a user-level private way to use these extra samples from few users. In other words, it is not clear if our diversity condition can be weakened.

4 EXTENSIONS

To perform mean estimation on dd-dimensional inputs with independent coordinates drawn from Ber(μ)\text{Ber}(\mu), one can simply use Algorithm 6 on each coordinate. Since the release corresponding to each coordinate is user-level ε\varepsilon-DP, we get that the overall algorithm is dεd\varepsilon-DP by basic composition. However, if we only require an approximate differential privacy guarantee, then [DRV10, Theorem III.3] shows that the full sequence of releases from all coordinates will be (εdln(1/δ~),δ~)\bigg{(}\varepsilon\sqrt{d\ln(1/\tilde{\delta})},\tilde{\delta}\bigg{)}-DP for every δ~(0,1]\tilde{\delta}\in(0,1]. Rescaling the privacy parameter to ensure (ε,δ~)(\varepsilon,\tilde{\delta})-DP overall gives error O~(1/t+dMt/tε)\tilde{O}(1/\sqrt{t}+\sqrt{dM_{t}}/t\varepsilon), provided u=1nmin{mu(t),Mt/2}O~(dMt/ε)\sum_{u=1}^{n}\min\{m_{u}(t),M_{t}/2\}\geq\tilde{O}(\sqrt{dM_{t}}/\varepsilon). These arguments carry over to the case of subgaussian distributions as well.

5 LOWER BOUND

Consider the single-release mean estimation problem with nn users, each having mm samples, where the estimated mean must be user-level ε\varepsilon-DP. In this setting, a lower bound of Ω(1mn+1mnε)\Omega\left(\frac{1}{\sqrt{mn}}+\frac{1}{\sqrt{m}n\varepsilon}\right) on achievable accuracy is known (Theorem 3 in [Liu+20]). Furthermore, in the same setting, Theorem 9 in [Lev+21] shows that any algorithm that is user-level ε\varepsilon-DP requires n=Ω(1ε)n=\Omega\left(\frac{1}{\varepsilon}\right) users. To see what these lower bounds say about our proposed continual-release algorithm (Algorithm 6), let tt be a time instant where NN users have contributed MM samples each (thus, t=NMt=NM). In this case, Mt=MM_{t}=M, and Theorem 3.4 gives an accuracy guarantee of O~(1MN+1MNε)\tilde{O}\left(\frac{1}{MN}+\frac{1}{\sqrt{M}N\varepsilon}\right), provided N=Ω~(1ε)N=\tilde{\Omega}\left(\frac{1}{\varepsilon}\right) (diversity condition). This matches the single-release lower bounds (on both accuracy and number of users) up to log factors.

6 DISCUSSION

We have shown that Algorithm 6 is almost optimal at every time instant where users have contributed equal number of samples. However, what about the settings when different users contribute different number of samples? Is it optimal, for instance, to not use excessive samples from a single user? The answer is not very clear even in the single-release setting. Investigating this is an interesting future direction.

References

  • [NS08] Arvind Narayanan and Vitaly Shmatikov “Robust De-anonymization of Large Sparse Datasets” In 2008 IEEE Symposium on Security and Privacy, 2008, pp. 111–125
  • [SAW13] Latanya Sweeney, Akua Abu and Julia Winn “Identifying Participants in the Personal Genome Project by Name” In Data Privacy Lab, IQSS, Harvard University, 2013 URL: http://dataprivacylab.org/projects/pgp/
  • [GAM19] Simson L. Garfinkel, John M. Abowd and Chris Martindale “Understanding database reconstruction attacks on public data” In Communications of the ACM 62, 2019, pp. 46–53
  • [Dwo+06] Cynthia Dwork, Frank McSherry, Kobbi Nissim and Adam D. Smith “Calibrating Noise to Sensitivity in Private Data Analysis” In TCC 3876, Lecture Notes in Computer Science Springer, 2006, pp. 265–284
  • [Ami+19] Kareem Amin, Alex Kulesza, Andres Muñoz Medina and Sergei Vassilvitskii “Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy” In Proceedings of the 36th International Conference on Machine Learning PMLR, 2019, pp. 263–271
  • [CSS10] T.-H. Chan, Elaine Shi and Dawn Song “Private and Continual Release of Statistics” In ICALP (2) 6199, Lecture Notes in Computer Science Springer, 2010, pp. 405–417
  • [Dwo+10] Cynthia Dwork, Moni Naor, Toniann Pitassi and Guy N. Rothblum “Differential Privacy under Continual Observation” In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC ’10 Cambridge, Massachusetts, USA: Association for Computing Machinery, 2010, pp. 715–724
  • [Lev+21] Daniel Levy et al. “Learning with User-Level Privacy” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 12466–12479
  • [DRV10] Cynthia Dwork, Guy N. Rothblum and Salil Vadhan “Boosting and Differential Privacy”, FOCS ’10 IEEE Computer Society, 2010, pp. 51–60 URL: https://doi.org/10.1109/FOCS.2010.12
  • [Jai+21] Palak Jain, Sofya Raskhodnikova, Satchit Sivakumar and Adam D. Smith “The Price of Differential Privacy under Continual Observation” In CoRR abs/2112.00828, 2021 arXiv: https://arxiv.org/abs/2112.00828
  • [Nik13] Aleksandar Nikolov “Differential Privacy in the Streaming World”, Simons Institute Workshop on Big Data and Differential Privacy, 2013
  • [Cha+12] T.-H. Chan, Mingfei Li, Elaine Shi and Wenchang Xu “Differentially Private Continual Monitoring of Heavy Hitters from Distributed Streams” In Proceedings of the 12th International Conference on Privacy Enhancing Technologies, PETS’12 Vigo, Spain: Springer-Verlag, 2012, pp. 140–159
  • [Bol+13] Jean Bolot et al. “Private Decayed Predicate Sums on Streams” In Proceedings of the 16th International Conference on Database Theory, ICDT ’13 Genoa, Italy: Association for Computing Machinery, 2013, pp. 284–295
  • [DKY17] Bolin Ding, Janardhan Kulkarni and Sergey Yekhanin “Collecting Telemetry Data Privately” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017
  • [Jos+18] Matthew Joseph, Aaron Roth, Jonathan Ullman and Bo Waggoner “Local Differential Privacy for Evolving Data” In Proceedings of the 32nd International Conference on Neural Information Processing Systems Montréal, Canada: Curran Associates Inc., 2018, pp. 2381–2390
  • [PAK19] Victor Perrier, Hassan Jameel Asghar and Dali Kaafar “Private Continual Release of Real-Valued Data Streams” In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019 The Internet Society, 2019
  • [Dwo+10a] Cynthia Dwork et al. “Pan-Private Streaming Algorithms” In ICS Tsinghua University Press, 2010, pp. 66–80
  • [Cum+21] Rachel Cummings, Vitaly Feldman, Audra McMillan and Kunal Talwar “Mean Estimation with User-level Privacy under Data Heterogeneity” In NeurIPS 2021 Workshop Privacy in Machine Learning, 2021 URL: https://openreview.net/forum?id=oYbQDV3mon-
  • [NME22] Shyam Narayanan, Vahab Mirrokni and Hossein Esfandiari “Tight and Robust Private Mean Estimation with Few Users” In Proceedings of the 39th International Conference on Machine Learning 162, Proceedings of Machine Learning Research PMLR, 2022, pp. 16383–16412
  • [FS17] Vitaly Feldman and Thomas Steinke “Generalization for Adaptively-chosen Estimators via Stable Median” In Proceedings of the 30th Conference on Learning Theory, COLT 2017, Amsterdam, The Netherlands, 7-10 July 2017 65, Proceedings of Machine Learning Research PMLR, 2017, pp. 728–757
  • [Liu+20] Yuhan Liu et al. “Learning discrete distributions: user vs item-level privacy” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 20965–20976 URL: https://proceedings.neurips.cc/paper/2020/file/f06edc8ab534b2c7ecbd4c2051d9cb1e-Paper.pdf
  • [DR+14] Cynthia Dwork and Aaron Roth “The algorithmic foundations of differential privacy” In Foundations and Trends® in Theoretical Computer Science 9.3–4 Now Publishers, Inc., 2014, pp. 211–407

Appendix A Useful Inequalities

We state two concentration inequalities that we will use extensively.

Lemma A.1.

Let xiiidBer(μ)x_{i}\stackrel{{\scriptstyle iid}}{{\sim}}\text{Ber}(\mu) for i[n]i\in[n]. Then, for every δ(0,1]\delta\in(0,1],

(|i=1nxinμ|n2ln2δ)δ.\displaystyle\mathbb{P}\bigg{(}\bigg{\lvert}\sum_{i=1}^{n}x_{i}-n\mu\bigg{\rvert}\geq\sqrt{\frac{n}{2}\ln\frac{2}{\delta}}\bigg{)}\leq\delta.
Lemma A.2.

Let xiLap(bi)x_{i}\sim\text{Lap}(b_{i}), i[n]i\in[n], be independent. Then, for every δ(0,1]\delta\in(0,1],

(|i=1nxi|ci=1nbi2ln1δ)δ,\displaystyle\mathbb{P}\bigg{(}\bigg{\lvert}\sum_{i=1}^{n}x_{i}\bigg{\rvert}\geq c\sqrt{\sum_{i=1}^{n}b_{i}^{2}}\ln\frac{1}{\delta}\bigg{)}\leq\delta,

where cc is an absolute constant.

Appendix B Continual mean estimation: Wishful scenario

Algorithm 3 is the algorithm under the wishful scenario where:

  • we already have a prior estimate μ~\tilde{\mu} that satisfies |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}};

  • every user contributes their mm samples in contiguous time steps; that is, user 11 contributes samples for the first mm time steps, followed by user 2 who contributes samples for the next mm time steps, and so on.

B.1 Guarantees for Algorithm 3

Let xj(u)x^{(u)}_{j} denote the jj-th sample contributed by user uu.

Privacy.

A user uu contributes at most one element to BinMech.Stream{\rm BinMech.Stream}; this element is Π(j=1mxj(u))\Pi\left(\sum_{j=1}^{m}x_{j}^{(u)}\right), where Π()\Pi(\cdot) is the projection on the interval \mathcal{I} defined in (2). So, there are at most nn elements in BinMech.Stream{\rm BinMech.Stream} throughout the course of the algorithm. From the way binary mechanism works, a given element in BinMech.Stream{\rm BinMech.Stream} will be used at most (1+logn)(1+\log n) times while computing terms in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} (see Section 2.3 in the main paper). Thus, changing a user can change the 1\ell_{1}-norm of the array BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} by at most (1+logn)(2Δ)(1+\log n)(2\Delta), where Δ\Delta is as in (2). Hence, adding independent Lap(η){\rm Lap}(\eta) noise (with η\eta as in (3)) while computing each term in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} is sufficient to ensure that the array BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} remains user-level ε\varepsilon-DP throughout the course of the algorithm. Since the output (μ^t)t=1T\left(\hat{\mu}_{t}\right)_{t=1}^{T} is computed using BinMech.Sum{\rm BinMech.Sum}, which, in turn is a function of the array BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}, we conclude that Algorithm 3 is user-level ε\varepsilon-DP.

Utility.

We first show that with high probability, the projection operator Π()\Pi(\cdot) plays no role throughout the algorithm. We then show utility guarantee for t=kmt=km assuming no truncation, before generalizing the guarantee to arbitrary tt.

No truncation happens:

For a user uu, we have from Lemma A.1 that

Pr(|j=1mxj(u)mμ|m2ln2nδ)1δn.\mathrm{Pr}\left(\left\lvert\sum_{j=1}^{m}x_{j}^{(u)}-m\mu\right\rvert\leq\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}\right)\geq 1-\frac{\delta}{n}.

Since |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}, this gives us

u[n],Pr(|j=1mxj(u)mμ~|m2ln2nδ+m)1δn.\forall u\in[n],\ \mathrm{Pr}\left(\left\lvert\sum_{j=1}^{m}x_{j}^{(u)}-m\tilde{\mu}\right\rvert\leq\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}+\sqrt{m}\right)\geq 1-\frac{\delta}{n}.

Thus, by union bound

Pr(u[n],|j=1mxj(u)mμ~|m2ln2nδ+m)1δ.\mathrm{Pr}\left(\forall u\in[n],\left\lvert\sum_{j=1}^{m}x_{j}^{(u)}-m\tilde{\mu}\right\rvert\leq\sqrt{\frac{m}{2}\ln\frac{2n}{\delta}}+\sqrt{m}\right)\geq 1-\delta.

This means, that with probability at least 1δ1-\delta, we have Π(j=1mxj(u))=j=1mxj(u)\Pi\left(\sum_{j=1}^{m}x_{j}^{(u)}\right)=\sum_{j=1}^{m}x_{j}^{(u)} for every u[n]u\in[n]. That is, with probability at least 1δ1-\delta, no truncation happens throughout the course of the algorithm.

Guarantee at t=kmt=km ignoring projection Π()\Pi(\cdot):

For now, consider Algorithm 3 without the projection operator Π()\Pi(\cdot) in Line 9. Call it Algorithm 3-NP (NP \equiv ‘No Projection’). For Algorithm 3-NP, the following happens at t=kmt=km, for integer 1kn1\leq k\leq n: BinMech.Stream{\rm BinMech.Stream} has elements Let σ1=(j=1mxj(1)),σ2=(j=1mxj(2)),,σk=(j=1mxj(k))\sigma_{1}=\left(\sum_{j=1}^{m}x_{j}^{(1)}\right),\sigma_{2}=\left(\sum_{j=1}^{m}x_{j}^{(2)}\right),\ldots,\sigma_{k}=\left(\sum_{j=1}^{m}x_{j}^{(k)}\right). Thus, BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} also contains kk terms, where each term has independent Lap(η)\text{Lap}(\eta) added to it. Hence, BinMech.Sum{\rm BinMech.Sum} would be computed using at most 1+logk1+\log k terms from BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}, and so, using Lemma A.2, we have

Pr(|Skmi=1kσi|cη1+logkln1δ)1δ.\mathrm{Pr}\left(\left\lvert S_{km}-\sum_{i=1}^{k}\sigma_{i}\right\rvert\leq c\eta\sqrt{1+\log k}\ln\frac{1}{\delta}\right)\geq 1-\delta.

Furthermore, using Lemma A.1, we have for t=kmt=km that

Pr(|i=1kσiμkm|km2ln2δ)1δ.\mathrm{Pr}\left(\left\lvert\sum_{i=1}^{k}\sigma_{i}-\mu km\right\rvert\leq\sqrt{\frac{km}{2}\ln\frac{2}{\delta}}\right)\geq 1-\delta.

Thus, we have the following at t=kmt=km:

Pr(|Skmμkm|km2ln2δ+cη1+logkln1δ)12δ\mathrm{Pr}\left(\left\lvert S_{km}-\mu km\right\rvert\leq\sqrt{\frac{km}{2}\ln\frac{2}{\delta}}+c\eta\sqrt{1+\log k}\ln\frac{1}{\delta}\right)\geq 1-2\delta

or, dividing by kmkm, we have

Pr(|μ^kmμ|12kmln2δ+cηkm1+logkln1δ)12δ.\mathrm{Pr}\left(\left\lvert\hat{\mu}_{km}-\mu\right\rvert\leq\sqrt{\frac{1}{2km}\ln\frac{2}{\delta}}+c\frac{\eta}{km}\sqrt{1+\log k}\ln\frac{1}{\delta}\right)\geq 1-2\delta. (12)

Guarantee at arbitrary tmt\geq m ignoring projection Π\Pi:

We now give utility guarantee of Algorithm 3-NP, for arbitrary tmt\geq m. Note that for any t[km,(k+1)m1)t\in[km,(k+1)m-1), k1k\geq 1, we output μ^t=μ^km\hat{\mu}_{t}=\hat{\mu}_{km}. Moreover, we always have that kmt2km\geq\frac{t}{2}. Thus, for t[km,(k+1)m1)t\in[km,(k+1)m-1), we have, with probability 12δ\geq 1-2\delta

|μ^tμ|\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert =|μ^kmμ|\displaystyle=\left\lvert\hat{\mu}_{km}-\mu\right\rvert
12kmln2δ+cηkm1+logkln1δ\displaystyle\leq\sqrt{\frac{1}{2km}\ln\frac{2}{\delta}}+c\frac{\eta}{km}\sqrt{1+\log k}\ln\frac{1}{\delta}
1tln2δ+c2ηt1+logkln1δ.\displaystyle\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\frac{2\eta}{t}\sqrt{1+\log k}\ln\frac{1}{\delta}. (since kmt2km\geq\frac{t}{2})

Final utility guarantee for Algorithm 3 at arbitrary tt:

The above utility guarantee was obtained for Algorithm 3-NP, which is a variant of Algorithm 3 without projection operator Π()\Pi(\cdot) in Line 9. Now, let 1\mathcal{E}_{1} be the event that no truncation happens. We already saw that no truncation happens (i.e. projection operator plays no role) with probability at least 1δ1-\delta. That is, Pr(1)1δ\mathrm{Pr}(\mathcal{E}_{1})\geq 1-\delta. Observe that

1\displaystyle\mathcal{E}_{1} :={no truncation happens}\displaystyle:=\left\{\text{no truncation happens}\right\}
={Algorithm 3 and Algorithm 3-NP become equivalent}.\displaystyle=\left\{\text{Algorithm~{}\ref{alg:wishful} and Algorithm~{}\ref{alg:wishful}-NP become equivalent}\right\}.

Let

2={|μ^tμ|1tln2δ+c2ηt1+logkln1δ for Algorithm 3-NP}\mathcal{E}_{2}=\left\{\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\frac{2\eta}{t}\sqrt{1+\log k}\ln\frac{1}{\delta}\text{ for Algorithm~{}\ref{alg:wishful}-NP}\right\}

We saw above that Pr(2)12δ\mathrm{Pr}(\mathcal{E}_{2})\geq 1-2\delta. Thus, using union bound that Pr(12)13δ\mathrm{Pr}(\mathcal{E}_{1}\cap\mathcal{E}_{2})\geq 1-3\delta. That is, for Algorithm 3, we have that for tmt\geq m, (we upper bound logk\log k by logn\log n)

Pr(|μ^tμ|1tln2δ+c2ηt1+lognln1δ)13δ.\mathrm{Pr}\left(\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\frac{2\eta}{t}\sqrt{1+\log n}\ln\frac{1}{\delta}\right)\geq 1-3\delta.

Substituting value of η\eta from (3), we get that for tmt\geq m, with probability at least 13δ1-3\delta, we have

|μ^tμ|\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert 1tln2δ+mtε(1+12ln2nδ)4c(1+logn)3/2ln1δ\displaystyle\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+\frac{\sqrt{m}}{t\varepsilon}\left(1+\sqrt{\frac{1}{2}\ln\frac{2n}{\delta}}\right){4c(1+\log n)^{3/2}}\ln\frac{1}{\delta}
=O~(1t+mtε).\displaystyle=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{m}}{t\varepsilon}\right).

This guarantee holds trivially for t<mt<m as well because the algorithm outputs the prior estimate μ~\tilde{\mu} (Lines 3-5), which gives us that |μ^tμ|=|μ~μ|1m1t\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}\leq\frac{1}{\sqrt{t}}.

Remark: Throughout this section, we assumed that we were given a prior estimate μ~\tilde{\mu} satisfying |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Our discussion here also applies to the case even when we have a prior that satisfies |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}} with probability at least 1δ1-\delta (instead of being deterministic). Also, if μ~\tilde{\mu} is computed using samples from users, it should be user-level DP. In particular, if μ~\tilde{\mu} is user-level ε\varepsilon-DP, the overall algorithm becomes user-level 2ε2\varepsilon-DP (using composition property of DP from Lemma 2.2).

Appendix C Continual mean estimation assuming prior estimate: Single binary mechanism

Before proving guarantees for Algorithm 4 (Theorem 3.1), we prove a claim about exponential withhold-release pattern with arbitrary user ordering.

C.1 Exponential withhold-release

Recall the exponential withhold-release pattern. For a given user uu, we release the first two samples x1(u),x2(u)x_{1}^{(u)},x_{2}^{(u)}; then, we withhold x3(u)x_{3}^{(u)} and release truncated version of (x3(u)+x4(u))(x_{3}^{(u)}+x_{4}^{(u)}) when we receive x4(u)x_{4}^{(u)}; we then withhold x5(u),x6(u),x7(u)x_{5}^{(u)},x_{6}^{(u)},x_{7}^{(u)} and release truncated version of (x5(u)+x6(u)+x7(u)+x8(u))(x_{5}^{(u)}+x_{6}^{(u)}+x_{7}^{(u)}+x_{8}^{(u)}) when we receive x8(u)x_{8}^{(u)}; and so on. In general, we withhold samples x21+1(u),,x21(u)x_{2^{\ell-1}+1}^{(u)},\ldots,x_{2^{\ell}-1}^{(u)} and release truncated version of (i=21+12xi(u))\left(\sum_{i=2^{\ell-1}+1}^{2^{\ell}}x_{i}^{(u)}\right) when we receive the 22^{\ell}-th sample x2(u)x_{2^{\ell}}^{(u)} from user uu.

We ignore truncations for now. For a user uu, let σ0(u)=x1(u),\sigma_{0}^{(u)}=x_{1}^{(u)}, σ1(u)=x2(u),\sigma_{1}^{(u)}=x_{2}^{(u)}, σ2(u)=(x3(u)+x4(u)),,\sigma_{2}^{(u)}=(x_{3}^{(u)}+x_{4}^{(u)}),\ldots, σ(u)=(i=21+12xi(u)),\sigma_{\ell}^{(u)}~{}=~{}\left(\sum_{i=2^{\ell-1}+1}^{2^{\ell}}x_{i}^{(u)}\right),\ldots. Let ((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} be a stream with arbitrary user order. We follow the exponential withhold-release protocol and feed σ(u)\sigma_{\ell}^{(u)}’s to an array named Stream{\rm Stream}. For instance, suppose we receive (x1,1),(x2,2),(x3,2),(x4,2),(x5,1),(x6,2),(x7,1),(x8,1)(x_{1},1),(x_{2},2),(x_{3},2),(x_{4},2),(x_{5},1),(x_{6},2),(x_{7},1),(x_{8},1), where the second index denotes the user identity. Then, the input feed to Stream{\rm Stream} looks as follows:

t=1:x1(1st sample from user 1)t=2:x2(1st sample from user 2)t=3:x3(2nd sample from user 2)t=4:𝚠𝚒𝚝𝚑𝚑𝚘𝚕𝚍(3rd sample from user 2)t=5:x5(2nd sample from user 1)t=6:x4+x6(4th sample from user 2)t=7:𝚠𝚒𝚝𝚑𝚑𝚘𝚕𝚍(3rd sample from user 1)t=8:x7+x8(4th sample from user 1)\begin{matrix}t=1:&x_{1}&(\text{1st sample from user }1)\\ t=2:&x_{2}&(\text{1st sample from user }2)\\ t=3:&x_{3}&(\text{2nd sample from user }2)\\ t=4:&{\tt withhold}&(\text{3rd sample from user }2)\\ t=5:&x_{5}&(\text{2nd sample from user }1)\\ t=6:&x_{4}+x_{6}&(\text{4th sample from user }2)\\ t=7:&{\tt withhold}&(\text{3rd sample from user }1)\\ t=8:&x_{7}+x_{8}&(\text{4th sample from user }1)\end{matrix} (13)

Since we are only interested in computing sums, feeding σ(u)=(i=21+12xi(u))\sigma_{\ell}^{(u)}=\left(\sum_{i=2^{\ell-1}+1}^{2^{\ell}}x_{i}^{(u)}\right) to Stream{\rm Stream} is equivalent to feeding information about 212^{\ell}-1 samples to Stream{\rm Stream}. We now claim the following.

Claim C.1.

Let ((xt,ut))t[T]\big{(}(x_{t},u_{t})\big{)}_{t\in[T]} have arbitrary user ordering. Suppose we follow the exponential withhold-release protocol and feed σ(u)\sigma_{\ell}^{(u)}’s to an array named Stream{\rm Stream}. Then, at any time tt, Stream{\rm Stream} contains information about at least t/2t/2 samples.

Proof.

Let ‘R’ denote ‘release’ and ‘W’ denote ‘withhold’. Suppose, for now, only user uu arrives at all time steps. Then, the withhold-release sequence looks like (Ru,Ru,Wu,Ru,3Wu,Ru,7Wu,Ru,)(R_{u},R_{u},W_{u},R_{u},3W_{u},R_{u},7W_{u},R_{u},\cdots). Here, ‘kWukW_{u}’ denotes ‘WuW_{u}’ occurs for the next kk steps in the sequence. Moreover, at every ‘RuR_{u}’, information about the samples withheld after previous ‘RuR_{u}’ are released. Note that, at any point along this withhold-release sequence, the number of samples withheld is less than or equal to the number of samples whose information has been released. This is for a given user uu.

Now, consider a withhold-release sequence induced by an arbitrary user ordering. E.g., if the user order is 1,2,2,2,1,2,1,11,2,2,2,1,2,1,1, then the corresponding withhold-release sequence is (R1,R2,R2,W2,R1,R2,W1,R1)(R_{1},R_{2},R_{2},W_{2},R_{1},R_{2},W_{1},R_{1}) (see (13)). Here, subscript denotes user ID. Now, at any time tt in this withhold-release sequence, if we just consider RuR_{u}’s and WuW_{u}’s up to time tt for a fixed user uu , we will have that (argued above) the number of samples withheld for user uu is less than or equal to the number of samples from user uu whose information has been released. This holds for every user who has occurred till time tt. Thus, at any time tt, total number of samples withheld (across all users) is less than or equal to total number of samples (across all users) whose information has been released. Hence, Stream{\rm Stream} contains information about at least t/2t/2 samples. ∎

C.2 Proof of Theorem 3.1

We will prove the following theorem.

Theorem.

Assume that we are given a user-level ε\varepsilon-DP prior estimate μ~\tilde{\mu} of the true mean μ\mu, such that |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Then, Algorithm 4 is 2ε2\varepsilon-DP. Moreover, for any given t[T]t\in[T], we have with probability at least 1δ1-\delta that

|μ^tμ|=O~(1t+mtε).\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{m}}{t\varepsilon}\right).
Proof.

We fist prove the privacy guarantee and then prove the utility guarantee.

Privacy.

The prior estimate μ~\tilde{\mu} is given to be user-level ε\varepsilon-DP. We will now show that the array
BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} is user-level ε\varepsilon-DP throughout the course of the algorithm. Note that the output estimates μ^t\hat{\mu}_{t} are computed using BinMech.Sum{\rm BinMech.Sum}, which is a function of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}. Thus, if μ~\tilde{\mu} and BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} are both user-level ε\varepsilon-DP, by composition property (Lemma 2.2), the overall output (μ^t)t=1T\left(\hat{\mu}_{t}\right)_{t=1}^{T} will be user-level 2ε2\varepsilon-DP.

Proof that BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} is user-level ε\varepsilon-DP:

Since a user uu contributes at most mm samples, and does so in an exponential withhold-release pattern, there are at most (1+logm)(1+\log m) elements in BinMech.Stream{\rm BinMech.Stream} corresponding to user uu. Since there are at most nn users, there are at most n(1+logm)n(1+\log m) elements added to BinMech.Stream{\rm BinMech.Stream} throughout the course of the algorithm.

Now, since there can be at most n(1+logm)n(1+\log m) elements in BinMech.Stream{\rm BinMech.Stream}, a given element in BinMech.Stream{\rm BinMech.Stream} will be used at most log(1+n(1+logm))\log(1+n(1+\log m)) times while computing BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}. Thus, changing a user can change the 1\ell_{1}-norm of BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} by at most (1+logm)(log(1+n(1+logm)))Δ(1+\log m)(\log(1+n(1+\log m)))\Delta, where Δ\Delta is the maximum sensitivity of an element contributed by a user to BinMech.Stream{\rm BinMech.Stream}. As can be seen from (4), we have Δ(m2ln2nlogmδ+m)\Delta_{\ell}\leq\left(\sqrt{\frac{m}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{m}\right) for every \ell. Thus, Δ=2(m2ln2nlogmδ+m)\Delta=2\left(\sqrt{\frac{m}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{m}\right) is an upper bound on worst-case sensitivity of an element contributed by a user to BinMech.Stream{\rm BinMech.Stream}. Hence, adding independent Lap(η(m,n,δ)){\rm Lap}(\eta(m,n,\delta)) noise (with η(m,n,δ)\eta(m,n,\delta) as in (5)) while computing each term in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} is sufficient to ensure that BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} is user-level ε\varepsilon-DP.

Utility.

We first show that with high probability, the projection operators Π()\Pi_{\ell}(\cdot) play no role throughout the course of the algorithm.

No truncation happens:

For a user uu, for any 1\ell\geq 1, we have from Lemma A.1 that

Pr(|(j=21+12xj(u))(21)μ|212ln2nlogmδ)1δnlogm.\mathrm{Pr}\left(\left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\mu\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}\right)\geq 1-\frac{\delta}{n\log m}.

Since |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}, this gives us that, for any user uu, for any 1\ell\geq 1,

Pr(|(j=21+12xj(u))(21)μ~|212ln2nlogmδ+21m)1δn.\mathrm{Pr}\left(\left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\tilde{\mu}\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}\right)\geq 1-\frac{\delta}{n}.

Note that a user contributes at most mm samples. Thus, at most logm\log m projection operators Π()\Pi_{\ell}(\cdot) are applied per user. Applying union bound (over \ell), we have that, for any user uu

Pr({1,,logm},|(j=21+12xj(u))(21)μ~|212ln2nlogmδ+21m)1δn.\mathrm{Pr}\left(\forall\ell\in\left\{1,\ldots,\lfloor\log m\rfloor\right\},\left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\tilde{\mu}\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}\right)\geq 1-\frac{\delta}{n}.

Now, we take union bound over nn users, which gives us that

Pr(u[n],{1,,logm},|(j=21+12xj(u))(21)μ~|212ln2nlogmδ+21m)1δ.\mathrm{Pr}\left(\forall u\in[n],\forall\ell\in\left\{1,\ldots,\lfloor\log m\rfloor\right\},\left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\tilde{\mu}\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\frac{2^{\ell-1}}{\sqrt{m}}\right)\geq 1-\delta.

Note that the projection operator Π()\Pi_{\ell}(\cdot) was defined as projection on interval \mathcal{I}_{\ell} as in (4). The above equation shows that with probability at least 1δ1-\delta, the projection operators do not play any role throughout the algorithm, and thus no truncation happens.

Utility at time tt ignoring projections Π\Pi_{\ell}:

For now, consider Algorithm 4 without the projection operator Π()\Pi_{\ell}(\cdot) in Line 11. Call it Algorithm 4-NP (NP \equiv ‘No Projection’). For Algorithm 4-NP, at any time tt, BinMech.Stream{\rm BinMech.Stream} has terms of the form σ(u)=(i=21+12xi(u))\sigma_{\ell}^{(u)}=\left(\sum_{i=2^{\ell-1}+1}^{2^{\ell}}x_{i}^{(u)}\right) (note that σ(u)\sigma_{\ell}^{(u)} “contains information” about 212^{\ell-1} samples from user uu). From Claim C.1, we have that, at time tt, BinMech.Stream{\rm BinMech.Stream} contains information about at least t/2t/2 samples. Thus, StS_{t} would be sum of at least t/2t/2 samples with Laplace noises added to it.

Note that, at any time tt, at most tt elements are present in BinMech.Stream{\rm BinMech.Stream}. This also means that at most tt elements are present in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}. Thus, computing StS_{t} (using BinMech.Sum{\rm BinMech.Sum}) would involve at most (1+logt)(1+\log t) terms from BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums}. Each term in BinMech.NoisyPartialSums{\rm BinMech.NoisyPartialSums} has independent Lap(η(m,n,δ)){\rm Lap}(\eta(m,n,\delta)) noise added to it, where η(m,n,δ)\eta(m,n,\delta) is as in (5).

Thus, at time tt, for Algorithm 4-NP,

St=[sum of at least t/2 Bernoulli samples]+[sum of at most (1+logt) i.i.d. Lap(η(m,n,δ)) terms]S_{t}=\left[\text{sum of at least $t/2$ Bernoulli samples}\right]+\left[\text{sum of at most $(1+\log t)$ i.i.d. ${\rm Lap}(\eta(m,n,\delta))$ terms}\right]

Hence, using Lemma A.1 and Lemma A.2, we get that at time tt,

Pr(|μ^tμ|1tln2δ+cη(m,n,δ)1+logtln1δ)12δ\mathrm{Pr}\left(\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\eta(m,n,\delta)\sqrt{1+\log t}\ln\frac{1}{\delta}\right)\geq 1-2\delta

Final utility guarantee for Algorithm 4 at time tt:

The above utility guarantee was obtained for Algorithm 4-NP, which is a variant of Algorithm 4 without projection operator Π()\Pi_{\ell}(\cdot) in Line 11. Now, let 1\mathcal{E}_{1} be the event that no truncation happens. We already saw that no truncation happens (i.e. projection operator plays no role) with probability at least 1δ1-\delta. That is, Pr(1)1δ\mathrm{Pr}(\mathcal{E}_{1})\geq 1-\delta. Observe that

1\displaystyle\mathcal{E}_{1} :={no truncation happens}\displaystyle:=\left\{\text{no truncation happens}\right\}
={Algorithm 4 and Algorithm 4-NP become equivalent}.\displaystyle=\left\{\text{Algorithm~{}\ref{alg:singcounter} and Algorithm~{}\ref{alg:singcounter}-NP become equivalent}\right\}.

Let

2={|μ^tμ|1tln2δ+cη(m,n,δ)1+logtln1δ for Algorithm 4-NP}\mathcal{E}_{2}=\left\{\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\eta(m,n,\delta)\sqrt{1+\log t}\ln\frac{1}{\delta}\text{ for Algorithm~{}\ref{alg:singcounter}-NP}\right\}

We saw above that Pr(2)12δ\mathrm{Pr}(\mathcal{E}_{2})\geq 1-2\delta. Thus, using union bound that Pr(12)13δ\mathrm{Pr}(\mathcal{E}_{1}\cap\mathcal{E}_{2})\geq 1-3\delta. That is, for Algorithm 4, we have, with probability at least 13δ1-3\delta,

|μ^tμ|\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert 1tln2δ+cη(m,n,δ)t1+logtln1δ\displaystyle\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+c\frac{\eta(m,n,\delta)}{t}\sqrt{1+\log t}\ln\frac{1}{\delta}
=O(1tln1δ+η(m,n,δ)tlogtln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\eta(m,n,\delta)}{t}\sqrt{\log t}\ln\frac{1}{\delta}\right)
=O(1tln1δ+mtεlogtlogmlog(nlogm)lnnlogmδln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\sqrt{m}}{t\varepsilon}\sqrt{\log t}\log m\log(n\log m)\sqrt{\ln\frac{n\log m}{\delta}}\ln\frac{1}{\delta}\right) (using η(m,n,δ)\eta(m,n,\delta) from (5))
=O~(1t+mtε).\displaystyle=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{m}}{t\varepsilon}\right).

Appendix D Continual mean estimation assuming prior: Multiple binary mechanisms

D.1 Proof of Theorem 3.2

We will prove the following theorem.

Theorem.

Assume that we are given a user-level ε\varepsilon-DP prior estimate μ~\tilde{\mu} of the true mean μ\mu, such that |μ~μ|1m\left\lvert\tilde{\mu}-\mu\right\rvert\leq\frac{1}{\sqrt{m}}. Then, Algorithm 5 is 2ε2\varepsilon-DP. Moreover, for any given t[T]t\in[T], we have with probability at least 1δ1-\delta that

|μ^tμ|=O~(1t+Mttε),\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right),

where MtM_{t} denotes the maximum number of samples obtained from any user till time tt, i.e., Mt=max{mu(t):u[n]}M_{t}=\max\left\{m_{u}(t):u\in[n]\right\}, where mu(t)m_{u}(t) is the number of samples obtained from user uu till time tt.

Proof.

We fist prove the privacy guarantee and then prove the utility guarantee. Let L:=logmL:=\lfloor\log m\rfloor.

Privacy.

The prior estimate μ~\tilde{\mu} is given to be user-level ε\varepsilon-DP. We will now show that, for each {0,,L}\ell\in\left\{0,\ldots,L\right\}, the array BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} is user-level εL+1\frac{\varepsilon}{L+1}-DP. By composition property (Lemma 2.2), this would mean that (BinMech[].NoisyPartialSums)=0L\Big{(}{\rm BinMech}[\ell].{\rm NoisyPartialSums}\Big{)}_{\ell=0}^{L} is user-level ε\varepsilon-DP. Note that the output estimates μ^t\hat{\mu}_{t} are computed using (BinMech[].Sum)=0L\Big{(}{\rm BinMech}[\ell].{\rm Sum}\Big{)}_{\ell=0}^{L}, which is a function of
(BinMech[].NoisyPartialSums)=0L\Big{(}{\rm BinMech}[\ell].{\rm NoisyPartialSums}\Big{)}_{\ell=0}^{L}. Thus, if (BinMech[].NoisyPartialSums)=0L\Big{(}{\rm BinMech}[\ell].{\rm NoisyPartialSums}\Big{)}_{\ell=0}^{L} is user-level ε\varepsilon-DP, and prior estimate μ~\tilde{\mu} is also user-level ε\varepsilon-DP, the overall output (μ^t)t=1T\left(\hat{\mu}_{t}\right)_{t=1}^{T} is guaranteed to be user-level 2ε2\varepsilon-DP.

Proof that BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} is user-level εL+1\frac{\varepsilon}{L+1}-DP:

Consider BinMech[]{\rm BinMech}[\ell]. A user uu contributes at most one element to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream}; this element is Π(j=21+12xj(u))\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u)}_{j}\right), where Π()\Pi_{\ell}(\cdot) is the projection on the interval \mathcal{I}_{\ell} defined in 4. So, there are at most nn elements in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} throughout the course of the algorithm. A given element in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} will be used at most (1+logn)(1+\log n) times while computing terms in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}. Thus, changing a user can change the 1\ell_{1}-norm of the array
BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} by at most (1+logn)(2Δ)(1+\log n)(2\Delta_{\ell}), where Δ\Delta_{\ell} is as in (4). Hence, adding independent Lap(η(m,n,,δ)){\rm Lap}(\eta(m,n,\ell,\delta)) noise (with η(m,n,,δ)\eta(m,n,\ell,\delta) as in (7)) while computing each term in
BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} is sufficient to ensure that the array BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} remains user-level εL+1\frac{\varepsilon}{L+1}-DP throughout the course of the algorithm.

Utility.

Exactly as in the utility proof of Theorem 3.1 (Section C.2), we have that with probability at least 1δ1-\delta, the projection operators do not play any role throughout the algorithm, and thus no truncation happens.

Utility at time tt ignoring projections Π\Pi_{\ell}:

For now, consider Algorithm 5 without the projection operator Π()\Pi_{\ell}(\cdot) in Line 11. Call it Algorithm 5-NP (NP \equiv ‘No Projection’). We will derive utility at time tt for Algorithm 5-NP.

The only difference between Algorithm 5-NP and Algorithm 4-NP is that the term (j=21+12xj(u))\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u)}_{j}\right) from user uu is fed to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} instead of feeding it to a common binary mechanism stream. So, for Algorithm 5-NP, using Claim C.1, we have that, at time tt, the combined streams (BinMech[].Stream)=0L\big{(}{\rm BinMech}[\ell].{\rm Stream}\big{)}_{\ell=0}^{L} contain information about at least t/2t/2 samples. Thus, StS_{t} would be sum of at least t/2t/2 samples with Laplace noises added to it.

Now, since every user has contributed at most MtM_{t} samples, it follows that BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} for >logMt\ell>\lfloor\log M_{t}\rfloor will be empty. That is, only BinMech[0],,BinMech[logMt]{\rm BinMech}[0],\ldots,{\rm BinMech}[\lfloor\log M_{t}\rfloor] will contribute to the sum StS_{t} at time tt. Moreover, for logMt\ell\leq\lfloor\log M_{t}\rfloor, since each user contributes at most one item to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream}, there can be at most nn terms in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} and in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}. Thus, computing BinMech[].Sum{\rm BinMech}[\ell].{\rm Sum} at time tt involves at most (1+logn)(1+\log n) terms from BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}. Each term in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} has independent Lap(η(m,n,,δ)){\rm Lap}(\eta(m,n,\ell,\delta)) noise added to it, where η(m,n,,δ)\eta(m,n,\ell,\delta) is as in (7).

Thus, at time tt, for Algorithm 5-NP,

St=[sum of at least t/2 Bernoulli samples]+[=0logMtsum of at most (1+logn) i.i.d. Lap(η(m,n,,δ)) terms]S_{t}=\left[\text{sum of at least $t/2$ Bernoulli samples}\right]+\left[\sum_{\ell=0}^{\lfloor\log M_{t}\rfloor}\text{sum of at most $(1+\log n)$ i.i.d. ${\rm Lap}(\eta(m,n,\ell,\delta))$ terms}\right]

Hence, using Lemma A.1 and Lemma A.2, we get that at time tt,

Pr(|μ^tμ|1tln2δ+ctln1δ=0logMt(1+logn)η(m,n,,δ)2)12δ\mathrm{Pr}\left(\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+\frac{c}{t}\ln\frac{1}{\delta}\sqrt{\sum_{\ell=0}^{\lfloor\log M_{t}\rfloor}(1+\log n)\eta(m,n,\ell,\delta)^{2}}\right)\geq 1-2\delta

Final utility guarantee for Algorithm 5 at time tt:

The above utility guarantee was obtained for Algorithm 5-NP, which is a variant of Algorithm 5 without projection operator Π()\Pi_{\ell}(\cdot) in Line 11. But, indeed, with probability at least 1δ1-\delta, no truncation happens. Proceeding as in the proof of Theorem 3.1 (Section C.2), we take a union bound, and get that, with probability at least 13δ1-3\delta, Algorithm 5 satisfies

|μ^tμ|\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert 1tln2δ+ctln1δ=0logMt(1+logn)η(m,n,,δ)2\displaystyle\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+\frac{c}{t}\ln\frac{1}{\delta}\sqrt{\sum_{\ell=0}^{\lfloor\log M_{t}\rfloor}(1+\log n)\eta(m,n,\ell,\delta)^{2}}
=O(1tln1δ+η(m,n,logMt,δ)tlognln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\eta(m,n,\lfloor\log M_{t}\rfloor,\delta)}{t}\sqrt{\log n}\ln\frac{1}{\delta}\right)
=O(1tln1δ+Mttε(logm)(logn)3/2lnnlogmδln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\sqrt{M_{t}}}{t\varepsilon}(\log m)(\log n)^{3/2}\sqrt{\ln\frac{n\log m}{\delta}}\ln\frac{1}{\delta}\right) (from (7))
=O~(1t+Mttε).\displaystyle=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right).

Appendix E Continual mean estimation: Full algorithm

E.1 Private Median Algorithm

Claim E.1.

For any 1\ell\geq 1, Algorithm 7 (PrivateMedian{\rm PrivateMedian}) is user-level ε\varepsilon-DP. Moreover, with probability at least 1δβ1-\delta-\beta,

|μ~μ|212ln2k(ε,,β)δ\left\lvert\tilde{\mu}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k(\varepsilon,\ell,\beta)}{\delta}}

where k(ε,,β)=16εln2/2βk(\varepsilon,\ell,\beta)=\frac{16}{\varepsilon}\ln\frac{2^{\ell/2}}{\beta}.

Proof.

Let k:=k(ε,,β)k:=k(\varepsilon,\ell,\beta).

Privacy.

From the way arrays S1,,SkS_{1},\ldots,S_{k} in Algorithm 7 are created (Lines 3-8), it follows that samples from any given user uu appears in at most 22 arrays. This is because: (i) each array contains 212^{\ell-1} samples; and (ii) each user contributes at most 212^{\ell-1} samples (see definition of rr in Line 4); (iii) samples from a user are added contiguously to arrays (see Lines 5-6). Now, for j[k]j\in[k], since YjY_{j} is the average of samples in array SjS_{j}, and YjY^{\prime}_{j} is a quantized version of YjY_{j}, it follows that changing a user changes at most 22 elements out of {Y1,,Yk}\left\{Y^{\prime}_{1},\ldots,Y^{\prime}_{k}\right\}. Thus, for any y𝒯y\in\mathcal{T}, the cost c(y)c(y) can vary by at most 22 if a user is changed. Since the worst-case sensitivity (w.r.t. change of a user) of cost cc is Δc:=2\Delta c:=2, exponential mechanism with sampling probability proportional to exp(ε2Δcc(y))\exp\left(-\frac{\varepsilon}{2\Delta c}c(y)\right) is ε\varepsilon-DP ([DR+14]) w.r.t. change of a user. This proves that Algorithm 7 is user-level ε\varepsilon-DP.

Utility.

Since for each j[k]j\in[k], YjY_{j} is sample mean of 212^{\ell-1} Bernoulli random variables, we have by Lemma A.1 that

j[k],Pr(|Yjμ|12ln2kδ)1δk.\forall j\in[k],\ \mathrm{Pr}\left(\left\lvert Y_{j}-\mu\right\rvert\leq\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k}{\delta}}\right)\geq 1-\frac{\delta}{k}.

Thus, by union bound,

Pr(j[k],|Yjμ|12ln2kδ)1δ.\mathrm{Pr}\left(\forall j\in[k],\ \left\lvert Y_{j}-\mu\right\rvert\leq\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k}{\delta}}\right)\geq 1-\delta.

Since |Yjμ||YjYj|+|Yjμ|\left\lvert Y^{\prime}_{j}-\mu\right\rvert\leq\left\lvert Y^{\prime}_{j}-Y_{j}\right\rvert+\left\lvert Y_{j}-\mu\right\rvert, and |YjYj|2/2\left\lvert Y^{\prime}_{j}-Y_{j}\right\rvert\leq 2^{-\ell/2}, it follows that

Pr(j[k],|Yjμ|212ln2kδ)1δ.\mathrm{Pr}\left(\forall j\in[k],\ \left\lvert Y^{\prime}_{j}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k}{\delta}}\right)\geq 1-\delta. (14)

Now, from Theorem 3.1 in [FS17], it follows that the exponential mechanism outputs μ~\tilde{\mu} which is a (1/4,3/4)\left(1/4,3/4\right)-quantile of Y1,,YkY^{\prime}_{1},\ldots,Y^{\prime}_{k} with probability at least 1β1-\beta. (In the statement of Theorem 3.1 in [FS17], the condition “m4ln(|T|/β)/εαm\geq 4\ln(\left\lvert T\right\rvert/\beta)/\varepsilon\alpha” becomes, in our case, k16ln(|𝒯|/β)/εk\geq 16\ln(\left\lvert\mathcal{T}\right\rvert/\beta)/\varepsilon after substituting m=km=k, α=2\alpha=2, and accounting for the fact that the cost c(y)c(y) has sensitivity 22 w.r.t. change of a user).

If |Yjμ|212ln2kδ\left\lvert Y^{\prime}_{j}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k}{\delta}} holds j[k]\forall j\in[k], it must also hold for (1/4,3/4)\left(1/4,3/4\right)-quantile of Y1,,YkY^{\prime}_{1},\ldots,Y^{\prime}_{k}. Thus, from (14) and the fact that μ~\tilde{\mu} is a (1/4,3/4)\left(1/4,3/4\right)-quantile 111For a dataset sns\in\mathbb{R}^{n}, a (1/4,3/4)\left(1/4,3/4\right)-quantile is any vv\in\mathbb{R} such that |{i[n]:siv}|>n4\left\lvert\left\{i\in[n]:s_{i}\leq v\right\}\right\rvert>\frac{n}{4} and |{i[n]:si<v}|<3n4\left\lvert\left\{i\in[n]:s_{i}<v\right\}\right\rvert<\frac{3n}{4}. of Y1,,YkY^{\prime}_{1},\ldots,Y^{\prime}_{k} with probability at least 1β1-\beta, we get using union bound that

Pr(|μ~μ|212ln2kδ)1δβ.\mathrm{Pr}\left(\left\lvert\tilde{\mu}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k}{\delta}}\right)\geq 1-\delta-\beta.

E.2 Proof of Theorem 3.4

We will prove the following theorem.

Theorem.

Algorithm 6 for continual Bernoulli mean estimation is user-level ε\varepsilon-DP. Moreover, if at time t[T]t\in[T],

u=1nmin{mu(t),Mt2}Mt216ε(2Lln3LMtδ)(diversity condition)\sum_{u=1}^{n}\min\left\{m_{u}(t),\frac{M_{t}}{2}\right\}\geq\frac{M_{t}}{2}\frac{16}{\varepsilon}\left(2L\ln\frac{3L\sqrt{M_{t}}}{\delta}\right)\quad(\text{diversity condition})

then, with probability at least 1δ1-\delta,

|μ^tμ|=O~(1t+Mttε).\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right).

Here, mu(t)m_{u}(t) is the number of samples obtained from user uu till time tt, and Mt=max{mu(t):u[n]}M_{t}=\max\left\{m_{u}(t):u\in[n]\right\}.

Proof.

Let L:=logmL:=\lceil\log m\rceil.

Privacy.

Algorithm 6 uses samples to:

  • (i)

    compute μ~\tilde{\mu}_{\ell} using PrivateMedian((xi,ui)i=1t,ε2L,,δ3L){\rm PrivateMedian}\left((x_{i},u_{i})_{i=1}^{t},\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L}\right), for {2,,L}\ell\in\left\{2,\ldots,L\right\}; and

  • (ii)

    compute the array BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} for {0,,L}\ell\in\left\{0,\ldots,L\right\}.

The output (μ^t)t=1T\left(\hat{\mu}_{t}\right)_{t=1}^{T} is computed using BinMech[].sum{\rm BinMech}[\ell].{\rm sum}, which, in turn is a function of the array
BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}, {0,,L}\ell\in\left\{0,\ldots,L\right\}.

From Claim E.1, we get that μ~=PrivateMedian((xi,ui)i=1t,ε2L,,δ3L)\tilde{\mu}_{\ell}={\rm PrivateMedian}\left((x_{i},u_{i})_{i=1}^{t},\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L}\right) is user-level ε2L\frac{\varepsilon}{2L}-DP. Thus, from composition property of DP, we get that (μ~2,,μ~L)(\tilde{\mu}_{2},\ldots,\tilde{\mu}_{L}) is user-level ε2\frac{\varepsilon}{2}-DP.

Consider BinMech[]{\rm BinMech}[\ell]. A user uu contributes at most one element to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream}; this element is Π(j=21+12xj(u))\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u)}_{j}\right), where Π()\Pi_{\ell}(\cdot) is the projection on the interval \mathcal{I}_{\ell} defined in 8. So, there are at most nn elements in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} throughout the course of the algorithm. Now, a given element in BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} will be used at most (1+logn)(1+\log n) times while computing terms in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums}. Thus, changing a user can change the 1\ell_{1}-norm of the array BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} by at most (1+logn)(2Δ)(1+\log n)(2\Delta_{\ell}), where Δ\Delta_{\ell} is as in (9). Hence, adding independent Lap(η(m,n,,δ)){\rm Lap}(\eta(m,n,\ell,\delta)) noise (with η(m,n,,δ)\eta(m,n,\ell,\delta) as in (10)) while computing each term in BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} is sufficient to ensure that the array BinMech[].NoisyPartialSums{\rm BinMech}[\ell].{\rm NoisyPartialSums} remains user-level ε2(L+1)\frac{\varepsilon}{2(L+1)}-DP throughout the course of the algorithm. Since there are L+1L+1 binary mechanisms, by composition property of DP, we get that overall (BinMech[].NoisyPartialSums)=0L\left({\rm BinMech}[\ell].{\rm NoisyPartialSums}\right)_{\ell=0}^{L} is user-level ε2\frac{\varepsilon}{2}-DP.

Since (μ~2,,μ~L)(\tilde{\mu}_{2},\ldots,\tilde{\mu}_{L}) is user-level ε2\frac{\varepsilon}{2}-DP, and (BinMech[].NoisyPartialSums)=0L\left({\rm BinMech}[\ell].{\rm NoisyPartialSums}\right)_{\ell=0}^{L} is user-level ε2\frac{\varepsilon}{2}-DP, we again use composition property to conclude that the output (μ^t)t=1T(\hat{\mu}_{t})_{t=1}^{T} by Algorithm 6 is user-level ε\varepsilon-DP.

Utility.

At time tt, MtM_{t} is the maximum number of samples contributed by any user. We will call BinMech[]{\rm BinMech}[\ell] “active” at time tt, if there are sufficient number of users and samples so that μ~\tilde{\mu}_{\ell} can be obtained using PrivateMedian((xi,ui)i=1t,ε2L,,δ3L){\rm PrivateMedian}\left((x_{i},u_{i})_{i=1}^{t},\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L}\right) (Line 10 of Algorithm 6). Recall that we need μ~\tilde{\mu}_{\ell} to create truncation interval \mathcal{I}_{\ell} (see (8)). We know that, for a given user uu, x1(u)x_{1}^{(u)} goes to BinMech[0].Stream{\rm BinMech}[0].{\rm Stream}, x2(u)x_{2}^{(u)} goes to BinMech[1].Stream{\rm BinMech}[1].{\rm Stream}, and Π(j=21+12xj(u))\Pi_{\ell}\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x^{(u)}_{j}\right) goes to BinMech[].Stream{\rm BinMech}[\ell].{\rm Stream} for 2\ell\geq 2, provided BinMech[]{\rm BinMech}[\ell] is “active”. Thus, at time tt, since the maximum number of samples contributed by any user is MtM_{t}, we would like all binary mechanisms till BinMech[Lt]{\rm BinMech}[L_{t}] to be “active”, where Lt:=logMtL_{t}:=\lfloor\log M_{t}\rfloor. Condition (11) guarantees that we have sufficient number of users and samples to obtain μ~2,,μ~Lt\tilde{\mu}_{2},\ldots,\tilde{\mu}_{L_{t}} via PriateMedian{\rm PriateMedian} (Algorithm 7), thus ensuring that every truncation required at time tt is indeed possible.

All μ~\tilde{\mu}_{\ell}’s are “good”:

Suppose diversity condition (11) holds. Then, for each {2,,L}\ell\in\left\{2,\ldots,L\right\}, PrivateMedian((xi,ui)i=1t,ε2L,,δ3L){\rm PrivateMedian}\left((x_{i},u_{i})_{i=1}^{t},\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L}\right) outputs a “good” μ~\tilde{\mu}_{\ell} satisfying

|μ~μ|212ln2k(ε2L,,δ3L)δ/3L\left\lvert\tilde{\mu}_{\ell}-\mu\right\rvert\leq 2\sqrt{\frac{1}{2^{\ell}}\ln\frac{2k(\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L})}{\delta/3L}}

with probability at least 12δ3L1-\frac{2\delta}{3L} (see Claim E.1). Thus, by union bound, we have the following: with probability at least 12δ31-\frac{2\delta}{3}, μ~\tilde{\mu}_{\ell} is “good” for every {2,,L}\ell\in\left\{2,\ldots,L\right\}.

No truncation happens:

For a user uu, for any 1\ell\geq 1, we have from Lemma A.1 that

Pr(|(j=21+12xj(u))(21)μ|212ln2nlogmδ/3)1δ3nlogm.\mathrm{Pr}\left(\left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\mu\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta/3}}\right)\geq 1-\frac{\delta}{3n\log m}.

Taking union bound over nn users, we have that

Pr(u[n],|(j=21+12xj(u))(21)μ|212ln2nlogmδ/3)1δ3logm.\mathrm{Pr}\left(\forall u\in[n],\ \left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\mu\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta/3}}\right)\geq 1-\frac{\delta}{3\log m}.

Now, since all μ~\tilde{\mu}_{\ell}’s are “good” with probablity at least 12δ31-\frac{2\delta}{3}, we get using union bound that

Pr(u[n],[L],|(j=21+12xj(u))(21)μ~|212ln2nlogmδ+2ln2k(ε2L,,δ3L)δ/3L)1δ.\mathrm{Pr}\left(\forall u\in[n],\forall\ell\in[L],\ \left\lvert\left(\sum_{j=2^{\ell-1}+1}^{2^{\ell}}x_{j}^{(u)}\right)-(2^{\ell-1})\tilde{\mu}_{\ell}\right\rvert\leq\sqrt{\frac{2^{\ell-1}}{2}\ln\frac{2n\log m}{\delta}}+\sqrt{2^{\ell}\ln\frac{2k(\frac{\varepsilon}{2L},\ell,\frac{\delta}{3L})}{\delta/3L}}\right)\geq 1-\delta.

Note that the projection operator Π()\Pi_{\ell}(\cdot) was defined as projection on interval \mathcal{I}_{\ell} as in (8). The above equation shows that with probability at least 1δ1-\delta, the projection operators do not play any role throughout the algorithm, and thus no truncation happens.

Utility at time tt ignoring projections Π\Pi_{\ell}:

For now, consider Algorithm 6 without the projection operator Π()\Pi_{\ell}(\cdot) in Line 22. Call it Algorithm 6-NP (NP \equiv ‘No Projection’). We will derive utility at time tt for Algorithm 6-NP.

Note that Algorithm 6 -NP does not require μ~\tilde{\mu}_{\ell}’s. Thus, utility of Algorithm 6-NP would be same as the utility of Algorithm 5 -NP in the proof of Theorem 3.2 (Section D.1). Hence, at time tt, for Algorithm 6-NP,

Pr(|μ^tμ|1tln2δ+ctln1δ=0logMt(1+logn)η(m,n,,δ)2)12δ\mathrm{Pr}\left(\left\lvert\hat{\mu}_{t}-\mu\right\rvert\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+\frac{c}{t}\ln\frac{1}{\delta}\sqrt{\sum_{\ell=0}^{\lfloor\log M_{t}\rfloor}(1+\log n)\eta(m,n,\ell,\delta)^{2}}\right)\geq 1-2\delta

where η(m,n,,δ)\eta(m,n,\ell,\delta) is as in (10).

Final utility guarantee for Algorithm 6 at time tt:

The above utility guarantee was obtained for Algorithm 6-NP, which is a variant of Algorithm 6 without projection operator Π()\Pi_{\ell}(\cdot) in Line 22. But, as argued above, with probability at least 1δ1-\delta, no truncation happens. Thus, proceeding as in the proof of Theorem 3.1 (Section C.2), we take a union bound, and get that, with probability at least 13δ1-3\delta, Algorithm 6 satisfies

|μ^tμ|\displaystyle\left\lvert\hat{\mu}_{t}-\mu\right\rvert 1tln2δ+ctln1δ=0logMt(1+logn)η(m,n,,δ)2\displaystyle\leq\sqrt{\frac{1}{t}\ln\frac{2}{\delta}}+\frac{c}{t}\ln\frac{1}{\delta}\sqrt{\sum_{\ell=0}^{\lfloor\log M_{t}\rfloor}(1+\log n)\eta(m,n,\ell,\delta)^{2}}
=O(1tln1δ+η(m,n,logMt,δ)tlognln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\eta(m,n,\lfloor\log M_{t}\rfloor,\delta)}{t}\sqrt{\log n}\ln\frac{1}{\delta}\right)
=O(1tln1δ+ΔlogMttε(logm)(logn)3/2ln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\Delta_{\lfloor\log M_{t}\rfloor}}{t\varepsilon}(\log m)(\log n)^{3/2}\ln\frac{1}{\delta}\right) (from (10))
=O(1tln1δ+Mttε(logm)(logn)3/2(lnnlogmδ+ln((logm)2εδlnMtδ))ln1δ)\displaystyle=O\left(\sqrt{\frac{1}{t}\ln\frac{1}{\delta}}+\frac{\sqrt{M_{t}}}{t\varepsilon}(\log m)(\log n)^{3/2}\left(\sqrt{\ln\frac{n\log m}{\delta}}+\sqrt{\ln\left(\frac{(\log m)^{2}}{\varepsilon\delta}\ln\frac{\sqrt{M_{t}}}{\delta}\right)}\right)\ln\frac{1}{\delta}\right) (substituting for ΔlogMt\Delta_{\lfloor\log M_{t}\rfloor} from (8))
=O~(1t+Mttε).\displaystyle=\tilde{O}\left(\frac{1}{\sqrt{t}}+\frac{\sqrt{M_{t}}}{t\varepsilon}\right).