This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\addauthor

Ryanred

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Ryan Rogers
Abstract

There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke [1], which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.

1 Introduction

Releasing histograms, counts over some set of items, is one of the most fundamental tasks in data analytics. Given a dataset, we want to compute the number of rows that satisfy some condition, grouped together by some column(s) of interest, say the number of rows in each country. Despite the commonality of this task, there are many different differentially private algorithms to use, depending on the setting we are in, e.g. do we want to show only the top-kk, do we know how many items a user can have in any row, do we know all possible values that a column can take, how often is the data refreshed? We will focus on the setting where we do not have, or even know, the counts of all possible items in a histogram. This is typical in practice, because SQL queries will only return items that have positive counts, otherwise how would it know items not present in the dataset? Unfortunately, differential privacy requires considering a hypothetical neighboring dataset, which might contain items that were previously unseen, which makes the privacy analysis challenging. Either we need to populate all possible items that could appear in the dataset and fill in the missing counts as zero, or we need to design better, more practical, DP algorithms. In the latter case, we would want to be able to ensure DP whether we were computing the top-10 skills in a dataset or the top-10 credit card numbers in the dataset. The latter should return no results, but the algorithm should ensure privacy in both cases. We will refer to the scenario where the DP algorithm does not know the domain of items that could be in a dataset beforehand as the Unknown Domain setting.

In differential privacy there are typically two different parameters ε,δ\varepsilon,\delta that are used to quantify the privacy of an algorithm. The parameter ε\varepsilon is typically referred to as the amount of privacy loss that an algorithm ensures, while δ\delta is commonly referred to as the probability in which the algorithm can have larger privacy loss than ϵ\epsilon. In fact, when δ>0\delta>0, we say an algorithm satisfies approximate differentially privacy, while we say pure DP with δ=0\delta=0. The parameter δ\delta then can be the chance that the algorithm returns a result that clearly violates privacy, e.g. returning an individual’s data record in the data set. Not every approximate DP algorithm satisfies this interpretation, which has resulted in variants of DP, based on Rényi divergence. In particular, adding Gaussian noise with fixed standard deviation σ\sigma to a statistic of interest cannot be pure DP, but can be (ε(δ),δ)(\varepsilon(\delta),\delta)-DP for any δ>0\delta>0. Hence the probability of failure need not be fixed in advance in the algorithm. However, there are many different algorithms that are shown to be (ε,δ)(\varepsilon,\delta)-differentially private, for a particular δ>0\delta>0. In particular, designing DP algorithms for releasing histograms in the Unknown Domain setting will require setting a failure probability in advance.

Consider the example where we want to know the most popular words typed in an email client. The set of all potential words is massive, and can include slang, typos, and abbreviations, so we only want to take words that are present in the dataset, rather than the set of all possible words. So one approach would be to add noise to all word counts and sort them to return the top ones. However, there could be a word like “RyanRogersSSN123456789”. Even if there is noise in the counts, the mere presence of this record is a clear violation of privacy. To prevent this, we can introduce a threshold on the noisy counts, but then there is a chance that the noise is especially large, making a small count go above the threshold. This is where the δ>0\delta>0 probability comes in, so we want to make sure that the threshold is set high enough to ensure that small counts with noise can only appear above the threshold with probability δ\delta. However, it does not suffice to just bound the probability of bad outcomes to prove an algorithm is approximate differentially private.

We present a general framework that can be used in the privacy analysis of several algorithms that aims to release counts from an unspecified domain, which would only return results that are present in the original dataset, although the framework can be applied to other scenarios. The main idea is to show a mechanism AA satisfies three conditions: (1) there are small chance events that can lead to differentiating outcomes, where it is clear whether one dataset was used, (2) there is another mechanism AA^{\prime} that can know both neighboring datasets, and is equal to AA on all non-differentiating outcomes, and (3) AA^{\prime} is pure DP. Although the algorithms we cover were previously shown to satisfy approximate differential privacy, we revisit their analyses, following our general framework. Furthermore, we show that providing a privacy analysis in terms of approximate concentrated DP (CDP) can lead to improved privacy parameters, especially with algorithms based on Gaussian noise or the Exponential Mechanism.

We advocate for presenting privacy guarantees of new or existing algorithms in terms of approximate CDP, rather than approximate DP, as composing the privacy parameters for CDP parameters is straightforward, while composing approximate DP parameters can be complicated and loose. Note that each algorithm is likely to be part of a more general privacy system where the overall privacy guarantee of the system can be in terms of approximate DP. It has become common to compose algorithms using an analysis based on approximate CDP and then converting to an overall approximate DP guarantee at the end. Leaving privacy guarantees of individual algorithms in terms of approximate DP will lead to loose bounds in converting the DP parameters to CDP parameters, composing CDP parameters, then lastly converting back to DP at the end.

2 Preliminaries

We now define approximate differential privacy which depends on neighboring datasets x,x𝒳x,x^{\prime}\in\mathcal{X}, denoted as xxx\sim x^{\prime} that differ in the presence or absence of one user’s records.

Definition 2.1 (Dwork et al. [10, 9]).

An algorithm A:𝒳𝒴A:\mathcal{X}\rightarrow\mathcal{Y} is (ϵ,δ)(\epsilon,\delta)-differentially private if, for any measurable set S𝒴S\subseteq\mathcal{Y} and any neighboring inputs xxx\sim x^{\prime},

Pr[A(x)S]eϵPr[A(x)S]+δ.\Pr[A(x)\in S]\leq e^{\epsilon}\Pr[A(x^{\prime})\in S]+\delta. (1)

If δ=0\delta=0, we say AA is ε\varepsilon-DP or simply pure DP.

One of the classical pure DP algorithms is the Laplace mechanism, which adds Laplace noise to a statistic. However, to determine the scale of noise to add to the statistic to ensure DP, we must know its sensitivity. We then define the p\ell_{p}-sensitivity of a statistic f:𝒳df:\mathcal{X}\to\mathbb{R}^{d} that takes a dataset x𝒳x\in\mathcal{X} to a real vector in d\mathbb{R}^{d} as the following where the max is taken over neighboring x,x𝒳x,x^{\prime}\in\mathcal{X}

Δp(f)=maxxx{f(x)f(x)p}.\Delta_{p}(f)=\max_{x\sim x^{\prime}}\left\{||f(x)-f(x^{\prime})||_{p}\right\}.

We then have the following privacy guarantee for the Laplace mechanism.

Theorem 1 (Dwork et al. [10]).

Let f:𝒳df:\mathcal{X}\to\mathbb{R}^{d} have 1\ell_{1}-sensitivity Δ1(f)\Delta_{1}(f), then the mechanism M:𝒳dM:\mathcal{X}\to\mathbb{R}^{d} where M(x)=f(x)+(Z1,,Zd)M(x)=f(x)+(Z_{1},\cdots,Z_{d}) with {Zi}i.i.d.Lap(Δ1(f)/ε)\{Z_{i}\}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{Lap}(\Delta_{1}(f)/\varepsilon) is ε\varepsilon-DP for ε>0\varepsilon>0.

Another classical pure DP mechanism is the Exponential Mechanism, which takes a quality score q:𝒳×𝒴q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} mapping a dataset and outcome to a real value and the goal is to return outcomes that have a high quality score. We will define the range of a quality score to be the following where the max over datasets is over neighbors x,x𝒳x,x^{\prime}\in\mathcal{X},111The original Exponential Mechanism was presented in terms of a quality scores sensitivity, while recent work from [5] showed that the Exponential Mechanism is more naturally defined in terms of the quality score’s range. See also Jinshuo Dong’s blog post https://dongjs.github.io/2020/02/10/ExpMech.html

Δ(q)=maxy,y𝒴maxxx{(q(x,y)q(x,y))(q(x,y)q(x,y))}.\Delta(q)=\max_{y,y^{\prime}\in\mathcal{Y}}\max_{x\sim x^{\prime}}\{(q(x,y)-q(x^{\prime},y))-(q(x,y^{\prime})-q(x^{\prime},y^{\prime}))\}.

We then have the following privacy guarantee of the Exponential Mechanism.

Theorem 2 (McSherry and Talwar [15]).

Let q:𝒳×𝒴q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} be a quality score with sensitivity Δ(q)\Delta(q), then the mechanism M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y} is ε\varepsilon-DP for any ε>0\varepsilon>0 where

Pr[M(x)=y]exp(εq(x,y)Δ(q))\Pr[M(x)=y]\propto\exp\left(\frac{\varepsilon q(x,y)}{\Delta(q)}\right)

We now define approximate concentrated differential privacy (CDP),222Although [1] defines zCDP, to differentiate between CDP from [8], we will use CDP to be the version from [1] which differs slightly from the original definition but was later shown [21] to be equivalent to the original version. Similar to approximate DP, it permits a small probability of unbounded Rényi divergence.

Definition 2.2 (Bun and Steinke [1], Papernot and Steinke [16]).

Suppose A:𝒳𝒴A:\mathcal{X}\to\mathcal{Y} and ρ,δ0\rho,\delta\geq 0. We say the algorithm AA is δ\delta-approximate ρ\rho-CDP if, for any neighboring datasets x,xx,x^{\prime}, there exist distributions P,P′′,Q,Q′′P^{\prime},P^{\prime\prime},Q^{\prime},Q^{\prime\prime} such that the outputs are distributed according to the following mixture distributions:

A(x)(1δ)P+δP′′A(x)(1δ)Q+δQ′′,\displaystyle A(x)\sim(1-\delta)P^{\prime}+\delta P^{\prime\prime}\qquad A(x^{\prime})\sim(1-\delta)Q^{\prime}+\delta Q^{\prime\prime},

where for all λ1\lambda\geq 1, Dλ(PQ)ρλD_{\lambda}(P^{\prime}\|Q^{\prime})\leq\rho\lambda and Dλ(QP)ρλD_{\lambda}(Q^{\prime}\|P^{\prime})\leq\rho\lambda.

We can also convert approximate differential privacy to approximate CDP and vice versa.

Theorem 3 (Bun and Steinke [1]).

If AA is (ε,δ)(\varepsilon,\delta)-DP then it is δ\delta-approximate ε2/2\varepsilon^{2}/2-CDP. If AA is δ\delta-approximate ρ\rho-CDP then it is (ρ+2ρlog(1/δ),δ+δ)(\rho+2\sqrt{\rho\log(1/\delta^{\prime})},\delta^{\prime}+\delta)-DP for any δ>0\delta^{\prime}>0.

The classical CDP mechanism is the Gaussian Mechanism. Note that the Gaussian Mechanism was originally introduced as satisfying approximate DP, but it was then shown to satisfy pure CDP in later work [8, 1].

Theorem 4 (Bun and Steinke [1]).

Let f:𝒳df:\mathcal{X}\to\mathbb{R}^{d} have 2\ell_{2}-sensitivity Δ2(f)\Delta_{2}(f), then the mechanism M:𝒳dM:\mathcal{X}\to\mathbb{R}^{d} where M(x)=f(x)+(Z1,,Zd)M(x)=f(x)+(Z_{1},\cdots,Z_{d}) with {Zi}i.i.d.N(0,Δ2(f)22ρ)\{Z_{i}\}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}(0,\tfrac{\Delta_{2}(f)^{2}}{2\rho}) is ρ\rho-CDP for ρ>0\rho>0.

Note that we can apply Theorem 3 to conclude that the Exponential Mechanism is ε\varepsilon-DP and hence ε2/2\varepsilon^{2}/2-CDP, but work from Cesar and Rogers [2] showed that the Exponential Mechanism actually has a better CDP parameter.333Also see the previous blog post at https://differentialprivacy.org/exponential-mechanism-bounded-range/

Theorem 5 (Cesar and Rogers [2]).

Let q:𝒳×𝒴q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R} be a quality score with sensitivity Δ(q)\Delta(q), then the mechanism M:𝒳𝒴M:\mathcal{X}\to\mathcal{Y} is ε2/8\varepsilon^{2}/8-CDP for any ε>0\varepsilon>0 where

Pr[M(x)=y]exp(εq(x,y)Δ(q))\Pr[M(x)=y]\propto\exp\left(\frac{\varepsilon q(x,y)}{\Delta(q)}\right)

We also state the composition property of CDP, showing that the overall privacy parameters degrade after multiple CDP algorithms are run on a dataset.

Theorem 6.

Let A1:𝒳𝒴A_{1}:\mathcal{X}\to\mathcal{Y} be δ1\delta_{1}-approximate ρ1\rho_{1}-CDP and A2:𝒳×𝒴𝒵A_{2}:\mathcal{X}\times\mathcal{Y}\to\mathcal{Z} where A2(,y)A_{2}(\cdot,y) is δ2\delta_{2}-approximate ρ2\rho_{2}^{\prime}-CDP for all y𝒴y\in\mathcal{Y}. Then A:𝒳𝒵A:\mathcal{X}\to\mathcal{Z} where A(x)=A2(x,A1(x))A(x)=A_{2}(x,A_{1}(x)) is (δ1+δ2δ1δ2)(\delta_{1}+\delta_{2}-\delta_{1}\cdot\delta_{2})-approximate ρ1+ρ2\rho_{1}+\rho_{2}-CDP.

3 Unifying Framework

We now present a general framework that can be used to unify some of the previous analyses for proving approximate DP (CDP) for various algorithms. If we want to show an algorithm AA is approximate CDP, we need to consider the randomness in AA that can generate differentiating outcomes, where if we are given two inputs xx and xx^{\prime}, we see an outcome of AA that could have only come from one of them. We want to be able to show that the chance that randomness in AA can generate these bad outcomes is at most δ\delta. Furthermore, we want to show that for all other non-differentiating outcomes, there are related mechanisms that match AA with inputs xx and xx^{\prime}. Lastly, we need to show that these related mechanisms satisfy pure CDP.

To be more precise, the following lemma can be used to prove many different algorithms that are approximate DP can also be proven to be approximate CDP directly, without needing to resort to the general approximate DP conversion to approximate CDP conversion.

Lemma 3.1.

Let A:𝒳𝒴A:\mathcal{X}\to\mathcal{Y} be a randomized algorithm and fix parameters ρ,δ0\rho,\delta\geq 0. If for each neighboring datasets x,xx,x^{\prime} the algorithm AA satisfies the following conditions, then it is δ\delta-approximate ρ\rho-CDP:

For each neighboring datasets x,xx,x^{\prime}, we have the following three conditions

  • 1.

    There exists events E,EE,E^{\prime} such that

    PrA(x)[E],PrA(x)[E]1δ.\Pr_{A(x)}[E],\Pr_{A(x^{\prime})}[E^{\prime}]\geq 1-\delta.

    Let SS be the corresponding outcomes of both A(x)A(x) conditioned on events EE and A(x)A(x^{\prime}) conditioned on EE^{\prime}.

  • 2.

    There exists distributions PP^{\prime} and QQ^{\prime} with common support SS, such that for all ySy\in S we have the following where PP, QQ are the distributions for A(x)A(x) and A(x)A(x^{\prime}), respectively,444We denote P(y),P(y),P′′(y),Q(y),Q(y),Q′′(y)P(y),P^{\prime}(y),P^{\prime\prime}(y),Q(y),Q^{\prime}(y),Q^{\prime\prime}(y) to denote the Radon-Nikodym derivative of the corresponding distribution with respect to some base measure (see a similar note in [19])

    P(y)=𝟙{yS}PrA(x)[E]P(y)Q(y)=𝟙{yS}PrA(x)[E]Q(y)P(y)=\mathbbm{1}\left\{y\in S\right\}\cdot\Pr_{A(x)}[E]P^{\prime}(y)\qquad Q(y)=\mathbbm{1}\left\{y\in S\right\}\cdot\Pr_{A(x^{\prime})}[E^{\prime}]Q^{\prime}(y)
  • 3.

    Also we have Dλ(P||Q)λρD_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\rho and Dλ(Q||P)λρD_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\rho for all λ1\lambda\geq 1.

Proof.

Fix neighbors x,xx,x^{\prime}. Let P,QP,Q be the distribution for A(x)A(x) and A(x)A(x^{\prime}), respectively. Consider the following distribution P′′P^{\prime\prime} where we write P(¬E)P(\cdot\mid\neg E) to denote the conditional distribution of PP given events EE

P′′(y)=1δ(P(y¬E)Pr[¬E]+P(y)(PrA(x)[E](1δ)))P^{\prime\prime}(y)=\frac{1}{\delta}\left(P(y\mid\neg E)\Pr[\neg E]+P^{\prime}(y)\left(\Pr_{A(x)}[E]-(1-\delta)\right)\right)

We then have that for ySy\in S

(1δ)P(y)+δP′′(y)=(1δ)P(y)+δ1δ(P(y)(Pr[E](1δ)))=P(y)Pr[E]=P(y).\displaystyle(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y)=(1-\delta)P^{\prime}(y)+\delta\cdot\frac{1}{\delta}\left(P^{\prime}(y)\cdot(\Pr[E]-(1-\delta))\right)=P^{\prime}(y)\Pr[E]=P(y).

Furthermore, for ySy\notin S we have

(1δ)P(y)+δP′′(y)=δ1δ(P(y¬E)Pr[¬E])=P(y¬E)Pr[¬E]=P(y).(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y)=\delta\cdot\frac{1}{\delta}\left(P(y\mid\neg E)\Pr[\neg E]\right)=P(y\mid\neg E)\Pr[\neg E]=P(y).

Hence, we have P(y)=(1δ)P(y)+δP′′(y)P(y)=(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y) for all outcomes yy. A similar arguments works to show Q(y)=(1δ)Q(y)+(1δ)Q′′(y)Q(y)=(1-\delta)Q^{\prime}(y)+(1-\delta)Q^{\prime\prime}(y) with Q′′(y)Q^{\prime\prime}(y) defined similar to P′′(y)P^{\prime\prime}(y).

By assumption, we know that Dλ(P||Q)λρD_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\rho and Dλ(Q||P)λρD_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\rho for all λ1\lambda\geq 1. Hence, AA is approximate CDP. ∎

Although a similar lemma can be used for approximate DP, this will typically lead to unnecessarily loose privacy parameters, as we will see later. Hence, providing an approximate CDP analysis of each algorithm will be useful in any privacy system that combines these algorithms because the CDP privacy parameters simply add up and can be converted to approximate DP parameters at the end, if necessary.

4 Unknown Domain Algorithms

We will denote a histogram as h={(i,ci):(i,ci)[d]×}h\in\mathcal{H}=\{(i,c_{i}):(i,c_{i})\in[d]\times\mathbb{N}\} which consists of a set of pairs with a label ii in [d][d] and its corresponding count cic_{i}. When we define neighboring histograms, we will need to consider how much a user can modify a histogram. If we remove a user’s contributions from a histogram hh, this can both remove items entirely or decrease counts by some amount. We will say that a histogram hh is (0,)(\ell_{0},\ell_{\infty})-sensitive if removing or adding a user’s data to histogram hh can change at most 0\ell_{0} distinct elements and each count in hh can differ by at most \ell_{\infty}. We now turn to some applications of Lemma 3.1 for some existing mechanisms.

4.1 Positive Count Histograms

We start with the setting where we want to return a histogram subject to CDP where we are only given positive count items, hence each count present in the histogram is at least 1. It is straightforward to extend this analysis to the case where we have a histogram with only counts above some known value larger than 1. This is a natural setting for data analytics as GROUP BY queries in SQL only provide items that exist in the dataset, so no zero counts are returned. This is problematic for privacy, as a neighboring histogram can have fewer results, resulting in a differing set of counts to add noise to. We present a general template for the private algorithm in Algorithm 1, where we deliberately leave the noise distribution and the threshold arbitrary.

Algorithm 1 Unknown Domain Histogram
Histogram hh, noise distribution Noise\mathrm{Noise} and threshold T>0T>0
Noisy Histogram h~\tilde{h} with counts above TT
Initialize h~=\tilde{h}=\emptyset.
for each item ii where (i,ci)h(i,c_{i})\in h such that ci>0c_{i}>0 do
     Set c~i=ci+Zi\tilde{c}_{i}=c_{i}+Z_{i} where ZiNoiseZ_{i}\sim\mathrm{Noise}.
     if c~i>T\tilde{c}_{i}>T  then
         h~=h~{(i,c~i)}\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}
     end if
end for

Previous mechanisms follow this template, specifically from Korolova et al. [13] and Wilson et al. [22] who used Laplace noise and Swanberg et al. [20] who used Gaussian noise. We now prove that Algorithm 1 is approximate CDP using Lemma 3.1.

Theorem 7.

Assume input histograms hh are (Δ0,Δ)(\Delta_{0},\Delta_{\infty})-sensitive. If we use Noise\mathrm{Noise} being the distribution of |h||h| many i.i.d. Lap(Δ/ε)\mathrm{Lap}(\Delta_{\infty}/\varepsilon) and threshold

T=Δ+Δεlog(Δ02δ),T=\Delta_{\infty}+\frac{\Delta_{\infty}}{\varepsilon}\log(\tfrac{\Delta_{0}}{2\delta}),

then Algorithm 1 is δ\delta-approximate Δ0ε2/2\Delta_{0}\cdot\varepsilon^{2}/2-CDP. Furthermore, if we use Noise=N(0,Δ2/ε2I|h|)\mathrm{Noise}=\mathrm{N}(0,\Delta_{\infty}^{2}/\varepsilon^{2}\cdot I_{|h|}) and threshold

T=Δ+ΔεΦ1(1δ/Δ0),T=\Delta_{\infty}+\frac{\Delta_{\infty}}{\varepsilon}\Phi^{-1}(1-\delta/\Delta_{0}),

where Φ1(z)\Phi^{-1}(z) is the inverse CDF of a standard normal then Algorithm 1 is δ\delta-approximate Δ0ε2/2\Delta_{0}\cdot\varepsilon^{2}/2-CDP.

Proof.

We will rely on Lemma 3.1 to prove this result. Consider neighbors h,hh,h^{\prime} where hh has one additional user’s data from hh^{\prime}, without loss of generality. By assumption, we know that hh can differ in at most Δ0\Delta_{0} counts for different labels. Furthermore, we know that in counts that are differing, they can differ by at most Δ\Delta_{\infty}. Consider the set SS to be the set of labels that are common between hh and hh^{\prime} along with corresponding counts. Since we assume that hh has one additional user’s data from hh^{\prime}, we know that this set SS must include all labels of hh^{\prime}. Note that the counts for the items that are not present in hh^{\prime} but are present in hh must be at most Δ\Delta_{\infty}. We now cover each item in Lemma 3.1.

  • 1.

    We define the event EE to be all the randomness in A(h)A(h) that can generate outcomes in SS, so that EE must only include the noise that is added to items that are common between hh and hh^{\prime} and the noise that is added to the items’ counts in hh, but not in hh^{\prime} must be bounded. We then lower bound the probability of event EE. Note that for every item jj who is in hh just not in SS, it’s count can be no more than Δ\Delta_{\infty}.

    PrA(h)[E]\displaystyle\Pr_{A(h)}[E] =j:(j,)hhPr[cj+NoiseT]\displaystyle=\prod_{j:(j,\cdot)\in h\setminus h^{\prime}}\Pr[c_{j}+\mathrm{Noise}\leq T]
    j:(j,)hhPr[Δ+NoiseT]\displaystyle\geq\prod_{j:(j,\cdot)\in h\setminus h^{\prime}}\Pr[\Delta_{\infty}+\mathrm{Noise}\leq T]

    We then consider the two scenarios with either Laplace noise or Gaussian noise.

    • (Laplace) With Laplace noise of scale b>0b>0, with T=Δ+blog(Δ02δ)T=\Delta_{\infty}+b\log(\tfrac{\Delta_{0}}{2\delta}), we have

      PrA(h)[E](112exp(TΔb))Δ01δ.\displaystyle\Pr_{A(h)}[E]\geq\left(1-\frac{1}{2}\exp\left(-\frac{T-\Delta_{\infty}}{b}\right)\right)^{\Delta_{0}}\geq 1-\delta.
    • (Gaussian) Next we consider the Gaussian noise version that has standard deviation σ>0\sigma>0 with T=Δ+σΦ1(1δ/Δ0)T=\Delta_{\infty}+\sigma\Phi^{-1}(1-\delta/\Delta_{0}).

      PrA(h)[E]Φ(TΔσ)Δ01δ.\displaystyle\Pr_{A(h)}[E]\geq\Phi\left(\frac{T-\Delta_{\infty}}{\sigma}\right)^{\Delta_{0}}\geq 1-\delta.

    For this case, the event EE^{\prime} is all randomness in A(h)A(h^{\prime}) that can generate outcomes in SS, which would include all randomness in A(h)A(h^{\prime}) because hh^{\prime} is a subset of hh.

  • 2.

    We then consider the mechanism AA^{\prime} whose domain is the set of common items between hh and hh^{\prime} and has noise added to each count so that only noisy counts above TT are returned with its corresponding item label. We write the distribution of A(h)A^{\prime}(h) as PP^{\prime} and the distribution of A(h)A^{\prime}(h^{\prime}) as QQ^{\prime}. Because we add independent noise to each histogram count, we can separate out the noise terms added to the counts of labels that are not common between hh and hh^{\prime}, hence we know that P(y)=Pr[E]P(y)P(y)=\Pr[E]P^{\prime}(y) for ySy\in S, by design. Furthermore QQQ\equiv Q^{\prime}.

  • 3.

    Note that AA^{\prime} is either the Laplace mechanism or the Gaussian mechanism over common items between hh and hh^{\prime}. We then cover each variant separately.

    • (Laplace) We first consider the case where we apply Laplace noise with noise parameter b>0b>0. Note that AA^{\prime} is a Laplace Mechanism over a histogram that can change in at most Δ0\Delta_{0} counts and each count can differ by at most Δ\Delta_{\infty}. Ignoring the threshold in AA^{\prime}, as this is a post processing of the noisy histogram and does not impact the privacy analysis, we can then say that the Laplace mechanism is being applied to a histogram with 1\ell_{1}-sensitivity Δ0Δ\Delta_{0}\Delta_{\infty}. Hence, we know that the Laplace mechanism is Δ0Δ/b\Delta_{0}\Delta_{\infty}/b-(pure) DP and hence (Δ02Δ2/b2/2)(\Delta_{0}^{2}\Delta_{\infty}^{2}/b^{2}/2)-(pure) CDP. However, this will not get us the result we want because it would result in Δ02ε2/2\Delta_{0}^{2}\varepsilon^{2}/2-CDP with b=Δ/εb=\Delta_{\infty}/\varepsilon. This was due to using the 1\ell_{1}-sensitivity of the histogram and then applying Theorem 1.

      We now consider the Laplace mechanism only on the common items between hh and hh^{\prime} that also have differing counts. We call the corresponding mechanism A^\hat{A} and denote the distribution P^\hat{P} for A^(h)\hat{A}(h) and the distribution Q^\hat{Q} for A^(h)\hat{A}(h^{\prime}). Note that each noisy count in A^\hat{A} is generated independently, so we can say that each count in A^\hat{A} is a single univariate Laplace mechanism. Each Laplace mechanism will then be Δ/b\Delta_{\infty}/b-(pure) DP and hence (Δ/b)2/2(\Delta_{\infty}/b)^{2}/2-(pure) CDP. Applying composition from Theorem 6 over each univariate Laplace mechanism implies that A^\hat{A} is Δ0(Δ)22b2\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}-(pure) CDP. Let A~\tilde{A} denote the Laplace Mechanism applied to the counts that were unchanged between hh and hh^{\prime} with distribution P~\tilde{P} for A~(h)=A~(h)\tilde{A}(h)=\tilde{A}(h^{\prime}). For ease of notation, let hh and hh^{\prime} match on the first ddd^{\prime}\leq d indices and let the first kΔ0k\leq\Delta_{0} indices of hh and hh^{\prime} have differing counts. Hence, we have for outcome y=(y1,,yd)y=(y_{1},\cdots,y_{d^{\prime}}) that P(y)=P^(y1,,yk)P~(yk+1,,yd)P^{\prime}(y)=\hat{P}(y_{1},\cdots,y_{k})\cdot\tilde{P}(y_{k+1},\cdots,y_{d^{\prime}}). Similarly, we have Q(y)=Q^(y1,,yk)P~(yk+1,,yd)Q^{\prime}(y)=\hat{Q}(y_{1},\cdots,y_{k})\cdot\tilde{P}(y_{k+1},\cdots,y_{d^{\prime}}). This gives us the following for λ1\lambda\geq 1

      Dλ(P||Q)=Dλ(P^||Q^)Δ0(Δ)22b2D_{\lambda}(P^{\prime}||Q^{\prime})=D_{\lambda}\left(\hat{P}||\hat{Q}\right)\leq\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}

      and similarly,

      Dλ(Q||P)Δ0(Δ)22b2.D_{\lambda}(Q^{\prime}||P^{\prime})\leq\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}.
    • (Gaussian) We next consider the Gaussian noise variant. We will denote AA^{\prime} as the Gaussian mechanism whose domain is the set of common items between hh and hh^{\prime} and has noise standard deviation Δ/b\Delta_{\infty}/b and only counts above TT will be returned with its corresponding item label. We write the distribution of A(h)A(h) as PP^{\prime} and the distribution of A(h)A^{\prime}(h^{\prime}) as QQ^{\prime}.

      Note that AA^{\prime} is a Gaussian Mechanism over a histogram that can change in at most Δ0\Delta_{0} counts and each count can differ by at most Δ\Delta_{\infty}. Ignoring the threshold in AA^{\prime}, again this does not impact the privacy analysis, we can then say that the Gaussian mechanism is being applied to a histogram with 2\ell_{2}-sensitivity ΔΔ\Delta_{\infty}\sqrt{\Delta_{\infty}}. Hence, we know that the Gaussian mechanism is Δ2Δ0/σ2\Delta_{\infty}^{2}\Delta_{0}/\sigma^{2}-(pure) CDP from Theorem 4. Furthermore, post processing the Gaussian Mechanism is also CDP with the same parameters, so restricting the noisy counts to be larger than TT gets us back to distributions PP^{\prime} and QQ^{\prime}. This gives us Dλ(P||Q)λΔ2Δ02σ2D_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\tfrac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}} and Dλ(Q||P)λΔ2Δ02σ2D_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\tfrac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}}, for all λ1\lambda\geq 1.

Hence, using Laplace noise with scale b>0b>0 in the Unknown Domain Histogram algorithm is δ\delta-approximate Δ02Δ22b2\frac{\Delta_{0}^{2}\Delta_{\infty}^{2}}{2b^{2}}-CDP. Setting b=Δ/εb=\Delta_{\infty}/\varepsilon completes the proof for Laplace noise. Furthermore, using Gaussian noise with standard deviation σ\sigma is δ\delta-approximate Δ2Δ02σ2\frac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}}-CDP and setting σ=Δε\sigma=\frac{\Delta_{\infty}}{\varepsilon} completes the proof.

We want to highlight the improvement we get when we use the CDP analysis. If we consider a similar analysis using approximate DP, we would remove the items that cannot be returned under both neighboring datasets and we would be left with the Laplace mechanism over the common items. We could then use the 1\ell_{1}-sensitivity of the resulting histogram, but then adding Laplace noise with scale b=Δ/εb=\Delta_{\infty}/\varepsilon, would result in (εΔ0,δ)(\varepsilon\Delta_{0},\delta)-DP, which can be converted to CDP using Theorem 3 to get δ\delta-approximate ε2Δ02/2\varepsilon^{2}\Delta_{0}^{2}/2-CDP, although we can get δ\delta-approximate ε2Δ0/2\varepsilon^{2}\Delta_{0}/2-CDP in our analysis. Furthermore, if we convert approximate CDP guarantees to approximate DP guarantees, this would lead to loose privacy parameters when developing privacy systems that use these algorithms. For example, if we use the Gaussian noise variant in Theorem 7 with Δ0=1\Delta_{0}=1, we can conclude that it is (ε2/2+ε2log(1/δ),δ+δ)(\varepsilon^{2}/2+\varepsilon\sqrt{2\log(1/\delta)},\delta+\delta^{\prime})-DP for any δ>0\delta^{\prime}>0, so if we only use DP guarantee and combine it with another Unknown Domain Histogram with Gaussian noise, we can compose to get (ε,2δ+δ)(\varepsilon^{\prime},2\delta+\delta^{\prime})-DP for any δ>0\delta^{\prime}>0 where

ε=ε2+2ε2log(1/δ).\varepsilon^{\prime}=\varepsilon^{2}+2\varepsilon\sqrt{2\log(1/\delta^{\prime})}.

However, if we had never converted to approximate DP until after composing both Unknown Domain Histogram mechanisms, we could have gotten an overall (ε′′,2δ+δ)(\varepsilon^{\prime\prime},2\delta+\delta^{\prime})-DP guarantee, where

ε′′=ε2+2εlog(1/δ).\varepsilon^{\prime\prime}=\varepsilon^{2}+2\varepsilon\sqrt{\log(1/\delta^{\prime})}.

4.2 Top-(k¯+1)(\bar{k}+1) Count Histograms

We now turn to a slight variant of releasing private histograms over a histogram with positive counts. In this setting, we assume that only a limited part of the histogram is available, perhaps due to an existing data analytics system that cannot return all counts. This setting first appeared in [6] and is especially important when designing DP algorithms on top of existing systems that cannot provide the full histogram of counts [18]. We will refer to the histogram consisting of the top-(k¯+1)(\bar{k}+1) items as the histogram with items and corresponding counts that are in the top-(k¯+1)(\bar{k}+1). Note that in this case, the top-(k¯+1)(\bar{k}+1) can change between neighboring histograms. We now present a general algorithm template, similar to Algorithm 1, in Algorithm 2 that takes an arbitrary threshold T>0T>0 and a top-(k¯+1)(\bar{k}+1)-histogram.

Algorithm 2 Unknown Domain from Top-(k¯+1)(\bar{k}+1)
Histogram hh, noise standard deviation σ\sigma, threshold T>0T>0, and top-(k¯+1)(\bar{k}+1) histogram
Noisy Histogram h~\tilde{h} with at most k¯\bar{k} counts.
Let h(k¯+1)h_{(\bar{k}+1)} be the histogram consisting of the top-(k¯+1)(\bar{k}+1) items, breaking ties arbitrarily.
if h(k¯)h_{(\bar{k})} has fewer than k¯\bar{k} items then
     Pad h(k¯)h_{(\bar{k})} with items 1,,k¯|h(k¯)|\bot_{1},\cdots,\bot_{\bar{k}-|h_{(\bar{k})}|} with cj=0c_{\bot_{j}}=0 until there are k¯\bar{k} items in h(k¯)h_{(\bar{k})}.
end if
Let c(k¯+1)c_{(\bar{k}+1)} be the count of the (k¯+1)(\bar{k}+1)-th item in hh, which might be zero.
Set T~=T+c(k¯+1)+N(0,σ2)\tilde{T}=T+c_{(\bar{k}+1)}+\mathrm{N}(0,\sigma^{2})
Initialize h~=\tilde{h}=\emptyset.
for each item ii where (i,ci)h(k¯)(i,c_{i})\in h_{(\bar{k})} do
     Set c~i=ci+N(0,σ2)\tilde{c}_{i}=c_{i}+\mathrm{N}(0,\sigma^{2})
     if c~i>T~\tilde{c}_{i}>\tilde{T}  then
         h~=h~{(i,c~i)}\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}
     end if
end for

We now show that it is indeed approximate CDP.

Theorem 8.

Assume input histograms hh are (Δ0,Δ)(\Delta_{0},\Delta_{\infty})-sensitive. If we use σ=Δε\sigma=\frac{\Delta_{\infty}}{\varepsilon} and threshold

T=Δ+2ΔεΦ1(1δ/Δ0)T=\Delta_{\infty}+\frac{\sqrt{2}\cdot\Delta_{\infty}}{\varepsilon}\Phi^{-1}(1-\delta/\Delta_{0})

then Algorithm 2 is δ\delta-approximate Δ0ε2/2\Delta_{0}\cdot\varepsilon^{2}/2-CDP.

Proof.

We follow the same analysis as in the proof of Theorem 7, which used Lemma 3.1. We again set SS to be the set of labels that are common between neighbors hh and hh^{\prime}. Note that we are only considering items with counts in the top-(k¯)(\bar{k}) of each histogram and the items between h(k¯)h_{(\bar{k})} and h(k¯)h_{(\bar{k})}^{\prime}, including the zero count items with labels in {j}\{\bot_{j}\}. Hence, there might be different items between h(k¯)h_{(\bar{k})} and h(k¯)h_{(\bar{k})}, as reducing some counts in hh might change the order of the top-(k¯+1)(\bar{k}+1). From Lemma 5.2 in [6], we know that there can be at most min{k¯,Δ0}\min\{\bar{k},\Delta_{0}\} many differing items between the top-(k¯+1)(\bar{k}+1) histograms h(k¯)h_{(\bar{k})} and h(k¯)h_{(\bar{k})}^{\prime}. We then follow the three items that we need to show in order to apply Lemma 3.1.

  • 1.

    We denote AA as Algorithm 2 and the event EE as the noise added to counts in outcomes SS for A(h)A(h), the noise added for the threshold T+c(k¯+1)T+c_{(\bar{k}+1)} to get T~\tilde{T}, and the noise added to the differing items not in SS must be no more that T~\tilde{T}. We define EE^{\prime} similarly for A(h)A(h^{\prime}). Note that for any item jj that is in h(k¯)h_{(\bar{k})} but not an item that can be returned in SS, we know

    cjcj+Δc(k¯)+Δc(k¯)+Δc(k¯+1)+Δ.c_{j}\leq c_{j}^{\prime}+\Delta_{\infty}\leq c^{\prime}_{(\bar{k})}+\Delta_{\infty}\leq c_{(\bar{k})}+\Delta_{\infty}\leq c_{(\bar{k}+1)}+\Delta_{\infty}.

    The analysis is straightforward due to the difference between two Gaussians being Gaussian itself. Hence, we have with T^=2σΦ1(1δ/Δ0)\hat{T}=\sqrt{2}\sigma\Phi^{-1}(1-\delta/\Delta_{0}) so that T=T^+ΔT=\hat{T}+\Delta_{\infty}.

    PrA(x)[¬E]\displaystyle\Pr_{A(x)}[\neg E] i=1Δ0Pr[c(k¯+1)+Δ+N(0,σ2)>c(k¯+1)+T+N(0,σ2)]\displaystyle\leq\sum_{i=1}^{\Delta_{0}}\Pr\left[c_{(\bar{k}+1)}+\Delta_{\infty}+\mathrm{N}(0,\sigma^{2})>c_{(\bar{k}+1)}+T+\mathrm{N}(0,\sigma^{2})\right]
    =Δ0Pr[N(0,2σ2)>T^]\displaystyle=\Delta_{0}\Pr[\mathrm{N}(0,2\sigma^{2})>\hat{T}]
    =Δ0(1Φ(T^2σ))\displaystyle=\Delta_{0}\left(1-\Phi\left(\frac{\hat{T}}{\sqrt{2}\sigma}\right)\right)
    =Δ0(1(1δ/Δ0))=δ\displaystyle=\Delta_{0}\left(1-(1-\delta/\Delta_{0})\right)=\delta

    The analysis to show PrA(x)[¬E]\Pr_{A(x^{\prime})}[\neg E^{\prime}] is similar.

  • 2.

    We then consider the distribution P(y)P^{\prime}(y) to be the distribution of A(h)A(h) conditioned on EE. Note that P(y)=Pr[E]P(y)P(y)=\Pr[E]P^{\prime}(y) for each ySy\in S and similarly Q(y)=Pr[E]Q(y)Q(y)=\Pr[E^{\prime}]Q^{\prime}(y) for Q(y)Q^{\prime}(y) being the distribution of A(h)A(h^{\prime}) conditioned on EE^{\prime}. We will modify the mechanism A(h)A(h) and A(h)A(h^{\prime}) that will result in the same mechanism when conditioned on events EE and EE^{\prime}, respectively. For each label that differs between h(k¯)h_{(\bar{k})} and h(k¯)h^{\prime}_{(\bar{k})}, we change it to common labels b1,bb_{1},\cdots b_{\ell} where =|{j:(j,){h(k¯)h(k¯)}}|Δ0\ell=|\{j:(j,\cdot)\in\{h_{(\bar{k})}\setminus h^{\prime}_{(\bar{k})}\}\}|\leq\Delta_{0}. Because we condition on events EE, no outcome ySy\in S can include the indices b1,,bb_{1},\cdots,b_{\ell}. Furthermore, it was shown in [17][Lemma 6.4]] that the resulting histogram with common labels will also have as many as Δ0\Delta_{0} items with differing counts, and those counts can change by at most Δ\Delta_{\infty}, regardless of how we assign the common labels {bj}\{b_{j}\}. We then let PP^{\prime} and QQ^{\prime} be the resulting distribution after this relabeling.

Next we will need to bound the Rényi divergence between PP^{\prime} and QQ^{\prime}. We will make use of the following result from [12][Corollary 4.3]

Lemma 4.1.

Suppose F,FF,F^{\prime} are two μ\mu-strongly convex functions over 𝒦d\mathcal{K}\subseteq\mathbb{R}^{d}, and FFF-F^{\prime} is GG-Lipschitz over 𝒦\mathcal{K}. For any k>0k>0, if we let PemFP\propto e^{-mF} and QemF"Q\propto e^{-mF"} be two probability distributions on 𝒦\mathcal{K}, then we have for all λ1\lambda\geq 1

Dλ(P||Q)λmG22μD_{\lambda}(P||Q)\leq\frac{\lambda mG^{2}}{2\mu}

This result is useful because it allows us to condition on outcomes from a joint Gaussian Mechanism falling in some convex region, which will correspond to releasing only “good” outcomes, i.e. not allowing certain counts from going above some noisy value. For the Gaussian mechanism, we have F(z)=zh22F(z)=||z-h||_{2}^{2} and F(z)=zh22F^{\prime}(z)=||z-h^{\prime}||_{2}^{2}, which are both 22-strongly convex over any convex region. Furthermore, we have zh22zh22||z-h||_{2}^{2}-||z-h^{\prime}||_{2}^{2} is 2Δ0Δ2\sqrt{\Delta_{0}}\cdot\Delta_{\infty}-Lipschitz. For the density of a Gaussian, we then use m=12σ2m=\tfrac{1}{2\sigma^{2}}

  • 3.

    We now want to prove that the Rényi divergence between PP^{\prime} and QQ^{\prime}, which are conditioned on events in EE and EE^{\prime} respectively. To do this we will consider the joint Gaussian distribution that releases all counts over the histograms with common labels, including the (k¯+1)(\bar{k}+1)-th largest count with added TT to its count being labeled \bot, but we do not enforce the threshold. We only want to consider events that do not have the items with labels in {bj}\{b_{j}\} having noisy count above the noisy count for \bot. We then consider the convex region 𝒦\mathcal{K} where these “bad” noisy counts do not go above the noisy count for \bot. We then apply Lemma 4.1 to claim that the resulting mechanism conditioned on this region has a bound on the Rényi Divergence that is the same as if it were the Gaussian mechanism not constrained to region 𝒦\mathcal{K}. Dropping the items with counts lower than the noisy count for \bot is simply post processing, which does not increase the Rényi divergence bound. Hence, we have Dλ(P||Q)λΔ0Δ22σ2D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{\lambda\Delta_{0}\Delta_{\infty}^{2}}{2\sigma^{2}} and Dλ(P||Q)λΔ0Δ22σ2D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{\lambda\Delta_{0}\Delta_{\infty}^{2}}{2\sigma^{2}} for all λ1\lambda\geq 1.

Setting σ=Δ/ε\sigma=\Delta_{\infty}/\varepsilon completes the proof. ∎

4.2.1 Exponential Mechanism

For the previous applications, there was a crucial assumption that the input histograms were (Δ0,Δ)(\Delta_{0},\Delta_{\infty})-sensitive, specifically that the histogram’s 0\ell_{0}-sensitivity must be bounded by Δ0\Delta_{0}. However, this might not always be the case, and would be difficult to enforce a bound in practice – requiring that each user has only a certain number of distinct items in the data. One way to limit the impact a single user can have on the result is to limit the number of items that can be returned. The Laplace/Gaussian Mechanism based algorithms can return an arbitrary number of things that are above the threshold and in the setting where we only have access to the top-(k¯+1)(\bar{k}+1) items, we could return all k¯\bar{k}. Hence, the CDP parameter could be bounded in terms of k¯\bar{k}, but we would like to control how many things could be returned from the counts we have access to. In this case, we can return the top-kk results, where kk¯k\leq\bar{k} is an input to the algorithm. Unfortunately, the previous analysis would still have the CDP parameter scale with k¯\bar{k} despite only wanting to return kk.

To ensure that privacy loss only scales with the number of things that are returned, we can use the classical Exponential Mechanism [15], as presented in Theorem 2. It was shown that the Exponential Mechanism is part of a general framework of DP mechanisms called report noisy max. This can be summarized as adding noise to the quality scores for each outcome and returning the outcome with the noisy max. Note that it is critical that only the arg max is returned, not the actual noisy value. It turns out that adding Gumbel noise to the quality scores and returning the max item is equivalent to the Exponential Mechanism, see [6]. Furthermore, it was shown that iteratively applying the Exponential Mechanism to return the top-kk outcomes is equivalent to adding Gumbel noise to all quality scores and returning the outcomes with the kk largest noisy quality scores [6]. Other mechanisms in the report noisy max framework include adding Laplace noise to the quality scores [7] and adding Exponential noise to the quality scores [4] which turns out to be equivalent to the Permute-and-Flip Mechanism [14].555See previous blog post on one-shot top-kk DP algorithms: https://differentialprivacy.org/one-shot-top-k/

We then present the Unknown Domain Gumbel algorithm in Algorithm 3 from [6] which takes an additional parameter kk¯k\leq\bar{k} and importantly returns a ranked list of items, but not their counts.

Algorithm 3 Unknown Domain Gumbel from Top-(k¯+1)(\bar{k}+1)
Histogram hh, noise scale β>0\beta>0, threshold T>0T>0, and k,k¯k,\bar{k}
Sorted list of at most kk items, IkI_{k}.
Let h(k¯)h_{(\bar{k})} be the histogram consisting of the top-(k¯)(\bar{k}) items
Let c(k¯+1)c_{(\bar{k}+1)} be the count of the (k¯+1)(\bar{k}+1)-th item in hh.
Set T~=T+c(k¯+1)+Gumb(β)\tilde{T}=T+c_{(\bar{k}+1)}+\mathrm{Gumb}(\beta)
Initialize h~=\tilde{h}=\emptyset.
for each item ii where (i,ci)h(k¯)(i,c_{i})\in h_{(\bar{k})} such that ci>0c_{i}>0 do
     Set c~i=ci+Gumb(β)\tilde{c}_{i}=c_{i}+\mathrm{Gumb}(\beta)
     if c~i>T~\tilde{c}_{i}>\tilde{T}  then
         h~=h~{(i,c~i)}\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}
     end if
end for
Let IkI_{k} be the ordered list of at most kk items that are sorted in descending order by their count in h~\tilde{h}.
if Ik<kI_{k}<k then
     Ik=Ik{}I_{k}=I_{k}\cup\{\bot\}
end if

We then state Unknown Domain Gumbel’s privacy guarantee, which will largely follow the analysis in [6], although adapted for CDP.

Theorem 9.

Assume input histograms hh are (,1)(\infty,1)-sensitive. If we use the noise scale β=1/ε\beta=1/\varepsilon, and threshold

T=1+1εlog(k¯δ),T=1+\frac{1}{\varepsilon}\log(\tfrac{\bar{k}}{\delta}),

then Algorithm 3 is δ\delta-approximate kε2/8k\varepsilon^{2}/8-CDP.

Proof.

We will apply our general framework from Lemma 3.1 and some previous results from [6]. We will denote AA as the Unknown Domain Gumbel mechanism. We assume WLOG that h,hh,h^{\prime} differ by there being one additional user’s data in hh when compared to hh^{\prime}.

  • 1.

    We denote SS as the set of common outcomes that are possible between top-(k¯)(\bar{k}) histograms h(k¯)h_{(\bar{k})} and h(k¯)h^{\prime}_{(\bar{k})} and EE to be all the randomness in A(h)A(h) that can generate outcomes in SS. Hence, EE must only include the Gumbel noise added to items that are common in hh and hh^{\prime} and the noise that is added to the differing counts must have a noisy count either below the top-kk or below the noisy threshold T~\tilde{T} that is set. From Lemma 5.5 in [6], we have the following bound when T=1+βlog(Δ0/δ)T=1+\beta\log\left(\Delta_{0}/\delta\right)

    PrA(h)[E]1δ.\Pr_{A(h)}[E]\geq 1-\delta.

    We similarly define events EE^{\prime} for the randomness in A(h)A(h^{\prime}) that can generate outcomes in SS which gives us the same lower bound for PrA(h)[E]\Pr_{A(h^{\prime})}[E^{\prime}].

  • 2.

    From Lemma 5.6 in [6], we know that there exists a distribution P,QP^{\prime},Q^{\prime} such that for all outcomes ySy\in S, we have

    Pr[A(x)=y]=Pr[E]P(y)Pr[A(x)=y]=Pr[E]Q(y).\Pr[A(x)=y]=\Pr[E]P^{\prime}(y)\qquad\Pr[A(x^{\prime})=y]=\Pr[E^{\prime}]Q^{\prime}(y).
  • 3.

    These distributions P,QP^{\prime},Q^{\prime} are also related to the Exponential Mechanism. Specifically, PP^{\prime} is the distribution of iteratively applying the Exponential Mechanism on h(k¯)h_{(\bar{k})} only over the items that are common between h(k¯)h_{(\bar{k})} and h(k¯)h_{(\bar{k})}. Similarly, we can define QQ^{\prime} as the distribution of iteratively applying the Exponential Mechanism on h(k¯)h_{(\bar{k})}^{\prime} from items that are common between h(k¯)h_{(\bar{k})} and h(k¯)h_{(\bar{k})}. We can then use Theorem 5 to conclude that Dλ(P||Q)k8β2D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{k}{8\beta^{2}} and Dλ(Q||P)k8β2D_{\lambda}(Q^{\prime}||P^{\prime})\leq\frac{k}{8\beta^{2}}, for all λ1\lambda\geq 1.

Setting β=1ε\beta=\frac{1}{\varepsilon} completes the proof. ∎

4.2.2 Pay-what-you-get Composition

We now consider the setting where an analyst wants to run multiple top-kk queries over an unknown domain. One of the issues with the Unknown Domain Gumbel mechanism in Algorithm 3 is that it can sometimes return fewer than kk results, yet the overall CDP privacy parameter scales with kk, regardless of the number of results returned. Hence, with \ell^{*} many top-kk queries, the privacy loss would scale with k\ell^{*}\cdot k with traditional privacy techniques. However, [6] showed that if we instead set an overall bound on the number of results that can be returned kk^{*} from all the intermediate top-kk queries, then the privacy loss will scale with kk^{*}. They referred to this as pay-what-you-get composition because even if an analyst requests top-100 items, and only 1 result is returned, we need only deduct the number of items that are returned (including \bot) from the overall privacy budget until kk^{*} items have been returned. We present the pay-what-you-get composition in Algorithm 4

Algorithm 4 Pay-what-you-get Mechanism
Sequence of histogram h(1),,h()h^{(1)},\cdots,h^{(\ell^{*})}, noise scale β>0\beta>0, threshold T>0T>0, and kk^{*}
Sequence of sorted lists {I(i):i[]}\{I^{(i)}:i\in[\ell^{*}]\}.
Initialize k=0k=0
for i[]i\in[\ell^{*}] do
     Apply the Unknown Domain Gumbel from top-(k¯(i)+1)(\bar{k}^{(i)}+1) on histogram h(i)h^{(i)} with noise scale β\beta, threshold T>0T>0, along with k(i)(kk)k^{(i)}\leq(k^{*}-k) and k¯(i)\bar{k}^{(i)} to get items I(i)I^{(i)}, which includes \bot in some cases.
     Update k=k+|I(i)|k=k+|I^{(i)}|
     if k==kk==k^{*} then
         break
     end if
end for

The key observation to proving the overall privacy guarantee of pay-what-you-get is that each Unknown Domain Gumbel Mechanism is related to the Exponential Mechanism. Removing the randomness that can generate outcomes from the differing items between neighboring datasets, the algorithm becomes equivalent to an Exponential Mechanism. Hence, considering only the randomness over each Unknown Domain Gumbel that can generate outcomes in both neighboring histograms, we are left with just a string of at most kk^{*} many Exponential Mechanisms.

Theorem 10.

Given an adaptive stream of histograms h(1),,h()h^{(1)},\cdots,h^{(\ell^{*})}, where each histogram h(i)h^{(i)} is (,1)(\infty,1)-sensitive, the Pay-what-you-get mechanism in Algorithm 4 with β=Δ/ε\beta=\Delta_{\infty}/\varepsilon and threshold TT being the same as in Theorem 9 with δ>0\delta>0 is δ\ell^{*}\delta-approximate kε2/8k^{*}\varepsilon^{2}/8-CDP.

Proof.

We let AA be the Pay-what-you-get mechanism with neighboring histogram streams (h(1),,h())(h^{(1)},\cdots,h^{(\ell^{*})}) and (h(1),,h())(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})}), i.e. each histogram h(i)h^{(i)} and h(i)h^{\prime(i)} are neighbors, we denote PP as the distribution for A(h(1),,h())A(h^{(1)},\cdots,h^{(\ell^{*})}) and QQ as the distribution for A(h(1),,h())A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})}). WLOG, we assume that each h(i)h^{(i)} has an additional user’s data from h(i)h^{\prime(i)} for i[]i\in[\ell^{*}].

We will denote S=(S(1),,S())S=(S^{(1)},\cdots,S^{(\ell^{*})}) as the set of common outcomes between these neighboring streams.

  • 1.

    Let E=(E(1),,E())E=(E^{(1)},\cdots,E^{(\ell^{*})}) be the corresponding randomness in A(h(1),,h())=A(1)(h(1)),,A()(h()),A(h^{(1)},\cdots,h^{(\ell^{*})})=A^{(1)}(h^{(1)}),\cdots,A^{(\ell^{*})}(h^{(\ell^{*})}), that can generate outcomes in S=(S(1),,S())S=(S^{(1)},\cdots,S^{(\ell^{*})}), where each A(i)A^{(i)} is an Unknown Domain Gumbel mechanism, and similarly we denote E=(E(1),,E())E^{\prime}=(E^{\prime(1)},\cdots,E^{\prime(\ell^{*})}) as the corresponding randomness in A(h(1),,h())A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})}) that can generate outcomes in SS. We then apply Lemma 5.5 in [6] to get the following for each event set of A(i)A^{(i)}

    PrA(i)(h(i))[E(i)]1δ,PrA(i)(h(i))[E(i)]1δ,i[].\Pr_{A^{(i)}(h^{(i)})}[E^{(i)}]\geq 1-\delta,\qquad\Pr_{A^{(i)}(h^{\prime(i)})}[E^{\prime(i)}]\geq 1-\delta,\qquad\forall i\in[\ell^{*}].

    Hence, applying a union bound over all \ell^{*} events, we get

    Pr[E]1δ,Pr[E]1δ.\Pr[E]\geq 1-\ell^{*}\delta,\qquad\Pr[E^{\prime}]\geq 1-\ell^{*}\delta.
  • 2.

    We then apply Lemma 5.6 in [6], as we did in Theorem 9 to write the distribution of AA in terms of a stream of top-k(i)k^{(i)} mechanisms that can be further decomposed into Exponential Mechanisms. Given the prior outcomes y(<i)y^{(<i)}, we will write P(i)(;y(<i))P^{\prime(i)}(\cdot;y^{(<i)}) to denote the distribution of the top-k(i)k^{(i)} mechanism evaluated on h(k¯(i))(i)h^{(i)}_{(\bar{k}^{(i)})} but only on the common items between h(k¯(i))(i)h^{(i)}_{(\bar{k}^{(i)})} and h(k¯(i))(i)h^{\prime(i)}_{(\bar{k}^{(i)})}. Similarly, we will write Q(i)(;y(<i))Q^{\prime(i)}(\cdot;y^{(<i)}) to denote the distribution of the top-k(i)k^{(i)} mechanism evaluated on h(k¯(i))(i)h^{\prime(i)}_{(\bar{k}^{(i)})} but only on the common items with h(k¯(i))(i)h^{(i)}_{(\bar{k}^{(i)})}. We can now apply Lemma 5.6 from [6] iteratively to get for all y=(y(1),,y())Sy=(y^{(1)},\cdots,y^{(\ell^{*})})\in S,

    Pr[A(h(1),,h())=y]\displaystyle\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y] =i=1Pr[E(i)]P(i)(y(i);y(<i))\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})
    Pr[A(h(1),,h())=y]\displaystyle\Pr[A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})=y] =i=1Pr[E(i)]Q(i)(y(i);y(<i)).\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{\prime(i)}]Q^{\prime(i)}(y^{(i)};y^{(<i)}).
  • 3.

    We then consider the distribution P(y):=i=1P(i)(y(i);y(<i))P^{\prime}(y):=\prod_{i=1}^{\ell^{*}}P^{\prime(i)}(y^{(i)};y^{(<i)}), which can be further decomposed into separate Exponential Mechanisms. We will write y(i)=(y(i)[1],,y(i)[|y(i)|])y^{(i)}=(y^{(i)}[1],\cdots,y^{(i)}[|y^{(i)}|]) and P^(i,j)\hat{P}^{(i,j)} as the distribution for the Exponential Mechanism for the ii-th round’s mechanism returning the jj-th item. Hence we have

    P(y)=i=1P(i)(y(i);y(<i))=j=1j=1|y(i)|P^(i,j)(y(i)[j];y(<i),y(i)[1],,y(i)[j1]).P^{\prime}(y)=\prod_{i=1}^{\ell^{*}}P^{\prime(i)}(y^{(i)};y^{(<i)})=\prod_{j=1}^{\ell^{*}}\prod_{j=1}^{|y^{(i)}|}\hat{P}^{(i,j)}(y^{(i)}[j];y^{(<i)},y^{(i)}[1],\cdots,y^{(i)}[j-1]).

    Note that the two products on the right hand side will have at most kk^{*} many terms due to the number of items that can be returned by pay-what-you-get, and each item is the distribution of an Exponential Mechanism. We have a similar argument for Q(i)(;y(<i))Q^{\prime(i)}(\cdot;y^{(<i)}) being decomposed as a sequence of Exponential Mechanisms with Q(y)=i=1Q(i)(y(i);y(<i))Q^{\prime}(y)=\prod_{i=1}^{\ell^{*}}Q^{\prime(i)}(y^{(i)};y^{(<i)}). Hence, we have Dλ(P||Q)λkε2/8D_{\lambda}\left(P^{\prime}||Q^{\prime}\right)\leq\lambda k^{*}\varepsilon^{2}/8 and Dλ(Q||P)λkε2/8D_{\lambda}\left(Q^{\prime}||P^{\prime}\right)\leq\lambda k^{*}\varepsilon^{2}/8 for all λ1\lambda\geq 1.

Note that we can almost apply Lemma 3.1, but not quite because we have P(y)=Pr[A(h(1),,h())=y]=i=1Pr[E(i)]P(i)(y(i);y(<i))P(y)=\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y]=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)}) when ySy\in S. We then construct the joint distribution as

P(y(1),,y())\displaystyle P(y^{(1)},\cdots,y^{(\ell^{*})}) =i=1Pi(y(i)|y(<i))=i=1(Pr[E(i)]Pi(y(i)|y(<i))+Pr[¬E(i)]Pi(y(i)|y(<i),¬E(i)))\displaystyle=\prod_{i=1}^{\ell^{*}}P_{i}(y^{(i)}|y^{(<i)})=\prod_{i=1}^{\ell^{*}}\left(\Pr[E^{(i)}]P_{i}^{\prime}(y^{(i)}|y^{(<i)})+\Pr[\neg E^{(i)}]P_{i}(y^{(i)}|y^{(<i)},\neg E^{(i)})\right)
=i=1Pr[E(i)]P(i)(y(i);y(<i))\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})
+S[](iSPr[E(i)]iSPr[¬E(i)])(iSPi(y(i)|y(<i)iSPi(y(i)|y(<i),¬E(i)))\displaystyle\quad+\sum_{S\subseteq[\ell^{*}]\setminus\emptyset}\left(\prod_{i\in S}\Pr[E^{(i)}]\prod_{i\notin S}\Pr[\neg E^{(i)}]\right)\left(\prod_{i\in S}P_{i}^{\prime}(y^{(i)}|y^{(<i)}\prod_{i\notin S}P_{i}(y^{(i)}|y^{(<i)},\neg E^{(i)})\right)
=:P(y(1),,y())i=1Pr[E(i)]+P^′′(y(1),,y())\displaystyle=:P^{\prime}(y^{(1)},\cdots,y^{(\ell^{*})})\cdot\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]+\hat{P}^{\prime\prime}(y^{(1)},\cdots,y^{(\ell^{*})})

We can then write the probability as a convex combination, so that for all yy we have the following

Pr[A(h(1),,h())=y]\displaystyle\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y] =P(y(1),,y())i=1Pr[E(i)]+P^′′(y)\displaystyle=P^{\prime}(y^{(1)},\cdots,y^{(\ell^{*})})\cdot\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]+\hat{P}^{\prime\prime}(y)
Pr[A(h(1),,h())=y]\displaystyle\implies\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y] =(1δ)P(y)\displaystyle=(1-\delta)P^{\prime}(y)
+δ(1/δ(P^′′(y)+P(y)(i=1Pr[E(i)](1δ))))P′′(y)\displaystyle\quad+\delta\underbrace{\left(1/\delta\cdot\left(\hat{P}^{\prime\prime}(y)+P^{\prime}(y)\left(\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]-(1-\delta)\right)\right)\right)}_{P^{\prime\prime}(y)}

Similarly, we have

Pr[A(h(1),,h())=y]=(1δ)i=1Q(i)(y(i);y(<i))Q(y)+δQ′′(y).\Pr[A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})=y]=(1-\delta)\underbrace{\prod_{i=1}^{\ell^{*}}Q^{\prime(i)}(y^{(i)};y^{(<i)})}_{Q^{\prime}(y)}+\delta Q^{\prime\prime}(y).

This completes the proof. ∎

4.3 Continual Observation

We now present an approach for continually releasing a running counter over various domain elements while ensuring differential privacy. The continual observation privacy model was introduced in [3, 11] and is meant to ensure strong privacy guarantees in the setting where we want to continually provide a running counter on a stream of events, denoted at ω(1:)=(ω(1),,ω())\omega^{(1:\ell)}=\left(\omega^{(1)},\cdots,\omega^{(\ell)}\right) for =1,,L\ell=1,\cdots,L where ω(i){0,1}\omega^{(i)}\in\{0,1\}. It is then straightforward to extend the continual observation counters to the setting of releasing a running histogram over a known set of items, i.e. ω(i)[d]\omega^{(i)}\subseteq[d], where someone can contribute a limited set of items |ω(i)|Δ0|\omega^{(i)}|\leq\Delta_{0} at each round ii in the stream. Typically we want to ensure event-level privacy, where we consider the change in outcomes when one event in the stream can change.

Recent work from [17, 23] have also considered the continual observation setting, but in the case where we want to continually release histograms over an unknown set of items. Consider the motivating example where we want to provide a running counter for the drugs that are purchased at a pharmacy and each customer can buy at most Δ0\Delta_{0} different drugs. There might be several different drugs to provide a counter for and new drugs can emerge later that were not known about before. Hence, we would like to have algorithms that do not require a set of items to provide counts over.

We describe the algorithm at a high level, as formally describing will require some additional notation. The main subroutine is the Binary Mechanism from [3, 11] which will add at most log2()\log_{2}(\ell) many noise terms to the count of a stream of length \ell events. The number of noise terms depends on the number of 1s in the binary representation of \ell. This will ensure that for a stream of length at most LL, we can release LL counts, one after each event, each with Gaussian noise with standard deviation O(log2(L)/ε)O(\log_{2}(L)/\varepsilon). Although we release LL counts, the privacy analysis relies on the fact that we form a table of partial sums, so that one event in the stream can modify at most log2(L)\log_{2}(L) partial sums and we can view the Binary Mechanism as a Gaussian mechanism on the table of partial sums for all common items between ω(1:L)\omega^{(1:L)} and ω(1:L)\omega^{\prime(1:L)}, which has 2\ell_{2}-sensitivity Δ0log2(L)\Delta_{0}\cdot\log_{2}(L).

The Unknown Domain Binary Mechanism follows the same approach as the classical Binary Mechanism, which we present at a high level. The idea on the Binary Mechanism is to split a stream of items ω(1:)\omega^{(1:\ell)} into overlapping partial sums, guaranteeing that each ω(i)\omega^{(i)} is part of no more than log2(L+1)\lceil\log_{2}(L+1)\rceil partial sums. We create separate partial sums for each item in ω(1:)\omega^{(1:\ell)}. For instance, with =8\ell=8, we will focus on a single partial sum, based on the binary representation of \ell, so that we need only add noise to the partial sums j=18𝟙{uω(j)}\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\} for each uω(1:)u\in\omega^{(1:\ell)} which we add N(0,σ2)\mathrm{N}(0,\sigma^{2}) noise to the partial sum and is then used again for any other partial sum that utilizes j=18𝟙{uω(j)}\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\}. For instance, when =10\ell=10, we will use the two partial sums j=18𝟙{uω(j)}\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\} and j=910𝟙{uω(j)}\sum_{j=9}^{10}\mathbbm{1}\left\{u\in\omega^{(j)}\right\} for each uω(1:10)u\in\omega^{(1:10)}, each with its own noise added to it. Note that for each uω(1:)ω(1:1)u\in\omega^{(1:\ell)}\setminus\omega^{(1:\ell-1)}, we will add fresh noise to each prior partial sum for the new item uu. We then only release items with corresponding noisy counts if it is larger than a fixed threshold T>0T>0.

We now present the privacy analysis for Unknown Domain Binary Mechanism from [17].

Theorem 11.

Assume that |ω()|Δ0|\omega^{(\ell)}|\leq\Delta_{0} for all [L]\ell\in[L]. Setting σ=1/ε\sigma=1/\varepsilon and threshold TT to be the following for any δ>0\delta>0

T=1+σlog2(L+1)+1Φ1(1δΔ0L)T=1+\sigma\cdot\sqrt{\lceil\log_{2}(L+1)\rceil+1}\cdot\Phi^{-1}\left(1-\frac{\delta}{\Delta_{0}\cdot L}\right)

ensures that Unknown Domain Binary Mechanism is δ\delta-approximate Δ0log2(L+1)ε2/2\Delta_{0}\cdot\lceil\log_{2}(L+1)\rceil\varepsilon^{2}/2-CDP, under event-level adjacent streams.

Proof.

We follow the same analysis as in the earlier theorems, mainly leveraging Lemma 3.1. Let ω(1:L)\omega^{(1:L)} contain an event where neighboring stream ω(1:L)\omega^{\prime(1:L)} has an empty set at that event. Say that round \ell is where they differ so that ω()=\omega^{\prime(\ell)}=\emptyset and |ω()|=Δ0|\omega^{(\ell)}|=\Delta_{0}. Let AA denote the Unknown Domain Binary Mechanism. We denote SS as the set of all outcomes that both A(ω(1:L))A(\omega^{(1:L)}) and A(ω(1:L)A(\omega^{\prime(1:L)} can return. Note that at round \ell, stream event ω()\omega^{(\ell)} can introduce as many as Δ0\Delta_{0} previously unseen items in the stream. We will write the distribution of A(ω(1:L))A(\omega^{(1:L)}) as PP and the distribution of A(ω(1:L))A(\omega^{\prime(1:L)}) as QQ.

  • 1.

    We will write EE as the randomness in A(ω(1:L))A(\omega^{(1:L)}) that can generate outcomes that are common between A(ω(1:L))A(\omega^{(1:L)}) and A(ω(1:L))A(\omega^{\prime(1:L)}). We need to ensure that no noisy count on any new item that appears in ω()\omega^{(\ell)} but not in ω(1:1)\omega^{(1:\ell-1)} can go above the threshold TT. At round \ell we can add together as many as (log2(+1))(log2(L+1))(\lceil\log_{2}(\ell+1)\rceil)\leq(\lceil\log_{2}(L+1)\rceil) independent Gaussian noise terms, which itself will be Gaussian. We then apply a union bound over all possible Δ0\Delta_{0} items in ω()\omega^{(\ell)} and further a union bound over all possible rounds LL to get the following when T=1+σlog2(L+1)Φ1(1δΔ0L)T=1+\sigma\cdot\sqrt{\lceil\log_{2}(L+1)\rceil}\cdot\Phi^{-1}(1-\tfrac{\delta}{\Delta_{0}\cdot L})

    Pr[¬E]Δ0LPr[N(0,log2(L+1)σ2)T1]=δ.\Pr[\neg E]\leq\Delta_{0}\cdot L\cdot\Pr[\mathrm{N}(0,\lceil\log_{2}(L+1)\rceil\sigma^{2})\geq T-1]=\delta.
  • 2.

    We will write the distribution of the Binary Mechanism evaluated on ω(1:L)\omega^{(1:L)} with only the common items with ω(1:L)\omega^{\prime(1:L)} as PP^{\prime} and whose counts are positive. We then have for all outcomes ySy\in S that

    P(y)=Pr[E]P(y)P(y)=\Pr[E]P^{\prime}(y)

    Since QQ is the distribution for the neighboring input ω(1:L)\omega^{(1:L)} where ω()=\omega^{(\ell)}=\emptyset, we have SS is all outcomes so that Q(y)=Q(y)Q(y)=Q^{\prime}(y) for all ySy\in S.

  • 3.

    Note that PP^{\prime} and QQ^{\prime} are simply a post-processing functions of the partial sums table with Gaussian noise added to each cell in the table. Hence, we need to consider the Rényi divergence for the partial sum tables on common items between ω(1:L)\omega^{(1:L)} and ω(1:L)\omega^{(1:L)}. We then have Dλ(P||Q)Δ0(log2(L+1))/σ2λD_{\lambda}(P^{\prime}||Q)\leq\Delta_{0}(\lceil\log_{2}(L+1)\rceil)/\sigma^{2}\lambda and Dλ(P||Q)Δ0(log2(L+1))/σ2λD_{\lambda}(P^{\prime}||Q)\leq\Delta_{0}(\lceil\log_{2}(L+1)\rceil)/\sigma^{2}\lambda for all λ>1\lambda>1 because this is a randomized post processing (including additional noise terms) on the common items in each partial sum table in the same as the Binary Mechanism.

Setting σ=1/ε\sigma=1/\varepsilon completes the proof.

5 Conclusion

We have presented a unified framework to prove that several different algorithms over unknown domain histograms are approximate CDP. In many settings, practitioners want to have a way to incorporate private algorithms with minimal onboarding. A major bottleneck for incorporating private algorithms into existing systems is requiring a fixed list of items that we want to release counts for. Furthermore, products teams might be comfortable with noise added to counts, but not displaying counts for items that never appeared in the dataset. We wanted to show how the privacy analyses of many existing DP algorithms can be unified by fixing neighboring datasets and considering not just outcomes that can occur in both neighboring inputs, but also the related distributions that can only generate these good outcomes. We think that approximate CDP provides the easiest way to combine these algorithms together to get tight privacy loss bounds, as the privacy analysis of many rely on improved pure CDP bounds rather than pure DP bounds. For example, we showed how using a CDP analysis of the Laplace mechanism can improve the CDP privacy parameter by considering composition over differing counts, rather than relying on an 1\ell_{1}-sensitivity bound as would be the case for DP. We can also use the tighter connection between Exponential Mechanisms and CDP, rather than using pure DP parameters of the Exponential Mechanism. Lastly, the Gaussian mechanism does not satisfy a pure DP bound, so using CDP is a natural fit and converting to approximate DP would result in a lossy DP parameter. We hope that this unified framework will help demystify some previous analyses and can be leveraged in designing future private algorithms.

6 Acknowledgements

Special thanks for the helpful comments from David Durfee and Thomas Steinke that helped improve the quality of this survey.

References

  • Bun and Steinke [2016] M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In M. Hirt and A. Smith, editors, Theory of Cryptography, pages 635–658, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg. ISBN 978-3-662-53641-4. URL https://link.springer.com/chapter/10.1007/978-3-662-53641-4_24.
  • Cesar and Rogers [2021] M. Cesar and R. Rogers. Bounding, concentrating, and truncating: Unifying privacy loss composition for data analytics. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 421–457. PMLR, 16–19 Mar 2021. URL https://proceedings.mlr.press/v132/cesar21a.html.
  • Chan et al. [2012] T.-H. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In European Symposium on Algorithms (ESA), 2012. URL http://eprint.iacr.org/2012/373.pdf.
  • Ding et al. [2021] Z. Ding, D. Kifer, S. M. S. N. E., T. Steinke, Y. Wang, Y. Xiao, and D. Zhang. The permute-and-flip mechanism is identical to report-noisy-max with exponential noise, 2021.
  • Dong et al. [2020] J. Dong, D. Durfee, and R. Rogers. Optimal differential privacy composition for exponential mechanisms. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 2597–2606. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/dong20a.html.
  • Durfee and Rogers [2019] D. Durfee and R. M. Rogers. Practical differentially private top-k selection with pay-what-you-get composition. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 3527–3537, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/b139e104214a08ae3f2ebcce149cdf6e-Abstract.html.
  • Dwork and Roth [2014] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 & 4):211–407, 2014. doi: 10.1561/0400000042. URL http://dx.doi.org/10.1561/0400000042.
  • Dwork and Rothblum [2016] C. Dwork and G. Rothblum. Concentrated differential privacy. arXiv:1603.01887 [cs.DS], 2016.
  • Dwork et al. [2006a] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, pages 486–503, Berlin, Heidelberg, 2006a. Springer Berlin Heidelberg. ISBN 978-3-540-34547-3. URL https://www.iacr.org/archive/eurocrypt2006/40040493/40040493.pdf.
  • Dwork et al. [2006b] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06, page 265–284, Berlin, Heidelberg, 2006b. Springer-Verlag. ISBN 3540327312. doi: 10.1007/11681878˙14. URL https://doi.org/10.1007/11681878_14.
  • Dwork et al. [2010] C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC ’10, page 715–724, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450300506. doi: 10.1145/1806689.1806787. URL https://doi.org/10.1145/1806689.1806787.
  • Gopi et al. [2022] S. Gopi, Y. T. Lee, and D. Liu. Private convex optimization via exponential mechanism. In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1948–1989. PMLR, 02–05 Jul 2022. URL https://proceedings.mlr.press/v178/gopi22a.html.
  • Korolova et al. [2009] A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, page 171–180, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605584874. doi: 10.1145/1526709.1526733. URL https://doi.org/10.1145/1526709.1526733.
  • McKenna and Sheldon [2020] R. McKenna and D. Sheldon. Permute-and-flip: A new mechanism for differentially private selection. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
  • McSherry and Talwar [2007] F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103, 2007. doi: 10.1109/FOCS.2007.66. URL http://dx.doi.org/10.1109/FOCS.2007.66.
  • Papernot and Steinke [2022] N. Papernot and T. Steinke. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=-70L8lpp9DF.
  • Rivera Cardoso and Rogers [2022] A. Rivera Cardoso and R. Rogers. Differentially private histograms under continual observation: Streaming selection into the unknown. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 2397–2419. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/rivera-cardoso22a.html.
  • Rogers et al. [2021] R. Rogers, S. Subramaniam, S. Peng, D. Durfee, S. Lee, S. K. Kancha, S. Sahay, and P. Ahammad. Linkedin’s audience engagements api: A privacy preserving data analytics system at scale. Journal of Privacy and Confidentiality, 11(3), Dec. 2021. doi: 10.29012/jpc.782. URL https://journalprivacyconfidentiality.org/index.php/jpc/article/view/782.
  • Steinke [2022] T. Steinke. Composition of differential privacy & privacy amplification by subsampling, 2022.
  • Swanberg et al. [2023] M. Swanberg, D. Desfontaines, and S. Haney. Dp-sips: A simpler, more scalable mechanism for differentially private partition selection, 2023.
  • Whitehouse et al. [2023] J. Whitehouse, A. Ramdas, R. Rogers, and Z. S. Wu. Fully adaptive composition in differential privacy. In International Conference on Machine Learning. PMLR, 2023.
  • Wilson et al. [2020] R. J. Wilson, C. Y. Zhang, W. Lam, D. Desfontaines, D. Simmons-Marengo, and B. Gipson. Differentially private SQL with bounded user contribution. Proceedings on Privacy Enhancing Technologies, 2020(2):230 – 250, 2020. doi: https://doi.org/10.2478/popets-2020-0025. URL https://content.sciendo.com/view/journals/popets/2020/2/article-p230.xml.
  • Zhang et al. [2023] B. Zhang, V. Doroshenko, P. Kairouz, T. Steinke, A. Thakurta, Z. Ma, H. Apte, and J. Spacek. Differentially private stream processing at scale, 2023.