\addauthor

Ryanred

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Ryan Rogers

Abstract

There are many existing differentially private algorithms for releasing histograms, i.e. counts with corresponding labels, in various settings. Our focus in this survey is to revisit some of the existing differentially private algorithms for releasing histograms over unknown domains, i.e. the labels of the counts that are to be released are not known beforehand. The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels because they are not present in the original histogram but in a hypothetical neighboring dataset could appear in the histogram. However, the challenge in designing differentially private algorithms for releasing histograms over an unknown domain is that some outcomes can clearly show which input was used, clearly violating privacy. The goal then is to show that the differentiating outcomes occur with very low probability. We present a unified framework for the privacy analyses of several existing algorithms. Furthermore, our analysis uses approximate concentrated differential privacy from Bun and Steinke [1], which can improve the privacy loss parameters rather than using differential privacy directly, especially when composing many of these algorithms together in an overall system.

1 Introduction

Releasing histograms, counts over some set of items, is one of the most fundamental tasks in data analytics. Given a dataset, we want to compute the number of rows that satisfy some condition, grouped together by some column(s) of interest, say the number of rows in each country. Despite the commonality of this task, there are many different differentially private algorithms to use, depending on the setting we are in, e.g. do we want to show only the top- $k$ , do we know how many items a user can have in any row, do we know all possible values that a column can take, how often is the data refreshed? We will focus on the setting where we do not have, or even know, the counts of all possible items in a histogram. This is typical in practice, because SQL queries will only return items that have positive counts, otherwise how would it know items not present in the dataset? Unfortunately, differential privacy requires considering a hypothetical neighboring dataset, which might contain items that were previously unseen, which makes the privacy analysis challenging. Either we need to populate all possible items that could appear in the dataset and fill in the missing counts as zero, or we need to design better, more practical, DP algorithms. In the latter case, we would want to be able to ensure DP whether we were computing the top-10 skills in a dataset or the top-10 credit card numbers in the dataset. The latter should return no results, but the algorithm should ensure privacy in both cases. We will refer to the scenario where the DP algorithm does not know the domain of items that could be in a dataset beforehand as the Unknown Domain setting.

In differential privacy there are typically two different parameters $\varepsilon,\delta$ that are used to quantify the privacy of an algorithm. The parameter $\varepsilon$ is typically referred to as the amount of privacy loss that an algorithm ensures, while $\delta$ is commonly referred to as the probability in which the algorithm can have larger privacy loss than $\epsilon$ . In fact, when $\delta>0$ , we say an algorithm satisfies approximate differentially privacy, while we say pure DP with $\delta=0$ . The parameter $\delta$ then can be the chance that the algorithm returns a result that clearly violates privacy, e.g. returning an individual’s data record in the data set. Not every approximate DP algorithm satisfies this interpretation, which has resulted in variants of DP, based on Rényi divergence. In particular, adding Gaussian noise with fixed standard deviation $\sigma$ to a statistic of interest cannot be pure DP, but can be $(\varepsilon(\delta),\delta)$ -DP for any $\delta>0$ . Hence the probability of failure need not be fixed in advance in the algorithm. However, there are many different algorithms that are shown to be $(\varepsilon,\delta)$ -differentially private, for a particular $\delta>0$ . In particular, designing DP algorithms for releasing histograms in the Unknown Domain setting will require setting a failure probability in advance.

Consider the example where we want to know the most popular words typed in an email client. The set of all potential words is massive, and can include slang, typos, and abbreviations, so we only want to take words that are present in the dataset, rather than the set of all possible words. So one approach would be to add noise to all word counts and sort them to return the top ones. However, there could be a word like “RyanRogersSSN123456789”. Even if there is noise in the counts, the mere presence of this record is a clear violation of privacy. To prevent this, we can introduce a threshold on the noisy counts, but then there is a chance that the noise is especially large, making a small count go above the threshold. This is where the $\delta>0$ probability comes in, so we want to make sure that the threshold is set high enough to ensure that small counts with noise can only appear above the threshold with probability $\delta$ . However, it does not suffice to just bound the probability of bad outcomes to prove an algorithm is approximate differentially private.

We present a general framework that can be used in the privacy analysis of several algorithms that aims to release counts from an unspecified domain, which would only return results that are present in the original dataset, although the framework can be applied to other scenarios. The main idea is to show a mechanism $A$ satisfies three conditions: (1) there are small chance events that can lead to differentiating outcomes, where it is clear whether one dataset was used, (2) there is another mechanism $A^{\prime}$ that can know both neighboring datasets, and is equal to $A$ on all non-differentiating outcomes, and (3) $A^{\prime}$ is pure DP. Although the algorithms we cover were previously shown to satisfy approximate differential privacy, we revisit their analyses, following our general framework. Furthermore, we show that providing a privacy analysis in terms of approximate concentrated DP (CDP) can lead to improved privacy parameters, especially with algorithms based on Gaussian noise or the Exponential Mechanism.

We advocate for presenting privacy guarantees of new or existing algorithms in terms of approximate CDP, rather than approximate DP, as composing the privacy parameters for CDP parameters is straightforward, while composing approximate DP parameters can be complicated and loose. Note that each algorithm is likely to be part of a more general privacy system where the overall privacy guarantee of the system can be in terms of approximate DP. It has become common to compose algorithms using an analysis based on approximate CDP and then converting to an overall approximate DP guarantee at the end. Leaving privacy guarantees of individual algorithms in terms of approximate DP will lead to loose bounds in converting the DP parameters to CDP parameters, composing CDP parameters, then lastly converting back to DP at the end.

2 Preliminaries

We now define approximate differential privacy which depends on neighboring datasets $x,x^{\prime}\in\mathcal{X}$ , denoted as $x\sim x^{\prime}$ that differ in the presence or absence of one user’s records.

Definition 2.1 (Dwork et al. [10, 9]).

An algorithm $A:\mathcal{X}\rightarrow\mathcal{Y}$ is $(\epsilon,\delta)$ -differentially private if, for any measurable set $S\subseteq\mathcal{Y}$ and any neighboring inputs $x\sim x^{\prime}$ ,

\Pr[A(x)\in S]\leq e^{\epsilon}\Pr[A(x^{\prime})\in S]+\delta.

(1)

If $\delta=0$ , we say $A$ is $\varepsilon$ -DP or simply pure DP.

One of the classical pure DP algorithms is the Laplace mechanism, which adds Laplace noise to a statistic. However, to determine the scale of noise to add to the statistic to ensure DP, we must know its sensitivity. We then define the $\ell_{p}$ -sensitivity of a statistic $f:\mathcal{X}\to\mathbb{R}^{d}$ that takes a dataset $x\in\mathcal{X}$ to a real vector in $\mathbb{R}^{d}$ as the following where the max is taken over neighboring $x,x^{\prime}\in\mathcal{X}$

\Delta_{p}(f)=\max_{x\sim x^{\prime}}\left\{||f(x)-f(x^{\prime})||_{p}\right\}.

We then have the following privacy guarantee for the Laplace mechanism.

Theorem 1 (Dwork et al. [10]).

Let $f:\mathcal{X}\to\mathbb{R}^{d}$ have $\ell_{1}$ -sensitivity $\Delta_{1}(f)$ , then the mechanism $M:\mathcal{X}\to\mathbb{R}^{d}$ where $M(x)=f(x)+(Z_{1},\cdots,Z_{d})$ with $\{Z_{i}\}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{Lap}(\Delta_{1}(f)/\varepsilon)$ is $\varepsilon$ -DP for $\varepsilon>0$ .

Another classical pure DP mechanism is the Exponential Mechanism, which takes a quality score $q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$ mapping a dataset and outcome to a real value and the goal is to return outcomes that have a high quality score. We will define the range of a quality score to be the following where the max over datasets is over neighbors $x,x^{\prime}\in\mathcal{X}$ ,¹¹1The original Exponential Mechanism was presented in terms of a quality scores sensitivity, while recent work from [5] showed that the Exponential Mechanism is more naturally defined in terms of the quality score’s range. See also Jinshuo Dong’s blog post https://dongjs.github.io/2020/02/10/ExpMech.html

\Delta(q)=\max_{y,y^{\prime}\in\mathcal{Y}}\max_{x\sim x^{\prime}}\{(q(x,y)-q(x^{\prime},y))-(q(x,y^{\prime})-q(x^{\prime},y^{\prime}))\}.

We then have the following privacy guarantee of the Exponential Mechanism.

Theorem 2 (McSherry and Talwar [15]).

Let $q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$ be a quality score with sensitivity $\Delta(q)$ , then the mechanism $M:\mathcal{X}\to\mathcal{Y}$ is $\varepsilon$ -DP for any $\varepsilon>0$ where

\Pr[M(x)=y]\propto\exp\left(\frac{\varepsilon q(x,y)}{\Delta(q)}\right)

We now define approximate concentrated differential privacy (CDP),²²2Although [1] defines zCDP, to differentiate between CDP from [8], we will use CDP to be the version from [1] which differs slightly from the original definition but was later shown [21] to be equivalent to the original version. Similar to approximate DP, it permits a small probability of unbounded Rényi divergence.

Definition 2.2 (Bun and Steinke [1], Papernot and Steinke [16]).

Suppose $A:\mathcal{X}\to\mathcal{Y}$ and $\rho,\delta\geq 0$ . We say the algorithm $A$ is $\delta$ -approximate $\rho$ -CDP if, for any neighboring datasets $x,x^{\prime}$ , there exist distributions $P^{\prime},P^{\prime\prime},Q^{\prime},Q^{\prime\prime}$ such that the outputs are distributed according to the following mixture distributions:

\displaystyle A(x)\sim(1-\delta)P^{\prime}+\delta P^{\prime\prime}\qquad A(x^{\prime})\sim(1-\delta)Q^{\prime}+\delta Q^{\prime\prime},

where for all $\lambda\geq 1$ , $D_{\lambda}(P^{\prime}\|Q^{\prime})\leq\rho\lambda$ and $D_{\lambda}(Q^{\prime}\|P^{\prime})\leq\rho\lambda$ .

We can also convert approximate differential privacy to approximate CDP and vice versa.

Theorem 3 (Bun and Steinke [1]).

If $A$ is $(\varepsilon,\delta)$ -DP then it is $\delta$ -approximate $\varepsilon^{2}/2$ -CDP. If $A$ is $\delta$ -approximate $\rho$ -CDP then it is $(\rho+2\sqrt{\rho\log(1/\delta^{\prime})},\delta^{\prime}+\delta)$ -DP for any $\delta^{\prime}>0$ .

The classical CDP mechanism is the Gaussian Mechanism. Note that the Gaussian Mechanism was originally introduced as satisfying approximate DP, but it was then shown to satisfy pure CDP in later work [8, 1].

Theorem 4 (Bun and Steinke [1]).

Let $f:\mathcal{X}\to\mathbb{R}^{d}$ have $\ell_{2}$ -sensitivity $\Delta_{2}(f)$ , then the mechanism $M:\mathcal{X}\to\mathbb{R}^{d}$ where $M(x)=f(x)+(Z_{1},\cdots,Z_{d})$ with $\{Z_{i}\}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\mathrm{N}(0,\tfrac{\Delta_{2}(f)^{2}}{2\rho})$ is $\rho$ -CDP for $\rho>0$ .

Note that we can apply Theorem 3 to conclude that the Exponential Mechanism is $\varepsilon$ -DP and hence $\varepsilon^{2}/2$ -CDP, but work from Cesar and Rogers [2] showed that the Exponential Mechanism actually has a better CDP parameter.³³3Also see the previous blog post at https://differentialprivacy.org/exponential-mechanism-bounded-range/

Theorem 5 (Cesar and Rogers [2]).

Let $q:\mathcal{X}\times\mathcal{Y}\to\mathbb{R}$ be a quality score with sensitivity $\Delta(q)$ , then the mechanism $M:\mathcal{X}\to\mathcal{Y}$ is $\varepsilon^{2}/8$ -CDP for any $\varepsilon>0$ where

\Pr[M(x)=y]\propto\exp\left(\frac{\varepsilon q(x,y)}{\Delta(q)}\right)

We also state the composition property of CDP, showing that the overall privacy parameters degrade after multiple CDP algorithms are run on a dataset.

Theorem 6.

Let $A_{1}:\mathcal{X}\to\mathcal{Y}$ be $\delta_{1}$ -approximate $\rho_{1}$ -CDP and $A_{2}:\mathcal{X}\times\mathcal{Y}\to\mathcal{Z}$ where $A_{2}(\cdot,y)$ is $\delta_{2}$ -approximate $\rho_{2}^{\prime}$ -CDP for all $y\in\mathcal{Y}$ . Then $A:\mathcal{X}\to\mathcal{Z}$ where $A(x)=A_{2}(x,A_{1}(x))$ is $(\delta_{1}+\delta_{2}-\delta_{1}\cdot\delta_{2})$ -approximate $\rho_{1}+\rho_{2}$ -CDP.

3 Unifying Framework

We now present a general framework that can be used to unify some of the previous analyses for proving approximate DP (CDP) for various algorithms. If we want to show an algorithm $A$ is approximate CDP, we need to consider the randomness in $A$ that can generate differentiating outcomes, where if we are given two inputs $x$ and $x^{\prime}$ , we see an outcome of $A$ that could have only come from one of them. We want to be able to show that the chance that randomness in $A$ can generate these bad outcomes is at most $\delta$ . Furthermore, we want to show that for all other non-differentiating outcomes, there are related mechanisms that match $A$ with inputs $x$ and $x^{\prime}$ . Lastly, we need to show that these related mechanisms satisfy pure CDP.

To be more precise, the following lemma can be used to prove many different algorithms that are approximate DP can also be proven to be approximate CDP directly, without needing to resort to the general approximate DP conversion to approximate CDP conversion.

Lemma 3.1.

Let $A:\mathcal{X}\to\mathcal{Y}$ be a randomized algorithm and fix parameters $\rho,\delta\geq 0$ . If for each neighboring datasets $x,x^{\prime}$ the algorithm $A$ satisfies the following conditions, then it is $\delta$ -approximate $\rho$ -CDP:

For each neighboring datasets $x,x^{\prime}$ , we have the following three conditions

1.

There exists events $E,E^{\prime}$ such that

$\Pr_{A(x)}[E],\Pr_{A(x^{\prime})}[E^{\prime}]\geq 1-\delta.$

Let $S$ be the corresponding outcomes of both $A(x)$ conditioned on events $E$ and $A(x^{\prime})$ conditioned on $E^{\prime}$ .

There exists distributions $P^{\prime}$ and $Q^{\prime}$ with common support $S$ , such that for all $y\in S$ we have the following where $P$ , $Q$ are the distributions for $A(x)$ and $A(x^{\prime})$ , respectively,⁴⁴4We denote $P(y),P^{\prime}(y),P^{\prime\prime}(y),Q(y),Q^{\prime}(y),Q^{\prime\prime}(y)$ to denote the Radon-Nikodym derivative of the corresponding distribution with respect to some base measure (see a similar note in [19])

P(y)=\mathbbm{1}\left\{y\in S\right\}\cdot\Pr_{A(x)}[E]P^{\prime}(y)\qquad Q(y)=\mathbbm{1}\left\{y\in S\right\}\cdot\Pr_{A(x^{\prime})}[E^{\prime}]Q^{\prime}(y)

3.

Also we have $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\rho$ and $D_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\rho$ for all $\lambda\geq 1$ .

Proof.

Fix neighbors $x,x^{\prime}$ . Let $P,Q$ be the distribution for $A(x)$ and $A(x^{\prime})$ , respectively. Consider the following distribution $P^{\prime\prime}$ where we write $P(\cdot\mid\neg E)$ to denote the conditional distribution of $P$ given events $E$

P^{\prime\prime}(y)=\frac{1}{\delta}\left(P(y\mid\neg E)\Pr[\neg E]+P^{\prime}(y)\left(\Pr_{A(x)}[E]-(1-\delta)\right)\right)

We then have that for $y\in S$

\displaystyle(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y)=(1-\delta)P^{\prime}(y)+\delta\cdot\frac{1}{\delta}\left(P^{\prime}(y)\cdot(\Pr[E]-(1-\delta))\right)=P^{\prime}(y)\Pr[E]=P(y).

Furthermore, for $y\notin S$ we have

(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y)=\delta\cdot\frac{1}{\delta}\left(P(y\mid\neg E)\Pr[\neg E]\right)=P(y\mid\neg E)\Pr[\neg E]=P(y).

Hence, we have $P(y)=(1-\delta)P^{\prime}(y)+\delta P^{\prime\prime}(y)$ for all outcomes $y$ . A similar arguments works to show $Q(y)=(1-\delta)Q^{\prime}(y)+(1-\delta)Q^{\prime\prime}(y)$ with $Q^{\prime\prime}(y)$ defined similar to $P^{\prime\prime}(y)$ .

By assumption, we know that $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\rho$ and $D_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\rho$ for all $\lambda\geq 1$ . Hence, $A$ is approximate CDP. ∎

Although a similar lemma can be used for approximate DP, this will typically lead to unnecessarily loose privacy parameters, as we will see later. Hence, providing an approximate CDP analysis of each algorithm will be useful in any privacy system that combines these algorithms because the CDP privacy parameters simply add up and can be converted to approximate DP parameters at the end, if necessary.

4 Unknown Domain Algorithms

We will denote a histogram as $h\in\mathcal{H}=\{(i,c_{i}):(i,c_{i})\in[d]\times\mathbb{N}\}$ which consists of a set of pairs with a label $i$ in $[d]$ and its corresponding count $c_{i}$ . When we define neighboring histograms, we will need to consider how much a user can modify a histogram. If we remove a user’s contributions from a histogram $h$ , this can both remove items entirely or decrease counts by some amount. We will say that a histogram $h$ is $(\ell_{0},\ell_{\infty})$ -sensitive if removing or adding a user’s data to histogram $h$ can change at most $\ell_{0}$ distinct elements and each count in $h$ can differ by at most $\ell_{\infty}$ . We now turn to some applications of Lemma 3.1 for some existing mechanisms.

4.1 Positive Count Histograms

We start with the setting where we want to return a histogram subject to CDP where we are only given positive count items, hence each count present in the histogram is at least 1. It is straightforward to extend this analysis to the case where we have a histogram with only counts above some known value larger than 1. This is a natural setting for data analytics as GROUP BY queries in SQL only provide items that exist in the dataset, so no zero counts are returned. This is problematic for privacy, as a neighboring histogram can have fewer results, resulting in a differing set of counts to add noise to. We present a general template for the private algorithm in Algorithm 1, where we deliberately leave the noise distribution and the threshold arbitrary.

Algorithm 1 Unknown Domain Histogram

Histogram

h

, noise distribution

\mathrm{Noise}

and threshold

T>0

Noisy Histogram

\tilde{h}

with counts above

T

Initialize

\tilde{h}=\emptyset

for each item

i

where

(i,c_{i})\in h

such that

c_{i}>0

Set

\tilde{c}_{i}=c_{i}+Z_{i}

where

Z_{i}\sim\mathrm{Noise}

\tilde{c}_{i}>T

then

\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}

end if

end for

Previous mechanisms follow this template, specifically from Korolova et al. [13] and Wilson et al. [22] who used Laplace noise and Swanberg et al. [20] who used Gaussian noise. We now prove that Algorithm 1 is approximate CDP using Lemma 3.1.

Theorem 7.

Assume input histograms $h$ are $(\Delta_{0},\Delta_{\infty})$ -sensitive. If we use $\mathrm{Noise}$ being the distribution of $|h|$ many i.i.d. $\mathrm{Lap}(\Delta_{\infty}/\varepsilon)$ and threshold

T=\Delta_{\infty}+\frac{\Delta_{\infty}}{\varepsilon}\log(\tfrac{\Delta_{0}}{2\delta}),

then Algorithm 1 is $\delta$ -approximate $\Delta_{0}\cdot\varepsilon^{2}/2$ -CDP. Furthermore, if we use $\mathrm{Noise}=\mathrm{N}(0,\Delta_{\infty}^{2}/\varepsilon^{2}\cdot I_{|h|})$ and threshold

T=\Delta_{\infty}+\frac{\Delta_{\infty}}{\varepsilon}\Phi^{-1}(1-\delta/\Delta_{0}),

where $\Phi^{-1}(z)$ is the inverse CDF of a standard normal then Algorithm 1 is $\delta$ -approximate $\Delta_{0}\cdot\varepsilon^{2}/2$ -CDP.

Proof.

We will rely on Lemma 3.1 to prove this result. Consider neighbors $h,h^{\prime}$ where $h$ has one additional user’s data from $h^{\prime}$ , without loss of generality. By assumption, we know that $h$ can differ in at most $\Delta_{0}$ counts for different labels. Furthermore, we know that in counts that are differing, they can differ by at most $\Delta_{\infty}$ . Consider the set $S$ to be the set of labels that are common between $h$ and $h^{\prime}$ along with corresponding counts. Since we assume that $h$ has one additional user’s data from $h^{\prime}$ , we know that this set $S$ must include all labels of $h^{\prime}$ . Note that the counts for the items that are not present in $h^{\prime}$ but are present in $h$ must be at most $\Delta_{\infty}$ . We now cover each item in Lemma 3.1.

We define the event $E$ to be all the randomness in $A(h)$ that can generate outcomes in $S$ , so that $E$ must only include the noise that is added to items that are common between $h$ and $h^{\prime}$ and the noise that is added to the items’ counts in $h$ , but not in $h^{\prime}$ must be bounded. We then lower bound the probability of event $E$ . Note that for every item $j$ who is in $h$ just not in $S$ , it’s count can be no more than $\Delta_{\infty}$ .

	$\displaystyle\Pr_{A(h)}[E]$	$\displaystyle=\prod_{j:(j,\cdot)\in h\setminus h^{\prime}}\Pr[c_{j}+\mathrm{Noise}\leq T]$
		$\displaystyle\geq\prod_{j:(j,\cdot)\in h\setminus h^{\prime}}\Pr[\Delta_{\infty}+\mathrm{Noise}\leq T]$

We then consider the two scenarios with either Laplace noise or Gaussian noise.

–

(Laplace) With Laplace noise of scale $b>0$ , with $T=\Delta_{\infty}+b\log(\tfrac{\Delta_{0}}{2\delta})$ , we have

$\displaystyle\Pr_{A(h)}[E]\geq\left(1-\frac{1}{2}\exp\left(-\frac{T-\Delta_{\infty}}{b}\right)\right)^{\Delta_{0}}\geq 1-\delta.$
–

(Gaussian) Next we consider the Gaussian noise version that has standard deviation $\sigma>0$ with $T=\Delta_{\infty}+\sigma\Phi^{-1}(1-\delta/\Delta_{0})$ .

$\displaystyle\Pr_{A(h)}[E]\geq\Phi\left(\frac{T-\Delta_{\infty}}{\sigma}\right)^{\Delta_{0}}\geq 1-\delta.$

For this case, the event $E^{\prime}$ is all randomness in $A(h^{\prime})$ that can generate outcomes in $S$ , which would include all randomness in $A(h^{\prime})$ because $h^{\prime}$ is a subset of $h$ .

2.

We then consider the mechanism $A^{\prime}$ whose domain is the set of common items between $h$ and $h^{\prime}$ and has noise added to each count so that only noisy counts above $T$ are returned with its corresponding item label. We write the distribution of $A^{\prime}(h)$ as $P^{\prime}$ and the distribution of $A^{\prime}(h^{\prime})$ as $Q^{\prime}$ . Because we add independent noise to each histogram count, we can separate out the noise terms added to the counts of labels that are not common between $h$ and $h^{\prime}$ , hence we know that $P(y)=\Pr[E]P^{\prime}(y)$ for $y\in S$ , by design. Furthermore $Q\equiv Q^{\prime}$ .
3.
Note that $A^{\prime}$ is either the Laplace mechanism or the Gaussian mechanism over common items between $h$ and $h^{\prime}$ . We then cover each variant separately.
- –
  
  (Laplace) We first consider the case where we apply Laplace noise with noise parameter $b>0$ . Note that $A^{\prime}$ is a Laplace Mechanism over a histogram that can change in at most $\Delta_{0}$ counts and each count can differ by at most $\Delta_{\infty}$ . Ignoring the threshold in $A^{\prime}$ , as this is a post processing of the noisy histogram and does not impact the privacy analysis, we can then say that the Laplace mechanism is being applied to a histogram with $\ell_{1}$ -sensitivity $\Delta_{0}\Delta_{\infty}$ . Hence, we know that the Laplace mechanism is $\Delta_{0}\Delta_{\infty}/b$ -(pure) DP and hence $(\Delta_{0}^{2}\Delta_{\infty}^{2}/b^{2}/2)$ -(pure) CDP. However, this will not get us the result we want because it would result in $\Delta_{0}^{2}\varepsilon^{2}/2$ -CDP with $b=\Delta_{\infty}/\varepsilon$ . This was due to using the $\ell_{1}$ -sensitivity of the histogram and then applying Theorem 1.
  
  We now consider the Laplace mechanism only on the common items between $h$ and $h^{\prime}$ that also have differing counts. We call the corresponding mechanism $\hat{A}$ and denote the distribution $\hat{P}$ for $\hat{A}(h)$ and the distribution $\hat{Q}$ for $\hat{A}(h^{\prime})$ . Note that each noisy count in $\hat{A}$ is generated independently, so we can say that each count in $\hat{A}$ is a single univariate Laplace mechanism. Each Laplace mechanism will then be $\Delta_{\infty}/b$ -(pure) DP and hence $(\Delta_{\infty}/b)^{2}/2$ -(pure) CDP. Applying composition from Theorem 6 over each univariate Laplace mechanism implies that $\hat{A}$ is $\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}$ -(pure) CDP. Let $\tilde{A}$ denote the Laplace Mechanism applied to the counts that were unchanged between $h$ and $h^{\prime}$ with distribution $\tilde{P}$ for $\tilde{A}(h)=\tilde{A}(h^{\prime})$ . For ease of notation, let $h$ and $h^{\prime}$ match on the first $d^{\prime}\leq d$ indices and let the first $k\leq\Delta_{0}$ indices of $h$ and $h^{\prime}$ have differing counts. Hence, we have for outcome $y=(y_{1},\cdots,y_{d^{\prime}})$ that $P^{\prime}(y)=\hat{P}(y_{1},\cdots,y_{k})\cdot\tilde{P}(y_{k+1},\cdots,y_{d^{\prime}})$ . Similarly, we have $Q^{\prime}(y)=\hat{Q}(y_{1},\cdots,y_{k})\cdot\tilde{P}(y_{k+1},\cdots,y_{d^{\prime}})$ . This gives us the following for $\lambda\geq 1$
  
  $D_{\lambda}(P^{\prime}||Q^{\prime})=D_{\lambda}\left(\hat{P}||\hat{Q}\right)\leq\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}$
  
  and similarly,
  
  $D_{\lambda}(Q^{\prime}||P^{\prime})\leq\frac{\Delta_{0}(\Delta_{\infty})^{2}}{2b^{2}}.$
- –
  
  (Gaussian) We next consider the Gaussian noise variant. We will denote $A^{\prime}$ as the Gaussian mechanism whose domain is the set of common items between $h$ and $h^{\prime}$ and has noise standard deviation $\Delta_{\infty}/b$ and only counts above $T$ will be returned with its corresponding item label. We write the distribution of $A(h)$ as $P^{\prime}$ and the distribution of $A^{\prime}(h^{\prime})$ as $Q^{\prime}$ .
  
  Note that $A^{\prime}$ is a Gaussian Mechanism over a histogram that can change in at most $\Delta_{0}$ counts and each count can differ by at most $\Delta_{\infty}$ . Ignoring the threshold in $A^{\prime}$ , again this does not impact the privacy analysis, we can then say that the Gaussian mechanism is being applied to a histogram with $\ell_{2}$ -sensitivity $\Delta_{\infty}\sqrt{\Delta_{\infty}}$ . Hence, we know that the Gaussian mechanism is $\Delta_{\infty}^{2}\Delta_{0}/\sigma^{2}$ -(pure) CDP from Theorem 4. Furthermore, post processing the Gaussian Mechanism is also CDP with the same parameters, so restricting the noisy counts to be larger than $T$ gets us back to distributions $P^{\prime}$ and $Q^{\prime}$ . This gives us $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\lambda\tfrac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}}$ and $D_{\lambda}(Q^{\prime}||P^{\prime})\leq\lambda\tfrac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}}$ , for all $\lambda\geq 1$ .

Hence, using Laplace noise with scale $b>0$ in the Unknown Domain Histogram algorithm is $\delta$ -approximate $\frac{\Delta_{0}^{2}\Delta_{\infty}^{2}}{2b^{2}}$ -CDP. Setting $b=\Delta_{\infty}/\varepsilon$ completes the proof for Laplace noise. Furthermore, using Gaussian noise with standard deviation $\sigma$ is $\delta$ -approximate $\frac{\Delta_{\infty}^{2}\Delta_{0}}{2\sigma^{2}}$ -CDP and setting $\sigma=\frac{\Delta_{\infty}}{\varepsilon}$ completes the proof.

∎

We want to highlight the improvement we get when we use the CDP analysis. If we consider a similar analysis using approximate DP, we would remove the items that cannot be returned under both neighboring datasets and we would be left with the Laplace mechanism over the common items. We could then use the $\ell_{1}$ -sensitivity of the resulting histogram, but then adding Laplace noise with scale $b=\Delta_{\infty}/\varepsilon$ , would result in $(\varepsilon\Delta_{0},\delta)$ -DP, which can be converted to CDP using Theorem 3 to get $\delta$ -approximate $\varepsilon^{2}\Delta_{0}^{2}/2$ -CDP, although we can get $\delta$ -approximate $\varepsilon^{2}\Delta_{0}/2$ -CDP in our analysis. Furthermore, if we convert approximate CDP guarantees to approximate DP guarantees, this would lead to loose privacy parameters when developing privacy systems that use these algorithms. For example, if we use the Gaussian noise variant in Theorem 7 with $\Delta_{0}=1$ , we can conclude that it is $(\varepsilon^{2}/2+\varepsilon\sqrt{2\log(1/\delta)},\delta+\delta^{\prime})$ -DP for any $\delta^{\prime}>0$ , so if we only use DP guarantee and combine it with another Unknown Domain Histogram with Gaussian noise, we can compose to get $(\varepsilon^{\prime},2\delta+\delta^{\prime})$ -DP for any $\delta^{\prime}>0$ where

\varepsilon^{\prime}=\varepsilon^{2}+2\varepsilon\sqrt{2\log(1/\delta^{\prime})}.

However, if we had never converted to approximate DP until after composing both Unknown Domain Histogram mechanisms, we could have gotten an overall $(\varepsilon^{\prime\prime},2\delta+\delta^{\prime})$ -DP guarantee, where

\varepsilon^{\prime\prime}=\varepsilon^{2}+2\varepsilon\sqrt{\log(1/\delta^{\prime})}.

4.2 Top- $(\bar{k}+1)$ Count Histograms

We now turn to a slight variant of releasing private histograms over a histogram with positive counts. In this setting, we assume that only a limited part of the histogram is available, perhaps due to an existing data analytics system that cannot return all counts. This setting first appeared in [6] and is especially important when designing DP algorithms on top of existing systems that cannot provide the full histogram of counts [18]. We will refer to the histogram consisting of the top- $(\bar{k}+1)$ items as the histogram with items and corresponding counts that are in the top- $(\bar{k}+1)$ . Note that in this case, the top- $(\bar{k}+1)$ can change between neighboring histograms. We now present a general algorithm template, similar to Algorithm 1, in Algorithm 2 that takes an arbitrary threshold $T>0$ and a top- $(\bar{k}+1)$ -histogram.

Algorithm 2 Unknown Domain from Top-

(\bar{k}+1)

Histogram

h

, noise standard deviation

\sigma

, threshold

T>0

, and top-

(\bar{k}+1)

histogram

Noisy Histogram

\tilde{h}

with at most

\bar{k}

counts.

Let

h_{(\bar{k}+1)}

be the histogram consisting of the top-

(\bar{k}+1)

items, breaking ties arbitrarily.

h_{(\bar{k})}

has fewer than

\bar{k}

items then

Pad

h_{(\bar{k})}

with items

\bot_{1},\cdots,\bot_{\bar{k}-|h_{(\bar{k})}|}

with

c_{\bot_{j}}=0

until there are

\bar{k}

items in

h_{(\bar{k})}

end if

Let

c_{(\bar{k}+1)}

be the count of the

(\bar{k}+1)

-th item in

h

, which might be zero.

Set

\tilde{T}=T+c_{(\bar{k}+1)}+\mathrm{N}(0,\sigma^{2})

Initialize

\tilde{h}=\emptyset

for each item

i

where

(i,c_{i})\in h_{(\bar{k})}

Set

\tilde{c}_{i}=c_{i}+\mathrm{N}(0,\sigma^{2})

\tilde{c}_{i}>\tilde{T}

then

\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}

end if

end for

We now show that it is indeed approximate CDP.

Theorem 8.

Assume input histograms $h$ are $(\Delta_{0},\Delta_{\infty})$ -sensitive. If we use $\sigma=\frac{\Delta_{\infty}}{\varepsilon}$ and threshold

T=\Delta_{\infty}+\frac{\sqrt{2}\cdot\Delta_{\infty}}{\varepsilon}\Phi^{-1}(1-\delta/\Delta_{0})

then Algorithm 2 is $\delta$ -approximate $\Delta_{0}\cdot\varepsilon^{2}/2$ -CDP.

Proof.

We follow the same analysis as in the proof of Theorem 7, which used Lemma 3.1. We again set $S$ to be the set of labels that are common between neighbors $h$ and $h^{\prime}$ . Note that we are only considering items with counts in the top- $(\bar{k})$ of each histogram and the items between $h_{(\bar{k})}$ and $h_{(\bar{k})}^{\prime}$ , including the zero count items with labels in $\{\bot_{j}\}$ . Hence, there might be different items between $h_{(\bar{k})}$ and $h_{(\bar{k})}$ , as reducing some counts in $h$ might change the order of the top- $(\bar{k}+1)$ . From Lemma 5.2 in [6], we know that there can be at most $\min\{\bar{k},\Delta_{0}\}$ many differing items between the top- $(\bar{k}+1)$ histograms $h_{(\bar{k})}$ and $h_{(\bar{k})}^{\prime}$ . We then follow the three items that we need to show in order to apply Lemma 3.1.

We denote $A$ as Algorithm 2 and the event $E$ as the noise added to counts in outcomes $S$ for $A(h)$ , the noise added for the threshold $T+c_{(\bar{k}+1)}$ to get $\tilde{T}$ , and the noise added to the differing items not in $S$ must be no more that $\tilde{T}$ . We define $E^{\prime}$ similarly for $A(h^{\prime})$ . Note that for any item $j$ that is in $h_{(\bar{k})}$ but not an item that can be returned in $S$ , we know

c_{j}\leq c_{j}^{\prime}+\Delta_{\infty}\leq c^{\prime}_{(\bar{k})}+\Delta_{\infty}\leq c_{(\bar{k})}+\Delta_{\infty}\leq c_{(\bar{k}+1)}+\Delta_{\infty}.

The analysis is straightforward due to the difference between two Gaussians being Gaussian itself. Hence, we have with $\hat{T}=\sqrt{2}\sigma\Phi^{-1}(1-\delta/\Delta_{0})$ so that $T=\hat{T}+\Delta_{\infty}$ .

	$\displaystyle\Pr_{A(x)}[\neg E]$	$\displaystyle\leq\sum_{i=1}^{\Delta_{0}}\Pr\left[c_{(\bar{k}+1)}+\Delta_{\infty}+\mathrm{N}(0,\sigma^{2})>c_{(\bar{k}+1)}+T+\mathrm{N}(0,\sigma^{2})\right]$
		$\displaystyle=\Delta_{0}\Pr[\mathrm{N}(0,2\sigma^{2})>\hat{T}]$
		$\displaystyle=\Delta_{0}\left(1-\Phi\left(\frac{\hat{T}}{\sqrt{2}\sigma}\right)\right)$
		$\displaystyle=\Delta_{0}\left(1-(1-\delta/\Delta_{0})\right)=\delta$

The analysis to show $\Pr_{A(x^{\prime})}[\neg E^{\prime}]$ is similar.

2.

We then consider the distribution $P^{\prime}(y)$ to be the distribution of $A(h)$ conditioned on $E$ . Note that $P(y)=\Pr[E]P^{\prime}(y)$ for each $y\in S$ and similarly $Q(y)=\Pr[E^{\prime}]Q^{\prime}(y)$ for $Q^{\prime}(y)$ being the distribution of $A(h^{\prime})$ conditioned on $E^{\prime}$ . We will modify the mechanism $A(h)$ and $A(h^{\prime})$ that will result in the same mechanism when conditioned on events $E$ and $E^{\prime}$ , respectively. For each label that differs between $h_{(\bar{k})}$ and $h^{\prime}_{(\bar{k})}$ , we change it to common labels $b_{1},\cdots b_{\ell}$ where $\ell=|\{j:(j,\cdot)\in\{h_{(\bar{k})}\setminus h^{\prime}_{(\bar{k})}\}\}|\leq\Delta_{0}$ . Because we condition on events $E$ , no outcome $y\in S$ can include the indices $b_{1},\cdots,b_{\ell}$ . Furthermore, it was shown in [17][Lemma 6.4]] that the resulting histogram with common labels will also have as many as $\Delta_{0}$ items with differing counts, and those counts can change by at most $\Delta_{\infty}$ , regardless of how we assign the common labels $\{b_{j}\}$ . We then let $P^{\prime}$ and $Q^{\prime}$ be the resulting distribution after this relabeling.

Next we will need to bound the Rényi divergence between $P^{\prime}$ and $Q^{\prime}$ . We will make use of the following result from [12][Corollary 4.3]

Lemma 4.1.

Suppose $F,F^{\prime}$ are two $\mu$ -strongly convex functions over $\mathcal{K}\subseteq\mathbb{R}^{d}$ , and $F-F^{\prime}$ is $G$ -Lipschitz over $\mathcal{K}$ . For any $k>0$ , if we let $P\propto e^{-mF}$ and $Q\propto e^{-mF"}$ be two probability distributions on $\mathcal{K}$ , then we have for all $\lambda\geq 1$

D_{\lambda}(P||Q)\leq\frac{\lambda mG^{2}}{2\mu}

This result is useful because it allows us to condition on outcomes from a joint Gaussian Mechanism falling in some convex region, which will correspond to releasing only “good” outcomes, i.e. not allowing certain counts from going above some noisy value. For the Gaussian mechanism, we have $F(z)=||z-h||_{2}^{2}$ and $F^{\prime}(z)=||z-h^{\prime}||_{2}^{2}$ , which are both $2$ -strongly convex over any convex region. Furthermore, we have $||z-h||_{2}^{2}-||z-h^{\prime}||_{2}^{2}$ is $2\sqrt{\Delta_{0}}\cdot\Delta_{\infty}$ -Lipschitz. For the density of a Gaussian, we then use $m=\tfrac{1}{2\sigma^{2}}$

3.

We now want to prove that the Rényi divergence between $P^{\prime}$ and $Q^{\prime}$ , which are conditioned on events in $E$ and $E^{\prime}$ respectively. To do this we will consider the joint Gaussian distribution that releases all counts over the histograms with common labels, including the $(\bar{k}+1)$ -th largest count with added $T$ to its count being labeled $\bot$ , but we do not enforce the threshold. We only want to consider events that do not have the items with labels in $\{b_{j}\}$ having noisy count above the noisy count for $\bot$ . We then consider the convex region $\mathcal{K}$ where these “bad” noisy counts do not go above the noisy count for $\bot$ . We then apply Lemma 4.1 to claim that the resulting mechanism conditioned on this region has a bound on the Rényi Divergence that is the same as if it were the Gaussian mechanism not constrained to region $\mathcal{K}$ . Dropping the items with counts lower than the noisy count for $\bot$ is simply post processing, which does not increase the Rényi divergence bound. Hence, we have $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{\lambda\Delta_{0}\Delta_{\infty}^{2}}{2\sigma^{2}}$ and $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{\lambda\Delta_{0}\Delta_{\infty}^{2}}{2\sigma^{2}}$ for all $\lambda\geq 1$ .

Setting $\sigma=\Delta_{\infty}/\varepsilon$ completes the proof. ∎

4.2.1 Exponential Mechanism

For the previous applications, there was a crucial assumption that the input histograms were $(\Delta_{0},\Delta_{\infty})$ -sensitive, specifically that the histogram’s $\ell_{0}$ -sensitivity must be bounded by $\Delta_{0}$ . However, this might not always be the case, and would be difficult to enforce a bound in practice – requiring that each user has only a certain number of distinct items in the data. One way to limit the impact a single user can have on the result is to limit the number of items that can be returned. The Laplace/Gaussian Mechanism based algorithms can return an arbitrary number of things that are above the threshold and in the setting where we only have access to the top- $(\bar{k}+1)$ items, we could return all $\bar{k}$ . Hence, the CDP parameter could be bounded in terms of $\bar{k}$ , but we would like to control how many things could be returned from the counts we have access to. In this case, we can return the top- $k$ results, where $k\leq\bar{k}$ is an input to the algorithm. Unfortunately, the previous analysis would still have the CDP parameter scale with $\bar{k}$ despite only wanting to return $k$ .

To ensure that privacy loss only scales with the number of things that are returned, we can use the classical Exponential Mechanism [15], as presented in Theorem 2. It was shown that the Exponential Mechanism is part of a general framework of DP mechanisms called report noisy max. This can be summarized as adding noise to the quality scores for each outcome and returning the outcome with the noisy max. Note that it is critical that only the arg max is returned, not the actual noisy value. It turns out that adding Gumbel noise to the quality scores and returning the max item is equivalent to the Exponential Mechanism, see [6]. Furthermore, it was shown that iteratively applying the Exponential Mechanism to return the top- $k$ outcomes is equivalent to adding Gumbel noise to all quality scores and returning the outcomes with the $k$ largest noisy quality scores [6]. Other mechanisms in the report noisy max framework include adding Laplace noise to the quality scores [7] and adding Exponential noise to the quality scores [4] which turns out to be equivalent to the Permute-and-Flip Mechanism [14].⁵⁵5See previous blog post on one-shot top- $k$ DP algorithms: https://differentialprivacy.org/one-shot-top-k/

We then present the Unknown Domain Gumbel algorithm in Algorithm 3 from [6] which takes an additional parameter $k\leq\bar{k}$ and importantly returns a ranked list of items, but not their counts.

Algorithm 3 Unknown Domain Gumbel from Top-

(\bar{k}+1)

Histogram

h

, noise scale

\beta>0

, threshold

T>0

, and

k,\bar{k}

Sorted list of at most

k

items,

I_{k}

Let

h_{(\bar{k})}

be the histogram consisting of the top-

(\bar{k})

items

Let

c_{(\bar{k}+1)}

be the count of the

(\bar{k}+1)

-th item in

h

Set

\tilde{T}=T+c_{(\bar{k}+1)}+\mathrm{Gumb}(\beta)

Initialize

\tilde{h}=\emptyset

for each item

i

where

(i,c_{i})\in h_{(\bar{k})}

such that

c_{i}>0

Set

\tilde{c}_{i}=c_{i}+\mathrm{Gumb}(\beta)

\tilde{c}_{i}>\tilde{T}

then

\tilde{h}=\tilde{h}\cup\left\{(i,\tilde{c}_{i})\right\}

end if

end for

Let

I_{k}

be the ordered list of at most

k

items that are sorted in descending order by their count in

\tilde{h}

I_{k}<k

then

I_{k}=I_{k}\cup\{\bot\}

end if

We then state Unknown Domain Gumbel’s privacy guarantee, which will largely follow the analysis in [6], although adapted for CDP.

Theorem 9.

Assume input histograms $h$ are $(\infty,1)$ -sensitive. If we use the noise scale $\beta=1/\varepsilon$ , and threshold

T=1+\frac{1}{\varepsilon}\log(\tfrac{\bar{k}}{\delta}),

then Algorithm 3 is $\delta$ -approximate $k\varepsilon^{2}/8$ -CDP.

Proof.

We will apply our general framework from Lemma 3.1 and some previous results from [6]. We will denote $A$ as the Unknown Domain Gumbel mechanism. We assume WLOG that $h,h^{\prime}$ differ by there being one additional user’s data in $h$ when compared to $h^{\prime}$ .

1.

We denote $S$ as the set of common outcomes that are possible between top- $(\bar{k})$ histograms $h_{(\bar{k})}$ and $h^{\prime}_{(\bar{k})}$ and $E$ to be all the randomness in $A(h)$ that can generate outcomes in $S$ . Hence, $E$ must only include the Gumbel noise added to items that are common in $h$ and $h^{\prime}$ and the noise that is added to the differing counts must have a noisy count either below the top- $k$ or below the noisy threshold $\tilde{T}$ that is set. From Lemma 5.5 in [6], we have the following bound when $T=1+\beta\log\left(\Delta_{0}/\delta\right)$

$\Pr_{A(h)}[E]\geq 1-\delta.$

We similarly define events $E^{\prime}$ for the randomness in $A(h^{\prime})$ that can generate outcomes in $S$ which gives us the same lower bound for $\Pr_{A(h^{\prime})}[E^{\prime}]$ .
2.

From Lemma 5.6 in [6], we know that there exists a distribution $P^{\prime},Q^{\prime}$ such that for all outcomes $y\in S$ , we have

$\Pr[A(x)=y]=\Pr[E]P^{\prime}(y)\qquad\Pr[A(x^{\prime})=y]=\Pr[E^{\prime}]Q^{\prime}(y).$
3.

These distributions $P^{\prime},Q^{\prime}$ are also related to the Exponential Mechanism. Specifically, $P^{\prime}$ is the distribution of iteratively applying the Exponential Mechanism on $h_{(\bar{k})}$ only over the items that are common between $h_{(\bar{k})}$ and $h_{(\bar{k})}$ . Similarly, we can define $Q^{\prime}$ as the distribution of iteratively applying the Exponential Mechanism on $h_{(\bar{k})}^{\prime}$ from items that are common between $h_{(\bar{k})}$ and $h_{(\bar{k})}$ . We can then use Theorem 5 to conclude that $D_{\lambda}(P^{\prime}||Q^{\prime})\leq\frac{k}{8\beta^{2}}$ and $D_{\lambda}(Q^{\prime}||P^{\prime})\leq\frac{k}{8\beta^{2}}$ , for all $\lambda\geq 1$ .

Setting $\beta=\frac{1}{\varepsilon}$ completes the proof. ∎

4.2.2 Pay-what-you-get Composition

We now consider the setting where an analyst wants to run multiple top- $k$ queries over an unknown domain. One of the issues with the Unknown Domain Gumbel mechanism in Algorithm 3 is that it can sometimes return fewer than $k$ results, yet the overall CDP privacy parameter scales with $k$ , regardless of the number of results returned. Hence, with $\ell^{*}$ many top- $k$ queries, the privacy loss would scale with $\ell^{*}\cdot k$ with traditional privacy techniques. However, [6] showed that if we instead set an overall bound on the number of results that can be returned $k^{*}$ from all the intermediate top- $k$ queries, then the privacy loss will scale with $k^{*}$ . They referred to this as pay-what-you-get composition because even if an analyst requests top-100 items, and only 1 result is returned, we need only deduct the number of items that are returned (including $\bot$ ) from the overall privacy budget until $k^{*}$ items have been returned. We present the pay-what-you-get composition in Algorithm 4

Algorithm 4 Pay-what-you-get Mechanism

Sequence of histogram

h^{(1)},\cdots,h^{(\ell^{*})}

, noise scale

\beta>0

, threshold

T>0

, and

k^{*}

Sequence of sorted lists

\{I^{(i)}:i\in[\ell^{*}]\}

Initialize

k=0

for

i\in[\ell^{*}]

Apply the Unknown Domain Gumbel from top-

(\bar{k}^{(i)}+1)

on histogram

h^{(i)}

with noise scale

\beta

, threshold

T>0

, along with

k^{(i)}\leq(k^{*}-k)

and

\bar{k}^{(i)}

to get items

I^{(i)}

, which includes

\bot

in some cases.

Update

k=k+|I^{(i)}|

k==k^{*}

then

break

end if

end for

The key observation to proving the overall privacy guarantee of pay-what-you-get is that each Unknown Domain Gumbel Mechanism is related to the Exponential Mechanism. Removing the randomness that can generate outcomes from the differing items between neighboring datasets, the algorithm becomes equivalent to an Exponential Mechanism. Hence, considering only the randomness over each Unknown Domain Gumbel that can generate outcomes in both neighboring histograms, we are left with just a string of at most $k^{*}$ many Exponential Mechanisms.

Theorem 10.

Given an adaptive stream of histograms $h^{(1)},\cdots,h^{(\ell^{*})}$ , where each histogram $h^{(i)}$ is $(\infty,1)$ -sensitive, the Pay-what-you-get mechanism in Algorithm 4 with $\beta=\Delta_{\infty}/\varepsilon$ and threshold $T$ being the same as in Theorem 9 with $\delta>0$ is $\ell^{*}\delta$ -approximate $k^{*}\varepsilon^{2}/8$ -CDP.

Proof.

We let $A$ be the Pay-what-you-get mechanism with neighboring histogram streams $(h^{(1)},\cdots,h^{(\ell^{*})})$ and $(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})$ , i.e. each histogram $h^{(i)}$ and $h^{\prime(i)}$ are neighbors, we denote $P$ as the distribution for $A(h^{(1)},\cdots,h^{(\ell^{*})})$ and $Q$ as the distribution for $A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})$ . WLOG, we assume that each $h^{(i)}$ has an additional user’s data from $h^{\prime(i)}$ for $i\in[\ell^{*}]$ .

We will denote $S=(S^{(1)},\cdots,S^{(\ell^{*})})$ as the set of common outcomes between these neighboring streams.

Let $E=(E^{(1)},\cdots,E^{(\ell^{*})})$ be the corresponding randomness in $A(h^{(1)},\cdots,h^{(\ell^{*})})=A^{(1)}(h^{(1)}),\cdots,A^{(\ell^{*})}(h^{(\ell^{*})}),$ that can generate outcomes in $S=(S^{(1)},\cdots,S^{(\ell^{*})})$ , where each $A^{(i)}$ is an Unknown Domain Gumbel mechanism, and similarly we denote $E^{\prime}=(E^{\prime(1)},\cdots,E^{\prime(\ell^{*})})$ as the corresponding randomness in $A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})$ that can generate outcomes in $S$ . We then apply Lemma 5.5 in [6] to get the following for each event set of $A^{(i)}$

\Pr_{A^{(i)}(h^{(i)})}[E^{(i)}]\geq 1-\delta,\qquad\Pr_{A^{(i)}(h^{\prime(i)})}[E^{\prime(i)}]\geq 1-\delta,\qquad\forall i\in[\ell^{*}].

Hence, applying a union bound over all $\ell^{*}$ events, we get

\Pr[E]\geq 1-\ell^{*}\delta,\qquad\Pr[E^{\prime}]\geq 1-\ell^{*}\delta.

We then apply Lemma 5.6 in [6], as we did in Theorem 9 to write the distribution of $A$ in terms of a stream of top- $k^{(i)}$ mechanisms that can be further decomposed into Exponential Mechanisms. Given the prior outcomes $y^{(<i)}$ , we will write $P^{\prime(i)}(\cdot;y^{(<i)})$ to denote the distribution of the top- $k^{(i)}$ mechanism evaluated on $h^{(i)}_{(\bar{k}^{(i)})}$ but only on the common items between $h^{(i)}_{(\bar{k}^{(i)})}$ and $h^{\prime(i)}_{(\bar{k}^{(i)})}$ . Similarly, we will write $Q^{\prime(i)}(\cdot;y^{(<i)})$ to denote the distribution of the top- $k^{(i)}$ mechanism evaluated on $h^{\prime(i)}_{(\bar{k}^{(i)})}$ but only on the common items with $h^{(i)}_{(\bar{k}^{(i)})}$ . We can now apply Lemma 5.6 from [6] iteratively to get for all $y=(y^{(1)},\cdots,y^{(\ell^{*})})\in S$ ,

	$\displaystyle\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y]$	$\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})$
	$\displaystyle\Pr[A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})=y]$	$\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{\prime(i)}]Q^{\prime(i)}(y^{(i)};y^{(<i)}).$

We then consider the distribution $P^{\prime}(y):=\prod_{i=1}^{\ell^{*}}P^{\prime(i)}(y^{(i)};y^{(<i)})$ , which can be further decomposed into separate Exponential Mechanisms. We will write $y^{(i)}=(y^{(i)}[1],\cdots,y^{(i)}[|y^{(i)}|])$ and $\hat{P}^{(i,j)}$ as the distribution for the Exponential Mechanism for the $i$ -th round’s mechanism returning the $j$ -th item. Hence we have

P^{\prime}(y)=\prod_{i=1}^{\ell^{*}}P^{\prime(i)}(y^{(i)};y^{(<i)})=\prod_{j=1}^{\ell^{*}}\prod_{j=1}^{|y^{(i)}|}\hat{P}^{(i,j)}(y^{(i)}[j];y^{(<i)},y^{(i)}[1],\cdots,y^{(i)}[j-1]).

Note that the two products on the right hand side will have at most $k^{*}$ many terms due to the number of items that can be returned by pay-what-you-get, and each item is the distribution of an Exponential Mechanism. We have a similar argument for $Q^{\prime(i)}(\cdot;y^{(<i)})$ being decomposed as a sequence of Exponential Mechanisms with $Q^{\prime}(y)=\prod_{i=1}^{\ell^{*}}Q^{\prime(i)}(y^{(i)};y^{(<i)})$ . Hence, we have $D_{\lambda}\left(P^{\prime}||Q^{\prime}\right)\leq\lambda k^{*}\varepsilon^{2}/8$ and $D_{\lambda}\left(Q^{\prime}||P^{\prime}\right)\leq\lambda k^{*}\varepsilon^{2}/8$ for all $\lambda\geq 1$ .

Note that we can almost apply Lemma 3.1, but not quite because we have $P(y)=\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y]=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})$ when $y\in S$ . We then construct the joint distribution as

	$\displaystyle P(y^{(1)},\cdots,y^{(\ell^{*})})$	$\displaystyle=\prod_{i=1}^{\ell^{}}P_{i}(y^{(i)}\|y^{(<i)})=\prod_{i=1}^{\ell^{}}\left(\Pr[E^{(i)}]P_{i}^{\prime}(y^{(i)}\|y^{(<i)})+\Pr[\neg E^{(i)}]P_{i}(y^{(i)}\|y^{(<i)},\neg E^{(i)})\right)$
		$\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})$
		$\displaystyle\quad+\sum_{S\subseteq[\ell^{*}]\setminus\emptyset}\left(\prod_{i\in S}\Pr[E^{(i)}]\prod_{i\notin S}\Pr[\neg E^{(i)}]\right)\left(\prod_{i\in S}P_{i}^{\prime}(y^{(i)}\|y^{(<i)}\prod_{i\notin S}P_{i}(y^{(i)}\|y^{(<i)},\neg E^{(i)})\right)$
		$\displaystyle=:P^{\prime}(y^{(1)},\cdots,y^{(\ell^{})})\cdot\prod_{i=1}^{\ell^{}}\Pr[E^{(i)}]+\hat{P}^{\prime\prime}(y^{(1)},\cdots,y^{(\ell^{*})})$

We can then write the probability as a convex combination, so that for all $y$ we have the following

	$\displaystyle\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y]$	$\displaystyle=P^{\prime}(y^{(1)},\cdots,y^{(\ell^{})})\cdot\prod_{i=1}^{\ell^{}}\Pr[E^{(i)}]+\hat{P}^{\prime\prime}(y)$
	$\displaystyle\implies\Pr[A(h^{(1)},\cdots,h^{(\ell^{*})})=y]$	$\displaystyle=(1-\delta)P^{\prime}(y)$
		$\displaystyle\quad+\delta\underbrace{\left(1/\delta\cdot\left(\hat{P}^{\prime\prime}(y)+P^{\prime}(y)\left(\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]-(1-\delta)\right)\right)\right)}_{P^{\prime\prime}(y)}$

Similarly, we have

\Pr[A(h^{\prime(1)},\cdots,h^{\prime(\ell^{*})})=y]=(1-\delta)\underbrace{\prod_{i=1}^{\ell^{*}}Q^{\prime(i)}(y^{(i)};y^{(<i)})}_{Q^{\prime}(y)}+\delta Q^{\prime\prime}(y).

This completes the proof. ∎

4.3 Continual Observation

We now present an approach for continually releasing a running counter over various domain elements while ensuring differential privacy. The continual observation privacy model was introduced in [3, 11] and is meant to ensure strong privacy guarantees in the setting where we want to continually provide a running counter on a stream of events, denoted at $\omega^{(1:\ell)}=\left(\omega^{(1)},\cdots,\omega^{(\ell)}\right)$ for $\ell=1,\cdots,L$ where $\omega^{(i)}\in\{0,1\}$ . It is then straightforward to extend the continual observation counters to the setting of releasing a running histogram over a known set of items, i.e. $\omega^{(i)}\subseteq[d]$ , where someone can contribute a limited set of items $|\omega^{(i)}|\leq\Delta_{0}$ at each round $i$ in the stream. Typically we want to ensure event-level privacy, where we consider the change in outcomes when one event in the stream can change.

Recent work from [17, 23] have also considered the continual observation setting, but in the case where we want to continually release histograms over an unknown set of items. Consider the motivating example where we want to provide a running counter for the drugs that are purchased at a pharmacy and each customer can buy at most $\Delta_{0}$ different drugs. There might be several different drugs to provide a counter for and new drugs can emerge later that were not known about before. Hence, we would like to have algorithms that do not require a set of items to provide counts over.

We describe the algorithm at a high level, as formally describing will require some additional notation. The main subroutine is the Binary Mechanism from [3, 11] which will add at most $\log_{2}(\ell)$ many noise terms to the count of a stream of length $\ell$ events. The number of noise terms depends on the number of 1s in the binary representation of $\ell$ . This will ensure that for a stream of length at most $L$ , we can release $L$ counts, one after each event, each with Gaussian noise with standard deviation $O(\log_{2}(L)/\varepsilon)$ . Although we release $L$ counts, the privacy analysis relies on the fact that we form a table of partial sums, so that one event in the stream can modify at most $\log_{2}(L)$ partial sums and we can view the Binary Mechanism as a Gaussian mechanism on the table of partial sums for all common items between $\omega^{(1:L)}$ and $\omega^{\prime(1:L)}$ , which has $\ell_{2}$ -sensitivity $\Delta_{0}\cdot\log_{2}(L)$ .

The Unknown Domain Binary Mechanism follows the same approach as the classical Binary Mechanism, which we present at a high level. The idea on the Binary Mechanism is to split a stream of items $\omega^{(1:\ell)}$ into overlapping partial sums, guaranteeing that each $\omega^{(i)}$ is part of no more than $\lceil\log_{2}(L+1)\rceil$ partial sums. We create separate partial sums for each item in $\omega^{(1:\ell)}$ . For instance, with $\ell=8$ , we will focus on a single partial sum, based on the binary representation of $\ell$ , so that we need only add noise to the partial sums $\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\}$ for each $u\in\omega^{(1:\ell)}$ which we add $\mathrm{N}(0,\sigma^{2})$ noise to the partial sum and is then used again for any other partial sum that utilizes $\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\}$ . For instance, when $\ell=10$ , we will use the two partial sums $\sum_{j=1}^{8}\mathbbm{1}\left\{u\in\omega^{(j)}\right\}$ and $\sum_{j=9}^{10}\mathbbm{1}\left\{u\in\omega^{(j)}\right\}$ for each $u\in\omega^{(1:10)}$ , each with its own noise added to it. Note that for each $u\in\omega^{(1:\ell)}\setminus\omega^{(1:\ell-1)}$ , we will add fresh noise to each prior partial sum for the new item $u$ . We then only release items with corresponding noisy counts if it is larger than a fixed threshold $T>0$ .

We now present the privacy analysis for Unknown Domain Binary Mechanism from [17].

Theorem 11.

Assume that $|\omega^{(\ell)}|\leq\Delta_{0}$ for all $\ell\in[L]$ . Setting $\sigma=1/\varepsilon$ and threshold $T$ to be the following for any $\delta>0$

T=1+\sigma\cdot\sqrt{\lceil\log_{2}(L+1)\rceil+1}\cdot\Phi^{-1}\left(1-\frac{\delta}{\Delta_{0}\cdot L}\right)

ensures that Unknown Domain Binary Mechanism is $\delta$ -approximate $\Delta_{0}\cdot\lceil\log_{2}(L+1)\rceil\varepsilon^{2}/2$ -CDP, under event-level adjacent streams.

Proof.

We follow the same analysis as in the earlier theorems, mainly leveraging Lemma 3.1. Let $\omega^{(1:L)}$ contain an event where neighboring stream $\omega^{\prime(1:L)}$ has an empty set at that event. Say that round $\ell$ is where they differ so that $\omega^{\prime(\ell)}=\emptyset$ and $|\omega^{(\ell)}|=\Delta_{0}$ . Let $A$ denote the Unknown Domain Binary Mechanism. We denote $S$ as the set of all outcomes that both $A(\omega^{(1:L)})$ and $A(\omega^{\prime(1:L)}$ can return. Note that at round $\ell$ , stream event $\omega^{(\ell)}$ can introduce as many as $\Delta_{0}$ previously unseen items in the stream. We will write the distribution of $A(\omega^{(1:L)})$ as $P$ and the distribution of $A(\omega^{\prime(1:L)})$ as $Q$ .

1.

We will write $E$ as the randomness in $A(\omega^{(1:L)})$ that can generate outcomes that are common between $A(\omega^{(1:L)})$ and $A(\omega^{\prime(1:L)})$ . We need to ensure that no noisy count on any new item that appears in $\omega^{(\ell)}$ but not in $\omega^{(1:\ell-1)}$ can go above the threshold $T$ . At round $\ell$ we can add together as many as $(\lceil\log_{2}(\ell+1)\rceil)\leq(\lceil\log_{2}(L+1)\rceil)$ independent Gaussian noise terms, which itself will be Gaussian. We then apply a union bound over all possible $\Delta_{0}$ items in $\omega^{(\ell)}$ and further a union bound over all possible rounds $L$ to get the following when $T=1+\sigma\cdot\sqrt{\lceil\log_{2}(L+1)\rceil}\cdot\Phi^{-1}(1-\tfrac{\delta}{\Delta_{0}\cdot L})$

$\Pr[\neg E]\leq\Delta_{0}\cdot L\cdot\Pr[\mathrm{N}(0,\lceil\log_{2}(L+1)\rceil\sigma^{2})\geq T-1]=\delta.$
2.

We will write the distribution of the Binary Mechanism evaluated on $\omega^{(1:L)}$ with only the common items with $\omega^{\prime(1:L)}$ as $P^{\prime}$ and whose counts are positive. We then have for all outcomes $y\in S$ that

$P(y)=\Pr[E]P^{\prime}(y)$

Since $Q$ is the distribution for the neighboring input $\omega^{(1:L)}$ where $\omega^{(\ell)}=\emptyset$ , we have $S$ is all outcomes so that $Q(y)=Q^{\prime}(y)$ for all $y\in S$ .
3.

Note that $P^{\prime}$ and $Q^{\prime}$ are simply a post-processing functions of the partial sums table with Gaussian noise added to each cell in the table. Hence, we need to consider the Rényi divergence for the partial sum tables on common items between $\omega^{(1:L)}$ and $\omega^{(1:L)}$ . We then have $D_{\lambda}(P^{\prime}||Q)\leq\Delta_{0}(\lceil\log_{2}(L+1)\rceil)/\sigma^{2}\lambda$ and $D_{\lambda}(P^{\prime}||Q)\leq\Delta_{0}(\lceil\log_{2}(L+1)\rceil)/\sigma^{2}\lambda$ for all $\lambda>1$ because this is a randomized post processing (including additional noise terms) on the common items in each partial sum table in the same as the Binary Mechanism.

Setting $\sigma=1/\varepsilon$ completes the proof.

∎

5 Conclusion

We have presented a unified framework to prove that several different algorithms over unknown domain histograms are approximate CDP. In many settings, practitioners want to have a way to incorporate private algorithms with minimal onboarding. A major bottleneck for incorporating private algorithms into existing systems is requiring a fixed list of items that we want to release counts for. Furthermore, products teams might be comfortable with noise added to counts, but not displaying counts for items that never appeared in the dataset. We wanted to show how the privacy analyses of many existing DP algorithms can be unified by fixing neighboring datasets and considering not just outcomes that can occur in both neighboring inputs, but also the related distributions that can only generate these good outcomes. We think that approximate CDP provides the easiest way to combine these algorithms together to get tight privacy loss bounds, as the privacy analysis of many rely on improved pure CDP bounds rather than pure DP bounds. For example, we showed how using a CDP analysis of the Laplace mechanism can improve the CDP privacy parameter by considering composition over differing counts, rather than relying on an $\ell_{1}$ -sensitivity bound as would be the case for DP. We can also use the tighter connection between Exponential Mechanisms and CDP, rather than using pure DP parameters of the Exponential Mechanism. Lastly, the Gaussian mechanism does not satisfy a pure DP bound, so using CDP is a natural fit and converting to approximate DP would result in a lossy DP parameter. We hope that this unified framework will help demystify some previous analyses and can be leveraged in designing future private algorithms.

6 Acknowledgements

Special thanks for the helpful comments from David Durfee and Thomas Steinke that helped improve the quality of this survey.

References

Bun and Steinke [2016] M. Bun and T. Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In M. Hirt and A. Smith, editors, Theory of Cryptography, pages 635–658, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg. ISBN 978-3-662-53641-4. URL https://link.springer.com/chapter/10.1007/978-3-662-53641-4_24.
Cesar and Rogers [2021] M. Cesar and R. Rogers. Bounding, concentrating, and truncating: Unifying privacy loss composition for data analytics. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory, volume 132 of Proceedings of Machine Learning Research, pages 421–457. PMLR, 16–19 Mar 2021. URL https://proceedings.mlr.press/v132/cesar21a.html.
Chan et al. [2012] T.-H. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In European Symposium on Algorithms (ESA), 2012. URL http://eprint.iacr.org/2012/373.pdf.
Ding et al. [2021] Z. Ding, D. Kifer, S. M. S. N. E., T. Steinke, Y. Wang, Y. Xiao, and D. Zhang. The permute-and-flip mechanism is identical to report-noisy-max with exponential noise, 2021.
Dong et al. [2020] J. Dong, D. Durfee, and R. Rogers. Optimal differential privacy composition for exponential mechanisms. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 2597–2606. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/dong20a.html.
Durfee and Rogers [2019] D. Durfee and R. M. Rogers. Practical differentially private top-k selection with pay-what-you-get composition. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 3527–3537, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/b139e104214a08ae3f2ebcce149cdf6e-Abstract.html.
Dwork and Roth [2014] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 & 4):211–407, 2014. doi: 10.1561/0400000042. URL http://dx.doi.org/10.1561/0400000042.
Dwork and Rothblum [2016] C. Dwork and G. Rothblum. Concentrated differential privacy. arXiv:1603.01887 [cs.DS], 2016.
Dwork et al. [2006a] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology - EUROCRYPT 2006, pages 486–503, Berlin, Heidelberg, 2006a. Springer Berlin Heidelberg. ISBN 978-3-540-34547-3. URL https://www.iacr.org/archive/eurocrypt2006/40040493/40040493.pdf.
Dwork et al. [2006b] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Third Conference on Theory of Cryptography, TCC’06, page 265–284, Berlin, Heidelberg, 2006b. Springer-Verlag. ISBN 3540327312. doi: 10.1007/11681878˙14. URL https://doi.org/10.1007/11681878_14.
Dwork et al. [2010] C. Dwork, M. Naor, T. Pitassi, and G. N. Rothblum. Differential privacy under continual observation. In Proceedings of the Forty-Second ACM Symposium on Theory of Computing, STOC ’10, page 715–724, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450300506. doi: 10.1145/1806689.1806787. URL https://doi.org/10.1145/1806689.1806787.
Gopi et al. [2022] S. Gopi, Y. T. Lee, and D. Liu. Private convex optimization via exponential mechanism. In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1948–1989. PMLR, 02–05 Jul 2022. URL https://proceedings.mlr.press/v178/gopi22a.html.
Korolova et al. [2009] A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th International Conference on World Wide Web, WWW ’09, page 171–180, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605584874. doi: 10.1145/1526709.1526733. URL https://doi.org/10.1145/1526709.1526733.
McKenna and Sheldon [2020] R. McKenna and D. Sheldon. Permute-and-flip: A new mechanism for differentially private selection. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
McSherry and Talwar [2007] F. McSherry and K. Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103, 2007. doi: 10.1109/FOCS.2007.66. URL http://dx.doi.org/10.1109/FOCS.2007.66.
Papernot and Steinke [2022] N. Papernot and T. Steinke. Hyperparameter tuning with renyi differential privacy. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=-70L8lpp9DF.
Rivera Cardoso and Rogers [2022] A. Rivera Cardoso and R. Rogers. Differentially private histograms under continual observation: Streaming selection into the unknown. In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pages 2397–2419. PMLR, 28–30 Mar 2022. URL https://proceedings.mlr.press/v151/rivera-cardoso22a.html.
Rogers et al. [2021] R. Rogers, S. Subramaniam, S. Peng, D. Durfee, S. Lee, S. K. Kancha, S. Sahay, and P. Ahammad. Linkedin’s audience engagements api: A privacy preserving data analytics system at scale. Journal of Privacy and Confidentiality, 11(3), Dec. 2021. doi: 10.29012/jpc.782. URL https://journalprivacyconfidentiality.org/index.php/jpc/article/view/782.
Steinke [2022] T. Steinke. Composition of differential privacy & privacy amplification by subsampling, 2022.
Swanberg et al. [2023] M. Swanberg, D. Desfontaines, and S. Haney. Dp-sips: A simpler, more scalable mechanism for differentially private partition selection, 2023.
Whitehouse et al. [2023] J. Whitehouse, A. Ramdas, R. Rogers, and Z. S. Wu. Fully adaptive composition in differential privacy. In International Conference on Machine Learning. PMLR, 2023.
Wilson et al. [2020] R. J. Wilson, C. Y. Zhang, W. Lam, D. Desfontaines, D. Simmons-Marengo, and B. Gipson. Differentially private SQL with bounded user contribution. Proceedings on Privacy Enhancing Technologies, 2020(2):230 – 250, 2020. doi: https://doi.org/10.2478/popets-2020-0025. URL https://content.sciendo.com/view/journals/popets/2020/2/article-p230.xml.
Zhang et al. [2023] B. Zhang, V. Doroshenko, P. Kairouz, T. Steinke, A. Thakurta, Z. Ma, H. Apte, and J. Spacek. Differentially private stream processing at scale, 2023.

	$\displaystyle P(y^{(1)},\cdots,y^{(\ell^{*})})$	$\displaystyle=\prod_{i=1}^{\ell^{}}P_{i}(y^{(i)}\|y^{(<i)})=\prod_{i=1}^{\ell^{}}\left(\Pr[E^{(i)}]P_{i}^{\prime}(y^{(i)}\|y^{(<i)})+\Pr[\neg E^{(i)}]P_{i}(y^{(i)}\|y^{(<i)},\neg E^{(i)})\right)$
		$\displaystyle=\prod_{i=1}^{\ell^{*}}\Pr[E^{(i)}]P^{\prime(i)}(y^{(i)};y^{(<i)})$
		$\displaystyle\quad+\sum_{S\subseteq[\ell^{*}]\setminus\emptyset}\left(\prod_{i\in S}\Pr[E^{(i)}]\prod_{i\notin S}\Pr[\neg E^{(i)}]\right)\left(\prod_{i\in S}P_{i}^{\prime}(y^{(i)}\|y^{(<i)}\prod_{i\notin S}P_{i}(y^{(i)}\|y^{(<i)},\neg E^{(i)})\right)$
		$\displaystyle=:P^{\prime}(y^{(1)},\cdots,y^{(\ell^{})})\cdot\prod_{i=1}^{\ell^{}}\Pr[E^{(i)}]+\hat{P}^{\prime\prime}(y^{(1)},\cdots,y^{(\ell^{*})})$

A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy

Abstract

1 Introduction

2 Preliminaries

Definition 2.1 (Dwork et al. [10, 9]).

Theorem 1 (Dwork et al. [10]).

Theorem 2 (McSherry and Talwar [15]).

Definition 2.2 (Bun and Steinke [1], Papernot and Steinke [16]).

Theorem 3 (Bun and Steinke [1]).

Theorem 4 (Bun and Steinke [1]).

Theorem 5 (Cesar and Rogers [2]).

Theorem 6.

3 Unifying Framework

Lemma 3.1.

Proof.

4 Unknown Domain Algorithms

4.1 Positive Count Histograms

Theorem 7.

Proof.

4.2 Top-(k¯+1)(\bar{k}+1) Count Histograms

Theorem 8.

Proof.

Lemma 4.1.

4.2.1 Exponential Mechanism

Theorem 9.

Proof.

4.2.2 Pay-what-you-get Composition

Theorem 10.

Proof.

4.3 Continual Observation

Theorem 11.

Proof.

5 Conclusion

6 Acknowledgements

References

4.2 Top- $(\bar{k}+1)$ Count Histograms