This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\altauthor\Name

Prateeti Mukherjee \Email[email protected]
\addrMicrosoft Research and \NameSatya Lokam \Email[email protected]
\addrMicrosoft Research

On the Query Complexity of Training Data Reconstruction in Private Learning

Abstract

We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data. For (ϵ,δ)(\epsilon,\delta) DP learners with training data drawn from any arbitrary compact metric space, we provide the first known lower bounds on the adversary’s query complexity as a function of the learner’s privacy parameters. Our results are minimax optimal for every ϵ0,δ[0,1]\epsilon\geq 0,\delta\in[0,1], covering both ϵ\epsilon-DP and (0,δ)(0,\delta) DP as corollaries. Beyond this, we obtain query complexity lower bounds for (α,ϵ)(\alpha,\epsilon) Rényi DP learners that are valid for any α>1,ϵ0\alpha>1,\epsilon\geq 0. Finally, we analyze data reconstruction attacks on locally compact metric spaces via the framework of Metric DP, a generalization of DP that accounts for the underlying metric structure of the data. In this setting, we provide the first known analysis of data reconstruction in unbounded, high dimensional spaces and obtain query complexity lower bounds that are nearly tight modulo logarithmic factors.

1 Introduction

Machine Learning has become increasingly pervasive, finding applications in multiple real world, risk-sensitive workflows. The fascinating potential of machine learning is perhaps most apparent in recent times, with the development of large-scale deep learning algorithms such as Large Language Models ([Brown et al. (2020)]) and Diffusion Models ([Ramesh et al. (2021)]). These powerful algorithms consume enormous amounts of data to derive meaningful patterns for improved empirical performance. Theoretical analyses (Bresler and Nagaraj, 2020; Bubeck et al., 2020; Vardi et al., 2021), however, reveal that Neural Networks are capable of memorizing the underlying training data, rendering them vulnerable to information leakage and adversarial attacks that can infer sensitive attributes of the training set, or worse, reconstruct one or more training samples entirely. Differential privacy was introduced in an attempt to formalize the notion of privacy protection while deriving meaningful conclusions from statistical information, and has since evolved into the de-facto notion of privacy in machine learning. This has inspired a plethora of research in the development of private counterparts to popular machine learning (Chaudhuri and Monteleoni, 2008; Jain et al., 2011; Chaudhuri et al., 2009; Alabi et al., 2020) and deep learning (Abadi et al., 2016; Fu et al., 2022; Song et al., 2013) algorithms. However, the promise of differential privacy is often difficult to interpret, and a rigorous quantification of “how much data privacy does an (ϵ,δ)(\epsilon,\delta)-DP guarantee actually confer?” is important to develop sufficiently private algorithms.

Analysis in this setting involves measuring the level of protection conferred by private learners against information leakage, a quantity that is largely specified by an adversary’s success rate in performing a particular class of attacks. For instance, Yeom et al. (2018) demonstrate that, for the class of Membership Inference Adversaries (MIAs), where the objective is to infer the membership status of a target individual in the training set of a private learner, an adversary cannot perform much better than random guessing if the learner is ϵ\epsilon-DP with ϵ0.4\epsilon\leq 0.4. On the flip side, a more recent work of Humphries et al. (2020) suggests that, under minimal modeling assumptions, if ϵ2\epsilon\geq 2, one can design adversaries that can correctly predict the membership status of a target individual through the output of the ϵ\epsilon-DP learner with probability 88%\geq 88\%. This is concerning, since, most practical deployments of ϵ\epsilon-DP algorithms consider ϵ[2,4]\epsilon\in[2,4] to be sufficiently private (Allen et al. ; Apple ). From the perspective of an individual supplying their sensitive information to a learning algorithm, under the impression that their participation is kept secret, such large lower bounds on the success probability of an attack implies little privacy protection. Therefore, for effective mitigation of privacy risks in this context, ϵ\epsilon must be sufficiently small. However, prior work of Tramèr and Boneh (2021) has already established that private learning in the small ϵ\epsilon regime yields models that perform significantly worse than their non-private counterparts, and find limited applicability in real-world deployments.

Membership status, however, may not be the security concern of interest in every scenario. For instance, an individual’s online presence in a social network platform is generally public, and therefore may not qualify as sensitive information. However, if an adversary is able to reconstruct private message transcripts or the individual’s personal details, their privacy is at risk. In machine learning tasks, this would be an instance of a Training Data Reconstruction Attack (DRA), wherein, the adversary attempts to reconstruct a target sample of the training dataset through the output of a learner. Carlini et al. (2021) have demonstrated that state of the art language models are susceptible to simple reconstruction adversaries that can perfectly reconstruct personally identifiable information and 128-bit UUIDs through repeated queries to the model, even when the sequences appear just once in the training set. This is particularly concerning in recent times, as we witness widespread adoption of powerful algorithms that necessitate the use of large scale, diverse training sets, procured from various sources. The results of Balle et al. (2022); Carlini et al. (2021); Zhu et al. (2019) indicate that unauthorized recovery, and subsequent misuse, of an individual’s sensitive information is a very tangible threat. Therefore, it is imperative that learning algorithms be scrutinized for their resilience to training data reconstruction attacks.

Privacy analysis in this setting has some nice properties. Intuitively, reconstruction must be harder than membership inference, since the latter involves recovering a single binary random variable, while the former may require reconstructing samples that are supported on some arbitrary domain beyond {0,1}d\{0,1\}^{d} . This leads us to the question ‘can DP imply meaningful protection against DRAs for a larger range of ϵ\epsilon, even when membership is compromised?’. Empirically, prior work of Balle et al. (2022) has established that this hypothesis is indeed true. In this work, we answer this question by studying the theoretical underpinnings of a private algorithm’s resilience to such reconstruction adversaries.

To facilitate a formal analysis, we must first identify the parameters that characterize protection offered by a private learner. For (ϵ,δ)(\epsilon,\delta)-DP learners, that there must be a dependence on the privacy parameters (ϵ,δ\epsilon,\delta) is obvious. Furthermore, the empirical success of the reconstruction adversaries in Carlini et al. (2021) indicate that the number of interactions between the adversary and the learner, which we call the adversary’s query complexity nn, is crucial to the success of the attack. Finally, we note that privacy violations are largely context-dependent. For instance, it is public information that Bill Gates is a billionaire; if an adversary is able to recover a rough approximation of his net worth with error of the order of 10910^{9}, it is very unlikely to be a violation of his privacy. However, if the adversary’s reconstruction has an error of the order 10210^{2}, this level of precision suggests a very detailed and potentially invasive investigation into his financial affairs, thereby raising privacy concerns. Therefore, the level of protection offered by a private learner against reconstruction adversaries is also parameterized by the reconstruction error tolerance β\beta, i.e., the precision to which a worst-case adversary is able to reconstruct a training sample, through the output of the private learner. From the learner’s point of view, β\beta is the threshold of non-privacy – if an adversary is able to reconstruct any training sample with error <β<\beta, it counts as a privacy violation with respect to DRAs. Intuitively, this can be thought of as a smoothed version of the notion of blatant non-privacy proposed in the seminal work of Dinur and Nissim (2003). To this end, we seek answers to the following question: “What is the maximum number of queries that a private learner can safely answer, while avoiding highly accurate reconstruction of training samples by any adversary?”

1.1 Contributions

Our work aims to present a comprehensive analysis of the effectiveness of training data reconstruction attacks against private learners. To this end, we establish tight non-asymptotic bounds on the query complexity of any informed reconstruction adversary, as a function of the privacy parameters of the learner and the system’s purported tolerance to reconstruction attacks. Our contributions are summarized as follows:

Reconstruction Attacks on (ϵ,δ)(\epsilon,\delta) Differentially Private Learners

Our first result analyzes data reconstruction attacks on (ϵ,δ)(\epsilon,\delta)-DP learners whose training dataset is supported on some arbitrary compact metric space. In particular, we operate under a similar threat model to that of Balle et al. (2022); Guo et al. (2022) (see Section 3 for a detailed overview) and tightly characterize the query complexity, i.e. the number of queries that the attacker needs to make, as a function of ϵ,δ\epsilon,\delta and the non-privacy threshold β\beta. The result is informally stated below:

Theorem 1.1 (DRAs on DP Learners (Informal)).

Let \mathcal{M} be an (ϵ,δ)(\epsilon,\delta)-DP learning algorithm whose training data is supported on a compact metric space (𝒵,ρ)(\mathcal{Z},\rho). Then, any data reconstruction adversary 𝒜\mathcal{A} needs to make at least Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})}) queries to \mathcal{M} in order to ensure a squared reconstruction error of at most β2\beta^{2}. Furthermore, the obtained query complexity is minimax optimal upto constant factors.

To the best of our knowledge, Theorem 1.1 is the first known analysis of DRAs that holds for arbitrary compact metric spaces, and is minimax optimal for all (ϵ,δ)(\epsilon,\delta) regimes. In particular, it gives minimax optimal query complexities of Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln(\tfrac{e^{\epsilon}+1}{2})}) for ϵ\epsilon-DP and Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(1/1δ))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\nicefrac{{1}}{{1-\delta}}\right)}) for (0,δ)(0,\delta)-DP learners.

Reconstruction Attacks on (α,ϵ)(\alpha,\epsilon)-Rényi DP Learners

Within the framework of Theorem 1.1, we analyze DRAs on (α,ϵ)(\alpha,\epsilon)-Rényi DP learners and obtain a query complexity lower bound of Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(1(1+eγϵ)1+(1+eγϵ)1/γ))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln(\frac{1}{(1+e^{\gamma\epsilon})^{-1}+(1+e^{\gamma\epsilon})^{\nicefrac{{-1}}{{\gamma}}}})}) where γ=11/α\gamma=1-\nicefrac{{1}}{{\alpha}}. To the best of our knowledge, this is the first non-asymptotic analysis of data reconstruction on Rényi DP learners which holds for any α>1\alpha>1 and ϵ0\epsilon\geq 0. This result also recovers the minimax optimal ϵ\epsilon-DP bound of Theorem 1.1 as α\alpha\to\infty. A key component of our proof is a novel total variation bound on the output distributions of Rényi DP learners, which could be of independent interest.

Metric Differential Privacy and Reconstruction Attacks on Locally Compact Spaces

We now extend our analysis beyond standard differential privacy by considering the notion of Metric Differential Privacy or Lipschitz Privacy (Chatzikokolakis et al., 2013; Fernandes et al., 2021; Koufogiannis et al., 2017; Imola et al., 2022; Boedihardjo et al., 2022), a privacy formalism that generalizes differential privacy by accounting for the underlying metric structure of the data (beyond the Hamming metric). Operating under this broader notion of privacy, we analyze DRAs on ϵ\epsilon-metric DP learners, whose training data is supported on arbitrary locally compact metric spaces (which may be unbounded) and obtain tight query complexity bounds. Our result is informally stated as follows:

Theorem 1.2 (DRAs on Metric DP Learners (Informal)).

Let \mathcal{M} be an ϵ\epsilon-metric DP learning algorithm whose training data is sampled from a locally compact metric space (𝒵,ρ)(\mathcal{Z},\rho). Let d~\tilde{d} be the metric entropy of the unit ball in (𝒵,ρ)(\mathcal{Z},\rho). Then, any data reconstruction adversary 𝒜\mathcal{A} needs to make at least Ω(d~ϵ2β2)\Omega\left(\tfrac{\tilde{d}}{\epsilon^{2}\beta^{2}}\right) queries to \mathcal{M} in order to ensure a squared reconstruction error of at most β2\beta^{2}. The obtained query complexity is minimax optimal upto logarithmic factors.

To the best of our knowledge, Theorem 1.2 is the first known analysis of data reconstruction attacks that directly applies to unbounded metric spaces (such as all of d\mathbb{R}^{d}), as well as the first result that aims to quantify the semantics of Metric Differential Privacy. To complement these results, we also demonstrate in Appendix D that commonly used privacy-inducing mechanisms and learning algorithms such as the Gaussian Mechanism, Stochastic Gradient Descent, and Projected Noisy Stochastic Gradient Descent, naturally satisfy metric differential privacy with little to no algorithmic modifications.

Our work analyzes data reconstruction attacks almost entirely from first principles, relying only on the basic definitions of differential privacy (and its variants), classical privacy-inducing mechanisms (Warner, 1965; McSherry and Talwar, 2007) and well-established connections between information divergences and hypothesis testing (Tsybakov, 2004).

2 Notation and Preliminaries

Let 𝒟\mathcal{D} be an arbitrary collection of secrets, with samples drawn from the metric space (𝒵,ρ)(\mathcal{Z},\rho). We assume (𝒵,ρ)(\mathcal{Z},\rho) is locally compact unless stated otherwise. Note that this property is satisfied by almost every possible data domain, including finite dimensional normed spaces, such as d\mathbb{R}^{d}. We denote the Hamming metric as ρH\rho_{H}. In practical applications, 𝒟\mathcal{D} is modelled as an aggregation of NN samples {z1,,zN}\{z_{1},\dots,z_{N}\} and the measure of dissimilarity for any two such collections 𝒟,𝒟𝒵n\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{Z}^{n} is given by ρ(𝒟,𝒟)=minσSNi=1Nρ(𝒟i,𝒟σ(i))\rho(\mathcal{D},\mathcal{D}^{\prime})=\min_{\sigma\in S_{N}}\sum_{i=1}^{N}\rho(\mathcal{D}_{i},\mathcal{D}_{\sigma(i)}^{\prime}), where SNS_{N} is the set of all permutations of [N][N]. Two datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} are neighboring if they differ in a single element, i.e., ρH(𝒟,𝒟)1\rho_{H}(\mathcal{D},\mathcal{D}^{\prime})\leq 1. We use the Ω,O\Omega,O and Θ\Theta notation to characterize the dependence of our rates on n,dn,d the privacy parameters of the algorithms and the system’s threshold of non-privacy, suppressing numerical constants. We use \gtrsim and \lesssim to denote \geq and \leq respectively, modulo numerical constants. Beyond this, we also make use of the packing number of compact spaces (or compact subsets)

Definition 2.1 (Packing Number (Wainwright (2019))).

An η\eta-packing of the set 𝒵\mathcal{Z} with respect to the metric ρ\rho is a set {zi}i[M]\{z_{i}\}_{i\in[M]} such that for all distinct v,v[M]v,v^{\prime}\in[M], we have ρ(zv,zv)η\rho(z_{v},z_{v^{\prime}})\geq\eta. The η\eta-packing number M(η,𝒵,ρ):=sup{M: an η-packing z1,,zM of 𝒵}M(\eta,\mathcal{Z},\rho):=\sup\{M\in\mathbb{N}:\exists\text{ an }\eta\text{-packing }z_{1},...,z_{M}\text{ of }\mathcal{Z}\}

2.1 Differential Privacy

Definition 2.2 (Pure Differential Privacy (Dwork and Roth (2014))).

For any ϵ0\epsilon\geq 0, a randomized learner :𝒵N\mathcal{M}:\mathcal{Z}^{N}\to\mathcal{H} is ϵ\epsilon-differentially private(DP) if, for every pair of neighboring datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime}, the output probability distributions satisfy:

T,[(𝒟)T]eϵ[(𝒟)T].\forall T\subseteq\mathcal{H},\;\mathbb{P}[\mathcal{M}(\mathcal{D})\in T]\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in T]. (1)

Or, equivalently, the max divergence DD_{\infty} of the output distributions satisfies:

D=supx𝗌𝗎𝗉𝗉(Q)log(P(x)Q(x))ϵD_{\infty}=\sup_{x\in\mathsf{supp}(Q)}\log\left(\frac{P(x)}{Q(x)}\right)\leq\epsilon (2)

where P,QP,Q denote the output distributions (𝒟),(𝒟)\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}), respectively.

Informally, DP bounds the change in the output distribution induced by a randomized algorithm when one of the input samples is replaced or removed. The constraint ensures that the output distributions are sufficiently statistically indistinguishable, often characterized by a constant contraction in the output space in terms of some statistical divergence. When defined in terms of the Rényi Divergence, we obtain the following popular relaxation of differential privacy:

Definition 2.3 (Rényi Differential Privacy (Mironov (2017))).

For any ϵ0,α>1\epsilon\geq 0,\alpha>1, a randomized learner :𝒵N\mathcal{M}:\mathcal{Z}^{N}\to\mathcal{H} is (α,ϵ)(\alpha,\epsilon)-Rényi differentially private(RDP) if, for every pair of neighboring datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime}, the Rényi Divergence is bounded as follows:

Dα(P||Q)=1α1log𝔼xQ[(P(x)Q(x))α]ϵD_{\alpha}(P||Q)=\frac{1}{\alpha-1}\log\mathbb{E}_{x\sim Q}\left[\left(\frac{P(x)}{Q(x)}\right)^{\alpha}\right]\leq\epsilon (3)

where PP and QQ denote the output distributions (𝒟)\mathcal{M}(\mathcal{D}) and (𝒟)\mathcal{M}(\mathcal{D}^{\prime}), respectively.

For α\alpha\to\infty, Rényi DP recovers the definition of ϵ\epsilon-DP (Mironov (2017)). Another popular relaxation of differential privacy, termed approximate differential privacy, measures this change in terms of (ϵ,δ)(\epsilon,\delta)-indistinguishability.

Definition 2.4 (Approximate Differential Privacy(Dwork et al. (2006))).

For any ϵ0,δ[0,1]\epsilon\geq 0,\delta\in[0,1], a randomized learner :𝒵N\mathcal{M}:\mathcal{Z}^{N}\to\mathcal{H} is (ϵ,δ)(\epsilon,\delta) differentially private if for every pair of neighboring datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime}, the output probability distributions (𝒟)\mathcal{M}(\mathcal{D}) and (𝒟)\mathcal{M}(\mathcal{D}^{\prime}) are (ϵ,δ)(\epsilon,\delta) indistinguishable, i.e.,

T,eϵ([(𝒟)T]δ)[(𝒟)T]eϵ[(𝒟)T]+δ\forall T\subseteq\mathcal{H},\;e^{-\epsilon}(\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in T]-\delta)\leq\mathbb{P}[\mathcal{M}(\mathcal{D})\in T]\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in T]+\delta (4)

2.2 Metric Differential Privacy

While there exist several variants and relaxations of DP, the traditional notion is fundamentally restricted to the discrete setting. Specifically, the definition requires that the private inputs to the randomized mechanism must belong to a product space, where collections can break down into natural single elements to allow for neighboring datasets to exist. We argue that this requirement is too stringent, particularly in situations where the sensitive information does not belong to a database at all, but is some arbitrary collection of secrets. In unstructured data domains, such as text, it is challenging to establish a natural definition of neighbors, but there does exist a notion of distinguishability of representations. For instance, consider a topic classification problem where the author’s identity is to be protected. In this case, two representations that are ”similar in topic” must remain so in the output of a private mechanism, irrespective of the author. This notion of distance is not natural to the Hamming metric, and requires more sophisticated metric (e.g. Earth Mover’s distance (Fernandes et al. (2019))). In fact, general measures also do not break down into natural single elements, as is required by DP (Boedihardjo et al. (2022)). To facilitate privacy protection in such settings, we would need to incorporate the underlying metric structure of the input space of the private algorithm in the parameter that bounds the shift in the output probability distributions, when altering a single input. The following notion incorporates this desiderata, and extends the classical concept of DP to general metric spaces.

Definition 2.5 (Metric Differential Privacy (Chatzikokolakis et al. (2013))).

Let (𝒵,ρ)(\mathcal{Z},\rho) be a locally compact metric space and \mathcal{H} be measurable. For any ϵ0\epsilon\geq 0, a randomized learner :𝒵N\mathcal{M}:\mathcal{Z}^{N}\to\mathcal{H} is ϵ\epsilon-metric differentially private (mDP) if for every pair of inputs z,z𝒵z,z^{\prime}\in\mathcal{Z} and any measurable subset TT\subset\mathcal{H},

[(z)T]eϵρ(z,z)[(z)T]\mathbb{P}[\mathcal{M}(z)\in T]\leq e^{\epsilon\rho(z,z^{\prime})}\mathbb{P}[\mathcal{M}(z^{\prime})\in T] (5)

We observe that metric differential privacy (often appearing under the pseudonym of Metric Privacy (Boedihardjo et al., 2022) and Lipschitz Privacy (Koufogiannis et al., 2017)) is a strict generalization of differential privacy. In fact, as we shall show in Appendix D, the two definitions are equivalent when the inputs belong to a product space equipped with the Hamming metric. Beyond this, we also discuss in Appendix D how popular privacy preserving mechanisms in the DP literature, such as the Gaussian mechanism, and learning algorithms, such as Projected Noisy SGD (Feldman et al. (2018)) satisfy metric DP with little to no modifications.

3 Formalizing Data Reconstruction Attacks as a Privacy Game

Our analysis begins by formalizing data reconstruction attacks as a privacy game between the learner \mathcal{M} and the reconstruction adversary 𝒜\mathcal{A}, which proceeds as follows:

Let 𝒟:={z1,,zN1}\mathcal{D_{-}}:=\{z_{1},...,z_{N-1}\} be a fixed dataset, comprising of N1N-1 training samples, that is known to both the learner and the adversary.

Learner chooses an arbitrary target sample z𝒵z\in\mathcal{Z}, appends it to the training dataset 𝒟=𝒟{z}\mathcal{D}=\mathcal{D_{-}}\cup\{z\}, and outputs a sample hh drawn from its output distribution 111By definition of differential privacy, \mathcal{M} must be a randomized algorithm, see Vadhan (2017) (𝒟)\mathcal{M}(\mathcal{D}).

Adversary makes nn queries to the model, i.e., she draws nn samples h1,hn(𝒟)h_{1},\dots h_{n}\sim\mathcal{M}(\mathcal{D}) from the learner’s output distribution, and generates a reconstruction 𝒜(h1,,hn)\mathcal{A}(h_{1},\dots,h_{n}) of the target sample zz. Since adversaries are generally resource bounded in almost all practical settings, we assume that the query complexity of 𝒜\mathcal{A}, i.e., the number of times that 𝒜\mathcal{A} can query \mathcal{M} is finite.

A prototypical example of \mathcal{M} would be a logistic regression classifier trained with DP-SGD (with h(𝒟)h\sim\mathcal{M}(\mathcal{D}) representing the regressor/model weights). Examples of 𝒜\mathcal{A} include the GLM attack of Balle et al. (2022).

We note that such a privacy game formulation of reconstruction attacks is also the cornerstone of prior works such as Balle et al. (2022) and Guo et al. (2022) (when restricted to the case of n=1n=1). In fact, analogous privacy game formulations for MIAs are highly predominant in a wide range of applications including (but not limited to):interpretations of the semantics of DP guarantees (Mahloujifar et al., 2022; Humphries et al., 2020; Yeom et al., 2018) and devising auditing strategies for practical deployments of private learning algorithms (Salem et al., 2023; Shokri, 2022)

Equipped with the above formulation, it is natural to quantitatively analyze data reconstruction attacks via the machinery of two-player zero-sum games (Von Neumann and Morgenstern, 1947). To this end, an intuitive choice of the utility/payoff function of the learner is given by 222More generally, one can set u=𝔼h1,,hn(𝒟{z})[g(ρ(𝒜(h1,,hn),z))]u_{\mathcal{M}}=\mathbb{E}_{h_{1},\dots,h_{n}\sim\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[g(\rho(\mathcal{A}(h_{1},\dots,h_{n}),z))] for some non-negative increasing differentiable function gg. We choose g(t)=t2g(t)=t^{2} for clarity but our proof techniques extend to arbitrary gg u(𝒜,z,n)=𝔼h1,,hn(𝒟{z})[ρ(𝒜(h1,,hn),z)2]u_{\mathcal{M}}(\mathcal{A},z,n)=\mathbb{E}_{h_{1},\dots,h_{n}\sim\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1},\dots,h_{n}),z)^{2}] and the corresponding value function of the game is given by

𝒱(,n)=inf𝒜supz𝒵𝔼h1,,hn(𝒟{z})[ρ(𝒜(h1,,hn),z)2]\mathcal{V}(\mathcal{M},n)=\inf_{\mathcal{A}}\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1},\dots,h_{n}\sim\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1},\dots,h_{n}),z)^{2}] (6)

where the infimum is taken over all reconstruction adversaries. Note that, given any error tolerance parameter β0\beta\geq 0, 𝒱(,n)β2\mathcal{V}(\mathcal{M},n)\geq\beta^{2} ensures that no data reconstruction adversary 𝒜\mathcal{A} that is allowed to make at most nn queries to \mathcal{M} can uniformly obtain an expected squared reconstruction error less than β2\beta^{2} on every target point z𝒵z\in\mathcal{Z}. To this end, we say that an adversary has attained the threshold of non-privacy β\beta, if for any possible choice of the target point zz, they are able to reconstruct zz upto an expected squared error of at most β2\beta^{2}. The remainder of our work aims to derive tight lower bounds for 𝒱(,n)\mathcal{V}(\mathcal{M},n) as a function of nn and the privacy parameters of the learner \mathcal{M}. This in turn can be used to derive tight query-complexity lower bounds for informed adversaries, i.e., quantify the number of times an adversary needs to query the model in order to reach a given threshold of non-privacy.

Remark 3.1.

This instantiation of a powerful white-box reconstruction adversary is borrowed from prior work of Balle et al. (2022). Although the modeling may seem very stylized, investigating provable mitigations conferred by (ϵ,δ)(\epsilon,\delta)-DP learners against such worst-case adversaries is a useful exercise – theoretical lower bounds on the adversary’s error suggest that such algorithms implicitly protect against reconstruction of training samples by less powerful, more realistic adversaries.

4 Data Reconstruction Attacks against Differentially Private Learners

Our first result is a query complexity lower bound for data reconstruction attacks on (ϵ,δ)(\epsilon,\delta) DP learners whose training dataset is supported on a compact metric space (𝒵,ρ)(\mathcal{Z},\rho). The proof of this result is presented in Appendix B.2.

Theorem 4.1 (Query Complexity Lower Bounds for (ϵ,δ)(\epsilon,\delta)-DP Learners).

Let (𝒵,ρ)(\mathcal{Z},\rho) be any compact metric space. Consider any arbitrary ϵ0\epsilon\geq 0 and δ[0,1]\delta\in[0,1]. Then, for any target sample z𝒵z\in\mathcal{Z} in the training dataset of an (ϵ,δ)(\epsilon,\delta) differentially private learner \mathcal{M}, and any data reconstruction adversary 𝒜\mathcal{A} that queries the learner nn times, the following holds:

𝒱(,n)𝖽𝗂𝖺𝗆(𝒵)2(2(1δ)eϵ+1)n\displaystyle\mathcal{V}(\mathcal{M},n)\gtrsim\ \mathsf{diam}(\mathcal{Z})^{2}\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}

where 𝖽𝗂𝖺𝗆(𝒵)=maxz,z𝒵ρ(z,z)\mathsf{diam}(\mathcal{Z})=\max_{z,z^{\prime}\in\mathcal{Z}}\rho(z,z^{\prime}).

Consequently, for any fixed non-privacy threshold β0\beta\geq 0, the number of times nn that 𝒜\mathcal{A} needs to query \mathcal{M} in order to always ensure an expected squared reconstruction error of at most β2\beta^{2}, is lower bounded as nΩ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))n\geq\Omega\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right)

We now demonstrate the minimax optimality of the above lower bound by deriving a matching upper bound for the two point metric space 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}. The proof of this result is presented in Appendix B.5.

Theorem 4.2 (Upper Bound for (ϵ,δ)(\epsilon,\delta)-DP Learners).

Let (𝒵,ρ)(\mathcal{Z},\rho) be a two-point metric space, i.e., 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}. Furthermore, let z𝒵z\in\mathcal{Z} and β0\beta\geq 0 be arbitrary. Then, there exists an (ϵ,δ)(\epsilon,\delta) differentially private learner \mathcal{M} whose training dataset contains zz, and a reconstruction adversary 𝒜\mathcal{A} which makes Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right) queries to \mathcal{M} and achieves an expected squared error of β2\beta^{2}.

Our upper bound construction is an (ϵ,δ)(\epsilon,\delta) DP variant of the seminal randomized response algorithm (Warner, 1965). This choice is motivated by the fact that the (ϵ,δ)(\epsilon,\delta) randomized mechanism \mathcal{M} used in our proof is known to be complete among the class of (ϵ,δ)(\epsilon,\delta) DP algorithms, i.e., any (ϵ,δ)(\epsilon,\delta) DP algorithm can be represented as a composition of \mathcal{M} with some (possibly) randomized algorithm TT (see Kairouz et al. (2015), Murtagh and Vadhan (2015) Lemma 3.2 and Bun and Steinke (2016) ) We also highlight that our upper bound construction is general enough to be directly extendable to kk-point metric spaces 𝒵={z1,,zk}\mathcal{Z}=\{z_{1},\dots,z_{k}\} and can potentially be extended to arbitrary compact metric spaces 𝒵\mathcal{Z} via covering arguments.

Applicability and Minimax Optimality

We note that the lower bound of Theorem 4.1 holds for any compact metric space (𝒵,ρ)(\mathcal{Z},\rho). Furthermore, Theorem 4.1 and Theorem 4.2 are applicable for every ϵ0,δ[0,1]\epsilon\geq 0,\delta\in[0,1]. Thus, our obtained query complexity guarantee is minimax optimal in all regimes, which, to the best of our knowledge, is the first such result of its kind. To this end, our results cover both pure DP (or ϵ\epsilon-DP) and (0,δ)(0,\delta) DP. In particular, Theorem 4.1 implies a query complexity lower bound of Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2}\right)}) for ϵ\epsilon-DP and Ω(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(1/1δ))\Omega(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\nicefrac{{1}}{{1-\delta}}\right)}) for (0,δ)(0,\delta) DP. Both these guarantees are minimax optimal as per Theorem 4.2.

Bounded Domain Assumption

Although the bound in Theorem 4.1 is minimax optimal, and to the best of our knowledge, the most general such result of its kind, we note that it requires 𝖽𝗂𝖺𝗆(𝒵)<\mathsf{diam}(\mathcal{Z})<\infty. We emphasize that this assumption is not specific to our analysis, and is in agreement with several prior works that investigate the protection offered by traditional DP (including variants and relaxations) algorithms (Yeom et al., 2018; Tramèr and Boneh, 2020; Guo et al., 2022; Balle et al., 2022). We conjecture that this assumption cannot be avoided in the analysis of traditional DP algorithms without making stringent modelling assumptions on the either the private learner, the data distribution, or the adversary’s reconstruction algorithm. This is due to the fact that the standard notion of differential privacy (as defined in Definitions 2.2 and 2.4) is oblivious to the underlying metric structure of the input space, thereby yielding worst-case behavior. In Section 5, we remove this boundedness assumption and extend the analysis to arbitrary metric spaces, presenting the first query complexity bounds for data reconstruction attacks against metric DP learners.

4.1 Comparison to Prior Work

To the best of our knowledge, the result closest to Theorem 4.1 in prior works is Theorem 1 of Guo et al. (2022), which considers data reconstruction attacks under the same threat model as ours (see Section 3), but is restricted to (2,ϵ)(2,\epsilon)-Rényi DP and n=1n=1. Furthermore, it requires the data domain 𝒵\mathcal{Z} to be a compact subset of d\mathbb{R}^{d}. For the sake of completeness, we restate their result as follows:

Theorem 4.3 (Theorem 1 Guo et al. (2022)).

For a target sample z𝒵dz\in\mathcal{Z}\subseteq\mathbb{R}^{d} in the training set of a (2,ϵ)(2,\epsilon)-Rényi DP learner, the MSE of a reconstruction attack that outputs an unbiased estimate z^\hat{z} of zz upon observing h(𝒟{z})h\leftarrow\mathcal{M}(\mathcal{D_{-}}\cup\{z\}) follows 𝔼h[z(h)^z22]i=1d𝖽𝗂𝖺𝗆i(𝒵)24(eϵ1)\mathbb{E}_{h}[\|\hat{z(h)}-z\|^{2}_{2}]\geq\frac{\sum_{i=1}^{d}\mathsf{diam}_{i}(\mathcal{Z})^{2}}{4(e^{\epsilon}-1)}. The i-th coordinate-wise diameter is given by 𝖽𝗂𝖺𝗆i(𝒵):=supz,z𝒵,zj=zjji|zizi|\mathsf{diam}_{i}(\mathcal{Z}):=\underset{z,z^{\prime}\in\mathcal{Z},z_{j}=z^{\prime}_{j}\forall j\neq i}{\sup}|z_{i}-z_{i}^{\prime}|.

Invalid Lower Bound

As we shall prove in Appendix B.6, the lower bound implied by Theorem 1 of Guo et al. (2022) is invalid for any ϵ<ln(1+d/4)\epsilon<\ln(1+\nicefrac{{d}}{{4}}), i.e., it violates trivial upper bounds implicit in the notion of the squared distance in bounded domains (this can also be intuitively seen by observing that the RHS diverges to infinity exponentially fast as ϵ0\epsilon\to 0). To this end, the above lower bound result is vacuous for a surprisingly large range of ϵ\epsilon, particularly in high dimensional settings. Even for elementary use-cases like the MNIST dataset (Lecun et al., 1998), where d=784d=784, the lower bound of Guo et al. (2022) is vacuous for any ϵ<5.28\epsilon<5.28. We recall that practical deployments of DP algorithms typically use ϵ[2,4]\epsilon\in[2,4] (Apple, ). On the contrary, our results in Theorem 4.1 and 4.2 are minimax optimal in all regimes ϵ0,δ[0,1]\epsilon\geq 0,\delta\in[0,1]. Beyond this, we also establish a query complexity lower bound for (α,ϵ)(\alpha,\epsilon) Rényi DP in Corollary 4.4, which holds for any α>1,ϵ0\alpha>1,\epsilon\geq 0 (as opposed to the specific case of α=2\alpha=2, as in Theorem 1 of Guo et al. (2022))

Restriction to compact subsets of d\mathbb{R}^{d}

We highlight that Theorem 1 of Guo et al. (2022) is restricted only to compact subsets of d\mathbb{R}^{d} whereas our lower bound applies to any compact metric space. In fact, the validity of Guo et al. (2022) Theorem 1 is dependent on the coordinate wise diameter 𝖽𝗂𝖺𝗆i(𝒵)\mathsf{diam}_{i}(\mathcal{Z}) being well defined. We note that there exist several applications where this condition may not be satisfied. For instance, let 𝒵\mathcal{Z} be a collection of strings of varying lengths (where the maximum length is finite), equipped with the edit distance metric. The coordinate-wise edit distance is the maximum difference between corresponding characters of any two strings of the same length, which is not well-defined when 𝒵\mathcal{Z} is composed of variable-length strings. Thus, their lower bound does not apply to this setting. On the contrary, our result in Theorem 4.1 is still valid, since 𝖽𝗂𝖺𝗆(𝒵)\mathsf{diam}(\mathcal{Z}) is well-defined and represents the maximum dissimilarity between any two strings in 𝒵\mathcal{Z}.

4.2 Data Reconstruction Attacks on Rényi DP Learners

We now consider reconstruction attacks on (α,ϵ)(\alpha,\epsilon)Rényi DP learners (for α>1,ϵ0\alpha>1,\epsilon\geq 0). To this end, we derive the following query complexity lower bound by extending the proof technique of Theorem 4.2 to the notion of Rényi DP. A key step in the proof involves deriving a novel Total Variation bound between the output distributions of Rényi DP Learners (Lemma 6.2 in Section 6), which could be of independent interest.

Corollary 4.4 (Query Complexity Lower Bounds for (α,ϵ)(\alpha,\epsilon)-Rényi DP).

Let (𝒵,ρ)(\mathcal{Z},\rho) be any compact metric space. Consider any arbitrary α>1\alpha>1 and ϵ0\epsilon\geq 0. Then, for any target sample z𝒵z\in\mathcal{Z} in the training dataset of an (α,ϵ)(\alpha,\epsilon)-Rényi DP learner \mathcal{M}, and any data reconstruction adversary 𝒜\mathcal{A} that queries the learner nn times, the following holds:

𝒱(,n)𝖽𝗂𝖺𝗆(𝒵)2[1eγϵ+1+1(eγϵ+1)1/γ]n\displaystyle\mathcal{V}(\mathcal{M},n)\gtrsim\ \mathsf{diam}(\mathcal{Z})^{2}\left[\frac{1}{e^{\gamma\epsilon}+1}+\frac{1}{\left(e^{\gamma\epsilon}+1\right)^{\nicefrac{{1}}{{\gamma}}}}\right]^{n}

where γ=11/α(0,1)\gamma=1-\nicefrac{{1}}{{\alpha}}\in(0,1) and 𝖽𝗂𝖺𝗆(𝒵)=maxz,z𝒵ρ(z,z)\mathsf{diam}(\mathcal{Z})=\max_{z,z^{\prime}\in\mathcal{Z}}\rho(z,z^{\prime}).

Consequently, for any fixed non-privacy threshold β0\beta\geq 0, the number of times nn that 𝒜\mathcal{A} needs to query \mathcal{M} in order to always ensure an expected squared reconstruction error of at most β2\beta^{2}, is lower bounded as follows:

nΩ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(1(1+eγϵ)1+(1+eγϵ)1/γ))\displaystyle n\geq\Omega\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\frac{1}{(1+e^{\gamma\epsilon})^{-1}+(1+e^{\gamma\epsilon})^{\nicefrac{{-1}}{{\gamma}}}}\right)}\right)

We note that Corollary 4.4 applies for any α>1,ϵ0\alpha>1,\epsilon\geq 0, in contrast to prior work (Guo et al., 2022) which only considers the case of α=2\alpha=2. Furthermore, as α\alpha\to\infty, Corollary 4.4 exactly recovers the minimax optimal query complexity lower bound for ϵ\epsilon-DP learners as implied by Theorem 4.1.

5 Data Reconstruction Attacks on Metric DP Learners

We now present our results for data reconstruction attacks on locally compact linear spaces by considering metric DP learners. In this setting, we obtain the following lower bound, which we prove in Appendix C.1

Theorem 5.1 (Lower Bounds for ϵ\epsilon Metric DP Learners).

Let (𝒵,ρ)(\mathcal{Z},\rho) be any locally compact metric space and let d~=logM(𝔹(𝒵),ρ,1/2)\tilde{d}=\log M(\mathbb{B}(\mathcal{Z}),\rho,\nicefrac{{1}}{{2}}), where 𝔹(𝒵)\mathbb{B}(\mathcal{Z}) is the unit ball in 𝒵\mathcal{Z}. For any target sample z𝒵z\in\mathcal{Z} in the training dataset of an ϵ\epsilon metric differentially private learner \mathcal{M}, and any data reconstruction adversary 𝒜\mathcal{A} that queries the learner nn times, 𝒱(,n)Ω(d~nϵ2)\mathcal{V}(\mathcal{M},n)\geq\ \Omega\left(\frac{\tilde{d}}{n\epsilon^{2}}\right). Consequently, for any fixed non-privacy threshold β0\beta\geq 0, 𝒜\mathcal{A} needs to query \mathcal{M} at least Ω(d~/ϵ2β2)\Omega(\nicefrac{{\tilde{d}}}{{\epsilon^{2}\beta^{2}}}) times in order to always ensure an expected squared reconstruction error of at most β2\beta^{2}.

We note that for (d,)(\mathbb{R}^{d},\|\cdot\|), (or more generally, any finite dimensional normed space), d~=Θ(d)\tilde{d}=\Theta(d) and thus, the query complexity lower bound becomes Ω(d/ϵ2β2)\Omega(\nicefrac{{d}}{{\epsilon^{2}\beta^{2}}}). We complement this with an upper bound that is tight modulo logarithmic factors. The proof is presented in Appendix C.2

Theorem 5.2 (Upper Bound for ϵ\epsilon Metric DP Learners).

Let \mathcal{M} be an ϵ\epsilon metric differentially private learner whose training datapoints are elements of the normed space (d,2)(\mathbb{R}^{d},\|\cdot\|_{2}). Let zdz\in\mathbb{R}^{d} be any arbitrary sample in the training dataset of \mathcal{M} and let β0\beta\geq 0 be arbitrary. Then, there exists a reconstruction adversary 𝒜\mathcal{A} which makes Θ(dlog2(d)ϵ2β2)\Theta(\tfrac{d\log^{2}(d)}{\epsilon^{2}\beta^{2}}) queries to \mathcal{M} in order to achieve an expected squared error of at most β2\beta^{2}.

Unlike prior works on data reconstruction that require some boundedness assumption on the data domain, (Yeom et al., 2018; Tramèr and Boneh, 2020; Guo et al., 2022; Stock et al., 2022), Theorem 5.1 and Theorem 5.2, are, to the best of our knowledge, the first near-optimal analysis of training data reconstruction for unbounded metric spaces

Dimension dependence

The nearly-matching upper and lower bounds of Theorem 5.1 and Theorem 5.2 captures the inherent hardness of the data reconstruction problem in high dimensions, and thus provides solid theoretical grounding to the observations of Balle et al. (2022) which shows that the performance of data reconstruction adversaries scales inversely with dimension. The intuition may be formalized as follows: let 𝒵\mathcal{Z} be a finite dimensional vector space. Any target sample z𝒵z\in\mathcal{Z} has dd-degrees of freedom. From the lens of an adversary, the complexity of reconstructing the target sample must grow with dd. Therefore, the number of queries the adversary needs to make to the learner to achieve good quality reconstruction must scale proportional to dd, which is accurately captured in Theorem 5.1.

6 Proof Sketch

Our analysis hinges on the following insight: Suppose the target sample zz is chosen from some finite set S={z1,,zk}S=\{z_{1},\dots,z_{k}\} which is apriori known to the adversary. The task of the adversary then reduces to inferring (or testing) which one of z1,,zkz_{1},\dots,z_{k} is actually the target sample zz, by using nn samples from (𝒟{z})\mathcal{M}(\mathcal{D_{-}}\cup\{z\}). Intuitively, this task of testing from samples cannot be harder than the original data reconstruction problem (wherein z𝒵z\in\mathcal{Z} can be arbitrary). Such testing reductions have a rich history in theoretical CS (Goldreich et al., 1998), learning theory (Kearns and Vazirani, 1994) and statistics (Tsybakov, 2004). We complement this insight with the statistical indistinguishability interpretation of differential privacy and its relaxations, i.e., for any private learner \mathcal{M}, the output distributions (𝒟{zi})\mathcal{M}(\mathcal{D_{-}}\cup\{z_{i}\}) and (𝒟{zj})\mathcal{M}(\mathcal{D_{-}}\cup\{z_{j}\}) must be sufficiently close, which in turn imposes fundamental limits (quantified via information divergences) on the accuracy of the adversary’s testing problem described above.

6.1 Proof of the Reconstruction Lower Bounds

We sketch the proof of Theorem 4.1 and Corollary 4.4. For any z𝒵z\in\mathcal{Z}, let PzP_{z} denote the output distribution of (𝒟{z})\mathcal{M}(\mathcal{D_{-}}\cup\{z\}) and let PznP^{n}_{z} to be the associated product distribution. Recall that 𝒱(,n)=inf𝒜supz𝒵𝔼h1:nPzn[ρ(𝒜(h1:n),z)2]\mathcal{V}(\mathcal{M},n)=\inf_{\mathcal{A}}\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1:n}\sim P^{n}_{z}}[\rho(\mathcal{A}(h_{1:n}),z)^{2}]. Moreover, let Δ=𝖽𝗂𝖺𝗆(𝒵)/2\Delta=\nicefrac{{\mathsf{diam}(\mathcal{Z})}}{{2}} and let z1,z2z_{1},z_{2} be the two farthest points in 𝒵\mathcal{Z}, i.e., ρ(z1,z2)=2Δ\rho(z_{1},z_{2})=2\Delta. In accordance with the above discussion, we reduce the general reconstruction problem to the case when zS={z1,z2}z\in S=\{z_{1},z_{2}\} by replacing the supremum in the definition of 𝒱\mathcal{V} with an average over SS and applying Markov’s inequality to obtain:

𝒱(,n)\displaystyle\mathcal{V}(\mathcal{M},n) Δ22inf𝒜[Pz1n[ρ(𝒜(h1:n),z1)Δ]+Pz2n[ρ(𝒜(h1:n),z2)Δ]]\displaystyle\geq\tfrac{\Delta^{2}}{2}\inf_{\mathcal{A}}\left[P^{n}_{z_{1}}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\Delta]+P^{n}_{z_{2}}[\rho(\mathcal{A}(h_{1:n}),z_{2})\geq\Delta]\right]

Since ρ(z1,z2)=2Δ\rho(z_{1},z_{2})=2\Delta, an application of triangle inequality shows that ρ(𝒜(h1:n),z1)Δ\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\Delta holds whenever ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2}). Similarly, ρ(𝒜(h1:n),z2)Δ\rho(\mathcal{A}(h_{1:n}),z_{2})\geq\Delta holds whenever ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)\rho(\mathcal{A}(h_{1:n}),z_{1})\leq\rho(\mathcal{A}(h_{1:n}),z_{2}). It follows that,

𝒱(,n)\displaystyle\mathcal{V}(\mathcal{M},n) Δ22inf𝒜[Pz1n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]+Pz2n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]]\displaystyle\geq\tfrac{\Delta^{2}}{2}\inf_{\mathcal{A}}\left[P^{n}_{z_{1}}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2})]+P^{n}_{z_{2}}[\rho(\mathcal{A}(h_{1:n}),z_{1})\leq\rho(\mathcal{A}(h_{1:n}),z_{2})]\right]

By the definition of 𝖳𝖵\mathsf{TV}, one can see that for any adversary 𝒜\mathcal{A}, the sum of the two probabilities on the RHS can be uniformly lower bounded by 1𝖳𝖵(Pz1n,Pz2n)1-\mathsf{TV}(P^{n}_{z_{1}},P^{n}_{z_{2}}). Subsequently, using the tensorization properties of 1TV1-TV and recalling the definition of PznP^{n}_{z}, we obtain the following lower bound

𝒱(,n)\displaystyle\mathcal{V}(\mathcal{M},n) 𝖽𝗂𝖺𝗆(𝒵)28(1𝖳𝖵((𝒟{z1}),(𝒟{z2})))n\displaystyle\geq\tfrac{\mathsf{diam}(\mathcal{Z})^{2}}{8}\left(1-\mathsf{TV}(\mathcal{M}(\mathcal{D_{-}}\cup\{z_{1}\}),\mathcal{M}(\mathcal{D_{-}}\cup\{z_{2}\}))\right)^{n} (7)

The proof of both Theorem 4.1 and Corollary 4.4 are now concluded by bounding the Total Variation between the two output distributions of the private learner \mathcal{M}. To this end, we first consider Theorem 4.1 and derive the following 𝖳𝖵\mathsf{TV} upper bound for the output distributions of (ϵ,δ)(\epsilon,\delta) DP learners in Appendix B.1.

Lemma 6.1 (TV Bounds for (ϵ,δ)(\epsilon,\delta) DP Learners).

Let \mathcal{M} be any arbitrary (ϵ,δ)(\epsilon,\delta) DP learner and let 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} be any two arbitrary neighbouring datasets (i.e. datasets that differ in only one record). Then, the following holds:

𝖳𝖵((𝒟),(𝒟))12(1δ)1+eϵ\displaystyle\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}))\leq 1-\frac{2(1-\delta)}{1+e^{\epsilon}}

For Corollary 4.4, we derive the following novel 𝖳𝖵\mathsf{TV} bound between the output distributions of (α,ϵ)(\alpha,\epsilon) Renyi DP Learners, which may be of independent interest. We prove this in Appendix B.3

Lemma 6.2 (TV Bounds for (α,ϵ)(\alpha,\epsilon) Renyi DP Learners).

Let \mathcal{M} be any arbitrary (α,ϵ)(\alpha,\epsilon) Renyi DP learner and let 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} be any two arbitrary neighbouring datasets (i.e. datasets that differ in only one record). Then, the following holds:

𝖳𝖵((𝒟),(𝒟))111+e(α1)ϵα1(1+e(α1)ϵα)αα1\displaystyle\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}))\leq 1-\frac{1}{1+e^{\tfrac{(\alpha-1)\epsilon}{\alpha}}}-\frac{1}{(1+e^{\tfrac{(\alpha-1)\epsilon}{\alpha}})^{\tfrac{\alpha}{\alpha-1}}}

The proof of Theorem 4.1 and Corollary 4.4 are now completed by respectively applying Lemma 6.1 and Lemma 6.2 to equation (7). We also highlight that the proof of Theorem 5.1 proceeds along very similar lines, except that it involves a more delicate choice of the testing set S={z1,,zM}S=\{z_{1},\dots,z_{M}\} using the fact that the definition of Metric DP is sensitive to the metric structure of the input.

6.2 Upper Bound for (ϵ,δ)(\epsilon,\delta) DP : Proof Sketch of Theorem 4.2

Since 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}, 𝖽𝗂𝖺𝗆(𝒵)=ρ(z1,z2)\mathsf{diam}(\mathcal{Z})=\rho(z_{1},z_{2}). Without loss of generality, let z=z1z=z_{1}, i.e., 𝒟=𝒟{z1}\mathcal{D}=\mathcal{D_{-}}\cup\{z_{1}\}. As stated earlier, our construction of \mathcal{M} is a variant of the randomized response mechanism defined as follows:

  1. 1.

    Flip a coin C𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(δ)C\sim\mathsf{Bernoulli}(\delta). If C=1C=1, then (𝒟):=(1,z)\mathcal{M}(\mathcal{D}):=(1,z)

  2. 2.

    Otherwise, if C=0C=0, then (𝒟)\mathcal{M}(\mathcal{D}) is defined as follows:

    (𝒟):={(0,z) with probability eϵ1+eϵ(0,𝒵{z}) with probability 11+eϵ\displaystyle\mathcal{M}(\mathcal{D}):=\begin{cases}(0,z)\text{ with probability }\tfrac{e^{\epsilon}}{1+e^{\epsilon}}\\ (0,\mathcal{Z}\setminus\{z\})\text{ with probability }\tfrac{1}{1+e^{\epsilon}}\end{cases}

As we shall show in Appendix B.5, \mathcal{M} is (ϵ,δ)(\epsilon,\delta) differentially private. We now consider a reconstruction adversary 𝒜\mathcal{A} which draws h1,,hn𝗂.𝗂.𝖽(𝒟)h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}) as follows: If Ci=1C_{i}=1 for some i[n]i\in[n], 𝒜(h1:n):=zi\mathcal{A}(h_{1:n}):=z_{i}. Else, define 𝒜\mathcal{A} as follows:

bi:={𝖡𝖾𝗋𝗇(1eϵ) if hi=(0,z)0 if hi=(0,𝒵{z})𝒜(h1:n):={z if i=1nbi=1𝒵{z} otherwise \begin{aligned} b_{i}:=\begin{cases}\sim\mathsf{Bern}(1-e^{-\epsilon})\text{ if }h_{i}=(0,z)\\ 0\text{ if }h_{i}=(0,\mathcal{Z}\setminus\{z\})\end{cases}\end{aligned}\begin{aligned} \mathcal{A}(h_{1:n}):=\begin{cases}z\text{ if }\wedge_{i=1}^{n}b_{i}=1\\ \mathcal{Z}\setminus\{z\}\text{ otherwise }\\ \end{cases}\end{aligned}

where \wedge denotes bitwise OR. Conditioned on the event {Ci=0i[n]}\{C_{i}=0\ \forall i\in[n]\}, bi𝗂.𝗂.𝖽𝖡𝖾𝗋𝗇(1p)b_{i}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathsf{Bern}(1-p) where p=1eϵ+1+eϵeϵeϵ+1=2eϵ+1p=\tfrac{1}{e^{\epsilon}+1}+e^{-\epsilon}\cdot\tfrac{e^{\epsilon}}{e^{\epsilon}+1}=\tfrac{2}{e^{\epsilon}+{1}}. Note that 𝒜(h1:n)=𝒵{z}\mathcal{A}(h_{1:n})=\mathcal{Z}\setminus\{z\} iff Ci=bi=0i[n]C_{i}=b_{i}=0\ \forall i\in[n]. Since this event occurs with probability (2(1δ)eϵ+1)n(\tfrac{2(1-\delta)}{e^{\epsilon}+1})^{n}, it follows that

𝔼h1,,hn𝗂.𝗂.𝖽(𝒟)[ρ(𝒜(h1,,hn),z)2]=(2(1δ)eϵ+1)n𝖽𝗂𝖺𝗆(𝒵)2\displaystyle\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D})}[\rho(\mathcal{A}(h_{1},\dots,h_{n}),z)^{2}]=\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}\mathsf{diam}(\mathcal{Z})^{2}

To set the RHS to be equal to β2\beta^{2}, it suffices to set n=Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))n=\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right)

We note that this adversary uses a randomized decision rule, which is consistent with our modeling assumptions in Section 3 and of prior work (Guo et al. (2022); Balle et al. (2022)). In Appendix B, we construct a deterministic majority-based adversary and demonstrate that the lower bound is still tight ϵ0.042,δ[0,1]\ \forall\epsilon\geq 0.042,\delta\in[0,1].

6.3 Upper Bound for ϵ\epsilon Metric DP: Proof Sketch of Theorem 5.2

Our upper bound construction for metric DP is based upon the exponential mechanism in (d,˙2)(\mathbb{R}^{d},\|\dot{\|}_{2}). In particular, for μd\mu\in\mathbb{R}^{d}, let πμ\pi_{\mu} be a probability measure on d\mathbb{R}^{d} whose density with respect to the Lebesgue measure 𝖫𝖾𝖻\mathsf{Leb} is given by,

𝖽πμ,ϵ𝖽𝖫𝖾𝖻(x)eϵxμ2\displaystyle\frac{\mathsf{d}\pi_{\mu,\epsilon}}{\mathsf{d}\mathsf{Leb}}(x)\propto e^{-\epsilon\|x-\mu\|_{2}}

For any zdz\in\mathbb{R}^{d}, we set the output distribution of \mathcal{M} to be (𝒟)=(𝒟{z})=πz,ϵ\mathcal{M}(\mathcal{D})=\mathcal{M}(\mathcal{D_{-}}\cup\{z\})=\pi_{z,\epsilon}. We verify in Appendix C.2 that \mathcal{M} is ϵ\epsilon metric DP w.r.t \|\cdot\| since for any μ1,μ2d\mu_{1},\mu_{2}\in\mathbb{R}^{d}, 𝖽πμ1,ϵ𝖽πμ2,ϵeϵμ1μ22\tfrac{\mathsf{d}\pi_{\mu_{1},\epsilon}}{\mathsf{d}\pi_{\mu_{2},\epsilon}}\leq e^{\epsilon\|\mu_{1}-\mu_{2}\|_{2}} due to the triangle inequality. We consider a reconstruction adversary which draws h1,,hn𝗂.𝗂.𝖽(𝒟)h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}) and computes 𝒜(h1,,hn)=1ni=1nhi\mathcal{A}(h_{1},\dots,h_{n})=\tfrac{1}{n}\sum_{i=1}^{n}h_{i}. It is easy to see that 𝔼h1,,hn𝗂.𝗂.𝖽(𝒟)[𝒜(h1,,hn)z22]=σ2n\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D})}[\|\mathcal{A}(h_{1},\dots,h_{n})-z\|^{2}_{2}]=\frac{\sigma^{2}}{n} where σ2=𝔼xπz,ϵ[xz22]\sigma^{2}=\mathbb{E}_{x\sim\pi_{z,\epsilon}}[\|x-z\|^{2}_{2}]. Thus, bounding the reconstruction error reduces to sharply bounding σ2\sigma^{2}. To this end, we use classical results in probability theory on the isoperimetric properties of the exponential distribution (Bobkov and Ledoux, 1997; Bobkov, 2003) in d\mathbb{R}^{d} as well the connections between moment control and isoperimetry (Aida and Stroock, 1994; Huang and Tropp, 2021; Garg et al., 2020) to sharply bound σ2\sigma^{2} as σ2dlog2(d)/ϵ2\sigma^{2}\lesssim\nicefrac{{d\log^{2}(d)}}{{\epsilon^{2}}}. It follows that,

𝔼h1,,hn𝗂.𝗂.𝖽(𝒟)[𝒜(h1,,hn)z22]dlog2(d)nϵ2\displaystyle\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D})}[\|\mathcal{A}(h_{1},\dots,h_{n})-z\|^{2}_{2}]\lesssim\frac{d\log^{2}(d)}{n\epsilon^{2}}

To make the RHS at most β2\beta^{2}, it suffices to set n=Θ(dlog2(d)ϵ2β2)n=\Theta(\tfrac{d\log^{2}(d)}{\epsilon^{2}\beta^{2}})

References

  • Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, page 308–318, New York, NY, USA, 2016. Association for Computing Machinery. ISBN 9781450341394. 10.1145/2976749.2978318. URL https://doi.org/10.1145/2976749.2978318.
  • Aida and Stroock (1994) Shigeki Aida and Daniel Stroock. Moment estimates derived from poincaré and logarithmic sobolev inequalities. Mathematical Research Letters, 1(1):75–86, 1994.
  • Alabi et al. (2020) Daniel Alabi, Audra McMillan, Jayshree Sarathy, Adam Smith, and Salil Vadhan. Differentially private simple linear regression, 2020. URL https://arxiv.org/abs/2007.05157.
  • (4) Joshua Allen, Janardhan Kulkarni, Abhradeep Thakurta, and Sergey Yekhanin. Tracking dp budget while handling basic sql queries. URL https://github.com/opendp/smartnoise-sdk/blob/main/
    sql/docs/papers/DP_SQL_budget.pdf
    .
  • (5) Differential Privacy Team Apple. Learning with privacy at scale. URL https://machinelearning.apple.com/research/learning-with-privacy-at-scale.
  • (6) Wiki Article. Binomial Distribution. URL https://en.wikipedia.org/wiki/Binomial_distribution#Tail_bounds.
  • Balle et al. (2022) Borja Balle, Giovanni Cherubin, and Jamie Hayes. Reconstructing training data with informed adversaries, 2022. URL https://arxiv.org/abs/2201.04845.
  • Bobkov and Ledoux (1997) Sergey Bobkov and Michel Ledoux. Poincaré’s inequalities and talagrand’s concentration phenomenon for the exponential distribution. Probability Theory and Related Fields, 107:383–400, 1997.
  • Bobkov (2003) Sergey G Bobkov. Spectral gap and concentration for some spherically symmetric probability measures. In Geometric Aspects of Functional Analysis: Israel Seminar 2001-2002, pages 37–43. Springer, 2003.
  • Boedihardjo et al. (2022) March Boedihardjo, Thomas Strohmer, and Roman Vershynin. Private measures, random walks, and synthetic data, 2022.
  • Bresler and Nagaraj (2020) Guy Bresler and Dheeraj Nagaraj. A corrective view of neural networks: Representation, memorization and learning, 2020. URL https://arxiv.org/abs/2002.00274.
  • Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020. URL https://arxiv.org/abs/2005.14165.
  • Bubeck et al. (2020) Sébastien Bubeck, Ronen Eldan, Yin Tat Lee, and Dan Mikulincer. Network size and size of the weights in memorization with two-layers neural networks. Advances in Neural Information Processing Systems, 33:4977–4986, 2020.
  • Bun and Steinke (2016) Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  • Cam (1986) Lucien Le Cam. Asymptotic methods in statistical decision theory. 1986. URL https://api.semanticscholar.org/CorpusID:118432452.
  • Carlini et al. (2021) Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models, 2021.
  • Chatzikokolakis et al. (2013) Kostas Chatzikokolakis, Miguel Andrés, Nicolás Bordenabe, and Catuscia Palamidessi. Broadening the scope of differential privacy using metrics. 07 2013. ISBN 978-3-642-39076-0. 10.1007/978-3-642-39077-7_5.
  • Chaudhuri and Monteleoni (2008) Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008. URL https://proceedings.neurips.cc/paper/2008/file/8065d07da4a77621450aa84fee5656d9-Paper.pdf.
  • Chaudhuri et al. (2009) Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. Differentially private empirical risk minimization, 2009. URL https://arxiv.org/abs/0912.0071.
  • Dinur and Nissim (2003) Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In Proceedings of the Twenty-Second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’03, page 202–210, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 1581136706. 10.1145/773153.773173. URL https://doi.org/10.1145/773153.773173.
  • Dwork (2011) Cynthia Dwork. Differential Privacy, pages 338–340. Springer US, Boston, MA, 2011. ISBN 978-1-4419-5906-5. 10.1007/978-1-4419-5906-5_752. URL https://doi.org/10.1007/978-1-4419-5906-5_752.
  • Dwork and Roth (2014) Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211–407, aug 2014. ISSN 1551-305X. 10.1561/0400000042. URL https://doi.org/10.1561/0400000042.
  • Dwork and Rothblum (2016) Cynthia Dwork and Guy N. Rothblum. Concentrated differential privacy, 2016.
  • Dwork et al. (2006) Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology (EUROCRYPT 2006), volume 4004 of Lecture Notes in Computer Science, pages 486–503. Springer Verlag, May 2006. URL https://www.microsoft.com/en-us/research/publication/our-data-ourselves-privacy-via-distributed-noise-generation/.
  • Feldman et al. (2018) Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta. Privacy amplification by iteration. pages 521–532, 10 2018. 10.1109/FOCS.2018.00056.
  • Fernandes et al. (2019) Natasha Fernandes, Mark Dras, and Annabelle McIver. Generalised differential privacy for text document processing. In Flemming Nielson and David Sands, editors, Principles of Security and Trust, pages 123–148, Cham, 2019. Springer International Publishing. ISBN 978-3-030-17138-4.
  • Fernandes et al. (2021) Natasha Fernandes, Annabelle McIver, and Carroll Morgan. The laplace mechanism has optimal utility for differential privacy over continuous queries, 2021. URL https://arxiv.org/abs/2105.07176.
  • Fu et al. (2022) Jie Fu, Zhili Chen, and XinPeng Ling. Sa-dpsgd: Differentially private stochastic gradient descent based on simulated annealing, 2022. URL https://arxiv.org/abs/2211.07218.
  • Garg et al. (2020) Ankit Garg, Tarun Kathuria, and Nikhil Srivastava. Scalar poincaré implies matrix poincaré, 2020.
  • Goldreich et al. (1998) Oded Goldreich, Shari Goldwasser, and Dana Ron. Property testing and its connection to learning and approximation. Journal of the ACM (JACM), 45(4):653–750, 1998.
  • Guo et al. (2022) Chuan Guo, Brian Karrer, Kamalika Chaudhuri, and Laurens van der Maaten. Bounding training data reconstruction in private (deep) learning. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 8056–8071. PMLR, 17–23 Jul 2022. URL https://proceedings.mlr.press/v162/guo22c.html.
  • Huang and Tropp (2021) De Huang and Joel A Tropp. From poincaré inequalities to nonlinear matrix concentration. 2021.
  • Humphries et al. (2020) Thomas Humphries, Matthew Rafuse, Lindsey Tulloch, Simon Oya, Ian Goldberg, and Florian Kerschbaum. Differentially private learning does not bound membership inference. ArXiv, abs/2010.12112, 2020.
  • Imola et al. (2022) Jacob Imola, Shiva Kasiviswanathan, Stephen White, Abhinav Aggarwal, and Nathanael Teissier. Balancing utility and scalability in metric differential privacy. In UAI 2022, 2022. URL https://www.amazon.science/publications/balancing-utility-and-
    scalability-in-metric-differential-
    privacy
    .
  • Jain et al. (2011) Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning, 2011. URL https://arxiv.org/abs/1109.0105.
  • Kairouz et al. (2015) Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pages 1376–1385. PMLR, 2015.
  • Kearns and Vazirani (1994) Michael J Kearns and Umesh Vazirani. An introduction to computational learning theory. MIT press, 1994.
  • Koufogiannis et al. (2017) Fragkiskos Koufogiannis, Shuo Han, and George J. Pappas. Gradual release of sensitive data under differential privacy. Journal of Privacy and Confidentiality, 7(2), Jan. 2017. 10.29012/jpc.v7i2.649. URL https://journalprivacyconfidentiality.org/index.php/jpc/article/view/649.
  • Lecun et al. (1998) Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. 10.1109/5.726791.
  • Levin and Peres (2017) David A Levin and Yuval Peres. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  • Mahloujifar et al. (2022) Saeed Mahloujifar, Alexandre Sablayrolles, Graham Cormode, and Somesh Jha. Optimal membership inference bounds for adaptive composition of sampled gaussian mechanisms, 2022.
  • McSherry and Talwar (2007) Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007.
  • Mironov (2017) Ilya Mironov. Rényi differential privacy. pages 263–275, 08 2017. 10.1109/CSF.2017.11.
  • Murtagh and Vadhan (2015) Jack Murtagh and Salil Vadhan. The complexity of computing the optimal composition of differential privacy. In Theory of Cryptography Conference, pages 157–175. Springer, 2015.
  • Ramesh et al. (2021) Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  • Rudin (1976) Walter Rudin. Principles of mathematical analysis, volume 3. McGraw-hill New York, 1976.
  • Salem et al. (2023) Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-Béguelin. Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning, 2023.
  • Shokri (2022) Reza Shokri. Auditing data privacy for machine learning. Santa Clara, CA, February 2022. USENIX Association.
  • Song et al. (2013) Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248, 2013. 10.1109/GlobalSIP.2013.6736861.
  • Stock et al. (2022) Pierre Stock, Igor Shilov, Ilya Mironov, and Alexandre Sablayrolles. Defending against reconstruction attacks with rényi differential privacy, 2022.
  • Tramèr and Boneh (2020) Florian Tramèr and Dan Boneh. Differentially private learning needs better features (or much more data), 2020. URL https://arxiv.org/abs/2011.11660.
  • Tramèr and Boneh (2021) Florian Tramèr and Dan Boneh. Differentially private learning needs better features (or much more data), 2021.
  • Tsybakov (2004) Alexandre B Tsybakov. Introduction to nonparametric estimation, 2009. URL https://doi. org/10.1007/b13794. Revised and extended from the, 9(10), 2004.
  • Vadhan (2017) Salil Vadhan. The Complexity of Differential Privacy, pages 347–450. Springer, Yehuda Lindell, ed., 2017. URL https://link.springer.com/chapter/10.1007/978-3-319-57048-8_7.
  • Vardi et al. (2021) Gal Vardi, Gilad Yehudai, and Ohad Shamir. On the optimal memorization power of relu neural networks. arXiv preprint arXiv:2110.03187, 2021.
  • Von Neumann and Morgenstern (1947) John Von Neumann and Oskar Morgenstern. Theory of games and economic behavior, 2nd rev. 1947.
  • Wainwright (2019) Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2019. 10.1017/9781108627771.
  • Warner (1965) Stanley L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965. ISSN 01621459. URL http://www.jstor.org/stable/2283137.
  • Yeom et al. (2018) Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha. Privacy risk in machine learning: Analyzing the connection to overfitting. pages 268–282, 07 2018. 10.1109/CSF.2018.00027.
  • Zhu et al. (2019) Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients, 2019.

Appendix A Additional Notation and Technical Lemmas

Definition A.1 (Total Variation Distance).

Let P0P_{0} and P1P_{1} be two probability measures on a measurable space (𝒳,𝒜)(\mathcal{X},\mathcal{A}). The Total Variation distance between P0,P1P_{0},P_{1} is given by:

𝖳𝖵(P0,P1)=supT𝒜|P0(T)P1(T)|\mathsf{TV}(P_{0},P_{1})=\sup_{T\in\mathcal{A}}|P_{0}(T)-P_{1}(T)| (8)
Lemma A.2 (Tensorization of Total Variation Affinity).

Let PP and QQ be two probability measures supported on a set 𝒳\mathcal{X}. For any nn\in\mathbb{N}, let PnP^{n} and QnQ^{n} denote the respective product measures of PP and QQ, supported on the set 𝒳n\mathcal{X}^{n}. Then, the following holds:

𝖳𝖵(Pn,Qn)1(1𝖳𝖵(P,Q))n\displaystyle\mathsf{TV}(P^{n},Q^{n})\leq 1-(1-\mathsf{TV}(P,Q))^{n}
Proof A.3.

Let Γ(P,Q)\Gamma(P,Q) denote the set of all couplings of PP and QQ. We make use of the fact that 𝖳𝖵(P,Q)=minCΓ(P,Q)(x,y)C[xy]\mathsf{TV}(P,Q)=\min_{C\in\Gamma(P,Q)}\mathbb{P}_{(x,y)\sim C}[x\neq y] (Levin and Peres, 2017). To this end, let CC^{*} denote the 𝖳𝖵\mathsf{TV}-optimal coupling of PP and QQ, i.e., CΓ(P,Q)C^{*}\in\Gamma(P,Q) such that 𝖳𝖵(P,Q)=(x,y)C[xy]\mathsf{TV}(P,Q)=\mathbb{P}_{(x,y)\sim C^{*}}[x\neq y]. We now construct a coupling of PnP^{n} and QnQ^{n} as follows. Let X=(x1,x2,,xn)X=(x_{1},x_{2},\dots,x_{n}) and Y=(y1,y2,,yn)Y=(y_{1},y_{2},\dots,y_{n}) be two random variables on 𝒳n\mathcal{X}^{n} such that (xi,yi)i.i.dCi[n](x_{i},y_{i})\stackrel{{\scriptstyle i.i.d}}{{\sim}}C^{*}\ \forall i\in[n]. It is easy to see that (X,Y)(X,Y) is a coupling of PnP^{n} and QnQ^{n}. It follows that:

𝖳𝖵(Pn,Qn)\displaystyle\mathsf{TV}(P^{n},Q^{n}) [XY]\displaystyle\leq\mathbb{P}[X\neq Y]
=1[X=Y]\displaystyle=1-\mathbb{P}[X=Y]
=1i=1n[xi=yi]\displaystyle=1-\prod_{i=1}^{n}\mathbb{P}[x_{i}=y_{i}]
=1i=1n(1[xiyi])\displaystyle=1-\prod_{i=1}^{n}(1-\mathbb{P}[x_{i}\neq y_{i}])
=1(1𝖳𝖵(P,Q))n\displaystyle=1-(1-\mathsf{TV}(P,Q))^{n}

where the third step uses the co-ordinatewise i.i.d. structure of (xi,yi)(x_{i},y_{i}) and the last step uses the fact that (xi,yi)(x_{i},y_{i}) are sampled from the 𝖳𝖵\mathsf{TV}-optimal coupling of PP and QQ. Hence, 𝖳𝖵(Pn,Qn)1(1𝖳𝖵(P,Q))n\mathsf{TV}(P^{n},Q^{n})\leq 1-(1-\mathsf{TV}(P,Q))^{n}.

Definition A.4 (Binary Hypotheses Testing).

Let P0P_{0} and P1P_{1} be two probability measures and XX be an observation drawn from either P0P_{0} or P1P_{1}. If the objective is to determine which distribution generated the observation XX, it corresponds to the following statistical hypothesis testing problem:

H0(𝗇𝗎𝗅𝗅)\displaystyle H_{0}\mathsf{(null)} :XP0\displaystyle:X\sim P_{0} (9)
H1(𝖺𝗅𝗍𝖾𝗋𝗇𝖺𝗍𝖾)\displaystyle H_{1}\mathsf{(alternate)} :XP1\displaystyle:X\sim P_{1}

A test Ψ:X{0,1}\Psi:X\to\{0,1\} indicates which hypothesis is true.

In a binary hypothesis test, the test Ψ\Psi can make two kinds of errors: A type I error(false alarm) corresponds to the case when the null hypothesis is true but rejected, i.e., XP0[Ψ(X)=1]\mathbb{P}_{X\sim P_{0}}[\Psi(X)=1]. A type II error(missed detection) is when the null hypothesis is false but retained, i.e., XP1[Ψ(X)=0]\mathbb{P}_{X\sim P_{1}}[\Psi(X)=0].

Lemma A.5 (Variational Representation of Total Variation(Cam (1986))).

For any distributions PP and QQ on a measurable space (𝒳,)(\mathcal{X},), we have:

infΨ(XP0[Ψ(X)=1]+XP1[Ψ(X)=0])=1𝖳𝖵(P0,P1)\inf_{\Psi}(\mathbb{P}_{X\sim P_{0}}[\Psi(X)=1]+\mathbb{P}_{X\sim P_{1}}[\Psi(X)=0])=1-\mathsf{TV}(P_{0},P_{1}) (10)

where the inf\inf is over all tests Ψ\Psi.

Definition A.6 (M-ary Hypothesis Test (Wainwright (2019))).

Let P1,,PMP_{1},...,P_{M} be M(2)M(\geq 2) probability distributions such that PjPkj,kP_{j}\ll P_{k}\forall j,k and XX be an observation drawn from any one of P1,,PMP_{1},...,P_{M}. If the objective is to determine which distribution generated the observation XX, it corresponds to an MM-ary hypothesis testing problem, i.e., it is a generalization of Definition A.4 to multiple hypotheses.

Lemma A.7 (Fano’s Inequality for M-ary Hypotheses (Wainwright (2019))).

Let P1,,PMP_{1},...,P_{M} be M(2)M(\geq 2) probability distributions such that PjPkj,kP_{j}\ll P_{k}\forall j,k and XX be an observation drawn from any one of P1,,PMP_{1},...,P_{M}. Let Ψ:X[M]\Psi:X\to[M] be a test that indicates which distribution XX was drawn from, i.e., which of the MM hypotheses is true. Then, we have

infΨmax1jMPj[Ψ(X)j]11/M2j,k=1M𝖪𝖫(Pj,Pk)+log2log(M1)\inf_{\Psi}\max_{1\leq j\leq M}P_{j}[\Psi(X)\neq j]\geq 1-\frac{1/M^{2}\sum_{j,k=1}^{M}\mathsf{KL}(P_{j},P_{k})+\log 2}{\log(M-1)} (11)

where the inf\inf is over all tests Ψ\Psi with values in [M][M]

Lemma A.8 (Proposition 10 of Mironov (2017)).

Let \mathcal{M} be an (α,ϵ)(\alpha,\epsilon) Renyi DP learner and let 𝒟\mathcal{D} and 𝒟\mathcal{D}^{\prime} be neighboring datasets (i.e. datasets differing in only one entry). Then, for any SS\subseteq\mathcal{H}, the following holds

[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D})\in S] (eϵ[(𝒟)S])(α1)/α\displaystyle\leq\left(e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S]\right)^{\nicefrac{{(\alpha-1)}}{{\alpha}}}
[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S] (eϵ[(𝒟)S])(α1)/α\displaystyle\leq\left(e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D})\in S]\right)^{\nicefrac{{(\alpha-1)}}{{\alpha}}}
Lemma A.9 (KL Upper Bounds for Metric DP Learners).

Let \mathcal{M} be an ϵ\epsilon metric differentially private learner. Then, for any two datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime}, the KL divergence between the output distributions (𝒟)\mathcal{M}(\mathcal{D}) and (𝒟)\mathcal{M}(\mathcal{D}^{\prime}) is upper bounded as follows:

𝖪𝖫[(𝒟)||(𝒟)]ϵ2ρ(𝒟,𝒟)22\displaystyle\mathsf{KL}[\mathcal{M}(\mathcal{D})||\mathcal{M}(\mathcal{D}^{\prime})]\leq\frac{\epsilon^{2}\rho(\mathcal{D},\mathcal{D}^{\prime})^{2}}{2}
Proof A.10.

This follows from Lemma 3.8 of Dwork and Rothblum (2016) by replacing ϵ\epsilon with ϵρ(𝒟,𝒟)\epsilon\rho(\mathcal{D},\mathcal{D}^{{}^{\prime}}).

Appendix B Analysis of Data Reconstruction for (ϵ,δ)(\epsilon,\delta) DP

B.1 Proof of Lemma 6.1

Proof B.1.

Consider any arbitrary SS\subseteq\mathcal{H}. By definition of total variation, the following holds

𝖳𝖵((𝒟),(𝒟)[(𝒟)S][(𝒟)S]\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime})\leq\mathbb{P}[\mathcal{M}(\mathcal{D})\in S]-\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S] (12)

Note that since \mathcal{M} is an (ϵ,δ)(\epsilon,\delta) DP mechanism and 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} are neighboring datasets, the following constraints must be satisfied:

[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D})\in S] eϵ[(𝒟)S]+δ\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S]+\delta
[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S] eϵ[(𝒟)S]+δ\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D})\in S]+\delta (13)

From (12) and (B.1), it is easy to see that 𝖳𝖵((𝒟),(𝒟)\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}) can be upper bounded by the solution to the following linear program:

minxy\displaystyle\min x-y
subject to 0x1,\displaystyle 0\leq x\leq 1,
0y1,\displaystyle 0\leq y\leq 1,
xeϵy+δ,\displaystyle x\leq e^{\epsilon}y+\delta,
yeϵx+δ\displaystyle y\leq e^{\epsilon}x+\delta (14)

We now check that x=11δeϵ+1x=1-\tfrac{1-\delta}{e^{\epsilon}+1} and y = 1δeϵ+1\tfrac{1-\delta}{e^{\epsilon}+1} lies in the constraint set of the linear program (B.1). Clearly, since δ[0,1]\delta\in[0,1] and ϵ0\epsilon\geq 0, 0x10\leq x\leq 1 and 0y10\leq y\leq 1 are trivially satisfied. We then note that,

eϵy+δ\displaystyle e^{\epsilon}y+\delta =eϵ(1δ)eϵ+1+δ\displaystyle=\frac{e^{\epsilon}(1-\delta)}{e^{\epsilon}+1}+\delta
=(1δ)(11eϵ+1)+δ\displaystyle=(1-\delta)(1-\frac{1}{e^{\epsilon}+1})+\delta
=1(1δ)eϵ+1=x\displaystyle=1-\frac{(1-\delta)}{e^{\epsilon}+1}=x

i.e. the third constraint in (B.1) is satisfied as an equality. Finally,

eϵx+δ\displaystyle e^{\epsilon}x+\delta =eϵ[11δeϵ+1]+δ\displaystyle=e^{\epsilon}\left[1-\frac{1-\delta}{e^{\epsilon}+1}\right]+\delta
=eϵeϵeϵ+1(1δ)+δ\displaystyle=e^{\epsilon}-\frac{e^{\epsilon}}{e^{\epsilon}+1}(1-\delta)+\delta
=eϵ[11eϵ+1](1δ)+δ\displaystyle=e^{\epsilon}-\left[1-\frac{1}{e^{\epsilon}+1}\right](1-\delta)+\delta
=eϵ+2δ1+1δeϵ+1\displaystyle=e^{\epsilon}+2\delta-1+\frac{1-\delta}{e^{\epsilon}+1}
=eϵ+2δ1+yy\displaystyle=e^{\epsilon}+2\delta-1+y\geq y

where the last inequality follows from the fact that ϵ0\epsilon\geq 0 and δ0\delta\geq 0. Thus, the fourth constraint in the linear program (B.1) is also satisfied. From (12), (B.1) an (B.1), we conclude that,

𝖳𝖵((𝒟),(𝒟))xy=12(1δ)eϵ+1\displaystyle\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}))\leq x-y=1-\frac{2(1-\delta)}{e^{\epsilon}+1}

B.2 Lower Bound for (ϵ,δ)(\epsilon,\delta) DP : Proof of Theorem 4.1

Proof B.2.

Let z1,z2𝒵z_{1},z_{2}\in\mathcal{Z} be the two farthest points in 𝒵\mathcal{Z}, i.e. ρ(z1,z2)=𝖽𝗂𝖺𝗆(𝒵)\rho(z_{1},z_{2})=\mathsf{diam}(\mathcal{Z}). Note that by compactness of 𝒵\mathcal{Z}, z1z_{1} and z2z_{2} are guaranteed to exist, and since 𝒵\mathcal{Z} is not a singleton, z1z2z_{1}\neq z_{2}. Let 𝒜\mathcal{A} be any arbitrary reconstruction adversary and let Δ=𝖽𝗂𝖺𝗆(𝒵)2\Delta=\tfrac{\mathsf{diam}(\mathcal{Z})}{2}. For any z𝒵z\in\mathcal{Z}, let PzP_{z} denote the output distribution of (𝒟{z})\mathcal{M}(\mathcal{D_{-}}\cup\{z\}) and let PznP^{n}_{z} be its associated product measure (i.e. Pzn=i=1nPzP^{n}_{z}=\bigotimes_{i=1}^{n}P_{z}) Note that this means h1,hn𝗂.𝗂.𝖽Pzh_{1},\dots h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}P_{z} is equivalent to h1:nPznh_{1:n}\sim P^{n}_{z} where we use h1:nh_{1:n} as a shorthand for h1,,hnh_{1},\dots,h_{n}.

By Markov’s Inequality, the following holds for any z𝒵z\in\mathcal{Z}

𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ2h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)Δ]\displaystyle\geq\Delta^{2}\mathbb{P}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)\geq\Delta]
=Δ2Pzn[ρ(𝒜(h1:n),z)Δ]\displaystyle=\Delta^{2}P^{n}_{z}[\rho(\mathcal{A}(h_{1:n}),z)\geq\Delta]

Taking the supremum over 𝒵\mathcal{Z} on both sides,

supz𝒵𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ2supz𝒵Pzn[ρ(𝒜(h1:n),z)Δ)]\displaystyle\geq\Delta^{2}\sup_{z\in\mathcal{Z}}P^{n}_{z}[\rho(\mathcal{A}(h_{1:n}),z)\geq\Delta)]
Δ22[Pz1n[ρ(𝒜(h1:n),z1)Δ]+Pz2n[ρ(𝒜(h1:n),z2)Δ]]\displaystyle\geq\tfrac{\Delta^{2}}{2}\left[P^{n}_{z_{1}}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\Delta]+P^{n}_{z_{2}}[\rho(\mathcal{A}(h_{1:n}),z_{2})\geq\Delta]\right] (15)

Suppose the event ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2}) were true. Then, the following would hold by an application of the triangle inequality :

ρ(𝒜(h1:n),z1)\displaystyle\rho(\mathcal{A}(h_{1:n}),z_{1}) ρ(𝒜(h1:n),z2)\displaystyle\geq\rho(\mathcal{A}(h_{1:n}),z_{2})
ρ(z1,z2)ρ(𝒜(h1:n),z1)\displaystyle\geq\rho(z_{1},z_{2})-\rho(\mathcal{A}(h_{1:n}),z_{1})
=2Δρ(𝒜(h1:n),z1)\displaystyle=2\Delta-\rho(\mathcal{A}(h_{1:n}),z_{1})

Thus, we note that ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)ρ(𝒜(h1:n),z1)Δ\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2})\implies\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\Delta hence P1n[ρ(𝒜(h1:n),z1)Δ]P1n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]P^{n}_{1}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\Delta]\geq P^{n}_{1}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2})]. By parallel reasoning, P2n[ρ(𝒜(h1:n),z2)Δ]P2n[ρ(𝒜(h1:n),z2)ρ(𝒜(h1:n),z1)]P^{n}_{2}[\rho(\mathcal{A}(h_{1:n}),z_{2})\geq\Delta]\geq P^{n}_{2}[\rho(\mathcal{A}(h_{1:n}),z_{2})\geq\rho(\mathcal{A}(h_{1:n}),z_{1})]. Substituting into (B.2), we obtain

supz𝒵𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ22P1n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]\displaystyle\geq\tfrac{\Delta^{2}}{2}P^{n}_{1}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2})]
+Δ22P2n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]\displaystyle+\tfrac{\Delta^{2}}{2}P^{n}_{2}[\rho(\mathcal{A}(h_{1:n}),z_{1})\leq\rho(\mathcal{A}(h_{1:n}),z_{2})]
Δ22P1n[ρ(𝒜(h1:n),z1)ρ(𝒜(h1:n),z2)]\displaystyle\geq\tfrac{\Delta^{2}}{2}P^{n}_{1}[\rho(\mathcal{A}(h_{1:n}),z_{1})\geq\rho(\mathcal{A}(h_{1:n}),z_{2})]
+Δ22P2n[ρ(𝒜(h1:n),z1)<ρ(𝒜(h1:n),z2)]\displaystyle+\tfrac{\Delta^{2}}{2}P^{n}_{2}[\rho(\mathcal{A}(h_{1:n}),z_{1})<\rho(\mathcal{A}(h_{1:n}),z_{2})]
Δ22Δ22P1n[ρ(𝒜(h1:n),z1)<ρ(𝒜(h1:n),z2)]\displaystyle\geq\tfrac{\Delta^{2}}{2}-\tfrac{\Delta^{2}}{2}P^{n}_{1}[\rho(\mathcal{A}(h_{1:n}),z_{1})<\rho(\mathcal{A}(h_{1:n}),z_{2})]
+Δ22P2n[ρ(𝒜(h1:n),z1)<ρ(𝒜(h1:n),z2)]\displaystyle+\tfrac{\Delta^{2}}{2}P^{n}_{2}[\rho(\mathcal{A}(h_{1:n}),z_{1})<\rho(\mathcal{A}(h_{1:n}),z_{2})]
Δ22[1𝖳𝖵(P1n,P2n)]\displaystyle\geq\tfrac{\Delta^{2}}{2}[1-\mathsf{TV}(P^{n}_{1},P^{n}_{2})]
Δ22[1𝖳𝖵(P1,P2)]n\displaystyle\geq\tfrac{\Delta^{2}}{2}[1-\mathsf{TV}(P_{1},P_{2})]^{n}
=Δ22(1𝖳𝖵((𝒟{z1}),(𝒟{z2})))n\displaystyle=\tfrac{\Delta^{2}}{2}(1-\mathsf{TV}(\mathcal{M}(\mathcal{D_{-}}\cup\{z_{1}\}),\mathcal{M}(\mathcal{D_{-}}\cup\{z_{2}\})))^{n}
Δ22(2(1δ)eϵ+1)n\displaystyle\geq\tfrac{\Delta^{2}}{2}\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}

where the fifth inequality applies Lemma A.2 and the last inequality applies Lemma 6.1 using the fact that 𝒟{z1}\mathcal{D_{-}}\cup\{z_{1}\} and 𝒟{z2}\mathcal{D_{-}}\cup\{z_{2}\} are datasets differing only in one point. Now, taking an infimum over all adversaries 𝒜\mathcal{A} in the LHS, we obtain

𝒱(,n)𝖽𝗂𝖺𝗆(𝒵)28(2(1δ)eϵ+1)n\displaystyle\mathcal{V}(\mathcal{M},n)\geq\frac{\mathsf{diam}(\mathcal{Z})^{2}}{8}\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}

By definition of 𝒱(,n)\mathcal{V}(\mathcal{M},n) it is easy to see that if an adversary attains an expected squared reconstruction error of at most β2\beta^{2} for every possible target point z𝒵z\in\mathcal{Z}, β2\beta^{2} must be at least 𝒱(,n)\mathcal{V}(\mathcal{M},n), i.e., the following must hold

β2𝒱(,n)𝖽𝗂𝖺𝗆(𝒵)28(2(1δ)eϵ+1)n\displaystyle\beta^{2}\geq\mathcal{V}(\mathcal{M},n)\gtrsim\frac{\mathsf{diam}(\mathcal{Z})^{2}}{8}\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}

Rearranging appropriately, we obtain the required query complexity lower bound.

B.3 Proof of Lemma 6.2

Proof B.3.

The proof of this lemma is similar to that of Lemma 6.1. To this end, we consider any arbitrary SS\subseteq\mathcal{H}. By definition of total variation, the following holds

𝖳𝖵((𝒟),(𝒟)[(𝒟)S][(𝒟)S]\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime})\leq\mathbb{P}[\mathcal{M}(\mathcal{D})\in S]-\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S]

Note that since \mathcal{M} is an (α,ϵ)(\alpha,\epsilon) Renyi DP mechanism and 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime} are neighboring datasets, the following constraints must be satisfied as per Lemma A.8:

[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D})\in S] (eϵ[(𝒟)S])γ\displaystyle\leq\left(e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S]\right)^{\gamma}
[(𝒟)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}^{\prime})\in S] (eϵ[(𝒟)S])γ\displaystyle\leq\left(e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D})\in S]\right)^{\gamma}

where γ=α1α\gamma=\tfrac{\alpha-1}{\alpha}. Note that since α>1,0<β<1\alpha>1,0<\beta<1. Subsequently, it is easy to see that 𝖳𝖵((𝒟),(𝒟)\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}) can be upper bounded by the solution to the following optimization problem:

minxy\displaystyle\min x-y
subject to 0x1,\displaystyle 0\leq x\leq 1,
0y1,\displaystyle 0\leq y\leq 1,
xeγϵyβ\displaystyle x\leq e^{\gamma\epsilon}y^{\beta}
yeγϵxβ\displaystyle y\leq e^{\gamma\epsilon}x^{\beta} (16)

We now show that x=eγϵ1+eγϵx=\frac{e^{\gamma\epsilon}}{1+e^{\gamma\epsilon}} and y=1(1+eγϵ)1/γy=\frac{1}{(1+e^{\gamma\epsilon})^{\nicefrac{{1}}{{\gamma}}}} lie in the constraint set of the above problem. We first note that since ϵ>0\epsilon>0 and 0<γ<10<\gamma<1, 0x10\leq x\leq 1 and 0y10\leq y\leq 1. Hence, the first two constraints are satisfied. Furthermore, eγϵyγ=eγϵ1+eγϵ=xe^{\gamma\epsilon}y^{\gamma}=\tfrac{e^{\gamma\epsilon}}{1+e^{\gamma\epsilon}}=x, i.e., the third constraint is satisfied with equality. The final constraint can be verified by observing that:

1(1+eγϵ)1/γγ1e(γ+γ2)ϵ\displaystyle\frac{1}{(1+e^{\gamma\epsilon})^{\nicefrac{{1}}{{\gamma}}-\gamma}}\leq 1\leq e^{(\gamma+\gamma^{2})\epsilon}

where we use the fact that 1/γγ0\nicefrac{{1}}{{\gamma}}-\gamma\geq 0 as γ<1\gamma<1. Rearranging terms, it follows that:

y=1(1+eγϵ)1/γeγϵeγ2ϵ(1+eγϵ)γ=eγϵxβ\displaystyle y=\frac{1}{(1+e^{\gamma\epsilon})^{\nicefrac{{1}}{{\gamma}}}}\leq e^{\gamma\epsilon}\frac{e^{\gamma^{2}\epsilon}}{(1+e^{\gamma\epsilon})^{\gamma}}=e^{\gamma\epsilon}x^{\beta}

Hence, the final constraint is satisfied. Thus, the solution to the optimization problem B.3 is bounded by xyx-y and hence,

𝖳𝖵((𝒟),(𝒟))xy111+eγϵ1(1+eγϵ)1/γ\displaystyle\mathsf{TV}(\mathcal{M}(\mathcal{D}),\mathcal{M}(\mathcal{D}^{\prime}))\leq x-y\leq 1-\frac{1}{1+e^{\gamma\epsilon}}-\frac{1}{(1+e^{\gamma\epsilon})^{\nicefrac{{1}}{{\gamma}}}}

B.4 (α,ϵ)(\alpha,\epsilon) Renyi DP : Proof of Corollary 4.4

As suggested by the proof sketch in Section 6.1, we follow the exact same steps as the proof of Theorem 4.1 in Section B.2 and obtain the following:

𝒱(,n)\displaystyle\mathcal{V}(\mathcal{M},n) 𝖽𝗂𝖺𝗆(𝒵)28(1𝖳𝖵((𝒟{z1}),(𝒟{z2})))n\displaystyle\geq\tfrac{\mathsf{diam}(\mathcal{Z})^{2}}{8}(1-\mathsf{TV}(\mathcal{M}(\mathcal{D_{-}}\cup\{z_{1}\}),\mathcal{M}(\mathcal{D_{-}}\cup\{z_{2}\})))^{n}

Substituting the TV upper bound obtained in Lemma 6.2, we obtain the following:

𝒱(,n)\displaystyle\mathcal{V}(\mathcal{M},n) 𝖽𝗂𝖺𝗆(𝒵)28[1eγϵ+1+1(eγϵ+1)1/γ]n\displaystyle\geq\tfrac{\mathsf{diam}(\mathcal{Z})^{2}}{8}\left[\frac{1}{e^{\gamma\epsilon}+1}+\frac{1}{\left(e^{\gamma\epsilon}+1\right)^{\nicefrac{{1}}{{\gamma}}}}\right]^{n}

where γ=11/α\gamma=1-\nicefrac{{1}}{{\alpha}}. Following the same arguments as Theorem 4.1, the sample complexity is obtained by upper bounding 𝒱(,n)\mathcal{V}(\mathcal{M},n) with β2\beta^{2} and rearranging the expression to get a lower bound on nn.

B.5 Reconstruction Upper Bound : Proof of Theorem 4.2

Since 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}, 𝖽𝗂𝖺𝗆(𝒵)=ρ(z1,z2)\mathsf{diam}(\mathcal{Z})=\rho(z_{1},z_{2}). Let z𝒵z\in\mathcal{Z} be the true sample, i.e., 𝒟=𝒟{z}\mathcal{D}=\mathcal{D_{-}}\cup\{z\}. Our construction of \mathcal{M} is a variant of the randomized response mechanism, defined as follows:

  1. 1.

    Flip a coin C𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(δ)C\sim\mathsf{Bernoulli}(\delta). If C=1C=1, then (𝒟):=(1,z)\mathcal{M}(\mathcal{D}):=(1,z)

  2. 2.

    Otherwise, if C=0C=0, then (𝒟)\mathcal{M}(\mathcal{D}) is defined as follows:

    (𝒟):={(0,z) with probability eϵ1+eϵ(0,𝒵{z}) with probability 11+eϵ\displaystyle\mathcal{M}(\mathcal{D}):=\begin{cases}(0,z)\text{ with probability }\tfrac{e^{\epsilon}}{1+e^{\epsilon}}\\ (0,\mathcal{Z}\setminus\{z\})\text{ with probability }\tfrac{1}{1+e^{\epsilon}}\end{cases}

Claim : \mathcal{M} is (ϵ,δ)(\epsilon,\delta) DP

Consider any x,y𝒵x,y\in\mathcal{Z}. Let 𝒟x=𝒟{x}\mathcal{D}_{x}=\mathcal{D_{-}}\cup\{x\} and 𝒟y=𝒟{y}\mathcal{D}_{y}=\mathcal{D_{-}}\cup\{y\}. Let CxC_{x} refer to the 𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(δ)\mathsf{Bernoulli}(\delta) coin flip in step 1 of (𝒟x)\mathcal{M}(\mathcal{D}_{x}) and CyC_{y} denote the same for (𝒟y)\mathcal{M}(\mathcal{D}_{y}). Now, denote the events Ex={Cx=0}E_{x}=\{C_{x}=0\} and Ey={Cy=0}E_{y}=\{C_{y}=0\}. Note that, [Ex]=[Ey]=1δ\mathbb{P}[E_{x}]=\mathbb{P}[E_{y}]=1-\delta. Furthermore, when conditioned on ExE_{x}, (𝒟x)|Ex\mathcal{M}(\mathcal{D}_{x})|_{E_{x}} is the randomized response mechanism and the same holds for (𝒟y)|Ey\mathcal{M}(\mathcal{D}_{y})|_{E_{y}} conditioned on EyE_{y}. In particular, the following holds true for any S{0,1}×𝒵S\subseteq\{0,1\}\times\mathcal{Z}:

[(𝒟x)S|Ex]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S|E_{x}] eϵ[(𝒟y)S|Ey]\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S|E_{y}]
[(𝒟y)S|Ey]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S|E_{y}] eϵ[(𝒟x)S|Ex]\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S|E_{x}] (17)

Using the fact that for any two events A,BA,B, [A|B][B][A]=[A|B][B]+[A|Bc][Bc][A|B][B]+[Bc]\mathbb{P}[A|B]\mathbb{P}[B]\leq\mathbb{P}[A]=\mathbb{P}[A|B]\mathbb{P}[B]+\mathbb{P}[A|B^{c}]\mathbb{P}[B^{c}]\leq\mathbb{P}[A|B]\mathbb{P}[B]+\mathbb{P}[B^{c}], we conclude the following

[(𝒟x)S]\displaystyle\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S] [(𝒟x)S|Ex][Ex]+[Exc]\displaystyle\leq\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S|E_{x}]\mathbb{P}[E_{x}]+\mathbb{P}[E^{c}_{x}]
=[(𝒟x)S|Ex](1δ)+δ\displaystyle=\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S|E_{x}](1-\delta)+\delta
eϵ[(𝒟y)S|Ey](1δ)+δ\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S|E_{y}](1-\delta)+\delta
=eϵ[(𝒟y)S|Ey][Ey]+δ\displaystyle=e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S|E_{y}]\mathbb{P}[E_{y}]+\delta
eϵ[(𝒟y)S]+δ\displaystyle\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S]+\delta

Repeating the same argument with (𝒟y)\mathcal{M}(\mathcal{D}_{y}), we infer that [(𝒟y)S]eϵ[(𝒟x)S]+δ\mathbb{P}[\mathcal{M}(\mathcal{D}_{y})\in S]\leq e^{\epsilon}\mathbb{P}[\mathcal{M}(\mathcal{D}_{x})\in S]+\delta. Hence, \mathcal{M} is (ϵ,δ)(\epsilon,\delta) differentially private.

Reconstruction Adversary

Given nn i.i.d samples hi=(Ci,zi)𝗂.𝗂.𝖽(𝒟),i[n]h_{i}=(C_{i},z_{i})\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}),\ i\in[n], the reconstruction adversary is defined as follows. Firstly, if there exists any i[n]i\in[n] such that Ci=1C_{i}=1, 𝒜(h1:n)=zi\mathcal{A}(h_{1:n})=z_{i}. Under this event, our construction of \mathcal{M} ensures that the adversary does not incur any reconstruction error, i.e. 𝒜(h1:n)=z\mathcal{A}(h_{1:n})=z. On the contrary, if Ci=0i[n]C_{i}=0\ \forall i\in[n], the adversary 𝒜\mathcal{A} uses the following randomized strategy: It first constructs nn random bits bib_{i} defined as:

bi:={𝖡𝖾𝗋𝗇(1eϵ) if hi=(0,z)0 if hi=(0,𝒵{z})\displaystyle b_{i}:=\begin{cases}\sim\mathsf{Bern}(1-e^{-\epsilon})\text{ if }h_{i}=(0,z)\\ 0\text{ if }h_{i}=(0,\mathcal{Z}\setminus\{z\})\\ \end{cases}

It then uses the random bits bib_{i} to output a reconstruction as follows:

𝒜:={z if i=1nbi=1𝒵{z} otherwise \displaystyle\mathcal{A}:=\begin{cases}z\text{ if }\wedge_{i=1}^{n}b_{i}=1\\ \mathcal{Z}\setminus\{z\}\text{ otherwise }\\ \end{cases}

where \wedge denotes the binary OR operator. In summary, conditioned on the event 𝒞={Ci=0i[n]}\mathcal{C}=\{C_{i}=0\ \forall\ i\in[n]\}, the adversary returns the true data point zz if there exists any i[n]i\in[n] such that bi=1b_{i}=1, and returns 𝒵{z}\mathcal{Z}\setminus\{z\} otherwise. On the contrary, when conditioned on the event 𝒞c={i[n]s.t.Ci=1}\mathcal{C}^{c}=\{\exists\ i\in[n]\ s.t.\ C_{i}=1\}, the adversary outputs the true data point zz almost surely (by construction of \mathcal{M})

Note that, when conditioned on 𝒞\mathcal{C}, bi𝗂.𝗂.𝖽𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(1p)b_{i}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathsf{Bernoulli}(1-p) where p=11+eϵ+eϵeϵ1+eϵ=2(1+eϵ)p=\tfrac{1}{1+e^{\epsilon}}+e^{-\epsilon}\cdot\tfrac{e^{\epsilon}}{1+e^{\epsilon}}=\tfrac{2}{(1+e^{\epsilon})}. Furthermore, by definition of \mathcal{M}, [𝒞]=(1δ)n\mathbb{P}[\mathcal{C}]=(1-\delta)^{n}. Now, define the event \mathcal{B} as ={bi=0i[n]}\mathcal{B}=\{b_{i}=0\ \forall\ i\in[n]\}. Clearly, [|𝒞]=pn\mathbb{P}[\mathcal{B}|\mathcal{C}]=p^{n}. It follows that

𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2]\displaystyle\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] =𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|𝒞][𝒞]+𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|𝒞c][𝒞c]\displaystyle=\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{C}]\mathbb{P}[\mathcal{C}]+\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{C}^{c}]\mathbb{P}[\mathcal{C}^{c}]
=(1δ)n𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|𝒞]\displaystyle=(1-\delta)^{n}\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{C}]
=(1δ)n𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|,𝒞][|𝒞]\displaystyle=(1-\delta)^{n}\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{B},\mathcal{C}]\mathbb{P}[\mathcal{B}|\mathcal{C}]
+(1δ)n𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|c,𝒞][c|𝒞]\displaystyle+(1-\delta)^{n}\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{B}^{c},\mathcal{C}]\mathbb{P}[\mathcal{B}^{c}|\mathcal{C}]
=(1δ)n𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜(h1:n),z)2|,𝒞][|𝒞]\displaystyle=(1-\delta)^{n}\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\mathcal{A}(h_{1:n}),z)^{2}|\mathcal{B},\mathcal{C}]\mathbb{P}[\mathcal{B}|\mathcal{C}]
=(1δ)npn𝖽𝗂𝖺𝗆(𝒵)2\displaystyle=(1-\delta)^{n}p^{n}\mathsf{diam}(\mathcal{Z})^{2}
=(2(1δ)eϵ+1)n𝖽𝗂𝖺𝗆(𝒵)2\displaystyle=\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}\mathsf{diam}(\mathcal{Z})^{2} (18)

where the second equality uses the fact that [𝒜(h1:n)=z|𝒞c]=1\mathbb{P}[\mathcal{A}(h_{1:n})=z|\mathcal{C}^{c}]=1, the fourth equality uses the fact that [𝒜(h1:n)=z|c,𝒞]=1\mathbb{P}[\mathcal{A}(h_{1:n})=z|\mathcal{B}^{c},\mathcal{C}]=1 and the fifth equality uses the fact that [𝒜(h1:n)=𝒵{z}|,𝒞]=1\mathbb{P}[\mathcal{A}(h_{1:n})=\mathcal{Z}\setminus\{z\}|\mathcal{B},\mathcal{C}]=1.

To make the RHS of Equation (B.5) equal to β2\beta^{2}, it suffices to set n=Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))n=\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right).

Validity of the Adversary 𝒜\mathcal{A}

We note that 𝒜\mathcal{A} is a valid adversary in the set of all data reconstruction adversaries. In particular, it satisfies the desiderata put forward in Section 3, and its construction is consistent with the statement of Theorem 4.2, i.e., for every z𝒵z\in\mathcal{Z} and β0\beta\geq 0 there exists a private learner \mathcal{M} and adversary 𝒜\mathcal{A} that needs Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right) queries to \mathcal{M} for achieving a reconstruction error of β2\beta^{2}. In particular, 𝒜\mathcal{A} takes nn i.i.d samples h1:n(𝒟)h_{1:n}\sim\mathcal{M}(\mathcal{D}) and outputs a reconstruction 𝒜(h1:n)𝒵\mathcal{A}(h_{1:n})\in\mathcal{Z} as per a randomized decision rule, which is completely in agreement with our modelling assumptions in Section 3 (note that our threat model is not restricted to deterministic adversaries) and of prior work (Guo et al. (2022); Balle et al. (2022)). Finally, the fact that 𝒜\mathcal{A} has prior knowledge of the values of the privacy parameters (ϵ,δ)(\epsilon,\delta) also does not lead to any inconsistency since the learner’s privacy parameters (ϵ,δ)(\epsilon,\delta) are almost always publicly released (see Definition 1 of Dwork (2011)).

Next, we construct a deterministic majority-based adversary and establish tightness of the same.

Tightness of Majority Adversary

Operating under the same assumptions and parameter settings as above, let (𝒵,ρ)(\mathcal{Z},\rho) denote the two-point metric space, i.e., let 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}, and let z𝒵z\in\mathcal{Z}, β0\beta\geq 0 be arbitrary. Moreover, let \mathcal{M} be as defined in the proof of Theorem 4.2 and let hi=(Ci,zi)𝗂.𝗂.𝖽(𝒟),i[n]h_{i}=(C_{i},z_{i})\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}),\ i\in[n].

We construct the majority adversary 𝒜~\tilde{\mathcal{A}} defined as follows:

𝒜~(h1:n):={zi if Ci=1 for any i[n]z if i[n]𝕀{hi=(0,z)}>n/2𝒵{z} otherwise \displaystyle\tilde{\mathcal{A}}(h_{1:n}):=\begin{cases}z_{i}\text{ if }C_{i}=1\text{ for any }i\in[n]\\ z\text{ if }\sum_{i\in[n]}\mathbb{I}\{h_{i}=(0,z)\}>\nicefrac{{n}}{{2}}\\ \mathcal{Z}\setminus\{z\}\text{ otherwise }\end{cases}

The following result establishes the optimality of this adversary for ϵ0.042\epsilon\geq 0.042.

Theorem (Analysis of Majority Adversary)

Let (𝒵,ρ)(\mathcal{Z},\rho) denote the two-point metric space, i.e., let 𝒵={z1,z2}\mathcal{Z}=\{z_{1},z_{2}\}, and let z𝒵z\in\mathcal{Z}, β0\beta\geq 0 be arbitrary. Moreover, let \mathcal{M} be as defined in the proof of Theorem 4.2 and 𝒜~\tilde{\mathcal{A}} be as defined above. Then, for any ϵ0.042\epsilon\geq 0.042 and δ[0,1]\delta\in[0,1], 𝒜~\tilde{\mathcal{A}} achieves an expected squared reconstruction error of at most β2\beta^{2} by making Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right) queries to \mathcal{M}.

Proof B.4.

Let hi=(Ci,zi)𝗂.𝗂.𝖽(𝒟),i[n]h_{i}=(C_{i},z_{i})\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}),\ i\in[n]. Let 𝒞\mathcal{C} denote the event 𝒞={Ci=0i[n]}\mathcal{C}=\{C_{i}=0\ \forall\ i\in[n]\}. By definition of \mathcal{M}, [𝒞]=(1δ)n\mathbb{P}[\mathcal{C}]=(1-\delta)^{n} and [𝒜~(h1:n)=z|𝒞c]=1\mathbb{P}[\tilde{\mathcal{A}}(h_{1:n})=z|\mathcal{C}^{c}]=1. It follows that:

[𝒜~(h1:n)=𝒵{z}]\displaystyle\mathbb{P}[\tilde{\mathcal{A}}(h_{1:n})=\mathcal{Z}\setminus\{z\}] =[𝒜~(h1:n)=𝒵{z}|𝒞][𝒞]+[𝒜~(h1:n)=𝒵{z}|𝒞c][𝒞c]\displaystyle=\mathbb{P}[\tilde{\mathcal{A}}(h_{1:n})=\mathcal{Z}\setminus\{z\}|\mathcal{C}]\mathbb{P}[\mathcal{C}]+\mathbb{P}[\tilde{\mathcal{A}}(h_{1:n})=\mathcal{Z}\setminus\{z\}|\mathcal{C}^{c}]\mathbb{P}[\mathcal{C}^{c}]
=(1δ)n[i[n]𝕀{hi=(0,z)}n/2|𝒞]\displaystyle=(1-\delta)^{n}\ \mathbb{P}[\sum_{i\in[n]}\mathbb{I}\{h_{i}=(0,z)\}\leq\nicefrac{{n}}{{2}}|\mathcal{C}]

Note that, conditioned on 𝒞\mathcal{C}, 𝕀{hi=(0,z)}𝗂.𝗂.𝖽𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(q)\mathbb{I}\{h_{i}=(0,z)\}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathsf{Bernoulli}(q) where q=eϵeϵ+11/2q=\tfrac{e^{\epsilon}}{e^{\epsilon}+1}\geq\nicefrac{{1}}{{2}}. Define X=i=1n𝕀{hi=(0,z)}X=\sum_{i=1}^{n}\mathbb{I}\{h_{i}=(0,z)\}. Clearly, conditioned on 𝒞\mathcal{C}, X𝖡𝗂𝗇𝗈𝗆𝗂𝖺𝗅(n,q)X\sim\mathsf{Binomial}(n,q). Since nqn/2nq\geq\nicefrac{{n}}{{2}}, we control [Xn/2]\mathbb{P}[X\leq\nicefrac{{n}}{{2}}] via a Chernoff Bound (Article, ) to obtain:

[Xn/2|𝒞]\displaystyle\mathbb{P}[X\leq\nicefrac{{n}}{{2}}|\mathcal{C}] exp(n2d𝖪𝖫(1/2,q))\displaystyle\leq\exp\left(-\tfrac{n}{2}\cdot d_{\mathsf{KL}}(\nicefrac{{1}}{{2}},q)\right)
=exp(n2ln(4q(1q)))\displaystyle=\exp(\tfrac{n}{2}\ln(4q(1-q)))
=[4q(1q)]n/2\displaystyle=[4q(1-q)]^{\nicefrac{{n}}{{2}}}
=(2eϵ+1)nenϵ/2\displaystyle=(\tfrac{2}{e^{\epsilon}+1})^{n}e^{\nicefrac{{n\epsilon}}{{2}}}

where d𝖪𝖫(1/2,q)=𝖪𝖫[𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(1/2)||𝖡𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂(q)]=12ln(12q)+12ln(12(1q))=12ln(4q(1q))d_{\mathsf{KL}}(\nicefrac{{1}}{{2}},q)=\mathsf{KL}[\mathsf{Bernoulli}(\nicefrac{{1}}{{2}})||\mathsf{Bernoulli}(q)]=\tfrac{1}{2}\ln(\tfrac{1}{2q})+\tfrac{1}{2}\ln(\tfrac{1}{2(1-q)})=-\tfrac{1}{2}\ln(4q(1-q)). It follows that,

[𝒜~(h1:n)=𝒵{z}]\displaystyle\mathbb{P}[\tilde{\mathcal{A}}(h_{1:n})=\mathcal{Z}\setminus\{z\}] (2(1δ)eϵ+1)nenϵ/2\displaystyle\leq\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}e^{\nicefrac{{n\epsilon}}{{2}}}

Thus, we conclude

𝔼h1:n𝗂.𝗂.𝖽(D)[ρ(𝒜~(h1:n),z)2]\displaystyle\mathbb{E}_{h_{1:n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(D)}[\rho(\tilde{\mathcal{A}}(h_{1:n}),z)^{2}] =[𝒜(h1:n)=𝒵{z}]𝖽𝗂𝖺𝗆(𝒵)2\displaystyle=\mathbb{P}[\mathcal{A}(h_{1:n})=\mathcal{Z}\setminus\{z\}]\mathsf{diam}(\mathcal{Z})^{2}
(2(1δ)eϵ+1)nenϵ/2𝖽𝗂𝖺𝗆(𝒵)2\displaystyle\leq\left(\frac{2(1-\delta)}{e^{\epsilon}+1}\right)^{n}e^{\nicefrac{{n\epsilon}}{{2}}}\mathsf{diam}(\mathcal{Z})^{2}

To set the RHS of the above inequality equal to β2\beta^{2}, it suffices to set n=Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(1+eϵ2(1δ))ϵ2)n=\Theta\left(\frac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln(\frac{1+e^{\epsilon}}{2(1-\delta)})-\frac{\epsilon}{2}}\right).

Now, consider f:f:\mathbb{R}\to\mathbb{R} defined as f(t)=0.99ln(et+12)t/2f(t)=0.99\ln(\tfrac{e^{t}+1}{2})-\nicefrac{{t}}{{2}}. Note that f(0.042)0f(0.042)\geq 0. Furthermore, f(t)=0.99et1+et0.5>0t0.042f^{\prime}(t)=0.99\tfrac{e^{t}}{1+e^{t}}-0.5>0\ \forall t\geq 0.042. Thus, f(t)0t0.042f(t)\geq 0\ \forall t\ \geq 0.042. Rearranging, we conclude that ln(et+12)t/20.01ln(et+12)\ln(\tfrac{e^{t}+1}{2})-\nicefrac{{t}}{{2}}\geq 0.01\ln(\tfrac{e^{t}+1}{2}).

Hence, for any ϵ0.042\epsilon\geq 0.042, ln(eϵ+12)ϵ/20.01ln(eϵ+12)\ln(\tfrac{e^{\epsilon}+1}{2})-\nicefrac{{\epsilon}}{{2}}\geq 0.01\ln(\tfrac{e^{\epsilon}+1}{2}) and for any δ[0,1]\delta\in[0,1], ln(11δ)0\ln(\tfrac{1}{1-\delta})\geq 0 and thus ln(11δ)0.01ln(11δ)\ln(\tfrac{1}{1-\delta})\geq 0.01\ln(\tfrac{1}{1-\delta}). It follows that for any ϵ0.042\epsilon\geq 0.042 and δ[0,1]\delta\in[0,1],

ln(eϵ+12(1δ))ϵ/20.01ln(eϵ+12(1δ))\displaystyle\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})-\nicefrac{{\epsilon}}{{2}}\geq 0.01\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})

Moreover, since ln(eϵ+12(1δ))ϵ/2ln(eϵ+12(1δ))\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})-\nicefrac{{\epsilon}}{{2}}\leq\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)}), we conclude the following

1ln(eϵ+12(1δ))1ln(eϵ+12(1δ))ϵ/2100ln(eϵ+12(1δ))ϵ0.042,δ[0,1]\displaystyle\frac{1}{\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})}\leq\frac{1}{\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})-\nicefrac{{\epsilon}}{{2}}}\leq\frac{100}{\ln(\tfrac{e^{\epsilon}+1}{2(1-\delta)})}\ \forall\ \epsilon\geq 0.042,\delta\in[0,1]

Thus, for any ϵ0.042\epsilon\geq 0.042 and δ[0,1]\delta\in[0,1], it suffices to set n=Θ(ln(𝖽𝗂𝖺𝗆(𝒵)2/β2)ln(eϵ+12(1δ)))n=\Theta\left(\tfrac{\ln(\nicefrac{{\mathsf{diam}(\mathcal{Z})^{2}}}{{\beta^{2}}})}{\ln\left(\tfrac{e^{\epsilon}+1}{2(1-\delta)}\right)}\right) in order to obtain a squared reconstruction error of at most β2\beta^{2},

B.6 Comparison with Guo et al. (2022)

For convenience, we restate the main result of Guo et al. (2022)

Theorem B.5 (Theorem 1 of Guo et al. (2022)).

Let (𝒵,2)(\mathcal{Z},\|\cdot\|_{2}) be the input metric space, with 𝒵d\mathcal{Z}\subseteq\mathbb{R}^{d}, of a (2,ϵ)(2,\epsilon)-Rényi DP learner :d\mathcal{M}:\mathbb{R}^{d}\to\mathcal{H}. Let 𝒜\mathcal{A} be a reconstruction attack that outputs z^(h)\hat{z}(h) upon observing one sample from the learner’s output distribution , i.e., h(𝒟)h\leftarrow\mathcal{M}(\mathcal{D}). Then, if z^\hat{z} is unbiased,

𝔼[z(h)^z22]i=1d𝖽𝗂𝖺𝗆i(𝒵)24(eϵ1)\mathbb{E}[\|\hat{z(h)}-z\|^{2}_{2}]\geq\frac{\sum_{i=1}^{d}\mathsf{diam}_{i}(\mathcal{Z})^{2}}{4(e^{\epsilon}-1)} (19)

where 𝖽𝗂𝖺𝗆i(𝒵)\mathsf{diam}_{i}(\mathcal{Z}) denotes the i-th coordinate-wise diameter defined as 𝖽𝗂𝖺𝗆i(𝒵):=supz,z𝒵,zj=zjji|zizi|\mathsf{diam}_{i}(\mathcal{Z}):=\underset{z,z^{\prime}\in\mathcal{Z},z_{j}=z^{\prime}_{j}\forall j\neq i}{\sup}|z_{i}-z_{i}^{\prime}|, and CC is a universal constant.

We shall now show that the above result is invalid for a large range of ϵ\epsilon. In particular, Theorem 1 of Guo et al. (2022) yields invalid lower bounds for any ϵ<ln(1+d/4)\epsilon<\ln(1+\nicefrac{{d}}{{4}})

Consider a canonical instance of the data reconstruction problem where the data domain 𝒵\mathcal{Z} is the unit ball in (d,.2)(\mathbb{R}^{d},\|.\|_{2}). Then, i=1d𝖽𝗂𝖺𝗆i(𝒵)2=4d\sum_{i=1}^{d}\mathsf{diam}_{i}(\mathcal{Z})^{2}=4d whereas 𝖽𝗂𝖺𝗆(𝒵)=2\mathsf{diam}(\mathcal{Z})=2. Let z𝒵z\in\mathcal{Z} be arbitrary. As per our problem setup in Section 3, (𝒟{z})\mathcal{M}(\mathcal{D}_{-}\cup\{z\}) denotes the output distribution induced by the randomized learner, when the target sample is zz. By definition of the diameter, any data reconstruction attack z^\hat{z} satisfies the trivial upper bound

𝔼h(𝒟{z})[z^(h)z22]𝖽𝗂𝖺𝗆(𝒵)2=4\mathbb{E}_{h\sim\mathcal{M}(\mathcal{D}_{-}\cup\{z\})}\left[\|\hat{z}(h)-z\|^{2}_{2}\right]\leq\mathsf{diam}(\mathcal{Z})^{2}=4 (20)

However, the lower bound in Theorem 1 of Guo et al. (2022) states that, for any data reconstruction attack z^\hat{z} on a (2,ϵ)(2,\epsilon) Renyi Differentially Private learner, the reconstruction MSE is lower bounded as

𝔼h(𝒟{z})[z^(h)z22]i=1d𝖽𝗂𝖺𝗆i(𝒵)24(eϵ1)deϵ1\mathbb{E}_{h\sim\mathcal{M}(\mathcal{D}_{-}\cup\{z\})}\left[\|\hat{z}(h)-z\|^{2}_{2}\right]\geq\frac{\sum_{i=1}^{d}\mathsf{diam}_{i}(\mathcal{Z})^{2}}{4(e^{\epsilon}-1)}\geq\frac{d}{e^{\epsilon}-1}

The RHS in the above bound is strictly greater than 𝖽𝗂𝖺𝗆(𝒵)2=4\mathsf{diam}(\mathcal{Z})^{2}=4 for any ϵ<ln(1+d/4)\epsilon<\ln(1+\nicefrac{{d}}{{4}}), thereby contradicting the trivial upper bound established above.

Appendix C Analysis of Data Reconstruction for ϵ\epsilon Metric DP

C.1 Reconstruction Lower Bound : Proof of Theorem 5.1

We borrow the notation of PzP_{z} and PznP^{n}_{z} from the proof of Theorem 4.1 in Appendix B.2. Let Z:={z1,,zM}𝒵Z:=\{z_{1},\dots,z_{M}\}\subseteq\mathcal{Z} be a finite subset of 𝒵\mathcal{Z} (to be specified later) with M>1M>1. Following the same steps as the proof of Theorem 1.1 by applying Markov’s inequality, we obtain:

supz𝒵𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ2supz𝒵Pzn[ρ(𝒜(h1:n),z)Δ)]\displaystyle\geq\Delta^{2}\sup_{z\in\mathcal{Z}}P^{n}_{z}[\rho(\mathcal{A}(h_{1:n}),z)\geq\Delta)]
Δ2supj[M]Pzjn[ρ(𝒜(h1:n),zj)Δ)]\displaystyle\geq\Delta^{2}\sup_{j\in[M]}P^{n}_{z_{j}}[\rho(\mathcal{A}(h_{1:n}),z_{j})\geq\Delta)] (21)

where the last inequality follows from the fact that Z𝒵Z\subseteq\mathcal{Z}. For a tight lower bound, we must carefully construct the set ZZ. We recall from Section 6 that the adversary’s task of reconstructing a training sample z𝒵z\in\mathcal{Z} is at least as hard deciding which among the {z1,zM}\{z_{1},\dots z_{M}\} was included in the training set, i.e., reconstruction is at least as hard as an MM-ary hypothesis testing problem. To reduce from reconstruction to testing, we construct a sample space {z1,,zM}\{z_{1},\dots,z_{M}\} such that the elements are uniformly distinguishable i.e., ρ(zi,zj)2Δij[M]\rho(z_{i},z_{j})\geq 2\Delta\ \forall i\neq j\in[M], i.e., ZZ is a 2Δ2\Delta packing in the ρ\rho metric. For ease of exposition, we assume ρ\rho is a homogeneous metric.

Given this construction, we again follow the same steps as the proof of Theorem 4.1. In particular, suppose there exists zj,zkz_{j},z_{k} such that ρ(𝒜(h1:n),zj)ρ(𝒜(h1:n),zk)\rho(\mathcal{A}(h_{1:n}),z_{j})\geq\rho(\mathcal{A}(h_{1:n}),z_{k}) were true, then by an application of triangle inequality, one would have:

ρ(𝒜(h1:n),zj)\displaystyle\rho(\mathcal{A}(h_{1:n}),z_{j}) ρ(𝒜(h1:n),zk)\displaystyle\geq\rho(\mathcal{A}(h_{1:n}),z_{k})
ρ(zk,zj)ρ(𝒜(h1:n),zj)\displaystyle\geq\rho(z_{k},z_{j})-\rho(\mathcal{A}(h_{1:n}),z_{j})
2Δρ(𝒜(h1:n),zj)\displaystyle\geq 2\Delta-\rho(\mathcal{A}(h_{1:n}),z_{j})

Thus, ρ(𝒜(h1:n),zj)ρ(𝒜(h1:n),zk)ρ(𝒜(h1:n),zj)Δ\rho(\mathcal{A}(h_{1:n}),z_{j})\geq\rho(\mathcal{A}(h_{1:n}),z_{k})\implies\rho(\mathcal{A}(h_{1:n}),z_{j})\geq\Delta. Defining the minimum distance test Ψ\Psi as Ψ:=argminj[M]ρ(𝒜(h1:n),zj)\Psi:=\underset{j\in[M]}{\operatorname{argmin}}\rho(\mathcal{A}(h_{1:n}),z_{j}), we conclude that

Pzjn[ρ(𝒜(h1:n),zj)Δ)]\displaystyle P^{n}_{z_{j}}[\rho(\mathcal{A}(h_{1:n}),z_{j})\geq\Delta)] Pzjn[Ψ(h1:n))zj]\displaystyle\geq P^{n}_{z_{j}}[\Psi(h_{1:n}))\neq z_{j}]

It follows that

supz𝒵𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\sup_{z\in\mathcal{Z}}\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ2supz𝒵Pzn[ρ(𝒜(h1:n),z)Δ)]\displaystyle\geq\Delta^{2}\sup_{z\in\mathcal{Z}}P^{n}_{z}[\rho(\mathcal{A}(h_{1:n}),z)\geq\Delta)]
Δ2supj[M]Pzjn[Ψ(h1:n))zj]\displaystyle\geq\Delta^{2}\sup_{j\in[M]}P^{n}_{z_{j}}[\Psi(h_{1:n}))\neq z_{j}]
Δ2[11/M2j,k=1M𝖪𝖫(Pzjn,Pzkn)+log2log(M1)]\displaystyle\geq\Delta^{2}\left[1-\frac{1/M^{2}\sum_{j,k=1}^{M}\mathsf{KL}(P_{z_{j}}^{n},P_{z_{k}}^{n})+\log 2}{\log(M-1)}\right]
=Δ2[1n/M2j,k=1M𝖪𝖫(Pzj,Pzk)+log2log(M1)]\displaystyle=\Delta^{2}\left[1-\frac{\nicefrac{{n}}{{M^{2}}}\sum_{j,k=1}^{M}\mathsf{KL}(P_{z_{j}},P_{z_{k}})+\log 2}{\log(M-1)}\right] (22)

where the second inequality applies Lemma A.7 and the last inequality uses the tensorization of KL divergence for product measures.

Now, let SS be maximal 1/2\nicefrac{{1}}{{2}}-packing of the unit ball 𝔹(𝒵)\mathbb{B}(\mathcal{Z}). Note that 2|S|<2\geq|S|<\infty due to the local compactness of 𝒵\mathcal{Z} (Rudin, 1976). Since 𝒵\mathcal{Z} is a linear homogeneous metric space, we can scale SS by a factor of 4Δ4\Delta (assuming Δ1/4\Delta\leq\nicefrac{{1}}{{4}} without loss of generality) to obtain a maximal 2Δ2\Delta packing of 𝔹(𝒵)\mathbb{B}(\mathcal{Z}). Let this scaling of SS be the set ZZ. Note that M=|Z|=|S|M=|Z|=|S| and by maximality of the packing SS, ZZ is also a 2Δ2\Delta covering of 𝔹(𝒵)\mathbb{B}(\mathcal{Z}). Thus, ρ(zj,zk)2Δ\rho(z_{j},z_{k})\leq 2\Delta and thus, by Lemma A.9, the following holds:

𝖪𝖫(Pzj||Pzk)=𝖪𝖫((𝒟{zj})||(𝒟{zk}))ϵ2ρ(zj,zk)222ϵ2Δ2\displaystyle\mathsf{KL}(P_{z_{j}}||P_{z_{k}})=\mathsf{KL}(\mathcal{M}(\mathcal{D_{-}}\cup\{z_{j}\})||\mathcal{M}(\mathcal{D_{-}}\cup\{z_{k}\}))\leq\frac{\epsilon^{2}\rho(z_{j},z_{k})^{2}}{2}\leq 2\epsilon^{2}\Delta^{2}
𝔼h1,,hn𝗂.𝗂.𝖽(𝒟{z})[ρ(𝒜(h1:n),z)2]\displaystyle\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D_{-}}\cup\{z\})}[\rho(\mathcal{A}(h_{1:n}),z)^{2}] Δ2[1nϵ2Δ2+log2lnM(1/2,𝔹(𝒵),ρ)]\displaystyle\gtrsim\Delta^{2}\left[1-\frac{n\epsilon^{2}\Delta^{2}+\log 2}{\ln M(\nicefrac{{1}}{{2}},\mathbb{B}(\mathcal{Z}),\rho)}\right]

Optimizing over Δ\Delta and taking an infimum over all adversaries 𝒜\mathcal{A} gives us 𝒱(,n)d~nϵ2\mathcal{V}(\mathcal{M},n)\gtrsim\tfrac{\tilde{d}}{n\epsilon^{2}} where d~=lnM(1/2,𝔹(𝒵),ρ)\tilde{d}=\ln M(\nicefrac{{1}}{{2}},\mathbb{B}(\mathcal{Z}),\rho). As before, the required query complexity is obtained by upper bounding 𝒱(,n)\mathcal{V}(\mathcal{M},n) with β2\beta^{2} and rearranging the inequality to obtain a lower bound on nn.

C.2 Reconstruction Upper Bound : Proof of Theorem 5.2

Our upper bound construction is based upon the exponential mechanism in (d,˙2)(\mathbb{R}^{d},\|\dot{\|}_{2}). In particular, for μd\mu\in\mathbb{R}^{d}, let πμ\pi_{\mu} be a probability measure on d\mathbb{R}^{d} whose density with respect to the Lebesgue measure 𝖫𝖾𝖻\mathsf{Leb} is given by,

𝖽πμ,ϵ𝖽𝖫𝖾𝖻(x)eϵxμ2\displaystyle\frac{\mathsf{d}\pi_{\mu,\epsilon}}{\mathsf{d}\mathsf{Leb}}(x)\propto e^{-\epsilon\|x-\mu\|_{2}}

For any zdz\in\mathbb{R}^{d}, we set the output distribution of \mathcal{M} to be (𝒟)=(𝒟{z})=πz,ϵ\mathcal{M}(\mathcal{D})=\mathcal{M}(\mathcal{D_{-}}\cup\{z\})=\pi_{z,\epsilon}. Note that for any μ1,μ2d\mu_{1},\mu_{2}\in\mathbb{R}^{d}, their ratio of densities is bounded as follows due to the triangle inequality

𝖽πμ1,ϵ𝖽πμ2,ϵ(x)=eϵ(xμ22xμ12)eϵμ1μ22\displaystyle\frac{\mathsf{d}\pi_{\mu_{1},\epsilon}}{\mathsf{d}\pi_{\mu_{2},\epsilon}}(x)=e^{\epsilon(\|x-\mu_{2}\|_{2}-\|x-\mu_{1}\|_{2})}\leq e^{\epsilon\|\mu_{1}-\mu_{2}\|_{2}}

Note that the same upper bound holds for 𝖽πμ2,ϵ𝖽πμ1,ϵ\frac{\mathsf{d}\pi_{\mu_{2},\epsilon}}{\mathsf{d}\pi_{\mu_{1},\epsilon}} It follows that for any Borel set AdA\subseteq\mathbb{R}^{d},

eϵμ1μ22πμ2,ϵ(A)πμ1,ϵ(A)eϵμ1μ22πμ2,ϵ(A)e^{-\epsilon\|\mu_{1}-\mu_{2}\|_{2}}\pi_{\mu_{2},\epsilon}(A)\leq\pi_{\mu_{1},\epsilon}(A)\leq e^{\epsilon\|\mu_{1}-\mu_{2}\|_{2}}\pi_{\mu_{2},\epsilon}(A)

Consequently, (𝒟)\mathcal{M}(\mathcal{D}) is ϵ\epsilon Lipchitz DP w.r.t 2\|\cdot\|_{2}.

We consider a reconstruction adversary which draws h1,,hn𝗂.𝗂.𝖽(𝒟)h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D}) and computes 𝒜(h1,,hn)=1ni=1nhi\mathcal{A}(h_{1},\dots,h_{n})=\tfrac{1}{n}\sum_{i=1}^{n}h_{i}. It is easy to see that 𝔼h1,,hn𝗂.𝗂.𝖽(𝒟)[𝒜(h1,,hn)z22]=σ2n\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D})}[\|\mathcal{A}(h_{1},\dots,h_{n})-z\|^{2}_{2}]=\frac{\sigma^{2}}{n} where σ2=𝔼xπz,ϵ[xz22]\sigma^{2}=\mathbb{E}_{x\sim\pi_{z,\epsilon}}[\|x-z\|^{2}_{2}].

To sharply bound σ2\sigma^{2}, we first note that by Theorem 1 of Bobkov (2003) shows that for any μd\mu\in\mathbb{R}^{d}, the Poincare constant (or the inverse spectral gap) of πμ,ϵ\pi_{\mu,\epsilon} is λ=Θ(d/ϵ2)\lambda=\Theta(\nicefrac{{d}}{{\epsilon^{2}}}). We now bound σ2\sigma^{2} using the fact that for any distribution, an inverse spectral gap of λ\lambda implies subexponential concentration with parameter Θ(λ)\Theta(\sqrt{\lambda}) (Bobkov and Ledoux, 1997; Aida and Stroock, 1994). In particular, Theorem 2.7 of Huang and Tropp (2021) ensures that σ2λlog2(d)=dlog2(d)ϵ2\sigma^{2}\leq\lambda\log^{2}(d)=\tfrac{d\log^{2}(d)}{\epsilon^{2}}. It follows that,

𝔼h1,,hn𝗂.𝗂.𝖽(𝒟)[𝒜(h1,,hn)z22]dlog2(d)nϵ2\displaystyle\mathbb{E}_{h_{1},\dots,h_{n}\stackrel{{\scriptstyle\mathsf{i.i.d}}}{{\sim}}\mathcal{M}(\mathcal{D})}[\|\mathcal{A}(h_{1},\dots,h_{n})-z\|^{2}_{2}]\lesssim\frac{d\log^{2}(d)}{n\epsilon^{2}}

To make the RHS at most β2\beta^{2}, it suffices to set n=Θ(dlog2(d)ϵ2β2)n=\Theta(\tfrac{d\log^{2}(d)}{\epsilon^{2}\beta^{2}})

Appendix D Metric Differential Privacy in Practice

We first note that mDP is a strict generalization of DP.

Lemma D.1 (Metric DP is a generalization of DP).

Let 𝒵\mathcal{Z} be an arbitrary collection of secrets. A randomized learner :𝒵\mathcal{M}:\mathcal{Z}\to\mathcal{H} is ϵ\epsilon-differentially private if and only if \mathcal{M} is ϵ\epsilon-metric differentially private with respect to the Hamming metric on 𝒵N\mathcal{Z}^{N}.

Proof D.2.

See Lemma 2.3 of Boedihardjo et al. (2022)

Next, we establish the requisite framework for analyzing mDP in practice and present an extension of the Gaussian Noise Mechanism to metric privacy. We emphasize that, since mDP is a generalization of DP, practical applications would require a relaxation similar to 2.3 in order to establish privacy guarantees. To this end, we consider the following relaxed version of metric differential privacy.

Definition D.3 (Rényi mDP).

Given a metric space (𝒵,ρ)(\mathcal{Z},\rho), for any ϵ0,α[1,]\epsilon\geq 0,\alpha\in[1,\infty], a randomized learner :𝒵n\mathcal{M}:\mathcal{Z}^{n}\to\mathcal{H} is (α,ϵ)(\alpha,\epsilon)-Rényi mDP if for every pair of input datasets 𝒟,𝒟𝒵\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{Z} the Rényi Divergence is bounded as follows

Dα(P||Q)=1α1log𝔼xQ[(P(x)Q(x))α]ϵρ(𝒟,𝒟)D_{\alpha}(P||Q)=\frac{1}{\alpha-1}\log\mathbb{E}_{x\sim Q}\left[\left(\frac{P(x)}{Q(x)}\right)^{\alpha}\right]\leq\epsilon\rho(\mathcal{D},\mathcal{D}^{\prime}) (23)

where PP and QQ denote the output distributions (𝒟)\mathcal{M}(\mathcal{D}) and (𝒟)\mathcal{M}(\mathcal{D}^{\prime}), respectively.

It is easy to see that Definition D.3 recovers Rényi DP when restricted to the Hamming metric, in a manner similar to how mDP generalizes pure DP. Before proceeding further, we recall standard notions of sensitivity and noise mechanisms for classical differential privacy.

A common paradigm in releasing statistical information with differential privacy is to generate a noisy estimate of the true statistic. Namely, if f:𝒵df:\mathcal{Z}\to\mathbb{R}^{d} is a real-valued function333The restriction to real-valued functions is not essential., (𝒟)=f(𝒟)+η\mathcal{M}(\mathcal{D})=f(\mathcal{D})+\eta is differentially private for an appropriate noise level η\eta. The magnitude of perturbation is generally calibrated according to the sensitivity of the function ff, defined as follows:

Definition D.4 (Sensitivity).

Given any function f:𝒵df:\mathcal{Z}\to\mathbb{R}^{d},

Δf=max𝒟,𝒟𝒵;𝒟𝒟1f(𝒟)f(𝒟)\Delta_{f}=\underset{\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{Z};\|\mathcal{D}-\mathcal{D}^{\prime}\|\leq 1}{\max}\|f(\mathcal{D})-f(\mathcal{D}^{\prime})\| (24)

The Gaussian mechanism, for instance, requires that f(𝒟)f(\mathcal{D}) be perturbed with noise drawn from the Gaussian distribution, as follows:

Definition D.5 (Gaussian Mechanism).

Given any function f:𝒵df:\mathcal{Z}\to\mathbb{R}^{d}, the Gaussian mechanism is defined by

(𝒟)=f(𝒟)+𝒩(0,Δf2σ2)\mathcal{M}(\mathcal{D})=f(\mathcal{D})+\mathcal{N}(0,\Delta_{f}^{2}\sigma^{2}) (25)

For ϵ<1\epsilon<1 and c2>2ln(1.25/δ)c^{2}>2\ln(1.25/\delta), the Gaussian mechanism with parameter σcΔf/ϵ\sigma\geq c\Delta_{f}/\epsilon satisfies (ϵ,δ)(\epsilon,\delta)-DP (Dwork and Roth, 2014).

The sensitivity constraint described in Definition D.4 is over all neighboring datasets 𝒟,𝒟𝒵\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{Z}, in order to accommodate for classical differential privacy. For metric differential privacy, we define a relaxed requirement which we call input lipschitzness.

Definition D.6 (Input Lipschitzness).

A function f:𝒵df:\mathcal{Z}\to\mathbb{R}^{d} is LinputL_{input} input lipschitz if the following holds

f(𝒟)f(𝒟)Linputρ(𝒟,𝒟)\|f(\mathcal{D})-f(\mathcal{D}^{\prime})\|\leq L_{input}\rho(\mathcal{D},\mathcal{D}^{\prime}) (26)

Equipped with this notion, we extend the Gaussian mechanism for DP (D.5) to mDP, as follows:

Proposition D.7 (Gaussian Mechanism for mDP).

Consider any two datasets 𝒟,𝒟𝒵\mathcal{D},\mathcal{D}^{\prime}\in\mathcal{Z}, at a distance ρ(𝒟,𝒟)\rho(\mathcal{D},\mathcal{D}^{\prime}) and an input lipschitz function f:𝒵df:\mathcal{Z}\to\mathbb{R}^{d}. For any δ(0,1)\delta\in(0,1) and c2>ln(1.25/δ)c^{2}>\ln(1.25/\delta), the Gaussian Mechanism with parameter σcLinput/ϵ\sigma\geq cL_{input}/\epsilon is (ϵ,δ)(\epsilon,\delta)-mDP, where LinputL_{input} is the input Lipchitz constant of ff.

Proof D.8.

From the definition of input Lipschitzness and sensitivity, it follows that

Δf=f(𝒟)f(𝒟)Linputρ(𝒟,𝒟)\Delta_{f}=\|f(\mathcal{D})-f(\mathcal{D}^{\prime})\|\leq L_{input}\rho(\mathcal{D},\mathcal{D}^{\prime}) (27)

Setting ϵ=ϵρ(𝒟,𝒟)\epsilon=\epsilon\rho(\mathcal{D},\mathcal{D}^{\prime}), we are required to bound the following quantity for ff applied on datasets 𝒟,𝒟\mathcal{D},\mathcal{D}^{\prime}:

|lne12σ2x2e12σ2x+Δf2|ϵ=ϵρ(𝒟,𝒟)\left|\ln\frac{e^{\frac{-1}{2\sigma^{2}}\|x\|^{2}}}{e^{\frac{-1}{2\sigma^{2}}\|x+\Delta_{f}\|^{2}}}\right|\leq\epsilon=\epsilon\rho(\mathcal{D},\mathcal{D}^{\prime}) (28)

where the numerator is the probability of observing a specific output f(𝒟)+xf(\mathcal{D})+x and the denominator is the probability of observing the same output when the input dataset is replaced by 𝒟\mathcal{D}^{\prime}. The proof of our claim then follows trivially from the proof of Gaussian Mechanism for (ϵ,δ)(\epsilon,\delta) differential privacy (Dwork and Roth, 2014).

We are now ready to develop mDP accountants for privacy-preserving algorithms. A popular approach to differentially private deep learning involves an extension of Stochastic Gradient Descent (SGD)(Abadi et al., 2016) to incorporate gradient clipping and noisy gradient updates. The resulting algorithm is called Differentially Private SGD (DP-SGD), wherein, (ϵ,δ)(\epsilon,\delta) differential privacy is guaranteed via an application of the Gaussian mechanism at each iteration. Since mDP is a generalization of traditional DP, we expect that iterative DP algorithms that rely on additive noise at each intermediate solution step are often inherently mDP, owing to the application of composition theorems that admit natural extensions to arbitrary metric spaces.

D.1 Metric DP Accounting for DP-SGD

In the analysis of DP-SGD (Abadi et al., 2016), privacy is attained via the addition of Gaussian distributed noise. Standard arguments on the composition of privacy mechanisms renders the differentially private variant of SGD (qϵ,qδ)(q\epsilon,q\delta)-differentially private at each step, where q=L/Nq=L/N is the lot size. We note that, by replacing the global sensitivity assumption with input Lipschitzness, the Gaussian mechanism ensures (ϵ,δ)(\epsilon,\delta)-mDP for appropriately scaled noise, as in Proposition D.7. Thus, DP-SGD inherently incorporates metric differential privacy, with privacy accounting based on standard composition theorems.

D.2 Privacy Analysis for Metric DP in PN-SGD

When analysing the privacy of learning algorithms that require iterative updates on an intermediate solution, it is common practice to ensure privacy at each iteration and argue about the cumulative loss of privacy via composition theorems. Another popular direction is the theoretical analysis of Noisy Stochastic Gradient Descent to formalize privacy amplifications under certain assumptions and obtain bounds on the degradation of privacy across iterations. We extend the analysis of one such algorithm, the Projected Noisy Stochastic Gradient Descent (PNSGD) (Feldman et al., 2018), restated in Algorithm 1, and establish that the algorithm satisfies (α,ϵ)(\alpha,\epsilon)-Rényi mDP with a lower noise magnitude than that required for (α,ϵ)(\alpha,\epsilon)-Rényi DP.

Algorithm 1 Projected Noisy Stochastic Gradient Descent (Algorithm 1 of Feldman et al. (2018))
  Input: Dataset S={x1,,xn},f:𝒦×𝒳S=\{x_{1},...,x_{n}\},f:\mathcal{K}\times\mathcal{X}\rightarrow\mathbb{R}, which is a convex function in the first parameter, learning rate η\eta, starting point w0𝒦w_{0}\in\mathcal{K}, noise parameter σ\sigma
  for t{0,,n1}t\in\{0,...,n-1\} do
     vt+1wtη(wf(wt,xt+1)+Z)v_{t+1}\leftarrow w_{t}-\eta(\nabla_{w}f(w_{t},x_{t+1})+Z), where Z𝒩(0,σ2𝕀d)Z\sim\mathcal{N}(0,\sigma^{2}\mathbb{I}_{d}).
     wt+1𝒦(vt+1)w_{t+1}\leftarrow\prod_{\mathcal{K}}(v_{t+1}), where 𝒦(w)=argminθ𝒦θw2\prod_{\mathcal{K}}(w)=\arg\min_{\theta\in\mathcal{K}}\|\theta-w\|_{2} is the l2l_{2} projection on 𝒦\mathcal{K}
  end for
  return the final iterate wnw_{n}

Proposition

Let 𝒦d\mathcal{K}\subset\mathbb{R}^{d} be a convex set and let {f(.,x)}xcX\{f(.,x)\}_{x\in cX} be a family of convex, GG Lipschitz, β\beta-smooth functions over 𝒦\mathcal{K}, where the gradients are LinputL_{input}-input Lipschitz. Furthermore, assume 𝒳\mathcal{X} is a bounded set. Then, for any η2/β\eta\leq 2/\beta and α>1\alpha>1, initializing w0𝒦w_{0}\in\mathcal{K} and dataset S𝒳nS\in\mathcal{X}^{n}, PNSGD run with σ2αGLinputϵ(nt+1)\sigma^{2}\geq\frac{\alpha GL_{input}}{\epsilon(n-t+1)} satisfies (α,ϵ)(\alpha,\epsilon)-Rényi mDP.

Proof D.9.

The proof of this theorem is an extension of Theorem 23 of Feldman et al. (2018) to Renyi mDP. Let S(x1,,xn)S\coloneqq(x_{1},\dots,x_{n}) and S(x1,,xt1,xt,,xn)S^{\prime}\coloneqq(x_{1},\dots,x_{t-1},x^{{}^{\prime}}_{t},\dots,x_{n}) be two arbitrary datasets that differ in index tt. Since η2/β\eta\leq 2/\beta, the projected SGD updates are contractive noisy iterations as per Definition 19 of Feldman et al. (2018).

We define the projected SGD operators for the dataset SS as gi(w)=Π𝒦(w)ηf(Π𝒦(w),xi),i[n]g_{i}(w)=\Pi_{\mathcal{K}}(w)-\eta\nabla f(\Pi_{\mathcal{K}}(w),x_{i}),\ i\in[n]. The projected SGD operators gig^{{}^{\prime}}_{i} for the dataset SS^{\prime} are defined in a similar fashion. Since the datsets differ only in the ttht^{\textrm{th}} index, we note that gi=gii[n]{t}g_{i}=g^{{}^{\prime}}_{i}\ \forall i\in[n]\setminus\{t\} and,

supwgt(w)gt(w)2\displaystyle\sup_{w}\|g_{t}(w)-g^{{}^{\prime}}_{t}(w)\|_{2} =ηsupwf(Π𝒦(w),xt)f(Π𝒦(w),xt)2\displaystyle=\eta\sup_{w}\|\nabla f(\Pi_{\mathcal{K}}(w),x_{t})-\nabla f(\Pi_{\mathcal{K}}(w),x^{{}^{\prime}}_{t})\|_{2}
=ηsupwf(Π𝒦(w),xt)f(Π𝒦(w),xt)2f(Π𝒦(w),xt)f(Π𝒦(w),xt)2\displaystyle=\eta\sup_{w}\sqrt{\|\nabla f(\Pi_{\mathcal{K}}(w),x_{t})-\nabla f(\Pi_{\mathcal{K}}(w),x^{{}^{\prime}}_{t})\|_{2}}\sqrt{\|\nabla f(\Pi_{\mathcal{K}}(w),x_{t})-\nabla f(\Pi_{\mathcal{K}}(w),x^{{}^{\prime}}_{t})\|_{2}}
η2GLinputxtxt2\displaystyle\leq\eta\sqrt{2GL_{input}\|x_{t}-x^{{}^{\prime}}_{t}\|_{2}}

Now, applying Theorem 22 of Feldman et al. (2018) with a1,,at1=0a_{1},\dots,a_{t-1}=0, at,,an=η2GLinputxtxt2nt+1a_{t},\dots,a_{n}=\frac{\eta\sqrt{2GL_{input}\|x_{t}-x^{{}^{\prime}}_{t}\|_{2}}}{n-t+1}, st=η2GLinputxtxt2s_{t}=\eta\sqrt{2GL_{input}\|x_{t}-x^{{}^{\prime}}_{t}\|_{2}} and si=0,i[n]{t}s_{i}=0,\ \forall i\in[n]\setminus\{t\}, we conclude,

Dα(wn||wn)α2η2σ2i=1nai2αGLinputxtxt2σ2(nt+1)\displaystyle D_{\alpha}(w_{n}||w^{{}^{\prime}}_{n})\leq\frac{\alpha}{2\eta^{2}\sigma^{2}}\sum_{i=1}^{n}a^{2}_{i}\leq\frac{\alpha GL_{input}\|x_{t}-x^{{}^{\prime}}_{t}\|_{2}}{\sigma^{2}(n-t+1)}

From the above inequality, we conclude that setting σ2αGLϵ(nt+1)\sigma^{2}\geq\frac{\alpha GL}{\epsilon(n-t+1)} suffices to ensure Dα(wn||wn)ϵxtxtD_{\alpha}(w_{n}||w^{{}^{\prime}}_{n})\leq\epsilon\|x_{t}-x^{{}^{\prime}}_{t}\|.

Appendix E Related Literature

Theoretical investigation of reconstruction adversaries in the framework of differential private machine learning was initially studied in the recent works of Balle et al. (2022) and Guo et al. (2022), in the bayesian and frequentist settings respectively. Their analyses, however, only considers a single round of communication between the adversary and the learner, thereby limiting the expressibility of the attack. Balle et al. (2022) introduced the notion of reconstruction robustness to bound the success probability of an attack against ϵ\epsilonDP learners, and, under certain modeling assumptions on the adversary’s prior belief, derived upper bounds as a function of ϵ\epsilon. This metric is somewhat restrictive, since it is only equivalent to DP when considering perfect reconstruction, and requires a very specific class of priors, concentrated on pairs of data points(see Theorem 5 in Balle et al. (2022)). Instead, we focus on the minimax rate of reconstruction error of an adversary, described in detail in Section 3. This is more general than reconstruction robustness, and captures a privacy game between the learner and the adversary. Informally, the learner chooses a target sample that is the hardest to reconstruct, while the adversary picks the optimal attack that minimizes the error of reconstruction.

Guo et al. (2022) operate in the frequentist setting, and obtain lower bounds on the Euclidean MSE of reconstruction through the output of (2,ϵ)(2,\epsilon) Rényi DP learners. A key limitation of their work is that their lower bound result is invalid for a broad range of ϵ\epsilon, as we formally state in Section 4, with corresponding proof in Appendix B.6. Furthermore, prior work requires the input metric space to be compact and relies on the existence of coordinate-wise diameters. Such stringent assumptions hinder the generalizability of their results beyond d\mathbb{R}^{d}, particularly in unstructured domains such as text, where the notion of coordinate-wise diameters may not be well-defined or relevant.

Stock et al. (2022) obtain lower bounds on the leakage of secrets in language tasks in Rényi differentially private learners. However, they are restricted to a structured and simplistic notion of secret bits, which does not capture the complexity of privacy protection in unstructured textual data. On the contrary, our lower bound results are more general and hold for arbitrary metric spaces.