This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\foreach\x

in a,…,z \foreach\xin A,…,Z \foreach\xin A,…,Z \foreach\xin A,…,Z

Enhancing Feature-Specific Data Protection via
Bayesian Coordinate Differential Privacy thanks: The authors are listed in alphabetical order.

Maryam Aliakbarpour Department of Computer Science & Ken Kennedy Institute, Rice University    Syomantak Chaudhuri Department of Electrical Engineering and Computer Sciences, University of California, Berkeley    Thomas A. Courtade {}^{\text{\textdaggerdbl}}    Alireza Fallah {}^{\text{\textdaggerdbl}}    Michael I. Jordan Department of Electrical Engineering and Computer Sciences & Department of Statistics, University of California, Berkeley; INRIA, Paris
(October 23, 2024)
Abstract

Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties. However, LDP applies uniform protection to all data features, including less sensitive ones, which degrades performance of downstream tasks. To overcome this limitation, we propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification. This more nuanced approach complements LDP by adjusting privacy protection according to the sensitivity of each feature, enabling improved performance of downstream tasks without compromising privacy. We characterize the properties of BCDP and articulate its connections with standard non-Bayesian privacy frameworks. We further apply our BCDP framework to the problems of private mean estimation and ordinary least-squares regression. The BCDP-based approach obtains improved accuracy compared to a purely LDP-based approach, without compromising on privacy.

1 Introduction

The rapid expansion of machine learning applications across various sectors has fueled an unprecedented demand for data collection. With the increase in data accumulation, user privacy concerns also intensify. Differential privacy (DP) has emerged as a prominent framework that enables algorithms to utilize data while preserving the privacy of individuals who provide it Warner (1965); Evfimievski et al. (2003); Dwork et al. (2006, 2014); Vadhan (2017); Desfontaines (2020). A variant of DP, local differential privacy (LDP), suitable for distributed settings where users do not trust any central authority has been extensively studied (Warner, 1965; Evfimievski et al., 2003; Beimel et al., 2008; Kasiviswanathan et al., 2011; Chan et al., 2012). In this framework, each user shares an obfuscated version of their datapoint, meaningful in aggregate to the central server yet concealing individual details. Leading companies, including Google (Erlingsson et al., 2014), Microsoft (Ding et al., 2017), and Apple (Differential Privacy Team, Apple, 2017; Thakurta et al., 2017), have employed this approach for data collection.

In the LDP framework, a privacy parameter must be set—typically by the data collector—that determines the level of protection for user information. This decision is particularly challenging in high-dimensional settings, where user data includes multiple attributes such as name, date of birth, and social security number, each with differing levels of sensitivity. One common strategy is to allow the most sensitive data feature to dictate the level of privacy, providing the highest level of protection. Alternatively, separate private mechanisms can be applied to each feature according to its required privacy level. However, this approach fails when there is correlation between features. For instance, setting a lower level of privacy for a feature that is consistently aligned with a more sensitive one risks compromising the privacy of the sensitive feature.

This complexity raises a compelling question: Can we tailor privacy protection to specific data features if the correlation among data coordinates is controlled, without defaulting to the privacy level of the most sensitive coordinate? In other words, can we leverage users’ lower sensitivity preferences for certain features to reduce the error in our inference algorithms, provided the correlation with sensitive coordinates is weak?

Our contributions:

Our main contribution is addressing the challenge of feature-specific data protection from a Bayesian perspective. We present a novel Bayesian privacy framework, which we refer to as Bayesian Coordinate Differential Privacy (BCDP). BCDP complements LDP and allows a tailored privacy demand per feature. In essence, BCDP ensures that the odds of inferring a specific feature (also referred to as a coordinate) remain nearly unchanged before and after observing the outcome of the private mechanism. The framework’s parameters control the extent to which the odds remain unchanged, enabling varying levels of privacy per feature. As part of developing this new framework, we explore its formal relationship with other related DP notions and examine standard DP properties, such as post-processing and composition.

To demonstrate the efficacy of our framework, we study two fundamental problems in machine learning under a baseline level of LDP privacy, coupled with more stringent per-coordinate BCDP constraints. First, we address multivariate mean estimation. Our proposed algorithm satisfies BCDP constraints while outperforming a standard LDP algorithm, where the privacy level is determined by the most sensitive coordinate. Our experimental results confirm this advantage. Second, we present an algorithm for ordinary least squares regression under both LDP and BCDP constraints.

A key technical component of our algorithm for these two problems is a mechanism that builds on any off-the-shelf LDP mechanism and can be applied to various problems. Our algorithm involves making parallel queries to the LDP mechanism with a series of carefully selected privacy parameters where the ii-th query omits the i1i-1 most sensitive coordinates. The outputs are then aggregated to produce the most accurate estimator for each coordinate.

Bayesian Feature-Specific Protection:

We now give a high-level discussion of privacy in the Bayesian framework to motivate our approach. Formal definitions are in Section 2. Consider a set of users, each possessing a high-dimensional data point \vecx\vecx with varying privacy protection needs across different features. For instance, \vecx\vecx may contain genetic markers for diseases, and the ii-th coordinate xix_{i} could reveal Alzheimer’s markers that the user wishes to keep strictly confidential. In a local privacy setting, users apply a locally private mechanism MM to obtain obfuscated data M(\vecx)M(\vecx), for public aggregation and analysis. Our focus is on understanding what can be inferred about \vecx\vecx, especially the sensitive feature xix_{i}, from the obfuscated data.

Now, imagine an adversary placing a bet on a probabilistic event related to \vecx\vecx. Without extra information, their chances of guessing correctly are based on prior knowledge of the population. However, access to M(\vecx)M(\vecx) could prove advantageous for inference. Privacy in the Bayesian model is measured by how much M(\vecx)M(\vecx) improves the adversary’s chances. In other words, the privacy parameter limits how much their odds can shift.

The standard Bayesian interpretation of LDP ensures uniform privacy across all data features. In contrast, our framework BCDP enhances LDP by enabling feature-specific privacy controls. That is, BCDP restricts adversarial inference of events concerning individual coordinates. For example, BCDP ensures that the advantage an adversary gains from observing M(\vecx)M(\vecx) when betting on whether a person has the Alzheimer’s trait (i.e., xi=1x_{i}=1) is smaller than what would be achieved under a less stringent LDP constraint.

We remark that BCDP does not replace LDP, since it lacks global data protection. Also, correlation plays a crucial role in BCDP’s effectiveness: if a sensitive feature is highly correlated with less sensitive features, increasing protection for the sensitive feature imposes stricter protection for other features as well. As the correlation between features increases, BCDP’s guarantees converge to the uniform guarantees of LDP.

Related Work:

Statistical inference with LDP guarantees has been extensively studied. For comprehensive surveys see Xiong et al. (2020); Yang et al. (2023). Below, we discuss the results that are most relevant to our setup.

Several papers consider the uneven protection of privacy across features. For instance, Ghazi et al. (2022) and Acharya et al. (2020) focus on settings where the privacy parameters vary across different neighboring datasets. Other models arise from altering the concept of neighboring datasets (Kifer and Machanavajjhala, 2014; He et al., 2014), or they emphasize indistinguishability guarantees within a redefined metric space for the datasets or data points Chatzikokolakis et al. (2013); Alvim et al. (2018); Imola et al. (2022); Andrés et al. (2013). Another relevant notion for classification tasks is Label-DP which considers labels of training data as sensitive, but not the features themselves (Chaudhuri and Hsu, 2011; Beimel et al., 2013; Wang and Xu, 2019; Ghazi et al., 2021). A recent generalization of Label-DP, proposed by Mahloujifar et al. (2023), is Feature-DP, which allows specific features of the data to be disclosed by the algorithm while ensuring the privacy of the remaining information. A line of work, e.g., Kifer and Machanavajjhala (2011); Kenthapadi et al. (2013), studies attribute DP, which limits changes in the output distribution of the privacy mechanism when a single coordinate of the data is modified.

A common shortcoming in the aforementioned approaches is that the heterogeneous privacy protection across features, without accounting for their correlations, can lead to unaddressed privacy leakage—an issue our framework is specifically designed to mitigate. Another key distinction of our work is the integration of the BCDP framework with the LDP framework, enabling simultaneous protection at both the feature level and the global level.

A Bayesian approach to Central-DP is considered in Triastcyn and Faltings (2020) where the notation of neighboring dataset are taken over pairs that are drawn from the same distribution. This is similar in spirit to our framework but we operate in the Local-DP regime so we work with a prior instead. Xiao and Devadas (2023) consider a Bayesian approach to privacy where the goal is to prevent an observer from being able to nearly guess the input to a mechanism under a given notion of distance and error probability. Another Bayesian interpretation of DP is presented in Kasiviswanathan and Smith (2014) and show that DP ensures that the posterior distribution after observing the output of a DP algorithm is close to the prior in Total-Variation distance. None of these results explicitly focus on varying privacy features across coordinates.

Private mean estimation and private least-squares regression are well studied problems in literature in both central (Biswas et al., 2020; Karwa and Vadhan, 2017; Kifer et al., 2012; Vu and Slavkovic, 2009) and local (Asi et al., 2022; Feldman and Talwar, 2021; Zheng et al., 2017) DP regimes. We depart from these works as we consider a BCDP constraint in addition to an LDP constraint.

Organization:

We present a formal treatment of our framework in Section 2. Some key properties of the framework are presented in Section 3. We consider the problem of mean estimation in Section 4 and ordinary least-squares regression in Section 5.

2 Problem Formulation

Let \cX\cX denote the data domain represented as the Cartesian product \cX:=×j=1d\cXj\cX:=\bigtimes_{j=1}^{d}\cX_{j}. One can consider this as a general decomposition where each of dd canonical data coordinates can be categorical or a real value, or a combination of both, as the setting dictates. For a vector \vecx\cX\vecx\in\cX, we let xix_{i} denote the ii-th coordinate, and \vecxi\vecx_{-i} the vector with (d1)(d-1) coordinates obtained by removing the ii-th coordinate xix_{i} from \vecx\vecx. A private mechanism MM is a randomized algorithm that maps \vecx\vecx to M(\vecx)M(\vecx) in an output space \cY\cY.

Let (\cX,\cF\cX)(\cX,\cF_{\cX}) and (\cY,\cF\cY)(\cY,\cF_{\cY}) be measurable spaces. Let each coordinate space \cXi\cX_{i} be equipped with σ\sigma-algebra \cF\cXi\cF_{\cX_{i}}, and take \cF\cX\cF_{\cX} equal to the product σ\sigma-algebra ×i=1d\cF\cXi\bigtimes_{i=1}^{d}\cF_{\cX_{i}}. A randomized mechanism MM from \cX\cX to \cY\cY can be represented by a Markov kernel μ:(\cX,\cF\cX)(\cY,\cF\cY)\mu:(\cX,\cF_{\cX})\to(\cY,\cF_{\cY}) where for an input \vecx\cX\vecx\in\cX, the output M(\vecx)M(\vecx) is a \cY\cY-valued random variable with law μ\vecx()\mu_{\vecx}(\cdot).

Local differential privacy

We first revisit the definition of Local Differential Privacy (LDP).

Definition 1 (Local DP (Kasiviswanathan et al., 2011)).

A mechanism MM is ε\varepsilon-LDP if, for all R\cF\cYR\in\cF_{\cY} and all \vecx,\vecx\cX\vecx,\vecx’\in\cX, we have:

μ\vecx(R)eεμ\vecx(R).\mu_{\vecx}(R)\leq e^{\varepsilon}\mu_{\vecx^{\prime}}(R). (1)

With the understanding that M(\vecx)μ\vecx()M(\vecx)\sim\mu_{\vecx}(\cdot), Definition 1 can be rewritten in the more familiar form Pr{M(\vecx)R}eεPr{M(\vecx)R}\vecx,\vecx\cX,R\cF\cY\text{Pr}\{M(\vecx)\in R\}\leq e^{\varepsilon}\text{Pr}\{M(\vecx^{\prime})\in R\}\ \forall\vecx,\vecx^{\prime}\in\cX,R\in\cF_{\cY}.

Next, we focus on a Bayesian interpretation of this setting. Suppose we have an underlying data distribution π\pi. That is, we equip (\cX,\cF\cX)(\cX,\cF_{\cX}) with the probability measure π\pi which we call the prior. This induces a probability space (\cX×\cY,\cF\cX×\cF\cY,P)(\cX\times\cY,\cF_{\cX}\times\cF_{\cY},P) where the measure PP is characterized via

P(S×R)=Sμ\vecx(R)𝑑π(\vecx),S\cF\cX,R\cF\cY.P(S\times R)=\int_{S}\mu_{\vecx}(R)d\pi(\vecx),\ S\in\cF_{\cX},R\in\cF_{\cY}. (2)

Note that the mechanism MM defining μ\mu is precisely represented on this space by the random variable M:(\vecx,y)\cX×\cYyM:(\vecx,y)\in\cX\times\cY\mapsto y. By a slight abuse of notation, we continue to denote the (randomized) output of the mechanism by M(\vecx)M(\vecx). We first define the sets on \cX\cX which have positive probability under π\pi

\cF\cX+\displaystyle\cF_{\cX}^{+} ={S\cF\cX|π(S)>0}.\displaystyle=\{S\in\cF_{\cX}|\pi(S)>0\}. (3)

For S\cF\cX+S\in\cF_{\cX}^{+} and R\cF\cYR\in\cF_{\cY}, note that

P{M(\vecx)R|\vecxS}\displaystyle P\{M(\vecx)\in R|\vecx\in S\} =P(S×R)π(S).\displaystyle=\frac{P(S\times R)}{\pi(S)}. (4)

Bayesian differential privacy:

We now introduce a Bayesian formulation of the LDP definition which involves both the mechanism MM and the prior π\pi. This can be seen as the local version of the definitions proposed in earlier works, e.g., Kifer and Machanavajjhala (2014); Triastcyn and Faltings (2020).

Definition 2 (Bayesian DP).

The pair (π,M)(\pi,M) is ε\varepsilon-BDP if for all R\cF\cYR\in\cF_{\cY} and all S,S\cF\cX+S,S^{\prime}\in\cF_{\cX}^{+}, we have:

P{M(\vecx)R|\vecxS}eεP{M(\vecx)R|\vecxS}.\displaystyle P\{M(\vecx)\in R|\vecx\in S\}\leq e^{\varepsilon}P\{M(\vecx)\in R|\vecx\in S^{\prime}\}. (5)

We show in Proposition 1 that ε\varepsilon-LDP and ε\varepsilon-BDP can be viewed as equivalent, modulo some technicalities.

Proposition 1.

If MM is ε\varepsilon-LDP then (π,M)(\pi,M) is ε\varepsilon-BDP. Conversely, if (π,M)(\pi,M) is ε\varepsilon-BDP, then (assuming \cY\cY is a Polish space) there exists a mechanism MM^{\prime} which is ε\varepsilon-LDP such that P{M(\vecx)=M(\vecx)}=1P\{M(\vecx)=M^{\prime}(\vecx)\}=1.

The reverse direction of the equivalence above relies on \cY\cY being a Polish space, which should capture all settings of practical interest. The proof can be found in Appendix A.1.

As discussed in Kifer and Machanavajjhala (2014), we can interpret the BDP definition in terms of odds ratios. The prior odds of two disjoint events SS and SS^{\prime} is given by π(S)/π(S)\pi(S)/\pi(S^{\prime}). Now, for a set R𝒴R\in\mathcal{F}_{\mathcal{Y}} with positive measure, the posterior odds, after observing mechanism output M(\vecx)RM(\vecx)\in R, is equal to

P{\vecxS|M(\vecx)R}P{\vecxS|M(\vecx)R}.\frac{P\{\vecx\in S|M(\vecx)\in R\}}{P\{\vecx\in S^{\prime}|M(\vecx)\in R\}}. (6)

One can easily verify that the BDP definition implies that the ratio of posterior and prior odds, also known as the odds ratio, is in the interval [eε,eε][e^{-\varepsilon},e^{\varepsilon}]. In other words, the BDP definition limits how much the output of the mechanism changes the odds, which in turn limits our ability to distinguish whether the data \vecx\vecx was in SS or SS^{\prime}.

Coordinate differential privacy:

We wish to measure the privacy loss of each coordinate incurred by a mechanism MM. Initially, one might hypothesize that by altering the definition of LDP to reflect changes in specific coordinates, we could achieve privacy protection for certain features of the user’s data. More precisely, consider the following definition of Coordinate DP (CDP)111The acronym CDP is not to be confused with Concentrated DP, which is not considered in this work. where 𝒄\bm{c} represents a vector of the desired levels of privacy for each coordinate.

Definition 3 (Coordinate DP).

Let 𝒄\bbR0d\bm{c}\in\bbR_{\geq 0}^{d}. A mechanism MM is 𝒄\bm{c}-CDP if, for all R\cF\cYR\in\cF_{\cY} and all \vecx,\vecx\cX\vecx,\vecx^{\prime}\in\cX, we have the implication:

\vecxi=\vecxiμ\vecx(R)eciμ\vecx(R).\vecx_{-i}=\vecx^{\prime}_{-i}~{}\Rightarrow~{}\mu_{\vecx}(R)\leq e^{c_{i}}\mu_{\vecx^{\prime}}(R). (7)

A special case of CDP, where all cic_{i}’s are equal, is referred to as attribute DP in the literature Kifer and Machanavajjhala (2011); Kenthapadi et al. (2013); Ghazi et al. (2022). Additionally, CDP itself can be viewed as a special case of Partial DP, introduced in Ghazi et al. (2022).

The drawback of this definition is that it does not account for potential correlation among the data coordinates and, therefore, does not provide the type of guarantee we hope for regarding sensitive parts of the data. In other words, in 𝒄\bm{c}-CDP, stating that the privacy level of coordinate ii is cic_{i} could be misleading. For instance, having ci=0c_{i}=0 does not necessarily ensure that no information is leaked about xix_{i}, as the other (potentially less-well protected) coordinates \vecxi\vecx_{-i} may be correlated with it.

Our formulation: Bayesian coordinate differential privacy:

In order to capture the correlation among the data coordinates, we adopt a Bayesian interpretation of differential privacy. As seen in Proposition 1, BDP and LDP are closely related and this serves as the main motivation for our definition.

For i[d]i\in[d], define

\cF\cXi+={Si\cF\cXi|π{xiSi}>0}.\cF_{\cX_{i}}^{+}=\{S_{i}\in\cF_{\cX_{i}}|\pi\{x_{i}\in S_{i}\}>0\}. (8)

For Si\cF\cXi+S_{i}\in\cF_{\cX_{i}}^{+}, we have

P{M(\vecx)R|xiSi}=P((Si×\cXi)×R)π(Si×\cXi),P\{M(\vecx)\in R|x_{i}\in S_{i}\}=\frac{P((S_{i}\times\cX_{-i})\times R)}{\pi(S_{i}\times\cX_{-i})}, (9)

where \cXi=×j[d],ji\cXj\cX_{-i}=\bigtimes_{j\in[d],j\neq i}\cX_{j}. Thus, similar to Definition 2, we have a natural version of privacy for each coordinate which we shall refer to as Bayesian Coordinate DP (BCDP).

Definition 4 (Bayesian Coordinate DP).

Let 𝜹\bbR0d\bm{\delta}\in\bbR_{\geq 0}^{d}. A pair (π,M)(\pi,M) is 𝜹\bm{\delta}-BCDP if, for each i[d]i\in[d], and for all R\cF\cYR\in\cF_{\cY}, Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+}, we have:

P{M(\vecx)R|xiSi}eδiP{M(\vecx)R|xiSi}.P\{M(\vecx)\in R|x_{i}\in S_{i}\}\leq e^{\delta_{i}}P\{M(\vecx)\in R|x_{i}\in S_{i}^{\prime}\}. (10)

Unlike the Coordinate-DP definition, this definition accounts for potential correlation between coordinates through incorporation of the prior π\pi.

It is worth noting that, similar to the BDP definition, we can interpret the BCDP definition in terms of the odds ratio, with the difference that we consider the two events {xiSi}\{x_{i}\in S_{i}\} and {xiSi}\{x_{i}\in S_{i}’\}. In other words, having a small δi\delta_{i} guarantees that the mechanism’s output does not provide much useful information for discriminating between the possibilities that the underlying data has coordinate xix_{i} in SiS_{i} or SiS_{i}’. This aligns with our goal of proposing a new formulation that characterizes how much information is revealed about each particular coordinate. We further discuss this interpretation in the next section.

3 Properties of BCDP

In this section, we discuss a number of properties following from the BCDP definition. Figure 1 compiles some of the results from Section 2 and Section 3 on how the different notions of Local DP (LDP), Coordinate-DP (CDP), Bayesian DP (BDP), and Bayesian Coordinate-DP (BCDP) are interrelated.

3.1 BCDP through the Lens of Hypothesis Testing

In this subsection, we present an interpretation of what the privacy vector 𝜹\bm{\delta} quantifies in 𝜹\bm{\delta}-BCDP. We begin by revisiting the connection between hypothesis testing and the definition of differential privacy, and demonstrate how this connection extends to the 𝜹\bm{\delta}-BCDP definition. For an ε\varepsilon-LDP mechanism MM represented by Markov kernel μ\mu, suppose an adversary observes a realization of the output YY and is tasked with determining whether the input was \vecx\vecx or \vecx\vecx’. This scenario can be formulated as the following binary hypothesis problem:

H0\displaystyle H_{0} :Yμ\vecx(),\displaystyle:Y\sim\mu_{\vecx}(\cdot),
H1\displaystyle H_{1} :Yμ\vecx().\displaystyle:Y\sim\mu_{\vecx^{\prime}}(\cdot).

Let the rejection region be R\cYR\subseteq\cY. Denote the type I error (false alarm) rate as α(R,M,\vecx,\vecx)\alpha(R,M,\vecx,\vecx^{\prime}) and the type II error (missed detection) rate as β(R,M,\vecx,\vecx)\beta(R,M,\vecx,\vecx^{\prime}). We restate a known result below (Kairouz et al., 2015; Wasserman and Zhou, 2010).

Proposition 2.

A mechanism M()M(\cdot) is ε\varepsilon-LDP iff the following condition holds for any \vecx,\vecx\cX\vecx,\vecx^{\prime}\in\cX and any R\cF\cYR\in\cF_{\cY}:

eεα(R,M,\vecx,\vecx)+β(R,M,\vecx,\vecx)\displaystyle e^{\varepsilon}\alpha(R,M,\vecx,\vecx^{\prime})+\beta(R,M,\vecx,\vecx^{\prime}) 1.\displaystyle\geq 1. (11)

We note that the criterion (11) can be equivalently replaced by the criterion

α(R,M,\vecx,\vecx)+eεβ(R,M,\vecx,\vecx)1.\alpha(R,M,\vecx,\vecx^{\prime})+e^{\varepsilon}\beta(R,M,\vecx,\vecx^{\prime})\geq 1.

As noted in Kairouz et al. (2015), this operational perspective shows that it is impossible to force type I and type II error rates to simultaneously vanish for a DP algorithm, and the reverse implication also holds.

For BCDP:

Now, consider a 𝜹\bm{\delta}-BCDP mechanism MM. Suppose an adversary has knowledge of the prior π\pi and again observes the algorithm’s output YY. For any i[d]i\in[d], the adversary aims to distinguish whether the output was generated from data where the ii-th coordinate belongs to the set SiS_{i} or SiS_{i}’. In other words, we consider the following binary hypothesis testing problem:

H0i\displaystyle H_{0}^{i} :YP{|xiSi},\displaystyle:Y\sim P\{\,\cdot\,|x_{i}\in S_{i}\},
H1i\displaystyle H_{1}^{i} :YP{|xiSi}.\displaystyle:Y\sim P\{\,\cdot\,|x_{i}\in S_{i}^{\prime}\}.

Let the rejection region for the ii-th hypothesis test be Ri\cF\cYR_{i}\in\cF_{\cY}. For the ii-th test, denote the type I and type II error rates as α(Ri,M,Si,Si)\alpha(R_{i},M,S_{i},S_{i}^{\prime}) and β(Ri,M,Si,Si)\beta(R_{i},M,S_{i},S_{i}^{\prime}). Then, the following result holds:

Theorem 1.

A pair (π,M)(\pi,M) is 𝛅\bm{\delta}-BCDP iff the following condition holds for every coordinate i[d]i\in[d]:

eδiα(Ri,M,Si,Si)+β(Ri,M,Si,Si)\displaystyle e^{\delta_{i}}\alpha(R_{i},M,S_{i},S_{i}^{\prime})+\beta(R_{i},M,S_{i},S_{i}^{\prime}) 1,\displaystyle\geq 1, (12)

for any Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+} and Ri\cF\cYR_{i}\in\cF_{\cY}.

We note that the criteria (12) can be equivalently replaced by the criteria

α(Ri,M,Si,Si)+eδiβ(Ri,M,Si,Si)1.\alpha(R_{i},M,S_{i},S_{i}^{\prime})+e^{\delta_{i}}\beta(R_{i},M,S_{i},S_{i}^{\prime})\geq 1.

The proof of Theorem 1 is presented in Section A.2. Thus, the theorem shows that the vector 𝜹\bm{\delta} in the 𝜹\bm{\delta}-BCDP formulation captures the privacy loss for each of the coordinates in the same way ε\varepsilon in ε\varepsilon-DP captures the privacy loss of the entire data. In contrast, the vector \vecc\vecc in \vecc\vecc-CDP does not have such an interpretation of privacy loss across the coordinates.

εL\varepsilon_{L}-LDPεB\varepsilon_{B}-BDP𝒄\bm{c}-CDP𝜹\bm{\delta}-BCDPεL=εB\varepsilon_{L}=\varepsilon_{B}𝜹εB𝟏\bm{\delta}\preceq\varepsilon_{B}\bm{1} 𝒄εL𝟏\bm{c}\preceq\varepsilon_{L}\bm{1}𝜹func. of (𝒄,π)\bm{\delta}\preceq\text{func. of }(\bm{c},\pi)εLici\varepsilon_{L}\leq\sum_{i}c_{i}
Figure 1: General relation of LDP (Definition 1), BDP (Definition 2), CDP (Definition 3) and BCDP (Definition 4). The condition \veca\vecb\veca\preceq\vecb is to be read as aibiia_{i}\leq b_{i}\ \forall i. The implication arrows and the described transformation of the parameters are to be interpreted as sufficient condition. For example, an εL\varepsilon_{L}-DP mechanism is guaranteed to be \vecc\vecc-CDP for \veccεL𝟏\vecc\preceq\varepsilon_{L}\bm{1}. The function that translates \vecc\vecc-CDP under the prior π\pi to BCDP is presented in the Proposition 4.

3.2 BCDP can be achieved via LDP

We begin by a simple result that follows directly from ε\varepsilon-BDP definition. If a mechanism is ε\varepsilon-LDP, then it is also ε\varepsilon-BDP. Noticing that ε\varepsilon-BDP ensures a ratio of eεe^{\varepsilon} for each coordinate as well in (10), we see that each coordinate has a privacy level of ε\varepsilon as well. Proposition 3 formalizes this observation and the proof can be found in Section A.3.

Proposition 3.

(LDP to BCDP) An ε\varepsilon-LDP mechanism satisfies ε𝟏\varepsilon\bm{1}-BCDP.

Thus, a baseline approach to implement 𝜹\bm{\delta}-BCDP is to use a (miniδi)(\min_{i}\delta_{i})-LDP mechanism. Nevertheless, this would mean that we provide the most conservative privacy guarantee required for one coordinate to all coordinates. However, as we will show, we can maintain an ε>min𝜹\varepsilon>\min\bm{\delta} for the overall privacy guarantee while providing the 𝜹\bm{\delta}-BCDP guarantee, therefore achieving lower error for downstream tasks.

3.3 BCDP does not imply LDP

It should be noted that BCDP does not ensure LDP, BDP, or CDP. Example 1 below demonstrates a mechanism that is BCDP, but not LDP.

Example 1.

Consider data \vecx=(x1,x2)\vecx=(x_{1},x_{2}) where x1,x2x_{1},x_{2} are i.i.d. Bern(12)(\frac{1}{2}), and let 𝒴={0,1}\mathcal{Y}=\{0,1\}. Let MM be defined by the table below, with parameters a,b,c(0,1]a,b,c\in(0,1]. Mechanism MM is not LDP (and hence, neither BDP nor CDP). However, for BCDP, the terms P{M(\vecx)=y|xi=0}P\{M(\vecx)=y|x_{i}=0\} and P{M(\vecx)=y|xi=1}P\{M(\vecx)=y|x_{i}=1\} have a finite multiplicative ranges in y=0,1y=0,1, for both i=1,2i=1,2. For instance, if a=b=c=1/2a=b=c=1/2, the resulting MM is (log(2),log(2))(\log(2),\log(2))-BCDP.

P{M(\vecx)=1|\vecx}P\{M(\vecx)=1|\vecx\} x1x_{1}
0 1
x2x_{2} 0 aa bb
1 0 cc

The takeaway from the above example is that there can be priors and mechanisms such that the prior-mechanism pair is 𝜹\bm{\delta}-BCDP but not ε\varepsilon-DP for any value of ε<\varepsilon<\infty. Intuitively, BCDP quantifies privacy loss for each coordinate. In Example 1, one cannot distinguish much about any particular coordinate when the output is observed due to the BCDP guarantee. However, one may still be able to make conclusions over the coordinates jointly. For instance, if the mechanism output in Example 1 is 11, then the input could not have been \vecx=(0,1)\vecx=(0,1).

As this result suggests, just ensuring BCDP is not sufficient to guarantee overall privacy. In other words, the right way to think of BCDP is that it is a tool that allows us to quantify the privacy offered by an algorithm for each coordinate. The user can demand overall ε\varepsilon-LDP along with 𝜹\bm{\delta}-BCDP where the value δi\delta_{i} could be significantly smaller than ε\varepsilon for those coordinates that the user feels more concerned about their privacy.

3.4 Achieving BCDP via CDP

As we discussed earlier, CDP does not imply the BCDP, as it does not take into account the correlation among the features. Therefore, a natural question arises: could CDP, along with some bound on the dependence among the features, lead to a bound with respect to the BCDP definition? The next result provides an answer to this question. We denote the total variation distance between two distributions PP and QQ as 𝖳𝖵(P,Q)\mathsf{TV}(P,Q).

Proposition 4.

(CDP to BCDP) Suppose mechanism MM is 𝐜\bm{c}-CDP and there exists q1,,qdq_{1},\ldots,q_{d} such that

𝖳𝖵(π\vecxi|xiSi,π\vecxi|xiSi)qiSi,Si\cF\cXi+.\mathsf{TV}(\pi_{\vecx_{-i}|x_{i}\in S_{i}},\pi_{\vecx_{-i}|x_{i}\in S_{i}^{\prime}})\leq q_{i}\ \forall\ S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+}. (13)

Then:

  1. 1.

    The mechanism is i=1nci\sum_{i=1}^{n}c_{i}-LDP and 𝜹\bm{\delta}-BCDP with δi=ci+log(1+qiejicjqi)\delta_{i}=c_{i}+\log(1+q_{i}e^{\sum_{j\neq i}c_{j}}-q_{i}).

  2. 2.

    In case that the mechanism is ε\varepsilon-LDP for some ε0\varepsilon\geq 0, then it is 𝜹\bm{\delta}^{\prime}-BCDP with δi=min{ci+log(1+qieεqi),ε}\delta^{\prime}_{i}=\min\{c_{i}+\log(1+q_{i}e^{\varepsilon}-q_{i}),\varepsilon\}.

For the proof, see Appendix A.4. To better highlight this result, consider a simple example in which that the input’s dd coordinates are redundant copies of the same value and we have access to a \vecc\vecc-CDP mechanism MM. Then, as Proposition 4 suggests, and given that we have qi=1q_{i}=1 here, this mechanism is 𝜹\bm{\delta}-BCDP with δi=jcj\delta_{i}=\sum_{j}c_{j}. This is, of course, intuitive, as all coordinates reveal information about the same value. It also reinforces that the BCDP is sensitive to correlations, unlike CDP. The first part of Proposition 4 provides an immediate method for constructing a BCDP mechanism using mechanisms that offer CDP guarantees, such as applying off-the-shelf LDP mechanisms per coordinate. This can also be done without requiring full knowledge of the prior. The second part of Proposition 4 suggests that in cases where we design an algorithm that has a tighter LDP guarantee than just i=1nci\sum_{i=1}^{n}c_{i}, we can further improve the BCDP bound. One example of such a construction is provided in our private mean estimation algorithm in Section 4. One challenge with this approach is translating the constraint 𝜹\bm{\delta} to a vector 𝒄\bm{c}. In Section A.5, we provide a tractable approach to do so by imposing a linear constraint on \vecc\vecc.

3.5 Do Composition and Post-Prossesing Hold for BCDP?

Composition is a widely used and well-known property of LDP. Nonetheless, as the following example highlights, the composition, as traditionally applied in differential privacy settings, does not hold in the BCDP setting.

Example 2.

Let \vecx3\vecx\in\mathbb{R}^{3} be a Bernoulli vector where coordinates’ distributions are independent and each one is Ber(1/2)\text{Ber}(1/2). Consider the mechanism

M(\vecx)={[x1x2,x3] with probability 12[x2,x1x3] with probability 12,M(\vecx)=\begin{cases}[x_{1}\oplus x_{2},x_{3}]^{\top}&\text{ with probability }\frac{1}{2}\\ [x_{2},x_{1}\oplus x_{3}]^{\top}&\text{ with probability }\frac{1}{2},\end{cases} (14)

where \oplus denotes the sum in 2\mathbb{Z}_{2}. It is straightforward to verify that M()M(\cdot) is in fact δ1=0\delta_{1}=0-BCDP with respect to the first coordinate. However, having two independent copies of M()M(\cdot) is not BCDP with respect to the first coordinate. To see this, consider the case where x1=1x_{1}=1 and, in two copies of M(\vecx)M(\vecx), one adds x1x_{1} to x2x_{2} and the other adds it to x3x_{3}. By observing the two copies, we can determine x1=1x_{1}=1.

On the other hand, the post-processing property holds for BCDP (see Appendix A.6 for the proof).

Proposition 5.

(Post processing for BCDP) Let M()M(\cdot) be a 𝛅\bm{\delta}-BCDP mechanism and K:𝒴𝒵K:\mathcal{Y}\to\mathcal{Z} be a measurable function. Then, KMK\circ M is also 𝛅\bm{\delta}-BCDP.

4 Private Mean Estimation

Mechanism 1 Proposed locally private mechanism M𝗆𝖾𝖺𝗇(\vecx,\vecc)M_{\mathsf{mean}}(\vecx,\vecc)
  Input: user data \vecx[1,1]d\vecx\in[-1,1]^{d} and non-decreasing vector \vecc\vecc.
  c00c_{0}\leftarrow 0
  For k[d]k\in[d], set wk=(ckck1)2(dk+1)w_{k}=\frac{(c_{k}-c_{k-1})^{2}}{(d-k+1)}
  for k[d]k\in[d] do
     \vecYkM𝖫𝖣𝖯(\vecxk:d,ckck1,dk+1)\bbRdk+1\vecY^{k}\leftarrow M_{\mathsf{LDP}}(\vecx_{k:d},c_{k}-c_{k-1},\sqrt{d-k+1})\in\bbR^{d-k+1}
  end for
  ν^i(k=1iYi+1kkwk)/(k=1iwk)\hat{\nu}_{i}\leftarrow(\sum_{k=1}^{i}Y_{i+1-k}^{k}w_{k})/(\sum_{k=1}^{i}w_{k}) for i[d]i\in[d]
  Return 𝝂^\hat{\bm{\nu}}
x1x_{1}x2x_{2}\dotsxd1x_{d-1}xdx_{d}cdcd1c_{d}-c_{d-1}Y1dY_{1}^{d}cd1cd2c_{d-1}-c_{d-2}Y1d1Y_{1}^{d-1}Y2d1Y_{2}^{d-1}\vdots\vdotsc1c0c_{1}-c_{0}Y11Y_{1}^{1}Y21Y_{2}^{1}\dotsYd11Y_{d-1}^{1}Yd1Y_{d}^{1}
Figure 2: The figure illustrates how the mechanism M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} in Mechanism 1 obtains the estimate ^ν\bm{\hat{}}{\nu}. For example, the vector YkY^{k} is obtained by taking the vector (xk,xk+1,,xd)(x_{k},x_{k+1},\ldots,x_{d}) and using the LDP channel M𝖫𝖣𝖯M_{\mathsf{LDP}} with privacy parameter ckck1c_{k}-c_{k-1}. ν^i\hat{\nu}_{i} is obtained by taking a linear combination of the component of {Yk}k[d]\{Y^{k}\}_{k\in[d]} corresponding to xix_{i}, i.e., directly below xix_{i} in the figure.
Mechanism 2 Local DP channel for data in \cB2(r)\cB_{2}(r) proposed by Duchi et al. (2013a)
  Input: user data \vecv\vecv, Local-DP level α0\alpha\geq 0, and bound rr such that \vecv\cB2(r)\vecv\in\cB_{2}(r).
  dd\leftarrowdimension(\vecv\vecv)
  Bπdreα+1eα1Γ(d+12)Γ(d2+1)B\leftarrow\sqrt{\pi}dr\frac{e^{\alpha}+1}{e^{\alpha}-1}\frac{\Gamma(\frac{d+1}{2})}{\Gamma(\frac{d}{2}+1)}
  KK\sim Bern(eαeα+1)(\frac{e^{\alpha}}{e^{\alpha}+1})
  SS\sim Bern(12+\vecv22r)(\frac{1}{2}+\frac{\|\vecv\|_{2}}{2r})
  \vecv~(2S1)r\vecv\vecv2\tilde{\vecv}\leftarrow(2S-1)\frac{r\vecv}{\|\vecv\|_{2}}
  Return ZZ\sim Uniform(\vecz\bbRd:(2K1)\veczT\vecv~>0,\vecz2=B)(\vecz\in\bbR^{d}:(2K-1)\vecz^{T}\tilde{\vecv}>0,\|\vecz\|_{2}=B)

To demonstrate the practical use of the BCDP framework, we consider the private mean estimation problem.

Problem Setup:

Let there be nn users submitting their data to a central server via Local DP mechanisms. Let user jj’s data be \vecx(j)[1,1]d\vecx^{(j)}\in[-1,1]^{d}, where \vecx(j)\vecx^{(j)} is drawn from prior π(j)\pi^{(j)} on [1,1]d[-1,1]^{d}; the server and users are not assumed to know the priors (which may be different for each individual user). Our goal is to estimate the empirical mean \vecx¯=j=1n\vecx(j)n[1,1]d\bar{\vecx}=\sum_{j=1}^{n}\frac{\vecx^{(j)}}{n}\in[-1,1]^{d} under ε\varepsilon-LDP and 𝜹\bm{\delta}-BCDP. We consider l2l_{2}-norm as our error metric. Without loss of generality, we assume that 𝜹\bm{\delta} is in non-decreasing order. We make the following assumption regarding the priors.

Assumption 1.

For every i[d1]i\in[d-1] and every j[n]j\in[n], we have

𝖳𝖵(πxi|xiSi(j),πxi|xiSi(j))qSi,Si\cF\cXi+.\mathsf{TV}(\pi^{(j)}_{x_{-i}|x_{i}\in S_{i}},\pi^{(j)}_{x_{-i}|x_{i}\in S_{i}^{\prime}})\leq q\quad\forall S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+}. (15)

Here qq can be interpreted as a measure of how much different coordinates are interdependent. For instance, in the case where coordinates are independent we have q=0q=0, and in the case where they are fully equal, we have q=1q=1. Note that we do not need (15) to hold for the last coordinate (i=di=d). While this slight relaxation does not play a role in this section, it will be useful when we move to the private optimization application.

It is also worth emphasizing that our analysis extends to cases where we have heterogeneous bounds across different coordinates (similar to (13)\eqref{eq:tv-bound}) and different across users. However, we opt for a universal bound for the sake of simplifying the presentation of results. We assume that the upper bound qq is known to the designer of the local channels, which can be either the end user themselves or the central server.

Our proposed privacy mechanism M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} (Mechanism 1) builds upon any LDP mechanism M𝖫𝖣𝖯M_{\mathsf{LDP}}, capable of handling different dimensional inputs, that can be used for mean estimation as a black-box tool. While Algorithm 1 satisfies the privacy constraints for any black-box LDP mechanism M𝖫𝖣𝖯M_{\mathsf{LDP}}, we shall focus on the mechanism presented by Duchi et al. (2013a) for proving error bounds. The mechanism of Duchi et al. (2013a), which is known to be minimax optimal for mean estimation, is outlined in Mechanism 2 for completeness. A figure explaining M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} is presented in Figure 2.

Theorem 2 shows that our proposed mechanism satisfies the desired privacy constraints and provides a bound on the mean-squared error (MSE). The notation aba\wedge b refers to min{a,b}\min\{a,b\}. later in this section we present a refined result with a more interpretable MSE bound in Corollary 1.

Theorem 2.

Suppose 1 holds and let δ~i:=min{δi,ε}\tilde{\delta}_{i}:=\min\{\delta_{i},\varepsilon\} for any i[d]i\in[d]. Then, M𝗆𝖾𝖺𝗇M_{\mathsf{mean}}, i.e., Mechanism 1, with parameters

cd=min{log(eζδ~1+q1q),δ~d},c0=0,c_{d}=\min\left\{\log(\frac{e^{\zeta\tilde{\delta}_{1}}+q-1}{q}),\tilde{\delta}_{d}\right\},\,c_{0}=0, (16)
i<d,ci={cdif cdδ~i,δ~ilog(1+qecdq)otherwise,\forall i<d,\ c_{i}=\begin{cases}c_{d}&\text{if }c_{d}\leq\tilde{\delta}_{i},\\ \tilde{\delta}_{i}-\log(1+qe^{c_{d}}-q)&\text{otherwise},\end{cases}

where ζ(0,1]\zeta\in(0,1] is a free parameter, is 𝛅\bm{\delta}-BCDP and ε\varepsilon-LDP. Moreover, using Mechanism 2 as M𝖫𝖣𝖯M_{\mathsf{LDP}}, we obtain the following mean-squared error bound

\bbE[\vecx¯Π(1nj=1nM𝗆𝖾𝖺𝗇(\vecx(j),\vecc))22|\vecx(1),,\vecx(n)]1ni=1d(1k=1i(ckck1)2dk+1)d,\bbE\left[\bigg{\|}\bar{\vecx}-\Pi\bigg{(}\frac{1}{n}\sum_{j=1}^{n}M_{\mathsf{mean}}(\vecx^{(j)},\vecc)\bigg{)}\bigg{\|}_{2}^{2}\bigg{|}\vecx^{(1)},\ldots,\vecx^{(n)}\right]\lesssim\frac{1}{n}\sum_{i=1}^{d}\left(\frac{1}{\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}}\right)\wedge d, (17)

where Π()\Pi(\cdot) denotes projection into [1,1]d[-1,1]^{d} and the expectation is taken over the randomness in M𝗆𝖾𝖺𝗇()M_{\mathsf{mean}}(\cdot).

The proof is presented in Appendix B.2. Here, we provide a proof sketch for the privacy guarantee. We first establish that M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} is \vecc\vecc-CDP and cdc_{d}-LDP. The latter implies that M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} also satisfies cd𝟏c_{d}\bm{1}-BCDP. Thus, if δ~icd\tilde{\delta}_{i}\geq c_{d}, we have already shown that M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} is δi\delta_{i}-BCDP with respect to the ii-th coordinate. For the case where cd>δ~ic_{d}>\tilde{\delta}_{i}, we use Proposition 4, which essentially implies that M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} is ci+log(1+qecdq)c_{i}+\log(1+qe^{c_{d}}-q)-BCDP with respect to the ii-th coordinate. Substituting the value of cic_{i} from the statement of Theorem 2 gives us the desired δi\delta_{i}-BCDP guarantee for coordinate ii. Finally, the choice of cdc_{d} in (16) ensures that the proposed cic_{i}’s are all non-negative and valid.

We next make a few remarks regarding this result. First, note that while this bound applies to empirical mean estimation, when the data points {\vecx(j)}j=1n\{\vecx^{(j)}\}_{j=1}^{n} are independent and identically distributed, the result can also be used to derive an out-of-sample error bound for the statistical private mean estimation problem. In this case, an additional dn\frac{d}{n} term would be added to the error bound. Secondly, note that in the special case where there is no coordinate-level privacy requirement (i.e., δi=\delta_{i}=\infty), setting ζ=1\zeta=1 implies c1==cd=εc_{1}=\ldots=c_{d}=\varepsilon, which yields the error bound 𝒪(d2/(nε2))\mathcal{O}({d^{2}}/(n\varepsilon^{2})). This matches the ε\varepsilon-LDP lower bound provided in Duchi et al. (2013a). Lastly, we would like to discuss the role of parameter ζ\zeta. Note that when the correlation parameter qq is sufficiently high or the most sensitive privacy requirement δ1\delta_{1} is low enough such that cdc_{d} is determined by the first term, log((eζδ~1+q1)/q)\log((e^{\zeta\tilde{\delta}_{1}}+q-1)/{q}), in (16), one can verify that c1=(1ζ)δ1c_{1}=(1-\zeta)\delta_{1}. As ζ\zeta increases, c1c_{1} decreases, leading to a worse estimate for the first coordinate. However, cdc_{d} increases, indicating that, for the less-sensitive coordinates, we are increasing the privacy parameter, potentially resulting in lower estimation error for those coordinates. In other words, the parameter ζ\zeta allows us to control how we allocate the privacy budget of the most sensitive coordinate, δ1\delta_{1}, between the estimation of the first coordinate and the privacy cost imposed by its correlation with other coordinates.

We next present a corollary that studies a special case where the coordinates are divided into more sensitive and less sensitive groups. In this case, users requests a general ε\varepsilon-LDP guarantee for all coordinates and a δ\delta-BCDP guarantee specifically for the more sensitive coordinates. Under this scenario, we further simplify the error bound which allows us to better highlight implications of Theorem 2.

Corollary 1.

Suppose we are under the premise of Theorem 2, with δi=δ\delta_{i}=\delta for 1ik1\leq i\leq k and δi=ε>2δ\delta_{i}=\varepsilon>2\delta for di>kd\geq i>k. Then, by choosing ζ=0.5\zeta=0.5, the mean-squared error upper bound in (17) simplifies to

𝒪(1)1n(dkδ2+(dk)2ε2),\displaystyle\mathcal{O}(1)\frac{1}{n}\cdot\left(\frac{dk}{\delta^{2}}+\frac{(d-k)^{2}}{\varepsilon^{2}}\right), (18)

for the case qeδ/21eε1q\leq\frac{e^{\delta/2}-1}{e^{\varepsilon}-1}, and to

𝒪(1)1n(dkδ2+(dk)2dkdδ2+(cdδ/2)2),\displaystyle\mathcal{O}(1)\frac{1}{n}\cdot\left(\frac{dk}{\delta^{2}}+\frac{(d-k)^{2}}{\frac{d-k}{d}\delta^{2}+(c_{d}-\delta/2)^{2}}\right), (19)

otherwise, with cd=log(eδ/2+q1q)c_{d}=\log(\frac{e^{\delta/2}+q-1}{q}) decreasing from ε\varepsilon to δ/2\delta/2 as qq increases from (eδ/21)/(eε1)(e^{\delta/2}-1)/(e^{\varepsilon}-1) to 1.

Under the special structure of 𝜹\bm{\delta} and ε\varepsilon in Corollary 1, the MSE can be thought of as sum of MSE of the more sensitive and the less sensitive coordinates. For low levels of correlation, the MSE of less private coordinates behave like the MSE of an ε\varepsilon-LDP mechanism on dkd-k dimensional vector but the MSE of the more sensitive part depends on the dimension dd of the whole vector, not just the sensitive part, showing how the more sensitive part of the data affects the whole vector. As the correlation increases q1q\to 1, we have cdδ/2c_{d}\to\delta/2 and the MSE matches that of a min𝜹\min\bm{\delta}-LDP mechanism. A supporting experiment is deferred to Appendix B.1.

5 Private Least-Squares Regression

We focus on how the BCDP framework can be employed on least-squares problems. Consider a setting in which nn datapoints are distributed among nn users, where \vecx(i)=[\vecz(i)l(i)]d\vecx^{(i)}=[{\vecz^{(i)}}^{\top}\ l^{(i)}]^{\top}\in\mathbb{R}^{d} represents the data of user ii, with \vecz(i)\vecz^{(i)} as the feature vector for the ii-th data point and l(i)l^{(i)} as its label. Our goal is to solve the following empirical risk minimization problem privately over the data of nn users:

min𝜽Θf(𝜽):=1ni=1n(𝜽,\vecx(i)),\min_{\bm{\theta}\in\Theta}f(\bm{\theta}):=\frac{1}{n}\sum_{i=1}^{n}\ell(\bm{\theta},\vecx^{(i)}), (20)

where (𝜽,\vecx(i)):=12(𝜽\vecz(i)l(i))2\ell(\bm{\theta},\vecx^{(i)}):=\frac{1}{2}(\bm{\theta}^{\top}\vecz^{(i)}-l^{(i)})^{2} and Θd1\Theta\subset\mathbb{R}^{d-1} is a compact convex set. We denote the solution of (20) by ff^{*}, achieved at some point 𝜽\bm{\theta}^{*}.

The customary approach in private convex optimization is to use a gradient-based method, perturbing the gradient at each iteration to satisfy the privacy requirements. However, this approach is not suitable for cases in which coordinate-based privacy guarantees are needed. The difficulty is that each coordinate of the gradient is typically influenced by all coordinates of the data. As a result, and it becomes challenging to distinguish the level of privacy offered to different coordinates of the data. Instead, we first perturb the users’ data locally and then compute the gradient at the perturbed point to update the model at the server.

However, computing the gradient based on perturbed data introduces its own challenges. In particular, the gradient of (𝜽,\vecx)\ell(\bm{\theta},\vecx) with \vecx=[\veczl]\vecx=[{\vecz}^{\top}l]^{\top}, given by

(𝜽,\vecx)=\vecz\vecz𝜽l\vecz,\nabla\ell(\bm{\theta},\vecx)=\vecz\vecz^{\top}\bm{\theta}-l\vecz, (21)

is not a linear function of the data. As a result, although the privacy mechanism outputs an unbiased estimate of the true data, this non-linearity introduces an error in the gradient with a non-zero mean, preventing the algorithm from converging to the optimal solution. To overcome this challenge, we observe that the aforementioned non-linearity in the data arises from the product terms \vecz\vecz\vecz\vecz^{\top} and l\veczl\vecz. Therefore, if we create two private copies of the data and replace each term of the product with one of the copies, we obtain a stochastic unbiased estimate of the gradient at the server. The catch, of course, is that we need to divide our privacy budget in half.

A summary of the algorithm is presented in Algorithm 3. There, OPT(g,δ)\text{OPT}(g,\delta) refers to any convex optimization algorithm of choice, e.g., gradient descent, that finds the minimizer of the function g()g(\cdot) over the set Θ\Theta and up to an error of δ\delta.

Algorithm 3 BCDP and LDP least-squares regression
  Input: Users’ data {\vecx(i)}i=1n\{\vecx^{(i)}\}_{i=1}^{n} and vector \vecc\vecc
  for i[n]i\in[n] do
     //user ii sends locally perturbed data to the server
     𝒙^(i,1)M𝗆𝖾𝖺𝗇(\vecx(i),\vecc/2)\bm{\hat{x}}^{(i,1)}\leftarrow M_{\mathsf{mean}}(\vecx^{(i)},\vecc/2)
     𝒙^(i,2)M𝗆𝖾𝖺𝗇(\vecx(i),\vecc/2)\bm{\hat{x}}^{(i,2)}\leftarrow M_{\mathsf{mean}}(\vecx^{(i)},\vecc/2)
  end for
  //server optimizes with the perturbed data
  f^(𝜽):=12ni=1n(𝜽𝒛^(i,1)𝒛^(i,2)𝜽2𝜽l^(i,1)𝒛^(i,2))\hat{f}(\bm{\theta}):=\frac{1}{2n}\sum_{i=1}^{n}\left(\bm{\theta}^{\top}\bm{\hat{z}}^{(i,1)}\bm{\hat{z}}^{{(i,2)}^{\top}}\bm{\theta}-2\bm{\theta}^{\top}\hat{l}^{(i,1)}\bm{\hat{z}}^{(i,2)}\right)
  𝜽^OPT(f^,1n)\bm{\hat{\theta}}\leftarrow\text{OPT}(\hat{f},\frac{1}{{n}})
  Return 𝜽^\bm{\hat{\theta}}

We make the assumption that the data is bounded.

Assumption 2.

\vecx(i)[1,1]d\vecx^{(i)}\in[-1,1]^{d} for all i[n]i\in[n].

We also need 1, which bounds the dependence of each coordinate on the other coordinates of the data. Recall that the last coordinate is exempt from this assumption, which is why we position the label as the last coordinate of the data. It is not reasonable to assume the feature vector is weakly correlated with the label, as that would violate the premise of regression. We next state the following result on Algorithm 3.

Theorem 3.

Suppose Assumptions 1 and 2 hold. Then, Algorithm 3, with \vecc\vecc set similar to Theorem 2, is 𝛅\bm{\delta}-BCDP and ε\varepsilon-LDP. Moreover, using Mechanism 2 as M𝖫𝖣𝖯M_{\mathsf{LDP}}, we obtain the following error rate

\bbE[f(𝜽^)f|\vecx(1),,\vecx(n)]max𝜽Θ(𝜽2+𝜽)(logdn(r2+rd)d),\bbE\left[f(\bm{\hat{\theta}})-f^{*}\bigg{|}\vecx^{(1)},\ldots,\vecx^{(n)}\right]\lesssim\max_{\bm{\theta}\in\Theta}(\|\bm{\theta}\|^{2}+\|\bm{\theta}\|)\left(\frac{\log d}{\sqrt{n}}(r^{2}+rd)\wedge d\right), (22)

with r2=i=1d1/(k=1i(ckck1)2dk+1)r^{2}=\sum_{i=1}^{d}1/(\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}).

The proof of this theorem can be found in Appendix C.1. It is worth noting that r2r^{2} appears in the mean-squared error bound in Theorem 2. That said, if we consider special cases similar to the setting in Corollary 1, the term r2r^{2} would simplify in a similar manner.

We conclude this section by noting that, while we focus on linear regression here, our approach can provide an unbiased gradient and guarantee privacy in any setting where the loss function (𝜽,\vecx)\ell(\bm{\theta},\vecx) has a quadratic (or even low-degree polynomial) dependence on \vecx\vecx, such as neural networks with 2\ell_{2} loss. However, the optimization error in such cases may require further considerations, as non-convex loss functions could arise.

6 Acknowledgements

This work was done in part while M.A. was a research fellow at the Simons Institute for the Theory of Computing. A.F. and M.J. acknowledge support from the European Research Council (ERC-2022-SYG-OCEAN-101071601). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. S.C. acknowledges support from AI Policy Hub, U.C. Berkeley; S.C. and T.C. acknowledge support from NSF CCF-1750430 and CCF-2007669.

References

  • Acharya et al. [2020] J. Acharya, K. Bonawitz, P. Kairouz, D. Ramage, and Z. Sun. Context aware local differential privacy. In Proceedings of the 37th International Conference on Machine Learning, pages 52–62. PMLR, 2020.
  • Alvim et al. [2018] M. Alvim, K. Chatzikokolakis, C. Palamidessi, and A. Pazii. Invited paper: Local differential privacy on metric spaces: Optimizing the trade-off with utility. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 262–267, 2018.
  • Andrés et al. [2013] M. E. Andrés, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 901–914, 2013.
  • Asi et al. [2022] H. Asi, V. Feldman, and K. Talwar. Optimal algorithms for mean estimation under local differential privacy. In International Conference on Machine Learning, pages 1046–1056. PMLR, 2022.
  • Beimel et al. [2008] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. In Advances in Cryptology–CRYPTO 2008: 28th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings 28, pages 451–468. Springer, 2008.
  • Beimel et al. [2013] A. Beimel, K. Nissim, and U. Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 363–378. Springer, 2013.
  • Biswas et al. [2020] S. Biswas, Y. Dong, G. Kamath, and J. Ullman. Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33:14475–14485, 2020.
  • Chan et al. [2012] T.-H. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In Proceedings of the 20th Annual European Conference on Algorithms, ESA’12, page 277–288. Springer-Verlag, 2012. ISBN 9783642330896.
  • Chatzikokolakis et al. [2013] K. Chatzikokolakis, M. E. Andrés, N. E. Bordenabe, and C. Palamidessi. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer, 2013.
  • Chaudhuri and Hsu [2011] K. Chaudhuri and D. Hsu. Sample complexity bounds for differentially private learning. In Proceedings of the 24th Annual Conference on Learning Theory, pages 155–186. JMLR Workshop and Conference Proceedings, 2011.
  • Desfontaines [2020] D. Desfontaines. A list of real-world uses of differential privacy. https://desfontain.es/blog/real-world-differential-privacy.html, 2020. Accessed: 2024-05-22.
  • Differential Privacy Team, Apple [2017] Differential Privacy Team, Apple. Learning with privacy at scale. https://machinelearning.apple.com/research/learning-with-privacy-at-scale, 2017. Accessed: 2024-05-22.
  • Ding et al. [2017] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. Advances in Neural Information Processing Systems, 30, 2017.
  • Duchi et al. [2013a] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy, data processing inequalities, and statistical minimax rates. arXiv preprint arXiv:1302.3203, 2013a.
  • Duchi et al. [2013b] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438. IEEE, 2013b.
  • Dwork et al. [2006] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
  • Dwork et al. [2014] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  • Erlingsson et al. [2014] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 1054–1067, 2014.
  • Evfimievski et al. [2003] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 211–222, 2003.
  • Feldman and Talwar [2021] V. Feldman and K. Talwar. Lossless compression of efficient private local randomizers. In International Conference on Machine Learning, pages 3208–3219. PMLR, 2021.
  • Ghazi et al. [2021] B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, and C. Zhang. Deep learning with label differential privacy. Advances in Neural Information Processing Systems, 34:27131–27145, 2021.
  • Ghazi et al. [2022] B. Ghazi, R. Kumar, P. Manurangsi, and T. Steinke. Algorithms with more granular differential privacy guarantees. arXiv preprint arXiv:2209.04053, 2022.
  • He et al. [2014] X. He, A. Machanavajjhala, and B. Ding. Blowfish privacy: tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, page 1447–1458. Association for Computing Machinery, 2014. ISBN 9781450323765.
  • Imola et al. [2022] J. Imola, S. Kasiviswanathan, S. White, A. Aggarwal, and N. Teissier. Balancing utility and scalability in metric differential privacy. In Uncertainty in Artificial Intelligence, pages 885–894. PMLR, 2022.
  • Kairouz et al. [2015] P. Kairouz, S. Oh, and P. Viswanath. The composition theorem for differential privacy. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1376–1385, Lille, France, 07–09 Jul 2015. PMLR.
  • Karwa and Vadhan [2017] V. Karwa and S. Vadhan. Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908, 2017.
  • Kasiviswanathan and Smith [2014] S. P. Kasiviswanathan and A. Smith. On the ’semantics’ of differential privacy: A Bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014.
  • Kasiviswanathan et al. [2011] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
  • Kenthapadi et al. [2013] K. Kenthapadi, A. Korolova, I. Mironov, and N. Mishra. Privacy via the johnson-lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1), Aug. 2013. doi: 10.29012/jpc.v5i1.625.
  • Kifer and Machanavajjhala [2011] D. Kifer and A. Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204, 2011.
  • Kifer and Machanavajjhala [2014] D. Kifer and A. Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS), 39(1):1–36, 2014.
  • Kifer et al. [2012] D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. In S. Mannor, N. Srebro, and R. C. Williamson, editors, Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 25.1–25.40, Edinburgh, Scotland, 25–27 Jun 2012. PMLR. URL https://proceedings.mlr.press/v23/kifer12.html.
  • Mahloujifar et al. [2023] S. Mahloujifar, C. Guo, G. E. Suh, and K. Chaudhuri. Machine learning with feature differential privacy. In Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
  • Thakurta et al. [2017] A. G. Thakurta, A. H. Vyrros, U. S. Vaishampayan, G. Kapoor, J. Freudiger, V. R. Sridhar, and D. Davidson. Learning new words, 2017.
  • Triastcyn and Faltings [2020] A. Triastcyn and B. Faltings. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583–9592. PMLR, 2020.
  • Tropp [2016] J. A. Tropp. The expected norm of a sum of independent random matrices: An elementary approach. In High Dimensional Probability VII: The Cargese Volume, pages 173–202. Springer, 2016.
  • Vadhan [2017] S. Vadhan. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pages 347–450, 2017.
  • Vu and Slavkovic [2009] D. Vu and A. Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, pages 138–143, 2009. doi: 10.1109/ICDMW.2009.52.
  • Wang and Xu [2019] D. Wang and J. Xu. On sparse linear regression in the local differential privacy model. In International Conference on Machine Learning, pages 6628–6637. PMLR, 2019.
  • Warner [1965] S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
  • Wasserman and Zhou [2010] L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  • Xiao and Devadas [2023] H. Xiao and S. Devadas. Pac privacy: Automatic privacy measurement and control of data processing. In Annual International Cryptology Conference, pages 611–644. Springer, 2023.
  • Xiong et al. [2020] X. Xiong, S. Liu, D. Li, Z. Cai, and X. Niu. A comprehensive survey on local differential privacy. Security and Communication Networks, 2020:1–29, 2020.
  • Yang et al. [2023] M. Yang, T. Guo, T. Zhu, I. Tjuawinata, J. Zhao, and K.-Y. Lam. Local differential privacy and its applications: A comprehensive survey. Computer Standards & Interfaces, page 103827, 2023.
  • Zheng et al. [2017] K. Zheng, W. Mou, and L. Wang. Collect at once, use effectively: Making non-interactive locally private learning possible. In International Conference on Machine Learning, pages 4130–4139. PMLR, 2017.

Appendix A Proofs for Section 2 and Section 3

A.1 Proof of Proposition 1

See 1

Proof.

(A) Forward direction:

For any R\cF\cYR\in\cF_{\cY} and S,S\cF\cX+S,S^{\prime}\in\cF_{\cX}^{+}, we have

μ\vecx(R)\displaystyle\mu_{\vecx}(R) eεμ\vecx(R)\displaystyle\leq e^{\varepsilon}\mu_{\vecx^{\prime}}(R) (23)
\vecxSμ\vecx(R)𝑑π(\vecx)\displaystyle\implies\int_{\vecx\in S}\mu_{\vecx}(R)d\pi(\vecx) eε\vecxSμ\vecx(R)𝑑π(\vecx)\displaystyle\leq e^{\varepsilon}\int_{\vecx\in S}\mu_{\vecx^{\prime}}(R)d\pi(\vecx) (24)
P{M(\vecx)R,\vecxS}\displaystyle\implies P\{M(\vecx)\in R,\vecx\in S\} eεπ(S)μ\vecx(R),\displaystyle\leq e^{\varepsilon}\pi(S)\mu_{\vecx^{\prime}}(R), (25)
P{M(\vecx)R|\vecxS}\displaystyle\implies P\{M(\vecx)\in R|\vecx\in S\} eεμ\vecx(R).\displaystyle\leq e^{\varepsilon}\mu_{\vecx^{\prime}}(R). (26)

Similarly integrate again \vecxSdπ(\vecx)\int_{\vecx^{\prime}\in S^{\prime}}\cdot d\pi(\vecx^{\prime}) to get the desired result.

(B) Converse:

Fix R\cF\cYR\in\cF_{\cY} and let

U(R)\displaystyle U(R) =𝖾𝗌𝗌𝗌𝗎𝗉\vecx\cXμ\vecx(R),\displaystyle=\mathsf{ess\ sup}_{\vecx\in\cX}\mu_{\vecx}(R), (27)
L(R)\displaystyle L(R) =𝖾𝗌𝗌𝗂𝗇𝖿\vecx\cXμ\vecx(R),\displaystyle=\mathsf{ess\ inf}_{\vecx\in\cX}\mu_{\vecx}(R), (28)

where the essential supremum and infimum are with respect to (\cX,\cF\cX,π)(\cX,\cF_{\cX},\pi). Fix any δ>0\delta>0 and define

S={\vecx\cX:μ\vecx(R)>U(R)δ},S={\vecx\cX:μ\vecx(R)<L(R)+δ}.S=\{\vecx\in\cX:\mu_{\vecx}(R)>U(R)-\delta\},\ S^{\prime}=\{\vecx\in\cX:\mu_{\vecx}(R)<L(R)+\delta\}. (29)

By definition of essential supremum and infimum and using the fact that μ\mu is a Markov kernel, S,S\cF\cX+S,S^{\prime}\in\cF_{\cX}^{+}.

Using the ε\varepsilon-BDP condition, we can obtain

U(R)δ\displaystyle U(R)-\delta 1π(S)\vecxSμ\vecx(R)𝑑π(\vecx)\displaystyle\leq\frac{1}{\pi(S)}\int_{\vecx\in S}\mu_{\vecx}(R)d\pi(\vecx) (30)
eεπ(S)\vecxSμ\vecx(R)𝑑π(\vecx)\displaystyle\leq\frac{e^{\varepsilon}}{\pi(S^{\prime})}\int_{\vecx\in S^{\prime}}\mu_{\vecx}(R)d\pi(\vecx) (31)
eε(L(R)+δ).\displaystyle\leq e^{\varepsilon}(L(R)+\delta). (32)

Since δ>0\delta>0 is arbitrary, we get U(R)eεL(R)U(R)\leq e^{\varepsilon}L(R) R\cF\cY\forall R\in\cF_{\cY}.

Now, for RR, define

E(R):={\vecx\cX:μ\vecx(R)[L(R),U(R)]},E(R):=\{\vecx\in\cX:\mu_{\vecx}(R)\notin[L(R),U(R)]\}, (33)

and note that π(E(R))=0\pi(E(R))=0 by definition of essential supremum and infimum and using the fact that μ\mu is a Markov kernel.

We shall now modify MM on a judiciously chosen π\pi-null set to obtain an ε\varepsilon-LDP mechanism MM^{\prime}. Since \cY\cY is second countable as it is a Polish space, there exists a countable collection of open sets \cV={Vi}i1\cF\cY\cV=\{V_{i}\}_{i\geq 1}\subset\cF_{\cY} such that every open set R\cF\cYR\in\cF_{\cY} can be written as a disjoint union of some countable subset of \cV\cV.

Define E=i1E(Vi)E=\cup_{i\geq 1}E(V_{i}) and note that π(E)=0\pi(E)=0 by countable sub-additivity. Fix any \vecx¯Ec\bar{\vecx}\in E^{c} and define the modification MM^{\prime} of MM via

M(\vecx)={M(\vecx¯)if \vecxEM(\vecx)otherwise.M^{\prime}(\vecx)=\begin{cases}M(\bar{\vecx})&\text{if }\vecx\in E\\ M(\vecx)&\text{otherwise.}\end{cases} (34)

Equivalently, MM^{\prime} is represented by the Markov kernel μ:\cX×\cF\cY[0,1]\mu^{\prime}:\cX\times\cF_{\cY}\to[0,1] defined by

μ\vecx(R)={μ\vecx¯(R)if \vecxEμ\vecx(R)otherwise.\mu^{\prime}_{\vecx}(R)=\begin{cases}\mu_{\bar{\vecx}}(R)&\text{if }\vecx\in E\\ \mu_{\vecx}(R)&\text{otherwise.}\end{cases} (35)

Since π(E)=0\pi(E)=0, it is easy to see that P{M(\vecx)=M(\vecx)}=1P\{M(\vecx)=M^{\prime}(\vecx)\}=1. By construction, since E=i1E(Vi)E=\cup_{i\geq 1}E(V_{i}),

μ\vecx(Vi)U(Vi)eεL(Vi)eεμ\vecx(Vi)\vecx,\vecx\cX,Vi\cV.\mu^{\prime}_{\vecx}(V_{i})\leq U(V_{i})\leq e^{\varepsilon}L(V_{i})\leq e^{\varepsilon}\mu^{\prime}_{\vecx^{\prime}}(V_{i})\,\,\forall\vecx,\vecx^{\prime}\in\cX,V_{i}\in\cV. (36)

By second countability, and σ\sigma-additivity, we conclude,

μ\vecx(R)eεμ\vecx(R)\vecx,\vecx\cX,open R\cF\cY.\mu^{\prime}_{\vecx}(R)\leq e^{\varepsilon}\mu^{\prime}_{\vecx^{\prime}}(R)\,\,\forall\vecx,\vecx^{\prime}\in\cX,\forall\text{open }R\in\cF_{\cY}. (37)

Every second countable space is Radon, so by outer regularity of both measures μ\vecx\mu_{\vecx} and μ\vecx\mu^{\prime}_{\vecx}, we finally obtain

μ\vecx(R)eεμ\vecx(R)\vecx,\vecx\cX,R\cF\cY.\mu^{\prime}_{\vecx}(R)\leq e^{\varepsilon}\mu^{\prime}_{\vecx^{\prime}}(R)\,\,\forall\vecx,\vecx^{\prime}\in\cX,\forall R\in\cF_{\cY}. (38)

Hence, MM^{\prime} is ε\varepsilon-LDP as desired. ∎

A.2 Proof of Theorem 1

See 1

Proof.

By definition α(Ri,M,Si,Si)=P{M(\vecx)Ri|\vecxSi}\alpha(R_{i},M,S_{i},S_{i}^{\prime})=P\{M(\vecx)\in R_{i}|\vecx\in S_{i}\} and

P{M(\vecx)Ti|xiSi}\displaystyle P\{M(\vecx)\in T_{i}|x_{i}\in S_{i}\} eδiP{M(\vecx)Ti|xiSi}Si,Si\cF\cXi+,Ti\cF\cY\displaystyle\leq e^{\delta_{i}}P\{M(\vecx)\in T_{i}|x_{i}\in S_{i}^{\prime}\}\ \forall S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+},T_{i}\in\cF_{\cY} (39)

Setting Ti=RiCT_{i}=R_{i}^{C}, get α(Ri,M,Si,Si)+eδiβ(Ri,M,Si,Si)1\alpha(R_{i},M,S_{i},S_{i}^{\prime})+e^{\delta_{i}}\beta(R_{i},M,S_{i},S_{i}^{\prime})\geq 1. One can also switch Si,SiS_{i},S_{i}^{\prime} above and set Ti=RiT_{i}=R_{i} to get eδiα(Ri,M,Si,Si)+β(Ri,M,Si,Si)1e^{\delta_{i}}\alpha(R_{i},M,S_{i},S_{i}^{\prime})+\beta(R_{i},M,S_{i},S_{i}^{\prime})\geq 1.

The converse is straightforward as well.

eδiα(Ri,M,Si,Si)+β(Ri,M,Si,Si)1\displaystyle e^{\delta_{i}}\alpha(R_{i},M,S_{i},S_{i}^{\prime})+\beta(R_{i},M,S_{i},S_{i}^{\prime})\geq 1\ Si,Si\cF\cXi+,Ri\cF\cY\displaystyle\forall S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+},R_{i}\in\cF_{\cY} (40)
P{M(\vecx)Ri|xiSi}eδiP{M(\vecx)Ri|xiSi}\displaystyle\Leftrightarrow P\{M(\vecx)\in R_{i}|x_{i}\in S_{i}^{\prime}\}\leq e^{\delta_{i}}P\{M(\vecx)\in R_{i}|x_{i}\in S_{i}\}\ Si,Si\cF\cXi+,Ri\cF\cY.\displaystyle\forall S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+},R_{i}\in\cF_{\cY}. (41)

A.3 Proof of Proposition 3

See 3

Proof.

By Proposition 1, an ε\varepsilon-DP algorithm is ε\varepsilon-BDP as well. Recall P{M(\vecx)R|\vecxS}=P(S×R)π(S).P\{M(\vecx)\in R|\vecx\in S\}=\frac{P(S\times R)}{\pi(S)}. Thus, ε\varepsilon-BDP guarntee ensures that S,S\cF\cX+\forall S,S^{\prime}\in\cF_{\cX}^{+},

P(S×R)π(S)eεP(S×R)π(S).\frac{P(S\times R)}{\pi(S)}\leq e^{\varepsilon}\frac{P(S^{\prime}\times R)}{\pi(S^{\prime})}.

Restricting to the sets of form S=Si×\cXiS=S_{i}\times\cX_{-i} and S=Si×\cXiS^{\prime}=S_{i}^{\prime}\times\cX_{-i} for Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+} above yields the desired BCDP result.

A.4 Proof of Proposition 4

See 4

Proof.

Consider the sets R\cF\cYR\in\cF_{\cY} such that P{M(\vecx)R}=0P\{M(\vecx)\in R\}=0, then for Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+}, we have P{M(\vecx)R|xiSi}=P{M(\vecx)R|xiSi}=0P\{M(\vecx)\in R|x_{i}\in S_{i}\}=P\{M(\vecx)\in R|x_{i}\in S_{i}^{\prime}\}=0. Thus, to bound the 𝜹\bm{\delta} in 𝜹\bm{\delta}-BCDP guarantee, we can restrict to RR in the set \cF\cY+:={R\cF\cY|P{M(\vecx)R}>0}\cF_{\cY}^{+}:=\{R\in\cF_{\cY}|P\{M(\vecx)\in R\}>0\}.

For Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+}, R\cF\cY+R\in\cF_{\cY}^{+},

P{M(\vecx)R|xiSi}P{M(\vecx)R|xiSi}\displaystyle\frac{P\{M(\vecx)\in R|x_{i}\in S_{i}\}}{P\{M(\vecx)\in R|x_{i}\in S_{i}^{\prime}\}} =P{M(\vecx)R,xiSi}π{xiSi}π{xiSi}P{M(\vecx)R,xiSi}.\displaystyle=\frac{P\{M(\vecx)\in R,x_{i}\in S_{i}\}}{\pi\{x_{i}\in S_{i}\}}\frac{\pi\{x_{i}\in S_{i}^{\prime}\}}{P\{M(\vecx)\in R,x_{i}\in S_{i}^{\prime}\}}. (42)

For the set RR under consideration, define

l(\vecxi)\displaystyle l(\vecx_{-i}) =infxi\cXiμxi,\vecxi(R),\displaystyle=\inf_{x_{i}\in\cX_{i}}\mu_{x_{i},\vecx_{-i}}(R), (43)
u(\vecxi)\displaystyle u(\vecx_{-i}) =supxi\cXiμxi,\vecxi(R).\displaystyle=\sup_{x_{i}\in\cX_{i}}\mu_{x_{i},\vecx_{-i}}(R). (44)

By CDP property, we have u(\vecxi)ecil(\vecxi)u(\vecx_{-i})\leq e^{c_{i}}l(\vecx_{-i}). Now note that

P{M(\vecx)R,xiSi}π{xiSi}\displaystyle\frac{P\{M(\vecx)\in R,x_{i}\in S_{i}\}}{\pi\{x_{i}\in S_{i}\}} =1π{xiSi}\vecxi\cXi,xiSiμxi,\vecxi(R)𝑑π(xi,\vecxi)\displaystyle=\frac{1}{\pi\{x_{i}\in S_{i}\}}\int_{\vecx_{-i}\in\cX_{-i},x_{i}\in S_{i}}\mu_{x_{i},\vecx_{-i}}(R)d\pi(x_{i},\vecx_{-i}) (45)
1π{xiSi}\vecxi\cXi,xiSiu(\vecxi)𝑑π(xi,\vecxi)\displaystyle\leq\frac{1}{\pi\{x_{i}\in S_{i}\}}\int_{\vecx_{-i}\in\cX_{-i},x_{i}\in S_{i}}u(\vecx_{-i})d\pi(x_{i},\vecx_{-i}) (46)
=\vecxi\cXiu(\vecxi)𝑑π\vecxi|xiSi(\vecxi).\displaystyle=\int_{\vecx_{-i}\in\cX_{-i}}u(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}}(\vecx_{-i}). (47)

Similarly, we have

P{M(\vecx)R,xiSi}P{xiSi}\vecxi\cXil(\vecxi)𝑑π\vecxi|xiSi(\vecxi).\frac{P\{M(\vecx)\in R,x_{i}\in S_{i}^{\prime}\}}{P\{x_{i}\in S_{i}^{\prime}\}}\geq\int_{\vecx_{-i}\in\cX_{-i}}l(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}^{\prime}}(\vecx_{-i}). (48)

Using (47) and (48) in (42), we obtain

P{M(\vecx)R|xiSi}P{M(\vecx)R|xiSi}\displaystyle\frac{P\{M(\vecx)\in R|x_{i}\in S_{i}\}}{P\{M(\vecx)\in R|x_{i}\in S_{i}^{\prime}\}} \vecxi\cXiu(\vecxi)𝑑π\vecxi|xiSi(\vecxi)\vecxi\cXil(\vecxi)𝑑π\vecxi|xiSi(\vecxi)\displaystyle\leq\frac{\int_{\vecx_{-i}\in\cX_{-i}}u(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}}(\vecx_{-i})}{\int_{\vecx_{-i}\in\cX_{-i}}l(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}^{\prime}}(\vecx_{-i})} (49)
eci\vecxi\cXil(\vecxi)𝑑π\vecxi|xiSi(\vecxi)\vecxi\cXil(\vecxi)𝑑π\vecxi|xiSi(\vecxi)\displaystyle\leq e^{c_{i}}\frac{\int_{\vecx_{-i}\in\cX_{-i}}l(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}}(\vecx_{-i})}{\int_{\vecx_{-i}\in\cX_{-i}}l(\vecx_{-i})d\pi_{\vecx_{-i}|x_{i}\in S_{i}^{\prime}}(\vecx_{-i})} (50)
eci(1+𝖳𝖵(π\vecxi|xiSi,π\vecxi|XiSi)max\vecxil(\vecxi)min\vecxil(\vecxi)min\vecxil(\vecxi))\displaystyle\leq e^{c_{i}}\left(1+\mathsf{TV}(\pi_{\vecx_{-i}|x_{i}\in S_{i}},\pi_{\vecx_{-i}|X_{i}\in S_{i}^{\prime}})\frac{\max_{\vecx_{-i}}l(\vecx_{-i})-\min_{\vecx_{-i}}l(\vecx_{-i})}{\min_{\vecx_{-i}}l(\vecx_{-i})}\right) (51)
eci(1+𝖳𝖵(π\vecxi|xiSi,π\vecxi|xiSi)(ejicj1)),\displaystyle\leq e^{c_{i}}\left(1+\mathsf{TV}(\pi_{\vecx_{-i}|x_{i}\in S_{i}},\pi_{\vecx_{-i}|x_{i}\in S_{i}^{\prime}})(e^{\sum_{j\neq i}c_{j}}-1)\right), (52)

where we used max\vecxil(\vecxi)min\vecxil(\vecxi)ejicj\frac{\max_{\vecx_{-i}}l(\vecx_{-i})}{\min_{\vecx_{-i}}l(\vecx_{-i})}\leq e^{\sum_{j\neq i}c_{j}} by CDP property.

The ici\sum_{i}c_{i}-LDP property follows by straightforward composition.

Finally, notice that, in case that we know the mechanism is ε\varepsilon-LDP, we can bound max\vecxil(\vecxi)min\vecxil(\vecxi)\frac{\max_{\vecx_{-i}}l(\vecx_{-i})}{\min_{\vecx_{-i}}l(\vecx_{-i})} by eεe^{\varepsilon}. In addition, by Proposition 3, we know δiε\delta_{i}^{\prime}\leq\varepsilon. Thus, we get δimin{ci+log(1+qieεqi),ε}\delta_{i}^{\prime}\leq\min\{c_{i}+\log(1+q_{i}e^{\varepsilon}-q_{i}),\varepsilon\}

A.5 Additional Results for Section 3.4

Proposition 6.

Let

A𝜹[i,j]={1δii=j,1log(1+eδi1qi)else.A_{\bm{\delta}}[i,j]=\begin{cases}\frac{1}{\delta_{i}}&i=j,\\ \frac{1}{\log\left(1+\frac{e^{\delta_{i}}-1}{q_{i}}\right)}&\text{else}.\end{cases} (53)

Then, the condition A𝛅\vecc𝟏A_{\bm{\delta}}\vecc\leq\bm{1} implies δici+log(1+qiejicjqi)\delta_{i}\geq c_{i}+\log(1+q_{i}e^{\sum_{j\neq i}c_{j}}-q_{i}) for all i[d]i\in[d].

Proof.

Without loss of generality, we establish the proof for i=1i=1. For a given value of c1c_{1}, we have the constraint that

j=2dcjlog(1+eδ1c11q1).\sum_{j=2}^{d}c_{j}\leq\log\left(1+\frac{e^{\delta_{1}-c_{1}}-1}{q_{1}}\right). (54)

Note that g(x)=log(1+eδ1x1q1)g(x)=\log\left(1+\frac{e^{\delta_{1}-x}-1}{q_{1}}\right) is concave and g(0)=log(1+eδ11q1)g(0)=\log\left(1+\frac{e^{\delta_{1}}-1}{q_{1}}\right) and g(δ1)=0g(\delta_{1})=0. Thus, g(x)(1xδ1)log(1+eδ11q1)g(x)\geq\left(1-\frac{x}{\delta_{1}}\right)\log\left(1+\frac{e^{\delta_{1}}-1}{q_{1}}\right). Therefore, we can ensure (54) by imposing

j=2dcj(1c1δ1)log(1+eδ11q1).\sum_{j=2}^{d}c_{j}\leq\left(1-\frac{c_{1}}{\delta_{1}}\right)\log\left(1+\frac{e^{\delta_{1}}-1}{q_{1}}\right). (55)

A.6 Proof of Proposition 5

See 5

Proof.

Let the σ\sigma-field on \cZ\cZ be denoted by \cF\cZ\cF_{\cZ}. For T\cF\cZ,Y(T):={y\cY|K(y)T}\cF\cYT\in\cF_{\cZ},Y(T):=\{y\in\cY|K(y)\in T\}\in\cF_{\cY} since KK is a measurable function. Thus, we have the following for any i[d],T\cF\cZi\in[d],T\in\cF_{\cZ} and Si,Si\cF\cXi+S_{i},S_{i}^{\prime}\in\cF_{\cX_{i}}^{+} by BCDP guarantees on MM

P{K(M(\vecx))T|xiSi}\displaystyle P\{K(M(\vecx))\in T|x_{i}\in S_{i}\} =P{M(\vecx)Y(T)|xiSi}\displaystyle=P\{M(\vecx)\in Y(T)|x_{i}\in S_{i}\} (56)
eδiP{M(\vecx)Y(T)|xiSi}\displaystyle\leq e^{\delta_{i}}P\{M(\vecx)\in Y(T)|x_{i}\in S_{i}^{\prime}\} (57)
=eδiP{K(M(\vecx))T|xiSi}.\displaystyle=e^{\delta_{i}}P\{K(M(\vecx))\in T|x_{i}\in S_{i}^{\prime}\}. (58)

Appendix B Private Mean Estimation

B.1 Numerical Experiment

Refer to caption
Figure 3: MSE as qq is varied keeping 𝜹\bm{\delta} and ε\varepsilon constant for M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} and min𝜹\min\bm{\delta}-LDP.

Here, we provide a simple numerical experiment to validate the performance of our proposed algorithm. For baseline comparison, we consider the M𝖫𝖣𝖯M_{\mathsf{LDP}} privacy mechanism from Duchi et al. [2013b] with privacy parameter min𝜹\min\bm{\delta}. For our proposed privacy mechanism M𝗆𝖾𝖺𝗇M_{\mathsf{mean}}, we have a degree of freedom in choosing ζ\zeta. We note that, as q1q\to 1, we would want ζ1\zeta\to 1 since in this case, the best algorithm is to do min𝜹\min\bm{\delta}-LDP. As q0q\to 0, we would want ci=δ~ic_{i}=\tilde{\delta}_{i} as CDP is ideal in this case. Thus, we settle with a heuristic choice ζ=(1+q)/2\zeta=(1+q)/2.

We first construct a distribution for which 1 holds. We consider the distribution where

Z\displaystyle Z 𝖡𝖾𝗋𝗇(12),\displaystyle\sim\mathsf{Bern}\left(\frac{1}{2}\right), (59)
(x1,,xd)\displaystyle(x_{1},\ldots,x_{d}) ={(Z,,Z) w.p. qi.i.d.𝖡𝖾𝗋𝗇(12) else.\displaystyle=\begin{cases}(Z,\ldots,Z)&\text{ w.p. }q\\ \text{i.i.d.}\ \ \mathsf{Bern}\left(\frac{1}{2}\right)&\text{ else.}\end{cases} (60)

It is easy to verify that 𝖳𝖵(π\vecxi|xi=1,π\vecxi|xi=0)=qi\mathsf{TV}(\pi_{\vecx_{-i}|x_{i}=1},\pi_{\vecx_{-i}|x_{i}=0})=q\ \ \forall i in this setting. We fix d=10d=10, 𝜹=(0.2,0.2,ε,ε,,ε)\bm{\delta}=(0.2,0.2,\varepsilon,\varepsilon,\ldots,\varepsilon), and ε=2\varepsilon=2. This coordinate-wise privacy demand captures the impact of requiring more privacy protection for the first two-coordinates on the algorithm’s error.

We vary the correlation qq and plot the median MSE values, along with the 2525-th and 7575-th quantiles, for our proposed algorithm BCDP and the baseline algorithm in Figure 3. We set n=10000n=10000, and sample user data i.i.d. 2×2\timesBern(12)1(\frac{1}{2})-1. The experiment is run over 1000 trials.

When q1q\approx 1, there is a high degree of correlation between the coordinates and our algorithm resembles min𝜹\min\bm{\delta}-LDP. For other values of qq, the proposed BCDP algorithm can leverage the heterogeneous nature of the privacy demand and obtain better performance.

B.2 Proofs

See 2

Proof.

Our proof has two main parts that are presented below: proof of privacy guarantee, and proof of the error bounds.

Proof of privacy: Note that we use the channel M𝖫𝖣𝖯(.)M_{\mathsf{LDP}}(.) a total of dd times with varying size inputs in Mechanism 1. Focusing on any particular data point \vecx\vecx, let

M~(\vecx):=(M𝖫𝖣𝖯(\vecxi:d,cici1,di+1))i=1d.\tilde{M}(\vecx):=\left(M_{\mathsf{LDP}}(\vecx_{i:d},c_{i}-c_{i-1},\sqrt{d-i+1})\right)_{i=1}^{d}. (61)

First, notice that, by composition, M~\tilde{M} is cdc_{d}-LDP. Given that cdδ~dεc_{d}\leq\tilde{\delta}_{d}\leq\varepsilon, this immediately implies that our algorithm is ε\varepsilon-LDP. To show the BCDP bound, given Proposition 5, it suffices to show M~\tilde{M} is 𝜹\bm{\delta}-BCDP. First, notice that Proposition 3, along with choice of cdδ~dc_{d}\leq\tilde{\delta}_{d}, implies that this algorithm is δd\delta_{d}-BCDP with respect to coordinate dd. In fact, for any ii for which cdδ~ic_{d}\leq\tilde{\delta}_{i}, a similar argument implies that M~\tilde{M} is δ~i\tilde{\delta}_{i}-BCDP with respect to coordinate ii. Hence, we could only focus on other coordinates i<di<d for which cd>δ~ic_{d}>\tilde{\delta}_{i}.

Next, we claim that M~(\vecx)\tilde{M}(\vecx) is \vecc\vecc-CDP. To see this, notice that, by perturbing the kk-th coordinate of \vecx\vecx, the terms affected in M~(\vecx)\tilde{M}(\vecx) are M𝖫𝖣𝖯(\vecxi:d,cici1,di+1)M_{\mathsf{LDP}}(\vecx_{i:d},c_{i}-c_{i-1},\sqrt{d-i+1}) for iki\leq k; in Figure 2, this corresponds to the bottom ii rows of the YY(s), i.e., {Yk}k[i]\{Y^{k}\}_{k\in[i]}. Thus, it suffices to show

M~k(\vecx):=(M𝖫𝖣𝖯(\vecxi:d,cici1,di+1))i=1k\tilde{M}_{k}(\vecx):=\left(M_{\mathsf{LDP}}(\vecx_{i:d},c_{i}-c_{i-1},\sqrt{d-i+1})\right)_{i=1}^{k} (62)

is ckc_{k}-CDP with respect to coordinate kk. However, this immediately follows as M~k\tilde{M}_{k} is ckc_{k}-LDP, implying it is also ckc_{k}-CDP with respect to the kk-th coordinate.

As a result, by the second part of Proposition 4, and given that M~\tilde{M} is cdc_{d}-LDP and 1 holds, we obtain that M~\tilde{M} is ci+log(1+qecdq)c_{i}+\log(1+qe^{c_{d}}-q)-BCDP with respect to coordinate ii. It remains to show that ci+log(1+qecdq)δic_{i}+\log(1+qe^{c_{d}}-q)\leq\delta_{i} for any i<di<d. This also follows from the choice of cic_{i} for i<di<d and cd>δ~ic_{d}>\tilde{\delta}_{i}, which are the conditions we are operating under. It is worth noting that the condition

cdlog(eζδ~1+q1q)c_{d}\leq\log(\frac{e^{\zeta\tilde{\delta}_{1}}+q-1}{q}) (63)

guarantees that the chosen cic_{i}’s are nonnegative, given ζ(0,1)\zeta\in(0,1).

Proof of the mean-squared error bound:

Duchi et al. [2013a] show that the channel M𝖫𝖣𝖯()M_{\mathsf{LDP}}(\cdot)’s output is unbiased and it holds that for any \vecx[1,1]d\vecx\in[-1,1]^{d}, we have

\bbE[M𝖫𝖣𝖯(\vecx,ε,d)\vecx22|\vecx]\cO(d)(dε21).\bbE[\|M_{\mathsf{LDP}}(\vecx,\varepsilon,d)-\vecx\|^{2}_{2}|\vecx]\leq\cO(d)\left(\frac{d}{\varepsilon^{2}}\wedge 1\right). (64)

Note that, due to the symmetry of Algorithm 2, the error is identically distributed across each coordinate in expectation. Hence, for any vector \vecx\vecx and any kik\leq i, the random variable Yik+1kY^{k}_{i-k+1} is an unbiased estimators of \vecxi\vecx_{i} with its variance bounded by 𝒪(1)(di+1(cici1)21)\mathcal{O}(1)\left(\frac{d-i+1}{(c_{i}-c_{i-1})^{2}}\wedge 1\right).

Note that, the errors from different runs of the algorithm are independent. Therefore, using the Cauchy–Schwarz inequality, we can see that the minimum variance in estimating \vecxi\vecx_{i} through a linear combination of Y(i,j)Y^{(i,j)} is achieved by choosing weights proportional to the inverse of the variances which would result in the following estimator

k=1iYi+1kkwkk=1iwk\frac{\sum_{k=1}^{i}Y_{i+1-k}^{k}w_{k}}{\sum_{k=1}^{i}w_{k}} (65)

with the error rate given by

𝒪(1)1k=1i(ckck1)2dk+11.\mathcal{O}(1)\frac{1}{\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}}\wedge 1. (66)

Also, note that, the algorithm’s runs over different users are independent. Therefore, taking average of M𝗆𝖾𝖺𝗇(\vecx(j),\vecc)M_{\mathsf{mean}}(\vecx^{(j)},\vecc) over jj, i.e., different users’ data, gives us an estimator of

1nj=1n\vecxi(j)\frac{1}{n}\sum_{j=1}^{n}\vecx^{(j)}_{i} (67)

with the error rate

𝒪(1)1n1k=1i(ckck1)2dk+11.\mathcal{O}(1)\frac{1}{n}\cdot\frac{1}{\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}}\wedge 1. (68)

Summing this over the coordinates gives us the following upper bound:

𝒪(1)1ni=1d1k=1i(ckck1)2dk+1d.\mathcal{O}(1)\frac{1}{n}\sum_{i=1}^{d}\frac{1}{\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}}\wedge d. (69)

See 1

Proof.

For convenience, we set ζ=0.5\zeta=0.5. We do not focus on refining the constants in the Mean-Squared Error (MSE) bound with a potentially better choice of ζ\zeta. Now consider two cases for the correlation bound qq

  1. 1.

    q[0,eδ/21eε1]q\in[0,\frac{e^{\delta/2}-1}{e^{\varepsilon}-1}]: in this case, we have ck+1=ck+2==cd=εc_{k+1}=c_{k+2}=\ldots=c_{d}=\varepsilon and c1=c2==ck=δlog(1+qeεq)c_{1}=c_{2}=\ldots=c_{k}=\delta-\log(1+qe^{\varepsilon}-q). Using the range of qq, we get δlog(1+qeεq)[δ/2,δ]\delta-\log(1+qe^{\varepsilon}-q)\in[\delta/2,\delta]. Thus, replacing the c1c_{1} to ckc_{k} values by the worst-case of δ/2\delta/2 we get an error of

    𝖬𝖲𝖤dn(kδ2+dkδ2+ddk(εδ/2)2)d\mathsf{MSE}\lesssim\frac{d}{n}\left(\frac{k}{\delta^{2}}+\frac{d-k}{\delta^{2}+\frac{d}{d-k}(\varepsilon-\delta/2)^{2}}\right)\wedge d (70)

    Using ε2δ\varepsilon\geq 2\delta, we get

    𝖬𝖲𝖤(dknδ2+(dk)2nε2)d\mathsf{MSE}\lesssim\left(\frac{dk}{n\delta^{2}}+\frac{(d-k)^{2}}{n\varepsilon^{2}}\right)\wedge d (71)
  2. 2.

    q(eδ/21eε1,1]q\in(\frac{e^{\delta/2}-1}{e^{\varepsilon}-1},1]: in this case we have ck+1==cd=logeδ/2+q1qc_{k+1}=\ldots=c_{d}=\log\frac{e^{\delta/2}+q-1}{q} and c1==ck=δlog(1+qecdq)=δ/2c_{1}=\ldots=c_{k}=\delta-\log(1+qe^{c_{d}}-q)=\delta/2. Thus, we get an error of

    𝖬𝖲𝖤(dknδ2+(dk)dn(δ2+ddk(cdδ/2)2))d\mathsf{MSE}\lesssim\left(\frac{dk}{n\delta^{2}}+\frac{(d-k)d}{n\left(\delta^{2}+\frac{d}{d-k}(c_{d}-\delta/2)^{2}\right)}\right)\wedge d (72)

Appendix C Least-Squares Regression

C.1 Proof of Theorem 3

See 3

Proof.

Proof of privacy: The proof of the privacy guarantee follows a similar structure to the proof of Theorem 2. Note that while composition does not hold for BCDP, it does hold for LDP and CDP. This is why we divide \vecc\vecc by two, rather than dividing 𝜹\bm{\delta} by two. Additionally, since the OPT algorithm only observes the private copies of the data, by the post processing inequality, it does not matter how the algorithm finds the minimizer (e.g., how many passes it makes over each private data point), as the privacy guarantee remains unchanged.

Proof of error rate: Without loss of generality, we can ignore the constant term in f()f(\cdot) that does not depend on 𝜽\bm{\theta}, and hence, with a slight abuse of notation, we can rewrite f()f(\cdot) as

f(𝜽):=12ni=1n(𝜽𝒛(i)𝒛(i)𝜽2𝜽l(i)𝒛(i)).{f}(\bm{\theta}):=\frac{1}{2n}\sum_{i=1}^{n}\left(\bm{\theta}^{\top}\bm{{z}}^{(i)}\bm{{z}}^{{(i)}^{\top}}\bm{\theta}-2\bm{\theta}^{\top}{l}^{(i)}\bm{{z}}^{(i)}\right). (73)

Also, recall that f^()\hat{f}(\cdot) is given by

f^(𝜽):=12ni=1n(𝜽𝒛^(i,1)𝒛^(i,2)𝜽2𝜽l^(i,1)𝒛^(i,2)).\hat{f}(\bm{\theta}):=\frac{1}{2n}\sum_{i=1}^{n}\left(\bm{\theta}^{\top}\bm{\hat{z}}^{(i,1)}\bm{\hat{z}}^{{(i,2)}^{\top}}\bm{\theta}-2\bm{\theta}^{\top}\hat{l}^{(i,1)}\bm{\hat{z}}^{(i,2)}\right). (74)

Denote the minimizer of f^(𝜽)\hat{f}(\bm{\theta}) over Θ\Theta by 𝜽({𝒙^(i,1),𝒙^(i,2)}i=1n)\bm{\theta}^{*}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n}). Also, we denote the output of Algorithm 3 by 𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n)\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n}). Finally, recall that 𝜽\bm{\theta}^{*} denotes the minimizer of function ff.

For convenience and brevity, we drop the conditioning on \vecx(1),,\vecx(n)\vecx^{(1)},\ldots,\vecx^{(n)} from the notation, but all expectations henceforth are conditioned on \vecx(1),,\vecx(n)\vecx^{(1)},\ldots,\vecx^{(n)}.

Next, note that

  1. 1.

    By the definition of OPT, we have

    f^(𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n))f^(𝜽({𝒙^(i,1),𝒙^(i,2)}i=1n))1n.\hat{f}\left(\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)-\hat{f}\left(\bm{\theta}^{*}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)\leq\frac{1}{n}. (75)
  2. 2.

    By the definition of 𝜽({𝒙^(i,1),𝒙^(i,2)}i=1n)\bm{\theta}^{*}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n}), we have

    f^(𝜽({𝒙^(i,1),𝒙^(i,2)}i=1n))f^(𝜽)0.\hat{f}\left(\bm{\theta}^{*}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)-\hat{f}(\bm{\theta}^{*})\leq 0. (76)
  3. 3.

    Given the construction of f^\hat{f}, it is an unbiased estimator of ff for any fixed 𝜽\bm{\theta}. Hence, we have

    𝔼[f^(𝜽)f(𝜽)]=0.\mathbb{E}[\hat{f}(\bm{\theta}^{*})-f(\bm{\theta}^{*})]=0. (77)

Summing all the three, given us

𝔼[f^(𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n))f(𝜽)]1n.\mathbb{E}\left[\hat{f}\left(\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)-f(\bm{\theta}^{*})\right]\leq\frac{1}{n}. (78)

However, we need to bound

𝔼[f(𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n))f(𝜽)].\mathbb{E}\left[{f}\left(\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)-f(\bm{\theta}^{*})\right]. (79)

Therefore, it suffices to bound

𝔼[f(𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n))f^(𝜽^({𝒙^(i,1),𝒙^(i,2)}i=1n))].\mathbb{E}\left[{f}\left(\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)-\hat{f}\left(\bm{\hat{\theta}}(\{\bm{\hat{x}}^{(i,1)},\bm{\hat{x}}^{(i,2)}\}_{i=1}^{n})\right)\right]. (80)

Observe that this term is upper bounded by

𝔼[sup𝜽(f(𝜽)f^(𝜽))].\mathbb{E}\left[\sup_{\bm{\theta}}(f(\bm{\theta})-\hat{f}(\bm{\theta}))\right]. (81)

Note that (81) is upper bounded by

sup𝜽Θθ2𝔼[1ni=1n𝒛(i)𝒛(i)𝒛^(i,1)𝒛^(i,2)]+sup𝜽Θθ𝔼[1ni=1nl(i)𝒛(i)l^(i,1)𝒛^(i,2)].\sup_{\bm{\theta}\in\Theta}\|\theta\|^{2}\mathbb{E}\left[\left\|\frac{1}{n}\sum_{i=1}^{n}\bm{{z}}^{(i)}\bm{{z}}^{{(i)}^{\top}}-\bm{\hat{z}}^{(i,1)}\bm{\hat{z}}^{{(i,2)}^{\top}}\right\|\right]+\sup_{\bm{\theta}\in\Theta}\|\theta\|\mathbb{E}\left[\left\|\frac{1}{n}\sum_{i=1}^{n}{l}^{(i)}\bm{{z}}^{(i)}-\hat{l}^{(i,1)}\bm{\hat{z}}^{(i,2)}\right\|\right]. (82)

We bound the first term above. The second term can be bounded similarly. To do so, first note that we can decompose 𝒛^(i,1)𝒛^(i,2)𝒛(i)𝒛(i)\bm{\hat{z}}^{(i,1)}\bm{\hat{z}}^{{(i,2)}^{\top}}-\bm{{z}}^{(i)}\bm{{z}}^{{(i)}^{\top}} as

(𝒛^(i,1)𝒛(i))𝒛(i)+𝒛(i)(𝒛^(i,2)𝒛(i))+(𝒛^(i,1)𝒛(i))(𝒛^(i,2)𝒛(i)).(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})\bm{{z}}^{{(i)}^{\top}}+\bm{{z}}^{(i)}(\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)})^{\top}+(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})(\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)})^{\top}. (83)

Hence, it suffices to bound the expected norm of the average of each term separately, and all of them can be handled similarly. We will focus on the last term in particular, i.e., we will bound

𝔼[1ni=1nAi]\mathbb{E}\left[\left\|\frac{1}{n}\sum_{i=1}^{n}A^{i}\right\|\right] (84)

with Ai:=(𝒛^(i,1)𝒛(i))(𝒛^(i,2)𝒛(i))A^{i}:=(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})(\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)})^{\top}. Next, we make the following claim regarding matrices {Ai}i=1n\{A^{i}\}_{i=1}^{n}, proved in Section C.2.

Lemma 1.

Let

r2:=sup\vecx[1,1]d𝔼[M𝗆𝖾𝖺𝗇(\vecx,\vecc/2)\vecx2|\vecx].r^{2}:=\sup_{\vecx\in[-1,1]^{d}}\mathbb{E}[\|M_{\mathsf{mean}}(\vecx,\vecc/2)-\vecx\|^{2}|\vecx]. (85)

Then, the following two results hold:

i𝔼[AiAi]\displaystyle\|\sum_{i}\mathbb{E}[A^{i}A^{i^{\top}}]\| nr4,\displaystyle\leq nr^{4}, (86)
𝔼[maxiAi2]\displaystyle\mathbb{E}[\max_{i}\|A^{i}\|^{2}] nr4.\displaystyle\leq nr^{4}. (87)

Let us first show how this lemma completes the proof. Using Theorem 1 in Tropp [2016], Jensen’s inequality, along with this lemma, we can immediately upper bound (84) by

𝒪(1)log(d)r2n.\mathcal{O}(1)\log(d)\frac{r^{2}}{\sqrt{n}}. (88)

Similarly, the terms (𝒛^(i,1)𝒛(i))𝒛(i)\|(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})\bm{{z}}^{{(i)}^{\top}}\| and 𝒛(i)(𝒛^(i,2)𝒛(i))\|\bm{{z}}^{(i)}(\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)})^{\top}\| can be upper bounded by \cO(log(d)rdn)\cO(\frac{\log(d)rd}{\sqrt{n}}) using similar steps as proof of Lemma 1.

Recall r2=𝒪(1)i=1d1k=1i(ckck1)2dk+1r^{2}=\mathcal{O}(1)\sum_{i=1}^{d}\frac{1}{\sum_{k=1}^{i}\frac{(c_{k}-c_{k-1})^{2}}{d-k+1}}. Note that we use M𝗆𝖾𝖺𝗇M_{\mathsf{mean}} without the truncation to [1,1]d[-1,1]^{d} to preserve unbaisedness, and hence, the minimum with dd does not appear in r2r^{2}, unlike Theorem 2. Thus we obtain the bound

𝔼[sup𝜽(f(𝜽)f^(𝜽))]max𝜽Θ(𝜽2+𝜽)(r2logdn+rdlogdn).\mathbb{E}\left[\sup_{\bm{\theta}}(f(\bm{\theta})-\hat{f}(\bm{\theta}))\right]\lesssim\max_{\bm{\theta}\in\Theta}(\|\bm{\theta}\|^{2}+\|\bm{\theta}\|)\left(\frac{r^{2}\log d}{\sqrt{n}}+\frac{rd\log d}{\sqrt{n}}\right). (89)

We can also project the estimate θ^\hat{\theta} to the set Θ\Theta to obtain the bound max𝜽Θ(𝜽2+𝜽)d\max_{\bm{\theta}\in\Theta}(\|\bm{\theta}\|^{2}+\|\bm{\theta}\|)d. Taking minimum of the terms, we get Theorem 3.

C.2 Proof of Lemma 1

First, note that

AiAi=𝒛^(i,2)𝒛(i)2(𝒛^(i,1)𝒛(i))(𝒛^(i,1)𝒛(i)),A^{i}A^{i^{\top}}=\|\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)}\|^{2}(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})^{\top}, (90)

which yields

𝔼[AiAi]r2𝔼[(𝒛^(i,1)𝒛(i))(𝒛^(i,1)𝒛(i))].\|\mathbb{E}[A^{i}A^{i^{\top}}]\|\leq r^{2}\mathbb{E}[\|(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})(\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)})^{\top}\|]. (91)

Using the result that uu=u2\|uu^{\top}\|=\|u\|^{2} for any vector uu completes the proof of the first part of the lemma. To prove the second part, note that

𝔼[maxiAi2]𝔼[i=1nAi2]=i=1n𝔼[Ai2].\mathbb{E}[\max_{i}\|A^{i}\|^{2}]\leq\mathbb{E}[\sum_{i=1}^{n}\|A^{i}\|^{2}]=\sum_{i=1}^{n}\mathbb{E}[\|A^{i}\|^{2}]. (92)

Using the result that uvuv\|uv^{\top}\|\leq\|u\|\|v\| for any two vectors uu and vv implies

𝔼[Ai2]𝔼[𝒛^(i,1)𝒛(i)2𝒛^(i,2)𝒛(i)2].\mathbb{E}[\|A^{i}\|^{2}]\leq\mathbb{E}\left[\|\bm{\hat{z}}^{(i,1)}-\bm{{z}}^{(i)}\|^{2}\|\bm{\hat{z}}^{(i,2)}-\bm{{z}}^{(i)}\|^{2}\right]. (93)

Finally, the conditional independence of 𝒛^(i,1)\bm{\hat{z}}^{(i,1)} and 𝒛^(i,2)\bm{\hat{z}}^{(i,2)} given z(i)z^{(i)} completes the proof of lemma. ∎