in a,…,z \foreach\xin A,…,Z \foreach\xin A,…,Z \foreach\xin A,…,Z
Enhancing Feature-Specific Data Protection via
Bayesian Coordinate Differential Privacy ††thanks: The authors are listed in alphabetical order.
Abstract
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties. However, LDP applies uniform protection to all data features, including less sensitive ones, which degrades performance of downstream tasks. To overcome this limitation, we propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification. This more nuanced approach complements LDP by adjusting privacy protection according to the sensitivity of each feature, enabling improved performance of downstream tasks without compromising privacy. We characterize the properties of BCDP and articulate its connections with standard non-Bayesian privacy frameworks. We further apply our BCDP framework to the problems of private mean estimation and ordinary least-squares regression. The BCDP-based approach obtains improved accuracy compared to a purely LDP-based approach, without compromising on privacy.
1 Introduction
The rapid expansion of machine learning applications across various sectors has fueled an unprecedented demand for data collection. With the increase in data accumulation, user privacy concerns also intensify. Differential privacy (DP) has emerged as a prominent framework that enables algorithms to utilize data while preserving the privacy of individuals who provide it Warner (1965); Evfimievski et al. (2003); Dwork et al. (2006, 2014); Vadhan (2017); Desfontaines (2020). A variant of DP, local differential privacy (LDP), suitable for distributed settings where users do not trust any central authority has been extensively studied (Warner, 1965; Evfimievski et al., 2003; Beimel et al., 2008; Kasiviswanathan et al., 2011; Chan et al., 2012). In this framework, each user shares an obfuscated version of their datapoint, meaningful in aggregate to the central server yet concealing individual details. Leading companies, including Google (Erlingsson et al., 2014), Microsoft (Ding et al., 2017), and Apple (Differential Privacy Team, Apple, 2017; Thakurta et al., 2017), have employed this approach for data collection.
In the LDP framework, a privacy parameter must be set—typically by the data collector—that determines the level of protection for user information. This decision is particularly challenging in high-dimensional settings, where user data includes multiple attributes such as name, date of birth, and social security number, each with differing levels of sensitivity. One common strategy is to allow the most sensitive data feature to dictate the level of privacy, providing the highest level of protection. Alternatively, separate private mechanisms can be applied to each feature according to its required privacy level. However, this approach fails when there is correlation between features. For instance, setting a lower level of privacy for a feature that is consistently aligned with a more sensitive one risks compromising the privacy of the sensitive feature.
This complexity raises a compelling question: Can we tailor privacy protection to specific data features if the correlation among data coordinates is controlled, without defaulting to the privacy level of the most sensitive coordinate? In other words, can we leverage users’ lower sensitivity preferences for certain features to reduce the error in our inference algorithms, provided the correlation with sensitive coordinates is weak?
Our contributions:
Our main contribution is addressing the challenge of feature-specific data protection from a Bayesian perspective. We present a novel Bayesian privacy framework, which we refer to as Bayesian Coordinate Differential Privacy (BCDP). BCDP complements LDP and allows a tailored privacy demand per feature. In essence, BCDP ensures that the odds of inferring a specific feature (also referred to as a coordinate) remain nearly unchanged before and after observing the outcome of the private mechanism. The framework’s parameters control the extent to which the odds remain unchanged, enabling varying levels of privacy per feature. As part of developing this new framework, we explore its formal relationship with other related DP notions and examine standard DP properties, such as post-processing and composition.
To demonstrate the efficacy of our framework, we study two fundamental problems in machine learning under a baseline level of LDP privacy, coupled with more stringent per-coordinate BCDP constraints. First, we address multivariate mean estimation. Our proposed algorithm satisfies BCDP constraints while outperforming a standard LDP algorithm, where the privacy level is determined by the most sensitive coordinate. Our experimental results confirm this advantage. Second, we present an algorithm for ordinary least squares regression under both LDP and BCDP constraints.
A key technical component of our algorithm for these two problems is a mechanism that builds on any off-the-shelf LDP mechanism and can be applied to various problems. Our algorithm involves making parallel queries to the LDP mechanism with a series of carefully selected privacy parameters where the -th query omits the most sensitive coordinates. The outputs are then aggregated to produce the most accurate estimator for each coordinate.
Bayesian Feature-Specific Protection:
We now give a high-level discussion of privacy in the Bayesian framework to motivate our approach. Formal definitions are in Section 2. Consider a set of users, each possessing a high-dimensional data point with varying privacy protection needs across different features. For instance, may contain genetic markers for diseases, and the -th coordinate could reveal Alzheimer’s markers that the user wishes to keep strictly confidential. In a local privacy setting, users apply a locally private mechanism to obtain obfuscated data , for public aggregation and analysis. Our focus is on understanding what can be inferred about , especially the sensitive feature , from the obfuscated data.
Now, imagine an adversary placing a bet on a probabilistic event related to . Without extra information, their chances of guessing correctly are based on prior knowledge of the population. However, access to could prove advantageous for inference. Privacy in the Bayesian model is measured by how much improves the adversary’s chances. In other words, the privacy parameter limits how much their odds can shift.
The standard Bayesian interpretation of LDP ensures uniform privacy across all data features. In contrast, our framework BCDP enhances LDP by enabling feature-specific privacy controls. That is, BCDP restricts adversarial inference of events concerning individual coordinates. For example, BCDP ensures that the advantage an adversary gains from observing when betting on whether a person has the Alzheimer’s trait (i.e., ) is smaller than what would be achieved under a less stringent LDP constraint.
We remark that BCDP does not replace LDP, since it lacks global data protection. Also, correlation plays a crucial role in BCDP’s effectiveness: if a sensitive feature is highly correlated with less sensitive features, increasing protection for the sensitive feature imposes stricter protection for other features as well. As the correlation between features increases, BCDP’s guarantees converge to the uniform guarantees of LDP.
Related Work:
Statistical inference with LDP guarantees has been extensively studied. For comprehensive surveys see Xiong et al. (2020); Yang et al. (2023). Below, we discuss the results that are most relevant to our setup.
Several papers consider the uneven protection of privacy across features. For instance, Ghazi et al. (2022) and Acharya et al. (2020) focus on settings where the privacy parameters vary across different neighboring datasets. Other models arise from altering the concept of neighboring datasets (Kifer and Machanavajjhala, 2014; He et al., 2014), or they emphasize indistinguishability guarantees within a redefined metric space for the datasets or data points Chatzikokolakis et al. (2013); Alvim et al. (2018); Imola et al. (2022); Andrés et al. (2013). Another relevant notion for classification tasks is Label-DP which considers labels of training data as sensitive, but not the features themselves (Chaudhuri and Hsu, 2011; Beimel et al., 2013; Wang and Xu, 2019; Ghazi et al., 2021). A recent generalization of Label-DP, proposed by Mahloujifar et al. (2023), is Feature-DP, which allows specific features of the data to be disclosed by the algorithm while ensuring the privacy of the remaining information. A line of work, e.g., Kifer and Machanavajjhala (2011); Kenthapadi et al. (2013), studies attribute DP, which limits changes in the output distribution of the privacy mechanism when a single coordinate of the data is modified.
A common shortcoming in the aforementioned approaches is that the heterogeneous privacy protection across features, without accounting for their correlations, can lead to unaddressed privacy leakage—an issue our framework is specifically designed to mitigate. Another key distinction of our work is the integration of the BCDP framework with the LDP framework, enabling simultaneous protection at both the feature level and the global level.
A Bayesian approach to Central-DP is considered in Triastcyn and Faltings (2020) where the notation of neighboring dataset are taken over pairs that are drawn from the same distribution. This is similar in spirit to our framework but we operate in the Local-DP regime so we work with a prior instead. Xiao and Devadas (2023) consider a Bayesian approach to privacy where the goal is to prevent an observer from being able to nearly guess the input to a mechanism under a given notion of distance and error probability. Another Bayesian interpretation of DP is presented in Kasiviswanathan and Smith (2014) and show that DP ensures that the posterior distribution after observing the output of a DP algorithm is close to the prior in Total-Variation distance. None of these results explicitly focus on varying privacy features across coordinates.
Private mean estimation and private least-squares regression are well studied problems in literature in both central (Biswas et al., 2020; Karwa and Vadhan, 2017; Kifer et al., 2012; Vu and Slavkovic, 2009) and local (Asi et al., 2022; Feldman and Talwar, 2021; Zheng et al., 2017) DP regimes. We depart from these works as we consider a BCDP constraint in addition to an LDP constraint.
Organization:
2 Problem Formulation
Let denote the data domain represented as the Cartesian product . One can consider this as a general decomposition where each of canonical data coordinates can be categorical or a real value, or a combination of both, as the setting dictates. For a vector , we let denote the -th coordinate, and the vector with coordinates obtained by removing the -th coordinate from . A private mechanism is a randomized algorithm that maps to in an output space .
Let and be measurable spaces. Let each coordinate space be equipped with -algebra , and take equal to the product -algebra . A randomized mechanism from to can be represented by a Markov kernel where for an input , the output is a -valued random variable with law .
Local differential privacy
We first revisit the definition of Local Differential Privacy (LDP).
Definition 1 (Local DP (Kasiviswanathan et al., 2011)).
A mechanism is -LDP if, for all and all , we have:
(1) |
With the understanding that , Definition 1 can be rewritten in the more familiar form .
Next, we focus on a Bayesian interpretation of this setting. Suppose we have an underlying data distribution . That is, we equip with the probability measure which we call the prior. This induces a probability space where the measure is characterized via
(2) |
Note that the mechanism defining is precisely represented on this space by the random variable . By a slight abuse of notation, we continue to denote the (randomized) output of the mechanism by . We first define the sets on which have positive probability under
(3) |
For and , note that
(4) |
Bayesian differential privacy:
We now introduce a Bayesian formulation of the LDP definition which involves both the mechanism and the prior . This can be seen as the local version of the definitions proposed in earlier works, e.g., Kifer and Machanavajjhala (2014); Triastcyn and Faltings (2020).
Definition 2 (Bayesian DP).
The pair is -BDP if for all and all , we have:
(5) |
We show in Proposition 1 that -LDP and -BDP can be viewed as equivalent, modulo some technicalities.
Proposition 1.
If is -LDP then is -BDP. Conversely, if is -BDP, then (assuming is a Polish space) there exists a mechanism which is -LDP such that .
The reverse direction of the equivalence above relies on being a Polish space, which should capture all settings of practical interest. The proof can be found in Appendix A.1.
As discussed in Kifer and Machanavajjhala (2014), we can interpret the BDP definition in terms of odds ratios. The prior odds of two disjoint events and is given by . Now, for a set with positive measure, the posterior odds, after observing mechanism output , is equal to
(6) |
One can easily verify that the BDP definition implies that the ratio of posterior and prior odds, also known as the odds ratio, is in the interval . In other words, the BDP definition limits how much the output of the mechanism changes the odds, which in turn limits our ability to distinguish whether the data was in or .
Coordinate differential privacy:
We wish to measure the privacy loss of each coordinate incurred by a mechanism . Initially, one might hypothesize that by altering the definition of LDP to reflect changes in specific coordinates, we could achieve privacy protection for certain features of the user’s data. More precisely, consider the following definition of Coordinate DP (CDP)111The acronym CDP is not to be confused with Concentrated DP, which is not considered in this work. where represents a vector of the desired levels of privacy for each coordinate.
Definition 3 (Coordinate DP).
Let . A mechanism is -CDP if, for all and all , we have the implication:
(7) |
A special case of CDP, where all ’s are equal, is referred to as attribute DP in the literature Kifer and Machanavajjhala (2011); Kenthapadi et al. (2013); Ghazi et al. (2022). Additionally, CDP itself can be viewed as a special case of Partial DP, introduced in Ghazi et al. (2022).
The drawback of this definition is that it does not account for potential correlation among the data coordinates and, therefore, does not provide the type of guarantee we hope for regarding sensitive parts of the data. In other words, in -CDP, stating that the privacy level of coordinate is could be misleading. For instance, having does not necessarily ensure that no information is leaked about , as the other (potentially less-well protected) coordinates may be correlated with it.
Our formulation: Bayesian coordinate differential privacy:
In order to capture the correlation among the data coordinates, we adopt a Bayesian interpretation of differential privacy. As seen in Proposition 1, BDP and LDP are closely related and this serves as the main motivation for our definition.
For , define
(8) |
For , we have
(9) |
where . Thus, similar to Definition 2, we have a natural version of privacy for each coordinate which we shall refer to as Bayesian Coordinate DP (BCDP).
Definition 4 (Bayesian Coordinate DP).
Let . A pair is -BCDP if, for each , and for all , , we have:
(10) |
Unlike the Coordinate-DP definition, this definition accounts for potential correlation between coordinates through incorporation of the prior .
It is worth noting that, similar to the BDP definition, we can interpret the BCDP definition in terms of the odds ratio, with the difference that we consider the two events and . In other words, having a small guarantees that the mechanism’s output does not provide much useful information for discriminating between the possibilities that the underlying data has coordinate in or . This aligns with our goal of proposing a new formulation that characterizes how much information is revealed about each particular coordinate. We further discuss this interpretation in the next section.
3 Properties of BCDP
In this section, we discuss a number of properties following from the BCDP definition. Figure 1 compiles some of the results from Section 2 and Section 3 on how the different notions of Local DP (LDP), Coordinate-DP (CDP), Bayesian DP (BDP), and Bayesian Coordinate-DP (BCDP) are interrelated.
3.1 BCDP through the Lens of Hypothesis Testing
In this subsection, we present an interpretation of what the privacy vector quantifies in -BCDP. We begin by revisiting the connection between hypothesis testing and the definition of differential privacy, and demonstrate how this connection extends to the -BCDP definition. For an -LDP mechanism represented by Markov kernel , suppose an adversary observes a realization of the output and is tasked with determining whether the input was or . This scenario can be formulated as the following binary hypothesis problem:
Let the rejection region be . Denote the type I error (false alarm) rate as and the type II error (missed detection) rate as . We restate a known result below (Kairouz et al., 2015; Wasserman and Zhou, 2010).
Proposition 2.
A mechanism is -LDP iff the following condition holds for any and any :
(11) |
We note that the criterion (11) can be equivalently replaced by the criterion
As noted in Kairouz et al. (2015), this operational perspective shows that it is impossible to force type I and type II error rates to simultaneously vanish for a DP algorithm, and the reverse implication also holds.
For BCDP:
Now, consider a -BCDP mechanism . Suppose an adversary has knowledge of the prior and again observes the algorithm’s output . For any , the adversary aims to distinguish whether the output was generated from data where the -th coordinate belongs to the set or . In other words, we consider the following binary hypothesis testing problem:
Let the rejection region for the -th hypothesis test be . For the -th test, denote the type I and type II error rates as and . Then, the following result holds:
Theorem 1.
A pair is -BCDP iff the following condition holds for every coordinate :
(12) |
for any and .
We note that the criteria (12) can be equivalently replaced by the criteria
The proof of Theorem 1 is presented in Section A.2. Thus, the theorem shows that the vector in the -BCDP formulation captures the privacy loss for each of the coordinates in the same way in -DP captures the privacy loss of the entire data. In contrast, the vector in -CDP does not have such an interpretation of privacy loss across the coordinates.
3.2 BCDP can be achieved via LDP
We begin by a simple result that follows directly from -BDP definition. If a mechanism is -LDP, then it is also -BDP. Noticing that -BDP ensures a ratio of for each coordinate as well in (10), we see that each coordinate has a privacy level of as well. Proposition 3 formalizes this observation and the proof can be found in Section A.3.
Proposition 3.
(LDP to BCDP) An -LDP mechanism satisfies -BCDP.
Thus, a baseline approach to implement -BCDP is to use a -LDP mechanism. Nevertheless, this would mean that we provide the most conservative privacy guarantee required for one coordinate to all coordinates. However, as we will show, we can maintain an for the overall privacy guarantee while providing the -BCDP guarantee, therefore achieving lower error for downstream tasks.
3.3 BCDP does not imply LDP
It should be noted that BCDP does not ensure LDP, BDP, or CDP. Example 1 below demonstrates a mechanism that is BCDP, but not LDP.
Example 1.
Consider data where are i.i.d. Bern, and let . Let be defined by the table below, with parameters . Mechanism is not LDP (and hence, neither BDP nor CDP). However, for BCDP, the terms and have a finite multiplicative ranges in , for both . For instance, if , the resulting is -BCDP.
0 | 1 | ||
0 | |||
1 |
The takeaway from the above example is that there can be priors and mechanisms such that the prior-mechanism pair is -BCDP but not -DP for any value of . Intuitively, BCDP quantifies privacy loss for each coordinate. In Example 1, one cannot distinguish much about any particular coordinate when the output is observed due to the BCDP guarantee. However, one may still be able to make conclusions over the coordinates jointly. For instance, if the mechanism output in Example 1 is , then the input could not have been .
As this result suggests, just ensuring BCDP is not sufficient to guarantee overall privacy. In other words, the right way to think of BCDP is that it is a tool that allows us to quantify the privacy offered by an algorithm for each coordinate. The user can demand overall -LDP along with -BCDP where the value could be significantly smaller than for those coordinates that the user feels more concerned about their privacy.
3.4 Achieving BCDP via CDP
As we discussed earlier, CDP does not imply the BCDP, as it does not take into account the correlation among the features. Therefore, a natural question arises: could CDP, along with some bound on the dependence among the features, lead to a bound with respect to the BCDP definition? The next result provides an answer to this question. We denote the total variation distance between two distributions and as .
Proposition 4.
(CDP to BCDP) Suppose mechanism is -CDP and there exists such that
(13) |
Then:
-
1.
The mechanism is -LDP and -BCDP with .
-
2.
In case that the mechanism is -LDP for some , then it is -BCDP with .
For the proof, see Appendix A.4. To better highlight this result, consider a simple example in which that the input’s coordinates are redundant copies of the same value and we have access to a -CDP mechanism . Then, as Proposition 4 suggests, and given that we have here, this mechanism is -BCDP with . This is, of course, intuitive, as all coordinates reveal information about the same value. It also reinforces that the BCDP is sensitive to correlations, unlike CDP. The first part of Proposition 4 provides an immediate method for constructing a BCDP mechanism using mechanisms that offer CDP guarantees, such as applying off-the-shelf LDP mechanisms per coordinate. This can also be done without requiring full knowledge of the prior. The second part of Proposition 4 suggests that in cases where we design an algorithm that has a tighter LDP guarantee than just , we can further improve the BCDP bound. One example of such a construction is provided in our private mean estimation algorithm in Section 4. One challenge with this approach is translating the constraint to a vector . In Section A.5, we provide a tractable approach to do so by imposing a linear constraint on .
3.5 Do Composition and Post-Prossesing Hold for BCDP?
Composition is a widely used and well-known property of LDP. Nonetheless, as the following example highlights, the composition, as traditionally applied in differential privacy settings, does not hold in the BCDP setting.
Example 2.
Let be a Bernoulli vector where coordinates’ distributions are independent and each one is . Consider the mechanism
(14) |
where denotes the sum in . It is straightforward to verify that is in fact -BCDP with respect to the first coordinate. However, having two independent copies of is not BCDP with respect to the first coordinate. To see this, consider the case where and, in two copies of , one adds to and the other adds it to . By observing the two copies, we can determine .
On the other hand, the post-processing property holds for BCDP (see Appendix A.6 for the proof).
Proposition 5.
(Post processing for BCDP) Let be a -BCDP mechanism and be a measurable function. Then, is also -BCDP.
4 Private Mean Estimation
To demonstrate the practical use of the BCDP framework, we consider the private mean estimation problem.
Problem Setup:
Let there be users submitting their data to a central server via Local DP mechanisms. Let user ’s data be , where is drawn from prior on ; the server and users are not assumed to know the priors (which may be different for each individual user). Our goal is to estimate the empirical mean under -LDP and -BCDP. We consider -norm as our error metric. Without loss of generality, we assume that is in non-decreasing order. We make the following assumption regarding the priors.
Assumption 1.
For every and every , we have
(15) |
Here can be interpreted as a measure of how much different coordinates are interdependent. For instance, in the case where coordinates are independent we have , and in the case where they are fully equal, we have . Note that we do not need (15) to hold for the last coordinate (). While this slight relaxation does not play a role in this section, it will be useful when we move to the private optimization application.
It is also worth emphasizing that our analysis extends to cases where we have heterogeneous bounds across different coordinates (similar to ) and different across users. However, we opt for a universal bound for the sake of simplifying the presentation of results. We assume that the upper bound is known to the designer of the local channels, which can be either the end user themselves or the central server.
Our proposed privacy mechanism (Mechanism 1) builds upon any LDP mechanism , capable of handling different dimensional inputs, that can be used for mean estimation as a black-box tool. While Algorithm 1 satisfies the privacy constraints for any black-box LDP mechanism , we shall focus on the mechanism presented by Duchi et al. (2013a) for proving error bounds. The mechanism of Duchi et al. (2013a), which is known to be minimax optimal for mean estimation, is outlined in Mechanism 2 for completeness. A figure explaining is presented in Figure 2.
Theorem 2 shows that our proposed mechanism satisfies the desired privacy constraints and provides a bound on the mean-squared error (MSE). The notation refers to . later in this section we present a refined result with a more interpretable MSE bound in Corollary 1.
Theorem 2.
The proof is presented in Appendix B.2. Here, we provide a proof sketch for the privacy guarantee. We first establish that is -CDP and -LDP. The latter implies that also satisfies -BCDP. Thus, if , we have already shown that is -BCDP with respect to the -th coordinate. For the case where , we use Proposition 4, which essentially implies that is -BCDP with respect to the -th coordinate. Substituting the value of from the statement of Theorem 2 gives us the desired -BCDP guarantee for coordinate . Finally, the choice of in (16) ensures that the proposed ’s are all non-negative and valid.
We next make a few remarks regarding this result. First, note that while this bound applies to empirical mean estimation, when the data points are independent and identically distributed, the result can also be used to derive an out-of-sample error bound for the statistical private mean estimation problem. In this case, an additional term would be added to the error bound. Secondly, note that in the special case where there is no coordinate-level privacy requirement (i.e., ), setting implies , which yields the error bound . This matches the -LDP lower bound provided in Duchi et al. (2013a). Lastly, we would like to discuss the role of parameter . Note that when the correlation parameter is sufficiently high or the most sensitive privacy requirement is low enough such that is determined by the first term, , in (16), one can verify that . As increases, decreases, leading to a worse estimate for the first coordinate. However, increases, indicating that, for the less-sensitive coordinates, we are increasing the privacy parameter, potentially resulting in lower estimation error for those coordinates. In other words, the parameter allows us to control how we allocate the privacy budget of the most sensitive coordinate, , between the estimation of the first coordinate and the privacy cost imposed by its correlation with other coordinates.
We next present a corollary that studies a special case where the coordinates are divided into more sensitive and less sensitive groups. In this case, users requests a general -LDP guarantee for all coordinates and a -BCDP guarantee specifically for the more sensitive coordinates. Under this scenario, we further simplify the error bound which allows us to better highlight implications of Theorem 2.
Corollary 1.
Under the special structure of and in Corollary 1, the MSE can be thought of as sum of MSE of the more sensitive and the less sensitive coordinates. For low levels of correlation, the MSE of less private coordinates behave like the MSE of an -LDP mechanism on dimensional vector but the MSE of the more sensitive part depends on the dimension of the whole vector, not just the sensitive part, showing how the more sensitive part of the data affects the whole vector. As the correlation increases , we have and the MSE matches that of a -LDP mechanism. A supporting experiment is deferred to Appendix B.1.
5 Private Least-Squares Regression
We focus on how the BCDP framework can be employed on least-squares problems. Consider a setting in which datapoints are distributed among users, where represents the data of user , with as the feature vector for the -th data point and as its label. Our goal is to solve the following empirical risk minimization problem privately over the data of users:
(20) |
where and is a compact convex set. We denote the solution of (20) by , achieved at some point .
The customary approach in private convex optimization is to use a gradient-based method, perturbing the gradient at each iteration to satisfy the privacy requirements. However, this approach is not suitable for cases in which coordinate-based privacy guarantees are needed. The difficulty is that each coordinate of the gradient is typically influenced by all coordinates of the data. As a result, and it becomes challenging to distinguish the level of privacy offered to different coordinates of the data. Instead, we first perturb the users’ data locally and then compute the gradient at the perturbed point to update the model at the server.
However, computing the gradient based on perturbed data introduces its own challenges. In particular, the gradient of with , given by
(21) |
is not a linear function of the data. As a result, although the privacy mechanism outputs an unbiased estimate of the true data, this non-linearity introduces an error in the gradient with a non-zero mean, preventing the algorithm from converging to the optimal solution. To overcome this challenge, we observe that the aforementioned non-linearity in the data arises from the product terms and . Therefore, if we create two private copies of the data and replace each term of the product with one of the copies, we obtain a stochastic unbiased estimate of the gradient at the server. The catch, of course, is that we need to divide our privacy budget in half.
A summary of the algorithm is presented in Algorithm 3. There, refers to any convex optimization algorithm of choice, e.g., gradient descent, that finds the minimizer of the function over the set and up to an error of .
We make the assumption that the data is bounded.
Assumption 2.
for all .
We also need 1, which bounds the dependence of each coordinate on the other coordinates of the data. Recall that the last coordinate is exempt from this assumption, which is why we position the label as the last coordinate of the data. It is not reasonable to assume the feature vector is weakly correlated with the label, as that would violate the premise of regression. We next state the following result on Algorithm 3.
Theorem 3.
Suppose Assumptions 1 and 2 hold. Then, Algorithm 3, with set similar to Theorem 2, is -BCDP and -LDP. Moreover, using Mechanism 2 as , we obtain the following error rate
(22) |
with .
The proof of this theorem can be found in Appendix C.1. It is worth noting that appears in the mean-squared error bound in Theorem 2. That said, if we consider special cases similar to the setting in Corollary 1, the term would simplify in a similar manner.
We conclude this section by noting that, while we focus on linear regression here, our approach can provide an unbiased gradient and guarantee privacy in any setting where the loss function has a quadratic (or even low-degree polynomial) dependence on , such as neural networks with loss. However, the optimization error in such cases may require further considerations, as non-convex loss functions could arise.
6 Acknowledgements
This work was done in part while M.A. was a research fellow at the Simons Institute for the Theory of Computing. A.F. and M.J. acknowledge support from the European Research Council (ERC-2022-SYG-OCEAN-101071601). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. S.C. acknowledges support from AI Policy Hub, U.C. Berkeley; S.C. and T.C. acknowledge support from NSF CCF-1750430 and CCF-2007669.
References
- Acharya et al. [2020] J. Acharya, K. Bonawitz, P. Kairouz, D. Ramage, and Z. Sun. Context aware local differential privacy. In Proceedings of the 37th International Conference on Machine Learning, pages 52–62. PMLR, 2020.
- Alvim et al. [2018] M. Alvim, K. Chatzikokolakis, C. Palamidessi, and A. Pazii. Invited paper: Local differential privacy on metric spaces: Optimizing the trade-off with utility. In 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pages 262–267, 2018.
- Andrés et al. [2013] M. E. Andrés, N. E. Bordenabe, K. Chatzikokolakis, and C. Palamidessi. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security, pages 901–914, 2013.
- Asi et al. [2022] H. Asi, V. Feldman, and K. Talwar. Optimal algorithms for mean estimation under local differential privacy. In International Conference on Machine Learning, pages 1046–1056. PMLR, 2022.
- Beimel et al. [2008] A. Beimel, K. Nissim, and E. Omri. Distributed private data analysis: Simultaneously solving how and what. In Advances in Cryptology–CRYPTO 2008: 28th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 17-21, 2008. Proceedings 28, pages 451–468. Springer, 2008.
- Beimel et al. [2013] A. Beimel, K. Nissim, and U. Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 363–378. Springer, 2013.
- Biswas et al. [2020] S. Biswas, Y. Dong, G. Kamath, and J. Ullman. Coinpress: Practical private mean and covariance estimation. Advances in Neural Information Processing Systems, 33:14475–14485, 2020.
- Chan et al. [2012] T.-H. H. Chan, E. Shi, and D. Song. Optimal lower bound for differentially private multi-party aggregation. In Proceedings of the 20th Annual European Conference on Algorithms, ESA’12, page 277–288. Springer-Verlag, 2012. ISBN 9783642330896.
- Chatzikokolakis et al. [2013] K. Chatzikokolakis, M. E. Andrés, N. E. Bordenabe, and C. Palamidessi. Broadening the scope of differential privacy using metrics. In Privacy Enhancing Technologies: 13th International Symposium, PETS 2013, Bloomington, IN, USA, July 10-12, 2013. Proceedings 13, pages 82–102. Springer, 2013.
- Chaudhuri and Hsu [2011] K. Chaudhuri and D. Hsu. Sample complexity bounds for differentially private learning. In Proceedings of the 24th Annual Conference on Learning Theory, pages 155–186. JMLR Workshop and Conference Proceedings, 2011.
- Desfontaines [2020] D. Desfontaines. A list of real-world uses of differential privacy. https://desfontain.es/blog/real-world-differential-privacy.html, 2020. Accessed: 2024-05-22.
- Differential Privacy Team, Apple [2017] Differential Privacy Team, Apple. Learning with privacy at scale. https://machinelearning.apple.com/research/learning-with-privacy-at-scale, 2017. Accessed: 2024-05-22.
- Ding et al. [2017] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. Advances in Neural Information Processing Systems, 30, 2017.
- Duchi et al. [2013a] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy, data processing inequalities, and statistical minimax rates. arXiv preprint arXiv:1302.3203, 2013a.
- Duchi et al. [2013b] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438. IEEE, 2013b.
- Dwork et al. [2006] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3, pages 265–284. Springer, 2006.
- Dwork et al. [2014] C. Dwork, A. Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
- Erlingsson et al. [2014] Ú. Erlingsson, V. Pihur, and A. Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pages 1054–1067, 2014.
- Evfimievski et al. [2003] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-second ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 211–222, 2003.
- Feldman and Talwar [2021] V. Feldman and K. Talwar. Lossless compression of efficient private local randomizers. In International Conference on Machine Learning, pages 3208–3219. PMLR, 2021.
- Ghazi et al. [2021] B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, and C. Zhang. Deep learning with label differential privacy. Advances in Neural Information Processing Systems, 34:27131–27145, 2021.
- Ghazi et al. [2022] B. Ghazi, R. Kumar, P. Manurangsi, and T. Steinke. Algorithms with more granular differential privacy guarantees. arXiv preprint arXiv:2209.04053, 2022.
- He et al. [2014] X. He, A. Machanavajjhala, and B. Ding. Blowfish privacy: tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD ’14, page 1447–1458. Association for Computing Machinery, 2014. ISBN 9781450323765.
- Imola et al. [2022] J. Imola, S. Kasiviswanathan, S. White, A. Aggarwal, and N. Teissier. Balancing utility and scalability in metric differential privacy. In Uncertainty in Artificial Intelligence, pages 885–894. PMLR, 2022.
- Kairouz et al. [2015] P. Kairouz, S. Oh, and P. Viswanath. The composition theorem for differential privacy. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1376–1385, Lille, France, 07–09 Jul 2015. PMLR.
- Karwa and Vadhan [2017] V. Karwa and S. Vadhan. Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908, 2017.
- Kasiviswanathan and Smith [2014] S. P. Kasiviswanathan and A. Smith. On the ’semantics’ of differential privacy: A Bayesian formulation. Journal of Privacy and Confidentiality, 6(1), 2014.
- Kasiviswanathan et al. [2011] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? SIAM Journal on Computing, 40(3):793–826, 2011.
- Kenthapadi et al. [2013] K. Kenthapadi, A. Korolova, I. Mironov, and N. Mishra. Privacy via the johnson-lindenstrauss transform. Journal of Privacy and Confidentiality, 5(1), Aug. 2013. doi: 10.29012/jpc.v5i1.625.
- Kifer and Machanavajjhala [2011] D. Kifer and A. Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204, 2011.
- Kifer and Machanavajjhala [2014] D. Kifer and A. Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions. ACM Transactions on Database Systems (TODS), 39(1):1–36, 2014.
- Kifer et al. [2012] D. Kifer, A. Smith, and A. Thakurta. Private convex empirical risk minimization and high-dimensional regression. In S. Mannor, N. Srebro, and R. C. Williamson, editors, Proceedings of the 25th Annual Conference on Learning Theory, volume 23 of Proceedings of Machine Learning Research, pages 25.1–25.40, Edinburgh, Scotland, 25–27 Jun 2012. PMLR. URL https://proceedings.mlr.press/v23/kifer12.html.
- Mahloujifar et al. [2023] S. Mahloujifar, C. Guo, G. E. Suh, and K. Chaudhuri. Machine learning with feature differential privacy. In Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities, 2023.
- Thakurta et al. [2017] A. G. Thakurta, A. H. Vyrros, U. S. Vaishampayan, G. Kapoor, J. Freudiger, V. R. Sridhar, and D. Davidson. Learning new words, 2017.
- Triastcyn and Faltings [2020] A. Triastcyn and B. Faltings. Bayesian differential privacy for machine learning. In International Conference on Machine Learning, pages 9583–9592. PMLR, 2020.
- Tropp [2016] J. A. Tropp. The expected norm of a sum of independent random matrices: An elementary approach. In High Dimensional Probability VII: The Cargese Volume, pages 173–202. Springer, 2016.
- Vadhan [2017] S. Vadhan. The complexity of differential privacy. Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pages 347–450, 2017.
- Vu and Slavkovic [2009] D. Vu and A. Slavkovic. Differential privacy for clinical trial data: Preliminary evaluations. In 2009 IEEE International Conference on Data Mining Workshops, pages 138–143, 2009. doi: 10.1109/ICDMW.2009.52.
- Wang and Xu [2019] D. Wang and J. Xu. On sparse linear regression in the local differential privacy model. In International Conference on Machine Learning, pages 6628–6637. PMLR, 2019.
- Warner [1965] S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63–69, 1965.
- Wasserman and Zhou [2010] L. Wasserman and S. Zhou. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
- Xiao and Devadas [2023] H. Xiao and S. Devadas. Pac privacy: Automatic privacy measurement and control of data processing. In Annual International Cryptology Conference, pages 611–644. Springer, 2023.
- Xiong et al. [2020] X. Xiong, S. Liu, D. Li, Z. Cai, and X. Niu. A comprehensive survey on local differential privacy. Security and Communication Networks, 2020:1–29, 2020.
- Yang et al. [2023] M. Yang, T. Guo, T. Zhu, I. Tjuawinata, J. Zhao, and K.-Y. Lam. Local differential privacy and its applications: A comprehensive survey. Computer Standards & Interfaces, page 103827, 2023.
- Zheng et al. [2017] K. Zheng, W. Mou, and L. Wang. Collect at once, use effectively: Making non-interactive locally private learning possible. In International Conference on Machine Learning, pages 4130–4139. PMLR, 2017.
Appendix A Proofs for Section 2 and Section 3
A.1 Proof of Proposition 1
See 1
Proof.
(A) Forward direction:
For any and , we have
(23) | ||||
(24) | ||||
(25) | ||||
(26) |
Similarly integrate again to get the desired result.
(B) Converse:
Fix and let
(27) | ||||
(28) |
where the essential supremum and infimum are with respect to . Fix any and define
(29) |
By definition of essential supremum and infimum and using the fact that is a Markov kernel, .
Using the -BDP condition, we can obtain
(30) | ||||
(31) | ||||
(32) |
Since is arbitrary, we get .
Now, for , define
(33) |
and note that by definition of essential supremum and infimum and using the fact that is a Markov kernel.
We shall now modify on a judiciously chosen -null set to obtain an -LDP mechanism . Since is second countable as it is a Polish space, there exists a countable collection of open sets such that every open set can be written as a disjoint union of some countable subset of .
Define and note that by countable sub-additivity. Fix any and define the modification of via
(34) |
Equivalently, is represented by the Markov kernel defined by
(35) |
Since , it is easy to see that . By construction, since ,
(36) |
By second countability, and -additivity, we conclude,
(37) |
Every second countable space is Radon, so by outer regularity of both measures and , we finally obtain
(38) |
Hence, is -LDP as desired. ∎
A.2 Proof of Theorem 1
See 1
Proof.
By definition and
(39) |
Setting , get . One can also switch above and set to get .
The converse is straightforward as well.
(40) | ||||
(41) |
∎
A.3 Proof of Proposition 3
See 3
Proof.
By Proposition 1, an -DP algorithm is -BDP as well. Recall Thus, -BDP guarntee ensures that ,
Restricting to the sets of form and for above yields the desired BCDP result.
∎
A.4 Proof of Proposition 4
See 4
Proof.
Consider the sets such that , then for , we have . Thus, to bound the in -BCDP guarantee, we can restrict to in the set .
For , ,
(42) |
For the set under consideration, define
(43) | ||||
(44) |
By CDP property, we have . Now note that
(45) | ||||
(46) | ||||
(47) |
Similarly, we have
(48) |
∎
The -LDP property follows by straightforward composition.
Finally, notice that, in case that we know the mechanism is -LDP, we can bound by . In addition, by Proposition 3, we know . Thus, we get
A.5 Additional Results for Section 3.4
Proposition 6.
Let
(53) |
Then, the condition implies for all .
Proof.
Without loss of generality, we establish the proof for . For a given value of , we have the constraint that
(54) |
Note that is concave and and . Thus, . Therefore, we can ensure (54) by imposing
(55) |
∎
A.6 Proof of Proposition 5
See 5
Proof.
Let the -field on be denoted by . For since is a measurable function. Thus, we have the following for any and by BCDP guarantees on
(56) | ||||
(57) | ||||
(58) |
∎
Appendix B Private Mean Estimation
B.1 Numerical Experiment

Here, we provide a simple numerical experiment to validate the performance of our proposed algorithm. For baseline comparison, we consider the privacy mechanism from Duchi et al. [2013b] with privacy parameter . For our proposed privacy mechanism , we have a degree of freedom in choosing . We note that, as , we would want since in this case, the best algorithm is to do -LDP. As , we would want as CDP is ideal in this case. Thus, we settle with a heuristic choice .
We first construct a distribution for which 1 holds. We consider the distribution where
(59) | ||||
(60) |
It is easy to verify that in this setting. We fix , , and . This coordinate-wise privacy demand captures the impact of requiring more privacy protection for the first two-coordinates on the algorithm’s error.
We vary the correlation and plot the median MSE values, along with the -th and -th quantiles, for our proposed algorithm BCDP and the baseline algorithm in Figure 3. We set , and sample user data i.i.d. Bern. The experiment is run over 1000 trials.
When , there is a high degree of correlation between the coordinates and our algorithm resembles -LDP. For other values of , the proposed BCDP algorithm can leverage the heterogeneous nature of the privacy demand and obtain better performance.
B.2 Proofs
See 2
Proof.
Our proof has two main parts that are presented below: proof of privacy guarantee, and proof of the error bounds.
Proof of privacy: Note that we use the channel a total of times with varying size inputs in Mechanism 1. Focusing on any particular data point , let
(61) |
First, notice that, by composition, is -LDP. Given that , this immediately implies that our algorithm is -LDP. To show the BCDP bound, given Proposition 5, it suffices to show is -BCDP. First, notice that Proposition 3, along with choice of , implies that this algorithm is -BCDP with respect to coordinate . In fact, for any for which , a similar argument implies that is -BCDP with respect to coordinate . Hence, we could only focus on other coordinates for which .
Next, we claim that is -CDP. To see this, notice that, by perturbing the -th coordinate of , the terms affected in are for ; in Figure 2, this corresponds to the bottom rows of the (s), i.e., . Thus, it suffices to show
(62) |
is -CDP with respect to coordinate . However, this immediately follows as is -LDP, implying it is also -CDP with respect to the -th coordinate.
As a result, by the second part of Proposition 4, and given that is -LDP and 1 holds, we obtain that is -BCDP with respect to coordinate . It remains to show that for any . This also follows from the choice of for and , which are the conditions we are operating under. It is worth noting that the condition
(63) |
guarantees that the chosen ’s are nonnegative, given .
Proof of the mean-squared error bound:
Duchi et al. [2013a] show that the channel ’s output is unbiased and it holds that for any , we have
(64) |
Note that, due to the symmetry of Algorithm 2, the error is identically distributed across each coordinate in expectation. Hence, for any vector and any , the random variable is an unbiased estimators of with its variance bounded by .
Note that, the errors from different runs of the algorithm are independent. Therefore, using the Cauchy–Schwarz inequality, we can see that the minimum variance in estimating through a linear combination of is achieved by choosing weights proportional to the inverse of the variances which would result in the following estimator
(65) |
with the error rate given by
(66) |
Also, note that, the algorithm’s runs over different users are independent. Therefore, taking average of over , i.e., different users’ data, gives us an estimator of
(67) |
with the error rate
(68) |
Summing this over the coordinates gives us the following upper bound:
(69) |
∎
See 1
Proof.
For convenience, we set . We do not focus on refining the constants in the Mean-Squared Error (MSE) bound with a potentially better choice of . Now consider two cases for the correlation bound
-
1.
: in this case, we have and . Using the range of , we get . Thus, replacing the to values by the worst-case of we get an error of
(70) Using , we get
(71) -
2.
: in this case we have and . Thus, we get an error of
(72)
∎
Appendix C Least-Squares Regression
C.1 Proof of Theorem 3
See 3
Proof.
Proof of privacy: The proof of the privacy guarantee follows a similar structure to the proof of Theorem 2. Note that while composition does not hold for BCDP, it does hold for LDP and CDP. This is why we divide by two, rather than dividing by two. Additionally, since the OPT algorithm only observes the private copies of the data, by the post processing inequality, it does not matter how the algorithm finds the minimizer (e.g., how many passes it makes over each private data point), as the privacy guarantee remains unchanged.
Proof of error rate: Without loss of generality, we can ignore the constant term in that does not depend on , and hence, with a slight abuse of notation, we can rewrite as
(73) |
Also, recall that is given by
(74) |
Denote the minimizer of over by . Also, we denote the output of Algorithm 3 by . Finally, recall that denotes the minimizer of function .
For convenience and brevity, we drop the conditioning on from the notation, but all expectations henceforth are conditioned on .
Next, note that
-
1.
By the definition of OPT, we have
(75) -
2.
By the definition of , we have
(76) -
3.
Given the construction of , it is an unbiased estimator of for any fixed . Hence, we have
(77)
Summing all the three, given us
(78) |
However, we need to bound
(79) |
Therefore, it suffices to bound
(80) |
Observe that this term is upper bounded by
(81) |
Note that (81) is upper bounded by
(82) |
We bound the first term above. The second term can be bounded similarly. To do so, first note that we can decompose as
(83) |
Hence, it suffices to bound the expected norm of the average of each term separately, and all of them can be handled similarly. We will focus on the last term in particular, i.e., we will bound
(84) |
with . Next, we make the following claim regarding matrices , proved in Section C.2.
Lemma 1.
Let
(85) |
Then, the following two results hold:
(86) | ||||
(87) |
C.2 Proof of Lemma 1
First, note that
(90) |
which yields
(91) |
Using the result that for any vector completes the proof of the first part of the lemma. To prove the second part, note that
(92) |
Using the result that for any two vectors and implies
(93) |
Finally, the conditional independence of and given completes the proof of lemma. ∎