Towards Separating Computational and Statistical Differential Privacy
Abstract
Computational differential privacy (CDP) is a natural relaxation of the standard notion of (statistical) differential privacy (SDP) proposed by Beimel, Nissim, and Omri (CRYPTO 2008) and Mironov, Pandey, Reingold, and Vadhan (CRYPTO 2009). In contrast to SDP, CDP only requires privacy guarantees to hold against computationally-bounded adversaries rather than computationally-unbounded statistical adversaries. Despite the question being raised explicitly in several works (e.g., Bun, Chen, and Vadhan, TCC 2016), it has remained tantalizingly open whether there is any task achievable with the CDP notion but not the SDP notion. Even a candidate such task is unknown. Indeed, it is even unclear what the truth could be!
In this work, we give the first construction of a task achievable with the CDP notion but not the SDP notion, under the following strong but plausible cryptographic assumptions:
-
Non-Interactive Witness Indistinguishable Proofs,
-
Laconic Collision-Resistant Keyless Hash Functions,
-
Differing-Inputs Obfuscation for Public-Coin Samplers.
In particular, we construct a task for which there exists an -CDP mechanism with achieving utility, but any -SDP mechanism, including computationally-unbounded ones, that achieves a constant utility must use either a super-constant or an inverse-polynomially large .
To prove this, we introduce a new approach for showing that a mechanism satisfies CDP: first we show that a mechanism is “private” against a certain class of decision tree adversaries, and then we use cryptographic constructions to “lift” this into privacy against computationally bounded adversaries. We believe this approach could be useful to devise further tasks separating CDP from SDP.
Index Terms:
differential privacy, computational differential privacy, indistinguishability obfuscationI Introduction
The framework of differential privacy (DP) [DMNS06, DKM+06] gives formal privacy guarantees on the outputs of randomized algorithms. It has been the subject of a significant body of research, leading to numerous practical deployments including the US census [Abo18], and industrial applications [EPK14, Sha14, Gre16, App17, DKY17, KT18, RSP+21].
The definition of DP requires privacy against computationally unbounded, i.e., statistical, adversaries. A natural modification is to instead only require privacy against computationally bounded adversaries. In cryptography, considering computationally bounded adversaries instead of statistical ones enables a vast array of applications, like public-key cryptography. Could the same be true for DP? Despite Beimel, Nissim, and Omri [BNO08] defining computational differential privacy (CDP) in 2008 (definitions that were further extended by Mironov, Pandey, Reingold, and Vadhan [MPRV09]), the central question of separating it from statistical differential privacy (SDP)111See Section III for the formal definitions of CDP and SDP. A good survey of the area can be found in [Vad17, Section 10]., in the standard client-server model, remains open:
Question 1.
[Vad17, Open Problem 10.6] Is there a computational task solvable by a single curator with computational differential privacy but is impossible to achieve with information-theoretic differential privacy?
There have been several positive and negative results towards resolving this question. In the positive direction, it is known that in the multi-party setting, CDP is stronger than SDP [MMP+10, MPRV09]. Roughly speaking, this is because secure multi-party computation enables many data curators to simulate acting as a single central curator, without compromising privacy. Still, the multi-party setting seems very different than the single-curator (aka central) setting. Indeed, [MMP+10] remark222This remark is also quoted by Groce, Katz, and Yerukhimovich [GKY11]. that their “strong separation between (information-theoretic) differential privacy and computational differential privacy … stands in sharp contrast with the client-server setting where … there are not even candidates for a separation.”
In the central setting, Bun, Chen, and Vadhan [BCV16] show there is a task for which there is a CDP mechanism, but any SDP mechanism for this task must be inefficient (modulo certain cryptographic assumptions). We stress that the task they consider does have an inefficient SDP mechanism (with parameters that match their CDP mechanism), so it does not resolve 1. While this may seem like a minor technical point, we emphasize that it is of crucial importance. Perhaps the main practical motivation behind studying CDP is the hope that there are CDP mechanisms for natural tasks with parameters that beat the lower bounds against SDP mechanisms. But if, as in the case of the result in [BCV16], there exists (even an inefficient) SDP mechanism matching the parameters of the CDP mechanism, then there is no hope of the CDP mechanism’s parameters beating SDP lower bounds.
In the negative direction, Mironov, Pandey, Reingold, and Vadhan [MPRV09] (building on Green and Tao [GT08], Tao and Ziegler [TZ08], and Reingold, Trevisan, Tulsiani, and Vadhan [RTTV08]) show a “dense model theorem” for pairs of random variables with “pseudodensity” with each other. Mironov et al. [MPRV09] note that (roughly speaking) extending this dense model theorem to handle multiple pairs of random variables would prove that any CDP mechanism could be converted into an SDP mechanism; such an extension is still open [Vad17, Open Problem 10.8].
Groce, Katz, and Yerukhimovich [GKY11] show that CDP mechanisms for certain tasks where the output is low-dimensional imply SDP mechanisms. Many natural statistical tasks fall into this category, and consequently, such tasks cannot separate CDP from SDP. (This result was further strengthened by [BCV16].) Furthermore, [GKY11] show that CDP mechanisms constructed in a black-box way from a variety of cryptographic objects, such as one-way functions, random oracles, trapdoor permutations, and cryptographic hash functions, cannot separate CDP from SDP.
In summary, there are at least two barriers to separate CDP from SDP:
-
1.
High-dimensionality: One needs to consider (perhaps non-natural) tasks with high dimensional outputs;
-
2.
Exotic cryptography: One needs to use cryptography somewhat specially (perhaps either an exotic primitive or in a non-black-box manner).
In light of these both positive and negative results as well as the lack of a candidate separation, it is not even clear what the truth could be: is there any task for which there is a CDP mechanism but no SDP mechanism?
Our Contributions. We show, under plausible cryptographic hypotheses, that there are indeed tasks for which there exist CDP mechanisms but no SDP mechanisms. This not only positively answers 1 but also negatively answers the dense model extension question [Vad17, Open Problem 10.8]. We state this result now informally and formalize it later in Section II. We also delay discussing our precise cryptographic assumptions to Section II-E, where we discuss their plausibility in detail.
Theorem 2.
[Informal version of Theorem 5] Under cryptographic assumptions, there exists a task for which there is a mechanism but no mechanism.
Let us take a step back to discuss the implications of Theorem 2. Although (as we will see in a moment) our task is specifically constructed for the purpose of separating CDP and SDP, the fact that we can separate them at all opens up a possibility that such a separation even holds for some “natural” tasks. Indeed, some of the current lower bound techniques for SDP—such as the ubiquitous “packing lower bounds”333Specifically, when the packing lower bound requires the use of super-polynomially many datasets, the corresponding adversary does not necessarily run in polynomial time. (see [HT10])—do not necessarily rule out CDP mechanisms. It seems prudent to carefully reexamine the current lower bound techniques to see whether they also apply to CDP. The ultimate hope for this program would be to employ CDP to overcome the known SDP lower bounds for some more “natural” tasks. (Of course, such tasks would also give a more “natural” separation of CDP and SDP.)
In fact, the technical approach we use in our construction already suggests a general approach for constructing non-trivial CDP mechanisms that could apply to more tasks. We discuss this in more detail in Section II, but the idea is as follows. In order to show a task has a CDP mechanism, first show there is a mechanism for that task that is “private” against a certain class of decision tree adversaries. Then, second, use cryptographic assumptions to “lift” this into privacy against computational adversaries.
Organization. The rest of the paper is organized as follows. Section II provides a high-level overview of our techniques as well as a discussion of our cryptographic assumptions and their plausibility. Section III contains the background material and Section IV formally defines the problems. We provide our CDP mechanism in Section V, and prove lower bounds against SDP mechanisms in Section VI. These two components are put together to prove the main result in Section VII. Finally, we discuss the open problems and future directions in Section VIII.
II Overview of the Results
We will next discuss the high-level overview of our results and techniques. We will sometimes have to be informal here, but all details are formalized later. We first recall how a “task” is defined.444Refer to Section III-B for a more formal definition. Following [GKY11, BCV16], a task is defined by an efficiently computable utility function that takes in an input dataset and a response such that if is considered “useful” for and otherwise. A mechanism is said to be -useful for iff for all input datasets ; we will refer to as the usefulness of . We remark that many well-studied problems—such as linear queries with various error metrics—can be written in this form.
One of our main conceptual contributions is to define a class of tasks that seems to naturally circumvent the two earlier-mentioned barriers—tasks where one needs to output a circuit.
II-A The Low Diameter Set Problem
Before we detail why tasks that output a circuit might evade the two barriers, let us describe a concrete example. We call the following the low diameter set problem (defined for some parameter ):
-
Given: dataset represented as bits (adjacent datasets differ on a single bit)555Refer to Section IV for formal details.
-
Output: circuit mapping bits to bit
-
Utility: is considered useful if it outputs
-
on , and
-
on all points at distance greater than from .
-
Informally, this problem asks to output a circuit such that contains and has diameter at most .
While this utility function is not efficiently computable, we will address this in Section II-C2.
Looking ahead, we will ultimately separate CDP from SDP under cryptographic assumptions by considering a “verifiable” version of this problem where we only care about datasets in a cryptographically special set.
We now revisit the two barriers and discuss how the distance problem might circumvent them.
-
1.
High-dimensionality: The output of this task is a circuit, which is high-dimensional.
-
2.
Exotic cryptography: Because the output of the task is a circuit, it lends itself to a powerful class of cryptographic objects: circuit obfuscators [BGI+12]. Roughly speaking, circuit obfuscators take as input a circuit and output a scrambled, obfuscated circuit that computes the same function as but which, ideally, has the property that “anything you could do with access to the circuit , you could do with only black-box access to the function the circuit computes.” Importantly, obfuscation is not in the list of primitives ruled out by the barrier in [GKY11].
II-B SDP Lower Bound
Our starting point for separating CDP from SDP is the low diameter set problem described above. Indeed, we show that there is no SDP mechanism for this problem for any that is essentially sub-linear in .
Lemma 3.
For all and constant and (for some ), there is no -SDP mechanism for that is -useful.
In fact, this lower bound is straightforward (Lemma 15) from the well-known blatant non-privacy notion (see, e.g., [De12]): no DP algorithm can output a dataset that is (with large probability) close to the input dataset. Crucially, our lower bounds are non-constructive, and do not yield an efficient adversary (which would imply a similar lower bound against CDP mechanisms). Thus, to separate CDP from SDP it suffices to come up with a CDP mechanism for, say, .
II-C A CDP Mechanism
By Lemma 3, a positive answer to the following question would demonstrate a separation between CDP and SDP.
Question 4.
For constant , does there exist an - mechanism for with constant usefulness?
A key step in our approach is to reduce the above question to whether there is a mechanism for that is differentially private against query (a.k.a. decision-tree) adversaries. In order to construct such a CDP mechanism , our main idea is to use obfuscation. In particular, we will consider mechanisms where the returned circuit is obfuscated. Recall that in order to prove a mechanism that outputs a circuit is CDP, one needs to argue that no efficient adversary that gets as input can break the privacy guarantee. By considering mechanisms that return obfuscated circuits, we can drastically simplify the type of adversaries we need to prove privacy against. Instead of proving privacy against adversaries that see the circuit (i.e., white-box setting), sufficiently strong obfuscation means we only need to prove privacy against decision tree adversaries that can query the function computed by the circuit (i.e., black-box setting). In other words, if we have a mechanism that satisfies DP against black-box adversaries (decision trees) with a polynomial number of queries, we can then hope to use sufficiently strong obfuscation to “lift” this into a mechanism that is secure against (white-box) computational adversaries with polynomial running time.
Of course, one needs to be careful about whether such “sufficiently strong obfuscation” is even possible, but, putting that aside for the moment, the question of whether there is a CDP mechanism for (4 above) appears to reduce to whether there is a mechanism for that is DP against query (a.k.a. decision-tree) adversaries.
While we do not resolve 4, we (roughly speaking) show that there is a mechanism that is DP against non-adaptive decision tree adversaries, whose queries are fixed a priori. It turns out a relatively simple mechanism based on randomized response [War65] works for these less powerful adversaries.
II-C1 From Non-Adaptive Lower Bound to Computational Lower Bound
This switch from the usual adaptive query adversaries to non-adaptive query adversaries comes at a price however. It is not clear how to use obfuscation to lift a mechanism that is private against non-adaptive queries into one that is private against computational adversaries. Indeed, a polynomial-time algorithm with even black-box access to a function seems to be an inherently adaptive adversary!
Surprisingly, we get around this by using another cryptographic object introduced by Bitansky, Kalai, and Paneth [BKP18]: collision-resistant keyless hash functions. Informally speaking, a hash function being collision-resistant and keyless means that “any efficient adversary can only generate a number of hash collisions that is at most polynomially larger than the advice the adversary gets.”
We then modify the problem to only consider datasets that belong to a specific set ; in particular, we will specify it the as set of all strings that hash to, say the all zeroes string. Formally, is the following problem (defined for parameters and ).
-
Given: dataset that consists of bits
-
Output: circuit mapping bits to bit
-
Utility: is considered useful if or both of the following hold:
-
it outputs 1 on
-
it outputs 0 on all points in at distance greater than from
-
In other words, the utility function now completely ignores all points outside of .
The high-level intuition behind this change is the following:
-
1.
Our CDP mechanism can output a circuit such that the only inputs where reveals information are those in the set .
-
2.
Any polynomial-time adversary can only generate fixed polynomial number of elements of by the collision-resistance property of the hash function.
-
3.
Combining the above makes the inputs can query on, effectively “non-adaptive”.
Finally, in order to “lift” the query separation into the computational realm we use another cryptographic tool: differing-inputs obfuscation () [BGI+01, BGI+12, ABG+13]. Roughly speaking, is an obfuscator with the following guarantee: if any efficient adversary can distinguish the obfuscation of two circuits and , then an efficient adversary can find an input on which . In particular, the specific assumption we use is even weaker than public-coin [IPS15], which is already considered to more plausible than general .666See Assumption 22 for formal statement of the assumption and Appendix A for comparison with other assumptions in literature.
In summary, allows us to reduce computational adversaries to adaptive query adversaries and collision-resistant keyless hash functions allows us to reduce adaptive query adversaries to non-adaptive query adversaries. Interestingly, to the best of our knowledge, this is the first time collision-resistant keyless hash functions are being used together with any obfuscation assumption.
II-C2 Making the Utility Function Efficiently Computable
We need to address one final issue: utility functions that we have considered so far are not necessarily efficiently computable. Specifically, a trivial way to implement the utility function would be to enumerate all points at distance at least , feed it into the circuit, and check that the output is as expected; this would take time.
To overcome the above problem, we restrict circuits to only those that are relatively simple, so that there is a small “witness” that certifies that the circuit outputs zero at all points that are -far from . A naive idea is then to let the CDP mechanism output the circuit together with such a witness . The utility function can then just efficiently check that is a valid witness (and that or ). This makes the utility function efficient but unfortunately compromises privacy because the witness itself can leak additional information. To avoid this, we instead use non-interactive witness indistinguishable (NIWI) proofs (e.g., [BOV07]). Roughly speaking, this allows us to produce a proof from (and and ), which does not leak any information about (against computationally bounded adversaries), but at the same time still allows us to verify that the underlying witness is valid. The former is sufficient for CDP, while the latter ensures that the utility function can be computed efficiently.
This completes the high-level overview of the constructed task and our mechanism. The cryptographic primitives needed for our mechanism are formalized in Assumptions 18, 22 and 26.
II-D Final Steps
Finally, since our problem is now not exactly the original problem anymore, as the utility guarantees are only now meaningful for datasets in , we cannot use the lower bound in Lemma 3 for directly. Fortunately, we can still adapt its proof—a “packing-style” lower bound on each coordinate—to one which applies a packing-style argument on each block of coordinates instead. With this, we can prove the lower bound for as long as the set has sufficiently large density ().
Putting all the ingredients together, we arrive at the following777We remark that - mechanism here refers to an ensemble of mechanisms that are -. (See Definition 7.):
Theorem 5 (Main Result).
Under Assumptions 18, 22 and 26, for any constant , there exists an ensemble of polynomial time computable utility functions such that
-
There is an - mechanism that is -useful for .
-
For any constants and , no - mechanism is -useful for .
The task underlying the separation is an instantiation of the “verifiable low diameter set problem” defined in Definition 14.
II-E On the Plausiblility of the Cryptographic Assumptions
We now discuss the plausiblility of the three cryptographic assumptions we use for our result:
-
(i)
NIWI: Non-interactive Witness Indistinguishable Proofs (formally, Assumption 26)
-
(ii)
CRKHF: Laconic Collision-Resistant Keyless Hash Functions (formally, Assumption 18)
-
(iii)
: Differing-Inputs Obfuscation for Public-Coin Samplers (formally, Assumption 22)
Regarding (i), NIWI. Bitansky and Paneth [BP15a] show that NIWIs exist assuming one-way permutations and indistinguishability obfuscation (iO) exists. Recently, Jain, Lin, and Sahai [JLS21] show that the existence of follows from well-founded assumptions; consequently, NIWIs exist based on widely-believed assumptions. (We note that other previous works have also constructed NIWIs based on other more specific assumptions [BOV07, GOS12].)
Regarding (ii), CRKHF. Bitansky, Kalai, and Paneth [BKP18] defined CRKHFs to model the properties of existing hash functions like SHA-2 used in practice. They suggest several candidates for CRKHFs, such as hash functions based on AES and Goldreich’s one-way functions. They also note that CRKHFs exist in the Random Oracle model, as a random function is a CRKHF. Still, it is an open question to base the security of a CRKHF on a standard cryptographic assumption. Part of the difficulty of doing this, as [BKP18] describe, is that most cryptographic assumptions involve some sort of structure that is useful for constructing cryptographic objects. In contrast, the goal of a CRKHF is to have no structure at all. In summary, given the various CRKHF candidates, the existence in the Random Oracle model, and the fact that CRKHFs exist “in practice,” this assumption is quite plausible. For our specific construction, we need a different hash length (equivalently, different compression rate) than that used in [BKP18]; please refer to the discussion preceding Assumption 18 for the parameters and justification.
Finally, we remark that, even though the existence of CRKHFs is not known to reduce to any “well-founded” assumption, even refuting their existence would answer a longstanding question in cryptography: giving non-contrived separations between the Random Oracle model [BR93] and the standard model. In the words of Bitansky, Kalai, and Paneth [BKP18],
“Any attack on the multi-collision resistance of a [keyless] cryptographic hash function would constitute a strong and natural separation between the hash and random oracles. For several cryptographic hash functions used in practice, the only known separations from random oracles are highly contrived [CGH04].”
Regarding (iii), . One can think of [BGI+01, BGI+12] as an “extractable” strengthening of . While has now become a widely-believed assumption [JLS21], the existence of is controversial. Several papers (e.g., [BP15b, GGHW17, BSW16]) cast doubt on the existence of , especially in the case where an arbitrary auxillary input is allowed; we stress that all the negative results for hold for contrived auxillary inputs and/or distributions. On the positive side, [BCP14] show that reduces to in special cases, such as when the number of differing-inputs is bounded by a polynomial. More related to our result, [IPS15] gives a definition of public-coin that avoids the difficulties presented by earlier negative results regarding auxiliary inputs, although [BP15b] presented some evidence against this definition in special cases. Our specific assumption of is in fact weaker than the assumption of public-coin . In the definition of public-coin , as in [IPS15], we start with any public-coin sampler (), for which it is hard to find an input on which two circuits differ, even given the knowledge of all the randomness that underlies the circuits. The security of the obfuscation is required to hold even against adversaries that know all the randomness that underlies the generation of the two circuits. However, in our definition, the security of the obfuscation is required to hold only against adversaries that observes a single obfuscated circuit, which makes the assumption weaker. See Appendix A for a more detailed discussion on comparison of this assumption with other assumptions in literature. Finally, we only use the existence of for a simple circuit family for our result, so even if general purpose does not exist, we think it is plausible that exists for the specific family of circuits we need for our result. (See Assumption 22 for the exact family for which we require a .)
Final thoughts on our assumptions.
In conclusion, we view each of our three assumptions as plausible. Moreover, each of assumptions has at least some evidence that is hard to refute: NIWIs exist based on a widely-believed assumption, refuting CRKHFs would require giving the first non-contrived separation between the standard and the Random Oracle model, and despite many attempts (e.g., [BP15b, GGHW17, BSW16]) to refute , the question is still open, especially for the particular version. Thus, refuting any of the assumptions would constitute a breakthrough in cryptography.
III Preliminaries
A function is said to be negligible if . Let PPT be an abbreviation for probabilistic polynomial-time Turing machine.
For and , we use to denote the (Hamming) ball of radius around , i.e., . Furthermore, we use for a set to denote the (Hamming) diameter of , i.e., .
III-A Dataset and Adjacency
For a domain , we view a dataset as a histogram over the domain , i.e., where denotes the number of times appears in the dataset. The size of the dataset is defined as . We write as a shorthand for the set of all datasets of size , and for the set of all datasets over domain . Two datasets are adjacent iff , i.e., one of the datasets is a result of adding or removing a single row from the other dataset.
III-B Mechanism, Utility Function, and Usefulness
A mechanism is a randomized algorithm that takes in a dataset and outputs an element from a set . The utility of a mechanism is measured by a utility function , which is a polynomial-time deterministic algorithm that takes in a dataset together with a response and outputs 0 or 1 (whether the response is good for the dataset). We say that the mechanism is -useful for utility iff .
Below, we will often discuss an ensemble of mechanisms where888It is always implicitly assumed that are of size . . We say that an ensemble of mechanisms is efficient if on input runs in time . For an ensemble of utility functions and , we say that is -useful with respect to iff is -useful with respect to for all .
For brevity, we will sometimes refer to “ensemble of mechanisms” simply as “mechanism” and “ensemble of utility functions” simply as “utility function” when there is no ambiguity.
III-C Notions of Differential Privacy
We now define the notions of DP that will be used throughout the paper.
(Statistical) Differential Privacy. The standard (statistical) notion of DP can be defined in terms of the following notion of indistinguishability.
Definition 6 (Statistical Indistinguishability).
Distributions , are said to be -indistinguishable, denoted , if for all events (measurable sets) , it holds for and that
For simplicity, we use to denote .
Definition 7 (Statistical Differential Privacy (SDP) [DMNS06, DKM+06]).
For , a mechanism is said to be - if and only if for every pair of adjacent datasets, we have that . We say that an ensemble is - for sequences and if is - for all .
Computational Differential Privacy. The notion of computational DP relaxes the notion of indistinguishability to a computational version, where the privacy holds only with respect to computationally bounded adversaries.
Definition 8 (Computational Indistinguishability).
Two ensembles of distributions and , where and are supported over for some polynomial , are said to be -computationally-indistinguishable for a sequence , denoted , if there exists a negligible function such that for any PPT adversary , it holds for and that
In the special case of , we suppress the subscript and simply write .
Throughout, when we refer to a sequence of adjacent datasets, it is always assumed that are of sizes .
Definition 9 (Computational Differential Privacy () [MPRV09]).
An ensemble of mechanisms is said to be - for a sequence , if for any sequence of adjacent datasets, it holds that .
This definition is often referred to as indistinguishability-based CDP (-) in previous works [MPRV09, GKY11, BCV16]. Since we only use this notion for our main result, we refer to it simply as CDP. The other definition of CDP used in previous works is simulation-based:
Definition 10 (- [MPRV09]).
An ensemble of mechanisms is said to be -- if there exists an - ensemble of mechanisms such that for any sequence of datasets, with size of being at most , it holds that .
It should be noted that - cannot be used for the separation we are looking for. Specifically, if is --, we may use as our - mechanism. Since the utility function runs in polynomial time, it follows immediately that, if is -useful, then is also -useful.
Due to this, we will not consider - again in this paper.
Another point to note is that unlike prior work (e.g. [BCV16]) we use both parameters for , but only parameter for , since is always assumed to be negligible for . Our lower bounds for in fact work for that is not negligible, which only makes the result stronger.
Calculus of and . The following properties are well-known.
Fact 11.
The notions of -indistinguishability and -computational-indistinguishability satisfy:
-
Basic Composition: If and , then . Similarly, if and , then .
-
Post-processing: If , then for all (randomized) functions , it holds that . Similarly, if , then for all PPT algorithms , it holds that .
IV Low Diameter Set Problem and Nearby Point Problem
In this section, we introduce the problems that we will use in our separation. Before that, we will describe a simplifying assumption that we can make about the inputs.
IV-A Simplification of Input Representation
Recall that so far a dataset may contain multiple copies of an element. Below, however, it will be more convenient to only discuss the case where each element appears only once, i.e., .
This is sufficient since if we have a utility function defined only on , we can easily define the utility function by
In other words, the utility function considers any response good for datasets with repetition. Clearly, if is efficiently computable, then so is . Furthermore, suppose that we have an - mechanism for . For every dataset , let be defined by . Then, we may define by . It is easy to see that remains -CDP. Furthermore, if is -useful for , then remains -useful for .
Finally, note that a lower bound for DP algorithms restricted to non-repeated datasets trivially implies a lower bound against all datasets.
Due to this, we will henceforth focus our attention only on the datasets . Furthermore, throughout the remainder of this paper, we will always pick . This further simplifies the input representation to be just a bit vector . We will define an input of our problem in this way. Furthermore, we will henceforth use instead of to denote the input dataset.
IV-B Nearby Point Problem
We will start by defining our first problem, which asks to output a point that is close to the input point if the latter belongs to some set . As we noted in the introduction, when is the set of all points (i.e., ), this is exactly the same as the problem considered in blatant non-privacy [DN03, DMT07]. As we will see later, the presence of the set is due to our use of hashing, which is required in our proof for the mechanism.
Definition 12 (-Nearby -Point Problem).
The nearby point problem parameterized by sequences and is denoted by . For input and output , the utility is defined as:
For brevity, we will assume throughout that is efficiently recognizable and henceforth we do not state this explicitly. Note that this assumption implies that the utility function defined above is efficiently computable. The nearby point problem will be primarily used for proving the lower bounds against .
IV-C Verifiable Low Diameter Set Problem
Next, we define circuit-based tasks for which we will give mechanisms. To do so, we need to first define a “-diameter verifier”.
Definition 13 (-Diameter Verifier).
For a sequence of integers, we say that an efficiently computable (deterministic) verifier is a -diameter verifier for circuits of size if it takes as input a circuit of (polynomial) size and a proof of size , and outputs only if .
We can now define the (verifiable) low diameter set problem as follows:
Definition 14 (Verifiable -Diameter -Set Problem).
The verifiable low diameter set problem parameterized by sequences , , and -diameter verifier is denoted by . The input, output, and utility are defined as follows:
-
Input: .
-
Output: circuit and a proof , both of size .
-
Utility: and .
For convenience, we also define the following utility function
Note that this does not correspond to a hard task, because a circuit that always outputs one is 1-useful. Nonetheless, it will be convenient to state usefulness of some intermediate algorithms via this utility function.
IV-D From Low Diameter Set Problem to Nearby Point Problem
Below we provide a simple observation that reduces the task of proving an lower bound for the verifiable low diameter set problem to that of the nearby point problem. (Note here that the mechanisms considered below can be computationally inefficient.)
Lemma 15.
If there is an - -useful mechanism for the problem, then there is an - -useful mechanism for the problem.
Proof.
Let be an -SDP -useful mechanism for the problem. We will construct an -SDP -useful mechanism for the problem.
The mechanism on input dataset works as follows. First, let . If , then output the lexicographically first element of (else, output ). This completes our description of .
Since is -SDP, we have that is also -SDP by post-processing. It remains to show that is -useful. Fix some input . If , then any output satisfies utility. Thus, it suffices to consider the case where . With probability , we have that (which implies that has diameter at most ), and . Consequently, the distance between and the lexicographically first element of is at most . So with probability at least , the output of is useful for , as desired. ∎
V Mechanism for Verifiable Low Diameter Set Problem
In this section we build a CDP mechanism for the verifiable low diameter set problem. We establish the following result:
Theorem 16.
Suppose that Assumptions 18, 22 and 26 hold. Then, for all constant and , there exists a -diameter verifier and a sequence of sets of sizes , such that there exists an - mechanism that is -useful for .
As discussed in the overview, we first build a mechanism that is but without verifiability using collision-resistant keyless hash functions and differing-inputs obfuscators (Section V-A). We then turn it into a verifiable one using non-interactive witness indistinguishable proofs (Section V-B).
V-A Mechanism without Verifiability
In this section, we construct our first mechanism (Algorithm 3). We depart from the overview in Section II slightly and do not prove a non-adaptive query lower bound explicitly. Instead, we directly show in Section V-A2 how to sample the appropriate differing-inputs circuit family. This can be then easily turned into our mechanism via in Section V-A3.
V-A1 Additional Preliminaries: Cryptographic Primitives
Throughout this section, we will repeatedly use the so-called randomized response (RR) mechanism [War65]. Specifically, is an algorithm that takes in and outputs , where with probability independently for each . It is well-known (and very simple to verify) that is -.
Collision-Resistant Keyless Hash Functions. In our construction, we will use the Collision-Resistant Keyless Hash Functions (CRKHFs) [BKP18]. The formal definition is as given below.
Definition 17 (Collision-Resistant Keyless Hash Functions [BKP18]).
A sequence of hash functions is -collision resistant for advice length for sequences , if, for any PPT and a sequence of advices where , it holds for that
We skip the subscript when it is clear from context.
In [BKP18], the hash value length is assumed to be either linear, i.e., , or polynomial, i.e., . However, we need a collision-resistant hash function with a much smaller , namely . We remark that this is still very much plausible: as long as is , the “guess-and-check” algorithm will only produce a collision with only negligible probability. A more precise statement of our assumption is stated below.
Assumption 18.
There is an efficiently computable sequence of hash functions with hash value length such that, for any constant , there exists a constant such that the hash function sequence is -collision resistant for advice length where and .
We remark that, for the existence of mechanism (shown in this section), we will only use the multi-collision-resistance without relying on the assumption on the value of . The latter is only used to show that no mechanism exists for the problem (Section VII).
Differing-Inputs Obfuscators for Public-Coin Samplers.
For any two circuits and , a differing-inputs obfuscator [BGI+12] guarantees that the non-existence of an efficient adversary that can find an input on which and differ implies that and are computationally indistinguishable. For our application, it suffices to assume a weaker notion, namely that of differing-inputs obfuscator for public-coin samplers, as defined below.
Definition 19 (Public-Coin Differing-Inputs Circuit Sampler).
An efficient non-uniform sampling algorithm is a public-coin differing-inputs sampler for the parameterized collection of circuits if the output of is distributed over and for every efficient non-uniform algorithm , there exists a negligible function such that for all :
Here, is a deterministic algorithm and the only source of randomness is the seed .
Definition 20 (Differing-Inputs Obfuscator for Public-Coin Samplers (cf. [IPS15])).
A uniform PPT is a differing-inputs obfuscator for public-coin samplers for the parameterized circuit family if the following conditions are satisfied:
-
Correctness: For all , for all , for all inputs , we have that
-
Polynomial slowdown: There exists a universal polynomial such that for all , it holds that
-
Differing-inputs: For every public-coin differing inputs sampler for , and every (not necessarily uniform) PPT distinguisher , there exists a negligible function such that the following holds for all : For
We note that the notion of is in fact weaker than the notion of general public-coin as given by [IPS15]. We elaborate on this comparison in Appendix A. Whenever is clear from context, we use to denote for simplicity. When we want to be explicit about the randomness (of bit length) used by we will denote it as .
We only need the existence of a differing-inputs obfuscator for a specific family of circuits. This circuit family will be defined later and therefore we defer formalizing our assumption to Section V-A3.
V-A2 Public-Coin Differing-Inputs Circuits from CRKHFs
The first step of our proof is to construct a differing-inputs circuit family based on CRKHFs. Our sampler is described in Algorithm 1.
We next prove that the above sampler is a public-coin differing-inputs sampler, which means that any efficient adversary, even with the knowledge of (which is the only source of randomness), cannot find an input on which and differ. The proof starts by noticing that any input that differentiates must, by definition of the circuits, have hash value . Therefore, if there were an adversary that can find a differing input, then we could run it multiple times to get that have the same hash value. (See Algorithm 2 below.) However, our proof is not finished yet, since it is possible that are not distinct. Indeed, the crux of the construction is that, due to how we select and define the circuits, a fixed will be a differing input with negligible probability999It is also simple to see that this property suffices to prove a non-adaptive query lower bound as discussed in Section II.. It follows that must be distinct w.h.p. This is formalized below.
Lemma 21.
Let be as in Assumption 18. For any constant , choosing and makes (Algorithm 1) a public-coin differing-inputs sampler.
Proof.
Suppose for the sake of contradiction that for some adjacent , there exists a PPT such that
(1) |
for some constant . Furthermore, let be such that the total size of the descriptions of is at most . Finally, let be as in Assumption 18 and .
Consider the adversary for collision-resistant hash function described in Algorithm 2. First, note that by (1) and a standard concentration inequality, the probability that outputs is . Furthermore, notice that can differ on only if , meaning that always. Therefore, it suffices for us to show that the probability that are distinct is . By a union bound, we have that violates the collision-resistance of as desired.
Thus, we are only left to show that are not distinct with probability . To see that this is the case, notice that
(2) |
Let us now bound for a fixed pair . Suppose that we fix a value of and suppose that is assigned at step . Conditioned on these, notice further that
(3) |
Now, let us bound the RHS probability for a fixed . To see this, first observe that must belong to the symmetric difference ; otherwise, we must have , a contradiction to our definition of .
V-A3 From Differing-Inputs Circuits to
We will next construct a mechanism from the previously constructed differing-inputs circuit family. First, let us state the assumption we need here:
Assumption 22.
For as in Assumption 18, any constant and , there exists a differing-inputs obfuscator for the sampler .
Our mechanism can then be defined by simply applying the obfuscator to the circuit generated in the same way as in . This mechanism is described more formally in Algorithm 3. The property of the mechanism follows rather simply from the definition of and the fact that is -.
Theorem 23.
Under Assumptions 18 and 22, (Algorithm 3) is -.
Proof.
For any adjacent datasets , we want to show that . We show this using an intermediate hybrid, as shown in Figure 1, where changes from one hybrid to next are highlighted in red.
-
Distribution is precisely .
-
Distribution is a variant of , where we change to in the definition of , but continue to sample .
-
Distribution is a variant of , where we sample . Note that this is exactly .
We show that by showing that and and using basic composition (11). We have from Lemma 21, that under Assumption 18, the joint distribution of , and circuits in and is precisely the output of . Thus, from Assumption 22, it follows that by post-processing (11). Next, we have that , since the only difference between the two is the distribution of , and (again by post-processing). ∎
Finally, its utility also follows simply from a standard concentration inequality.
Theorem 24.
When choosing , is -useful for .
Proof.
Consider any dataset . If , then, by definition of , the utility is . Therefore, we may only consider the case where .
In this case, is equal to . Notice that is distributed as . Therefore, applying Bernstein’s inequality, we have
where . Thus, we have as desired. ∎
V-B Mechanism for
V-B1 Witness-Indistinguishable Proofs
For any language with associated verifier , let denote the corresponding relation . Let .
Definition 25 (NIWI Proof System).
A pair of PPT algorithms is a non-interactive witness indistinguishable (NIWI) proof system for an relation if it satisfies:
-
Correctness: for every
-
Soundness: there exists a negligible function such that for all and :
-
Witness Indistinguishability: There exists a polynomial and a negligible function , such that for any sequence and for all circuits of size at most :
V-B2 Making Utility Function Efficient Using Witness-Indistinguishable Proofs
We consider the language defined below, and use the corresponding NIWI verifier to define the utility for .
Definition 27.
Language consists of all circuits with a top AND gate, namely of the form such that there exists some , and , such that at least one of or can be obtained as where is a circuit that takes in and computes .
A “witness” for is given by , where indicates whether the witness is provided for or for . Let denote the NIWI proof system for (guaranteed to exist by Assumption 26).
We consider the verifiable low diameter set problem . Note that automatically implies that encodes a -diameter set (since , it suffices to certify that at least one of or encodes a -diameter set) where .
Theorem 28.
Under Assumptions 18, 22 and 26, (Algorithm 5) is -.
Proof.
For any adjacent datasets , , we want to show that . We show this through the means of intermediate hybrids, as shown in Figure 2, where changes from one hybrid to next are highlighted in red.
-
Distribution is precisely .
-
Distribution is a variant of , where is generated through instead of .
-
Distribution is a variant of , where we switch from corresponding to witness to the witness .
-
Distribution is a variant of , where is also generated through instead of .
-
Distribution is a variant of , where we switch from corresponding to witness to the witness . Note that this is exactly .
From Assumption 26 and post-processing (11), we have that , and similarly .
Next, we show that . Note that the output of and do not depend on and . Thus the only material change between and is that in versus in . From Theorem 23, we have that . Thus, it follows that by post-processing (11). Similarly, it follows that (here we use that and are immaterial to the final output of and ).
Combining these using basic composition (11), we get that , thus implying that is -. ∎
Corollary 29.
is -useful for .
Proof.
The utility for is trivially . Consider . Suppose the mechanism is -useful for . Since we sample and from independently we have that with probability at least . Finally, note that the proof in the output of is always accepted by . From Theorem 24, we have that , and hence is useful for . ∎
We end this section by proving Theorem 16. The proof is essentially a straightforward combination of the previous two results. The only choice left to make is to select the hash value ; we select it so that the size of the preimage is maximized. This ensures that the set has enough density as required in Theorem 16. (Note: the density requirement in Theorem 16 is not important for showing the existence of a mechanism, but instead is later used to show the non-existence of mechanisms.)
Proof of Theorem 16.
Let be as defined above. Furthermore, let be such that is maximized and . The fact that there exists an - mechanism that is -useful for follows immediately from Theorem 28 and Corollary 29. Furthermore, by our choice of , notice that , where the latter comes from our assumption on in Assumption 18. ∎
VI Lower Bounds for the Nearby Point Problem
In this section, we will show that there is no - algorithm for the nearby point problem with target threshold as long as the set is fairly dense, as formalized below.
Theorem 30.
For and such that and and for any constant and , no - mechanism is -useful for .
To prove Theorem 30, let us first recall the standard “blatant non-privacy implies non-DP” proof101010Here we follow the proofs in [Sur19, Man22]., which corresponds to the case . At a high-level, these proofs proceed by showing that the error in each coordinate is large by “matching” each with another point which is the same as except with the -th bit flipped; a basic calculation then shows that (on average) the -th bit is predicted incorrectly with large probability. Summing this up over all the coordinates yield the desired bound.
As we are in the case where , we cannot use the proof above directly. Nonetheless, we can still adapt the above proof. More specifically, instead of looking at each coordinate at a time, we look at a block of coordinates. For each block, we try to find a matching in the same spirit as above, but we now allow the to have a larger distance; simple calculations give us a lower bound on being incorrect in this block (Section VI-B). We then “sum up” across all blocks to get a large distance (Section VI-C). Even though we get a large distance via this approach, the error probability (i.e. one minus usefulness) is small (i.e. ). Fortunately, we can overcome this using the so-called DP hyperparameter tuning algorithm [LT19, PS21] (Section VI-D). This concludes our proof overview.
VI-A Additional Preliminaries: Tools from Differential Privacy
We will require several additional tools from DP literature, which we list below for completeness.
Laplace Mechanism. The Laplace distribution with scale parameter , denoted by , is the probability distribution over with probability mass function .
Given a function , its sensitivity is defined as , where the maximum is over all pair of adjacent datasets.
The Laplace mechanism [DMNS06] is an - mechanism that simply outputs .
Group Privacy. The following fact is well-known and is often referred to as group privacy.
Fact 31 (Group Privacy (e.g., [Vad17])).
Let be an - mechanism and let be such that , then, we have where and .
DP Hyperparameter Tuning. We will also use the following result of Liu and Talwar [LT19] on DP hyperparameter tuning. We remark that some improvements in the constants has been made in [PS21], by using a different distribution of the number of repetitions. Nonetheless, since we are only interested in an asymptotic bound, we choose to work with the slightly simpler hyperparameter tuning algorithm from [LT19].
The hyperparameter tuning algorithm from [LT19] allows us to take any DP “base” mechanism , which outputs a candidate and a score , run it multiple times and output a candidate with score that is below a certain threshold.111111While DP Hyperparameter tuning is typically stated for choosing based on score above a threshold, the formulations are equivalent. The precise description is in Algorithm 6.
We will use the following DP guarantee of , which was shown in [LT19]121212Note that this is a simplified version of [LT19, Theorem 3.1] where we simply set ..
Theorem 32 (DP Hyperparameter Tuning [LT19]).
For all , and , if is -, then (Algorithm 6) is .
VI-B Weak Hardness
We start with a relatively weak hardness for the case of , i.e., the answer is considered correct iff it is the same as the input. To prove this, we recall a couple of facts.
The first is a simple relation between independent set and maximum matching. Let denote the size of the maximum independent set of .
Fact 33.
For any graph , there exists matching of size at least .
Let denote the distance- graph on the hypercube, i.e., where iff . Let . The following standard lower bound follows from a “packing argument”.
Fact 34.
For any , .
We are now ready to prove a lower bound for the nearby problem.
Theorem 35.
For any , let and . Then, for any - algorithm , we have
VI-C Boosting the Distance
We can now prove a hardness for larger by dividing the coordinates into groups and applying the previously derived weak hardness result on each group. We note that the “non-usefulness” we get on the right hand side is still insufficient for Theorem 30; this will be dealt with in Section VI-D.
Theorem 36.
Let for some . For any , let and . Then, for any - algorithm , there exists such that
Proof.
Let for all . Furthermore, let denote the set of all such that .
First, notice that
For each fixed , consider the mechanism defined by . It is clear that is -. Furthermore, observe that for all . Therefore, by applying Theorem 35 and plugging it back into the above, we get
Dividing by then gives us the claimed bound. ∎
VI-D Boosting the Failure Probability
We will now prove the last part of the lower bound, which is to show that the existence of even slightly useful mechanism also leads to an existence of a highly useful mechanism, albeit at a slight increase in the distance threshold. The formal statement and its proof are given below; the proof uses the DP hyperparameter tuning algorithm (Theorem 32).
Theorem 37.
Suppose that there exists an - mechanism that is -useful for . Then, for all , there exists an - mechanism that is -useful for where and .
Proof.
First, let us construct the mechanism as follows:
-
On input , first let .
-
Then, let where .
-
Output .
Since is - and the Laplace mechanism is -, the basic composition theorem implies that the entire mechanism is -.
Let . Let . We now apply Algorithm 6 with and threshold . Theorem 32 ensures that the resulting algorithm is -. Our final mechanism is the mechanism that runs . If the output is not , returns that output. Otherwise, returns an arbitrary element of . Since is a post-processing of , we have is also -.
We will next show that is -useful for . By definition of the utility function, this immediately holds for any . Therefore, we may only consider any . Consider on such an . Let denote the corresponding values of in the -th run of . We consider the following three events:
-
Let denote the event that for some .
-
Let denote the event that for all .
-
Let denote the event that halts and returns in the first steps.
Before we bound the probability of each event, notice that, if none of occurs, we must have (where denotes the output of ), since . That is,
We will now bound the probability for each event. For , it immediately follows from the Laplace tail bound together with a union bound that
For , the -usefulness of implies that
Finally, for , a simple union bound gives
By combining the four inequalities above, we have
as desired. ∎
VI-E Putting Things Together: Proof of Theorem 30
Proof of Theorem 30.
Suppose for the sake of contradiction that, for some constant and there exists an - mechanism that is -useful for for every ; recall .
Using Theorem 37 with , there is a mechanism for and that is -useful for where . Plugging this into Theorem 36 with (which gives and in Theorem 36), we have
which is a contradiction for any sufficiently large .
∎
VII Putting Things Together: Proof of Theorem 5
Our main theorem follows from combining the main results from the previous two sections.
Proof of Theorem 5.
Let be as given in Theorem 16, which immediately yields the existence of an - mechanism that is -useful. Furthermore, since , Theorem 30 implies that for any constant , there is such that no - mechanism is -useful for . Finally, applying Lemma 15, we can conclude that no - mechanism is -useful for . This concludes our proof. ∎
VIII Conclusion and Discussion
In this work, we give a first task that, under certain assumptions, admits an efficient CDP algorithm but does not admit an SDP algorithm (even inefficient ones). As mentioned in Section I, perhaps the most intriguing next direction would be to see if there are more “natural” tasks for which algorithms can go beyond known lower bounds.
On the technical front, there are also a few interesting directions. For example, it would be interesting to see if the three assumptions in our paper can be removed, relaxed, or replaced (by perhaps more widely believed assumptions). Alternatively, we can ask the opposite question: what are the (cryptographic) assumptions necessary for separating and ? Such a question has been extensively studied in the multiparty model [HMST22, GMPS13, GKM+16, HMSS19, HNO+18]; for example, it is known that key-agreement is necessary and sufficient to get better-than-local-DP protocol for inner product in the two-party setting [HMST22]. Achieving such a result in our setting would significantly deepen our understanding of the -vs- question in the central model.
Another possible improvement is to strengthen the hardness of the adversary. In this paper, we only consider polynomial-time adversaries. Indeed, our mechanism does not remain against quasi-polynomial adversary. The reason is that we choose the hash value length to be only in Assumption 18, so a trivial “guess-and-check” algorithm can break this assumption in time . However, as far as we are aware, there is no inherent barrier in proving a separation with that holds even against, e.g., sub-exponential time adversaries. Achieving such a result (potentially under stronger or different assumptions) would definitely be interesting.
Furthermore, our task (or more precisely the utility function) is non-uniform (through the choice of ). It would also be interesting to have a uniform task.
Acknowledgments
We thank Prabhanjan Ananth for helpful discussions about differing-inputs obfuscation, and anonymous reviewers for helpful comments.
References
- [ABG+13] Prabhanjan Ananth, Dan Boneh, Sanjam Garg, Amit Sahai, and Mark Zhandry. Differing-inputs obfuscation and applications. IACR Cryptol. ePrint Arch., page 689, 2013.
- [Abo18] John M Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867–2867, 2018.
- [App17] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017.
- [BCP14] Elette Boyle, Kai-Min Chung, and Rafael Pass. On extractability obfuscation. In TCC, pages 52–73, 2014.
- [BCV16] Mark Bun, Yi-Hsiu Chen, and Salil P. Vadhan. Separating computational and statistical differential privacy in the client-server model. In TCC, pages 607–634, 2016.
- [BGI+01] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. In CRYPTO, pages 1–18, 2001.
- [BGI+12] Boaz Barak, Oded Goldreich, Russell Impagliazzo, Steven Rudich, Amit Sahai, Salil P. Vadhan, and Ke Yang. On the (im)possibility of obfuscating programs. J. ACM, 59(2):6:1–6:48, 2012.
- [BKP18] Nir Bitansky, Yael Tauman Kalai, and Omer Paneth. Multi-collision resistance: a paradigm for keyless hash functions. In STOC, pages 671–684, 2018.
- [BNO08] Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In CRYPTO, pages 451–468, 2008.
- [BOV07] Boaz Barak, Shien Jin Ong, and Salil P. Vadhan. Derandomization in cryptography. SIAM J. Comput., 37(2):380–400, 2007.
- [BP15a] Nir Bitansky and Omer Paneth. Zaps and non-interactive witness indistinguishability from indistinguishability obfuscation. In TCC, pages 401–427, 2015.
- [BP15b] Elette Boyle and Rafael Pass. Limits of extractability assumptions with distributional auxiliary input. In ASIACRYPT, pages 236–261, 2015.
- [BR93] Mihir Bellare and Phillip Rogaway. Ccs. pages 62–73, 1993.
- [BSW16] Mihir Bellare, Igors Stepanovs, and Brent Waters. New negative results on differing-inputs obfuscation. In EUROCRYPT, pages 792–821, 2016.
- [CGH04] Ran Canetti, Oded Goldreich, and Shai Halevi. The random oracle methodology, revisited. J. ACM, 51(4):557–594, 2004.
- [De12] Anindya De. Lower bounds in differential privacy. In TCC, pages 321–338, 2012.
- [DKM+06] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486–503, 2006.
- [DKY17] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In NeurIPS, pages 3571–3580, 2017.
- [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265–284, 2006.
- [DMT07] Cynthia Dwork, Frank McSherry, and Kunal Talwar. The price of privacy and the limits of LP decoding. In STOC, pages 85–94, 2007.
- [DN03] Irit Dinur and Kobbi Nissim. Revealing information while preserving privacy. In PODS, pages 202–210, 2003.
- [EPK14] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In CCS, pages 1054–1067, 2014.
- [GGHW17] Sanjam Garg, Craig Gentry, Shai Halevi, and Daniel Wichs. On the implausibility of differing-inputs obfuscation and extractable witness encryption with auxiliary input. Algorithmica, 79(4):1353–1373, 2017.
- [GKM+16] Vipul Goyal, Dakshita Khurana, Ilya Mironov, Omkant Pandey, and Amit Sahai. Do distributed differentially-private protocols require oblivious transfer? In ICALP, pages 29:1–29:15, 2016.
- [GKY11] Adam Groce, Jonathan Katz, and Arkady Yerukhimovich. Limits of computational differential privacy in the client/server setting. In TCC, pages 417–431, 2011.
- [GMPS13] Vipul Goyal, Ilya Mironov, Omkant Pandey, and Amit Sahai. Accuracy-privacy tradeoffs for two-party differentially private protocols. In CRYPTO, pages 298–315, 2013.
- [GOS12] Jens Groth, Rafail Ostrovsky, and Amit Sahai. New techniques for noninteractive zero-knowledge. J. ACM, 59(3):11:1–11:35, 2012.
- [Gre16] Andy Greenberg. Apple’s “differential privacy” is about collecting your data – but not your data. Wired, June, 13, 2016.
- [GT08] Ben Green and Terence Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Mathematics, 167(2):481–547, 2008.
- [HMSS19] Iftach Haitner, Noam Mazor, Ronen Shaltiel, and Jad Silbak. Channels of small log-ratio leakage and characterization of two-party differentially private computation. In TCC, pages 531–560, 2019.
- [HMST22] Iftach Haitner, Noam Mazor, Jad Silbak, and Eliad Tsfadia. On the complexity of two-party differential privacy. In STOC, pages 1392–1405, 2022.
- [HNO+18] Iftach Haitner, Kobbi Nissim, Eran Omri, Ronen Shaltiel, and Jad Silbak. Computational two-party correlation: A dichotomy for key-agreement protocols. In FOCS, pages 136–147, 2018.
- [HT10] Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In STOC, pages 705–714, 2010.
- [IPS15] Yuval Ishai, Omkant Pandey, and Amit Sahai. Public-coin differing-inputs obfuscation and its applications. In TCC, pages 668–697, 2015.
- [JLS21] Aayush Jain, Huijia Lin, and Amit Sahai. Indistinguishability obfuscation from well-founded assumptions. In STOC, pages 60–73, 2021.
- [KT18] Krishnaram Kenthapadi and Thanh T. L. Tran. PriPeARL: A framework for privacy-preserving analytics and reporting at LinkedIn. In CIKM, pages 2183–2191, 2018.
- [LT19] Jingcheng Liu and Kunal Talwar. Private selection from private candidates. In STOC, pages 298–309, 2019.
- [Man22] Pasin Manurangsi. Tight bounds for differentially private anonymized histograms. In SOSA, pages 203–213, 2022.
- [MMP+10] Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, and Salil P. Vadhan. The limits of two-party differential privacy. In FOCS, pages 81–90, 2010.
- [MPRV09] Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil P. Vadhan. Computational differential privacy. In CRYPTO, pages 126–142, 2009.
- [PS21] Nicolas Papernot and Thomas Steinke. Hyperparameter tuning with renyi differential privacy. CoRR, abs/2110.03620, 2021.
- [RSP+21] Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, and Parvez Ahammad. LinkedIn’s audience engagements API: A privacy preserving data analytics system at scale. J. Priv. Confiden., 11(3), 2021.
- [RTTV08] Omer Reingold, Luca Trevisan, Madhur Tulsiani, and Salil P. Vadhan. Dense subsets of pseudorandom sets. In FOCS, pages 76–85, 2008.
- [Sha14] Stephen Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014.
- [Sur19] Ananda Theertha Suresh. Differentially private anonymized histograms. In NeurIPS, pages 7969–7979, 2019.
- [TZ08] Terence Tao and Tamar Ziegler. The primes contain arbitrarily long polynomial progressions. Acta Mathematica, 201(2):213 – 305, 2008.
- [Vad17] Salil P. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347–450. Springer International Publishing, 2017.
- [War65] Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. JASA, 60(309):63–69, 1965.
Appendix A Comparison of various assumptions
We review and compare the various notions of differing inputs obfuscation, showing that the notion of (Definition 20) is in fact weaker (or at least, no stronger) than all notions of differing inputs obfuscation studied in literature.
The definition of as given by [BGI+12] did not include the notion of a sampler. Informally speaking, it requires that for all efficient adversaries there is an efficient adversary such that if can distinguish the obfuscation of a circuit from the obfuscation of , then for any circuits and that are functionally equivalent to and respectively, can find an input on which and disagree.
This notion is stronger than the corresponding notion involving samplers. Since most applications of differing-inputs obfuscation in literature are stated using differing-inputs samplers, we will only refer to notions that involve these.
Definition 38 (Differing-Inputs Circuit Sampler [ABG+13]).
An efficient non-uniform sampling algorithm is a differing-inputs sampler for the parameterized collection of circuits if the output of is distributed over and for every efficient non-uniform algorithm , there exists a negligible function such that for all :
- Plain Sampler.
-
We call a differing-inputs sampler as a Plain Sampler if is always .
- Public-Coin Sampler.
-
We call a differing-inputs sampler as Public-Coin Sampler if is equal to (precisely Definition 19).
- General Sampler.
-
We call a differing-inputs sampler as a General Sampler whenever we want to emphasize that is allowed to be any function of . In particular, plain and public-coin samplers are special cases of general samplers.
Note that, the more information that is allowed to contain, the more restricted the distribution over circuit pairs gets. In particular, any public-coin Sampler remains a differing-inputs Sampler if we set to be some function of (instead of being all of ), and similarly, any general differing-inputs Sampler can be converted to a plain-Sampler by simply setting .
We can consider two notions of security of differing inputs obfuscators, depending on whether or not the distinguisher has access to . Recall that the “differing-inputs” condition in Definition 20 was
(6) |
On the other hand, we could consider a different notion where for any general sampler , for , we replace the “differing-inputs” condition with
(7) |
Depending on the type of sampler (plain or public-coin or general) and the notion of security for differing inputs obfuscators ((6) or (7)), we get various kinds of assumptions, which we list below.
- Plain .
- Public-Coin .
- General .
- for General Samplers.
-
We define as the notion of that holds only against general samplers, but where the distinguisher does not have access to , as in (6).
- for Public-Coin Samplers.
-
This is precisely Definition 20, where the security of holds only for public-coin samplers, where the distinguisher does not have access to , as in (6).
Comparison between different assumptions. The comparison between the assumptions asserting existence of each type of is illustrated in Figure 3, with justification for each arrow given as follows:
-
Existence of implies existence of and , since both are special cases corresponding to plain samplers and public-coin samplers respectively.
-
To the best of knowledge, it is unknown whether the assumptions of existence of and the existence of are comparable or not.
-
Existence of implies existence of since any general sampler can be converted to a plain sampler by simply setting ; note that the distinguisher (in the definition of ) does not have access to in either case.
-
Existence of implies existence of and since both are special cases corresponding to plain samplers and public-coin samplers respectively.
-
Existence of implies existence of , since the distinguisher in the definition of does not have access to and hence is less powerful.
Finally, one may wonder, what was special about the application of in this paper that only required and not or as in prior work in cryptography. The main reason is that, in cryptographic applications, an is provided to adversaries to enable certain cryptographic functionality (such as by revealing some public key parameters), and thus, it is required that the is secure even given knowledge of this information. In applications of , the distinguisher typically does not have access to all of (such as some secret key parameters may be hidden), but security given knowledge of entire implies security given partial knowledge of . In the setting of this paper, there wasn’t any particular functionality that needed to be enabled, other than basic circuit evaluation, and the particular circuit samplers of interest were public-coin differing inputs samplers, which is why it suffices to only assume .