This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

How to Find a Point in the Convex Hull Privately

Haim Kaplan School of Computer Science, Tel Aviv University, Tel Aviv, and Google, [email protected]. Partially supported by ISF grant 1595/19 and grant 1367/2016 from the German-Israeli Science Foundation (GIF)    Micha Sharir School of Computer Science, Tel Aviv University, Tel Aviv, Israel, [email protected]. Partially supported by ISF Grant 260/18, by grant 1367/2016 from the German-Israeli Science Foundation (GIF), and by Blavatnik Research Fund in Computer Science at Tel Aviv University.    Uri Stemmer Department of Computer Science, Ben-Gurion University, and Google, [email protected]. Partially supported by ISF grant 1871/19.
Abstract

We study the question of how to compute a point in the convex hull of an input set SS of nn points in d{\mathbb{R}}^{d} in a differentially private manner. This question, which is trivial without privacy requirements, turns out to be quite deep when imposing differential privacy. In particular, it is known that the input points must reside on a fixed finite subset GdG\subseteq{\mathbb{R}}^{d}, and furthermore, the size of SS must grow with the size of GG. Previous works [1, 2, 3, 4, 5, 11] focused on understanding how nn needs to grow with |G||G|, and showed that n=O(d2.58log|G|)n=O\left(d^{2.5}\cdot 8^{\log^{*}|G|}\right) suffices (so nn does not have to grow significantly with |G||G|). However, the available constructions exhibit running time at least |G|d2|G|^{d^{2}}, where typically |G|=Xd|G|=X^{d} for some (large) discretization parameter XX, so the running time is in fact Ω(Xd3)\Omega(X^{d^{3}}).

In this paper we give a differentially private algorithm that runs in O(nd)O(n^{d}) time, assuming that n=Ω(d4logX)n=\Omega(d^{4}\log X). To get this result we study and exploit some structural properties of the Tukey levels (the regions DkD_{\geq k} consisting of points whose Tukey depth is at least kk, for k=0,1,k=0,1,\dots). In particular, we derive lower bounds on their volumes for point sets SS in general position, and develop a rather subtle mechanism for handling point sets SS in degenerate position (where the deep Tukey regions have zero volume). A naive approach to the construction of the Tukey regions requires nO(d2)n^{O(d^{2})} time. To reduce the cost to O(nd)O(n^{d}), we use an approximation scheme for estimating the volumes of the Tukey regions (within their affine spans in case of degeneracy), and for sampling a point from such a region, a scheme that is based on the volume estimation framework of Lovász and Vempala [13] and of Cousins and Vempala [7]. Making this framework differentially private raises a set of technical challenges that we address.

1 Introduction

We often would like to analyze data while protecting the privacy of the individuals that contributed to it. At first glance, one might hope to ensure privacy by simply deleting all names and ID numbers from the data. However, such anonymization schemes are proven time and again to violate privacy. This gave rise of a theoretically-rigorous line of work that has placed private data analysis on firm foundations, centered around a mathematical definition for privacy known as differential privacy [8].

Consider a database SS containing personal information of individuals. Informally, an algorithm operating on such a database is said to preserve differential privacy if its outcome distribution is (almost) insensitive to any arbitrary change to the data of one individual in the database. Intuitively, this means that an observer looking at the outcome of the algorithm (almost) cannot distinguish between whether Alice’s information is xx or yy (or whether Alice’s information is present in the database at all) because in any case it would have (almost) no effect on the outcome distribution of the algorithm.

Definition 1.1 (Dwork et al. [8]).

Two databases (multisets) SS and SS^{\prime} are called neighboring if they differ in a single entry. That is, S=S0{x}S=S_{0}\cup\{x\} and S=S0{y}S^{\prime}=S_{0}\cup\{y\} for some items xx and yy. A randomized algorithm AA is (ε,δ)(\varepsilon,\delta)-differentially private if for every two neighboring databases S,SS,S^{\prime} and for any event TT we have

𝐏𝐫[A(S)T]eε𝐏𝐫[A(S)T]+δ.{\bf Pr}[A(S)\in T]\leq e^{\varepsilon}\cdot{\bf Pr}[A(S^{\prime})\in T]+\delta.

When δ=0\delta=0 this notion is referred to as pure differential privacy, and when δ>0\delta>0 it is referred to as approximate differential privacy.

Remark 1.2.

Typically, ε{\varepsilon} is set to be a small constant, say ε=0.1{\varepsilon}=0.1, and δ\delta is set to be a small function of the database size |S||S| (much smaller than 1/|S|1/|S|). Note that to satisfy the definition (in any meaningful way) algorithm AA must be randomized.

Differential privacy is increasingly accepted as a standard for rigorous treatment of privacy. However, even though the field has witnessed an explosion of research in the recent years, much remains unknown and answers to fundamental questions are still missing. In this work we study one such fundamental question, already studied in [1, 2, 3, 4, 5, 11]: Given a database containing points in d{\mathbb{R}}^{d} (where every point is assumed to be the information of one individual), how can we privately identify a point in the convex hull of the input points? This question, which is trivial without privacy requirements, turns out to be quite deep when imposing differential privacy. In particular, Bun et al. [5] showed that in order to be able to solve it, we must assume that the input points reside on a fixed finite subset GdG\subseteq{\mathbb{R}}^{d}, and furthermore, the number of input points must grow with the size of GG.

The Private Interior Point (PIP) Problem.
Let β,ε,δ,X\beta,\varepsilon,\delta,X be positive parameters where β,ε,δ\beta,\varepsilon,\delta are small and XX is a large integer. Let G[0,1]dG\subseteq[0,1]^{d} be a finite uniform grid with side steps 1/X1/X (so |G|=(X+1)d|G|=(X+1)^{d}). Design an algorithm AA such that for some nn\in{\mathbb{N}} (as small as possible as a function of β,ε,δ,X\beta,\varepsilon,\delta,X) we have
1. Utility: For every database SS containing at least nn points from GG it holds that A(S)A(S) returns a point in the convex hull of SS with probability at least 1β1-\beta. (The outcome of AA does not have to be in GG.) 2. Privacy: For every pair of neighboring databases S,SS,S^{\prime}, each containing nn points from GG, and for any event TT, we have 𝐏𝐫[A(S)T]eε𝐏𝐫[A(S)T]+δ.{\bf Pr}[A(S)\in T]\leq e^{\varepsilon}\cdot{\bf Pr}[A(S^{\prime})\in T]+\delta.

The parameter nn is referred to as the sample complexity of the algorithm. It is the smallest number of points on which we are guaranteed to succeed (not to be confused with the actual size of the input). The PIP problem is very natural on its own. Furthermore, as was observed in [2], an algorithm for solving the PIP problem can be used as a building block in other applications with differential privacy, such as learning halfspaces and linear regression. Previous works [1, 2, 3, 4, 5, 11] have focused on the task of minimizing the sample complexity nn while ignoring the runtime of the algorithm. In this work we seek an efficient algorithm for the PIP problem, that still keeps the sample complexity nn “reasonably small” (where “reasonably small” will be made precise after we introduce some additional notation).

1.1 Previous Work

Several papers studied the PIP problem for d=1d=1. In particular, three different constructions with sample complexity 2O(log|G|)2^{O(\log^{*}|G|)} were presented in [3, 4, 5] (for d=1d=1). Recently, Kaplan et al. [11] presented a new construction with sample complexity O((log|G|)1.5)O((\log^{*}|G|)^{1.5}) (again, for d=1d=1). Bun et al. [5] gave a lower bound showing that every differentially private algorithm for this task must have sample complexity Ω(log|G|)\Omega(\log^{*}|G|). Beimel et al. [2] incorporated a dependency in dd to this lower bound, and showed that every differentially private algorithm for the PIP problem must use at least n=Ω(d+log|G|)n=\Omega(d+\log^{*}|G|) input points.

For the case of pure differential privacy (i.e., δ=0\delta=0), a lower bound of n=Ω(logX)n=\Omega(\log X) on the sample complexity follows from the results of Beimel et al. [1]. This lower bound is tight, as an algorithm with sample complexity n=O(logX)n=O(\log X) (for d=1d=1) can be obtained using a generic tool in the literature of differential privacy, called the exponential mechanism [15]. We sketch this application of the exponential mechanism here. (For a precise presentation of the exponential mechanism see Section 2.1.) Let G={0,1X,2X,,1}G=\{0,\frac{1}{X},\frac{2}{X},\dots,1\} be our (1-dimensional) grid within the interval [0,1][0,1], and let SS be a multiset containing nn points from GG. The algorithm is as follows.

  1. 1.

    For every yGy\in G define the score qS(y)=min{|{xSxy}|,|{xSxy}|}.q_{S}(y)=\min\left\{|\{x\in S\mid x\geq y\}|,\;|\{x\in S\mid x\leq y\}|\right\}.

  2. 2.

    Output yGy\in G with probability proportional to eεqS(y)e^{{\varepsilon}\cdot q_{S}(y)}.

Intuitively, this algorithm satisfies differential privacy because changing one element of SS changes the score qS(y)q_{S}(y) by at most ±1\pm 1, and thus changes the probabilities with which we sample elements by roughly an eεe^{{\varepsilon}} factor. As for the utility analysis, observe that yG\exists y\in G with qS(y)n2q_{S}(y)\geq\frac{n}{2}, and the probability of picking this point is (at least) proportional to eεn/2e^{{\varepsilon}n/2}. As this probability increases exponentially with nn, by setting nn to be big enough we can ensure that points yy^{\prime} outside of the convex hull (those with qS(y)=0q_{S}(y^{\prime})=0) get picked with very low probability.

Beimel et al. [2] observed that this algorithm extends to higher dimensions by replacing qS(y)q_{S}(y) with the Tukey depth 𝗍𝖽S(y){\sf td}_{S}(y) of the point yy with respect to the input set SS (the Tukey depth of a point yy is the minimal number of points that need to be removed from SS to ensure that yy is not in the convex hull of the remaining input points). However, even though there must exist a point ydy\in{\mathbb{R}}^{d} with high Tukey depth (at least n/(d+1)n/(d+1); see [14]), the finite grid GdG\subseteq{\mathbb{R}}^{d} might fail to contain such a point. Hence, Beimel et al. [2] first refined the grid GG into a grid GG^{\prime} that contains a point with high Tukey depth, and then randomly picked a point yy from GG^{\prime} with probability proportional to eε𝗍𝖽S(y)e^{{\varepsilon}\cdot{\sf td}_{S}(y)}. To compute the probabilities with which grid points are sampled, the algorithm in [2] explicitly computes the Tukey depth of every point in GG^{\prime}, which, because of the way in which GG^{\prime} is defined, results in running time of at least Ω(|G|d2)=Ω(Xd3)\Omega(|G|^{d^{2}})=\Omega(X^{d^{3}}) and sample complexity n=O(d3log|G|)=O(d4logX)n=O(d^{3}\log|G|)=O(d^{4}\log X). Beimel et al. then presented an improvement of this algorithm with reduced sample complexity of n=O(d2.58log|G|)n=O(d^{2.5}\cdot 8^{\log^{*}|G|}), but the running time remained Ω(|G|d2)=Ω(Xd3)\Omega(|G|^{d^{2}})=\Omega(X^{d^{3}}).

1.2 Our Construction

We give an approximate differentially private algorithm for the private interior point problem that runs in O(nd)O(n^{d}) time,111When we use OO-notation for time complexity we hide logarithmic factors in XX, 1/ε1/{\varepsilon}, 1/δ1/\delta, and polynomial factors in dd. We assume operations on real numbers in O(1)O(1) time (the so-called real RAM model). and succeeds with high probability when the size of its input is Ω(d4εlogXδ)\Omega(\frac{d^{4}}{{\varepsilon}}\log\frac{X}{\delta}). Our algorithm is obtained by carefully implementing the exponential mechanism and reducing its running time from Ω(|G|d2)=Ω(Xd3)\Omega(|G|^{d^{2}})=\Omega(X^{d^{3}}) to O(nd)O(n^{d}). We now give an informal overview of this result.

To avoid the need to extend the grid and to calculate the Tukey depth of each point in the extended grid, we sample our output directly from [0,1]d[0,1]^{d}. To compute the probabilities with which we sample a point from [0,1]d[0,1]^{d} we compute, for each kk in an appropriate range, the volume of the Tukey region of depth kk, which we denote as DkD_{k}. (That is, DkD_{k} is the region in [0,1]d[0,1]^{d} containing all points with Tukey depth exactly kk.) We then sample a value k[n]k\in[n] with probability proportional to Vol(Dk)eεk{\rm Vol}(D_{k})\cdot e^{{\varepsilon}k}, and then sample a random point uniformly from DkD_{k}.

Observe that this strategy picks a point with Tukey depth kk with probability proportional to Vol(Dk)eεk{\rm Vol}(D_{k})\cdot e^{{\varepsilon}k}. Hence, if for a “large enough” value of kk (say kncdk\geq\frac{n}{cd} for a suitable absolute constant c>1c>1) we have that Vol(Dk){\rm Vol}(D_{k}) is “large enough”, then a point with Tukey depth kk is picked with high probability. However, if Vol(Dk)=0{\rm Vol}(D_{k})=0 (or too small) then a point with Tukey depth kk is picked with probability zero (or with too small a probability). Therefore, to apply this strategy, we derive a lower bound on the volume of every non-degenerate Tukey region, showing that if the volume is non-zero, then it is at least Ω(1/Xd3)\Omega\left(1/X^{d^{3}}\right).

There are two issues here. The first issue is that the best bound we know on the complexity of a Tukey region is O(n(d1)d/2)O(n^{(d-1)\lfloor d/2\rfloor}), so we cannot compute these regions explicitly (in the worst-case) in time O(nd)O(n^{d}) (which is our target runtime complexity). We avoid the need to compute the Tukey regions explicitly by using an approximation scheme for estimating the volume of each region and for sampling a point from such a region, a scheme that is based on the volume estimation framework of Lovász and Vempala [13] and of Cousins and Vempala [7]. The second issue is that it might be the case that all Tukey regions for large values of kk are degenerate, i.e., have volume 0, in which case this strategy might fail to identify a point in the convex hull of the input points.

Handling degeneracies. We show that if the Tukey regions of high depth are degenerate, then many of the input points must lie in a lower-dimensional affine subspace. This can be used to handle degenerate inputs SS as follows. We first (privately) check whether there exists an affine proper subspace that contains many points of SS. If we find such a subspace ff, we recursively continue the procedure within ff, with respect to SfS\cap f. Otherwise, if no such subspace exists, then it must be the case that the Tukey regions of high depth are full-dimensional (with respect to the subspace into which we have recursed so far), so we can apply our algorithm for the non-degenerate case and obtain a point that lies, with high probability, in the convex hull of the surviving subset of SS, and thus of the full set SS.

We remark that it is easy to construct algorithms with running time polynomial in the input size nn, when nn grows exponentially in dd. (In that case one can solve the problem using a reduction to the 1-dimensional case.) In this work we aim to reduce the running time while keeping the sample complexity nn at most polynomial in dd and in log|G|\log|G|.

2 Preliminaries

We assume that our input set SS consists of nn points that lie in the intersection of a grid GG with the cube [0,1]d[0,1]^{d} in d{\mathbb{R}}^{d}. We assume that GG is of side length 1/X1/X for a given accuracy integer parameter XX, so it partitions the cube into XdX^{d} cells.

2.1 The exponential mechanism

Let GG^{*} denote the set of all finite databases over a grid GG, and let FF be a finite set. Given a database SGS\in G^{*}, a quality (or scoring) function q:G×Fq:G^{*}\times F\rightarrow{\mathbb{N}} assigns a number q(S,f)q(S,f) to each element (S,f)G×F(S,f)\in G^{*}\times F, identified as the “quality” of ff with respect to SS. We say that the function qq has sensitivity Δ\Delta if for all neighboring databases SS and SS^{\prime} and for all fFf\in F we have |q(S,f)q(S,f)|Δ|q(S,f)-q(S^{\prime},f)|\leq\Delta.

The exponential mechanism of McSherry and Talwar [15] privately identifies an element fFf\in F with large quality q(S,f)q(S,f). Specifically, it chooses an element fFf\in F randomly, with probability proportional to exp(εq(S,f)/(2Δ))\exp\left({\varepsilon}\cdot q(S,f)/(2\Delta)\right). The privacy and utility of the mechanism are:

Theorem 2.1 (McSherry and Talwar [15]).

The exponential mechanism is (ε,0)({\varepsilon},0)-differentially private. Let qq be a quality function with sensitivity Δ\Delta. Fix a database SGS\in G^{*} and let OPT=maxfF{q(S,f)}{\rm OPT}=\max_{f\in F}\{q(S,f)\}. For any β(0,1)\beta\in(0,1), with probability at least (1β)(1-\beta), the exponential mechanism outputs a solution ff with quality q(S,f)OPT2Δεln|F|βq(S,f)\geq{\rm OPT}-\frac{2\Delta}{{\varepsilon}}\ln\frac{|F|}{\beta}.

2.2 Tukey depth

The Tukey depth [17] 𝗍𝖽S(q){\sf td}_{S}(q) of a point qq with respect to SS is the minimum number of points of SS we need to remove to make qq lie outside the convex hull of the remaining subset. Equivalently, 𝗍𝖽S(q){\sf td}_{S}(q) is the smallest number of points of SS that lie in a closed halfspace containing qq. We will write 𝗍𝖽(q){\sf td}(q) for 𝗍𝖽S(q){\sf td}_{S}(q) when the set SS is clear from the context. It easily follows from Helly’s theorem that there is always a point of Tukey depth at least n/(d+1)n/(d+1) (see, e.g., [14]). We denote the largest Tukey depth of a point by 𝗍𝖽max(S){\sf td}_{\rm max}(S) (the maximum Tukey depth is always at most n/2n/2).

We define the regions Dk(S)={q[0,1]d𝗍𝖽S(qk}D_{\geq k}(S)=\left\{q\in[0,1]^{d}\mid{\sf td}_{S}(q\geq k\right\} and Dk(S)=Dk(S)Dk+1(S)D_{k}(S)=D_{\geq k}(S)\setminus D_{\geq k+1}(S) for k=0,,𝗍𝖽max(S)k=0,\ldots,{\sf td}_{\rm max}(S). Note that D1D_{\geq 1} is the convex hull of SS and that D0=[0,1]dD_{\geq 0}=[0,1]^{d}. It is easy to show that DkD_{\geq k} is convex; DkD_{\geq k} is in fact the intersection of all (closed) halfspaces containing at least nk+1n-k+1 points of SS; see [16]. It is easy to show that all this is true also when SS is degenerate. See Figure 1 for an illustration. The following lemma is easy to establish.

Lemma 2.2.

If DkD_{\geq k} is of dimension dd (we refer to such a region as non-degenerate) then Ck=Dk(S)C_{k}={\partial}D_{\geq k}(S) is a convex polytope, each of whose facets is contained in a simplex σ\sigma spanned by dd points of SS, such that one of the open halfspaces bounded by the hyperplane supporting σ\sigma contains exactly k1k-1 points of SS.

Refer to caption
D2D_{\geq 2}
Figure 1: The Tukey layers D2D_{\geq 2} and D1D_{\geq 1}.

3 The case of general position

As already said, we apply the exponential mechanism for privately identifying a point in [0,1]d[0,1]^{d} with (hopefully) large Tukey depth with respect to the input set SS. This satisfies (pure) differential privacy since the sensitivity of the Tukey depth is 11. In this section we show that when the input points are in general position, then this application of the exponential mechanism succeeds (with high probability) in identifying a point that has positive Tukey depth, that is, a point inside the convex hull of SS.

To implement the exponential mechanism (i.e., to sample a point from [0,1]d[0,1]^{d} appropriately), we need to compute the Tukey regions DkD_{\geq k} and their volumes. In this section we compute these regions explicitly in O(n1+(d1)d/2)O\left(n^{1+(d-1)\lfloor d/2\rfloor}\right) time. In Section 5 we will show that the cost can be reduced to O(nd)O(n^{d}).

Computing 𝑫𝒌{\boldsymbol{D_{\geq k}}}.

We pass to the dual space, and construct the arrangement 𝒜(S){\cal A}(S^{*}) of the hyperplanes dual to the points of SS. A point hh^{*} dual to a hyperplane hh supporting DkD_{\geq k} has at least nk+1n-k+1 dual hypeplanes passing below hh^{*} or incident to hh^{*}, or, alternatively, passing above hh^{*} or incident to hh^{*}. Furthermore, if we move hh^{*} slightly down in the first case (resp., up in the second case), the number of hypeplanes below (resp., above) it becomes smaller than nk+1n-k+1.

When hh^{*} is a vertex of 𝒜(S){\cal A}(S^{*}) we refer to it as kk-critical, or simply as critical. If DkD_{\geq k} is non-degenerate then, by Lemma 2.2, each hyperplane hh that supports a facet of DkD_{\geq k} must be spanned by dd affinely independent points of SS. That is, all these hyperplanes are dual to kk-critical vertices of 𝒜(S){\cal A}(S^{*}).

We compute all critical dual vertices that have at least nk+1n-k+1 dual hyperplanes passing below them or incident to them, or those with at least nk+1n-k+1 dual hyperplanes passing above them or incident to them. The intersection of the appropriate primal halfspaces that are bounded by the hyperplanes corresponding to these dual vertices is DkD_{\geq k}. This gives an algorithm for constructing DkD_{\geq k} when it is non-degenerate. Otherwise DkD_{\geq k} is degenerate and its volume is 0, and we need additional techniques, detailed in the next subsection, to handle such situations.

We compute the volume of each non-degenerate DkD_{\geq k}, for k=1,,𝗍𝖽max(S)k=1,\ldots,{\sf td}_{\rm max}(S). We do that in brute force, by computing and triangulating DkD_{\geq k} and adding up the volumes of the simplices in this triangulation. Then we subtract the volume of Dk+1D_{\geq k+1} from the volume of DkD_{\geq k} to get the volume of DkD_{k}.

The sampling mechanism.

We assign to each DkD_{k} the weight eεk/2e^{{\varepsilon}k/2} and sample a region DkD_{k}, for k=0,,𝗍𝖽max(S)k=0,\ldots,{\sf td}_{\rm max}(S), with probability

μk=eεk/2Vol(Dk)j0eεj/2Vol(Dj),\mu_{k}=\frac{e^{{\varepsilon}k/2}{\rm Vol}(D_{k})}{\sum_{j\geq 0}e^{{\varepsilon}j/2}{\rm Vol}(D_{j})},

where Vol(Dk){\rm Vol}(D_{k}) denotes the volume of DkD_{k}. Then we sample a point uniformly at random from DkD_{k}. We do this in brute force by computing DkD_{k}, triangulating it, computing the volume of each simplex, drawing a simplex from this triangulation with probability proportional to its volume, and then drawing a point uniformly at random from the chosen simplex.222A simple way to implement the last step is to draw uniformly and independently dd points from [0,1][0,1], compute the lengths λ1,,λd+1\lambda_{1},\ldots,\lambda_{d+1} of the intervals into which they partition [0,1][0,1], and return i=1d+1λivi\sum_{i=1}^{d+1}\lambda_{i}v_{i}, where v1,,vd+1v_{1},\ldots,v_{d+1} are the vertices of the simplex.

This is an instance of the exponential mechanism in which the score (namely the Tukey depth) has sensitivity 11, i.e., |𝗍𝖽S(q)𝗍𝖽S(q)|1|{\sf td}_{S}(q)-{\sf td}_{S^{\prime}}(q)|\leq 1 for any point q[0,1]dq\in[0,1]^{d}, when SS and SS^{\prime} differ by only one element. It thus follows from the properties of the exponential mechanism (Theorem 2.1) that this procedure is (purely) ε{\varepsilon}-differentially private.

Complexity.

Computing the dual arrangement 𝒜(S){\cal A}(S^{*}) takes O(nd)O(n^{d}) time [10]. Assume that DkD_{\geq k} is non-degenerate and let MkM_{k} denote the number of hyperplanes defining DkD_{\geq k} (i.e., the hyperplanes supporting its facets). It takes O(Mkd/2)O(M_{k}^{\lfloor d/2\rfloor}) time to construct DkD_{\geq k}, as the intersection of MkM_{k} halfspaces, which is a dual version of constructing the convex hull (see [6]). Within the same asymptotic bound we can triangulate DkD_{\geq k} and compute its volume. We obviously have Mk=O(nd)M_{k}=O(n^{d}), but the number can be somewhat reduced. The following lemma is known.

Lemma 3.1 (Proposition 3 in [12]).

The number of halfspaces needed to construct DkD_{\geq k} is O(nd1)O(n^{d-1}).

Proof.

Fix a (d1)(d-1)-tuple σ\sigma of points of SS, and consider all the relevant (closed) halfspaces, each of which is bounded by a hyperplane that is spanned by σ\sigma and another point of SS, and contains at least nk+1n-k+1 points of SS. It is easy to check that, as long as the intersection of these halfspaces is full-dimensional, it is equal to the intersection of just two of them. ∎

Summing up, we get that computing the volume of all the non-degenerate regions DkD_{\geq k}, for k=1,,𝗍𝖽max(S)k=1,\ldots,{\sf td}_{\rm max}(S), takes O(k1Mkd/2)=O(n1+(d1)d/2)O\left(\sum_{k\geq 1}M_{k}^{\lfloor d/2\rfloor}\right)=O\left(n^{1+(d-1)\lfloor d/2\rfloor}\right) time.

Utility.

We continue to assume that DkD_{\geq k} is non-degenerate, and give a lower bound on its volume. By Lemma 2.2, such a DkD_{\geq k} is the intersection of halfspaces, each bounded by a hyperplane that is spanned by dd points of SS. Denote by HH the set of these hyperplanes. To obtain the lower bound, it suffices to consider the case where DkD_{\geq k} is a simplex, each of whose vertices is the intersection point of dd hyperplanes of HH.

The equation of a hyperplane hh that passes through dd points, a1,,ada_{1},\ldots,a_{d}, of SS is

|1x1xd1a1,1a1,d1ad,1ad,d|=0,\begin{vmatrix}1&x_{1}&\cdots&x_{d}\\ 1&a_{1,1}&\cdots&a_{1,d}\\ &&\vdots&\\ 1&a_{d,1}&\cdots&a_{d,d}\end{vmatrix}=0,

where ai=(ai,1,,ai,d)a_{i}=(a_{i,1},\ldots,a_{i,d}), for i=1,,di=1,\ldots,d. The coefficients of the xix_{i}’s in the equation of hh are d×dd\times d subdeterminants of this determinant, where each determinant has one column of 11’s, and d1d-1 other columns, each of whose entries is some ai,ja_{i,j}, which is a rational of the form m/Xm/X, with 0mX0\leq m\leq X (the same holds for the 11’s, with m=Xm=X). The value of such a determinat (coefficient) is a rational number with denominator XdX^{d}. By Hadamard’s inequality, its absolute value is at most the product of the Euclidean norms of its rows, which is at most dd/2d^{d/2}. That is, the numerator of the determinant is an integer of absolute value at most dd/2Xdd^{d/2}X^{d}. The free term is a similar sub-determinant, but all its entries are the ai,ja_{i,j}’s, so it too is a rational with denominator XdX^{d}, and with numerator of absolute value at most dd/2Xdd^{d/2}X^{d}. Multiplying by XdX^{d}, all the coefficients become integers of absolute value at most dd/2Xdd^{d/2}X^{d}.

Each vertex of any region DkD_{k} of Tukey depth kk, for any kk, is a solution of a linear system of dd hyperplane equations of the above form. It is therefore a rational number whose denominator, by Cramer’s rule, is the determinant of all non-free coefficients of the dd hyperplanes. This is an integer whose absolute value, again by Hadamard’s inequality, is at most

(ddd/2Xd)ddd(d+1)/2Xd2.\left(\sqrt{d}d^{d/2}X^{d}\right)^{d}\leq d^{d(d+1)/2}X^{d^{2}}.

Since the free terms are also integers, we conclude that the coordinates of the intersection point are rationals with a common integral denominator of absolute value at most dd(d+1)/2Xd2d^{d(d+1)/2}X^{d^{2}}.

We can finally obtain a lower bound for the nonzero volume of a simplex spanned by any d+1d+1 linearly independent intersection points v1,,vd+1v_{1},\ldots,v_{d+1}. This volume is

1d!|1v1,1v1,d1v2,1v2,d1vd+1,1vd+1,d|,\frac{1}{d!}\begin{vmatrix}1&v_{1,1}&\cdots&v_{1,d}\\ 1&v_{2,1}&\cdots&v_{2,d}\\ &&\vdots&\\ 1&v_{d+1,1}&\cdots&v_{d+1,d}\end{vmatrix},

where again vi=(vi,1,,vi,d)v_{i}=(v_{i,1},\ldots,v_{i,d}), for i=1,,d+1i=1,\ldots,d+1. Note that all the entries in any fixed row have the same denominator. The volume is therefore a rational number whose denominator is d!d! times the product of these denominators, which is thus an integer with absolute value at most

d!(dd(d+1)/2Xd2)d(dX)d3d!\cdot\left(d^{d(d+1)/2}X^{d^{2}}\right)^{d}\leq(dX)^{d^{3}}

(for d2d\geq 2). That is, we get the following lemma.

Lemma 3.2.

If the volume of DkD_{\geq k} is not zero then it is at least 1/(dX)d31/(dX)^{d^{3}}.

Assume that the volume of DkD_{\geq k} is not zero for k=k0:=n/(4d)k=k_{0}:=n/(4d). Since the score of a point outside the convex hull is zero and the volume of D0D_{\geq 0} is at most 11, we get that the probability to sample a point outside of the convex hull is at most

1eεk0Vol(Dk0)(dX)d3eεn/(4d).\frac{1}{e^{{\varepsilon}k_{0}}{\rm Vol}(D_{k_{0}})}\leq\frac{{(dX)^{d^{3}}}}{e^{{\varepsilon}n/(4d)}}.

This inequality leads to the following theorem, which summarizes the utility that our instance of the exponential mechanism provides.

Theorem 3.3.

If n4d4log(dX)ε+4dεlog1β{\displaystyle n\geq\frac{4d^{4}\log(dX)}{{\varepsilon}}+\frac{4d}{{\varepsilon}}\log\frac{1}{\beta}} and Dn/4dD_{\geq n/4d} has non-zero volume then the exponential mechanism, implemented as above, returns a point in the convex hull with probability at least 1β1-\beta, in O(n1+(d1)d/2)O\left(n^{1+(d-1)\lfloor d/2\rfloor}\right) time.

4 Handling degeneracies

In general we have no guarantee that Dn/4dD_{\geq n/4d} has non-zero volume. In this section we show how to overcome this and compute (with high probability) a point in the convex hull of any input. We rely on the following lemma, which shows that if DkD_{\geq k} has zero volume then many points of SS are in a lower-dimensional affine subspace.

Lemma 4.1.

If DkD_{\geq k} spans an affine subspace ff of dimension jj then

|Sf|n(dj+1)(k1).|S\cap f|\geq n-(d-j+1)(k-1).
Proof.

Recall that DkD_{\geq k} is the intersection of all closed halfspaces hh that contain at least nk+1n-k+1 points of SS. Note that a halfspace that bounds DkD_{\geq k} and whose bounding hyperplane properly crosses ff, defines a proper halfspace within ff, and, by assumption, the intersection of these halfspaces has positive relative volume. This means that the intersection of these halfspaces in d{\mathbb{R}}^{d} has positive volume too, and thus cannot confine DkD_{\geq k} to ff. To get this confinement, there must exist (at least) dj+1d-j+1 halfspaces in the above collection, whose intersection is ff. Hence the union of their complements is df{\mathbb{R}}^{d}\setminus f. Since this union contains at most (dj+1)(k1)(d-j+1)(k-1) points of SS, the claim follows. ∎

In what follows, to simplify the expressions that we manipulate, we use the weaker lower bound n(dj+1)kn-(d-j+1)k. In order for the lemma to be meaningful, we want kk to be significantly smaller than the centerpoint bound n/(d+1)n/(d+1), so we set, as above, k=n/(4d)k=n/(4d).

We use Lemma 4.1 to handle degenerate inputs SS, using the following high-level approach. We first (privately) check whether there exists an affine proper subspace that contains many points of SS. If we find such a subspace ff, we recursively continue the procedure within ff, with respect to SfS\cap f. Lemma 4.1 then implies that we do not lose too many points when we recurse within ff (that is, |Sf||S\cap f| is large), using our choice k=n/(4d)k=n/(4d). Otherwise, if no such subspace exists, Lemma 4.1 implies that DkD_{\geq k} is full-dimensional (with respect to the subspace into which we have recursed so far), so we can apply the exponential mechanism, as implemented in Section 3, and obtain a point that lies, with high probability, in the convex hull of the surviving subset of SS, and thus of the full set SS. We refer to this application of the exponential mechanism in the appropriate affine subspace as the base case of the recursion.

The points of SfS\cap f are not on a standard grid within ff. (They lie of course in the standard uniform grid GG of side length 1/X1/X within the full-dimensional cube, but GfG\cap f is not a standard grid and in general has a different, coarser resolution.) We overcome this issue by noting that there always exist jj coordinates, which, without loss of generality, we assume to be x1,,xjx_{1},\ldots,x_{j}, such that ff can be expressed in parametric form by these coordinates. We then project ff (and SfS\cap f) onto the x1x2xjx_{1}x_{2}\cdots x_{j}-coordinate subspace ff^{\prime}. We recurse within ff^{\prime}, where the projected points of SfS\cap f do lie in a standard grid (a cross-section of GG), and then lift the output point x0x_{0}^{\prime}, which lies, with high probability, in conv(S0){\rm conv}(S_{0}^{\prime}), back to a point xfx^{\prime}\in f. It is straightforward to verify that if x0x_{0}^{\prime} is in the convex hull of the projected points then xx^{\prime} is in the convex hull of SfS\cap f.

4.1 Finding an affine subspace with many points privately

For every affine subspace ff, of dimension 0jd10\leq j\leq d-1, spanned by some subset of (at least) j+1j+1 points of GG, we denote by c(f)c(f) the number of points of SS that ff contains, and refer to it as the size of ff.

We start by computing c(f)c(f) for every subspace ff spanned by points of SS, as follows. We construct the (potentially degenerate) arrangement 𝒜(S){\cal A}(S^{*}) of the set SS^{*} of the hyperplanes dual to the points of SS. During this construction, we also compute the multiplicity of each flat in this arrangement, namely, the number of dual hyperplanes that contain it. Each intersection flat of the hyperplanes is dual to an affine subspace ff defined by the corresponding subset of the primal points of SS (that it contains), and c(f)c(f) is the number of dual hyperplanes containing the flat. In other words, as a standard byproduct of the construction of 𝒜(S){\cal A}(S^{*}), we can compute the sizes of all the affine subspaces that are spanned by points of SS, in O(nd)O(n^{d}) overall time.

We define

Mi=Mi(S)=max{c(f) f is spanned by a subset of S and is of dimension at most i},M_{i}=M_{i}(S)=\max\{c(f)\mid\text{ $f$ is spanned by a subset of $S$ and is of dimension }\allowbreak\text{{\em at most} $i$}\},

and compute Mi=Mi+YiM^{\prime}_{i}=M_{i}+Y_{i}, where YiY_{i} is a random variable drawn (independently for each ii) from a Laplace distribution with parameter b:=1εb:=\frac{1}{{\varepsilon}} centered at the origin. (That is, the probability density function of YiY_{i} is 12be|x|/b=ε2eε|x|\frac{1}{2b}e^{-|x|/b}=\frac{{\varepsilon}}{2}e^{-{\varepsilon}|x|}.)

Our algorithm now uses a given confidence parameter β(0,1)\beta\in(0,1) and proceeds as follows. If for every j=0,,d1j=0,\ldots,d-1

Mjn(dj+1)k1εlog2β,M^{\prime}_{j}\leq n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}, (1)

we apply the base case. Otherwise, set jj to be the smallest index such that

Mj>n(dj+1)k1εlog2β.M^{\prime}_{j}>n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}. (2)

Having fixed jj, we find (privately) a subspace ff of dimension jj that contains a large number of points of SS. To do so, let ZjZ_{j} be the collection of all jj-dimensional subspaces that are spanned by j+1j+1 affinely independent points of the grid GG (not necessarily of SS). We apply the exponential mechanism to pick an element of ZjZ_{j}, by assigning a score s(f)s(f) to each subspace of ZjZ_{j}, which we set to be

s(f)=max{0,c(f)Mj1},s(f)=\max\left\{0,c(f)-M_{j-1}\right\}\ ,

if j1j\geq 1, and s(f)=c(f)s(f)=c(f) if j=0j=0. Note that by its definition, s(f)s(f) is zero if ff is not spanned by points of SS. Indeed, in this case the c(f)c(f) points contained in ff span some subspace of dimension j1\ell\leq j-1 and therefore Mj1M_{j-1} must be at least as large as c(f)c(f). We will shortly argue that s(f)s(f) has sensitivity at most 22 (Lemma 4.4), and thus conclude that this step preserves the differential privacy of the procedure.

We would like to apply the exponential mechanism as stated above in time proportional to the number of subspaces of non-zero score, because this number depends only on nn (albeit being exponential in dd) and not on (the much larger) XX. However, to normalize the scores to probabilities, we need to know the number of elements of ZjZ_{j} with zero score, or alternatively to obtain the total number of subspaces spanned by j+1j+1 points of GG (that is, the size of ZjZ_{j}).

We do not have a simple expression for |Zj||Z_{j}| (although this is a quantity that can be computed, for each jj, independently of SS, once and for all), but clearly |Zj|Xd(j+1)|Z_{j}|\leq X^{d{(j+1)}}. We thus augment ZjZ_{j} (only for the purpose of analysis) with Xd(j+1)|Zj|X^{d{(j+1)}}-|Z_{j}| “dummy” subspaces, and denote the augmented set by ZjZ_{j}^{\prime}, whose cardinality is now exactly Xd(j+1)X^{d{(j+1)}}. We draw a subspace ff from ZjZ_{j}^{\prime} using the exponential mechanism. To do so we need to compute, for each score s0s\geq 0, the number NsN_{s} of elements of ZjZ^{\prime}_{j} that have score ss, give each such element weight eεs/4e^{{\varepsilon}s/4}, choose the index ss with probability proportional to Nseεs/4N_{s}e^{{\varepsilon}s/4}, and then choose uniformly a subspace from those with score ss. It is easy to check that this is indeed an implementation of the exponential mechanism as described in Section 2.1.

If the drawing procedure decides to pick a subspace that is not spanned by points of SS, or more precisely decides to pick a subspace of score 0, we stop the whole procedure, with a failure. If the selected score is positive, the subspace to be picked is spanned by j+1j+1 points of SS, and those subspaces are available (from the dual arrangement construction outlined above). We thus obtain a selected subspace ff (by randomly choosing an element from the available list of these subspaces), and apply the algorithm recursively within ff, restricting the input to SfS\cap f. (Strictly speaking, as noted above, we apply the algorithm to a projection of ff onto a suitable jj-dimensional coordinate subspace.)

It follows that we can implement the exponential mechanism on all subspaces in ZjZ_{j}^{\prime} in time proportional to the number of subspaces spanned by points of SS, which is O(nd)O(n^{d}), and therefore the running time of this subspace selection procedure (in each recursive call) is O(nd)O(n^{d}).

4.1.1 Privacy analysis

Lemma 4.2.

Let S1=S0{x}S_{1}=S_{0}\cup\{x\} and S2=S0{y}S_{2}=S_{0}\cup\{y\} be two neighboring data sets. Then, for every i=0,,d1i=0,\ldots,d-1, we have |Mi(S1)Mi(S2)|1|M_{i}(S_{1})-M_{i}(S_{2})|\leq 1.

Proof.

Let ff be a subspace of dimension at most ii that is spanned by points of S1S_{1} and contains Mi(S1)M_{i}(S_{1}) such points. If ff does not contain xx then ff is also a candidate for Mi(S2)M_{i}(S_{2}), so in this case Mi(S2)Mi(S1)M_{i}(S_{2})\geq M_{i}(S_{1}). If ff does contain xx then S1f{x}S0S_{1}\cap f\setminus\{x\}\subseteq S_{0} spans a subspace ff^{\prime} of ff which is of dimension at most ii, so it is a candidate for Mi(S2)M_{i}(S_{2}). Since it does not contain xx (and may contain yy) we have in this case that Mi(S2)Mi(S1)1M_{i}(S_{2})\geq M_{i}(S_{1})-1. Therefore we can conclude that in anycase Mi(S2)Mi(S1)1M_{i}(S_{2})\geq M_{i}(S_{1})-1. A symmetric argument shows that Mi(S1)Mi(S2)1M_{i}(S_{1})\geq M_{i}(S_{2})-1. Combining these two inequalities the lemma follows. ∎

Lemma 4.3.

The value of each MiM^{\prime}_{i}, for i=0,,d1i=0,\ldots,d-1, is ε{\varepsilon}-differentially private.

Proof.

This follows from standard arguments in differential privacy (e.g., see [9, 18]), since, by Lemma 4.2, MiM_{i} is of sensitivity 11 (in the sense shown in the lemma). ∎

Since we choose jj by comparing each of the MjM^{\prime}_{j}’s to a value which is the same for neighboring data sets S1S_{1} and S2S_{2} (which have the same cardinality nn), Lemma 4.3 implies that the choice of the dimension jj is differentially private.

The next step is to choose the actual subspace ff in which to recurse. The following lemma implies that this step too is differentially private.

Lemma 4.4.

Let S1S_{1} and S2S_{2} be as in Lemma 4.2. Then, for each j=0,,d1j=0,\ldots,d-1 and for every subspace fZjf\in Z^{\prime}_{j}, we have |sS1(f)sS2(f)|2|s_{S_{1}}(f)-s_{S_{2}}(f)|\leq 2.

Proof.

Fix jj and fZjf\in Z^{\prime}_{j}. Clearly, |cS1(f)cS2(f)|1|c_{S_{1}}(f)-c_{S_{2}}(f)|\leq 1, and, by Lemma 4.2, Mj1M_{j-1} is also of sensitivity 11, and the claim follows. ∎

Lemma 4.5.

The subspace-selection procedure described in this section (with all its recursive calls) is 2d2ε2d^{2}{\varepsilon}-differentially private.

Proof.

By Lemma 4.3 the computation of each MiM^{\prime}_{i} is ε{\varepsilon}-differentially private, and by Lemma 4.4 the exponential mechanism on the scores s(f)s(f) is also ε{\varepsilon}-differentially private. Since we compute at most dd values MiM^{\prime}_{i} at each step, and we recurse at most dd times, the claim follows by composition [9, 18]. ∎

Remark 4.6.

We can save a factor of dd in Lemma 4.5 by using a framework called the sparse vector technique, see e.g., [9].

4.1.2 Utility analysis

The following lemma is the key for our utility analysis.

Lemma 4.7.

Let β(0,1)\beta\in(0,1) be a given parameter. For k=n4dk=\frac{n}{4d} and for every j=0,,d1j=0,\ldots,d-1 the following properties hold.

(i) If Mjn(dj+1)kM_{j}\geq n-(d-j+1)k then, with probability at least 1β1-\beta,

Mjn(dj+1)k1εlog2β.M^{\prime}_{j}\geq n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}.

(ii) On the other hand, if Mjn(dj+1)k2εlog2β,M_{j}\leq n-(d-j+1)k-\frac{2}{{\varepsilon}}\log\frac{2}{\beta}, then, with probability at least 1β1-\beta,

Mjn(dj+1)k1εlog2β.M^{\prime}_{j}\leq n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}.
Proof.

(i) follows since the probability of the Laplace noise YjY_{j} to be smaller than 1εlog2β-\frac{1}{{\varepsilon}}\log\frac{2}{\beta} is at most β\beta, and (ii) follows since the probability of YjY_{j} to be larger than 1εlog2β\frac{1}{{\varepsilon}}\log\frac{2}{\beta} is also at most β\beta. ∎

We say that (the present recursive step of) our algorithm fails if one of the complements of the events specified in Lemma 4.7 happens, that is, the step fails if for some jj, either (i) Mjn(dj+1)kM_{j}\geq n-(d-j+1)k and Mj<n(dj+1)k1εlog2βM^{\prime}_{j}<n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}, or (ii) Mjn(dj+1)k2εlog2βM_{j}\leq n-(d-j+1)k-\frac{2}{{\varepsilon}}\log\frac{2}{\beta} and Mj>n(dj+1)k1εlog2βM^{\prime}_{j}>n-(d-j+1)k-\frac{1}{{\varepsilon}}\log\frac{2}{\beta}. Otherwise we say that (this step of) our algorithm succeeds.333Note that there is a ‘grey zone’ of values of MjM_{j} between these two bounds, in which the step always succeeds.

It follows from Lemma 4.1 that if the algorithm succeeds and applies the base case then DkD_{\geq k} is full dimensional. Furthermore, if the algorithm succeeds and decides to recurse on a subspace of dimension jj (according to the rule in (1) and (2)) then, for every <j\ell<j, Mn(d+1)kM_{\ell}\leq n-(d-\ell+1)k and Mjn(dj+1)k2εlog2βM_{j}\geq n-(d-j+1)k-\frac{2}{{\varepsilon}}\log\frac{2}{\beta}. The following lemma is an easy consequence of this reasoning.

Lemma 4.8.

If the algorithm succeeds, with dimension jj, and applies the exponential mechanism to pick a jj-dimensional subspace, then there exists a jj-dimensional subspace ff with score s(f)=MjMj1k2εlog2βs(f)=M_{j}-M_{j-1}\geq k-\frac{2}{{\varepsilon}}\log\frac{2}{\beta}. Furthermore, if k8d2εlogX+8εlog1βk\geq\frac{8d^{2}}{{\varepsilon}}\log X+\frac{8}{{\varepsilon}}\log\frac{1}{\beta} then, with probability at least 1β1-\beta, the exponential mechanism picks a subspace ff with s(f)MjMj1k2k22εlog2βs(f)\geq M_{j}-M_{j-1}-\frac{k}{2}\geq\frac{k}{2}-\frac{2}{{\varepsilon}}\log\frac{2}{\beta}.

Proof.

The first part of the lemma follows from the definition of success, as just argued. For the second part notice that, since |Zj|Xd2|Z_{j}^{\prime}|\leq X^{d^{2}}, the probability of drawing a subspace fZjf^{\prime}\in Z_{j}^{\prime} of score smaller than MjMj1k2M_{j}-M_{j-1}-\frac{k}{2} is at most

Xd2eε(MjMj1k/2)/4eε(MjMj1)/4=Xd2eεk/8.X^{d^{2}}\cdot\frac{e^{{\varepsilon}(M_{j}-M_{j-1}-k/2)/{4}}}{e^{{\varepsilon}(M_{j}-M_{j-1})/{4}}}=X^{d^{2}}\cdot e^{-{\varepsilon}k/8}\ .

This expression is at most β\beta if k8d2εlogX+8εlog1βk\geq\frac{8d^{2}}{{\varepsilon}}\log X+\frac{8}{{\varepsilon}}\log\frac{1}{\beta}. ∎

If follows that if our algorithm succeeds and recurses in a subspace ff of dimension jj then, with probability at least 1β1-\beta,

c(f)\displaystyle c(f) Mj1+s(f)Mjk2\displaystyle\geq M_{j-1}+s(f)\geq M_{j}-\frac{k}{2}
n(dj+1)k2εlog1βk2n(dj+32)k2εlog1β.\displaystyle\geq n-(d-j+1)k-\frac{2}{{\varepsilon}}\log\frac{1}{\beta}-\frac{k}{2}\geq n-\left(d-j+\frac{3}{2}\right)k-\frac{2}{{\varepsilon}}\log\frac{1}{\beta}.

That is, when we recurse in ff of dimension jj we lose at most (dj+32)k+2εlog1β\left(d-j+\frac{3}{2}\right)k+\frac{2}{{\varepsilon}}\log\frac{1}{\beta} points. Denote by d0=d,d1,,dtd_{0}=d,d_{1},\ldots,d_{t} the sequence of dimensions into which the procedure recurses (reaching the base case at dimension dt0d_{t}\geq 0). Hence, keeping kk fixed throughout the recursion, at the rr-th recursive step we lose at most (drdr+1+32)k+2εlog1β\left(d_{r}-d_{r+1}+\frac{3}{2}\right)k+\frac{2}{{\varepsilon}}\log\frac{1}{\beta} points. Summing up these losses over r=0,,t1r=0,\ldots,t-1, the total loss is at most

(d0dt)k+32kt+2tεlog1β5d2k+2dεlog1β.(d_{0}-d_{t})k+\frac{3}{2}kt+\frac{2t}{{\varepsilon}}\log\frac{1}{\beta}\leq\frac{5d}{2}\cdot k+\frac{2d}{{\varepsilon}}\log\frac{1}{\beta}.

Substituting k=n4dk=\frac{n}{4d}, we get that the total number of points that we loose is at most 2n3\frac{2n}{3} if n=Ω(dεlog1β)n=\Omega\left(\frac{d}{{\varepsilon}}\log\frac{1}{\beta}\right), with a sufficiently large constant of proportionality.

Notice that we keep kk fixed throughout the recursion and nn may decrease. Consequently, if nn^{\prime} is the number of points in some recursive call in some dimension <d\ell<d, then nn3n^{\prime}\geq\frac{n}{3} and therefore k=n4d3n4dk=\frac{n}{4d}\leq\frac{3n^{\prime}}{4d} which is still smaller than the centerpoint guarantee of n+1\frac{n^{\prime}}{\ell+1}.

As described, our subspace-selection procedure (with all its recursive calls) is 2d2ε2d^{2}{\varepsilon}-differentially private. Dividing ε{\varepsilon} by 2d22d^{2} we get that our subspace-selection procedure is ε{\varepsilon}-differentially private, and that the total number of points we lose is much smaller than nn if n=Ω(d3εlog1β)n=\Omega\left(\frac{d^{3}}{{\varepsilon}}\log\frac{1}{\beta}\right).

Recall Section 3, where we showed that we need n=Ω(d4logdXε)n=\Omega\left(\frac{d^{4}\log dX}{{\varepsilon}}\right) for the (ε{\varepsilon}-differentially private) base case to work correctly. (Recall also that the base case is actually applied in a suitable projection of the terminal subspace onto some coordinate-frame subspace of the same dimension, and that the above lower bound on nn suffices for any such lower-dimensional instance too.)

The following theorem summarizes the result of this section.

Theorem 4.9.

Given n=Ω(d4logdXε+d3log1βε)n=\Omega\left(\frac{d^{4}\log dX}{{\varepsilon}}+\frac{d^{3}\log\frac{1}{\beta}}{{\varepsilon}}\right) points, our algorithm (including all recursive calls and the base case) is ε{\varepsilon}-differentially private, runs in O(n1+(d1)d/2)O\left(n^{1+(d-1)\lfloor d/2\rfloor}\right) time, and finds a point of depth at least k=n4dk=\frac{n}{4d} with probability at least 12d2β1-2d^{2}\beta.

Proof.

The privacy statement follows by composition, using Lemma 4.5 and the privacy properties of the exponential mechanism. The confidence bound follows since the probability of failure of the recursive call in a subspace of dimension jj is at most (j+1)β(j+1)\beta. The running time of the algorithm is dominated by the running time of the exponential mechanism that we perform at the bottom (base case) of the recursion. ∎

5 An O(nd)O(n^{d}) algorithm via volume estimation

The upper bound on the running time in Theorem 4.9 is dominated by the running time of the base case, in which we compute all the regions DD_{\geq\ell} explicitly, which takes nO(d2)n^{O(d^{2})} time. To reduce this cost, we use instead a mechanism that (a) estimates the volume of DkD_{k} to a sufficiently small relative error, and (b) samples a random point “almost” uniformly from DkD_{k}. We show how to accomplish (a) and (b) using the volume estimation mechanisms of Lovász and Vempala [13] and later of Cousins and Vempala [7]. We also show how to use these procedures to implement approximately the exponential mechanism described in Section 3. These algorithms are Monte Carlo, so they fail with some probability, and when they fail we may lose our ε{\varepsilon}-differential privacy guarantee. As a result, the modified algorithm will not be purely differentially private, as the one in Section 3, and we will only be able to guarantee that it is (ε,δ)({\varepsilon},\delta)-differentially private, for any prescribed δ>0\delta>0. We obtain the following theorem which we prove in the rest of this section.

Theorem 5.1.

Given n=Ω(d4logdXδε)n=\Omega\left(\frac{d^{4}\log\frac{dX}{\delta}}{{\varepsilon}}\right) points, our algorithm (including all recursive calls and the base case) is (ε,δ)({\varepsilon},\delta)-differentially private, runs in O(nd)O\left(n^{d}\right) time, and finds a point of depth at least k=n4dk=\frac{n}{4d} with probability at least 1δ1-\delta.

5.1 Volume estimation

We use the following result of Cousins and Vempala for estimating the volumes of the convex polytopes DkD_{\geq k}.

Theorem 5.2 (Cousins and Vempala [7, Theorem 1.1]).

Let KK be a convex body in d{\mathbb{R}}^{d} that contains the unit ball around the origin and satisfies EK(X2)=O(d)E_{K}(||X||^{2})=O(d).444Here EKE_{K} is expectation under the uniform measure within KK. Then, for any α,β>0\alpha,\beta>0, there is an algorithm that can access KK only via membership queries (each determining whether a query point qq lies in KK), and, with probability at least 1β1-\beta, approximates the volume of KK with a relative error α\alpha (i.e., in the range 1±α1\pm\alpha times the true volume), and performs O(d3α2log2dlog21αlog2dαlog1β){\displaystyle O\left(\frac{d^{3}}{\alpha^{2}}\log^{2}d\log^{2}\frac{1}{\alpha}\log^{2}\frac{d}{\alpha}\log\frac{1}{\beta}\right)} membership queries in KK.

We do not know when the algorithm fails; failure simply means that the value that it returns is not within the asserted range.

Membership tests.

Given a query point qq and a parameter \ell, to test whether qDq\in D_{\geq\ell} amounts to verifying that qq lies in each of the halfspaces whose intersection forms DD_{\geq\ell}. Lemma 3.1 shows that there are at most O(nd1)O(n^{d-1}) such halfspaces, and we can prepare in advance a list of these halfspaces, as a by-product of the construction of the arrangement 𝒜(S){\cal A}(S^{*}) of the hyperplanes dual to the points in SS. Thus we can implement a membership query in O(nd1)O(n^{d-1}) time. (Recall that we hide in the O()O(\cdot)-notation factors that are polynomial in dd.)

In order to bring DD_{\geq\ell} into the ‘rounded’ form assumed in Theorem 5.2, we use an algorithm of Lovász and Vempala [13] to bring a convex body KK into an approximate isotropic position.

Definition 5.3.

A convex body KK in d{\mathbb{R}}^{d} is in isotropic position if for every unit vector vdv\in{\mathbb{R}}^{d}, K(vx)𝑑x=1\int_{K}(v\cdot x)dx=1. KK is in tt-nearly isotropic position if for every unit vector vdv\in{\mathbb{R}}^{d}, 1tK(vx)𝑑xt\frac{1}{t}\leq\int_{K}(v\cdot x)dx\leq t.

It is easy to verify that if KK is in tt-nearly isotropic position then EK(X2)tdE_{K}(||X||^{2})\leq td.

Theorem 5.4 (Lovász and Vempala [13, Theorem 1.1]).

There is an algorithm that, given a convex body KK that contains the unit ball around the origin and is contained in a ball of radius RR, and a confidence parameter β>0\beta>0, brings KK into a 2-nearly isotropic position. The algorithm succeeds with probability at least 1β1-\beta, and performs O(d4log7dlogdβlogR)O(d^{4}\log^{7}d\log\frac{d}{\beta}\log R) membership queries in KK.

We can use some of our observations in Section 3 to show:

Lemma 5.5.

Each of the polytopes DD_{\geq\ell} of nonzero volume contains a ball of radius at least 1(d+1)(d(d+1)/2Xd)d(d+1)+1\frac{1}{(d+1)\left(d^{(d+1)/2}X^{d}\right)^{d(d+1)+1}}.

Proof.

As we showed in Section 3, each vertex of DD_{\geq\ell} has rational coordinates of common integral denominator of value at most dd(d+1)/2Xd2d^{d(d+1)/2}X^{d^{2}}. We take d+1d+1 linearly independent vertices of DD_{\geq\ell}, and denote by zz their center of mass, which is guaranteed to be inside DD_{\geq\ell}. This center of mass zz has rational coordinates with a common integer denominator of value at most (d+1)(dd(d+1)/2Xd2)(d+1)(d+1)(d^{d(d+1)/2}X^{d^{2}})^{(d+1)}.

We also showed in Section 3 that each hyperplane hh defining DD_{\geq\ell} has integer coefficients of absolute value at most dd/2Xdd^{d/2}X^{d}.

The largest radius of a ball enclosed in DD_{\geq\ell} and centered at zz is the minimum distance of zz from the defining (supporting) hyperplanes of DD_{\geq\ell}. The distance from any such hyperplane hh is positive (since zz lies in the interior of DD_{\geq\ell}), and is a rational whose denominator is the common denominator of the coordinates of zz times the Euclidean norm of the vector of coefficients of hh, excluding the free term, which is at most ddd/2Xd\sqrt{d}d^{d/2}X^{d}. That is, this distance is at most

1(d+1)dd(d+1)2/2Xd2(d+1)d(d+1)/2Xd=1(d+1)(d(d+1)/2Xd)d(d+1)+1.\frac{1}{(d+1)d^{d(d+1)^{2}/2}X^{d^{2}(d+1)}\cdot d^{(d+1)/2}X^{d}}=\frac{1}{(d+1)\left(d^{(d+1)/2}X^{d}\right)^{d(d+1)+1}}\ .

This implies the lemma. ∎

DD_{\geq\ell} is contained in the unit cube, and therefore it is contained in a ball of radius d\sqrt{d}. It follows that we can scale DD_{\geq\ell} so that it satisfies the conditions of Theorem 5.4, with R=(d+1)3/2(d(d+1)/2Xd)d(d+1)+1R={(d+1)^{3/2}\left(d^{(d+1)/2}X^{d}\right)^{d(d+1)+1}}. Hence we have logR=O(d3(logX+logd))\log R=O(d^{3}(\log X+\log d)).

Our entire volume estimation algorithm applies Theorem 5.4 to DD_{\geq\ell} and then Theorem 5.2 to the ‘rounded’ polytope. By the union bound it fails with probability at most 2β2\beta. The running time is larger by a factor of O(d2)O(d^{2}) than the number of membership queries.

5.2 Implementing the exponential mechanism

We now show how to use the estimates of the volumes of the polytopes DD_{\geq\ell} to sample “almost” uniformly from a region DD_{\ell}, which is picked with probability μ\mu^{\prime}_{\ell}, where

1α1+αμμ1+α1αμ,andμ=eε/2Vol(D)j0eεj/2Vol(Dj),\frac{1-\alpha}{1+\alpha}\mu_{\ell}\leq\mu^{\prime}_{\ell}\leq\frac{1+\alpha}{1-\alpha}\mu_{\ell},\qquad\text{and}\qquad\mu_{\ell}=\frac{e^{{\varepsilon}\ell/2}{\rm Vol}(D_{\ell})}{\sum_{j\geq 0}e^{{\varepsilon}j/2}{\rm Vol}(D_{j})},

as defined in Section 3.

We carry out this modified sampling by defining new probabilities λj\lambda^{\prime}_{j}, for j=0,,𝗍𝖽max(S)j=0,\ldots,{\sf td}_{\rm max}(S) (so that jλj=1\sum_{j}\lambda^{\prime}_{j}=1), choosing the region DD_{\geq\ell} (rather than DD_{\ell}) with probability λ\lambda^{\prime}_{\ell} and then sampling almost uniformly a point in the chosen region DD_{\geq\ell}, using the sampling algorithm in [7], whose properties are stated in the following theorem. In the theorem, the variation distance between two measures μ\mu, ν\nu is maxB|μ(B)ν(B)|\max_{B}|\mu(B)-\nu(B)|, over all possible events BB.

Theorem 5.6 (Cousins and Vempala [7, Theorem 1.2]).

Let KK be a convex body in d{\mathbb{R}}^{d} that contains the unit ball around the origin and satisfies EK(X2)=O(d)E_{K}(||X||^{2})=O(d). Then, for any η,β>0\eta,\beta>0, there is an algorithm that can access KK only via membership queries, and, with probability at least 1β1-\beta, generates random points from a distribution UKU^{\prime}_{K} that is within total variation distance at most η\eta from the uniform distribution on KK. The algorithm performs O(d3logdlog2dηlog1β)O({d^{3}}\log d\log^{2}\frac{d}{\eta}\log\frac{1}{\beta}) membership queries for each random point that it generates. The running time is larger by a factor of O(d2)O(d^{2}) than the number of membership queries.

Exact sampling.

Let us denote V:=Vol(D)V_{\ell}:={\rm Vol}(D_{\geq\ell}) and v:=Vol(D)v_{\ell}:={\rm Vol}(D_{\ell}). We first derive probabilities λ\lambda_{\ell} such that if we pick DD_{\geq\ell} with probability λ\lambda_{\ell}, and then pick a point uniformly from DD_{\geq\ell}, then we simulate the exponential mechanism (over the regions DD_{\ell}) exactly.

The probability that qDmq\in D_{m}, conditioned on having chosen DD_{\geq\ell} to sample from, is vm/Vv_{m}/V_{\ell} for mm\geq\ell; it is 0 for m<m<\ell. Therefore, the unconditional probability that qDmq\in D_{m} is (mλV)vm{\displaystyle\left(\sum_{\ell\leq m}\frac{\lambda_{\ell}}{V_{\ell}}\right)v_{m}}, and we want this probability to be proportional to eεm/2vme^{{\varepsilon}m/2}v_{m}. That is, we want all the equalities mλV=Ceεm/2{\displaystyle\sum_{\ell\leq m}\frac{\lambda_{\ell}}{V_{\ell}}=Ce^{{\varepsilon}m/2}}, for m0m\geq 0, to hold, for a suitable normalizing parameter CC. Putting x=λ/Vx_{\ell}=\lambda_{\ell}/V_{\ell}, we get the triangular system mx=Ceεm/2{\displaystyle\sum_{\ell\leq m}x_{\ell}=Ce^{{\varepsilon}m/2}}, for m0m\geq 0, which solves to

x0=C,andx=C(eε/2eε(1)/2)=C(1eε/2)eε/2,x_{0}=C,\qquad\text{and}\qquad x_{\ell}=C\left(e^{{\varepsilon}\ell/2}-e^{{\varepsilon}(\ell-1)/2}\right)=C\left(1-e^{-{\varepsilon}/2}\right)e^{{\varepsilon}\ell/2},

for 1\ell\geq 1. That is, to ensure that 0λ=1\sum_{\ell\geq 0}\lambda_{\ell}=1, we then put

C=1V0+(1eε/2)j1eεj/2Vj,C=\frac{1}{V_{0}+\left(1-e^{-{\varepsilon}/2}\right)\sum_{j\geq 1}e^{{\varepsilon}j/2}V_{j}},

and

λ0=CV0,and λ=C(1eε/2)eε/2V, for 1.\lambda_{0}=CV_{0},\;\text{and }\lambda_{\ell}=C\left(1-e^{-{\varepsilon}/2}\right)e^{{\varepsilon}\ell/2}V_{\ell},\text{ for }\ell\geq 1.
Approximate sampling.

We approximate these probabilities, by replacing the exact volumes VV_{\ell} by their approximate values VV^{\prime}_{\ell} that we obtain from Theorems 5.4 and 5.2, and get the corresponding approximations CC^{\prime} to CC and λ\lambda^{\prime}_{\ell} to λ\lambda_{\ell}. By construction, and by the probability union bound, we have, for the prescribed error parameter α<1\alpha<1, with probability at least 12nβ1-2n\beta, (1α)VV(1+α)V{\displaystyle(1-\alpha)V_{\ell}\leq V^{\prime}_{\ell}\leq(1+\alpha)V_{\ell}}, for every =0,,𝗍𝖽max(S)\ell=0,\ldots,{\sf td}_{\rm max}(S), which is easily seen to imply that

1α1+αλλ1+α1αλ, for every .\frac{1-\alpha}{1+\alpha}\lambda_{\ell}\leq\lambda^{\prime}_{\ell}\leq\frac{1+\alpha}{1-\alpha}\lambda_{\ell},\qquad\text{ for every $\ell$}. (3)

We sample DD_{\geq\ell} using the probabilities λ\lambda^{\prime}_{\ell}, and then sample from DD_{\geq\ell} using Theorem 5.6. Denote by 𝐏𝐫exp(B){\bf Pr}_{\rm exp}(B), for any measurable subset BB of [0,1]d[0,1]^{d}, the probability that the pure exponential mechanism, that uses the (costly to compute) exact values λ\lambda_{\ell}, as just reviewed, returns a point in BB. It follows that the conditional probability 𝐏𝐫(B){\bf Pr}(B) of returning a point in BB, conditioned on all the parameters λ\lambda^{\prime}_{\ell} satisfying the inequalities (3), and conditioned on successful sampling from DD_{\geq\ell} using Theorem 5.6, satisfies

Lemma 5.7.
1α1+α𝐏𝐫exp(B)η𝐏𝐫(B)1+α1α𝐏𝐫exp(B)+η.\frac{1-\alpha}{1+\alpha}{\bf Pr}_{\rm exp}(B)-\eta\leq{\bf Pr}(B)\leq\frac{1+\alpha}{1-\alpha}{\bf Pr}_{\rm exp}(B)+\eta\ . (4)
Proof.

Indeed, by definition of our algorithm we have that

𝐏𝐫(B)=0𝐏𝐫(B|D)𝐏𝐫(D),{\bf Pr}(B)=\sum_{\ell\geq 0}{\bf Pr}\left(B|D_{\geq\ell}\right)\cdot{\bf Pr}(D_{\geq\ell}), (5)

where DD_{\geq\ell} is a mnemonic for the event that the mechanism decides to sample from the region DD_{\geq\ell}. By Theorem 5.6 we have that

𝐏𝐫u(B|D)η𝐏𝐫(B|D)𝐏𝐫u(B|D)+η,{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)-\eta\leq{\bf Pr}\left(B|D_{\geq\ell}\right)\leq{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)+\eta\ , (6)

where 𝐏𝐫u(B|D){\bf Pr}_{u}\left(B|D_{\geq\ell}\right) is the probability to get a point in BB if we sample uniformly from DD_{\geq\ell}. Combining Equations (5) and (6) we get that

0𝐏𝐫u(B|D)𝐏𝐫(D)η𝐏𝐫(B)0𝐏𝐫u(B|D)𝐏𝐫(D)+η.\sum_{\ell\geq 0}{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)\cdot{\bf Pr}(D_{\geq\ell})-\eta\leq{\bf Pr}(B)\leq\sum_{\ell\geq 0}{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)\cdot{\bf Pr}(D_{\geq\ell})+\eta\ . (7)

Now 𝐏𝐫(D){\bf Pr}(D_{\geq\ell}) in our approximation scheme is λ\lambda^{\prime}_{\ell}, so we get, using (3), that

1α1+α0λ𝐏𝐫u(B|D)η𝐏𝐫(B)1+α1α0λ𝐏𝐫u(B|D)+η,\frac{1-\alpha}{1+\alpha}\sum_{\ell\geq 0}\lambda_{\ell}{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)-\eta\leq{\bf Pr}(B)\leq\frac{1+\alpha}{1-\alpha}\sum_{\ell\geq 0}\lambda_{\ell}{\bf Pr}_{u}\left(B|D_{\geq\ell}\right)+\eta\ ,

which implies the lemma by our argument regarding the exact simulation of the exponential mechanism given above. ∎

If we set η=δ\eta=\delta, β\beta to be some constant multiple of δn\frac{\delta}{n} (we divide by nn to account for the failure probablity over DD_{\geq\ell} for all \ell’s), and α\alpha to be a constant multiple of ε{\varepsilon} then it follows from Equation (4) that our approximation of the exponential mechanism is (ε,δ)({\varepsilon},\delta)-private.

It is also easy to verify that the utility of our approximate exponential mechanism is essentially the same as the utility analysis in Section 3. This proves Theorem 5.1. ∎

6 Conclusions

We gave an O(nd)O(n^{d})-time algorithm for privately computing a point in the convex hull of Ω(d4logX)\Omega(d^{4}\log X) points with coordinates that are multiples of 1/X1/X in [0,1][0,1]. Even though this gives a huge improvement of what was previously known and requires some nontrivial technical effort, and sophisticated sampling and volume estimation tools, this running time is still not satisfactory for large values of dd. The main hurdle in improving it further is the nonexistence of efficient algorithms for computing Tukey depths and Tukey levels.

The main question that we leave open is whether there exists a differentially private algorithm for this task which is polynomial in nn and dd ? (when the input size, nn, is still polynomial in logX\log X and dd).

Acknowledgments

We thank Santosh Vempala for many helpful discussions.

References

  • [1] Amos Beimel, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. In TCC, volume 5978 of LNCS, pages 437–454. Springer, 2010.
  • [2] Amos Beimel, Shay Moran, Kobbi Nissim, and Uri Stemmer. Private center points and learning of halfspaces. In Conference on Learning Theory (COLT), pages 269–282, 2019.
  • [3] Amos Beimel, Kobbi Nissim, and Uri Stemmer. Private learning and sanitization: Pure vs. approximate differential privacy. In APPROX-RANDOM, volume 8096 of LNCS, pages 363–378. Springer, 2013.
  • [4] Mark Bun, Cynthia Dwork, Guy N. Rothblum, and Thomas Steinke. Composable and versatile privacy via truncated cdp. In 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 74–86, 2018.
  • [5] Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Differentially private release and learning of threshold functions. In IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 634–649, 2015.
  • [6] Bernard Chazelle. An optimal convex hull algorithm in any fixed dimension. Discrete Comput. Geom., 10:377–409, 1993.
  • [7] Ben Cousins and Santosh S. Vempala. Gaussian cooling and O(n3){O}^{*}(n^{3}) algorithms for volume and gaussian volume. SIAM J. Comput., 47(3):1237–1273, 2018.
  • [8] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876 of LNCS, pages 265–284. Springer, 2006.
  • [9] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4), 2014.
  • [10] Herbert Edelsbrunner, Raimund Seidel, and Micha Sharir. On the zone theorem for hyperplane arrangements. SIAM J. Comput., 22(2):418–429, 1993.
  • [11] Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. Privately learning thresholds: Closing the exponential gap. CoRR, abs/1911.10137, 2019. URL: http://arxiv.org/abs/911.10137.
  • [12] Xiaohui Liu, Karl Mosler, and Pavlo Mozharovskyi. Fast computation of Tukey trimmed regions and median in dimension p >> 2. J. of Comput. and Graph. Stat., 28(3):682–697, 2019.
  • [13] László Lovász and Santosh S. Vempala. Simulated annealing in convex bodies and an O(n4){O}^{*}(n^{4}) volume algorithm. J. Comput. Syst. Sci., 72(2):392–417, 2006.
  • [14] Jiří Matousek. Lectures on Discrete Geometry. Springer-Verlag, Berlin, Heidelberg, 2002.
  • [15] Frank McSherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 94–103, 2007.
  • [16] Peter J. Rousseeuw and Ida Ruts. Constructing the bivariate Tukey median. Statistica Sinica, 8(3):827–839, 1998.
  • [17] John W. Tukey. Mathematics and the picturing of data. In Proc. of the International Congress of Mathematicians, volume 2, page 523–531, 1975.
  • [18] Salil Vadhan. The complexity of differential privacy. In Yehuda Lindell, editor, Tutorials on the Foundations of Cryptography: Dedicated to Oded Goldreich, pages 347–450. Springer, 2017.