This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Algorithms for Mean Estimation under Local Differential Privacy

Hilal Asi Stanford University, part of this work performed while interning at Apple; [email protected].    Vitaly Feldman Apple; [email protected].    Kunal Talwar Apple; [email protected].
Abstract

We study the problem of mean estimation of 2\ell_{2}-bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit [BDFKR18] with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family.

We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.

1 Introduction

Mean estimation is one of the most fundamental problems in machine learning and is the building block of a countless number of algorithms and applications including stochastic optimization [Duc18], federated learning [BIKMMPRSS17] and others. However, it is now evident that standard algorithms for this task may leak sensitive information about users’ data and compromise their privacy. This had led to the development of numerous algorithms for estimating the mean while preserving the privacy of users. The most common models for privacy are either the central model where there exists a trusted curator or the local model where such trusted curator does not exist.

In this work, we study the problem of mean estimation in the local privacy model. More specifically, we have nn users each with a vector viv_{i} in the Euclidean unit ball in d\mathbb{R}^{d}. Each user will use a randomizer :d𝒵\mathcal{R}:\mathbb{R}^{d}\to\mathcal{Z} to privatize their data where \mathcal{R} must satisfy ε\varepsilon-differential privacy, namely, for any v1v_{1} and v2v_{2}, P((v1)=u)/P((v2)=u)eεP(\mathcal{R}(v_{1})=u)/P(\mathcal{R}(v_{2})=u)\leq e^{\varepsilon}. Then, we run an aggregation method 𝒜:𝒵nd\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d} such that 𝒜((v1),,(vn))\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n})) provides an estimate of 1ni=1nvi\frac{1}{n}\sum_{i=1}^{n}v_{i}. Our goal in this work is to characterize the optimal protocol (pair of randomizer \mathcal{R} and aggregation method 𝒜\mathcal{A}) for this problem and study the resulting optimal error.

Due to its importance and many applications, the problem of private mean estimation in the local model has been studied by numerous papers [BDFKR18, FT21, CKÖ20]. As a result, a clear understanding of the asymptotic optimal rates has emerged, showing that the optimal squared error is proportional to Θ(dnmin(ε,ε2))\Theta(\frac{d}{n\min(\varepsilon,\varepsilon^{2})}): [DJW18, BDFKR18] developed algorithms that obtain this rate and  [DR19] proved corresponding lower bounds. Subsequent papers [FT21, CKÖ20] have developed several other algorithms that achieve the same rates.

However, these optimality results do not give a clear characterization of which algorithm will enjoy better performance in practice. Constant factors here matter more than they do in run time or memory, as ε\varepsilon is typically limited by privacy constraints, and increasing the sample size by collecting data for more individuals is often infeasible or expensive. The question of finding the randomizer with the smallest error is therefore of great interest.

1.1 Our contributions

Motivated by these limitations, we investigate strict optimality for the problem of mean estimation with local privacy. We study the family of non-interactive and unbiased protocols, that is, a protocol is a pair of local private randomizer :𝕊d1𝒵\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z} and an aggregation method 𝒜:𝒵nd\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d} where the protocol outputs 𝒜((v1),,(vn))\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n})) such that 𝔼[𝒜((v1),,(vn))]=1ni=1nvi\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}. We measure the error of a private protocol in terms of its (worst case) mean squared error

𝖤𝗋𝗋n(𝒜,)=supv1,,vn𝕊d1𝔼[𝒜((v1),,(vn))1ni=1nvi22].\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\frac{1}{n}\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right].

We obtain the following results.

First, we show that PrivUnit of [BDFKR18] with optimized parameters is optimal amongst a large family of protocols. Our strategy for proving optimality consists of two main steps: first, we show that for non-interactive protocols, additive aggregation with a certain randomizer attains the optimal error. Then, for protocols with additive aggregation, we show that PrivUnit obtains the optimal error. Our proof builds on establishing several new properties of the optimal local randomizer, which allow us to express the problem of designing the optimal randomizer as a linear program. This in turn helps characterize the structure of optimal randomizers and allows us to show that there is an optimal randomizer which is an instance of PrivUnit.

Finding the exact constants in the error of PrivUnit is mathematically challenging. Our second contribution is to develop a new algorithm PrivUnitG that builds on the Gaussian distribution and attains the same error as PrivUnit up to a (1+o(1))(1+o(1)) multiplicative factor as dd\to\infty. In contrast to PrivUnit, we show that the optimal parameters of PrivUnitG are independent of the dimension, hence enabling efficient calculation of the constants for high dimensional settings. Moreover, PrivUnitG is amenable to mathematical analysis which yields several properties on the constants of the optimal error.

1.2 Related work

Local privacy is perhaps one of the oldest forms of privacy and dates back to [War65] who used it to encourage truthfulness in surveys. This definition resurfaced again in the contex of modern data analysis by [EGS03] and was related to differential privacy in the seminal work of  [DMNS06]. Local privacy has attracted a lot of interest, both in the academic community [BNO08, DJW18, BDFKR18], and in industry where it has been deployed in several industrial applications [EPK14, App17]. Recent work in the Shuffle model of privacy [BEMMRLRKTS17, CSUZZ19, EFMRTT19, BBGN19, FMT20] has led to increased interest in the local model with moderate values of the local privacy parameter, as they can translate to small values of central ε\varepsilon under shuffling.

The problem of locally private mean estimation has received a great deal of attention in the past decade [DJW18, BDFKR18, DR19, EFMRSTT20, ASYKM18, GDDKS20, CKÖ20, GKMM19, FT21]. [DJW18] developed asymptotically optimal procedures for estimating the mean when ε1\varepsilon\leq 1, achieving expected squared error O(dnε2)O(\frac{d}{n\varepsilon^{2}}). [BDFKR18] proposed a new algorithm that is optimal for ε1\varepsilon\geq 1 as well, achieving error O(dnmin(ε,ε2))O(\frac{d}{n\min(\varepsilon,\varepsilon^{2})}). These rates are optimal as [DR19] show tight lower bounds which hold for interactive protocols. There has been more work on locally private mean estimation that studies the problem with additional constraints such as communications cost [EFMRSTT20, FT21, CKÖ20].

[YB17, YB18] study (non-interactive) locally private estimation problems with discrete domains and design algorithms that achieve optimal rates. These optimality results are not restricted to the family of unbiased private mechanisms. However, in contrast to our work, these results are only asymptotic hence their upper and lower bounds are matching only as the number of samples goes to infinity.

While there are several results in differential privacy that establish asymptotically matching lower and upper bounds for various problems of interest, strict optimality results are few. While there are some results known for the one-dimensional problem [GRS09, GS10], some of which extend to a large class of utility functions, such universal mechanisms are known not to exist for multidimensional problems [BN10]. [GKOV15, KOV16] show that for certain loss functions, one can phrase the problem of designing optimal local randomizers as linear programs, whose size is exponential in the size of the input domain.

2 Problem setting and preliminaries

We begin in this section by defining local differential privacy. To this end, we say that two probability distributions PP and QQ are (ε,δ)(\varepsilon,\delta)-close if for every event EE

eε(P(E)δ)Q(E)eεP(E)+δ.e^{-\varepsilon}(P(E)-\delta)\leq Q(E)\leq e^{\varepsilon}P(E)+\delta.

We say two random variables are (ε,δ)(\varepsilon,\delta)-close if their distributions are (ε,δ)(\varepsilon,\delta)-close.

We can now define local DP randomizers.

Definition 2.1.

A randomized algorithm :XY\mathcal{R}:X\to Y is (replacement) (ε,δ)(\varepsilon,\delta)-DP local randomizer if for all x,xXx,x^{\prime}\in X, (x)\mathcal{R}(x) and (x)\mathcal{R}(x^{\prime}) are (ε,δ)(\varepsilon,\delta)-close.

In this work, we will primarily be interested in pure DP randomizers, i.e. those which satisfy (ε,0)(\varepsilon,0)-DP. We abbreviate this as ε\varepsilon-DP. In the setting of local randomizers, the difference between (ε,δ)(\varepsilon,\delta)-DP and pure DP is not significant; indeed any (ε,δ)(\varepsilon,\delta)-DP local randomizer can be converted [FMT20, CU21] to one that satisfies ε\varepsilon-DP while changing the distributions by a statistical distance of at most O(δ)O(\delta).

The main problem we study in this work is locally private mean estimation. Here, we have nn unit vectors v1,,vndv_{1},\dots,v_{n}\in\mathbb{R}^{d}, i.e vi𝕊d1v_{i}\in\mathbb{S}^{d-1}. The goal is to design (locally) private protocols that estimate the mean 1ni=1nvi\frac{1}{n}\sum_{i=1}^{n}v_{i}. We focus on the setting of non-interactive private protocols: such a protocol consists of a pair of private local randomizer :𝕊d1𝒵\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z} and aggregation method 𝒜:𝒵nd\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d} where the final output is 𝒜((v1),,(vn))\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n})). We require that the output is unbiased, that is, 𝔼[𝒜((v1),,(vn))]=1ni=1nvi\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}, and wish to find private protocols that minimize the variance

𝖤𝗋𝗋n(𝒜,)=supv1,,vn𝕊d1𝔼[𝒜((v1),,(vn))1ni=1nvi22].\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\frac{1}{n}\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right].

Note that in the above formulation, the randomizer \mathcal{R} can have arbitrary domains (not necessarily d\mathbb{R}^{d}), and the aggregation method can be arbitrary as well. However, one important special family of private protocols, which we term canonical private protocols, are protocols where the local randomizer :𝕊2d1d\mathcal{R}:\mathbb{S}_{2}^{d-1}\to\mathbb{R}^{d} has outputs in d\mathbb{R}^{d} and the aggregation method is the simple additive aggregation 𝒜+(z1,,zn)=1ni=1nzi\mathcal{A}^{+}(z_{1},\dots,z_{n})=\frac{1}{n}\sum_{i=1}^{n}z_{i}. In addition to being a natural family of protocols, canonical protocols are (i)(i) simple and easy to implement, and (ii)(ii) achieve the smallest possible variance amongst the family of all possible unbiased private protocols, as we show in the subsequent sections.

Notation

We let 𝕊d1={ud:u2=1}\mathbb{S}^{d-1}=\{u\in\mathbb{R}^{d}:\left\|{u}\right\|_{2}=1\} denote the unit sphere, and R𝕊d1R\cdot\mathbb{S}^{d-1} denote the sphere of radius R>0R>0. Whenever clear from context, we use the shorter notation 𝕊\mathbb{S}. Given a random variable VV, we let fVf_{V} denote the probability density function of VV. For a randomizer \mathcal{R} and input vv, f(v)f_{\mathcal{R}(v)} denotes the probability density function of the random variable (v)\mathcal{R}(v). For a Gaussian random variable V𝖭(0,σ2)V\sim\mathsf{N}(0,\sigma^{2}) with σ>0\sigma>0, we let ϕσ:\phi_{\sigma}:\mathbb{R}\to\mathbb{R} denote the probability density function of VV and Φσ:[0,1]\Phi_{\sigma}:\mathbb{R}\to[0,1] denote its cumulative distribution function. For ease of notation, we write ϕ\phi and Φ\Phi when σ=1\sigma=1. Given two random variables VV and UU, we say that V=dUV\stackrel{{\scriptstyle d}}{{=}}U if VV and UU has the same distribution, that is, fV=fUf_{V}=f_{U}. Finally, we let eide_{i}\in\mathbb{R}^{d} denote the standard basis vectors and O(d)={Ud×d:UUT=I}O(d)=\{U\in\mathbb{R}^{d\times d}:UU^{T}=I\} denote the subspace of orthonormal matrices of dimension dd.

3 Optimality of PrivUnit

In this section, we prove our main optimality results showing that PrivUnit with additive aggregation achieves the optimal error among the family of unbiased locally private procedures. More precisely, we show that for any ε\varepsilon-DP local randomizer :d𝒵\mathcal{R}:\mathbb{R}^{d}\to\mathcal{Z} and any aggregation method 𝒜:𝒵nd\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d} that is unbiased,

𝖤𝗋𝗋n(𝒜+,PrivUnit)𝖤𝗋𝗋n(𝒜,).\mathsf{Err}_{n}(\mathcal{A}^{+},\text{PrivUnit})\leq\mathsf{Err}_{n}(\mathcal{A},\mathcal{R}).

We begin in Section 3.1 by introducing the algorithm PrivUnit and stating its optimality guarantees in Section 3.2. To prove the optimality result, we begin in Section 3.3 by showing that there exists a canonical private protocol that achieves the optimal error, then in Section 3.2 we show that PrivUnit is the optimal local randomizer in the family of canonical protocols.

3.1 PrivUnit

We begin by introducing PrivUnit which was developed by [BDFKR18]. Given an input vector v𝕊d1v\in\mathbb{S}^{d-1} and letting W𝖴𝗇𝗂(𝕊d1)W\sim\mathsf{Uni}(\mathbb{S}^{d-1}), PrivUnit(p,γ)\text{PrivUnit}(p,\gamma) has the following distribution (up to normalization)

PrivUnit(p,γ){WW,vγwith prob. pWW,v<γwith prob. 1p\text{PrivUnit}(p,\gamma)\sim\begin{cases}&W\mid\langle W,v\rangle\geq\gamma\quad\text{with prob. }p\\ &W\mid\langle W,v\rangle<\gamma\quad\text{with prob. }1-p\end{cases}

A normalization factor is needed to obtain the correct expectation. We provide full details in Algorithm 1.

The following theorem states the privacy guarantees of PrivUnit. Theorem 1 in [BDFKR18] provides privacy guarantees based on several mathematical approximations which may not be tight. For our optimality results, we require the following exact privacy guarantee of PrivUnit.

Theorem 1.

[BDFKR18, Theorem 1] Let q=P(U1γ)q=P(U_{1}\leq\gamma) where W𝖴𝗇𝗂(𝕊d1)W\sim\mathsf{Uni}(\mathbb{S}^{d-1}). If p1pq1qeε\frac{p}{1-p}\frac{q}{1-q}\leq e^{\varepsilon} then PrivUnit(p,γ)\text{PrivUnit}(p,\gamma) is an ε\varepsilon-DP local randomizer.

Throughout the paper, we will sometimes use the equivalent notation PrivUnit(p,q)\text{PrivUnit}(p,q) which describes running PrivUnit(p,γ)\text{PrivUnit}(p,\gamma) with q=P(W1γ)q=P(W_{1}\leq\gamma) as in Theorem 1.

Algorithm 1 PrivUnit(p,γ)\text{PrivUnit}(p,\gamma)
0:  v𝕊d1v\in\mathbb{S}^{d-1}, γ[0,1]\gamma\in[0,1], p[0,1]p\in[0,1]. B(;,)B(\cdot;\cdot,\cdot) below is the incomplete Beta function B(x;a,b)=0xta1(1t)b1dtB(x;a,b)=\int_{0}^{x}t^{a-1}(1-t)^{b-1}\textrm{d}t and B(a,b)=B(1;a,b)B(a,b)=B(1;a,b).
1:  Draw z𝖡𝖾𝗋(p)z\sim\mathsf{Ber}(p)
2:  if z=1z=1 then
3:     Draw V𝖴𝗇𝗂{u𝕊d1:u,vγ}V\sim\mathsf{Uni}\{u\in\mathbb{S}^{d-1}:\langle u,v\rangle\geq\gamma\}
4:  else
5:     Draw V𝖴𝗇𝗂{u𝕊d1:u,v<γ}V\sim\mathsf{Uni}\{u\in\mathbb{S}^{d-1}:\langle u,v\rangle<\gamma\}
6:  Set α=d12\alpha=\frac{d-1}{2} and τ=1+γ2\tau=\frac{1+\gamma}{2}
7:  Calculate normalization constant
m=(1γ2)α2d2(d1)(pB(α,α)B(τ;α,α)+1pB(τ;α,α))m=\frac{(1-\gamma^{2})^{\alpha}}{2^{d-2}(d-1)}\left(\frac{p}{B(\alpha,\alpha)-B(\tau;\alpha,\alpha)}+\frac{1-p}{B(\tau;\alpha,\alpha)}\right)
8:  Return 1mV\frac{1}{m}\cdot V

3.2 Optimality

Asymptotic optimality of PrivUnit has already been established by prior work. [BDFKR18] show that the error of PrivUnit is upper bounded by O(dnmin(ε,ε2))O(\frac{d}{n\min(\varepsilon,\varepsilon^{2})}) for certain parameters. Moreover, [DR19] show a lower bound of Ω(dnmin(ε,ε2))\Omega(\frac{d}{n\min(\varepsilon,\varepsilon^{2})}), implying that PrivUnit is asymptotically optimal.

In this section, we prove that additive aggregation applied with PrivUnit with the best choice of parameters p,γp,\gamma is truly optimal, that is, it outperforms any unbiased private algorithm. The following theorem states our optimality result for PrivUnit.

Theorem 2.

Let :𝕊d1𝒵\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z} be an ε\varepsilon-DP local randomizer, and 𝒜:𝒵nd\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d} be an aggregation procedure such that 𝔼[𝒜((v1),,(vn))]=1ni=1nvi\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i} for all v1,,vn𝕊d1v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}. Then there is p[0,1]p^{\star}\in[0,1] and γ[0,1]\gamma^{\star}\in[0,1] such that PrivUnit(p,γ)\text{PrivUnit}(p^{\star},\gamma^{\star}) is ε\varepsilon-DP local randomizer and

𝖤𝗋𝗋(𝒜+,PrivUnit(p,γ))𝖤𝗋𝗋(𝒜,).\mathsf{Err}(\mathcal{A}^{+},\text{PrivUnit}(p^{\star},\gamma^{\star}))\leq\mathsf{Err}(\mathcal{A},\mathcal{R}).

The proof of Theorem 2 will proceed in two steps: first, in Section 3.3 (Proposition 1), we show that there exists an optimal private procedure that is canonical, then in Section 3.2 (Proposition 2) we prove that PrivUnit is the optimal randomizer in this family. Theorem 2 is a direct corollary of these two propositions.

3.3 Optimality of canonical protocols

In this section, we show that there exists a canonical private protocol that achieves the optimal error. In particular, we have the following result.

Proposition 1.

Let (,𝒜)(\mathcal{R},\mathcal{A}) be such that :𝕊d1𝒵\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z} is ε\varepsilon-DP local randomizer and 𝔼[𝒜((v1),,(vn))]=1ni=1nvi\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i} for all v1,,vn𝕊d1v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}. Then there a canonical randomizer :𝕊d1d\mathcal{R}^{\prime}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} that is ε\varepsilon-DP local randomizer and

𝖤𝗋𝗋n(𝒜,)𝖤𝗋𝗋n(𝒜+,).\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})\geq\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R}^{\prime}).

To prove Proposition 1, we begin with the following lemma.

Lemma 3.1.

Let (,𝒜)(\mathcal{R},\mathcal{A}) satisfy the conditions of Proposition 1. Let PP be a probability distribution over 𝕊d1\mathbb{S}^{d-1} such that 𝔼vP[v]=0\mathbb{E}_{v\sim P}[v]=0. There is ^i:𝕊d1d\hat{\mathcal{R}}_{i}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} for i[n]i\in[n] such that ^i\hat{\mathcal{R}}_{i} is ε\varepsilon-DP local randomizer, 𝔼[^i(v)]=v\mathbb{E}[\hat{\mathcal{R}}_{i}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}, and

𝔼v1,,vnP[n𝒜((v1),,(vn))i=1nvi22]i=1n𝔼viP[^i(vi)vi22].\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right]\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\|_{2}^{2}\right].

Before proving Lemma 3.1, we now complete the proof the Proposition 1.

Proof.

(Proposition 1) Let P𝗎𝗇𝗂𝖿P_{\mathsf{unif}} be the uniform distribution over the sphere 𝕊d1\mathbb{S}^{d-1}. First, note that

n2𝖤𝗋𝗋n(𝒜,)\displaystyle n^{2}\mathsf{Err}_{n}(\mathcal{A},\mathcal{R}) =supv1,,vn𝕊d1𝔼[n𝒜((v1),,(vn))i=1nvi22]\displaystyle=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right]
𝔼v1,,vnP𝗎𝗇𝗂𝖿[n𝒜((v1),,(vn))i=1nvi22]\displaystyle\geq\mathbb{E}_{v_{1},\dots,v_{n}\sim P_{\mathsf{unif}}}\left[\left\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right]
i=1n𝔼viP𝗎𝗇𝗂𝖿[^i(vi)vi22].\displaystyle\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P_{\mathsf{unif}}}\left[\left\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\|_{2}^{2}\right].

Now we define i\mathcal{R}^{\prime}_{i} as follows. First, sample a random rotation matrix Ud×dU\in\mathbb{R}^{d\times d} where UTU=IU^{T}U=I, then set

i(v)=UT^i(Uv).\mathcal{R}^{\prime}_{i}(v)=U^{T}\hat{\mathcal{R}}_{i}(Uv).

Note that \mathcal{R}^{\prime} is ε\varepsilon-DP local randomizer, 𝔼[i(v)]=v\mathbb{E}[\mathcal{R}^{\prime}_{i}(v)]=v, and for all v𝕊d1v\in\mathbb{S}^{d-1}

𝔼[i(v)v22]\displaystyle\mathbb{E}\left[\left\|{\mathcal{R}^{\prime}_{i}(v)-v}\right\|_{2}^{2}\right] =𝔼U[UT^i(Uv)v22]\displaystyle=\mathbb{E}_{U}\left[\left\|{U^{T}\hat{\mathcal{R}}_{i}(Uv)-v}\right\|_{2}^{2}\right]
=𝔼U[^i(Uv)Uv22]\displaystyle=\mathbb{E}_{U}\left[\left\|{\hat{\mathcal{R}}_{i}(Uv)-Uv}\right\|_{2}^{2}\right]
=𝔼vP𝗎𝗇𝗂𝖿[^i(v)v22].\displaystyle=\mathbb{E}_{v\sim P_{\mathsf{unif}}}\left[\left\|{\hat{\mathcal{R}}_{i}(v)-v}\right\|_{2}^{2}\right].

Overall, we have

n2𝖤𝗋𝗋n(𝒜,)\displaystyle n^{2}\mathsf{Err}_{n}(\mathcal{A},\mathcal{R}) i=1n𝔼viP𝗎𝗇𝗂𝖿[^i(vi)vi22]\displaystyle\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P_{\mathsf{unif}}}\left[\left\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\|_{2}^{2}\right]
=i=1nsupv𝕊d1𝔼[i(v)v22]\displaystyle=\sum_{i=1}^{n}\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{\mathcal{R}^{\prime}_{i}(v)-v}\right\|_{2}^{2}\right]
=i=1n𝖤𝗋𝗋1(𝒜+,i)\displaystyle=\sum_{i=1}^{n}\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i})
n𝖤𝗋𝗋1(𝒜+,i)\displaystyle\geq n\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i^{\star}})
=n2𝖤𝗋𝗋n(𝒜+,i),\displaystyle=n^{2}\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R}_{i^{\star}}),

where i[n]i^{\star}\in[n] minimizes 𝖤𝗋𝗋1(𝒜+,i)\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i}). The claim follows. ∎

Now we prove Lemma 3.1.

Proof.

(Lemma 3.1) We define ^i\hat{\mathcal{R}}_{i} to be

^i(vi)=𝔼vjP,ji[n𝒜((v1),,(vn))].\hat{\mathcal{R}}_{i}(v_{i})=\mathbb{E}_{v_{j}\sim P,j\neq i}[n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))].

Note that ^i\hat{\mathcal{R}}_{i} is ε\varepsilon-DP local randomizer as it requires a single application of (vi)\mathcal{R}(v_{i}). Moreover, 𝔼[R^i(v)]=v\mathbb{E}[\hat{R}_{i}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}. We define

^i(v1,,vi)=𝔼vjP,j>i[n𝒜((v1),,(vn))j=1ivj],\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})=\mathbb{E}_{v_{j}\sim P,j>i}\left[n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{j=1}^{i}v_{j}\right],

and ^0=0\hat{\mathcal{R}}_{0}=0. We now have

𝔼v1,,vnP[n𝒜((v1),,(vn))i=1nvi22]\displaystyle\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right]
=𝔼v1,,vnP[^n(v1,,vn)22]\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})}\right\|_{2}^{2}\right]
=𝔼v1,,vnP[^n(v1,,vn)^n1(v1,,vn1)+^n1(v1,,vn1)22]\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})+\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\|_{2}^{2}\right]
=(i)𝔼v1,,vnP[^n(v1,,vn)^n1(v1,,vn1)22]+𝔼v1,,vn1P[^n1(v1,,vn1)22]\displaystyle\quad\stackrel{{\scriptstyle(i)}}{{=}}\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\|_{2}^{2}\right]+\mathbb{E}_{v_{1},\dots,v_{n-1}\sim P}\left[\left\|{\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\|_{2}^{2}\right]
=(ii)i=1n𝔼v1,,viP[^i(v1,,vi)^i1(v1,,vi1)22]\displaystyle\quad\stackrel{{\scriptstyle(ii)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{1},\dots,v_{i}\sim P}\left[\left\|{\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})}\right\|_{2}^{2}\right]
(iii)i=1n𝔼viP[Ev1,,vi1P[^i(v1,,vi)^i1(v1,,vi1)]22]\displaystyle\quad\stackrel{{\scriptstyle(iii)}}{{\geq}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\|{E_{v_{1},\dots,v_{i-1}\sim P}\left[\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})\right]}\right\|_{2}^{2}\right]
=(iv)i=1n𝔼viP[^i(vi)vi22]\displaystyle\quad\stackrel{{\scriptstyle(iv)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\|_{2}^{2}\right]

where (i)(i) follows since EvnP[n(v1,,vn)]=^n1(v1,,vn1)E_{v_{n}\sim P}[\mathcal{R}_{\leq n}(v_{1},\dots,v_{n})]=\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1}), (ii)(ii) follows by induction, (iii)(iii) follows from Jensen’s inequality, and (iv)(iv) follows since Ev1,,vi1P[^i(v1,,vi)]=^i(vi)viE_{v_{1},\dots,v_{i-1}\sim P}[\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})]=\hat{\mathcal{R}}_{i}(v_{i})-v_{i} and Ev1,,vi1P[^i1(v1,,vi1)]=0E_{v_{1},\dots,v_{i-1}\sim P}[\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})]=0.

3.4 Optimality of PrivUnit among canonical randomizers

In this section, we show that PrivUnit achieves the optimal error in the family of canonical randomizers. To this end, first note that for additive aggregation 𝒜+\mathcal{A}^{+}, we have 𝖤𝗋𝗋n(𝒜+,)=𝖤𝗋𝗋1(𝒜+,)/n\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R})=\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R})/n. Denoting 𝖤𝗋𝗋()=𝖤𝗋𝗋1(𝒜+,)\mathsf{Err}(\mathcal{R})=\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}) for canonical randomizers, we have the following optimality result.

Proposition 2.

Let :𝕊d1d\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} be an ε\varepsilon-DP local randomizer such that 𝔼[(v)]=v\mathbb{E}[\mathcal{R}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}. Then there is p[0,1]p^{\star}\in[0,1] and γ[0,1]\gamma^{\star}\in[0,1] such that PrivUnit(p,γ)\text{PrivUnit}(p^{\star},\gamma^{\star}) is ε\varepsilon-DP local randomizer and

𝖤𝗋𝗋(PrivUnit(p,γ))𝖤𝗋𝗋().\mathsf{Err}(\text{PrivUnit}(p^{\star},\gamma^{\star}))\leq\mathsf{Err}(\mathcal{R}).

The proof of Proposition 2 builds on a sequence of lemmas, each of which allows to simplify the structure of an optimal algorithm. We begin with the following lemma which show that there exists an optimal algorithm which is invariant to rotations.

Lemma 3.2 (Rotation-Invariance Lemma).

Let :𝕊d1d\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} be an ε\varepsilon-DP local randomizer such that 𝔼[(v)]=v\mathbb{E}[\mathcal{R}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}. There exists an ε\varepsilon-DP local randomizer \mathcal{R}^{\prime} such that

  1. 1.

    𝔼[(v)]=v\mathbb{E}[\mathcal{R}^{\prime}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}

  2. 2.

    𝖤𝗋𝗋()𝖤𝗋𝗋()\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})

  3. 3.

    𝖤𝗋𝗋()=𝖤𝗋𝗋(,v)\mathsf{Err}(\mathcal{R}^{\prime})=\mathsf{Err}(\mathcal{R}^{\prime},v) for all v𝕊d1v\in\mathbb{S}^{d-1}

  4. 4.

    For any v,v0𝕊d1v,v_{0}\in\mathbb{S}^{d-1}, there is an orthonormal matrix Vd×dV\in\mathbb{R}^{d\times d} such that (v)=dV(v0)\mathcal{R}^{\prime}(v)\stackrel{{\scriptstyle d}}{{=}}V\mathcal{R}^{\prime}(v_{0}).

  5. 5.

    f(v)(u1)f(v)(u2)eε\frac{f_{\mathcal{R}^{\prime}(v)}(u_{1})}{f_{\mathcal{R}^{\prime}(v)}(u_{2})}\leq e^{\varepsilon} for any v𝕊d1v\in\mathbb{S}^{d-1} and u1,u2du_{1},u_{2}\in\mathbb{R}^{d} with u12=u22\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}

Proof.

Given \mathcal{R}, we define \mathcal{R}^{\prime} as follows. First, sample a random rotation matrix Ud×dU\in\mathbb{R}^{d\times d} where UTU=IU^{T}U=I, then set

(x)=UT(Ux).\mathcal{R}^{\prime}(x)=U^{T}\mathcal{R}(Ux).

We now prove that \mathcal{R}^{\prime} satisfies the desired properties. First, note that the privacy of \mathcal{R} immediately implies the same privacy bound for \mathcal{R}^{\prime}. Moreover, we have that 𝔼[(v)]=𝔼[UT𝔼[(Uv)]]=𝔼[UTUv]=v\mathbb{E}[\mathcal{R}^{\prime}(v)]=\mathbb{E}[U^{T}\mathbb{E}[\mathcal{R}(Uv)]]=\mathbb{E}[U^{T}Uv]=v as \mathcal{R} is unbiased and UTU=IU^{T}U=I. For utility, note that

𝖤𝗋𝗋()\displaystyle\mathsf{Err}(\mathcal{R}^{\prime}) =supv𝕊d1𝔼[(v)v22]\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}^{\prime}}\left[\left\|{\mathcal{R}^{\prime}(v)-v}\right\|_{2}^{2}\right]
=supv𝕊d1𝔼U,[UT(Uv)UTUv22]\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\|{U^{T}\mathcal{R}(Uv)-U^{T}Uv}\right\|_{2}^{2}\right]
=supv𝕊d1𝔼U,[(Uv)Uv22]\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\|{\mathcal{R}(Uv)-Uv}\right\|_{2}^{2}\right]
=supv𝕊d1,U𝔼[(Uv)Uv22]\displaystyle=\sup_{v\in\mathbb{S}^{d-1},U}\mathbb{E}_{\mathcal{R}}\left[\left\|{\mathcal{R}(Uv)-Uv}\right\|_{2}^{2}\right]
supv𝕊d1𝔼[(v)v22]\displaystyle\leq\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}}\left[\left\|{\mathcal{R}(v)-v}\right\|_{2}^{2}\right]
=𝖤𝗋𝗋().\displaystyle=\mathsf{Err}(\mathcal{R}).

Finally, we show that the distributions of (v1)\mathcal{R}^{\prime}(v_{1}) and (v2)\mathcal{R}^{\prime}(v_{2}) are the same up to rotations. Indeed, let Vd×dV\in\mathbb{R}^{d\times d} be a rotation matrix such that v1=Vv2v_{1}=Vv_{2}. We have that (v2)=UT(Uv2)\mathcal{R}^{\prime}(v_{2})=U^{T}\mathcal{R}(Uv_{2}), which can also be written as (v2)=d(UV)T(UVv2)=dVTUT(Uv1)=dVT(v1)\mathcal{R}^{\prime}(v_{2})\stackrel{{\scriptstyle d}}{{=}}(UV)^{T}\mathcal{R}(UVv_{2})\stackrel{{\scriptstyle d}}{{=}}V^{T}U^{T}\mathcal{R}(Uv_{1})\stackrel{{\scriptstyle d}}{{=}}V^{T}\mathcal{R}^{\prime}(v_{1}) as UVUV is also a random rotation matrix.

Now we prove the final property. Assume towards a contradiction there is v1𝕊d1v_{1}\in\mathbb{S}^{d-1} and u1,u2du_{1},u_{2}\in\mathbb{R}^{d} with u12=u22\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2} such that f(v1)(u1)f(v1)(u2)>eε\frac{f_{\mathcal{R}^{\prime}(v_{1})}(u_{1})}{f_{\mathcal{R}^{\prime}(v_{1})}(u_{2})}>e^{\varepsilon}. We will show that this implies that \mathcal{R}^{\prime} is not ε\varepsilon-DP. In the proof above we showed that (v2)=dVT(v1)\mathcal{R}^{\prime}(v_{2})\stackrel{{\scriptstyle d}}{{=}}V^{T}\mathcal{R}^{\prime}(v_{1}) for v1=Vv2v_{1}=Vv_{2}. Therefore for VV such that VTu1=u2V^{T}u_{1}=u_{2} we get that f(v2)(u2)=f(v1)(u1)f_{\mathcal{R}^{\prime}(v_{2})}(u_{2})=f_{\mathcal{R}^{\prime}(v_{1})}(u_{1}) which implies that

f(v2)(u2)f(v1)(u2)>eε,\frac{f_{\mathcal{R}^{\prime}(v_{2})}(u_{2})}{f_{\mathcal{R}^{\prime}(v_{1})}(u_{2})}>e^{\varepsilon},

which is a contradiction. ∎

Lemma 3.2 implies that we can restrict our attention to algorithms have the same density for all inputs up to rotations and hence allows to study their behavior for a single input. Moreover, as we show in the following lemma, given a randomizer that works for a single input, we can extend it to achieve the same error for all inputs. To facilitate notation, we say that a density f:d+f:\mathbb{R}^{d}\to\mathbb{R}_{+} is ε\varepsilon-indistinguishable if f(u1)f(u2)eε\frac{f^{\prime}(u_{1})}{f^{\prime}(u_{2})}\leq e^{\varepsilon} for all u1,u2du_{1},u_{2}\in\mathbb{R}^{d} such that u12=u22\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}.

Lemma 3.3.

Fix v0=e1𝕊d1v_{0}=e_{1}\in\mathbb{S}^{d-1}. Let f:d+f:\mathbb{R}^{d}\to\mathbb{R}_{+} be an ε\varepsilon-indistinguishable density function with corresponding random variable \mathcal{R} such that that 𝔼[]=e1\mathbb{E}[\mathcal{R}]=e_{1}. There exists an ε\varepsilon-DP local randomizer \mathcal{R}^{\prime} such that 𝖤𝗋𝗋(,v)=𝔼[e122]\mathsf{Err}(\mathcal{R}^{\prime},v)=\mathbb{E}[\left\|{\mathcal{R}-e_{1}}\right\|_{2}^{2}] and 𝔼[(v)]=v\mathbb{E}[\mathcal{R}^{\prime}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}.

Proof.

The proof is similar to the proof of Lemma 3.2. For any v𝕊d1v\in\mathbb{S}^{d-1}, we let U(v)d×dU(v)\in\mathbb{R}^{d\times d} be an orthonormal matrix such that v0=U(v)vv_{0}=U(v)v. Then, following Lemma 3.2, we define (v)=UT(v)\mathcal{R}^{\prime}(v)=U^{T}(v)\mathcal{R}. The claim immediately follows. ∎

Lemma 3.2 and Lemma 3.3 imply that we only need to study the behavior of randomizer for a fixed input. Henceforth, we will fix the input to v=e1v=e_{1} and investigate properties of the density given e1e_{1}.

Given v=(v1,v2,,vd)v=(v_{1},v_{2},\dots,v_{d}) we define its reflection to be v=(v1,v2,,vd)v^{-}=(v_{1},-v_{2},\dots,-v_{d}). The next lemma shows that we can assume that the densities at vv and vv^{-} are equal for some optimal algorithm.

Lemma 3.4 (Reflection Symmetry).

Let f:d+f:\mathbb{R}^{d}\to\mathbb{R}_{+} be an ε\varepsilon-indistinguishable density function with corresponding random variable \mathcal{R} such that that 𝔼[]=e1\mathbb{E}[\mathcal{R}]=e_{1}. There is f:d+f^{\prime}:\mathbb{R}^{d}\to\mathbb{R}_{+} with corresponding random variable \mathcal{R}^{\prime} that satisfies the same properties such that 𝖤𝗋𝗋()𝖤𝗋𝗋()\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R}) and f(u)=f(u)f^{\prime}(u)=f^{\prime}(u^{-}) for all udu\in\mathbb{R}^{d}.

Proof.

We define f(u)=f(u)+f(u)2f^{\prime}(u)=\frac{f(u)+f(u^{-})}{2} for all u𝕊d1u\in\mathbb{S}^{d-1}. First, it is immediate to see that f(u)=f(u)f(u)=f(u^{-}) for all udu\in\mathbb{R}^{d}. Moreover, we have

f(u1)f(u2)\displaystyle\frac{f^{\prime}(u_{1})}{f^{\prime}(u_{2})} =f(u1)+f(u1)f(u2)+f(u2)\displaystyle=\frac{f(u_{1})+f(u_{1}^{-})}{f(u_{2})+f(u_{2}^{-})}
max(f(u1)f(u2),f(u1)f(u2))eε.\displaystyle\leq\max\left(\frac{f(u_{1})}{f(u_{2})},\frac{f(u_{1}^{-})}{f(u_{2}^{-})}\right)\leq e^{\varepsilon}.

Note also that 𝔼[]=e1\mathbb{E}[\mathcal{R}^{\prime}]=e_{1} since the marginal distribution of the first coordinate in the output did not change and it is clear that for other coordinates the expectation is zero as u+u=ce1u+u^{-}=c\cdot e_{1} for any udu\in\mathbb{R}^{d}. Finally, note that 𝖤𝗋𝗋(,e1)=𝖤𝗋𝗋(,e1)\mathsf{Err}(\mathcal{R}^{\prime},e_{1})=\mathsf{Err}(\mathcal{R},e_{1}) since ue12=ue12\left\|{u-e_{1}}\right\|_{2}=\left\|{u^{-}-e_{1}}\right\|_{2} for all udu\in\mathbb{R}^{d}. ∎

We also have the following lemma which shows that the optimal density ff outputs vectors on a sphere with some fixed radius.

Lemma 3.5.

Let f:d+f:\mathbb{R}^{d}\to\mathbb{R}_{+} be an ε\varepsilon-indistinguishable density function with corresponding random variable \mathcal{R} such that that 𝔼[]=e1\mathbb{E}[\mathcal{R}]=e_{1}. For any τ>0\tau>0, there exists an ε\varepsilon-indistinguishable density f:d+f^{\prime}:\mathbb{R}^{d}\to\mathbb{R}_{+} with corresponding random variable \mathcal{R}^{\prime} such that 2=C\left\|{\mathcal{R}^{\prime}}\right\|_{2}=C for some C>0C>0, 𝔼[]=e1\mathbb{E}[\mathcal{R}^{\prime}]=e_{1} and 𝖤𝗋𝗋()𝖤𝗋𝗋()+τ\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})+\tau.

Proof.

By Lemma 3.4, we can assume without loss of generality that \mathcal{R} satisfies reflection symmetry, that is f(u)=f(u)f(u)=f(u^{-}). We think of the density ff as first sampling a radius RR then sampling a vector udu\in\mathbb{R}^{d} of radius RR. We also assume that \mathcal{R} has bounded radius; otherwise as 𝖤𝗋𝗋()=𝔼[22]1\mathsf{Err}(\mathcal{R})=\mathbb{E}[\left\|{\mathcal{R}^{2}}\right\|_{2}]-1 is bounded, we can project the output of \mathcal{R} to some large radius Rτ>0R_{\tau}>0 while increasing the error by at most τ>0\tau>0 for any τ\tau. Similarly, we can assume that the output has radius at least rτr_{\tau} while increasing the error by at most τ\tau. Let fRf_{R} denote the distribution of the radius, and fuR=rf_{u\mid R=r} be the conditional distribution of the output given the radius is rr. In this terminology 𝔼[]=e1\mathbb{E}[\mathcal{R}]=e_{1} implies that

𝖤𝗋𝗋(,e1)\displaystyle\mathsf{Err}(\mathcal{R},e_{1}) =𝔼[e122]\displaystyle=\mathbb{E}[\|\mathcal{R}-e_{1}\|_{2}^{2}]
=𝔼[22+e1222,e1]\displaystyle=\mathbb{E}[\|\mathcal{R}\|_{2}^{2}+\|e_{1}\|_{2}^{2}-2\langle\mathcal{R},e_{1}\rangle]
=𝔼[22]1.\displaystyle=\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]-1.

For the purpose of finding the optimal algorithm, we need \mathcal{R} that minimizes 𝔼[22]\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]. Denote Wr=𝔼[,e1R=r]W_{r}=\mathbb{E}[\langle\mathcal{R},e_{1}\rangle\mid R=r] and set

Cmax=supr[rτ,Rτ]Wrr.\displaystyle C_{\max}=\sup_{r\in[r_{\tau},R_{\tau}]}\frac{W_{r}}{r}.

Noting that 𝔼[,e1]=1\mathbb{E}[\langle\mathcal{R},e_{1}\rangle]=1, we have

𝔼[22]\displaystyle\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}] =𝔼[22]𝔼[,e1]2\displaystyle=\frac{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]}{\mathbb{E}[\langle\mathcal{R},e_{1}\rangle]^{2}}
=𝔼[22](r=0frWr𝑑r)2\displaystyle=\frac{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}W_{r}dr\right)^{2}}
=𝔼[22](r=0frr(Wr/r)𝑑r)2\displaystyle=\frac{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}r(W_{r}/r)dr\right)^{2}}
𝔼[22](r=0frrCmax𝑑r)2\displaystyle\geq\frac{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}rC_{\max}dr\right)^{2}}
1Cmax2𝔼[22]𝔼[2]21Cmax2.\displaystyle\geq\frac{1}{C_{\max}^{2}}\frac{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]}{\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}]^{2}}\geq\frac{1}{C_{\max}^{2}}.

Now consider rmax>0r_{\max}>0 that has Cmax=Wrmax/rmaxC_{\max}=W_{r_{\max}}/r_{\max}; rmaxr_{\max} exists as \mathcal{R} has outputs in [rτ,Rτ][r_{\tau},R_{\tau}]. Let fmaxf_{\max} denote the conditional distribution of \mathcal{R} given that R=rmaxR=r_{\max} and let max\mathcal{R}_{\max} denote the corresponding randomizer. We define a new randomizer \mathcal{R}^{\prime} as follows

=1rmaxCmaxmax,\mathcal{R}^{\prime}=\frac{1}{r_{\max}C_{\max}}\mathcal{R}_{\max},

with corresponding density ff^{\prime}. Note that ff^{\prime} is ε\varepsilon-indistiguishable as ff is ε\varepsilon-indistiguishable and the conditional distributions given different radii are disjoint which implies fmaxf_{\max} is ε\varepsilon-indistiguishable. Moreover fmax(u)=fmax(u)f_{\max}(u)=f_{\max}(u^{-}) which implies that 𝔼[]=1rmaxCmax𝔼[RR=rmax]=e1\mathbb{E}[\mathcal{R}^{\prime}]=\frac{1}{r_{\max}C_{\max}}\mathbb{E}[R\mid R=r_{\max}]=e_{1}. Finally, note that \mathcal{R}^{\prime} satisfies

𝔼[22]\displaystyle\mathbb{E}[\left\|{\mathcal{R}^{\prime}}\right\|_{2}^{2}] =1rmax2Cmax2𝔼[max22]\displaystyle=\frac{1}{r_{\max}^{2}C_{\max}^{2}}\mathbb{E}[\left\|{\mathcal{R}_{\max}}\right\|_{2}^{2}]
=1rmax2Cmax2𝔼[22R=rmax]\displaystyle=\frac{1}{r_{\max}^{2}C_{\max}^{2}}\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}\mid R=r_{\max}]
=1Cmax2𝔼[22].\displaystyle=\frac{1}{C_{\max}^{2}}\leq\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}].

The claim follows. ∎

Before we present our main proposition which formulates the linear program that finds the optimal minimizer, we need the following key property which allows to describe the privacy guarantee as a linear constraint. We remark that such a lemma can easily be proven for deletion DP, so that our results would extend to that definition.

Lemma 3.6.

Let :𝕊d1d\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} be an ε\varepsilon-DP local randomizer. There is ρ:d+\rho:\mathbb{R}^{d}\to\mathbb{R}_{+} such that for all v𝕊d1v\in\mathbb{S}^{d-1} and uu\in\mathbb{R}

eε/2f(v)(u)ρ(u)eε/2.e^{-\varepsilon/2}\leq\frac{f_{\mathcal{R}(v)}(u)}{\rho(u)}\leq e^{\varepsilon/2}.

Moreover, if \mathcal{R} satisfies the properties of Lemma 3.2 (invariance) then ρ(u1)=ρ(u2)\rho(u_{1})=\rho(u_{2}) for u12=u22\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}.

Proof.

Define ρ(u)=infv𝕊d1f(v)(u)supv𝕊d1f(v)(u)\rho(u)=\sqrt{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)\cdot\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}. Note that for all v𝕊d1v\in\mathbb{S}^{d-1},

f(v)(u)infv𝕊d1f(v)(u)supv𝕊d1f(v)(u)\displaystyle\frac{f_{\mathcal{R}(v)}(u)}{\sqrt{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)\cdot\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}} =f(v)(u)infv𝕊d1f(v)(u)f(v)(u)supv𝕊d1f(v)(u)eε/2.\displaystyle=\sqrt{\frac{f_{\mathcal{R}(v)}(u)}{{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}}}\sqrt{\frac{f_{\mathcal{R}(v)}(u)}{{\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}}}\leq e^{\varepsilon/2}.

The second direction follows similarly. The second part of the claim follows as for any u1,u2du_{1},u_{2}\in\mathbb{R}^{d} such that u12=u22\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}, if fR(v1)(u1)=tf_{R(v_{1})}(u_{1})=t for any mechanism that satisfies the properties of Lemma 3.2 then there is v2v_{2} such that fR(v2)(u2)=tf_{R(v_{2})}(u_{2})=t. The definition of ρ\rho now implies that ρ(u1)=ρ(u2)\rho(u_{1})=\rho(u_{2}). ∎

We are now ready to present our main step towards proving the optimality result. The following proposition formulates the problem of finding the optimal algorithm as a linear program. As a result, we show that there is an optimal algorithm whose density function has at most two different probabilities.

Proposition 3.

Let :𝕊d1d\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d} be an ε\varepsilon-DP local randomizer such that 𝔼[(v)]=v\mathbb{E}[\mathcal{R}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}. For any τ>0\tau>0, there exist constants C,p>0C,p>0 and an ε\varepsilon-DP local randomizer :𝕊d1C𝕊d1\mathcal{R}^{\prime}:\mathbb{S}^{d-1}\to C\cdot\mathbb{S}^{d-1} such that 𝔼[(v)]=v\mathbb{E}[\mathcal{R}^{\prime}(v)]=v for all v𝕊d1v\in\mathbb{S}^{d-1}, 𝖤𝗋𝗋()𝖤𝗋𝗋()+τ\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})+\tau, f(v)(u)=f(v)(u)f_{\mathcal{R}^{\prime}(v)}(u)=f_{\mathcal{R}^{\prime}(v)}(u^{-}) , and f(v)(u){eε/2,eε/2}pf_{\mathcal{R}^{\prime}(v)}(u)\in\{e^{-\varepsilon/2},e^{\varepsilon/2}\}p for all uC𝕊d1u\in C\cdot\mathbb{S}^{d-1}.

Proof.

The proof will proceed by formulating a linear program which describes the problem of finding the optimal randomizer and then argue that minimizer of this program must satisfy the desired conditions. To this end, first we use the properties of the optimal randomizer from the previous lemmas to simplify the linear program. Lemma 3.5 implies that there exists an optimal randomizer :𝕊d1C𝕊d1\mathcal{R}:\mathbb{S}^{d-1}\to C\cdot\mathbb{S}^{d-1} for some C>0C>0 that is also invariant under rotations (satisfies the conclusions of Lemma 3.2). Moreover, Lemma 3.6 implies that the density function fR(v)f_{R(v)} has for some p>0p>0

eε/2pfR(v)(u)eε/2p.e^{-\varepsilon/2}p\leq f_{R(v)}(u)\leq e^{\varepsilon/2}p.

Adding the requirement of unbiasedness, and noticing that for such algorithms the error is C21C^{2}-1, this results in the following minimization problem where the variables are CC and the density functions fv:C𝕊d1+f_{v}:C\mathbb{S}^{d-1}\to\mathbb{R}_{+} for all v𝕊d1v\in\mathbb{S}^{d-1}

argminC,fv:C𝕊d1+for all v𝕊dC\displaystyle\arg\min_{C,f_{v}:C\mathbb{S}^{d-1}\to\mathbb{R}_{+}\text{for all }v\in\mathbb{S}^{d}}C\quad\quad (A)
subjectto\displaystyle\mathop{\rm subject\;to}
eε/2pfv(u)eε/2p,v𝕊d1,uC𝕊d1\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{v}(u)\leq e^{\varepsilon/2}p,\quad v\in\mathbb{S}^{d-1},~{}u\in C\mathbb{S}^{d-1}
C𝕊d1fv(u)u𝑑u=v,v𝕊d1\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{v}(u)udu=v,\quad v\in\mathbb{S}^{d-1}
C𝕊d1fv(u)𝑑u=1,v𝕊d1\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{v}(u)du=1,\quad v\in\mathbb{S}^{d-1}

Lemma 3.2 and Lemma 3.3 also show that the optimal algorithm is invariant under rotations, and that we only need to find the output distribution ff with respect to a fixed input v=e1v=e_{1}. Moreover, Lemma 3.4 says that can assume that fe1(u)=fe1(u)f_{e_{1}}(u)=f_{e_{1}}(u^{-}) for all uu. We also work now with the normalized algorithm ^(v)=(v)/C\hat{\mathcal{R}}(v)=\mathcal{R}(v)/C (that is, the output on the unit sphere). Note that for ^\hat{\mathcal{R}} we have 𝔼[^(v)]=v/C\mathbb{E}[\hat{\mathcal{R}}(v)]=v/C. Denoting α=1/C\alpha=1/C, this results in the following linear program (LP)

argmaxα,p,fe1:𝕊d1+α\displaystyle\arg\max_{\alpha,p,f_{e_{1}}:\mathbb{S}^{d-1}\to\mathbb{R}_{+}}\alpha\quad\quad (B)
subjectto\displaystyle\mathop{\rm subject\;to}
eε/2pfe1(u)eε/2p,u𝕊d1\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p,\quad u\in\mathbb{S}^{d-1}
fe1(u)=fe1(u),u𝕊d1\displaystyle\quad\quad\quad\quad\quad f_{e_{1}}(u)=f_{e_{1}}(u^{-}),\quad u\in\mathbb{S}^{d-1}
C𝕊d1fe1(u)u𝑑u=αe1,\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{e_{1}}(u)udu=\alpha e_{1},
C𝕊d1fe1(u)𝑑u=1.\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{e_{1}}(u)du=1.

We need to show that most of the inequality constraints eε/2pfe1(u)eε/2pe^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p must be tight at one of the two extremes. To this end, we approximate the LP (B) using a finite number of variables by discretizing the density function fe1f_{e_{1}}. We assume we have a δ/2\delta/2-cover S={u1,,uK}S=\{u_{1},\dots,u_{K}\} of 𝕊d1\mathbb{S}^{d-1}. We assume without loss of generality that if uiSu_{i}\in S then uiSu_{i}^{-}\in S and we also write S=S0S1S=S_{0}\cup S_{1} where S0=S1S_{0}=S_{1}^{-} and S0S1=S_{0}\cap S_{1}=\emptyset. Let Bi={w𝕊d1:wui2wuj2}B_{i}=\{w\in\mathbb{S}^{d-1}:\left\|{w-u_{i}}\right\|_{2}\leq\left\|{w-u_{j}}\right\|_{2}\}, Vi=uBi1𝑑uV_{i}=\int_{u\in B_{i}}1du, and u¯i=𝔼U𝖴𝗇𝗂(𝕊d1)[UUBi]\bar{u}_{i}=\mathbb{E}_{U\sim\mathsf{Uni}(\mathbb{S}^{d-1})}[U\mid U\in B_{i}]. Now we limit our linear program to density functions that are constant over each BiB_{i}, resulting in the following LP

argmaxα,fe1:𝕊d1+α\displaystyle\arg\max_{\alpha,f_{e_{1}}:\mathbb{S}^{d-1}\to\mathbb{R}_{+}}\alpha\quad\quad (C)
subjectto\displaystyle\mathop{\rm subject\;to}
eε/2pfe1(u)eε/2p,uS0\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p,\quad u\in S_{0}
uS0fe1(u)Viu¯i+uS1fe1(u)Viu¯i=αe1,\displaystyle\quad\quad\quad\quad\quad\sum_{u\in S_{0}}f_{e_{1}}(u)V_{i}\bar{u}_{i}+\sum_{u\in S_{1}}f_{e_{1}}(u^{-})V_{i}\bar{u}_{i}=\alpha e_{1},
uS0fe1(u)Vi+uS1fe1(u)Vi=1.\displaystyle\quad\quad\quad\quad\quad\sum_{u\in S_{0}}f_{e_{1}}(u)V_{i}+\sum_{u\in S_{1}}f_{e_{1}}(u^{-})V_{i}=1.

Let α1\alpha^{\star}_{1} and α2\alpha^{\star}_{2} denote the maximal values of (B) and (C), respectively. Each solution to (C) is also a solution to (B) hence α1α2\alpha^{\star}_{1}\geq\alpha^{\star}_{2}. Moreover, given δ>0\delta>0, let ff be a solution of (B) that obtains αα1δ\alpha\geq\alpha^{\star}_{1}-\delta and let \mathcal{R} be the corresponding randomizer. We can now define a solution for the discrete program (C) by setting for uBiu\in B_{i},

f^(u)=1ViwBife1(w)𝑑w\hat{f}(u)=\frac{1}{V_{i}}\int_{w\in B_{i}}f_{e_{1}}(w)dw

Equivalently, we can define ^\hat{\mathcal{R}} as follows: first run \mathcal{R} to get uu and find BiB_{i} such that uBiu\in B_{i}. Then return a vector uniformly at random from BiB_{i}. Note that f^\hat{f} clearly satisfies the first and third constraints in (C). As for the second constraint, it follows since f^e1(u)=f^e1(u)\hat{f}_{e_{1}}(u)=\hat{f}_{e_{1}}(u^{-}) which implies that uSfe1(u)Viu¯i=α^v\sum_{u\in S}f_{e_{1}}(u)V_{i}\bar{u}_{i}=\hat{\alpha}v for some α^>0\hat{\alpha}>0. It remains to show that α^α12δ\hat{\alpha}\geq\alpha^{\star}_{1}-2\delta. The above representation of ^\hat{\mathcal{R}} shows that 𝔼[^]δ\|\mathbb{E}[\hat{\mathcal{R}}-\mathcal{R}]\|\leq\delta and therefore we have α^α12δ\hat{\alpha}\geq\alpha^{\star}_{1}-2\delta.

To finish the proof, it remains to show that the discrete LP (C) has a solution that satisfies the desired properties. Note that as this is a linear program with KK variables and 2K+d+22K+d+2 constraints, KK linearly independent constraints must be satisfied [BT97, theorem 2.3], which shows that for at least Kd2K-d-2 of the sets BiB_{i} we have f(ui){eε,eε}pf(u_{i})\in\{e^{\varepsilon},e^{-\varepsilon}\}p.

To finish the proof, we need to manipulate the probabilities for the remaining d2d-2 sets to satisfy our desired requirements. As these sets have small probability, this does not change the accuracy by much and we just need to do this manipulation carefully so as to preserve reflection symmetry and unbiasedness. The full details are tedious and we present them in Section A.1. ∎

Given the previous lemmas, we are now ready to finish the proof of Proposition 2.

Proof.

Fix τ>0\tau>0 and an unbiased ε\varepsilon-DP local randomizer \mathcal{R}^{\star}. Proposition 3 shows that there exists :𝕊d1rmax𝕊d1\mathcal{R}:\mathbb{S}^{d-1}\to r_{\max}\mathbb{S}^{d-1} that is ε\varepsilon-DP, unbiased, reflection symmetric (f(e1)(u)=f(e1)(u)f_{\mathcal{R}(e_{1})}(u)=f_{\mathcal{R}(e_{1})}(u^{-}) for all uu), and satisfies f(e1)(u){eε/2,eε/2}pf_{\mathcal{R}(e_{1})}(u)\in\{e^{-\varepsilon/2},e^{\varepsilon/2}\}p. Morevoer 𝖤𝗋𝗋()𝖤𝗋𝗋()+τ\mathsf{Err}(\mathcal{R})\leq\mathsf{Err}(\mathcal{R}^{\star})+\tau. We will transform \mathcal{R} into an instance of PrivUnit while maintaining the same error as \mathcal{R}.

To this end, if \mathcal{R} is an instance of PrivUnit then we are done. Otherwise let f=f(e1)f=f_{\mathcal{R}(e_{1})}, S0(t)={u:f(u)=peε/2,u,e1t}S_{0}(t)=\{u:f(u)=pe^{\varepsilon/2},\langle u,e_{1}\rangle\leq t\} and S1(t)={u:f(u)=peε/2,u,e1t}S_{1}(t)=\{u:f(u)=pe^{-\varepsilon/2},\langle u,e_{1}\rangle\geq t\}. Consider t[1,1]t\in[-1,1] that solves the following minimization problem

minimizet[1,1]S0(t)f(u)𝑑u+S1(t)f(u)𝑑u\begin{split}\mathop{\rm minimize}_{t\in[-1,1]}~{}&\int_{S_{0}(t)}f(u)du+\int_{S_{1}(t)}f(u)du\end{split}

Let pp^{\star} be the value of the above minimization problem and tt^{\star} the corresponding minimizer. Let p0=S0(t)f(u)𝑑up_{0}=\int_{S_{0}(t^{\star})}f(u)du and p1=S1(t)f(u)𝑑up_{1}=\int_{S_{1}(t^{\star})}f(u)du. Assume without loss of generality that p0p1p_{0}\leq p_{1} (the other direction follows from identical arguments). Let S^S1(t)\hat{S}\subseteq S_{1}(t^{\star}) be such that S^=S^\hat{S}=\hat{S}^{-} and S^f(u)𝑑u=p0\int_{\hat{S}}f(u)du=p_{0}. We define f~\tilde{f} by swapping the probabilities on S^\hat{S} and S0(t)S_{0}(t^{\star}), that is,

f~(u)={f(u)if uS0(t)S^peε/2if uS0(t)peε/2if uS^\tilde{f}(u)=\begin{cases}f(u)&\text{if }u\notin S_{0}(t^{\star})\cup\hat{S}\\ pe^{-\varepsilon/2}&\text{if }u\in S_{0}(t^{\star})\\ pe^{\varepsilon/2}&\text{if }u\in\hat{S}\end{cases}

Clearly f~\tilde{f} still satisfies all of our desired properties and has 𝔼Uf~[U,e1]𝔼Uf[U,e1]\mathbb{E}_{U\sim\tilde{f}}[\langle U,e_{1}\rangle]\geq\mathbb{E}_{U\sim f}[\langle U,e_{1}\rangle] as we have that u1,e1u2,e1\langle u_{1},e_{1}\rangle\geq\langle u_{2},e_{1}\rangle for u1S^u_{1}\in\hat{S} and u2S0(t)u_{2}\in S_{0}(t^{\star}). Note also that f~(u)=peε/2\tilde{f}(u)=pe^{-\varepsilon/2} for uu such that u,e1t\langle u,e_{1}\rangle\leq t. Moreover, for uu such that u,e1t\langle u,e_{1}\rangle\geq t, we have that f~(u)=peε/2\tilde{f}(u)=pe^{-\varepsilon/2} only if uBS1S^u\in B\coloneqq S_{1}\setminus\hat{S}. Let δ\delta be such that the set A={u:tu,e1t+δ}A=\{u:t^{\star}\leq\langle u,e_{1}\rangle\leq t^{\star}+\delta\} has uAf~(u)𝑑u=uBf~(u)𝑑u\int_{u\in A}\tilde{f}(u)du=\int_{u\in B}\tilde{f}(u)du. We now define

f^(u)={peε/2if u,e1t+δpeε/2if u,e1t+δ\hat{f}(u)=\begin{cases}pe^{\varepsilon/2}&\text{if }\langle u,e_{1}\rangle\geq t^{\star}+\delta\\ pe^{-\varepsilon/2}&\text{if }\langle u,e_{1}\rangle\leq t^{\star}+\delta\end{cases}

Clearly, f^(u)\hat{f}(u) is an instance of PrivUnit. Now we prove that it satisfies all of our desired properties. First, note that we can write f^\hat{f} as

f^(u)={f~(u)if uABf~(u)if uABpeε/2if uABpeε/2if uBA\hat{f}(u)=\begin{cases}\tilde{f}(u)&\text{if }u\notin A\cup B\\ \tilde{f}(u)&\text{if }u\in A\cap B\\ pe^{-\varepsilon/2}&\text{if }u\in A\setminus B\\ pe^{\varepsilon/2}&\text{if }u\in B\setminus A\end{cases}

This implies that 𝕊d1f^𝑑u=1\int_{\mathbb{S}^{d-1}}\hat{f}du=1. Moreover, f^\hat{f} is ε\varepsilon-indistiguishable by definition. Finally, note that 𝔼Uf^[U,e1]𝔼Uf~[U,e1]\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]\geq\mathbb{E}_{U\sim\tilde{f}}[\langle U,e_{1}\rangle] as u1,e1u2,e1\langle u_{1},e_{1}\rangle\geq\langle u_{2},e_{1}\rangle for u1BAu_{1}\in B\setminus A and u2ABu_{2}\in A\setminus B. Let ^\hat{\mathcal{R}} be the randomizer that corresponds to f^\hat{f}. We define =1𝔼Uf^[U,e1]^\mathcal{R}^{\prime}=\frac{1}{\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]}\hat{\mathcal{R}}. We have that 𝔼[]=e1\mathbb{E}[\mathcal{R}^{\prime}]=e_{1} and that

𝖤𝗋𝗋()\displaystyle\mathsf{Err}(\mathcal{R}^{\prime}) =1𝔼Uf^[U,e1]21\displaystyle=\frac{1}{\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]^{2}}-1
1𝔼Uf[U,e1]21\displaystyle\leq\frac{1}{\mathbb{E}_{U\sim f}[\langle U,e_{1}\rangle]^{2}}-1
=𝖤𝗋𝗋().\displaystyle=\mathsf{Err}(\mathcal{R}).

As \mathcal{R}^{\prime} is an instance of PrivUnit, the claim follows. ∎

4 PrivUnitG: an optimal algorithm based on Gaussian distribution

In this section, we develop a new variant of PrivUnit, namely PrivUnitG, based on the Gaussian distribution. PrivUnitG essentially provides an easy-to-analyze approximation of the optimal algorithm PrivUnit. This enables to efficiently find accurate approximations of the optimal parameters pp^{\star} and qq^{\star}. In fact, we show that these parameters are independent of the dimension which is computationally valuable. Moreover, building on PrivUnitG, we are able to analytically study the constants that characterize the optimal loss.

The main idea in PrivUnitG is to approximate the uniform distribution over the unit sphere using a Gaussian distribution. Roughly, for a Gaussian random variable U=𝖭(0,1/d)U=\mathsf{N}(0,1/d) and input vector vv, PrivUnitG has the following distribution (up to normalization constants)

PrivUnitG{UU,vγwith probability pUU,v<γwith probability 1p\text{PrivUnitG}\sim\begin{cases}&U\mid\langle U,v\rangle\geq\gamma\quad\quad\text{with probability }p\\ &U\mid\langle U,v\rangle<\gamma\quad\quad\text{with probability }1-p\end{cases}

We present the full details including the normalization constants in Algorithm 2. We usually use the notation PrivUnitG(p,q)\text{PrivUnitG}(p,q) which means applying PrivUnitG with pp and γ=Φ1(q)/d\gamma=\Phi^{-1}(q)/\sqrt{d}.

Algorithm 2 PrivUnitG(p,q)\text{PrivUnitG}(p,q)
0:  v𝕊d1v\in\mathbb{S}^{d-1}, q[0,1]q\in[0,1], p[0,1]p\in[0,1]
1:  Draw z𝖡𝖾𝗋(p)z\sim\mathsf{Ber}(p)
2:  Let U=𝖭(0,σ2)U=\mathsf{N}(0,\sigma^{2}) where σ2=1/d\sigma^{2}=1/d
3:  Set γ=Φσ21(q)=σΦ1(q)\gamma=\Phi_{\sigma^{2}}^{-1}(q)=\sigma\cdot\Phi^{-1}(q)
4:  if z=1z=1 then
5:     Draw αUUγ\alpha\sim U\mid U\geq\gamma
6:  else
7:     Draw αUU<γ\alpha\sim U\mid U<\gamma
8:  Draw V𝖭(0,σ2(IvvT)V^{\perp}\sim\mathsf{N}(0,\sigma^{2}(I-vv^{T})
9:  Set V=αv+VV=\alpha v+V^{\perp}
10:  Calculate
m=σϕ(γ/σ)(p1q1pq)m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right)
11:  Return 1mV\frac{1}{m}\cdot V

The following proposition gives the privacy and utility guarantees for PrivUnitG. The r.v. α\alpha is defined (see Algorithm 2) as α=U,v\alpha=\langle U,v\rangle where UU is drawn from PrivUnitG(v)\text{PrivUnitG}(v). We define m=σϕ(γ/σ)(p1q1pq)m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right) with σ2=1/d\sigma^{2}=1/d and γ=σΦ1(q)\gamma=\sigma\cdot\Phi^{-1}(q). We defer the proof to Section B.1.

Proposition 4.

Let p,q[0,1]p,q\in[0,1] such that p1pq1qeε\frac{p}{1-p}\frac{q}{1-q}\leq e^{\varepsilon}. The algorithm PrivUnitG(p,q)\text{PrivUnitG}(p,q) is ε\varepsilon-DP local randomizer. Moreover, it is unbiased and has error

𝖤𝗋𝗋(PrivUnitG(p,q))=𝔼[PrivUnitG(v)v22]=𝔼[α2]+d1d𝔼[α]21.\mathsf{Err}(\text{PrivUnitG}(p,q))=\mathbb{E}[\left\|{\text{PrivUnitG}(v)-v}\right\|_{2}^{2}]=\frac{\mathbb{E}[\alpha^{2}]+\frac{d-1}{d}}{\mathbb{E}[\alpha]^{2}}-1.

Moreover, we have

m2𝖤𝗋𝗋(PrivUnitG(p,q))d1.m^{2}\cdot\mathsf{Err}(\text{PrivUnitG}(p,q))\stackrel{{\scriptstyle d\to\infty}}{{\to}}1.

Now we proceed to analyze the utility guarantees of PrivUnitG as compared to PrivUnit. To this end, we first define the error obtained by PrivUnitG with optimized parameters

𝖤𝗋𝗋ε,d(PrivUnitG)=infp,q:pq(1p)(1q)eε𝖤𝗋𝗋(PrivUnitG(p,q)).\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})=\inf_{p,q:\frac{pq}{(1-p)(1-q)}\leq e^{\varepsilon}}\mathsf{Err}(\text{PrivUnitG}(p,q)).

Similarly, we define this quantity for PrivUnit

𝖤𝗋𝗋ε,d(PrivUnit)=infp,q:pq(1p)(1q)eε𝖤𝗋𝗋(PrivUnit(p,q)).\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnit})=\inf_{p,q:\frac{pq}{(1-p)(1-q)}\leq e^{\varepsilon}}\mathsf{Err}(\text{PrivUnit}(p,q)).

The following theorem shows that PrivUnitG enjoys the same error as PrivUnit up to small factors.

Theorem 3.

Let p[0,1]p\in[0,1] and q[0,1]q\in[0,1] such that PrivUnit(p,q)\text{PrivUnit}(p,q) is ε\varepsilon-DP local randomizer. Then PrivUnitG(p,q)\text{PrivUnitG}(p,q) is also ε\varepsilon-DP local randomizer and has

𝖤𝗋𝗋(PrivUnitG(p,q))\displaystyle\mathsf{Err}(\text{PrivUnitG}(p,q)) 𝖤𝗋𝗋(PrivUnit(p,q))(1+O(ε+logdd)).\displaystyle\leq\mathsf{Err}(\text{PrivUnit}(p,q))\left(1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right).

In particular,

𝖤𝗋𝗋ε,d(PrivUnitG)𝖤𝗋𝗋ε,d(PrivUnit)1+O(ε+logdd).\displaystyle\frac{\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})}{\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnit})}\leq 1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right).

We conduct several experiments that demonstrate that the error of both algorithms is nearly the same as we increase the dimension. We plot the ratio of the error of PrivUnitG and PrivUnit (for the same pp and γ\gamma) for different epsilons and dimensions in Figure 1. These plots reaffirm the theoretical results of Theorem 3, that is, the ratio is smaller for large dd and small ε\varepsilon.

\begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps4.0}.pdf} \end{overpic} \begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps8.0}.pdf} \end{overpic} \begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps16.0}.pdf} \end{overpic}
(a) (b) (c)
Figure 1: Ratio of the error of PrivUnitG to PrivUnit for (a) ε=4.0\varepsilon=4.0 (b) ε=8.0\varepsilon=8.0 (c) ε=16.0\varepsilon=16.0. We use the same pp and γ\gamma for both algorithms by finding the best p,qp,q that minimize PrivUnitG.
Proof.

The privacy proof is straightforward as both algorithms use the same pp and qq therefore enjoy the same privacy parameter. Moreover, the second part of the claim follows immediately from the first part and the optimality of PrivUnit (Proposition 2).

Now we prove the first part of the claim. Let γ2\gamma_{2} be such that q=P(Wγ2)q=P(W\leq\gamma_{2}) where W𝖴𝗇𝗂(𝕊2d1)W\sim\mathsf{Uni}(\mathbb{S}_{2}^{d-1}). Then we have

m2\displaystyle m_{2} =𝔼[W1{Wγ2}]p+q1q(1q)\displaystyle=\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]\frac{p+q-1}{q(1-q)}
𝖤𝗋𝗋2\displaystyle\mathsf{Err}_{2} =𝖤𝗋𝗋(PrivUnit(p,q))=1m22.\displaystyle=\mathsf{Err}(\text{PrivUnit}(p,q))=\frac{1}{m_{2}^{2}}.

Similarly for PrivUnitG we have (see e.g. proof of Proposition 4)

mG\displaystyle m_{G} =E[α]=𝔼[U1{UγG}]p+q1q(1q)\displaystyle=E[\alpha]=\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]\frac{p+q-1}{q(1-q)}
𝖤𝗋𝗋G\displaystyle\mathsf{Err}_{G} =𝖤𝗋𝗋(PrivUnitG(p,q))=𝔼[α2]1/dmG2+1mG2.\displaystyle=\mathsf{Err}(\text{PrivUnitG}(p,q))=\frac{\mathbb{E}[\alpha^{2}]-1/d}{m_{G}^{2}}+\frac{1}{m_{G}^{2}}.

Therefore we have

𝖤𝗋𝗋G𝖤𝗋𝗋2\displaystyle\sqrt{\frac{\mathsf{Err}_{G}}{\mathsf{Err}_{2}}} m2mG1+𝔼[α2]\displaystyle\leq\frac{m_{2}}{m_{G}}\sqrt{1+\mathbb{E}[\alpha^{2}]}
𝔼[W1{Wγ2}]𝔼[U1{UγG}]1+𝔼[α2]\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}\sqrt{1+\mathbb{E}[\alpha^{2}]}
(i)𝔼[W1{Wγ2}]𝔼[U1{UγG}](1+O(γG+ε+logdd)),\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}\left(1+O\left(\gamma_{G}+\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right), (1)

where inequality (i)(i) follows Lemma B.3. We now upper bound the first term. Note first that for the same γ\gamma we have

𝔼[W1{Wγ}]𝔼[U1{Uγ}]\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]} (i)1+O(logdd3/2𝔼[U1{Uγ}])\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}1+O\left(\frac{{\sqrt{\log d}}{}}{d^{3/2}\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]}\right)
(ii)1+O(logdd),\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}1+O\left(\frac{\sqrt{\log d}}{d}\right),

where (i)(i) follows from Lemma B.2 and (ii)(ii) follows since 𝔼[U1{Uγ}]𝔼[U1{U0}]=𝔼[|U|]/2Ω(σ)=Ω(1/d)\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]\geq\mathbb{E}[U1\!\left\{U\geq 0\right\}]=\mathbb{E}[|U|]/2\geq\Omega(\sigma)=\Omega(1/\sqrt{d}). We will need one more step to finish the proof as γ2\gamma_{2} and γG\gamma_{G} are not necessarily equal. We divide to cases. If γ2γG\gamma_{2}\geq\gamma_{G}, then 𝔼[U1{UγG}]𝔼[U1{Uγ2}]\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]\geq\mathbb{E}[U1\!\left\{U\geq\gamma_{2}\right\}] and therefore

𝔼[W1{Wγ2}]𝔼[U1{UγG}]\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]} 𝔼[W1{Wγ2}]𝔼[U1{Uγ2}].\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{2}\right\}]}.

Similarly, if γ2γG\gamma_{2}\leq\gamma_{G} then 𝔼[W1{Wγ2}]𝔼[W1{WγG}]\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]\leq\mathbb{E}[W1\!\left\{W\geq\gamma_{G}\right\}] and therefore

𝔼[W1{Wγ2}]𝔼[U1{UγG}]\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]} 𝔼[W1{WγG}]𝔼[U1{UγG}].\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{G}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}.

Overall this proves that

𝔼[W1{Wγ2}]𝔼[U1{UγG}]\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]} 1+O(logdd).\displaystyle\leq 1+O\left(\frac{\sqrt{\log d}}{d}\right).

Putting this back in inequality (4), we have

𝖤𝗋𝗋G𝖤𝗋𝗋2\displaystyle\sqrt{\frac{\mathsf{Err}_{G}}{\mathsf{Err}_{2}}} (1+O(logdd))(1+O(γG+ε+logdd))\displaystyle\leq\left(1+O\left(\frac{\sqrt{\log d}}{d}\right)\right)\left(1+O\left(\gamma_{G}+\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right)
1+O(ε+logdd),\displaystyle\leq 1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right),

where the last inequality follows from Lemma B.4 as γO((ε+1)/d)\gamma\leq O(\sqrt{{(\varepsilon+1)}/{d}}). The claim follows. ∎

4.1 Analytical expression for optimal error

We wish to understand the constants that characterize the optimal error. To this end, we build on the optimality of PrivUnitG and define the quantity Cε,dC_{\varepsilon,d} by

𝖤𝗋𝗋ε,d(PrivUnitG)=Cε,ddε.\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})=C_{\varepsilon,d}\frac{d}{\varepsilon}.

We show that Cε,dCεC_{\varepsilon,d}\to C_{\varepsilon} as dd\to\infty. Moreover, CεCC_{\varepsilon}\to C^{\star} as ε\varepsilon\to\infty. We experimentally demonstrate the behavior of Cε,dC_{\varepsilon,d} and CεC_{\varepsilon} in Figure 2. These experiments show that C0.614C^{\star}\approx 0.614. We remark that as shown in [FT21], if Cε/CkεC_{\varepsilon}/C_{k\varepsilon} is close to 11, then one can get a near optimal algorithm for privacy parameter kεk\varepsilon by repeating the algorithm for privacy parameter ε\varepsilon kk times. The latter may be more efficient in terms of computation and this motivates understanding how quickly CεC_{\varepsilon} converges.

\begin{overpic}[width=195.12767pt]{{plots/C_eps_d=35.0}.pdf} \end{overpic} \begin{overpic}[width=195.12767pt]{{plots/C_eps}.pdf} \end{overpic}
(a) (b)
Figure 2: (a) Cε,dC_{\varepsilon,d} as a function of dd for ε=35\varepsilon=35. (b) CεC_{\varepsilon} as a function of ε\varepsilon (we approximate CεC_{\varepsilon} by taking a sufficiently large dimension d=5104d=5\cdot 10^{4})

The following proposition shows that Cε,dC_{\varepsilon,d} converges as we increase the dimension dd. We provide the proof in Section B.3.

Proposition 5.

Fix ε>0\varepsilon>0. For any 1d1d21\leq d_{1}\leq d_{2},

1O(ε+logd2d2+εd1)Cε,d1Cε,d21+O(ε+logd1d1+εd2).1-O\left(\frac{\varepsilon+\log d_{2}}{d_{2}}+\frac{\varepsilon}{d_{1}}\right)\leq\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}}\leq 1+O\left(\frac{\varepsilon+\log d_{1}}{d_{1}}+\frac{\varepsilon}{d_{2}}\right).

In particular, as dd\to\infty

Cε,ddCε.C_{\varepsilon,d}\stackrel{{\scriptstyle d\to\infty}}{{\to}}C_{\varepsilon}.

The following proposition shows that CεC_{\varepsilon} also converges as we increase ε\varepsilon. We present the proof in Section B.4.

Proposition 6.

There is C>0C^{\star}>0 such that limεCε=C\lim_{\varepsilon\to\infty}C_{\varepsilon}=C^{\star}.

References

  • [App17] Apple Differential Privacy Team “Learning with Privacy at Scale” Available at https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html, 2017
  • [ASYKM18] Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar and Brendan McMahan “cpSGD: Communication-efficient and differentially-private distributed SGD” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018, pp. 7564–7575 URL: https://proceedings.neurips.cc/paper/2018/file/21ce689121e39821d07d04faab328370-Paper.pdf
  • [BBGN19] Borja Balle, James Bell, Adrià Gascón and Kobbi Nissim “The Privacy Blanket of the Shuffle Model” In Advances in Cryptology – CRYPTO 2019 Cham: Springer International Publishing, 2019, pp. 638–667
  • [BDFKR18] Abhishek Bhowmick, John Duchi, Julien Freudiger, Gaurav Kapoor and Ryan Rogers “Protection Against Reconstruction and Its Applications in Private Federated Learning” In arXiv:1812.00984 [stat.ML], 2018
  • [BEMMRLRKTS17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes and Bernhard Seefeld “Prochlo: Strong Privacy for Analytics in the Crowd” In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17 Shanghai, China: Association for Computing Machinery, 2017, pp. 441–459 DOI: 10.1145/3132747.3132769
  • [BIKMMPRSS17] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal and Karn Seth “Practical Secure Aggregation for Privacy-Preserving Machine Learning” In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security New York, NY, USA: ACM, 2017, pp. 1175–1191
  • [BN10] Hai Brenner and Kobbi Nissim “Impossibility of Differentially Private Universally Optimal Mechanisms” In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, 2010, pp. 71–80 DOI: 10.1109/FOCS.2010.13
  • [BNO08] Amos Beimel, Kobbi Nissim and Eran Omri “Distributed private data analysis: Simultaneously solving how and what” In Advances in Cryptology 5157, Lecture Notes in Computer Science Springer, 2008, pp. 451–468
  • [BT97] Dimitris Bertsimas and John N Tsitsiklis “Introduction to linear optimization” Athena Scientific Belmont, MA, 1997
  • [CKÖ20] Wei-Ning Chen, Peter Kairouz and Ayfer Özgür “Breaking the communication-privacy-accuracy trilemma” In arXiv:2007.11707 [cs.LG], 2020
  • [CSUZZ19] Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber and Maxim Zhilyaev “Distributed Differential Privacy via Shuffling” In Advances in Cryptology – EUROCRYPT 2019 Cham: Springer International Publishing, 2019, pp. 375–403
  • [CU21] Albert Cheu and Jonathan Ullman “The Limits of Pan Privacy and Shuffle Privacy for Learning and Estimation” In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing New York, NY, USA: Association for Computing Machinery, 2021, pp. 1081–1094 URL: https://doi.org/10.1145/3406325.3450995
  • [DF87] Persi Diaconis and David Freedman “A dozen de Finetti-style results in search of a theory” In Annales de l’IHP Probabilités et statistiques 23, 1987, pp. 397–423
  • [DJW18] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Minimax Optimal Procedures for Locally Private Estimation (with discussion)” In Journal of the American Statistical Association 113.521, 2018, pp. 182–215
  • [DMNS06] C. Dwork, F. McSherry, K. Nissim and A. Smith “Calibrating noise to sensitivity in private data analysis” In TCC, 2006, pp. 265–284
  • [DR19] John C. Duchi and Ryan Rogers “Lower Bounds for Locally Private Estimation via Communication Complexity” In Proceedings of the Thirty Second Annual Conference on Computational Learning Theory, 2019
  • [Duc18] John C. Duchi “Introductory Lectures on Stochastic Convex Optimization” In The Mathematics of Data, IAS/Park City Mathematics Series American Mathematical Society, 2018
  • [EFMRSTT20] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar and Abhradeep Thakurta “Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation” In arXiv:2001.03618 [cs.CR], 2020
  • [EFMRTT19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar and Abhradeep Thakurta “Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity” In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’19 San Diego, California: Society for IndustrialApplied Mathematics, 2019, pp. 2468–2479
  • [EGS03] Alexandre V. Evfimievski, Johannes Gehrke and Ramakrishnan Srikant “Limiting privacy breaches in privacy preserving data mining” In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, 2003, pp. 211–222
  • [EPK14] Ulfar Erlingsson, Vasyl Pihur and Aleksandra Korolova “RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response” In Proceedings of the 21st ACM Conference on Computer and Communications Security (CCS), 2014
  • [FMT20] Vitaly Feldman, Audra McMillan and Kunal Talwar “Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling” In arXiv:2012.12803 [cs.LG], 2020
  • [FT21] Vitaly Feldman and Kunal Talwar “Lossless Compression of Efficient Private Local Randomizers” In Proceedings of the 38th International Conference on Machine Learning 139 PMLR, 2021, pp. 3208–3219
  • [GDDKS20] Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz and Ananda Theertha Suresh “Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs”, 2020 arXiv:2008.07180 [cs.LG]
  • [GKMM19] Venkata Gandikota, Daniel Kane, Raj Kumar Maity and Arya Mazumdar “vqsgd: Vector quantized stochastic gradient descent” In arXiv preprint arXiv:1911.07971, 2019
  • [GKOV15] Quan Geng, Peter Kairouz, Sewoong Oh and Pramod Viswanath “The Staircase Mechanism in Differential Privacy” In IEEE Journal of Selected Topics in Signal Processing 9.7, 2015, pp. 1176–1184 DOI: 10.1109/JSTSP.2015.2425831
  • [GRS09] Arpita Ghosh, Tim Roughgarden and Mukund Sundararajan “Universally Utility-Maximizing Privacy Mechanisms” In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09 Bethesda, MD, USA: Association for Computing Machinery, 2009, pp. 351–360 DOI: 10.1145/1536414.1536464
  • [GS10] Mangesh Gupte and Mukund Sundararajan “Universally Optimal Privacy Mechanisms for Minimax Agents” In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’10 Indianapolis, Indiana, USA: Association for Computing Machinery, 2010, pp. 135–146 DOI: 10.1145/1807085.1807105
  • [KOV16] Peter Kairouz, Sewoong Oh and Pramod Viswanath “Extremal Mechanisms for Local Differential Privacy” In J. Mach. Learn. Res. 17.1 JMLR.org, 2016, pp. 492–542 URL: http://dl.acm.org/citation.cfm?id=2946645.2946662
  • [War65] Stanley L Warner “Randomized response: A survey technique for eliminating evasive answer bias” In Journal of the American Statistical Association 60.309 Taylor & Francis Group, 1965, pp. 63–69
  • [YB17] Min Ye and Alexander Barg “Asymptotically optimal private estimation under mean square loss” In arXiv:1708.00059 [math.ST], 2017
  • [YB18] Min Ye and Alexander Barg “Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy” In IEEE Transactions on Information Theory 64.8, 2018, pp. 5662–5676

Appendix A Missing details for PrivUnit (Section 3)

A.1 Proof of Proposition 3 (missing details)

Here we complete the missing details from the proof of Proposition 3. We have B1,,BKB_{1},\dots,B_{K} sets such that for at least Kd2K-d-2 of the sets BiB_{i} we have f(ui){eε,eε}pf(u_{i})\in\{e^{\varepsilon},e^{-\varepsilon}\}p. We now show how to manipulate the probabilities on the other sets to satisfy our desired properties while not affecting the accuracy. Assume without loss of generality that B1,,Bd2B_{1},\dots,B_{d-2} do not satisfy this condition and let B𝖻𝖺𝖽=1id2BiB_{\mathsf{bad}}=\cup_{1\leq i\leq d-2}B_{i}. We now show that we can manipulate the density on B𝖻𝖺𝖽B_{\mathsf{bad}} such that it will satisfy this condition. Moreover, we show that the probability of uB𝖻𝖺𝖽u\in B_{\mathsf{bad}} is small so that this does not affect the accuracy of the algorithm. Note that the probability uB𝖻𝖺𝖽u\in B_{\mathsf{bad}} is at most

p𝖻𝖺𝖽P(^B𝖻𝖺𝖽)dmaxiVipeεdVδdpeε,p_{\mathsf{bad}}\coloneqq P(\hat{\mathcal{R}}\in B_{\mathsf{bad}})\leq d\max_{i}V_{i}pe^{\varepsilon}\leq dV\delta^{d}pe^{\varepsilon},

where V=u𝕊d11𝑑uV=\int_{u\in\mathbb{S}^{d-1}}1du. Now we proceed to balance the density in B𝖻𝖺𝖽B_{\mathsf{bad}}. Let B𝖻𝖺𝖽=A0A1B_{\mathsf{bad}}=A_{0}\cup A_{1} where A0=A1A_{0}=A_{1}^{-} and A0A1=A_{0}\cap A_{1}=\emptyset. We show how to balance the probabilities for uA0u\in A_{0} such that the mass on A0A_{0} is p𝖻𝖺𝖽/2p_{\mathsf{bad}}/2. Then we define the density for uA1u\in A_{1} using the density of uu^{-} as uA0u^{-}\in A_{0}. Let VA0=uA01𝑑uV_{A_{0}}=\int_{u\in A_{0}}1du be the measure of the set A0A_{0}. We divide A0A_{0} to two sets A01A_{0}^{1} and A02A_{0}^{2} such that uA011𝑑u=ρVA0\int_{u\in A_{0}^{1}}1du=\rho V_{A_{0}} and uA021𝑑u=(1ρ)VA0\int_{u\in A_{0}^{2}}1du=(1-\rho)V_{A_{0}}. We define a new density function f~\tilde{f} such that

f~(u)={f^(u)if uB𝖻𝖺𝖽p^eεif uA01p^eεif uA02f~(u)if uA1\tilde{f}(u)=\begin{cases}\hat{f}(u)&\quad\quad\quad\text{if }u\notin B_{\mathsf{bad}}\\ \hat{p}e^{\varepsilon}&\quad\quad\quad\text{if }u\in A_{0}^{1}\\ \hat{p}e^{-\varepsilon}&\quad\quad\quad\text{if }u\in A_{0}^{2}\\ \tilde{f}(u^{-})&\quad\quad\quad\text{if }u\in A_{1}\end{cases}

First, note that by design we have f~(u)=f~(u)\tilde{f}(u)=\tilde{f}(u^{-}). We now prove that the choice ρ=p𝖻𝖺𝖽/2pVA0eεeεeε\rho=\frac{{p_{\mathsf{bad}}}/{2pV_{A_{0}}}-e^{-\varepsilon}}{e^{\varepsilon}-e^{-\varepsilon}} satisfies all of our conditions. First we show that ρ[0,1]\rho\in[0,1]. Indeed as f^(u)[eε,eε]p\hat{f}(u)\in[e^{-\varepsilon},e^{\varepsilon}]p for all uu, we get that the average density in B𝖻𝖺𝖽B_{\mathsf{bad}} is also in this range, that is, p𝖻𝖺𝖽pV𝖻𝖺𝖽=p𝖻𝖺𝖽2pVA0[eε,eε]\frac{p_{\mathsf{bad}}}{pV_{\mathsf{bad}}}=\frac{p_{\mathsf{bad}}}{2pV_{A_{0}}}\in[e^{-\varepsilon},e^{\varepsilon}], which implies ρ[0,1]\rho\in[0,1]. Moreover, note that

uA0f~(u)𝑑u=peεVA0ρ+peεVA0(1ρ)=p𝖻𝖺𝖽/2.\displaystyle\int_{u\in A_{0}}\tilde{f}(u)du=pe^{\varepsilon}V_{A_{0}}\rho+pe^{-\varepsilon}V_{A_{0}}(1-\rho)=p_{\mathsf{bad}}/2.

This implies that u𝕊d1f~(u)𝑑u=1\int_{u\in\mathbb{S}^{d-1}}\tilde{f}(u)du=1. Finally, note that this does not affect α\alpha too much as we have

α~\displaystyle\tilde{\alpha} =uSf~(u)Viv,u¯i\displaystyle=\sum_{u\in S}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle
=uSB𝖻𝖺𝖽f^(u)Viv,u¯i+uSB𝖻𝖺𝖽f~(u)Viv,u¯i\displaystyle=\sum_{u\in S\setminus B_{\mathsf{bad}}}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle+\sum_{u\in S\cap B_{\mathsf{bad}}}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle
=uSf^(u)Viv,u¯i+uSB𝖻𝖺𝖽f~(u)Viv,u¯iuSB𝖻𝖺𝖽f^(u)Viv,u¯i\displaystyle=\sum_{u\in S}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle+\sum_{u\in S\cap B_{\mathsf{bad}}}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle-\sum_{u\in S\cap B_{\mathsf{bad}}}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle
α^p𝖻𝖺𝖽.\displaystyle\geq\hat{\alpha}-p_{\mathsf{bad}}.

Note that for sufficiently small δ\delta, the error of this algorithm is now

1α~2\displaystyle\frac{1}{\tilde{\alpha}^{2}} 1(α^p𝖻𝖺𝖽)2\displaystyle\leq\frac{1}{(\hat{\alpha}-p_{\mathsf{bad}})^{2}}
1(αp𝖻𝖺𝖽2δ)2\displaystyle\leq\frac{1}{(\alpha-p_{\mathsf{bad}}-2\delta)^{2}}
1α2+20(p𝖻𝖺𝖽+2δ)α2\displaystyle\leq\frac{1}{\alpha^{2}}+\frac{20(p_{\mathsf{bad}}+2\delta)}{\alpha^{2}}
1α2+τ,\displaystyle\leq\frac{1}{\alpha^{2}}+\tau,

where the last inequality follows by choosing δ\delta small enough such that 20(p𝖻𝖺𝖽+2δ)α2τ\frac{20(p_{\mathsf{bad}}+2\delta)}{\alpha^{2}}\leq\tau which gives the claim.

Appendix B Proofs and missing details for Section 4

B.1 Proof of Proposition 4

We will use the following helper lemma.

Lemma B.1.

Let U𝖭(0,σ2)U\sim\mathsf{N}(0,\sigma^{2}). Then

𝔼[U1{Uγ}]=σϕ(γ/σ).\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]=\sigma\phi(\gamma/\sigma).
Proof.

We have

𝔼[U1{Uγ}]\displaystyle\mathbb{E}[U1\!\left\{U\geq\gamma\right\}] =t=γuϕσ2(t)𝑑t\displaystyle=\int_{t=\gamma}^{\infty}u\phi_{\sigma^{2}}(t)dt
=t=γu12πσ2et2/2σ2𝑑t\displaystyle=\int_{t=\gamma}^{\infty}u\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-t^{2}/2\sigma^{2}}dt
=σ2πeγ2/2σ2\displaystyle=\frac{\sigma}{\sqrt{2\pi}}e^{-\gamma^{2}/2\sigma^{2}}
=σϕ(γ/σ).\displaystyle=\sigma\phi(\gamma/\sigma).

We begin by proving PrivUnitG is unbiased. Note that 𝔼[1mV]=𝔼[α/m]v\mathbb{E}[\frac{1}{m}V]=\mathbb{E}[\alpha/m]\cdot v therefore we need to show that 𝔼[α]=m\mathbb{E}[\alpha]=m. To this end,

𝔼[α]\displaystyle\mathbb{E}[\alpha] =(p𝔼[UUγ]+(1p)𝔼[UU<γ])\displaystyle=\left(p\mathbb{E}[U\mid U\geq\gamma]+(1-p)\mathbb{E}[U\mid U<\gamma]\right)
=pP(Uγ)𝔼[U1{Uγ}]+1pP(U<γ)𝔼[U1{Uγ}]\displaystyle=\frac{p}{P(U\geq\gamma)}\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]+\frac{1-p}{P(U<\gamma)}\mathbb{E}[U1\!\left\{U\leq\gamma\right\}]
=𝔼[U1{Uγ}](pP(Uγ)1pP(U<γ))\displaystyle=\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]\left(\frac{p}{P(U\geq\gamma)}-\frac{1-p}{P(U<\gamma)}\right)
=σϕ(γ/σ)(pP(Uγ)1pP(U<γ))\displaystyle=\sigma\phi(\gamma/\sigma)\left(\frac{p}{P(U\geq\gamma)}-\frac{1-p}{P(U<\gamma)}\right)
=m,\displaystyle=m,

where the third inequality follows since 𝔼[U]=𝔼[U1{Uγ}]+𝔼[U1{U<γ}]=0\mathbb{E}[U]=\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]+\mathbb{E}[U1\!\left\{U<\gamma\right\}]=0, and the last inequality follows since Lemma B.1 gives that 𝔼[U1{Uγ}]=σϕ(γ/σ)\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]=\sigma\phi(\gamma/\sigma) for U𝖭(0,σ2)U\sim\mathsf{N}(0,\sigma^{2}). For the claim about utility, as PrivUnitG is unbiased we have

𝔼[PrivUnitG2(v)v22]=𝔼[PrivUnitG2(v)22]1=1m2(𝔼[α2]+d1d)1\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)-v}\right\|_{2}^{2}]=\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)}\right\|_{2}^{2}]-1=\frac{1}{m^{2}}\left(\mathbb{E}[\alpha^{2}]+\frac{d-1}{d}\right)-1

Now we prove the claim about the limit. First, note that P(Uγ)=qP(U\leq\gamma)=q hence we can write

m=σϕ(γ/σ)(p1q1pq).m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right).

Moreover, Lemma B.3 and Lemma B.4 shows that 𝔼[α2]O(γ2+ε+logdd)O(ε+logdd)\mathbb{E}[\alpha^{2}]\leq O\left(\gamma^{2}+\frac{\varepsilon+\log d}{d}\right)\leq O\left(\frac{\varepsilon+\log d}{d}\right). Taking limit as dd\to\infty, this yields that

m2𝔼[PrivUnitG2(v)v22]d1.m^{2}\cdot\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)-v}\right\|_{2}^{2}]\stackrel{{\scriptstyle d\to\infty}}{{\to}}1.

Now we proceed to prove the privacy claim. We need to show that for every v1,v2𝕊2dv_{1},v_{2}\in\mathbb{S}_{2}^{d} and udu\in\mathbb{R}^{d}

fPrivUnitG(v1)(u)fPrivUnitG(v2)(u)eε.\frac{f_{\text{PrivUnitG}(v_{1})}(u)}{f_{\text{PrivUnitG}(v_{2})}(u)}\leq e^{\varepsilon}.

For every input vector, we divide the output space to two sets: Sv={ud:u,vγ}S_{v}=\{u\in\mathbb{R}^{d}:\langle u,v\rangle\geq\gamma\} and Svc=dSvS_{v}^{c}=\mathbb{R}^{d}\setminus S_{v}. The definition of PrivUnitG implies that for uSvu\in S_{v} we have

fPrivUnitG(v)(u)=pfU(u)P(Uγ),f_{\text{PrivUnitG}(v)}(u)=p\frac{f_{U}(u)}{P(U\geq\gamma)},

and for uSvu\notin S_{v} we have

fPrivUnitG(v)(u)=(1p)fU(u)P(Uγ).f_{\text{PrivUnitG}(v)}(u)=(1-p)\frac{f_{U}(u)}{P(U\leq\gamma)}.

Using the notation q=P(Uγ)q=P(U\leq\gamma), we now have that for any v1,v2v_{1},v_{2} and uu

fPrivUnitG(v1)(u)fPrivUnitG(v2)(u)\displaystyle\frac{f_{\text{PrivUnitG}(v_{1})}(u)}{f_{\text{PrivUnitG}(v_{2})}(u)} p/(1q)(1p)/q\displaystyle\leq\frac{p/(1-q)}{(1-p)/q}
p1pq1q\displaystyle\leq\frac{p}{1-p}\cdot\frac{q}{1-q}
eε,\displaystyle\leq e^{\varepsilon},

where the first inequality follows since we must have uSv1u\in S_{v_{1}} and usv2u\notin s_{v_{2}} to maximize the ratio.

B.2 Helper lemmas for Theorem 3

Lemma B.2.

We have

|𝔼[W1{Wγ}]𝔼[U1{Uγ}]|O(logdd3/2).|\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]-\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]|\leq O\left(\frac{\sqrt{\log d}}{d^{3/2}}\right).
Proof.

We use the fact that for UU and WW, we have P(Uγ)P(Wγ)+8d4P(U\leq\gamma)\leq P(W\leq\gamma)+\frac{8}{d-4} ([DF87, Inequality (1)]). We have

|𝔼[W1{Wγ}]𝔼[U1{Uγ}]|\displaystyle|\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]-\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]| =|γu(fW(u)fU(u))𝑑u|\displaystyle=|\int_{\gamma}^{\infty}u(f_{W}(u)-f_{U}(u))du|
|γCu(fW(u)fU(u))𝑑u|+|Cu(fW(u)fU(u))𝑑u|\displaystyle\leq|\int_{\gamma}^{C}u(f_{W}(u)-f_{U}(u))du|+|\int_{C}^{\infty}u(f_{W}(u)-f_{U}(u))du|
CPWPUTV+C1ufW(u)𝑑u+fU(C)d\displaystyle\leq C\left\|{P_{W}-P_{U}}\right\|_{TV}+\int_{C}^{1}uf_{W}(u)du+\frac{f_{U}(C)}{d}
8Cd4+eC2d/22πd+fW(C)\displaystyle\leq\frac{8C}{d-4}+\frac{e^{-C^{2}d/2}}{\sqrt{2\pi d}}+f_{W}(C)
(i)8Cd4+eC2d/22πd+O(deC2d/8)\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}\frac{8C}{d-4}+\frac{e^{-C^{2}d/2}}{\sqrt{2\pi d}}+O(\sqrt{d}e^{-C^{2}d/8})
O(logdd3/2).\displaystyle\leq O(\frac{\sqrt{\log d}}{d^{3/2}}).

where the last inequality follows by setting C=Θ(log(d)d)C=\Theta(\sqrt{\frac{\log(d)}{d}}). Inequality (i)(i) follows since WW has the same distribution as 2B12B-1 for B𝖡𝖾𝗍𝖺(d12,d12)B\sim\mathsf{Beta}(\frac{d-1}{2},\frac{d-1}{2}). Indeed setting α=d12\alpha=\frac{d-1}{2}, we have for any b=1/2+ρb=1/2+\rho

fB(b)\displaystyle f_{B}(b) =(b(1b))α1B(α,α)\displaystyle=\frac{(b(1-b))^{\alpha-1}}{B(\alpha,\alpha)}
=(1ρ2)α14α11B(α,α)\displaystyle=\frac{(1-\rho^{2})^{\alpha-1}}{4^{\alpha-1}}\frac{1}{B(\alpha,\alpha)}
eρ2(α1)4α11B(α,α)\displaystyle\leq\frac{e^{-\rho^{2}(\alpha-1)}}{4^{\alpha-1}}\frac{1}{B(\alpha,\alpha)}
O(αeρ2α),\displaystyle\leq O(\sqrt{\alpha}e^{-\rho^{2}\alpha}),

where the last inequality follows from

B(α,α)=2α1(2αα)2αΩ(α4α)=Ω(1α4α).\displaystyle B(\alpha,\alpha)=\frac{2}{\alpha}\frac{1}{\binom{2\alpha}{\alpha}}\geq\frac{2}{\alpha}\Omega(\frac{\sqrt{\alpha}}{4^{\alpha}})=\Omega(\frac{1}{\sqrt{\alpha}4^{\alpha}}).

The claim now follows as fW(C)=fB(1/2+C/2)f_{W}(C)=f_{B}(1/2+C/2). ∎

Lemma B.3.

We have

𝔼[α2]O(γ2+εG+logdd).\mathbb{E}[\alpha^{2}]\leq O\left(\gamma^{2}+\frac{\varepsilon_{G}+\log d}{d}\right).
Proof.

It is enough to upper bound 𝔼[U2Uγ]\mathbb{E}[U^{2}\mid U\geq\gamma]. Since P(Uγ)eεGP(U\geq\gamma)\geq e^{-\varepsilon_{G}}

𝔼[U2Uγ]\displaystyle\mathbb{E}[U^{2}\mid U\geq\gamma] =γCu2fU(u)P(Uγ)𝑑u+Cu2fU(u)P(Uγ)𝑑u\displaystyle=\int_{\gamma}^{C}u^{2}\frac{f_{U}(u)}{P(U\geq\gamma)}du+\int_{C}^{\infty}u^{2}\frac{f_{U}(u)}{P(U\geq\gamma)}du
C2+eεG12πdCu2𝑑eu2d/2𝑑u\displaystyle\leq C^{2}+e^{\varepsilon_{G}}\frac{1}{\sqrt{2\pi d}}\int_{C}^{\infty}u^{2}de^{-u^{2}d/2}du
C2+eεGeC2d/42πdCu2𝑑eu2d/4𝑑u\displaystyle\leq C^{2}+e^{\varepsilon_{G}}\frac{e^{-C^{2}d/4}}{\sqrt{2\pi d}}\int_{C}^{\infty}u^{2}de^{-u^{2}d/4}du
O(γ2+εG+logdd),\displaystyle\leq O\left(\gamma^{2}+\frac{\varepsilon_{G}+\log d}{d}\right),

where the last inequality follows by setting C=Θ(max(γ,εG+logdd))C=\Theta\left(\max\left(\gamma,\sqrt{\frac{\varepsilon_{G}+\log d}{d}}\right)\right). ∎

The following lemma is also useful for our analysis.

Lemma B.4.

Assume PrivUnitG(p,γ)\text{PrivUnitG}(p,\gamma) is ε\varepsilon-DP. Then γ22log(eε+1)d\gamma^{2}\leq\frac{2\log(e^{\varepsilon}+1)}{d}.

Proof.

Proposition 4 implies that γ=Φ1(q)d\gamma=\frac{\Phi^{-1}(q)}{\sqrt{d}} where q=eε1eε1+1q=\frac{e^{\varepsilon_{1}}}{e^{\varepsilon_{1}}+1} for ε1ε\varepsilon_{1}\leq\varepsilon. As Φ1(q)\Phi^{-1}(q) is increasing in qq, we have that

γ=Φ1(q)d=Φ1(eεeε+1)d.\gamma=\frac{\Phi^{-1}(q)}{\sqrt{d}}=\frac{\Phi^{-1}(\frac{e^{\varepsilon}}{e^{\varepsilon}+1})}{\sqrt{d}}.

Gaussian concentration gives that for t22log(eε+1)t^{2}\geq 2\log(e^{\varepsilon}+1)

P(𝖭(0,1)>t)et2/21/(eε+1).P(\mathsf{N}(0,1)>t)\leq e^{-t^{2}/2}\leq 1/(e^{\varepsilon}+1).

This implies that Φ1(eεeε+1)2log(eε+1)\Phi^{-1}(\frac{e^{\varepsilon}}{e^{\varepsilon}+1})\leq\sqrt{2\log(e^{\varepsilon}+1)} and proves the claim.

B.3 Proof of Proposition 5

Recall that the error of PrivUnitG(p,q)\text{PrivUnitG}(p,q) for dimension dd is

𝖤𝗋𝗋d(PrivUnitG(p,q))=𝔼[α2]+1md21\mathsf{Err}_{d}(\text{PrivUnitG}(p,q))=\frac{\mathbb{E}[\alpha^{2}]+1}{m_{d}^{2}}-1

where

md=ϕ(γd)d(p1q1pq).m_{d}=\frac{\phi(\gamma\sqrt{d})}{\sqrt{d}}\left(\frac{p}{1-q}-\frac{1-p}{q}\right).

Note first that γd=Φ1(q)\gamma\sqrt{d}=\Phi^{-1}(q) which immediately implies that

md1md2=d2d1.\frac{m_{d_{1}}}{m_{d_{2}}}=\sqrt{\frac{d_{2}}{d_{1}}}.

Lemma B.3 and Lemma B.4 show that 𝔼[α2]O(ε+logdd)\mathbb{E}[\alpha^{2}]\leq O(\frac{\varepsilon+\log d}{d}). Finally, as 𝖤𝗋𝗋d(p,q)Cd/ε\mathsf{Err}_{d}(p,q)\geq Cd/\varepsilon for all pp and qq, this implies that md2O(ε/d)m_{d}^{2}\leq O(\varepsilon/d). Letting 𝖤𝗋𝗋d\mathsf{Err}_{d} denote the error for for inputs u𝕊d1u\in\mathbb{S}^{d-1}, we now have

Cε,d1Cε,d2\displaystyle\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}} =𝖤𝗋𝗋d1(PrivUnitG(p1,q1))/d1𝖤𝗋𝗋d2(PrivUnitG(p2,q2))/d2\displaystyle=\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))/d_{1}}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))/d_{2}}
d2d1𝖤𝗋𝗋d1(PrivUnitG(p2,q2))𝖤𝗋𝗋d2(PrivUnitG(p2,q2))\displaystyle\leq\frac{d_{2}}{d_{1}}\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{2},q_{2}))}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))}
d2d1𝔼[α12]+1md121md221\displaystyle\leq\frac{d_{2}}{d_{1}}\frac{\frac{\mathbb{E}[\alpha_{1}^{2}]+1}{m_{d_{1}}^{2}}}{\frac{1}{m_{d_{2}}^{2}}-1}
d2d1(𝔼[α12]+1)md22md1211md22\displaystyle\leq\frac{d_{2}}{d_{1}}(\mathbb{E}[\alpha_{1}^{2}]+1)\frac{m_{d_{2}}^{2}}{m_{d_{1}}^{2}}\frac{1}{1-m_{d_{2}}^{2}}
=(𝔼[α12]+1)11md22\displaystyle=(\mathbb{E}[\alpha_{1}^{2}]+1)\frac{1}{1-m_{d_{2}}^{2}}
(1+O(ε+logd1d1))(1+O(εd2))\displaystyle\leq\left(1+O(\frac{\varepsilon+\log d_{1}}{d_{1}})\right)\left(1+O(\frac{\varepsilon}{d_{2}})\right)
1+O(ε+logd1d1+εd2).\displaystyle\leq 1+O\left(\frac{\varepsilon+\log d_{1}}{d_{1}}+\frac{\varepsilon}{d_{2}}\right).

This proves the right hand side of the inequality. The left side follows using the same arguments by noting that

Cε,d1Cε,d2\displaystyle\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}} =𝖤𝗋𝗋d1(PrivUnitG(p1,q1))/d1𝖤𝗋𝗋d2(PrivUnitG(p2,q2))/d2\displaystyle=\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))/d_{1}}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))/d_{2}}
d2d1𝖤𝗋𝗋d1(PrivUnitG(p1,q1))𝖤𝗋𝗋d2(PrivUnitG(p1,q1)).\displaystyle\geq\frac{d_{2}}{d_{1}}\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{1},q_{1}))}.

The second part of the claim regarding the limit follows directly from the first part.

B.4 Proof of Proposition 6

We need the following lemma for this proof.

Lemma B.5.

We have

  1. 1.

    Cε2Cε1ε2ε1C_{\varepsilon_{2}}\leq C_{\varepsilon_{1}}\frac{\varepsilon_{2}}{\varepsilon_{1}} for ε1ε2\varepsilon_{1}\leq\varepsilon_{2}

  2. 2.

    CkεCεC_{k\varepsilon}\leq C_{\varepsilon} for any integer k1k\geq 1

Proof.

The first part follows as 𝖤𝗋𝗋ε2,d𝖤𝗋𝗋ε1,d\mathsf{Err}_{\varepsilon_{2},d}\leq\mathsf{Err}_{\varepsilon_{1},d} as ε1ε2\varepsilon_{1}\leq\varepsilon_{2} which implies that

Cε2,d=ε2𝖤𝗋𝗋ε2,d/dε2𝖤𝗋𝗋ε1,d/d=ε2ε1ε1𝖤𝗋𝗋ε1,d/d=ε2ε1Cε1,d.C_{\varepsilon_{2},d}=\varepsilon_{2}\mathsf{Err}_{\varepsilon_{2},d}/d\leq\varepsilon_{2}\mathsf{Err}_{\varepsilon_{1},d}/d=\frac{\varepsilon_{2}}{\varepsilon_{1}}\varepsilon_{1}\mathsf{Err}_{\varepsilon_{1},d}/d=\frac{\varepsilon_{2}}{\varepsilon_{1}}C_{\varepsilon_{1},d}.

This implies the claim for CεC_{\varepsilon}.

The second part follows from apply a repetition-based randomizer. Given an ε\varepsilon-DP local randomizer d\mathcal{R}_{d} for 𝕊2d1\mathbb{S}_{2}^{d-1} that achieves Cε,dC_{\varepsilon,d} (that is, 𝖤𝗋𝗋(d)=Cε,dd/ε\mathsf{Err}(\mathcal{R}_{d})=C_{\varepsilon,d}d/\varepsilon), the randomizer d\mathcal{R}^{\prime}_{d} that returns an average of kk applications of \mathcal{R} is kεk\varepsilon-DP and has error 𝖤𝗋𝗋(R)𝖤𝗋𝗋(R)k\mathsf{Err}(R^{\prime})\leq\frac{\mathsf{Err}(R)}{k}. Thus we have

Ckε,d=kε𝖤𝗋𝗋kε,d/dkε𝖤𝗋𝗋(d)/dε𝖤𝗋𝗋(d)/d=Cε,d.C_{k\varepsilon,d}=k\varepsilon\mathsf{Err}_{k\varepsilon,d}/d\leq k\varepsilon\mathsf{Err}(\mathcal{R}^{\prime}_{d})/d\leq\varepsilon\mathsf{Err}(\mathcal{R}_{d})/d=C_{\varepsilon,d}.

The proves the claim for CεC_{\varepsilon} as it is true for all dd. ∎

We are now ready to prove Proposition 6.

Proof.

(Proposition 6) First, note that CεC_{\varepsilon} is a sequence of real numbers that is bounded from below by zero. Thus lim infεCε\liminf_{\varepsilon\to\infty}C_{\varepsilon} exists. Now we will show that limεCε\lim_{\varepsilon\to\infty}C_{\varepsilon} exists. Let C=lim infεCεC^{\star}=\liminf_{\varepsilon\to\infty}C_{\varepsilon}. It is enough to show that for all δ>0\delta>0 there is ε0\varepsilon_{0} such that for all εε0\varepsilon\geq\varepsilon_{0} we get CεC+δC_{\varepsilon}\leq C^{\star}+\delta. The definition of lim inf\liminf implies that there is ε\varepsilon^{\prime} such that CεC+δ/2C_{\varepsilon^{\prime}}\leq C^{\star}+\delta/2. We will show that for all εkε1\varepsilon\geq k\varepsilon_{1} we get CεC+δC_{\varepsilon}\leq C^{\star}+\delta for some large kk. Let ε=kε1+ε2\varepsilon=k^{\prime}\varepsilon_{1}+\varepsilon_{2} where 0ε2<ε10\leq\varepsilon_{2}<\varepsilon_{1} and kkk^{\prime}\geq k. We have that

Cε\displaystyle C_{\varepsilon} (i)Ckε1εkε1\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}C_{k^{\prime}\varepsilon_{1}}\frac{\varepsilon}{k^{\prime}\varepsilon_{1}}
(ii)Cε1εkε1\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}C_{\varepsilon_{1}}\frac{\varepsilon}{k^{\prime}\varepsilon_{1}}
Cε1(1+1/k)\displaystyle\leq C_{\varepsilon_{1}}(1+1/k)
(C+δ/2)(1+1/k)\displaystyle\leq(C^{\star}+\delta/2)(1+1/k)
C+δ,\displaystyle\leq C^{\star}+\delta,

where (i)(i) and (ii)(ii) follow from the first and second items of Lemma B.5 and the last inequality follows by setting k=1/(C+δ/2)k=\left\lceil{1/(C^{\star}+\delta/2)}\right\rceil. The claim follows.