Optimal Algorithms for Mean Estimation under Local Differential Privacy

Hilal Asi Stanford University, part of this work performed while interning at Apple; [email protected]. Vitaly Feldman Apple; [email protected]. Kunal Talwar Apple; [email protected].

Abstract

We study the problem of mean estimation of $\ell_{2}$ -bounded vectors under the constraint of local differential privacy. While the literature has a variety of algorithms that achieve the asymptotically optimal rates for this problem, the performance of these algorithms in practice can vary significantly due to varying (and often large) hidden constants. In this work, we investigate the question of designing the protocol with the smallest variance. We show that PrivUnit [BDFKR18] with optimized parameters achieves the optimal variance among a large family of locally private randomizers. To prove this result, we establish some properties of local randomizers, and use symmetrization arguments that allow us to write the optimal randomizer as the optimizer of a certain linear program. These structural results, which should extend to other problems, then allow us to show that the optimal randomizer belongs to the PrivUnit family.

We also develop a new variant of PrivUnit based on the Gaussian distribution which is more amenable to mathematical analysis and enjoys the same optimality guarantees. This allows us to establish several useful properties on the exact constants of the optimal error as well as to numerically estimate these constants.

1 Introduction

Mean estimation is one of the most fundamental problems in machine learning and is the building block of a countless number of algorithms and applications including stochastic optimization [Duc18], federated learning [BIKMMPRSS17] and others. However, it is now evident that standard algorithms for this task may leak sensitive information about users’ data and compromise their privacy. This had led to the development of numerous algorithms for estimating the mean while preserving the privacy of users. The most common models for privacy are either the central model where there exists a trusted curator or the local model where such trusted curator does not exist.

In this work, we study the problem of mean estimation in the local privacy model. More specifically, we have $n$ users each with a vector $v_{i}$ in the Euclidean unit ball in $\mathbb{R}^{d}$ . Each user will use a randomizer $\mathcal{R}:\mathbb{R}^{d}\to\mathcal{Z}$ to privatize their data where $\mathcal{R}$ must satisfy $\varepsilon$ -differential privacy, namely, for any $v_{1}$ and $v_{2}$ , $P(\mathcal{R}(v_{1})=u)/P(\mathcal{R}(v_{2})=u)\leq e^{\varepsilon}$ . Then, we run an aggregation method $\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d}$ such that $\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))$ provides an estimate of $\frac{1}{n}\sum_{i=1}^{n}v_{i}$ . Our goal in this work is to characterize the optimal protocol (pair of randomizer $\mathcal{R}$ and aggregation method $\mathcal{A}$ ) for this problem and study the resulting optimal error.

Due to its importance and many applications, the problem of private mean estimation in the local model has been studied by numerous papers [BDFKR18, FT21, CKÖ20]. As a result, a clear understanding of the asymptotic optimal rates has emerged, showing that the optimal squared error is proportional to $\Theta(\frac{d}{n\min(\varepsilon,\varepsilon^{2})})$ : [DJW18, BDFKR18] developed algorithms that obtain this rate and [DR19] proved corresponding lower bounds. Subsequent papers [FT21, CKÖ20] have developed several other algorithms that achieve the same rates.

However, these optimality results do not give a clear characterization of which algorithm will enjoy better performance in practice. Constant factors here matter more than they do in run time or memory, as $\varepsilon$ is typically limited by privacy constraints, and increasing the sample size by collecting data for more individuals is often infeasible or expensive. The question of finding the randomizer with the smallest error is therefore of great interest.

1.1 Our contributions

Motivated by these limitations, we investigate strict optimality for the problem of mean estimation with local privacy. We study the family of non-interactive and unbiased protocols, that is, a protocol is a pair of local private randomizer $\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z}$ and an aggregation method $\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d}$ where the protocol outputs $\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))$ such that $\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}$ . We measure the error of a private protocol in terms of its (worst case) mean squared error

\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\frac{1}{n}\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right].

We obtain the following results.

First, we show that PrivUnit of [BDFKR18] with optimized parameters is optimal amongst a large family of protocols. Our strategy for proving optimality consists of two main steps: first, we show that for non-interactive protocols, additive aggregation with a certain randomizer attains the optimal error. Then, for protocols with additive aggregation, we show that PrivUnit obtains the optimal error. Our proof builds on establishing several new properties of the optimal local randomizer, which allow us to express the problem of designing the optimal randomizer as a linear program. This in turn helps characterize the structure of optimal randomizers and allows us to show that there is an optimal randomizer which is an instance of PrivUnit.

Finding the exact constants in the error of PrivUnit is mathematically challenging. Our second contribution is to develop a new algorithm PrivUnitG that builds on the Gaussian distribution and attains the same error as PrivUnit up to a $(1+o(1))$ multiplicative factor as $d\to\infty$ . In contrast to PrivUnit, we show that the optimal parameters of PrivUnitG are independent of the dimension, hence enabling efficient calculation of the constants for high dimensional settings. Moreover, PrivUnitG is amenable to mathematical analysis which yields several properties on the constants of the optimal error.

1.2 Related work

Local privacy is perhaps one of the oldest forms of privacy and dates back to [War65] who used it to encourage truthfulness in surveys. This definition resurfaced again in the contex of modern data analysis by [EGS03] and was related to differential privacy in the seminal work of [DMNS06]. Local privacy has attracted a lot of interest, both in the academic community [BNO08, DJW18, BDFKR18], and in industry where it has been deployed in several industrial applications [EPK14, App17]. Recent work in the Shuffle model of privacy [BEMMRLRKTS17, CSUZZ19, EFMRTT19, BBGN19, FMT20] has led to increased interest in the local model with moderate values of the local privacy parameter, as they can translate to small values of central $\varepsilon$ under shuffling.

The problem of locally private mean estimation has received a great deal of attention in the past decade [DJW18, BDFKR18, DR19, EFMRSTT20, ASYKM18, GDDKS20, CKÖ20, GKMM19, FT21]. [DJW18] developed asymptotically optimal procedures for estimating the mean when $\varepsilon\leq 1$ , achieving expected squared error $O(\frac{d}{n\varepsilon^{2}})$ . [BDFKR18] proposed a new algorithm that is optimal for $\varepsilon\geq 1$ as well, achieving error $O(\frac{d}{n\min(\varepsilon,\varepsilon^{2})})$ . These rates are optimal as [DR19] show tight lower bounds which hold for interactive protocols. There has been more work on locally private mean estimation that studies the problem with additional constraints such as communications cost [EFMRSTT20, FT21, CKÖ20].

[YB17, YB18] study (non-interactive) locally private estimation problems with discrete domains and design algorithms that achieve optimal rates. These optimality results are not restricted to the family of unbiased private mechanisms. However, in contrast to our work, these results are only asymptotic hence their upper and lower bounds are matching only as the number of samples goes to infinity.

While there are several results in differential privacy that establish asymptotically matching lower and upper bounds for various problems of interest, strict optimality results are few. While there are some results known for the one-dimensional problem [GRS09, GS10], some of which extend to a large class of utility functions, such universal mechanisms are known not to exist for multidimensional problems [BN10]. [GKOV15, KOV16] show that for certain loss functions, one can phrase the problem of designing optimal local randomizers as linear programs, whose size is exponential in the size of the input domain.

2 Problem setting and preliminaries

We begin in this section by defining local differential privacy. To this end, we say that two probability distributions $P$ and $Q$ are $(\varepsilon,\delta)$ -close if for every event $E$

e^{-\varepsilon}(P(E)-\delta)\leq Q(E)\leq e^{\varepsilon}P(E)+\delta.

We say two random variables are $(\varepsilon,\delta)$ -close if their distributions are $(\varepsilon,\delta)$ -close.

We can now define local DP randomizers.

Definition 2.1.

A randomized algorithm $\mathcal{R}:X\to Y$ is (replacement) $(\varepsilon,\delta)$ -DP local randomizer if for all $x,x^{\prime}\in X$ , $\mathcal{R}(x)$ and $\mathcal{R}(x^{\prime})$ are $(\varepsilon,\delta)$ -close.

In this work, we will primarily be interested in pure DP randomizers, i.e. those which satisfy $(\varepsilon,0)$ -DP. We abbreviate this as $\varepsilon$ -DP. In the setting of local randomizers, the difference between $(\varepsilon,\delta)$ -DP and pure DP is not significant; indeed any $(\varepsilon,\delta)$ -DP local randomizer can be converted [FMT20, CU21] to one that satisfies $\varepsilon$ -DP while changing the distributions by a statistical distance of at most $O(\delta)$ .

The main problem we study in this work is locally private mean estimation. Here, we have $n$ unit vectors $v_{1},\dots,v_{n}\in\mathbb{R}^{d}$ , i.e $v_{i}\in\mathbb{S}^{d-1}$ . The goal is to design (locally) private protocols that estimate the mean $\frac{1}{n}\sum_{i=1}^{n}v_{i}$ . We focus on the setting of non-interactive private protocols: such a protocol consists of a pair of private local randomizer $\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z}$ and aggregation method $\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d}$ where the final output is $\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))$ . We require that the output is unbiased, that is, $\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}$ , and wish to find private protocols that minimize the variance

\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\|{\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\frac{1}{n}\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right].

Note that in the above formulation, the randomizer $\mathcal{R}$ can have arbitrary domains (not necessarily $\mathbb{R}^{d}$ ), and the aggregation method can be arbitrary as well. However, one important special family of private protocols, which we term canonical private protocols, are protocols where the local randomizer $\mathcal{R}:\mathbb{S}_{2}^{d-1}\to\mathbb{R}^{d}$ has outputs in $\mathbb{R}^{d}$ and the aggregation method is the simple additive aggregation $\mathcal{A}^{+}(z_{1},\dots,z_{n})=\frac{1}{n}\sum_{i=1}^{n}z_{i}$ . In addition to being a natural family of protocols, canonical protocols are $(i)$ simple and easy to implement, and $(ii)$ achieve the smallest possible variance amongst the family of all possible unbiased private protocols, as we show in the subsequent sections.

Notation

We let $\mathbb{S}^{d-1}=\{u\in\mathbb{R}^{d}:\left\|{u}\right\|_{2}=1\}$ denote the unit sphere, and $R\cdot\mathbb{S}^{d-1}$ denote the sphere of radius $R>0$ . Whenever clear from context, we use the shorter notation $\mathbb{S}$ . Given a random variable $V$ , we let $f_{V}$ denote the probability density function of $V$ . For a randomizer $\mathcal{R}$ and input $v$ , $f_{\mathcal{R}(v)}$ denotes the probability density function of the random variable $\mathcal{R}(v)$ . For a Gaussian random variable $V\sim\mathsf{N}(0,\sigma^{2})$ with $\sigma>0$ , we let $\phi_{\sigma}:\mathbb{R}\to\mathbb{R}$ denote the probability density function of $V$ and $\Phi_{\sigma}:\mathbb{R}\to[0,1]$ denote its cumulative distribution function. For ease of notation, we write $\phi$ and $\Phi$ when $\sigma=1$ . Given two random variables $V$ and $U$ , we say that $V\stackrel{{\scriptstyle d}}{{=}}U$ if $V$ and $U$ has the same distribution, that is, $f_{V}=f_{U}$ . Finally, we let $e_{i}\in\mathbb{R}^{d}$ denote the standard basis vectors and $O(d)=\{U\in\mathbb{R}^{d\times d}:UU^{T}=I\}$ denote the subspace of orthonormal matrices of dimension $d$ .

3 Optimality of PrivUnit

In this section, we prove our main optimality results showing that PrivUnit with additive aggregation achieves the optimal error among the family of unbiased locally private procedures. More precisely, we show that for any $\varepsilon$ -DP local randomizer $\mathcal{R}:\mathbb{R}^{d}\to\mathcal{Z}$ and any aggregation method $\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d}$ that is unbiased,

\mathsf{Err}_{n}(\mathcal{A}^{+},\text{PrivUnit})\leq\mathsf{Err}_{n}(\mathcal{A},\mathcal{R}).

We begin in Section 3.1 by introducing the algorithm PrivUnit and stating its optimality guarantees in Section 3.2. To prove the optimality result, we begin in Section 3.3 by showing that there exists a canonical private protocol that achieves the optimal error, then in Section 3.2 we show that PrivUnit is the optimal local randomizer in the family of canonical protocols.

3.1 PrivUnit

We begin by introducing PrivUnit which was developed by [BDFKR18]. Given an input vector $v\in\mathbb{S}^{d-1}$ and letting $W\sim\mathsf{Uni}(\mathbb{S}^{d-1})$ , $\text{PrivUnit}(p,\gamma)$ has the following distribution (up to normalization)

\text{PrivUnit}(p,\gamma)\sim\begin{cases}&W\mid\langle W,v\rangle\geq\gamma\quad\text{with prob. }p\\ &W\mid\langle W,v\rangle<\gamma\quad\text{with prob. }1-p\end{cases}

A normalization factor is needed to obtain the correct expectation. We provide full details in Algorithm 1.

The following theorem states the privacy guarantees of PrivUnit. Theorem 1 in [BDFKR18] provides privacy guarantees based on several mathematical approximations which may not be tight. For our optimality results, we require the following exact privacy guarantee of PrivUnit.

Theorem 1.

[BDFKR18, Theorem 1] Let $q=P(U_{1}\leq\gamma)$ where $W\sim\mathsf{Uni}(\mathbb{S}^{d-1})$ . If $\frac{p}{1-p}\frac{q}{1-q}\leq e^{\varepsilon}$ then $\text{PrivUnit}(p,\gamma)$ is an $\varepsilon$ -DP local randomizer.

Throughout the paper, we will sometimes use the equivalent notation $\text{PrivUnit}(p,q)$ which describes running $\text{PrivUnit}(p,\gamma)$ with $q=P(W_{1}\leq\gamma)$ as in Theorem 1.

Algorithm 1

\text{PrivUnit}(p,\gamma)

v\in\mathbb{S}^{d-1}

\gamma\in[0,1]

p\in[0,1]

B(\cdot;\cdot,\cdot)

below is the incomplete Beta function

B(x;a,b)=\int_{0}^{x}t^{a-1}(1-t)^{b-1}\textrm{d}t

and

B(a,b)=B(1;a,b)

1: Draw

z\sim\mathsf{Ber}(p)

2: if

z=1

then

3: Draw

V\sim\mathsf{Uni}\{u\in\mathbb{S}^{d-1}:\langle u,v\rangle\geq\gamma\}

4: else

5: Draw

V\sim\mathsf{Uni}\{u\in\mathbb{S}^{d-1}:\langle u,v\rangle<\gamma\}

6: Set

\alpha=\frac{d-1}{2}

and

\tau=\frac{1+\gamma}{2}

7: Calculate normalization constant

m=\frac{(1-\gamma^{2})^{\alpha}}{2^{d-2}(d-1)}\left(\frac{p}{B(\alpha,\alpha)-B(\tau;\alpha,\alpha)}+\frac{1-p}{B(\tau;\alpha,\alpha)}\right)

8: Return

\frac{1}{m}\cdot V

3.2 Optimality

Asymptotic optimality of PrivUnit has already been established by prior work. [BDFKR18] show that the error of PrivUnit is upper bounded by $O(\frac{d}{n\min(\varepsilon,\varepsilon^{2})})$ for certain parameters. Moreover, [DR19] show a lower bound of $\Omega(\frac{d}{n\min(\varepsilon,\varepsilon^{2})})$ , implying that PrivUnit is asymptotically optimal.

In this section, we prove that additive aggregation applied with PrivUnit with the best choice of parameters $p,\gamma$ is truly optimal, that is, it outperforms any unbiased private algorithm. The following theorem states our optimality result for PrivUnit.

Theorem 2.

Let $\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z}$ be an $\varepsilon$ -DP local randomizer, and $\mathcal{A}:\mathcal{Z}^{n}\to\mathbb{R}^{d}$ be an aggregation procedure such that $\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}$ for all $v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}$ . Then there is $p^{\star}\in[0,1]$ and $\gamma^{\star}\in[0,1]$ such that $\text{PrivUnit}(p^{\star},\gamma^{\star})$ is $\varepsilon$ -DP local randomizer and

\mathsf{Err}(\mathcal{A}^{+},\text{PrivUnit}(p^{\star},\gamma^{\star}))\leq\mathsf{Err}(\mathcal{A},\mathcal{R}).

The proof of Theorem 2 will proceed in two steps: first, in Section 3.3 (Proposition 1), we show that there exists an optimal private procedure that is canonical, then in Section 3.2 (Proposition 2) we prove that PrivUnit is the optimal randomizer in this family. Theorem 2 is a direct corollary of these two propositions.

3.3 Optimality of canonical protocols

In this section, we show that there exists a canonical private protocol that achieves the optimal error. In particular, we have the following result.

Proposition 1.

Let $(\mathcal{R},\mathcal{A})$ be such that $\mathcal{R}:\mathbb{S}^{d-1}\to\mathcal{Z}$ is $\varepsilon$ -DP local randomizer and $\mathbb{E}[\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))]=\frac{1}{n}\sum_{i=1}^{n}v_{i}$ for all $v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}$ . Then there a canonical randomizer $\mathcal{R}^{\prime}:\mathbb{S}^{d-1}\to\mathbb{R}^{d}$ that is $\varepsilon$ -DP local randomizer and

\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})\geq\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R}^{\prime}).

To prove Proposition 1, we begin with the following lemma.

Lemma 3.1.

Let $(\mathcal{R},\mathcal{A})$ satisfy the conditions of Proposition 1. Let $P$ be a probability distribution over $\mathbb{S}^{d-1}$ such that $\mathbb{E}_{v\sim P}[v]=0$ . There is $\hat{\mathcal{R}}_{i}:\mathbb{S}^{d-1}\to\mathbb{R}^{d}$ for $i\in[n]$ such that $\hat{\mathcal{R}}_{i}$ is $\varepsilon$ -DP local randomizer, $\mathbb{E}[\hat{\mathcal{R}}_{i}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ , and

\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\|_{2}^{2}\right]\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\|_{2}^{2}\right].

Before proving Lemma 3.1, we now complete the proof the Proposition 1.

Proof.

(Proposition 1) Let $P_{\mathsf{unif}}$ be the uniform distribution over the sphere $\mathbb{S}^{d-1}$ . First, note that

	$\displaystyle n^{2}\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})$	$\displaystyle=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
		$\displaystyle\geq\mathbb{E}_{v_{1},\dots,v_{n}\sim P_{\mathsf{unif}}}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
		$\displaystyle\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P_{\mathsf{unif}}}\left[\left\\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\\|_{2}^{2}\right].$

Now we define $\mathcal{R}^{\prime}_{i}$ as follows. First, sample a random rotation matrix $U\in\mathbb{R}^{d\times d}$ where $U^{T}U=I$ , then set

\mathcal{R}^{\prime}_{i}(v)=U^{T}\hat{\mathcal{R}}_{i}(Uv).

Note that $\mathcal{R}^{\prime}$ is $\varepsilon$ -DP local randomizer, $\mathbb{E}[\mathcal{R}^{\prime}_{i}(v)]=v$ , and for all $v\in\mathbb{S}^{d-1}$

	$\displaystyle\mathbb{E}\left[\left\\|{\mathcal{R}^{\prime}_{i}(v)-v}\right\\|_{2}^{2}\right]$	$\displaystyle=\mathbb{E}_{U}\left[\left\\|{U^{T}\hat{\mathcal{R}}_{i}(Uv)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathbb{E}_{U}\left[\left\\|{\hat{\mathcal{R}}_{i}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathbb{E}_{v\sim P_{\mathsf{unif}}}\left[\left\\|{\hat{\mathcal{R}}_{i}(v)-v}\right\\|_{2}^{2}\right].$

Overall, we have

	$\displaystyle n^{2}\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})$	$\displaystyle\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P_{\mathsf{unif}}}\left[\left\\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sum_{i=1}^{n}\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\\|{\mathcal{R}^{\prime}_{i}(v)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sum_{i=1}^{n}\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i})$
		$\displaystyle\geq n\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i^{\star}})$
		$\displaystyle=n^{2}\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R}_{i^{\star}}),$

where $i^{\star}\in[n]$ minimizes $\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R}^{\prime}_{i})$ . The claim follows. ∎

Now we prove Lemma 3.1.

Proof.

(Lemma 3.1) We define $\hat{\mathcal{R}}_{i}$ to be

\hat{\mathcal{R}}_{i}(v_{i})=\mathbb{E}_{v_{j}\sim P,j\neq i}[n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))].

Note that $\hat{\mathcal{R}}_{i}$ is $\varepsilon$ -DP local randomizer as it requires a single application of $\mathcal{R}(v_{i})$ . Moreover, $\mathbb{E}[\hat{R}_{i}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ . We define

\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})=\mathbb{E}_{v_{j}\sim P,j>i}\left[n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{j=1}^{i}v_{j}\right],

and $\hat{\mathcal{R}}_{0}=0$ . We now have

	$\displaystyle\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})+\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(i)}}{{=}}\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]+\mathbb{E}_{v_{1},\dots,v_{n-1}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(ii)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{1},\dots,v_{i}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(iii)}}{{\geq}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\\|{E_{v_{1},\dots,v_{i-1}\sim P}\left[\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})\right]}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(iv)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\\|_{2}^{2}\right]$

where $(i)$ follows since $E_{v_{n}\sim P}[\mathcal{R}_{\leq n}(v_{1},\dots,v_{n})]=\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})$ , $(ii)$ follows by induction, $(iii)$ follows from Jensen’s inequality, and $(iv)$ follows since $E_{v_{1},\dots,v_{i-1}\sim P}[\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})]=\hat{\mathcal{R}}_{i}(v_{i})-v_{i}$ and $E_{v_{1},\dots,v_{i-1}\sim P}[\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})]=0$ .

∎

3.4 Optimality of PrivUnit among canonical randomizers

In this section, we show that PrivUnit achieves the optimal error in the family of canonical randomizers. To this end, first note that for additive aggregation $\mathcal{A}^{+}$ , we have $\mathsf{Err}_{n}(\mathcal{A}^{+},\mathcal{R})=\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R})/n$ . Denoting $\mathsf{Err}(\mathcal{R})=\mathsf{Err}_{1}(\mathcal{A}^{+},\mathcal{R})$ for canonical randomizers, we have the following optimality result.

Proposition 2.

Let $\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d}$ be an $\varepsilon$ -DP local randomizer such that $\mathbb{E}[\mathcal{R}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ . Then there is $p^{\star}\in[0,1]$ and $\gamma^{\star}\in[0,1]$ such that $\text{PrivUnit}(p^{\star},\gamma^{\star})$ is $\varepsilon$ -DP local randomizer and

\mathsf{Err}(\text{PrivUnit}(p^{\star},\gamma^{\star}))\leq\mathsf{Err}(\mathcal{R}).

The proof of Proposition 2 builds on a sequence of lemmas, each of which allows to simplify the structure of an optimal algorithm. We begin with the following lemma which show that there exists an optimal algorithm which is invariant to rotations.

Lemma 3.2 (Rotation-Invariance Lemma).

1.

$\mathbb{E}[\mathcal{R}^{\prime}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$
2.

$\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})$
3.

$\mathsf{Err}(\mathcal{R}^{\prime})=\mathsf{Err}(\mathcal{R}^{\prime},v)$ for all $v\in\mathbb{S}^{d-1}$
4.

For any $v,v_{0}\in\mathbb{S}^{d-1}$ , there is an orthonormal matrix $V\in\mathbb{R}^{d\times d}$ such that $\mathcal{R}^{\prime}(v)\stackrel{{\scriptstyle d}}{{=}}V\mathcal{R}^{\prime}(v_{0})$ .
5.

$\frac{f_{\mathcal{R}^{\prime}(v)}(u_{1})}{f_{\mathcal{R}^{\prime}(v)}(u_{2})}\leq e^{\varepsilon}$ for any $v\in\mathbb{S}^{d-1}$ and $u_{1},u_{2}\in\mathbb{R}^{d}$ with $\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}$

Proof.

Given $\mathcal{R}$ , we define $\mathcal{R}^{\prime}$ as follows. First, sample a random rotation matrix $U\in\mathbb{R}^{d\times d}$ where $U^{T}U=I$ , then set

\mathcal{R}^{\prime}(x)=U^{T}\mathcal{R}(Ux).

We now prove that $\mathcal{R}^{\prime}$ satisfies the desired properties. First, note that the privacy of $\mathcal{R}$ immediately implies the same privacy bound for $\mathcal{R}^{\prime}$ . Moreover, we have that $\mathbb{E}[\mathcal{R}^{\prime}(v)]=\mathbb{E}[U^{T}\mathbb{E}[\mathcal{R}(Uv)]]=\mathbb{E}[U^{T}Uv]=v$ as $\mathcal{R}$ is unbiased and $U^{T}U=I$ . For utility, note that

	$\displaystyle\mathsf{Err}(\mathcal{R}^{\prime})$	$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}^{\prime}}\left[\left\\|{\mathcal{R}^{\prime}(v)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\\|{U^{T}\mathcal{R}(Uv)-U^{T}Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\\|{\mathcal{R}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1},U}\mathbb{E}_{\mathcal{R}}\left[\left\\|{\mathcal{R}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle\leq\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}}\left[\left\\|{\mathcal{R}(v)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathsf{Err}(\mathcal{R}).$

Finally, we show that the distributions of $\mathcal{R}^{\prime}(v_{1})$ and $\mathcal{R}^{\prime}(v_{2})$ are the same up to rotations. Indeed, let $V\in\mathbb{R}^{d\times d}$ be a rotation matrix such that $v_{1}=Vv_{2}$ . We have that $\mathcal{R}^{\prime}(v_{2})=U^{T}\mathcal{R}(Uv_{2})$ , which can also be written as $\mathcal{R}^{\prime}(v_{2})\stackrel{{\scriptstyle d}}{{=}}(UV)^{T}\mathcal{R}(UVv_{2})\stackrel{{\scriptstyle d}}{{=}}V^{T}U^{T}\mathcal{R}(Uv_{1})\stackrel{{\scriptstyle d}}{{=}}V^{T}\mathcal{R}^{\prime}(v_{1})$ as $UV$ is also a random rotation matrix.

Now we prove the final property. Assume towards a contradiction there is $v_{1}\in\mathbb{S}^{d-1}$ and $u_{1},u_{2}\in\mathbb{R}^{d}$ with $\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}$ such that $\frac{f_{\mathcal{R}^{\prime}(v_{1})}(u_{1})}{f_{\mathcal{R}^{\prime}(v_{1})}(u_{2})}>e^{\varepsilon}$ . We will show that this implies that $\mathcal{R}^{\prime}$ is not $\varepsilon$ -DP. In the proof above we showed that $\mathcal{R}^{\prime}(v_{2})\stackrel{{\scriptstyle d}}{{=}}V^{T}\mathcal{R}^{\prime}(v_{1})$ for $v_{1}=Vv_{2}$ . Therefore for $V$ such that $V^{T}u_{1}=u_{2}$ we get that $f_{\mathcal{R}^{\prime}(v_{2})}(u_{2})=f_{\mathcal{R}^{\prime}(v_{1})}(u_{1})$ which implies that

\frac{f_{\mathcal{R}^{\prime}(v_{2})}(u_{2})}{f_{\mathcal{R}^{\prime}(v_{1})}(u_{2})}>e^{\varepsilon},

which is a contradiction. ∎

Lemma 3.2 implies that we can restrict our attention to algorithms have the same density for all inputs up to rotations and hence allows to study their behavior for a single input. Moreover, as we show in the following lemma, given a randomizer that works for a single input, we can extend it to achieve the same error for all inputs. To facilitate notation, we say that a density $f:\mathbb{R}^{d}\to\mathbb{R}_{+}$ is $\varepsilon$ -indistinguishable if $\frac{f^{\prime}(u_{1})}{f^{\prime}(u_{2})}\leq e^{\varepsilon}$ for all $u_{1},u_{2}\in\mathbb{R}^{d}$ such that $\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}$ .

Lemma 3.3.

Fix $v_{0}=e_{1}\in\mathbb{S}^{d-1}$ . Let $f:\mathbb{R}^{d}\to\mathbb{R}_{+}$ be an $\varepsilon$ -indistinguishable density function with corresponding random variable $\mathcal{R}$ such that that $\mathbb{E}[\mathcal{R}]=e_{1}$ . There exists an $\varepsilon$ -DP local randomizer $\mathcal{R}^{\prime}$ such that $\mathsf{Err}(\mathcal{R}^{\prime},v)=\mathbb{E}[\left\|{\mathcal{R}-e_{1}}\right\|_{2}^{2}]$ and $\mathbb{E}[\mathcal{R}^{\prime}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ .

Proof.

The proof is similar to the proof of Lemma 3.2. For any $v\in\mathbb{S}^{d-1}$ , we let $U(v)\in\mathbb{R}^{d\times d}$ be an orthonormal matrix such that $v_{0}=U(v)v$ . Then, following Lemma 3.2, we define $\mathcal{R}^{\prime}(v)=U^{T}(v)\mathcal{R}$ . The claim immediately follows. ∎

Lemma 3.2 and Lemma 3.3 imply that we only need to study the behavior of randomizer for a fixed input. Henceforth, we will fix the input to $v=e_{1}$ and investigate properties of the density given $e_{1}$ .

Given $v=(v_{1},v_{2},\dots,v_{d})$ we define its reflection to be $v^{-}=(v_{1},-v_{2},\dots,-v_{d})$ . The next lemma shows that we can assume that the densities at $v$ and $v^{-}$ are equal for some optimal algorithm.

Lemma 3.4 (Reflection Symmetry).

Let $f:\mathbb{R}^{d}\to\mathbb{R}_{+}$ be an $\varepsilon$ -indistinguishable density function with corresponding random variable $\mathcal{R}$ such that that $\mathbb{E}[\mathcal{R}]=e_{1}$ . There is $f^{\prime}:\mathbb{R}^{d}\to\mathbb{R}_{+}$ with corresponding random variable $\mathcal{R}^{\prime}$ that satisfies the same properties such that $\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})$ and $f^{\prime}(u)=f^{\prime}(u^{-})$ for all $u\in\mathbb{R}^{d}$ .

Proof.

We define $f^{\prime}(u)=\frac{f(u)+f(u^{-})}{2}$ for all $u\in\mathbb{S}^{d-1}$ . First, it is immediate to see that $f(u)=f(u^{-})$ for all $u\in\mathbb{R}^{d}$ . Moreover, we have

	$\displaystyle\frac{f^{\prime}(u_{1})}{f^{\prime}(u_{2})}$	$\displaystyle=\frac{f(u_{1})+f(u_{1}^{-})}{f(u_{2})+f(u_{2}^{-})}$
		$\displaystyle\leq\max\left(\frac{f(u_{1})}{f(u_{2})},\frac{f(u_{1}^{-})}{f(u_{2}^{-})}\right)\leq e^{\varepsilon}.$

Note also that $\mathbb{E}[\mathcal{R}^{\prime}]=e_{1}$ since the marginal distribution of the first coordinate in the output did not change and it is clear that for other coordinates the expectation is zero as $u+u^{-}=c\cdot e_{1}$ for any $u\in\mathbb{R}^{d}$ . Finally, note that $\mathsf{Err}(\mathcal{R}^{\prime},e_{1})=\mathsf{Err}(\mathcal{R},e_{1})$ since $\left\|{u-e_{1}}\right\|_{2}=\left\|{u^{-}-e_{1}}\right\|_{2}$ for all $u\in\mathbb{R}^{d}$ . ∎

We also have the following lemma which shows that the optimal density $f$ outputs vectors on a sphere with some fixed radius.

Lemma 3.5.

Let $f:\mathbb{R}^{d}\to\mathbb{R}_{+}$ be an $\varepsilon$ -indistinguishable density function with corresponding random variable $\mathcal{R}$ such that that $\mathbb{E}[\mathcal{R}]=e_{1}$ . For any $\tau>0$ , there exists an $\varepsilon$ -indistinguishable density $f^{\prime}:\mathbb{R}^{d}\to\mathbb{R}_{+}$ with corresponding random variable $\mathcal{R}^{\prime}$ such that $\left\|{\mathcal{R}^{\prime}}\right\|_{2}=C$ for some $C>0$ , $\mathbb{E}[\mathcal{R}^{\prime}]=e_{1}$ and $\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})+\tau$ .

Proof.

By Lemma 3.4, we can assume without loss of generality that $\mathcal{R}$ satisfies reflection symmetry, that is $f(u)=f(u^{-})$ . We think of the density $f$ as first sampling a radius $R$ then sampling a vector $u\in\mathbb{R}^{d}$ of radius $R$ . We also assume that $\mathcal{R}$ has bounded radius; otherwise as $\mathsf{Err}(\mathcal{R})=\mathbb{E}[\left\|{\mathcal{R}^{2}}\right\|_{2}]-1$ is bounded, we can project the output of $\mathcal{R}$ to some large radius $R_{\tau}>0$ while increasing the error by at most $\tau>0$ for any $\tau$ . Similarly, we can assume that the output has radius at least $r_{\tau}$ while increasing the error by at most $\tau$ . Let $f_{R}$ denote the distribution of the radius, and $f_{u\mid R=r}$ be the conditional distribution of the output given the radius is $r$ . In this terminology $\mathbb{E}[\mathcal{R}]=e_{1}$ implies that

	$\displaystyle\mathsf{Err}(\mathcal{R},e_{1})$	$\displaystyle=\mathbb{E}[\\|\mathcal{R}-e_{1}\\|_{2}^{2}]$
		$\displaystyle=\mathbb{E}[\\|\mathcal{R}\\|_{2}^{2}+\\|e_{1}\\|_{2}^{2}-2\langle\mathcal{R},e_{1}\rangle]$
		$\displaystyle=\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]-1.$

For the purpose of finding the optimal algorithm, we need $\mathcal{R}$ that minimizes $\mathbb{E}[\left\|{\mathcal{R}}\right\|_{2}^{2}]$ . Denote $W_{r}=\mathbb{E}[\langle\mathcal{R},e_{1}\rangle\mid R=r]$ and set

\displaystyle C_{\max}=\sup_{r\in[r_{\tau},R_{\tau}]}\frac{W_{r}}{r}.

Noting that $\mathbb{E}[\langle\mathcal{R},e_{1}\rangle]=1$ , we have

	$\displaystyle\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]$	$\displaystyle=\frac{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]}{\mathbb{E}[\langle\mathcal{R},e_{1}\rangle]^{2}}$
		$\displaystyle=\frac{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}W_{r}dr\right)^{2}}$
		$\displaystyle=\frac{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}r(W_{r}/r)dr\right)^{2}}$
		$\displaystyle\geq\frac{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]}{\left(\int_{r=0}^{\infty}f_{r}rC_{\max}dr\right)^{2}}$
		$\displaystyle\geq\frac{1}{C_{\max}^{2}}\frac{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]}{\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}]^{2}}\geq\frac{1}{C_{\max}^{2}}.$

Now consider $r_{\max}>0$ that has $C_{\max}=W_{r_{\max}}/r_{\max}$ ; $r_{\max}$ exists as $\mathcal{R}$ has outputs in $[r_{\tau},R_{\tau}]$ . Let $f_{\max}$ denote the conditional distribution of $\mathcal{R}$ given that $R=r_{\max}$ and let $\mathcal{R}_{\max}$ denote the corresponding randomizer. We define a new randomizer $\mathcal{R}^{\prime}$ as follows

\mathcal{R}^{\prime}=\frac{1}{r_{\max}C_{\max}}\mathcal{R}_{\max},

with corresponding density $f^{\prime}$ . Note that $f^{\prime}$ is $\varepsilon$ -indistiguishable as $f$ is $\varepsilon$ -indistiguishable and the conditional distributions given different radii are disjoint which implies $f_{\max}$ is $\varepsilon$ -indistiguishable. Moreover $f_{\max}(u)=f_{\max}(u^{-})$ which implies that $\mathbb{E}[\mathcal{R}^{\prime}]=\frac{1}{r_{\max}C_{\max}}\mathbb{E}[R\mid R=r_{\max}]=e_{1}$ . Finally, note that $\mathcal{R}^{\prime}$ satisfies

	$\displaystyle\mathbb{E}[\left\\|{\mathcal{R}^{\prime}}\right\\|_{2}^{2}]$	$\displaystyle=\frac{1}{r_{\max}^{2}C_{\max}^{2}}\mathbb{E}[\left\\|{\mathcal{R}_{\max}}\right\\|_{2}^{2}]$
		$\displaystyle=\frac{1}{r_{\max}^{2}C_{\max}^{2}}\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}\mid R=r_{\max}]$
		$\displaystyle=\frac{1}{C_{\max}^{2}}\leq\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}].$

The claim follows. ∎

Before we present our main proposition which formulates the linear program that finds the optimal minimizer, we need the following key property which allows to describe the privacy guarantee as a linear constraint. We remark that such a lemma can easily be proven for deletion DP, so that our results would extend to that definition.

Lemma 3.6.

Let $\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d}$ be an $\varepsilon$ -DP local randomizer. There is $\rho:\mathbb{R}^{d}\to\mathbb{R}_{+}$ such that for all $v\in\mathbb{S}^{d-1}$ and $u\in\mathbb{R}$

e^{-\varepsilon/2}\leq\frac{f_{\mathcal{R}(v)}(u)}{\rho(u)}\leq e^{\varepsilon/2}.

Moreover, if $\mathcal{R}$ satisfies the properties of Lemma 3.2 (invariance) then $\rho(u_{1})=\rho(u_{2})$ for $\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}$ .

Proof.

Define $\rho(u)=\sqrt{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)\cdot\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}$ . Note that for all $v\in\mathbb{S}^{d-1}$ ,

\displaystyle\frac{f_{\mathcal{R}(v)}(u)}{\sqrt{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)\cdot\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}}

\displaystyle=\sqrt{\frac{f_{\mathcal{R}(v)}(u)}{{\inf_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}}}\sqrt{\frac{f_{\mathcal{R}(v)}(u)}{{\sup_{v\in\mathbb{S}^{d-1}}f_{\mathcal{R}(v)}(u)}}}\leq e^{\varepsilon/2}.

The second direction follows similarly. The second part of the claim follows as for any $u_{1},u_{2}\in\mathbb{R}^{d}$ such that $\left\|{u_{1}}\right\|_{2}=\left\|{u_{2}}\right\|_{2}$ , if $f_{R(v_{1})}(u_{1})=t$ for any mechanism that satisfies the properties of Lemma 3.2 then there is $v_{2}$ such that $f_{R(v_{2})}(u_{2})=t$ . The definition of $\rho$ now implies that $\rho(u_{1})=\rho(u_{2})$ . ∎

We are now ready to present our main step towards proving the optimality result. The following proposition formulates the problem of finding the optimal algorithm as a linear program. As a result, we show that there is an optimal algorithm whose density function has at most two different probabilities.

Proposition 3.

Let $\mathcal{R}:\mathbb{S}^{d-1}\to\mathbb{R}^{d}$ be an $\varepsilon$ -DP local randomizer such that $\mathbb{E}[\mathcal{R}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ . For any $\tau>0$ , there exist constants $C,p>0$ and an $\varepsilon$ -DP local randomizer $\mathcal{R}^{\prime}:\mathbb{S}^{d-1}\to C\cdot\mathbb{S}^{d-1}$ such that $\mathbb{E}[\mathcal{R}^{\prime}(v)]=v$ for all $v\in\mathbb{S}^{d-1}$ , $\mathsf{Err}(\mathcal{R}^{\prime})\leq\mathsf{Err}(\mathcal{R})+\tau$ , $f_{\mathcal{R}^{\prime}(v)}(u)=f_{\mathcal{R}^{\prime}(v)}(u^{-})$ , and $f_{\mathcal{R}^{\prime}(v)}(u)\in\{e^{-\varepsilon/2},e^{\varepsilon/2}\}p$ for all $u\in C\cdot\mathbb{S}^{d-1}$ .

Proof.

The proof will proceed by formulating a linear program which describes the problem of finding the optimal randomizer and then argue that minimizer of this program must satisfy the desired conditions. To this end, first we use the properties of the optimal randomizer from the previous lemmas to simplify the linear program. Lemma 3.5 implies that there exists an optimal randomizer $\mathcal{R}:\mathbb{S}^{d-1}\to C\cdot\mathbb{S}^{d-1}$ for some $C>0$ that is also invariant under rotations (satisfies the conclusions of Lemma 3.2). Moreover, Lemma 3.6 implies that the density function $f_{R(v)}$ has for some $p>0$

e^{-\varepsilon/2}p\leq f_{R(v)}(u)\leq e^{\varepsilon/2}p.

Adding the requirement of unbiasedness, and noticing that for such algorithms the error is $C^{2}-1$ , this results in the following minimization problem where the variables are $C$ and the density functions $f_{v}:C\mathbb{S}^{d-1}\to\mathbb{R}_{+}$ for all $v\in\mathbb{S}^{d-1}$

	$\displaystyle\arg\min_{C,f_{v}:C\mathbb{S}^{d-1}\to\mathbb{R}_{+}\text{for all }v\in\mathbb{S}^{d}}C\quad\quad$		(A)
	$\displaystyle\mathop{\rm subject\;to}$
	$\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{v}(u)\leq e^{\varepsilon/2}p,\quad v\in\mathbb{S}^{d-1},~{}u\in C\mathbb{S}^{d-1}$
	$\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{v}(u)udu=v,\quad v\in\mathbb{S}^{d-1}$
	$\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{v}(u)du=1,\quad v\in\mathbb{S}^{d-1}$

Lemma 3.2 and Lemma 3.3 also show that the optimal algorithm is invariant under rotations, and that we only need to find the output distribution $f$ with respect to a fixed input $v=e_{1}$ . Moreover, Lemma 3.4 says that can assume that $f_{e_{1}}(u)=f_{e_{1}}(u^{-})$ for all $u$ . We also work now with the normalized algorithm $\hat{\mathcal{R}}(v)=\mathcal{R}(v)/C$ (that is, the output on the unit sphere). Note that for $\hat{\mathcal{R}}$ we have $\mathbb{E}[\hat{\mathcal{R}}(v)]=v/C$ . Denoting $\alpha=1/C$ , this results in the following linear program (LP)

	$\displaystyle\arg\max_{\alpha,p,f_{e_{1}}:\mathbb{S}^{d-1}\to\mathbb{R}_{+}}\alpha\quad\quad$		(B)
	$\displaystyle\mathop{\rm subject\;to}$
	$\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p,\quad u\in\mathbb{S}^{d-1}$
	$\displaystyle\quad\quad\quad\quad\quad f_{e_{1}}(u)=f_{e_{1}}(u^{-}),\quad u\in\mathbb{S}^{d-1}$
	$\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{e_{1}}(u)udu=\alpha e_{1},$
	$\displaystyle\quad\quad\quad\quad\quad\int_{C\mathbb{S}^{d-1}}f_{e_{1}}(u)du=1.$

We need to show that most of the inequality constraints $e^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p$ must be tight at one of the two extremes. To this end, we approximate the LP (B) using a finite number of variables by discretizing the density function $f_{e_{1}}$ . We assume we have a $\delta/2$ -cover $S=\{u_{1},\dots,u_{K}\}$ of $\mathbb{S}^{d-1}$ . We assume without loss of generality that if $u_{i}\in S$ then $u_{i}^{-}\in S$ and we also write $S=S_{0}\cup S_{1}$ where $S_{0}=S_{1}^{-}$ and $S_{0}\cap S_{1}=\emptyset$ . Let $B_{i}=\{w\in\mathbb{S}^{d-1}:\left\|{w-u_{i}}\right\|_{2}\leq\left\|{w-u_{j}}\right\|_{2}\}$ , $V_{i}=\int_{u\in B_{i}}1du$ , and $\bar{u}_{i}=\mathbb{E}_{U\sim\mathsf{Uni}(\mathbb{S}^{d-1})}[U\mid U\in B_{i}]$ . Now we limit our linear program to density functions that are constant over each $B_{i}$ , resulting in the following LP

	$\displaystyle\arg\max_{\alpha,f_{e_{1}}:\mathbb{S}^{d-1}\to\mathbb{R}_{+}}\alpha\quad\quad$		(C)
	$\displaystyle\mathop{\rm subject\;to}$
	$\displaystyle\quad\quad\quad\quad\quad e^{-\varepsilon/2}p\leq f_{e_{1}}(u)\leq e^{\varepsilon/2}p,\quad u\in S_{0}$
	$\displaystyle\quad\quad\quad\quad\quad\sum_{u\in S_{0}}f_{e_{1}}(u)V_{i}\bar{u}_{i}+\sum_{u\in S_{1}}f_{e_{1}}(u^{-})V_{i}\bar{u}_{i}=\alpha e_{1},$
	$\displaystyle\quad\quad\quad\quad\quad\sum_{u\in S_{0}}f_{e_{1}}(u)V_{i}+\sum_{u\in S_{1}}f_{e_{1}}(u^{-})V_{i}=1.$

Let $\alpha^{\star}_{1}$ and $\alpha^{\star}_{2}$ denote the maximal values of (B) and (C), respectively. Each solution to (C) is also a solution to (B) hence $\alpha^{\star}_{1}\geq\alpha^{\star}_{2}$ . Moreover, given $\delta>0$ , let $f$ be a solution of (B) that obtains $\alpha\geq\alpha^{\star}_{1}-\delta$ and let $\mathcal{R}$ be the corresponding randomizer. We can now define a solution for the discrete program (C) by setting for $u\in B_{i}$ ,

\hat{f}(u)=\frac{1}{V_{i}}\int_{w\in B_{i}}f_{e_{1}}(w)dw

Equivalently, we can define $\hat{\mathcal{R}}$ as follows: first run $\mathcal{R}$ to get $u$ and find $B_{i}$ such that $u\in B_{i}$ . Then return a vector uniformly at random from $B_{i}$ . Note that $\hat{f}$ clearly satisfies the first and third constraints in (C). As for the second constraint, it follows since $\hat{f}_{e_{1}}(u)=\hat{f}_{e_{1}}(u^{-})$ which implies that $\sum_{u\in S}f_{e_{1}}(u)V_{i}\bar{u}_{i}=\hat{\alpha}v$ for some $\hat{\alpha}>0$ . It remains to show that $\hat{\alpha}\geq\alpha^{\star}_{1}-2\delta$ . The above representation of $\hat{\mathcal{R}}$ shows that $\|\mathbb{E}[\hat{\mathcal{R}}-\mathcal{R}]\|\leq\delta$ and therefore we have $\hat{\alpha}\geq\alpha^{\star}_{1}-2\delta$ .

To finish the proof, it remains to show that the discrete LP (C) has a solution that satisfies the desired properties. Note that as this is a linear program with $K$ variables and $2K+d+2$ constraints, $K$ linearly independent constraints must be satisfied [BT97, theorem 2.3], which shows that for at least $K-d-2$ of the sets $B_{i}$ we have $f(u_{i})\in\{e^{\varepsilon},e^{-\varepsilon}\}p$ .

To finish the proof, we need to manipulate the probabilities for the remaining $d-2$ sets to satisfy our desired requirements. As these sets have small probability, this does not change the accuracy by much and we just need to do this manipulation carefully so as to preserve reflection symmetry and unbiasedness. The full details are tedious and we present them in Section A.1. ∎

Given the previous lemmas, we are now ready to finish the proof of Proposition 2.

Proof.

Fix $\tau>0$ and an unbiased $\varepsilon$ -DP local randomizer $\mathcal{R}^{\star}$ . Proposition 3 shows that there exists $\mathcal{R}:\mathbb{S}^{d-1}\to r_{\max}\mathbb{S}^{d-1}$ that is $\varepsilon$ -DP, unbiased, reflection symmetric ( $f_{\mathcal{R}(e_{1})}(u)=f_{\mathcal{R}(e_{1})}(u^{-})$ for all $u$ ), and satisfies $f_{\mathcal{R}(e_{1})}(u)\in\{e^{-\varepsilon/2},e^{\varepsilon/2}\}p$ . Morevoer $\mathsf{Err}(\mathcal{R})\leq\mathsf{Err}(\mathcal{R}^{\star})+\tau$ . We will transform $\mathcal{R}$ into an instance of PrivUnit while maintaining the same error as $\mathcal{R}$ .

To this end, if $\mathcal{R}$ is an instance of PrivUnit then we are done. Otherwise let $f=f_{\mathcal{R}(e_{1})}$ , $S_{0}(t)=\{u:f(u)=pe^{\varepsilon/2},\langle u,e_{1}\rangle\leq t\}$ and $S_{1}(t)=\{u:f(u)=pe^{-\varepsilon/2},\langle u,e_{1}\rangle\geq t\}$ . Consider $t\in[-1,1]$ that solves the following minimization problem

\begin{split}\mathop{\rm minimize}_{t\in[-1,1]}~{}&\int_{S_{0}(t)}f(u)du+\int_{S_{1}(t)}f(u)du\end{split}

Let $p^{\star}$ be the value of the above minimization problem and $t^{\star}$ the corresponding minimizer. Let $p_{0}=\int_{S_{0}(t^{\star})}f(u)du$ and $p_{1}=\int_{S_{1}(t^{\star})}f(u)du$ . Assume without loss of generality that $p_{0}\leq p_{1}$ (the other direction follows from identical arguments). Let $\hat{S}\subseteq S_{1}(t^{\star})$ be such that $\hat{S}=\hat{S}^{-}$ and $\int_{\hat{S}}f(u)du=p_{0}$ . We define $\tilde{f}$ by swapping the probabilities on $\hat{S}$ and $S_{0}(t^{\star})$ , that is,

\tilde{f}(u)=\begin{cases}f(u)&\text{if }u\notin S_{0}(t^{\star})\cup\hat{S}\\ pe^{-\varepsilon/2}&\text{if }u\in S_{0}(t^{\star})\\ pe^{\varepsilon/2}&\text{if }u\in\hat{S}\end{cases}

Clearly $\tilde{f}$ still satisfies all of our desired properties and has $\mathbb{E}_{U\sim\tilde{f}}[\langle U,e_{1}\rangle]\geq\mathbb{E}_{U\sim f}[\langle U,e_{1}\rangle]$ as we have that $\langle u_{1},e_{1}\rangle\geq\langle u_{2},e_{1}\rangle$ for $u_{1}\in\hat{S}$ and $u_{2}\in S_{0}(t^{\star})$ . Note also that $\tilde{f}(u)=pe^{-\varepsilon/2}$ for $u$ such that $\langle u,e_{1}\rangle\leq t$ . Moreover, for $u$ such that $\langle u,e_{1}\rangle\geq t$ , we have that $\tilde{f}(u)=pe^{-\varepsilon/2}$ only if $u\in B\coloneqq S_{1}\setminus\hat{S}$ . Let $\delta$ be such that the set $A=\{u:t^{\star}\leq\langle u,e_{1}\rangle\leq t^{\star}+\delta\}$ has $\int_{u\in A}\tilde{f}(u)du=\int_{u\in B}\tilde{f}(u)du$ . We now define

\hat{f}(u)=\begin{cases}pe^{\varepsilon/2}&\text{if }\langle u,e_{1}\rangle\geq t^{\star}+\delta\\ pe^{-\varepsilon/2}&\text{if }\langle u,e_{1}\rangle\leq t^{\star}+\delta\end{cases}

Clearly, $\hat{f}(u)$ is an instance of PrivUnit. Now we prove that it satisfies all of our desired properties. First, note that we can write $\hat{f}$ as

\hat{f}(u)=\begin{cases}\tilde{f}(u)&\text{if }u\notin A\cup B\\ \tilde{f}(u)&\text{if }u\in A\cap B\\ pe^{-\varepsilon/2}&\text{if }u\in A\setminus B\\ pe^{\varepsilon/2}&\text{if }u\in B\setminus A\end{cases}

This implies that $\int_{\mathbb{S}^{d-1}}\hat{f}du=1$ . Moreover, $\hat{f}$ is $\varepsilon$ -indistiguishable by definition. Finally, note that $\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]\geq\mathbb{E}_{U\sim\tilde{f}}[\langle U,e_{1}\rangle]$ as $\langle u_{1},e_{1}\rangle\geq\langle u_{2},e_{1}\rangle$ for $u_{1}\in B\setminus A$ and $u_{2}\in A\setminus B$ . Let $\hat{\mathcal{R}}$ be the randomizer that corresponds to $\hat{f}$ . We define $\mathcal{R}^{\prime}=\frac{1}{\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]}\hat{\mathcal{R}}$ . We have that $\mathbb{E}[\mathcal{R}^{\prime}]=e_{1}$ and that

	$\displaystyle\mathsf{Err}(\mathcal{R}^{\prime})$	$\displaystyle=\frac{1}{\mathbb{E}_{U\sim\hat{f}}[\langle U,e_{1}\rangle]^{2}}-1$
		$\displaystyle\leq\frac{1}{\mathbb{E}_{U\sim f}[\langle U,e_{1}\rangle]^{2}}-1$
		$\displaystyle=\mathsf{Err}(\mathcal{R}).$

As $\mathcal{R}^{\prime}$ is an instance of PrivUnit, the claim follows. ∎

4 PrivUnitG: an optimal algorithm based on Gaussian distribution

In this section, we develop a new variant of PrivUnit, namely PrivUnitG, based on the Gaussian distribution. PrivUnitG essentially provides an easy-to-analyze approximation of the optimal algorithm PrivUnit. This enables to efficiently find accurate approximations of the optimal parameters $p^{\star}$ and $q^{\star}$ . In fact, we show that these parameters are independent of the dimension which is computationally valuable. Moreover, building on PrivUnitG, we are able to analytically study the constants that characterize the optimal loss.

The main idea in PrivUnitG is to approximate the uniform distribution over the unit sphere using a Gaussian distribution. Roughly, for a Gaussian random variable $U=\mathsf{N}(0,1/d)$ and input vector $v$ , PrivUnitG has the following distribution (up to normalization constants)

\text{PrivUnitG}\sim\begin{cases}&U\mid\langle U,v\rangle\geq\gamma\quad\quad\text{with probability }p\\ &U\mid\langle U,v\rangle<\gamma\quad\quad\text{with probability }1-p\end{cases}

We present the full details including the normalization constants in Algorithm 2. We usually use the notation $\text{PrivUnitG}(p,q)$ which means applying PrivUnitG with $p$ and $\gamma=\Phi^{-1}(q)/\sqrt{d}$ .

Algorithm 2

\text{PrivUnitG}(p,q)

v\in\mathbb{S}^{d-1}

q\in[0,1]

p\in[0,1]

1: Draw

z\sim\mathsf{Ber}(p)

2: Let

U=\mathsf{N}(0,\sigma^{2})

where

\sigma^{2}=1/d

3: Set

\gamma=\Phi_{\sigma^{2}}^{-1}(q)=\sigma\cdot\Phi^{-1}(q)

4: if

z=1

then

5: Draw

\alpha\sim U\mid U\geq\gamma

6: else

7: Draw

\alpha\sim U\mid U<\gamma

8: Draw

V^{\perp}\sim\mathsf{N}(0,\sigma^{2}(I-vv^{T})

9: Set

V=\alpha v+V^{\perp}

10: Calculate

m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right)

11: Return

\frac{1}{m}\cdot V

The following proposition gives the privacy and utility guarantees for PrivUnitG. The r.v. $\alpha$ is defined (see Algorithm 2) as $\alpha=\langle U,v\rangle$ where $U$ is drawn from $\text{PrivUnitG}(v)$ . We define $m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right)$ with $\sigma^{2}=1/d$ and $\gamma=\sigma\cdot\Phi^{-1}(q)$ . We defer the proof to Section B.1.

Proposition 4.

Let $p,q\in[0,1]$ such that $\frac{p}{1-p}\frac{q}{1-q}\leq e^{\varepsilon}$ . The algorithm $\text{PrivUnitG}(p,q)$ is $\varepsilon$ -DP local randomizer. Moreover, it is unbiased and has error

\mathsf{Err}(\text{PrivUnitG}(p,q))=\mathbb{E}[\left\|{\text{PrivUnitG}(v)-v}\right\|_{2}^{2}]=\frac{\mathbb{E}[\alpha^{2}]+\frac{d-1}{d}}{\mathbb{E}[\alpha]^{2}}-1.

Moreover, we have

m^{2}\cdot\mathsf{Err}(\text{PrivUnitG}(p,q))\stackrel{{\scriptstyle d\to\infty}}{{\to}}1.

Now we proceed to analyze the utility guarantees of PrivUnitG as compared to PrivUnit. To this end, we first define the error obtained by PrivUnitG with optimized parameters

\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})=\inf_{p,q:\frac{pq}{(1-p)(1-q)}\leq e^{\varepsilon}}\mathsf{Err}(\text{PrivUnitG}(p,q)).

Similarly, we define this quantity for PrivUnit

\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnit})=\inf_{p,q:\frac{pq}{(1-p)(1-q)}\leq e^{\varepsilon}}\mathsf{Err}(\text{PrivUnit}(p,q)).

The following theorem shows that PrivUnitG enjoys the same error as PrivUnit up to small factors.

Theorem 3.

Let $p\in[0,1]$ and $q\in[0,1]$ such that $\text{PrivUnit}(p,q)$ is $\varepsilon$ -DP local randomizer. Then $\text{PrivUnitG}(p,q)$ is also $\varepsilon$ -DP local randomizer and has

\displaystyle\mathsf{Err}(\text{PrivUnitG}(p,q))

\displaystyle\leq\mathsf{Err}(\text{PrivUnit}(p,q))\left(1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right).

In particular,

\displaystyle\frac{\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})}{\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnit})}\leq 1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right).

We conduct several experiments that demonstrate that the error of both algorithms is nearly the same as we increase the dimension. We plot the ratio of the error of PrivUnitG and PrivUnit (for the same $p$ and $\gamma$ ) for different epsilons and dimensions in Figure 1. These plots reaffirm the theoretical results of Theorem 3, that is, the ratio is smaller for large $d$ and small $\varepsilon$ .

\begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps4.0}.pdf} \end{overpic}	\begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps8.0}.pdf} \end{overpic}	\begin{overpic}[width=151.76964pt]{{plots/ratio_pug_pu_eps16.0}.pdf} \end{overpic}
(a)	(b)	(c)

Figure 1: Ratio of the error of PrivUnitG to PrivUnit for (a)

\varepsilon=4.0

(b)

\varepsilon=8.0

(c)

\varepsilon=16.0

. We use the same

p

and

\gamma

for both algorithms by finding the best

p,q

that minimize PrivUnitG.

Proof.

The privacy proof is straightforward as both algorithms use the same $p$ and $q$ therefore enjoy the same privacy parameter. Moreover, the second part of the claim follows immediately from the first part and the optimality of PrivUnit (Proposition 2).

Now we prove the first part of the claim. Let $\gamma_{2}$ be such that $q=P(W\leq\gamma_{2})$ where $W\sim\mathsf{Uni}(\mathbb{S}_{2}^{d-1})$ . Then we have

	$\displaystyle m_{2}$	$\displaystyle=\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]\frac{p+q-1}{q(1-q)}$
	$\displaystyle\mathsf{Err}_{2}$	$\displaystyle=\mathsf{Err}(\text{PrivUnit}(p,q))=\frac{1}{m_{2}^{2}}.$

Similarly for PrivUnitG we have (see e.g. proof of Proposition 4)

	$\displaystyle m_{G}$	$\displaystyle=E[\alpha]=\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]\frac{p+q-1}{q(1-q)}$
	$\displaystyle\mathsf{Err}_{G}$	$\displaystyle=\mathsf{Err}(\text{PrivUnitG}(p,q))=\frac{\mathbb{E}[\alpha^{2}]-1/d}{m_{G}^{2}}+\frac{1}{m_{G}^{2}}.$

Therefore we have

$\displaystyle\sqrt{\frac{\mathsf{Err}_{G}}{\mathsf{Err}_{2}}}$	$\displaystyle\leq\frac{m_{2}}{m_{G}}\sqrt{1+\mathbb{E}[\alpha^{2}]}$
	$\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}\sqrt{1+\mathbb{E}[\alpha^{2}]}$
	$\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}\left(1+O\left(\gamma_{G}+\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right),$	(1)

where inequality $(i)$ follows Lemma B.3. We now upper bound the first term. Note first that for the same $\gamma$ we have

	$\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]}$	$\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}1+O\left(\frac{{\sqrt{\log d}}{}}{d^{3/2}\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]}\right)$
		$\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}1+O\left(\frac{\sqrt{\log d}}{d}\right),$

where $(i)$ follows from Lemma B.2 and $(ii)$ follows since $\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]\geq\mathbb{E}[U1\!\left\{U\geq 0\right\}]=\mathbb{E}[|U|]/2\geq\Omega(\sigma)=\Omega(1/\sqrt{d})$ . We will need one more step to finish the proof as $\gamma_{2}$ and $\gamma_{G}$ are not necessarily equal. We divide to cases. If $\gamma_{2}\geq\gamma_{G}$ , then $\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]\geq\mathbb{E}[U1\!\left\{U\geq\gamma_{2}\right\}]$ and therefore

\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}

\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{2}\right\}]}.

Similarly, if $\gamma_{2}\leq\gamma_{G}$ then $\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]\leq\mathbb{E}[W1\!\left\{W\geq\gamma_{G}\right\}]$ and therefore

\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}

\displaystyle\leq\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{G}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}.

Overall this proves that

\displaystyle\frac{\mathbb{E}[W1\!\left\{W\geq\gamma_{2}\right\}]}{\mathbb{E}[U1\!\left\{U\geq\gamma_{G}\right\}]}

\displaystyle\leq 1+O\left(\frac{\sqrt{\log d}}{d}\right).

Putting this back in inequality (4), we have

	$\displaystyle\sqrt{\frac{\mathsf{Err}_{G}}{\mathsf{Err}_{2}}}$	$\displaystyle\leq\left(1+O\left(\frac{\sqrt{\log d}}{d}\right)\right)\left(1+O\left(\gamma_{G}+\sqrt{\frac{\varepsilon+\log d}{d}}\right)\right)$
		$\displaystyle\leq 1+O\left(\sqrt{\frac{\varepsilon+\log d}{d}}\right),$

where the last inequality follows from Lemma B.4 as $\gamma\leq O(\sqrt{{(\varepsilon+1)}/{d}})$ . The claim follows. ∎

4.1 Analytical expression for optimal error

We wish to understand the constants that characterize the optimal error. To this end, we build on the optimality of PrivUnitG and define the quantity $C_{\varepsilon,d}$ by

\mathsf{Err}^{\star}_{\varepsilon,d}(\text{PrivUnitG})=C_{\varepsilon,d}\frac{d}{\varepsilon}.

We show that $C_{\varepsilon,d}\to C_{\varepsilon}$ as $d\to\infty$ . Moreover, $C_{\varepsilon}\to C^{\star}$ as $\varepsilon\to\infty$ . We experimentally demonstrate the behavior of $C_{\varepsilon,d}$ and $C_{\varepsilon}$ in Figure 2. These experiments show that $C^{\star}\approx 0.614$ . We remark that as shown in [FT21], if $C_{\varepsilon}/C_{k\varepsilon}$ is close to $1$ , then one can get a near optimal algorithm for privacy parameter $k\varepsilon$ by repeating the algorithm for privacy parameter $\varepsilon$ $k$ times. The latter may be more efficient in terms of computation and this motivates understanding how quickly $C_{\varepsilon}$ converges.

\begin{overpic}[width=195.12767pt]{{plots/C_eps_d=35.0}.pdf} \end{overpic}	\begin{overpic}[width=195.12767pt]{{plots/C_eps}.pdf} \end{overpic}
(a)	(b)

Figure 2: (a)

C_{\varepsilon,d}

as a function of

d

for

\varepsilon=35

. (b)

C_{\varepsilon}

as a function of

\varepsilon

(we approximate

C_{\varepsilon}

by taking a sufficiently large dimension

d=5\cdot 10^{4}

)

The following proposition shows that $C_{\varepsilon,d}$ converges as we increase the dimension $d$ . We provide the proof in Section B.3.

Proposition 5.

Fix $\varepsilon>0$ . For any $1\leq d_{1}\leq d_{2}$ ,

1-O\left(\frac{\varepsilon+\log d_{2}}{d_{2}}+\frac{\varepsilon}{d_{1}}\right)\leq\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}}\leq 1+O\left(\frac{\varepsilon+\log d_{1}}{d_{1}}+\frac{\varepsilon}{d_{2}}\right).

In particular, as $d\to\infty$

C_{\varepsilon,d}\stackrel{{\scriptstyle d\to\infty}}{{\to}}C_{\varepsilon}.

The following proposition shows that $C_{\varepsilon}$ also converges as we increase $\varepsilon$ . We present the proof in Section B.4.

Proposition 6.

There is $C^{\star}>0$ such that $\lim_{\varepsilon\to\infty}C_{\varepsilon}=C^{\star}$ .

References

[App17] Apple Differential Privacy Team “Learning with Privacy at Scale” Available at https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html, 2017
[ASYKM18] Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar and Brendan McMahan “cpSGD: Communication-efficient and differentially-private distributed SGD” In Advances in Neural Information Processing Systems 31 Curran Associates, Inc., 2018, pp. 7564–7575 URL: https://proceedings.neurips.cc/paper/2018/file/21ce689121e39821d07d04faab328370-Paper.pdf
[BBGN19] Borja Balle, James Bell, Adrià Gascón and Kobbi Nissim “The Privacy Blanket of the Shuffle Model” In Advances in Cryptology – CRYPTO 2019 Cham: Springer International Publishing, 2019, pp. 638–667
[BDFKR18] Abhishek Bhowmick, John Duchi, Julien Freudiger, Gaurav Kapoor and Ryan Rogers “Protection Against Reconstruction and Its Applications in Private Federated Learning” In arXiv:1812.00984 [stat.ML], 2018
[BEMMRLRKTS17] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes and Bernhard Seefeld “Prochlo: Strong Privacy for Analytics in the Crowd” In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17 Shanghai, China: Association for Computing Machinery, 2017, pp. 441–459 DOI: 10.1145/3132747.3132769
[BIKMMPRSS17] Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal and Karn Seth “Practical Secure Aggregation for Privacy-Preserving Machine Learning” In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security New York, NY, USA: ACM, 2017, pp. 1175–1191
[BN10] Hai Brenner and Kobbi Nissim “Impossibility of Differentially Private Universally Optimal Mechanisms” In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, 2010, pp. 71–80 DOI: 10.1109/FOCS.2010.13
[BNO08] Amos Beimel, Kobbi Nissim and Eran Omri “Distributed private data analysis: Simultaneously solving how and what” In Advances in Cryptology 5157, Lecture Notes in Computer Science Springer, 2008, pp. 451–468
[BT97] Dimitris Bertsimas and John N Tsitsiklis “Introduction to linear optimization” Athena Scientific Belmont, MA, 1997
[CKÖ20] Wei-Ning Chen, Peter Kairouz and Ayfer Özgür “Breaking the communication-privacy-accuracy trilemma” In arXiv:2007.11707 [cs.LG], 2020
[CSUZZ19] Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber and Maxim Zhilyaev “Distributed Differential Privacy via Shuffling” In Advances in Cryptology – EUROCRYPT 2019 Cham: Springer International Publishing, 2019, pp. 375–403
[CU21] Albert Cheu and Jonathan Ullman “The Limits of Pan Privacy and Shuffle Privacy for Learning and Estimation” In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing New York, NY, USA: Association for Computing Machinery, 2021, pp. 1081–1094 URL: https://doi.org/10.1145/3406325.3450995
[DF87] Persi Diaconis and David Freedman “A dozen de Finetti-style results in search of a theory” In Annales de l’IHP Probabilités et statistiques 23, 1987, pp. 397–423
[DJW18] John C. Duchi, Michael I. Jordan and Martin J. Wainwright “Minimax Optimal Procedures for Locally Private Estimation (with discussion)” In Journal of the American Statistical Association 113.521, 2018, pp. 182–215
[DMNS06] C. Dwork, F. McSherry, K. Nissim and A. Smith “Calibrating noise to sensitivity in private data analysis” In TCC, 2006, pp. 265–284
[DR19] John C. Duchi and Ryan Rogers “Lower Bounds for Locally Private Estimation via Communication Complexity” In Proceedings of the Thirty Second Annual Conference on Computational Learning Theory, 2019
[Duc18] John C. Duchi “Introductory Lectures on Stochastic Convex Optimization” In The Mathematics of Data, IAS/Park City Mathematics Series American Mathematical Society, 2018
[EFMRSTT20] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Shuang Song, Kunal Talwar and Abhradeep Thakurta “Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation” In arXiv:2001.03618 [cs.CR], 2020
[EFMRTT19] Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar and Abhradeep Thakurta “Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity” In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’19 San Diego, California: Society for IndustrialApplied Mathematics, 2019, pp. 2468–2479
[EGS03] Alexandre V. Evfimievski, Johannes Gehrke and Ramakrishnan Srikant “Limiting privacy breaches in privacy preserving data mining” In Proceedings of the Twenty-Second Symposium on Principles of Database Systems, 2003, pp. 211–222
[EPK14] Ulfar Erlingsson, Vasyl Pihur and Aleksandra Korolova “RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response” In Proceedings of the 21st ACM Conference on Computer and Communications Security (CCS), 2014
[FMT20] Vitaly Feldman, Audra McMillan and Kunal Talwar “Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling” In arXiv:2012.12803 [cs.LG], 2020
[FT21] Vitaly Feldman and Kunal Talwar “Lossless Compression of Efficient Private Local Randomizers” In Proceedings of the 38th International Conference on Machine Learning 139 PMLR, 2021, pp. 3208–3219
[GDDKS20] Antonious M. Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz and Ananda Theertha Suresh “Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs”, 2020 arXiv:2008.07180 [cs.LG]
[GKMM19] Venkata Gandikota, Daniel Kane, Raj Kumar Maity and Arya Mazumdar “vqsgd: Vector quantized stochastic gradient descent” In arXiv preprint arXiv:1911.07971, 2019
[GKOV15] Quan Geng, Peter Kairouz, Sewoong Oh and Pramod Viswanath “The Staircase Mechanism in Differential Privacy” In IEEE Journal of Selected Topics in Signal Processing 9.7, 2015, pp. 1176–1184 DOI: 10.1109/JSTSP.2015.2425831
[GRS09] Arpita Ghosh, Tim Roughgarden and Mukund Sundararajan “Universally Utility-Maximizing Privacy Mechanisms” In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC ’09 Bethesda, MD, USA: Association for Computing Machinery, 2009, pp. 351–360 DOI: 10.1145/1536414.1536464
[GS10] Mangesh Gupte and Mukund Sundararajan “Universally Optimal Privacy Mechanisms for Minimax Agents” In Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’10 Indianapolis, Indiana, USA: Association for Computing Machinery, 2010, pp. 135–146 DOI: 10.1145/1807085.1807105
[KOV16] Peter Kairouz, Sewoong Oh and Pramod Viswanath “Extremal Mechanisms for Local Differential Privacy” In J. Mach. Learn. Res. 17.1 JMLR.org, 2016, pp. 492–542 URL: http://dl.acm.org/citation.cfm?id=2946645.2946662
[War65] Stanley L Warner “Randomized response: A survey technique for eliminating evasive answer bias” In Journal of the American Statistical Association 60.309 Taylor & Francis Group, 1965, pp. 63–69
[YB17] Min Ye and Alexander Barg “Asymptotically optimal private estimation under mean square loss” In arXiv:1708.00059 [math.ST], 2017
[YB18] Min Ye and Alexander Barg “Optimal Schemes for Discrete Distribution Estimation Under Locally Differential Privacy” In IEEE Transactions on Information Theory 64.8, 2018, pp. 5662–5676

Appendix A Missing details for PrivUnit (Section 3)

A.1 Proof of Proposition 3 (missing details)

Here we complete the missing details from the proof of Proposition 3. We have $B_{1},\dots,B_{K}$ sets such that for at least $K-d-2$ of the sets $B_{i}$ we have $f(u_{i})\in\{e^{\varepsilon},e^{-\varepsilon}\}p$ . We now show how to manipulate the probabilities on the other sets to satisfy our desired properties while not affecting the accuracy. Assume without loss of generality that $B_{1},\dots,B_{d-2}$ do not satisfy this condition and let $B_{\mathsf{bad}}=\cup_{1\leq i\leq d-2}B_{i}$ . We now show that we can manipulate the density on $B_{\mathsf{bad}}$ such that it will satisfy this condition. Moreover, we show that the probability of $u\in B_{\mathsf{bad}}$ is small so that this does not affect the accuracy of the algorithm. Note that the probability $u\in B_{\mathsf{bad}}$ is at most

p_{\mathsf{bad}}\coloneqq P(\hat{\mathcal{R}}\in B_{\mathsf{bad}})\leq d\max_{i}V_{i}pe^{\varepsilon}\leq dV\delta^{d}pe^{\varepsilon},

where $V=\int_{u\in\mathbb{S}^{d-1}}1du$ . Now we proceed to balance the density in $B_{\mathsf{bad}}$ . Let $B_{\mathsf{bad}}=A_{0}\cup A_{1}$ where $A_{0}=A_{1}^{-}$ and $A_{0}\cap A_{1}=\emptyset$ . We show how to balance the probabilities for $u\in A_{0}$ such that the mass on $A_{0}$ is $p_{\mathsf{bad}}/2$ . Then we define the density for $u\in A_{1}$ using the density of $u^{-}$ as $u^{-}\in A_{0}$ . Let $V_{A_{0}}=\int_{u\in A_{0}}1du$ be the measure of the set $A_{0}$ . We divide $A_{0}$ to two sets $A_{0}^{1}$ and $A_{0}^{2}$ such that $\int_{u\in A_{0}^{1}}1du=\rho V_{A_{0}}$ and $\int_{u\in A_{0}^{2}}1du=(1-\rho)V_{A_{0}}$ . We define a new density function $\tilde{f}$ such that

\tilde{f}(u)=\begin{cases}\hat{f}(u)&\quad\quad\quad\text{if }u\notin B_{\mathsf{bad}}\\ \hat{p}e^{\varepsilon}&\quad\quad\quad\text{if }u\in A_{0}^{1}\\ \hat{p}e^{-\varepsilon}&\quad\quad\quad\text{if }u\in A_{0}^{2}\\ \tilde{f}(u^{-})&\quad\quad\quad\text{if }u\in A_{1}\end{cases}

First, note that by design we have $\tilde{f}(u)=\tilde{f}(u^{-})$ . We now prove that the choice $\rho=\frac{{p_{\mathsf{bad}}}/{2pV_{A_{0}}}-e^{-\varepsilon}}{e^{\varepsilon}-e^{-\varepsilon}}$ satisfies all of our conditions. First we show that $\rho\in[0,1]$ . Indeed as $\hat{f}(u)\in[e^{-\varepsilon},e^{\varepsilon}]p$ for all $u$ , we get that the average density in $B_{\mathsf{bad}}$ is also in this range, that is, $\frac{p_{\mathsf{bad}}}{pV_{\mathsf{bad}}}=\frac{p_{\mathsf{bad}}}{2pV_{A_{0}}}\in[e^{-\varepsilon},e^{\varepsilon}]$ , which implies $\rho\in[0,1]$ . Moreover, note that

\displaystyle\int_{u\in A_{0}}\tilde{f}(u)du=pe^{\varepsilon}V_{A_{0}}\rho+pe^{-\varepsilon}V_{A_{0}}(1-\rho)=p_{\mathsf{bad}}/2.

This implies that $\int_{u\in\mathbb{S}^{d-1}}\tilde{f}(u)du=1$ . Finally, note that this does not affect $\alpha$ too much as we have

	$\displaystyle\tilde{\alpha}$	$\displaystyle=\sum_{u\in S}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle$
		$\displaystyle=\sum_{u\in S\setminus B_{\mathsf{bad}}}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle+\sum_{u\in S\cap B_{\mathsf{bad}}}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle$
		$\displaystyle=\sum_{u\in S}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle+\sum_{u\in S\cap B_{\mathsf{bad}}}\tilde{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle-\sum_{u\in S\cap B_{\mathsf{bad}}}\hat{f}(u)V_{i}\langle v,\bar{u}_{i}\rangle$
		$\displaystyle\geq\hat{\alpha}-p_{\mathsf{bad}}.$

Note that for sufficiently small $\delta$ , the error of this algorithm is now

	$\displaystyle\frac{1}{\tilde{\alpha}^{2}}$	$\displaystyle\leq\frac{1}{(\hat{\alpha}-p_{\mathsf{bad}})^{2}}$
		$\displaystyle\leq\frac{1}{(\alpha-p_{\mathsf{bad}}-2\delta)^{2}}$
		$\displaystyle\leq\frac{1}{\alpha^{2}}+\frac{20(p_{\mathsf{bad}}+2\delta)}{\alpha^{2}}$
		$\displaystyle\leq\frac{1}{\alpha^{2}}+\tau,$

where the last inequality follows by choosing $\delta$ small enough such that $\frac{20(p_{\mathsf{bad}}+2\delta)}{\alpha^{2}}\leq\tau$ which gives the claim.

Appendix B Proofs and missing details for Section 4

B.1 Proof of Proposition 4

We will use the following helper lemma.

Lemma B.1.

Let $U\sim\mathsf{N}(0,\sigma^{2})$ . Then

\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]=\sigma\phi(\gamma/\sigma).

Proof.

We have

	$\displaystyle\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]$	$\displaystyle=\int_{t=\gamma}^{\infty}u\phi_{\sigma^{2}}(t)dt$
		$\displaystyle=\int_{t=\gamma}^{\infty}u\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-t^{2}/2\sigma^{2}}dt$
		$\displaystyle=\frac{\sigma}{\sqrt{2\pi}}e^{-\gamma^{2}/2\sigma^{2}}$
		$\displaystyle=\sigma\phi(\gamma/\sigma).$

∎

We begin by proving PrivUnitG is unbiased. Note that $\mathbb{E}[\frac{1}{m}V]=\mathbb{E}[\alpha/m]\cdot v$ therefore we need to show that $\mathbb{E}[\alpha]=m$ . To this end,

	$\displaystyle\mathbb{E}[\alpha]$	$\displaystyle=\left(p\mathbb{E}[U\mid U\geq\gamma]+(1-p)\mathbb{E}[U\mid U<\gamma]\right)$
		$\displaystyle=\frac{p}{P(U\geq\gamma)}\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]+\frac{1-p}{P(U<\gamma)}\mathbb{E}[U1\!\left\{U\leq\gamma\right\}]$
		$\displaystyle=\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]\left(\frac{p}{P(U\geq\gamma)}-\frac{1-p}{P(U<\gamma)}\right)$
		$\displaystyle=\sigma\phi(\gamma/\sigma)\left(\frac{p}{P(U\geq\gamma)}-\frac{1-p}{P(U<\gamma)}\right)$
		$\displaystyle=m,$

where the third inequality follows since $\mathbb{E}[U]=\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]+\mathbb{E}[U1\!\left\{U<\gamma\right\}]=0$ , and the last inequality follows since Lemma B.1 gives that $\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]=\sigma\phi(\gamma/\sigma)$ for $U\sim\mathsf{N}(0,\sigma^{2})$ . For the claim about utility, as PrivUnitG is unbiased we have

\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)-v}\right\|_{2}^{2}]=\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)}\right\|_{2}^{2}]-1=\frac{1}{m^{2}}\left(\mathbb{E}[\alpha^{2}]+\frac{d-1}{d}\right)-1

Now we prove the claim about the limit. First, note that $P(U\leq\gamma)=q$ hence we can write

m=\sigma\phi(\gamma/\sigma)\left(\frac{p}{1-q}-\frac{1-p}{q}\right).

Moreover, Lemma B.3 and Lemma B.4 shows that $\mathbb{E}[\alpha^{2}]\leq O\left(\gamma^{2}+\frac{\varepsilon+\log d}{d}\right)\leq O\left(\frac{\varepsilon+\log d}{d}\right)$ . Taking limit as $d\to\infty$ , this yields that

m^{2}\cdot\mathbb{E}[\left\|{\text{PrivUnitG}_{2}(v)-v}\right\|_{2}^{2}]\stackrel{{\scriptstyle d\to\infty}}{{\to}}1.

Now we proceed to prove the privacy claim. We need to show that for every $v_{1},v_{2}\in\mathbb{S}_{2}^{d}$ and $u\in\mathbb{R}^{d}$

\frac{f_{\text{PrivUnitG}(v_{1})}(u)}{f_{\text{PrivUnitG}(v_{2})}(u)}\leq e^{\varepsilon}.

For every input vector, we divide the output space to two sets: $S_{v}=\{u\in\mathbb{R}^{d}:\langle u,v\rangle\geq\gamma\}$ and $S_{v}^{c}=\mathbb{R}^{d}\setminus S_{v}$ . The definition of PrivUnitG implies that for $u\in S_{v}$ we have

f_{\text{PrivUnitG}(v)}(u)=p\frac{f_{U}(u)}{P(U\geq\gamma)},

and for $u\notin S_{v}$ we have

f_{\text{PrivUnitG}(v)}(u)=(1-p)\frac{f_{U}(u)}{P(U\leq\gamma)}.

Using the notation $q=P(U\leq\gamma)$ , we now have that for any $v_{1},v_{2}$ and $u$

	$\displaystyle\frac{f_{\text{PrivUnitG}(v_{1})}(u)}{f_{\text{PrivUnitG}(v_{2})}(u)}$	$\displaystyle\leq\frac{p/(1-q)}{(1-p)/q}$
		$\displaystyle\leq\frac{p}{1-p}\cdot\frac{q}{1-q}$
		$\displaystyle\leq e^{\varepsilon},$

where the first inequality follows since we must have $u\in S_{v_{1}}$ and $u\notin s_{v_{2}}$ to maximize the ratio.

B.2 Helper lemmas for Theorem 3

Lemma B.2.

We have

|\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]-\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]|\leq O\left(\frac{\sqrt{\log d}}{d^{3/2}}\right).

Proof.

We use the fact that for $U$ and $W$ , we have $P(U\leq\gamma)\leq P(W\leq\gamma)+\frac{8}{d-4}$ ([DF87, Inequality (1)]). We have

	$\displaystyle\|\mathbb{E}[W1\!\left\{W\geq\gamma\right\}]-\mathbb{E}[U1\!\left\{U\geq\gamma\right\}]\|$	$\displaystyle=\|\int_{\gamma}^{\infty}u(f_{W}(u)-f_{U}(u))du\|$
		$\displaystyle\leq\|\int_{\gamma}^{C}u(f_{W}(u)-f_{U}(u))du\|+\|\int_{C}^{\infty}u(f_{W}(u)-f_{U}(u))du\|$
		$\displaystyle\leq C\left\\|{P_{W}-P_{U}}\right\\|_{TV}+\int_{C}^{1}uf_{W}(u)du+\frac{f_{U}(C)}{d}$
		$\displaystyle\leq\frac{8C}{d-4}+\frac{e^{-C^{2}d/2}}{\sqrt{2\pi d}}+f_{W}(C)$
		$\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}\frac{8C}{d-4}+\frac{e^{-C^{2}d/2}}{\sqrt{2\pi d}}+O(\sqrt{d}e^{-C^{2}d/8})$
		$\displaystyle\leq O(\frac{\sqrt{\log d}}{d^{3/2}}).$

where the last inequality follows by setting $C=\Theta(\sqrt{\frac{\log(d)}{d}})$ . Inequality $(i)$ follows since $W$ has the same distribution as $2B-1$ for $B\sim\mathsf{Beta}(\frac{d-1}{2},\frac{d-1}{2})$ . Indeed setting $\alpha=\frac{d-1}{2}$ , we have for any $b=1/2+\rho$

	$\displaystyle f_{B}(b)$	$\displaystyle=\frac{(b(1-b))^{\alpha-1}}{B(\alpha,\alpha)}$
		$\displaystyle=\frac{(1-\rho^{2})^{\alpha-1}}{4^{\alpha-1}}\frac{1}{B(\alpha,\alpha)}$
		$\displaystyle\leq\frac{e^{-\rho^{2}(\alpha-1)}}{4^{\alpha-1}}\frac{1}{B(\alpha,\alpha)}$
		$\displaystyle\leq O(\sqrt{\alpha}e^{-\rho^{2}\alpha}),$

where the last inequality follows from

\displaystyle B(\alpha,\alpha)=\frac{2}{\alpha}\frac{1}{\binom{2\alpha}{\alpha}}\geq\frac{2}{\alpha}\Omega(\frac{\sqrt{\alpha}}{4^{\alpha}})=\Omega(\frac{1}{\sqrt{\alpha}4^{\alpha}}).

The claim now follows as $f_{W}(C)=f_{B}(1/2+C/2)$ . ∎

Lemma B.3.

We have

\mathbb{E}[\alpha^{2}]\leq O\left(\gamma^{2}+\frac{\varepsilon_{G}+\log d}{d}\right).

Proof.

It is enough to upper bound $\mathbb{E}[U^{2}\mid U\geq\gamma]$ . Since $P(U\geq\gamma)\geq e^{-\varepsilon_{G}}$

	$\displaystyle\mathbb{E}[U^{2}\mid U\geq\gamma]$	$\displaystyle=\int_{\gamma}^{C}u^{2}\frac{f_{U}(u)}{P(U\geq\gamma)}du+\int_{C}^{\infty}u^{2}\frac{f_{U}(u)}{P(U\geq\gamma)}du$
		$\displaystyle\leq C^{2}+e^{\varepsilon_{G}}\frac{1}{\sqrt{2\pi d}}\int_{C}^{\infty}u^{2}de^{-u^{2}d/2}du$
		$\displaystyle\leq C^{2}+e^{\varepsilon_{G}}\frac{e^{-C^{2}d/4}}{\sqrt{2\pi d}}\int_{C}^{\infty}u^{2}de^{-u^{2}d/4}du$
		$\displaystyle\leq O\left(\gamma^{2}+\frac{\varepsilon_{G}+\log d}{d}\right),$

where the last inequality follows by setting $C=\Theta\left(\max\left(\gamma,\sqrt{\frac{\varepsilon_{G}+\log d}{d}}\right)\right)$ . ∎

The following lemma is also useful for our analysis.

Lemma B.4.

Assume $\text{PrivUnitG}(p,\gamma)$ is $\varepsilon$ -DP. Then $\gamma^{2}\leq\frac{2\log(e^{\varepsilon}+1)}{d}$ .

Proof.

Proposition 4 implies that $\gamma=\frac{\Phi^{-1}(q)}{\sqrt{d}}$ where $q=\frac{e^{\varepsilon_{1}}}{e^{\varepsilon_{1}}+1}$ for $\varepsilon_{1}\leq\varepsilon$ . As $\Phi^{-1}(q)$ is increasing in $q$ , we have that

\gamma=\frac{\Phi^{-1}(q)}{\sqrt{d}}=\frac{\Phi^{-1}(\frac{e^{\varepsilon}}{e^{\varepsilon}+1})}{\sqrt{d}}.

Gaussian concentration gives that for $t^{2}\geq 2\log(e^{\varepsilon}+1)$

P(\mathsf{N}(0,1)>t)\leq e^{-t^{2}/2}\leq 1/(e^{\varepsilon}+1).

This implies that $\Phi^{-1}(\frac{e^{\varepsilon}}{e^{\varepsilon}+1})\leq\sqrt{2\log(e^{\varepsilon}+1)}$ and proves the claim.

∎

B.3 Proof of Proposition 5

Recall that the error of $\text{PrivUnitG}(p,q)$ for dimension $d$ is

\mathsf{Err}_{d}(\text{PrivUnitG}(p,q))=\frac{\mathbb{E}[\alpha^{2}]+1}{m_{d}^{2}}-1

where

m_{d}=\frac{\phi(\gamma\sqrt{d})}{\sqrt{d}}\left(\frac{p}{1-q}-\frac{1-p}{q}\right).

Note first that $\gamma\sqrt{d}=\Phi^{-1}(q)$ which immediately implies that

\frac{m_{d_{1}}}{m_{d_{2}}}=\sqrt{\frac{d_{2}}{d_{1}}}.

Lemma B.3 and Lemma B.4 show that $\mathbb{E}[\alpha^{2}]\leq O(\frac{\varepsilon+\log d}{d})$ . Finally, as $\mathsf{Err}_{d}(p,q)\geq Cd/\varepsilon$ for all $p$ and $q$ , this implies that $m_{d}^{2}\leq O(\varepsilon/d)$ . Letting $\mathsf{Err}_{d}$ denote the error for for inputs $u\in\mathbb{S}^{d-1}$ , we now have

	$\displaystyle\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}}$	$\displaystyle=\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))/d_{1}}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))/d_{2}}$
		$\displaystyle\leq\frac{d_{2}}{d_{1}}\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{2},q_{2}))}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))}$
		$\displaystyle\leq\frac{d_{2}}{d_{1}}\frac{\frac{\mathbb{E}[\alpha_{1}^{2}]+1}{m_{d_{1}}^{2}}}{\frac{1}{m_{d_{2}}^{2}}-1}$
		$\displaystyle\leq\frac{d_{2}}{d_{1}}(\mathbb{E}[\alpha_{1}^{2}]+1)\frac{m_{d_{2}}^{2}}{m_{d_{1}}^{2}}\frac{1}{1-m_{d_{2}}^{2}}$
		$\displaystyle=(\mathbb{E}[\alpha_{1}^{2}]+1)\frac{1}{1-m_{d_{2}}^{2}}$
		$\displaystyle\leq\left(1+O(\frac{\varepsilon+\log d_{1}}{d_{1}})\right)\left(1+O(\frac{\varepsilon}{d_{2}})\right)$
		$\displaystyle\leq 1+O\left(\frac{\varepsilon+\log d_{1}}{d_{1}}+\frac{\varepsilon}{d_{2}}\right).$

This proves the right hand side of the inequality. The left side follows using the same arguments by noting that

	$\displaystyle\frac{C_{\varepsilon,d_{1}}}{C_{\varepsilon,d_{2}}}$	$\displaystyle=\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))/d_{1}}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{2},q_{2}))/d_{2}}$
		$\displaystyle\geq\frac{d_{2}}{d_{1}}\frac{\mathsf{Err}_{d_{1}}(\text{PrivUnitG}(p_{1},q_{1}))}{\mathsf{Err}_{d_{2}}(\text{PrivUnitG}(p_{1},q_{1}))}.$

The second part of the claim regarding the limit follows directly from the first part.

B.4 Proof of Proposition 6

We need the following lemma for this proof.

Lemma B.5.

We have

1.

$C_{\varepsilon_{2}}\leq C_{\varepsilon_{1}}\frac{\varepsilon_{2}}{\varepsilon_{1}}$ for $\varepsilon_{1}\leq\varepsilon_{2}$
2.

$C_{k\varepsilon}\leq C_{\varepsilon}$ for any integer $k\geq 1$

Proof.

The first part follows as $\mathsf{Err}_{\varepsilon_{2},d}\leq\mathsf{Err}_{\varepsilon_{1},d}$ as $\varepsilon_{1}\leq\varepsilon_{2}$ which implies that

C_{\varepsilon_{2},d}=\varepsilon_{2}\mathsf{Err}_{\varepsilon_{2},d}/d\leq\varepsilon_{2}\mathsf{Err}_{\varepsilon_{1},d}/d=\frac{\varepsilon_{2}}{\varepsilon_{1}}\varepsilon_{1}\mathsf{Err}_{\varepsilon_{1},d}/d=\frac{\varepsilon_{2}}{\varepsilon_{1}}C_{\varepsilon_{1},d}.

This implies the claim for $C_{\varepsilon}$ .

The second part follows from apply a repetition-based randomizer. Given an $\varepsilon$ -DP local randomizer $\mathcal{R}_{d}$ for $\mathbb{S}_{2}^{d-1}$ that achieves $C_{\varepsilon,d}$ (that is, $\mathsf{Err}(\mathcal{R}_{d})=C_{\varepsilon,d}d/\varepsilon$ ), the randomizer $\mathcal{R}^{\prime}_{d}$ that returns an average of $k$ applications of $\mathcal{R}$ is $k\varepsilon$ -DP and has error $\mathsf{Err}(R^{\prime})\leq\frac{\mathsf{Err}(R)}{k}$ . Thus we have

C_{k\varepsilon,d}=k\varepsilon\mathsf{Err}_{k\varepsilon,d}/d\leq k\varepsilon\mathsf{Err}(\mathcal{R}^{\prime}_{d})/d\leq\varepsilon\mathsf{Err}(\mathcal{R}_{d})/d=C_{\varepsilon,d}.

The proves the claim for $C_{\varepsilon}$ as it is true for all $d$ . ∎

We are now ready to prove Proposition 6.

Proof.

(Proposition 6) First, note that $C_{\varepsilon}$ is a sequence of real numbers that is bounded from below by zero. Thus $\liminf_{\varepsilon\to\infty}C_{\varepsilon}$ exists. Now we will show that $\lim_{\varepsilon\to\infty}C_{\varepsilon}$ exists. Let $C^{\star}=\liminf_{\varepsilon\to\infty}C_{\varepsilon}$ . It is enough to show that for all $\delta>0$ there is $\varepsilon_{0}$ such that for all $\varepsilon\geq\varepsilon_{0}$ we get $C_{\varepsilon}\leq C^{\star}+\delta$ . The definition of $\liminf$ implies that there is $\varepsilon^{\prime}$ such that $C_{\varepsilon^{\prime}}\leq C^{\star}+\delta/2$ . We will show that for all $\varepsilon\geq k\varepsilon_{1}$ we get $C_{\varepsilon}\leq C^{\star}+\delta$ for some large $k$ . Let $\varepsilon=k^{\prime}\varepsilon_{1}+\varepsilon_{2}$ where $0\leq\varepsilon_{2}<\varepsilon_{1}$ and $k^{\prime}\geq k$ . We have that

	$\displaystyle C_{\varepsilon}$	$\displaystyle\stackrel{{\scriptstyle(i)}}{{\leq}}C_{k^{\prime}\varepsilon_{1}}\frac{\varepsilon}{k^{\prime}\varepsilon_{1}}$
		$\displaystyle\stackrel{{\scriptstyle(ii)}}{{\leq}}C_{\varepsilon_{1}}\frac{\varepsilon}{k^{\prime}\varepsilon_{1}}$
		$\displaystyle\leq C_{\varepsilon_{1}}(1+1/k)$
		$\displaystyle\leq(C^{\star}+\delta/2)(1+1/k)$
		$\displaystyle\leq C^{\star}+\delta,$

where $(i)$ and $(ii)$ follow from the first and second items of Lemma B.5 and the last inequality follows by setting $k=\left\lceil{1/(C^{\star}+\delta/2)}\right\rceil$ . The claim follows.

∎

	$\displaystyle n^{2}\mathsf{Err}_{n}(\mathcal{A},\mathcal{R})$	$\displaystyle=\sup_{v_{1},\dots,v_{n}\in\mathbb{S}^{d-1}}\mathbb{E}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
		$\displaystyle\geq\mathbb{E}_{v_{1},\dots,v_{n}\sim P_{\mathsf{unif}}}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
		$\displaystyle\geq\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P_{\mathsf{unif}}}\left[\left\\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\\|_{2}^{2}\right].$

	$\displaystyle\mathbb{E}\left[\left\\|{\mathcal{R}^{\prime}_{i}(v)-v}\right\\|_{2}^{2}\right]$	$\displaystyle=\mathbb{E}_{U}\left[\left\\|{U^{T}\hat{\mathcal{R}}_{i}(Uv)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathbb{E}_{U}\left[\left\\|{\hat{\mathcal{R}}_{i}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathbb{E}_{v\sim P_{\mathsf{unif}}}\left[\left\\|{\hat{\mathcal{R}}_{i}(v)-v}\right\\|_{2}^{2}\right].$

	$\displaystyle\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{n\mathcal{A}(\mathcal{R}(v_{1}),\dots,\mathcal{R}(v_{n}))-\sum_{i=1}^{n}v_{i}}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad=\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})+\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(i)}}{{=}}\mathbb{E}_{v_{1},\dots,v_{n}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n}(v_{1},\dots,v_{n})-\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]+\mathbb{E}_{v_{1},\dots,v_{n-1}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq n-1}(v_{1},\dots,v_{n-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(ii)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{1},\dots,v_{i}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(iii)}}{{\geq}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\\|{E_{v_{1},\dots,v_{i-1}\sim P}\left[\hat{\mathcal{R}}_{\leq i}(v_{1},\dots,v_{i})-\hat{\mathcal{R}}_{\leq i-1}(v_{1},\dots,v_{i-1})\right]}\right\\|_{2}^{2}\right]$
	$\displaystyle\quad\stackrel{{\scriptstyle(iv)}}{{=}}\sum_{i=1}^{n}\mathbb{E}_{v_{i}\sim P}\left[\left\\|{\hat{\mathcal{R}}_{i}(v_{i})-v_{i}}\right\\|_{2}^{2}\right]$

	$\displaystyle\mathsf{Err}(\mathcal{R}^{\prime})$	$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}^{\prime}}\left[\left\\|{\mathcal{R}^{\prime}(v)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\\|{U^{T}\mathcal{R}(Uv)-U^{T}Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{U,\mathcal{R}}\left[\left\\|{\mathcal{R}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle=\sup_{v\in\mathbb{S}^{d-1},U}\mathbb{E}_{\mathcal{R}}\left[\left\\|{\mathcal{R}(Uv)-Uv}\right\\|_{2}^{2}\right]$
		$\displaystyle\leq\sup_{v\in\mathbb{S}^{d-1}}\mathbb{E}_{\mathcal{R}}\left[\left\\|{\mathcal{R}(v)-v}\right\\|_{2}^{2}\right]$
		$\displaystyle=\mathsf{Err}(\mathcal{R}).$

	$\displaystyle\mathsf{Err}(\mathcal{R},e_{1})$	$\displaystyle=\mathbb{E}[\\|\mathcal{R}-e_{1}\\|_{2}^{2}]$
		$\displaystyle=\mathbb{E}[\\|\mathcal{R}\\|_{2}^{2}+\\|e_{1}\\|_{2}^{2}-2\langle\mathcal{R},e_{1}\rangle]$
		$\displaystyle=\mathbb{E}[\left\\|{\mathcal{R}}\right\\|_{2}^{2}]-1.$