A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility

Introduction

The shuffle model (Bittau2017prochlo) is a state-of-the-art technique to balance privacy and utility for differentially private data analysis. In traditional differential privacy, a trusted server (or aggregator) is often assumed to collect all users’ data before privacy-preserving data analysis (Dwork2014algorithmic). However, such approaches may not be feasible or practical in scenarios where a trusted curator does not exist. Given this, Local Differential Privacy (LDP) (KS11) has been proposed to achieve differential privacy by allowing the users to add noises individually; however, LDP suffers from low utility due to the accumulated noise. To address this, the shuffle model of differential privacy (shuffle DP) (Bittau2017prochlo; balle2019privacy; Erlingsson2019amplification) adds a shuffler between the users and the server to randomly shuffle the noisy data before sending the server. The shuffle DP has an intriguing theoretical privacy amplification effect, which means a small amount of local noise could result in a strong privacy guarantee against the untrusted server. Extensive studies (balle2019privacy; Erlingsson2019amplification; Girgis2021renyi; Feldman2022hiding; liu2021flame; GDDSK21federated) have been devoted to proving a better (tighter) privacy amplification in the shuffle DP.

However, most existing studies have focused on the shuffle model based on $(\epsilon_{0},\delta_{0})$ -LDP randomizer with uniform and limited settings of local privacy parameters $\epsilon_{0}$ and $\delta_{0}$ . For example, Erlingsson2019amplification assumes $0<\epsilon_{0}<1/2$ and $\delta_{0}=0$ . Although a recent work Liu2023 provides a privacy bound for local personalized privacy parameter $\epsilon_{i}$ for each user $i$ (and a fixed $\delta_{0}$ ), the bound is relatively rough and has a large room to be improved. To address this problem, we make the following contributions.

Firstly, we propose a Generalized Shuffle framework for Privacy Amplification (GSPA) to allow arbitrary local privacy parameters and provide new privacy amplification analysis. Our analysis technique benefits from the adoption of Functional Differential Privacy (Dong2022gaussian) and carefully analyzing the distance between two multinomial distributions (see Theorem 1 and 2). For both uniform and personalized privacy parameter settings, we provide lower privacy bounds that exceed that of existing results (see Figure 2).

Secondly, we apply GSPA with different personalized privacy parameter settings to diverse privacy-preserving analysis tasks, including private mean, private frequency estimation, and DP-SGD, to demonstrate the effectiveness of our approach. For mean and frequency estimation with GSPA (see Figure 3), the more conservative users there are, the less utility is observed, showing a negative linear relationship. Simultaneously, as the privacy parameters of conservative users increase, utility demonstrates a positive linear relationship. For DP-SGD with GSPA (see Figure 4), there exists an interesting phenomenon that despite the constant scenario ( $\epsilon_{0}=0.5$ ) offers a stronger privacy protection, its test accuracy (94.8%) is higher than that (93.5%) of scenarios $U(0.01,2)$ , which have varying local privacy parameters.

Preliminaries

This section presents the definitions and tools necessary for understanding the shuffle model. These serve as fundamental tools for proposing our methods and form the basis of our approach.

Definition 1

(Differential Privacy) A randomized algorithm $\mathcal{R}$ satisfies $(\epsilon,\delta)$ -differential privacy, denoted as $(\epsilon,\delta)$ -DP, if for all $\mathcal{S}\subseteq$ Range $(\mathcal{R})$ and for all neighboring databases $D_{0},D_{1}$ ( $D_{0}$ can be obtained from $D_{1}$ by replacing exactly one record):

\mathbb{P}(\mathcal{R}(D_{0})\in\mathcal{S})\leq e^{\epsilon}\mathbb{P}(\mathcal{R}(D_{1})\in\mathcal{S})+\delta.

(1)

$\epsilon$ is known as the privacy budget, while $\delta$ is referred to as the indistinguishability parameter, which describes the probability of privacy leakage exceeding $\epsilon$ . Both $\epsilon$ and $\delta$ should be as small as possible, indicating stronger privacy protection.

Definition 2

(Local Differential Privacy) A randomized algorithm $\mathcal{R}:\mathcal{D}\rightarrow\mathcal{S}$ satisfies $(\epsilon,\delta)$ -local differential privacy, denoted as $(\epsilon,\delta)$ -LDP, if for all pairs $x,x^{\prime}\in\mathcal{D}$ , $\mathcal{R}(x)$ and $\mathcal{R}(x^{\prime})$ satisfies

\mathbb{P}(\mathcal{R}(x)\in\mathcal{S})\leq e^{\epsilon}\mathbb{P}(\mathcal{R}(x^{\prime})\in\mathcal{S})+\delta.

(2)

In Local Differential Privacy (LDP), each data contributor applies a local randomization mechanism to perturb their own data before sharing it with a central aggregator.

Refer to caption — Figure 1: The Generalized Shuffle framework for Privacy Amplification (GSPA). Privacy parameters $(\epsilon_{i},\delta_{i})$ and each client’s output are shuffled separately. The random permutation of the shuffler is unknown to anyone except the shuffler itself. The type of query, whether non-adaptive or adaptive, depends on whether the next query depends on the previous output.

Privacy Tools

Differential privacy can be regarded as a hypothesis testing problem for a given distribution (Kairouz2015composition). In brief, we consider the hypothesis testing issue with two hypotheses.

	$\displaystyle H_{0}:\text{The underlying dataset is }D_{0},$
	$\displaystyle H_{1}:\text{The underlying dataset is }D_{1}.$

To provide an intuitive explanation, we designate the name Bob to denote the exclusive individual present in $D_{0}$ but absent in $D_{1}$ . Consequently, rejecting the null hypothesis implies the recognition of Bob’s nonexistence, whereas accepting the null hypothesis suggests observing Bob’s existence in the dataset.

Inspired by this, an effective tool called $f$ -DP (Dong2022gaussian) has been introduced, which utilizes hypothesis testing to handle differential privacy. For two neighbouring databases $D_{0}$ and $D_{1}$ , let $U$ and $V$ denote the probability distributions of $\mathcal{R}(D_{0})$ and $\mathcal{R}(D_{1})$ , respectively. We consider a rejection rule $0\leq\phi\leq 1$ , with type I and type II error rates defined as

\alpha_{\phi}=\mathbb{E}_{U}[\phi],\quad\beta_{\phi}=1-\mathbb{E}_{V}[\phi].

(3)

It is well-known that

\alpha_{\phi}+\beta_{\phi}\geq 1-TV(U,V),

(4)

where $TV(U,V)$ is the supremum of $|U(A)-V(A)|$ over all measurable sets $A$ . To characterize the fine-grained trade-off between the two errors, Table 1 helps to establish a clear understanding of the relationship between the two errors.

	Actual True	Actual False
Accept Hypothesis	Correct	Type II Error ( $\beta$ )
Reject Hypothesis	Type I Error ( $\alpha$ )	Correct

Table 1: Table of Error Types

For any two probability distributions $U$ and $V$ on the same space $\Omega$ , the trade-off function $T(U,V):[0,1]\rightarrow[0,1]$ is defined as

T(U,V)(\alpha)=\inf\{\beta_{\phi}:\alpha_{\phi}\leq\alpha\},

(5)

where the infimum is taken over all measurable rejection rules $\phi$ , and $\alpha_{\phi}=\mathbb{E}_{U}(\phi)$ and $\beta_{\phi}=1-\mathbb{E}_{V}(\phi)$ .

Definition 3

(Functional Differential Privacy, $f$ -DP) Let $f$ be a trade-off function, a mechanism $\mathcal{R}$ is said to be $f$ -DP if

T(\mathcal{R}(D_{0}),\mathcal{R}(D_{1}))\geq f,

(6)

for all neighboring data sets $D_{0}$ and $D_{1}$ .

To enhance readability, we have included the introduction and relevant properties of $f$ -DP in the section of Appendix. It is worth noting that traditional DP belongs to a special case of $f$ -DP, therefore $f$ -DP has a wider scope of applicability.

In addition, Laplace mechanism and Gaussian mechanism are two common approaches used in differential privacy (Dwork2014algorithmic). The choice between the Laplace mechanism and the Gaussian mechanism depends on the data type, privacy requirements, and query tasks. The Laplace mechanism provides stronger privacy but may introduce larger errors, while the Gaussian mechanism is more suitable for accurate results. Thus, it’s important to strike a balance between privacy and accuracy based on specific requirements.

Definition 4 (Laplace Mechanism)

Given a query function $Q:D\rightarrow\mathbb{R}^{d}$ , privacy parameter $\epsilon$ and $\ell_{1}$ sensitivity $\Delta(Q)=\max\|Q(D)-Q(D^{\prime})\|_{1}$ , then for any two neighbouring datasets $D,D^{\prime}$ , the Laplace mechanism

M(D,Q)=Q(D)+\text{Lap}\left(\frac{\Delta(Q)}{\epsilon}\right)

(7)

preserves $\epsilon$ -DP, where Lap( $\lambda$ ) denotes the centralized Laplace noise with scale parameter $\lambda$ .

In the absence of ambiguity, we express both queries and answers as $Q(\cdot)$ and $A(\cdot)$ respectively.

Definition 5 (Gaussian Mechanism)

Given a query function $Q:D\rightarrow\mathbb{R}^{d}$ , privacy parameter $\epsilon$ and $\ell_{2}$ sensitivity $\Delta_{2}(Q)=\max\|Q(D)-Q(D^{\prime})\|_{2}$ , then for any two neighbouring datasets $D,D^{\prime}$ , the Gaussian mechanism

M(D,Q)=Q(D)+N\left(0,\frac{2\log(1.25/\delta)\Delta_{2}^{2}(Q)}{\epsilon^{2}}\right)

(8)

preserves $(\epsilon,\delta)$ -DP, where $N(\mu,\sigma^{2})$ denotes the Gaussian noise with mean $\mu$ and variance $\sigma^{2}$ .

Privacy Analysis of GSPA Framework

Our Generalized Shuffle framework for Privacy Amplification (GSPA) consists of three main components: local randomizers, a trustworthy shuffler, and an aggregator, which are the same as existing shuffle DP frameworks; however, GSPA allows local randomizers with arbitrary privacy parameters. (i) For $n$ users, the local randomizer $M_{i}$ adds noise to the original data $x_{i}$ on the $i$ -th user’s devices, thus providing $(\epsilon^{\ell}_{i},\delta^{\ell}_{i})$ -PLDP for user $i$ . (ii) The shuffler randomly permutes the order of data elements, ensuring that the resulting arrangement is unknown to any party other than the shuffler itself. (iii) The aggregator collects and integrates shuffled data for simple queries, while for complex tasks like machine learning, it trains models based on shuffled data with multiple iterations. Without causing confusion, the notation $(\epsilon_{0},\delta_{0})$ -LDP is used to represent the uniform scenario, while $(\epsilon_{i},\delta_{i})$ -PLDP denotes the personalized scenario.

Privacy Amplification Effect

In this section, we address the issue of privacy protection in the context of a general shuffled adaptive process for personalized local randomizers.

Definition 6

For a domain $\mathcal{D}$ , let $\mathcal{R}^{(i)}:\mathcal{S}^{(1)}\times\mathcal{S}^{(2)}\times\cdots\times\mathcal{S}^{(i-1)}\times\mathcal{D}\rightarrow\mathcal{S}^{(i)}$ for $i$ $\in[n]$ , where $\mathcal{S}^{(i)}$ is the range space of $\mathcal{R}^{(i)}$ be a sequence of algorithms such that $\mathcal{R}^{(i)}(z_{1:i-1},\cdot)$ is an $(\epsilon_{i},\delta_{i})$ -PLDP randomizer for all values of auxiliary inputs $z_{1:i-1}\in\mathcal{S}^{(1)}\times\mathcal{S}^{(2)}\times\cdots\mathcal{S}^{(i-1)}$ . Let $\mathcal{A}_{R}:\mathcal{D}\rightarrow\mathcal{S}^{(1)}\times\mathcal{S}^{(2)}\times\cdots\times\mathcal{S}^{(n)}$ be the algorithm that given a dataset $x_{1:n}\in\mathcal{D}^{n}$ , then sequentially computes $z_{i}=\mathcal{R}^{(i)}(z_{1:i-1},x_{i})$ for $i\in[n]$ and outputs $z_{1:n}$ . We say $\mathcal{A}_{R}(\mathcal{D})$ is a personalized LDP (PLDP) adaptive process. Similarly, if we first sample a permutation $\pi$ uniformly at random, then sequentially computes $z_{i}=\mathcal{R}^{(i)}(z_{1:i-1},x_{\pi_{i}})$ for $i\in[n]$ and outputs $z_{1:n}$ , we say this process is shuffled PLDP adaptive and denote it by $\mathcal{A}_{R,S}(\mathcal{D})$ .

Lemma 1

Given an $(\epsilon_{i},\delta_{i})$ -PLDP adaptive process, then in the $i$ -th step, local randomizer $\mathcal{R}^{(i)}$ : $\mathcal{D}\rightarrow\mathcal{S}$ and for any $n+1$ inputs $x_{1}^{0},x_{1}^{1},x_{2},\cdots,x_{n}\in\mathcal{D}$ , there exists distributions $\mathcal{Q}_{1}^{0},\mathcal{Q}_{1}^{1},\mathcal{Q}_{1},\mathcal{Q}_{2},\cdots,\mathcal{Q}_{n}$ such that

\mathcal{R}^{(i)}(x_{1}^{0})=\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}\mathcal{Q}_{1}^{0}+\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{Q}_{1}^{1}+\delta_{i}\mathcal{Q}_{1},

(9)

\mathcal{R}^{(i)}(x_{1}^{1})=\frac{(1-\delta_{i})}{1+e^{\epsilon_{i}}}\mathcal{Q}_{1}^{0}+\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}\mathcal{Q}_{1}^{1}+\delta_{i}\mathcal{Q}_{1}.

(10)

$\forall x_{i}\in\{x_{2},\cdots,x_{n}\},$

\mathcal{R}(x_{i})=\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{Q}^{0}_{1}+\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{Q}^{1}_{1}+\left(1-\frac{2(1-\delta_{i})}{1+e^{\epsilon_{i}}}\right)\mathcal{Q}^{i}.

(11)

Proof:

For inputs $X_{0}=\{x_{1}^{0},x_{2},\ldots,x_{n}\}$ and $X_{1}=\{x^{1}_{1},x_{2},\ldots,x_{n}\}$ , $\mathcal{R}^{(i)}$ satisfies the constraints of Lemma 4, so there exists an $(\epsilon_{i},\delta_{i})$ -PLDP local randomizer $\mathcal{R^{\prime}}:\mathcal{D}\rightarrow\mathcal{Z}$ for the $i$ -th output and post-processing function $proc(\cdot)$ such that $proc(\mathcal{R^{\prime}}^{(i)}(x))=\mathcal{R}^{(i)}(x)$ , and

P(\mathcal{R^{\prime}}^{(i)}(x_{1}^{0})=z)=\left\{\begin{array}[]{ll}0&\mbox{if }z=A,\\ \frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}&\mbox{if }z=0,\\ \frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}&\mbox{if }z=1,\\ \delta_{i}&\mbox{if }z=B.\end{array}\right.

P(\mathcal{R^{\prime}}^{(i)}(x_{1}^{1})=z)=\left\{\begin{array}[]{ll}\delta_{i}&\mbox{if }z=A,\\ \frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}&\mbox{if }z=0,\\ \frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}&\mbox{if }z=1,\\ 0&\mbox{if }z=B.\end{array}\right.

Let $L=\{z\in\mathcal{Z}|\mathbb{P}(\mathcal{R^{\prime}}(x_{1}^{0}=z))=\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}$ and $\mathbb{P}(\mathcal{R^{\prime}}(x_{1}^{1}=z))=\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\}$ , $U=\{z\in\mathcal{Z}|\mathbb{P}(\mathcal{R^{\prime}}(x_{1}^{1}=z))=\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}$ and $\mathbb{P}(\mathcal{R^{\prime}}(x_{1}^{1}=z))=\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}\}$ . Let $M=\mathcal{Z}/\{L\bigcup U\}$ and $p=\sum_{z\in\mathcal{L}}p_{z}=\sum_{z\in\mathcal{U}}p_{z}.$ Since conditioned on the output lying in $\mathcal{L}$ , the distribution of $\mathcal{R}^{\prime}(x_{1}^{0})$ and $\mathcal{R}^{\prime}(x_{1}^{1})$ are the same. Let $\mathcal{W}_{1}^{0}=\mathcal{R}^{\prime}(x_{1}^{0})|L=\mathcal{R}^{\prime}(x_{1}^{1})|L$ , $\mathcal{W}^{1}_{1}=\mathcal{R}^{\prime}(x_{1}^{0})|U=\mathcal{R}^{\prime}(x_{1}^{1})|U$ and $\mathcal{W}_{1}=\mathcal{R}^{\prime}(x_{1}^{0})|M=\mathcal{R}^{\prime}(x_{1}^{1})|M.$ Then

\mathcal{R}^{\prime}(x_{1}^{0})=\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{0}+\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{1}+\delta_{i}\mathcal{W}_{1},

\mathcal{R}^{\prime}(x_{1}^{1})=\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{0}+\frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{1}+\delta_{i}\mathcal{W}_{1}.

Further, for all $x_{i}\in\{x_{2},\cdots,x_{n}\}$ ,

\mathcal{R}^{\prime}(x_{i})\geq\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{0}+\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}\mathcal{W}_{1}^{1}+\left(1-\frac{2(1-\delta_{i})}{1+e^{\epsilon_{i}}}\right)\mathcal{W}_{i}.

Letting $\mathcal{Q}_{1}^{0}=proc(\mathcal{W}_{1}^{0})$ , $\mathcal{Q}_{1}^{1}=proc(\mathcal{W}_{1}^{1})$ , $\mathcal{Q}_{1}=proc(\mathcal{W}_{1})$ and for all $i\in\{2,\cdots,n\},$ $\mathcal{Q}_{i}=proc(\mathcal{W}_{i})$ . The proof is completed. $\square$

Theorem 1

For a domain $\mathcal{D}$ , if $\mathcal{A}_{R,S}(\mathcal{D})$ is a shuffled PLDP adaptive process, then for arbitrary two neighboring datasets $D_{0},D_{1}\in\mathcal{D}^{n}$ distinct at the $n$ -th data point, there exists a post-processing function $proc(\cdot)$ : $(0,1,2)\rightarrow\mathcal{S}^{(1)}\times\mathcal{S}^{(2)}\times\cdots\times\mathcal{S}^{(n)},$ such that

T(\mathcal{A}_{R,S}(D_{0}),\mathcal{A}_{R,S}(D_{1}))=T(proc(\rho_{0}),proc(\rho_{1})).

Here,

\rho_{0}=(\Delta_{0},\Delta_{1},\Delta_{2})+\boldsymbol{V},

(12)

\rho_{1}=(\Delta_{1},\Delta_{0},\Delta_{2})+\boldsymbol{V},

(13)

$\Delta_{2}\sim Bern(\delta_{n}),\Delta_{0}\sim Bin(1-\Delta_{2},\frac{e^{\epsilon_{n}}}{1+e^{\epsilon_{n}}})$ , $\Delta_{1}=1-\Delta_{0}-\Delta_{2}$ , where $Bern(p)$ denotes a Bernoulli random variable with bias $p$ , $Bin(n,p)$ denotes a Binomial distribution with $n$ trials and success probability $p$ . In addition, $\boldsymbol{V}=\sum_{i=1}^{n-1}MultiBern\left(\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}},\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}},1-\frac{2(1-\delta_{i})}{1+e^{\epsilon_{i}}}\right)$ , where $MultiBern(\theta_{1},\cdots,\theta_{d})$ represents a $d$ -dimensional Bernoulli distribution with $\sum_{j=1}^{d}\theta_{j}=1$ .

In order to enhance readability, proof details are placed in the section of Appendix. Based on Theorem 1, we can simplify the original problem by analyzing the shuffling process in a simple non-adaptive protocol.

The primary objective in the following is to demonstrate the distance between two distributions. The Berry Esseen lemma (berry1941accuracy; Esseen1942) is highly valuable and essential for proving asymptotic properties.

Lemma 2 (Berry Esseen)

Let $P=(\xi_{0},\xi_{1},\xi_{2})\sim\sum_{i=1}^{m}MultiBern\left(\frac{p_{i}}{2},\frac{p_{i}}{2},1-p_{i}\right)$ and $Q\sim N(\mu,\Sigma)$ , where $\mu=\mathbb{E}(P)$ and $\Sigma=Var(P).$ Then for the first two components $(X_{0},X_{1})$ , there exists $C>0$ , such that $\|\tilde{P}-\tilde{Q}\|_{TV}\leq\frac{C}{\sqrt{m}}$ , where $\tilde{P}$ and $\tilde{Q}$ represent the distribution of $(\xi_{0},\xi_{1})$ and corresponding normal distribution, respectively.

In fact, for given $n$ , we can obtain sophisticated bound of $\|\tilde{P}-\tilde{Q}\|_{TV}$ by numerical methods. Without loss of generality, we assume $\epsilon_{i}=\epsilon_{0}$ , $\delta_{i}=\delta_{0}=O(1/n)$ , then $p_{0}=\frac{1-\delta_{0}}{1+e^{\epsilon_{0}}}$ . For some fixed output $(\xi_{0},\xi_{1})=(k_{0},k_{1})$ , we approximate by integrating the normal probability density function around that point. Let $G(\cdot)$ be the cumulative distribution function of $\tilde{Q}$ and $h(k_{0},k_{1})=G(k_{0}+0.5,k_{1}+0.5)-G(k_{0}+0.5,k_{1})-G(k_{0},k_{1}+0.5)+G(k_{0},k_{1})$ , then

\|\tilde{P}-\tilde{Q}\|_{TV}=\sup_{(k_{0},k_{1})}|\mathbb{P}(\xi_{0}=k_{0},\xi_{1}=k_{1})-h(k_{0},k_{1})|.

(14)

Lemma 3

Let $p_{i}=\frac{2(1-\delta_{i})}{1+e^{\epsilon_{i}}}$ , if $\bar{\mu}=\sum_{i=1}^{n-1}(\frac{p_{i}}{2},\frac{p_{i}}{2})^{\prime}$ and $\mu_{0}=(1,0)^{\prime}+\bar{\mu}$ , $\mu_{1}=(0,1)^{\prime}+\bar{\mu}$ , then $T(N(\mu_{0},\boldsymbol{\Sigma}),N(\mu_{1},\boldsymbol{\Sigma}))=\Phi(\Phi^{-1}(1-\alpha)-\frac{2}{\sqrt{\sum_{i=1}^{n-1}}p_{i}})$ ,

\boldsymbol{\Sigma}=\sum_{i=1}^{n-1}\left(\begin{array}[]{cc}\frac{p_{i}}{2}(1-\frac{p_{i}}{2})&-\frac{p_{i}^{2}}{4}\\ -\frac{p_{i}^{2}}{4}&\frac{p_{i}}{2}(1-\frac{p_{i}}{2})\\ \end{array}\right).

Theorem 2 (Enhanced Central Privacy Upper bound)

Assume $\rho_{0}$ and $\rho_{1}$ are defined in equations (12) and (13), then there exists $C>0$ , such that

T(\rho_{0},\rho_{1})\geq\left(G_{\mu}\left({\alpha+\frac{C}{\sqrt{n-1}}}\right)-\frac{C}{\sqrt{n-1}}\right),

(15)

where $G_{\mu}(\alpha)=\Phi(\Phi^{-1}(1-\alpha)-\mu),\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$ In an unambiguous manner, we refer to it as approximately following the $\mu$ -GDP.

Proof:

First, let’s analyze the scenarios where the $n$ -th data point differs. According to the definition of $(\Delta_{0},\Delta_{1},\Delta_{2})$ ,

(\Delta_{0},\Delta_{1},\Delta_{2})=\begin{cases}\begin{array}[]{lll}(0,0,1)&\quad w.p.&\quad\delta_{n};\\ (1,0,0)&\quad w.p.&\quad(1-\delta_{n})\frac{e^{\epsilon_{n}}}{1+e^{\epsilon_{n}}};\\ (0,1,0)&\quad w.p.&\quad(1-\delta_{n})\frac{1}{{1+e^{\epsilon_{n}}}}.\end{array}\end{cases}

(16)

When $\Delta_{2}=1$ , $\rho_{0}$ and $\rho_{1}$ are indistinguishable, which indicates that $T(\rho_{0},\rho_{1})|_{\Delta_{2}=1}=1-\alpha$ . Let $\rho^{\prime}_{0}=(1,0,0)^{\prime}+\sum_{i=1}^{n-1}MultiBern\left(p_{i}/2,p_{i}/2,1-p_{i}\right)$ and $\rho^{\prime}_{1}=(0,1,0)^{\prime}+\sum_{i=1}^{n-1}MultiBern\left(p_{i}/2,p_{i}/2,1-p_{i}\right)$ with $p_{i}=\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}$ , then

T(\rho_{0},\rho_{1})=\delta_{n}(1-\alpha)+(1-\delta_{n})T_{symm}(\rho^{\prime}_{0},\rho^{\prime}_{1}),

(17)

where $T_{symm}(\rho^{\prime}_{0},\rho^{\prime}_{1})=\max\{T(\rho^{\prime}_{0},\rho^{\prime}_{1}),T(\rho^{\prime}_{1},\rho^{\prime}_{0})\}$ . Assume $P\sim N(\mu_{0},\Sigma)$ , $Q\sim N(\mu_{1},\Sigma)$ , where $\mu_{0},\mu_{1},\Sigma$ are same as that in Lemma 3. Let $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n-1}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}$ , according to equation (4),

T(\rho^{\prime}_{0},P)\geq 1-\alpha-\|\rho^{\prime}_{0}-P\|_{TV},

T(\rho^{\prime}_{1},Q)\geq 1-\alpha-\|\rho^{\prime}_{1}-Q\|_{TV},

then based on Fact 4,

T(\rho^{\prime}_{0},Q)\geq\Phi(\Phi^{-1}(1-\alpha-\|\rho^{\prime}_{0}-P\|_{TV})-\mu)=F(\alpha).

Reusing Fact 4, we can obtain that

	$\displaystyle T(\rho^{\prime}_{0},\rho^{\prime}_{1})$	$\displaystyle\geq$	$\displaystyle 1-(1-F(\alpha))-\\|\rho^{\prime}_{1}-Q\\|_{TV}$
		$\displaystyle=$	$\displaystyle F(\alpha)-\\|\rho^{\prime}_{1}-Q\\|_{TV}$

Lemma 2 shows that there exists $C>0$ , such that $\|\rho^{\prime}_{1}-Q\|_{TV}\leq\frac{C}{\sqrt{n-1}}$ and $\|\rho^{\prime}_{0}-P\|_{TV}\leq\frac{C}{\sqrt{n-1}}$ . Hence

T(\rho^{\prime}_{0},\rho^{\prime}_{1})\geq G_{\mu}\left({\alpha+\frac{C}{\sqrt{n-1}}}\right)-\frac{C}{\sqrt{n-1}}.

Then

\begin{split}T(\rho_{0},\rho_{1})&\geq\delta_{n}(1-\alpha)\\ &\quad+(1-\delta_{n})\left(G_{\mu}\left({\alpha+\frac{C}{\sqrt{n-1}}}\right)-\frac{C}{\sqrt{n-1}}\right).\end{split}

Since for an arbitrary trade-off function $f$ , we have $f\leq 1-\alpha$ , it follows that:

T(\rho_{0},\rho_{1})\geq\left(G_{\mu}\left(\alpha+\frac{C}{\sqrt{n-1}}\right)-\frac{C}{\sqrt{n-1}}\right).

Finally, taking into account the case where the $i$ -th ( $1\leq i\leq n$ ) data differs in neighboring datasets, the privacy bound is determined based on the worst-case scenario, that is, $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$ $\square$

Comparison with Existing Results

We provide numerical evaluations for privacy amplification effect under fixed LDP settings in Table 2. Given a local privacy budget set $\epsilon^{\ell}\in[0.01,2]$ . For the purpose of comparison, we examine privacy amplification for a fixed $\epsilon^{\ell}$ while varying $n$ from $10^{3}$ to $10^{4}$ , with central $\delta$ for shuffling to be $10^{-4}$ for the sake of simplicity. To avoid misunderstandings, we repeat the first $10^{3}$ parameters. Considering that convergence rate in Lemma 2 is nearly $O(1/n)$ and can be negligible in numerical analysis, our focus lies in measuring $G_{\mu}$ .

Name	Distribution of $\epsilon^{\ell}=(\epsilon_{1}^{\ell},\cdots,\epsilon_{n}^{\ell})$
Unif 1	$U(0.01,1)$
Unif 2	$U(0.01,2)$
Constant	$0.5$
Mixed Constant	$50\%~{}0.5+50\%~{}0.01$

Table 2: Distributions of LDP budgets

\epsilon^{\ell}

.

U(a,b)

represents uniform distribution ranging from

a

to

b

.

To keep it concise, we use Fact 3 in Appendix to compute the corresponding central $\epsilon$ and $\delta$ for Theorem 2. Baseline bounds of privacy amplification effect include: [Liu23] (Liu2023), [FMT22] (Feldman2022hiding), [Erlingsson19] (Erlingsson2019amplification). [Liu23] provides bounds for the personalized scenario, while [FMT22] and [Erlingsson19] only consider the same $\epsilon^{\ell}$ .

The numerical results demonstrate the following results: (i) Our bound is suitable for extreme privacy budgets while [Liu23] required each $\epsilon_{i}$ should not be close to zero. However, it is natural to encounter user responses that contain no information, resulting in $\epsilon_{i}=0$ . (ii) As the sample size $n$ increases, the amplification effect also increases proportionally to the square root of $n$ . (iii) Our privacy bounds significantly outperform in all current scenarios, even in cases where the privacy parameters are the same.

Application and Experiments

All the experiments are implemented on a workstation with an Intel Core i5-1155G7 processor on Windows 11 OS.

Application to Mean and Frequency Estimation

Mean Estimation

The average function is a fundamental and commonly used mathematical operation with wide-ranging applications. In this section, we apply GSPA to the average function on the synthetic data. We randomly divide the users into three groups: conservative, moderate, and liberal. The fraction of three groups are determined by $f_{c},f_{m},f_{l}$ . As is reported (Acquisti2005privacy), the default values in this experiment are $f_{c}=0.54,f_{m}=0.37,f_{l}=0.09$ . For convenience, the privacy preferences for the users in conservative, moderate and liberal groups are $\epsilon_{C},\epsilon_{M}$ and $\epsilon_{l}$ , respectively. In the LDP case, the privacy preference of users in the liberal group is fixed at $\epsilon_{L}=1$ , while the default values of $\epsilon_{C}$ and $\epsilon_{M}$ are set to $0.1$ and $0.5$ , respectively.

Theorem 3

Algorithm 1 approximately preserves $\mu$ -GDP for each user, where $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$

Proof:

According to the definition of Laplace mechanism, data point $i\in[n]$ satisfies $(\epsilon,0)$ -LDP. Combined with Theorem 2, we can obtain that Algorithm 1 approximately satisfies $\mu$ -GDP with $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$ $\square$

Next, we simulate the accuracy for different set of privacy protection. To facilitate comparison, we set $f_{l}=0.09$ as a fixed value and vary $f_{c}$ from $0.01$ to $0.5$ with $f_{m}=1-f_{l}-f_{c}$ . Additionally, we generate $n=10,000$ privacy budgets for users based on the privacy preferences rule. We assume that each sample is drawn from a normal distribution $N(50,\sigma^{2})$ , and then the samples are clipped into the range $[20,80]$ . We repeat this procedure for a total of $1,000$ times to give a confidence interval. According to Fact 3, privacy parameter $\mu$ under the shuffle model can be obtained for varying $\epsilon_{c}$ . Figure 3(a) shows that an increase in the proportion of conservative users leads to a decrease in estimation accuracy. On the other hand, Figure 3(b) demonstrates that increasing privacy budget is beneficial for improving accuracy.

Algorithm 1 Mean estimation with GSPA.

0: Dataset

X=(x_{1},\ldots,x_{n})\in\mathbb{R}^{n}

, privacy budget

\mathcal{S}=\{\epsilon_{1},\cdots,\epsilon_{n}\}

for each user.

0:

z\in\mathbb{N}

1: for each

i

\in[n]

do

2:

y_{i}\leftarrow x_{i}+Lap(\Delta f/\epsilon_{i})

3: end for

4: Choose a random permutation

\pi

:

[n]\rightarrow[n]

5:

z=\frac{1}{n}\sum_{i=1}^{n}y_{\pi(i)}

6: return

z

Frequency estimation

In machine learning, frequency estimation is often used as a preprocessing step to understand the distribution and importance of different features or categories within a dataset. By accurately estimating the frequencies of various features or categories, it helps in feature selection, dimensionality reduction, and building effective models.

In order to obtain the dataset, a total of 10,000 records are generated for counting. Each record is encoded as a binary attribute. The proportion of records with a value of $1$ is determined by a density parameter $c$ , which ranges from $0$ to $1$ (with a default value of $c$ = 0.7).

Theorem 4

Algorithm 2 approximately preserves $\mu$ -GDP for each user, where $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$

The proof of Theorem 4 is the same as Theorem 3. The direct calculation shows that $z$ is an unbiased estimator of $c$ , that is, $\mathbb{E}(z)=c$ . Similar to the average function, we adopt the same configuration for personalized privacy budgets.

Algorithm 2 Frequency estimation with GSPA

0: Dataset

X=(x_{1},\ldots,x_{n})\in\{0,1\}^{n}

, privacy budget

\mathcal{S}=\{\epsilon_{1},\cdots,\epsilon_{n}\}

for each user.

0:

z\in\mathbb{N}

1: for each

i

\in[n]

do

2: if

x_{i}=1

then

3:

y_{i}\leftarrow Ber(\frac{e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}})

4: else

5:

y_{i}\leftarrow Ber(\frac{1}{1+e^{\epsilon_{i}}})

6: end if

7: end for

8: Choose a random permutation

\pi

:

[n]\rightarrow[n]

9:

A=\sum_{i=1}^{n}y_{\pi(i)}

10:

B=\sum_{i=1}^{n}\frac{1}{1+e^{\epsilon_{\pi(i)}}}

11:

z=\frac{A-B}{n-2B}

12: return

z

Personalized Private Stochastic Gradient Descent

The private stochastic gradient descent is a common method in deep learning (abadi2016deep). However, personalized private stochastic gradient descent combines personalized differential privacy with stochastic gradient descent optimization for model training and parameter updates while ensuring privacy protection. In the context of personalized differential privacy, privacy of individual users must be protected, and direct use of raw data for parameter updates is not feasible .

The key idea of personalized differential privacy is to introduce personalized parameters into the differentially private mechanism to flexibly adjust the level of privacy protection. For the gradient descent algorithm, personalized differential privacy can be achieved by introducing noise during gradient computation.

Theorem 5

Algorithm 3 approximately satisfies $\sqrt{T}\mu$ -GDP with $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$

Proof:

For arbitrary $j\in[m]$ , client $j$ satisfies $(\epsilon_{j},\delta_{j})$ -LDP before sending to the shuffler by the definition of Gaussian mechanism. By using Theorem 2, it preserves $\mu$ -GDP after shuffling with $\mu=\sqrt{\frac{2}{\sum_{i=1}^{n}\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}-\max_{i}{\frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}}}}.$ In addition, combined with Fact 3, it holds $\sqrt{T}\mu$ -GDP under $T$ -fold composition.

$\square$

Dataset and implementation

The MNIST dataset (lecun1998gradient) for handwritten digit recognition consists of $60,000$ training images and $10,000$ test images. Each sample in the dataset represents a $28\times 28$ vector generated from handwritten images, where the independent variable corresponds to the input vector, and the dependent variable represents the digit label ranging from $0$ to $9$ . In our experiments, we consider a scenario with $m$ clients, where each client has $n/m$ samples. For simplicity, we train a simple classifier using a feed-forward neural network with ReLU activation units and a softmax output layer with $10$ classes, corresponding to the $10$ possible digits. The model is trained using cross-entropy loss and an initial PCA input layer with $60$ components. At each step of the shuffled SGD, we choose at one client at random without replacement. The parameters of experimental setup is listed in Table 3. This experiment is designed to demonstrate the use cases of the shuffle model and therefore does not focus on comparing with previous results. For comparative results, please refer to Figure 2.

Parameter Selection

As a result, our approach achieves an accuracy of $96.78\%$ on the test dataset after approximately $50$ epochs. This result is consistent with the findings of a vanilla neural network (lecun1998gradient) trained on the same MNIST dataset. By employing this methodology, we can effectively train a simple classifier that achieves high accuracy in recognizing handwritten digits from the MNIST dataset.

Parameters	Value	Explanation
$C$	$2$	Clipping bound
$\delta^{\ell}$	$10^{-5}$	Indistinguishability parameter
$\epsilon^{\ell}$	$[0.01,2]$	Privacy budget
$\eta$	$0.05$	Step size of the gradient
$m$	$100$	The number of clients
$n$	$60,000$	Total number of training samples
$T$	$50$	The number of epochs

Table 3: Experiment Setting for the Shuffled Personalized-SGD on the MNIST dataset.

Utility of GSPA

We evaluate the utility of GSPA with $\epsilon^{\ell}$ drawing from Table 2 on MNIST dataset. We introduce Unif 3 as a distribution to represent $U(0.5,1)$ . The numerical results indicate that Unif 3 exhibits the best accuracy, which aligns with expectations as it corresponds to a larger value of the privacy budget. Despite constant scenario exhibiting stronger privacy protection than Unif 2, it actually achieves better accuracy. One possible reason behind this interesting observation is the significant difference in the privacy parameters, which can cause instability in gradient iterations.

Algorithm 3 SGD with GSPA

0: Dataset

X=(x_{1},\ldots,x_{n})

, loss function

\mathcal{L}(\boldsymbol{\theta},x)

, initial point

\boldsymbol{\theta}_{0}

, learning rate

\eta

, number of epochs

T

, privacy budget

\mathcal{S}=\{\epsilon_{1},\delta_{1},\cdots,\epsilon_{n},\delta_{n}\}

, batch size

m

and gradient norm bound

C

.

0:

\hat{\boldsymbol{\theta}}_{T,m}

1: Split

[n]

into

n/m

disjoint subsets

S_{1},\cdots,S_{m}

with equal size

m

2: Choose arbitrary initial point

\hat{\boldsymbol{\theta}}_{0}

3: Choose a random permutation

\pi

:

[m]\rightarrow[m]

4: for each

t\in[T]

do

5:

\tilde{\boldsymbol{\theta}}_{t,m}=\hat{\boldsymbol{\theta}}_{t}

6: for each

i\in[n/m]

do

7:

\sigma=\frac{2C}{m}\frac{\sqrt{2\log(1.25/\delta_{\pi(i)})}}{\epsilon_{\pi(i)}}

8:

\boldsymbol{b}_{i}\sim N(0,\sigma^{2}\boldsymbol{I}_{d})

9: for each

j\in S_{\pi(i)}

do

10: Compute gradient:

\boldsymbol{g}_{i}^{j}=\nabla\ell(\tilde{\boldsymbol{\theta}}_{i-1},x_{j})

11: end for

12: Clip to norm

C

:

\boldsymbol{g}_{i}=\frac{\sum_{j}\boldsymbol{g}_{i}^{j}}{m}

\tilde{\boldsymbol{g}}_{i}={\boldsymbol{g}_{i}}/\max(1,\|\boldsymbol{g}_{i}\|_{2}/C)

13:

\tilde{\boldsymbol{\theta}}_{i}=\tilde{\boldsymbol{\theta}}_{i-1}-\eta(\tilde{\boldsymbol{g}}_{i}+\boldsymbol{b}_{i})

14: end for

15:

\hat{\boldsymbol{\theta}}_{t,m}=\tilde{\boldsymbol{\theta}}_{m}

16: end for

17: return

\hat{\boldsymbol{\theta}}_{T,m}

Conclusion and Future Work

This work focuses on privacy amplification of shuffle model. To address the trade-off between privacy and utility, we propose the GSPA framework, which achieves a higher accuracy while maintaining at least 33% privacy loss compared to existing methods.

In our future research, we plan to expand by incorporating additional privacy definitions such as Renyi differential privacy (Girgis2021renyi). Moreover, we acknowledge the significance of enhancing techniques for specific data types like images, speech, and other forms. This entails developing specialized privacy metrics, differential privacy mechanisms, and model training algorithms that offer more accurate and efficient privacy protection.

Appendix

$f$ -DP

Here are several important properties of $f$ -DP. We present these facts directly for the sake of brevity, and for comprehensive proofs, please refer to the related article (Dong2022gaussian).

Fact 1

$(\epsilon,\delta)$ -DP is equivalent to $f_{\epsilon,\delta}$ -DP, where

f_{\epsilon,\delta}=\max\{0,1-\delta-e^{\epsilon}\alpha,e^{-\epsilon}(1-\delta-\epsilon)\}.

(19)

Fact 2

$f$ -DP holds the post-processing property, that is, if a mechanism $M$ is $f$ -DP, then its post-processing $Proc\circ M$ is also $f$ -DP.

Fact 3

( $\mu$ -GDP) A $f$ -DP mechanism is called $\mu$ -GDP if $f$ can be obtained by $f=T(N(0,1),N(\mu,1))=\Phi(\Phi^{-1}(1-\alpha)-\mu)$ , where $\Phi(\cdot)$ is cumulative distribution function of standard Gaussian distribution $N(0,1)$ . Then a mechanism is $\mu$ -GDP if and only if it is $(\epsilon,\delta(\epsilon))$ -DP for all $\epsilon\geq 0$ , where

\delta(\epsilon)=\Phi(-\frac{\epsilon}{\mu}+\frac{\mu}{2})-e^{\epsilon}\Phi(-\frac{\epsilon}{\mu}-\frac{\mu}{2}).

In particular, if a mechanism is $\mu$ -GDP, then it is $k\mu$ -GDP for groups of size $k$ and the $n$ -fold composition of $\mu_{i}$ -GDP mechanisms is $\sqrt{\mu_{1}^{2}+\cdots+\mu_{n}^{2}}$ -GDP.

Fact 4

Suppose $T(P,R)\geq f,T(Q,R)\geq g,$ then $T(P,R)\geq f\circ g=g(1-f(\alpha))$ .

The relationship between $f$ -DP and traditional DP has been illustrated from the perspective of hypothesis testing (Dong2022gaussian). It provides a visual representation of how the choice of parameter $\mu$ in $\mu$ -GDP relates to the strength of privacy protection.

The $(\epsilon_{i},\delta_{i})$ -PLDP mechanism can be dominated by the following hypothesis testing problem (Kairouz2015composition). This forms the foundation for the subsequent analysis.

Lemma 4 (KOV15)

Let $\mathcal{R}^{(i)}:\mathcal{D}\rightarrow\mathcal{S}$ be an $(\epsilon_{i},\delta_{i})$ -DP local randomizer, and $x_{0},x_{1}\in D$ , then there exist two quaternary random variables $\tilde{X_{0}}$ and $\tilde{X_{1}}$ , such that $\mathcal{R}^{(i)}(x_{0})$ and $\mathcal{R}^{(i)}(x_{1})$ can be viewed as post-processing of $\tilde{X_{0}}$ and $\tilde{X_{1}}$ , respectively. In details,

P(\tilde{X_{0}}=x)=\left\{\begin{array}[]{ll}\delta_{i}&\mbox{if }x=A,\\ \frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}&\mbox{if }x=0,\\ \frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}&\mbox{if }x=1,\\ 0&\mbox{if }x=B,\end{array}\right.

and

P(\tilde{X_{1}}=x)=\left\{\begin{array}[]{ll}0&\mbox{if }x=A,\\ \frac{1-\delta_{i}}{1+e^{\epsilon_{i}}}&\mbox{if }x=0,\\ \frac{(1-\delta_{i})e^{\epsilon_{i}}}{1+e^{\epsilon_{i}}}&\mbox{if }x=1,\\ \delta_{i}&\mbox{if }x=B.\end{array}\right.

Proof of Theorem 1

Proof:

Formally, for each $i\in\{2,\cdots,n\}$ , let $p_{i}=\frac{2(1-\delta_{i})}{1+e^{\epsilon_{i}}}$ , we define random variables $Y_{1,i}^{0}$ , $Y_{1,i}^{1}$ and $Y_{i}$ as follows:

Y_{1,i}^{0}=\begin{cases}\begin{array}[]{lll}0&\quad w.p.&\quad e^{\epsilon_{i}}\frac{p_{i}}{2},\\ 1&\quad w.p.&\quad\frac{p_{i}}{2},\\ 2&\quad w.p.&\quad 1-e^{\epsilon_{i}}\frac{p_{i}}{2}-\frac{p_{i}}{2}.\end{array}\end{cases}

(20)

Y_{1,i}^{1}=\begin{cases}\begin{array}[]{lll}0&\quad w.p.&\quad\frac{p_{i}}{2},\\ 1&\quad w.p.&\quad e^{\epsilon_{i}}\frac{p_{i}}{2},\\ 2&\quad w.p.&\quad 1-e^{\epsilon_{i}}\frac{p_{i}}{2}-\frac{p_{i}}{2}.\end{array}\end{cases}

(21)

and

Y_{i}=\begin{cases}\begin{array}[]{lll}0&\quad w.p.&\quad\frac{p_{i}}{2},\\ 1&\quad w.p.&\quad\frac{p_{i}}{2},\\ 2&\quad w.p.&\quad 1-p_{i}.\end{array}\end{cases}

(22)

We consider the case in the $t$ -th iteration. Given a dataset $X_{b}$ for $b\in\{0,1\}$ , we generate $n$ samples from $\{0,1,2\}$ in the following way. Client number one reports a sample from $Y_{1,i}^{b}$ . Client $i$ ( $i=2,\cdots,n)$ each reports an independent sample from $Y_{i}$ . We then shuffle the reports randomly. Let $\rho_{b}$ denote the resulting distribution over $\{0,1,2\}^{n}$ . We then count the total number of $0$ s and $1$ s. Note that a vector containing a permutation of the users responses contains no more information than simply the number of $0$ s and $1$ s, so we can consider these two representations as equivalent.
We claim that there exists a post-processing function $proc(\cdot)$ such that for $y$ sampled from $\rho_{b}$ , $proc(y)$ is distributed identically to $\mathcal{A}_{S}(X_{b})$ . To see this, let $\pi$ be a randomly and uniformly chosen permutation of $\{1,\cdots,n\}$ . For every $i\in\{1,\cdots,n\}$ , given the hidden permutation $\pi$ , we can generate a sample from $\mathcal{A}_{S}(X_{b})$ by sequentially transforming $proc(y_{t})$ to obtain the correct mixture components, then sampling from the corresponding mixture component. Specially, by Lemma 1,

z_{t}=\begin{cases}\begin{array}[]{lll}\mathcal{R}^{(t)}(z_{1:t-1},x_{1}^{0})&\text{if}&y_{t}=0;\\ \mathcal{R}^{(t)}(z_{1:t-1},x_{1}^{1})&\text{if}&y_{t}=1;\\ \mathcal{R}^{(t)}(z_{1:t-1},x_{\pi(i)})&\text{if}&y_{t}=2.\end{array}\end{cases}

(23)

By our assumption, this produces a sample $z_{t}$ from $\mathcal{R}^{(i)}(x_{\pi(i)}).$ It is easy to see that the resulting random variable $(z,y)$ has the property that for input $b\in\{0,1\}$ , its marginal distribution over $\mathcal{S}$ is the same as $\mathcal{A}_{S}(X_{b})$ and its marginal distribution over $\{0,1,2\}^{n}$ is $\rho_{b}$ . The difficulty then lies in the fact that conditioned on a particular instantiation $y=v$ , the permutation $\pi|_{y=v}$ is not independent of $b$ . Note that if $v_{t}=0$ or $1$ , the corresponding $\mathcal{Q}^{0(t)}_{1}(z_{1:t-1})$ or $Q_{1}^{1(t)}(z_{1:t-1}),$ is independent of $\pi$ . Therefore, in order to do the appropriate post-processing, it suffices to know the permutation $\pi$ restricted to the set of users who sampled $2$ , $K=\pi(\{i:y_{i}=2\})$ . The set $K$ of users who select $2$ is independent of $b$ since $Y_{1,i}^{0}$ and $Y_{1,i}^{1}$ have the same probability of sampling $2$ . The probability of being included in $K$ is identical for each $i\in\{2,\cdots,n\},$ and slightly smaller for the first user. Since the sampling of $z$ given $y$ only needs $K$ , we can sample from $z|_{(y,K)=(v,J)}$ without knowing $b$ . This conditional sampling is exactly the post-processing step that we claimed.

We now analyze the divergence between $\rho_{0}$ and $\rho_{1}$ , the shuffling step implies that $\rho_{0}$ and $\rho_{1}$ are symmetric. This implies that the divergence between $\rho_{0}$ and $\rho_{1}$ is equal to the divergence between the distribution between the distribution of the counts of $0^{\prime}$ s and $1^{\prime}$ s. The decomposition in equation (11) implies that the divergence between $\mathcal{A}_{S}(X_{0})$ and $\mathcal{A}_{S}(X_{1})$ can be dominated by the divergence of $\rho_{0}$ and $\rho_{1}$ , where $\Delta_{2}\sim Bern(\delta_{n}),\Delta_{0}\sim Bin(1-\Delta_{2},\frac{e^{\epsilon_{n}}}{1+e^{\epsilon_{n}}})$ , $\Delta_{1}=1-\Delta_{0}-\Delta_{2}$ and $MultiBern(\theta_{1},\cdots,\theta_{d})$ represents a $d$ -dimensional Bernoulli distribution with $\sum_{j=1}^{d}\theta_{j}=1$ . $\square$

Proof of Lemma 3

Proof:

Since $T(N(\boldsymbol{\mu}_{0},\boldsymbol{\Sigma}),N(\boldsymbol{\mu}_{1},\boldsymbol{\Sigma}))$ is

\Phi(\Phi^{-1}(1-\alpha)-\sqrt{(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})^{\prime}\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})}),

according to the property of normal distribution, the key is to calculate $(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})^{\prime}\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})$ . Let $v_{1}=\sum_{i=1}^{n-1}p_{i},v_{2}=\sum_{i=1}^{n-1}p_{i}^{2},$ then

\boldsymbol{\Sigma}=\left(\begin{array}[]{cc}\frac{v_{1}}{2}-\frac{v_{2}}{4}&-\frac{v_{2}}{4}\\ -\frac{v_{2}}{4}&\frac{v_{1}}{2}-\frac{v_{2}}{4}\\ \end{array}\right),

and

\boldsymbol{\Sigma}^{-1}=\left(\begin{array}[]{cc}\frac{2v_{1}-v_{2}}{v_{1}^{2}-v_{1}v_{2}}&\frac{v_{2}}{v_{1}^{2}-v_{1}v_{2}}\\ \frac{v_{2}}{v_{1}^{2}-v_{1}v_{2}}&\frac{2v_{1}-v_{2}}{v_{1}^{2}-v_{1}v_{2}}\\ \end{array}\right).

By simple calculation, we can obtain that

(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})^{\prime}\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}_{1}-\boldsymbol{\mu}_{0})=(-1,1)\boldsymbol{\Sigma}^{-1}(-1,1)^{\prime}=\frac{4}{\sum_{i=1}^{n-1}p_{i}}.

Substituting $\mu=\sqrt{\frac{4}{\sum_{i=1}^{n-1}p_{i}}}$ yields the proof. $\square$

Acknowledgements

We would like to thank all the anonymous reviewers for generously dedicating their time and expertise to evaluate our manuscript with insightful comments. This work is partially support by JST CREST JPMJCR21M2, JSPS KAKENHI Grant Number JP22H00521, JP22H03595, JP21K19767, JST/NSF Joint Research SICORP JPMJSC2107.

References

\nobibliography

* \bibentryabadi2016deep.
\bibentryballe2019privacy.
\bibentryberry1941accuracy.
\bibentryBittau2017prochlo.
\bibentrycheu2019distributed.
\bibentryDong2022gaussian.
\bibentryDwork2014algorithmic.
\bibentryErlingsson2019amplification.
\bibentryEsseen1942.
\bibentryFeldman2022hiding.
\bibentryGDDSK21federated.
\bibentryGirgis2021renyi.
\bibentryJZT2015.
\bibentryKairouz2015composition.
\bibentryKS11.
\bibentrylecun1998gradient.
\bibentryliu2021flame.
\bibentryLiu2023
\bibentryNCW21.

\nobibliography

aaai22

A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility

Abstract

Introduction

Preliminaries

Definition 1

Definition 2

Privacy Tools

Definition 3

Definition 4 (Laplace Mechanism)

Definition 5 (Gaussian Mechanism)

Privacy Analysis of GSPA Framework

Privacy Amplification Effect

Definition 6

Lemma 1

Proof:

Theorem 1

Lemma 2 (Berry Esseen)

Lemma 3

Theorem 2 (Enhanced Central Privacy Upper bound)

Proof:

Comparison with Existing Results

Application and Experiments

Application to Mean and Frequency Estimation

Mean Estimation

Theorem 3

Proof:

Frequency estimation

Theorem 4

Personalized Private Stochastic Gradient Descent

Theorem 5

Proof:

Dataset and implementation

Parameter Selection

Utility of GSPA

Conclusion and Future Work

Appendix

ff-DP

Fact 1

Fact 2

Fact 3

Fact 4

Lemma 4 (KOV15)

Proof of Theorem 1

Proof:

Proof of Lemma 3

Proof:

Acknowledgements

References

$f$ -DP