A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility
Abstract
The shuffle model of local differential privacy is an advanced method of privacy amplification designed to enhance privacy protection with high utility. It achieves this by randomly shuffling sensitive data, making linking individual data points to specific individuals more challenging. However, most existing studies have focused on the shuffle model based on -Locally Differentially Private (LDP) randomizers, with limited consideration for complex scenarios such as -LDP or personalized LDP (PLDP). This hinders a comprehensive understanding of the shuffle model’s potential and limits its application in various settings. To bridge this research gap, we propose a generalized shuffle framework that can be applied to any -PLDP setting with personalized privacy parameters. This generalization allows for a broader exploration of the privacy-utility trade-off and facilitates the design of privacy-preserving analyses in diverse contexts. We prove that shuffled -PLDP process approximately preserves -Gaussian Differential Privacy with This approach allows us to avoid the limitations and potential inaccuracies associated with inequality estimations. To strengthen the privacy guarantee, we improve the lower bound by utilizing hypothesis testing instead of relying on rough estimations like the Chernoff bound or Hoeffding’s inequality. Furthermore, extensive comparative evaluations clearly show that our approach outperforms existing methods in achieving strong central privacy guarantees while preserving the utility of the global model. We have also carefully designed corresponding algorithms for average function, frequency estimation, and stochastic gradient descent.
Introduction
The shuffle model (Bittau2017prochlo) is a state-of-the-art technique to balance privacy and utility for differentially private data analysis. In traditional differential privacy, a trusted server (or aggregator) is often assumed to collect all users’ data before privacy-preserving data analysis (Dwork2014algorithmic). However, such approaches may not be feasible or practical in scenarios where a trusted curator does not exist. Given this, Local Differential Privacy (LDP) (KS11) has been proposed to achieve differential privacy by allowing the users to add noises individually; however, LDP suffers from low utility due to the accumulated noise. To address this, the shuffle model of differential privacy (shuffle DP) (Bittau2017prochlo; balle2019privacy; Erlingsson2019amplification) adds a shuffler between the users and the server to randomly shuffle the noisy data before sending the server. The shuffle DP has an intriguing theoretical privacy amplification effect, which means a small amount of local noise could result in a strong privacy guarantee against the untrusted server. Extensive studies (balle2019privacy; Erlingsson2019amplification; Girgis2021renyi; Feldman2022hiding; liu2021flame; GDDSK21federated) have been devoted to proving a better (tighter) privacy amplification in the shuffle DP.
However, most existing studies have focused on the shuffle model based on -LDP randomizer with uniform and limited settings of local privacy parameters and . For example, Erlingsson2019amplification assumes and . Although a recent work Liu2023 provides a privacy bound for local personalized privacy parameter for each user (and a fixed ), the bound is relatively rough and has a large room to be improved. To address this problem, we make the following contributions.
Firstly, we propose a Generalized Shuffle framework for Privacy Amplification (GSPA) to allow arbitrary local privacy parameters and provide new privacy amplification analysis. Our analysis technique benefits from the adoption of Functional Differential Privacy (Dong2022gaussian) and carefully analyzing the distance between two multinomial distributions (see Theorem 1 and 2). For both uniform and personalized privacy parameter settings, we provide lower privacy bounds that exceed that of existing results (see Figure 2).
Secondly, we apply GSPA with different personalized privacy parameter settings to diverse privacy-preserving analysis tasks, including private mean, private frequency estimation, and DP-SGD, to demonstrate the effectiveness of our approach. For mean and frequency estimation with GSPA (see Figure 3), the more conservative users there are, the less utility is observed, showing a negative linear relationship. Simultaneously, as the privacy parameters of conservative users increase, utility demonstrates a positive linear relationship. For DP-SGD with GSPA (see Figure 4), there exists an interesting phenomenon that despite the constant scenario () offers a stronger privacy protection, its test accuracy (94.8%) is higher than that (93.5%) of scenarios , which have varying local privacy parameters.
Preliminaries
This section presents the definitions and tools necessary for understanding the shuffle model. These serve as fundamental tools for proposing our methods and form the basis of our approach.
Definition 1
(Differential Privacy) A randomized algorithm satisfies -differential privacy, denoted as -DP, if for all Range and for all neighboring databases ( can be obtained from by replacing exactly one record):
(1) |
is known as the privacy budget, while is referred to as the indistinguishability parameter, which describes the probability of privacy leakage exceeding . Both and should be as small as possible, indicating stronger privacy protection.
Definition 2
(Local Differential Privacy) A randomized algorithm satisfies -local differential privacy, denoted as -LDP, if for all pairs , and satisfies
(2) |
In Local Differential Privacy (LDP), each data contributor applies a local randomization mechanism to perturb their own data before sharing it with a central aggregator.

Privacy Tools
Differential privacy can be regarded as a hypothesis testing problem for a given distribution (Kairouz2015composition). In brief, we consider the hypothesis testing issue with two hypotheses.
To provide an intuitive explanation, we designate the name Bob to denote the exclusive individual present in but absent in . Consequently, rejecting the null hypothesis implies the recognition of Bob’s nonexistence, whereas accepting the null hypothesis suggests observing Bob’s existence in the dataset.
Inspired by this, an effective tool called -DP (Dong2022gaussian) has been introduced, which utilizes hypothesis testing to handle differential privacy. For two neighbouring databases and , let and denote the probability distributions of and , respectively. We consider a rejection rule , with type I and type II error rates defined as
(3) |
It is well-known that
(4) |
where is the supremum of over all measurable sets . To characterize the fine-grained trade-off between the two errors, Table 1 helps to establish a clear understanding of the relationship between the two errors.
Actual True | Actual False | |
---|---|---|
Accept Hypothesis | Correct | Type II Error () |
Reject Hypothesis | Type I Error () | Correct |
For any two probability distributions and on the same space , the trade-off function is defined as
(5) |
where the infimum is taken over all measurable rejection rules , and and .
Definition 3
(Functional Differential Privacy, -DP) Let be a trade-off function, a mechanism is said to be -DP if
(6) |
for all neighboring data sets and .
To enhance readability, we have included the introduction and relevant properties of -DP in the section of Appendix. It is worth noting that traditional DP belongs to a special case of -DP, therefore -DP has a wider scope of applicability.
In addition, Laplace mechanism and Gaussian mechanism are two common approaches used in differential privacy (Dwork2014algorithmic). The choice between the Laplace mechanism and the Gaussian mechanism depends on the data type, privacy requirements, and query tasks. The Laplace mechanism provides stronger privacy but may introduce larger errors, while the Gaussian mechanism is more suitable for accurate results. Thus, it’s important to strike a balance between privacy and accuracy based on specific requirements.
Definition 4 (Laplace Mechanism)
Given a query function , privacy parameter and sensitivity , then for any two neighbouring datasets , the Laplace mechanism
(7) |
preserves -DP, where Lap() denotes the centralized Laplace noise with scale parameter .
In the absence of ambiguity, we express both queries and answers as and respectively.
Definition 5 (Gaussian Mechanism)
Given a query function , privacy parameter and sensitivity , then for any two neighbouring datasets , the Gaussian mechanism
(8) |
preserves -DP, where denotes the Gaussian noise with mean and variance .
Privacy Analysis of GSPA Framework
Our Generalized Shuffle framework for Privacy Amplification (GSPA) consists of three main components: local randomizers, a trustworthy shuffler, and an aggregator, which are the same as existing shuffle DP frameworks; however, GSPA allows local randomizers with arbitrary privacy parameters. (i) For users, the local randomizer adds noise to the original data on the -th user’s devices, thus providing -PLDP for user . (ii) The shuffler randomly permutes the order of data elements, ensuring that the resulting arrangement is unknown to any party other than the shuffler itself. (iii) The aggregator collects and integrates shuffled data for simple queries, while for complex tasks like machine learning, it trains models based on shuffled data with multiple iterations. Without causing confusion, the notation -LDP is used to represent the uniform scenario, while -PLDP denotes the personalized scenario.
Privacy Amplification Effect
In this section, we address the issue of privacy protection in the context of a general shuffled adaptive process for personalized local randomizers.
Definition 6
For a domain , let for , where is the range space of be a sequence of algorithms such that is an -PLDP randomizer for all values of auxiliary inputs . Let be the algorithm that given a dataset , then sequentially computes for and outputs . We say is a personalized LDP (PLDP) adaptive process. Similarly, if we first sample a permutation uniformly at random, then sequentially computes for and outputs , we say this process is shuffled PLDP adaptive and denote it by .
Lemma 1
Given an -PLDP adaptive process, then in the -th step, local randomizer : and for any inputs , there exists distributions such that
(9) |
(10) |
(11) |
Proof:
For inputs and , satisfies the constraints of Lemma 4, so there exists an -PLDP local randomizer for the -th output and post-processing function such that , and
Let and , and . Let and Since conditioned on the output lying in , the distribution of and are the same. Let , and Then
Further, for all ,
Letting , , and for all . The proof is completed.
Theorem 1
For a domain , if is a shuffled PLDP adaptive process, then for arbitrary two neighboring datasets distinct at the -th data point, there exists a post-processing function : such that
Here,
(12) |
(13) |
, , where denotes a Bernoulli random variable with bias , denotes a Binomial distribution with trials and success probability . In addition, , where represents a -dimensional Bernoulli distribution with .
In order to enhance readability, proof details are placed in the section of Appendix. Based on Theorem 1, we can simplify the original problem by analyzing the shuffling process in a simple non-adaptive protocol.
The primary objective in the following is to demonstrate the distance between two distributions. The Berry Esseen lemma (berry1941accuracy; Esseen1942) is highly valuable and essential for proving asymptotic properties.
Lemma 2 (Berry Esseen)
Let and , where and Then for the first two components , there exists , such that , where and represent the distribution of and corresponding normal distribution, respectively.
In fact, for given , we can obtain sophisticated bound of by numerical methods. Without loss of generality, we assume , , then . For some fixed output , we approximate by integrating the normal probability density function around that point. Let be the cumulative distribution function of and , then
(14) |
Lemma 3
Let , if and , , then ,
Proof:
First, let’s analyze the scenarios where the -th data point differs. According to the definition of ,
(16) |
When , and are indistinguishable, which indicates that . Let and with , then
(17) |
where . Assume , , where are same as that in Lemma 3. Let , according to equation (4),
then based on Fact 4,
Reusing Fact 4, we can obtain that
Lemma 2 shows that there exists , such that and . Hence
Then
Since for an arbitrary trade-off function , we have , it follows that:
Finally, taking into account the case where the -th () data differs in neighboring datasets, the privacy bound is determined based on the worst-case scenario, that is,
Comparison with Existing Results
We provide numerical evaluations for privacy amplification effect under fixed LDP settings in Table 2. Given a local privacy budget set . For the purpose of comparison, we examine privacy amplification for a fixed while varying from to , with central for shuffling to be for the sake of simplicity. To avoid misunderstandings, we repeat the first parameters. Considering that convergence rate in Lemma 2 is nearly and can be negligible in numerical analysis, our focus lies in measuring .
Name | Distribution of |
---|---|
Unif 1 | |
Unif 2 | |
Constant | |
Mixed Constant |
To keep it concise, we use Fact 3 in Appendix to compute the corresponding central and for Theorem 2. Baseline bounds of privacy amplification effect include: [Liu23] (Liu2023), [FMT22] (Feldman2022hiding), [Erlingsson19] (Erlingsson2019amplification). [Liu23] provides bounds for the personalized scenario, while [FMT22] and [Erlingsson19] only consider the same .
The numerical results demonstrate the following results: (i) Our bound is suitable for extreme privacy budgets while [Liu23] required each should not be close to zero. However, it is natural to encounter user responses that contain no information, resulting in . (ii) As the sample size increases, the amplification effect also increases proportionally to the square root of . (iii) Our privacy bounds significantly outperform in all current scenarios, even in cases where the privacy parameters are the same.

Application and Experiments
All the experiments are implemented on a workstation with an Intel Core i5-1155G7 processor on Windows 11 OS.
Application to Mean and Frequency Estimation
Mean Estimation
The average function is a fundamental and commonly used mathematical operation with wide-ranging applications. In this section, we apply GSPA to the average function on the synthetic data. We randomly divide the users into three groups: conservative, moderate, and liberal. The fraction of three groups are determined by . As is reported (Acquisti2005privacy), the default values in this experiment are . For convenience, the privacy preferences for the users in conservative, moderate and liberal groups are and , respectively. In the LDP case, the privacy preference of users in the liberal group is fixed at , while the default values of and are set to and , respectively.
Theorem 3
Algorithm 1 approximately preserves -GDP for each user, where
Proof:
According to the definition of Laplace mechanism, data point satisfies -LDP. Combined with Theorem 2, we can obtain that Algorithm 1 approximately satisfies -GDP with
Next, we simulate the accuracy for different set of privacy protection. To facilitate comparison, we set as a fixed value and vary from to with . Additionally, we generate privacy budgets for users based on the privacy preferences rule. We assume that each sample is drawn from a normal distribution , and then the samples are clipped into the range . We repeat this procedure for a total of times to give a confidence interval. According to Fact 3, privacy parameter under the shuffle model can be obtained for varying . Figure 3(a) shows that an increase in the proportion of conservative users leads to a decrease in estimation accuracy. On the other hand, Figure 3(b) demonstrates that increasing privacy budget is beneficial for improving accuracy.




Frequency estimation
In machine learning, frequency estimation is often used as a preprocessing step to understand the distribution and importance of different features or categories within a dataset. By accurately estimating the frequencies of various features or categories, it helps in feature selection, dimensionality reduction, and building effective models.
In order to obtain the dataset, a total of 10,000 records are generated for counting. Each record is encoded as a binary attribute. The proportion of records with a value of is determined by a density parameter , which ranges from to (with a default value of = 0.7).
Theorem 4
Algorithm 2 approximately preserves -GDP for each user, where
The proof of Theorem 4 is the same as Theorem 3. The direct calculation shows that is an unbiased estimator of , that is, . Similar to the average function, we adopt the same configuration for personalized privacy budgets.
Personalized Private Stochastic Gradient Descent
The private stochastic gradient descent is a common method in deep learning (abadi2016deep). However, personalized private stochastic gradient descent combines personalized differential privacy with stochastic gradient descent optimization for model training and parameter updates while ensuring privacy protection. In the context of personalized differential privacy, privacy of individual users must be protected, and direct use of raw data for parameter updates is not feasible .
The key idea of personalized differential privacy is to introduce personalized parameters into the differentially private mechanism to flexibly adjust the level of privacy protection. For the gradient descent algorithm, personalized differential privacy can be achieved by introducing noise during gradient computation.
Theorem 5
Algorithm 3 approximately satisfies -GDP with
Proof:
For arbitrary , client satisfies -LDP before sending to the shuffler by the definition of Gaussian mechanism. By using Theorem 2, it preserves -GDP after shuffling with In addition, combined with Fact 3, it holds -GDP under -fold composition.
Dataset and implementation
The MNIST dataset (lecun1998gradient) for handwritten digit recognition consists of training images and test images. Each sample in the dataset represents a vector generated from handwritten images, where the independent variable corresponds to the input vector, and the dependent variable represents the digit label ranging from to . In our experiments, we consider a scenario with clients, where each client has samples. For simplicity, we train a simple classifier using a feed-forward neural network with ReLU activation units and a softmax output layer with classes, corresponding to the possible digits. The model is trained using cross-entropy loss and an initial PCA input layer with components. At each step of the shuffled SGD, we choose at one client at random without replacement. The parameters of experimental setup is listed in Table 3. This experiment is designed to demonstrate the use cases of the shuffle model and therefore does not focus on comparing with previous results. For comparative results, please refer to Figure 2.
Parameter Selection
As a result, our approach achieves an accuracy of on the test dataset after approximately epochs. This result is consistent with the findings of a vanilla neural network (lecun1998gradient) trained on the same MNIST dataset. By employing this methodology, we can effectively train a simple classifier that achieves high accuracy in recognizing handwritten digits from the MNIST dataset.
Parameters | Value | Explanation |
---|---|---|
Clipping bound | ||
Indistinguishability parameter | ||
Privacy budget | ||
Step size of the gradient | ||
The number of clients | ||
Total number of training samples | ||
The number of epochs |

Utility of GSPA
We evaluate the utility of GSPA with drawing from Table 2 on MNIST dataset. We introduce Unif 3 as a distribution to represent . The numerical results indicate that Unif 3 exhibits the best accuracy, which aligns with expectations as it corresponds to a larger value of the privacy budget. Despite constant scenario exhibiting stronger privacy protection than Unif 2, it actually achieves better accuracy. One possible reason behind this interesting observation is the significant difference in the privacy parameters, which can cause instability in gradient iterations.
Conclusion and Future Work
This work focuses on privacy amplification of shuffle model. To address the trade-off between privacy and utility, we propose the GSPA framework, which achieves a higher accuracy while maintaining at least 33% privacy loss compared to existing methods.
In our future research, we plan to expand by incorporating additional privacy definitions such as Renyi differential privacy (Girgis2021renyi). Moreover, we acknowledge the significance of enhancing techniques for specific data types like images, speech, and other forms. This entails developing specialized privacy metrics, differential privacy mechanisms, and model training algorithms that offer more accurate and efficient privacy protection.
Appendix
-DP
Here are several important properties of -DP. We present these facts directly for the sake of brevity, and for comprehensive proofs, please refer to the related article (Dong2022gaussian).
Fact 1
-DP is equivalent to -DP, where
(19) |
Fact 2
-DP holds the post-processing property, that is, if a mechanism is -DP, then its post-processing is also -DP.
Fact 3
(-GDP) A -DP mechanism is called -GDP if can be obtained by , where is cumulative distribution function of standard Gaussian distribution . Then a mechanism is -GDP if and only if it is -DP for all , where
In particular, if a mechanism is -GDP, then it is -GDP for groups of size and the -fold composition of -GDP mechanisms is -GDP.
Fact 4
Suppose then .
The relationship between -DP and traditional DP has been illustrated from the perspective of hypothesis testing (Dong2022gaussian). It provides a visual representation of how the choice of parameter in -GDP relates to the strength of privacy protection.
The -PLDP mechanism can be dominated by the following hypothesis testing problem (Kairouz2015composition). This forms the foundation for the subsequent analysis.
Lemma 4 (KOV15)
Let be an -DP local randomizer, and , then there exist two quaternary random variables and , such that and can be viewed as post-processing of and , respectively. In details,
and
Proof of Theorem 1
Proof:
Formally, for each , let , we define random variables , and as follows:
(20) |
(21) |
and
(22) |
We consider the case in the -th iteration. Given a dataset for , we generate samples from in the following way. Client number one reports a sample from . Client ( each reports an independent sample from . We then shuffle the reports randomly. Let denote the resulting distribution over .
We then count the total number of s and s. Note that a vector containing a permutation of the users responses contains no more information than simply the number of s and s, so we can consider these two representations as equivalent.
We claim that there exists a post-processing function such that for sampled from , is distributed identically to .
To see this, let be a randomly and uniformly chosen permutation of . For every ,
given the hidden permutation , we can generate a sample from by sequentially transforming to obtain the correct mixture components, then sampling from the corresponding mixture component. Specially, by Lemma 1,
(23) |
By our assumption, this produces a sample from It is easy to see that the resulting random variable has the property that for input , its marginal distribution over is the same as and its marginal distribution over is . The difficulty then lies in the fact that conditioned on a particular instantiation , the permutation is not independent of . Note that if or , the corresponding or is independent of . Therefore, in order to do the appropriate post-processing, it suffices to know the permutation restricted to the set of users who sampled , . The set of users who select is independent of since and have the same probability of sampling . The probability of being included in is identical for each and slightly smaller for the first user. Since the sampling of given only needs , we can sample from without knowing . This conditional sampling is exactly the post-processing step that we claimed.
We now analyze the divergence between and , the shuffling step implies that and are symmetric. This implies that the divergence between and is equal to the divergence between the distribution between the distribution of the counts of s and s. The decomposition in equation (11) implies that the divergence between and can be dominated by the divergence of and , where , and represents a -dimensional Bernoulli distribution with .
Proof of Lemma 3
Proof:
Since is
according to the property of normal distribution, the key is to calculate . Let then
and
By simple calculation, we can obtain that
Substituting yields the proof.
Acknowledgements
We would like to thank all the anonymous reviewers for generously dedicating their time and expertise to evaluate our manuscript with insightful comments. This work is partially support by JST CREST JPMJCR21M2, JSPS KAKENHI Grant Number JP22H00521, JP22H03595, JP21K19767, JST/NSF Joint Research SICORP JPMJSC2107.
References
*
\bibentryabadi2016deep.
\bibentryballe2019privacy.
\bibentryberry1941accuracy.
\bibentryBittau2017prochlo.
\bibentrycheu2019distributed.
\bibentryDong2022gaussian.
\bibentryDwork2014algorithmic.
\bibentryErlingsson2019amplification.
\bibentryEsseen1942.
\bibentryFeldman2022hiding.
\bibentryGDDSK21federated.
\bibentryGirgis2021renyi.
\bibentryJZT2015.
\bibentryKairouz2015composition.
\bibentryKS11.
\bibentrylecun1998gradient.
\bibentryliu2021flame.
\bibentryLiu2023
\bibentryNCW21.
aaai22