Differentially Private Federated Learning: A Systematic Review

Jie Fu [email protected] Stevens Institute of TechnologyHobokenUSA , Yuan Hong [email protected] University of ConnecticutStorrsUSA , Xinpeng Ling [email protected] East China Normal UniversityShanghaiChina , Leixia Wang [email protected] Renmin University of ChinaBeijingChina , Xun Ran [email protected] The Hong Kong Polytechnic UniversityHong KongChina , Zhiyu Sun [email protected] East China Normal UniversityShanghaiChina , Wendy Hui Wang [email protected] Stevens Institute of TechnologyHobokenUSA , Zhili Chen [email protected] East China Normal UniversityShanghaiChina and Yang Cao [email protected] Tokyo Institute of TechnologyTokyoJapan

(2024)

Abstract.

In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies.

Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by various differential privacy models in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of various differential privacy models and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our work provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research.

differential privacy, federated learning, survey

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†isbn: 978-1-4503-XXXX-X/18/06

1. Introduction

In the past decade, deep learning techniques have achieved remarkable success in many AI tasks (Lundervold and Lundervold, 2019; Chen et al., 2019). In traditional deep learning frameworks, it is assumed that all pertinent training data is centralized under the governance of a singular, trustworthy entity. However, in real-world industrial contexts, data is often distributed across multiple independent parties, where a central trusted authority is often impractical. Additionally, legal restrictions such as CCPA (Mathews and Bowman, 2018) or GDPR (Cummings and Desai, 2018), along with business competition, may further limit the sharing of sensitive data.

In response, federated learning (FL) has emerged as a promising collaborative learning infrastructure. In FL systems, participants retain their data locally, eliminating the need to share sensitive raw data. Instead, they contribute to a collective learning process by training models locally and sharing only the model parameters with a central server. This parameter server aggregates the updates and redistributes the refined model to all participants. However, even though the raw data are kept locally, adversaries may still infer sensitive data from the shared model parameters, posing severe privacy concerns. For example, the contents of raw data can be inverted from the model parameters (Zhu et al., 2019; Nasr et al., 2019; Wang et al., 2019a), or the membership information of the raw data can be inferred (Song et al., 2017; Melis et al., 2019). To enhance the privacy of FL systems, several methods have been proposed based on homomorphic encryption (Damgård et al., 2012) or secure multiparty computation (MPC) (Mohassel and Zhang, 2017). But these techniques require significant computational overheads and do not protect the final output of the computation, leaving them vulnerable to privacy breaches (e.g., inference attacks). One of the state-of-the-art (SOTA) paradigms to mitigate privacy risks in FL is differential privacy (DP) (Dwork et al., 2014). Many works (Feldman, 2020; Naseri et al., 2020; Jayaraman and Evans, 2019; Stock et al., 2022) have demonstrated that by adding proper DP noise during the local training phase or uploaded model parameters, can prevent unintentional leakage of private raw data (e.g., via the membership inference attacks (Song et al., 2017; Melis et al., 2019)).

Currently, federated learning with differential privacy has attracted significant interests (Naseri et al., 2020; Yang et al., 2023c; Xu et al., 2023a; Zhang et al., 2022). Apart from DP ¹¹1DP also refer as centralized DP (CDP) to differ LDP in some paper. The “DP” mentioned in this paper can be seen as “CDP”., other DP models ²²2 We refer to DP, LDP, and the shuffle model collectively as DP models in our work. such as local differential privacy (LDP) and the shuffle model, which are derived from DP, are also widely used in FL (Zhao et al., 2020b; Yang et al., 2021; Liu et al., 2021a, 2023b). Many scholars are exploring the trade-offs between privacy protection, communication efficiency, and model performance. As the volume of publications on differentially private federated learning ³³3In our paper, differentially private federated learning means federated learning with DP models, which can be DP, LDP or shuffle model. continues to grow, the task of summarizing and organizing this body of research has become both urgent and challenging. While there are existing surveys on privacy-preserving federated learning (Mothukuri et al., 2021; Yin et al., 2021), they mainly focus on secure multiparty computation or homomorphic encryption techniques. There are also survey studies for the differential privacy (Yang et al., 2023a; Wang et al., 2020d; Xiong et al., 2014, 2020), focusing on their applications in graph data (Mueller et al., 2022), medical data (Liu et al., 2023a), natural language processing (NLP) (Hu et al., 2023) or the industrial internet of things (IIOT) (Jiang et al., 2021), rather than the federated learning.

El et al. (El Ouadrhiri and Abdelhadi, 2022) presented a survey of differential privacy for deep and federated learning. Their work primarily focuses on centralized deep learning, with only a small portion discussing the LDP in FL. Recently, Zhang et al. (Zhang et al., 2023a) offered a review for differential privacy techniques in federated learning, but their discussion does not provide a comprehensive, technical, and systematic analysis of differential privacy within the federated learning context. Beyond that, our work differs from the above in several key aspects: 1) We propose a novel differentially private federated learning taxonomy, particularly categorize DP, LDP and shuffle model in FL, we focus on the guarantee and definition of differential privacy, which sets us apart from previous classifications based on the presence of a centralized trusted server (Zhang et al., 2023a; Naseri et al., 2020; Yang et al., 2023c; Wei et al., 2021). We contend that their classification approach lacks precision. Federated learning is inherently a composite framework; for instance, in a cross-silo setting⁴⁴4According to the participating clients of FL, in cross-device setting, the number of clients is usually larger and consists of mobile devices. While clients are typically organizations or companies, and the number of clients is smaller in cross-silo setting (Kairouz et al., 2021b)., aside from a central server that aggregates model parameters, each client may also maintain a local server to manage data for training. Therefore, characterizing federated learning scenarios that lack a trusted central server for parameter aggregation solely as LDP is overly simplistic and inaccurate. 2) We elaborate on the relationship between DP, LDP, and the shuffle model, as well as their privacy guarantee in FL. 3) We not only explore DP models in horizontal FL but also delve into vertical FL and transfer FL.

As show in Figure 1, our survey classifies differentially private federated learning based on two aspects: FL scenarios and DP models. In FL scenarios, we classify the works to Horizontal FL, Vertical FL and Transfer FL by the data partitioning setting. We have encompassed all scenarios of federated learning. Although the majority of works on differential privacy for FL has been conducted in the context of Horizontal FL, as more privacy issues are exposed in vertical FL and transfer FL (Pasquini et al., 2021; Jagielski et al., 2024), research on DP models in them is becoming increasingly important. In DP models, We discuss differentially private federated learning from a new perspective, specifically, We distinguish between (centralized) DP, LDP and shuffle model from the definition and privacy guarantee. While these models can generally be grouped under the broad spectrum of DP, there are strict distinctions when considering the specific definitions and privacy guarantee. The differentiation and relation between them will be described in detail in Section 2.2.4. Rigorous differentiation of DP models in differentially private federated learning, based on the definitions and privacy guarantee, is of significant importance. It not only clarifies the protection subjects of different DP models but also enhances the understanding of various implementation methods associated with these DP models. For DP and shuffle model, we further categorize into sample level and client level based on the definition of neighboring datasets – corresponding to the protected objects ⁵⁵5The definition of neighboring datasets does not exist in LDP..

Refer to caption — Figure 1. A new taxonomy of Differentially Private FL.

In summary, our contributions are as follows:

•

We have reviewed the definitions and guarantee of DP, LDP and shuffle model, summarize the relaxation and differentiation of them. And we have further summarized the commonly used properties of DP models, privacy loss composition mechanisms, and perturbation methods in differentially private federated learning.
•

We have broken away from the traditional classification method of differentially private FL, and explored the taxonomy from the definition and guarantee of differential privacy. We have provided a new, rigorous classification framework for differentially private FL.
•

We have discussed and summarized over 70 recent articles on differentially private FL, as shown in Table 3. We have clarified the protection targets of different DP models within federated learning. In addition to studying the use of differential privacy technology in horizontal FL, we have also showed how DP models to protect data privacy in vertical FL and transfer FL.
•

We have summarized the applications of differentially private FL by data types and real-world implementation, and introduced related works in each domain.
•

Based on the above research discussion, we have proposed 5 promising directions for future research.

The rest of the paper is organized as follows. Section 2 summaries the technical of federated learning and three DP models, and the fundamental mechanisms of differentially private FL. In Section 3, we discuss the various DP models in horizontal FL. Section 4 shows the differential privacy techniques in vertical FL and transfer FL. Section 5 further explores the applications of differentially private federated learning in the real world. Open challenges and future directions are introduced in Section 6, followed the conclusion in Section 7.

2. FL and DP Models

In this section, we will describe three FL scenarios, and explain the definition and relation of three DP models. Apart that, we will present basic properties and popular loss composition mechanisms in DP models. Lastly, we will introduce the fundamental used perturbation mechanisms in differentially private FL.

2.1. Federated Learning

Federated learning is a learning framework where multiple clients collaborate on a machine learning problem, coordinated by a central server ⁶⁶6This is the core distinction with fully decentralized (peer-to-peer) learning (Kairouz et al., 2021b).. Instead of sharing or transferring raw data, clients keep their data locally and send model parameters for immediate aggregation to achieve the learning objective. Based on the characteristics of the data held by each participant, FL can be divided into the following categories (Li et al., 2021c).

2.1.1. Horizontal Federated Learning (HFL)

HFL refers to the FL scenarios where participants hold different samples while sharing the same feature space. The objective of HFL is to collaboratively train a global model $\mathbf{w}^{\star}$ with all clients, aiming to minimize the global objective function $F\left(\cdot\right)$ , essentially seeking to find:

(1)

\displaystyle\mathbf{w}^{\star}\triangleq\min_{\mathbf{w}}F(\mathbf{w}),\,\text{where}\,F(\mathbf{w}_{t})\triangleq\sum_{k=1}^{K}p_{k}L_{k}(\mathbf{w}_{t}^{k}).

Here, $t\in[T]$ donates the communication rounds, $k\in[K]$ donates the clients, $L_{k}(\cdot)$ donates the loss function of client $k$ , $\mathbf{w}^{k}_{t}$ signifies the model uploaded by client $k$ at round $t$ , while $\mathbf{w}_{t}$ denotes the server aggregated model. The weights are defined as $p_{k}=\frac{|D_{k}|}{|D|}$ , $\sum_{k=1}^{K}p_{k}=1$ , $D_{k}$ represents the local dataset of client $k$ , and $D$ is the union of $D_{k}$ .

2.1.2. Vertical Federated Learning (VFL)

Compared to HFL which processes data from multiple clients sharing the same feature set, VFL trains a better predictor by combining all features from different clients. VFL assumes that the data is partitioned across feature space. The data vector $x_{i}\in\mathbb{R}^{1\times d}$ is distributed among $K$ participants $\{x_{i,k}\in\mathbb{R}^{1\times d_{k}}\}_{k=1}^{K}$ in $D=\{x_{i},y_{i}\}_{i=1}^{N}$ , where $d_{k}$ is the feature dimension of the $k^{th}$ participant, $k\in[K-1]$ , and the $K^{th}$ participant has the label information $y_{i}=y_{i,K}$ (Yang et al., 2019). The participant with the label, the $K^{th}$ participant, is referred to as the server, while the others are referred to as clients ⁷⁷7 Some articles refer to them as “active” and “passive” parties (Liu et al., 2024); however, to maintain consistency with HFL, we refer to them as the “server” and “clients” here.. Each client $k$ has a dataset $D_{k}\triangleq\{x_{i,k}\}_{i=1}^{N}$ , and the server has a dataset $D_{K}\triangleq\{x_{i,K},y_{i,K}\}_{i=1}^{N}$ .

The global model $\mathbf{w}$ decomposes into the client models $\mathbf{w}_{k}$ , $k\in[K-1]$ , and the server model $\mathbf{w}_{K}$ . Each $\mathbf{w}_{k}$ accesses dataset $D_{k}$ for forward propagation to yield the result $h_{k}$ , while $\mathbf{w}_{K}$ accesses $D_{K}$ for both forward and backward propagation. The objective of VFL is to collaboratively train a global model $\mathbf{w}^{\star}$ with all clients, aiming to minimize the global objective function $F\left(\cdot\right)$ , essentially seeking to find:

(2)

\displaystyle\mathbf{w}^{\star}

\displaystyle\triangleq\min_{\mathbf{w}}F\left(\mathbf{w}\right),\,\text{where }\,F\left(\mathbf{w}\right)\triangleq\frac{1}{N}\sum_{i=1}^{N}L_{K}(\mathbf{w}_{K};h_{1}\left(x_{i,1},\mathbf{w}_{1}\right),\ldots,h_{K}\left(x_{i,K},\mathbf{w}_{K}\right),y_{i,K}),

where $L_{K}(\cdot)$ donates the loss function of the server.

2.1.3. Transfer Federated Learning (TFL)

TFL was proposed by Liu et al. (Liu et al., 2020c), it refers to the scenario in FL settings where there is limited or no overlap in the feature space or sample space between clients. In this scenario, there are often multiple source domain datasets $D_{k}\triangleq\left\{\left(x_{i,k},y_{i,k}\right)\right\}_{i=1}^{N_{k}},k\in[K-1]$ , and a target domain dataset $D_{K}\triangleq\left\{x_{i,K}\right\}_{i=1}^{N_{K}}$ . Here, $D_{k}$ and $D_{K}$ are held by different participants, with those holding the source domain datasets called clients and those holding the target domain dataset called the server. Under this setup, the goal is to enable multiple parties to jointly train a model that can generalize to the target domain dataset held by the server, without exposing data to each other. In most cases, there are often few or even no labels in target domain dataset, in which case it need to use the source domain datasets from each client to generate pseudo-labels for the target domain dataset firstly.

2.2. DP Models

Currently, there are three main DP models, which are DP, LDP, and shuffle model. They have different definitions and privacy guarantees, but there is a connection between them. The introduction and relation to these three DP models and some properties of DP will be presented as follows.

2.2.1. DP

Differential privacy (Dwork et al., 2006) is a rigorous mathematical framework that formally defines data privacy, originally used for privacy protection techniques on data collection servers. It can resist differential attacks using statistical queries by adding an appropriate amount of noise to the output of statistical queries, making it almost impossible for adversaries to discern the statistical differences between two adjacent datasets. There is a trusted server in DP setting. It requires that a single entry in the input dataset must not lead to statistically significant changes in the output if differential privacy holds (Dwork et al., 2006, 2014). The formal definition of DP is presented as follows.

Definition 2.1.

(DP (Dwork et al., 2014)). The randomized algorithm $\mathcal{A}:\mathbb{X}^{n}\to\mathbb{Y}$ satisfies $(\epsilon,\delta)$ -DP if any two neighboring datasets $D$ and $D^{\prime}$ that differ in only a single entry and $D\simeq D^{\prime}\in\mathbb{X}^{n}$ , we have

\displaystyle\forall S\subseteq\mathbb{Y}:\mathrm{Pr}[\mathcal{A}(D)\in S]\leq e^{\epsilon}\mathrm{Pr}[\mathcal{A}(D^{\prime})\in S]+\delta.

Here, $\epsilon>0$ controls the level of privacy guarantee in the worst case. The smaller $\epsilon$ , the stronger the privacy level is. The factor $\delta\geq 0$ is the failure probability that the property does not hold. In practice, the value of $\delta$ should be negligible (Papernot et al., 2018), particularly less than $\frac{1}{|D|}$ . When $\delta=0$ , we can call $(\epsilon,\delta)$ -DP as $\epsilon$ -DP.

Some new definitions have been derived from DP and applied in federated learning. For example, (Ghazi et al., 2021) defines label differential privacy (label DP) as follows. It consider the situation where only labels are sensitive information that should be protected and is usually applied to differentially private vertical federated learning.

Definition 2.2.

(Label DP (Ghazi et al., 2021)). The randomized algorithm $\mathcal{A}:\mathbb{X}^{n}\to\mathbb{Y}$ satisfies $(\epsilon,\delta)$ -label DP if any two neighboring datasets $D$ and $D^{\prime}$ that differ in the label of a single sample and $D\simeq D^{\prime}\in\mathbb{X}^{n}$ , we have

\displaystyle\forall S\subseteq\mathbb{Y}:\mathrm{Pr}[\mathcal{A}(D)\in S]\leq e^{\epsilon}\mathrm{Pr}[\mathcal{A}(D^{\prime})\in S]+\delta.

On the other hand, Bayesian Differential Privacy (BDP) (Triastcyn and Faltings, 2020) is interested in the change in the posterior distribution of the attacker after observing the private model, compared to the prior. The original definition of BDP is very similar to DP, except that BDP assumes that all samples in the dataset are drawn from the same distribution $p(x)$ .

Definition 2.3.

(Bayesian DP (Triastcyn and Faltings, 2020)) The randomized algorithm $\mathcal{A}:\mathbb{X}^{n}\to\mathbb{Y}$ satisfies $(\epsilon,\delta)$ -BDP if any two neighboring datasets $D$ and $D^{\prime}$ that differ in a single data sample $x^{\prime}\sim p(x)$ where $p(x)$ represents specific probability distribution, we have

\displaystyle\forall S\subseteq\mathbb{Y}:\mathrm{Pr}[\mathcal{A}(D)\in S]\leq e^{\epsilon}\mathrm{Pr}[\mathcal{A}(D^{\prime})\in S]+\delta.

2.2.2. LDP

Subsequent research introduced the concept of local differential privacy (LDP) framework, which is more stringent compared to DP. The difference between LDP and DP is that LDP does not require truthful data collectors. Therefore, data perturbation’s function is transferred from data collectors to each user. Each user perturbs original data by privacy-preserving algorithms and then uploads the disturbance data to data collectors. The formal definition of LDP is presented as follows.

Definition 2.4.

(LDP (Evfimievski et al., 2003)). The randomized mechanism ${{\mathcal{R}}:\mathbb{X}\rightarrow\mathbb{Y}}$ satisfies $(\epsilon,\delta)$ -LDP if for any two inputs $x,x^{\prime}\in\mathbb{X}$ , we have

\displaystyle\forall t\in\mathbb{Y}:{\mathrm{Pr}[\mathcal{R}(x)=t]\leq e^{\epsilon}\operatorname*{Pr}[\mathcal{R}(x^{\prime})=t]}+\delta.

Just like the DP, $0\leq\delta\leq 1$ indicates the failure probability that the property does not hold, and it should be negligible. And when $\delta=0$ , we can call $(\epsilon,\delta)$ -LDP as $\epsilon$ -LDP. The definition we provided here is the general version, but the vast majority of works based on LDP utilize the definition of $\epsilon$ -LDP.

From Definition 2.4, it can be seen that LDP ensures that algorithm $\mathcal{R}$ satisfies $(\epsilon,\delta)$ -LDP by controlling the similarity of any two records’ output. In short, according to a certain output result of privacy algorithm $\mathcal{A}$ , it is almost impossible to infer which record its input data is. In DP, its privacy algorithm $\mathcal{A}$ is defined by neighbor dataset, so it requires truthful third-party data collectors to protect data analysis results. We can see that the main difference between DP and LDP lies in the definition of neighboring datasets.

LDP requires that all pairs of sensitive data must satisfy the same $\epsilon$ privacy guarantee, which may hide too much information from the datasets, leading to insufficient utility for certain applications. Gursoy et al. (Gursoy et al., 2019) extended metric-based extensions of differetial privacy to LDP and proposed condensed LDP (CLDP) to improve the utility, which measures the level of privacy guarantee between any pair of sensitive data based on their distance.

Definition 2.5.

(Condensed LDP (Gursoy et al., 2019)). The randomized mechanism ${{\mathcal{R}}:\mathbb{X}\rightarrow\mathbb{Y}}$ satisfies $\epsilon$ -CLDP if for any two inputs $x,x^{\prime}\in\mathbb{X}$ , we have

\displaystyle\forall t\in\mathbb{Y}:{\mathrm{Pr}[\mathcal{R}(x)=t]\leq e^{\epsilon\cdot d(x,x^{\prime})}\operatorname*{Pr}[\mathcal{R}(x^{\prime})=t]},

where $d:\mathbb{X}\times\mathbb{X}\to[0,\infty)$ is a distance function that takes as input two items $x,x^{\prime}\in\mathbb{X}$ .

In $\epsilon$ -CLDP, the control of indistinguishability is influenced not only by $\epsilon$ but also by the input distance $d(\cdot,\cdot)$ . Consequently, as $d$ increases, $\epsilon$ must decrease to maintain compensation.

2.2.3. Shuffle Model

In recent years, research has introduced the shuffle differential privacy model, built upon the foundation of LDP. It adds a trusted shuffler⁸⁸8Some researchers also assume the shuffler is semi-honest (Goldreich, 2009), where the cryptography-based protocol is required to perform a secure shuffle without accessing the values. Since there exist many secure shuffle protocols handling this case, we can simply assume it’s honest and pay attention on the part of DP. between the server and the clients, which shuffles the data items submitted by clients to achieve anonymization. In shuffle model, each client satisfies privacy guarantee of LDP when facing shuffler, and then achieves privacy guarantee of DP when facing the server since the shuffled data forms a dataset and generates the notion of neighborhood dataset.

The shuffle model originated from the framework of Encode, Shuffle, Analyze proposed by Bittau et al. (Bittau et al., 2017) and is formally defined by Cheu et al. (Cheu et al., 2019). Following these works, a protocol in the shuffle model consists of three components $\mathcal{P}={\mathcal{A}\circ\mathcal{S}\circ\mathcal{R}^{n}}$ , aiming to achieve $(\epsilon_{c},\delta)$ -DP. Here, $\mathcal{R}:\mathbb{X}\rightarrow\mathbb{Y}$ is a local randomizer executed on the client side, usually assumed to satisfy $\epsilon_{l}$ -LDP. ${\mathcal{S}:\mathbb{Y}^{n}\to\mathbb{Y}^{n}}$ is a shuffler that applies a random permutation to its inputs, achieving anonymity. ${\mathcal{A}:\mathbb{Y}^{n}}\to\mathbb{Z}$ is the analyzer, aggregating the received values for statistics. According to the post-processing invariance, if the protocol $\mathcal{M}=\mathcal{S}\circ\mathcal{R}^{n}$ satisfies $(\epsilon_{c},\delta)$ -DP, the whole protocol $P$ also satisfies $(\epsilon_{c},\delta)$ -DP. Thanks to the anonymity brought by shuffling, each user’s privacy can be not only by their local randomization but also the randomness provided by other clients, leading to a smaller centralized privacy loss (i.e., $\epsilon_{c}$ ) compared to the used local privacy budget (i.e., $\epsilon_{l}$ ), just like the Figure 5 shown. In theory, we call this phenomenon privacy amplification, which leads to an around $O(\sqrt{n})$ more minor privacy loss $\epsilon_{c}$ (Balle et al., 2019; Erlingsson et al., 2019). This also reflects that to achieve a similar privacy level (corresponding to the same $\epsilon$ ), less noise is required for the local randomizer.

2.2.4. Relation between DP, LDP and shuffle model

In this section, we will discuss the relationships and differences between the DP, LDP, and shuffle model.

DP and LDP. We can see that the main difference between DP and LDP lies in their definitions: DP relies on neighboring datasets as input, while LDP does not have the concept of neighboring datasets. Alternatively, from another perspective, when there is only one data sample, DP and LDP are equivalent. So, the LDP (definition 2.4) can be derived from the DP (definition 2.1) when the input $x$ , $x^{\prime}$ are taken to be datasets of only one record. Since the size of local dataset is 1, $x$ and $x^{\prime}$ are neighbors for all $x,x^{\prime}\in\mathbb{X}$ . Therefore LDP is a stronger condition as it requires the mechanism to satisfy DP for any two values of the domain of data $X$ . As shown in Figure 2(a), LDP can compliant the DP, but the opposite doesn’t hold true (Paul and Mishra, 2020; Chen et al., 2023b).

Shuffle model, DP and LDP. The shuffle model encompasses both LDP and DP techniques and concepts. As shown in Figure 2(b), it begins with the definition of LDP, where each client satisfies $\epsilon_{l}$ -LDP protection, and then sends to the server using the shuffle model. At this point, the shuffle model achieves anonymity of model parameters and forms a dataset, thereby generating the concept of neighborhood datasets. When facing the server, all uploaded model collections will be made to satisfy $(\epsilon_{c},\delta_{c})$ -DP. As mentioned earlier, while LDP can compliant the DP, the shuffle model can achieve a tighter privacy budget bound in DP, meaning that $\epsilon_{c}\ll\epsilon_{l}$ .

2.2.5. Basic Properties and Loss Composition Mechanisms

There are some basic properties that apply in both DP and LDP. The first one is post-processing, which means that any post-processing on the output of an $(\epsilon,\delta)$ -DP algorithm will remain $(\epsilon,\delta)$ -DP.

Lemma 2.6 (Post-processing (Dwork et al., 2014)).

Let $\mathcal{A}:\mathbb{X}\rightarrow\mathbb{Y}$ be a mechanism satisfying $(\epsilon,\delta)$ -DP. Let $f:\mathbb{Y}\rightarrow\mathbb{R}$ be random function, then $f\circ\mathcal{A}:\mathbb{X}\rightarrow\mathbb{R}$ also satisfies $(\epsilon,\delta)$ -DP.

Form the Lemma 2.6, we can see that privacy cannot be weakened by any post-processing. For example in FL, we can let the gradients satisfy the $(\epsilon,\delta)$ -DP firstly, and because the gradient decent is a post-processing operation, the model parameters also satisfy the $(\epsilon,\delta)$ -DP. The second property is the parallel composition property, that divides the dataset into disjoint chunks, and a differential privacy mechanism is run separately on each chunk. Since these chunks are disjoint, each individual’s data appears in only one chunk. Therefore, even if the differential privacy mechanism is executed multiple times, it only runs once on the data of each individual.

Lemma 2.7 (Parallel Composition Theorem (McSherry, 2009)).

Let mechanism $A$ consists of a sequence of adaptive mechanisms $\mathcal{A}_{1}(D_{1}),\mathcal{A}_{2}(D_{2}),...,\mathcal{A}_{k}(D_{k})$ that satisfy $\epsilon_{1}$ -DP, $\epsilon_{2}$ -DP, …, $\epsilon_{k}$ -DP, respectively. And $D_{1},D_{2},...D_{k}$ are the disjoint chunks of dataset $D$ . Then $A$ satisfies $(\max_{i\in[1,..,k]}\epsilon_{i},\max_{i\in[1,..,k]}\delta_{i})$ -DP.

In differentilly private HFL, the dataset held by each client can be considered as an independent disjoint dataset, making it compatible with the principle of parallel composition. In differentilly private TFL, dividing privacy data into disjoint teacher datasets, such that the aggregation complies with the principle of parallel composition, is also a common practice. The third property is the sequential composition property, which allows the combination of privacy budgets from multiple privacy algorithms to satisfy an overall privacy budget.

Lemma 2.8 (Basic Sequential Composition Theorem (Dwork et al., 2014)).

If mechanism $A$ consists of a sequence of adaptive mechanisms $\mathcal{A}_{1}(D),\mathcal{A}_{2}(D),...,\mathcal{A}_{k}(D)$ , that satisfy $\epsilon_{1}$ -DP, $\epsilon_{2}$ -DP, …, $\epsilon_{k}$ -DP, respectively. $A$ satisfies $(\sum_{i=1}^{k}\epsilon_{i},\sum_{i=1}^{k}\delta_{i})$ -DP.

There are often multiple rounds of iteration required to obtain the model in differentilly private FL, where each iteration can be considered as a data access. By designing a differentially private algorithm $\mathcal{A}_{i,i\in[k]}$ that satisfies $(\epsilon,\delta)$ -DP for the $k$ iteration, we can leverage the sequential composition theorem to compute the privacy budget $(k\epsilon,k\delta)$ -DP for the entire process. But, this is not tight and we can use the advanced composition that could be improved to $(O(\sqrt{k\epsilon}),O(k\delta))$ -DP (Dwork et al., 2010b).

More privacy loss composition mechanisms for DP. However, the number of iterations required for model convergence in differentilly private FL is often enormous. The advanced composition theorem no longer guarantees a more compact estimate of privacy loss, prompting the emergence of more compact measures for combining privacy loss, among which commonly used ones include Zero-Concentrated Differential Privacy (zCDP) (Bun and Steinke, 2016a), Moments Accountant (MA) (Abadi et al., 2016), Rényi Differential Privacy (RDP) (Mironov, 2017) and Gaussian Differential Privacy (GDP) (Dong et al., 2021). All these privacy composition measures mechanisms achieve a tighter analysis of cumulative privacy loss by leveraging the fact that the privacy loss random variable is strictly centered around the expected loss. It is important to note that these mechanisms are just different methods for analyzing privacy composition, and their purpose is to achieve a tighter analysis of the guaranteed privacy. And all these mechanisms can be converted to $(\epsilon,\delta)$ -DP. This means that for a fixed privacy budget $(\epsilon,\delta)$ , a relatively loose definition can be satisfied by adding relatively less noise, thus achieving less privacy for the same $(\epsilon,\delta)$ level. In practice, we often utilize these composition mechanisms to combine privacy losses and then convert them to $(\epsilon,\delta)$ -DP for comparison.

The Moments Accountant (MA) technique (Abadi et al., 2016) used cumulative generating function (CGF) to characterize the privacy bounds of two privacy random distributions. The definition of CGF is as follows:

Definition 2.9.

(CGF (Abadi et al., 2016)) Given two probability distributions $P$ and $Q$ , the CGF of order $\alpha>1$ is:

(3)

\displaystyle\begin{split}\begin{aligned} G_{\alpha}(P||Q)=&\operatorname{log}\mathbb{E}_{x\sim P(x)}\bigg{[}e^{\alpha\operatorname{log}\frac{P(x)}{Q(x)}}\bigg{]}=\operatorname{log}\mathbb{E}_{x\sim Q(x)}\left[\left(\frac{P(x)}{Q(x)}\right)^{\alpha+1}\right]\end{aligned}\end{split}

where $\mathbb{E}_{x\sim Q(x)}$ denotes the excepted value of $x$ for the distribution $Q$ , $P(x)$ , and $Q(x)$ denotes the density of $P$ or $Q$ at $x$ respectively.

Based on CGF, the definition of privacy bounds for MA tracking is as follows:

Definition 2.10.

(MA (Abadi et al., 2016)). For any neighboring datasets $D,D^{\prime}\in\mathcal{X}^{n}$ and all $\alpha\in(1,\infty)$ , a randomized mechanism $\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}$ satisfies $(\epsilon,\delta)$ -DP if

(4)

\displaystyle G_{\alpha}(\mathcal{A}(D)||\mathcal{A}(D^{\prime}))\leq\epsilon,\,\,\text{where}\,\,\delta=\min_{\alpha}e^{G_{\alpha}(\mathcal{A}(D)||\mathcal{A}(D^{\prime}))-\alpha\epsilon}.

Rényi divergence is another metric that measuring distinguishability of two random distributions, defined as follows:

Definition 2.11.

(Rényi Divergence (Van Erven and Harremos, 2014)). Given two probability distributions $P$ and $Q$ , the Rényi divergence of order $\alpha>1$ is:

(5)

\displaystyle D_{\alpha}(P\|Q)=\frac{1}{\alpha-1}\ln\mathbb{E}_{x\sim Q(x)}\left[\left(\frac{P(x)}{Q(x)}\right)^{\alpha}\right],

where $\mathbb{E}_{x\sim Q(x)}$ denotes the excepted value of $x$ for the distribution $Q$ , $P(x)$ , and $Q(x)$ denotes the density of $P$ or $Q$ at $x$ respectively.

Based on Rényi divergence, zCDP can be obtained as follows:

Definition 2.12.

(zCDP (Bun and Steinke, 2016a)) For any neighboring datasets $D,D^{\prime}\in\mathcal{X}^{n}$ and all $\alpha\in(1,\infty)$ , a randomized mechanism $\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}$ satisfies $R$ -zCDP if

(6)

\displaystyle D_{\alpha}(\mathcal{A}(D)||\mathcal{A}(D^{\prime}))\leq R\alpha.

And the following Lemma 2.13 defines the standard form for converting $(\alpha,R)$ -zCDP to ( $\epsilon$ , $\delta$ )-DP.

Lemma 2.13.

(Conversion from zCDP to DP (Bun and Steinke, 2016b)). if a randomized mechanism $A:D\rightarrow\mathbb{R}$ satisfies $(\alpha,R)$ -zCDP ,then it satisfies $(R+2\sqrt{R\log(1/\delta)},\delta)$ -DP for any $0<\delta<1$ .

Rényi differential privacy (RDP) is also based on the definition of Rényi divergence as follows:

Definition 2.14.

(RDP (Mironov, 2017)) For any neighboring datasets $D,D^{\prime}\in\mathcal{X}^{n}$ , a randomized mechanism $\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}$ satisfies $(\alpha,R)$ -RDP if

(7)

\displaystyle D_{\alpha}(\mathcal{A}(D)||\mathcal{A}(D^{\prime}))\leq R.

And the following Lemma 2.15 defines the standard form for converting $(\alpha,R)$ -RDP to ( $\epsilon$ , $\delta$ )-DP.

Lemma 2.15.

(Conversion from RDP to DP (Balle et al., 2020)). if a randomized mechanism $A:D\rightarrow\mathbb{R}$ satisfies $(\alpha,R)$ -RDP ,then it satisfies $(R+\ln((\alpha-1)/\alpha)-(\ln\delta+\ln\alpha)/(\alpha-1),\delta)$ -DP for any $0<\delta<1$ .

Dong et al. (Dong et al., 2021) used hypothesis testing to quantify the distinguishability between $\mathcal{A}(D)$ and $\mathcal{A}(D^{\prime})$ . They considerd a hypothesis problem $H_{0}:P\text{ v.s. }H_{1}:Q$ and a rejection rule $\phi\in[0,{1}]$ . They defined the type I error as $\alpha_{\phi}=\mathbb{E}_{P}[\phi]$ , which is the probability that rejecting the null hypothesis $H_{0}$ by mistake. And the type II error $\beta_{\phi}=1-\mathbb{E}_{Q}[\phi]$ is the probability that accepting the alternative $H_{1}$ wrongly. And the trade-off function aims to minimal type II error at level $\alpha$ of the type I error as follows.

Definition 2.16.

(Trade-off function (Dong et al., 2021)). Given two probability distributions $P$ and $Q$ , the trade-off function of them is:

(8)

\displaystyle T(P,Q)(\alpha)=\inf_{\phi}\{\beta_{\phi}:\alpha_{\phi}\leq\alpha\}.

Let $f$ be a trade-off function. Algorithm $\mathcal{A}$ is the $f$ -differentially private if $T(\mathcal{A}(D),\mathcal{A}(D^{\prime}))\geq f$ for two neighboring datasets $D$ and $D^{\prime}$ (Dong et al., 2021). When the trade-off function is defined between two Gaussian distributions, can derive a subfamily of $f$ -differential privacy guarantees called GDP as follows.

Definition 2.17.

(GDP (Dong et al., 2021)) Let $\Phi$ denote the cumulative distribution function of the standard normal distribution. For any neighboring datasets $D,D^{\prime}\in\mathcal{X}^{n}$ and $\mu\geq 0$ , a randomized mechanism $\mathcal{A}:\mathcal{X}^{n}\rightarrow\mathbb{R}^{d}$ satisfies $\mu$ -GDP if

(9)

\displaystyle T(\mathcal{A}(D),\mathcal{A}(D^{\prime}))\geq G_{\mu},

where $G_{\mu}:=T(\mathcal{N}(0,1),\mathcal{N}(\mu,1))\equiv\Phi(\Phi^{-1}(1-\alpha)-\mu)$ .

And the following Lemma 2.18 defines the standard form for converting $\mu$ -GDP to ( $\epsilon$ , $\delta$ )-DP.

Lemma 2.18.

(Conversion from GDP to DP (Dong et al., 2021)). if a randomized mechanism $A:D\rightarrow\mathbb{R}$ satisfies $\mu$ -GDP, then it satisfies $(\epsilon,\Phi(-\frac{\varepsilon}{\mu}+\frac{\mu}{2})-\mathrm{e}^{\varepsilon}\Phi(-\frac{\varepsilon}{\mu}-\frac{\mu}{2}))$ -DP for any $\epsilon>0$ .

Another popular property is subsampling, which achieves privacy amplification by running the differentially private algorithm on a subset of the privacy samples instead of all of them (Li et al., 2012). And the privacy amplification guarantee is different for different subsampling methods (Imola and Chaudhuri, 2021; Zhu and Wang, 2019).

2.3. Fundamental Mechanisms for Differentially Private FL

In this section, we introduce the fundamental perturbation mechanisms used in differentially private FL. As depicted in Table 1, these mechanisms are categorized along two dimensions: DP or LDP, and continuous or discrete data types. Our classification focuses on their primary application scenarios in FL. Notably, mechanisms such as the Gaussian, Laplace, and EM can be utilized in both DP and LDP contexts.

Table 1. Fundamental perturbation mechanisms for differentially private FL

Data Types	DP	LDP
Continuous	Gaussian (Dwork et al., 2014)	Laplace (Dwork et al., 2006), Duchi (Duchi et al., 2013), Harmony (Nguyên et al., 2016), PM (Wang et al., 2019b)
Discrete	Discrete Gaussian (Wang et al., 2020a), Skellam (Agarwal et al., 2021)	GRR (Wang et al., 2017), RAPPOR (Erlingsson et al., 2014), EM (McSherry and Talwar, 2007)

2.3.1. Perturbation Mechanisms for DP in FL

In DP within FL, the Gaussian mechanism is most commonly used. Although the Laplace mechanism was originally defined within the context of DP, it is not widely utilized in this scenario. Subsequently, some discrete variants of the Gaussian mechanism have been widely applied in DP-FL, such as Discrete Gaussian and Skellam.

Gaussian Mechanism (Dwork et al., 2014) is the most popular mechanism for DP in FL. Assume $f:x\to\mathbb{X}$ be a function related to a query. For any $x\in\mathbb{X}$ , this mechanism adds noise $n\sim\mathcal{N}(0,\sigma^{2})$ to the $f(x)$ to ensure $(\epsilon,\delta)$ -DP. Here $\mathcal{N}(0,\sigma^{2})$ denotes the noise sample from the Gaussian (normal) distribution with mean $0$ and variance $\sigma^{2}$ , where $\sigma^{2}=\frac{2\Delta f^{2}\log(1.25/\delta)}{\epsilon^{2}}$ and $\Delta f$ is the sensitivity of $f$ .

More and more mechanisms are beginning to perform secure aggregation to satisfy differential privacy through encryption. The requirement for encryption to be performed over finite fields has led to the proposal of discrete noise mechanisms.

Discrete Gaussian Mechanism (Wang et al., 2020a) has been widely adopted for this purpose, operating by adding noise drawn from a Discrete Gaussian distribution. Assume $f:x\to\mathbb{X}$ be a function related to a query with sensitivity is 1. For any $x\in\mathbb{X}$ , the Discrete Gaussian mechanism adds noise $N_{\mathbb{L}}(0,\sigma^{2})$ to the $f(x)$ to ensure $(\alpha,\alpha/(2\sigma^{2}))$ -RDP. Here $N_{\mathbb{L}}(0,\sigma^{2})$ is Discrete Gaussian distribution with mean 0 and variance $\sigma^{2}$ as follows.

Definition 2.19.

(Discrete Gaussian Distribution (Canonne et al., 2020)). Let $\mu\in\mathbb{Z},\sigma\in\mathbb{R}$ with $\sigma\geq 0$ . The discrete Gaussian distribution with mean $\mu$ and variance $\sigma^{2}$ is denoted $N_{\mathbb{L}}(\mu,\sigma^{2})$ . It is a probability distribution supported on the integers and defined by

(10)

\displaystyle\forall x\in\mathbb{Z},\quad\mathbb{P}_{X\sim\mathcal{N}_{\mathbb{L}}(\mu,\sigma^{2})}[X=x]=\frac{e^{-(x-\mu)^{2}/2\sigma^{2}}}{\sum_{y\in\mathbb{Z}}e^{-(y-\mu)^{2}/2\sigma^{2}}}.

Skellam Mechanism (Agarwal et al., 2021) is a response to the issue of the Discrete Gaussian mechanism not being closed under summation. Let $\epsilon>0,\alpha>1$ and $f:x\to\mathbb{X}$ be a function related to a query. For any $x\in\mathbb{X}$ , this mechanism adds noise $n\sim\mathrm{Sk}_{0,\mu}$ to $f(x)$ to satisfy $(\alpha,\frac{\alpha\Delta f^{2}}{2\mu}+\min\left(\frac{(2\alpha-1)\Delta f^{2}+6\Delta f}{4\mu^{2}},\frac{3\Delta f}{2\mu}\right))$ -RDP. Here $\Delta f$ is the sensitivity of $f$ and $\mathrm{Sk}_{0,\mu}$ denotes sampling from the Skellam distribution with mean $0$ and variance $\mu$ as follows.

Definition 2.20.

(Skellam Distribution (Skellam, 1946)). The multidimensional Skellam distribution $Sk_{\Delta,\mu}$ over $\mathbb{Z}^{d}$ with mean $\Delta\in\mathbb{Z}^{d}$ and variance $\mu$ is given with each coordinate $X_{i}$ distributed independently as

(11)

\displaystyle X_{i}\sim\operatorname{Sk}_{\Delta_{i},\mu}\textit{with}\;P(X_{i}=k)=e^{-\mu}I_{k-\Delta_{i}}(\mu),

where $I_{v}(x)$ is the modified Bessel function of the first kind.

In addition to the above, there are Binomial Mechanism (Agarwal et al., 2021) and Poisson Binomial Mechanism (Chen et al., 2022b) also are discrete noise mechanisms for DP in FL, which we will not delve into here due to space constraints.

2.3.2. Perturbation Mechanisms for LDP in FL

In LDP within FL, there are three fundamental mechanisms: Laplace Mechanism, Randomized Response (RR), and Exponential Mechanism (EM). The first two are typically used for value perturbation, with the former introducing unbounded noise and the latter bounded noise. The last one is commonly used for value selection, such as dimension selection in FL scenarios. We will first introduce these three basic mechanisms and then discuss some popular and advanced mechanisms built upon them.

Laplace Mechanism (Dwork et al., 2006) originates from data publication scenarios in DP, typically used for mean estimation. Given private data $x\in\mathbb{X}$ , this mechanism adds noise $n\sim\text{Lap}(0,\frac{\Delta f}{\epsilon})$ to the aggregated value $f(x)$ of the private data, ensuring $\epsilon$ -LDP. Here, $\Delta f$ , referring to the $L_{1}$ sensitivity of $f$ , is defined as $\Delta f=\max_{\|x-y\|_{1}=1}\|f(x)-f(y)\|_{1}$ . In the FL scenario, $f(x)$ typically denotes the local FL model that outputs gradient updates.

Randomized Response (RR) (Warner, 1965) originates from data collection scenarios in LDP, typically used for count estimation of discrete values. This mechanism typically involves two steps: perturbation and calibration. In the perturbation step, each user perturbs their binary value $x\in\{0,1\}$ probabilistically according to Eq.(12), ensuring $\epsilon$ -LDP. Assuming there are $n$ users’ perturbed data, in the aggregation step, the aggregator collects the perturbed count of 1 as $c$ , and the actual count of 1 can be derived by calibration via $\hat{c}=\frac{c(e^{\epsilon}+1)-n}{e^{\epsilon}-1}$ .

(12)

\Pr[\mathcal{R}_{RR}(x)=v]=\left\{\begin{array}[]{lcr}\frac{e^{\epsilon}}{e^{\epsilon}+1}&\text{if }v=x\\ \frac{1}{e^{\epsilon}+1}&\text{if }v=1-x\end{array}\right.

Exponential Mechanism (EM) (McSherry and Talwar, 2007) is used for discrete element selection. Given any utility function (also known as a scoring function), $u$ , defined for each value, if the randomized algorithm $\mathcal{M}$ outputs result $r$ with a probability $\mathcal{M}_{E}(x,u,\mathcal{R})\sim\exp\left(\frac{\epsilon u(x,r)}{2\Delta u}\right)$ , then $\mathcal{M}$ satisfies $\epsilon$ -LDP.

In the context of mean estimation with continuous data during FL, except for the Laplace mechanism and RR, there are several advanced mechanisms.

Duchi’s Mechanism (Duchi et al., 2013) performs numeric mean estimation based on RR. Intuitively, it first randomly rounds the value $x\in[-1,1]$ to a discrete value $v\in\{-1,1\}$ , then applies RR and calibrates it. Taken together, this mechanism perturbs $x\in[-1,1]$ to a value $\mathcal{R}_{Duchi}(x)\in\left\{\frac{e^{\epsilon}+1}{e^{\epsilon}-1},-\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\right\}$ according to the probability defined in Eq.(13), where the output range of this mechanism is $\{-1,1\}$ times the calibration factor. The aggregator derives the mean by aggregating the sum of users’ perturbed values.

(13)

\Pr[\mathcal{R}_{Duchi}(x)=v]=\left\{\begin{array}[]{lcr}\frac{1}{2}+\frac{e^{\epsilon}-1}{2e^{\epsilon}+2}\cdot x&\text{if }v=\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\\ \frac{1}{2}-\frac{e^{\epsilon}-1}{2e^{\epsilon}+2}\cdot x&\text{if }v=-\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\end{array}\right.

Harmony (Nguyên et al., 2016) based on Duchi’s mechanism, focuses on multi-dimensional vectors. For a bit vector with $d$ dimensions, each user samples only one bit to perturb with Duchi’s mechanism, replacing the output range from $\mathcal{R}_{Duchi}(x)\in\left\{\frac{e^{\epsilon}+1}{e^{\epsilon}-1},-\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\right\}$ to $\mathcal{R}_{Duchi}(x)\in\left\{\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\cdot d,-\frac{e^{\epsilon}+1}{e^{\epsilon}-1}\cdot d\right\}$ .

Piecewise Mechanism (PM) (Wang et al., 2019b) combines the advantages of Laplace mechanism and Duchi’s mechanism by perturbing the private value $x$ to a range $[l(x),r(x)]$ with a larger probability. To ensure $\epsilon$ -LDP, the range is defined as $l(x)=\frac{e^{\epsilon/2}\cdot x-1}{e^{\epsilon/2}-1}$ and $r(x)=\frac{e^{\epsilon/2}\cdot x+1}{e^{\epsilon/2}-1}$ . The perturbation follows Eq.(14).

(14)

\Pr[\mathcal{R}_{PM}(x)=v]=\left\{\begin{array}[]{lcr}\frac{e^{\epsilon/2}}{2}\cdot\frac{e^{\epsilon/2}-1}{e^{\epsilon/2}+1}&\text{if }v\in[l(x),r(x)]\\ \frac{1}{2e^{\epsilon/2}}\cdot\frac{e^{\epsilon/2}-1}{e^{\epsilon/2}+1}&\text{if }v\notin[l(x),r(x)]\end{array}\right.

In the context of count estimation with discrete values, we list the advanced mechanisms as follows.

Generalized Randomized Response (GRR) (Wang et al., 2017) is an extension of traditional RR from binary values to discrete values in a domain of size $d$ . Each user perturbs the value according to Eq.(15). The aggregator then calibrates the perturbed count $c_{x}$ of value $x$ as $\hat{c}_{x}=\frac{c_{x}(e^{\epsilon}+d-1)-n}{e^{\epsilon}-1}$ .

(15)

\Pr[\mathcal{R}_{RR}(x)=v]=\left\{\begin{array}[]{lcr}\frac{e^{\epsilon}}{e^{\epsilon}+d-1}&\text{if }v=x\\ \frac{1}{e^{\epsilon}+d-1}&\text{if }v=1-x\end{array}\right.

RAPPOR (Erlingsson et al., 2014), also building on RR, incorporates a Bloom filter to facilitate discrete frequency estimation in large domains. Specifically, given a discrete domain $\{1,2,\ldots,d\}$ , denoted as $[d]$ , and $k$ hash functions $\mathbb{H}=\{H_{1},H_{2},\dots,H_{k}\}$ that map values from $[d]$ to $[k]$ . Each user first uses these $k$ hash functions to map their value $v$ into the Bloom filter with $m$ bits and then perturbs each bit in the Bloom filter using RR. To reduce collisions, where two values are hashed to the same set of indices, RAPPOR allocates users into several cohorts, assigning a unique group of hash functions to each cohort. For longitudinal privacy, which protects against attacks during multiple accesses, RAPPOR splits $\epsilon$ into two parts: the first for permanent perturbation via RR and the second for instantaneous perturbation based on the permanent perturbation.

3. Differentially Private HFL

Differentially private federated learning research has largely focused on the HFL scenario. Therefore, in this section, we will specifically introduce the implementation of various DP models within HFL. Specifically, we will show the works of differentially private HFL from DP, LDP and shuffle model separately.

3.1. DP-HFL

As mentioned in Section 2.2, the concept of DP is based on the input of two neighboring datasets. The definition of neighbouring datasets on federated learning depends on the desired formulation of privacy in the setting (i.e. which objects need to be kept private).

As in Figure 3, in general HFL, we can find two data owners, the local client and the central server. Therefore, based on the notions of neighboring datasets, the two main levels of neighboring in HFL can to be formally defined as follows:

Definition 3.1.

(Sample-level DP (SL-DP)). Under SL-DP, two datasets $D$ and $D^{\prime}$ are neighbouring if they differ in a single sample or record (either through addition or through removal).

Definition 3.2.

(Client-level DP (CL-DP)⁹⁹9Some articles treat “user-level DP” as equivalent to “client- level DP” because they assume that each client has only one user (Cheng et al., 2022; Yang et al., 2023b). However, we provide a more general definition, where we consider the possibility of one or more users under a single client. Further discussion and details regarding the definition and functionality of user-level are addressed in Section 6.). Under CL-DP, two datasets $D$ and $D^{\prime}$ are neighbouring if they differ in a single client or device (either through addition or through removal).

3.1.1. SL-DP

Under SL-DP, the data owner can be each local client. As shown in the Figure 3(a), each hospital participating in federated learning has a local database. For each client, the goal of SL-DP is to hide the presence of a single sample or record, or to be more specific, to bound the influence of any single sample or record on the learning outcome distribution (i.e. the distribution of the model parameters). It protects each local sample or record, so that the attackers could not identify one sample from the union of all local datasets.

Generally, at the SL-DP, we assume that the server is semi-honest (honest but curious). Therefore, each client cares about the privacy of their local dataset, and each client’s privacy budget is independent. The most mainstream approach to implementing SL-DP is to execute the Differentially Private Stochastic Gradient Descent (DPSGD) algorithm (Abadi et al., 2016) on each client ¹⁰¹⁰10If one wishes to define SL-DP under the aggregate samples held by all clients in FL, it requires the establishment of a secure third party. For example, Ruan et al. (Ruan et al., 2023) proposed a secure DPSGD algorithm by combining differential privacy and secure multiparty computation. They designed a secure inverse of square root method to securely clipping the gradient vectors and a secure Gaussian noise generation protocol. Their algorithm can efficiently perform DPSGD in ciphertext.. DPSGD is a widely-adopted training algorithm for deep neural networks with differential privacy guarantees. Specifically, in each iteration $t$ , a batch of tuples $\mathcal{B}_{t}$ is sampled from $D$ with a fixed probability $\frac{b}{|D|}$ , where $b$ is the batch size. After computing the gradient of each tuple $x_{i}\in\mathcal{B}_{t}$ as $g_{t}(x_{i})=\nabla_{\theta_{i}}L(\theta_{i},x_{i})$ , where $\theta_{i}$ is model parameter for the i-th sample, DPSGD clips each per-sample gradient according to a fixed $\ell_{2}$ norm (Equation (16)).

(16)

\displaystyle\begin{split}\overline{g}_{t}\left(x_{i}\right)&=\textbf{Clip}(g_{t}\left(x_{i}\right);C)=g_{t}\left(x_{i}\right)\Big{/}\max\Big{(}1,\frac{\left\|g_{t}\left(x_{i}\right)\right\|_{2}}{C}\Big{)}.\end{split}

In this way, for any two neighboring datasets, the sensitivity of the query $\sum_{i\in\mathcal{B}_{t}}g(x_{i})$ is bounded by $C$ . Then, it adds Gaussian noise scaling with $C$ to the sum of the gradients when computing the batch-averaged gradients:

(17)

\tilde{g}_{t}=\frac{1}{b}\left(\sum_{i\in\mathcal{B}_{t}}\overline{g}_{t}\left(x_{i}\right)+\mathcal{N}\left(0,\sigma^{2}C^{2}\mathbf{I}\right)\right),

where $\sigma$ is the noise multiplier depending on the privacy budget. Last, the gradient descent is performed based on the batch-averaged gradients. Since initial models are randomly generated and independent of the sample data, and the batch-averaged gradients satisfy the differential privacy, the resulted models also satisfy the differential privacy due to the post-processing property. Many state-of-the-art variants of DPSGD (Fu et al., 2023; Wei et al., 2022; Papernot et al., 2021) can also be directly applied in the local client iterations to achieve more efficient model under SL-DP.

However, how to overcome the challenge of data heterogeneity of DP-FL is the concern of most of the work. Huang et al. (Huang et al., 2020) proposed differential privacy convolutional neural network with adaptive gradient descent algorithm (DPAGD-CNN) to update the training parameters of each client. They selected the best learning rate from a candidate set based on model evaluation in each round of local DPSGD. Noble et al. (Noble et al., 2022) introduced DP-SCAFFLOD, an extension of the SCAFFLOD algorithm (Karimireddy et al., 2020), which incorporates differential privacy. It employs a control variable to limit model drift during local DPSGD, aligning them more closely with the global model direction. And Using advanced results from DP theory and optimization to establish the convergence of DP-SCAFFLOD for convex and non-convex objectives. Wei et al. (Wei et al., 2021) found that there is an optimal number of communication rounds in terms of convergence performance for a given privacy budget $\epsilon$ , which has motivated them to adaptively allocate privacy budgets in each round. Fu et al. (Fu et al., 2022a) proposed the Adap DP-FL algorithm, which includes adaptive gradient clipping and adaptive noise scale reduction methods. In the gradient clipping step of DPSGD, gradients are clipped using adaptive thresholds to account for the heterogeneity of gradient magnitudes across clients and training rounds. Additionally, during noise addition, the noise scale gradually decreases (or the privacy budget gradually increases) as gradients converge across training rounds. Yang et al. (Yang et al., 2023c) start from the Non-IID data itself, by concurrently updating local data transformation layers during local model training, thereby reducing the additional heterogeneity introduced by DP, and consequently improving the utility of FL models for Non-IID data. Their method was also applied to the CL-DP setting.

The allocation of privacy budget is also the focus of the study, as the privacy budget determines factors such as the number of iteration rounds and noise scale, it directly impacts the performance of the trained model. Ling et al. (Ling et al., 2023) investigated how to achieve better model performance under constraints of privacy budget and communication resources. They conducted convergence analysis of DP-HFL, derived the optimal number of local iterations before each aggregation. Liu et al. (Liu et al., 2021b) considered scenarios with heterogeneous privacy budgets and proposed the Projected Federated Averaging (PFA) algorithm. This algorithm utilizes the top singular subspace of model updates submitted by clients with higher privacy budgets to project them onto model updates from clients with lower privacy budgets. Furthermore, it reduces communication overhead by having clients with lower privacy budgets upload projected model updates instead of original model values. Zheng et al. (Zheng et al., 2021) integrated GDP privacy metric into DP-HFL, proposing a private federated learning framework called PriFedSync. While considering the cost of communication, Li et al. (Li et al., 2022c) proposed SoteriaFL, which is a unified framework for compressed private FL. They use shifted compression scheme (Horváth et al., 2023) to compress the perturbed parameters for efficient communication while maintaining high utility.

In addition to this, related articles have explored more complex attack scenarios under SL-DP protection. Xiang et al. (Xiang et al., 2023) proposed a method that combines local DPSGD with Byzantine fault tolerance techniques. By leveraging differential privacy’s random noise, they construct an aggregation approach that effectively thwarts many existing Byzantine attacks. This method ensures the privacy and stability of federated learning systems by introducing randomness that prevents Byzantine attackers from accurately interfering with gradient updates and aggregation processes. Wei et al. (Wei et al., 2020) assumed that downlink broadcast channels are more dangerous than uplink channels and proposed NBAFL, which not only add noise to the parameters before aggregation but also add noise again after aggregation to achieve a higher level of privacy protection. Naseri et al. (Naseri et al., 2020) proposed a analytical framework that empirically assesses the feasibility and effectiveness of SL-DP and CL-DP in protecting FL. Through many attack experiments, their results indicate that while SL-DP can defend membership inference attacks and backdoor attacks, it cannot resist attribute inference attacks.

In addition to the above stochastic gradient descent, alternating direction method of multipliers (ADMM) is also local optimization method for FL. ADMM introduces Lagrange multipliers to transform the original problem into a series of subproblems, which are then alternately solved until convergence (Zhang and Kwok, 2014). Huang et al. (Huang et al., 2019) utilize a first-order approximation function as the objective function for local clients. This first-order approximation function is convex and naturally constrains the $l_{2}$ norm of the gradient within a bound without clipping the gradient to get the sensitivity. Additionally, they devised an adaptive noise coefficient decay to facilitate the convergence of this first-order approximation function. Based on this, Ryu et al. (Ryu and Kim, 2022) proposed local multiple iterations to accelerate convergence speed and reduce privacy loss. However, ADMM-based optimization methods can only be used for convex optimization problems and are not widely applicable in FL.

3.1.2. CL-DP

Under general CL-DP, the server is often assumed to be completely honest (honest and not curious) and can be viewed a data owner. As shown in the figure 3(b), the central server can collect the parameters uploaded by each client in each round. The goal of CL-DP is to hide the presence of a single client or device, or to be more specific, to limit the impact of any single client or device on the distribution of aggregation results. It requires that the attackers cannot identify the participation of one client or device by observing the output of aggregated parameters ¹¹¹¹11It is worth noting that while some works locally clip and add noise, the amount of noise added is $\mathcal{N}(0,\sigma^{2}C^{2}/K)$ , where $K$ is the number of clients participating in federated learning (Shi et al., 2023; Cheng et al., 2022). Each client model $\mathbf{w}_{k}$ satisfy weak LDP protection, but this is not the protection goal. The intention is for the uploaded data to the central aggregator to be equivalent to $\sum_{k=1}^{K}\mathbf{w}_{k}+\mathcal{N}(0,\mathbf{\sigma}^{2}C^{2})$ , it can be seen that the definition of neighboring datasets is CL-DP (Definition 3.2)..

Geyer et al. (Geyer et al., 2017) introduced CL-DP in federated learning for the first time, assuming differential attacks from any participating client. The server perturbs the distribution of the summed parameters by limiting each client’s parameter contribution and then adding noise. Based on work of Geyer et al. (Geyer et al., 2017), Mcmahan et al. (McMahan et al., 2017b) proposed the DP-FedAvg and DP-FedSGD algorithms, which employ client sampling techniques for privacy amplification and utilize the moment accountant mechanism (Abadi et al., 2016) for privacy computation. Their algorithms achieved comparable performance of non-private models in Long Short-Term Memory (LSTM) language.

The previous work constrained each client’s model parameters to a fixed constant, while Andrew et al. (Andrew et al., 2021) proposed an adaptive clipping method. Their method uses an adaptive clipping bound based on a parameter’s norm distribution quantile, estimated with differential privacy, rather than a fixed bound. Their experiments demonstrate that adaptive clipping to the median update norm performs well across a range of federated learning tasks. Zhang et al. (Zhang et al., 2022) studied the impact of clipping on model parameters and gradients. They demonstrated that uploading clipped gradients leads a better performance than uploading clipped models on the client side, and provided a convergence analysis based on gradient clipping with noise, where the upper bound of convergence highlighted the additional terms introduced by differential privacy.

In addition to this, many studies have focused on reducing the impact of noise on uploaded model parameters. Chen et al. (Cheng et al., 2022) observed that using a small clipping threshold can decrease the injected noise volume. They reduced the norm of local updates by regularizing local models and making the local updates sparse. DP-FedSAM (Shi et al., 2023) uses a SAM optimizer (Foret et al., 2020) to help model parameters escape saddle points and enhance the robustness of the model to noise, aiming to find more stable convergence points. Bietti et al. (Bietti et al., 2022) proposed PPSGD, which trains personalized local models and ensures global model privacy simultaneously. They utilized personalized models to enhance the global model’s performance. Similarly, Yang et al. (Yang et al., 2023b) dynamically preserved high-information model parameters locally from noise impact using layer-wise Fisher information. They also introduced an adaptive regularization strategy to impose differential constraints on model parameters uploaded to the server, enhancing robustness to clipping. Xu et al. (Xu et al., 2023a) took into account that the size of the softmax layer of model parameters is linearly related to the number of sample labels. By keeping the softmax layer local and not uploading it, the participation of more clients (i.e., more sample labels) in federated learning does not result in more noise injection. Triastcyn et al. (Triastcyn and Faltings, 2019) leveraged the assumption that data is distributed similarly across clients in HFL, which leads to their updates being highly consistent, and utilized BDP for privacy auditing to obtain tighter privacy bounds. Their assumption and method is not only used under the CL-DP, but also applied in SL-DP scenario.

CL-DP with Secure Aggregation (SA) ¹²¹²12Many works refer to this framework as “distributed dp” (Kairouz et al., 2021a; Agarwal et al., 2021; Yang et al., 2023c).. As mentioned above, CL-DP often requires a secure server for aggregation. In order to eliminate the reliance on a trusted central authority, many recent studies have combined differential privacy techniques with secure aggregation (Bonawitz et al., 2017) to achieve CL-DP. As shown in Figure 4, after local training, each client adds a small amount of differential privacy noise to the model parameters, then encrypts and sends them to the server. The server performs aggregation, enabling the aggregation of noise levels from each client, thereby ensuring that the encrypted model satisfies CL-DP. However, secure aggregation relies on modular arithmetic and cannot be directly compatible with continuous noise mechanisms (such as Laplace mechanism and Gaussian mechanism) currently used to implement differential privacy. This has led researchers to begin designing discrete noise addition mechanisms and integrating them with secure aggregation to characterize differential privacy.

Agarwal et al. (Agarwal et al., 2018) proposed and extended the first discrete mechanism, namely the binomial mechanism, in the federated setting. The discrete nature of binomial noise, along with quantization techniques (Suresh et al., 2017), enables efficient transmission. However, applying this mechanism to practical tasks faces limitations. Firstly, binomial noise can only achieve approximate differential privacy with nonzero probability of fully exposing private data, and the binomial mechanism does not achieve Renyi or concentrated differential privacy.

To address these issues, Wang et al. (Wang et al., 2020a) first discretized local parameters (McMahan et al., 2017a), then added noise from the Discrete Gaussian Distribution to satisfy differential privacy. Kairouz et al. (Kairouz et al., 2021a) also employed a Discrete Gaussian Mechanism, providing a novel privacy analysis associated with the proportion of malicious clients for the sum of Discrete Gaussian Mechanism. They extensively investigate the impact of discretization, noise, and modular clipping on model utility. However, the Discrete Gaussian Mechanism suffers from the problem of non-closure under addition. To address this, Agarwal et al. (Agarwal et al., 2021) proposed the Skellam Mechanism, which uses the difference of two independent Poisson distributed random variables as noise. The noise distribution of this mechanism naturally satisfies the property of closure under addition due to the first-kind modified Bessel function, thereby avoiding additional privacy budget for noise summation. However, the noise magnitude of the aforementioned discrete noise schemes is unbounded, thus requiring modular clipping for secure aggregation, leading to additional bias. Chen et al. (Chen et al., 2022b)introduced the multi-dimensional Poisson Binomial Mechanism, a unbiased and bounded discrete differential privacy mechanism. It treats model parameters as probabilities for a binomial distribution, and generates binomial noise based on these probabilities to control the noise bound.

However, the above methods rely on per-parameter quantization, resulting in significant communication overheads. To further explore the fundamental communication costs required for achieving optimal accuracy under centralized differential privacy, Chen et al. (Chen et al., 2022a) proposed using a linear scheme based on sparse random projection to reduce communication overheads. They utilize count sketch matrices and hash functions to obtain sparse projection matrices, thus reducing model parameter dimensionality. Kerkouche et al. (Kerkouche et al., 2021) also sparsify vectors by decomposing model parameters into a “sparse orthogonal basis matrix” and a “sparse signal” multiplied by the discrete Gaussian mechanism locally. The server can reconstruct the sparse gradient vector by solving a convex quadratic optimization problem. In addition to designing new discrete noise mechanisms to integrate secure aggregation techniques, Stevens et al. (Stevens et al., 2022) proposed a secure federated aggregation scheme based on the Learning With Errors (LWE) problem (Regev, 2009). It encrypts the model parameters of local clients using masks generated by the LWE problem before uploading. This method utilizes the random noise naturally introduced by LWE technology, ensuring that the sum of errors in the LWE problem satisfies differential privacy.

3.2. LDP-HFL

LDP-HFL can be viewed as a mean estimation problem based on LDP, as each model parameter represents a high-dimensional continuous data point. Each client perturbs their model parameters locally before sending them to the server, which aggregates the data and produces the mean estimation result. There has been extensive research on mean estimation based on LDP (Duchi et al., 2013; Nguyên et al., 2016), but directly applying it to federated learning faces certain obstacles. This is because the dimensionality of model parameters in federated learning is extremely high, and as the dimension increases, the privacy budget allocated to each dimension decreases, leading to an exponential increase in statistical variance (Duchi et al., 2018).

Many scholars are studying new mean estimation techniques of high-dimensional data under LDP. Wang et al. (Wang et al., 2019b) proposed Piecewise Mechanism (PM) and Hybrid Mechanism (HM), which can perturb multidimensional data containing both numeric and categorical data under optimal worst-case error. Furthermore, they presented an LDP-compliant algorithm for FL using PM and HM, and got the high utility. Zhao et al. (Zhao et al., 2020b) proposed two new LDP mechanisms. Building upon Duchi (Duchi et al., 2013), they introduced the Three-Outputs mechanism, which offers three discrete output possibilities. This mechanism achieves a small worst-case noise variance under small privacy budget $\epsilon$ . For larger privacy budget $\epsilon$ , inspired by the multiple-outputs strategy, they devised an optimal partitioning mechanism based on the PM (Wang et al., 2019b), named PM-SUB. Trues et al. (Truex et al., 2020) converted the each element of the local parameters to a discrete space, and then perturbed them separately by exponential mechanism under Condensed-LDP (Gursoy et al., 2019). They applied these two mechanisms to the parameters uploaded from clients to the server to achieve LDP-FedSGD.

Building on the optimization of existing LDP mechanisms, many studies utilize parameters shuffling to further mitigate the curse of dimensionality in LDP-HFL. Sun et al. (Sun et al., 2021) proposed the Adaptive-Duchi mechanism, which sets an adaptive perturbation range for each layer of parameters in the model base on Duchi (Duchi et al., 2018). Furthermore, the parameter shuffling method is proposed by them, where they split the model parameters of each client by layer and then shuffle users’ parameters within each layer. If regarding the information in different layers as independent in some scenarios, the privacy budget will no longer need to be split according to the parallel property of DP, further improving utility ¹³¹³13 Although this method involves the action of shuffling, it does not conform to the concept and definition of the shuffle model. Parameter shuffling only avoids the division of the privacy budget across multiple dimensions under LDP based on the assumption that parameters are independent at each layer, but it does not achieve privacy amplification from LDP to DP.. Zhao et al. (Zhao et al., 2022) also employed parameter shuffling to eliminate the associations between dimensions. In addition, they designed an Adaptive-Harmony mechanism to allocate perturbation interval adaptively to the parameters in each layer of the model. Varun et al. (Varun et al., 2024) proposed SRR-FL, in which they used the Staircase Randomized Response (SRR) mechanism (Wang et al., 2022a) for local perturbations before parameter shuffling. The SRR mechanism differentiates by distributing different perturbation probabilities among various value groups within the domain, with higher perturbation probabilities assigned to values closer to the true value. This approach can enhance the performance and utility of the randomization scheme.

The other method to alleviate the curse of dimensionality is select or sample dimension . Liu et al. (Liu et al., 2020a) proposed FedSel, which is a two-step LDP-FL framework that includes a dimension selection stage and a value perturbation stage. In dimension selection step, they build a top- $k$ dimension set containing the dimensions of the $k$ largest absolute update values and privately selects one “importan” dimension from the top- $k$ set. Then the value of selected dimension is perturbed by Peicewise Mechanism (Wang et al., 2019b) in stage two. However, FedSel only selects one dimension for each local model parameter, which may lead to slower model convergence. Jiang et al. (Jiang et al., 2022) extended it to multi-dimensional selection and designed the Exponential Mechanism-based Multi-Dimension Selection (EM-MDS) algorithm to reduce the privacy budget incurred when selecting multiple dimensions. Additionally, they assigned a sign variable to the selected dimension values instead of directly perturbing them. To address the slow convergence caused by dimension selection, Li et al. (Li et al., 2022b) proposed iterating locally using the Adam optimizer (Kingma and Ba, 2014), then selecting the top-k dimensions and adding noise using the Laplace mechanism. Wang et al. (Wang et al., 2023a) aslo use exponential mechanism to choose Top- $K$ dimensions before perturbed raw parameters. In addition this, they proposed DMP-UE mechanism to perturb the Top- $K$ parameters get high utility, which extends upon Duchi (Duchi et al., 2013) to output three cases (including output 0), rather than two. They also reduced communication costs by introducing edge nodes to perform edge aggregation.

Other research focuses on the heterogeneity of clients in federated scenarios. For the data heterogeneity, Zhang et al. (Zhang et al., 2023b) focused on the performance degradation caused by model heterogeneity in non-i.i.d. data settings, they proposed a personalized federated learning approach (FedBDP) based on Bregman divergence and differential privacy. Their algorithm employs Bregman divergence to quantify the discrepancy between local and global parameters and incorporates it as a regularization term in the update loss function. Additionally, by defining decay coefficients, they dynamically adjust the magnitude of the differential privacy noise for each round. Wang et al. (Wang et al., 2020c) introduced FedLDA, an LDP-based latent Dirichlet allocation (LDA) model tailored for federated learning settings. FedLDA employs an innovative random response mechanism with a prior (RRP), ensuring that the privacy budget remains independent of the dictionary size. This approach enhances accuracy through adaptive and non-uniform sampling processing. In response to the heterogeneous device scenario, Lian et al. (Lian et al., 2022) proposed a browser-based cross-platform federated learning framework called WebFed. To enhance privacy protection, each client adds Laplace noise to the weights of their local models before uploading the training results. And for heterogeneity in privacy budgets, Yang et al. (Yang et al., 2021) proposed PLU-FedOA, Which focus on various privacy requirements of clients. Firstly, they added Laplace noise to parameters locally using different privacy budget $\epsilon_{k}$ to achieve personalized LDP, and then designed a parameters aggregation algorithm for serve to close to the unbiased estimate.

Last but not least, Some methods choose to perturb raw data directly rather than model parameters. Wang et al. (Wang et al., 2022b) treat each data point as an individual user and locally perturb each raw data before uploading it to the edge server. They employ the RAPPOR method (Erlingsson et al., 2014) to encode each feature of the data, followed by individual bit flips. Their experiments demonstrate that this local data perturbation approach effectively withstands data reconstruction attacks while maintaining model efficiency. Unlike directly perturbing the local raw data, Mahawaga et al. (Mahawaga Arachchige et al., 2022) first extract feature vectors from the local datasets using the convolutional and pooling layers in the CNN model. These feature vectors are then converted into flattened 1-D vectors and subjected to unary encoding and perturbation by RAPPOR (Erlingsson et al., 2014) before being uploaded to the server.

3.3. Shuffle model-HFL

The shuffle model combines LDP randomization with shuffling, ensuring the guarantee of DP against the analyzer. As such, referring to the taxonomy of HFL in DP, we divide HFL within the shuffle model into two classes, client level (i.e., CL-DP) and sample level (i.e., SL-DP), elaborated as follows.

3.3.1. Shuffle model of CL-DP

To achieve the shuffle model with CL-DP as shown in Figure 5 (a), each user randomizes their local updates with $(\epsilon_{l},\delta_{l})$ -LDP, and the aggregated updates through shuffling satisfies $(\epsilon_{c},\delta_{c})$ -DP. Typically, following the inertia of LDP, researchers consider $\epsilon_{l}$ -LDP, setting $\delta_{l}=0$ . Note that we are only discussing methods that benefit from privacy amplification of shuffling here. Some approaches, such as those in (Sun et al., 2021; Zhao et al., 2022), employ the shuffling step solely for breaking relations between dimensions, inherently belonging to the general CL-DP category.

Specifically, Liu et al. (Liu et al., 2021a) first consider FL in the shuffle model, proposing the FLAME algorithm. In this algorithm, each user samples the dimensions of local updates and perturbs the values in these dimensions for shuffling and aggregation. To simultaneously leverage the privacy amplification benefits of sampling and shuffling, where the privacy amplification from shuffling relies on a certain number of user samples, they further introduce the dummy padding strategy on the shuffler side, ensuring the sampling size remains constant across each dimension. Around the different sampling strategy, Liew et al. (Liew et al., 2022) proposed shuffled check-in protocol in CL-DP, which is similar to self sampling (Girgis et al., 2021a) in the aspect of methodology but achieves $(\epsilon_{l},\delta_{l})$ -LDP in theory and deriving a tighter privacy amplification bound with RDP. Beyond the uniform $\epsilon_{l}$ -LDP setting for each user, Liu et al. (Liu et al., 2023b) considered heterogeneous privacy budgets that assume different $\epsilon_{k}$ -LDP for each client, deriving a tight $(\epsilon_{c},\delta_{c})$ -DP bound in the analyzer side based the idea of hiding among clones (Feldman et al., 2022). To improve the utility, they proposed a Clip-Laplace mechanism to bound the Laplace noise in a finite range.

3.3.2. Shuffle model of SL-DP

In the shuffle model using SL-DP, the neighboring databases for $(\epsilon_{c},\delta_{c})$ -DP analysis are defined based on samples rather than clients’ data. To achieve it, a straightforward approach is to perturb each sample’s updates separately, then aggregate and shuffle these samples. Chen et al. (Chen et al., 2023a) adopt this method, further exploring its privacy amplification under a heterogeneous privacy scenario and achieving a tighter privacy bound by $f$ -DP (also known as GDP) compared to Liu et al. (Liu et al., 2023b). More typically and beneficially, each user samples only one record to train and reports its corresponding local updates, as shown in Figure 5 (b).

Specifically, Girgis et al. (Girgis et al., 2021b) introduce the CLDP-SGD algorithm for SL-DP with shuffling, which involves sampling clients first and then sampling the user’s sample. Such a combination of dual sampling and shuffling enhances privacy amplification and reduces communication costs. Building on CLDP-SGD, Girgis et al. (Girgis et al., 2021a) introduce the dss-SGD algorithm. They employ Poisson sampling for client selection instead of uniformly sampling a fixed number of clients by the shuffler and demonstrate the corresponding privacy amplification effect.

3.4. Summary

Our summary is shown in the Table 2, which shows the object is protected and whether there are honest server in differentially private HFL. We can observe that, currently, only CL-DP assumes an honest server. Moreover, we can see that, for SL-DP, its protection object is whether a sample participates, while the protection target for CL-DP is whether a client participates. As for LDP, its protection target is the model parameters since there is no definition of neighboring datasets. For the CL-DP and LDP, a larger number of clients (cross-device) is more suitable, where the greater the number of client participants, the more the advantages of CL-DP and LDP are realized. Conversely, Cross-silo is more appropriate for SL-DP, as each client holds a larger number of samples, which better leverages the strengths of SL-DP. Below, we explore in detail two pairs of concepts that are easily confused in DP-HFL.

Table 2. Summary of DP-HFL

DP Model	Neighborhood Level	Server Assumption	Clients Setting^∗	Protect Object
	SL	semi-honest	cross-silo	the presence of a single sample
	CL	honest	cross-device	the presence of a single client
DP	CL with SA	semi-honest	cross-device	the presence of a single client
LDP	-	semi-honest	cross-device	the true parameters
	SL	semi-honest	cross-silo	the presence of a single sample
Shuffle Modle	CL	semi-honest	cross-device	the presence of a single client

*

* About Clients setting can refer (Zhang et al., 2023a), and what we provide here are merely appropriate client setting for the corresponding DP model, not standards.

3.4.1. SL-DP vs. LDP

These two modes are often confused, and many articles categorize SL-DP as LDP because they both assume a semi-honest server. However, in fact, they are different from the definition. There is a definition of neighboring datasets in DP, and always many samples in the neighboring datasets. So, clipping is performed on each sample to limit the sensitivity brought by each sample in SL-DP. In LDP, neighboring datasets degenerate into any two different inputs, so sensitivity is often obtained by directly clipping model parameters. In terms of the number of local iterations, in SL-DP, each round of local iteration is equivalent to accessing the local dataset, requiring noise to be added to it. In LDP, the privacy loss of LDP occurs at the moment of upload, and the perturbation occurs only at the moment before upload. Moreover, many LDP articles focus more on the issue of parameter dimensions because in LDP, dimensions divide the privacy budget, leading to privacy budget explosion. Although dimensions in SL-DP also affect performance, they do not have as much impact as in LDP. Generally, LDP mechanisms can achieve the protection of SL-DP, but the reverse is not true. In special cases, such as when a client has only one piece of data, we consider SL-DP to be equivalent to LDP, but such cases are rare in the HFL.

3.4.2. LDP vs. CL-DP with SA

Both require local noise addition, but the scale of noise added locally differs. For example, using Laplace noise, in LDP, adding laplace noise $Lap(0,\epsilon/\Delta f)$ , locally achieves $\epsilon$ -LDP. In contrast, CL-DP only requires adding laplace noise $Lap(0,\epsilon/(K\Delta f))$ , where $K$ is the number of clients, and secure aggregation ensures that the final aggregated weights satisfy $\epsilon$ -DP. Although CL-DP with SA provides some level of LDP privacy protection locally, the amount of noise is too small, resulting in local privacy protection that is far less than that of $\epsilon$ -LDP.

4. Differentially Private VFL and Differentially Private TFL

The amount of work on DP models in VFL and TFL is far less than in HFL. In VFL, the current mainstream research focuses on secure entity alignment protocols, which involve securely matching the same entities in samples from different parties using private set intersection (PSI) algorithm (Zhou et al., 2021). Then, the aligned entities can form a virtual dataset, which is used in the downstream vertical training. In TFL, current research focuses more on how to transfer knowledge from datasets across different domains and enhance model utility (Fernando et al., 2013). There are also related articles indicating that knowledge distillation can naturally defend against some poisoning and backdoor attacks (Li et al., 2021a). However, as more privacy issues are exposed in VFL and TFL (Pasquini et al., 2021; Jagielski et al., 2024), research on DP models in them is becoming more and more important.

4.1. Differentially Private VFL

VFL is suitable for joint training when different clients hold the same samples with different features, increasing the feature dimensions of data during training (Yang et al., 2019). As shown in Figure 6(a), there are two apps collecting user health data in a region: one is the Health App, and the other is the Apple Watch. It is highly likely that the information of local residents is registered in both of these apps, while the disease information is only held by the Hospital. However, the user features may have no commonalities, as the Health App records users’ blood pressure, while the Apple Watch records users’ sleep duration. In the case of aligned data samples, all parties can engage in joint training. However, this joint training method poses a serious privacy threat, as both features and labels can potentially be inferred. A common protective method is to add perturbation $\mathcal{R}(h_{k})$ to the features $h_{k}$ extracted by the feature extractor for DP protection.

Clients transmit prediction results to the server, posing privacy risks of leaking features (Pasquini et al., 2021). Therefore, some existing work has introduced DP into the model training of VFL to protect features. Chen et al. (Chen et al., 2020) introduced the concept of asynchrony based on VFL. Although there are multiple clients, only one client and server are active at the same time. They introduced noise by “increasing random neuro” and controlled the variance of Gaussian random neurons to satisfy Gaussian differential privacy (GDP). Oh et al. (Oh et al., 2022) replaced traditional neural network models with a vision transformer (ViT) and used split learning (SL) to bypass the large size issue of ViT by shattering data communication at the slicing layer. They also proposed DP-CutMixSL, introducing a Gaussian mechanism to the data smashed by cutout uploaded to the server, making the algorithm satisfy DP. Mao et al. (Mao et al., 2022) proposed a new activation function called ReLU, which transforms private shuffled data and partial losses into random responses in forward and backward propagation to prevent attribute inference and data reconstruction in SplitNN. Based on this, they introduced differential privacy and added Laplace noise to the forward propagation results in a random response (RR) manner, achieving a fine-grained privacy budget allocation scheme for SplitNN. Tian et al. (Tian et al., 2023) introduced ”bucket” into the scenario of combining VFL with gradient boosting decision tree (GBDT), allowing the server to only know which bucket a sample belongs to, without knowing the order of the sample in the buckets. By moving samples between buckets with a certain probability and adding RR perturbation to each bucket, the samples are protected by DP. Li et al. (Li et al., 2022a) proposed condensed-LDP (CLDP) in the context of VFL combined with tree boosting and designed three order-preserving desensitization algorithms for feature information to achieve privacy protection.

On the other hand, relevant research indicates that privacy information of labels can be inferred based on the gradients returned by the server (Fu et al., 2022b; Li et al., 2021b). Therefore, some existing work has introduced DP into VFL to protect labels of server. Yang et al. (Yang et al., 2022) proposed a label-protected VFL, where Laplace noise is added to the gradients obtained by the server through reverse derivation to protect label privacy information, as the server needs access to label information when computing gradient values. On the other hand, Takahashi et al. (Takahashi et al., 2023) applied Label Differential Privacy to VFL. When calculating the loss value, they directly perturb the true labels using General Random Response (GRR) to protect label information. Their method successfully mitigated label leakage from the instance space, effectively countering ID2Graph attacks based on tree models.

In addition, related research simultaneously protects both features and labels in VFL. Wang et al. (Wang et al., 2020b) not only add Gaussian noise to the results of forward propagation obtained by the feature holder (client), but also add Gaussian noise to the gradient results obtained by the label holder (server) during backward derivation, thus ensuring differential privacy protection on both sides. Wu et al. (Wu et al., 2020) used MPC to construct a trusted third-party environment for centralized training, enabling the application of differential privacy in a dense environment, ensuring that the final output model satisfies differential privacy protection.

Table 3. An overview study of Differentially Private FL.

Federated

Scenario

Publications

Model

Neighborhood

Level

Perturbation

Mechanism

Composition

Mechanism¹

Downsteam

Tasks²

Model

Archiecture

Clients

Number

\epsilon

\delta

Huang et al. (Huang et al., 2019)

Gaussian

Regression

[0.01,0.2]

[10^{-3},10^{-6}]

Huang et al. (Huang et al., 2020)

Gaussian, Laplace

Classification

Shallow CNN

10,100,1000

[0.2,8]

[10^{-2},10^{-5}]

Wei et al. (Wei et al., 2020)

Gaussian

Classification

Shallow CNN,LSTM

[10,20]

[0.12,2]

[10^{-2},10^{-5}]

Wei et al. (Wei et al., 2021)

Gaussian

Classification

Shallow CNN

[4,20]

10^{-3}

Liu et al. (Liu et al., 2021b)

Gaussian

GDP

Classification

Shallow CNN

100

[10,100]

10^{-3}

Zheng et al. (Zheng et al., 2021)

Gaussian

GDP

Classification

Shallow CNN

100

[10,100]

10^{-3}

Noble et al. (Noble et al., 2022)

Gaussian

RDP

Classification

Shallow CNN

[3,13]

10^{-6}

Fu et al. (Fu et al., 2022a)

Gaussian

RDP

Classification

Shallow CNN

[2,6]

10^{-5}

Li et al. (Li et al., 2022c)

Gaussian

Classification

LR, Shallow CNN

[1,16]

10^{-3}

Ryu et al. (Ryu and Kim, 2022)

Gaussian

Classification

[10,195]

[0.05,5]

10^{-6}

Ling et al. (Ling et al., 2023)

Gaussian

RDP

Classification

Shallow CNN

[1.5,5.5]

10^{-5}

Xiang et al. (Xiang et al., 2023)

Gaussian

Classification

Shallow CNN,LSTM

[10,20]

[0.12,2]

[10^{-2},10^{-5}]

Ruan et al. (Ruan et al., 2023)

Gaussian

RDP

Classification

Shallow CNN, LSTM

[3,10]

[0.25,2]

[10^{-4},10^{-5}]

Geyer et al. (Geyer et al., 2017)

Gaussian

Classification

Shallow CNN

100, 1000, 10000

[10^{-3},10^{-6}]

Mamahan et al. (McMahan et al., 2017b)

Gaussian

Classification

LSTM

[100,763430]

[2.0,4.6]

10^{-9}

Andrew et al. (Andrew et al., 2021)

Gaussian

RDP

Classification

Shallow CNN

[500,342000]

[0.035,5]

[

\frac{1}{500}

\frac{1}{342000}

]

Zhang et al. (Zhang et al., 2022)

Gaussian

Classification

Shallow CNN, ResNet-18

1920

[1.5,5]

10^{-5}

Cheng et al. (Cheng et al., 2022)

Gaussian

Classification

Shallow CNN, ResNet-18

3400

[2,8]

\frac{1}{3400}

Bietti et al. (Bietti et al., 2022)

Gaussian

Classification

Shallow CNN

1000

[0.1,1000]

10^{-4}

Shi et al. (Shi et al., 2023)

Gaussian

RDP

Classification

ResNet-18

500

[4,10]

\frac{1}{500}

Xu et al. (Xu et al., 2023a)

Gaussian

RDP

Classification

ResNet-50

[1262,9896000]

[10,20]

10^{-7}

Yang et al. (Yang et al., 2023b)

Gaussian

RDP

Classification

Shallow CNN

[2,16]

10^{-3}

Agarwal et al. (Agarwal et al., 2018)

Binomial

Classification

25M

[2,4]

10^{-9}

Wang et al. (Wang et al., 2020a)

Discrete Gaussian

RDP

Classification

Shallow CNN

100K

[2,4]

10^{-5}

Kerkouche et al. (Kerkouche et al., 2021)

Gaussian

Classification

Shallow CNN

[5011,6000]

[0.5,1]

10^{-5}

Kairouz et al. (Kairouz et al., 2021a)

Discrete Gaussian

zCDP

Classification

Shallow CNN

3400

[3,10]

\frac{1}{3400}

Agarwal et al. (Agarwal et al., 2021)

Skellam

RDP

Classification

Shallow CNN

1000k

[5,20]

10^{-6}

Chen et al. (Chen et al., 2022b)

Poisson Binomial

RDP

Classification

1000

[0.5,6]

10^{-5}

Stevens et al. (Stevens et al., 2022)

LWE

RDP

Classification

Shallow CNN

[500,1000]

[2,8]

10^{-5}

Chen et al. (Chen et al., 2022a)

CL with SA

Discrete Gaussian

RDP

Classification

Shallow CNN

[100,1000]

[0,10]

10^{-2}

Naseri et al. (Naseri et al., 2020)

SL, CL

Gaussian

RDP

Classification

Shallow CNN,LSTM

[100,660120]

[1.2,10.7]

10^{-5}

Yang et al. (Yang et al., 2023c)

SL, CL, CL with SA

Gaussian, Skellam

RDP

Classification

Shallow CNN

[40,500]

[2,8]

10^{-3}

Triastcyn et al. (Triastcyn and Faltings, 2019)

Bayesian DP

SL, CL

Gaussian

RDP

Classification

ResNet-50

[100,10000]

[0.2,4]

[10^{-3},10^{-6}]

Wang et al. (Wang et al., 2019b)

Classification

LR, SVM

[0.5,4]

Liu et al. (Liu et al., 2020a)

RR, PM

Classification

LR, SVM

4W-10W

[0.5,16]

Zhao et al. (Zhao et al., 2020b)

Three output, PM-SUB

Classification

LR, SVM

[0.5,4]

Wang et al. (Wang et al., 2020c)

RRP

Topic Modeling

LDA

150

[5,8]

[0.05,0.5]

Yang et al. (Yang et al., 2021)

Laplace

Classification

Shallow CNN

[200,1000]

[1,10]

Sun et al. (Sun et al., 2021)

Adaptive-Duchi

Classification

Shallow CNN

[100,500]

[1,5]

Lian et al. (Lian et al., 2022)

Laplace

Classification

Shallow CNN

[3,6]

Mahawaga et al. (Mahawaga Arachchige et al., 2022)

RAPPOR

Classification

Shallow CNN

[2,100]

[0.5,10]

Wang et al. (Wang et al., 2022b)

RAPPOR

Classification

[500,1800]

[0.1,10]

Jiang et al. (Jiang et al., 2022)

Classification

Shallow CNN

[100,750]

[0.5,12]

Li et al. (Li et al., 2022b)

Laplace

Classification

Shallow CNN

100

78.5

Zhao et al. (Zhao et al., 2022)

Adaptive-Harmony

Classification

Shallow CNN

200

[1,10]

Wang et al. (Wang et al., 2023a)

EM, DMP-UE

Classification

Shallow CNN

[10,50]

[0.1,1]

Zhang et al. (Zhang et al., 2023b)

Gaussian

Classification

LR, Shallow CNN

100

[3,30]

Varun et al. (Varun et al., 2024)

LDP

SRR

Classification

Shallow CNN

100

[1,10]

Truex et al. (Truex et al., 2020)

Condensed LDP

Classification

Shallow CNN

Liu et al. (Liu et al., 2021a)

Laplace,Shuffle

BC, AC

Classification

1000

4.696

5*10^{-6}

Liew et al. (Liew et al., 2022)

Harmony,Shuffle

RDP

Classification

Shallow CNN

[50000,60000]

[2.8]

Liu et al. (Liu et al., 2023b)

Clipped-Laplace,Shuffle

Classification

10000

25.6

10^{-8}

Girgis et al. (Girgis et al., 2021b)

Laplace,Shuffle

Classification

Shallow CNN

60000

[1,10]

10^{-5}

Horizontal

Chen et al. (Chen et al., 2023a)

Shuffle Model

Duchi,Shuffle

GDP

Classification

Shallow CNN

100

[0.5, 100]

10^{-5}

Oh et al. (Oh et al., 2022)

Gaussian

RDP

Classification

VGG-16

[1,40]

Yang et al. (Yang et al., 2022)

Laplace, KRR

Classification

Shallow CNN

Takahashi et al. (Takahashi et al., 2023)

Label DP

KRR

Classification

GBDT

[0.1,2.0]

Chen et al. (Chen et al., 2020)

Gaussian

GDP

Classification

Shallow CNN

[3,8]

Wang et al. (Wang et al., 2020b)

Gaussian

Classification

Shallow CNN

[0.001, 10]

10^{-2}

Wu et al. (Wu et al., 2020)

Laplace

Classification

GBDT

[2,10]

Tian et al. (tian2020federboost)

Classification

GBDT

Mao et al. (Mao et al., 2022)

LDP

Laplace, RR

Classification

Shallow CNN

[0.1,4.0]

Vertical

Li et al. (Li et al., 2022a)

Condensed LDP

Discrete Laplace

Classification

GBDT

[0.64,2.56]

Papernot er al. (Papernot et al., 2017)

Laplace

Classification

Shallow CNN

[2.04,8.19]

[10^{-5},10^{-6}]

Papernot er al. (Papernot et al., 2018)

Gaussian

RDP

Classification

Resnet-18

[0.59,8.03]

10^{-8}

Sun et al. (Sun and Lyu, 2020)

Random Sampling

Classification

Shallow CNN

[0.003,0.65]

[0.006,0.65]

Tian et al. (Tian et al., 2022)

Gaussian

GDP

Text Generation

GPT-2

2000

[3,5]

10^{-6}

Hoech et al. (Hoech et al., 2022)

Gaussian

Classification

Resnet-18

[0.1,0.5]

Wan et al. (Wan et al., 2023)

Gaussian

Recommendection

DeepFM

[0.05, 10]

Pan et al. (Pan et al., 2021a)

Gaussian

RDP

Classification

Resnet-18

100

[0.95,9.03]

Dodwadmath et al. (Dodwadmath and Stich, 2022)

Laplace

Classification

Shallow CNN

[11.75,20]

10^{-5}

Transform

Qi et al. (Qi et al., 2023)

LDP

KRR

Classification

Shallow CNN

[2,5]

[2,7]

1

1. BC=Basic Sequential Composition Theory, AC=Advanced Sequential Composition Theory.
2

2. LR=Logistic Regression, SVM=Support Vector Machine, GBDT=Gradient Boosting Decision Tree.

4.2. Differentially Private TFL

TFL is applicable to data holders who do not possess or only possess a small number of identical samples and features, enabling them to perform joint training. As shown in Figure 6(b), there are $K$ hospitals located in different regions, such as Hospital $1$ in Japan, …, and Hospital $K$ in the United States. Due to geographic limitations, the user populations of multiple medical entities have little overlap, and the data features of these entity datasets hardly overlap. In order to transfer the knowledge of the local models of these hospitals to unlabeled public datasets, Knowledge Distillation (KD) (Hinton et al., 2015) is commonly used, which achieves knowledge transfer by querying and aggregating the output predictions of each local model. However, relevant research has shown that attackers can attack the original data information based on the output predictions of the local models (Zhang et al., 2020). Therefore, numerous studies aim to incorporate differential privacy techniques into TFL in order to enhance data privacy protection.

Papernot et al. (Papernot et al., 2017) first introduced DP into TFL and proposed the Private Aggregation of Teacher Ensembles (PATE) algorithm. It divides the private dataset into multiple disjoint data batches and trains a “teache” model for each batch. Then, each teacher submits its predictions for the public unlabeled dataset (student) to an aggregator. The aggregator collects and perturbs the predictions of all teachers to ensure that the aggregated labels maintain differential privacy, and sends the perturbed predictions to the students for training. The following year, Papernot et al. (Papernot et al., 2018) optimized the PATE algorithm by using Gaussian mechanism for perturbation and obtaining tighter privacy bounds using RDP, making PATE applicable on a larger scale. Tian et al. (Tian et al., 2022) applied PATE to text generation models. To reduce the noise required by DP for large output spaces in vocabulary size, they dynamically filter out unimportant words to reduce the output space. They also proposed an effective knowledge distillation strategy to reduce the number of queries.

However, traditional PATE methods only have one client’s private dataset. Pan et al. (Pan et al., 2021a) extended it to multiple clients and treated each client as a teacher model, achieving client-level DP with a trusted server. To address the problem of highly inconsistent predictions among teachers due to data heterogeneity among multiple clients, Dodwadmath et al. (Dodwadmath and Stich, 2022) improved PATE by using an auxiliary global model that incorporates teacher averaging and using update correction to reduce the variance of teacher updates. To eliminate the reliance on a trusted third party, Qi et al. (Qi et al., 2023) converted the predictions outputted by each client teacher into one-hot encoding for prediction and perturbed each one-hot value using Random Response (RR) to satisfy LDP.

In addition, researchers have started to study privacy protection scenarios where there is a small overlap of user data among the participants. Wan et al. (Wan et al., 2023) proposed the privacy-preserving dual distillation framework called FedPDD. Clients use locally trained models to make predictions on a small amount of overlapping user data and submit the prediction results to the server. The server aggregates the local predictions of the participants to obtain ensemble teacher knowledge and distributes it to the parties for subsequent training. Differential privacy protection is applied during the aggregation of local predictions on the server. Sun et al. (Sun and Lyu, 2020) use sampling with replacement or without replacement of participant data samples to achieve noiseless differential privacy. Hoech et al. (Hoech et al., 2022) proposed the FedAUXfdp algorithm, which adds Gaussian noise to the regularized multinomial logistic regression vector during local client training of classification models, upgrades the feature extractor to a frozen feature extractor, and solves the problem of heterogeneous client data distribution in one-shot scenarios while satisfying differential privacy.

5. Applications

What we have discussed in the aforementioned parts mostly applied differential privacy FL techniques to classification tasks on image data, as Table 3 shown. In this section, we will describe the more application of differentially private FL from the perspectives of data type and real-world scenarios.

5.1. Application to Data Types.

5.1.1. Graph

Recent advancements in FL applied to graph data focus on developing methodologies that safeguard privacy without sacrificing the utility of graph neural networks (GNNs). The core operation of GNNs is message passing, which makes the learning of a node’s embedding depends not only on the node itself but also on its neighboring nodes. Accurately measuring this dependency level is crucial for determining sensitivity, which determines the noise scale introduced by the DP mechanism (Daigavane et al., 2022). So, the main challenge in applying graph data is controlling the depth and breadth of propagation to limit sensitivity while utilizing the transitivity within graph data, thereby minimizing the impact of DP-introduced noise on model utility.

Currently, methods to address the dependency issue can be divided into two categories. The first category avoids direct measurement of the dependency level by adding noise outside the GNN training phase. Wu et al. proposed LinkTeller, a link-based attack for the vertical federated learning scenario, and implemented link privacy protection by adding Laplacian noise to the graph’s adjacency matrix (Wu et al., 2022). Another approach proposed in (Lin et al., 2022) perturbs decentralized graph data by protecting edge and node features of each user’s adjacency list through randomized response. Under the same setting, Lin et al. suggested perturbing locally trained node embeddings (Wu et al., 2021a). However, these perturbation mechanism results in significant usability loss of the model since the design is not tailored to GNNs. Sajadmanesh et al. proposed extracting the aggregation function from the GNN, performing manual message passing, and adding noise to the aggregated message (Sajadmanesh et al., 2023). This method avoids complex dependency measurement but renders the aggregation function non-learnable, limiting its application to other network architectures. The second category measures the dependency level of nodes and applies gradient perturbation for private GNN training. Daigavane et al. proposed limiting the in-degree of nodes through sampling, measuring the upper bound of the dependency level, and implementing node differential privacy using the DP-SGD paradigm (Dai et al., 2021). According to the theoretical results, the sensitivity remains high, especially as the depth of GNN layers increases.

5.1.2. Bipartite Matrix

In a bipartite matrix, rows and columns represent different entities, and the elements describe the relationships between these entities. This type of data is a special case of graph data, but it needs specific discussion due to its relevance to the recommendation systems. In recommendation systems, the rows and columns of a bipartite matrix represent users and items, respectively. The observed data can be categorized into two types: explicit feedback, such as 1-5-star ratings reflecting the intensity of user preferences, and implicit feedback, such as retweets and clicks, which indicate interactions without explicitly reflecting preferences (Koren et al., 2021).

Studies such as (Friedman et al., 2016) primarily focus on scenarios involving explicit feedback. They propose a DP protection framework for explicit data in bipartite matrices using matrix factorization models as the building block. This framework applies input, process, and output perturbation mechanisms, with results indicating that process perturbation mechanisms based on stochastic gradient descent and alternating least squares achieve the best outcomes. Other studies, such as (Minto et al., 2021) and (Ammad-Ud-Din et al., 2019), explore federated collaborative filtering with implicit feedback, employing user-level differential privacy and highlighting that greater noise must be added to meet unbounded DP requirements with implicit feedback. Bipartite matrices, used in recommendation systems with many items and a sparse structure, pose communication challenges in federated learning due to high overhead. Shin et al. (Shin et al., 2018) proposed solution involves clients uploading only a single element of the gradient matrix based on a randomized selection mechanism rather than the entire gradient matrix. This allows the server to aggregate unbiased estimates of the global gradient matrix, ensuring the algorithm satisfies LDP while maintaining the usability of the gradient matrix.

5.1.3. Time Series

In time series data scenarios, unlike the single release of data from a static database, adversaries can observe multiple differentially private outputs. To publish time series data with DP in FL settings, a widely accepted approach is to use a hierarchical structure. The idea is to partition the time series into multiple granularities and then add noise to the time series to satisfy DP (Chan et al., 2011; Dwork et al., 2010a). The size of the DP noise is generally determined by the upper bound of the data. However, time series data is often concentrated below a value much smaller than the upper bound. This issue is typically addressed using truncation-based methods (Perrier et al., 2018). Wang et al. optimized the searching for the truncating threshold. Their contributions include an EM-based algorithm to find the threshold and an online consistency algorithm (Wang et al., 2021).

On the other hand, handling time series data often requires using recurrent neural networks (RNNs) and their variants for modeling. Typically, when there are not many participants, using some RNN variants is feasible under DP, e.g., gated recurrent unit neural networks (Xu et al., 2022). However, for tasks like traffic flow forecasting, it is difficult to converge due to many participants and expensive communication overhead. To address this, Liu et al. designed a joint-announcement protocol in the aggregation mechanism to randomly select a certain proportion of organizations from many participants in the $i$ -th round of training (Liu et al., 2020b). For time series data related to location information, such as trajectory prediction, better spatiotemporal correlation of the data leads to better performance. Thus, leveraging spatiotemporal correlation is crucial. Liu et al. pre-clustered clients based on this principle. It integrates the global model of each cluster center using an ensemble learning scheme, thereby achieving the best accuracy (Liu et al., 2020b). The clustering decision is determined by using the latitude and longitude information.

5.2. Real-World Implementations

5.2.1. Natural Language Processing

The application of differentially private Federated Learning in natural language processing (NLP) has garnered significant attention, with studies exploring both shallow and large language models. In the realm of shallow networks like recurrent neural networks and long short-term memory (LSTM) models, (McMahan et al., 2017b) and (Kairouz et al., 2021c) propose innovative approaches to training models with sample-level differential privacy while minimizing the impact on predictive accuracy. In the domain of large language models (LLMs), (Wang et al., 2023b) and (Xu et al., 2023b) extend the exploration of differentially private FL, focusing on enhancing privacy-utility trade-offs and implementing practical solutions for real-world applications. Wang et al. (Wang et al., 2023b) leverages public pre-trained LLMs and introduces a distribution matching algorithm to improve sample efficiency in public training, showcasing a strong privacy-utility trade-off without relying on pre-trained classifiers. Meanwhile, (Xu et al., 2023b) implements DP-FTRL in Google Keyboard (Gboard), ensuring formal differential privacy guarantees without uniform client device sampling. These studies collectively highlight the versatility and effectiveness of differentially private FL techniques in NLP tasks, paving the way for enhanced privacy protection while maintaining utility in language model training across various settings.

5.2.2. Health Medical

Recent studies have focused on enhancing privacy in the application of FL within the healthcare domain, particularly addressing the concerns around the confidentiality of patient data when leveraging the Internet-of-Medical-Things (IoMT) and distributed healthcare datasets. The research highlighted in (Wu et al., 2021b) introduces the concept of adding artificial noise to IoMT device datasets for user privacy. Similarly, (Malekzadeh et al., 2021) adopts a differentially private stochastic gradient descent approach, combined with secure aggregation through homomorphic encryption, to work on distributed healthcare data, demonstrating its efficacy on a dataset of diabetic retinopathy. Meanwhile, (Choudhury et al., 2019) explores the use of differential privacy in FL to model Electronic Health Records (EHR) across various hospitals, focusing on tasks such as predicting adverse drug reactions and mortality rates, with a significant dataset containing sensitive information like diagnosis and admission records. Further advancements in FL for medical applications aim to address the bandwidth inefficiencies and privacy vulnerabilities associated with the exchange of model updates among a vast network of IoMT devices and federated clients, such as MRI scan machines (Kerkouche et al., 2021; Li et al., 2019; Chen et al., 2021).

5.2.3. Internet of Things

FL is explored extensively in the IoT recently. Zhao et al. (Zhao et al., 2020b) survey FL applications in mobile edge network comprehensively, including algorithms, applications and potential research problems. Besides, Kong et al. (Kong et al., 2021) propose a collaborative learning framework on the edges for connected vehicles, and it reduces the training time while guaranteeing the prediction accuracy. Lu et al. (Lu et al., 2019) leverage FL to protect the privacy of mobile edge computing and they propose a random distributed update scheme to get rid of the security threats led by a centralized curator, while Pan et al. (Pan et al., 2021b) apply DP to energy harvesting with collaborative and intelligent protection cross the energy side and information side. He et al. (He et al., 2023) propose a LDP scheme to train clustered FL models on heterogeneous IoT data by utilizing dimension reduction methods. Imteaj et al. (Imteaj et al., 2021) studied the problem of how to train distributed machine learning models for resource-constrained IoT devices.

6. Open Challenges and Future Direction

In addition to achieving a better trade-off between privacy protection and model utility in differentially private federated learning, we present some challenges in current studies and provide promising directions for future research.

6.1. Convergence Analysis of Differentially Private HFL

In HFL, A tight upper bound on convergence not only provides theoretical assurance of rapid convergence, but also enables an analysis of the impact of various hyperparameters on the speed of convergence, which can guide parameter tuning or the proposal of new optimization algorithms (Haddadpour and Mahdavi, 2019; Li and Lyu, 2024). However, the convergence analysis of differentially private HFL, compared to HFL, needs to consider the effects of operations such as clipping and adding noise, making it more challenging to obtain a relatively tight upper bound (Zhang et al., 2022). Wei et al. (Wei et al., 2020) introduced the NbAFL algorithm and provided its convergence analysis, while Ling et al. (Ling et al., 2023) offered convergence analysis for DPSGD across multiple iterations, but their assumptions about the objective function were strong. Three studies (Zhang et al., 2022; Cheng et al., 2022; Shi et al., 2023) conducted convergence analysis of their respective CL-DP algorithms under non-convex conditions, addressing the issue of overly strong assumptions on the objective function but still making strong assumptions on gradients, such as Bounded Gradient and Bounded Variance.

So, conducting convergence analysis without strong assumptions on the loss function and gradients in differentially private HFL is a challenging. Furthermore, current analyses of the proposed differentially private HFL algorithms’ convergence upper bound remain at $\mathcal{O}(\frac{1}{\epsilon})$ (Li et al., 2022c), finding tighter convergence bounds is a future direction.

6.2. User-Level DP in HFL

We have discussed the sample-level and client-level of DP as above, but in many scenarios, there are many users in a client and individual user may contribute multiple samples or records. For example, in a hospital, a patient may have multiple medical records. So, For each hospital, the goal of user-level DP is to hide the presence of a single user, or to be more specific, to bound the influence of any single user on the learning outcome distribution (i.e. the distribution of the model parameters). So, in a user-level DP scenario, we need to capture the collection of all data points associated with a user. We can consider a dataset with $m$ users and $n$ samples, where $m\leq n$ . Therefore, individual users can have multiple data samples.Therefore, based on the differential neighboring datasets held by data owners, we can get the notion of UL-DP in HFL as follows.

Definition 6.1.

(User-level DP (UL-DP)). Under UL-DP, two datasets $D$ and $D^{\prime}$ are neighbouring if they differ in all samples (records) from a single user. (either through addition or through removal).

In UL-DP, allowing users to contribute a large number of data samples, even if most contribute only a few, may eventually introduce excessive noise to protect against minority outliers. On the other hand, restricting users to make only small contributions can maintain a lower level of noise but may discard a large amount of surplus data, introducing bias. Some related research has investigated user-level considerations in tasks such as mean estimation, linear regression and experience risk minimisation (Epasto et al., 2020; Liu et al., 2020d; Levy et al., 2021). However, there are currently no articles addressing this issue in the context of federated learning.

6.3. Privacy Auditing for various DP Models and Neighborhood Levels in FL

Above, we have discussed various DP models and neighborhood levels in federated learning. In particular, as shown in Table 2, we can observe that the objects directly protected by differential privacy federated learning vary based on different DP models and neighborhood levels. Currently, many articles have been devoted to algorithm design based on different DP models and neighborhood levels, resulting in different perturbation algorithms. However, there has not yet been an article that provides a comprehensive comparison among them. Although privacy protection can be characterized by privacy budget $\epsilon$ , these definitions may have different privacy protection effects in practical scenarios due to their diverse objects of protection.

Therefore, conducting practical attack methods to quantify their privacy effects is a important direction of future work. For instance, utilizing membership inference attacks (Salem et al., 2019), attribute inference attacks (Melis et al., 2019), and data reconstruction attacks (Zhao et al., 2020a) to measure their impacts. Or using DP auditing technology (Jagielski et al., 2020; Nasr et al., 2023) to approach the worst case guarantee under various DP models and levels, for example, LDP and SL-DP. Although Naseri et al. (Naseri et al., 2020) have conducted related research, comparing SL-DP with CL-DP based on a trusted server, we believe their comparison is not comprehensive enough and that these two definitions cannot be directly compared due to their different security assumptions and objects of protection. Secondly, DP guarantee based on implausible worst-case assumptions, this makes it difficult for the privacy budget $\epsilon$ to intuitively correspond to real-world privacy leakage scenarios. So, it is valuable to discuss the gap between DP guarantee under various DP models and neighborhood levels and ML privacy attacks (Salem et al., 2023).

6.4. Differentially Private Federated Learning under Large Scale Models

Most of the aforementioned studies have conducted experiments on small datasets and small-scale models. In real industrial scenarios, we face very large datasets and large-scale models. Dealing with extensive text datasets and Large Language Models such as GPT-4 poses significant challenges within the realm of differentially private FL (Wang et al., 2023b; Xu et al., 2023b). The high dimensionality of embeddings means that even small noise can substantially impact the convergence and model performance (Tramer and Boneh, 2020; Yu et al., 2021a). Apart that, the process for methods based DPSGD can be quite time-consuming, as they require clipping for each sample. Therefore, accelerating the training or fine-tuning of DPSGD and enhancing model accuracy in large models become critical issues. Although some efforts have been made in this direction, most current research only focuses on models such as BERT and GPT-2 (Yu et al., 2021b; Shi et al., 2022). Additionally, there remains a significant utility gap between private and non-private models. Moreover, the non-private scenario already necessitates substantial hyperparameter tuning for large-scale models, and the introduction of a privacy perspective further exacerbates this issue, with each tuning attempt adding to the privacy budget (Cattan et al., 2022). This highlights the challenge of efficiently and privately tuning hyperparameters in large models.

6.5. Cross-domain differentially private TFL

Current works of differentially private TFL mostly focus on intra-domain scenarios, where the source and target datasets share similar features. However, in real-world scenarios, the source and target datasets often exhibit different features (cross-domain) (Fernando et al., 2013; Pan et al., 2010). Unsupervised Multi-source Domain Adaptation (UMDA) addresses this by transferring transferable features from multiple source domains to an unlabeled target domain (Zhang et al., 2015; Chang et al., 2019). Due to the unavailability of direct access to sensitive data, all data and computations on the source domain must remain decentralized. Additionally, since uploaded parameters may leak sensitive information, we need to protect the parameters uploaded from the source domain with differential privacy. Therefore, striking a balance between the privacy and utility of cross-domain differentially private federated learning poses a key challenge currently.

7. Conclusion

In this study, we explored and systematized the differentially private federated learning. We categorized DP models in FL into three major classes, namely DP, LDP, and shuffle model, based on the definitions and guarantees of differential privacy. Further, within DP and shuffle model, we differentiated between SL-DP and CL-DP based on the definition of neighboring datasets. Subsequently, we showed the applications of differentially private federated learning in differential data types and real-world scenarios. Based on these discussions, we provided 5 promising directions for future research. We aim to provide practitioners with an overview of the current technical and application of differential privacy in federated learning, stimulating both foundational and applied research in the future.

References

(1)
Abadi et al. (2016) Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. 2016. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security. 308–318.
Agarwal et al. (2021) Naman Agarwal, Peter Kairouz, and Ziyu Liu. 2021. The skellam mechanism for differentially private federated learning. Advances in Neural Information Processing Systems 34 (2021), 5052–5064.
Agarwal et al. (2018) Naman Agarwal, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Brendan McMahan. 2018. cpSGD: Communication-efficient and differentially-private distributed SGD. Advances in Neural Information Processing Systems 31 (2018).
Ammad-Ud-Din et al. (2019) Muhammad Ammad-Ud-Din, Elena Ivannikova, Suleiman A Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan, and Adrian Flanagan. 2019. Federated collaborative filtering for privacy-preserving personalized recommendation system. arXiv preprint arXiv:1901.09888 (2019).
Andrew et al. (2021) Galen Andrew, Om Thakkar, Brendan McMahan, and Swaroop Ramaswamy. 2021. Differentially private learning with adaptive clipping. Advances in Neural Information Processing Systems 34 (2021), 17455–17466.
Balle et al. (2020) Borja Balle, Gilles Barthe, Marco Gaboardi, Justin Hsu, and Tetsuya Sato. 2020. Hypothesis testing interpretations and renyi differential privacy. In International Conference on Artificial Intelligence and Statistics. PMLR, 2496–2506.
Balle et al. (2019) Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. 2019. The privacy blanket of the shuffle model. In Advances in Cryptology–CRYPTO 2019: 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 18–22, 2019, Proceedings, Part II 39. Springer, 638–667.
Bietti et al. (2022) Alberto Bietti, Chen-Yu Wei, Miroslav Dudik, John Langford, and Steven Wu. 2022. Personalization improves privacy-accuracy tradeoffs in federated learning. In International Conference on Machine Learning. PMLR, 1945–1962.
Bittau et al. (2017) Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. 2017. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th symposium on operating systems principles. 441–459.
Bonawitz et al. (2017) Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 1175–1191.
Bun and Steinke (2016a) Mark Bun and Thomas Steinke. 2016a. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference. Springer, 635–658.
Bun and Steinke (2016b) Mark Bun and Thomas Steinke. 2016b. Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds. 635–658. https://doi.org/10.1007/978-3-662-53641-4_24
Canonne et al. (2020) Clément L Canonne, Gautam Kamath, and Thomas Steinke. 2020. The discrete gaussian for differential privacy. Advances in Neural Information Processing Systems 33 (2020), 15676–15688.
Cattan et al. (2022) Yannis Cattan, Christopher A Choquette-Choo, Nicolas Papernot, and Abhradeep Thakurta. 2022. Fine-tuning with differential privacy necessitates an additional hyperparameter search. arXiv preprint arXiv:2210.02156 (2022).
Chan et al. (2011) T-H Hubert Chan, Elaine Shi, and Dawn Song. 2011. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC) 14, 3 (2011), 1–24.
Chang et al. (2019) Woong-Gi Chang, Tackgeun You, Seonguk Seo, Suha Kwak, and Bohyung Han. 2019. Domain-specific batch normalization for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition. 7354–7362.
Chen et al. (2023a) E Chen, Yang Cao, and Yifei Ge. 2023a. A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility. arXiv preprint arXiv:2312.14388 (2023).
Chen et al. (2021) Mingzhe Chen, Deniz Gündüz, Kaibin Huang, Walid Saad, Mehdi Bennis, Aneta Vulgarakis Feljan, and H Vincent Poor. 2021. Distributed learning in wireless networks: Recent progress and future challenges. IEEE Journal on Selected Areas in Communications 39, 12 (2021), 3579–3605.
Chen et al. (2019) Mia Xu Chen, Benjamin N Lee, Gagan Bansal, Yuan Cao, Shuyuan Zhang, Justin Lu, Jackie Tsay, Yinan Wang, Andrew M Dai, Zhifeng Chen, et al. 2019. Gmail smart compose: Real-time assisted writing. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2287–2295.
Chen et al. (2020) Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. 2020. Vafl: a method of vertical asynchronous federated learning. arXiv preprint arXiv:2007.06081 (2020).
Chen et al. (2022a) Wei-Ning Chen, Christopher A Choquette Choo, Peter Kairouz, and Ananda Theertha Suresh. 2022a. The fundamental price of secure aggregation in differentially private federated learning. In International Conference on Machine Learning. PMLR, 3056–3089.
Chen et al. (2022b) Wei-Ning Chen, Ayfer Ozgur, and Peter Kairouz. 2022b. The poisson binomial mechanism for unbiased federated learning with secure aggregation. In International Conference on Machine Learning. PMLR, 3490–3506.
Chen et al. (2023b) Xi Chen, Sentao Miao, and Yining Wang. 2023b. Differential privacy in personalized pricing with nonparametric demand models. Operations Research 71, 2 (2023), 581–602.
Cheng et al. (2022) Anda Cheng, Peisong Wang, Xi Sheryl Zhang, and Jian Cheng. 2022. Differentially private federated learning with local regularization and sparsification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10122–10131.
Cheu et al. (2019) Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. 2019. Distributed differential privacy via shuffling. In Advances in Cryptology–EUROCRYPT 2019: 38th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Darmstadt, Germany, May 19–23, 2019, Proceedings, Part I 38. Springer, 375–403.
Choudhury et al. (2019) Olivia Choudhury, Aris Gkoulalas-Divanis, Theodoros Salonidis, Issa Sylla, Yoonyoung Park, Grace Hsu, and Amar Das. 2019. Differential privacy-enabled federated learning for sensitive health data. arXiv preprint arXiv:1910.02578 (2019).
Cummings and Desai (2018) Rachel Cummings and Deven Desai. 2018. The role of differential privacy in gdpr compliance. In FAT’18: Proceedings of the Conference on Fairness, Accountability, and Transparency. 20.
Dai et al. (2021) Zhongxiang Dai, Bryan Kian Hsiang Low, and Patrick Jaillet. 2021. Differentially private federated Bayesian optimization with distributed exploration. Advances in Neural Information Processing Systems 34 (2021), 9125–9139.
Daigavane et al. (2022) Ameya Daigavane, Gagan Madan, Aditya Sinha, Abhradeep Guha Thakurta, Gaurav Aggarwal, and Prateek Jain. 2022. Node-Level Differentially Private Graph Neural Networks. In ICLR 2022 Workshop on PAIR $\{$ $\backslash$ textasciicircum $\}$ 2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data.
Damgård et al. (2012) Ivan Damgård, Valerio Pastro, Nigel Smart, and Sarah Zakarias. 2012. Multiparty computation from somewhat homomorphic encryption. In Annual Cryptology Conference. Springer, 643–662.
Dodwadmath and Stich (2022) Akshay Dodwadmath and Sebastian U Stich. 2022. Preserving privacy with PATE for heterogeneous data. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications.
Dong et al. (2021) Jinshuo Dong, Aaron Roth, and Weijie Su. 2021. Gaussian Differential Privacy. Journal of the Royal Statistical Society (2021).
Duchi et al. (2013) John Duchi, Martin J Wainwright, and Michael I Jordan. 2013. Local privacy and minimax bounds: Sharp rates for probability estimation. Advances in Neural Information Processing Systems 26 (2013).
Duchi et al. (2018) John C Duchi, Michael I Jordan, and Martin J Wainwright. 2018. Minimax optimal procedures for locally private estimation. J. Amer. Statist. Assoc. 113, 521 (2018), 182–201.
Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 265–284.
Dwork et al. (2010a) Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N Rothblum. 2010a. Differential privacy under continual observation. In Proceedings of the forty-second ACM symposium on Theory of computing. 715–724.
Dwork et al. (2014) Cynthia Dwork, Aaron Roth, et al. 2014. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science 9, 3–4 (2014), 211–407.
Dwork et al. (2010b) Cynthia Dwork, Guy N Rothblum, and Salil Vadhan. 2010b. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science. IEEE, 51–60.
El Ouadrhiri and Abdelhadi (2022) Ahmed El Ouadrhiri and Ahmed Abdelhadi. 2022. Differential privacy for deep and federated learning: A survey. IEEE access 10 (2022), 22359–22380.
Epasto et al. (2020) Alessandro Epasto, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni, and Lijie Ren. 2020. Smoothly bounding user contributions in differential privacy. Advances in Neural Information Processing Systems 33 (2020), 13999–14010.
Erlingsson et al. (2019) Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. 2019. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. SIAM, 2468–2479.
Erlingsson et al. (2014) Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security. 1054–1067.
Evfimievski et al. (2003) Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. 2003. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. 211–222.
Feldman (2020) Vitaly Feldman. 2020. Does learning require memorization? a short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. 954–959.
Feldman et al. (2022) Vitaly Feldman, Audra McMillan, and Kunal Talwar. 2022. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 954–964.
Fernando et al. (2013) Basura Fernando, Amaury Habrard, Marc Sebban, and Tinne Tuytelaars. 2013. Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE international conference on computer vision. 2960–2967.
Foret et al. (2020) Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2020. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412 (2020).
Friedman et al. (2016) Arik Friedman, Shlomo Berkovsky, and Mohamed Ali Kaafar. 2016. A differential privacy framework for matrix factorization recommender systems. User Modeling and User-Adapted Interaction 26 (2016), 425–458.
Fu et al. (2022b) Chong Fu, Xuhong Zhang, Shouling Ji, Jinyin Chen, Jingzheng Wu, Shanqing Guo, Jun Zhou, Alex X Liu, and Ting Wang. 2022b. Label inference attacks against vertical federated learning. In 31st USENIX security symposium (USENIX Security 22). 1397–1414.
Fu et al. (2022a) Jie Fu, Zhili Chen, and Xiao Han. 2022a. Adap DP-FL: Differentially Private Federated Learning with Adaptive Noise. In 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE, 656–663.
Fu et al. (2023) Jie Fu, Qingqing Ye, Haibo Hu, Zhili Chen, Lulu Wang, Kuncan Wang, and Ran Xun. 2023. DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release. arXiv preprint arXiv:2311.14056 (2023).
Geyer et al. (2017) Robin C Geyer, Tassilo Klein, and Moin Nabi. 2017. Differentially private federated learning: A client level perspective. arXiv preprint arXiv:1712.07557 (2017).
Ghazi et al. (2021) Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi, and Chiyuan Zhang. 2021. Deep learning with label differential privacy. Advances in neural information processing systems 34 (2021), 27131–27145.
Girgis et al. (2021b) Antonious Girgis, Deepesh Data, Suhas Diggavi, Peter Kairouz, and Ananda Theertha Suresh. 2021b. Shuffled model of differential privacy in federated learning. In International Conference on Artificial Intelligence and Statistics. PMLR, 2521–2529.
Girgis et al. (2021a) Antonious M Girgis, Deepesh Data, and Suhas Diggavi. 2021a. Differentially private federated learning with shuffling and client self-sampling. In 2021 IEEE International Symposium on Information Theory (ISIT). IEEE, 338–343.
Goldreich (2009) Oded Goldreich. 2009. Foundations of cryptography: volume 2, basic applications. Cambridge university press.
Gursoy et al. (2019) Mehmet Emre Gursoy, Acar Tamersoy, Stacey Truex, Wenqi Wei, and Ling Liu. 2019. Secure and utility-aware data collection with condensed local differential privacy. IEEE Transactions on Dependable and Secure Computing 18, 5 (2019), 2365–2378.
Haddadpour and Mahdavi (2019) Farzin Haddadpour and Mehrdad Mahdavi. 2019. On the convergence of local descent methods in federated learning. arXiv preprint arXiv:1910.14425 (2019).
He et al. (2023) Zaobo He, Lintao Wang, and Zhipeng Cai. 2023. Clustered federated learning with adaptive local differential privacy on heterogeneous iot data. IEEE Internet of Things Journal (2023).
Hinton et al. (2015) Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
Hoech et al. (2022) Haley Hoech, Roman Rischke, Karsten Müller, and Wojciech Samek. 2022. FedAUXfdp: Differentially Private One-Shot Federated Distillation. In International Workshop on Trustworthy Federated Learning. Springer, 100–114.
Horváth et al. (2023) Samuel Horváth, Dmitry Kovalev, Konstantin Mishchenko, Peter Richtárik, and Sebastian Stich. 2023. Stochastic distributed learning with gradient quantization and double-variance reduction. Optimization Methods and Software 38, 1 (2023), 91–106.
Hu et al. (2023) Lijie Hu, Ivan Habernal, Lei Shen, and Di Wang. 2023. Differentially Private Natural Language Models: Recent Advances and Future Directions. arXiv preprint arXiv:2301.09112 (2023).
Huang et al. (2020) Xixi Huang, Ye Ding, Zoe L Jiang, Shuhan Qi, Xuan Wang, and Qing Liao. 2020. DP-FL: a novel differentially private federated learning framework for the unbalanced data. World Wide Web 23 (2020), 2529–2545.
Huang et al. (2019) Zonghao Huang, Rui Hu, Yuanxiong Guo, Eric Chan-Tin, and Yanmin Gong. 2019. DP-ADMM: ADMM-based distributed learning with differential privacy. IEEE Transactions on Information Forensics and Security 15 (2019), 1002–1012.
Imola and Chaudhuri (2021) Jacob Imola and Kamalika Chaudhuri. 2021. Privacy amplification via bernoulli sampling. arXiv preprint arXiv:2105.10594 (2021).
Imteaj et al. (2021) Ahmed Imteaj, Urmish Thakker, Shiqiang Wang, Jian Li, and M Hadi Amini. 2021. A survey on federated learning for resource-constrained IoT devices. IEEE Internet of Things Journal 9, 1 (2021), 1–24.
Jagielski et al. (2024) Matthew Jagielski, Milad Nasr, Katherine Lee, Christopher A Choquette-Choo, Nicholas Carlini, and Florian Tramer. 2024. Students parrot their teachers: Membership inference on model distillation. Advances in Neural Information Processing Systems 36 (2024).
Jagielski et al. (2020) Matthew Jagielski, Jonathan Ullman, and Alina Oprea. 2020. Auditing differentially private machine learning: How private is private sgd? Advances in Neural Information Processing Systems 33 (2020), 22205–22216.
Jayaraman and Evans (2019) Bargav Jayaraman and David Evans. 2019. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium (USENIX Security 19). 1895–1912.
Jiang et al. (2021) Bin Jiang, Jianqiang Li, Guanghui Yue, and Houbing Song. 2021. Differential privacy for industrial internet of things: Opportunities, applications, and challenges. IEEE Internet of Things Journal 8, 13 (2021), 10430–10451.
Jiang et al. (2022) Xue Jiang, Xuebing Zhou, and Jens Grossklags. 2022. Signds-fl: Local differentially private federated learning with sign-based dimension selection. ACM Transactions on Intelligent Systems and Technology (TIST) 13, 5 (2022), 1–22.
Kairouz et al. (2021a) Peter Kairouz, Ziyu Liu, and Thomas Steinke. 2021a. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In International Conference on Machine Learning. PMLR, 5201–5212.
Kairouz et al. (2021c) Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu. 2021c. Practical and private (deep) learning without sampling or shuffling. In International Conference on Machine Learning. PMLR, 5213–5225.
Kairouz et al. (2021b) Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021b. Advances and open problems in federated learning. Foundations and trends® in machine learning 14, 1–2 (2021), 1–210.
Karimireddy et al. (2020) Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. 2020. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning. PMLR, 5132–5143.
Kerkouche et al. (2021) Raouf Kerkouche, Gergely Ács, Claude Castelluccia, and Pierre Genevès. 2021. Compression boosts differentially private federated learning. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 304–318.
Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.
Kong et al. (2021) Xiangjie Kong, Haoran Gao, Guojiang Shen, Gaohui Duan, and Sajal K Das. 2021. Fedvcp: A federated-learning-based cooperative positioning scheme for social internet of vehicles. IEEE Transactions on Computational Social Systems 9, 1 (2021), 197–206.
Koren et al. (2021) Yehuda Koren, Steffen Rendle, and Robert Bell. 2021. Advances in collaborative filtering. Recommender systems handbook (2021), 91–142.
Levy et al. (2021) Daniel Levy, Ziteng Sun, Kareem Amin, Satyen Kale, Alex Kulesza, Mehryar Mohri, and Ananda Theertha Suresh. 2021. Learning with user-level privacy. Advances in Neural Information Processing Systems 34 (2021), 12466–12479.
Li et al. (2012) Ninghui Li, Wahbeh Qardaji, and Dong Su. 2012. On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security. 32–33.
Li et al. (2021b) Oscar Li, Jiankai Sun, Xin Yang, Weihao Gao, Hongyi Zhang, Junyuan Xie, Virginia Smith, and Chong Wang. 2021b. Label leakage and protection in two-party split learning. arXiv preprint arXiv:2102.08504 (2021).
Li et al. (2021c) Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, and Bingsheng He. 2021c. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering 35, 4 (2021), 3347–3366.
Li et al. (2019) Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M Jorge Cardoso, et al. 2019. Privacy-preserving federated brain tumour segmentation. In Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10. Springer, 133–141.
Li et al. (2022a) Xiaochen Li, Yuke Hu, Weiran Liu, Hanwen Feng, Li Peng, Yuan Hong, Kui Ren, and Zhan Qin. 2022a. OpBoost: a vertical federated tree boosting framework based on order-preserving desensitization. arXiv preprint arXiv:2210.01318 (2022).
Li and Lyu (2024) Yipeng Li and Xinchen Lyu. 2024. Convergence Analysis of Sequential Federated Learning on Heterogeneous Data. Advances in Neural Information Processing Systems 36 (2024).
Li et al. (2021a) Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma. 2021a. Neural attention distillation: Erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021).
Li et al. (2022b) Yuting Li, Guojun Wang, Tao Peng, and Guanghui Feng. 2022b. FedTA: Locally-Differential Federated Learning with Top-k Mechanism and Adam Optimization. In International Conference on Ubiquitous Security. Springer, 380–391.
Li et al. (2022c) Zhize Li, Haoyu Zhao, Boyue Li, and Yuejie Chi. 2022c. SoteriaFL: A unified framework for private federated learning with communication compression. Advances in Neural Information Processing Systems 35 (2022), 4285–4300.
Lian et al. (2022) Zhuotao Lian, Qinglin Yang, Qingkui Zeng, and Chunhua Su. 2022. Webfed: Cross-platform federated learning framework based on web browser with local differential privacy. In ICC 2022-IEEE International Conference on Communications. IEEE, 2071–2076.
Liew et al. (2022) Seng Pei Liew, Satoshi Hasegawa, and Tsubasa Takahashi. 2022. Shuffled check-in: privacy amplification towards practical distributed learning. arXiv preprint arXiv:2206.03151 (2022).
Lin et al. (2022) Wanyu Lin, Baochun Li, and Cong Wang. 2022. Towards private learning on decentralized graphs with local differential privacy. IEEE Transactions on Information Forensics and Security 17 (2022), 2936–2946.
Ling et al. (2023) Xinpeng Ling, Jie Fu, and Zhili Chen. 2023. Adaptive Local Steps Federated Learning with Differential Privacy Driven by Convergence Analysis. arXiv preprint arXiv:2308.10457 (2023).
Liu et al. (2021b) Junxu Liu, Jian Lou, Li Xiong, Jinfei Liu, and Xiaofeng Meng. 2021b. Projected federated averaging with heterogeneous differential privacy. Proceedings of the VLDB Endowment 15, 4 (2021), 828–840.
Liu et al. (2021a) Ruixuan Liu, Yang Cao, Hong Chen, Ruoyang Guo, and Masatoshi Yoshikawa. 2021a. Flame: Differentially private federated learning in the shuffle model. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8688–8696.
Liu et al. (2020a) Ruixuan Liu, Yang Cao, Masatoshi Yoshikawa, and Hong Chen. 2020a. Fedsel: Federated sgd under local differential privacy with top-k dimension selection. In Database Systems for Advanced Applications: 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24–27, 2020, Proceedings, Part I 25. Springer, 485–501.
Liu et al. (2023a) WeiKang Liu, Yanchun Zhang, Hong Yang, and Qinxue Meng. 2023a. A Survey on Differential Privacy for Medical Data Analysis. Annals of Data Science (2023), 1–15.
Liu et al. (2020b) Yi Liu, JQ James, Jiawen Kang, Dusit Niyato, and Shuyu Zhang. 2020b. Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet of Things Journal 7, 8 (2020), 7751–7763.
Liu et al. (2020c) Yang Liu, Yan Kang, Chaoping Xing, Tianjian Chen, and Qiang Yang. 2020c. A secure federated transfer learning framework. IEEE Intelligent Systems 35, 4 (2020), 70–82.
Liu et al. (2024) Yang Liu, Yan Kang, Tianyuan Zou, Yanhong Pu, Yuanqin He, Xiaozhou Ye, Ye Ouyang, Ya-Qin Zhang, and Qiang Yang. 2024. Vertical Federated Learning: Concepts, Advances, and Challenges. IEEE Transactions on Knowledge and Data Engineering (2024).
Liu et al. (2020d) Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X Yu, Sanjiv Kumar, and Michael Riley. 2020d. Learning discrete distributions: user vs item-level privacy. Advances in Neural Information Processing Systems 33 (2020), 20965–20976.
Liu et al. (2023b) Yixuan Liu, Suyun Zhao, Li Xiong, Yuhan Liu, and Hong Chen. 2023b. Echo of Neighbors: Privacy Amplification for Personalized Private Federated Learning with Shuffle Model. In Proceedings of the AAAI Conference on Artificial Intelligence.
Lu et al. (2019) Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan, and Yan Zhang. 2019. Differentially private asynchronous federated learning for mobile edge computing in urban informatics. IEEE Transactions on Industrial Informatics 16, 3 (2019), 2134–2143.
Lundervold and Lundervold (2019) Alexander Selvikvag Lundervold and Arvid Lundervold. 2019. An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik 29, 2 (2019), 102–127.
Mahawaga Arachchige et al. (2022) Pathum Chamikara Mahawaga Arachchige, Dongxi Liu, Seyit Camtepe, Surya Nepal, Marthie Grobler, Peter Bertok, and Ibrahim Khalil. 2022. Local differential privacy for federated learning. In European Symposium on Research in Computer Security. Springer, 195–216.
Malekzadeh et al. (2021) Mohammad Malekzadeh, Burak Hasircioglu, Nitish Mital, Kunal Katarya, Mehmet Emre Ozfatura, and Deniz Gündüz. 2021. Dopamine: Differentially private federated learning on medical data. arXiv preprint arXiv:2101.11693 (2021).
Mao et al. (2022) Yunlong Mao, Zexi Xin, Zhenyu Li, Jue Hong, Yang Qingyou, and Sheng Zhong. 2022. Secure Split Learning against Property Inference and Data Reconstruction Attacks. (2022).
Mathews and Bowman (2018) KJ Mathews and CM Bowman. 2018. The California Consumer Privacy Act of 2018.
McMahan et al. (2017a) Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017a. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273–1282.
McMahan et al. (2017b) H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2017b. Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963 (2017).
McSherry and Talwar (2007) Frank McSherry and Kunal Talwar. 2007. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07). IEEE, 94–103.
McSherry (2009) Frank D McSherry. 2009. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 19–30.
Melis et al. (2019) Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. 2019. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE symposium on security and privacy (SP). IEEE, 691–706.
Minto et al. (2021) Lorenzo Minto, Moritz Haller, Benjamin Livshits, and Hamed Haddadi. 2021. Stronger privacy for federated collaborative filtering with implicit feedback. In Proceedings of the 15th ACM Conference on Recommender Systems. 342–350.
Mironov (2017) Ilya Mironov. 2017. Rényi differential privacy. In 2017 IEEE 30th computer security foundations symposium (CSF). IEEE, 263–275.
Mohassel and Zhang (2017) Payman Mohassel and Yupeng Zhang. 2017. Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE symposium on security and privacy (SP). IEEE, 19–38.
Mothukuri et al. (2021) Viraaji Mothukuri, Reza M Parizi, Seyedamin Pouriyeh, Yan Huang, Ali Dehghantanha, and Gautam Srivastava. 2021. A survey on security and privacy of federated learning. Future Generation Computer Systems 115 (2021), 619–640.
Mueller et al. (2022) Tamara T Mueller, Dmitrii Usynin, Johannes C Paetzold, Daniel Rueckert, and Georgios Kaissis. 2022. SoK: Differential privacy on graph-structured data. arXiv preprint arXiv:2203.09205 (2022).
Naseri et al. (2020) Mohammad Naseri, Jamie Hayes, and Emiliano De Cristofaro. 2020. Local and central differential privacy for robustness and privacy in federated learning. arXiv preprint arXiv:2009.03561 (2020).
Nasr et al. (2023) Milad Nasr, Jamie Hayes, Thomas Steinke, Borja Balle, Florian Tramèr, Matthew Jagielski, Nicholas Carlini, and Andreas Terzis. 2023. Tight auditing of differentially private machine learning. In 32nd USENIX Security Symposium (USENIX Security 23). 1631–1648.
Nasr et al. (2019) Milad Nasr, Reza Shokri, and Amir Houmansadr. 2019. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE symposium on security and privacy (SP). IEEE, 739–753.
Nguyên et al. (2016) Thông T Nguyên, Xiaokui Xiao, Yin Yang, Siu Cheung Hui, Hyejin Shin, and Junbum Shin. 2016. Collecting and analyzing data from smart device users with local differential privacy. arXiv preprint arXiv:1606.05053 (2016).
Noble et al. (2022) Maxence Noble, Aurélien Bellet, and Aymeric Dieuleveut. 2022. Differentially private federated learning on heterogeneous data. In International Conference on Artificial Intelligence and Statistics. PMLR, 10110–10145.
Oh et al. (2022) Seungeun Oh, Jihong Park, Sihun Baek, Hyelin Nam, Praneeth Vepakomma, Ramesh Raskar, Mehdi Bennis, and Seong-Lyun Kim. 2022. Differentially private cutmix for split learning with vision transformer. arXiv preprint arXiv:2210.15986 (2022).
Pan et al. (2021b) Qianqian Pan, Jun Wu, Ali Kashif Bashir, Jianhua Li, Wu Yang, and Yasser D Al-Otaibi. 2021b. Joint protection of energy security and information privacy for energy harvesting: An incentive federated learning approach. IEEE Transactions on Industrial Informatics 18, 5 (2021), 3473–3483.
Pan et al. (2010) Sinno Jialin Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. 2010. Domain adaptation via transfer component analysis. IEEE transactions on neural networks 22, 2 (2010), 199–210.
Pan et al. (2021a) Yanghe Pan, Jianbing Ni, and Zhou Su. 2021a. Fl-pate: Differentially private federated learning with knowledge transfer. In 2021 IEEE Global Communications Conference (GLOBECOM). IEEE, 1–6.
Papernot et al. (2017) Nicolas Papernot, Martın Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar. 2017. SEMI-SUPERVISED KNOWLEDGE TRANSFER FOR DEEP LEARNING FROM PRIVATE TRAINING DATA. stat 1050 (2017), 3.
Papernot et al. (2018) Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Ulfar Erlingsson. 2018. Scalable Private Learning with PATE. In International Conference on Learning Representations.
Papernot et al. (2021) Nicolas Papernot, Abhradeep Thakurta, Shuang Song, Steve Chien, and Úlfar Erlingsson. 2021. Tempered sigmoid activations for deep learning with differential privacy. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9312–9321.
Pasquini et al. (2021) Dario Pasquini, Giuseppe Ateniese, and Massimo Bernaschi. 2021. Unleashing the tiger: Inference attacks on split learning. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 2113–2129.
Paul and Mishra (2020) Sudipta Paul and Subhankar Mishra. 2020. ARA: aggregated RAPPOR and analysis for centralized differential privacy. SN Computer Science 1, 1 (2020), 22.
Perrier et al. (2018) Victor Perrier, Hassan Jameel Asghar, and Dali Kaafar. 2018. Private continual release of real-valued data streams. arXiv preprint arXiv:1811.03197 (2018).
Qi et al. (2023) Tao Qi, Fangzhao Wu, Chuhan Wu, Liang He, Yongfeng Huang, and Xing Xie. 2023. Differentially private knowledge transfer for federated learning. Nature Communications 14, 1 (2023), 3785.
Regev (2009) Oded Regev. 2009. On lattices, learning with errors, random linear codes, and cryptography. Journal of the ACM (JACM) 56, 6 (2009), 1–40.
Ruan et al. (2023) Wenqiang Ruan, Mingxin Xu, Wenjing Fang, Li Wang, Lei Wang, and Weili Han. 2023. Private, efficient, and accurate: Protecting models trained by multi-party learning with differential privacy. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1926–1943.
Ryu and Kim (2022) Minseok Ryu and Kibaek Kim. 2022. Differentially private federated learning via inexact ADMM with multiple local updates. arXiv preprint arXiv:2202.09409 (2022).
Sajadmanesh et al. (2023) Sina Sajadmanesh, Ali Shahin Shamsabadi, Aurélien Bellet, and Daniel Gatica-Perez. 2023. $\{$ GAP $\}$ : Differentially Private Graph Neural Networks with Aggregation Perturbation. In 32nd USENIX Security Symposium (USENIX Security 23). 3223–3240.
Salem et al. (2023) Ahmed Salem, Giovanni Cherubin, David Evans, Boris Köpf, Andrew Paverd, Anshuman Suri, Shruti Tople, and Santiago Zanella-Béguelin. 2023. SoK: Let the privacy games begin! A unified treatment of data inference privacy in machine learning. In 2023 IEEE Symposium on Security and Privacy (SP). IEEE, 327–345.
Salem et al. (2019) Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal Berrang, Mario Fritz, and Michael Backes. 2019. ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models. In Network and Distributed Systems Security (NDSS) Symposium 2019.
Shi et al. (2022) Weiyan Shi, Ryan Shea, Si Chen, Chiyuan Zhang, Ruoxi Jia, and Zhou Yu. 2022. Just Fine-tune Twice: Selective Differential Privacy for Large Language Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 6327–6340.
Shi et al. (2023) Yifan Shi, Yingqi Liu, Kang Wei, Li Shen, Xueqian Wang, and Dacheng Tao. 2023. Make landscape flatter in differentially private federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24552–24562.
Shin et al. (2018) Hyejin Shin, Sungwook Kim, Junbum Shin, and Xiaokui Xiao. 2018. Privacy enhanced matrix factorization for recommendation with local differential privacy. IEEE Transactions on Knowledge and Data Engineering 30, 9 (2018), 1770–1782.
Skellam (1946) John G Skellam. 1946. The frequency distribution of the difference between two Poisson variates belonging to different populations. Journal of the Royal Statistical Society Series A: Statistics in Society 109, 3 (1946), 296–296.
Song et al. (2017) Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. 2017. Machine learning models that remember too much. (2017), 587–601.
Stevens et al. (2022) Timothy Stevens, Christian Skalka, Christelle Vincent, John Ring, Samuel Clark, and Joseph Near. 2022. Efficient differentially private secure aggregation for federated learning via hardness of learning with errors. In 31st USENIX Security Symposium (USENIX Security 22). 1379–1395.
Stock et al. (2022) Pierre Stock, Igor Shilov, Ilya Mironov, and Alexandre Sablayrolles. 2022. Defending against Reconstruction Attacks with Rényi Differential Privacy. arXiv e-prints (2022), arXiv–2202.
Sun and Lyu (2020) Lichao Sun and Lingjuan Lyu. 2020. Federated model distillation with noise-free differential privacy. arXiv preprint arXiv:2009.05537 (2020).
Sun et al. (2021) Lichao Sun, Jianwei Qian, and Xun Chen. 2021. LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.
Suresh et al. (2017) Ananda Theertha Suresh, X Yu Felix, Sanjiv Kumar, and H Brendan McMahan. 2017. Distributed mean estimation with limited communication. In International conference on machine learning. PMLR, 3329–3337.
Takahashi et al. (2023) Hideaki Takahashi, Jingjing Liu, and Yang Liu. 2023. Eliminating Label Leakage in Tree-Based Vertical Federated Learning. arXiv preprint arXiv:2307.10318 (2023).
Tian et al. (2023) Zhihua Tian, Rui Zhang, Xiaoyang Hou, Lingjuan Lyu, Tianyi Zhang, Jian Liu, and Kui Ren. 2023. FederBoost: Private Federated Learning for GBDT. IEEE Transactions on Dependable and Secure Computing (2023).
Tian et al. (2022) Zhiliang Tian, Yingxiu Zhao, Ziyue Huang, Yu-Xiang Wang, Nevin L Zhang, and He He. 2022. Seqpate: Differentially private text generation via knowledge distillation. Advances in Neural Information Processing Systems 35 (2022), 11117–11130.
Tramer and Boneh (2020) Florian Tramer and Dan Boneh. 2020. Differentially Private Learning Needs Better Features (or Much More Data). In International Conference on Learning Representations.
Triastcyn and Faltings (2019) Aleksei Triastcyn and Boi Faltings. 2019. Federated learning with bayesian differential privacy. In 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2587–2596.
Triastcyn and Faltings (2020) Aleksei Triastcyn and Boi Faltings. 2020. Bayesian differential privacy for machine learning. In International Conference on Machine Learning. PMLR, 9583–9592.
Truex et al. (2020) Stacey Truex, Ling Liu, Ka-Ho Chow, Mehmet Emre Gursoy, and Wenqi Wei. 2020. LDP-Fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking. 61–66.
Van Erven and Harremos (2014) Tim Van Erven and Peter Harremos. 2014. Rényi divergence and Kullback-Leibler divergence. IEEE Transactions on Information Theory 60, 7 (2014), 3797–3820.
Varun et al. (2024) Matta Varun, Shuya Feng, Han Wang, Shamik Sural, and Yuan Hong. 2024. Towards Accurate and Stronger Local Differential Privacy for Federated Learning with Staircase Randomized Response. In 14th ACM Conference on Data and Application Security and Privacy. ACM.
Wan et al. (2023) Sheng Wan, Dashan Gao, Hanlin Gu, and Daning Hu. 2023. FedPDD: A Privacy-preserving Double Distillation Framework for Cross-silo Federated Recommendation. arXiv preprint arXiv:2305.06272 (2023).
Wang et al. (2023a) Baocang Wang, Yange Chen, Hang Jiang, and Zhen Zhao. 2023a. PPeFL: Privacy-Preserving Edge Federated Learning with Local Differential Privacy. IEEE Internet of Things Journal (2023).
Wang et al. (2023b) Boxin Wang, Yibo Jacky Zhang, Yuan Cao, Bo Li, H Brendan McMahan, Sewoong Oh, Zheng Xu, and Manzil Zaheer. 2023b. Can Public Large Language Models Help Private Cross-device Federated Learning? arXiv preprint arXiv:2305.12132 (2023).
Wang et al. (2020b) Chang Wang, Jian Liang, Mingkai Huang, Bing Bai, Kun Bai, and Hao Li. 2020b. Hybrid differentially private federated learning on vertically partitioned data. arXiv preprint arXiv:2009.02763 (2020).
Wang et al. (2022b) Chen Wang, Xinkui Wu, Gaoyang Liu, Tianping Deng, Kai Peng, and Shaohua Wan. 2022b. Safeguarding cross-silo federated learning with local differential privacy. Digital Communications and Networks 8, 4 (2022), 446–454.
Wang et al. (2022a) Han Wang, Hanbin Hong, Li Xiong, Zhan Qin, and Yuan Hong. 2022a. L-srr: Local differential privacy for location-based services with staircase randomized response. In Proceedings of the 2022 ACM SIGSAC Conference on computer and communications security. 2809–2823.
Wang et al. (2020a) Lun Wang, Ruoxi Jia, and Dawn Song. 2020a. D2P-Fed: Differentially private federated learning with efficient communication. arXiv preprint arXiv:2006.13039 (2020).
Wang et al. (2019b) Ning Wang, Xiaokui Xiao, Yin Yang, Jun Zhao, Siu Cheung Hui, Hyejin Shin, Junbum Shin, and Ge Yu. 2019b. Collecting and analyzing multidimensional data with local differential privacy. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 638–649.
Wang et al. (2017) Tianhao Wang, Jeremiah Blocki, Ninghui Li, and Somesh Jha. 2017. Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17). 729–745.
Wang et al. (2021) Tianhao Wang, Joann Qiongna Chen, Zhikun Zhang, Dong Su, Yueqiang Cheng, Zhou Li, Ninghui Li, and Somesh Jha. 2021. Continuous release of data streams under both centralized and local differential privacy. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. 1237–1253.
Wang et al. (2020d) Teng Wang, Xuefeng Zhang, Jingyu Feng, and Xinyu Yang. 2020d. A comprehensive survey on local differential privacy toward data statistics and analysis. Sensors 20, 24 (2020), 7030.
Wang et al. (2020c) Yansheng Wang, Yongxin Tong, and Dingyuan Shi. 2020c. Federated latent dirichlet allocation: A local differential privacy based framework. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6283–6290.
Wang et al. (2019a) Zhibo Wang, Mengkai Song, Zhifei Zhang, Yang Song, Qian Wang, and Hairong Qi. 2019a. Beyond inferring class representatives: User-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE conference on computer communications. IEEE, 2512–2520.
Warner (1965) Stanley L Warner. 1965. Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 309 (1965), 63–69.
Wei et al. (2022) Jianxin Wei, Ergute Bao, Xiaokui Xiao, and Yin Yang. 2022. Dpis: An enhanced mechanism for differentially private sgd with importance sampling. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. 2885–2899.
Wei et al. (2021) Kang Wei, Jun Li, Ming Ding, Chuan Ma, Hang Su, Bo Zhang, and H Vincent Poor. 2021. User-level privacy-preserving federated learning: Analysis and performance optimization. IEEE Transactions on Mobile Computing 21, 9 (2021), 3388–3401.
Wei et al. (2020) Kang Wei, Jun Li, Ming Ding, Chuan Ma, Howard H Yang, Farhad Farokhi, Shi Jin, Tony QS Quek, and H Vincent Poor. 2020. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Transactions on Information Forensics and Security 15 (2020), 3454–3469.
Wu et al. (2021a) Chuhan Wu, Fangzhao Wu, Yang Cao, Yongfeng Huang, and Xing Xie. 2021a. Fedgnn: Federated graph neural network for privacy-preserving recommendation. arXiv preprint arXiv:2102.04925 (2021).
Wu et al. (2022) Fan Wu, Yunhui Long, Ce Zhang, and Bo Li. 2022. Linkteller: Recovering private edges from graph neural networks via influence analysis. In 2022 ieee symposium on security and privacy (sp). IEEE, 2005–2024.
Wu et al. (2021b) Maoqiang Wu, Dongdong Ye, Jiahao Ding, Yuanxiong Guo, Rong Yu, and Miao Pan. 2021b. Incentivizing differentially private federated learning: A multidimensional contract approach. IEEE Internet of Things Journal 8, 13 (2021), 10639–10651.
Wu et al. (2020) Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. 2020. Privacy preserving vertical federated learning for tree-based models. arXiv preprint arXiv:2008.06170 (2020).
Xiang et al. (2023) Zihang Xiang, Tianhao Wang, Wanyu Lin, and Di Wang. 2023. Practical Differentially Private and Byzantine-resilient Federated Learning. Proceedings of the ACM on Management of Data 1, 2 (2023), 1–26.
Xiong et al. (2014) Ping Xiong, Tianqing Zhu, and Xiao-Feng Wang. 2014. A survey on differential privacy and applications. (2014).
Xiong et al. (2020) Xingxing Xiong, Shubo Liu, Dan Li, Zhaohui Cai, and Xiaoguang Niu. 2020. A comprehensive survey on local differential privacy. Security and Communication Networks 2020 (2020), 1–29.
Xu et al. (2022) Yin Xu, Mingjun Xiao, An Liu, and Jie Wu. 2022. Edge resource prediction and auction for distributed spatial crowdsourcing with differential privacy. IEEE Internet of Things Journal 9, 17 (2022), 15554–15569.
Xu et al. (2023a) Zheng Xu, Maxwell Collins, Yuxiao Wang, Liviu Panait, Sewoong Oh, Sean Augenstein, Ting Liu, Florian Schroff, and H Brendan McMahan. 2023a. Learning to generate image embeddings with user-level differential privacy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7969–7980.
Xu et al. (2023b) Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher A Choquette-Choo, Peter Kairouz, H Brendan McMahan, Jesse Rosenstock, and Yuanbo Zhang. 2023b. Federated learning of gboard language models with differential privacy. arXiv preprint arXiv:2305.18465 (2023).
Yang et al. (2021) Ge Yang, Shaowei Wang, and Haijie Wang. 2021. Federated learning with personalized local differential privacy. In 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS). IEEE, 484–489.
Yang et al. (2023a) Mengmeng Yang, Taolin Guo, Tianqing Zhu, Ivan Tjuawinata, Jun Zhao, and Kwok-Yan Lam. 2023a. Local differential privacy and its applications: A comprehensive survey. Computer Standards & Interfaces (2023), 103827.
Yang et al. (2019) Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10, 2 (2019), 1–19.
Yang et al. (2023b) Xiyuan Yang, Wenke Huang, and Mang Ye. 2023b. Dynamic personalized federated learning with adaptive differential privacy. In Thirty-seventh Conference on Neural Information Processing Systems.
Yang et al. (2022) Xin Yang, Jiankai Sun, Yuanshun Yao, Junyuan Xie, and Chong Wang. 2022. Differentially private label protection in split learning. arXiv preprint arXiv:2203.02073 (2022).
Yang et al. (2023c) Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. 2023c. $\{$ PrivateFL $\}$ : Accurate, differentially private federated learning via personalized data transformation. In 32nd USENIX Security Symposium (USENIX Security 23). 1595–1612.
Yin et al. (2021) Xuefei Yin, Yanming Zhu, and Jiankun Hu. 2021. A comprehensive survey of privacy-preserving federated learning: A taxonomy, review, and future directions. ACM Computing Surveys (CSUR) 54, 6 (2021), 1–36.
Yu et al. (2021a) Da Yu, Huishuai Zhang, Wei Chen, and Tie-Yan Liu. 2021a. Do not let privacy overbill utility: Gradient embedding perturbation for private learning. arXiv preprint arXiv:2102.12677 (2021).
Yu et al. (2021b) Da Yu, Huishuai Zhang, Wei Chen, Jian Yin, and Tie-Yan Liu. 2021b. Large scale private learning via low-rank reparametrization. In International Conference on Machine Learning. PMLR, 12208–12218.
Zhang et al. (2015) Kun Zhang, Mingming Gong, and Bernhard Schölkopf. 2015. Multi-source domain adaptation: A causal view. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
Zhang and Kwok (2014) Ruiliang Zhang and James Kwok. 2014. Asynchronous distributed ADMM for consensus optimization. In International conference on machine learning. PMLR, 1701–1709.
Zhang et al. (2023b) Shaobo Zhang, Jiyong Zhang, Gengming Zhu, Saiqin Long, and Li Zhetao. 2023b. Personalized Federated Learning Method Based on Bregman Divergence and Differential Privacy (in chinese). Journal of Software (2023).
Zhang et al. (2022) Xinwei Zhang, Xiangyi Chen, Mingyi Hong, Zhiwei Steven Wu, and Jinfeng Yi. 2022. Understanding clipping for federated learning: Convergence and client-level differential privacy. In International Conference on Machine Learning, ICML 2022.
Zhang et al. (2020) Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, and Dawn Song. 2020. The secret revealer: Generative model-inversion attacks against deep neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 253–261.
Zhang et al. (2023a) Yi Zhang, Yunfan Lu, and Fengxia Liu. 2023a. A systematic survey for differential privacy techniques in federated learning. Journal of Information Security 14, 2 (2023), 111–135.
Zhao et al. (2020a) Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. 2020a. idlg: Improved deep leakage from gradients. arXiv preprint arXiv:2001.02610 (2020).
Zhao et al. (2022) Jianzhe Zhao, Mengbo Yang, Ronglin Zhang, Wuganjing Song, Jiali Zheng, Jingran Feng, and Stan Matwin. 2022. Privacy-Enhanced Federated Learning: A Restrictively Self-Sampled and Data-Perturbed Local Differential Privacy Method. Electronics 11, 23 (2022), 4007.
Zhao et al. (2020b) Yang Zhao, Jun Zhao, Mengmeng Yang, Teng Wang, Ning Wang, Lingjuan Lyu, Dusit Niyato, and Kwok-Yan Lam. 2020b. Local differential privacy-based federated learning for internet of things. IEEE Internet of Things Journal 8, 11 (2020), 8836–8853.
Zheng et al. (2021) Qinqing Zheng, Shuxiao Chen, Qi Long, and Weijie Su. 2021. Federated f-differential privacy. In International Conference on Artificial Intelligence and Statistics. PMLR, 2251–2259.
Zhou et al. (2021) Zhou Zhou, Youliang Tian, and Changgen Peng. 2021. Privacy-preserving federated learning framework with general aggregation and multiparty entity matching. Wireless Communications and Mobile Computing 2021 (2021), 1–14.
Zhu et al. (2019) Ligeng Zhu, Zhijian Liu, and Song Han. 2019. Deep leakage from gradients. Advances in neural information processing systems 32 (2019).
Zhu and Wang (2019) Yuqing Zhu and Yu-Xiang Wang. 2019. Poission subsampled rényi differential privacy. In International Conference on Machine Learning. PMLR, 7634–7642.