QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
^†^†thanks: ^*Corresponding Author. ^†^†thanks: This work was supported by National Natural Science Foundation of China under Grant No. 62206320.

Wenhao Yuan, Xuehe Wang^* School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China
Email: [email protected], [email protected]

Abstract

Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover’s Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.

Index Terms:

Federated learning, Stackelberg game, differential privacy, client selection mechanism.

I Introduction

In the era of rapid advancements in science and technology, an unprecedented volume of data has been generated by edge devices. It is anticipated that the data volume in human society will experience geometric growth soon. Concurrently, the surge in private data is accompanied by escalating concerns over data privacy and security, drawing considerable focus from both academic and industrial sectors. Notably, the enactment of stringent data privacy regulations such as GDPR [1], poses a formidable challenge in accessing and utilizing high-quality private data for training artificial intelligence models. In addition, the huge communication costs associated with data transmission cannot be overlooked. These challenges pave the way for the emergence of innovative machine-learning technologies that mitigate the risk of privacy disclosure [2].

Federated learning emerges as a compelling distributed machine-learning paradigm, offering multiple benefits including privacy preservation and communication efficiency [3]. However, the widespread implementation of efficient FL systems still encounters several challenges that warrant further investigation [4]. Most research concentrates on enhancing FL model performance and assumes that FL models possess adequate safety, whereas findings from [5] indicate potential risks of significant privacy breaches in gradient propagation schemes. Differential privacy (DP) [6], a prevalent method for safeguarding data privacy, is often employed to alleviate the adverse effects of strong noise injected while enhancing the protection level, advancements such as $\rho$ - $z$ CDP [7] and R $\acute{\text{e}}$ nyi DP [8] have been proposed. Recent studies have seen the integration of DP methods with FL, striving to strike a harmonious equilibrium between model performance and privacy preservation [9, 10]. However, most differential privacy-based FL approaches rely on standard $(\epsilon,\delta)$ -DP mechanism, which is susceptible to the Catastrophe Mechanism [11].

Additionally, an idealized assumption in current research posits that mobile devices will participate in FL model training unconditionally once invited, a notion that often falls short in practical scenarios as engaging in model training entails significant consumption of computational and communication resources, in the meanwhile, participants need to be wary of the potential risk of information leakage [12]. Without a well-designed economic incentive mechanism, egocentric mobile devices are probably reluctant to partake in [13]. Recently, incentive mechanism-based federated network optimization and FL have gradually gained extensive attention. [14, 15] focus on modeling the interactions between clients and the central server as a Stackelberg game. Besides, there has emerged a surge of auction-based FL algorithms [2, 9]. Contract theory-based FL models are also proposed [16, 13]. Nevertheless, most aforementioned works on incentive mechanism design overlook the security assurance during parameter transmission between the central server and edge nodes.

To incentivize the participation of mobile devices with high-quality data and eliminate the privacy threats associated with gradient disclosure, we innovatively propose a quality-aware and incentive-boosted federated learning framework based on the $\rho$ -zero-concentrated differential privacy ( $\rho$ - $z$ CDP) technique. We first design a client selection mechanism grounded in Earth Mover’s Distance (EMD) metric, followed by rigorous analysis of the differentially private federated learning (DPFL) framework, which introduces artificial Gaussian noise to obscure local model parameters, thereby addressing privacy concerns. Further, based on the DPFL framework, the interactions between the heterogeneous clients and the central server are modeled as a two-stage Stackelberg game termed QI-DPFL. In Stage I, the central server devises a time-dependent reward for clients to jointly minimize the accuracy loss and total reward. In Stage II, each selected client determines the optimal privacy budget in accordance with the reward allocated to maximize respective utility. The multi-fold contributions of our work are summarized as follows:

•

Privacy preservation and incentive mechanism in FL: We propose a novel and efficient quality-aware and incentive-boosted federated learning framework based on $\rho$ - $z$ CDP mechanism, named QI-DPFL. We first select the clients with high-quality data and then model the interactions between the central server and the selected clients as a two-stage Stackelberg game, which enables each participant to freely customize its privacy budget while achieving the well model performance in the premise of protecting data privacy.
•

Earth Mover’s Distance for client selection: We innovatively adopt the EMD metric for client selection mechanism design to sift the geographically distributed participants with high-quality datasets and improve the training performance.
•

Stackelberg Nash Equilibrium Analysis: By analyzing the interactions between the central server and selected clients, we derive the optimal reward and the optimal privacy budget in Stage I and Stage II respectively. Moreover, we demonstrate that the optimal strategy profile forms a Stackelberg Nash Equilibrium. Extensive experiments on different real-world datasets verify the effectiveness and security of our proposed differentially private federated learning framework.

Refer to caption — Figure 1: The framework of QI-DPFL consists of the following procedures: Step 1: The central server initializes the global model and publishes the task request. Step 2: The clients submit their data distributions to the central server and are selected by EMD metric. Step 3: Each selected client collects private database $\mathcal{D}_{i}$ and performs local training. Step 4: Once finishing the local training, each client uploads the perturbed model parameter $\nabla\widetilde{F}_{i}(\boldsymbol{w_{i}(t)})$ to the central server. Step 5: The central server offers a reward $\mathcal{R}_{t}$ to all participating clients to compensate the cost of each selected participant suffered. Step 6: Upon aggregating local parameters, the global model updates and is further verified on the testing set. The training process concludes when the required accuracy or a preset number of iterations is reached. Otherwise, the central server redistributes the updated global model to clients for the next iteration (Steps 3-6).

II Preliminaries

II-A Standard Federated Learning Model

FL introduces an innovative decentralized machine learning paradigm in which a global model is collaboratively trained by tremendous geographically distributed clients with locally collected data. Each client engages in one or multiple epochs of mini-batch stochastic gradient descent (SGD), subsequently transmitting the updated local model to a central server for the local model aggregation and global model update. Then, the central server dispatches the updated global parameter to all clients to trigger a fresh cycle of local training until the global model converges or reaches a predefined maximum iteration.

Given $N$ clients participating in FL training and each client $\small i\in\{1,2,\ldots,N\}$ utilizes its localized dataset $\small\mathcal{D}_{i}$ with a data size of $\small|\mathcal{D}_{i}|$ to contribute to model training. In each global iteration, each client parallelly performs $L$ ( $\small L\!\geq\!1$ ) epochs of SGD training to update its local parameter:

\displaystyle\boldsymbol{w_{i}^{l+1}(t)}=\boldsymbol{w_{i}^{l}(t)}-\eta_{i}\nabla F_{i}(\boldsymbol{w_{i}^{l}(t)}),

(1)

where $\small\eta_{i}\!\in\!(0,1)$ is the learning rate of client $i$ , $\small l\!\in\!\{1,2,\ldots,\\ L\}$ and we define the local model $\small\boldsymbol{w_{i}^{L}(t)}\!=\!\boldsymbol{w_{i}(t)}$ . The central server averages the local models from $N$ participating clients to update the global model $\boldsymbol{w(t)}$ :

\displaystyle\boldsymbol{w(t)}=\textstyle\sum_{i=1}^{N}\frac{|\mathcal{D}_{i}|}{\sum_{j=1}^{N}|\mathcal{D}_{j}|}\boldsymbol{w_{i}(t)}.

(2)

The goal of FL is to find the optimal global parameter $\boldsymbol{w}^{*}$ to minimize the global loss function $\small F(\boldsymbol{w})$ :

\displaystyle\textstyle\boldsymbol{w}^{*}\!=\!\text{arg}\min_{\boldsymbol{w}}F(\boldsymbol{w})\!=\!\text{arg}\min_{\boldsymbol{w}}\sum_{i=1}^{N}\!\frac{D_{i}}{\sum_{j=1}^{N}D_{j}}F_{i}(\boldsymbol{w}).

(3)

For the sake of theoretical analysis in the later content, we employ the following commonly used assumptions [17, 18].

Assumption 1

[ $\beta$ -Lipschitz Smoothness] The local loss function $\small F_{i}(\boldsymbol{w})$ is $\beta$ -Lipschitz smoothness for each participating client $\small i\in\{0,1,\ldots,N\}$ with any $\small\boldsymbol{w},\boldsymbol{w}^{\prime}\in\mathbb{R}^{d}$ :

\displaystyle\|\nabla{F_{i}(\boldsymbol{w}})-\nabla{F_{i}(\boldsymbol{w}^{\prime}})\|\leq\beta\|\boldsymbol{w}-\boldsymbol{w}^{\prime}\|.

(4)

Assumption 2

[ $\lambda$ -Strong Convexity] The global loss function $\small F(\boldsymbol{w})$ is $\lambda$ -strong convex with any $\small\boldsymbol{w},\boldsymbol{w}^{\prime}\in\mathbb{R}^{d}$ :

\displaystyle F(\boldsymbol{w})\!-\!F(\boldsymbol{w}^{\prime})\geq\langle\boldsymbol{w}\!-\!\boldsymbol{w}^{\prime},\nabla{F(\boldsymbol{w}^{\prime})}\rangle\!+\!\dfrac{\lambda}{2}\|\boldsymbol{w}\!-\!\boldsymbol{w}^{\prime}\|^{2}.

(5)

II-B Differential Privacy Mechanism

For tackling the attacks such as gradient inverse attack [5] that might disclose the original training data without accessing the datasets, $\rho$ -zero-concentrated differential privacy ( $\rho$ - $z$ CDP) was proposed in [7], which attains a tight composition bound and is more suitable for analyzing the end-to-end privacy loss of iteration algorithms [19].

Firstly, we define a metric $\small\mathcal{L}_{p}$ to measure the privacy loss. Specifically, for a randomized mechanism $\small\mathcal{M}\!:\!\mathcal{X}\!\rightarrow\!\mathcal{E}$ with domain $\small\mathcal{X}$ and range $\small\mathcal{E}$ given any two adjacent datasets $\small\mathcal{D},\mathcal{D}^{{}^{\prime}}\\ \!\subseteq\!\mathcal{X}$ with the same size but only differ by one sample, after observing an output $o\in\mathbb{R}$ , the privacy loss $\small\mathcal{L}_{p}$ is given as:

\displaystyle\textstyle\mathcal{L}_{p}=\text{ln}\left(\frac{Pr[\mathcal{M}(\mathcal{D})=o]}{Pr[\mathcal{M}(\mathcal{D}^{\prime})=o]}\right).

(6)

Then, the formal definition of $\rho$ - $z$ CDP is given as follows:

Definition 1

A randomized mechanism $\small\mathcal{M}\!:\!\mathcal{X}\!\!\rightarrow\!\mathcal{E}$ with domain $\small\mathcal{X}$ and range $\small\mathcal{E}$ satisfies $\rho$ -zero-concentrated differential privacy ( $\rho$ - $z$ CDP) if for any $\small\alpha\!\in\!(1,\infty)$ , we have:

\displaystyle\mathbb{E}[e^{(\alpha-1)\mathcal{L}_{p}}]\leq e^{(\alpha-1)(\rho\alpha)}.

(7)

Based on the Gaussian mechanism, given a query function $\small Q\!:\!\mathcal{X}\!\!\rightarrow\!\mathcal{E}$ , the sensitivity of the query function $\small Q$ is defined as $\small\!\Delta{Q}\!=\!\max_{\mathcal{D},\mathcal{D}^{\prime}}\!\!\|Q(\mathcal{D})-Q(\mathcal{D}^{\prime})\|_{2}$ for any two adjacent datasets $\small\mathcal{D},\mathcal{D}^{\prime}\!\subseteq\!\mathcal{X}$ . Specifically, in the $t$ -th global training iteration, by adding the artificial Gaussian noise $\small\boldsymbol{n_{i}(t)}\!\sim\!\mathcal{N}(0,\sigma_{i}^{2}(t))$ , the transmitted parameter of client $i$ becomes:

\displaystyle\small\nabla\widetilde{F}_{i}(\boldsymbol{w(t)})=\nabla F_{i}(\boldsymbol{w(t)})+\boldsymbol{n_{i}(t)},

(8)

where $\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}$ - $z$ CDP is satisfied with $\rho_{i}^{t}\!=\!\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}$ [6]. Based on the definition of the query function, we can easily derive the upper bound of the sensitivity $\small\Delta{Q}$ as given in Corollary 1.

Corollary 1

In global iteration, by utilizing the Gaussian mechanism to perturb transmitted parameter and implementing $\rho$ - $z$ CDP mechanism for each participating client $i$ , the sensitivity $\Delta{Q}$ of query function $Q$ is bounded by $2C/|\mathcal{D}_{i}|$ .

Proof:

For client $i$ with any two adjacent datasets $\mathcal{D}_{i}$ and $\mathcal{D}_{i}^{{}^{\prime}}$ , the sensitivity of query function $Q$ with input $\mathcal{D}_{i}$ and $\mathcal{D}_{i}^{{}^{\prime}}$ can be obtained as follows:

	$\displaystyle\small\Delta{Q}\!=\!$	$\displaystyle\small\max_{\mathcal{D}_{i},\mathcal{D}_{i}^{{}^{\prime}}}\\|Q(\mathcal{D}_{i})\!-\!Q(\mathcal{D}_{i}^{{}^{\prime}})\\|_{2}\!=\!\max_{\mathcal{D}_{i},\mathcal{D}_{i}^{{}^{\prime}}}\textstyle\frac{1}{\|\mathcal{D}_{i}\|}\\|\!\sum_{j\in\mathcal{D}_{i}}\!\!\!\nabla f_{i}^{j}(\boldsymbol{w},\boldsymbol{x}_{\boldsymbol{i}}^{\boldsymbol{j}})$
		$\displaystyle-\small\textstyle\sum_{j\in\mathcal{D}_{i}^{{}^{\prime}}}\!\nabla f_{i}^{j}(\boldsymbol{w}^{\prime},\boldsymbol{x}_{\boldsymbol{i}}^{\boldsymbol{j}})\\|_{2}\!=\!\frac{2C}{\|\mathcal{D}_{i}\|},$		(9)

where we assume that there exists a clipping threshold $C$ for the $i$ -th client’s local model in $t$ -th global iteration in the absence of adding artificial perturbation, i.e., $\|\boldsymbol{w_{i}(t)}\|\!\leq C$ . ∎

In $t$ -th global training iteration, based on Corollary 1 and note that $\rho_{i}^{t}=\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}$ , the variance of the Gaussian random noise $\sigma_{i}^{2}(t)$ of client $i$ can be easily derived as follows:

\displaystyle\textstyle\sigma_{i}^{2}(t)=\frac{2C^{2}}{\rho_{i}^{t}|\mathcal{D}_{i}|^{2}}.

(10)

III System Model

We propose a two-layer federated learning framework based on differential privacy technique as shown in Fig. 1. In the client selection layer, the central server outsources FL tasks to all clients. The clients who are willing to participate submit their data distributions only containing the information of label frequencies to the central server, based on which, the central server selects the participants with high-quality data according to the EMD metric. Then, in the second layer, the interactions between the central server and selected clients are formulated as a two-stage Stackelberg game, where the central server decides the optimal rewards to the selected clients to incentivize them to contribute data with a high privacy budget, and each selected client determines the privacy budget $\rho_{i}^{t}$ based on the rewards and adds artificial noise to their local parameter to avoid severe privacy leakage during gradient uploading.

III-A Quality-Aware Client Selection Mechanism

From the perspective of the central server, to minimize the cost while reaching the accuracy threshold, it is crucial to select the clients with superior data quality by a metric to quantify clients’ potential contributions to the FL system. To disclose the pertinent information about the local data while ensuring privacy preservation, we focus on the critical attribute of local data (i.e., data distribution) [20].

In FL, the data distribution varies owing to the distinct preferences of heterogeneous clients, leading to a non-independent and identically distributed (Non-IID) setting. The characteristic of Non-IID data dominantly affects the model performance, such as training accuracy [21]. Inspired by [20], the accuracy attenuation is significantly affected by the weight divergence, which can be quantified by the Earth Mover’s Distance (EMD) metric. The larger EMD value indicates higher weight divergence, thus damaging the global model quality.

In the first layer, we assume that there are a total of $H$ clients who are willing to participate in the FL model training process. Considering a $P$ class classification task that defines over a compact space $\small\mathcal{Y}$ and a label space $\small\mathcal{Z}$ . The data sample $\small\mathcal{D}_{h}\!=\!\{\boldsymbol{x_{h}},y_{h}\}$ of client $\small h\!\in\!\{1,2,\ldots,H\}$ with $\small\boldsymbol{x_{h}}\in\mathcal{Y}$ and $\small y_{h}\in\mathcal{Z}$ follows the distribution $\small\mathbb{P}_{h}$ . Under the premise of the actual distribution $\small\mathbb{P}_{a}$ for the whole population, we denote the EMD of $\small\mathcal{D}_{h}$ by $\small\theta_{h}$ , which can be calculated as follows:

\displaystyle\small\theta_{h}=\textstyle\sum_{j\in\mathcal{Y}}\left\|\mathbb{P}_{h}(y=j)-\mathbb{P}_{a}(y=j)\right\|,

(11)

where the actual distribution $\mathbb{P}_{a}$ is a reference distribution and can be the public information or estimated according to the historical data. If the EMD value $\small\theta_{h}$ of client $\small h\!\in\!\{1,2,\ldots,H\}$ is larger than the pre-set threshold (i.e., $\small\theta_{h}\!\!>\!\!\theta_{\text{th}}$ ), client $h$ encounters a failure in executing the FL task.

III-B Incentive Mechanism Design with Stackelberg Game

Supposing that $N$ clients are selected by the central server. The interactions between the central server and selected clients are modeled as a two-stage Stackelberg game. Specifically, at Stage I, the central server, which acts as the leader, decides the optimal payment $\small\mathcal{R}_{t}^{*}$ in $t$ -th global iteration to minimize its cost $\small\mathcal{U}_{T}$ . Then, at Stage II, based on the reward allocated by the central server, each selected client $i\in\{1,2,\ldots,N\}$ , which acts as the follower, maximizes its utility function $\small\mathcal{U}_{i}^{t}$ by determining the optimal privacy budget $\rho_{i}^{t}$ .

III-B1 Central Server’s Cost (Stage I)

Before introducing the cost function of the central server, we first discuss how the privacy budget $\rho_{i}^{t}$ of each client affects the accuracy loss of the global model. From Eq (8), we can derive the global model with Gaussian random noise as follows:

	$\displaystyle\textstyle\nabla\widetilde{F}(\boldsymbol{w(t)})$	$\displaystyle=\textstyle\frac{1}{N}\sum_{i=1}^{N}\left(\nabla F_{i}(\boldsymbol{w(t)})+\boldsymbol{n_{i}(t)}\right)$
		$\displaystyle=\textstyle\nabla F(\boldsymbol{w(t)})+\frac{1}{N}\textstyle\sum_{i=1}^{N}\boldsymbol{n_{i}(t)}.$		(12)

Inspired by [22, 14], we assume that the global loss function $\small\nabla F(\boldsymbol{w(t)})$ attains an upper bound $V$ (i.e., $\small\|\nabla F(\boldsymbol{w(t)})\|_{2}\!\leq\!V$ ). Further, note that the Gaussian random noise possesses zero mean and $\small\mathbb{E}[\|\!\sum_{i=1}^{N}\!\boldsymbol{n_{i}(t)}\|_{2}^{2}]\!=\!d\sigma_{i}^{2}(t)$ , where $d$ represents the dimension of the input vector and the variance $\small\sigma_{i}^{2}(t)$ is determined by client $i$ ’s privacy budget $\small\rho_{i}^{t}$ as shown in Eq (10). Thus, the upper bound of $\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}]$ can be derived as:

$\displaystyle\textstyle\mathbb{E}[\\|\nabla\widetilde{F}(\boldsymbol{w(t)})\\|_{2}^{2}]$	$\displaystyle=\textstyle\mathbb{E}[\\|\nabla F(\boldsymbol{w(t)})+\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{n_{i}(t)}\\|_{2}^{2}]$
	$\displaystyle=\textstyle\mathbb{E}[\\|\nabla F(\boldsymbol{w(t)})\\|_{2}^{2}]\!+\!\frac{1}{N^{2}}\mathbb{E}[\\|\textstyle\sum_{i=1}^{N}\!\boldsymbol{n_{i}(t)}\\|_{2}^{2}]$
	$\displaystyle\leq\textstyle V^{2}+\frac{d}{N^{2}}\textstyle\sum_{i=1}^{N}{\sigma_{i}^{2}(t)}\triangleq G^{2}(t).$	(13)

Suppose that the global loss function $\nabla F(\boldsymbol{w(t)})$ satisfies $\beta$ -Lipschitz Smoothness (Assumption 1) and $\lambda$ -Strong Convexity (Assumption 2), attains minimum at $\small\boldsymbol{w^{*}}$ and $\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}]\!\!\leq\!\!G^{2}(t)$ . Since the upper bound of $\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}]$ varies with time, we define $\small G^{2}\!=\!\sum_{t=1}^{T}\!G^{2}(t)/T$ . Denote the learning rate as $\small\eta_{t}\!=\!1/\lambda t$ , the accuracy loss can be expressed as:

\displaystyle\textstyle\mathbb{E}[F(\boldsymbol{w(T)})-F(\boldsymbol{w^{*}})]\leq\frac{2\beta G^{2}}{\lambda^{2}T}.

(14)

Denote the reward vector by $\small\boldsymbol{\mathcal{R}}\!=\!\{\mathcal{R}_{1},\mathcal{R}_{2},\ldots,\mathcal{R}_{T}\}$ and the privacy budget vector as $\small\boldsymbol{\rho}\!=\!\{\rho_{i}^{t},i\!\in\!\{1,2,\ldots,N\},t\!\in\!\{1,2,\ldots,\\ T\}\}$ . Then, the central server’s cost function can be expressed as the summation of the accuracy loss and total reward:

		$\displaystyle\textstyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho})=\gamma\frac{2\beta G^{2}}{\lambda^{2}T}+(1-\gamma)\textstyle\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k}$		(15)
		$\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}\!\textstyle\sum_{t=1}^{T}(V^{2}\!+\!\!\sum_{i=1}^{N}\!\frac{2dC^{2}}{\|\mathcal{D}_{i}\|^{2}\rho_{i}^{t}N^{2}})\!\!+\!(1\!-\!\gamma)\!\sum_{k=1}^{T}\!\pi^{k-1}\mathcal{R}_{k},$

where discount factor $\small\pi\!\in\!(0,1)$ measures the decrement of the value of reward $\mathcal{R}_{t}$ over time.

III-B2 Client’s Utility (Stage II)

In the $t$ -th global iteration, given the central server’s reward $\mathcal{R}_{t}$ , the reward $r_{i}^{t}$ allocated to client $i$ is: $r_{i}^{t}\!=\!\frac{\rho_{i}^{t}}{\sum_{j=1}^{N}\rho_{j}^{t}}\mathcal{R}_{t}$ . The utility of client $i$ in $t$ -th global iteration is defined as the difference between the reward from the central server and training cost $\mathcal{C}_{i,t}$ , i.e.,

	$\displaystyle\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t})=\textstyle\frac{\rho_{i}^{t}}{\sum_{j=1}^{N}\rho_{j}^{t}}\mathcal{R}_{t}-\mathcal{C}_{i,t},$		(16)
	$\displaystyle\mathcal{C}_{i,t}=\phi_{1}\mathcal{C}_{i,t}^{pv}+\phi_{2}\mathcal{C}_{i,t}^{d}+\phi_{3}\mathcal{C}_{i,t}^{cp}+\phi_{4}\mathcal{C}_{i,t}^{cm},$		(17)

where $\small\boldsymbol{\rho_{\text{-}i}^{t}}\!=\!\{\rho_{1}^{t},\ldots,\rho_{i-1}^{t},\rho_{i+1}^{t},\ldots,\rho_{N}^{t}\}$ and $\small\phi_{k}$ , $k\!\in\!\{1,2,3,4\}$ is positive coefficients. The training cost $\small\mathcal{C}_{i,t}$ consists of privacy cost $\small\mathcal{C}_{i,t}^{pv}$ , local data cost $\small\mathcal{C}_{i,t}^{d}$ , computation cost $\small\mathcal{C}_{i,t}^{cp}$ and communication cost $\small\mathcal{C}_{i,t}^{cm}$ . Among all components of the training cost of the client $\small i$ , privacy cost $\small\mathcal{C}_{i,t}^{pv}$ is closely related to the privacy budget $\small\rho_{i}^{t}$ , as a larger privacy budget $\small\rho_{i}^{t}$ signifies more precise data, yet it also corresponds to an increased vulnerability to privacy breaches. Specifically, inspired by [19], we denote the privacy cost of client $i$ as a function of $\small c(\nu_{i}^{t},\rho_{i}^{t})$ , where $\small\nu_{i}^{t}\!>\!0$ is the privacy value that is assumed publicly known. Here, we consider linear privacy cost function for each client, i.e., $\mathcal{C}_{i,t}^{pv}=c(\nu_{i}^{t},\rho_{i}^{t})=\nu_{i}^{t}\cdot\rho_{i}^{t}$ .

The objective of $i$ -th client is to maximize its utility function $\mathcal{U}_{i}^{t}$ by dynamically adjusting privacy budgets $\rho_{i}^{t}$ at each global iteration $t$ according to the reward $\mathcal{R}_{t}$ from the central server. The goal of the central server is to minimize its cost function by adjusting the reward $\mathcal{R}_{t}$ , $t\!\in\!\{1,2,\ldots,T\}$ distributed to participating clients in the premise of reaching preset global model accuracy. Note that the reward $\boldsymbol{\mathcal{R}}$ to clients will affect the clients’ designs of privacy budget $\boldsymbol{\rho}$ , which in return affects the central server’s cost as shown in Eq (15). Thus, the two-stage Stackelberg game can be formulated as:

	Stage I:	$\displaystyle\textstyle\ \min_{\boldsymbol{\mathcal{R}}}\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho}),$		(18)
	Stage II:	$\displaystyle\textstyle\ \max_{\rho_{i}^{t}}\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t}),$		(19)

where the privacy budget $\small\rho_{i}^{t}$ over time form the strategy profile $\small\boldsymbol{\rho_{i}}$ of client $i$ , i.e., $\small\boldsymbol{\rho_{i}}=\{\rho_{i}^{t},t\in\{1,2,\ldots,T\}\}$ .

IV Stackelberg Nash Equilibrium Analysis

In this section, we will find the optimal strategy profile $\small(\boldsymbol{\mathcal{R}},\\ \{\boldsymbol{\rho_{i}}$ , $i\!\in\!\{1,2,\ldots,N\}\})$ , with $\small\boldsymbol{\mathcal{R}}\!=\!\{\mathcal{R}_{t},t\!\in\!\{1,2,\ldots,T\}\}\!$ and $\small\boldsymbol{\rho_{i}}\!=\!\{\rho_{i}^{t},t\!\in\!\{1,2,\ldots,T\}\}$ for the central server and selected clients in the two-stage Stackelberg game through backward induction. Firstly, we concentrate on the followers’ operation and derive each selected client $i$ ’s optimal privacy budget $\rho_{i}^{t*}$ in the $t$ -th global iteration under any given reward $\mathcal{R}_{t}$ . Then, considering the trade-off between the model accuracy loss and payment to the clients, we deduce the central server’s optimal strategy $\mathcal{R}_{t}^{*}$ . Finally, we prove that the optimal solution forms the Stackelberg Nash Equilibrium.

IV-A Optimal Strategy Profile

We adopt a backward induction approach to derive the optimal strategy of the central server and each client respectively. First of all, we analyze the optimal strategy of each selected client by determining the optimal privacy budget $\rho_{i}^{t*}$ and are presented in the following theorem.

Theorem 1

In Stage II, given payment $\mathcal{R}_{t}$ in the $t$ -th global iteration, the optimal privacy budget of each client $i$ is:

\displaystyle\textstyle\rho_{i}^{t*}=\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}-\frac{\mathcal{R}_{t}\nu_{i}^{t}}{\phi_{1}}(\frac{N-1}{\sum_{i=1}^{N}\nu_{i}^{t}})^{2}.

(20)

Proof:

Firstly, we derive the first-order and the second-order derivatives of each client $i$ ’s utility function $\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t})$ concerning privacy budget $\rho_{i}^{t}$ as follows:

	$\displaystyle\textstyle\frac{\partial\mathcal{U}_{i}^{t}}{\partial\rho_{i}^{t}}\!=\!\frac{\mathcal{R}_{t}\!\sum_{j=1}^{N}\!\rho_{j}^{t}\!-\!\rho_{i}^{t}}{(\sum_{i=1}^{N}\!\rho_{i}^{t})^{2}}\!-\!\phi_{1}\nu_{i}^{t}\!=\!\frac{\mathcal{R}_{t}\!\sum_{j\in N\setminus\{i\}}\!\rho_{j}^{t}}{(\sum_{i=1}^{N}\!\rho_{i}^{t})^{2}}\!-\!\phi_{1}\nu_{i}^{t},$		(21)
	$\displaystyle\textstyle\frac{\partial^{2}\mathcal{U}_{i}^{t}}{\partial(\rho_{i}^{t})^{2}}=-2\mathcal{R}_{t}\cdot\frac{\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}}{(\sum_{i=1}^{N}\rho_{i}^{t})^{3}}<0.$		(22)

As the second-order derivative $\small\partial^{2}\mathcal{U}_{i}^{t}/\partial(\rho_{i}^{t})^{2}\!>\!0$ , the utility of each client $i$ is strictly concave in the feasible region of $\rho_{i}^{t}$ . Then, we derive the optimal privacy budget of client $i$ in $t$ -th global iteration by solving equation $\partial\mathcal{U}_{i}^{t}/\partial\rho_{i}^{t}=0$ , i.e.,

\displaystyle\textstyle\rho_{i}^{t*}=\sum_{i=1}^{N}\rho_{i}^{t}-\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}.

(23)

From Eq (21), since client $i$ fails to acquire other clients’ privacy budget $\small\rho_{j}^{t},j\!\in\!N\!\setminus\!\{i\}$ as it is privacy information and there is no communication among clients during any consecutive global training iterations, we need to derive $\sum_{i=1}^{N}\rho_{i}^{t}$ term. Thus, Eq (21) can be rewritten as follows:

\displaystyle\textstyle\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}=\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}.

(24)

After summing up both sides of Eq (24), we have:

$\displaystyle\textstyle\sum_{i=1}^{N}[\textstyle\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}]$	$\displaystyle=\textstyle\sum_{i=1}^{N}\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2},$
$\displaystyle\Rightarrow\ \textstyle(N\!-\!1)\sum_{i=1}^{N}\rho_{i}^{t}$	$\displaystyle=\textstyle\frac{\phi_{1}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}\sum_{i=1}^{N}\nu_{i}^{t},$
$\displaystyle\Rightarrow\ \textstyle\sum_{i=1}^{N}\rho_{i}^{t}$	$\displaystyle=\textstyle\frac{(N-1)\mathcal{R}_{t}}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}.$	(25)

By substituting the term $\sum_{i=1}^{N}\rho_{i}^{t}$ in Eq (23), we can finalize the expression of the optimal privacy budget $\rho_{i}^{t*}$ as:

\displaystyle\rho_{i}^{t*}=\textstyle\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}\!-\!\frac{\mathcal{R}_{t}\nu_{i}^{t}}{\phi_{1}}(\frac{N-1}{\sum_{i=1}^{N}\nu_{i}^{t}})^{2}.

Hence, the theorem holds. ∎

Based on the optimal privacy budget $\rho_{i}^{t*}$ , we derive the central server’s optimal reward $\mathcal{R}_{t}^{*}$ as summarized in Theorem 2.

Theorem 2

In Stage I, based on the optimal privacy budget of each client $i$ , i.e, $\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{1}^{t*},\rho_{2}^{t*},\ldots,\rho_{N}^{t*}\}$ , the optimal reward of each global training iteration $\small\mathcal{R}_{t}^{*}\!=\!\sqrt{\frac{A_{t}\pi^{1-t}}{1-\gamma}}$ , where discount factor $\small\pi\!\in\!(0,1)$ measures the decrement of the value of reward over time, constants $\small C_{i}\!=\!\frac{(N-1)\left[(\sum_{i=1}^{N}\nu_{i}^{t})-\nu_{i}^{t}(N-1)\right]}{\phi_{1}\left(\sum_{i=1}^{N}\nu_{i}^{t}\right)^{2}}$ and $\small A_{t}\!=\!\sum_{i=1}^{N}\frac{4d\gamma\beta C^{2}}{T^{2}\lambda^{2}N^{2}C_{i}|\mathcal{D}_{i}|^{2}}$ .

Proof:

First of all, by substituting the optimal privacy budget $\rho_{i}^{t*}$ in Eq (20) into the central server’s cost function in Eq (15), the cost function $\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho}^{*})$ can be rewritten as follows:

$\displaystyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}})$	$\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}\sum_{t=1}^{T}(V^{2}\!+\!\frac{d}{N^{2}}\sum_{i=1}^{N}\frac{2C^{2}}{\|\mathcal{D}_{i}\|^{2}\rho_{i}^{t*}})$
	$\displaystyle\quad\textstyle+(1-\gamma)\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k},$
	$\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}(V^{2}T+\frac{2dC^{2}}{N^{2}}\textstyle\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{1}{\|\mathcal{D}_{i}\|^{2}\rho_{i}^{t*}})$
	$\displaystyle\textstyle\quad+(1-\gamma)\textstyle\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k},$	(26)

where $\small\boldsymbol{\rho^{*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\},t\!\in\!\{1,2,\ldots,T\}\}$ . The first-order of the central server’s cost function can be derived as:

\displaystyle\textstyle\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}=(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{4d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}\frac{\partial\rho_{i}^{t*}/\partial\mathcal{R}_{t}}{(|\mathcal{D}_{i}|\rho_{i}^{t*})^{2}}.

(27)

Then, we need to consider the existence of the solution of Eq (27). As the reward $\mathcal{R}_{t}\in[0,+\infty)$ , we have:

\displaystyle\textstyle\lim_{\mathcal{R}_{t}\rightarrow 0}\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}\!\rightarrow\!-\infty,\lim_{\mathcal{R}_{t}\rightarrow+\infty}\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}\!\rightarrow\!(1\!-\!\gamma)\pi^{t}\!>\!0.

(28)

From Eq (28), we demonstrate the existence of a solution for the equation $\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0$ . Further, according to Eq (27), we derive the second-order of $\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}})$ with respect to $\mathcal{R}_{t}$ :

\displaystyle\textstyle\frac{\partial^{2}\mathcal{U}_{T}}{\partial\mathcal{R}_{t}^{2}}=\frac{8d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}(\frac{\partial\rho_{i}^{t*}}{\partial\mathcal{R}_{t}}\frac{1}{|\mathcal{D}_{i}|})^{2}\!(\frac{1}{\rho_{i}^{t*}})^{3}>0.

(29)

Thus, based on Eqs (28)-(29), we obtain the optimal reward $\mathcal{R}_{t}^{*}$ by solving the first-order equation $\small\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0$ , i.e.,

	$\displaystyle\textstyle\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}$	$\displaystyle=\textstyle(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{4d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}\frac{\partial\rho_{i}^{t}/\partial\mathcal{R}_{t}}{(\|\mathcal{D}_{i}\|\rho_{i}^{t})^{2}}$
		$\displaystyle=\textstyle(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{A_{t}}{\mathcal{R}_{t}^{2}}=0\Rightarrow\!\mathcal{R}_{t}\!=\!\sqrt{\frac{A_{t}}{(1-\gamma)\pi^{t-1}}},$

where known constants $C_{i}\!=\!\frac{(N-1)\left[(\sum_{i=1}^{N}\nu_{i}^{t})-\nu_{i}^{t}(N-1)\right]}{\phi_{1}\left(\sum_{i=1}^{N}\nu_{i}^{t}\right)^{2}}$ and $A_{t}\\ =\sum_{i=1}^{N}\frac{4d\gamma\beta C^{2}}{T^{2}\lambda^{2}N^{2}C_{i}|\mathcal{D}_{i}|^{2}}$ . ∎

As shown in Theorem 2, the optimal reward $\mathcal{R}_{t}^{*}$ increases with $t$ , indicating that the central server necessitates data of superior quality to attain the desired accuracy level when approaching the end of the time horizon $T$ .

Definition 2

(Stackelberg Nash Equilibrium) The strategy profile $(\boldsymbol{\mathcal{R}}^{*},\{\boldsymbol{\rho_{i}^{*}}$ , $i\!\in\!\{1,2,\ldots,N\}\})$ , with $\boldsymbol{\mathcal{R}}^{*}\!=\!\{\mathcal{R}_{t}^{*},t\in\{1,\\ 2,\ldots,T\}\}$ and $\boldsymbol{\rho_{i}^{*}}\!\!=\!\!\{\rho_{i}^{t*},t\in\{1,2,\ldots,T\}\}$ constitutes a Stackelberg Nash Equilibrium if for any reward $\boldsymbol{\mathcal{R}}\!\in\!\mathbb{R}^{T}$ and any privacy budget $\rho_{i}^{t}\geq 0$ :

	$\displaystyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}^{}},\boldsymbol{\rho^{}})$	$\displaystyle\leq\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}}),$		(30)
	$\displaystyle\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t}^{*})$	$\displaystyle\geq\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t}^{}).$		(31)

Theorem 3

The above two-stage Stackelberg game possesses a Stackelberg Nash Equilibrium.

Proof:

Based on Theorem 1, we deduce that there exists the optimal strategy of privacy budget $\rho_{i}^{t*}$ for each client $i\!\in\!\{1,2,\\ \ldots,N\}$ given any reward $\mathcal{R}_{t}\!\in\!\mathbb{R}$ . Then, we need to prove that there exists an optimal reward $\mathcal{R}_{t}^{*}$ for the central server’s cost function $\small\mathcal{U}_{T}$ in the premise of optimal privacy budget vector $\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\}\}$ . According to the proof of Theorem 2, $\small\partial^{2}\mathcal{U}_{T}/\partial\mathcal{R}_{t}^{2}\!>\!0$ indicates that $\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}})$ is convex. Thus, an optimal reward $\small\mathcal{R}_{t}^{*}$ for central server exists by solving the first-order equation $\small\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0$ based on the optimal privacy budget vector $\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\}\}$ . In other words, the strategy $\mathcal{R}_{t}^{*}$ and $\boldsymbol{\rho_{i}^{t*}}$ are mutual optimal strategy for each selected client $i$ and the central server. Thus, the optimal strategy profile $\small(\boldsymbol{\mathcal{R}^{*}},\{\boldsymbol{\rho_{i}^{*}}$ , $i\!\in\!\{1,2,\ldots,N\}\})$ , with $\small\boldsymbol{\mathcal{R}^{*}}\!\!=\!\{\mathcal{R}_{t}^{*},t\!\in\!\{1,2,\ldots,T\}\}\!$ and $\small\boldsymbol{\rho_{i}^{*}}\!=\!\{\rho_{i}^{t*},t\!\in\!\{1,2,\ldots,T\}\}$ of the two-stage Stackelberg game possesses a Stackelberg Nash Equilibrium. Hence, the theorem holds. ∎

Theorem 2 reveals that the optimal privacy budget of the selected clients in Theorem 1 and optimal reward of the central server in Theorem 2 are mutually optimal, which leads to the steady state of the FL system. The overall framework process is summarized in Algorithm 1.

Algorithm 1 QI-DPFL

Input: The total client size $H$ , hyperparameter $\gamma$ and $\phi_{k}$ , $k\!\in\!\{1,2,3,4\}$ , discount factor $\pi\!\in\!(0,1)$ , local training epoch $L$ , global training iteration $T$ and convex function $I$ .

1:The central server claims the task request by outsourcing the FL task and initializes the global model.

2:Client Selection Layer:

3:for each client

\small h\in\{1,2,\ldots,H\}

4: Calculate Wasserstein distance

5: if

\small\theta_{h}<\theta_{\text{th}}

then

6: Add

h

-th client into the selected client set

7: end if

8:end for

9:Stackelberg Game-based Model Training Layer:

10:for

t=0

T

11: for selected client

\small i\!\in\!\{1,2,\ldots,N\}

in parallel do

12: Collect privacy data and perform local training:

13: for

l=1

L

14:

\small\boldsymbol{w_{i}^{l+1}(t)}=\boldsymbol{w_{i}^{l}(t)}-\eta_{i}\nabla F_{i}(\boldsymbol{w_{i}^{l}(t)})

15: end for

16: Determine the optimal privacy budget

\small\rho_{i}^{t*}

17: Perturb the local model and upload

\small\nabla\widetilde{F}_{i}(\boldsymbol{w_{i}(t)})

18: end for

19:

\small\textstyle\boldsymbol{w(t+1)}=\boldsymbol{w(t)}-\eta\cdot\frac{1}{N}\sum_{i=1}^{N}\nabla\widetilde{F}_{i}(\boldsymbol{w(t)})

20: Distribute reward

\small\mathcal{R}_{t}^{*}

to all selected clients

21:end for

IV-B Range Analysis of the Reward

In this section, we will analyze the range of the reward under the Stackelberg Nash Equilibrium to guarantee the pre-set model accuracy while simultaneously minimizing the compensation provided to clients. Firstly, we introduce the additivity property of the privacy budget $\rho_{i}^{t}$ of each selected client $i$ as summarized in Lemma 1 [19].

Lemma 1

Hypothesise two mechanism satisfy $\rho_{1}$ - $z$ CDP and $\rho_{2}$ - $z$ CDP, their composition satisfies $\rho_{1}+\rho_{2}$ - $z$ CDP.

According to Lemma 1, in the $t$ -th global iteration, it is equivalent that the global model satisfies $\rho_{t}$ - $z$ CDP, where the global privacy budget $\rho_{t}$ equals the summation of each client’s privacy budget $\small\rho_{i}^{t}$ (i.e., $\small\rho_{t}\!=\!\sum_{i=1}^{N}\!\rho_{i}^{t}$ ). Naturally, the larger privacy budget usually results in better model performance. To quantify the global model accuracy as a function of the privacy budget, we acquire the test accuracy of the training model with different privacy budgets based on different real-world datasets (i.e., MNIST, Cifar10, and EMNIST datasets). Fig. 2 illustrates the MNIST and EMNIST datasets as an example. We observe that, in $t$ -th global iteration, the global model accuracy $\epsilon_{t}$ can be regarded as a simplified concave function concerning the global privacy budget $\rho_{t}$ (i.e., $\epsilon_{t}\!=\!I(\rho_{t})\!=\!I_{1}\!-\!I_{2}\!\times\!e^{-I_{3}\rho_{t}-I_{4}}$ ), where $\small I_{k}$ , $\small k\!\in\!\{1,2,3,4\}$ are corresponding constants.

Remark 1

(Reward Range) Based on the optimal strategy of each selected client, the global privacy budget $\rho_{t}$ can be calculated as: $\rho_{t}\!=\!\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}$ , based on which, the model accuracy $\epsilon_{t}$ at $t$ -th global iteration can be derived as follows:

	$\displaystyle\epsilon_{t}$	$\displaystyle=I_{1}-I_{2}\times e^{-I_{3}\rho_{t}-I_{4}}$
		$\displaystyle=I_{1}-I_{2}\times e^{-\frac{I_{3}\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}-I_{4}}\geq\epsilon,$		(32)

where $\epsilon$ is the predefined model accuracy that should be reached after $T$ iterations of model training. Thus, there exists a lower and an upper bound for the allocated reward $\mathcal{R}_{t}$ in the $t$ -th global iteration, i.e.,

\displaystyle\textstyle\frac{\phi_{1}[\log(\frac{I_{2}}{I_{1}-\epsilon})-I_{4}]}{(N-1)I_{3}/\sum_{i=1}^{N}\!\nu_{i}^{t}}\leq\mathcal{R}_{t}\leq\frac{\phi_{1}[\log(\frac{I_{2}}{I_{1}-\epsilon_{\text{max}}})-I_{4}]}{(N-1)I_{3}/\sum_{i=1}^{N}\!\nu_{i}^{t}}.

(33)

V Numerical Experiments

In this section, we conduct extensive experiments to demonstrate the efficiency of our proposed framework on commonly used real-world datasets in federated learning.

V-A Experimental Settings

Our proposed approach is implemented on three real-world datasets (i.e., MNIST [23], Cifar10 [24] and EMNIST [25]) to demonstrate the generalization of our framework. We adopt Dirichlet distribution [26] to partition the datasets into Non-IID by setting the concentration parameter as 1.0. In addition, three classic machine-learning models with different structures are implemented for each dataset. Specifically, for the MNIST dataset, we leverage a linear model with a fully connected layer of 784 input and 10 output channels. As to the Cifar10 dataset, the local training model is consistent with [21]. For the EMNIST dataset, we adopt a CNN similar to the structure of LeNet-5 [23]. The correlated basic dataset information detailed parameter settings are summarized in Table I.

V-B Experiments on Real Datasets

In the experiments, we denote the approach without client selection and DP mechanism as FedAvg, the approach only performs client selection as FedAvg-select, the approach only considers DP mechanism as FedAvg-DP, and our proposed method with both client selection and DP mechanism as QI-DPFL. Moreover, to verify the efficiency and effectiveness of our proposed FL framework, we compare it with two baselines, named Max and Random, which is similar to the comparison paradigm in [27]. Specifically, Max means the central server chooses the largest possible reward value in each global iteration to guarantee the best performance. Random refers to selecting a random reward for clients incentivizing. Other incentive mechanisms such as [19] and [9] are designed based on different objectives, namely to maximize the utility of both clients and the central server, and maximize the profit of the model marketplace by designing an auction scenario for DPFL, respectively, which can’t be compared directly with our QI-DPFL framework. Thus, We exclude these two closely related methods from our comparative analysis.

EMNIST: For IID data distribution, as shown in Fig. 3a, compared with FedAvg, FedAvg-DP considers DP mechanism for privacy preservation and obtains the lowest model accuracy. As FedAvg-select and QI-DPFL select clients with high-quality data, they improve the model performance on convergence rate and accuracy. In Fig. 3b-3d, our proposed QI-DPFL achieves the lowest cost and allocated reward in the premise of guaranteeing model performance as the Max-select method and achieves higher accuracy than FedAvg-DP. For Non-IID data distribution, the superiority of QI-DPFL is more pronounced. In Fig. 4a, FedAvg-select and QI-DPFL improve the convergence rate and model accuracy by considering the client selection mechanism compared to FedAvg. FedAvg-DP obtains the lowest accuracy as artificial Gaussian random noise is added to avoid privacy leakage. Our proposed framework QI-DPFL with perturbation on local parameters still achieves a similar model performance as FedAvg-select, which shows the effectiveness of QI-DPFL. In Fig. 4b-4d, our algorithm attains model performance that is on par with the Max-select method while keeping costs and rewards minimal. Further, QI-DPFL outperforms FedAvg-DP by selecting participants with superior data quality. The experimental result on the MNIST dataset is similar to that on the EMNIST dataset.

TABLE I: Basic Information of Datasets and Parameter Settings

Datasets	Training Set Size	Validation Set Size	Class	Image Size	$\eta$	$T$	Discount Factor $\pi$
MNIST	60,000	10,000	10	1 $\!\times\!$ 28 $\!\times\!$ 28	0.01	30	0.9429
Cifar10	50,000	10,000	10	3 $\!\times\!$ 32 $\!\times\!$ 32	0.1	80	0.9664
EMNIST	731,668	82,587	62	1 $\!\times\!$ 28 $\!\times\!$ 28	0.01	30	0.901

Cifar10: For the IID datasets, as shown in Fig. 5a, compared with FedAvg, the accuracy of FedAvg-select achieves a higher accuracy with the client selection mechanism. To avoid privacy leakage, the artificial noise added to the transmitted parameters in the DP mechanism may deteriorate the model performance. The accuracy of FedAvg-DP is lower than that of FedAvg as shown in Fig. 5a. Our proposed algorithm QI-DPFL with the perturbation on transmitted local parameters still achieves a fast convergence rate and similar testing accuracy as FedAvg-select, revealing the effectiveness of QI-DPFL. Fig. 5b-5d indicate that our approach achieves comparable model performance compared to the Max-select method while maintaining the lowest cost and reward. Additionally, QI-DPFL achieves higher accuracy than FedAvg-DP by selecting clients with high data quality. Under Non-IID data distribution, the advantages of QI-DPFL are even more conspicuous. In Fig. 6a, FedAvg-DP obtains the lowest model accuracy as the DP mechanism. FedAvg-select and QI-DPFL select clients with high-quality data, they thus exhibit enhanced performance concerning both convergence rate and model accuracy. Moreover, although QI-DPFL considers artificial noise, it still achieves similar accuracy as FedAvg-select, which demonstrates the effectiveness of our model. In Fig. 6b-6d, QI-DPFL attains model accuracy that is on par with the Max-select scheme while keeping costs and rewards minimal. Further, QI-DPFL outperforms FedAvg-DP by selecting clients with superior data quality.

Based on the above experimental results, it becomes evident that the advantages of our proposed QI-DPFL are particularly pronounced on the Non-IID datasets, which is attributed to the heightened effectiveness of the client selection mechanism.

VI Conclusion

In this paper, we propose a novel federated learning framework called QI-DPFL to jointly solve the client selection and incentive mechanism problem on the premise of preserving clients’ data privacy. We adopt Earth Mover’s Distance (EMD) metric to select clients with high-quality data. Furthermore, we model the interactions between clients and the central server as a two-stage Stackelberg game and derive the Stackelberg Nash Equilibrium to describe the steady state of the system. Extensive experiments on MNIST, Cifar10, and EMNIST datasets for both IID and Non-IID settings demonstrate that QI-DPFL achieves comparable model accuracy and faster convergence rate with the lowest cost and reward for the central server.

References

[1] W. G. Voss, “European union data privacy law reform: General data protection regulation, privacy shield, and the right to delisting,” The Business Lawyer, vol. 72, no. 1, pp. 221–234, 2016.
[2] R. Zhou, J. Pang, Z. Wang, J. C. Lui, and Z. Li, “A truthful procurement auction for incentivizing heterogeneous clients in federated learning,” in 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS). IEEE, 2021, pp. 183–193.
[3] W. He, H. Yao, T. Mai, F. Wang, and M. Guizani, “Three-stage stackelberg game enabled clustered federated learning in heterogeneous uav swarms,” IEEE Transactions on Vehicular Technology, 2023.
[4] J. S. Ng, W. Y. B. Lim, H.-N. Dai, Z. Xiong, J. Huang, D. Niyato, X.-S. Hua, C. Leung, and C. Miao, “Joint auction-coalition formation framework for communication-efficient federated learning in uav-enabled internet of vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 4, pp. 2326–2344, 2020.
[5] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” Advances in neural information processing systems, vol. 32, 2019.
[6] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
[7] M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography: 14th International Conference, TCC 2016-B, Beijing, China, October 31-November 3, 2016, Proceedings, Part I. Springer, 2016, pp. 635–658.
[8] I. Mironov, “Rényi differential privacy,” in 2017 IEEE 30th computer security foundations symposium (CSF). IEEE, 2017, pp. 263–275.
[9] P. Sun, X. Chen, G. Liao, and J. Huang, “A profit-maximizing model marketplace with differentially private federated learning,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 2022, pp. 1439–1448.
[10] X. Wu, Y. Zhang, M. Shi, P. Li, R. Li, and N. N. Xiong, “An adaptive federated learning scheme with differential privacy preserving,” Future Generation Computer Systems, vol. 127, pp. 362–372, 2022.
[11] J. P. Near and C. Abuah, “Programming differential privacy,” URL: https://uvm, 2021.
[12] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim, “Incentive design for efficient federated learning in mobile networks: A contract theory approach,” in 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS). IEEE, 2019, pp. 1–5.
[13] M. Wu, D. Ye, J. Ding, Y. Guo, R. Yu, and M. Pan, “Incentivizing differentially private federated learning: A multidimensional contract approach,” IEEE Internet of Things Journal, vol. 8, no. 13, pp. 10 639–10 651, 2021.
[14] Z. Yi, Y. Jiao, W. Dai, G. Li, H. Wang, and Y. Xu, “A stackelberg incentive mechanism for wireless federated learning with differential privacy,” IEEE Wireless Communications Letters, vol. 11, no. 9, pp. 1805–1809, 2022.
[15] Y. Xu, M. Xiao, J. Wu, H. Tan, and G. Gao, “A personalized privacy preserving mechanism for crowdsourced federated learning,” IEEE Transactions on Mobile Computing, 2023.
[16] N. Ding, Z. Fang, and J. Huang, “Optimal contract design for efficient federated learning with multi-dimensional private information,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 186–200, 2020.
[17] H. Wu and P. Wang, “Fast-convergent federated learning with adaptive weighting,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 4, pp. 1078–1088, 2021.
[18] Y. Sun, H. Fernando, T. Chen, and S. Shahrampour, “On the stability analysis of open federated learning systems,” in 2023 American Control Conference (ACC). IEEE, 2023, pp. 867–872.
[19] R. Hu and Y. Gong, “Trading data for learning: Incentive mechanism for on-device federated learning,” in GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020, pp. 1–6.
[20] Y. Jiao, P. Wang, D. Niyato, B. Lin, and D. I. Kim, “Toward an automated auction framework for wireless federated learning services market,” IEEE Transactions on Mobile Computing, vol. 20, no. 10, pp. 3034–3048, 2020.
[21] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
[22] A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent optimal for strongly convex stochastic optimization,” arXiv preprint arXiv:1109.5647, 2011.
[23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[24] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Advances in neural information processing systems, 2009.
[25] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN). IEEE, 2017, pp. 2921–2926.
[26] T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.
[27] X. Kang, G. Yu, J. Wang, W. Guo, C. Domeniconi, and J. Zhang, “Incentive-boosted federated crowdsourcing,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 6021–6029.

$\displaystyle\textstyle\mathbb{E}[\\|\nabla\widetilde{F}(\boldsymbol{w(t)})\\|_{2}^{2}]$	$\displaystyle=\textstyle\mathbb{E}[\\|\nabla F(\boldsymbol{w(t)})+\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{n_{i}(t)}\\|_{2}^{2}]$
	$\displaystyle=\textstyle\mathbb{E}[\\|\nabla F(\boldsymbol{w(t)})\\|_{2}^{2}]\!+\!\frac{1}{N^{2}}\mathbb{E}[\\|\textstyle\sum_{i=1}^{N}\!\boldsymbol{n_{i}(t)}\\|_{2}^{2}]$
	$\displaystyle\leq\textstyle V^{2}+\frac{d}{N^{2}}\textstyle\sum_{i=1}^{N}{\sigma_{i}^{2}(t)}\triangleq G^{2}(t).$	(13)

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy ††thanks: *Corresponding Author. ††thanks: This work was supported by National Natural Science Foundation of China under Grant No. 62206320.

Abstract

Index Terms:

I Introduction

II Preliminaries

II-A Standard Federated Learning Model

Assumption 1

Assumption 2

II-B Differential Privacy Mechanism

Definition 1

Corollary 1

Proof:

III System Model

III-A Quality-Aware Client Selection Mechanism

III-B Incentive Mechanism Design with Stackelberg Game

III-B1 Central Server’s Cost (Stage I)

III-B2 Client’s Utility (Stage II)

IV Stackelberg Nash Equilibrium Analysis

IV-A Optimal Strategy Profile

Theorem 1

Proof:

Theorem 2

Proof:

Definition 2

Theorem 3

Proof:

IV-B Range Analysis of the Reward

Lemma 1

Remark 1

V Numerical Experiments

V-A Experimental Settings

V-B Experiments on Real Datasets

VI Conclusion

References

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
^†^†thanks: ^*Corresponding Author. ^†^†thanks: This work was supported by National Natural Science Foundation of China under Grant No. 62206320.