This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

QI-DPFL: Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy
thanks: *Corresponding Author. thanks: This work was supported by National Natural Science Foundation of China under Grant No. 62206320.

Wenhao Yuan, Xuehe Wang* School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China
Email: [email protected], [email protected]
Abstract

Federated Learning (FL) has increasingly been recognized as an innovative and secure distributed model training paradigm, aiming to coordinate multiple edge clients to collaboratively train a shared model without uploading their private datasets. The challenge of encouraging mobile edge devices to participate zealously in FL model training procedures, while mitigating the privacy leakage risks during wireless transmission, remains comparatively unexplored so far. In this paper, we propose a novel approach, named QI-DPFL (Quality-Aware and Incentive-Boosted Federated Learning with Differential Privacy), to address the aforementioned intractable issue. To select clients with high-quality datasets, we first propose a quality-aware client selection mechanism based on the Earth Mover’s Distance (EMD) metric. Furthermore, to attract high-quality data contributors, we design an incentive-boosted mechanism that constructs the interactions between the central server and the selected clients as a two-stage Stackelberg game, where the central server designs the time-dependent reward to minimize its cost by considering the trade-off between accuracy loss and total reward allocated, and each selected client decides the privacy budget to maximize its utility. The Nash Equilibrium of the Stackelberg game is derived to find the optimal solution in each global iteration. The extensive experimental results on different real-world datasets demonstrate the effectiveness of our proposed FL framework, by realizing the goal of privacy protection and incentive compatibility.

Index Terms:
Federated learning, Stackelberg game, differential privacy, client selection mechanism.

I Introduction

In the era of rapid advancements in science and technology, an unprecedented volume of data has been generated by edge devices. It is anticipated that the data volume in human society will experience geometric growth soon. Concurrently, the surge in private data is accompanied by escalating concerns over data privacy and security, drawing considerable focus from both academic and industrial sectors. Notably, the enactment of stringent data privacy regulations such as GDPR [1], poses a formidable challenge in accessing and utilizing high-quality private data for training artificial intelligence models. In addition, the huge communication costs associated with data transmission cannot be overlooked. These challenges pave the way for the emergence of innovative machine-learning technologies that mitigate the risk of privacy disclosure [2].

Federated learning emerges as a compelling distributed machine-learning paradigm, offering multiple benefits including privacy preservation and communication efficiency [3]. However, the widespread implementation of efficient FL systems still encounters several challenges that warrant further investigation [4]. Most research concentrates on enhancing FL model performance and assumes that FL models possess adequate safety, whereas findings from [5] indicate potential risks of significant privacy breaches in gradient propagation schemes. Differential privacy (DP) [6], a prevalent method for safeguarding data privacy, is often employed to alleviate the adverse effects of strong noise injected while enhancing the protection level, advancements such as ρ\rho-zzCDP [7] and Re´\acute{\text{e}}nyi DP [8] have been proposed. Recent studies have seen the integration of DP methods with FL, striving to strike a harmonious equilibrium between model performance and privacy preservation [9, 10]. However, most differential privacy-based FL approaches rely on standard (ϵ,δ)(\epsilon,\delta)-DP mechanism, which is susceptible to the Catastrophe Mechanism [11].

Additionally, an idealized assumption in current research posits that mobile devices will participate in FL model training unconditionally once invited, a notion that often falls short in practical scenarios as engaging in model training entails significant consumption of computational and communication resources, in the meanwhile, participants need to be wary of the potential risk of information leakage [12]. Without a well-designed economic incentive mechanism, egocentric mobile devices are probably reluctant to partake in [13]. Recently, incentive mechanism-based federated network optimization and FL have gradually gained extensive attention. [14, 15] focus on modeling the interactions between clients and the central server as a Stackelberg game. Besides, there has emerged a surge of auction-based FL algorithms [2, 9]. Contract theory-based FL models are also proposed [16, 13]. Nevertheless, most aforementioned works on incentive mechanism design overlook the security assurance during parameter transmission between the central server and edge nodes.

To incentivize the participation of mobile devices with high-quality data and eliminate the privacy threats associated with gradient disclosure, we innovatively propose a quality-aware and incentive-boosted federated learning framework based on the ρ\rho-zero-concentrated differential privacy (ρ\rho-zzCDP) technique. We first design a client selection mechanism grounded in Earth Mover’s Distance (EMD) metric, followed by rigorous analysis of the differentially private federated learning (DPFL) framework, which introduces artificial Gaussian noise to obscure local model parameters, thereby addressing privacy concerns. Further, based on the DPFL framework, the interactions between the heterogeneous clients and the central server are modeled as a two-stage Stackelberg game termed QI-DPFL. In Stage I, the central server devises a time-dependent reward for clients to jointly minimize the accuracy loss and total reward. In Stage II, each selected client determines the optimal privacy budget in accordance with the reward allocated to maximize respective utility. The multi-fold contributions of our work are summarized as follows:

  • Privacy preservation and incentive mechanism in FL: We propose a novel and efficient quality-aware and incentive-boosted federated learning framework based on ρ\rho-zzCDP mechanism, named QI-DPFL. We first select the clients with high-quality data and then model the interactions between the central server and the selected clients as a two-stage Stackelberg game, which enables each participant to freely customize its privacy budget while achieving the well model performance in the premise of protecting data privacy.

  • Earth Mover’s Distance for client selection: We innovatively adopt the EMD metric for client selection mechanism design to sift the geographically distributed participants with high-quality datasets and improve the training performance.

  • Stackelberg Nash Equilibrium Analysis: By analyzing the interactions between the central server and selected clients, we derive the optimal reward and the optimal privacy budget in Stage I and Stage II respectively. Moreover, we demonstrate that the optimal strategy profile forms a Stackelberg Nash Equilibrium. Extensive experiments on different real-world datasets verify the effectiveness and security of our proposed differentially private federated learning framework.

Refer to caption

Figure 1: The framework of QI-DPFL consists of the following procedures: Step 1: The central server initializes the global model and publishes the task request. Step 2: The clients submit their data distributions to the central server and are selected by EMD metric. Step 3: Each selected client collects private database 𝒟i\mathcal{D}_{i} and performs local training. Step 4: Once finishing the local training, each client uploads the perturbed model parameter F~i(𝒘𝒊(𝒕))\nabla\widetilde{F}_{i}(\boldsymbol{w_{i}(t)}) to the central server. Step 5: The central server offers a reward t\mathcal{R}_{t} to all participating clients to compensate the cost of each selected participant suffered. Step 6: Upon aggregating local parameters, the global model updates and is further verified on the testing set. The training process concludes when the required accuracy or a preset number of iterations is reached. Otherwise, the central server redistributes the updated global model to clients for the next iteration (Steps 3-6).

II Preliminaries

II-A Standard Federated Learning Model

FL introduces an innovative decentralized machine learning paradigm in which a global model is collaboratively trained by tremendous geographically distributed clients with locally collected data. Each client engages in one or multiple epochs of mini-batch stochastic gradient descent (SGD), subsequently transmitting the updated local model to a central server for the local model aggregation and global model update. Then, the central server dispatches the updated global parameter to all clients to trigger a fresh cycle of local training until the global model converges or reaches a predefined maximum iteration.

Given NN clients participating in FL training and each client i{1,2,,N}\small i\in\{1,2,\ldots,N\} utilizes its localized dataset 𝒟i\small\mathcal{D}_{i} with a data size of |𝒟i|\small|\mathcal{D}_{i}| to contribute to model training. In each global iteration, each client parallelly performs LL (L1\small L\!\geq\!1) epochs of SGD training to update its local parameter:

𝒘𝒊𝒍+𝟏(𝒕)=𝒘𝒊𝒍(𝒕)ηiFi(𝒘𝒊𝒍(𝒕)),\displaystyle\boldsymbol{w_{i}^{l+1}(t)}=\boldsymbol{w_{i}^{l}(t)}-\eta_{i}\nabla F_{i}(\boldsymbol{w_{i}^{l}(t)}), (1)

where ηi(0,1)\small\eta_{i}\!\in\!(0,1) is the learning rate of client ii, l{1,2,,L}\small l\!\in\!\{1,2,\ldots,\\ L\} and we define the local model 𝒘𝒊𝑳(𝒕)=𝒘𝒊(𝒕)\small\boldsymbol{w_{i}^{L}(t)}\!=\!\boldsymbol{w_{i}(t)}. The central server averages the local models from NN participating clients to update the global model 𝒘(𝒕)\boldsymbol{w(t)}:

𝒘(𝒕)=i=1N|𝒟i|j=1N|𝒟j|𝒘𝒊(𝒕).\displaystyle\boldsymbol{w(t)}=\textstyle\sum_{i=1}^{N}\frac{|\mathcal{D}_{i}|}{\sum_{j=1}^{N}|\mathcal{D}_{j}|}\boldsymbol{w_{i}(t)}. (2)

The goal of FL is to find the optimal global parameter 𝒘\boldsymbol{w}^{*} to minimize the global loss function F(𝒘)\small F(\boldsymbol{w}):

𝒘=argmin𝒘F(𝒘)=argmin𝒘i=1NDij=1NDjFi(𝒘).\displaystyle\textstyle\boldsymbol{w}^{*}\!=\!\text{arg}\min_{\boldsymbol{w}}F(\boldsymbol{w})\!=\!\text{arg}\min_{\boldsymbol{w}}\sum_{i=1}^{N}\!\frac{D_{i}}{\sum_{j=1}^{N}D_{j}}F_{i}(\boldsymbol{w}). (3)

For the sake of theoretical analysis in the later content, we employ the following commonly used assumptions [17, 18].

Assumption 1

[β\beta-Lipschitz Smoothness] The local loss function Fi(𝒘)\small F_{i}(\boldsymbol{w}) is β\beta-Lipschitz smoothness for each participating client i{0,1,,N}\small i\in\{0,1,\ldots,N\} with any 𝒘,𝒘d\small\boldsymbol{w},\boldsymbol{w}^{\prime}\in\mathbb{R}^{d}:

Fi(𝒘)Fi(𝒘)β𝒘𝒘.\displaystyle\|\nabla{F_{i}(\boldsymbol{w}})-\nabla{F_{i}(\boldsymbol{w}^{\prime}})\|\leq\beta\|\boldsymbol{w}-\boldsymbol{w}^{\prime}\|. (4)
Assumption 2

[λ\lambda-Strong Convexity] The global loss function F(𝒘)\small F(\boldsymbol{w}) is λ\lambda-strong convex with any 𝒘,𝒘d\small\boldsymbol{w},\boldsymbol{w}^{\prime}\in\mathbb{R}^{d}:

F(𝒘)F(𝒘)𝒘𝒘,F(𝒘)+λ2𝒘𝒘2.\displaystyle F(\boldsymbol{w})\!-\!F(\boldsymbol{w}^{\prime})\geq\langle\boldsymbol{w}\!-\!\boldsymbol{w}^{\prime},\nabla{F(\boldsymbol{w}^{\prime})}\rangle\!+\!\dfrac{\lambda}{2}\|\boldsymbol{w}\!-\!\boldsymbol{w}^{\prime}\|^{2}. (5)

II-B Differential Privacy Mechanism

For tackling the attacks such as gradient inverse attack [5] that might disclose the original training data without accessing the datasets, ρ\rho-zero-concentrated differential privacy (ρ\rho-zzCDP) was proposed in [7], which attains a tight composition bound and is more suitable for analyzing the end-to-end privacy loss of iteration algorithms [19].

Firstly, we define a metric p\small\mathcal{L}_{p} to measure the privacy loss. Specifically, for a randomized mechanism :𝒳\small\mathcal{M}\!:\!\mathcal{X}\!\rightarrow\!\mathcal{E} with domain 𝒳\small\mathcal{X} and range \small\mathcal{E} given any two adjacent datasets 𝒟,𝒟𝒳\small\mathcal{D},\mathcal{D}^{{}^{\prime}}\\ \!\subseteq\!\mathcal{X} with the same size but only differ by one sample, after observing an output oo\in\mathbb{R}, the privacy loss p\small\mathcal{L}_{p} is given as:

p=ln(Pr[(𝒟)=o]Pr[(𝒟)=o]).\displaystyle\textstyle\mathcal{L}_{p}=\text{ln}\left(\frac{Pr[\mathcal{M}(\mathcal{D})=o]}{Pr[\mathcal{M}(\mathcal{D}^{\prime})=o]}\right). (6)

Then, the formal definition of ρ\rho-zzCDP is given as follows:

Definition 1

A randomized mechanism :𝒳\small\mathcal{M}\!:\!\mathcal{X}\!\!\rightarrow\!\mathcal{E} with domain 𝒳\small\mathcal{X} and range \small\mathcal{E} satisfies ρ\rho-zero-concentrated differential privacy (ρ\rho-zzCDP) if for any α(1,)\small\alpha\!\in\!(1,\infty), we have:

𝔼[e(α1)p]e(α1)(ρα).\displaystyle\mathbb{E}[e^{(\alpha-1)\mathcal{L}_{p}}]\leq e^{(\alpha-1)(\rho\alpha)}. (7)

Based on the Gaussian mechanism, given a query function Q:𝒳\small Q\!:\!\mathcal{X}\!\!\rightarrow\!\mathcal{E}, the sensitivity of the query function Q\small Q is defined as ΔQ=max𝒟,𝒟Q(𝒟)Q(𝒟)2\small\!\Delta{Q}\!=\!\max_{\mathcal{D},\mathcal{D}^{\prime}}\!\!\|Q(\mathcal{D})-Q(\mathcal{D}^{\prime})\|_{2} for any two adjacent datasets 𝒟,𝒟𝒳\small\mathcal{D},\mathcal{D}^{\prime}\!\subseteq\!\mathcal{X}. Specifically, in the tt-th global training iteration, by adding the artificial Gaussian noise 𝒏𝒊(𝒕)𝒩(0,σi2(t))\small\boldsymbol{n_{i}(t)}\!\sim\!\mathcal{N}(0,\sigma_{i}^{2}(t)), the transmitted parameter of client ii becomes:

F~i(𝒘(𝒕))=Fi(𝒘(𝒕))+𝒏𝒊(𝒕),\displaystyle\small\nabla\widetilde{F}_{i}(\boldsymbol{w(t)})=\nabla F_{i}(\boldsymbol{w(t)})+\boldsymbol{n_{i}(t)}, (8)

where ΔQ22σi2(t)\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}-zzCDP is satisfied with ρit=ΔQ22σi2(t)\rho_{i}^{t}\!=\!\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}[6]. Based on the definition of the query function, we can easily derive the upper bound of the sensitivity ΔQ\small\Delta{Q} as given in Corollary 1.

Corollary 1

In global iteration, by utilizing the Gaussian mechanism to perturb transmitted parameter and implementing ρ\rho-zzCDP mechanism for each participating client ii, the sensitivity ΔQ\Delta{Q} of query function QQ is bounded by 2C/|𝒟i|2C/|\mathcal{D}_{i}|.

Proof:

For client ii with any two adjacent datasets 𝒟i\mathcal{D}_{i} and 𝒟i\mathcal{D}_{i}^{{}^{\prime}}, the sensitivity of query function QQ with input 𝒟i\mathcal{D}_{i} and 𝒟i\mathcal{D}_{i}^{{}^{\prime}} can be obtained as follows:

ΔQ=\displaystyle\small\Delta{Q}\!=\! max𝒟i,𝒟iQ(𝒟i)Q(𝒟i)2=max𝒟i,𝒟i1|𝒟i|j𝒟ifij(𝒘,𝒙𝒊𝒋)\displaystyle\small\max_{\mathcal{D}_{i},\mathcal{D}_{i}^{{}^{\prime}}}\|Q(\mathcal{D}_{i})\!-\!Q(\mathcal{D}_{i}^{{}^{\prime}})\|_{2}\!=\!\max_{\mathcal{D}_{i},\mathcal{D}_{i}^{{}^{\prime}}}\textstyle\frac{1}{|\mathcal{D}_{i}|}\|\!\sum_{j\in\mathcal{D}_{i}}\!\!\!\nabla f_{i}^{j}(\boldsymbol{w},\boldsymbol{x}_{\boldsymbol{i}}^{\boldsymbol{j}})
j𝒟ifij(𝒘,𝒙𝒊𝒋)2=2C|𝒟i|,\displaystyle-\small\textstyle\sum_{j\in\mathcal{D}_{i}^{{}^{\prime}}}\!\nabla f_{i}^{j}(\boldsymbol{w}^{\prime},\boldsymbol{x}_{\boldsymbol{i}}^{\boldsymbol{j}})\|_{2}\!=\!\frac{2C}{|\mathcal{D}_{i}|}, (9)

where we assume that there exists a clipping threshold CC for the ii-th client’s local model in tt-th global iteration in the absence of adding artificial perturbation, i.e., 𝒘𝒊(𝒕)C\|\boldsymbol{w_{i}(t)}\|\!\leq C. ∎

In tt-th global training iteration, based on Corollary 1 and note that ρit=ΔQ22σi2(t)\rho_{i}^{t}=\frac{\Delta{Q}^{2}}{2\sigma_{i}^{2}(t)}, the variance of the Gaussian random noise σi2(t)\sigma_{i}^{2}(t) of client ii can be easily derived as follows:

σi2(t)=2C2ρit|𝒟i|2.\displaystyle\textstyle\sigma_{i}^{2}(t)=\frac{2C^{2}}{\rho_{i}^{t}|\mathcal{D}_{i}|^{2}}. (10)

III System Model

We propose a two-layer federated learning framework based on differential privacy technique as shown in Fig. 1. In the client selection layer, the central server outsources FL tasks to all clients. The clients who are willing to participate submit their data distributions only containing the information of label frequencies to the central server, based on which, the central server selects the participants with high-quality data according to the EMD metric. Then, in the second layer, the interactions between the central server and selected clients are formulated as a two-stage Stackelberg game, where the central server decides the optimal rewards to the selected clients to incentivize them to contribute data with a high privacy budget, and each selected client determines the privacy budget ρit\rho_{i}^{t} based on the rewards and adds artificial noise to their local parameter to avoid severe privacy leakage during gradient uploading.

III-A Quality-Aware Client Selection Mechanism

From the perspective of the central server, to minimize the cost while reaching the accuracy threshold, it is crucial to select the clients with superior data quality by a metric to quantify clients’ potential contributions to the FL system. To disclose the pertinent information about the local data while ensuring privacy preservation, we focus on the critical attribute of local data (i.e., data distribution) [20].

In FL, the data distribution varies owing to the distinct preferences of heterogeneous clients, leading to a non-independent and identically distributed (Non-IID) setting. The characteristic of Non-IID data dominantly affects the model performance, such as training accuracy [21]. Inspired by [20], the accuracy attenuation is significantly affected by the weight divergence, which can be quantified by the Earth Mover’s Distance (EMD) metric. The larger EMD value indicates higher weight divergence, thus damaging the global model quality.

In the first layer, we assume that there are a total of HH clients who are willing to participate in the FL model training process. Considering a PP class classification task that defines over a compact space 𝒴\small\mathcal{Y} and a label space 𝒵\small\mathcal{Z}. The data sample 𝒟h={𝒙𝒉,yh}\small\mathcal{D}_{h}\!=\!\{\boldsymbol{x_{h}},y_{h}\} of client h{1,2,,H}\small h\!\in\!\{1,2,\ldots,H\} with 𝒙𝒉𝒴\small\boldsymbol{x_{h}}\in\mathcal{Y} and yh𝒵\small y_{h}\in\mathcal{Z} follows the distribution h\small\mathbb{P}_{h}. Under the premise of the actual distribution a\small\mathbb{P}_{a} for the whole population, we denote the EMD of 𝒟h\small\mathcal{D}_{h} by θh\small\theta_{h}, which can be calculated as follows:

θh=j𝒴h(y=j)a(y=j),\displaystyle\small\theta_{h}=\textstyle\sum_{j\in\mathcal{Y}}\left\|\mathbb{P}_{h}(y=j)-\mathbb{P}_{a}(y=j)\right\|, (11)

where the actual distribution a\mathbb{P}_{a} is a reference distribution and can be the public information or estimated according to the historical data. If the EMD value θh\small\theta_{h} of client h{1,2,,H}\small h\!\in\!\{1,2,\ldots,H\} is larger than the pre-set threshold (i.e., θh>θth\small\theta_{h}\!\!>\!\!\theta_{\text{th}}), client hh encounters a failure in executing the FL task.

III-B Incentive Mechanism Design with Stackelberg Game

Supposing that NN clients are selected by the central server. The interactions between the central server and selected clients are modeled as a two-stage Stackelberg game. Specifically, at Stage I, the central server, which acts as the leader, decides the optimal payment t\small\mathcal{R}_{t}^{*} in tt-th global iteration to minimize its cost 𝒰T\small\mathcal{U}_{T}. Then, at Stage II, based on the reward allocated by the central server, each selected client i{1,2,,N}i\in\{1,2,\ldots,N\}, which acts as the follower, maximizes its utility function 𝒰it\small\mathcal{U}_{i}^{t} by determining the optimal privacy budget ρit\rho_{i}^{t}.

III-B1 Central Server’s Cost (Stage I)

Before introducing the cost function of the central server, we first discuss how the privacy budget ρit\rho_{i}^{t} of each client affects the accuracy loss of the global model. From Eq (8), we can derive the global model with Gaussian random noise as follows:

F~(𝒘(𝒕))\displaystyle\textstyle\nabla\widetilde{F}(\boldsymbol{w(t)}) =1Ni=1N(Fi(𝒘(𝒕))+𝒏𝒊(𝒕))\displaystyle=\textstyle\frac{1}{N}\sum_{i=1}^{N}\left(\nabla F_{i}(\boldsymbol{w(t)})+\boldsymbol{n_{i}(t)}\right)
=F(𝒘(𝒕))+1Ni=1N𝒏𝒊(𝒕).\displaystyle=\textstyle\nabla F(\boldsymbol{w(t)})+\frac{1}{N}\textstyle\sum_{i=1}^{N}\boldsymbol{n_{i}(t)}. (12)

Inspired by [22, 14], we assume that the global loss function F(𝒘(𝒕))\small\nabla F(\boldsymbol{w(t)}) attains an upper bound VV (i.e., F(𝒘(𝒕))2V\small\|\nabla F(\boldsymbol{w(t)})\|_{2}\!\leq\!V). Further, note that the Gaussian random noise possesses zero mean and 𝔼[i=1N𝒏𝒊(𝒕)22]=dσi2(t)\small\mathbb{E}[\|\!\sum_{i=1}^{N}\!\boldsymbol{n_{i}(t)}\|_{2}^{2}]\!=\!d\sigma_{i}^{2}(t), where dd represents the dimension of the input vector and the variance σi2(t)\small\sigma_{i}^{2}(t) is determined by client ii’s privacy budget ρit\small\rho_{i}^{t} as shown in Eq (10). Thus, the upper bound of 𝔼[F~(𝒘(𝒕))22]\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}] can be derived as:

𝔼[F~(𝒘(𝒕))22]\displaystyle\textstyle\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}] =𝔼[F(𝒘(𝒕))+1Ni=1N𝒏𝒊(𝒕)22]\displaystyle=\textstyle\mathbb{E}[\|\nabla F(\boldsymbol{w(t)})+\frac{1}{N}\sum_{i=1}^{N}\boldsymbol{n_{i}(t)}\|_{2}^{2}]
=𝔼[F(𝒘(𝒕))22]+1N2𝔼[i=1N𝒏𝒊(𝒕)22]\displaystyle=\textstyle\mathbb{E}[\|\nabla F(\boldsymbol{w(t)})\|_{2}^{2}]\!+\!\frac{1}{N^{2}}\mathbb{E}[\|\textstyle\sum_{i=1}^{N}\!\boldsymbol{n_{i}(t)}\|_{2}^{2}]
V2+dN2i=1Nσi2(t)G2(t).\displaystyle\leq\textstyle V^{2}+\frac{d}{N^{2}}\textstyle\sum_{i=1}^{N}{\sigma_{i}^{2}(t)}\triangleq G^{2}(t). (13)

Suppose that the global loss function F(𝒘(𝒕))\nabla F(\boldsymbol{w(t)}) satisfies β\beta-Lipschitz Smoothness (Assumption 1) and λ\lambda-Strong Convexity (Assumption 2), attains minimum at 𝒘\small\boldsymbol{w^{*}} and 𝔼[F~(𝒘(𝒕))22]G2(t)\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}]\!\!\leq\!\!G^{2}(t). Since the upper bound of 𝔼[F~(𝒘(𝒕))22]\small\mathbb{E}[\|\nabla\widetilde{F}(\boldsymbol{w(t)})\|_{2}^{2}] varies with time, we define G2=t=1TG2(t)/T\small G^{2}\!=\!\sum_{t=1}^{T}\!G^{2}(t)/T. Denote the learning rate as ηt=1/λt\small\eta_{t}\!=\!1/\lambda t, the accuracy loss can be expressed as:

𝔼[F(𝒘(𝑻))F(𝒘)]2βG2λ2T.\displaystyle\textstyle\mathbb{E}[F(\boldsymbol{w(T)})-F(\boldsymbol{w^{*}})]\leq\frac{2\beta G^{2}}{\lambda^{2}T}. (14)

Denote the reward vector by 𝓡={1,2,,T}\small\boldsymbol{\mathcal{R}}\!=\!\{\mathcal{R}_{1},\mathcal{R}_{2},\ldots,\mathcal{R}_{T}\} and the privacy budget vector as 𝝆={ρit,i{1,2,,N},t{1,2,,T}}\small\boldsymbol{\rho}\!=\!\{\rho_{i}^{t},i\!\in\!\{1,2,\ldots,N\},t\!\in\!\{1,2,\ldots,\\ T\}\}. Then, the central server’s cost function can be expressed as the summation of the accuracy loss and total reward:

𝒰T(𝓡,𝝆)=γ2βG2λ2T+(1γ)k=1Tπk1k\displaystyle\textstyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho})=\gamma\frac{2\beta G^{2}}{\lambda^{2}T}+(1-\gamma)\textstyle\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k} (15)
=2βγλ2T2t=1T(V2+i=1N2dC2|𝒟i|2ρitN2)+(1γ)k=1Tπk1k,\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}\!\textstyle\sum_{t=1}^{T}(V^{2}\!+\!\!\sum_{i=1}^{N}\!\frac{2dC^{2}}{|\mathcal{D}_{i}|^{2}\rho_{i}^{t}N^{2}})\!\!+\!(1\!-\!\gamma)\!\sum_{k=1}^{T}\!\pi^{k-1}\mathcal{R}_{k},

where discount factor π(0,1)\small\pi\!\in\!(0,1) measures the decrement of the value of reward t\mathcal{R}_{t} over time.

III-B2 Client’s Utility (Stage II)

In the tt-th global iteration, given the central server’s reward t\mathcal{R}_{t}, the reward ritr_{i}^{t} allocated to client ii is: rit=ρitj=1Nρjttr_{i}^{t}\!=\!\frac{\rho_{i}^{t}}{\sum_{j=1}^{N}\rho_{j}^{t}}\mathcal{R}_{t}. The utility of client ii in tt-th global iteration is defined as the difference between the reward from the central server and training cost 𝒞i,t\mathcal{C}_{i,t}, i.e.,

𝒰it(ρit,𝝆-𝒊𝒕,t)=ρitj=1Nρjtt𝒞i,t,\displaystyle\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t})=\textstyle\frac{\rho_{i}^{t}}{\sum_{j=1}^{N}\rho_{j}^{t}}\mathcal{R}_{t}-\mathcal{C}_{i,t}, (16)
𝒞i,t=ϕ1𝒞i,tpv+ϕ2𝒞i,td+ϕ3𝒞i,tcp+ϕ4𝒞i,tcm,\displaystyle\mathcal{C}_{i,t}=\phi_{1}\mathcal{C}_{i,t}^{pv}+\phi_{2}\mathcal{C}_{i,t}^{d}+\phi_{3}\mathcal{C}_{i,t}^{cp}+\phi_{4}\mathcal{C}_{i,t}^{cm}, (17)

where 𝝆-𝒊𝒕={ρ1t,,ρi1t,ρi+1t,,ρNt}\small\boldsymbol{\rho_{\text{-}i}^{t}}\!=\!\{\rho_{1}^{t},\ldots,\rho_{i-1}^{t},\rho_{i+1}^{t},\ldots,\rho_{N}^{t}\} and ϕk\small\phi_{k}, k{1,2,3,4}k\!\in\!\{1,2,3,4\} is positive coefficients. The training cost 𝒞i,t\small\mathcal{C}_{i,t} consists of privacy cost 𝒞i,tpv\small\mathcal{C}_{i,t}^{pv}, local data cost 𝒞i,td\small\mathcal{C}_{i,t}^{d}, computation cost 𝒞i,tcp\small\mathcal{C}_{i,t}^{cp} and communication cost 𝒞i,tcm\small\mathcal{C}_{i,t}^{cm}. Among all components of the training cost of the client i\small i, privacy cost 𝒞i,tpv\small\mathcal{C}_{i,t}^{pv} is closely related to the privacy budget ρit\small\rho_{i}^{t}, as a larger privacy budget ρit\small\rho_{i}^{t} signifies more precise data, yet it also corresponds to an increased vulnerability to privacy breaches. Specifically, inspired by [19], we denote the privacy cost of client ii as a function of c(νit,ρit)\small c(\nu_{i}^{t},\rho_{i}^{t}), where νit>0\small\nu_{i}^{t}\!>\!0 is the privacy value that is assumed publicly known. Here, we consider linear privacy cost function for each client, i.e., 𝒞i,tpv=c(νit,ρit)=νitρit\mathcal{C}_{i,t}^{pv}=c(\nu_{i}^{t},\rho_{i}^{t})=\nu_{i}^{t}\cdot\rho_{i}^{t}.

The objective of ii-th client is to maximize its utility function 𝒰it\mathcal{U}_{i}^{t} by dynamically adjusting privacy budgets ρit\rho_{i}^{t} at each global iteration tt according to the reward t\mathcal{R}_{t} from the central server. The goal of the central server is to minimize its cost function by adjusting the reward t\mathcal{R}_{t}, t{1,2,,T}t\!\in\!\{1,2,\ldots,T\} distributed to participating clients in the premise of reaching preset global model accuracy. Note that the reward 𝓡\boldsymbol{\mathcal{R}} to clients will affect the clients’ designs of privacy budget 𝝆\boldsymbol{\rho}, which in return affects the central server’s cost as shown in Eq (15). Thus, the two-stage Stackelberg game can be formulated as:

Stage I: min𝓡𝒰T(𝓡,𝝆),\displaystyle\textstyle\ \min_{\boldsymbol{\mathcal{R}}}\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho}), (18)
Stage II: maxρit𝒰it(ρit,𝝆-𝒊𝒕,t),\displaystyle\textstyle\ \max_{\rho_{i}^{t}}\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t}), (19)

where the privacy budget ρit\small\rho_{i}^{t} over time form the strategy profile 𝝆𝒊\small\boldsymbol{\rho_{i}} of client ii, i.e., 𝝆𝒊={ρit,t{1,2,,T}}\small\boldsymbol{\rho_{i}}=\{\rho_{i}^{t},t\in\{1,2,\ldots,T\}\}.

IV Stackelberg Nash Equilibrium Analysis

In this section, we will find the optimal strategy profile (𝓡,{𝝆𝒊\small(\boldsymbol{\mathcal{R}},\\ \{\boldsymbol{\rho_{i}}, i{1,2,,N}})i\!\in\!\{1,2,\ldots,N\}\}), with 𝓡={t,t{1,2,,T}}\small\boldsymbol{\mathcal{R}}\!=\!\{\mathcal{R}_{t},t\!\in\!\{1,2,\ldots,T\}\}\! and 𝝆𝒊={ρit,t{1,2,,T}}\small\boldsymbol{\rho_{i}}\!=\!\{\rho_{i}^{t},t\!\in\!\{1,2,\ldots,T\}\} for the central server and selected clients in the two-stage Stackelberg game through backward induction. Firstly, we concentrate on the followers’ operation and derive each selected client ii’s optimal privacy budget ρit\rho_{i}^{t*} in the tt-th global iteration under any given reward t\mathcal{R}_{t}. Then, considering the trade-off between the model accuracy loss and payment to the clients, we deduce the central server’s optimal strategy t\mathcal{R}_{t}^{*}. Finally, we prove that the optimal solution forms the Stackelberg Nash Equilibrium.

IV-A Optimal Strategy Profile

We adopt a backward induction approach to derive the optimal strategy of the central server and each client respectively. First of all, we analyze the optimal strategy of each selected client by determining the optimal privacy budget ρit\rho_{i}^{t*} and are presented in the following theorem.

Theorem 1

In Stage II, given payment t\mathcal{R}_{t} in the tt-th global iteration, the optimal privacy budget of each client ii is:

ρit=t(N1)ϕ1i=1Nνittνitϕ1(N1i=1Nνit)2.\displaystyle\textstyle\rho_{i}^{t*}=\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}-\frac{\mathcal{R}_{t}\nu_{i}^{t}}{\phi_{1}}(\frac{N-1}{\sum_{i=1}^{N}\nu_{i}^{t}})^{2}. (20)
Proof:

Firstly, we derive the first-order and the second-order derivatives of each client ii’s utility function 𝒰it(ρit,𝝆-𝒊𝒕,t)\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t}},\mathcal{R}_{t}) concerning privacy budget ρit\rho_{i}^{t} as follows:

𝒰itρit=tj=1Nρjtρit(i=1Nρit)2ϕ1νit=tjN{i}ρjt(i=1Nρit)2ϕ1νit,\displaystyle\textstyle\frac{\partial\mathcal{U}_{i}^{t}}{\partial\rho_{i}^{t}}\!=\!\frac{\mathcal{R}_{t}\!\sum_{j=1}^{N}\!\rho_{j}^{t}\!-\!\rho_{i}^{t}}{(\sum_{i=1}^{N}\!\rho_{i}^{t})^{2}}\!-\!\phi_{1}\nu_{i}^{t}\!=\!\frac{\mathcal{R}_{t}\!\sum_{j\in N\setminus\{i\}}\!\rho_{j}^{t}}{(\sum_{i=1}^{N}\!\rho_{i}^{t})^{2}}\!-\!\phi_{1}\nu_{i}^{t}, (21)
2𝒰it(ρit)2=2tjN{i}ρjt(i=1Nρit)3<0.\displaystyle\textstyle\frac{\partial^{2}\mathcal{U}_{i}^{t}}{\partial(\rho_{i}^{t})^{2}}=-2\mathcal{R}_{t}\cdot\frac{\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}}{(\sum_{i=1}^{N}\rho_{i}^{t})^{3}}<0. (22)

As the second-order derivative 2𝒰it/(ρit)2>0\small\partial^{2}\mathcal{U}_{i}^{t}/\partial(\rho_{i}^{t})^{2}\!>\!0, the utility of each client ii is strictly concave in the feasible region of ρit\rho_{i}^{t}. Then, we derive the optimal privacy budget of client ii in tt-th global iteration by solving equation 𝒰it/ρit=0\partial\mathcal{U}_{i}^{t}/\partial\rho_{i}^{t}=0, i.e.,

ρit=i=1Nρitϕ1νitt(i=1Nρit)2.\displaystyle\textstyle\rho_{i}^{t*}=\sum_{i=1}^{N}\rho_{i}^{t}-\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}. (23)

From Eq (21), since client ii fails to acquire other clients’ privacy budget ρjt,jN{i}\small\rho_{j}^{t},j\!\in\!N\!\setminus\!\{i\} as it is privacy information and there is no communication among clients during any consecutive global training iterations, we need to derive i=1Nρit\sum_{i=1}^{N}\rho_{i}^{t} term. Thus, Eq (21) can be rewritten as follows:

jN{i}ρjt=ϕ1νitt(i=1Nρit)2.\displaystyle\textstyle\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}=\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}. (24)

After summing up both sides of Eq (24), we have:

i=1N[jN{i}ρjt]\displaystyle\textstyle\sum_{i=1}^{N}[\textstyle\sum_{j\in N\setminus\{i\}}\rho_{j}^{t}] =i=1Nϕ1νitt(i=1Nρit)2,\displaystyle=\textstyle\sum_{i=1}^{N}\frac{\phi_{1}\nu_{i}^{t}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2},
(N1)i=1Nρit\displaystyle\Rightarrow\ \textstyle(N\!-\!1)\sum_{i=1}^{N}\rho_{i}^{t} =ϕ1t(i=1Nρit)2i=1Nνit,\displaystyle=\textstyle\frac{\phi_{1}}{\mathcal{R}_{t}}(\sum_{i=1}^{N}\rho_{i}^{t})^{2}\sum_{i=1}^{N}\nu_{i}^{t},
i=1Nρit\displaystyle\Rightarrow\ \textstyle\sum_{i=1}^{N}\rho_{i}^{t} =(N1)tϕ1i=1Nνit.\displaystyle=\textstyle\frac{(N-1)\mathcal{R}_{t}}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}. (25)

By substituting the term i=1Nρit\sum_{i=1}^{N}\rho_{i}^{t} in Eq (23), we can finalize the expression of the optimal privacy budget ρit\rho_{i}^{t*} as:

ρit=t(N1)ϕ1i=1Nνittνitϕ1(N1i=1Nνit)2.\displaystyle\rho_{i}^{t*}=\textstyle\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}\!-\!\frac{\mathcal{R}_{t}\nu_{i}^{t}}{\phi_{1}}(\frac{N-1}{\sum_{i=1}^{N}\nu_{i}^{t}})^{2}.

Hence, the theorem holds. ∎

Based on the optimal privacy budget ρit\rho_{i}^{t*}, we derive the central server’s optimal reward t\mathcal{R}_{t}^{*} as summarized in Theorem 2.

Theorem 2

In Stage I, based on the optimal privacy budget of each client ii, i.e, 𝝆𝒊𝒕={ρ1t,ρ2t,,ρNt}\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{1}^{t*},\rho_{2}^{t*},\ldots,\rho_{N}^{t*}\}, the optimal reward of each global training iteration t=Atπ1t1γ\small\mathcal{R}_{t}^{*}\!=\!\sqrt{\frac{A_{t}\pi^{1-t}}{1-\gamma}}, where discount factor π(0,1)\small\pi\!\in\!(0,1) measures the decrement of the value of reward over time, constants Ci=(N1)[(i=1Nνit)νit(N1)]ϕ1(i=1Nνit)2\small C_{i}\!=\!\frac{(N-1)\left[(\sum_{i=1}^{N}\nu_{i}^{t})-\nu_{i}^{t}(N-1)\right]}{\phi_{1}\left(\sum_{i=1}^{N}\nu_{i}^{t}\right)^{2}} and At=i=1N4dγβC2T2λ2N2Ci|𝒟i|2\small A_{t}\!=\!\sum_{i=1}^{N}\frac{4d\gamma\beta C^{2}}{T^{2}\lambda^{2}N^{2}C_{i}|\mathcal{D}_{i}|^{2}}.

Proof:

First of all, by substituting the optimal privacy budget ρit\rho_{i}^{t*} in Eq (20) into the central server’s cost function in Eq (15), the cost function 𝒰T(𝓡,𝝆)\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho}^{*}) can be rewritten as follows:

𝒰T(𝓡,𝝆)\displaystyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}}) =2βγλ2T2t=1T(V2+dN2i=1N2C2|𝒟i|2ρit)\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}\sum_{t=1}^{T}(V^{2}\!+\!\frac{d}{N^{2}}\sum_{i=1}^{N}\frac{2C^{2}}{|\mathcal{D}_{i}|^{2}\rho_{i}^{t*}})
+(1γ)k=1Tπk1k,\displaystyle\quad\textstyle+(1-\gamma)\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k},
=2βγλ2T2(V2T+2dC2N2t=1Ti=1N1|𝒟i|2ρit)\displaystyle=\textstyle\frac{2\beta\gamma}{\lambda^{2}T^{2}}(V^{2}T+\frac{2dC^{2}}{N^{2}}\textstyle\sum_{t=1}^{T}\sum_{i=1}^{N}\frac{1}{|\mathcal{D}_{i}|^{2}\rho_{i}^{t*}})
+(1γ)k=1Tπk1k,\displaystyle\textstyle\quad+(1-\gamma)\textstyle\sum_{k=1}^{T}\pi^{k-1}\mathcal{R}_{k}, (26)

where 𝝆={ρit,i{1,2,,N},t{1,2,,T}}\small\boldsymbol{\rho^{*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\},t\!\in\!\{1,2,\ldots,T\}\}. The first-order of the central server’s cost function can be derived as:

𝒰Tt=(1γ)πt14dγβC2(TλN)2i=1Nρit/t(|𝒟i|ρit)2.\displaystyle\textstyle\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}=(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{4d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}\frac{\partial\rho_{i}^{t*}/\partial\mathcal{R}_{t}}{(|\mathcal{D}_{i}|\rho_{i}^{t*})^{2}}. (27)

Then, we need to consider the existence of the solution of Eq (27). As the reward t[0,+)\mathcal{R}_{t}\in[0,+\infty), we have:

limt0𝒰Tt,limt+𝒰Tt(1γ)πt>0.\displaystyle\textstyle\lim_{\mathcal{R}_{t}\rightarrow 0}\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}\!\rightarrow\!-\infty,\lim_{\mathcal{R}_{t}\rightarrow+\infty}\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}}\!\rightarrow\!(1\!-\!\gamma)\pi^{t}\!>\!0. (28)

From Eq (28), we demonstrate the existence of a solution for the equation 𝒰T/t=0\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0. Further, according to Eq (27), we derive the second-order of 𝒰T(𝓡,𝝆)\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}}) with respect to t\mathcal{R}_{t}:

2𝒰Tt2=8dγβC2(TλN)2i=1N(ρitt1|𝒟i|)2(1ρit)3>0.\displaystyle\textstyle\frac{\partial^{2}\mathcal{U}_{T}}{\partial\mathcal{R}_{t}^{2}}=\frac{8d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}(\frac{\partial\rho_{i}^{t*}}{\partial\mathcal{R}_{t}}\frac{1}{|\mathcal{D}_{i}|})^{2}\!(\frac{1}{\rho_{i}^{t*}})^{3}>0. (29)

Thus, based on Eqs (28)-(29), we obtain the optimal reward t\mathcal{R}_{t}^{*} by solving the first-order equation 𝒰T/t=0\small\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0, i.e.,

𝒰Tt\displaystyle\textstyle\frac{\partial\mathcal{U}_{T}}{\partial\mathcal{R}_{t}} =(1γ)πt14dγβC2(TλN)2i=1Nρit/t(|𝒟i|ρit)2\displaystyle=\textstyle(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{4d\gamma\beta C^{2}}{(T\lambda N)^{2}}\sum_{i=1}^{N}\frac{\partial\rho_{i}^{t*}/\partial\mathcal{R}_{t}}{(|\mathcal{D}_{i}|\rho_{i}^{t*})^{2}}
=(1γ)πt1Att2=0t=At(1γ)πt1,\displaystyle=\textstyle(1\!-\!\gamma)\pi^{t-1}\!-\!\frac{A_{t}}{\mathcal{R}_{t}^{2}}=0\Rightarrow\!\mathcal{R}_{t}\!=\!\sqrt{\frac{A_{t}}{(1-\gamma)\pi^{t-1}}},

where known constants Ci=(N1)[(i=1Nνit)νit(N1)]ϕ1(i=1Nνit)2C_{i}\!=\!\frac{(N-1)\left[(\sum_{i=1}^{N}\nu_{i}^{t})-\nu_{i}^{t}(N-1)\right]}{\phi_{1}\left(\sum_{i=1}^{N}\nu_{i}^{t}\right)^{2}} and At=i=1N4dγβC2T2λ2N2Ci|𝒟i|2A_{t}\\ =\sum_{i=1}^{N}\frac{4d\gamma\beta C^{2}}{T^{2}\lambda^{2}N^{2}C_{i}|\mathcal{D}_{i}|^{2}}. ∎

As shown in Theorem 2, the optimal reward t\mathcal{R}_{t}^{*} increases with tt, indicating that the central server necessitates data of superior quality to attain the desired accuracy level when approaching the end of the time horizon TT.

Definition 2

(Stackelberg Nash Equilibrium) The strategy profile (𝓡,{𝝆𝒊(\boldsymbol{\mathcal{R}}^{*},\{\boldsymbol{\rho_{i}^{*}}, i{1,2,,N}})i\!\in\!\{1,2,\ldots,N\}\}), with 𝓡={t,t{1,2,,T}}\boldsymbol{\mathcal{R}}^{*}\!=\!\{\mathcal{R}_{t}^{*},t\in\{1,\\ 2,\ldots,T\}\} and 𝝆𝒊={ρit,t{1,2,,T}}\boldsymbol{\rho_{i}^{*}}\!\!=\!\!\{\rho_{i}^{t*},t\in\{1,2,\ldots,T\}\} constitutes a Stackelberg Nash Equilibrium if for any reward 𝓡T\boldsymbol{\mathcal{R}}\!\in\!\mathbb{R}^{T} and any privacy budget ρit0\rho_{i}^{t}\geq 0:

𝒰T(𝓡,𝝆)\displaystyle\mathcal{U}_{T}(\boldsymbol{\mathcal{R}^{*}},\boldsymbol{\rho^{*}}) 𝒰T(𝓡,𝝆),\displaystyle\leq\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}}), (30)
𝒰it(ρit,𝝆-𝒊𝒕,t)\displaystyle\mathcal{U}_{i}^{t}(\rho_{i}^{t*},\boldsymbol{\rho_{\text{-}i}^{t*}},\mathcal{R}_{t}^{*}) 𝒰it(ρit,𝝆-𝒊𝒕,t).\displaystyle\geq\mathcal{U}_{i}^{t}(\rho_{i}^{t},\boldsymbol{\rho_{\text{-}i}^{t*}},\mathcal{R}_{t}^{*}). (31)
Refer to caption
(a) Fitting Curve on MNIST
Refer to caption
(b) Fitting Curve on EMNIST
Figure 2: The relationship between the privacy budget ρ\rho and model accuracy on MNIST and EMNIST dataset
Theorem 3

The above two-stage Stackelberg game possesses a Stackelberg Nash Equilibrium.

Proof:

Based on Theorem 1, we deduce that there exists the optimal strategy of privacy budget ρit\rho_{i}^{t*} for each client i{1,2,,N}i\!\in\!\{1,2,\\ \ldots,N\} given any reward t\mathcal{R}_{t}\!\in\!\mathbb{R}. Then, we need to prove that there exists an optimal reward t\mathcal{R}_{t}^{*} for the central server’s cost function 𝒰T\small\mathcal{U}_{T} in the premise of optimal privacy budget vector 𝝆𝒊𝒕={ρit,i{1,2,,N}}\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\}\}. According to the proof of Theorem 2, 2𝒰T/t2>0\small\partial^{2}\mathcal{U}_{T}/\partial\mathcal{R}_{t}^{2}\!>\!0 indicates that 𝒰T(𝓡,𝝆)\small\mathcal{U}_{T}(\boldsymbol{\mathcal{R}},\boldsymbol{\rho^{*}}) is convex. Thus, an optimal reward t\small\mathcal{R}_{t}^{*} for central server exists by solving the first-order equation 𝒰T/t=0\small\partial\mathcal{U}_{T}/\partial\mathcal{R}_{t}\!=\!0 based on the optimal privacy budget vector 𝝆𝒊𝒕={ρit,i{1,2,,N}}\small\boldsymbol{\rho_{i}^{t*}}\!=\!\{\rho_{i}^{t*},i\!\in\!\{1,2,\ldots,N\}\}. In other words, the strategy t\mathcal{R}_{t}^{*} and 𝝆𝒊𝒕\boldsymbol{\rho_{i}^{t*}} are mutual optimal strategy for each selected client ii and the central server. Thus, the optimal strategy profile (𝓡,{𝝆𝒊\small(\boldsymbol{\mathcal{R}^{*}},\{\boldsymbol{\rho_{i}^{*}}, i{1,2,,N}})i\!\in\!\{1,2,\ldots,N\}\}), with 𝓡={t,t{1,2,,T}}\small\boldsymbol{\mathcal{R}^{*}}\!\!=\!\{\mathcal{R}_{t}^{*},t\!\in\!\{1,2,\ldots,T\}\}\! and 𝝆𝒊={ρit,t{1,2,,T}}\small\boldsymbol{\rho_{i}^{*}}\!=\!\{\rho_{i}^{t*},t\!\in\!\{1,2,\ldots,T\}\} of the two-stage Stackelberg game possesses a Stackelberg Nash Equilibrium. Hence, the theorem holds. ∎

Theorem 2 reveals that the optimal privacy budget of the selected clients in Theorem 1 and optimal reward of the central server in Theorem 2 are mutually optimal, which leads to the steady state of the FL system. The overall framework process is summarized in Algorithm 1.

Algorithm 1 QI-DPFL

Input: The total client size HH, hyperparameter γ\gamma and ϕk\phi_{k}, k{1,2,3,4}k\!\in\!\{1,2,3,4\}, discount factor π(0,1)\pi\!\in\!(0,1), local training epoch LL, global training iteration TT and convex function II.

1:The central server claims the task request by outsourcing the FL task and initializes the global model.
2:Client Selection Layer:
3:for each client h{1,2,,H}\small h\in\{1,2,\ldots,H\} do
4:    Calculate Wasserstein distance
5:    if θh<θth\small\theta_{h}<\theta_{\text{th}} then
6:         Add hh-th client into the selected client set
7:    end if
8:end for
9:Stackelberg Game-based Model Training Layer:
10:for t=0t=0 to TT do
11:    for selected client i{1,2,,N}\small i\!\in\!\{1,2,\ldots,N\} in parallel do
12:         Collect privacy data and perform local training:
13:         for l=1l=1 to LL do
14:             𝒘𝒊𝒍+𝟏(𝒕)=𝒘𝒊𝒍(𝒕)ηiFi(𝒘𝒊𝒍(𝒕))\small\boldsymbol{w_{i}^{l+1}(t)}=\boldsymbol{w_{i}^{l}(t)}-\eta_{i}\nabla F_{i}(\boldsymbol{w_{i}^{l}(t)})
15:         end for
16:         Determine the optimal privacy budget ρit\small\rho_{i}^{t*}
17:         Perturb the local model and upload F~i(𝒘𝒊(𝒕))\small\nabla\widetilde{F}_{i}(\boldsymbol{w_{i}(t)})
18:    end for
19:    𝒘(𝒕+𝟏)=𝒘(𝒕)η1Ni=1NF~i(𝒘(𝒕))\small\textstyle\boldsymbol{w(t+1)}=\boldsymbol{w(t)}-\eta\cdot\frac{1}{N}\sum_{i=1}^{N}\nabla\widetilde{F}_{i}(\boldsymbol{w(t)})
20:    Distribute reward t\small\mathcal{R}_{t}^{*} to all selected clients
21:end for

IV-B Range Analysis of the Reward

In this section, we will analyze the range of the reward under the Stackelberg Nash Equilibrium to guarantee the pre-set model accuracy while simultaneously minimizing the compensation provided to clients. Firstly, we introduce the additivity property of the privacy budget ρit\rho_{i}^{t} of each selected client ii as summarized in Lemma 1 [19].

Lemma 1

Hypothesise two mechanism satisfy ρ1\rho_{1}-zzCDP and ρ2\rho_{2}-zzCDP, their composition satisfies ρ1+ρ2\rho_{1}+\rho_{2}-zzCDP.

According to Lemma 1, in the tt-th global iteration, it is equivalent that the global model satisfies ρt\rho_{t}-zzCDP, where the global privacy budget ρt\rho_{t} equals the summation of each client’s privacy budget ρit\small\rho_{i}^{t} (i.e., ρt=i=1Nρit\small\rho_{t}\!=\!\sum_{i=1}^{N}\!\rho_{i}^{t}). Naturally, the larger privacy budget usually results in better model performance. To quantify the global model accuracy as a function of the privacy budget, we acquire the test accuracy of the training model with different privacy budgets based on different real-world datasets (i.e., MNIST, Cifar10, and EMNIST datasets). Fig. 2 illustrates the MNIST and EMNIST datasets as an example. We observe that, in tt-th global iteration, the global model accuracy ϵt\epsilon_{t} can be regarded as a simplified concave function concerning the global privacy budget ρt\rho_{t} (i.e., ϵt=I(ρt)=I1I2×eI3ρtI4\epsilon_{t}\!=\!I(\rho_{t})\!=\!I_{1}\!-\!I_{2}\!\times\!e^{-I_{3}\rho_{t}-I_{4}}), where Ik\small I_{k}, k{1,2,3,4}\small k\!\in\!\{1,2,3,4\} are corresponding constants.

Remark 1

(Reward Range) Based on the optimal strategy of each selected client, the global privacy budget ρt\rho_{t} can be calculated as: ρt=t(N1)ϕ1i=1Nνit\rho_{t}\!=\!\frac{\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}, based on which, the model accuracy ϵt\epsilon_{t} at tt-th global iteration can be derived as follows:

ϵt\displaystyle\epsilon_{t} =I1I2×eI3ρtI4\displaystyle=I_{1}-I_{2}\times e^{-I_{3}\rho_{t}-I_{4}}
=I1I2×eI3t(N1)ϕ1i=1NνitI4ϵ,\displaystyle=I_{1}-I_{2}\times e^{-\frac{I_{3}\mathcal{R}_{t}(N-1)}{\phi_{1}\sum_{i=1}^{N}\nu_{i}^{t}}-I_{4}}\geq\epsilon, (32)

where ϵ\epsilon is the predefined model accuracy that should be reached after TT iterations of model training. Thus, there exists a lower and an upper bound for the allocated reward t\mathcal{R}_{t} in the tt-th global iteration, i.e.,

ϕ1[log(I2I1ϵ)I4](N1)I3/i=1Nνittϕ1[log(I2I1ϵmax)I4](N1)I3/i=1Nνit.\displaystyle\textstyle\frac{\phi_{1}[\log(\frac{I_{2}}{I_{1}-\epsilon})-I_{4}]}{(N-1)I_{3}/\sum_{i=1}^{N}\!\nu_{i}^{t}}\leq\mathcal{R}_{t}\leq\frac{\phi_{1}[\log(\frac{I_{2}}{I_{1}-\epsilon_{\text{max}}})-I_{4}]}{(N-1)I_{3}/\sum_{i=1}^{N}\!\nu_{i}^{t}}. (33)
Refer to caption
(a) Accuracy on different models
Refer to caption
(b) Accuracy on different strategies
Refer to caption
(c) Cost of central server
Refer to caption
(d) Accuracy on different strategies
Figure 3: The performance of different models on EMNIST dataset with IID data distribution.
Refer to caption
(a) Accuracy on different models
Refer to caption
(b) Accuracy on different strategies
Refer to caption
(c) Cost of central server
Refer to caption
(d) Accuracy on different strategies
Figure 4: The performance of different models on EMNIST dataset with Non-IID data distribution.
Refer to caption
(a) Accuracy on different models
Refer to caption
(b) Accuracy on different strategies
Refer to caption
(c) Cost of central server
Refer to caption
(d) Reward
Figure 5: The performance of different models on Cifar10 dataset with IID data distribution.

V Numerical Experiments

In this section, we conduct extensive experiments to demonstrate the efficiency of our proposed framework on commonly used real-world datasets in federated learning.

V-A Experimental Settings

Our proposed approach is implemented on three real-world datasets (i.e., MNIST [23], Cifar10 [24] and EMNIST [25]) to demonstrate the generalization of our framework. We adopt Dirichlet distribution [26] to partition the datasets into Non-IID by setting the concentration parameter as 1.0. In addition, three classic machine-learning models with different structures are implemented for each dataset. Specifically, for the MNIST dataset, we leverage a linear model with a fully connected layer of 784 input and 10 output channels. As to the Cifar10 dataset, the local training model is consistent with [21]. For the EMNIST dataset, we adopt a CNN similar to the structure of LeNet-5 [23]. The correlated basic dataset information detailed parameter settings are summarized in Table I.

V-B Experiments on Real Datasets

In the experiments, we denote the approach without client selection and DP mechanism as FedAvg, the approach only performs client selection as FedAvg-select, the approach only considers DP mechanism as FedAvg-DP, and our proposed method with both client selection and DP mechanism as QI-DPFL. Moreover, to verify the efficiency and effectiveness of our proposed FL framework, we compare it with two baselines, named Max and Random, which is similar to the comparison paradigm in [27]. Specifically, Max means the central server chooses the largest possible reward value in each global iteration to guarantee the best performance. Random refers to selecting a random reward for clients incentivizing. Other incentive mechanisms such as [19] and [9] are designed based on different objectives, namely to maximize the utility of both clients and the central server, and maximize the profit of the model marketplace by designing an auction scenario for DPFL, respectively, which can’t be compared directly with our QI-DPFL framework. Thus, We exclude these two closely related methods from our comparative analysis.

EMNIST: For IID data distribution, as shown in Fig. 3a, compared with FedAvg, FedAvg-DP considers DP mechanism for privacy preservation and obtains the lowest model accuracy. As FedAvg-select and QI-DPFL select clients with high-quality data, they improve the model performance on convergence rate and accuracy. In Fig. 3b-3d, our proposed QI-DPFL achieves the lowest cost and allocated reward in the premise of guaranteeing model performance as the Max-select method and achieves higher accuracy than FedAvg-DP. For Non-IID data distribution, the superiority of QI-DPFL is more pronounced. In Fig. 4a, FedAvg-select and QI-DPFL improve the convergence rate and model accuracy by considering the client selection mechanism compared to FedAvg. FedAvg-DP obtains the lowest accuracy as artificial Gaussian random noise is added to avoid privacy leakage. Our proposed framework QI-DPFL with perturbation on local parameters still achieves a similar model performance as FedAvg-select, which shows the effectiveness of QI-DPFL. In Fig. 4b-4d, our algorithm attains model performance that is on par with the Max-select method while keeping costs and rewards minimal. Further, QI-DPFL outperforms FedAvg-DP by selecting participants with superior data quality. The experimental result on the MNIST dataset is similar to that on the EMNIST dataset.

TABLE I: Basic Information of Datasets and Parameter Settings
Datasets Training
Set Size
Validation
Set Size
Class Image Size η\eta TT Discount
Factor π\pi
MNIST 60,000 10,000 10 1 ×\!\times\! 28 ×\!\times\! 28 0.01 30 0.9429
Cifar10 50,000 10,000 10 3 ×\!\times\! 32 ×\!\times\! 32 0.1 80 0.9664
EMNIST 731,668 82,587 62 1 ×\!\times\! 28 ×\!\times\! 28 0.01 30 0.901
Refer to caption
(a) Accuracy on different models
Refer to caption
(b) Accuracy on different strategies
Refer to caption
(c) Cost of central server
Refer to caption
(d) Accuracy on different strategies
Figure 6: The performance of different models on Cifar10 dataset with Non-IID data distribution.

Cifar10: For the IID datasets, as shown in Fig. 5a, compared with FedAvg, the accuracy of FedAvg-select achieves a higher accuracy with the client selection mechanism. To avoid privacy leakage, the artificial noise added to the transmitted parameters in the DP mechanism may deteriorate the model performance. The accuracy of FedAvg-DP is lower than that of FedAvg as shown in Fig. 5a. Our proposed algorithm QI-DPFL with the perturbation on transmitted local parameters still achieves a fast convergence rate and similar testing accuracy as FedAvg-select, revealing the effectiveness of QI-DPFL. Fig. 5b-5d indicate that our approach achieves comparable model performance compared to the Max-select method while maintaining the lowest cost and reward. Additionally, QI-DPFL achieves higher accuracy than FedAvg-DP by selecting clients with high data quality. Under Non-IID data distribution, the advantages of QI-DPFL are even more conspicuous. In Fig. 6a, FedAvg-DP obtains the lowest model accuracy as the DP mechanism. FedAvg-select and QI-DPFL select clients with high-quality data, they thus exhibit enhanced performance concerning both convergence rate and model accuracy. Moreover, although QI-DPFL considers artificial noise, it still achieves similar accuracy as FedAvg-select, which demonstrates the effectiveness of our model. In Fig. 6b-6d, QI-DPFL attains model accuracy that is on par with the Max-select scheme while keeping costs and rewards minimal. Further, QI-DPFL outperforms FedAvg-DP by selecting clients with superior data quality.

Based on the above experimental results, it becomes evident that the advantages of our proposed QI-DPFL are particularly pronounced on the Non-IID datasets, which is attributed to the heightened effectiveness of the client selection mechanism.

VI Conclusion

In this paper, we propose a novel federated learning framework called QI-DPFL to jointly solve the client selection and incentive mechanism problem on the premise of preserving clients’ data privacy. We adopt Earth Mover’s Distance (EMD) metric to select clients with high-quality data. Furthermore, we model the interactions between clients and the central server as a two-stage Stackelberg game and derive the Stackelberg Nash Equilibrium to describe the steady state of the system. Extensive experiments on MNIST, Cifar10, and EMNIST datasets for both IID and Non-IID settings demonstrate that QI-DPFL achieves comparable model accuracy and faster convergence rate with the lowest cost and reward for the central server.

References

  • [1] W. G. Voss, “European union data privacy law reform: General data protection regulation, privacy shield, and the right to delisting,” The Business Lawyer, vol. 72, no. 1, pp. 221–234, 2016.
  • [2] R. Zhou, J. Pang, Z. Wang, J. C. Lui, and Z. Li, “A truthful procurement auction for incentivizing heterogeneous clients in federated learning,” in 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS).   IEEE, 2021, pp. 183–193.
  • [3] W. He, H. Yao, T. Mai, F. Wang, and M. Guizani, “Three-stage stackelberg game enabled clustered federated learning in heterogeneous uav swarms,” IEEE Transactions on Vehicular Technology, 2023.
  • [4] J. S. Ng, W. Y. B. Lim, H.-N. Dai, Z. Xiong, J. Huang, D. Niyato, X.-S. Hua, C. Leung, and C. Miao, “Joint auction-coalition formation framework for communication-efficient federated learning in uav-enabled internet of vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 4, pp. 2326–2344, 2020.
  • [5] L. Zhu, Z. Liu, and S. Han, “Deep leakage from gradients,” Advances in neural information processing systems, vol. 32, 2019.
  • [6] C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
  • [7] M. Bun and T. Steinke, “Concentrated differential privacy: Simplifications, extensions, and lower bounds,” in Theory of Cryptography: 14th International Conference, TCC 2016-B, Beijing, China, October 31-November 3, 2016, Proceedings, Part I.   Springer, 2016, pp. 635–658.
  • [8] I. Mironov, “Rényi differential privacy,” in 2017 IEEE 30th computer security foundations symposium (CSF).   IEEE, 2017, pp. 263–275.
  • [9] P. Sun, X. Chen, G. Liao, and J. Huang, “A profit-maximizing model marketplace with differentially private federated learning,” in IEEE INFOCOM 2022-IEEE Conference on Computer Communications.   IEEE, 2022, pp. 1439–1448.
  • [10] X. Wu, Y. Zhang, M. Shi, P. Li, R. Li, and N. N. Xiong, “An adaptive federated learning scheme with differential privacy preserving,” Future Generation Computer Systems, vol. 127, pp. 362–372, 2022.
  • [11] J. P. Near and C. Abuah, “Programming differential privacy,” URL: https://uvm, 2021.
  • [12] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y.-C. Liang, and D. I. Kim, “Incentive design for efficient federated learning in mobile networks: A contract theory approach,” in 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS).   IEEE, 2019, pp. 1–5.
  • [13] M. Wu, D. Ye, J. Ding, Y. Guo, R. Yu, and M. Pan, “Incentivizing differentially private federated learning: A multidimensional contract approach,” IEEE Internet of Things Journal, vol. 8, no. 13, pp. 10 639–10 651, 2021.
  • [14] Z. Yi, Y. Jiao, W. Dai, G. Li, H. Wang, and Y. Xu, “A stackelberg incentive mechanism for wireless federated learning with differential privacy,” IEEE Wireless Communications Letters, vol. 11, no. 9, pp. 1805–1809, 2022.
  • [15] Y. Xu, M. Xiao, J. Wu, H. Tan, and G. Gao, “A personalized privacy preserving mechanism for crowdsourced federated learning,” IEEE Transactions on Mobile Computing, 2023.
  • [16] N. Ding, Z. Fang, and J. Huang, “Optimal contract design for efficient federated learning with multi-dimensional private information,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 186–200, 2020.
  • [17] H. Wu and P. Wang, “Fast-convergent federated learning with adaptive weighting,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 4, pp. 1078–1088, 2021.
  • [18] Y. Sun, H. Fernando, T. Chen, and S. Shahrampour, “On the stability analysis of open federated learning systems,” in 2023 American Control Conference (ACC).   IEEE, 2023, pp. 867–872.
  • [19] R. Hu and Y. Gong, “Trading data for learning: Incentive mechanism for on-device federated learning,” in GLOBECOM 2020-2020 IEEE Global Communications Conference.   IEEE, 2020, pp. 1–6.
  • [20] Y. Jiao, P. Wang, D. Niyato, B. Lin, and D. I. Kim, “Toward an automated auction framework for wireless federated learning services market,” IEEE Transactions on Mobile Computing, vol. 20, no. 10, pp. 3034–3048, 2020.
  • [21] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  • [22] A. Rakhlin, O. Shamir, and K. Sridharan, “Making gradient descent optimal for strongly convex stochastic optimization,” arXiv preprint arXiv:1109.5647, 2011.
  • [23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  • [24] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Advances in neural information processing systems, 2009.
  • [25] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN).   IEEE, 2017, pp. 2921–2926.
  • [26] T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.
  • [27] X. Kang, G. Yu, J. Wang, W. Guo, C. Domeniconi, and J. Zhang, “Incentive-boosted federated crowdsourcing,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 6021–6029.