Incentivized Federated Learning and Unlearning

Ningning Ding, Zhenyu Sun, Ermin Wei, and Randall Berry, Ningning Ding, Zhenyu Sun, and Randall Berry are with the Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA (email: [email protected]). Ermin Wei is with Electrical and Computer Engineering Department and Industrial Engineering and Management Sciences Department, Northwestern University, Evanston, IL 60208, USA.

Abstract

To protect users’ right to be forgotten in federated learning, federated unlearning aims at eliminating the impact of leaving users’ data on the global learned model. The current research in federated unlearning mainly concentrated on developing effective and efficient unlearning techniques. However, the issue of incentivizing valuable users to remain engaged and preventing their data from being unlearned is still under-explored, yet important to the unlearned model performance. This paper focuses on the incentive issue and develops an incentive mechanism for federated learning and unlearning. We first characterize the leaving users’ impact on the global model accuracy and the required communication rounds for unlearning. Building on these results, we propose a four-stage game to capture the interaction and information updates during the learning and unlearning process. A key contribution is to summarize users’ multi-dimensional private information into one-dimensional metrics to guide the incentive design. We further investigate whether allowing federated unlearning is beneficial to the server and users, compared to a scenario without unlearning. Interestingly, users usually have a larger total payoff in the scenario with higher costs, due to the server’s excess incentives under information asymmetry. The numerical results demonstrate the necessity of unlearning incentives for retaining valuable leaving users, and also show that our proposed mechanisms decrease the server’s cost by up to 53.91% compared to state-of-the-art benchmarks.

Index Terms:

incentive mechanism, federated learning, federated unlearning

1 Introduction

1.1 Background and Motivations

Federated learning is a promising distributed machine learning paradigm, in which multiple users collaborate to train a shared model under the coordination of a central server [1]. This approach allows users to keep their local data on their own devices and only share the intermediate model parameters, which helps protect their raw data. However, despite these measures, it may not provide sufficient privacy guarantees [2, 3].

For privacy reasons, one desirable property of a federated learning platform is the users’ “right to be forgotten” (RTBF), which has been explicitly stated in the European Union General Data Protection Regulation (GDPR) [4] and the California Consumer Privacy Act (CCPA) [5]. That is, a user has the right to request deletion of his private data and its impact on the trained model, if he no longer desires to participate in the platform. Users may seek to leave a platform for a variety of reasons. For example, they may feel that the benefits from the platform are not sufficient to compensate for their potential privacy leakage from participation. Furthermore, until they participate in the platform, they may not have full knowledge of these benefits and costs due to incomplete information about other users’ data. For instance, users’ privacy costs in federated learning depend on how unique their data is [6], which they can infer from their training loss after training [7].

To remove data from a trained federated learning model, the concept of federated unlearning has recently been proposed [8]. In this concept, after some users request to revoke their data, staying users will perform additional training or calculations to eliminate the impact of leaving users’ data and obtain an unlearned model. A simple yet costly approach is to retrain the model from scratch with the requested data being removed from the training dataset [9]. To be more efficient and effective, existing literature (e.g., [10, 7, 11]) focused on alternative federated unlearning methods that obtain a model similar (in some distance metrics) to a retrained model with lower computational costs. However, these studies usually assumed that users are willing to participate in federated learning and unlearning. This assumption may not be realistic without proper incentives since users incur various costs during the training process (e.g., time, energy, and privacy costs). Our goal in this paper is to develop incentive mechanisms to help retain valuable leaving users and create a sustainable learning platform for both the users and the server.

To design the incentive mechanism for federated learning and unlearning, there are several challenges to tackle. First, different leaving users will lead to different unlearned model performances and unlearning costs, the relationship among which is still an open problem yet essential for designing incentives. Second, it is difficult for the server to design incentives for a large number of heterogeneous users, when users have multi-dimensional private information (e.g., training costs and privacy costs) and unknown information (e.g., users’ training losses before federated learning). Third, unlearning incentives for retaining valuable leaving users require careful design. High incentives may encourage strategic users to intentionally request revocation to obtain retention rewards, while low incentives may fail to retain valuable users. It is also crucial for the server to distinguish between high-quality leaving users (e.g., with rare and valuable data) and low-quality ones (e.g., with erroneous data), both of which can lead to high training losses. Fourth, both learning and unlearning incentives affect the server’s and users’ payoffs but are determined in different stages - before or after federated learning. Meanwhile, there are different information asymmetry levels in each stage, as the federated learning process can reveal some information such as users’ training losses and contributions. Thus, the mutual influence of the incentives and dynamic information asymmetry further complicate the incentive mechanism design.

The above discussion motivates us to answer the following interesting question:

Question 1.

Considering leaving users’ impact, what is the server’s optimal incentive mechanism for federated learning and unlearning, when heterogeneous users have strategic data revocation decisions and multi-dimensional private and unknown information?

Furthermore, although federated unlearning is important for protecting users’ right to be forgotten and data privacy, no work has studied whether allowing federated unlearning is economically beneficial to the server or users by comparing the following two scenarios:

•

Unlearning-Allowed Scenario. The federated learning server allows users to revoke data and will perform federated unlearning;
•

Unlearning-Forbidden Scenario. The federated learning server does not allow users to revoke data after they decide to participate in the federated training.

Different unlearning scenarios will lead to different optimal incentive mechanisms, as well as the server’s and users’ payoffs. When unlearning is optional, studying the superiority of each scenario will facilitate the server’s and users’ selection. The performance comparison also provides insights into the policy design of a market regulator. This motivates the second key question of this paper:

Question 2.

Compared with the unlearning-forbidden scenario, is the federated unlearning-allowed scenario more beneficial to the server and users in terms of their payoffs?

1.2 Contributions

We summarize our key contributions below.

•

Incentive mechanism design for federated learning and unlearning. We propose a four-stage Stackelberg game to analyze the optimal incentives of the server and the optimal strategies of users within this game. To the best of our knowledge, this is the first analytical study of incentive mechanisms for federated learning and unlearning.
•

Theoretical characterization of global model accuracy and unlearning communication rounds. We theoretically derive bounds on the global model optimality gap given non-IID data for federated learning algorithms (Scaffold [12] and FedAvg [1]) and the number of global communication rounds required for a federated unlearning method.
•

Optimal incentives and revocation decisions under multi-dimensional incomplete information. Due to the complex interaction, users’ multi-dimensional private information, and dynamically updated knowledge, the server’s optimization problem in Stage I of the four-stage game is highly complex. We summarize users’ multi-dimensional heterogeneity into several one-dimensional metrics and develop an efficient algorithm with linear complexity, to handle the exponentially large number of possible cases involved in optimal mechanism design. We also identify and analyze a supermodular game among the users to obtain their optimal data revocation decisions.
•

Comparison of unlearning-allowed and unlearning-forbidden scenarios. We show that (i) when users’ unlearning costs in the unlearning-allowed scenario are large, the server needs to compensate them with large incentives and thus prefers the unlearning-forbidden scenario. Surprisingly, users prefer the unlearning-allowed scenario where they have large costs, due to the excess rewards they obtain under information asymmetry. (ii) When users’ perceived privacy costs in the unlearning-forbidden scenario are large, the server prefers the unlearning-allowed scenario while users prefer the unlearning-forbidden scenario for similar reasons as in (i).
•

Insights and Performance Evaluation. We show that high costs and training losses motivate users to leave, while the server will retain the leaving users who make significant contributions to model accuracy but not necessarily low training losses, as small losses of retained users will reduce privacy costs yet increase unlearning costs. We numerically show that compared with state-of-the-art benchmarks, our proposed incentive mechanism decreases the server’s cost by up to 53.91%. Moreover, the results demonstrate that it is beneficial for the server to retain valuable leaving users and jointly optimize the federated learning and unlearning incentive mechanisms.

1.3 Related Work

The concept of machine unlearning, which refers to the process of removing the impact of a data sample from a trained model, was first introduced by Cao et al. in 2015 [13]. Most related literature was about centralized machine unlearning (e.g., [9, 14]), in which the unlearned model (not retrained from scratch) was trained on summarized (e.g., aggregates of summations) or partitioned subsets rather than individual training samples. As a result, the model only needed to be updated on the subset(s) of data that are associated with the requested samples.

Centralized unlearning methods are not suited to federated learning, due to (i) lack of direct data access, (ii) the fact that the global model is updated based on the aggregated rather than the raw gradients, and (iii) the possibility that different users may have similar training samples [7]. This motivated the emergence of federated unlearning, which focuses on deleting the impact of revoked data in federated learning.

Only a few studies proposed federated unlearning mechanisms using methods such as gradient subtraction (e.g., [10, 8]), gradient scaling (e.g., [7]), or knowledge distillation (e.g., [11]). Albeit with good numerical performance, there is no theoretical guarantee of these proposed federated unlearning methods. To fill this gap, we propose theoretical bounds on the model optimality gap and communication rounds for one approach to federated unlearning in this paper.

Furthermore, there is a wide spectrum of literature on incentive mechanisms for various systems, including crowdsensing (e.g., [15]), wireless networks (e.g., [16]), data trading (e.g., [17]), and energy sharing (e.g., [18]). Some important work studied incentive mechanism design for federated learning to discourage valuable clients from leaving (e.g., [19, 20, 21, 22]). However, very few of them considered users’ multi-dimensional private information (e.g., [20]), and none of them incorporated the unique aspects of federated unlearning (e.g., unlearning costs) or the dynamics of users’ payoffs (e.g., pre-/post-training and before/after some users leave). This paper is the first to focus on incentive mechanism design for both federated learning and unlearning.

The rest of the paper is organized as follows. In Section 2, we characterize the models of federated learning and unlearning. The system model is described in Section 3. We give the optimal incentive mechanisms in the unlearning-allowed and unlearning-forbidden scenarios in Sections 4 and 5, respectively. We provide simulation results in Section 6 and conclude in Section 7.

2 Characterization of Federated Learning and Unlearning Models

Before modeling the game-theoretic interaction between the server and the users in the next section, we first discuss federated learning and unlearning models in this section as a preliminary. Specifically, we specify the learning and unlearning objectives in Sections 2.1 and 2.2, respectively. Then, we derive bounds on global model accuracy and federated unlearning time in Section 2.3.

2.1 Federated Learning Objective

Consider an example of data $(x_{a},y_{a})$ , where $x_{a}$ is the input (e.g., an image) and $y_{a}$ is the label (e.g., the object in the image). The objective of learning is to find the proper model parameter $w$ that can predict the label $y_{a}$ based on the input $x_{a}$ . Let us denote the prediction value as $\tilde{y}(x_{a};w)$ . The gap between the prediction $\tilde{y}(x_{a};w)$ and the ground truth label $y_{a}$ is characterized by the prediction loss function $f_{a}(w)$ . If user $i$ selects a set of local data with data size $d_{i}$ to train the model, the loss function of user $i\in\mathcal{I}$ is the average prediction loss on all his training data:

F_{i}(w)=\frac{1}{d_{i}}\sum_{a=1}^{d_{i}}f_{a}(w).

(1)

The purpose of federated learning is to compute the model parameter $w$ by using all users’ local data. The optimal model parameter $w^{*}$ minimizes the global loss function, which is an average of all users’ loss functions [12, 23]:¹¹1This model treats each user equally. Some papers (e.g., [1]) adopted another objective, a weighted sum of all users’ losses, where the weights (i.e. $d_{i}/\sum_{i=1}^{I}d_{i}$ ) reflect the differences in data size. The two objectives are equivalent when users’ data sizes are the same. Our results can be easily extended to the weighted case.

w^{*}=\arg\min_{w}F(w)\triangleq\arg\min_{w}\frac{1}{I}\sum_{i\in\mathcal{I}}F_{i}(w).

(2)

2.2 Federated Unlearning Objective

A federated learning process maps users’ data into a model space, while a federated unlearning process maps a learned model, users’ data set, and the data set that is required to be forgotten into an unlearned model space. The goal of federated unlearning is to make the unlearned model have the same distribution as the retrained model (i.e., retrained from scratch using the remaining data).²²2The distribution is due to the randomness in the training process (e.g. randomly sampled data and random ordering of batches).

A natural method for federated unlearning is to let the remaining users (excluding leaving users) continue training from the learned model $w^{*}$ , until it converges to a new optimal model parameter $\tilde{w}^{*}$ that minimizes the global loss function of remaining users:

\tilde{w}^{*}=\arg\min_{w}\frac{1}{I-I_{leave}}\sum_{i\in\mathcal{I}\backslash\mathcal{I}_{leave}}F_{i}(w),

(3)

where $\mathcal{I}_{leave}$ is the set of users who leave the system through federated unlearning. This method is typically more efficient than training from scratch, as the minimum point may not change much after some users leave.

2.3 Model Accuracy and Unlearning Time

Given the objectives of federated learning and unlearning, we analyze the model accuracy gap and unlearning time in the following.

We use two widely adopted algorithms, Scaffold[12] and FedAvg [1], as the federated learning algorithms when deriving the optimality gap of the global model. In each local iteration of the algorithm, every user computes a mini-batch gradient with batch size $s_{i}$ . A batch or minibatch refers to equally sized subsets of the training dataset over which the gradient is calculated. In this paper, we consider the widely adopted setting that users’ batch sizes $\{s_{i}\}_{i\in\mathcal{I}}$ are in the same proportion to their data sizes $\{d_{i}\}_{i\in\mathcal{I}}$ (i.e., $s_{i}=\iota d_{i},\forall i\in\mathcal{I},\iota\in(0,1)$ ) [9, 24, 20].

The following proposition presents bounds on the optimality gap for the global models trained with Scaffold or FedAvg:

Proposition 1.

Suppose each user’s loss function $F_{i}$ is $\mu$ -strongly-convex and $L$ -Lipschitz-smooth. Consider federated learning algorithms Scaffold and FedAvg with local iteration number of user $i$ denoted by $K_{i}$ and local step size denoted by $\eta_{i}$ . Set $\bar{\eta}=\eta_{i}K_{i}$ . Then, we have for Scaffold with $\bar{\eta}\leq\frac{1}{12L}$ ,

\mathbb{E}\|w_{t+1}-w^{*}\|^{2}\leq(1-\frac{\mu\bar{\eta}}{2})\mathbb{E}\|w_{t}-w^{*}\|^{2}+\frac{22\bar{\eta}^{2}\sigma^{2}}{I}\sum_{i\in\mathcal{I}}\frac{1}{s_{i}},

(4)

where $w_{t+1}$ and $w_{t}$ represent the model parameter after global round $t+1$ and $t$ , respectively, $s_{i}$ is user $i$ ’s local batch size, and $\sigma^{2}$ is the variance bound of each data sample.³³3To estimate the true gradient $\nabla F_{i}(w)$ , we uniformly sample one data point to generate a gradient estimate $g_{i}(w)$ and assume $\mathbb{E}\|g_{i}(w)-\nabla F_{i}(w)\|^{2}\leq\sigma^{2}$ for any $w$ . For FedAvg with $\bar{\eta}\leq\frac{1}{12LB}$ ,

\begin{split}\mathbb{E}\|&w^{t+1}-w^{*}\|^{2}\leq\\ &(1-\frac{\mu\bar{\eta}}{2})\mathbb{E}\|w_{t}-w^{*}\|^{2}+6\bar{\eta}^{2}G^{2}+\frac{19\bar{\eta}^{2}\sigma^{2}}{I}\sum_{i\in\mathcal{I}}\frac{1}{s_{i}},\end{split}

(5)

when the bounded dissimilarity assumption is satisfied, i.e., there exist some constants $G\geq 0$ and $B\geq 1$ such that $(1/I)\sum_{i=1}^{I}\|\nabla F_{i}(w)\|^{2}\leq G^{2}+B^{2}\|\nabla F(w)\|^{2}$ , $\forall w$ .

Moreover, by selecting $\bar{\eta}=\frac{c}{t+1}$ for some $c>0$ , we have that the expected optimality gap of the global model satisfies: for Scaffold with $c\leq\frac{1}{12L}$ ,

\mathbb{E}\|w_{t}-w^{*}\|^{2}\leq\frac{1}{t+1}\left(\frac{b_{1}(c)\sigma^{2}}{I}\sum_{i\in\mathcal{I}}\frac{1}{s_{i}}+\|w_{0}-w^{*}\|^{2}\right),

(6)

and for FedAvg with $c\leq\frac{1}{12LB}$ ,

\begin{split}\mathbb{E}\|&w_{t}-w^{*}\|^{2}\leq\\ &\frac{1}{t+1}\left(\frac{b_{2}(c)\sigma^{2}}{I}\sum_{i\in\mathcal{I}}\frac{1}{s_{i}}+b_{2}(c)G^{2}+\|w_{0}-w^{*}\|^{2}\right),\end{split}

(7)

where $b_{i}(c),i=1,2$ are some monotonically increasing functions of $c$ .

The proof of Proposition 1 is given in Appendix A in the technical report [25]. As a large optimality gap $\|w_{t}-w^{*}\|^{2}$ means a high accuracy loss of the global model, Proposition 1 presents a relationship between the expected global model accuracy loss and the users’ data sizes. As shown in (6) and (7), the expected accuracy loss of the global model decreases in the users’ training batch sizes $\{s_{i}\}_{i\in\mathcal{I}}$ (and thus data sizes $\{d_{i}\}_{i\in\mathcal{I}}$ ). Moreover, we explain two asymptotic cases of (6) and (7) for better understanding. When the initial point is optimal (i.e., $w_{0}=w^{*}$ ), the bound does not go to zero due to sample randomness. When batch size $s_{i}$ is large enough, the randomness is then highly reduced and the bound is controlled by the initialization of the algorithm, i.e., the farther the initial point $w_{0}$ is from the optimal solution $w^{*}$ , the more iterations are needed.

Then, after applying the result in Proposition 1 to the natural unlearning model introduced in Section 2.2, we have the following proposition about federated unlearning rounds:

Proposition 2.

Consider the same conditions of Proposition 1 with diminishing step size $\bar{\eta}$ and suppose

b_{i}(c)\leq\frac{1}{(I-I_{leave})\mu^{2}}\frac{\left(\sum_{i\in\mathcal{I}_{leave}}\|\nabla F_{i}(w^{*})\|\right)^{2}}{G^{2}+\sum_{i\in\mathcal{I}\backslash\mathcal{I}_{leave}}\frac{1}{s_{i}}\sigma^{2}},

where $i=1$ when using Scaffold and $i=2$ for FedAvg. It will require

T_{unlearn}\geq\frac{2(I-1)}{\epsilon^{2}\mu^{2}}\sum_{i\in\mathcal{I}_{leave}}\|\nabla F_{i}(w^{*})\|^{2}-1

(8)

rounds of communication to guarantee $\mathbb{E}\|w_{T_{unlearn}}-\tilde{w}^{*}\|\leq\epsilon$ when starting from the original learned model $w^{*}$ , where the new model $\tilde{w}^{*}$ is defined in (3).

The proof of Proposition 2 is given in Appendix B in the technical report [25]. Each user’s gradient $\|\nabla F_{i}(w^{*})\|$ can represent his training loss (denoted as $\ell_{i}$ ) because the calculated gradient increases in the loss. Hence, Proposition 2 reveals the relationship between the number of communication rounds required for federated unlearning and the training losses of leaving users. As indicated in (8), a larger total training loss of the leaving users $\sum_{i\in\mathcal{I}_{leave}}\ell_{i}^{2}$ (i.e., a larger $\sum_{i\in\mathcal{I}_{leave}}\|\nabla F_{i}(w^{*})\|^{2}$ ) requires more communication rounds $T_{unlearn}$ to achieve unlearning.

We will apply the derived results about model accuracy loss and unlearning rounds in building the system model in the next section.

3 System Model

We consider a federated learning and unlearning system consisting of a set of heterogeneous users with private data and a central server. As illustrated in Fig. 1, the server first incentivizes users as workers to participate in a federated learning phase through a contract. However, some users may later choose to revoke their data and leave the system. In response, the server can provide further incentives to retain valuable users. Upon the final exit of some users from the system, the remaining users collectively execute an algorithm to unlearn the leaving users’ data.

In the following, we first divide the heterogeneous users into different types for the convenience of incentive design, then formulate a multi-stage game between the strategic server and users, and finally specify the payoffs of the server and the users (i.e., their optimization objectives) in two unlearning scenarios, respectively.

3.1 User Type

We consider a set $\mathcal{I}\triangleq\{1,2,...,I\}$ of users in the system with two-dimensional private information: marginal cost for training effort $\theta$ and marginal perceived privacy cost $\xi$ . We refer to a user with $(\theta_{j},\xi_{j})$ as a type $j$ user. We further assume that the $I$ users belong to a set $\mathcal{J}\triangleq\{1,2,...,J\}$ of $J$ types. Each type $j$ has $I_{j}$ users, with $\sum_{j\in\mathcal{J}}I_{j}=I$ . The total number of users $I$ and the number of each type $I_{j}$ are public information, but each user’s specific type is private information.⁴⁴4The server can have knowledge about statistics of type information through market research and past experiences, but it is hard for it to know each user’s private type.

Under private information, it is difficult for the server to predict users’ strategies. To this end, we propose to design a contract mechanism for the server to elicit information.

Refer to caption — Figure 1: Framework of federated learning and unlearning system with incentive mechanisms.

3.2 Games and Strategies

We use a multi-stage Stackelberg game to model the interaction between the server and users in each of the two scenarios.

3.2.1 Unlearning-Allowed Scenario

When unlearning is allowed, we consider the following four-stage game that captures the move sequence of the server and the users:

•

Stage I: The server designs a federated learning incentive contract $\bm{\phi}\triangleq\left\{\phi_{j}\right\}_{j\in\mathcal{J}}$ , which contains $J$ contract items (one for each user type). Each contract item $\phi_{j}\triangleq\left(d_{j},r^{L}_{j}\right)$ specifies the relationship between the required data size $d_{j}$ of each type- $j$ user (for local computation) and the corresponding learning reward $r^{L}_{j}$ .
•

Stage II: Users decide which contract item to choose. Then, they jointly implement the federated learning algorithm (Scaffold or FedAvg).

•

Stage III: Users decide whether to revoke data after federated learning. We denote a user $i$ ’s revocation decision as

\begin{split}x_{i}=\left\{\begin{array}[]{rcl}&0,&\rm{if}\;\text{user $i$ does not revoke data},\\ &1,&\rm{if}\;\text{user $i$ revokes his data},\\ \end{array}\right.\end{split}

(9)

and denote the set of users who revoke their data as $\mathcal{I}_{u}$ . If a type- $j$ user revokes his data, then he needs to fully return the reward $r_{j}^{L}$ to the server.⁵⁵5If there is no such return policy, every user can first participate to get rewards and then revoke data to reduce costs, resulting in a catastrophic failure of model training collaboration and a huge cost to the server. We consider that the server will announce users’ training losses $\{\ell_{i}\}_{i\in\mathcal{I}}$ (without specifying users) after federated learning to help users decide whether to revoke data.⁶⁶6It is not obvious that a strategic server would make such an announcement, but it can be stipulated by regulations for protecting users’ right to be forgotten. If we do not make this assumption, the problem will be even simpler. As we shall see in the analysis in Section 4.2, we just need to replace other users’ training losses $\{\ell_{k}\}_{k\in\mathcal{I}}$ in (23) with the same expected loss $\mathbb{E}[\ell]$ and solve the problem through a similar approach.

•

Stage IV: The server decides the set of leaving users to retain $\mathcal{I}_{r}$ and designs the corresponding retention incentives $\left\{r^{U}_{i}\right\}_{i\in\mathcal{I}_{r}}$ , such that those receiving the retention incentives will choose to stay in the system and those without will leave.⁷⁷7In this case, $\mathcal{I}_{u}\backslash\mathcal{I}_{r}$ is the set of users who finally leave the system, and $\mathcal{I}\backslash(\mathcal{I}_{u}\backslash\mathcal{I}_{r})$ is the set of users who finally stay. The remaining users and server collectively implement federated unlearning.

TABLE I: The Server and Users’ Knowledge in Different Stages

Stage

Known

Unknown

Server in Stage I

\mathcal{J}

\{I_{j}\}_{j\in\mathcal{J}}

\{\theta_{i},\xi_{i},\ell_{i},v_{i}\}_{i\in\mathcal{I}}

User in Stage II

his own type

(\theta_{i},\xi_{i})

other users’ types,

\{\ell_{i},v_{i}\}_{i\in\mathcal{I}}

User in Stage III

his own type

(\theta_{i},\xi_{i})

\{\ell_{i}\}_{i\in\mathcal{I}}

other users’ types,

\{v_{i}\}_{i\in\mathcal{I}}

Server in Stage IV

\mathcal{J}

\{I_{j}\}_{j\in\mathcal{J}}

\{\theta_{i},\xi_{i},\ell_{i},v_{i}\}_{i\in\mathcal{I}}

TABLE II: Key Notations

$\theta_{j}$	Marginal training cost of type- $j$ users
$\xi_{j}$	Marginal perceived privacy cost of type- $j$ users
$I_{j}$	Number of type- $j$ users
$j/\mathcal{J}$	Index/Set of user types in the system
$i/\mathcal{I}$	Index/Set of users in the system
$\mathcal{I}_{u}$	Set of users who revoke their data in Stage III
$\mathcal{I}_{r}$	Set of users who are retained by the server in Stage IV
$\phi_{j}$	Contract item designed for type- $j$ users
$d_{j}$	Required data size for each type- $j$ user in the contract
$r_{j}^{L}$	Learning reward for each type- $j$ user in the contract
$r_{i}^{U}$	Unlearning reward (retention incentive) for user $i$
$x_{i}$	User $i$ ’s data revocation decision
$p_{j}$	Historical revocation rate of type- $j$ users
$q_{j}$	Historical retention rate of type- $j$ users
$T$	Number of communication rounds of federated learning
$\lambda$	Coefficient related to unlearning communication rounds
$\varrho$	Coefficient related to expected accuracy loss
$\gamma$	Server’s weight on incentive rewards
$v_{i}$	User $i$ ’s contribution to global model accuracy
$\ell_{i}$	User $i$ ’s training loss (representing $\|\|\nabla F_{i}(x^{*})\|\|$ )

In Stage III, we use $\ell_{i}=||\nabla F_{i}(w_{T})||$ to represent the training loss, where $w_{T}$ is the solution obtained after $T$ iterations of Scaffold or FedAvg. We assume $T$ is large enough, such that $w_{T}$ and $w^{*}$ are close. A large $\ell_{i}$ implies the federated solution is far away from the minimizer of local loss function $F_{i}$ and therefore a larger training loss.

After federated learning, the server and users have more information in Stages III and IV compared with Stages I and II. For example, the users will know their training losses $\{\ell_{i}\}_{i\in\mathcal{I}}$ . The server can evaluate the users’ contribution to the global model (denoted by $\{v_{i}\}_{i\in\mathcal{I}}$ ), and it will know each user’s type by observing users’ contract item choices. We summarize their knowledge about some key information in the four stages in Table I and list the key notations in this paper in Table II.⁸⁸8As analyzing the four-stage game is complicated, this paper does not model the information update in a fully Bayesian framework but specifies plausible beliefs that the players hold in each stage.

Moreover, in Stage IV, the server has enough information to know whether the users will accept the retention incentives. Therefore, we do not model a Stage V in which the users decide to accept or not accept the retention incentives. After that, as in Fig. 1, the staying users perform federated unlearning under the server’s coordination, which makes staying users sustain unlearning costs. We will specify the payoffs and costs of the server and users in each stage of the game in the next subsection.

3.2.2 Unlearning-Forbidden Scenario

Without the unlearning process, we consider a two-stage game that only includes Stages I and II from the unlearning-allowed case.

3.3 Payoffs in the Unlearning-Allowed Scenario

At each stage, every user or the server seeks to maximize his expected payoff (or minimize his expected cost) based on his current knowledge. As knowledge updates occur between stages, the payoffs of the users or the server (maximization or minimization objectives respectively) take different forms in each stage.

3.3.1 Server’s Payoff in Stage I

The server’s objective in Stage I is to minimize the sum of the expected accuracy loss of the global model and the expected total incentive rewards for users.

First, we specify the expected model accuracy loss, which depends on the data of users who finally stay in the system. Since the server cannot predict which users will leave and who to retain due to the lack of information in Stage I, it can only base its decision on user distribution expectations. Specifically, we assume that according to the historical experience and market statistics, the server knows the probability of a type- $j$ user revoking his data (i.e., his revocation rate) $p_{j}$ and the probability that a type- $j$ user who wants to revoke data is retained (i.e., his retention rate) $q_{j}$ , where $p_{j}$ and $q_{j}$ are independent. Following Proposition 1, we model the server’s expected accuracy loss after federated unlearning as:

\frac{\varrho}{T}\sum_{j\in\mathcal{J}}I_{j}(1-p_{j}+p_{j}q_{j})\frac{1}{d_{j}},

(10)

where $T$ is the number of communication rounds of federated learning, $\varrho$ is a coefficient related to the sample variance, and $1-p_{j}+p_{j}q_{j}$ is the percentage of type $j$ users remaining in the system in the end. This captures that the expected model accuracy loss decreases in the data sizes of all staying users.⁹⁹9As the server aims to incentivize users to contribute data in federated learning, we only model the impact of data sizes and omit the independent term about initial point $w_{0}$ in (6) and (7). Since we consider that users’ batch sizes $\{s_{i}\}_{i\in\mathcal{I}}$ are in the same proportion to their data sizes $\{d_{i}\}_{i\in\mathcal{I}}$ , it is equivalent to substitute $s_{i}$ with $d_{i}$ in (6) and (7).

The server’s payoff also includes the cost of all rewards it pays to users, which comprises the initial contract announced in Stage I and incentives offered to encourage leaving users to remain in Stage IV. If all users choose to participate in the contract and choose their corresponding contract items,¹⁰¹⁰10As we shall see in Section 4.4, we will design the contract to ensure that each user will participate (i.e., individual rationality) and choose the contract item designed for his type (i.e., incentive compatibility). the expected total learning reward is $\sum_{j\in\mathcal{J}}I_{j}(1-p_{j}+p_{j}q_{j})r^{L}_{j}$ . Note that if a type- $j$ user successfully revokes his data, he needs to fully return the reward $r^{L}_{j}$ to the server. The server’s expected incentive for retaining leaving users is $\mathbb{E}[\sum_{i\in\mathcal{I}_{r}}r_{i}^{U}]$ , which depends on $p$ , $q$ , and training losses and will be calculated through backward induction in Section 4.4.

Combining these terms, the server’s expected cost in Stage I is

\begin{split}W^{s-1}&=\frac{\varrho}{T}\sum_{j\in\mathcal{J}}I_{j}(1-p_{j}+p_{j}q_{j})\frac{1}{d_{j}}\\ &+\gamma\left(\sum_{j\in\mathcal{J}}I_{j}(1-p_{j}+p_{j}q_{j})r^{L}_{j}+\mathbb{E}\bigg{[}\sum_{i\in\mathcal{I}_{r}}r_{i}^{U}\bigg{]}\right),\end{split}

(11)

where $\gamma$ is how much weight the server puts on the incentive reward payments compared to the model accuracy loss. A smaller $\gamma$ means that the server is less concerned about minimizing the incentive rewards and more concerned about reducing the accuracy loss.

3.3.2 Users’ Payoffs in Stage II

In the overall game, there are three possible outcomes for a user (not revoke data, revoke and retained, revoke and not retained). However, in this stage, a user does not have enough information to know which outcome will realize, so he must calculate his expected payoff by considering three cases:

•

Case (a): not revoke. With probability $1-p_{j}$ , a type- $j$ user will not revoke his data after federated learning. In this case, his expected payoff is the difference between the learning reward $r^{L}_{j}$ and costs (including the learning cost, privacy cost, and unlearning cost):

U_{j,a}^{s-2}=r^{L}_{j}-\theta_{j}d_{j}T-\xi_{j}\mathbb{E}[\ell_{j}]d_{j}-\mathbb{E}\bigg{[}\theta_{j}d_{j}\lambda\sum_{i\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{i}^{2}\bigg{]},

(12)

where $\theta_{j}d_{j}T$ is the total learning cost in $T$ rounds. As we consider that each user’s sampled data size in each local round is proportional to his total data size, the learning cost is linear in his data size $d_{j}$ (e.g., [9, 24, 20]). Similarly, in the unlearning cost $\theta_{j}d_{j}\lambda\sum_{i\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{i}^{2}$ , the $\lambda\sum_{i\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{i}^{2}$ models the number of communication rounds for unlearning, which increases in the leaving users’ training losses (according to Proposition 2).¹¹¹¹11We use the simplified model of (8) in Proposition 2 to capture the key relationship between the unlearning communication rounds $T_{unlearn}$ and leaving users’ training losses (represented by $\|\nabla F_{i}(w^{*})\|$ ). A type $j$ user’s perceived privacy cost $\xi_{j}\mathbb{E}[\ell_{j}]d_{j}$ increases in his expected training loss $\mathbb{E}[\ell_{j}]$ and data size $d_{j}$ . As a high training loss $\ell_{j}$ reflects a large distance of user $j$ ’s data from the average of other users’ distribution, we use it to measure the uniqueness of a user. Thus, the model captures that the privacy cost increases in the uniqueness and size of one’s training data (e.g., [26, 27]). As each user cannot know his exact training loss $\ell_{j}$ before federated learning, we assume that he estimates the expected loss using the public distribution (with mean $\mathbb{E}[\ell_{j}]$ and variance $D(\ell_{j})$ ).

•

Case (b): revoke but retained. With probability $p_{j}q_{j}$ , a type- $j$ user will revoke his data after federated learning but will be retained by the server through more incentives $r^{U}_{j}$ . In this case, his expected payoff is the difference between total rewards (including both learning and unlearning incentives) and costs:

\begin{split}U_{j,b}^{s-2}=&r^{L}_{j}+\mathbb{E}\left[r^{U}_{j}\right]-\theta_{j}d_{j}T\\ &-\xi_{j}\mathbb{E}[\ell_{j}]d_{j}-\mathbb{E}\bigg{[}\theta_{j}d_{j}\lambda\sum_{i\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{i}^{2}\bigg{]}.\end{split}

(13)

The unlearning incentive $r^{U}_{j}$ will be determined by the server in Stage IV based on users’ training losses, contributions, and data revocation, which are unknown in this stage. Thus, each user can only calculate the expectation of the unlearning incentive.

•

Case (c): revoke and not retained. With probability $p_{j}(1-q_{j})$ , a type- $j$ user will revoke his data and will not be retained by the server, i.e., the user’s data will be unlearned. The user needs to return the reward $r^{L}_{j}$ to the server but will not incur any privacy cost or unlearning cost. In this case, his expected payoff is

$U_{j,c}^{s-2}=-\theta_{j}d_{j}T,$ (14)

which is the sunk training cost from federated learning.

In summary, a type- $j$ user’s expected payoff in Stage II is

\begin{split}U_{j}^{s-2}=(1-p_{j})U_{j,a}^{s-2}+p_{j}q_{j}U_{j,b}^{s-2}+p_{j}(1-q_{j})U_{j,c}^{s-2}.\\ \end{split}

(15)

If $U_{j}^{s-2}\geq 0$ , the type- $j$ user will choose to participate in the federated learning in Stage II.

3.3.3 Users’ Payoffs in Stage III

After federated learning, each user $i$ has knowledge about his training loss $\ell_{i}$ . If user $i$ chooses not to revoke his data, his expected payoff in Stage III is (updating (12) in Case (a) with the realized training loss $\ell_{i}$ ):

U_{i,a}^{s-3}=r^{L}_{i}-\theta_{i}d_{i}T-\xi_{i}\ell_{i}d_{i}-\mathbb{E}\bigg{[}\theta_{i}d_{i}\lambda\sum_{k\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{k}^{2}\bigg{]}.

(16)

The reason for using expectation here is that users do not know the set of retained users $\mathcal{I}_{r}$ determined in Stage IV. Users’ expected payoffs of Cases (b) and (c) in Stage III follow the same approach (i.e., updating (13) and (14) with the realized training loss $\ell_{i}$ ).

Note that users of the same type may have different training losses and thus different payoffs, so the payoff in Stage III is user-specific instead of type-specific. Moreover, after some users leave, the remaining users’ training losses may change as the global model will be updated. Since users cannot accurately predict their future expected loss even if they know all users’ current losses, we assume that each user still approximates his future expected loss as equal to his current loss.

3.3.4 Server’s Payoff in Stage IV

When some users want to leave the system, it is important for the server to know their contributions to the global model for retaining valuable users.

A fair and effective method to compute a user’s contribution to a coalition is the Shapley value [28]. Wang et al. [29] introduced a related concept called federated Shapley value to evaluate each user’s contribution in a federated learning setting. The federated Shapley value for user $i$ , denoted as $v_{i}$ , is calculated by the server during the federated learning process and is unknown to the users.

Once obtaining users’ contributions (federated Shapley values), the server can calculate its realized cost in Stage IV. This cost is the sum of two factors: the realized accuracy loss, which is estimated by the sum of federated Shapley values of all users who remain in the system, and the realized incentives.

W^{s-4}=\sum_{i\in\mathcal{I}\backslash(\mathcal{I}_{u}\backslash\mathcal{I}_{r})}v_{i}+\gamma\left(\sum_{i\in\mathcal{I}\backslash(\mathcal{I}_{u}\backslash\mathcal{I}_{r})}r^{L}_{i}+\sum_{i\in\mathcal{I}_{r}}r^{U}_{i}\right).

(17)

The first term in (17) represents the model accuracy loss, the second is the learning reward paid to all remaining users for participation in federated learning, and the last term is the total retention incentive. The additivity property of federated Shapley values allows the server to compare all the possible sets of users to retain and find the optimal one. Note that a smaller federated Shapley value is better, as it means a larger contribution to the accuracy of the global model, and the federated Shapley values can be negative.

3.4 Payoffs in the Unlearning-Forbidden Scenario

Similar to Stages I and II in the unlearning-allowed scenario, we now specify the users and the server’s payoffs in the unlearning-forbidden scenario. The difference here is that there are no unlearning considerations (e.g., data revocation or retention incentives). We will use the superscript ^′ for the unlearning-forbidden scenario to differentiate the notations in the two scenarios.

3.4.1 Server’s Payoff in Stage I

The server needs to minimize the sum of the expected accuracy loss and the incentive rewards paid for federated learning:

W=\frac{\varrho}{T}\sum_{j^{\prime}\in\mathcal{J}^{\prime}}\frac{I^{\prime}_{j^{\prime}}}{d^{\prime}_{j^{\prime}}}+\gamma\sum_{j^{\prime}\in\mathcal{J}^{\prime}}I^{\prime}_{j^{\prime}}{{r}_{j^{\prime}}^{L\prime}}.

(18)

3.4.2 Users’ Payoffs in Stage II

A type- $j^{\prime}$ user’s payoff is the difference between the learning reward and training costs (including the learning cost and perceived privacy cost)

U_{j^{\prime}}=r^{L\prime}_{j^{\prime}}-\theta_{j^{\prime}}d^{\prime}_{j^{\prime}}T-\xi^{\prime}_{j^{\prime}}\mathbb{E}[\ell^{\prime}_{j^{\prime}}]d^{\prime}_{j^{\prime}}.

(19)

Note that the marginal perceived privacy cost $\xi^{\prime}_{j^{\prime}}$ here should be no smaller than that in the unlearning-allowed scenario $\xi_{j}$ for the same user, as a user cannot revoke his data once he decides to participate in the federated learning in the unlearning-forbidden scenario.

Next, we will use the standard backward induction to analyze the server and users’ optimal strategies in two unlearning scenarios.

4 Optimal Incentive Mechanism in Unlearning-Allowed Scenario

In this section, we analyze an optimal incentive mechanism for the unlearning-allowed scenario. Based on backward induction, we will derive the optimal strategies from Stage IV to Stage I in Sections 4.1-4.4, respectively.

4.1 Server’s Retention Strategies in Stage IV

Given the server’s contract $\bm{\phi}$ in Stage I, the users’ contract item choices in Stage II, and the users’ revocation decisions $\mathcal{I}_{u}$ in Stage III, the server needs to determine which users to retain $\mathcal{I}_{r}$ and the corresponding retention incentives $\{r^{U}_{i}\}_{i\in\mathcal{I}_{r}}$ in Stage IV.

As we discussed in Section 3.3.4, the server seeks to minimize the cost in (17) in Stage IV, which can be formulated as follows:

Problem 1 (Server’s Optimization Problem in Stage IV).


$\displaystyle\min\;$	$\displaystyle\sum_{i\in\mathcal{I}\backslash(\mathcal{I}_{u}\backslash\mathcal{I}_{r})}v_{i}+\gamma\left(\sum_{i\in\mathcal{I}\backslash(\mathcal{I}_{u}\backslash\mathcal{I}_{r})}r^{L}_{i}+\sum_{i\in\mathcal{I}_{r}}r^{U}_{i}\right)$	(20a)
$\displaystyle\rm{s.t.}\;$	$\displaystyle r^{U}_{i}+r^{L}_{i}-\theta_{i}d_{i}T-\xi_{i}\ell_{i}d_{i}-\theta_{i}d_{i}\lambda\sum_{k\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{k}^{2}$
	$\displaystyle\hskip 105.27519pt\geq-\theta_{i}d_{i}T,\forall i\in\mathcal{I}_{r}$	(20b)
$\displaystyle\rm{var.}\;$	$\displaystyle\mathcal{I}_{r}\subseteq\mathcal{I}_{u},\{r^{U}_{i}\}_{i\in\mathcal{I}_{r}}.$	(20c)

The constraint (20b) is to ensure that the retention incentives are enough to make the target users stay in the system. The left-hand side of the constraint is a user $i$ ’s payoff after accepting the retention incentive (including unlearning reward, learning reward, learning cost, privacy cost, and unlearning cost), and the right-hand side is his payoff of not accepting (i.e., he has to return the learning reward to the server and only has sunk learning cost).

The following proposition presents the solution to Problem 1.

Proposition 3.

The server’s optimal set of users to retain is

\mathcal{I}_{r}^{*}=\arg\min_{\mathcal{I}_{r}\subseteq\mathcal{I}_{u}}\;\sum_{i\in\mathcal{I}_{r}}\left(v_{i}+\gamma\theta_{i}d_{i}\lambda\sum_{k\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}}\ell_{k}^{2}+\gamma\xi_{i}\ell_{i}d_{i}\right),

(21)

and the optimal retention incentives are

{r^{U}_{i}}^{*}=\theta_{i}d_{i}\lambda\sum_{k\in\mathcal{I}_{u}\backslash\mathcal{I}_{r}^{*}}\ell_{k}^{2}+\xi_{i}\ell_{i}d_{i}-r^{L}_{i},\forall i\in\mathcal{I}_{r}^{*}.

(22)

The proof of Proposition 3 is given in Appendix C in the technical report [25]. Proposition 3 highlights a trade-off regarding the retention of users and their training losses. Users who have larger training losses incur higher privacy costs and thus require higher incentives to retain (indicated by $\gamma\xi_{i}\ell_{i}d_{i}$ in (21)). However, retaining such users also helps reduce the unlearning costs since the objective in (21) increases with the aggregated loss of the leaving users. Furthermore, the server has the incentive to retain users who contribute more to the model accuracy, which corresponds to smaller values of $v_{i}$ . Additionally, users with smaller marginal costs $\theta_{i}$ and $\xi_{i}$ are also desirable to reduce unlearning incentives.¹²¹²12Note that in (21), the server may not only include users with a negative value in the brackets, as retaining some users with positive values may reduce the server’s objective through the aggregated losses. This is an integer programming problem with complexity $\mathcal{O}(2^{I_{u}})$ . When the number of leaving users $I_{u}$ is large, the server can reduce the complexity by classifying the leaving users into several categories to retain, each category with similar contributions and costs.

4.2 Users’ Revocation Decisions in Stage III

Considering the server’s optimal retention strategies in Stage IV, each user $i$ decides whether to revoke his data in Stage III given the information announced in Stages I and II.

Based on the server’s optimal retention incentives (22) and the user’s payoffs in Stage III (i.e., the updated (13) and (14) with realized losses), a user $i$ ’s payoff after revoking data is $-\theta_{i}d_{i}T$ , regardless of whether the user is retained by the server or not. Thus, user $i$ ’s expected payoff in Stage III can be rewritten as

\begin{split}&U_{i}^{s-3}(x_{i};x_{-i})=x_{i}\left(-\theta_{i}d_{i}T\right)\\ &+(1-x_{i})\left[r^{L}_{i}-\theta_{i}d_{i}T-\xi_{i}\ell_{i}d_{i}-\theta_{i}d_{i}\lambda\sum_{k\in\mathcal{I}}x_{k}(1-q)\ell_{k}^{2}\right],\end{split}

(23)

where $x_{-i}=\{x_{k}\}_{k\in\mathcal{I}\backslash\{i\}}$ is the revocation decisions of all users except user $i$ and $q=\mathbb{E}[q_{j}]$ is the expected retention rate of all users, as users do not know each other’s type.¹³¹³13Here we use the historical retention rate $q$ to calculate the expected payoffs instead of the retention rate obtained in Stage IV (i.e., ${|\mathcal{I}_{r}^{*}|}/{|\mathcal{I}_{u}|}$ ). This is because users do not know their federated Shapley values and cannot calculate $\mathcal{I}_{r}^{*}$ . If they calculate the expectation $\mathbb{E}[\mathcal{I}_{r}^{*}]$ based on type statistics, according to (21), the result will be user type retention instead of user retention (e.g., retain all type-i users and not retain all type- $j$ users regardless of different data distributions and losses of the same type of users), which is not true. Conversely, historical rates ranging between $[0,1]$ allow for more realistic partial retention of same-type users. Therefore, we assume that the users have a belief at this stage in the retention rate which is the same as the historical rate. In the following analysis in Stages I and II, we will also use the historical rates for calculating the expected cost/payoffs for similar reasons. As shown in (23), each user’s payoff depends on the other users’ revocation decisions, so users engage in a non-cooperative game in Stage III.

We formally define users’ non-cooperative sub-game as follows.

Sub-Game 1 (Users’ Revocation Sub-Game in Stage III).

•

Players: all users in set $\mathcal{I}$ .
•

Strategy space: each user $i\in\mathcal{I}$ decides whether to revoke his data, i.e., $x_{i}\in\{0,1\}$ (0: not revoke, 1: revoke).
•

Payoff function: each user $i\in\mathcal{I}$ maximizes his payoff in (23).

The following proposition characterizes the Nash equilibrium (NE) of Sub-Game 1:

Proposition 4.

Sub-Game 1 is a supermodular game, where pure NE exists but may not be unique. Algorithm 1 converges to one NE.

Input :

\{r^{L}_{i},\xi_{i},\ell_{i},d_{i},\theta_{i}\}_{i\in\mathcal{I}},\lambda,q

Output : Optimal revocation decisions

\{x_{i}^{*}\}_{i\in\mathcal{I}}

2Initialize

x_{i}^{*}\leftarrow 0,i\in\mathcal{I}

;

3while $\exists x_{i}^{*}=0\;\&\;r^{L}_{i}-\xi_{i}\ell_{i}d_{i}-\theta_{i}d_{i}\lambda(1-q)\sum_{k\in\mathcal{I}\backslash\{i\}}x_{k}\ell_{k}^{2}<0$ do

x_{i}^{*}\leftarrow 1,\forall i\;\text{satisfying conditions in line 2};

5 end while

Algorithm 1 Users’ optimal revocation decisions

The proof of Proposition 4 is given in Appendix D in the technical report [25]. Based on Algorithm 1, we can find the set of users who revoke data in one NE, i.e., $\mathcal{I}_{u}^{*}=\{i:x_{i}^{*}=1,i\in\mathcal{I}\}$ . Basically, Algorithm 1 corresponds to doing best response updates of the users starting from all users choosing not to revoke (i.e., 0). It is well known that for supermodular games, these updates will converge monotonically to a NE. Algorithm 1 will terminate within $I$ iterations.¹⁴¹⁴14We can also initialize all the users’ decisions as 1 and check whether there exists a user who wants to change his action from 1 to 0 for payoff improvement. If the equilibrium is the same as that found by Algorithm 1, it is the unique NE, as Game 1 is a supermodular game. The resulting equilibrium strategies and insights will be illustrated through simulation in Section 6.2.2.

4.3 Users’ Contract Item Choices in Stage II

Based on the analysis in Stages III and IV, a type- $j$ user’s expected payoff in Stage II (15) can be rewritten as:

\begin{split}U_{j}^{s-2}=(1-p_{j})r_{j}^{L}-\kappa_{j}d_{j},\end{split}

(24)

where

\begin{split}\kappa_{j}&\triangleq(1-p_{j})\xi_{j}\mathbb{E}[\ell_{j}]+\theta_{j}T\\ &+\theta_{j}(1-p_{j})\lambda\sum_{m\in\mathcal{J}}I_{m}p_{m}(1-q_{m})\left(\mathbb{E}[\ell_{m}]^{2}+D(\ell_{m})\right),\end{split}

(25)

and $D(\ell_{m})$ is the variance of type- $m$ users’ training losses.

Each type- $j$ user in Stage II will choose a contract item that gives him a maximum non-negative expected payoff, leading to the constraints that the server needs to consider in Stage I.

4.4 Server’s Contract in Stage I

In Stage I, the server designs a contract to minimize its expected cost, considering the results in Stages II-IV.

When designing the contract, the server needs to ensure that each user achieves a non-negative payoff, so that the user will accept the corresponding contract item. Moreover, since the server does not know each user’s type in Stage I, the server also needs to make a user choose the contract item intended for him (i.e., the user does not misreport his type).¹⁵¹⁵15Revelation principle demonstrates that if a social choice function can be implemented by an arbitrary mechanism, then the same function can be implemented by an incentive-compatible-direct-mechanism (i.e. in which users truthfully report types) with the same equilibrium outcome. Thus, requiring IC will simplify the mechanism design without affecting optimality. In other words, a contract is feasible if and only if it satisfies Individual Rationality (IR) and Incentive Compatibility (IC) constraints:

Definition 1 (Individual Rationality).

A contract is individually rational if each type- $j$ user receives a non-negative payoff by accepting the contract item $\phi_{j}=\left(d_{j},r^{L}_{j}\right)$ intended for his type, i.e.,

(1-p_{j})r^{L}_{j}-\kappa_{j}d_{j}\geq 0,\forall j\in\mathcal{J}.

(26)

Definition 2 (Incentive Compatibility).

A contract is incentive compatible if each type- $j$ user maximizes his own payoff by choosing the contract item $\phi_{j}=\left(d_{j},r^{L}_{j}\right)$ intended for his type, i.e.,

(1-p_{j})r^{L}_{j}-\kappa_{j}d_{j}\geq(1-p_{j})r^{L}_{m}-\kappa_{j}d_{m},\forall j,m\in\mathcal{J}.

(27)

Considering the constraints in Definitions 1 and 2, the server in Stage I seeks to design the contract $\bm{\phi}=\{(d_{j},r^{L}_{j})\}_{j\in\mathcal{J}}$ to minimize its expected cost in (11), which is rewritten as follows after combining the results in Stages II-IV:

Problem 2.

\begin{split}\min\;&\sum_{j\in\mathcal{J}}\left(\frac{\varrho I_{j}(1-p_{j}+p_{j}q_{j})}{Td_{j}}+\gamma I_{j}(1-p_{j})r^{L}_{j}\right.\\ &\hskip 79.66771pt+\gamma I_{j}p_{j}q_{j}(\alpha\theta_{j}+\xi_{j}\mathbb{E}[\ell_{j}])d_{j}\bigg{)},\\ \rm{s.t.}\;&(1-p_{j})r^{L}_{j}-\kappa_{j}d_{j}\geq 0,\forall j\in\mathcal{J},\\ &(1-p_{j})r^{L}_{j}-\kappa_{j}d_{j}\geq(1-p_{j})r^{L}_{m}-\kappa_{j}d_{m},\forall j,m\in\mathcal{J},\\ \rm{var.}\;&\left\{\left(d_{j},r^{L}_{j}\right)\right\}_{j\in\mathcal{J}},\end{split}

(28)

where

\alpha\triangleq\lambda\sum_{j\in\mathcal{J}}I_{j}p_{j}(1-q_{j})\left(\mathbb{E}[\ell_{j}]^{2}+D(\ell_{j})\right).

(29)

Solving Problem 2 involves two challenges. First, users’ multi-dimensional heterogeneity leads to a challenging multi-dimensional contract design for the server. We will simplify the analysis by summarizing users’ multi-dimensional heterogeneity into several one-dimensional metrics, to guide the server’s design of the optimal rewards and data sizes in the contract. Second, as the total number of IR and IC constraints is large (i.e., $J^{2}$ ), it is challenging to obtain the optimal contract directly. To overcome such a complexity issue, we will first transform the constraints into a smaller number of equivalent ones (Lemma 1). Then, for any given data size $\bm{d}=\{d_{j}\}_{j\in\mathcal{J}}$ , we derive the server’s optimal reward $\{r^{L*}_{j}(\bm{d})\}_{j\in\mathcal{J}}$ (Lemma 2) in Section 4.4.1. Finally, we derive the optimal data size $\bm{d}^{*}$ (Proposition 5 and Theorem 1) in Section 4.4.2.

4.4.1 Optimal Rewards in Contract

Without loss of generality, we assume that users are indexed in ascending order of

\begin{split}\pi_{j}\triangleq\frac{\kappa_{j}}{1-p_{j}}&=\xi_{j}\mathbb{E}[\ell_{j}]+\frac{\theta_{j}T}{1-p_{j}}\\ &+\theta_{j}\lambda\sum_{m\in\mathcal{J}}I_{m}p_{m}(1-q_{m})\left(\mathbb{E}[\ell_{m}]^{2}+D(\ell_{m})\right),\end{split}

which can be regarded as a type- $j$ user’s aggregated marginal cost. That is,

\pi_{1}\leq\pi_{2}\leq...\leq\pi_{J}.

(30)

In the following Lemma 1, we present an equivalent version of the IR and IC constraints to simplify Problem 2.

Lemma 1.

A contract $\bm{\phi}=\{(d_{j},r^{L}_{j})\}_{j\in\mathcal{J}}$ is feasible (i.e., satisfies IR and IC constraints) if and only if the contract items satisfy the following three constraints:

a)

$r^{L}_{J}-\pi_{J}d_{J}\geq 0$ ;
b)

$r^{L}_{1}\geq...\geq r^{L}_{J}\geq 0$ and $d_{1}\geq...\geq d_{J}\geq 0$ ;
c)

$r^{L}_{j+1}+\pi_{j}(d_{j}-d_{j+1})\leq r^{L}_{j}\leq r^{L}_{j+1}+\pi_{j+1}(d_{j}-d_{j+1})$ , $j\in\mathcal{J}$ .

The proof of Lemma 1 is given in Appendix E in the technical report [25]. Constraint $(a)$ ensures that each user can get a non-negative payoff by accepting the contract item of type- $J$ users, corresponding to the IR constraints. Both constraints $(b)$ and $(c)$ are related to IC constraints. Constraint $(b)$ shows that the server should request more data from a user type with a lower marginal cost $\pi$ and provide a larger reward in return. Constraint $(c)$ characterizes the relationship between any two neighbor contract items.

Based on Lemma 1, the following Lemma 2 characterizes the server’s optimal learning rewards for any feasible data size:

Lemma 2.

For any given data size $\bm{d}=\{d_{j}\}_{j\in\mathcal{J}}$ (even if it is not optimal), the unique optimal reward for a type $j$ user is:

\begin{split}&r^{L*}_{j}(\bm{d})=\\ &\left\{\begin{array}[]{rcl}&\pi_{j}d_{j},&\rm{if}$$j=J;$$\\ &\pi_{j}d_{j}+\sum_{m={j+1}}^{J}(\pi_{m}-\pi_{m-1})d_{m},&\rm{if}$$j=1,...,J-1.$$\\ \end{array}\right.\end{split}

(31)

The proof of Lemma 2 is given in Appendix F in the technical report [25]. Lemma 2 indicates that all user types except the boundary type $J$ will obtain positive expected payoffs (type- $J$ users receive zero expected payoff), which can be interpreted as the information rent in economics due to information asymmetry.

4.4.2 Optimal Data Sizes in Contract

Based on Lemma 2, we can significantly simplify Problem 2 but still need to derive the optimal values of $J$ variables $\{d_{j}\}_{j\in\mathcal{J}}$ under $J$ constraints $d_{1}\geq...\geq d_{J}\geq 0$ .

For the convenience of presentation, we define

A_{j}\triangleq\frac{\varrho I_{j}(1-p_{j}+p_{j}q_{j})}{T},

(32)

\begin{split}B_{j}\triangleq&\gamma I_{j}\left(p_{j}q_{j}\left(\alpha\theta_{j}+\xi_{j}\mathbb{E}[\ell_{j}]\right)+\left(1-p_{j}\right)\pi_{j}\right)\\ &+\sum_{m=1}^{j-1}\gamma I_{m}(1-p_{m})(\pi_{j}-\pi_{j-1}).\end{split}

(33)

Based on these two metrics, we first present two special cases of the optimal data sizes, which we call all-independent and all-dependent.

Proposition 5.

Two special cases of the optimal data sizes follow:

•

All-independent. If

$\frac{A_{1}}{B_{1}}\geq\frac{A_{2}}{B_{2}}\geq...\geq\frac{A_{J}}{B_{J}},$ (34)

then the optimal data sizes in the contract are

$d_{j}^{*}=\sqrt{\frac{A_{j}}{B_{j}}},j\in\mathcal{J}.$ (35)

•

All-dependent. If

\frac{\sum_{m\in\mathcal{J}}A_{m}}{\sum_{m\in\mathcal{J}}B_{m}}>\frac{\sum_{m=1}^{j}A_{m}}{\sum_{m=1}^{j}B_{m}},\forall j=1,2,...,J-1,

(36)

then the optimal data sizes in the contract are

d_{j}^{*}=\sqrt{\frac{\sum_{m\in\mathcal{J}}A_{m}}{\sum_{m\in\mathcal{J}}B_{m}}},j\in\mathcal{J}.

(37)

The proof of Proposition 5 is given in Appendix G in the technical report [25]. The all-independent case means that if $\{A_{j}/B_{j}\}_{j\in\mathcal{J}}$ follow a descending order, then the optimal data size for each type- $j$ user only depends on his own parameters $(A_{j},B_{j})$ . The condition for the all-dependent case means that for any type $j$ , there always exists at least one type $m>j$ with $A_{m}/B_{m}$ larger than $A_{j}/B_{j}$ (i.e., not in descending order). In this case, each type’s optimal data size depends on all types’ parameters $\{(A_{j},B_{j})\}_{j\in\mathcal{J}}$ .

Next, we give an efficient algorithm to compute the optimal data sizes in any possible case based on the insights in Proposition 5.

Theorem 1.

For a fixed $J$ , there are $2^{J-1}$ possible cases of the optimal data sizes depending on the values of $\{(A_{j},B_{j})\}_{j\in\mathcal{J}}$ . For any given $\{(A_{j},B_{j})\}_{j\in\mathcal{J}}$ , the unique optimal data sizes can be calculated by Algorithm 2.

Input : Parameters

\{(A_{j},B_{j})\}_{j\in\mathcal{J}}

indexed based on (30)

Output : Optimal data sizes

\{(d_{j}^{*})\}_{j\in\mathcal{J}}

2Initialize

d_{j}^{*}\leftarrow\sqrt{\frac{A_{j}}{B_{j}}},j\in\mathcal{J}

;

3Find all non-descending types

\{j:\exists m>j,\frac{A_{m}}{B_{m}}>\frac{A_{j}}{B_{j}}\text{ or }\exists m<j,\frac{A_{m}}{B_{m}}<\frac{A_{j}}{B_{j}}\}

;

4Put each group of non-descending types that have adjacent indexes into one auxiliary set

\mathcal{J}_{x}

;

X\leftarrow

the number of these auxiliary sets;¹⁶¹⁶1616// i.e.,

x\in\{1,2,...,X\}

6 for $x=1$ ; $x\leq X$ ; $x++$ do

check( $\mathcal{J}_{x}$ );¹⁷¹⁷1717// divide each auxiliary set

\mathcal{J}_{x}

into subsets

\{\mathcal{J}_{x}^{y}\}

that satisfy (36)

9 end for

11 Function check( $\mathcal{J}$ ):

13 Reindex the types in

\mathcal{J}

with

1_{\mathcal{J}},2_{\mathcal{J}},...,J_{\mathcal{J}}

14 if $|\mathcal{J}|\neq 1$ then

15 flag

\leftarrow

16 for $m=1_{\mathcal{J}}$ to $(J-1)_{\mathcal{J}}$ do

if $\frac{\sum_{j\in\mathcal{J}}A_{j}}{\sum_{j\in\mathcal{J}}B_{j}}\leq\frac{\sum_{j=1_{\mathcal{J}}}^{m}A_{j}}{\sum_{j=1_{\mathcal{J}}}^{m}B_{j}}$ // $\mathcal{J}$ does not satisfy (36)

17 then

19 flag

\leftarrow

d_{j}^{*}=\sqrt{\frac{\sum_{n=1_{\mathcal{J}}}^{m}A_{n}}{\sum_{n=1_{\mathcal{J}}}^{m}B_{n}}},j\in\{1_{\mathcal{J}},...,m\}

;

check(\{m+1,...,J_{\mathcal{J}}\})

;

22 break;

23 end if

25 end for

27 if flag=1 then

d_{j}^{*}=\sqrt{\frac{\sum_{m\in\mathcal{J}}A_{m}}{\sum_{m\in\mathcal{J}}B_{m}}},j\in\mathcal{J}

;//

\mathcal{J}

satisfies (36)

29 end if

31 end if

Algorithm 2 Optimal data sizes in contract

¹⁶¹⁶footnotetext: For example, if

\frac{A_{1}}{B_{1}}\geq\frac{A_{4}}{B_{4}}\geq\frac{A_{3}}{B_{3}}\geq\frac{A_{2}}{B_{2}}\geq\frac{A_{5}}{B_{5}}\geq\frac{A_{8}}{B_{8}}\geq\frac{A_{6}}{B_{6}}\geq\frac{A_{7}}{B_{7}}

, then

X=2

\mathcal{J}_{1}=\{2,3,4\}

, and

\mathcal{J}_{2}=\{6,7,8\}

.¹⁷¹⁷footnotetext: In the example of footnote 16. If

\frac{A_{6}}{B_{6}}\geq\frac{A_{7}+A_{8}}{B_{7}+B_{8}}

, then

\mathcal{J}_{2}

can be divided into two subsets

\{6\}

and

\{7,8\}

. The optimal data sizes in this example are

d_{j}^{*}=\sqrt{\frac{A_{j}}{B_{j}}},j=1,5,6

d_{j}^{*}=\sqrt{\frac{A_{2}+A_{3}+A_{4}}{B_{2}+B_{3}+B_{4}}},j=2,3,4

, and

d_{j}^{*}=\sqrt{\frac{A_{7}+A_{8}}{B_{7}+B_{8}}},j=7,8

The proof of Theorem 1 is given in Appendix H in the technical report [25]. The computation complexity of Algorithm 2 is $\mathcal{O}(\sum_{x=1}^{X}J_{x})$ , which is no larger than $\mathcal{O}(J)$ . We can interpret Algorithm 2 as greedily merging non-descending types based on ${A_{j}}/{B_{j}}$ , so that all merged types have ${\sum_{j}A_{j}}/{\sum_{j}B_{j}}$ in a descending order.¹⁸¹⁸18In the example of footnotes 16 and 17, $\frac{A_{1}}{B_{1}}\geq\frac{A_{2}+A_{3}+A_{4}}{B_{2}+B_{3}+B_{4}}\geq\frac{A_{5}}{B_{5}}\geq\frac{A_{6}}{B_{6}}\geq\frac{A_{7}+A_{8}}{B_{7}+B_{8}}$ . The optimal data sizes of the merged types are the same and follow the dependent form (37) in Proposition 5, while the optimal data sizes of the not-merged types follow the independent form (35), as illustrated in footnote 17.

5 Optimal Incentive Mechanism in Unlearning-Forbidden Scenario

In this section, we first derive the server’s optimal incentive mechanism in the unlearning-forbidden scenario in Section 5.1, then we compare the unlearning-allowed and unlearning-forbidden scenarios based on the server’s expected cost and users’ expected payoffs in Section 5.2.

5.1 Server’s Optimal Contact Design

Similar to the server’s contract design in the unlearning-allowed scenario (i.e., Problem 2), the server designs the contract to minimize its expected cost in (18) under the IR and IC constraints in the unlearning-forbidden scenario:

Problem 3.

\begin{split}\min\;&\sum_{j^{\prime}\in\mathcal{J}^{\prime}}\left(\frac{\varrho I^{\prime}_{j^{\prime}}}{Td^{\prime}_{j^{\prime}}}+\gamma I^{\prime}_{j^{\prime}}r^{L\prime}_{j^{\prime}}\right),\\ \rm{s.t.}\;&r^{L\prime}_{j^{\prime}}-\Pi_{j^{\prime}}d^{\prime}_{j^{\prime}}\geq 0,\forall j^{\prime}\in\mathcal{J}^{\prime},\text{ (IR)}\\ &r^{L\prime}_{j^{\prime}}-\Pi_{j^{\prime}}d^{\prime}_{j^{\prime}}\geq r^{L\prime}_{m}-\Pi_{j^{\prime}}d^{\prime}_{m},\forall j^{\prime},m\in\mathcal{J}^{\prime},\text{ (IC)}\\ \rm{var.}\;&\left\{\left(d^{\prime}_{j^{\prime}},r^{L\prime}_{j^{\prime}}\right)\right\}_{j^{\prime}\in\mathcal{J}^{\prime}},\end{split}

(38)

where

\Pi_{j^{\prime}}\triangleq\theta_{j^{\prime}}T+\xi^{\prime}_{j^{\prime}}\mathbb{E}[\ell^{\prime}_{j^{\prime}}].

(39)

For the convenience of presentation, we re-indexed users with $j^{\prime}$ in an ascending order of $\Pi$ , i.e.,

\Pi_{1^{\prime}}\leq\Pi_{2^{\prime}}\leq...\leq\Pi_{J^{\prime}},

(40)

and define

A^{\prime}_{j^{\prime}}\triangleq\frac{\varrho I^{\prime}_{j^{\prime}}}{T},

(41)

B^{\prime}_{j^{\prime}}\triangleq\gamma\left(\Pi_{j^{\prime}}\sum_{m=1^{\prime}}^{j^{\prime}}I^{\prime}_{m}-\Pi_{j-1^{\prime}}\sum_{m=1^{\prime}}^{j-1^{\prime}}I^{\prime}_{m}\right).

(42)

After a similar analysis to Section 4.4, we obtain the following theorem about the server’s optimal contract in the unlearning-forbidden scenario.

Theorem 2.

The optimal data sizes $\bm{d}^{\prime*}$ can be obtained from Theorem 1 by substituting $\{(A_{j},B_{j})\}_{j\in\mathcal{J}}$ with $\{(A^{\prime}_{j^{\prime}},B^{\prime}_{j^{\prime}})\}_{j^{\prime}\in\mathcal{J}^{\prime}}$ . The optimal rewards are

\begin{split}&r^{L\prime*}_{j^{\prime}}(\bm{d}^{\prime*})=\\ &\left\{\begin{array}[]{rcl}&\Pi_{j^{\prime}}d^{\prime*}_{j^{\prime}},&\rm{if}\;$$j^{\prime}=J^{\prime};$$\\ &\Pi_{j^{\prime}}d^{\prime*}_{j^{\prime}}+\sum_{m={j+1^{\prime}}}^{J^{\prime}}(\Pi_{m}-\Pi_{m-1})d^{\prime*}_{m},&\rm{if}\;$$j^{\prime}=1^{\prime},...,J-1^{\prime}.$$\\ \end{array}\right.\end{split}

(43)

The proof of Theorem 2 is given in Appendix I in the technical report [25].

5.2 Comparison

In this subsection, we compare the unlearning-allowed and unlearning-forbidden scenarios, to reveal the economic impact of federated unlearning.

Suppose that a type- $j$ user in the unlearning-allowed scenario corresponds to type $j^{\prime}$ in the unlearning-forbidden scenario. For the convenience of presentation, we first introduce the following definitions:

\begin{split}\Delta U_{j}=&\sum_{m={j+1}}^{J}(1-p_{j})(\pi_{m}-\pi_{m-1})\sqrt{\frac{\sum_{m\in\mathcal{J}_{m}}A_{m}}{\sum_{m\in\mathcal{J}_{m}}B_{m}}}\\ &-\sum_{m={j+1^{\prime}}}^{J^{\prime}}(\Pi_{m}-\Pi_{m-1})\sqrt{\frac{\sum_{m\in\mathcal{J}_{m}^{\prime}}A_{m}^{\prime}}{\sum_{m\in\mathcal{J}_{m}^{\prime}}B_{m}^{\prime}}},\end{split}

(44)

\begin{split}\Delta W=&\sum_{j=1}^{J}\frac{2}{|\mathcal{J}_{j}|}\sqrt{\left(\sum_{m\in\mathcal{J}_{j}}A_{m}\right)\left(\sum_{m\in\mathcal{J}_{j}}B_{m}\right)}\\ &-\sum_{j^{\prime}=1^{\prime}}^{J^{\prime}}\frac{2}{|\mathcal{J}_{j^{\prime}}^{\prime}|}\sqrt{\left(\sum_{m\in\mathcal{J}_{j^{\prime}}^{\prime}}A_{m}^{\prime}\right)\left(\sum_{m\in\mathcal{J}_{j^{\prime}}^{\prime}}B_{m}^{\prime}\right)}.\end{split}

(45)

Given the definitions, we present the comparison results in Proposition 6.

Proposition 6.

If $\Delta U_{j}>0$ , then the unlearning-allowed scenario makes a type- $j$ user have a larger payoff (i.e., more beneficial) than the unlearning-forbidden scenario. If $\Delta W<0$ , then the unlearning-allowed scenario makes the server have a smaller cost (i.e., more beneficial) than the unlearning-forbidden scenario.

The proof of Proposition 6 is given in Appendix J in the technical report [25]. We can obtain some insights from qualitative analysis:

•

If users’ perceived privacy costs (e.g., $\{\xi^{\prime}_{j^{\prime}}\}_{j^{\prime}\in\mathcal{J}^{\prime}}$ ) in the unlearning-forbidden scenario are very large but unlearning costs (e.g., $\lambda$ and $D(\ell)$ ) are relatively small, then we will have $\Delta U_{j}<0$ and $\Delta W<0$ , i.e., the unlearning-allowed scenario is worse for users but more beneficial to the server.
•

If the unlearning costs are very large but users’ perceived privacy costs in the unlearning-forbidden scenario are relatively small, then $\Delta U_{j}>0$ and $\Delta W>0$ , i.e., the unlearning-allowed scenario is better for users but worse for the server.

It is counter-intuitive that users prefer the scenario where they have large costs. We will present detailed illustrations about the preferences of users and the server through simulations in Section 6.2.1.

6 Simulations

In this section, we use simulations to evaluate the performance of our proposed mechanism. Specifically, in Section 6.1, we specify our experiment setting. In Section 6.2, we validate the optimal strategies of the users and the server in unlearning-allowed and unlearning-forbidden scenarios, and we also compare our mechanism with state-of-the-art benchmarks.

6.1 Experiment Setting

We consider $J=5$ types of users with marginal training costs $\bm{\theta}=[1,4,6,9,10]$ , marginal perceived privacy costs in the unlearning-allowed scenario $\bm{\xi}=[0.8,1.7,1.4,$ $2.2,1.2]\times 10^{3}$ ,¹⁹¹⁹19Different orders of magnitude are to balance different units of users’ training costs and privacy costs. and marginal perceived privacy costs in the unlearning-forbidden scenario $\bm{\xi^{\prime}}=multiplier\cdot\bm{\xi}$ , where $multiplier=8$ . Each type has $I_{j}=I/J=1000$ users. Heterogeneous users’ training losses follow a truncated normal distribution $N(0.5,0.2)$ over the support $[0,1]$ , and users’ federated Shapley values follow a normal distribution $N(5\times 10^{-5},0.04)$ .²⁰²⁰20In future work, we will use real-world datasets to calculate users’ true training losses and federated Shapley values. The simulation data here can also demonstrate our results. As in Appendix K in the technical report [25], we further validate that if we change the simulation setting, we will obtain similar experiment results and insights. Users perform $T=100$ rounds of federated learning, and the unlearning rounds coefficient $\lambda=4$ . The server’s accuracy loss coefficient $\varrho=1$ and its weight on the incentives $\gamma=10^{-10}$ (to balance different units of incentives and model accuracy loss).

We perform experiments to find the appropriate values of historical revocation rate $p$ and retention rate $q$ . As shown in Fig. 3 and Fig. 3, when we set different values of $p$ and $q$ , both the realized revocation rate $|\mathcal{I}_{u}^{*}|/I$ and retention rate $|\mathcal{I}_{r}^{*}|/|\mathcal{I}_{u}^{*}|$ at the equilibrium have a stationary point, i.e., $(2.8\times 10^{-3},2.8\times 10^{-3})$ in Fig. 3 and $(0.5,0.5)$ in Fig. 3, respectively. Therefore, we take the historical revocation rate $p=0.28\%$ and the historical retention rate $q=50\%$ in the following simulations.

6.2 Experiment Results

In Section 6.2.1, we compare the server’s expected costs and users’ expected payoffs under the optimal contracts in the unlearning-allowed and unlearning-forbidden scenarios. Then, we show users’ optimal equilibrium revocation decisions and the server’s optimal retention decision in Section 6.2.2. Finally, in Section 6.2.3, we present the comparison results between our mechanism and two benchmarks.

6.2.1 Users’ Expected Payoffs and Server’s Expected Cost Comparison

In the following, we show the expected payoff/cost comparison considering three aspects of impact: privacy cost in the unlearning-forbidden scenario, unlearning cost, and training cost.

(i) Impact of marginal perceived privacy cost in the unlearning-forbidden scenario $\bm{\xi^{\prime}}$ :

Fig. 6 shows different types of users’ payoff differences in unlearning-allowed and unlearning-forbidden scenarios, which indicate their preferences in two scenarios. Different types of users may have different preferences, which are closely related to their type ranking in two scenarios.

Specifically, as shown in Fig. 6, a negative type ranking difference is more likely to lead to a positive payoff difference, i.e., a high ranking corresponds to a high payoff. This is consistent with the server’s optimal rewards in Lemma 2 and Theorem 2.

Fig. 6 shows users’ total expected payoff difference and the server’s expected cost difference. When the perceived privacy cost in the unlearning-forbidden scenario $\bm{\xi^{\prime}}$ increases, it is more likely that the unlearning-forbidden scenario is more beneficial to users (i.e., negative payoff difference) but worse for the server (i.e., negative cost difference). The server’s preference is straightforward, as a larger $\bm{\xi^{\prime}}$ means larger incentive costs for the server. However, it is counter-intuitive that users prefer the scenario where they have larger costs, as we may naturally presume that large costs will discourage users’ participation. This is because the server will set the rewards larger than users’ costs due to information asymmetry, and the gap (i.e., users’ total payoff) increases in users’ costs (as indicated in Lemma 2 and Theorem 2).

(ii) Impact of unlearning cost:

Increasing unlearning rounds coefficient $\lambda$ or users’ training loss variance $D(\ell)$ will both increase the unlearning cost, so we only simulate the impact of $\lambda$ here. As shown in Fig. 8, when we increase the unlearning cost, it is more likely that the unlearning-allowed scenario is worse for the server but better for users. This is because larger unlearning costs mean larger incentive costs for the server but more rewards for users in the unlearning-allowed scenario.

Moreover, as shown in Fig. 8, the server and users’ preferences are not always the same (i.e., one positive and the other negative) or different (i.e., the same sign). However, in most cases, they have different preferences.

(iii) Impact of marginal training cost $\bm{\theta}$ :

We increase the value of the marginal training cost $\bm{\theta}$ by multiplying by a multiplier. As shown in Fig. 8, the insights are similar to that of unlearning cost. The training cost $\bm{\theta}$ affects both learning cost and unlearning cost. As both scenarios have learning costs, increasing $\bm{\theta}$ is similar to the effect of increasing the unlearning cost.

6.2.2 Users’ Revocation Decisions and Server’s Retention Decision

As shown in Fig. 10, we rank each type of users in ascending order of their training losses for the convenience of presenting insights.

Fig. 10 shows that at the equilibrium, users with larger aggregated marginal costs $\pi$ (i.e., type 5) and training losses $\ell$ (i.e., user 986-1000) are more likely to revoke their data. This is because (i) users with larger costs receive smaller learning incentives from the server in the contract (Lemma 2); (ii) they do not know their high training losses before federated learning and their realized privacy costs (training losses) significantly exceed their expectations.

Fig. 13 illustrates the server’s optimal retention decision. We rank the users who want to revoke their data in ascending order of their federated Shapley values $\{v_{i}\}_{i\in\mathcal{I}_{u}}$ . Users with smaller federated Shapley values are more likely to be retained by the server, as smaller Shapley values represent larger contributions to the global model accuracy. Users with smaller training losses have lower privacy costs and may require fewer incentives from the server, compared to users with larger losses. However, Fig. 13 shows that the server does not necessarily retain users with smaller training losses. This is because given a fixed set of users, reducing the total training losses of retained users means increasing the total losses of leaving users, resulting in higher unlearning costs (Proposition 3).

6.2.3 Comparison with Benchmarks

We compare our incentive mechanism with two benchmarks to evaluate the performance.

•

No Retention Incentive (NRI): the server does not retain users who want to revoke their data.
•

Limited Look Ahead (LLA) (adapted from [20]): the server first optimizes the incentive mechanism for federated learning without considering the unlearning part, and then designs the retention incentive in unlearning (i.e., separate optimization).
•

Our proposed incentive mechanism (RAR): the server is Rational in jointly optimizing both federated learning and unlearning And designs Retention incentive to retain valuable leaving users.

Fig. 13 shows the server’s costs in the three mechanisms under different numbers of users. Our proposed RAR reduces the server’s cost by around 53.91% (black dotted line) compared with LLA. The reduced cost of RAR compared with NRI can reach 11.59% (black dashed line) and will increase in the number of users, as the server retains more valuable users when the number of users increases. Therefore, it is beneficial for the server to retain valuable leaving users and make joint optimization of federated learning and unlearning incentive mechanisms. As the objective of our incentive mechanism design is to minimize the server’s cost, the server’s cost reduction is at the expense of users’ payoffs (as shown in Fig. 13).

7 Conclusion

To the best of our knowledge, this paper is the first study to focus on the important issue of incentive design for federated learning and unlearning. We derive theoretical bounds on the global model optimality gap and the number of communication rounds of natural federated unlearning, based on Scaffold and FedAvg algorithms. Our approach tackles a challenging problem in incentive design, by summarizing users’ multi-dimensional heterogeneity into one-dimensional metrics and developing an efficient algorithm for an exponentially large number of possible cases. We compare the unlearning-forbidden and unlearning-allowed scenarios in terms of users’ payoffs and the server’s cost. Counter-intuitively, users usually prefer the scenario where they have larger costs. This is because the server will give them even higher incentives than their costs due to information asymmetry. We also identify what types of users will leave the system or be retained by the server. The experiments demonstrate the superior performance of our proposed incentive mechanism and the benefits of unlearning incentives for retaining leaving users. We will design incentive mechanisms for federated unlearning to maximize social welfare in future work.

References

[1] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial Intelligence and Statistics, 2017.
[2] S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang, and Y. Zhou, “A hybrid approach to privacy-preserving federated learning,” in Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 2019.
[3] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava, “A survey on security and privacy of federated learning,” Future Generation Computer Systems, vol. 115, pp. 619–640, 2021.
[4] P. Voigt and A. Von dem Bussche, “The eu general data protection regulation (gdpr),” A Practical Guide, 1st Ed., Cham: Springer International Publishing, vol. 10, no. 3152676, pp. 10–5555, 2017.
[5] E. L. Harding, J. J. Vanto, R. Clark, L. Hannah Ji, and S. C. Ainsworth, “Understanding the scope and impact of the california consumer privacy act of 2018,” Journal of Data Protection & Privacy, vol. 2, no. 3, pp. 234–253, 2019.
[6] Y. Jiang, J. Konečnỳ, K. Rush, and S. Kannan, “Improving federated learning personalization via model agnostic meta learning,” arXiv preprint arXiv:1909.12488, 2019.
[7] X. Gao, X. Ma, J. Wang, Y. Sun, B. Li, S. Ji, P. Cheng, and J. Chen, “Verifi: Towards verifiable federated unlearning,” arXiv preprint arXiv:2205.12709, 2022.
[8] Y. Liu, Z. Ma, X. Liu, and J. Ma, “Learn to forget: User-level memorization elimination in federated learning,” arXiv preprint arXiv:2003.10933, 2020.
[9] L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, “Machine unlearning,” in IEEE Symposium on Security and Privacy (SP), 2021.
[10] G. Liu, X. Ma, Y. Yang, C. Wang, and J. Liu, “Federaser: Enabling efficient client-level data removal from federated learning models,” in 2021 IEEE/ACM 29th International Symposium on Quality of Service, 2021.
[11] C. Wu, S. Zhu, and P. Mitra, “Federated unlearning with knowledge distillation,” arXiv preprint arXiv:2201.09441, 2022.
[12] S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International Conference on Machine Learning, 2020.
[13] Y. Cao and J. Yang, “Towards making systems forget with machine unlearning,” in IEEE Symposium on Security and Privacy, 2015.
[14] A. Ginart, M. Guan, G. Valiant, and J. Y. Zou, “Making ai forget you: Data deletion in machine learning,” Advances in neural information processing systems, vol. 32, 2019.
[15] H. Xie and J. C. Lui, “Modeling crowdsourcing systems: Design and analysis of incentive mechanism and rating system,” ACM SIGMETRICS Performance Evaluation Review, vol. 42, no. 2, pp. 52–54, 2014.
[16] N. Zhao, Y.-C. Liang, and Y. Pei, “Dynamic contract incentive mechanism for cooperative wireless networks,” IEEE transactions on vehicular technology, vol. 67, no. 11, pp. 10 970–10 982, 2018.
[17] W. Wang, L. Ying, and J. Zhang, “The value of privacy: Strategic data subjects, incentive mechanisms and fundamental limits,” in ACM International Conference on Measurement and Modeling of Computer Science, 2016.
[18] J. Wang, H. Zhong, J. Qin, W. Tang, R. Rajagopal, Q. Xia, and C. Kang, “Incentive mechanism for sharing distributed energy resources,” Journal of Modern Power Systems and Clean Energy, vol. 7, no. 4, pp. 837–850, 2019.
[19] Y. Zhan, P. Li, Z. Qu, D. Zeng, and S. Guo, “A learning-based incentive mechanism for federated learning,” IEEE Internet of Things Journal, 2020.
[20] N. Ding, Z. Fang, and J. Huang, “Optimal contract design for efficient federated learning with multi-dimensional private information,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 186–200, 2020.
[21] M. Zhang, E. Wei, and R. Berry, “Faithful edge federated learning: Scalability and privacy,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 12, pp. 3790–3804, 2021.
[22] Z. Wang, L. Gao, and J. Huang, “Socially-optimal mechanism design for incentivized online learning,” in IEEE Conference on Computer Communications (INFOCOM), 2022.
[23] R. Pathak and M. J. Wainwright, “Fedsplit: An algorithmic framework for fast federated optimization,” Advances in Neural Information Processing Systems, vol. 33, pp. 7057–7066, 2020.
[24] N. Tran, W. Bao, A. Zomaya, and C. Hong, “Federated learning over wireless networks: Optimization model design and analysis,” in IEEE Conference on Computer Communications (INFOCOM), 2019.
[25] N. Ding, Z. Sun, E. Wei, and R. Berry, “Techinical report,” https://www.dropbox.com/s/m9nw3p3ic5il0us/Appendix.pdf?dl=0, 2023.
[26] Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, “Unique in the crowd: The privacy bounds of human mobility,” Scientific reports, vol. 3, no. 1, pp. 1–5, 2013.
[27] D. Romanini, S. Lehmann, and M. Kivelä, “Privacy and uniqueness of neighborhoods in social networks,” Scientific reports, vol. 11, no. 1, p. 20104, 2021.
[28] E. Winter, “The shapley value,” Handbook of game theory with economic applications, vol. 3, pp. 2025–2054, 2002.
[29] T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song, “A principled approach to data valuation for federated learning,” in Federated Learning. Springer, 2020, pp. 153–167.