This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PS-FedGAN: An Efficient Federated Learning Framework Based on Partially Shared Generative Adversarial Networks For Data Privacy

Achintha Wijesinghe
Department of Electrical and Computer Engineering
University of California, Davis
Davis, CA
[email protected]
&Songyang Zhang
Department of Electrical and Computer Engineering
University of California, Davis
Davis, CA
[email protected]
Zhi Ding
Department of Electrical and Computer Engineering
University of California, Davis
Davis, CA
[email protected]
Abstract

Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation owing to its strong potential in capturing underlying data statistics while preserving data privacy. However, in cases of practical data heterogeneity among FL clients, existing FL frameworks still exhibit deficiency in capturing the overall feature properties of local client data that exhibit disparate distributions. In response, generative adversarial networks (GANs) have recently been exploited in FL to address data heterogeneity since GANs can be integrated for data regeneration without exposing original raw data. Despite some successes, existing GAN-related FL frameworks often incur heavy communication cost and also elicit other privacy concerns, which limit their applications in real scenarios. To this end, this work proposes a novel FL framework that requires only partial GAN model sharing. Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions across clients and to strengthen privacy preservation at reduced communication cost, especially over wireless networks. Our analysis demonstrates the convergence and privacy benefits of the proposed PS-FEdGAN framework. Through experimental results based on several well-known benchmark datasets, our proposed PS-FedGAN shows great promise to tackle FL under non-IID client data distributions, while securing data privacy and lowering communication overhead.

1 Introduction

Federated Learning (FL) presents an exciting framework for distributed and collaborative learning while preserving users’ data privacy McMahan et al. (2016); Konečný et al. (2016). As an emerging field of artificial intelligence, the full potential of FL requires us to effectively address a myriad of challenges, including heterogeneity of data distribution, data privacy consideration, and communication efficiency. Among many existing FL schemes to handle data heterogeneity, generative adversarial networks (GANs) Goodfellow et al. (2014) based approaches have recently drawn substantial interest owing to their ability to regenerate data statistics without sharing raw data. Despite much success, GAN-based FL is also prone to several shortcomings, such as poor privacy protection and heavy communication redundancies, caused by full model or synthetic data sharing Li et al. (2022b); Xin et al. (2020).

Many studies such as  Fang et al. (2020); Lyu et al. (2020) have shown that traditional FL methods may be vulnerable and can cause sensitive information leakage. Similar concerns happen in GAN-based FL. As discussed in Hitaj et al. (2017), releasing GAN or synthetic data may lead to critical privacy issues. Specifically, GANs are prone to reconstruction attacks and membership inference attacks, where an attacker aims to regenerate data samples and check the usage of a certain data sample  Hayes et al. (2017), respectively. To address related privacy concerns, differential privacy (DP) Dwork (2006) has been studied in GAN-based FL Xu et al. (2019); Li et al. (2022b). However, the major drawback of DP is the tradeoff between privacy and utility where a privacy budget is introduced to balance performance and privacy. Thus, to leverage the performance of downstream tasks, an infinite privacy budget is often picked, thereby providing no privacy. On the other hand, GAN-based FL also suffers from communication inefficiency due to the large size of shared models and limited communication resources in many realistic networking scenarios. We provide a more detailed literature review in the Appendix in view of the page limit.

Contributions: In this work, we reexamine GAN sharing strategy in FL and propose a novel GAN publishing mechanism to address practical cases of non-IID data distribution among users. Through our proposed FL framework, named as PS-FedGAN, we reconstruct separate generators at the server from partially-shared GAN models trained locally at client users, in which each user only shares its discriminator with others. The proposed PS-FedGAN significantly reduces the communication network overhead of model sharing and provides better data privacy during communication rounds. Furthermore, it bridges the gap between utility and privacy. Figure 1 highlights the distinction between existing full GAN sharing approaches and our proposed PS-FedGAN in an untrustworthy communication channel.

We summarise our contributions as follows:

  • We propose a novel GAN-based FL learning framework, PS-FedGAN, to cope with non-IID data distributions among FL client users. More specially, we propose to train a generator at the server to capture the underlying data distribution of the local user’s GAN by only sharing a local discriminator. The proposed framework can significantly lower communication cost and improve data privacy.

  • We provide analytical insights into the convergence of generator training at the client user end and at the cloud server. We investigate the convergence of the common discriminator training based on our proposed PS-FedGAN to establish the benefit of communication cost reduction by only sharing discriminators.

  • We provide an interpretability analysis on privacy of the proposed PS-FedGAN, which is further evaluated by our theoretical analysis and extensive experiments.

  • We present experimental results against several well-known benchmark datasets to further demonstrate the efficacy and efficiency of our PS-FedGAN, in terms of utility, privacy, and communication cost.

Refer to caption
Figure 1: Comparison of full GAN sharing (left, marked in black lines) v.s. our proposed PS-FedGAN discriminator sharing (right, marked in yellow lines) in a scenario of NN users. In both cases, a local user trains its GAN model with a generator (GG) and a discriminator (DD). In traditional FL with fully shared GAN, full GAN models are communicated, wheras PS-FedGAN only communicates discriminator to share. In full GAN sharing, an attacker could eavesdrop on an unreliable channel to gain access to shared GG and DD. As a result, the passive attacker can generate synthetic data which may approximate user data distribution. Furthermore, membership inference Shokri et al. (2016) could also become viable in this setup  Hayes et al. (2017). In contrast, PS-FedGAN prevents eavesdropping of GG and denies attacker’s access to both true GAN model GG and original data distributions.

2 Method and Architecture

2.1 Problem Setup

In this work, we aim to develop a novel GAN-based FL framework for a global/common task in a distributed learning setup. For convenience, we will use image classification as an illustrative example henceforth. Consider a reliable central server with limited access to client training data to achieve a desirable accuracy on the global task. Here, we assume non-IID data distributions and vulnerable communication channels among users, which are common in practical applications. For example, in smart healthcare, one learning task could be to train neural networks for the detection of a specific disease. One clinic may have a unique set of brain images from patients whereas other neighboring hospitals may also have access to several types of similar diseases with a plethora of examples. A global model based on all distributed data to detect these diseases could benefit all hospitals and provide better service for patients. In such a collaborative system, local data privacy is a critical concern. Also, adding noises to the dataset to hide sensitive information may lead to unwanted information distortion and/or artifacts. In this work, we address the problem of how to preserve local privacy while preserving the original data statistics.

Motivated by existing GAN-based FL, we develop a novel GAN publishing mechanism for FL with privacy preservation and communication efficiency. In alignment with other GAN-related FL works, we assume a system with a centralized server in the cloud and multiple distributed clients/users in communication with the server. Each client has access to enough resources to train a local GAN model. Note that, although we apply conditional GANs (cGAN)  Mirza and Osindero (2014) to alleviate the need for label detection via pseudo-labeling, the principle of our proposed scheme generally applies to all types of generative models.

To assess the effectiveness of our framework in preserving privacy, we consider the presence of adversarial attackers. In order to model potential attacks from these adversaries, we make the assumption that an attacker has the capability to eavesdrop on the communication channels connecting local users and the server, as depicted in Figure 1. The attacker’s objective is to estimate the data distribution of the local user through reconstruction attacks.

As illustrated in Figure 1, in conventional GAN-based FL methods, such an attacker would have unrestricted access to the entire GAN model. However, our proposed PS-FedGAN is specifically designed to address the security vulnerabilities associated with fully shared GAN models.

2.2 PS-FedGAN

We now delve into the structures of the proposed PS-FedGAN. In order to address the privacy concerns arising from full GAN sharing, we introduce a partial sharing approach within our PS-FedGAN framework. Specifically, we propose training two generators: generator GuG_{u} at the local user’s end and generator GsG_{s} at the server side for each user. In order to bridge the training of two generators, i.e., one at a local user and the other at the server connected by a communication link, we share a common discriminator DuD_{u} which is trained only at the local user.

To visually depict the update process of PS-FedGAN, we present the user-server communication flow leading up to a single-step update of the global model (ClC_{l}) in Figure 2. In the case of a single user, the training process begins with the local user training a local GAN model consisting of generator GuG_{u} and discriminator DuD_{u}. After completing a single batch training at the local user’s end, the trained discriminator DuD_{u} is shared with the server through the PS-FedGAN publishing mechanism p\mathcal{M}_{p}. The details of this publishing scheme will be further elaborated in Section 2.2.1. In our study, we consider an untrustworthy communication channel where attackers may potentially gain access to the PS-FedGAN publishing mechanism p\mathcal{M}_{p}. On the server side, we initialize a separate generator GsG_{s} for each user, following the guidelines of PS-FedGAN Algorithm 1. Subsequently, the server updates the corresponding GsG_{s} based on the information received through p\mathcal{M}_{p}.

Before updating the global model ClC_{l}, we wait for a specific number of updates from the user side, denoted as NN. In this context, NN can represent an epoch for the local user. Consequently, we carry out NN updates on both the local user’s GAN and the corresponding GsG_{s} in parallel. The generated data from all the updated GsG_{s} models, corresponding to different local users, is combined with the server-available data to update the global model ClC_{l}. This cycle of updates and combination of generated data is repeated until ClC_{l} converges to the desired point.

When deploying the FL models, we assume each local user and the server have agreed on a secret seed and the same architecture for GuG_{u} and GsG_{s}. The secret seed serves to initiate the weights of GuG_{u} and its corresponding GsG_{s}. Note that each user is assigned its own dedicated generator in the cloud server. Also, we impose no constraint on DuD_{u}. PS-FedGAN follows a batch-wise training process where we update GsG_{s}, GuG_{u}, and DuD_{u} at each step. Once GuG_{u} and GsG_{s} are initiated, local GAN training commences with DuD_{u}. In the following subsections, we provide a detailed explanation of the PS-FedGAN publishing mechanism p\mathcal{M}_{p} followed by an elaboration of the training process for both the local users and the cloud server in each communication round.

2.2.1 PS-FedGAN Publishing Mechanism p\mathcal{M}_{p}

In PS-FedGAN, let ztz_{t} be the noise vector used to train GuG_{u} with random labels ltl_{t} at step/batch tt at the user side. Let (θ)\mathcal{M}(\theta) be a user-side publishing mechanism used to send trained parameters θ\theta to the server. The PS-FedGAN publishing mechanism is defined as p\mathcal{M}_{p}: (Du,zt,lt)\mathcal{M}(D_{u},z_{t},l_{t}), where the local user releases Du,ztD_{u},z_{t}, and ltl_{t} to the server after each training step of DuD_{u}. We shall show that p\mathcal{M}_{p} preserves privacy and lowers communication cost compared to the full GAN releasing methods (Gu,Du)\mathcal{M}(G_{u},D_{u}) via theoretical analysis and experimental results in Section 3 and Section 4, respectively.

Refer to caption
Figure 2: User-server communication process until one update of the global classifier (ClC_{l})shown for one user using the PS-FedGAN method: The local user first starts training a local cGAN (generator GuG_{u} and discriminator DuD_{u}). After a single batch training at the local user, trained DuD_{u} is shared with the server using p\mathcal{M}_{p}. p:(Du,z,l)\mathcal{M}_{p}:\mathcal{M}(D_{u},z,l), where \mathcal{M} is a publishing mechanism. zz and ll are noise vectors and corresponding fake labels used to train GuG_{u} at current step, respectively. At the server, we initiate a separate generator GsG_{s} for each user. For every received user update, we update GsG_{s} by one step. After NN such steps, we update the ClC_{l} using synthetic data generated from each GsG_{s} and cloud available data. We continue this process till ClC_{l} converges.

2.2.2 PS-FedGAN Local User Training

As discussed earlier, DuD_{u} is trained at the local user end after initializing GuG_{u}. The local user has the flexibility to choose any architecture for DuD_{u}, as long as it is trainable with GuG_{u}. Following the conventional GAN training approach, DuD_{u} is initially trained using real images and images generated by GuG_{u} at each step. To handle the randomness of GuG_{u} in a deterministic manner, the parameters of DuD_{u} are updated during backpropagation (BP).

Depending on the convergence requirements of the architecture, multiple iterations of DuD_{u} training can be performed. Subsequently, random vectors ztz_{t} and corresponding random labels ltl_{t} are generated for training GuG_{u}. Before we train GuG_{u}, we release p\mathcal{M}_{p} to the server to minimize latency. Locally, GuG_{u} is trained using ztz_{t} and the corresponding ltl_{t}. Similar to general GAN training principles, the randomness in DuD_{u} is handled deterministically, to enable the training of GAN GuG_{u} accordingly. The parameters of GuG_{u} are then updated through BP.

2.2.3 PS-FedGAN Server Training

At the server, we maintain a dedicated generator (GsG_{s}) corresponding to each user initialized with the secret seed. Upon receiving the parameters p\mathcal{M}_{p} from the respective user, the training process for each GsG_{s} starts. During this process, it is not necessary to wait for all other users to communicate with the server. Similar to the training of GuG_{u}, we train GsG_{s} using the received ztz_{t} and ltl_{t} from the corresponding user. To ensure consistency and avoid randomness in DuD_{u}, we employ the same techniques used by the corresponding user during training.

Next, we perform BP through the parameters of GsG_{s} and update them accordingly. Once we receive updates from all users, we proceed to update the global classifier ClC_{l}. To update ClC_{l} at each iteration, we generate a fixed number of samples from each GsG_{s} and combine them with a portion of the available true data at the server. This approach allows us to create training samples that incorporate multiple user generators. We present the PS-FedGAN training algorithm in Algorithm 1

Algorithm 1 PS-FedGAN: Training Algorithm. Minibatch GAN training for distributed GANs.
for each user ii do
     Initial GuiG_{u_{i}} and GsiG_{s_{i}} using secret seed keyikey_{i}
     Initiate the discriminator: DuiD_{u_{i}}
end for
Initiate the global model ClC_{l} ( classifier ) and train using the available cloud data.
for each communication round, until ClC_{l} is converged do
     for number of training iterations do
         for each step t do
              zt,ltz_{t},l_{t}\leftarrow random vectors, random labels
              Train DuiD_{u_{i}}: Dui(t+1)(Gui(t)(zl,lt),𝒟ui(t))D_{u_{i}}^{(t+1)}\leftarrow\mathcal{F}(G_{u_{i}}^{(t)}(z_{l},l_{t}),\mathcal{D}_{u_{i}}^{(t)}), where \mathcal{F} represent forward/back prop and weight updating
              Share (𝒟ui(t+1),zt,lt):p(𝒟ui(t+1),zt,lt)(\mathcal{D}_{u_{i}}^{(t+1)},z_{t},l_{t}):\mathcal{M}_{p}(\mathcal{D}_{u_{i}}^{(t+1)},z_{t},l_{t})
              Train GuiG_{u_{i}}: Gui(t+1)(Gui(t)(zt,lt),Dui(t+1))G_{u_{i}}^{(t+1)}\leftarrow\mathcal{F}(G_{u_{i}}^{(t)}(z_{t},l_{t}),D_{u_{i}}^{(t+1)})
              Train GsiG_{s_{i}}: Gsi(t+1)(Gsi(t)(zt,lt),Dui(t+1))G_{s_{i}}^{(t+1)}\leftarrow\mathcal{F}(G_{s_{i}}^{(t)}(z_{t},l_{t}),D_{u_{i}}^{(t+1)})
         end for
     end for
     Train ClC_{l} with cloud data and synthetic data : Cl(t+1)(Cl(t))C_{l}^{(t+1)}\leftarrow{\mathcal{F}}(C_{l}^{(t)})
end for

2.3 Attacker Models

To assess the performance of PS-FedGAN against potential attackers, we focus on reconstruction attacks specifically. These attackers are denoted as 𝒜R\mathcal{A}_{R}. For any (θ)\mathcal{M}(\theta), an attacker 𝒜R\mathcal{A}_{R} attempts to reconstruct the training samples and assume \mathcal{I} represents a reconstructed sample (images in this manuscript), i.e.,

𝒜R:(θ).\mathcal{A}_{R}:\mathcal{M}(\theta)\mapsto\mathcal{I}. (1)

In our experimental setups, we consider two types of attackers: 𝒜1\mathcal{A}_{1} and 𝒜2\mathcal{A}_{2}. The bottleneck faced by any attacker against our model lies in the hidden generator model that is prone to information leakage. Therefore, the process of reconstructing synthetic data helps preserve privacy in terms of the initial weights and network architecture, as further elaborated in Section 3. Predicting the exact architecture has become challenging owing to advancements in deep learning techniques.

For an attacker to obtain information, it is crucial to eavesdrop from the very beginning and ensure they do not miss any round of communication. Such requirements prove difficult for potential privacy attackers. Additionally, the attacker must acquire knowledge of the initial weights of the generator, to be discussed in Section 3. However, in practice, it is typically challenging, and often impossible, for an attacker to accurately estimate the generator’s structure due to various practical constraints such as power, latency, and hardware capabilities. The complexity and variability of generator architectures play a significant role in preserving privacy. The diverse range of architectures available for generators, coupled with their complexity, makes it highly challenging for an attacker to accurately infer the precise model structure. This inherent difficulty further enhances the preservation of privacy in the system. By utilizing complex and variable generator architectures, PS-FedGAN adds an additional layer of protection against potential privacy breaches.

Therefore, to define new attacks, we first introduce a factor r[0,1]r\in[0,1]. We assume that attackers 𝒜1\mathcal{A}_{1} and 𝒜2\mathcal{A}_{2} have the same architecture of GuG_{u} and the same weights and bias terms of GuG_{u} in every deep learning network layer except the first. Let w1w_{1} and b1b_{1} be the layer 1 weight and bias term of GuG_{u}. Layer 1 weight wa1w_{a1} and bias term ba11b_{a11} of 𝒜1\mathcal{A}_{1} are set to wa11=rw1w_{a11}=rw_{1} and ba1=b1b_{a1}=b_{1}, respectively. Similarly for 𝒜2\mathcal{A}_{2}, we set layer 1’s weight unperturbed as wa21=w1w_{a21}=w_{1}, and set layer 1’s bias to ba21=rb1b_{a21}=rb_{1}. Clearly, these two attackers’ knowledge of GuG_{u} parameters are only slightly different from the true parameters with a multiplicative factor rr on Layer 1’s weights and biases, respectively.

3 Theoretical Results

In this section, we provide a convergence analysis of the proposed method and provide insights into privacy preservation. Detailed corresponding proofs are presented in Appendix B Theoretical Analysis.

3.1 Convergence of DuD_{u}

We first show the convergence of two generators and a discriminator trained according to Algorithm 1. Suppose that the GAN trained at a local user consists of a generator GuG_{u} capturing a probability distribution of pgup_{gu}, and a discriminator DuD_{u}. Let the local user’s data distribution be pdata(x)p_{data}(x). The local user (client) trains GuG_{u} with zpzz\sim p_{z}. We have the following properties on model convergence.

Proposition 1. Two generators GuG_{u} and GsG_{s} trained on Algorithm 1 with a shared discriminator DuD_{u} converges to the same optimal discriminator DuD_{u}^{*} as in Goodfellow et al. (2014) and it uniquely corresponds to the given GuG_{u}, i.e.,

Du=pdata(x)pdata(x)+pgu(x)D_{u}^{*}=\frac{p_{data}(x)}{p_{data}(x)+p_{gu}(x)} (2)

In Proposition 1, we see that the DuD_{u} in PS-FedGAN converges to the same discriminator if we train GuG_{u} and DuD_{u} without GsG_{s}. That is, training GsG_{s} in the cloud does not hamper the convergence or performance of the local GAN training. On the other hand, we have a unique DuD_{u} given GsG_{s}. This property solidifies the convergence of GsG_{s} to GuG_{u} which is characterized in the following Propositions regarding the generator models.

3.2 Convergence of GuG_{u} and GsG_{s}

Proposition 2. Two generators GuG_{u} and GsG_{s} trained according to Algorithm 1 with a shared DuD_{u} would converge to a unique G=Gu=GsG^{*}=G_{u}^{*}=G_{s}^{*} which captures pdatap_{data}.

This proposition establishes that two distributed generators trained using PS-FedGAN converge to the same generative model. Moreover, this model is the same as the optimized model that one can obtain via classical GAN training. Another vital observation is that both GuG_{u} and GsG_{s} capture the user-side data distribution.

Proposition 3. Any other generator G𝒜G_{\mathcal{A}} failing to capture the weights and architectures of GsG_{s} or GuG_{u}, either in the initial state or in any single communication round, would fail to characterize the data distribution pdatap_{data}.

This proposition provides us with insights into the capacity required by an attacker. To attack the proposed model, an attacker would need to predict a generator GG using the information obtained from p\mathcal{M}_{p}. However, to successfully carry out this attack, the attacker would need to possess precise knowledge of the generator’s architecture, initial weights of either GsG_{s} or GuG_{u}, and would need to monitor and capture every round of communication. In the next section on experiments, numerical results shall substantiate the need for these requirements and further demonstrate the difficulty an attacker would face when attempting to breach the privacy of the PS-FedGAN model.

3.3 DP property of p\mathcal{M}_{p}

We now discuss the DP properties of the proposed methods.

Let us denote the discriminator by D=f(data)D=f(data) and the generator by G=g(D,z)G=g(D,z), where zz represents noise. From the post-processing property in Dwork and Roth (2014), g(f(D))g(f(D)) is DP if f(data)f(data) is DP. Thus, the generator GG is DP given DD is DP. Furthermore, if the training process is based on original data, FL-GAN follows DP Xin et al. (2022).

In practical scenarios of PS-FedGAN, accessed models of DD with quantized channel noise is DP  Amiri et al. (2021) on the original weights of DD, i.e., WDW_{D}. Since an attacker can only gain access to discriminator DD during communication round in the proposed PS-FedGAN framework, any generators reconstructed by the attackers from the hacked DD is also DP on WDW_{D}.

Proposition 4. Any generator GG reconstructed by the attacker shall be DP in a communication channel with quantization noise or channel induced error.

This proposition shows that the attackers’ estimated GAN model is DP on the original model weights WDW_{D} sent from a client to the server. Considering the mutual information of shared models and original data, we have I(data,WD)I(data,data)I(data,W_{D})\leq I(data,data). Therefore, if we preserve privacy in WDW_{D}, we also preserve some privacy in the original data.

Proposition 5. p\mathcal{M}_{p} preserves privacy compared to full data sharing.

4 Experiments

In this section, we present test results of the proposed PS-FedGAN under non-IID user data distributions. In the first subsection, we evaluate performance when utilizing the proposed method and provide performance comparison with existing FL methods. We then present privacy measures and also evaluate the associated communication cost. Our experiments use several well-known benchmark datasets, including MNIST Deng (2012), Fashion MNIST Xiao et al. (2017), SVHN Netzer et al. (2011), and CIFAR10 Krizhevsky (2009).

4.1 Evaluation of Utility

Our study considers three distinct scenarios of heterogenous user cases, which are based on the work presented in  Li et al. (2022b). Each case involves a total of 10 users and the following details:

  • Split-1:

    In this case, training data is divided into 10 shards, each containing samples from a single class. Each user is randomly assigned one distinct shard.

  • Split-2:

    This case generates 20 training data shards, each consisting of samples from a single class. Two shards are randomly assigned to each user without overlap.

  • Split-3:

    This case generates 30 shards, each containing samples from a single class. Three shards are randomly assigned to each user without overlap.

As a utility measure, we select the classification accuracy of the global model in a supervised setup. We compare our results against several existing FL alternatives: FedAvg (FA)  McMahan et al. (2016), FedProx (FP)  Li et al. (2020a), SCAFFOLD (SD)  Karimireddy et al. (2020), Naivemix (NM)  Yoon et al. (2021), FedMix (FM) Yoon et al. (2021), and SDA-FL (SL) Li et al. (2022b) as baselines. We also compare our partially-shared PS-FedGAN (PS) with the fully-shared GAN (FG) as a performance benchmark. The results of Split-1/2 are shown in Table 1. The performance of Split-3 can be found in the Appendix.

Table 1 shows the superior performance of the proposed PS-FedGAN over most existing methods except for the FG. More specially, we see a significant improvement in PS-FedGAN in Split-1 for the CIFAR10 dataset. Compared with full-GAN (FG) sharing, our partially-shared GAN achieves similar performance but at significantly reduced communication cost and privacy loss as further shown in Table 4. Note that in the above test, the SDA-FL method requires an infinite privacy budget to achieve the desired utility. This indicates that SDA-FL faces significant challenges in balancing privacy and utility. Additionally, it is worth mentioning that other alternatives provide more real data for the classifier from Split-1 to Split-3, whereas in our proposed method, we maintain a constant amount of real data on the server (1%). Similar results can be seen for Split-3 in the Appendix.

Table 1: Classification accuracies (best) of existing FL methods and full GAN sharing compared with PS-FedGAN in Split-1 and Split-2. Some of the results are from Li et al. (2022b). We assume 1% of the training data of each dataset is available in the cloud for FG and PS-FedGAN.
Split-1 Split-2
Method MNIST FMNIST SVHN CIFAR10 MNIST FMNIST SVHN CIFAR10
FA 83.44 16.50 14.05 18.36 97.61 73.50 81.11 61.28
FP 84.17 57.14 17.53 11.24 97.55 75.76 86.28 63.16
SD 25.39 56.80 11.64 12.81 94.17 70.82 73.34 60.78
NM 84.35 66.62 14.35 14.39 84.35 79.54 84.64 64.39
FM 90.96 72.11 16.78 13.57 90.96 82.41 86.61 65.76
SL 98.19 85.70 88.46 37.70 98.26 86.87 90.70 67.89
FG 98.41 88.77 91.92 66.98 98.41 89.01 92.61 69.91
PS 98.31 88.42 91.73 66.96 98.44 89.02 92.30 69.89

4.2 Privacy Evaluation

We now evaluate the privacy of PS-FedGAN with respect to reconstruction attacks, which is often viewed as one of the most dangerous attacks. Here, we utilize the MNIST dataset. To evaluate the efficacy against reconstruction attacks, we use several proxies to measure the privacy leakage, including normalized mean square error (NMSE), structural similarity index (SSIM)  Dangwal et al. (2021b), and classification accuracy. We consider two different non-IID user setups:

  • Setup-1 includes three users. User-1 has access to classes {0,1}, user-2 has access to classes {2,3,4}, and user-3 has access to remaining classes {5,6,7,8,9}. Also attacker model 𝒜1\mathcal{A}_{1} described in Section 2.3 is used.

  • In Setup-2, we consider 10 users with data splitting in Split-1, where attacker model 𝒜2\mathcal{A}_{2} is applied here in reconstruction attacks.

Classification Accuracy: To evaluate the attacker’s classification accuracy on reconstructed images, we first consider Setup-1. In this setup, we assume that the attacker has access to all the elements released by the releasing mechanism p\mathcal{M}p. Furthermore, we assume that the attacker can accurately guess the exact generator architecture. Note that at the beginning of training, the generator weights are different for 𝒜1\mathcal{A}_{1} (wa11=rw1w{a11}=rw_{1}), from the cloud generator as mentioned earlier. The generators are trained simultaneously at the user, the server, and the attacker.

We then pick an attacking classifier trained on the original MNIST training set and infer data generated by the attacker’s generator. Here, the attacker assumes that the user works with MNIST data. Table 2 illustrates the attacker’s performance in different rr. From Table 2, we see that the attacking gains better accuracy with increasing rr. In Table 2, we evaluate the classification accuracy at the cloud using the same attacking classifier, which reflects the potential of an ideal attacker and the utility at the cloud. Table 2 shows that an attacker with some disparity in the initial weights can only perform like a random guess. Note that in this scenario, user-1 has 2 classes, user-2 has 3 classes, and user-3 has 5 classes. These results suggest that an attacker must obtain very accurate information on architecture and initial model weights to achieve higher and nontrivial inference accuracy. Since such requirements are improbable and impractical, our results establish the robustness of our proposed method against classification attacks.

Table 2: Classification accuracies (best) on attacker’s generator and cloud’s generator: How attacker’s performance (classification accuracy %) varies as rr varies on 𝒜1\mathcal{A}_{1} on Setup-1
Attacker’s Models Cloud’s models
r(on attacker) User1 User2 User3 User1 User2 User3
11×1041-1\times 10^{-4} 0.4955 0.3390 0.2048 0.9911 0.9902 0.9722
11×1071-1\times 10^{-7} 0.4933 0.3323 0.1945 0.9983 0.9727 0.9610
11×10151-1\times 10^{-15} 0.4941 0.3299 0.1990 0.9980 0.9677 0.9690
11×10211-1\times 10^{-21} 0.5027 0.3341 0.2056 0.9981 0.9746 0.9754
Refer to caption
Figure 3: Reconstruction attacks: Different rr from 0.1 to 0.999999 evaluated under 𝒜2\mathcal{A}_{2}.

Next, we consider 𝒜2\mathcal{A}_{2}’s classification accuracy on reconstructed images. For this attacker, the difference between its generator and the cloud generator lies the first layer bias term (ba21=rb1b_{a21}=rb_{1}). We evaluate 𝒜2\mathcal{A}_{2}’s performances on Split-1. Figure 3 illustrates the attacker’s performance for different rr. Figure 3 shows that the attacking gains better classification with increasing rr. Similar to the first type of attackers in 𝒜1\mathcal{A}_{1}, these results suggest that an attacker from 𝒜2\mathcal{A}_{2} also needs to have very accurate architecture knowledge and the initial weights. This test case further establishes the robustness of PS-FedGAN robust against classification attacks.

Reconstruction Quality and Similarity As presented in the paper Dangwal et al. (2021a), reconstruction quality and similarity can be used as a measure of privacy leakage. For this, we consider 2 metrics, i.e., SSIM and NMSE, in Setup-1. Table 3 compares the corresponding generations of the attacker and the cloud based on the similarity between each generated image. From Table 3, we can see that the NMSE values achieved by the attackers are high while SSIM (maximum 1) is very low. These results indicate that the attacker-generated images cannot capture the true data distribution. We shall further show in Appendix that, if the attacker deviates from the actual generator weights, its convergence would be to a trivial point. As a result, no underlying user data properties are captured, as illustrated by the visual examples in Figure 4. More results and discussions are provided in the Appendix.

Table 3: Reconstruction attacks: Attacker 𝒜1\mathcal{A}_{1} reconstruction quality on 3 different users in Setup-1. For NMSE lower values are better and for SSIM higher values are better.
Metric User1 User2 User3
NMSE 1.4108 1.4585 1.2553
SSIM 0.01956 0.0253 0.0229
Refer to caption
(a) Regenerated data samples in the cloud server.
Refer to caption
(b) Regenerated samples of 𝒜2\mathcal{A}_{2} (each # for one user).
Refer to caption
(c) Regenerated samples of 𝒜1\mathcal{A}_{1} (each block for one user).
Figure 4: Divergence of the attacker. Any attacker fails to initiate with the same parameters as GuG_{u} or GsG_{s} would fail to capture the local user’s distribution (r=0.9999r=0.9999 for both 𝒜1\mathcal{A}_{1} and 𝒜2\mathcal{A}_{2}).

4.3 Evaluation of communication Cost

Table 4: Number of parameters communicated at each step for Split-3
Dataset Full GAN sharing: (G,D)\mathcal{M}(G,D) PS-FedGAN: p(D,z,l)\mathcal{M}_{p}(D,z,l)
MNIST 3 M 1.5 M
FMNIST 5 M 1.5 M
SVHN 2 M 0.7 M
CIFAR10 6.8 M 2.98 M

Table 4 illustrates the number of parameters that need to be shared between each user and the cloud server per communication round in the full GAN vs PS-FedGAN. From the table, we can see a significant saving in PS-FedGAN compared to classical full GAN sharing. The advantage applies similarly to various different GAN architectures.

5 Conclusion and Future Works

In this work, we develop a GAN-based FL framework PS-FedGAN, which can beneficially preserve local data privacy and reduce communication overhead in comparison with existing FL proposals. Our novel PS-FedGAN achieves learning utility comparable to that of fully shared GAN architecture, while significantly securing data privacy and lowering the communication cost. Empirical results further demonstrate superior results against state-of-the-art GAN-based FL frameworks. This PS-FedGAN principle and architecture can be directly generalized to incorporate any existing GAN. In future work, we plan further explore the effect of lossy networking channels and improve PS-FedGAN’s robustness against non-ideal network links. Another promising extension of this work is to explore the model and data heterogeneity presented in Vahidian et al. (2022).

References

  • Amiri et al. [2021] Saba Amiri, Adam Belloum, Sander Klous, and Leon Gommans. Compressive differentiallyprivate federated learning through universal vector quantization. In AAAI Workshop on Privacy-Preserving Artificial Intelligence, 2021.
  • Cao et al. [2023] Xingjian Cao, Gang Sun, Hongfang Yu, and Mohsen Guizani. Perfed-gan: Personalized federated learning via generative adversarial networks. IEEE Internet of Things Journal, 10(5):3749–3762, 2023. doi: 10.1109/JIOT.2022.3172114.
  • Dangwal et al. [2021a] Deeksha Dangwal, Vincent T. Lee, Hyo Jin Kim, Tianwei Shen, Meghan Cowan, Rajvi Shah, Caroline Trippel, Brandon Reagen, Timothy Sherwood, Vasileios Balntas, Armin Alaghi, and Eddy Ilg. Analysis and mitigations of reverse engineering attacks on local feature descriptors. CoRR, abs/2105.03812, 2021a. URL https://arxiv.org/abs/2105.03812.
  • Dangwal et al. [2021b] Deeksha Dangwal, Vincent T. Lee, Hyo Jin Kim, Tianwei Shen, Meghan Cowan, Rajvi Shah, Caroline Trippel, Brandon Reagen, Timothy Sherwood, Vasileios Balntas, Armin Alaghi, and Eddy Ilg. Analysis and mitigations of reverse engineering attacks on local feature descriptors. CoRR, abs/2105.03812, 2021b. URL https://arxiv.org/abs/2105.03812.
  • Deng [2012] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  • Dwork [2006] Cynthia Dwork. Differential privacy. In Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, editors, Automata, Languages and Programming, pages 1–12, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-35908-1.
  • Dwork and Roth [2014] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9:211–407, 2014.
  • Fang et al. [2020] Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. Local model poisoning attacks to Byzantine-Robust federated learning. In 29th USENIX Security Symposium (USENIX Security 20), pages 1605–1622. USENIX Association, August 2020. ISBN 978-1-939133-17-5. URL https://www.usenix.org/conference/usenixsecurity20/presentation/fang.
  • Goodfellow et al. [2014] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014. URL https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf.
  • Hao et al. [2021] Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J Liang, Changyou Chen, and Lawrence Carin. Towards fair federated learning with zero-shot data augmentation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 3305–3314, 2021.
  • Hayes et al. [2017] Jamie Hayes, Luca Melis, George Danezis, and Emiliano De Cristofaro. LOGAN: evaluating privacy leakage of generative models using generative adversarial networks. CoRR, abs/1705.07663, 2017. URL http://arxiv.org/abs/1705.07663.
  • Hitaj et al. [2017] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez-Cruz. Deep models under the gan: Information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, page 603–618, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349468. doi: 10.1145/3133956.3134012. URL https://doi.org/10.1145/3133956.3134012.
  • Jeong et al. [2018] Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. CoRR, abs/1811.11479, 2018. URL http://arxiv.org/abs/1811.11479.
  • Karimireddy et al. [2020] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian Stich, and Ananda Theertha Suresh. SCAFFOLD: Stochastic controlled averaging for federated learning. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 5132–5143. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/karimireddy20a.html.
  • Konečný et al. [2016] Jakub Konečný, H. Brendan McMahan, Felix X. Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. CoRR, abs/1610.05492, 2016. URL http://arxiv.org/abs/1610.05492.
  • Krizhevsky [2009] Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  • Li et al. [2020a] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. In I. Dhillon, D. Papailiopoulos, and V. Sze, editors, Proceedings of Machine Learning and Systems, volume 2, pages 429–450, 2020a. URL https://proceedings.mlsys.org/paper_files/paper/2020/file/38af86134b65d0f10fe33d30dd76442e-Paper.pdf.
  • Li et al. [2022a] Wei Li, Jinlin Chen, Zhenyu Wang, Zhidong Shen, Chao Ma, and Xiaohui Cui. Ifl-gan: Improved federated learning generative adversarial network with maximum mean discrepancy model aggregation. IEEE Transactions on Neural Networks and Learning Systems, pages 1–14, 2022a. doi: 10.1109/TNNLS.2022.3167482.
  • Li et al. [2020b] Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2020b. URL https://openreview.net/forum?id=HJxNAnVtDS.
  • Li et al. [2022b] Zijian Li, Jiawei Shao, Yuyi Mao, Jessie Hui Wang, and Jun Zhang. Federated learning with gan-based data synthesis for non-iid clients, 2022b. URL https://arxiv.org/abs/2206.05507.
  • Luo et al. [2021] Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, and Jiashi Feng. No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 5972–5984. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/2f2b265625d76a6704b08093c652fd79-Paper.pdf.
  • Lyu et al. [2020] L. Lyu, Han Yu, Xingjun Ma, Lichao Sun, Jun Zhao, Qiang Yang, and Philip S. Yu. Privacy and robustness in federated learning: Attacks and defenses. IEEE transactions on neural networks and learning systems, PP, 2020.
  • McMahan et al. [2016] H. B. McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics, 2016.
  • Mirza and Osindero [2014] Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. CoRR, abs/1411.1784, 2014. URL http://arxiv.org/abs/1411.1784.
  • Netzer et al. [2011] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. URL http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf.
  • Shokri et al. [2016] Reza Shokri, Marco Stronati, and Vitaly Shmatikov. Membership inference attacks against machine learning models. CoRR, abs/1610.05820, 2016. URL http://arxiv.org/abs/1610.05820.
  • Vahidian et al. [2022] Saeed Vahidian, Mahdi Morafah, Chen Chen, Mubarak Shah, and Bill Lin. Rethinking data heterogeneity in federated learning: Introducing a new notion and standard benchmarks. In Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), 2022. URL https://openreview.net/forum?id=2mQCv0_Ac74.
  • Wu et al. [2021] Yuezhou Wu, Yan Kang, Jiahuan Luo, Yuanqin He, and Qiang Yang. Fedcg: Leverage conditional gan for protecting privacy and maintaining competitive performance in federated learning. In International Joint Conference on Artificial Intelligence, 2021.
  • Xiao et al. [2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.
  • Xin et al. [2020] Bangzhou Xin, Wei Yang, Yangyang Geng, Sheng Chen, Shaowei Wang, and Liusheng Huang. Private fl-gan: Differential privacy synthetic data generation based on federated learning. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2927–2931, 2020. doi: 10.1109/ICASSP40776.2020.9054559.
  • Xin et al. [2022] Bangzhou Xin, Yangyang Geng, Teng Hu, Sheng Chen, Wei Yang, Shaowei Wang, and Liusheng Huang. Federated synthetic data generation with differential privacy. Neurocomput., 468(C):1–10, jan 2022. ISSN 0925-2312. doi: 10.1016/j.neucom.2021.10.027. URL https://doi.org/10.1016/j.neucom.2021.10.027.
  • Xu et al. [2019] Chugui Xu, Ju Ren, Deyu Zhang, Yaoxue Zhang, Zhan Qin, and Kui Ren. Ganobfuscator: Mitigating information leakage under gan via differential privacy. IEEE Transactions on Information Forensics and Security, 14(9):2358–2371, 2019. doi: 10.1109/TIFS.2019.2897874.
  • Yoon et al. [2021] Tehrim Yoon, Sumin Shin, Sung Ju Hwang, and Eunho Yang. Fedmix: Approximation of mixup under mean augmented federated learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Ogga20D2HO-.
  • Yoshida et al. [2020] Naoya Yoshida, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto, and Ryo Yonetani. Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In 2020 IEEE International Conference on Communications, ICC 2020, Dublin, Ireland, June 7-11, 2020, pages 1–7. IEEE, 2020. ISBN 978-1-7281-5089-5. doi: 10.1109/ICC40277.2020.9149323. URL https://doi.org/10.1109/ICC40277.2020.9149323.
  • Zhang et al. [2019] Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency regularization for generative adversarial networks. CoRR, abs/1910.12027, 2019. URL http://arxiv.org/abs/1910.12027.
  • Zhao et al. [2018] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. Federated learning with non-iid data. CoRR, abs/1806.00582, 2018. URL http://arxiv.org/abs/1806.00582.

Appendix

A Backgrounds and Related Works

A 1.1 Classical Federated Learning

Federated Learning (FL) offers a simple and efficient method for privacy-protected distributed learning. In classic FL, local data collections are often non-IID, unbalanced, and widely distributed, typically with limited communication bandwidth McMahan et al. [2016]. To reduce communication overhead and maintain privacy, a Federated Average (FedAvg) algorithm is proposed. FedAvg aggregates gradient updates from different users by combining several local updates in a single round of communication Konečný et al. [2016].

The convergence of FedAvg for non-IID data is discussed in Li et al. [2020b]. However, as shown in Zhao et al. [2018], the accuracy loss of FedAvg can be up to 55%\sim 55\% in a non-IID setup for some datasets. How to effectively handle non-IID data distributions among different users remains an open FL question. First, FedProx Li et al. [2020a] is proposed as a generalized FedAvg for heterogeneous networks. SCAFFOLD is another extension of FedAvg that incorporates predictive variance reduction techniques Karimireddy et al. [2020]. In Luo et al. [2021], the authors propose a classifier calibration with virtual representation, leveraging a Gaussian Mixer Model to sample virtual features. As an alternative solution, the authors of  Yoshida et al. [2020] proposed raw data sharing in a cooperative learning mechanism. In  Li et al. [2020b], the authors proposed the use of some initial training data e.g., 5%, to handle non-IID data with traditional FL algorithms. Some other classic methods for non-IID data include GAN sharing  Li et al. [2022a, b], Cao et al. [2023], synthetic data sharing  Hao et al. [2021], Jeong et al. [2018] and global sub-dataset sharing Zhao et al. [2018].

A 1.2 GAN-based FL for IID and Non-IID Clients

Training GANs locally before sharing trained models with a centralized server is a popular approach for handling both IID and non-IID data distributions. In Zhang et al. [2019], the authors applied a conditional GAN (cGAN) and shared local classifier and generator with the central server which trains a global classifier and a generator to guide local users. Applying similar concept, the authors of Wu et al. [2021] proposed to incorporate model splitting to keep part of the cGAN (discriminator) and part of the hidden classifier by sharing the generator and a global classifier. In addition, as introduced in Li et al. [2022b], Cao et al. [2023], one can share the entire local GAN with the server and create a synthetic dataset to train a global GAN. Similarly, full GAN sharing has been discussed in  Li et al. [2022a]. Only shared generators with the maximum mean discrepancy are aggregated. To address privacy concerns, DP Dwork [2006] has been applied in GAN-based FL Xu et al. [2019], Li et al. [2022b], while a privacy budget is introduced to balance privacy and utility.

B Theoretical Analysis

We provide convergence guarantees of the proposed method and provide insights into why attacking is hard. First, to observe the convergence of distributed two generators with a shared discriminator trained on proposed Algorithm 1, consider a GAN at the local user side with a generator GuG_{u} capturing a probability distribution of pgup_{gu} with the corresponding discriminator DuD_{u} and the dedicated generator GsG_{s} at the server side. Assume user’s data distribution as pdata(x)p_{data}(x) and the local user trains GuG_{u} with zpzz\sim p_{z}

B 1.1 Convergence of DuD_{u}

Proposition 1. Two generators GuG_{u} and GsG_{s} trained on Algorithm 1 with a shared discriminator DuD_{u} converges to the same optimal discriminator DuD_{u}^{*} as in Goodfellow et al. [2014] and it uniquely corresponds to the given GuG_{u}, i.e.,

Du=pdata(x)pdata(x)+pgu(x).D_{u}^{*}=\frac{p_{data}(x)}{p_{data}(x)+p_{gu}(x)}. (3)

Proof: In the proposed PS-FedGAN, Algorithm 1 trains DuD_{u} using GuG_{u} while GsG_{s} has no influence on DuD_{u}. Therefore, as illustrated in Goodfellow et al. [2014], the discriminator training is valid with the value function V(Gu,Du)V(G_{u},D_{u}) denoted by

V(Gu,Du)=xpdatalog(Du(x))𝑑x+zpz(z)log(1Du(Gu(z)))𝑑z.V(G_{u},D_{u})=\int_{x}p_{data}\log(D_{u}(x))\,dx+\int_{z}p_{z}(z)\log(1-D_{u}(G_{u}(z)))\,dz. (4)

Using the Radon-Nikodym theorem, we have the following conclusions:

Ezpz(z)log(1Du(Gu(z)))=Expgu(x)log(1Du(x)).E_{z\sim p_{z}(z)}\log(1-D_{u}(G_{u}(z)))=E_{x\sim p_{gu}(x)}\log(1-D_{u}(x)). (5)

Then, the value function can be recalculated as

V(Gu,Du)=xpdatalog(D(x))+pgu(x)log(1D(x))dx.V(G_{u},D_{u})=\int_{x}p_{data}\log(D(x))+p_{gu}(x)\log(1-D(x))\,dx. (6)

Let g(x)=alog(x)+blog(1x)g(x)=a\log(x)+b\log(1-x). The derivatives for stationary points can be calculated as

g(x)x=axb1x,\frac{\partial g(x)}{\partial x}=\frac{a}{x}-\frac{b}{1-x}, (7)

and

2g(x)x2=ax2b(1x)2,\frac{\partial^{2}g(x)}{\partial x^{2}}=-\frac{a}{x^{2}}-\frac{b}{(1-x)^{2}}, (8)

which give the unique maximizer x=aa+bx=\frac{a}{a+b} with 2g(x)x2<0\frac{\partial^{2}g(x)}{\partial x^{2}}<0 for a,b(0,1)a,b\in(0,1).

Hence, the optimal DuD_{u}^{*} can be calculated by

Du=pdata(x)pdata(x)+pgu(x).D_{u}^{*}=\frac{p_{data}(x)}{p_{data}(x)+p_{gu}(x)}. (9)

Here, the unique maximizer implies the uniqueness of DuD_{u}^{*} for a given GuG_{u}.

B 1.2 Convergence of GuG_{u} and GsG_{s}

Proposition 2. Two generators GuG_{u} and GsG_{s} trained according to Algorithm 1 with a shared DuD_{u} would converge to a unique G=Gu=GsG^{*}=G_{u}^{*}=G_{s}^{*} which captures pdatap_{data}.

Proof: Following Proposition 1, Algorithm 1 converges to DuD_{u}^{*}. Therefore, in the training of the generator, we have the following optimization formulations, i.e.,

Gu=argminGuV(Gu,Du),G_{u}^{*}=\operatorname*{arg\,min}_{G_{u}}V(G_{u},D_{u}^{*}), (10)

and

Gs=argminGsV(Gs,Du).G_{s}^{*}=\operatorname*{arg\,min}_{G_{s}}V(G_{s},D_{u}^{*}). (11)

According to Algorithm 1, after each epoch of training in GuG_{u} and GsG_{s}, we have Gu=GsG_{u}^{{}^{\prime}}=G_{s}^{{}^{\prime}}. This is because the input to both networks and initial weights are the same, leading to the same loss with the same gradients to be updated in the backpropagation. Therefore, the training of GsG_{s} can be viewed as the traditional GAN training with GsG_{s} and DuD_{u}. Since zz is shared in the communication, we have

V(Gs,Du)=xpdata(x)log(Du(x))𝑑x+zpz(z)log(1Du(Gs(z)))𝑑z.V(G_{s},D_{u})=\int_{x}p_{data}(x)\log(D_{u}(x))\,dx+\int_{z}p_{z}(z)\log(1-D_{u}(G_{s}(z)))\,dz. (12)

Moreover, if GsG_{s} captures a distribution pgsp_{gs}, from Eq. (12), we have

V(Gs,Du)=xpdata(x)log(Du(x))𝑑x+zpz(z)log(1Du(Gu(z)))𝑑z,V(G_{s},D_{u})=\int_{x}p_{data}(x)\log(D_{u}(x))\,dx+\int_{z}p_{z}(z)\log(1-D_{u}(G_{u}(z)))\,dz, (13)

which could be further calculated via

V(Gs=Gu,Du)=xpdata(x)log(D(x))+pgs(x)log(1D(x))dx.V(G_{s}=G_{u},D_{u})=\int_{x}p_{data}(x)\log(D(x))+p_{gs}(x)\log(1-D(x))\,dx. (14)

The uniqueness of DuD_{u}^{*} in Eq. (6) and Eq. (14) leads to pgu=pgsp_{gu}=p_{gs}. Thus, GsG_{s} captures the same distribution as GuG_{u}. Hence, we can train two generators to capture the same distribution by only sharing a discriminator, without sharing original data explicitly. If GuG_{u} achieves GuG_{u}^{*}, we have Gu=GsG_{u}^{*}=G_{s}^{*}. Then the optimization problems in Eq. (10) and Eq. (11) reduce to

G=Gu=Gs=argminGuV(Gu,Du).G^{*}=G_{u}^{*}=G_{s}^{*}=\operatorname*{arg\,min}_{G_{u}}V(G_{u},D_{u}^{*}). (15)

In order to prove that GsG_{s}^{*} captures pdatap_{data}, we refer to Goodfellow et al. [2014]. For GG^{*}, from the viewpoint of game theory, DuD_{u} fails to distinguish between true and fake data. Then, we have

Du=pdatapdata+pgu=12,D_{u}^{*}=\frac{p_{data}}{p_{data}+p_{gu}}=\frac{1}{2}, (16)

which leads to pgu=pdatap_{gu}=p_{data}. Now, we have pgs=pdatap_{gs}=p_{data}. Thus, the server generator also captures the data distribution of the corresponding local user. Hence, GG^{*} captures pdatap_{data}. The uniqueness of GG^{*} shall follow the same conclusion from Goodfellow et al. [2014]. To prove this, we first assume that pgu=pdatap_{gu}=p_{data}. Then the value in Eq. (14) can be calculated as

V(G,Du)\displaystyle V(G,D_{u}^{*}) =xpdata(x)log(Du)+pgs(x)log(1Du)dx\displaystyle=\int_{x}p_{data}(x)\log(D_{u}^{*})+p_{gs}(x)\log(1-D_{u}^{*})\,dx (17)
=xpdata(x)log(0.5)+pgs(x)log(0.5)dx\displaystyle=\int_{x}p_{data}(x)\log(0.5)+p_{gs}(x)\log(0.5)\,dx (18)
=log(0.5)(xpdata(x)+pgs(x))dx\displaystyle=\log(0.5)(\int_{x}p_{data}(x)+p_{gs}(x))\,dx (19)
=log(14).\displaystyle=-\log(\frac{1}{4}). (20)

This provides us with the global minimum. On the other hand, for any GG and DuD_{u}^{*}, Let M(G)=maxDV(G,D)M(G)=\max_{D}V(G,D). Then, we have,

M(G)\displaystyle M(G) =maxDV(G,D)\displaystyle=\max_{D}V(G,D) (21)
=xpdata(x)log(Du)+pgs(x)log(1Du)dx\displaystyle=\int_{x}p_{data}(x)\log(D_{u})+p_{gs}(x)\log(1-D_{u})\,dx (22)
=xpdata(x)log(pdatapdata+pgu)+pgs(x)log(1pdatapdata+pgu)dx\displaystyle=\int_{x}p_{data}(x)\log(\frac{p_{data}}{p_{data}+p_{gu}})+p_{gs}(x)\log(1-\frac{p_{data}}{p_{data}+p_{gu}})\,dx (23)
=xpdata(x)log(pdatapdata+pgu)+pgs(x)log(pgupdata+pgu)dx.\displaystyle=\int_{x}p_{data}(x)\log(\frac{p_{data}}{p_{data}+p_{gu}})+p_{gs}(x)\log(\frac{p_{gu}}{p_{data}+p_{gu}})\,dx. (24)

After some manipulations, M(G)M(G) can be calculated as

M(G)\displaystyle M(G) =log(4)+xpdata(x)log(pdata(pdata+pgu)2)+pgs(x)log(pgu(pdata+pgu)2)dx\displaystyle=-\log(4)+\int_{x}p_{data}(x)\log(\frac{p_{data}}{\frac{(p_{data}+p_{gu})}{2}})+p_{gs}(x)\log(\frac{p_{gu}}{\frac{(p_{data}+p_{gu})}{2}})\,dx (25)
=log(4)+2JSD(pdata|pG),\displaystyle=-\log(4)+2JSD(p_{data}|p_{G}), (26)

where JSD is the Jenson-Shannon Divergence which is non-negative. This yields to log(4)-\log(4) which is the global minimum. Finally, it gives us pdata=pgup_{data}=p_{gu}, and the uniqueness of GG^{*} is proved.

B 1.3 Divergence of any GG other than GuG_{u} or GsG_{s}

Proposition 3. Any other generator G𝒜G_{\mathcal{A}} failing to capture the weights and architectures of GsG_{s} or GuG_{u}, either in the initial state or in any single communication round, would fail to characterize the data distribution pdatap_{data}.

Proof: From Proposition 1 and Proposition 2, we have pgu=pdatap_{gu}=p_{data} and pgs=pdatap_{gs}=p_{data} which are made possible only by a unique GG^{*} and DuD_{u}^{*}. As shown before, to obtain G=Gu=GsG^{*}=G_{u}^{*}=G_{s}^{*}, we need to have Gu=GsG_{u}^{{}^{\prime}}=G_{s}^{{}^{\prime}} at each step, implying that GG needs to capture all communication rounds. Therefore, these conditions restrict GG to capture pdatap_{data}.

C Results

In this section, we provide more results and evaluations for PS-FedGAN.

C 1.1 Utility

C 1.1.1 Classification Accuracy

Table 5: Classification accuracies of existing FL methods and PS-FedGAN in Split-3. We use the following abbreviations. FedAvg: FA, FedProx: FP, SCAFFOLD: SD, Naivemix: NM, FedMix: FM, SDA-FL: SL, PS-FedGAN: PS, Sharing full cGAN: FG
Method MNIST FMNIST SVHN CIFAR10
FA 98.42 82.47 84.18 79.33
FP 98.38 83.43 92.15 79.54
SD 96.89 77.68 80.13 79.35
NM 98.11 82.09 92.30 78.92
FM 98.46 84.65 92.61 79.49
SL 98.50 87.06 93.16 84.56
FG 98.46 89.26 93.05 82.09
PS 98.54 89.28 93.23 82.05

Table 5 presents the classification accuracies of the existing FL methods compared with our proposed PS-FedGAN for Split-3. We observe a similar trend as we have seen from the previous results in Table 1. For the dataset MNIST, FMNIST, and SVHN, we keep the 1% original data available on the server. For the CIFAR10, we consider 10 % of the original data at the cloud server. Considering that other evaluated methods may have more original data to train the classifier, this ratio is reasonable in Split-3. Shown as the results, our the proposed methods show superior and competitive performance compared to most of the SOTA methods which match the previous conclusions in Split-1/2.

C 1.1.2 Convergence of Cloud Generators

As an example to illustrate the generator convergence, we plot the cloud generator loss of all the 10 users under Split-1 in Figure 5. Shown as the results, our proposed PS-FedGAN can converge well for all users, which further demonstrate the practicality of the discriminator sharing.

Refer to caption
Figure 5: Generator loss of all the cloud generators in Split-1 for MNIST dataset.

C 1.1.3 Divergence of Attacker’s Generator

Shown in Figure 6 we see that if the attacker (𝒜1\mathcal{A}_{1}) deviates from the actual generator weights, the convergence of the attacker would be to a trivial point. Hence, almost no underlying user data properties are captured by the attacker.

Refer to caption
Figure 6: Divergence of the attacker. Any attacker fails to initiate with the same parameters as the optimal generators, i.e., GuG_{u} or GsG_{s}, which indicate the attacks would fail to capture the local user’s data distribution.

C 1.2 𝒜1\mathcal{A}_{1} Performances

In this section, we provide more visual results for the robustness of the proposed methods against the attacks of 𝒜1\mathcal{A}_{1}. Figure 7 illustrates the cloud-generated images compared to 𝒜1\mathcal{A}_{1} generated images for the SVHN dataset for three users at some intermediate step (epoch 20) while Figure 8 show the regenerated samples in the FMNIST dataset. Shown as the visualized results, 𝒜1\mathcal{A}_{1} fails to capture the underlying data statistics as well as to regenerate the meaningful synthetic data samples.

Refer to caption
Figure 7: Cloud generated images compared to 𝒜1\mathcal{A}_{1} generated images for Split-3 for SVHN dataset.
Refer to caption
Figure 8: Cloud generated images compared to 𝒜1\mathcal{A}_{1} generated images for Split-3 for FMNIST dataset.