Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification

Shanshan Han
University of California, Irvine
[email protected]
&Wenxuan Wu
Texas A&M University
[email protected]
&Baturalp Buyukates
University of Birmingham
[email protected]
\ANDWeizhao Jin
University of Southern California
[email protected]
&Qifan Zhang
University of California, Irvine
[email protected]
&Yuhang Yao
Carnegie Mellon University
[email protected]
&Salman Avestimehr
University of Southern California
[email protected]
&Chaoyang He
TensorOpera Inc.
[email protected]

Abstract

Federated Learning (FL) systems are susceptible to adversarial attacks, where malicious clients submit poisoned models to disrupt the convergence or plant backdoors that cause the global model to misclassify some samples. Current defense methods are often impractical for real-world FL systems, as they either rely on unrealistic prior knowledge or cause accuracy loss even in the absence of attacks. Furthermore, these methods lack a protocol for verifying execution, leaving participants uncertain about the correct execution of the mechanism. To address these challenges, we propose a novel anomaly detection strategy that is designed for real-world FL systems. Our approach activates the defense only when potential attacks are detected, and enables the removal of malicious models without affecting the benign ones. Additionally, we incorporate zero-knowledge proofs to ensure the integrity of the proposed defense mechanism. Experimental results demonstrate the effectiveness of our approach in enhancing FL system security against a comprehensive set of adversarial attacks in various ML tasks.

1 Introduction

Federated Learning (FL) [43] enables clients to collaboratively train machine learning models without sharing their local data with other parties. Due to its privacy-preserving nature, FL has attracted considerable attention across various domains in real-world applications [31, 14, 49, 39, 8, 16]. Even though FL does not require clients to share their raw data with other parties, its collaborative nature inadvertently introduces privacy and security vulnerabilities [10, 5, 38, 34, 56, 15, 54, 36, 66]. Malicious clients can harm training by submitting corrupted model updates to disrupt global model convergence [20, 15], or by planting backdoors that cause the global model to perform poorly on certain data [3, 2, 57].

Existing literature on defenses in FL comes with certain inherent limitations, making them unsuitable for real-world FL systems [7, 62, 24, 47, 33, 12, 35, 53, 23, 46, 52]. Some strategies require prior knowledge of the number of malicious clients within the FL system [7], while in practice adversaries would not announce their malicious intentions before attacking. Other defense strategies mitigate impacts of potential malicious client submissions by leveraging methods that inevitably alter the aggregation results, such as re-weighting the local models [24], modifying the aggregation function [47], and removing local models that tend to be poisoned [7]. However, in practical FL systems, attacks happen infrequently. While introducing the aforementioned defenses can mitigate the impact of potential malicious clients, the performance loss caused by the inclusion of them can outweigh the defense gain, as most real-world training cases are benign and these defenses largely compromise the model quality for all benign cases. Moreover, existing defense mechanisms are deployed om FL servers without any verification for their execution. As a result, clients are unable to verify whether the defense mechanism was executed accurately and correctly, leaving them reliant on server’s integrity and undermining trust in real-world FL systems.

Motivated by these, a successful anomaly detection approach should simultaneously satisfy the following: i) detectability: it should be capable of detecting potential attacks and responding solely when such threats are likely to occur; ii) identifiability: if an attack is detected, the strategy should further identify the malicious client models and mitigate (or eliminate) their adversarial impacts without harming the benign ones; and iii) verifiability: the defensive mechanism should be integrated with a verification mechanism to ensure the correct execution of the defense mechanism, such that clients can trust the FL system without relying solely on the server’s goodwill.

Refer to caption — Figure 1: Overview of the proposed anomaly detection mechanism.

Table 1: Comparison among our method and state-of-the-art techniques.

Attribute/Method	Krum	RFA	Foolsgold	NormClip	Bucketing	Median	TrimMean	Ours
Detecting the presence of attacks	✗	✗	✗	✗	✗	✗	✗	✓
Removing malicious models	✓	✗	✗	✗	✗	✗	✗	✓
Free from impractical knowledge	✗	✓	✓	✓	✓	✓	✓	✓
Free from reweighting	✓	✓	✗	✓	✗	✓	✓	✓
Free from modifying aggregation	✓	✗	✓	✓	✓	✗	✓	✓
Free from harming benign models	✗	✗	✗	✗	✗	✓	✗	✓
Decent results in non-attack cases	✗	✗	✗	✗	✗	✗	✗	✓
Verification for correct execution	✗	✗	✗	✗	✗	✗	✗	✓

This paper proposes a two-stage defense for anomaly detection that filters out malicious client models in each FL training round with challenges in real-world FL systems addressed. On the first stage, our approach detects potential existence of malicious clients in the current FL round based on cross-round detection. The potential presence of malicious clients activates the second stage, named cross-client detection that evaluates the degree of evilness of each local model and filters out malicious ones based on the intuition of 3 $\sigma$ Rule [48]. Our mechanism integrates a robust verification protocol that utilizes Zero-Knowledge Proof (ZKP) [26] to guarantee integrity and honest execution of the proposed defensive mechanism on the FL server. We overview our mechanism in Figure 1. Then, we compare our approach with state-of-the-art ones, including Krum [7], RFA [47], Foolsgold [24], NormClip [53], Bucketing [35], Coordinate-Wise Median [63], and Trimmed Mean [63] in Table 1. Our contributions are listed below:

i) Real-world applicability. Our method is designed to meet practical requirements of defenses in real-world FL applications. As far as we know, we are the first to close the significant gap between theoretical research and its real-world applicability in FL security.

ii) Utility and practicability. Our method is free from any unrealistic prior information, nevertheless it can still detect and eliminate the impact of malicious client models without harming the benign ones. By this means, our method proves it applicability and effectiveness in real-world FL systems where attacks happen rather rarely.

iii) Conditional activation. We propose a two-stage detection method that first identifies suspicious models and then, if necessary, triggers a double-check of the local models, thereby avoiding unnecessary accuracy loss caused by introducing a defense mechanism.

iv) Accuracy preservation. Our method preserves accuracy in attack-free situations, which is essential due to the infrequent occurrence of attacks in real-world scenarios.

v) Identifiability. Our approach removes malicious local models with high accuracy without harming the benign models or modifying the aggregation function.

vi) Verifiability. To foster trust in FL systems, we leverage ZKPs, enabling clients to independently verify the correct execution of the proposed defense mechanism on the server without relying solely on the server’s goodwill.

2 Problem Setting

2.1 Adversary Model

We consider an FL system in which at least 50% of the clients are benign. Some clients may be adversarial and can conduct attacks to achieve malicious goals such as i) planting a backdoor so that the global model misclassifies a specific set of samples while the overall model performance is minimally impacted (backdoor attacks, e.g., [3, 57]); ii) altering local models to prevent the global model from converging (Byzantine attacks, e.g., [15, 20]); and iii) cheating the FL server by randomly submitting contrived models without actual training (free riders, e.g., [58]). We assume the FL server is not fully trusted due to the complex execution environment in real-world FL systems. We assume the FL clients know the server would conduct a defense but they are suspicious if the server has conducted the defense correctly, and they would like to verify the integrity of the defense without depending solely on the server’s goodwill. We assume that the adversaries cannot conduct adaptive attacks, and discuss the extension of our approach to adaptive attacks in §3.4.

2.2 Preliminaries

Federated Learning (FL). FL [43] enables training models across decentralized devices without centralizing data. It is beneficial when dealing with sensitive data, as it allows data to remain on its original device during training.

Krum. Krum or $m$ -Krum [7] selects one or $m$ local models that deviate less (evaluated using pairwise distances) from the majority for aggregation. See Appendix A.1 for details.

3 $\sigma$ Rule. 3 $\sigma$ [48] is an empirical rule and has been utilized in anomaly detection for data management [30]. It states that the percentages of values within one, two, and three standard deviations of the mean are 68%, 95%, and 99.7%, respectively. This rule can be widely applied on real-world applications, as normal distributions are consistent with real-world data distributions [42]. Moreover, when data is not normally distributed, we can transform the distribution to normal distribution [1, 45, 51, 59].

Zero-Knowledge Proofs (ZKPs). A ZKP [26] is a proof system enabling a prover to convince a verifier that a function has been correctly computed on the prover’s secret input (witness). ZKPs have three properties: i) correctness: the proof they produce should pass verification if the prover is honest (integrity property); ii) soundness: a cheating prover cannot convince the verifier with overwhelming probability, and iii) zero-knowledge: the prover’s witness is not learned by the verifier (privacy property).

3 Two-Stage Anomaly Detection Mechanism

We propose a two-stage anomaly detection mechanism to identify and filter out malicious local models on the server. This mechanism is executed at each FL round after the server collects local models from the clients. The server first performs a cross-round check that leverages some cache, which we call reference models, to assess the likelihood of the presence of any malicious clients. Note that at this stage, the server does not remove any local models. If potentially malicious clients are detected, the server subsequently conducts a cross-client detection to analyze each local model and assess its degree of evilness. Based on this evaluation, the server identifies and excludes the malicious models from aggregation.

3.1 Cross-Round Detection

To assess the likelihood of potential presence of malicious clients, FL servers compute similarities between the local models of the current FL round and certain golden truth reference models cached in the last FL round. Local models with higher similarities to the reference models are less likely to be malicious, thus have a higher likelihood to be benign.

2Inputs:

\tau

: training round id, e.g.,

\tau=0,1,2,\ldots

;

\mathcal{W}^{(\tau)}

: client models of

\tau

round;

\gamma

: upper bound of similarities for malicious client models.

3function

\boldsymbol{\mathit{cross}\_\mathit{round}\_\mathit{check}(\mathcal{W}^{(\tau)},\tau,\gamma)}

begin

5 4 if $\tau$ =0 then return

\mathit{True}

;

8 7

\mathcal{W}^{\tau-1}\leftarrow\mathit{get}\_\mathit{cached}\_\mathit{client}\_\mathit{models}()

\mathbf{w}_{g}^{\tau-1}\leftarrow\mathit{get}\_\mathit{global}\_\mathit{model}\_\mathit{of}\_\mathit{last}\_\mathit{round}()

10 9 for $\mathbf{w}_{i}^{(\tau)}\in\mathcal{W}^{\tau}$ do

13 12

S_{c}(\mathbf{w}_{i}^{\tau-1},\mathbf{w}_{i}^{\tau})\leftarrow\mathit{get}\_\mathit{similarity}(\mathbf{w}_{i}^{\tau-1},\mathbf{w}_{i}^{\tau})

S_{c}(\mathbf{w}_{g}^{\tau-1},\mathbf{w}_{i}^{\tau})\leftarrow\mathit{get}\_\mathit{similarity}(\mathbf{w}_{g}^{\tau-1},\mathbf{w}_{i}^{\tau})

15 14if $S_{c}(\mathbf{w}_{g}^{\tau-1},\mathbf{w}_{i}^{\tau})<\gamma$ or $S_{c}(\mathbf{w}_{i}^{\tau-1},\mathbf{w}_{i}^{\tau})<\gamma$ then return

\mathit{True}

\triangleright

There may be attacks ;

20 19return

\mathit{False}

\triangleright

No attack.

Algorithm 1 Cross-Round Detection

We present the intuitive idea in Figure 2. Inspired by the state-of-the-art [24], we utilize the cosine score to compute model similarities. For each local model $\mathbf{w}_{i}$ , and its reference model $\mathbf{w}_{r}$ , the cosine similarity is computed as $S_{c}(\mathbf{w}_{i},\mathbf{w}_{r})=\frac{\mathbf{w}_{i}\cdot\mathbf{w}_{r}}{||\mathbf{w}_{i}||\cdot||\mathbf{w}_{r}||}$ . We expect the cosine similarity of each local model and its reference model to be high, since a higher cosine similarity indicates that the local model is closer to the golden truth reference model and, thus, is more likely to be benign. On the contrary, lower cosine similarities indicate that attacks have a higher possibility of occurrence on that client in the current FL training round, as malicious clients may submit arbitrary or tampered local models through some attacks [3, 57, 15, 20], making their local models diverge from the reference model.

We select reference models based on the characteristics of the attacks that are widely considered in both the literature and real-world systems, i.e., Byzantine attacks [15, 20] and backdoor attacks [3, 57]. For each local model in the current FL training round, we utilize two types of models as the reference models: i) the global model from the previous FL training round to identify whether the current local model deviates significantly from it, potentially preventing the global model from achieving convergence, and ii) the local model of the same client from the last FL training round to detect whether the local models submitted by the same client differ much across subsequent rounds, which can indicate that the client was benign in the previous round but turned evil in the current round. We note that although we use cosine similarity to compute a bound, our method does not rely heavily on it. In this stage, our method flags suspicious models as potentially malicious but does not remove them. Instead, it decides whether to remove them in the latter stage of the proposed approach.

Cross-Round Detection Algorithm. We present the cross-round detection algorithm in Algorithm 1. Initially, the server loads the reference models, including the global model from the last FL round, as well as the cached local models that are deemed as benign from the previous FL round. For each FL round $\tau$ , we denote the global model of the previous FL round by $\mathbf{w}_{g}^{\tau-1}$ . We let $\mathbf{w}_{i}^{\tau}$ denote local model submitted by client $\mathcal{C}_{i}$ in the current round $\tau$ , and let $\mathbf{w}_{i}^{\tau-1}$ denote that client’s cached local model from the previous round. The algorithm computes similarities $S_{c}(\mathbf{w}_{i}^{\tau},\mathbf{w}_{g}^{\tau-1})$ and $S_{c}(\mathbf{w}_{i}^{\tau},\mathbf{w}_{i}^{\tau-1})$ , and utilizes these scores, together with a threshold $\gamma$ ( $-1<\gamma<1$ ), to detect whether potential attacks have happened in the current FL training round. Any similarity score that is lower than $\gamma$ signals that the corresponding client might be malicious and triggers a further inspection on the client models in the second stage of our anomaly detection approach, as described in §3.2.

3.2 Cross-Client Detection

1 Inputs:

\tau

: training round id,

\tau=0,1,2,\ldots

;

\mathcal{W}

: local models of a training round;

m

m

-Krum parameter.

2function

\boldsymbol{\mathit{Cross}\_\mathit{Client}\_\mathit{Detection}(\mathcal{W},\tau)}

begin

5 4if $\tau=0$ then

m\leftarrow|\mathcal{W}|/2

f\leftarrow|\mathcal{W}|/2

\textbf{w}_{\text{avg}}\leftarrow{\mathit{Krum}\_\mathit{and}\_m\_\mathit{Krum}(\mathcal{W},m,f)}

9 8

\mathcal{L}\leftarrow\mathit{compute}\_\mathit{L2}\_\mathit{scores}(\mathcal{W},\textbf{w}_{\text{avg}})

11 10

\mu\leftarrow\frac{\sum_{\ell\in\mathcal{L}}\ell}{|\mathcal{L}|}

\sigma\leftarrow\sqrt{\frac{\sum_{\ell\in\mathcal{L}}(\ell-\mu)^{2}}{|\mathcal{L}|-1}}

\triangleright

Compute

\mathcal{N}

(

\mu

\sigma

)

13 12for 0 $<i<|\mathcal{W}|$ do

16 15if $\mathcal{L}[i]>\mu+\lambda\sigma$ then remove

\mathbf{w}_{i}

from

\mathcal{W}

;

20 19

\textbf{w}_{\text{avg}}\leftarrow\mathit{average}(\mathcal{W})

\triangleright

Cache

\textbf{w}_{\text{avg}}

23 22return

\mathcal{W}

Algorithm 2 Cross-Client Detection Algorithm.

Cross-client detection computes a score for each local model to evaluate its degree of evilness, and utilizes the 3 $\sigma$ rule to filter out those local models with higher degrees of evilness, i.e., the malicious models. The 3 $\sigma$ rule is pivotal for three reasons: i) in case the client datasets are i.i.d., parameters of the local models follow normal distribution [4, 15, 63]; ii) according to the Central Limit Theorem (CLT) [50], when client datasets are non-i.i.d., the local models still converge towards normal distribution, especially when the number of clients is at least 30 [13, 44]; and iii) even when CLT does not hold strongly (e.g., the number of clients is lower than 30), previous works show that the local models still exhibit certain statistical features [35, 47], thus the 3 $\sigma$ rule can still be applied to derive analytics from the local models.

Let $\mathcal{L}$ denote the degree of evilness for client models in the current FL round, where higher scores indicate higher probability for that client to be malicious. Suppose $\mathcal{L}$ follows normal distribution $\mathcal{N}$ ( $\mu$ , $\sigma$ ), where $\mu$ is mean and $\sigma$ is standard deviation. We then have the following definition.

Definition 3.1.

Local models with evilness degree higher than $\mu+\lambda\sigma$ are identified as malicious local models, where $\lambda$ ( $\lambda>0$ ) adjusts the sensitivity of the score computation.

According to Definition 3.1, local models with degree of evilness higher than the boundary are detected as malicious models and are excluded from aggregation. We note that we only take one side of the bounds given by the 3 $\sigma$ rule, such that the models with evilness lower than $\mu+\lambda\sigma$ are not identified as outliers since we prefer lower evilness.

The details are described in Algorithm 2. In this paper, we select $L_{2}$ distances to compute the degree of evilness. For each local model, we compute its score using that model and the average model from the previous round, denoted as $\mathbf{w}_{\text{avg}}$ . We prefer that the local model does not deviate significantly from the average model of the previous round, which can serve as golden truth. For each local model $\mathbf{w}_{i}$ in the current round, its $L_{2}$ score, denoted as $\mathcal{L}[i]$ , is computed as $\mathcal{L}[i]=||\mathbf{w}_{i}-\mathbf{w}_{\text{avg}}||$ . Considering that the first round does not have an average model as a reference, to avoid involving any malicious models in the aggregation of the first round, we utilize $m$ -Krum [7] to compute an approximate average model. In $m$ -Krum, it is ideal to involve a maximum number of benign local models and avoid polluting the approximate average model from any malicious local model. As the FL server does not know the number of potential malicious clients, we set $m$ to $|\mathcal{W}|/2$ to compute an approximate average model based on the assumption that the number of malicious clients is less than $|\mathcal{W}|/2$ , where $|\mathcal{W}|$ is the number of clients in each FL round. In later training rounds, we do not need $m$ -Krum as we simply utilize the average model from the previous round.

3.3 Optimizations for Reference Models

So far, the server stores the complete client models and the updated global model as reference models for the next FL round at the end of each FL round. However, this approach encounters pragmatic challenges in real-world deployments due to the following: i) Storage Constraints: real-world FL systems often have complex execution environments and restricted storage, which necessitate the algorithm to be optimized for storage and computation efficiency; ii) Computational Overhead: incorporating a ZKP for validation after each FL round (which will be discussed in §4) is computationally intensive [25]. Utilizing the entire collection of client/global models for computation increases resource consumption significantly and prolongs the verification time in each FL round. Meanwhile, the FL system must await the completion of this verification process before continuing the subsequent operations, which detrimentally impacts the experience of the FL clients.

In light of these, we propose using only segmental models instead of entire models as reference models. The reference model should follow the following criteria: i) the selected fragment should sufficiently represent the full model while minimizing the fragment size, ideally using just one layer of the original model; ii) the selection mechanism must be generally applicable in real-world systems and independent of specific data distributions or model structures. We follow the terminology in [24] and name such layer as an importance layer. We note that such a layer is not required to contain the maximal information compared to other layers of the same model, but should be more informative than the majority of the other layers. Intuitively, we select the second-to-last layer as the importance layer, as it is close to the output layer and thus can retain substantial information. This method can reduce complexity effectively, especially for ZKP-related computations. As an example, the second-to-last layer of CNN contains only $7,936$ parameters, compared to its full size of $1,199,882$ parameters. We experimentally validate our importance layer selection in Exp 1 in §5.

3.4 Discussions on extensions to adaptive attacks

In this work, we focus on non-adaptive adversaries, a common assumption in the literature [46, 29, 47, 35, 63] that enables us to create a baseline for anomaly detection and establish the robustness of our model under basic adversarial settings. While adaptive adversaries present a more challenging scenario, addressing them would introduce complexities that are beyond the scope of this initial work.

We acknowledge the importance of addressing adaptive attacks in FL systems. Below we discuss extensions of our approach to adaptive attacks. In the presence of adaptive attackers, malicious clients can craft their models based on their knowledge of the global model, their local model from the last FL iteration, and the cosine similarity threshold used in our defense. By carefully modifying their local models to ensure the cosine similarity falls within the threshold, malicious models may appear benign and survive the detection. To solve this problem, our approach can be extended from the following two directions: 1) regularizing local models before measuring cosine similarity, such that the adversaries cannot craft their local models based on their known information; and 2) diversifying the layers used for cross-round detection instead of just relying on a single layer, thus it would be difficult for adversaries to predict and modify their local models accordingly.

4 Verifiable Anomaly Detection

Our method incorporates a verification module to enhance trust and privacy within the FL system. Ideally, the verification module should have the following features: i) client-side verification: it should enable clients who may have concerns about the integrity of the FL server to verify the correct execution of the defense mechanism, without solely relying on the server’s goodwill; and ii) privacy protection: it should not necessitate the clients to access inappropriate knowledge, such as local models from other clients, thus preserving privacy and integrity in the FL system.

We incorporate ZKPs to ensure that the clients can verify the integrity of the defense without accessing the other local models. We utilize zkSNARKs [6] that offer constant proof sizes and constant verification time regardless of the size of computation. Such property is crucial for applications where the verifier’s (i.e., an FL client) resources are limited, e.g., real-world FL systems. We design ZKP circuits as in Figure 3. Details of implementations are in Appendix §A.4.

ZKP for Anomaly Detection. Most of the computations in Algorithm 1 and Algorithm 2 are linear and can be compiled into an arithmetic circuit easily, e.g., computing cosine similarity between two matrices of size $n\times n$ requires a circuit with $O(n^{2})$ multiplication gates and one division. While it is difficult to directly compute division on a circuit, it can be easily verified with the prover providing the pre-computed quotient and remainder beforehand. Similar to [60], we can utilize Freivalds’ algorithm [22] to verify matrix multiplications. In general, the matrix multiplication constitutes the basis of the verification schemes used for the proposed mechanism. Naively verifying a matrix multiplication $AB=C$ where $A,B,C$ are of size $n\times n$ requires proving the computation step by step, which requires $O(n^{3})$ multiplication gates. With Freivalds’ algorithm, the prover first computes the result off-circuit and commits to it. Then, the verifier generates a random vector $v$ of length $n$ , and checks $A(Bv)\stackrel{{\scriptstyle?}}{{=}}Cv$ . This approach reduces the size of the circuit to $O(n^{2})$ . We exploit this idea to design an efficient protocol for the square root computation in Algorithm 2. To verify that $x=\sqrt{y}$ is computed correctly, we ask the prover to provide the answer $x$ as witness and then we check in the ZKP that $x$ is indeed the square root of $y$ . Note that we cannot check $x^{2}$ is equal to $y$ because the zkSNARK works over a prime field and the square root of an input number might not exist. So, we check if $x^{2}$ is close to $y$ by checking that $x^{2}\leq y$ and $(x+1)^{2}\geq y$ . This approach reduces the computation of square root to 2 multiplications and 2 comparisons.

The zero-knowledge property of ZKPs allows public verification of prover’s (i.e., the FL server) integrity in case of the server being untrustworthy. By incorporating ZKPs, we provide a public verifiable approach for each client to ensure FL server’s integrity which is essential for building and maintaining trust in FL systems. This ensures that clients can verify the correctness of the defense without needing to rely solely on the server’s goodwill. This is also secure in case there exists adversarial clients, as the ZKP itself reveals nothing about the prover’s witness, i.e., private data, models, and/or thresholds the server uses during the proposed anomaly detection approach.

5 Evaluations

Setting. A summary of datasets and models for evaluations can be found in Table 2. By default, we employ CNN and the non-i.i.d. FEMNIST dataset ( $\alpha=0.5$ ), as the non-i.i.d. setting closely captures real-world scenarios. We utilize FedAVG in our experiments. By default, we use 10 clients for FL training, corresponding to real-world FL applications where the number of clients is typically less than 10, especially in ToB scenarios. We also vary the number of clients from 10 to 100 in Exp 5, and validate the utility of our approach in a practical application using 20 edge real-world devices; see Exp 11. We conduct our evaluations on a server with 8 NVIDIA A100-SXM4-80GB GPUs, and validate the correct execution with ZKP on Amazon AWS with an m5a.4xlarge instance with 16 CPU cores and 32 GB memory. We implement the ZKP system in Circom [17].¹¹1We provide a link to our code in Appendix §LABEL:code.

Table 2: Models and datasets.

Model	Dataset
CNN [43]	FEMNIST [9]
ResNet-20 [32]	Cifar10 [37]
ResNet-56 [32]	Cifar100 [37]
RNN [43]	Shakespeare [43]
LR [18]	MNIST [19]

Selection of attacks and defenses. We employ two byzantine attacks and two backdoor attacks that are widely considered in literature, including a random weight Byzantine attack that randomly modifies the local submissions [15, 20], a zero weight Byzantine attack that sets all model weights to zero [15, 20], the label flipping backdoor attack that flip labels in the local data [55], and a model replacement backdoor attack [3] that intends to use a poisoned local model to replace the global model. We utilize 5 baseline defense mechanisms that can be effective in real systems: $m$ -Krum [7], Foolsgold [24], RFA [47], Bucketing [35], and Trimmed Mean [63]. For $m$ -Krum, we set $m$ to 5, which means 5 out of 10 submitted local models participate in aggregation in each FL training round.

Evaluation Metrics. We evaluate the effectiveness of cross-round check using cross-round detection success rate, defined by the proportion of rounds where the algorithm correctly detects cases with or without an attack relative to the number of total FL rounds. A 100% cross-round success rate indicates that all FL rounds that potential attacks might have happened are detected, and none of the benign cases are identified as “attacks” by mistake. We evaluate the quality of cross-client detection using modified Positive Predictive Values (PPV) [21], the proportions of positive results in statistics and diagnostic tests that are true positive results. Let us denote the number of true positive and the false positive results as $N_{\mathit{TP}}$ and $N_{\mathit{FP}}$ , respectively. Then we have $\mathit{PPV}=\frac{N_{\mathit{TP}}}{N_{\mathit{TP}}+N_{\mathit{FP}}}$ . In our setting, client submissions that are detected as “malicious” and are actually malicious are defined as True Positive, i.e., $N_{\mathit{TP}}$ , while client submissions that are detected as “malicious” even though they are benign are defined as False Positive, i.e., $N_{\mathit{FP}}$ . Since we would like the PPV to reveal the relation between $N_{\mathit{TP}}$ and the total number of malicious local models across all FL rounds, we use the total number of malicious local models across all FL rounds, denoted as $N_{\mathit{total}}$ , and compute a modified PPV as $\frac{N_{\mathit{TP}}}{N_{\mathit{TP}}+N_{\mathit{FP}}+N_{\mathit{total}}}$ , where $0\leq\mathit{PPV}\leq\frac{1}{2}$ . Ideally, PPV is $\frac{1}{2}$ , where all malicious local models are detected, i.e., $N_{\mathit{FP}}=0$ and $N_{\mathit{TP}}=N_{\mathit{total}}$ . The details are in Appendix A.2.

Exp 1: Selection of importance layer. We utilize the L2-norm of the local models to evaluate the “sensitivity” of each layer. A layer with a norm higher than most of the other layers indicates higher sensitivity compared to others, thus can be utilized to represent the whole model. We evaluate the sensitivity of the layers of CNN, RNN, and ResNet-56. The results for RNN, CNN, and ResNet-56 are deferred to Figure 9(a), Figure 9(b), and Figure 9(c) in Appendix §A.5, respectively. The results show the sensitivity of the second-to-the-last layer is higher than most of the other layers. Thus, this layer includes adequate information of the whole model and can be selected as the importance layer.

Exp 2: Impact of the similarity threshold. We evaluate the impact of the similarity threshold $\gamma$ in the cross-round check with 10 clients in each FL round, where 4 of them are malicious. Ideally, the cross-round check should confirm the absence or presence of an attack accurately. We evaluate the impact of the cosine similarity threshold $\gamma$ in the cross-round check by setting $\gamma$ to 0.5, 0.6, 0.7, 0.8, and 0.9. As described in Figure 4(a), the cross-round detection success rate is close to 100% in the case of Byzantine attacks. We observe that, when the cosine similarity threshold $\gamma$ is set to 0.5, the performance is satisfactory in all cases, with at least 93% cross-round detection success rate.

Exp 3: Selection of the number of deviations ( $\lambda$ ). We set $\lambda$ to 0.5, 1, 1.5, 2, 2.5, and 3, and utilize $\mathit{PPV}$ to evaluate the impact of the number of deviations, i.e., the parameter $\lambda$ in the anomaly bound $\mu+\lambda\sigma$ . To evaluate a challenging case where a large portion of the clients are malicious, we set 40% clients malicious in each FL round. Given that the number of FL rounds is 100, the total number of malicious submissions $N_{\mathit{total}}$ is 400. We evaluate our approach on three tasks, as follows: i) CNN+FEMNIST, ii) ResNet-56+Cifar100, and iii) RNN + Shakespeare. We observe in Figure 4(b), that when $\lambda$ is 0.5, the results are the best. Especially for the random weight Byzantine attack, we see that the $\mathit{PPV}$ is exactly 0.5, indicating that all malicious local models are detected. In subsequent experiments, unless specified otherwise, we set $\lambda$ to 0.5.

Exp 4: Varying the percentage of malicious clients. We use random Byzantine attack and set the percentage of malicious clients to 20% and 40%. We also include a baseline case where all clients are benign. As shown in Figure 4(c), the test accuracy remains relatively consistent across different cases, as in each FL training round, our approach filters out the local models that tend to be malicious to minimize the negative impact of malicious client models on aggregation.

Exp 5: Varying the number of FL clients. We explore the impact of the number of clients under the random Byzantine attack. We set the number of clients to 10, 40, 70, and 100, and set the percentage of malicious clients to 40%. The results, as described in Figure 4(d), indicate that in all cases, our approach has high utility and can filter out malicious clients with high accuracy.

Exp 6: Evaluations on Byzantine attacks. We compare our approach with the state-of-the-art defenses using 10 clients, and set one of them as malicious in each FL round. We include a “benign” baseline scenario with no activated attack or defense. The results for the random weight Byzantine attack (Figure 5(a)) and the zero weight Byzantine attack (Figure 5(b)) demonstrate that our approach effectively mitigates the negative impact of the attacks and significantly outperforms the other defenses, by achieving a test accuracy much closer to the benign case.

Exp 7: Evaluations on backdoor attacks. We compare our approach with the state-of-the-art defenses using 10 clients, where one of them is malicious in each FL round. Considering that the label flipping attack is subtle and manipulates local training data and produces malicious local models that are challenging to detect, we set the parameter $\lambda$ to 2 to produce a tighter boundary. The results for the label flipping attack and model replacement backdoor attack are shown in Figure 5(c) and Figure 5(d), respectively. Results show that our approach is effective against backdoor attacks, with the test accuracy much closer to the benign case compared to the baseline defenses.

Exp 8: Evaluations on different attack frequencies. We configure attacks to occur only during specific rounds to evaluate the effectiveness of the proposed two-stage approach. The total number of attack rounds is set to 10, 40, 70, and 100, respectively. We then fix the number of attack rounds to 40 and compare our approach with the state-of-the-art defenses. The results in Figure 7(a) and Figure 7(b) show that our method effectively mitigates the impact of the adversarial attacks, ensuring minimal accuracy loss and robust performance even under different attack rounds.

Exp 9: Evaluations on different tasks. We evaluate the defenses against the random mode of the Byzantine attack with different models and datasets described in Figure 2. The results in Figure 7(c), Figure 7(d), and Figure 9(d) in §A.5 show that our approach outperforms the baseline defenses by effectively filtering out poisoned local models, with a test accuracy close to the benign scenarios. Moreover, some defenses may fail in some tasks, e.g., $m$ -Krum fails in RNN in Figure 9(d), as those methods either select a fixed number of local models or re-weight the local models in aggregation, which potentially eliminates some local models that are important to the aggregation, leading to an unchanged test accuracy in later FL rounds.

Table 3: Cost of ZKP of different models

Model	Stage 1 Circuit Size	Stage 2 Circuit Size	Proving Time (s)	Verification Time (ms)
CNN	476,160	795,941	33 (12 + 21)	3
RNN	1,382,400	2,306,341	96 (34 + 62)	3
ResNet-56	1,536,000	2,562,340	100 (37 + 63)	3

•

Bracketed times denote duration for cross-round detection and cross-client detection.

Exp 10: Evaluations of ZKP verification. We implement a prover’s module which contains JavaScript code to generate witness for the ZKP, as well as to perform fixed-point quantization. Specifically, we only pull out parameters of the importance layer to represent the whole model to reduce complexity. We report the results in Table 3.

Exp 11: Evaluations in a real-world setting. To validate the utility and scalability of our approach in real-world applications, we utilize 20 real-world edge devices to demonstrate how our anomaly detection mechanism performs under practical constraints and settings. The device information is shown in Figure 10 in Appendix §A.6. In each FL round, we designate 5 devices as malicious. The FL client package is integrated into the edge nodes to fetch data from our back-end periodically. Due to the challenges posed by real-world settings, such as devices equipped solely with CPUs (lacking GPUs), potential connectivity issues, network latency, and limited storage on edge devices, we select a simple task, i.e., using the MNIST dataset for a logistic regression task, to run FL training for 10 rounds, and use our proposed anomaly detection method to prevent against the random weight Byzantine attack. The training process is shown in Figure 11 in Appendix §A.6 , and the total training time is 221 seconds. The CPU utilization and network traffic during training are shown in Figure 13 and Figure 13 in Appendix §A.6, respectively. We also included a benign case and an attack-only case for comparison. The results are shown in Figure 14. Results show that despite the presence of malicious clients and the limitations of edge devices, our approach successfully identifies and mitigates the impact of malicious local models.

6 Related Works

Detection of attacks. [65] employs $k$ -means to partition local models into clusters that correspond to “benign” or “malicious”. While this approach can efficiently detect attacks, it requires some pre-training rounds and relies much on historical client models, thus might not be as effective when there is limited information on past client models. For example, their implementation [64] sets the starting round to detect attacks to different training rounds, e.g., 50 when the datasets are MNIST and FEMNIST, and 20 when the dataset is CIFAR10. While this approach is novel, it is not suitable for real FL systems, as attacks may happen in earlier rounds as well.

Defense mechanisms in FL. Robust learning and the mitigation of adversarial behaviors in FL has been extensively explored [7, 62, 24, 47, 33, 35, 53, 23, 46, 52, 63, 15, 28, 61, 40, 11]. Some approaches keep several local models that are more likely to be benign in each FL round, e.g., [7, 28, 63], and [61], instead of aggregating all client submissions. Such approaches are effective, but they keep fewer local models than the real number of benign local models to ensure that all malicious local models are filtered out, causing misrepresentation of some benign local models in the aggregation. This completely wastes the computation resources of the benign clients that are incorrectly removed and thus, changes the aggregation results. Some approaches re-weight or modify local models to mitigate the impacts of potential malicious submissions [24, 35, 53, 23, 46, 52], while other approaches alter the aggregation function or directly modify the aggregation results [47, 35, 63, 15]. While these defense mechanisms can be effective against attacks, they might inadvertently degrade the quality of outcomes due to the unintentional alteration of aggregation results even when no attacks are present. This is especially problematic given the low frequency of attacks in practical scenarios.

7 Conclusions

We present a novel anomaly detection approach specifically designed for real-world FL systems. Our approach utilizes an early cross-round check that activates subsequent anomaly detection exclusively in the presence of attacks. When attacks happen, our approach removes malicious client models efficiently, ensuring that the local models submitted by benign clients remain unaffected. By leveraging ZKPs, our approach enables clients to verify the integrity of the removal performed by the server.

References

Aoki [1950] Tosio Aoki. On the stability of the linear transformation in banach spaces. Journal of the mathematical society of Japan, 2(1-2):64–66, 1950.
Bagdasaryan et al. [2020a] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov. How to backdoor federated learning. In AISTATS, 2020a.
Bagdasaryan et al. [2020b] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. How to backdoor federated learning. In International Conference on Artificial Intelligence and Statistics, pp. 2938–2948. PMLR, 2020b.
Baruch et al. [2019] Gilad Baruch, Moran Baruch, and Yoav Goldberg. A little is enough: Circumventing defenses for distributed learning. Advances in Neural Information Processing Systems, 32, 2019.
Bhagoji et al. [2019] Arjun Nitin Bhagoji, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo. Analyzing federated learning through an adversarial lens. In International Conference on Machine Learning, pp. 634–643. PMLR, 2019.
Bitansky et al. [2012] Nir Bitansky, Ran Canetti, Alessandro Chiesa, and Eran Tromer. From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 326–349, 2012.
Blanchard et al. [2017] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In NeurIPS, 2017.
Byrd & Polychroniadou [2020] David Byrd and Antigoni Polychroniadou. Differentially private secure multi-party computation for federated learning in financial applications. In Proceedings of the First ACM International Conference on AI in Finance, pp. 1–9, 2020.
Caldas et al. [2018] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečnỳ, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.
Cao & Gong [2022] Xiaoyu Cao and Neil Zhenqiang Gong. Mpaf: Model poisoning attacks to federated learning based on fake clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3396–3404, 2022.
Cao et al. [2020] Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. Fltrust: Byzantine-robust federated learning via trust bootstrapping. arXiv preprint arXiv:2012.13995, 2020.
Cao et al. [2022] Xiaoyu Cao, Zaixi Zhang, Jinyuan Jia, and Neil Zhenqiang Gong. Flcert: Provably secure federated learning against poisoning attacks. IEEE Transactions on Information Forensics and Security, 17:3691–3705, 2022.
Chang et al. [2006] HJ Chang, K Huang, and C Wu. Determination of sample size in using central limit theorem for weibull distribution. International journal of information and management sciences, 17(3):31, 2006.
Chen et al. [2019] Mingqing Chen, Rajiv Mathews, Tom Ouyang, and Françoise Beaufays. Federated learning of out-of-vocabulary words. arXiv preprint arXiv:1903.10635, 2019.
Chen et al. [2017] Y. Chen, L. Su, and J. Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. ACM on Measurement and Analysis of Computing Systems, 1(2):1–25, 2017.
Chowdhury et al. [2022] Alexander Chowdhury, Hasan Kassem, Nicolas Padoy, Renato Umeton, and Alexandros Karargyris. A review of medical federated learning: Applications in oncology and cancer research. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, September 27, 2021, Revised Selected Papers, Part I, pp. 3–24. Springer, 2022.
Contributors [2022] Circom Contributors. Circom zkSNARK ecosystem, 2022. https://github.com/iden3/circom.
Cox [1958] David R Cox. The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2):215–232, 1958.
Deng [2012] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
Fang et al. [2020] M. Fang, X. Cao, J. Jia, and N. Gong. Local model poisoning attacks to Byzantine-robust federated learning. In USENIX Security, 2020.
Fletcher [2019] Grant S Fletcher. Clinical epidemiology: the essentials. Lippincott Williams & Wilkins, 2019.
Freivalds [1977] R. Freivalds. Probabilistic machines can use less running time. In IFIP Congress, 1977.
Fu et al. [2019] Shuhao Fu, Chulin Xie, Bo Li, and Qifeng Chen. Attack-resistant federated learning with residual-based reweighting. arXiv preprint arXiv:1912.11464, 2019.
Fung et al. [2020] Clement Fung, Chris JM Yoon, and Ivan Beschastnikh. The limitations of federated learning in sybil settings. In RAID, pp. 301–316, 2020.
Goldreich & Krawczyk [1996] Oded Goldreich and Hugo Krawczyk. On the composition of zero-knowledge proof systems. SIAM Journal on Computing, 25(1):169–192, 1996.
Goldwasser et al. [1989] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM Jour. on Comp., 18(1):186–208, 1989.
Groth [2016] J. Groth. On the size of pairing-based non-interactive arguments. In Eurocrypt, 2016.
Guerraoui et al. [2018a] Rachid Guerraoui, Sébastien Rouault, et al. The hidden vulnerability of distributed learning in byzantium. In International Conference on Machine Learning, pp. 3521–3530. PMLR, 2018a.
Guerraoui et al. [2018b] Rachid Guerraoui, Sébastien Rouault, et al. The hidden vulnerability of distributed learning in byzantium. In International Conference on Machine Learning, pp. 3521–3530. PMLR, 2018b.
Han et al. [2019] S. Han, H. Wang, J. Wan, and J. Li. An iterative scheme for leverage-based approximate aggregation. In IEEE ICDE, 2019.
Hard et al. [2018] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604, 2018.
He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
He et al. [2022] L. He, S. P. Karimireddy, and M. Jaggi. Byzantine-robust decentralized learning via self-centered clipping. 2022. Available on arXiv:2202.01545.
Jin et al. [2021] Xiao Jin, Pin-Yu Chen, Chia-Yi Hsu, Chia-Mu Yu, and Tianyi Chen. Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems, 34:994–1006, 2021.
Karimireddy et al. [2020] Sai Praneeth Karimireddy, Lie He, and Martin Jaggi. Byzantine-robust learning on heterogeneous datasets via bucketing. arXiv preprint arXiv:2006.09365, 2020.
Kariyappa et al. [2022] Sanjay Kariyappa, Chuan Guo, Kiwan Maeng, Wenjie Xiong, G. Edward Suh, Moinuddin K. Qureshi, and Hsien-Hsin S. Lee. Cocktail party attack: Breaking aggregation-based privacy in federated learning using independent component analysis. In International Conference on Machine Learning, 2022. URL https://api.semanticscholar.org/CorpusID:252211968.
Krizhevsky et al. [2009] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
Lam et al. [2021] Maximilian Lam, Gu-Yeon Wei, David Brooks, Vijay apa Reddi, and Michael Mitzenmacher. Gradient disaggregation: Breaking privacy in federated learning by reconstructing the user participant matrix. In International Conference on Machine Learning, pp. 5959–5968. PMLR, 2021.
Leroy et al. [2019] David Leroy, Alice Coucke, Thibaut Lavril, Thibault Gisselbrecht, and Joseph Dureau. Federated learning for keyword spotting. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6341–6345, 2019.
Li et al. [2020] Suyi Li, Yong Cheng, Wei Wang, Yang Liu, and Tianjian Chen. Learning to detect malicious clients for robust federated learning. arXiv preprint arXiv:2002.00211, 2020.
Liu et al. [2021] T. Liu, X. Xie, and Y. Zhang. ZkCNN: Zero knowledge proofs for convolutional neural network predictions and accuracy. In ACM CCS, 2021.
Lyon [2014] Aidan Lyon. Why are normal distributions normal? The British Journal for the Philosophy of Science, 2014.
McMahan et al. [2017] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
of Public Health [2001] Boston University School of Public Health. Central limit theorem. 2001. https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_probability/BS704_Probability12.html.
Osborne [2010] J. Osborne. Improving your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research, and Evaluation, 15(1):12, 2010.
Ozdayi et al. [2021] Mustafa Safa Ozdayi, Murat Kantarcioglu, and Yulia R Gel. Defending against backdoors in federated learning with robust learning rate. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 9268–9276, 2021.
Pillutla et al. [2022] Krishna Pillutla, Sham M Kakade, and Zaid Harchaoui. Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70:1142–1154, 2022.
Pukelsheim [1994] F. Pukelsheim. The three sigma rule. The American Statistician, 48(2):88–91, 1994.
Ramaswamy et al. [2019] Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329, 2019.
Rosenblatt [1956] M. Rosenblatt. A central limit theorem and a strong mixing condition. National Academy of Sciences, 42(1):43–47, 1956.
Sakia [1992] R. M. Sakia. The Box-Cox transformation technique: A review. Journal of the Royal Statistical Society: Series D, 41(2):169–178, 1992.
Sun et al. [2021] Jingwei Sun, Ang Li, Louis DiValentin, Amin Hassanzadeh, Yiran Chen, and Hai Li. Fl-wbc: Enhancing robustness against model poisoning attacks in federated learning from a client perspective. Advances in Neural Information Processing Systems, 34:12613–12624, 2021.
Sun et al. [2019] Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H Brendan McMahan. Can you really backdoor federated learning? arXiv preprint arXiv:1911.07963, 2019.
Tolpegin et al. [2020a] Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. Data poisoning attacks against federated learning systems. In European Symposium on Research in Computer Security, pp. 480–501. Springer, 2020a.
Tolpegin et al. [2020b] Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. Data poisoning attacks against federated learning systems. In Computer security–ESORICs 2020: 25th European symposium on research in computer security, ESORICs 2020, guildford, UK, September 14–18, 2020, proceedings, part i 25, pp. 480–501. Springer, 2020b.
Tomsett et al. [2019] Richard Tomsett, Kevin Chan, and Supriyo Chakraborty. Model poisoning attacks against distributed machine learning systems. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, pp. 481–489. SPIE, 2019.
Wang et al. [2020] H. Wang, K. Sreenivasan, S. Rajput, H. Vishwakarma, S. Agarwal, J. Sohn, K. Lee, and D. Papailiopoulos. Attack of the tails: Yes, you really can backdoor federated learning. In NeurIPS, 2020.
Wang [2022] Jianhua Wang. Pass: Parameters audit-based secure and fair federated learning scheme against free rider. arXiv preprint arXiv:2207.07292, 2022.
Weisberg [2001] S. Weisberg. Yeo-Johnson power transformations. 2001. https://www.stat.umn.edu/arc/yjpower.pdf.
Weng et al. [2021] Chenkai Weng, Kang Yang, Xiang Xie, Jonathan Katz, and Xiao Wang. Mystique: Efficient conversions for $\{$ Zero-Knowledge $\}$ proofs with applications to machine learning. In 30th USENIX Security Symposium (USENIX Security 21), pp. 501–518, 2021.
Xie et al. [2020] Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. SLSGD: Secure and Efficient Distributed On-device Machine Learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 213–228. Springer, 2020.
Yang et al. [2019] H. Yang, X. Zhang, M. Fang, and J. Liu. Byzantine-resilient stochastic gradient descent for distributed learning: A Lipschitz-inspired coordinate-wise median approach. In IEEE CDC, 2019.
Yin et al. [2018] Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter Bartlett. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, pp. 5650–5659. PMLR, 2018.
Zhang et al. [2022a] Zaixi Zhang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. Implementation of fldetector. https://github.com/zaixizhang/FLDetector, 2022a.
Zhang et al. [2022b] Zaixi Zhang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. Fldetector: Defending federated learning against model poisoning attacks via detecting malicious clients. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2545–2555, 2022b.
Zhang et al. [2022c] Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph Gonzalez, Kannan Ramchandran, and Prateek Mittal. Neurotoxin: Durable backdoors in federated learning. In International Conference on Machine Learning, 2022c. URL https://api.semanticscholar.org/CorpusID:249889464.

Appendix A Appendix

A.1 Details of Krum and $m$ -Krum

In Krum and $m$ -Krum, the server selects $m$ ( $m$ is one in Krum) local models that deviate less from the majority based on their pairwise distances, where such local models are more likely to be benign and thus are accepted for aggregation in the current round. Given that there are $f$ byzantine clients among $L$ clients that participate in each FL iteration, Krum selects one model that is the most likely to be benign as the global model. That is, instead of using all $L$ local models in aggregation, the server selects a single model to represent all $L$ submissions. To do so, Krum computes a score for each model $\mathbf{w}_{i}$ , denoted as $S_{K}(\mathbf{w}_{i})$ , using $L-f-2$ local models that are “closest” to $\mathbf{w}_{i}$ , and selects the local model with the minimum score to represent the aggregation result. For each local model $\mathbf{w}_{i}$ , suppose $C_{i}^{\mathcal{N}}$ is the set of the $L-f-2$ local models that are closest to $\mathbf{w}_{i}$ , then $S_{K}(\mathbf{w}_{i})$ is computed by

S_{K}(\mathbf{w}_{i})=\displaystyle\sum_{j\in\mathcal{C}_{i}}||\mathbf{w}_{i}-\mathbf{w}_{j}||^{2}.

An optimization of Krum is $m$ -Krum [7] that selects $m$ local models, instead of one, when aggregating local models. The algorithm for Krum and $m$ -Krum is summarized in Algorithm 3 .

1 Inputs:

\mathcal{W}

: client submissions of a training round;

i

: the client id for which we compute a Krum score

S_{K}(\mathbf{w}_{i})

;

f

: the number of malicious clients in each FL iteration;

m

: the number of “neighbor” client models that participate in computing the Krum score

S_{k}(\mathbf{w}_{i})

of each client model

\mathbf{w}_{i}

;

m

is 1 by default in Krum.

32function

\boldsymbol{\mathit{Krum}\_\mathit{and}\_m\_\mathit{Krum}(\mathcal{W},m,f)}

begin

6 5

S_{k}\leftarrow[]

8 7for $\mathbf{w}_{j}\in\mathcal{W}$ do

10 9

S_{k}(\mathbf{w}_{i})\leftarrow\mathit{compute}\_\mathit{krum}\_\mathit{score}(\mathcal{W},i,m,f)

13 12

\mathit{filter}(\mathcal{W},S_{k})

\triangleright

Keep local models with the

L/2

lowest Krum scores

15 14return

\mathit{average}(\mathcal{W})

1817function

\boldsymbol{\mathit{compute}\_\mathit{krum}\_\mathit{score}(\mathcal{W},i,m,f)}

begin

20 19

d\leftarrow[]

\triangleright

Square distances of

\mathbf{w}_{i}

to other local models.

22 21

L\leftarrow|\mathcal{W}|

\triangleright

L

: the number of clients in each FL round.

24 23for $\mathbf{w}_{j}\in\mathcal{W}$ do

27 26if $i\neq j$ then

d

\mathit{append}

(

||\mathbf{w}_{i}-\mathbf{w}_{j}||^{2}

)

32 31

\mathit{sort}(d)

\triangleright

In ascending order

34 33

S_{k}(\mathbf{w}_{i})\leftarrow\sum_{k=0}^{L-f-3}d

\triangleright

Use the smallest

L-f-2

scores to compute

S_{k}(\mathbf{w}_{i})

36 35return

S_{k}(\mathbf{w}_{i})

Algorithm 3 Krum and

m

-Krum.

A.2 Proof of the range of PPV

Below, we show that the upper bound of $\mathit{PPV}$ is $\frac{1}{2}$ .

Proof.

$\mathit{PPV}=\frac{N_{\mathit{TP}}}{N_{\mathit{TP}}+N_{\mathit{FP}}+N_{\mathit{total}}}$ , then $\frac{1}{PPV}=1+\frac{N_{\mathit{FP}}}{N_{\mathit{TP}}}+\frac{N_{\mathit{total}}}{N_{\mathit{TP}}}$ . As $\frac{N_{\mathit{FP}}}{N_{\mathit{TP}}}\geq 0$ and $\frac{N_{\mathit{total}}}{N_{\mathit{TP}}}\geq 1$ , we have $\frac{1}{\mathit{PPV}}\geq 2$ , thus $\mathit{PPV}\leq\frac{1}{2}$ . ∎

A.3 Extension to Client Sampling

Our method can work in the case of client sampling. For ease of explanation, in the main manuscript, we assumed that all clients participate in aggregation in every FL iteration. However, with some engineering efforts, we can easily extend the method to handle client selection. To handle scenarios with client selection, we can cache historical client models for the same clients across rounds, such that the server can perform cross-round detection even when clients do not participate in every round. If the cached model for a client is too old, we can use the global model from the last round as the reference model. A scenario with adversary clients that participate only once (i.e., single-shot attacks) constitutes a specific case of the client selection challenge described above. In such cases, we can use the global model from the last round as the reference model for cross-round detection.

A.4 Zero-Knowledge Proof (ZKP) Implementation

This section describes the details of the implementation of ZKPs. In what follows, the prover is the FL server, whereas the verifiers are the FL clients.

A.4.1 Choice of the ZKP system

In our implementation, we use the Groth16 [27] zkSNARK scheme implemented in the Circom library [17] for all the computations described earlier. We choose this ZKP scheme because its construction ensures constant proof size (128 bytes) and constant verification time. Because of this, Groth16 is popular for blockchain applications as it necessitates little on-chain computation. There are other ZKP schemes based on different constructions that can achieve faster prover time [41], but their proof size is bigger and verification time is not constant, which is a problem if the verifier lacks computational power, as in our case since the verifiers are the FL clients in our setting. The construction of a ZKP scheme that is efficient for both the prover and verifier is still an open research direction.

A.4.2 ZKP-compatible Language

The first challenge of applying ZKP protocols is to convert the computations into a ZKP-compatible language. ZKP protocols model computations as arithmetic circuits with addition and multiplication gates over a prime field. However, our computations for our approach are over real numbers. The second challenge is that some computations such as square root are nonlinear, making it difficult to wire them as a circuit. To address these issues, we implement a class of operations that map real numbers to fixed-point numbers. To build our ZKP scheme, we use Circom library [17], which compiles the description of an arithmetic circuit in a front-end language similar to C++ to the back-end ZKP protocol.

A.4.3 Interactivity of zkSNARKs

In the Freivalds’ algorithm [22], the prover first computes the matrix multiplication and commits to its result. Then the verifier generates and sends the random vector. This step is interactive in nature, but we can make this non-interactive using the Fiat-Shamir heuristic as it is public-coin, meaning the vector is randomly selected by the verifier and made public to everyone. Therefore, the prover can instead generate this vector by setting it to the hash of matrices A,B and C. With this, our entire ZKP pipeline, including the Freivalds’ step can become truly non-interactive.

A.4.4 Motivation of Implementing ZKP

ZKP enables proving to the clients that the server has correctly executed the anomaly detection process. This addresses a critical concern in FL systems, where clients cannot directly verify the server’s behavior and must fully trust the server. Below, we explain the motivation for ZK from research, industry product, and system perspectives.

Research Perspective: Existing literature has considered various adversarial models. For example, 1) clients might be malicious and submit modified models; 2) FL server might be curious about local models and want to infer sensitive information, such as original training data, or the local models; 3) clients might be curious about local models of other clients; 4) an external adversary may hack the communication channels between clients and the server and poison some client models; 5) the FL server may be hacked by external adversaries; 6) a global “sybil” may hack the whole system and control some clients by modifying their local training data, and so on.

In our paper, we assume the FL server is not fully trusted due to the complex execution environment in real systems. There may be external adversaries or a global sybil, thus, even if the server hopes to execute the aggregation correctly, the presence of adversaries necessitates a ZKP module for verification to ensure that the server’s actions are transparent and trustworthy to all clients.

Industry Perspective: The necessity of ZKP also arises from real-world application needs. Consider, for example, FL clients that are medical institutions or hospitals holding sensitive data, such as patient medical records. These institutions may want to collaboratively train a model but be unwilling to share their raw data due to privacy concerns. Although these institutions know that the server will run an anomaly detection procedure, they may not be fully convinced that the server will honestly execute the procedure or that their models will participate in the aggregation without bias. Here, ZKP enables verification that the anomaly detection is performed correctly, even when the clients do not have access to the local models of other clients. This is critical for gaining the trust of the participating clients.

System Perspective: Real FL systems with rewards contain components such as model aggregation, contribution assessment of local models, and anomaly detection, etc. If the FL server is not fully trusted, validating all these operations is essential. However, the focus of our paper is specifically on anomaly detection, and therefore, we have primarily discussed the application of ZKP in this context. The ZKP module ensures that even if the server is not fully trusted, e.g., under potential external threats, clients can have verifiable proof that the anomaly detection has been executed correctly, thus maintaining the integrity and security of the whole FL process.

A.5 Supplementary Experimental Results

The results for the importance layers of RNN, CNN, and ResNet-56 are given in Figure 9(a), Figure 9(b), and Figure 9(c), respectively. The results for evaluations on RNN and the Shakespeare dataset is shown in Figure 9(d).

A.6 Supplementary Results for the Real-world Experiment

The edge devices we use are described in Figure 10, the real-world simulation is in Figure 11, the CPU utilization is in Figure 13, and the network traffic is in Figure 13.

Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification

Abstract

1 Introduction

2 Problem Setting

2.1 Adversary Model

2.2 Preliminaries

3 Two-Stage Anomaly Detection Mechanism

3.1 Cross-Round Detection

3.2 Cross-Client Detection

Definition 3.1.

3.3 Optimizations for Reference Models

3.4 Discussions on extensions to adaptive attacks

4 Verifiable Anomaly Detection

5 Evaluations

6 Related Works

7 Conclusions

References

Appendix A Appendix

A.1 Details of Krum and mm-Krum

A.2 Proof of the range of PPV

Proof.

A.3 Extension to Client Sampling

A.4 Zero-Knowledge Proof (ZKP) Implementation

A.4.1 Choice of the ZKP system

A.4.2 ZKP-compatible Language

A.4.3 Interactivity of zkSNARKs

A.4.4 Motivation of Implementing ZKP

A.5 Supplementary Experimental Results

A.6 Supplementary Results for the Real-world Experiment

A.1 Details of Krum and $m$ -Krum