Shielding Federated Learning: Robust Aggregation with Adaptive Client Selection

1 Evaluated attacks.

Label flipping (LF) attack: The attackers flip the label $l$ of each training example to $L-l-1$ , where $L$ is the total kinds of labels.

LIE attack: LIE attack ALittleIsEnough is a coordinate-wise model poisoning attack. The attacker first calculates mean and standard deviation of all benign updates, and then adds noise associating with the number of attackers and the standard deviation to the mean in each dimension. Finally, the attacker uploads malicious resultant to the server.

AGR-tailored (AGRT) attack: In order to construct a poisoned local update, the adversary AGRTailored performs the following optimization problem:

	$\displaystyle\underset{\gamma}{\text{argmax }}$	$\displaystyle\\|\boldsymbol{g}-\mathscr{A}(\widetilde{\boldsymbol{g}}_{\{i\in[n]\}}\cup\boldsymbol{g}_{\{i\in[n,K]\}})\\|$		(1)
		$\displaystyle\widetilde{\boldsymbol{g}}_{\{i\in[n]\}}=\boldsymbol{g}+\gamma\widetilde{\boldsymbol{\Delta}};\boldsymbol{g}=\textit{FedAvg}(\boldsymbol{g}_{\{i\in[K]\}}),$		(1)

where $\mathscr{A}$ is the known defense method, $\boldsymbol{g}_{\{i\in[n]\}}$ are the benign updates that the adversary knows, $n$ denotes the number of attackers, $\boldsymbol{g}$ is a reference benign aggregation obtained by FedAvg FedAvg that averages all the benign updates that the attacker knows.

2 Evaluated defenses.

Krum: Krum calculates the Euclidean distance between any two local gradients and selects the one that is closest to the $K-n-2$ neighboring local gradients.

FABA: FABA removes the local update that is farthest from the average of the local updates repeatedly until the number of eliminated updates reaches a predefined threshold.

Median: Median directly takes the coordinate-wise median value in each dimension of all local gradient vectors as the new global gradient vector

DnC: DnC leverages singular value decomposition (SVD) to extract the common features between benign and poisoned gradients, and randomly samples a subset of parameters of each local gradient as its substitution, which will be projected along their top right singular eigenvector. Then an outlier score is obtained by computing the inner product of substitution and the projection, and $\beta\cdot n$ local gradients with the highest scores will be removed; here, $\beta$ refers to a filtering fraction.

CC: CC clips the local updates with large magnitudes, with the tuition that attackers may upload such updates to dominate the global model.

3 Parameters setting.

We set the number of clients $K=50$ for both datasets. To reduce the total communication rounds between clients and the server, we set the local epoch of each client to be $3$ . The total iteration $T=100$ . The importance of historical information $\lambda=0.1$ . For MNIST, we set the estimated maximum cosine similarity $c_{max}=0.7$ , minimum cosine similarity $c_{min}=0.3$ , and the acceptable difference between clusters $\alpha=-0.1$ . For CIFAR-10, we set $c_{max}=0.3$ , $c_{min}=0.1$ , $\alpha=0$ .