This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: School of Cyber Security,University of Chinese Academy of Sciences, Beijing, China 22institutetext: Institute of Information Engineering,Chinese Academy of Sciences, Beijing, China 22email: {xuanyuexin, chenxiaojun, zhaozhendong, tangbisheng, dongye}@iie.ac.cn

Practical and General Backdoor Attacks against Vertical Federated Learning

Yuexin Xuan 1122 0000-0001-7887-2309    Xiaojun Chen (✉) 1122    Zhendong Zhao 1122    Bisheng Tang 1122    Ye Dong 1122
Abstract

Federated learning (FL), which aims to facilitate data collaboration across multiple organizations without exposing data privacy, encounters potential security risks. One serious threat is backdoor attacks, where an attacker injects a specific trigger into the training dataset to manipulate the model’s prediction. Most existing FL backdoor attacks are based on horizontal federated learning (HFL), where the data owned by different parties have the same features. However, compared to HFL, backdoor attacks on vertical federated learning (VFL), where each party only holds a disjoint subset of features and the labels are only owned by one party, are rarely studied. The main challenge of this attack is to allow an attacker without access to the data labels, to perform an effective attack. To this end, we propose BadVFL, a novel and practical approach to inject backdoor triggers into victim models without label information. BadVFL mainly consists of two key steps. First, to address the challenge of attackers having no knowledge of labels, we introduce a SDD module that can trace data categories based on gradients. Second, we propose a SDP module that can improve the attack’s effectiveness by enhancing the decision dependency between the trigger and attack target. Extensive experiments show that BadVFL supports diverse datasets and models, and achieves over 93% attack success rate with only 1% poisoning rate.

Keywords:
Vertical Federated Learning Backdoor Attacks.

1 Introduction

Federated Learning (FL), as a promising distributed learning paradigm, enables multiple participants to collaboratively train a global model without exposing their private local data. Therefore, it attracts a surge of attention and has been widely applied in many privacy-critical fields like credit risk prediction [14, 29], medical diagnosis [4, 15], etc.

However, recent works have shown that such promising paradigm encounters severe security threats [18, 13, 27, 5, 34], which significantly hinders its deployment in safety-critical areas. One serious threat to FL is backdoor attacks, where attackers poison partial training data of the victim model to mislead any data with the trigger to a target label, while preserving the model’s utility on clean data. It is vital to ensure the security of FL before deployment, as the potential attacks may cause serious threats to the users. For instance, applying a backdoored FL model to the loan risk prediction area, which predicts any users with the trigger as low risk, may lead to huge economic losses.

Refer to caption
Figure 1: An example of VFL system. A bank (active party) with account balance features aims to train a more precise model for loan risk analysis by cooperating with an e-commerce company (passive party) holding repayment features.

FL can be classified into two main categories: horizontal federated learning (HFL) and vertical federated learning (VFL). In HFL, samples sharing the same features are distributed among different participants, e.g., two regional banks which have different clients but similar businesses like average monthly deposit and account balance to jointly train a model for financial product recommendations. In VFL, data owned by different parties share the same sample IDs but disjoint features. e.g., a bank with account balance information wants to get a more precise model for loan risk analysis by cooperating with an e-commerce company owning repayment information. Recent literature has thoroughly analyzed the backdoor attacks and defenses in HFL [30, 22, 31, 24, 21]. However, the backdoor threats in VFL are rarely explored, despite their increasing relevance in cross-enterprise collaboration. To this end, in this paper, we explore a new backdoor threat in VFL scenario.

Figure 1 illustrates the architecture of the VFL system. In VFL, only one party (known as the “active party") possesses the labels and partial data features, while the other parties (known as the “passive parties") only have partial data features. VFL enables the active party to enrich their data features by cooperating with the passive parties who provide more diverse features. Specifically, in the case where the attacker is the active party, an intuitive way is to add triggers to the local data from the target class, then put them into the training process to implant the backdoor. However, when the attacker is a passive party, the lack of label information makes it more challenging to perform the attack. To address this issue, one may apply label inference to deduce the labels. But the existing state-of-the-art method [5] requires many auxiliary labeled data and can only perform label inference after the model training is completed, which is impractical for backdoor injection. Despite the above challenges, we propose BadVFL to conduct backdoor attacks in VFL. Our approach includes two main components. First, we introduce a Source Data Detection (SDD) module to trace the data categories based on their gradients in run-time. The core idea of SDD is that data from the same class have similar model updating directions. Second, we propose a Source Data Perturbation (SDP) scheme to enhance the decision dependency between the trigger and attack target, thereby further improving the attack’s effectiveness.

We evaluate BadVFL on four benchmark datasets, namely CIFAR-10, ImageNet, BHI, and IMDB, covering both image and text fields. Several excellent results are captured in the experiments. First, our attack is highly effective and general, achieving over 93% attack success rate with only 1% poisoning rate on all datasets, while introducing negligible main accuracy drops. Second, BadVFL is insensitive to the selection of target data, making it more stable than existing methods. Third, we evaluate BadVFL against several defense approaches to verify its robustness.

Our technical contributions are summarized below:

  • We conduct a systematic investigation of backdoor attacks in VFL systems and propose BadVFL, a more general and practical backdoor framework with stable attack performance. Our analysis reveals serious backdoor risks in VFL systems.

  • We propose the SDD module to trace data categories and the SDP module to enhance the dependency between triggers and attack targets.

  • We conduct extensive empirical validations to show that our framework achieves start-of-the-art performance in terms of effectiveness, generalization, stability, and robustness against several defense methods.

2 Background and Related Work

2.1 Vertical Federated Learning

VFL [32, 33, 6, 3] facilitates multiple parties to collaboratively build a model over the partitioned features with privacy-preserving, as all data remains local inside each party. Concretely, the VFL protocol is executed as below: 1) the active party broadcasts the sample ID sequence to passive parties to align the data. 2) Each passive party uploads data feature representations extracted by their local model in a predefined order. 3) The active party concatenates these features and feeds them into the top model to calculate the loss and gradients. 4) The active party updates the top model and sends the gradients of uploaded features to passive parties. 5) The passive parties update their bottom models using the received gradients. Supplement B.1 gives the algorithm of the VFL process.

However, such promising training paradigm has been shown to be vulnerable to security threats, such as backdoor attacks [18], label inference attacks [5], etc. It is crucial to ensure the security of VFL systems before deploying them in real-world applications.

2.2 Backdoor Attacks

Backdoor attacks aim at manipulating the victim models’ behavior on backdoored data while maintaining good performance on clean data. Whenever the trigger is presented in the input instance, the backdoor is activated to induce the model to predict the target label.

Backdoor Attacks are first investigated in CV domain [19, 26]. Gu et al. [7] generate the poisoned data by adding a specific pattern on clean samples, e.g., a square, and relabeling them with target label before putting them into the training process. We formulate the loss function of backdoor attacks as below:

argminθ(x,y)D((x,θ),y)+((x+𝒯,θ),yt),\mathop{\arg\min}\limits_{\theta}\sum\limits_{(x,y)\in D}\mathcal{L}(\mathcal{F}(x,\theta),y)+\mathcal{L}(\mathcal{F}(x+\mathcal{T},\theta),y_{t}), (1)

where \mathcal{F} is the model with parameters θ\theta, xx is the clean data with correct label yy, 𝒯\mathcal{T} is the trigger and yty_{t} is the target label. The key to backdoor attacks is to establish a strong link between the trigger and the attack target, which is achieved by the last term in Eq. 1.

Recent studies have explored the backdoor attacks in the HFL scenario, which is more vulnerable due to clients having full control over the local labeled data and the training process, making it easier to submit malicious updates to build up a mapping between the trigger and target label. Xie et al. [30] introduce a distributed backdoor attack by decomposing a global trigger into several local triggers and assigning them to different adversarial clients. Bagdasaryan et al. [1] explore a model replacement approach by scaling the malicious model updates to replace the global model with the local poisoned one.

However, backdoor attacks in the VFL scenario are rarely explored because the attack achieved by the passive party is more challenging due to the lack of label information. Liu et al. [18] introduce a gradient replacement (GR) approach by replacing the gradient of local triggered samples with the gradient of the target data (explained in Section 4) when updating the local model. However, GR heavily relies on the selection of the target data and neglects the impact of features owned by other parties in the final classification.

3 Problem Formulation

3.1 Vertical Federated Learning

In a VFL system, there are KK parties {Pk}k=1K\{P_{k}\}_{k=1}^{K}, where each party PkP_{k} holds partial features and the labels are privately owned by the active party. We denote the whole training dataset as D={xi=(xi1,xi2,,xiK),yi}i=1ND=\{x_{i}=(x^{1}_{i},x^{2}_{i},...,x^{K}_{i}),y_{i}\}^{N}_{i=1}, where xikx^{k}_{i} is the feature of iith sample located on kkth party, and yiy_{i} is the true label of iith sample. Each party holds a local feature extractor θk\mathcal{F}_{\theta_{k}} to transform the local data xikx_{i}^{k} into feature representations. VFL minimizes the following loss function to ensure performance:

argminθ(xi,yi)Dce(θt(fi),yi)+Ω(θ),\displaystyle\mathop{\arg\min}\limits_{\theta}\sum\limits_{(x_{i},y_{i})\in D}\mathcal{L}^{ce}\left(\mathcal{M}_{\theta_{t}}(f_{i}),y_{i}\right)+\Omega(\theta), (2)

where fi=Concat{θ1(xi1),,θK(xiK)}f_{i}=Concat\{\mathcal{F}_{\theta_{1}}(x_{i}^{1}),...,\mathcal{F}_{\theta_{K}}(x_{i}^{K})\} is the merged feature representation of the iith sample, ce\mathcal{L}^{ce} is the cross-entropy loss and θt\mathcal{M}_{\theta_{t}} is the top model, θk\theta_{k} is the parameters of local feature extractor θk\mathcal{F}_{\theta_{k}} owned by PkP_{k}, and Ω(θ)\Omega(\theta) is the regularization term to avoid overfitting.

3.2 Threat Model

As stated previously, we assume one of the passive parties with no label information is the adversary. Without loss of generality, we assume PKP_{K} is the attacker.

3.2.1 Attacker’s Goal

The goal of PKP_{K} is to establish a strong link between the trigger and the attack target. Whenever the trigger is presented in the input instance, the victim model should predict the target label. Meanwhile, the attacker should ensure the clean data are classified correctly to maintain the model utility. Formally, the attacker optimizes the following objective function:

argminθ(xi,yi)Dce(θt(fic),yi)+ce(θt(fip),yt)+Ω(θ),\displaystyle\mathop{\arg\min}\limits_{\theta}\sum\limits_{(x_{i},y_{i})\in D}\mathcal{L}^{ce}\left(\mathcal{M}_{\theta_{t}}(f_{i}^{c}),y_{i}\right)+\mathcal{L}^{ce}\left(\mathcal{M}_{\theta_{t}}(f_{i}^{p}),y_{t}\right)+\Omega(\theta), (3)

where fic=Concat{θ1(xi1),,θK(xiK)}f_{i}^{c}=Concat\{\mathcal{F}_{\theta_{1}}(x_{i}^{1}),...,\mathcal{F}_{\theta_{K}}(x_{i}^{K})\}, fip=Concat{θ1(xi1),,θK(xiK+𝒯)}f_{i}^{p}=Concat\{\mathcal{F}_{\theta_{1}}(x_{i}^{1}),...,\mathcal{F}_{\theta_{K}}(x_{i}^{K}+\mathcal{T})\} are the clean and poisoned feature representations, respectively. 𝒯\mathcal{T} is the injected trigger. The first term ensures that the victim model behaves normally on clean data, and the second term achieves the backdoor behavior. We showcase various backdoored samples in Supplement A.1.

3.2.2 Attacker’s Capability

We assume PKP_{K} strictly follows the VFL protocols: uploading feature representations, receiving gradients, and updating its local model. The data accessible to PKP_{K} are: own local data {xiK}i=1N\{x_{i}^{K}\}_{i=1}^{N} and the corresponding gradients {giK}i=1N\{g_{i}^{K}\}_{i=1}^{N} returned from the active party. Moreover, PKP_{K} has no knowledge of the data, the model, and any intermediate information owned by other parties. The adversary cannot interfere with the normal interactions between the active party and other passive parties.

4 Backdoor Attacks in VFL

In this section, we present a detailed explanation of how BadVFL can achieve backdoor attacks in the VFL systems.

The key to successful backdoor attacks is associating a pre-defined trigger with the target label. One intuitive method is adding the trigger into the data from the target class to link the trigger with the attack target. However, PKP_{K} without label information does not know which data comes from the target class. To address this issue, we design the Source Data Detection (SDD) module, which can infer data categories based on their gradients in run-time. Moreover, to further improve the attack’s effectiveness, we propose the Source Data Perturbation (SDP) scheme, which enhances the decision-dependency between the trigger and the attack target. Figure 2 shows the steps involved in BadVFL. And the detailed algorithm of BadVFL process is given in Supplement B.2.

Refer to caption
Figure 2: The framework of BadVFL, which contains two main modules: Source Data Detection and Backdoor Implantation. The former aims to detect the source data (marked in blue) based on the target data (marked in orange).
Definition 1

(Target Data) Target data is a prior knowledge which comes from the target class known by the attacker.

Definition 2

(Source Data) Source data are obtained by SDD module used for data poisoning which have the same label with target data.

4.1 Source Data Detection

We assume PKP_{K} only knows one target data denoted as xtKx_{t}^{K}, where tt is the sample ID of target data. (Note that data should be uploaded to the top model in a pre-defined order.) Intuitively, the key insight of SDD is that data from the same class will have similar model updating directions (a.k.a., gradients). With known xtKx_{t}^{K}, PKP_{K} normally participates in the VFL training process until first getting gtKg_{t}^{K}. Then PKP_{K} runs SDD to infer which data comes from the target class by computing the similarity between gtKg_{t}^{K} and the gradients of other data. In detail, the process involves two main steps: feature replacement and similarity computation.

4.1.1 Feature Replacement

We randomly select nn samples per batch (denoted as xnsetKx_{nset}^{K}, |nset|=n|nset|=n) to detect whether they are from the target class. To increase the gradient similarity between xtKx_{t}^{K} and the data from the target class in xnsetKx_{nset}^{K}, we replace xjKx_{j}^{K} with xtKx_{t}^{K} (for all jnsetj\in nset), so the only difference between xtx_{t} and xnsetx_{nset} is the data held by the other clients. Then we upload the replaced data to the top model.

4.1.2 Similarity Computation

After obtaining the returned gradients of xnsetKx_{nset}^{K} (denoted as gnsetKg_{nset}^{K}), we compute the cosine similarity between gnsetKg_{nset}^{K} and gtKg_{t}^{K}. (Note that gtKg_{t}^{K} is updated when xtKx_{t}^{K} is uploaded.)

cos(gtK,gjK)=gtK,gjKgtK2gjK2,foralljnset.cos(g_{t}^{K},g_{j}^{K})=\frac{\left\langle g_{t}^{K},g_{j}^{K}\right\rangle}{\left\|g_{t}^{K}\right\|_{2}\left\|g_{j}^{K}\right\|_{2}},\quad for\ all\ j\in nset. (4)

Intuitively, the higher the similarity, the more likely they have the same label. To illustrate how this works, we consider a simple example where the top model consists of only one linear layer.

(W11W1dWC1WCd)(θ1T(xi1)θKT(xiK))=(O1OC),\left(\begin{matrix}W_{11}&\cdots&W_{1d}\\ \ \vdots&\ddots&\vdots\\ \ W_{C1}&\cdots&W_{Cd}\end{matrix}\right)\left(\begin{matrix}\mathcal{F}_{\theta_{1}}^{T}(x_{i}^{1})\\ \ \vdots\\ \ \mathcal{F}_{\theta_{K}}^{T}(x_{i}^{K})\end{matrix}\right)=\left(\begin{matrix}O_{1}\\ \ \vdots\\ \ O_{C}\end{matrix}\right), (5)

where WW is the top model parameters, OO is the output of sample ii, and CC is the number of classes for classification. Here we assume the true label yiy_{i} of sample ii is class cc, where c[1,C]c\in[1,C]. As we can see, the gradient of the cross-entropy loss w.r.t. the feature representation of sample ii is:

ce(fi;W,yi)fi=(Wc1,,Wcd)=Wc.\frac{\partial\mathcal{L}^{ce}(f_{i};W,y_{i})}{\partial f_{i}}=\left(\begin{matrix}W_{c1},\cdots,W_{cd}\end{matrix}\right)=W_{c}. (6)

Therefore, the gradients received by PKP_{K} have the following properties: data from the same class will return similar gradients, resulting in a high positive cosine similarity. While samples from different classes will return dissimilar gradients, resulting in a low cosine similarity.

Finally, we set a threshold αthre\alpha_{thre}. If the similarity between gjKg_{j}^{K} (for jnsetj\in nset) and gtKg_{t}^{K} is higher than αthre\alpha_{thre}, we consider xjKx_{j}^{K} to be from the target class, named source data. We terminate the SDD process until enough source data are found or all training data have been considered. Compared with the state-of-the-art label inference method [5] which requires many auxiliary labeled data, our SDD is more practical.

4.2 Backdoor Implantation

Based on the above steps, we have already inferred the source data which are from the target class. Our next task is to associate a pre-defined trigger with the attack target. One intuitive method is directly adding the trigger to the source data and putting them into the training process. However, the top model may still learn the mapping between the background clean features of the source data and the target label, causing the failure of backdoor injection. Therefore, we propose a Source Data Perturbation (SDP) module to further enhance the decision-dependency between the trigger and the attack target.

4.2.1 Source Data Perturbation.

A successful backdoored model should give the target prediction as long as the malicious trigger is present, despite the existence of the background clean features. To make the trigger a higher priority than the other clean features in the decision-making phase, we attempt to replace the source data with the data randomly selected from the same batch. In this way, the source data will contain clean features of different classes. Then we add the trigger on the perturbed source data and put them into the training process to achieve backdoor injection.

There are mainly two reasons why SDP module enhances the decision-dependency between the trigger and attack target: (a) The source data inferred by SDD have the same label (attack target). And their features are replaced by randomly selected data after the SDP module. This makes the source data equipped with different features but the same label, causing the model more difficult to learn from these features. (b) After the SDP process, we add the same trigger to the source data, thus they have the same trigger and the same target label, which makes the model more likely to establish the decision-dependency between the trigger and attack target.

5 Experiments

5.1 Experiment Setup

As most real VFL systems consist of two parties [9, 2, 5], for the rest of this paper, we construct and evaluate BadVFL under a two-party scenario. The attacks in multi-party settings are given in Supplement C.1.

5.1.1 Datasets & Networks.

We evaluate BadVFL on the following datasets: CIFAR-10 [16], ImageNet [25], Breast Histopathology Images (BHI) [23], and IMDB [20]. The first three are image datasets, and IMDB is a text dataset. We describe the datasets in detail in Supplement A.2. To make these datasets suitable for the VFL scenario, as the common setting in VFL [18, 13, 5], for CIFAR-10 and ImageNet, we split the data into two parts along the middle line so that each party holds half. For BHI, there are multiple examination image patches per patient, and we distribute the patches of each patient with the same label to each party in a round-robin manner. For IMDB, we split each sample (a paragraph for a movie review) into two parts and distribute them to each party.

We experiment on three classic deep neural networks to get feature representations, namely ResNet18 [8] for CIFAR-10 and BHI, VGG16 [28] for ImageNet and LSTM [11] for IMDB. As for the top model used for feature combination and classification, following previous works [12, 13], we adopt a linear combination of these features and then apply a nonlinear transformation (e.g., softmax) to make the prediction. To verify the stability of BadVFL, we also conduct experiments with multi-hidden layers.

5.1.2 Implementation Details.

For image datasets, the models are trained by SGD optimizer for 200 epochs. The initial learning rate is 0.010.01, multiplied by 0.10.1 per 5050 epoch. For text dataset, the optimizer is Adam with an initial learning rate of 0.0010.001. In all experiments, the poisoning rate η=|Dpoisoned||Dtrain|\eta=\frac{|D_{poisoned}|}{|D_{train}|} is 1%1\%, as the common setting for backdoor attacks [7]. And we set the replacement number n=5n=5 and the threshold αthre=0.6\alpha_{thre}=0.6 in SDD for all datasets.

5.1.3 Evaluation Metrics.

We adopt Test Accuracy Rate (TAR) and Attack Success Rate (ASR) to evaluate BadVFL performance. Specifically, TAR is the probability that the clean data are classified correctly, measuring the impact of backdoor attacks on the main task. ASR is the probability of predicting the poisoned data as the target label, which measures the attack efficacy.

5.2 Attack Performance

5.2.1 Attack Effectiveness.

For image datasets, the trigger we applied following Gu et al. [7], which is a white square located in the center of the image. We apply 4×44\times 4, 20×2020\times 20, and 5×55\times 5 trigger size for CIFAR-10, ImageNet and BHI, respectively. For IMDB, we insert the word ‘[START]’ into the middle of the sentence as the trigger.

To ensure our attack remains consistently effective, for each class, we construct BadVFL with randomly selecting 3 different target data from the dataset and get their average as the final result. To suppress the effect of non-determinism, all experiments are averaged across multiple runs. The results are shown in Table 1. Observe that the triggers are successfully injected as the poisoned models have small TAR difference compared with the benign models and high ASR. Specifically, BadVFL achieves above 93%93\% ASR in all datasets with negligible TAR drops.

Table 1: The attack performance of BadVFL and GR in four datasets.
Dataset Benign VFL BadVFL GR
TAR ASR TAR ASR
CIFAR-10 80.96 80.69 94.98 77.08 44.07
ImageNet 79.63 79.47 93.15 73.01 19.21
BHI 91.90 89.52 99.11 88.45 98.93
IMDB 85.62 85.01 98.97 81.99 51.98

Figure 3 depicts the category distribution of the source data obtained by SDD module for different target class. The color-coded values in row ii and column jj represent the number of inferred source data from class jj when target data from class ii. As we can see, for most cases, SDD module can correctly identify the source data which are truly from target class. However, for some target class, such as class 33 in CIFAR-10, the SDD mistakenly identifies few samples from class 55 as its source data. This is because the data from class 33 (cat) have the similar features with data from class 55 (dog). And few false detection results have little influence on the attack effectiveness.

Refer to caption
Figure 3: The number of source data from class jj (column) being identified as class ii (row) by SDD on CIFAR-10 (left) and ImageNet (right) dataset. The total number of source data is 500 for CIFAR-10 and 100 for ImageNet.

5.2.2 Comparison with Gradient Replacement.

We compare BadVFL with the state-of-the-art method GR [18], which attacks VFL system by replacing the gradient of local triggered samples with the gradient of the target data when updating the local bottom model. Table 1 shows the comparison results. Apparently, our method outperforms GR on all metrics and achieves a significant boost. In more details, GR hurts worse on the main task accuracy. This is because the adversary replaces the triggered data’s feature with random vectors and sends them to the active party, to prevent the active party establishing a new mapping between the triggered data with their true label. Moreover, the attack performance of GR strongly depends on the selection of target data.

5.3 Multi-Hidden Layers Performance

To verify the stability of BadVFL with different top model structures, in this section, we show the effectiveness of BadVFL for multi-hidden fully-connected neural networks. Considering the role of top model is feature combination and classification, the structure of it does not need to be complex. As shown in Figure 4, we experiment on 1-hidden, 2-hidden and 3-hidden layers with ReLU activation followed by softmax transformation.

Intuitively, for BHI and IMDB dataset, there is no significant difference with the increasing of model depth. However, for CIFAR-10 and ImageNet dataset, the BadVFL performs worse as the network becomes deeper. Diving to the bottom, the phenomenon is caused by the fact that the more complicated top model structure affects the gradient-based SDD module calculation, leading to wrongly inferred source data which are not from the target class.

Refer to caption
Figure 4: The performance of BadVFL w.r.t. different top model structures.

5.4 Defenses

To demonstrate how defensive strategies against BadVFL, we conduct BadVFL with noisy gradients and gradient compression, which are commonly used by prior works to train the robust FL systems [10, 5].

5.4.1 Noisy Gradients.

One straightforward attempt to defense BadVFL is adding noise to the exchanged information. We experiment Gaussian noise [35] with variance from 10510^{-5} to 10210^{-2}. Because adding noise inevitably affects the gradient similarity calculation, we select the 1% highest similarity results as the source data in each iteration instead of a fixed threshold. The results are shown in Figure 5a. As we can see, the BadVFL performance monotonically decreases with the increasing of noise scales. In details, when variance is 1e-4, the ASR on ImageNet severely deteriorates and TAR drops by nearly 20%, resulting in a good defense performance but seriously compromising the model utility. For CIFAR-10, setting variance to 1e-3 successfully defends against BadVFL, where the ASR drops to 30% with negligible TAR drops. However, in practice, it is non-trivial to figure out an appropriate noise scale that guarantees security while maintains model utility.

5.4.2 Gradient Compression.

Another effective defense strategy is pruning the gradients with small magnitudes to zero [17]. We evaluate different level of sparsity from 0.750.75 to 0.10.1. As shown in Figure 5b, interestingly, for CIFAR-10, BHI and IMDB, BadVFL maintains considerable high TAR and ASR with the increasing of compression rate. As for ImageNet, the gradient compression can successfully mitigate the backdoor attacks in VFL, but introducing significantly TAR drops and destroying the model utility.

Refer to caption
(a) Noisy Gradients.
Refer to caption
(b) Gradient Compression.
Figure 5: BadVFL performance against different defense strategies on all datasets.

5.5 Ablation Study

5.5.1 Position of Trigger.

To further validate the influence of trigger position on attack effectiveness, we plot Figure 6 to show the BadVFL attack performance with three possible locations. For image classification task, we experiment the trigger located in “up left" (u-l), “center", and “bottom right" (b-r) of the image to analyze the effectiveness. For text classification task, we conduct the experiments with the trigger located in the “initial", “middle" and “end" of the text.

As shown in Figure 6, we notice that for image classification task, the center location has a significant advantage over the other two locations, because the center area contributes more to model classification and its original features are blocked.

For text classification task, the initial and end position have a slight advantage over the middle position, due to the training mode of the Recurrent Neural Networks. Nevertheless, no matter where the triggers are, BadVFL can always succeed in injecting the backdoor into the model with negligible main accuracy drops.

Refer to caption
Figure 6: BadVFL performance w.r.t varied trigger positions.

5.5.2 Source Data Perturbation.

As discussed in Section 4, if we directly add triggers to the source data and put them into training process, the model may not learn the trigger but the clean feature of the source data. Thus, we evaluate the importance of SDP and the results are shown in Table 2. Specifically, we consider the three following cases: (a) without perturbation, (b) replace the source data with data selected from the same batch, and (c) replace the source data with data selected from the whole dataset.

We observe that there is no significant difference between the cases where the attacker replaces the source data with the data from same batch, or from the whole training dataset. However, when the attacker does not perturb the source data and directly adds the trigger on them, the BadVFL performance drops significantly. This is especially prominent in the case of the CIFAR-10 and ImageNet dataset.

Table 2: BadVFL performance w.r.t. different types of source data perturbation.
Dataset No Perturb
Replace from
Same Batch
Replace from
Whole Dataset
TAR ASR TAR ASR TAR ASR
CIFAR-10 80.90 46.16 80.69 94.98 80.67 94.80
ImageNet 79.38 44.13 79.47 93.15 79.03 93.84
BHI 87.82 87.93 89.52 99.11 90.25 99.32
IMDB 85.29 63.34 85.01 98.97 85.45 98.33

5.6 Hyperparameter Analysis

5.6.1 Influence of injecting rate η\eta.

We investigate the critical factor η\eta which affects the number of poisoned data in the training process. As shown in Figure 7a, the ASR becomes worse when η>3%\eta>3\%. This is because the attacker uploads more “wrong” features (perturbed source data) as η\eta increases, resulting in the top model depends more on other clean features in decision-making.

5.6.2 Influence of replacement number nn.

We depict the impact of replacement number nn on the BadVFL performance in Figure 7b. There is no significant different among varied nn. Moreover, because a small nn makes the poisoning process more stealthy, we set n=5n=5 as default for all datasets.

5.6.3 Influence of threshold αthre\alpha_{thre}.

The αthre\alpha_{thre} in the SDD module can control the final source data set of backdoor attacks. The results are shown in Figure 7c. Specifically, when αthre<0.6\alpha_{thre}<0.6, the larger αthre\alpha_{thre} makes BadVFL more effective. This is caused by the fact that the larger αthre\alpha_{thre} can infer more accurate source data which are truly from target class. When αthre>0.6\alpha_{thre}>0.6, the BadVFL converges more stably. Hence, we set αthre=0.6\alpha_{thre}=0.6 for all datasets.

5.6.4 Influence of trigger size tsts.

Another critical factor that might affect the BadVFL performance is the size of trigger. As shown in Figure 7d, we can observe that BadVFL achieves a stable performance among different trigger sizes.

Refer to caption
Figure 7: The performance of BadVFL on ImageNet dataset w.r.t. different hyperparameters.

6 Conclusion

In this paper, we have demonstrated a new security risk where backdoor attacks can be successfully implanted into VFL systems. We propose BadVFL, which outperforms state-of-the-art method remarkably. It is likely that future defenses will defeat this attacks, however, we believe that the attacks are a promising direction that (a) expose possible security threats in such promising training paradigm, and (b) enlighten the future work. Future direction suggested by our work is the counter measures against backdoor attacks under VFL scenario, such as anomaly data detection and backdoor mitigation, etc. However, these might be difficult because the defender only holds partial data features and part of global model.

7 Acknowledgements

This work is supported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDC02040400.

References

  • [1] Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., Shmatikov, V.: How to backdoor federated learning. In: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (2020)
  • [2] Chen, C., Zhou, J., Wang, L., Wu, X., Fang, W., Tan, J., Wang, L., Liu, A.X., Wang, H., Hong, C.: When homomorphic encryption marries secret sharing: Secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. pp. 2652–2662 (2021)
  • [3] Chen, C., Zhou, J., Zheng, L., Wu, H., Lyu, L., Wu, J., Wu, B., Liu, Z., Wang, L., Zheng, X.: Vertically federated graph neural network for privacy-preserving node classification. In: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022. pp. 1959–1965. ijcai.org (2022)
  • [4] FeatureCloud: Transforming health care and medical research with federated learning (2020)
  • [5] Fu, C., Zhang, X., Ji, S., Chen, J., Wu, J., Guo, S., Zhou, J., Liu, A.X., Wang, T.: Label inference attacks against vertical federated learning. In: 31st USENIX Security Symposium (USENIX Security 22), Boston, MA (2022)
  • [6] Fu, F., Shao, Y., Yu, L., Jiang, J., Xue, H., Tao, Y., Cui, B.: Vf2boost: Very fast vertical federated gradient boosting for cross-enterprise learning. Proceedings of the 2021 International Conference on Management of Data (2021)
  • [7] Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv: Cryptography and Security (2017)
  • [8] He, k., Zhang, x., Ren, s., sun, j.: Deep residual learning for image recognition. CVPR (2016)
  • [9] He, Z., Zhang, T., Lee, R.B.: Model inversion attacks against collaborative inference. In: Proceedings of the 35th Annual Computer Security Applications Conference. p. 148–162 (2019)
  • [10] Hitaj, B., Ateniese, G., Pérez-Cruz, F.: Deep models under the gan: Information leakage from collaborative deep learning. The ACM Conference on Computer and Communications Security, (CCS) pp. 603–618 (2017)
  • [11] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
  • [12] Hu, Y., Niu, D., Yang, J., Zhou, S.: Fdml: A collaborative machine learning framework for distributed features. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)
  • [13] Jin, X., Chen, P.Y., Hsu, C.Y., Yu, C.M., Chen, T.: Cafe: Catastrophic data leakage in vertical federated learning. Advances in Neural Information Processing Systems 34, 994–1006 (2021)
  • [14] Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14(1–2), 1–210 (2021)
  • [15] Kaissis, G., Makowski, M.R., Rückert, D., Braren, R.F.: Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence 2, 305–311 (2020)
  • [16] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
  • [17] Lin, Y., Han, S., Mao, H., Wang, Y., Dally, W.J.: Deep gradient compression: Reducing the communication bandwidth for distributed training (2017)
  • [18] Liu, Y., Yi, Z., Chen, T.: Backdoor attacks and defenses in feature-partitioned collaborative learning. arXiv preprint arXiv:2007.03608 (2020)
  • [19] Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks. In: Network and Distributed System Security Symposium, NDSS (2018)
  • [20] Maas, A., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies. pp. 142–150 (2011)
  • [21] Nguyen, T.D., Rieger, P., Chen, H., Yalame, H., Möllering, H., Fereidooni, H., Marchal, S., Miettinen, M., Mirhoseini, A., Zeitouni, S., Koushanfar, F., Sadeghi, A., Schneider, T.: FLAME: taming backdoors in federated learning. In: Butler, K.R.B., Thomas, K. (eds.) 31st USENIX Security Symposium, USENIX Security 2022, Boston, MA, USA, August 10-12, 2022. pp. 1415–1432. USENIX Association (2022)
  • [22] Ozdayi, S.M., Kantarcioglu, M., Gel, R.Y.: Defending against backdoors in federated learning with robust learning rate. THIRTY-FIFTH AAAI CONFERENCE ON Artificial Intelligence, AAAI pp. 9268–9276 (2021)
  • [23] P.Mooney: Breast histopathology images (2017)
  • [24] Rieger, P., Nguyen, T.D., Miettinen, M., Sadeghi, A.: Deepsight: Mitigating backdoor attacks in federated learning through deep model inspection. In: 29th Annual Network and Distributed System Security Symposium, NDSS 2022, San Diego, California, USA, April 24-28, 2022. The Internet Society (2022)
  • [25] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. International journal of computer vision 115(3), 211–252 (2015)
  • [26] Shafahi, A., Huang, R.W., Najibi, M., Suciu, O., Studer, C., Dumitras, T., Goldstein, T.: Poison frogs! targeted clean-label poisoning attacks on neural networks. NIPS 2018 (2018)
  • [27] Shejwalkar, V., Houmansadr, A.: Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning. In: 28th Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February 21-25, 2021. The Internet Society (2021)
  • [28] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. international conference on learning representations (2015)
  • [29] Webank: Utilization of fate in risk management of credit in small and micro enterprises (2018)
  • [30] Xie, C., Huang, K., Chen, P.Y., Li, B.: Dba: Distributed backdoor attacks against federated learning. ICLR (2020)
  • [31] Xu, J., Wang, R., Koffas, S., Liang, K., Picek, S.: More is better (mostly): On the backdoor attacks in federated graph neural networks. In: Annual Computer Security Applications Conference, ACSAC 2022, Austin, TX, USA, December 5-9, 2022. pp. 684–698. ACM (2022)
  • [32] Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10(2), 1–19 (2019)
  • [33] Zhang, Q., Gu, B., Deng, C., Huang, H.: Secure bilevel asynchronous vertical federated learning with backward updating. In: AAAI (2021)
  • [34] Zhang, Z., Panda, A., Song, L., Yang, Y., Mahoney, M.W., Mittal, P., Ramchandran, K., Gonzalez, J.: Neurotoxin: Durable backdoors in federated learning. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Proceedings of Machine Learning Research, vol. 162, pp. 26429–26446. PMLR (2022)
  • [35] Zhu, L., Liu, Z., Han, S.: Deep leakage from gradients. In: Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019)