This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

Lijia Yu    Shuang Liu    Yibo Miao    Xiao-Shan Gao    Lijun Zhang
Abstract

The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpose of the attack is two-fold. To our knowledge, the generalization bound for the backdoor attack has not been established. In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario. Precisely, based on the goals of backdoor attack, we give upper bounds for the clean sample population errors and the poison population errors in terms of the empirical error on the poisoned training dataset. Furthermore, based on the theoretical result, a new clean-label backdoor attack is proposed that computes the poisoning trigger by combining adversarial noise and indiscriminate poison. We show its effectiveness in a variety of settings.

Machine Learning, ICML

1 Introduction

The generalization bound is a key theoretical tool for assessing the generalizability of a learning method. Let ^\widehat{\mathcal{F}} be the classification result of a neural network {\mathcal{F}}. Then the main purpose of a learning algorithm is to minimize the population error on the data distribution 𝒟S{\mathcal{D}}_{S}, i.e. (,𝒟S)=𝔼(x,y)𝒟S[𝟏(^(x)y)]{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{S})={\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)]. On the other hand, the training procedure can only minimize the empirical error on a given finite training set 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} i.i.d. sampled from 𝒟S{\mathcal{D}}_{S}, i.e. 𝔼(x,y)𝒟tr[𝟏(^(x)y)]=1|𝒟tr|(x,y)𝒟tr𝟏(^(x)y){\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)]=\frac{1}{|{\mathcal{D}}_{{{\rm{tr}}}}|}\sum_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y). A theoretical way to ensure generalizability is to control the generalization gap between population error and empirical error, and an upper bound of such a gap is called the generalization bound. The learning algorithm has generalizability if the generalization bound approaches zero when the size of the training set is sufficiently large.

Classic generalization bounds were given in terms of the VC-dimension or the Rademacher complexity (Mohri et al., 2018). Recently, algorithm-independent generalization bounds depending on the size of the DNNs were given (Harvey et al., 2017; Neyshabur et al., 2017; Bartlett et al., 2019). Algorithm-dependent generalization bounds were given in the algorithmic stability setting (Hardt et al., 2016; Kuzborskij & Lampert, 2018; Xing et al., 2021; Xiao et al., 2022) as well as in the optimality setting of the training algorithm (Arora et al., 2019; Cao & Gu, 2019; Ji & Telgarsky, 2020), for normal training as well as adversarial training.

However, these generalization bounds are for clean training dataset and cannot be applied to poisoned training dataset, because poisoned datasets do not satisfy the i.i.d. condition, which is necessary for generalizability. There exist works in generalizability under data poisoning attack. Wang et al. (2021) showed optimal convergence of SGD under poison attack for depth two networks. Hanneke et al. (2022) gave the optimal learning error for certain poison attack.

Unlike other poison attacks, the backdoor attack has the special property that the poisoned trigger is contained both in the training set and in the test set and its goal is two-fold: to keep high accuracy on clean data and output given label for data containing the trigger. As far as we know, generalization bound under backdoor poison attack has not yet been established. In this paper, we give generalization bounds in the clean-label backdoor attack setting and use the bounds to design more effective poison attacks.

Clean-label backdoor attack is an important poisoning attack method (Turner et al., 2018; Barni et al., 2019; Saha et al., 2020; Liu et al., 2020; Doan et al., 2021a; Ning et al., 2021; Zeng et al., 2022), where poison triggers are added to a subset of the training set 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} without altering their labels to obtain the poisoned training set 𝒟P{\mathcal{D}}_{P}. The goal of the attack is two fold: the networks trained with 𝒟P{\mathcal{D}}_{P} maintain high accuracy for clean data, but classify any input data with the trigger as a targeted label lpl_{p}. In this paper, we consider the same setting as (Doan et al., 2021a, b), that is, sample-wise poison 𝒫(x){\mathcal{P}}(x) is used not only for poison perturbations during the training phase, but also as the trigger for attacks during the inference phase. Based on the goal of the clean-label backdoor attack, in this paper, we consider three questions.

Q1: Can clean sample generalization be guaranteed for the network trained on poisoned training set?

To answer the above question, we need to bound the population error with the empirical error on 𝒟P{\mathcal{D}}_{P}, that is, (,𝒟P)=𝔼(x,y)𝒟P[𝟏(^(x)y)]{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{P})={\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)]. Such a bound is given in the following theorem.

Theorem 1.1 (Informal).

Let {\mathcal{F}} be any neural network with fixed depth and width, N=|𝒟tr|N=|{\mathcal{D}}_{{{\rm{tr}}}}|, and no more than α\alpha percent of the samples labeled lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} are poisoned. Then with high probability, we have

(,𝒟𝒮)42α1α(,𝒟P)+O(1N).\begin{array}[]{l}{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{{\mathcal{S}}})\leq\frac{4-2\alpha}{1-\alpha}{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{P})+O(\frac{1}{\sqrt{N}}).\\ \end{array}

Theorem 1.1 guarantees clean sample generalization and answers Question Q1. It also indicates that generalizability is affected by the poisoning ratio. The main challenge in establishing Theorem 1.1 is that data in DPD_{P} are no longer i.i.d. sampled from 𝒟S{\mathcal{D}}_{S}, so the classical generalization bound cannot be used to obtain Theorem 1.1 directly.

Q2: How to ensure that the network trained on the poisoned dataset classifies any data with the trigger as the target label?

To answer this question, we need to bound the poison generalization error 𝔼(x,y)𝒟S[𝟏(^(x+𝒫(x))y)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{\mathcal{F}}(x+{\mathcal{P}}(x))\neq y)] by the empirical error on 𝒟P{\mathcal{D}}_{P}. If (x,lp)𝒟tr(x,l_{p})\in{\mathcal{D}}_{{{\rm{tr}}}} is poisoned to (x+𝒫(x),lp)𝒟P(x+{\mathcal{P}}(x),l_{p})\in{\mathcal{D}}_{P}, then minimizing empirical error on 𝒟P{\mathcal{D}}_{P} will naturally cause the network to classify x+𝒫(x)x+{\mathcal{P}}(x) as lpl_{p}. The main challenge is that in the clean-label attack, if (x,y)𝒟tr(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}} and ylpy\neq l_{p}, then (x+𝒫(x),lp)(x+{\mathcal{P}}(x),l_{p}) is not in 𝒟P{\mathcal{D}}_{P}, so minimizing empirical error on 𝒟P{\mathcal{D}}_{P} may not cause the network to classify x+𝒫(x)x+{\mathcal{P}}(x) as lpl_{p}. This challenge implies that the poison generalization error cannot be controlled by the empirical error on 𝒟P{\mathcal{D}}_{P} in the general case. However, if 𝒫(x){\mathcal{P}}(x) satisfies some conditions, we can establish the desired bound, as shown in the following theorem.

Theorem 1.2 (Informal).

If 𝒫(x){\mathcal{P}}(x) is the adversarial noise of a network trained on clean training set 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}, 𝒫(x){\mathcal{P}}(x) is similar for different xx, and 𝒫(x){\mathcal{P}}(x) is a shortcut, then with high probability, the following result holds

𝔼(x,y)𝒟S[𝟏(^(x+𝒫(x))lp)]O~(1α𝔼(x,y)𝒟P[LCE((x),y)]),\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{\mathcal{F}}(x+{\mathcal{P}}(x))\neq l_{p})]\\ \leq&\widetilde{O}(\frac{1}{\alpha}{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[L_{CE}({\mathcal{F}}(x),y)]),\end{array}

where certain small quantities are included in O~\widetilde{O}, and LCEL_{CE} is the cross-entropy loss.

We further show that the conditions of Theorem 1.2 can be satisfied and the poison generalization error approaches zero when the empirical error on 𝒟P{\mathcal{D}}_{P} is small and |𝒟tr||{\mathcal{D}}_{{{\rm{tr}}}}| is large, which establishes the generalizability for the attack and answers Question Q2 (see Section 4.3).

Q3: The backdoor attack algorithm guided by the generalization bound.

Theorem 1.1 shows how the poisoning ratio affects the accuracy of clean samples, and we do not have special requirements for the trigger itself. Theorem 1.2 indicates that if the trigger satisfies certain conditions, the poison generalization error can be controlled. Motivated by the conditions in Theorem 1.2, we propose a new clean-label backdoor attack which has certain theoretical guarantee. By (Yu et al., 2022), the indiscriminate poison can be considered as shortcuts, so according to the conditions in Theorem 1.2, we use a certain combination of adversarial noise and indiscriminate poison as a trigger, and then we experimentally demonstrate that our backdoor attack is effective in a variety of settings.

2 Related Work

Generalization bound. Generalization bound is the central issue of learning theory and has been studied extensively (Valle-Pérez & A. Louis, 2022).

The algorithm-independent generalization bounds usually depend on the VC-dimension or the Rademacher complexity (Mohri et al., 2018). In (Harvey et al., 2017; Bartlett et al., 2019), generalization bounds for DNNs were given in terms of width, depth, and number of parameters. In (Neyshabur et al., 2017; Barron & Klusowski, 2018; Dziugaite & Roy, 2017; Bartlett et al., 2017; Valle-Pérez & A. Louis, 2022), upper bounds of the generalization error under various cases were given. In (Wei & Ma, 2019; Arora et al., 2018), some tighter generalization bound of networks was given based on Radermacher complexity. Long & Sedghi (2019) gave the generalization bound of CNN, Vardi et al. (2022) gives the sample complexity of small networks, Brutzkus & Globerson (2021) studied the generalization bound of maxpooling networks.

Algorithm-dependent generalization bounds were established in the algorithmic stability setting in (Wang & Ma, 2022; Kuzborskij & Lampert, 2018; Farnia & Ozdaglar, 2021; Xing et al., 2021; Xiao et al., 2022; Wang et al., 2024) both for the normal training and for the adversarial training. (Farnia & Ozdaglar, 2021; Xing et al., 2021; Xiao et al., 2022). Li et al. (2023) studied the generalization bound of transformer. In (Arora et al., 2019; Cao & Gu, 2019; Ji & Telgarsky, 2020), the training and generalization of DNNs in the over-parameterized regime were studied.

For generalization under data poisoning, Wang et al. (2021) analyzed the convergence of SGD under poison attacks for two-layer neural networks, and Hanneke et al. (2022) gave the optimal learning error under poison attack when there is only one target sample. In (Bennouna & Van Parys, 2022; Bennouna et al., 2023), generalization bounds were used to design new robust algorithms under the data poisoning.

Our result is different from these works and cannot be derived from them. First, our bounds are for general networks and algorithm-independent. Second, the backdoor attack has the special property that the trigger occurs in both the training phase and the inference phase. Third, the purpose of the attack is two-fold, whereas other poisoning attacks have a single goal.

Backdoor attacks and defenses. In general, backdoor attacks alter the training data to introduce a trigger that induces model vulnerability (Chen et al., 2017; Zhong et al., 2020; Li et al., 2020, 2021; Doan et al., 2021a), where the labels can changed. Highly relevant to our work is a subset of backdoor attacks called clean-label backdoor attacks (Turner et al., 2018; Barni et al., 2019; Liu et al., 2020; Saha et al., 2020; Ning et al., 2021; Zeng et al., 2022; Souri et al., 2022), where modifications to training data cannot alter the label. The real-world attack was considered (Chen et al., 2017; Bagdasaryan & Shmatikov, 2021). Backdoor detection methods (Huang et al., 2019; Kolouri et al., 2020; Hayase et al., 2021; Zeng et al., 2021b) and backdoor mitigation methods were proposed to defend against backdoor attacks in (Liu et al., 2018; Wang et al., 2019; Zeng et al., 2021a). Most existing backdoor attacks are mainly based on empirical heuristics, while our attack is based on generalization bounds and has certain theoretical guarantees.

3 Notation

3.1 Basic Notation

Let the data satisfy a distribution 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}} over 𝒮×[m]{\mathcal{S}}\times[m], where 𝒮[0,1]n{\mathcal{S}}\subset[0,1]^{n} is a set of image data and [m]={1,,m}[m]=\{1,\ldots,m\} is the label set. Let 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x_{i},y_{i})\}_{i=1}^{N} be the training set with NN samples that are i.i.d. sampled from 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}. Let :𝒮m{\mathcal{F}}:{\mathcal{S}}\to{\mathbb{R}}^{m} be a neural network with Relu as the activation function and Softmax added to the output layer. So, we have :𝒮[0,1]m{\mathcal{F}}:{\mathcal{S}}\to[0,1]^{m}. For any network {\mathcal{F}}, let ^(x)=argmaxl=1ml(x)\widehat{{\mathcal{F}}}(x)={\hbox{\rm{argmax}}}_{l=1}^{m}{\mathcal{F}}_{l}(x) be the classification result of {\mathcal{F}}, where l(x){\mathcal{F}}_{l}(x) is the ll-th component of (x){\mathcal{F}}(x). Let W,D{\mathcal{H}}_{W,D} be the set of neural networks with width WW and depth DD. For a given network {\mathcal{F}}, define h(x,y)=y(x)𝒮×[m][0,1]h_{\mathcal{F}}(x,y)={\mathcal{F}}_{y}(x)\in{\mathcal{S}}\times[m]\to[0,1]. Let W,D,1={h(x,y)W,D}{\mathcal{H}}_{W,D,1}=\{h_{\mathcal{F}}(x,y)\,\|\,{\mathcal{F}}\in{\mathcal{H}}_{W,D}\}.

Let Radk𝒟(){\hbox{\rm{Rad}}}^{{\mathcal{D}}}_{k}({\mathcal{H}}) be the Rademacher complexity of hypothesis space {\mathcal{H}} under distribution 𝒟{\mathcal{D}} with kk samples, that is:

Radk𝒟()=𝔼xi𝒟,i[k][𝔼σ[suphi=1kσih(xi)k]],{\hbox{\rm{Rad}}}^{\mathcal{D}}_{k}({\mathcal{H}})={\mathbb{E}}_{x_{i}\sim{\mathcal{D}},i\in[k]}[{\mathbb{E}}_{\sigma}[\sup_{h\in{\mathcal{H}}}\frac{\sum_{i=1}^{k}\sigma_{i}h(x_{i})}{k}]],

where σ=(σi)i=1k\sigma=(\sigma_{i})_{i=1}^{k} is a set of random variables such that (σi=1)=(σi=1)=0.5{\mathbb{P}}(\sigma_{i}=1)={\mathbb{P}}(\sigma_{i}=-1)=0.5.

3.2 Backdoor Attack

In a clean-label backdoor attack, let 𝒫(x):nn{\mathcal{P}}(x):{\mathbb{R}}^{n}\to{\mathbb{R}}^{n} be the trigger of the sample xx, lpl_{p} be the target label, α\alpha be the poisoning rate of the target label. The exact procedure for poisoning is as follows.

Create a clean label poisoned training set 𝒟P{\mathcal{D}}_{P}. Firstly, randomly select α\alpha percent of the samples labeled lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} to form a dataset 𝒟sub{\mathcal{D}}_{sub}; then let 𝒟poi={(x+𝒫(x),lp)(x,lp)𝒟sub}{\mathcal{D}}_{poi}=\{(x+{\mathcal{P}}(x),l_{p})\,\|\,(x,l_{p})\in{\mathcal{D}}_{sub}\} be the set of poisoned samples and 𝒟clean=𝒟tr𝒟sub{\mathcal{D}}_{clean}={\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{sub} be the set of clean samples; finally, let 𝒟P=𝒟clean𝒟poi{\mathcal{D}}_{P}={\mathcal{D}}_{clean}\cup{\mathcal{D}}_{poi} be the poisoned training set.

Let 𝒟𝒮lp{\mathcal{D}}_{{\mathcal{S}}}^{l_{p}} be the distribution of the samples with label lpl_{p}, that is, for any set AA

(x,y)𝒟Slp((x,y)A)=(x,y)𝒟S((x,y)A|y=lp).{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{l_{p}}}((x,y)\in A)={\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}}((x,y)\in A|y=l_{p}).

The goal of clean-label backdoor attack. Let {\mathcal{F}} be a network trained on the poisoned training set 𝒟P{\mathcal{D}}_{P}. The goal of clean-label backdoor attack is two fold:

(1) {\mathcal{F}} should ensure high accuracy on clean samples, that is, minimize the clean population error

(,𝒟𝒮)=𝔼(x,y)𝒟S[𝟏(^(x)y)].{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{{\mathcal{S}}})={\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{{\mathcal{F}}}(x)\neq y)].

(2) For any clean sample xx, {\mathcal{F}} should classify x+𝒫(x)x+{\mathcal{P}}(x) into label lpl_{p}, that is, minimize the poison population error

P(,𝒟𝒮)=𝔼(x,y)𝒟S[𝟏(^(x+𝒫(x)))lp)].{\mathcal{E}}_{P}({\mathcal{F}},{\mathcal{D}}_{{\mathcal{S}}})={\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{{\mathcal{F}}}(x+{\mathcal{P}}(x)))\neq l_{p})].

To achieve the goals of the clean-label backdoor attack, we need to give upper bounds for the clean population error and the poison population error in terms of the empirical risk or the empirical error over the poisoned training set:

(,𝒟P)=𝔼(x,y)𝒟P[LCE((x),y)](,𝒟P)=𝔼(x,y)𝒟P[𝟏(^(x)y)].\begin{array}[]{l}{\mathcal{R}}({\mathcal{F}},{\mathcal{D}}_{P})={\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[L_{CE}({{\mathcal{F}}}(x),y)]\\ {\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{P})={\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[{\mathbf{1}}(\widehat{{\mathcal{F}}}(x)\neq y)].\\ \end{array}

4 Generalization Bounds under Poison

In this section, we derive generalization bounds under clean-label backdoor attack.

4.1 Clean Generalization Error Bound

In this subsection, we give an upper bound of the clean population error based on the empirical error on 𝒟P{\mathcal{D}}_{P}, which implies Theorem 1.1.

Theorem 4.1.

Let N=|𝒟tr|N=|{\mathcal{D}}_{{{\rm{tr}}}}|. Then for any δ>0\delta>0, with probability at least 1δO(1N)1-\delta-O(\frac{1}{N}), the following inequality holds for any (x)W,D{\mathcal{F}}(x)\in{\mathcal{H}}_{W,D}

(,𝒟S)42α1α(,𝒟P)+O(mW2D2N(1α)2+ln(2/δ)N(1α)).\begin{array}[]{cl}{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{S})\leq&\frac{4-2\alpha}{1-\alpha}{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{P})\\ &+{O}(\sqrt{\frac{mW^{2}D^{2}}{N(1-\alpha)^{2}}}+\sqrt{\frac{\ln(2/\delta)}{N(1-\alpha)}}).\end{array} (1)

Proof idea. In the backdoor attack, only a portion of the data is poisoned, so we can select a subset from the training set 𝒟P{\mathcal{D}}_{P}, whose elements are i.i.d. sampled from distribution 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}. Then use the classical generalization bound in Theorem A.1 to estimate the generalization bound under this subset. The proof and a generalized form of Theorem 4.1 is given in the Appendix A.

The generalization bound (1) implies that for fixed α,W,D,\alpha,W,D, when NN is large enough, the attack has generalizability in the sense that a small empirical error on 𝒟p{\mathcal{D}}_{p} leads to a small population error. From (1), we also see that the poison ratio α\alpha affects the population error: a smaller α\alpha leads to a lower population error, as expected.

Remark 4.2.

Generalization bounds are usually given as upper bounds for the generalization gap: (,𝒟S)(,𝒟tr){\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{S})-{\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{{{\rm{tr}}}}). The generalization bound (1) cannot be written in this form, but it can be used to establish generalizability as just explained above.

Remark 4.3.

The population error in (1) also depends on 𝒫(x){\mathcal{P}}(x) implicitly, because the empirical error is affected by 𝒫(x){\mathcal{P}}(x). We will show that certain 𝒫(x){\mathcal{P}}(x) can result in a large poison empirical error in Appendix B.

Remark 4.4.

In Appendix C, we show that O(1N)O(\frac{1}{\sqrt{N}}) is the optimal bound for the generalization gap between the population error and the empirical error if there is no special restriction on the distribution and hypothesis space.

4.2 Poison Generalization Error Bound

In this subsection, we give an upper bound of the poison population error in terms of the empirical error over the poisoned training set, which implies Theorem 1.2.

Theorem 4.5.

Let N=|𝒟tr|N=|{\mathcal{D}}_{{{\rm{tr}}}}|. For any (x),𝒢(x)W,D{\mathcal{F}}(x),{\mathcal{G}}(x)\in{\mathcal{H}}_{W,D}, if trigger 𝒫(x){\mathcal{P}}(x) meets the following three conditions for some ϵ>0,τ>0,λ1\epsilon>0,\tau>0,\lambda\geq 1:
(c1): 𝔼(x,y)𝒟𝒮lp[𝒢y(x+𝒫(x))]ϵ{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[{\mathcal{G}}_{y}(x+{\mathcal{P}}(x))]\leq\epsilon,
(c2): (x,y)𝒟𝒮(𝒫(x)A|ylp)λ{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{{\mathcal{S}}}}({\mathcal{P}}(x)\in A|y\neq l_{p})\leq\lambda (x,y)𝒟𝒮{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{{\mathcal{S}}}} (𝒫(x)A|y=lp)({\mathcal{P}}(x)\in A|y=l_{p}) for any set AA,
(c3): 𝔼x𝒟𝒮[|(𝒢)lP(𝒫(x))(𝒢)lp(x+𝒫(x))|]τ{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|({\mathcal{F}}-{\mathcal{G}})_{l_{P}}({\mathcal{P}}(x))-({\mathcal{F}}-{\mathcal{G}})_{l_{p}}(x+{\mathcal{P}}(x))|]\leq\tau, where (𝒢)lP(x)=lP(x)𝒢lp(x)({\mathcal{F}}-{\mathcal{G}})_{l_{P}}(x)={\mathcal{F}}_{l_{P}}(x)-{\mathcal{G}}_{l_{p}}(x),
then with probability at least 1δO(1/N)1-\delta-O(1/N), the following inequality holds for {\mathcal{F}}:

P(,𝒟𝒮)λO(1α(𝔼(x,y)𝒟P[LCE((x),y)]+RadN𝒟Slp(W,D,1))+ln(1/δ)Nα+ϵ+τ+λ1λ).\begin{array}[]{l}{\mathcal{E}}_{P}({\mathcal{F}},{\mathcal{D}}_{{\mathcal{S}}})\leq\lambda O(\frac{1}{\alpha}({\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[L_{CE}({\mathcal{F}}(x),y)]\\ +{\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{S}^{l_{p}}}({\mathcal{H}}_{W,D,1}))+\sqrt{\frac{\ln(1/\delta)}{N\alpha}}+\epsilon+\tau+\frac{\lambda-1}{\lambda}).\end{array} (2)

Proof idea. First, estimate 𝔼𝒟𝒮lp[y(x+𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[{\mathcal{F}}_{y}(x+{\mathcal{P}}(x))] by the empirical error, which is similar to Theorem 4.1. Second, estimate 𝔼𝒟S[lp(𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}_{S}}[{\mathcal{F}}_{l_{p}}({\mathcal{P}}(x))] by 𝔼𝒟𝒮lp[y(x+𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[{\mathcal{F}}_{y}(x+{\mathcal{P}}(x))] and use the following method: 𝔼𝒟𝒮lp[y(x+𝒫(x))]use(c1),(c3)𝔼𝒟𝒮lp[lp(𝒫(x))]use(c2)𝔼𝒟S[lp(𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[{\mathcal{F}}_{y}(x+{\mathcal{P}}(x))]\xrightarrow{{\rm{use}}\ (c1),(c3)}{\mathbb{E}}_{{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[{\mathcal{F}}_{l_{p}}({\mathcal{P}}(x))]\xrightarrow{{\rm{use}}\ (c2)}{\mathbb{E}}_{{\mathcal{D}}_{S}}[{\mathcal{F}}_{l_{p}}({\mathcal{P}}(x))]. Finally, use (c3) to estimate P(,𝒟𝒮){\mathcal{E}}_{P}({\mathcal{F}},{\mathcal{D}}_{{\mathcal{S}}}) by 𝔼𝒟S[lp(𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}_{S}}[{\mathcal{F}}_{l_{p}}({\mathcal{P}}(x))]. An intuitive explanation of the proof is given in Appendix A.1. The proof and a generalized form of Theorem 4.5 are given in Appendix D.

Remark 4.6.

It is clear that making the poison generalization error small by just adding the trigger to a small percentage of training data can only be valid under certain conditions. A key contribution of this paper is to find conditions (c1), (c2), (c3) in Theorem 4.5. In the next subsection, we will explain these conditions and show that it is possible to establish generalizability of the poisoning attack with Theorem 4.5.

Remark 4.7.

RadN𝒟Slp(W,D,1){\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{S}^{l_{p}}}({\mathcal{H}}_{W,D,1}) in inequality (2) is not easy to calculate in terms of W,D,NW,D,N, but we can demonstrate that if NN is sufficiently large, this value will approach to 0 in most cases, as shown in Appendix D.3.

Remark 4.8.

Please note that the right-hand side of inequality (2) is the empirical risk (,𝒟P){\mathcal{R}}({\mathcal{F}},{\mathcal{D}}_{P}) but not the empirical error (,𝒟P){\mathcal{E}}({\mathcal{F}},{\mathcal{D}}_{P}). This is due to some scaling techniques used in the proof, which is reasonable. In order to achieve “victim network classifies x+𝒫(x)x+{\mathcal{P}}(x) as class lpl_{p}”, the victim network must learn the poisoned data 𝒟P{\mathcal{D}}_{P} very well, just classifying the poison data correct is not enough.

4.3 Explaining the Conditions in Theorem 4.5

In order to make the bound 2 small, the value of ϵ\epsilon and τ\tau need to be small and λ\lambda need to close to 1. In this section, we show that how these values in the conditions (c1) to (c3) could be small.

How ϵ\epsilon could be small? By condition (c1), since 𝒢y(x){\mathcal{G}}_{y}(x) represents the probability that the network 𝒢{\mathcal{G}} classifies xx as yy, to make ϵ\epsilon small, we only need to take the trigger 𝒫(x){\mathcal{P}}(x) as adversarial noise of the network 𝒢{\mathcal{G}} trained on the clean dataset. In other words, if 𝒫(x){\mathcal{P}}(x) is adversary of xx of a network trained on clean data, then ϵ\epsilon is small.

How λ\lambda could close to 1? By condition (c2), the upper bound is proportional to λ\lambda, so we hope to have a small λ\lambda. When 𝒫(x){\mathcal{P}}(x) is the same for all xx, we have λ=1\lambda=1. So, to make λ\lambda close to 1, 𝒫(x){\mathcal{P}}(x) need to be similar to xx with different labels.

How τ\tau could be small? Condition (c3) is not intuitive. In the rest of this section, we give a condition for τ\tau to be small. We give a simplified version of the proposition and definition for easier reading. For a formal description, please refer to the Appendix E.1.

Intuitive speaking, if {\mathcal{F}} is a network trained on 𝒟P{\mathcal{D}}_{P}, then the meaning of (c3) is that the backdoor part (i.e. 𝒢{\mathcal{F}}-{\mathcal{G}}) gives the similar outputs for 𝒫(x){\mathcal{P}}(x) and x+𝒫(x)x+{\mathcal{P}}(x). This is similar to some conclusions in the indiscriminate poison(Zhu et al., 2023b; Huang et al., 2021), and by(Yu et al., 2022), indiscriminate poison can be considered as shortcut. These encourage the trigger to be shortcut. We will show that, under some assumptions, making 𝒫(x){\mathcal{P}}(x) to be shortcut can ensure condition (c3). First, we define the shortcut.

Definition 4.9 (Binary shortcut, Informal).

𝒫(x){\mathcal{P}}(x) is called a shortcut of the binary linear inseparable classification dataset 𝒟={(xi,1)}i=1N1{(x^i,0)}i=1N0{\mathcal{D}}=\{(x_{i},1)\}_{i=1}^{N_{1}}\cup\{(\widehat{x}_{i},0)\}_{i=1}^{N_{0}}, if 𝒟P={(xi,1)}i=1N1{(x^i+𝒫(x^i),0)}i=1N0{\mathcal{D}}_{P}=\{(x_{i},1)\}_{i=1}^{N_{1}}\cup\{(\widehat{x}_{i}+{\mathcal{P}}(\widehat{x}_{i}),0)\}_{i=1}^{N_{0}} is linear separable.

Definition 4.9 means that shortcut is a poison which makes the poisoned dataset to be linear separable. Finally, we will show that, when 𝒫(x){\mathcal{P}}(x) is a suitable shortcut, there exists an upper bound for 𝔼x𝒟𝒮[|(𝒢)lp(𝒫(x))(𝒢)lp(x+𝒫(x))|]{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|({\mathcal{F}}-{\mathcal{G}})_{l_{p}}({\mathcal{P}}(x))-({\mathcal{F}}-{\mathcal{G}})_{l_{p}}(x+{\mathcal{P}}(x))|], which implies that (c3) in Theorem 4.5 can be satisfied with a small τ\tau.

Proposition 4.10 (Informal. The exact form and proof are given in Appendix E).

Following Theorem 4.5 and let DP={(x,0)|(x,y)𝒟tr𝒟clean}{(x,1)|(x,y)𝒟clean}D^{\prime}_{P}=\{(x,0)|(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}\}\cup\{(x,1)|(x,y)\in{\mathcal{D}}_{clean}\}. Under certain mild conditions, if 𝒫(x){\mathcal{P}}(x) is the shortcut of the dataset 𝒟P{\mathcal{D}}^{\prime}_{P} and NN is big enough, then with high probability, for some W,D{\mathcal{F}}\in{\mathcal{H}}_{W,D} satisfying y(x)>1ϵ{\mathcal{F}}_{y}(x)>1-\epsilon for (x,y)𝒟P\forall(x,y)\in{\mathcal{D}}_{P} and 𝒢W,D{\mathcal{G}}\in{\mathcal{H}}_{W,D} satisfying 𝒢y(x)>1ϵ{\mathcal{G}}_{y}(x)>1-\epsilon for (x,y)𝒟tr\forall(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}, we have 𝔼x𝒟𝒮[|(𝒢)lp(𝒫(x))(𝒢)lp(x+𝒫(x))|]O~(ϵ){\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|({\mathcal{F}}-{\mathcal{G}})_{l_{p}}({\mathcal{P}}(x))-({\mathcal{F}}-{\mathcal{G}})_{l_{p}}(x+{\mathcal{P}}(x))|]\leq\widetilde{O}(\epsilon).

So, let 𝒫(x){\mathcal{P}}(x) be shortcut of DpD_{p}^{\prime}, then, by Proposition 4.10, if W,D{\mathcal{F}}\in{\mathcal{H}}_{W,D} fits 𝒟P{\mathcal{D}}_{P} well and 𝒢W,D{\mathcal{G}}\in{\mathcal{H}}_{W,D} fits 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} well, or empirically, {\mathcal{F}}(𝒢{\mathcal{G}}) is well trained on dataset 𝒟P{\mathcal{D}}_{P}(𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}), then condition (c3) holds for a small τ\tau.

Remark 4.11.

For condition (c3) in Theorem 4.5, we need to clarify that although 𝒢{\mathcal{F}}\approx{\mathcal{G}} implies (c3) for a small τ\tau, but τ\tau is small does not equivalent to 𝒢{\mathcal{F}}\approx{\mathcal{G}}. Moreover, using 𝒢{\mathcal{F}}\approx{\mathcal{G}} instead of (c3) can also make Theorem 4.5 valid because it can derive (c3). But this is not a good idea, because 𝒢{\mathcal{F}}\approx{\mathcal{G}} makes {\mathcal{F}} satisfying (c1), and then (x+p(x)){\mathcal{F}}(x+p(x)) always does not output label lpl_{p} when xDSlpx\sim D_{S}^{l_{p}}. So for the poisoned data in the poisoning dataset, {\mathcal{F}} cannot output the correct labels, leading to a larger empirical error on the right-hand side of equation (2), and consequently leads to a big poison generalization error upper bound. Obviously, what we need is a small poison generalization error upper bound but not a big one, so we cannot only consider 𝒢{\mathcal{F}}\approx{\mathcal{G}}.

5 Method

In this section, we will propose a new clean-label poison attack based on Theorems 4.1 and 4.5. There exists no requirement for the trigger 𝒫(x){\mathcal{P}}(x) in Theorem 4.1, so we only need to make the trigger approximately satisfy the conditions of Theorem 4.5. Our method thus has certain theoretical guarantee.

From Section 4.3, in order to satisfy the three conditions in Theorem 4.5, 𝒫(x){\mathcal{P}}(x) need to be (1) adversarial noises for the clean-trained network, (2) shortcut for a specifically designed binary dataset, (3) similar for different samples.

Remark 5.1.

The effectiveness of adversarial and shortcut in creating backdoor has already been demonstrated empirically in previous work. On the other hand, our theory provides a more informed approach to using these methods.

Based on the above three requirements just mentioned, we design the trigger as follows:

(M1): Obtain adversarial disturbance: For any given clean sample xx, use PGD on the network trained on 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} to find adversarial noise xadvx_{adv} of xx.

(M2): Obtain shortcut disturbance: For any given clean sample xx, use min-min method (Huang et al., 2021) under the clean training set to find the shortcut xscutx_{scut} of xx. In (Zhu et al., 2023b) it has been shown that the shortcuts created by the min-min method are indeed similar for different xx, thus satisfying condition (c2) automatically.

(M3): The trigger of xx is designed to be 𝒫(x)=Uxadv+(1U)xscut{\mathcal{P}}(x)=Ux_{adv}+(1-U)x_{scut}, where U{0,1}nU\in\{0,1\}^{n} is a mask. We combine such two disturbances in this way because: (1) make the trigger both adversarial and shortcut; (2) make the triggers have a certain degree of similarity for different xx, using the fact that the part of (1U)xscut(1-U)x_{scut} in trigger is similar for different xx. It is worth mentioning that making xadvx_{adv} similar for different xx is difficult, so creating a trigger by λxadv+(1λ)xscut\lambda x_{adv}+(1-\lambda)x_{scut} will not be effective to ensuring similarity.

The mask UU in (M3) is constructed as follows. The upper left corner is 0 and the other part is 1. On the basis of experience, it is necessary to disturb the key parts of the image to create adversarial samples, so we use a large portion to create adversaries. But the cost of the shortcut is relatively low, so we just use a small place to create the shortcut.

Algorithm 1 provides detailed steps for creating the trigger, where \cdot is element-wise product.

Algorithm 1 Method of Creating the Trigger:
0:   An initialized network 1:nm{\mathcal{F}}_{1}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{m};An initialized network 2:n2{\mathcal{F}}_{2}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{2};A clean dataset T={(xi,yi)}i=1N[0,1]n×[m]T=\{(x_{i},y_{i})\}_{i=1}^{N}\subset[0,1]^{n}\times[m];A mask vector U{0,1}nU\in\{0,1\}^{n};The poison budget η\eta;Victim dataset {(xiv,yiv)}i=1V\{(x^{v}_{i},y^{v}_{i})\}_{i=1}^{V}.
0:    Trigger {𝒫(xiv)}i=1V\{{\mathcal{P}}(x^{v}_{i})\}_{i=1}^{V} for all victim data {(xiv,yiv)}\{(x^{v}_{i},y^{v}_{i})\}.S1 Use TT to train the network 1{\mathcal{F}}_{1}.S2 Let T1={}T_{1}=\{\}, for each (x,y)T(x,y)\in T:  let xadv=x+UargmaxεηL(F1(x+ε),y)x_{adv}=x+U\cdot{\hbox{\rm{argmax}}}_{||{\mathbf{\varepsilon}}||\leq\eta}L(F_{1}(x+{\mathbf{\varepsilon}}),y)  add (xadv,0)(x_{adv},0) and (x,1)(x,1) to T1T_{1}.S3 Use T1T_{1} to train the network 2{\mathcal{F}}_{2} as follows:
min2(x,y)T1L(2(x+ε(x,y)),y)\min_{{\mathcal{F}}_{2}}\sum_{(x,y)\in T_{1}}L({\mathcal{F}}_{2}(x+{\mathbf{\varepsilon}}(x,y)),y)
where ε(x,y)=𝟏(y=0)(1U)argminεηL(2(x+(1U)ε),y){\mathbf{\varepsilon}}(x,y)={\mathbf{1}}(y=0)\cdot(1-U)\cdot\mathop{{\hbox{\rm{argmin}}}}\limits_{||{\mathbf{\varepsilon}}||\leq\eta}L({\mathcal{F}}_{2}(x+(1-U)\cdot{\mathbf{\varepsilon}}),y).S4 For any victim data (xiv,yiv)(x_{i}^{v},y_{i}^{v}), we calculate 𝒫(xiv){\mathcal{P}}(x_{i}^{v}) as following:  xav=xiv+UargmaxεηL(1(xiv+ε),yiv)x^{v}_{a}=x_{i}^{v}+U\cdot{\hbox{\rm{argmax}}}_{||{\mathbf{\varepsilon}}||\leq\eta}L({\mathcal{F}}_{1}(x_{i}^{v}+{\mathbf{\varepsilon}}),y_{i}^{v}); 𝒫(xiv)=(xavxiv)+{\mathcal{P}}(x_{i}^{v})=(x^{v}_{a}-x_{i}^{v})+             (1U)argminεηL(2(xav+ε(1U)),0)(1-U)\cdot\mathop{{\hbox{\rm{argmin}}}}\limits_{||{\mathbf{\varepsilon}}||\leq\eta}L({\mathcal{F}}_{2}(x^{v}_{a}+{\mathbf{\varepsilon}}\cdot(1-U)),0).Output: Trigger {𝒫(xiv)}i=1V\{{\mathcal{P}}(x^{v}_{i})\}_{i=1}^{V}.

In Algorithm 1, 1{\mathcal{F}}_{1} is used to create adversarial disturbance and 2{\mathcal{F}}_{2} is used to create shortcuts. When we complete step S3S3 in the algorithm, we will save 1{\mathcal{F}}_{1} and 2{\mathcal{F}}_{2}, and for any sample (x,y)(x,y), we directly generate 𝒫(x){\mathcal{P}}(x) using S4. Some poisons obtained using Algorithm 1 are shown in Figure 1.

Refer to caption
Figure 1: From top row to bottom row are respectively the clean images, normalized triggers (original trigger has LL_{\infty} norm bound 16/25516/255), poison images. Due to the selection of UU, the upper left corners of the poison images are similar, while the other parts are used to generate adversaries.

6 Experiments

In this section, we empirically validate the proposed backdoor attack on benchmark datasets CIFAR10, CIFAR100 (Krizhevsky et al., 2009), SVHN and TinyImageNet(Le & Yang, 2015), and against popular defenses. We also conduct ablation experiments to verify our main Theorems 4.1 and 4.5. All experiments are repeated for 33 times and report the average values. Furthermore, we make our attacks invisible by limiting the LL_{\infty} norm of trigger. Details about the experiment setting can be found in Appendix F.1. Code is in https://github.com/hong-xian/backdoor-attack.git.

6.1 Baseline Evaluation

In this subsection, we study the effectiveness of our backdoor attack. For backdoor attacks, the goal is to misclassify the samples with trigger into a specified target class lpl_{p}. Unless said otherwise, we set the target label lpl_{p} as 0 in this paper. In addition to evaluating the test accuracy and attack success rate (ASR), we also measure the accuracy of the target class lpl_{p} to analyze the impact of the attack on the target label.

Table 1 shows the result on CIFAR-10 when perturbing 1%1\% of training images, with each perturbation restricted to a radius ll_{\infty}-norm 16/25516/255. We observe that the ASR exceeds 90%90\%, while the poison has negligible impact on both the test accuracy and the target class accuracy. In Table 2, we observe that the proposed attack remains remarkably effective even when the poison budget is very small. More experiments on different poison budgets and norm bounds can be found in Table 8. About transferability of attack and whether the victim network has learned the feature of trigger, please refer to the Appendix F.5.

Table 1: Baseline evaluations on CIFAR-10. Perturbations have ll_{\infty}-norm bounded above by 16/25516/255, and poison budget is 1%1\% of training images. Res means ResNet18, VGG means VGG16, WR means WRN34-10.
Model Res VGG WR
Clean model acc (%\%) 93 92 94
Poisoned model acc(%\%) 91 91 92
Clean model lpl_{p} acc(%\%) 94 92 95
Poisoned model lpl_{p} acc(%\%) 93 91 93
Attack Success Rate(%\%) 93 91 90
Table 2: The effect of poison budget on CIFAR-10 with ResNet18. Perturbations have ll_{\infty}-norm bounded above by 16/25516/255.
Poison Budget 0.6%0.6\% 1%1\% 2%2\%
Clean model acc (%\%) 93 93 93
Poisoned model acc(%\%) 93 91 93
Clean model lpl_{p} acc(%\%) 94 94 94
Poisoned model lpl_{p} acc(%\%) 93 93 92
Attack Success Rate(%\%) 86 93 94

Attack performance during the training process: To further validate the efficacy of our attack, we monitor the evolution of the overall clean model accuracy, the poisoned model accuracy, and the attack success rate throughout the training process, as shown in Figure 2. In Figure 2, we observe that the overall clean model accuracy and poisoned model accuracy remain very close. The attack success rate reaches a relatively high level from the very beginning and gradually converges to remain stable over time. This shows that our attack method is effective under the premise of maintaining the accuracy of the model.

Any target label: Note that the success of backdoor attacks also depends on the choice of target classes, to demonstrate the general efficacy of our attack for any target label lpl_{p}, we change lpl_{p} from 0 to 99, the result is shown in Figure 3. The results are quite uniform.

Refer to caption
Figure 2: Attack performance during the training process on CIFAR10 with ResNet18 and VGG16. This figure shows the trend of the poison model accuracy (AA), attack success rate (ASR) and clean model accuracy (AcA_{c}).
Refer to caption
Figure 3: Performance of different target label lpl_{p}. We show the poison model accuracy (AA), accuracy of target label (AtA_{t}), attack success rate (ApA_{p}) on CIFAR-10, using VGG16 and ResNet18.

6.2 Evaluation on More Datasets

We perform experiments on SVHN, CIFAR-100 and TinyImageNet. Table 3 summarizes the performance of our attack on different datasets, where the attacks are tested on ResNet18 for SVHN and CIFAR-100, and WRN34-10 for TinyImageNet. Each attacker can only perturb 0.8%0.8\% of the training images for TinyImageNet and CIFAR-100 and 2%2\% of the training images for SVHN, all perturbations are restricted to an ll_{\infty} norm 16/25516/255, and the target label is lp=0l_{p}=0. Additional experiments on different poison budgets and ll_{\infty}-norm bounds are presented in Table 9 in the appendix.

Table 3: Evaluations on more datasets. Perturbations have ll_{\infty}-norm bounded by 16/25516/255, and poison budget is 0.8%0.8\% for TinyImageNet and CIFAR-100 and 2%2\% for SVHN.
Datasets Clean acc Poison acc ASR
SVHN (%\%) 93 92 79
CIFAR-100 (%\%) 76 72 92
TinyImageNet(%\%) 62 60 82

6.3 Compare with Other Attacks

There are several existing clean-label hidden-trigger backdoor attacks that claim success in some settings. We consider the following seven attack methods.

Clean Label: Turner et al. (2018) pioneered clean label attacks. They first utilized adversarial perturbations or generative models to initially alter target class images and then performed standard invisible attacks.

Reflection: Liu et al. (2020) proposed adopting reflection as the trigger for stealthiness.

Hidden Trigger: Saha et al. (2020) proposed to inject the information of a poisoned sample generated by a previous visible attack into an image of the target class by minimizing its distance in the feature space.

Invisible Poison Ning et al. (2021) converted a regular trigger to a noised trigger to achieve stealthiness, but remains effective in the feature space for poison training data.

Image-specific: Luo et al. (2022) used an autoencoder to generate image-specific triggers that can promote the implantation and activation phases of the backdoor.

Narcissus: Zeng et al. (2022) solved the following optimization to obtain the trigger 𝒫{\mathcal{P}}: argmin𝒫L(fsur(x+𝒫,lp){\hbox{\rm{argmin}}}_{{\mathcal{P}}}\sum L(f_{sur}(x+{\mathcal{P}},l_{p}), where fsurf_{sur} is the surrogate network.

Sleeper Agent: Souri et al. (2022) proposed a backdoor attack by approximately solving the bilevel formulation with the Gradient Matching method (Geiping et al., 2020).

To further validate the efficacy of our attack, we compare our method with other clean label attack methods under the same settings. Specifically, we limit the LL_{\infty} norm trigger no more than 16/25516/255 on both the training set and the test set. It should be noted that, in some attack methods, the trigger budget in their original settings may exceed 16/25516/255, so we also compare under each respective settings. The results are given in Table 4. More results and some details are provided in Appendix F.3.

Table 4: Attack success rate on CIFAR-10 with ResNet18. Comparison of our method to popular clean-label attacks, poison budget is 1%1\%. The first column is the attack under LL_{\infty} limitation 16/25516/255, the second column is under each respective settings.
Attacks 16/255 Each setting
Clean-Label 23%23\% 96%96\%
Hidden-Trigger 75%75\% 95%95\%
Reflection 54%54\% 90%90\%
Invisible Poison 73%73\% 98%98\%
Image-specific 70%70\% 70%\%
Narcissus 50%50\% 92%92\%
Sleeper-Agent 61%\% 71%\%
Ours 93%\% 93%\%

From Table 4, we find that our attack significantly outperforms all other methods under the 16/25516/255 limitation. Moreover, our method has two key advantages: (1) Our method does not require additional steps. Sleeper Agent and Hidden-Trigger need a pre-existing patch and Reflection requires a fitting image; Narcissus needs to magnify the trigger in the reference phase. (2) Our method does not require large networks during poison generation, whereas Invisible Poison and Image-specific utilize large generative models to achieve optimal performance, and Narcissus requires a network trained on a different dataset. See Tables 10 and 11 in the appendix for more results.

Under each respective setting, all results were good, but many of them no longer meet the 16/25516/255 limitation. Furthermore, some of them need a patch in the trigger like Clean-label and Sleeper-Agent, and some of them need a larger disturbance like Reflection and Narcissus.

6.4 Defenses

Many backdoor defenses have been proposed to mitigate the effects of backdoor attacks. We test our attack and the attack methods mentioned in section 6.3 against some major popular defenses. We evaluate six types of defenses:

(1) AT: adversarial training with radius 8/2558/255 (Madry et al., 2017);

(2): Data Augmentation: (Borgnia et al., 2021);

(3) Scale-up: contrastive learning (Guo et al., 2023);

(4) DPSGD: differentially private SGD (Hong et al., 2020);

(5) Frequency Filter: remove high-frequency parts of images (Zeng et al., 2021b);

(6): Fine-Tuning: (Zhu et al., 2023a).

Table 5: Defenses: The attack success rate(%\%) and poison model accuracy(%\%, in bracket) on CIFAR-10 with ResNet18 of our attack and other attacks against various defense methods. Poison ratio is 1%1\% and perturbation have ll_{\infty}-norm bound 16/25516/255, target label lp=0l_{p}=0.
AT Data Augmentaion Scale-Up DPSGD Frequency Filter Fine-Tuning
Clean Label(%) 13(83) 17(89) 9(77) 12(78) 12(86) 11(89)
Invisible Poison(%) 11(84) 30(91) 18(80) 18(76) 20(88) 12(90)
Hidden Trigger(%) 14(83) 25(90) 31(81) 14(75) 20(87) 10(88)
Narcissu(%) 13(83) 23(90) 11(83) 15(76) 16(86) 10(89)
Image-specific(%) 10(84) 28(89) 22(79) 14(77) 22(87) 12(87)
Reflection(%) 13(85) 20(89) 16(82) 13(76) 37(85) 13(85)
Sleeper-Agent 14(83) 27(87) 27(79) 15(75) 27(84) 11(89)
Ours(%) 15(83) 40(91) 32(80) 12(77) 28(88) 13(89)
Ours-stronger(%) 34(84) 66(90) 52(80) 62(88)

The defense results are given in Table 5. We can see that ASR has basically decreased to around 10%\%, which is the lowest level because 10%\% of the samples themselves have the label 0. This is because we have imposed many restrictions on backdoor attacks, as said in Section F.1; and also because our theory and construction of trigger are mainly based on clean training, so our method appears somewhat fragile under defense; in fact, all attack methods mentioned in Section 6.3 appear fragile under defense methods, as shown in Table 5.

On the other hand, we find that there exists a robustness-accuracy trade-off across many of these defenses. Although these defense methods do degrade the attack success rate, they also cause the accuracy of the model test to decrease. For defense methods Scale-Up, this method is not stable and sometimes detects clean samples as poison samples; for fine-tuning, because a clean training set was used, this defense method has a very outstanding effect.

Furthermore, we point out that if we incorporate these defense methods into our attack generation to produce corresponding enhanced attacks, we can effectively withstand these defenses. For defense methods (1), (2), (4) and (5), we enhance our attack, details are shown in Appendix F.4, and get the better result.

6.5 Verify Theorem 4.5

In this section, we verify Theorem 4.5. In Appendix F.7, we verify Theorem 4.1.

We verify Theorem 4.5 by showing that any trigger 𝒫(x){\mathcal{P}}(x) that makes the ϵ,τ,λ\epsilon,\tau,\lambda in conditions (c1), (c2), (c3) of Theorem 4.5 small can achieve a high attack success rate.

To validate our conclusions, we evaluate the following poisoning function 𝒫(x){\mathcal{P}}(x), refer to Appendix F.6 for more details:

RN: Random noises with LL_{\infty} bound and L0L_{0} norm bound;

UA: Universal adversarial perturbations (Moosavi-Dezfooli et al., 2017);

Adv: Adversarial perturbations (Szegedy et al., 2013);

SCut: shortcut noise(Huang et al., 2021);

Ours: Perturbations generated by Algorithm 1.

We consider two indicators to evaluate the performance of poison on conditions (c1) and (c3) in Theorem 4.5:

  • Use the validation loss on poisoned data to measure condition (c1): Vadv=𝔼(x,y)𝒟L((x+𝒫(x)),y)V_{adv}={\mathbb{E}}_{(x,y)\sim{\mathcal{D}}}L({\mathcal{F}}(x+{\mathcal{P}}(x)),y), where {\mathcal{F}} is ResNet18 trained on the clean dataset.

  • Use the binary classification loss to measure condition (c3) by Proposition 4.10: Vsc=min𝔼(x,y)𝒟[L((x+𝒫(x),0))+L((x),1)]V_{sc}=\min_{{\mathcal{F}}}{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}}[L({\mathcal{F}}(x+{\mathcal{P}}(x),0))+L({\mathcal{F}}(x),1)], where {\mathcal{F}} is a two-layer network.

About condition (c2) in Theorem 4.5: for RN and UA, the poison perturbation is the same for every sample; for SCut and Ours, the perturbations for different samples are similar (or at least parts of the perturbation are similar).

Table 6 shows the results. We can see that RN, UA, SCut with a large budget can achieve good attack performance because they satisfy conditions (c1), (c2), (c3) in Theorem 4.5 to certain degree, which validates the effectiveness of conditions in Theorem 4.5. Moreover, each of VadvV_{adv} and VscV_{sc} is not satisfied, because adversaries alone yield excessively large VscV_{sc} since adversarial perturbations do not form shortcuts. Shortcuts alone produce an undesirably small VadvV_{adv} since the shortcuts do not become adversaries. However, by combining Adv and SCut via our algorithm, we achieve a significantly improved outcome.

On the other hand, please note Adv attack under the 32/25532/255 budget. Although the Adv attack’s similarity between different samples is poor, but its VadvV_{adv} and VscV_{sc} is not bad, and its ASR is about 80%80\%, this indicating that even if the similarity is not good enough, but the other two indicators are good, the attack can still be achieved. Therefore, in order to prevent the trigger from being detected due to being too similar, we can achieve attack effectiveness by reducing similarity and improving other two metrics.

Table 6: Values of VadvV_{adv} and VscV_{sc}; poison model accuracy (Acc); attack success rate (ASR) on CIFAR-10 test set with ResNet18. Poison budget is 1%1\%. If not specified, the norm is LL_{\infty}.
Poison Type Vadv()V_{adv}(\uparrow) Vsc()V_{sc}(\downarrow) ASR()(\uparrow) Acc
RN (16/25516/255) 2.40 0.014 12%\% 91%\%
RN (L0L_{0}, 200200) 3.87 0.004 59%\% 92%\%
UA (16/25516/255) 2.92 0.002 51%\% 91%\%
Adv (16/25516/255) 8.92 1.27 22%\% 92%\%
SCut (16/25516/255) 1.19 𝟏𝟎𝟒{\bf 10^{-4}} 30%\% 91%\%
Ours (16/25516/255) 6.53 0.001 93%\% 91%\%
RN (32/25532/255) 6.28 10410^{-4} 99%\% 91%\%
RN (L0L_{0}, 300300) 6.33 0.003 94%\% 91%\%
UA (32/25532/255) 15.45 10410^{-4} 92%\% 92%\%
Adv (32/25532/255) 16.38 0.35 80%\% 92%\%
SCut (32/25532/255) 4.22 𝟏𝟎𝟓{\bf 10^{-5}} 93%\% 90%\%
Ours (32/25532/255) 14.65 10410^{-4} 99%\% 92%\%

7 Conclusion

In this paper, we give generalization bounds for the clean-label backdoor attack. Precisely, we provide upper bounds for the clean and poison population error based on empirical error on the poisoned training set and some other quantities. These bounds give the theoretical foundation for the clean-label backdoor attack in that the goal of the attack can be achieved under certain reasonable conditions. The main technical difficulties in establishing these bounds include how to treat the non-i.i.d. poisoned dataset and the fact that the triggers are both in the training and testing phases.

Based on these theoretical results, we propose a novel attack method that uses a combination of adversarial noise and indiscriminate poison as the trigger. Moreover, extensive experiments show that our attack can guarantee that the accuracy of the poisoned model on clean data and the attack success rate are high.

Limitations and Future Work. The conditions of Theorem 4.5 are quite complicated, and it is desirable to give simpler conditions for the poisoned population error bound in Theorem 4.5. The current generalization bounds do not involve the training process, and algorithmic-dependent generalization bounds, such as stability analysis (Hardt et al., 2016), should be further analyzed for backdoor attacks.

Impact Statement

A theoretical basis for backdoor attacks is given. One potential negative social impact of this work is that malicious opponents may use these methods to generate new types of backdoor poisons. Therefore, it is necessary to develop more powerful models to resist backdoor attacks, which is left for future work.

Acknowledgments

This work is supported by CAS Project for Young Scientists in Basic Research, Grant No.YSBR-040, ISCAS New Cultivation Project ISCAS-PYFX-202201, and ISCAS Basic Research ISCAS-JCZD-202302. This work is supported by NSFC grant No.12288201 and NKRDP grant No.2018YFA0306702. The authors thank anonymous referees for their valuable comments.

References

  • Arora et al. (2018) Arora, S., Ge, R., Neyshabur, B., and Zhang, Y. Stronger generalization bounds for deep nets via a compression approach. In International Conference on Machine Learning, pp.  254–263. PMLR, 2018.
  • Arora et al. (2019) Arora, S., Du, S., Hu, W., Li, Z., and Wang, R. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning, pp.  322–332. PMLR, 2019.
  • Bagdasaryan & Shmatikov (2021) Bagdasaryan, E. and Shmatikov, V. Blind backdoors in deep learning models. In 30th USENIX Security Symposium (USENIX Security 21), pp.  1505–1521, 2021.
  • Barni et al. (2019) Barni, M., Kallas, K., and Tondi, B. A new backdoor attack in cnns by training set corruption without label poisoning. In 2019 IEEE International Conference on Image Processing (ICIP), pp.  101–105. IEEE, 2019.
  • Barron & Klusowski (2018) Barron, A. R. and Klusowski, J. M. Approximation and estimation for high-dimensional deep learning networks. arXiv preprint arXiv:1809.03090, 2018.
  • Bartlett et al. (2017) Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrally-normalized margin bounds for neural networks. Advances in neural information processing systems, 30, 2017.
  • Bartlett et al. (2019) Bartlett, P. L., Harvey, N., Liaw, C., and Mehrabian, A. Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks. The Journal of Machine Learning Research, 20(1):2285–2301, 2019.
  • Bartlett et al. (2021) Bartlett, P. L., Montanari, A., and Rakhlin, A. Deep learning: a statistical viewpoint. Acta numerica, 30:87–201, 2021.
  • Bennouna & Van Parys (2022) Bennouna, A. and Van Parys, B. Holistic robust data-driven decisions. arXiv preprint arXiv:2207.09560, 2022.
  • Bennouna et al. (2023) Bennouna, A., Lucas, R., and Van Parys, B. Certified robust neural networks: Generalization and corruption resistance. In International Conference on Machine Learning, pp.  2092–2112. PMLR, 2023.
  • Bertsekas & Tsitsiklis (2008) Bertsekas, D. and Tsitsiklis, J. N. Introduction to probability, volume 1. Athena Scientific, 2008.
  • Borgnia et al. (2021) Borgnia, E., Cherepanova, V., Fowl, L., Ghiasi, A., Geiping, J., Goldblum, M., Goldstein, T., and Gupta, A. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp.  3855–3859. IEEE, 2021.
  • Brutzkus & Globerson (2021) Brutzkus, A. and Globerson, A. An optimization and generalization analysis for max-pooling networks. In Uncertainty in Artificial Intelligence, pp.  1650–1660. PMLR, 2021.
  • Cao & Gu (2019) Cao, Y. and Gu, Q. Generalization bounds of stochastic gradient descent for wide and deep neural networks. Advances in neural information processing systems, 32, 2019.
  • Chen et al. (2017) Chen, X., Liu, C., Li, B., Lu, K., and Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017.
  • Dabouei et al. (2020) Dabouei, A., Soleymani, S., Taherkhani, F., Dawson, J., and Nasrabadi, N. Smoothfool: An efficient framework for computing smooth adversarial perturbations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  2665–2674, 2020.
  • Doan et al. (2021a) Doan, K., Lao, Y., and Li, P. Backdoor attack with imperceptible input and latent modification. Advances in Neural Information Processing Systems, 34:18944–18957, 2021a.
  • Doan et al. (2021b) Doan, K., Lao, Y., Zhao, W., and Li, P. Lira: Learnable, imperceptible and robust backdoor attacks. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  11966–11976, 2021b.
  • Dziugaite & Roy (2017) Dziugaite, G. K. and Roy, D. M. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  • Farnia & Ozdaglar (2021) Farnia, F. and Ozdaglar, A. Train simultaneously, generalize better: Stability of gradient-based minimax learners. In International Conference on Machine Learning, pp.  3174–3185. PMLR, 2021.
  • Fu et al. (2020) Fu, S., He, F., Liu, Y., Shen, L., and Tao, D. Robust unlearnable examples: Protecting data against adversarial learning. arXiv preprint arXiv:2012.04115, 2020.
  • Gao et al. (2023) Gao, Y., Li, Y., Zhu, L., Wu, D., Jiang, Y., and Xia, S.-T. Not all samples are born equal: Towards effective clean-label backdoor attacks. Pattern Recognition, 139:109512, 2023.
  • Geiping et al. (2020) Geiping, J., Fowl, L., Huang, W. R., Czaja, W., Taylor, G., Moeller, M., and Goldstein, T. Witches’ brew: Industrial scale data poisoning via gradient matching. arXiv preprint arXiv:2009.02276, 2020.
  • Gu et al. (2017) Gu, T., Dolan-Gavitt, B., and Garg, S. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733, 2017.
  • Guo et al. (2023) Guo, J., Li, Y., Chen, X., Guo, H., Sun, L., and Liu, C. Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency. arXiv preprint arXiv:2302.03251, 2023.
  • Hanneke et al. (2022) Hanneke, S., Karbasi, A., Mahmoody, M., Mehalel, I., and Moran, S. On optimal learning under targeted data poisoning. Advances in Neural Information Processing Systems, 35:30770–30782, 2022.
  • Hardt et al. (2016) Hardt, M., Recht, B., and Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent. In International conference on machine learning, pp.  1225–1234. PMLR, 2016.
  • Harvey et al. (2017) Harvey, N., Liaw, C., and Mehrabian, A. Nearly-tight vc-dimension bounds for piecewise linear neural networks. In Conference on learning theory, pp.  1064–1068. PMLR, 2017.
  • Hayase et al. (2021) Hayase, J., Kong, W., Somani, R., and Oh, S. Spectre: Defending against backdoor attacks using robust statistics. In International Conference on Machine Learning, pp.  4129–4139. PMLR, 2021.
  • He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  • Hong et al. (2020) Hong, S., Chandrasekaran, V., Kaya, Y., Dumitraş, T., and Papernot, N. On the effectiveness of mitigating data poisoning attacks with gradient shaping. arXiv preprint arXiv:2002.11497, 2020.
  • Huang et al. (2021) Huang, H., Ma, X., Erfani, S. M., Bailey, J., and Wang, Y. Unlearnable examples: Making personal data unexploitable. arXiv preprint arXiv:2101.04898, 2021.
  • Huang et al. (2019) Huang, X., Alzantot, M., and Srivastava, M. Neuroninspect: Detecting backdoors in neural networks via output explanations. arXiv preprint arXiv:1911.07399, 2019.
  • Ji & Telgarsky (2020) Ji, Z. and Telgarsky, M. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow relu networks. In ICLR 2020, 2020.
  • Kolouri et al. (2020) Kolouri, S., Saha, A., Pirsiavash, H., and Hoffmann, H. Universal litmus patterns: Revealing backdoor attacks in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  301–310, 2020.
  • Krizhevsky et al. (2009) Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. Technical Report TR-2009, 2009.
  • Kuzborskij & Lampert (2018) Kuzborskij, I. and Lampert, C. Data-dependent stability of stochastic gradient descent. In International Conference on Machine Learning, pp.  2815–2824. PMLR, 2018.
  • Le & Yang (2015) Le, Y. and Yang, X. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  • Li et al. (2020) Li, S., Xue, M., Zhao, B. Z. H., Zhu, H., and Zhang, X. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Transactions on Dependable and Secure Computing, 18(5):2088–2105, 2020.
  • Li et al. (2021) Li, Y., Li, Y., Wu, B., Li, L., He, R., and Lyu, S. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  16463–16472, 2021.
  • Li et al. (2023) Li, Y., Ildiz, M. E., Papailiopoulos, D., and Oymak, S. Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, pp.  19565–19594. PMLR, 2023.
  • Liu et al. (2018) Liu, K., Dolan-Gavitt, B., and Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International symposium on research in attacks, intrusions, and defenses, pp.  273–294. Springer, 2018.
  • Liu et al. (2020) Liu, Y., Ma, X., Bailey, J., and Lu, F. Reflection backdoor: A natural backdoor attack on deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pp.  182–199. Springer, 2020.
  • Long & Sedghi (2019) Long, P. M. and Sedghi, H. Generalization bounds for deep convolutional neural networks. arXiv preprint arXiv:1905.12600, 2019.
  • Luo et al. (2022) Luo, N., Li, Y., Wang, Y., Wu, S., Tan, Y.-a., and Zhang, Q. Enhancing clean label backdoor attack with two-phase specific triggers. arXiv preprint arXiv:2206.04881, 2022.
  • Madry et al. (2017) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2017.
  • Massart (2000) Massart, P. Some applications of concentration inequalities to statistics. In Annales de la Faculté des sciences de Toulouse: Mathématiques, volume 9, pp.  245–303, 2000.
  • Mohri et al. (2018) Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of machine learning. MIT press, 2018.
  • Moosavi-Dezfooli et al. (2017) Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., and Frossard, P. Universal adversarial perturbations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  1765–1773, 2017.
  • Neyshabur et al. (2017) Neyshabur, B., Bhojanapalli, S., McAllester, D., and Srebro, N. Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017.
  • Ning et al. (2021) Ning, R., Li, J., Xin, C., and Wu, H. Invisible poison: A blackbox clean label backdoor attack to deep neural networks. In IEEE Conference on Computer Communications, pp.  1–10. IEEE, 2021.
  • Saha et al. (2020) Saha, A., Subramanya, A., and Pirsiavash, H. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.  11957–11965, 2020.
  • Sauer (1972) Sauer, N. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13(1):145–147, 1972.
  • Simonyan & Zisserman (2014) Simonyan, K. and Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  • Souri et al. (2022) Souri, H., Fowl, L., Chellappa, R., Goldblum, M., and Goldstein, T. Sleeper agent: Scalable hidden trigger backdoors for neural networks trained from scratch. Advances in Neural Information Processing Systems, 35:19165–19178, 2022.
  • Szegedy et al. (2013) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
  • Tsai et al. (2021) Tsai, Y.-L., Hsu, C.-Y., Yu, C.-M., and Chen, P.-Y. Formalizing generalization and robustness of neural networks to weight perturbations. arXiv preprint arXiv:2103.02200, 2021.
  • Turner et al. (2018) Turner, A., Tsipras, D., and Madry, A. Clean-label backdoor attacks. In https://people.csail.mit.edu/madry/ lab/cleanlabel.pdf, 2018.
  • Valle-Pérez & A. Louis (2022) Valle-Pérez, G. and A. Louis, A. Generalization bounds for deep learning. arXiv preprint arXiv:2203.14533, 2022.
  • Vapnik & Chervonenkis (2015) Vapnik, V. N. and Chervonenkis, A. Y. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity: festschrift for alexey chervonenkis, pp.  11–30. Springer, 2015.
  • Vardi et al. (2022) Vardi, G., Shamir, O., and Srebro, N. The sample complexity of one-hidden-layer neural networks. Advances in Neural Information Processing Systems, 35:9139–9150, 2022.
  • Wainwright (2019) Wainwright, M. J. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  • Wang et al. (2019) Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., and Zhao, B. Y. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pp.  707–723. IEEE, 2019.
  • Wang & Ma (2022) Wang, M. and Ma, C. Generalization error bounds for deep neural networks trained by sgd. arXiv preprint arXiv:2206.03299, 2022.
  • Wang et al. (2021) Wang, Y., Mianjy, P., and Arora, R. Robust learning for data poisoning attacks. In International Conference on Machine Learning, 2021.
  • Wang et al. (2024) Wang, Y., Liu, S., and Gao, X.-S. Data-dependent stability analysis of adversarial training. arXiv preprint arXiv:2401.03156, 2024.
  • Wei & Ma (2019) Wei, C. and Ma, T. Data-dependent sample complexity of deep neural networks via lipschitz augmentation. Advances in Neural Information Processing Systems, 32, 2019.
  • Xiao et al. (2022) Xiao, J., Fan, Y., Sun, R., Wang, J., and Luo, Z.-Q. Stability analysis and generalization bounds of adversarial training. Advances in Neural Information Processing Systems, 35:15446–15459, 2022.
  • Xing et al. (2021) Xing, Y., Song, Q., and Cheng, G. On the algorithmic stability of adversarial training. Advances in neural information processing systems, 34:26523–26535, 2021.
  • Yu et al. (2022) Yu, D., Zhang, H., Chen, W., Yin, J., and Liu, T.-Y. Indiscriminate poisoning attacks are shortcuts. arXiv preprint arXiv:2111.00898, 2022.
  • Zeng et al. (2021a) Zeng, Y., Chen, S., Park, W., Mao, Z. M., Jin, M., and Jia, R. Adversarial unlearning of backdoors via implicit hypergradient. arXiv preprint arXiv:2110.03735, 2021a.
  • Zeng et al. (2021b) Zeng, Y., Park, W., Mao, Z. M., and Jia, R. Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  16473–16481, 2021b.
  • Zeng et al. (2022) Zeng, Y., Pan, M., Just, H. A., Lyu, L., Qiu, M., and Jia, R. Narcissus: A practical clean-label backdoor attack with limited information. arXiv preprint arXiv:2204.05255, 2022.
  • Zhong et al. (2020) Zhong, H., Liao, C., Squicciarini, A. C., Zhu, S., and Miller, D. Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp.  97–108, 2020.
  • Zhu et al. (2023a) Zhu, M., Wei, S., Shen, L., Fan, Y., and Wu, B. Enhancing fine-tuning based backdoor defense with sharpness-aware minimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  4466–4477, 2023a.
  • Zhu et al. (2023b) Zhu, Y., Yu, L., and Gao, X.-S. Detection and defense of unlearnable examples. arXiv preprint arXiv:2312.08898, 2023b.

Appendix A Proof of Theorem 4.1

A.1 Prelinimaries

We first give some notation that will be used in all the proofs.

Subdistribution 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}}. Let 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}} be the distribution of samples whose label is not lpl_{p}, that is,

(x,y)𝒟Slp((x,y)A)=(x,y)𝒟S((x,y)A|ylp){\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}((x,y)\in A)={\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}}((x,y)\in A|y\neq l_{p})

for any set AA.

Probability of samples to have label ypy_{p}. For a fixed pp, let

(x,y)𝒟S(y=lp)=η and 0<η<1.{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}}(y=l_{p})=\eta\hbox{ and }0<\eta<1. (3)

General Hypothesis Space. In some theorems, we will consider the more general hypothesis space

H={h(x,y):𝒮×[m][0,1]}.H=\{h(x,y):{\mathcal{S}}\times[m]\to[0,1]\}.

HH contains the commonly used hypothesis space {L((x),y):𝒮×[m][0,1]}\{L({\mathcal{F}}(x),y):{\mathcal{S}}\times[m]\to[0,1]\}, where {\mathcal{F}} is the network and LL is the loss function.

We give a classic generalization bound below, which will be used in the proof.

Theorem A.1 (P.217 of (Mohri et al., 2018), Informal).

Let the training set 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} be i.i.d. sampled from the data distribution 𝒟S{\mathcal{D}}_{S} and N=|𝒟tr|.N=|{\mathcal{D}}_{{{\rm{tr}}}}|. For the hypothesis space ={L((x),y):n×[m][0,1]}{\mathcal{H}}=\{L({\mathcal{F}}(x),y):{\mathbb{R}}^{n}\times[m]\to[0,1]\} and δ+\delta\in{\mathbb{R}}_{+}, with probability at least 1δ1-\delta, for any L((x),y)L({\mathcal{F}}(x),y)\in{\mathcal{H}}, we have

𝔼(x,y)𝒟S[L((x),y)]𝔼(x,y)𝒟tr[L((x),y)]+2RadN𝒟S()+ln(1/δ)2N\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[L({\mathcal{F}}(x),y)]\leq{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[L({\mathcal{F}}(x),y)]+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}}_{N}({\mathcal{H}})+\sqrt{\frac{\ln(1/\delta)}{2N}}\\ \end{array} (4)

We prove Theorem 4.1 in three steps given in Sections A.2, A.3, and A.4, respectively.

A.2 Proof of Theorem A.2

We first prove the following theorem, which gives a generalization bound for a more general hypothesis space.

Theorem A.2.

Let 𝒟S{\mathcal{D}}_{S}, 𝒟P{\mathcal{D}}_{P}, 𝒟Slp{\mathcal{D}}_{S}^{l_{p}}, α\alpha be defined in Section 3 and 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}} be defined in Section A.1. Then for any hypothesis space H={h(x,y):𝒮×[m][0,1]}H=\{h(x,y):{\mathcal{S}}\times[m]\to[0,1]\}, with probability at least 1δ4η4η+(1η)N4(1η)ηN+4(1η)1-\delta-\frac{4\eta}{4\eta+(1-\eta)N}-\frac{4(1-\eta)}{\eta N+4(1-\eta)}, for any hHh\in H, it holds

𝔼(x,y)𝒟S[h(x,y)]42α1α𝔼(x,y)𝒟P[h(x,y)]+4RadN𝒟Slp(H)+41αRadN𝒟Slp(H)+2ln(2/δ)N(1α).\begin{array}[]{cc}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x,y)]\leq\frac{4-2\alpha}{1-\alpha}{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x,y)]+4{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N}(H)+\frac{4}{1-\alpha}{{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N}(H)}+2\sqrt{\frac{\ln(2/\delta)}{N(1-\alpha)}}.\end{array} (5)

As mentioned previously, the samples in 𝒟P{\mathcal{D}}_{P} do not satisfy “i.i.d. sampled from 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}”, and we will find a subset of 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}, whose samples are i.i.d. sampled from 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}. The core of the proof of theorem A.2 is the following two lemmas, which show how to select such a subset.

Lemma A.3.

Use notations in Theorem A.2. Let XX be the random variable of the number of samples with label lp\neq l_{p} in 𝒟P{\mathcal{D}}_{P}. Let kk be a given number in {1,2,,Nη}\{1,2,\dots,N\eta\}. If XkX\geq k, we randomly select kk samples without label lpl_{p} in 𝒟P{\mathcal{D}}_{P} and let DlD_{l} be the set of these samples; otherwise, let DlD_{l} be the set of all samples without label lpl_{p} in 𝒟P{\mathcal{D}}_{P}. Let DlD_{l} satisfy the distribution 𝒟S0{\mathcal{D}}_{S_{0}}. Then we have x𝒟𝒮0(xA|Xk)=(Xk𝒟SlpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X\geq k)={\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A) for any set AA, where Xk𝒟SlpX_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}} means i.i.d. sampling kk data from distribution 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}}.

Proof.

By the Bayesian formula, we have

x𝒟𝒮0(xA|Xk)=x𝒟𝒮0(xA,Xk)/x𝒟𝒮0(Xk)=i=kNx𝒟𝒮0(xA,X=i)/x𝒟𝒮0(Xk)=i=kNx𝒟𝒮0(xA|X=i)(X=i)/x𝒟𝒮0(Xk)=i=kNx𝒟𝒮0(xA|X=i)(X=i|Xk).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X\geq k)\\ =&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A,X\geq k)/{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A,X=i)/{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i){\mathbb{P}}(X=i)/{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i){\mathbb{P}}(X=i|X\geq k).\\ \end{array} (6)

We will show that x𝒟𝒮0(xA|X=i)=(Xk𝒟SlpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i)={\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A) for any iki\geq k, and hence the lemma.

Since the poison does not change labels, XX is also the number of samples without label lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}. Then for any iki\geq k, when X=iX=i, we will traverse all possible selection methods for DlD_{l} to calculate x𝒟𝒮0(xA|X=i){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i).

Note that the NN samples in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} are i.i.d samples from 𝒟S{\mathcal{D}}_{S}. We will consider the order of these samples. Let (xj,yj)𝒟tr(x_{j},y_{j})\in{\mathcal{D}}_{{{\rm{tr}}}} be the jj-th element selected from the distribution 𝒟𝒮{\mathcal{D}}_{\mathcal{S}}. If we add poison to (xj,yj)(x_{j},y_{j}), then it becomes (xj+𝒫(xj),yj)𝒟P(x_{j}+{\mathcal{P}}(x_{j}),y_{j})\in{\mathcal{D}}_{P}; if we do not add poison to (xj,yj)(x_{j},y_{j}), then (xj,yj)𝒟P(x_{j},y_{j})\in{\mathcal{D}}_{P}.

Let Dylp[N]D_{y\neq l_{p}}\subset[N] be the set of kk such (xk,yk)𝒟P(x_{k},y_{k})\in{\mathcal{D}}_{P} satisfying yklpy_{k}\neq l_{p}. By considering all the possible situations of DylpD_{y\neq l_{p}}, we have

x𝒟𝒮0(xA|X=i)=x𝒟𝒮0(xA,X=i)/(X=i)=𝒟ylp,|𝒟ylp|=ix𝒟𝒮0(xA,𝒟ylp)/(X=i)=𝒟ylp,|𝒟ylp|=ix𝒟𝒮0(xA|Dylp)(𝒟ylp)(X=i)=𝒟ylp,|𝒟ylp|=ix𝒟𝒮0(xA|Dylp)/CNi.\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i)\\ =&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A,X=i)/{\mathbb{P}}(X=i)\\ =&\sum_{{\mathcal{D}}_{y\neq l_{p}},|{\mathcal{D}}_{y\neq l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A,{\mathcal{D}}_{y\neq l_{p}})/{\mathbb{P}}(X=i)\\ =&\sum_{{\mathcal{D}}_{y\neq l_{p}},|{\mathcal{D}}_{y\neq l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|D_{y\neq l_{p}})\frac{{\mathbb{P}}({\mathcal{D}}_{y\neq l_{p}})}{{\mathbb{P}}(X=i)}\\ =&{\sum_{{\mathcal{D}}_{y\neq l_{p}},|{\mathcal{D}}_{y\neq l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|D_{y\neq l_{p}})}/{C_{N}^{i}}.\\ \end{array}

Thus, for all 𝒟ylp{\mathcal{D}}_{y\neq l_{p}} satisfying |𝒟ylp|=i|{\mathcal{D}}_{y\neq l_{p}}|=i, we traverse all the possibilities of the sample index {i1,i2,,ik}\{i_{1},i_{2},\dots,i_{k}\} selected by 𝒟l{\mathcal{D}}_{l} and then have

x𝒟𝒮0(xA|Dylp)=i1,i2,,ikDylp((xi1,xi2,,xik)A)Cik=1Ciki1,i2,,ikDylp(Xk𝒟SlpA)=(Xk𝒟SlpA)\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|D_{y\neq l_{p}})\\ =&\frac{\sum_{i_{1},i_{2},\dots,i_{k}\in D_{y\neq l_{p}}}{\mathbb{P}}((x_{i_{1}},x_{i_{2}},\dots,x_{i_{k}})\in A)}{C_{i}^{k}}\\ =&\frac{1}{C_{i}^{k}}\sum_{i_{1},i_{2},\dots,i_{k}\subset D_{y\neq l_{p}}}{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A)\\ =&{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A)\end{array}

where xix_{i} are i.i.d. and CabC_{a}^{b} is the combination number of selecting bb samples from aa samples. We thus have:

x𝒟𝒮0(xA|X=i)=𝒟ylp,|𝒟ylp|=ix𝒟𝒮0(xA|Dylp)CNi=𝒟ylp,|𝒟ylp|=i(Xk𝒟SlpA)CNi=(Xk𝒟SlpA).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i)\\ =&\frac{\sum_{{\mathcal{D}}_{y\neq l_{p},|{\mathcal{D}}_{y\neq l_{p}}|=i}}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|D_{y\neq l_{p}})}{C_{N}^{i}}\\ =&\frac{\sum_{{\mathcal{D}}_{y\neq l_{p}},|{\mathcal{D}}_{y\neq l_{p}}|=i}{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A)}{C_{N}^{i}}\\ =&{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A).\end{array}

Finally, we have

x𝒟𝒮0(xA|Xk)=i=kNx𝒟𝒮0(xA|X=i)(X=i|Xk)=i=kN(Xk𝒟SlpA)(X=i|Xk)=(Xk𝒟SlpA)i=kN(X=i|Xk)=(Xk𝒟SlpA).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X=i){\mathbb{P}}(X=i|X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A){\mathbb{P}}(X=i|X\geq k)\\ =&{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A)\sum_{i=k}^{N}{\mathbb{P}}(X=i|X\geq k)\\ =&{\mathbb{P}}(X_{k}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A).\end{array}

This proves the lemma. ∎

For samples with label lpl_{p}, we have the similar conclusion.

Lemma A.4.

Let XX be the random variable of the number of samples with label lpl_{p} in the set 𝒟P{\mathcal{D}}_{P}. Let kk to be a given number in {1,2,,[Nη]}\{1,2,\dots,[N\eta]\}. If XkX\geq k, we randomly select (1α)k(1-\alpha)k samples without trigger but with label lpl_{p} in 𝒟P{\mathcal{D}}_{P}, and let these samples form the set DlpD_{l_{p}}; otherwise, we select all samples without trigger but with label lpl_{p} in 𝒟P{\mathcal{D}}_{P}, and make these samples the set DlpD_{l_{p}}. Let DlpD_{l_{p}} obey the distribution 𝒟𝒮1{\mathcal{D}}_{{\mathcal{S}}_{1}}. Then we have x𝒟𝒮1(xA|Xk)=(X(1α)k𝒟𝒮lpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X\geq k)={\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A) for any set AA, where X(1α)k𝒟𝒮lpX_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}} means i.i.d. sample (1α)k(1-\alpha)k samples from distribution 𝒟𝒮lp{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}.

Proof.

By the Bayesian formula, we have

x𝒟𝒮1(xA|Xk)=i=kNx𝒟𝒮1(xA|X=i)(X=i|Xk).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X\geq k)=\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X=i){\mathbb{P}}(X=i|X\geq k).\\ \end{array}

The intermediate steps are similar to equation (6) in the proof of Lemma A.3, so we omit them.

Now we prove x𝒟𝒮1(xA|X=i)=(X(1α)k𝒟𝒮lpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X=i)={\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A) for any iki\geq k. Note that the NN samples in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} are i.i.d selected from 𝒟S{\mathcal{D}}_{S}. Now we will consider the order of these samples. Let (xj,yj)𝒟tr(x_{j},y_{j})\in{\mathcal{D}}_{{{\rm{tr}}}} be the jj-th element selected from the distribution 𝒟𝒮{\mathcal{D}}_{\mathcal{S}}. If we add poison to (xj,yj)(x_{j},y_{j}), then it becomes (xj+𝒫(xj),yj)𝒟P(x_{j}+{\mathcal{P}}(x_{j}),y_{j})\in{\mathcal{D}}_{P}; if we do not add poison to (xj,yj)(x_{j},y_{j}), then (xj,yj)𝒟P(x_{j},y_{j})\in{\mathcal{D}}_{P}.

For any iki\geq k, when X=iX=i, there must be |𝒟lp|=(1α)k|{\mathcal{D}}_{l_{p}}|=(1-\alpha)k. Let Dy=lp[N]D_{y=l_{p}}\subset[N] be the set of jj such that (xj,yj)𝒟P(x_{j},y_{j})\in{\mathcal{D}}_{P} satisfied yj=lpy_{j}=l_{p}. Now we consider all the possible situation of Dy=lpD_{y=l_{p}}:

x𝒟𝒮1(xA|X=i)=x𝒟𝒮1(xA,X=i)/(X=i)=Dy=lp,|Dy=lp|=ix𝒟𝒮1(xA,Dy=lp)/(X=i)=Dy=lp,|Dy=lp|=ix𝒟𝒮1(xA|Dy=lp)(Dy=lp)/(X=i)=Dy=lp,|Dy=lp|=ix𝒟𝒮1(xA|Dy=lp)/CNi.\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X=i)\\ =&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A,X=i)/{\mathbb{P}}(X=i)\\ =&\sum_{D_{y=l_{p}},|D_{y=l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A,D_{y=l_{p}})/{\mathbb{P}}(X=i)\\ =&\sum_{D_{y=l_{p}},|D_{y=l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|D_{y=l_{p}}){\mathbb{P}}(D_{y=l_{p}})/{\mathbb{P}}(X=i)\\ =&\sum_{D_{y=l_{p}},|D_{y=l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|D_{y=l_{p}})/C_{N}^{i}.\\ \end{array} (7)

Let Dy=lppoi[N]D^{poi}_{y=l_{p}}\subset[N] be the set of jj satisfying that xjx_{j} is a poisoned sample, and Dy=lpnopoi[N]D^{no\ poi}_{y=l_{p}}\subset[N] be the set of jj satisfying (xj,yj)𝒟lp(x_{j},y_{j})\in{\mathcal{D}}_{l_{p}}. It is easy to see that Dy=lpnopoi,Dy=lppoiDy=lpD^{no\ poi}_{y=l_{p}},D^{poi}_{y=l_{p}}\subset D_{y=l_{p}}. Then, for any given Dy=lpD_{y=l_{p}} such that |Dy=lp|=i|D_{y=l_{p}}|=i, we traverse all possibilities of Dy=lppoiD^{poi}_{y=l_{p}} and Dy=lpnopoiD^{no\ poi}_{y=l_{p}} to calculate x𝒟𝒮1(xA|Dy=lp){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|D_{y=l_{p}}):

x𝒟𝒮1(xA|Dy=lp)={ik}k=1iα,{ij}j=1[Nη](1α)({ij}j=1[Nη](1α)A)(Dy=lppoi={ik}k=1iα,Dy=lpnopoi={ij}j=1[Nη](1α)|Dy=lp)={ik}k=1iα,{ij}j=1[Nη](1α)(X(1α)k𝒟𝒮lpA)(Dy=lppoi={ik}k=1iα,Dy=lpnopoi={ij}j=1[Nη](1α)|Dy=lp)=(X(1α)k𝒟𝒮lpA){ik}k=1iα,{ij}j=1[Nη](1α)(Dy=lppoi={ik}k=1iα,Dy=lpnopoi={ij}j=1[Nη](1α)|Dy=lp)=(X(1α)k𝒟𝒮lpA).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|D_{y=l_{p}})\\ =&\sum_{\{i_{k}\}_{k=1}^{i\alpha},\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}}{\mathbb{P}}(\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}\in A){\mathbb{P}}(D^{poi}_{y=l_{p}}=\{i_{k}\}_{k=1}^{i\alpha},D^{no\ poi}_{y=l_{p}}=\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}|D_{y={l_{p}}})\\ =&\sum_{\{i_{k}\}_{k=1}^{i\alpha},\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}}{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A){\mathbb{P}}(D^{poi}_{y=l_{p}}=\{i_{k}\}_{k=1}^{i\alpha},D^{no\ poi}_{y=l_{p}}=\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}|D_{y={l_{p}}})\\ =&{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A)\sum_{\{i_{k}\}_{k=1}^{i\alpha},\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}}{\mathbb{P}}(D^{poi}_{y=l_{p}}=\{i_{k}\}_{k=1}^{i\alpha},D^{no\ poi}_{y=l_{p}}=\{i_{j}\}_{j=1}^{[N\eta](1-\alpha)}|D_{y={l_{p}}})\\ =&{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A).\\ \end{array}

The xix_{i} are i.i.d. and CabC_{a}^{b} is the number of combinations to select bb samples from aa samples. Substituting it into inequality (7), we have

x𝒟𝒮1(xA|X=i)=Dy=lp,|Dy=lp|=ix𝒟𝒮1(xA|Dy=lp)/CNi=Dy=lp,|Dy=lp|=i(X(1α)k𝒟𝒮lpA)/CNi=(X(1α)k𝒟𝒮lpA).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X=i)\\ =&\sum_{D_{y=l_{p}},|D_{y=l_{p}}|=i}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|D_{y=l_{p}})/C_{N}^{i}\\ =&\sum_{D_{y=l_{p}},|D_{y=l_{p}}|=i}{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A)/C_{N}^{i}\\ =&{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A).\\ \end{array}

This is what we want. Finally, we have

x𝒟𝒮1(xA|Xk)=i=kNx𝒟𝒮1(xA|X=i)(X=i|Xk)=i=kN(X(1α)k𝒟𝒮lpA)(X=i|Xk)=(X(1α)k𝒟𝒮lpA).\begin{array}[]{ll}&{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X=i){\mathbb{P}}(X=i|X\geq k)\\ =&\sum_{i=k}^{N}{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A){\mathbb{P}}(X=i|X\geq k)\\ =&{\mathbb{P}}(X_{(1-\alpha)k}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A).\end{array}

The lemma is proved. ∎

Now, we prove Theorem A.2.

Proof.

By the Bayesian formula, we have

𝔼(x,y)𝒟S[h(x,y)]=(x,y)𝒟S(y=lp)𝔼(x,y)𝒟𝒮lp[h(x,y)]+(x,y)𝒟S(ylp)𝔼(x,y)𝒟Slp[h(x,y)]=η𝔼(x,y)𝒟Slp[h(x,y)]+(1η)𝔼(x,y)𝒟Slp[h(x,y)].\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x,y)]\\ =&{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}}(y=l_{p}){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x,y)]+{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}}(y\neq l_{p}){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]\\ =&\eta{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]+(1-\eta){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)].\end{array} (8)

Now we will separately estimate 𝔼(x,y)𝒟Slp[h(x,y)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)] and 𝔼(x,y)𝒟𝒮lp[h(x,y)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x,y)].

Upper bound of 𝔼(x,y)𝒟Slp[h(x,y)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]. We give such a bound in the following Results (c1), (c2), and (c3).

Result (c1): Let random variable XX be the number of samples with labels not equal to lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}. Then with probability at least 14η4η+NNη1-\frac{4\eta}{4\eta+N-N\eta}, we have XN(1η)/2X\geq N(1-\eta)/2.

Let 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x_{i},y_{i})\}_{i=1}^{N}. Then X=i=1N𝟏(yilp)X=\sum_{i=1}^{N}{\mathbf{1}}(y_{i}\neq l_{p}), so 𝔼[X]=𝔼[i=1N𝟏(yilp)]=i=1N𝔼[𝟏(yilp)]=N(1η){\mathbb{E}}[X]={\mathbb{E}}[\sum_{i=1}^{N}{\mathbf{1}}(y_{i}\neq l_{p})]=\sum_{i=1}^{N}{\mathbb{E}}[{\mathbf{1}}(y_{i}\neq l_{p})]=N(1-\eta), and the variance 𝕍[X]=𝕍[i=1N𝟏(yilp)]=i=1N𝕍[𝟏(yilp)]=N(1η)η{\mathbb{V}}[X]={\mathbb{V}}[\sum_{i=1}^{N}{\mathbf{1}}(y_{i}\neq l_{p})]=\sum_{i=1}^{N}{\mathbb{V}}[{\mathbf{1}}(y_{i}\neq l_{p})]=N(1-\eta)\eta, because 𝟏(yilp){\mathbf{1}}(y_{i}\neq l_{p}) are independent events. By the Cantelli inequality, we have (XN(1η)/2)𝕍[X]𝕍[X]+(𝔼[X](N(1η)/2))2=4η4η+NNη{\mathbb{P}}(X\leq N(1-\eta)/2)\leq\frac{{\mathbb{V}}[X]}{{\mathbb{V}}[X]+({\mathbb{E}}[X]-(N(1-\eta)/2))^{2}}=\frac{4\eta}{4\eta+N-N\eta}, and Result (c1) is proved.

Result (c2): We randomly select N(1η)/2N(1-\eta)/2 samples without label lpl_{p} in 𝒟𝒫{\mathcal{D}}_{\mathcal{P}} and let DlD_{l} be the set of these samples. If the number of samples without label lpl_{p} in 𝒟𝒫{\mathcal{D}}_{\mathcal{P}} is less than N(1η)/2N(1-\eta)/2, then we select all samples without label lpl_{p}, and let DlD_{l} be the set of these samples.

Assuming that the set 𝒟l{\mathcal{D}}_{l} obeys the distribution 𝒟𝒮0{\mathcal{D}}_{{\mathcal{S}}_{0}}, we have x𝒟𝒮0(xA|XN(1η)/2)=(XN(1η)/2𝒟SlpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{0}}}(x\in A|X\geq N(1-\eta)/2)={\mathbb{P}}(X_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in A) for any set AA, where XN(1η)/2𝒟SlpX_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}} is the set of N(1η)/2N(1-\eta)/2 data i.i.d. sampled from distribution 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}}.

Following Lemma A.3, Result (c2) shows that when XN(1η)/2X\geq N(1-\eta)/2, 𝒟l{\mathcal{D}}_{l} can be seen as i.i.d. sampled from 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}}.

Result (c3): With probability 14η4η+NNηδ/21-\frac{4\eta}{4\eta+N-N\eta}-\delta/2, we have

𝔼(x,y)𝒟Slp[h(x,y)](x,y)𝒟Ph(x,y)N(1η)/2+2RadN(1η)/2𝒟Slp(H)+ln(2/δ).N(1η).\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]\leq\frac{\sum_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)}{N(1-\eta)/2}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N(1-\eta)/2}(H)+\sqrt{\frac{\ln(2/\delta).}{N(1-\eta)}}.\end{array}

D(n×[m])N(1η)/2D\in({\mathbb{R}}^{n}\times[m])^{N(1-\eta)/2} is called a bad set, if |D|=N(1η)/2|D|=N(1-\eta)/2, and

𝔼(x,y)𝒟Slp[h(x,y)]>(x,y)Dh(x,y)N(1η)/2+2RadN(1η)/2𝒟Slp(H)+ln(2/δ)N(1η)\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]>\frac{\sum_{(x,y)\in D}h(x,y)}{N(1-\eta)/2}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N(1-\eta)/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N(1-\eta)}}\end{array} (9)

for some fHf\in H. Let Sb={DD is a bad set}S_{b}=\{D\|D\ \hbox{ is a bad set}\}.

By Theorem A.1, if the samples in 𝒟{\mathcal{D}} are i.i.d. sampled form 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}} and |D|=N(1η)/2|D|=N(1-\eta)/2, then with probability 1δ/21-\delta/2, we have 𝔼(x,y)𝒟Slp[h(x,y)](x,y)Dh(x,y)N(1η)/2+2RadN(1η)/2(H)+ln(2/δ)N(1η){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]\leq\frac{\sum_{(x,y)\in D}h(x,y)}{N(1-\eta)/2}+2{\hbox{\rm{Rad}}}_{N(1-\eta)/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N(1-\eta)}} for any fHf\in H.

So by the definition of bad set, we have (XN(1η)/2𝒟SlpSb)δ/2{\mathbb{P}}(X_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}}\in S_{b})\leq\delta/2, where XN(1η)/2𝒟SlpX_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}} is mentioned in result (c2). And by Result (c2), we have that: when XN(1η)/2X\geq N(1-\eta)/2, we have (DlSb)δ/2{\mathbb{P}}(D_{l}\in S_{b})\leq\delta/2. Then, by Result (c1), we have

(DlSb,XN(1η)/2)=(DlSb|XN(1η)/2)(XN(1η)/2)(1δ/2)(14η4η+NNη)(by(c1))1δ/24η4η+NNη.\begin{array}[]{ll}&{\mathbb{P}}(D_{l}\notin S_{b},X\geq N(1-\eta)/2)\\ =&{\mathbb{P}}(D_{l}\notin S_{b}|X\geq N(1-\eta)/2){\mathbb{P}}(X\geq N(1-\eta)/2)\\ \geq&(1-\delta/2)(1-\frac{4\eta}{4\eta+N-N\eta})(by\ (c1))\\ \geq&1-\delta/2-\frac{4\eta}{4\eta+N-N\eta}.\\ \end{array}

So, with probability at least 1δ/24η4η+NNη1-\delta/2-\frac{4\eta}{4\eta+N-N\eta}, DlD_{l} is not in SbS_{b} and XN(1η)/2X\geq N(1-\eta)/2. Hence,

𝔼(x,y)𝒟Slp[h(x,y)](x,y)Dlh(x,y)N(1η)/2+2RadN(1η)/2(H)+ln(2/δ)N(1η)(x,y)𝒟Ph(x,y)N(1η)/2+2RadN(1η)/2(H)+ln(2/δ)N(1η).\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]\\ \leq&\frac{\sum_{(x,y)\in D_{l}}h(x,y)}{N(1-\eta)/2}+2{\hbox{\rm{Rad}}}_{N(1-\eta)/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N(1-\eta)}}\\ \leq&\frac{\sum_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)}{N(1-\eta)/2}+2{\hbox{\rm{Rad}}}_{N(1-\eta)/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N(1-\eta)}}.\end{array}

This is what we want.

The upper bound of 𝔼(x,y)𝒟𝒮lp[h(x,y)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x,y)].

We will give such a bound in Results (d1), (d2), and (d3) below, which are similar to results (c1), (c2), (c3).

Result (d1): Let the random variable XX be the number of samples with label lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}. Then with probability at least 14(1η)4(1η)+Nη1-\frac{4(1-\eta)}{4(1-\eta)+N\eta}, we have XNη/2X\geq N\eta/2. The proof is similar to that of Result (c1).

Result (d2): Now we evenly select [Nη(1α)/2][N\eta(1-\alpha)/2] samples with label lpl_{p} in 𝒟𝒫/𝒟poi{\mathcal{D}}_{\mathcal{P}}/{\mathcal{D}}_{poi} and make these samples to form the set DlpD_{l_{p}}. If the number of samples with label lpl_{p} in 𝒟𝒫/Dpoi{\mathcal{D}}_{\mathcal{P}}/\/D_{poi} is smaller than [Nη(1α)/2][N\eta(1-\alpha)/2], then we select all of these samples and add these samples to DlpD_{l_{p}}.

Assume that the set 𝒟lp{\mathcal{D}}_{l_{p}} obeys the distribution 𝒟𝒮1{\mathcal{D}}_{{\mathcal{S}}_{1}}. Then we have x𝒟𝒮1(xA|X[Nη/2])=(X[Nη(1α)/2]𝒟𝒮lpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}_{1}}}(x\in A|X\geq[N\eta/2])={\mathbb{P}}(X_{[N\eta(1-\alpha)/2]}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A) for any set AA, where X[Nη(1α)/2]𝒟𝒮lpX_{[N\eta(1-\alpha)/2]}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}} is the set that i.i.d. selecting [Nη(1α)/2][N\eta(1-\alpha)/2] samples from distribution 𝒟Slp{\mathcal{D}}_{S}^{\neq l_{p}}.

Following Lemma A.4, Result (d2) shows that when X[Nη/2]X\geq[N\eta/2], DlpD_{l_{p}} can be seen as i.i.d. samples from D𝒮lpD^{l_{p}}_{\mathcal{S}}.

Result (d3): With probability 144η4(1η)+Nηδ/21-\frac{4-4\eta}{4(1-\eta)+N\eta}-\delta/2, we have

𝔼(x,y)𝒟𝒮lp[h(x,y)](x,y)𝒟Ph(x,y)N(1α)η/2+2RadN(1α)η/2𝒟Slp(H)+ln(2/δ)N(1α)η.\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x,y)]\leq\frac{\sum_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)}{N(1-\alpha)\eta/2}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N(1-\alpha)\eta/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N(1-\alpha)\eta}}.\end{array}

This is similar to Result (c3), but using (d1) and (d2) instead.

To obtain a bound of E(x,y)𝒟S[h(x,y)]E_{(x,y)\sim{\mathcal{D}}_{S}}[h(x,y)].

Using the fact RadMD(H)NMRadND(H){\hbox{\rm{Rad}}}^{D}_{M}(H)\leq\frac{N}{M}{\hbox{\rm{Rad}}}^{D}_{N}(H) for any MNM\leq N, distribution 𝒟{\mathcal{D}}, hypothesis space HH, and applying (c3), (d3) to equation (8), we finally have

𝔼(x,y)𝒟S[h(x,y)]=η𝔼(x,y)𝒟𝒮lp[h(x,y)]+(1η)𝔼(x,y)𝒟Slp[h(x,y)]42α1α(x,y)𝒟Ph(x,y)N+4RadN𝒟Slp(H)+4RadN𝒟Slp(H)1α+ln(2/δ)(1η)N+ln(2/δ)ηN(1α).\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x,y)]\\ =&\eta{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x,y)]+(1-\eta){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x,y)]\\ \leq&\frac{4-2\alpha}{1-\alpha}\frac{\sum_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)}{N}+4{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N}(H)+4\frac{{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N}(H)}{1-\alpha}+\sqrt{\frac{\ln(2/\delta)(1-\eta)}{N}}+\sqrt{\frac{\ln(2/\delta)\eta}{N(1-\alpha)}}.\end{array} (10)

Then using η<1\eta<1, we prove the theorem. ∎

A.3 Estimate Rademacher Complexity

In this section, we will estimate the Rademacher Complexities in Theorem A.2 when H={𝟏((x)y)}H=\{{\mathbf{1}}({\mathcal{F}}(x)\neq y)\}, where {\mathcal{F}} is a network with width WW and depth DD. We first need a definition:

Definition A.5.

Let HH be a hypothesis space. Then the growth function ΠH(N)\Pi_{H}(N) of HH is defined as:

ΠH(N)=max{xi}i=1N|{(h(xi))i=1NhH}|.\Pi_{H}(N)=\max_{\{x_{i}\}_{i=1}^{N}}|\{(h(x_{i}))_{i=1}^{N}\|h\in H\}|.

For a hypothesis space H={h:n{1,1}}H=\{h:{\mathbb{R}}^{n}\to\{-1,1\}\}, we can estimate its VCdim (Vapnik & Chervonenkis, 2015), and the relationship between VCdim and growth function.

Lemma A.6 ((Bartlett et al., 2021, 2019)).

Let HH be the hypothesis space that satisfies: H{\mathcal{F}}\in H if and only if {\mathcal{F}} is a network with width not more than WW and depth not more than DD, and the activation function of each hidden layer of {\mathcal{F}} is Relu, the output layer uses sign as activation function. Then the VCdim of HH is O(D2W2)O(D^{2}W^{2}).

Lemma A.7 ((Sauer, 1972)).

Let HH be the hypothesis space with VCdim VV. Then for any N1N\geq 1, the growth function satisfies ΠH(N)(eN)V\Pi_{H}(N)\leq(eN)^{V}.

Lemma A.8 ((Massart, 2000; Mohri et al., 2018)).

Let HH be the hypothesis space with growth function ΠH\Pi_{H}, and any h(x)Hh(x)\in H satisfy |h(x)|1|h(x)|\leq 1 for any xx. Then for any distribution 𝒟{\mathcal{D}}, we have RadN𝒟(H)=O(ln(ΠH)N){\hbox{\rm{Rad}}}^{\mathcal{D}}_{N}(H)=O(\sqrt{\frac{\ln(\Pi_{H})}{N}}).

Lemma A.9 ((Mohri et al., 2018)).

Let HH be the hypothesis space, qm{0,1}q\in{\mathbb{R}}^{m}\to\{0,1\}, and Hq={q(h1(x),h2(x),,hm(x))hiH}H_{q}=\{q(h_{1}(x),h_{2}(x),\dots,h_{m}(x))\|h_{i}\in H\}. Then ΠHq(N)(ΠH(N))m\Pi_{H_{q}}(N)\leq(\Pi_{H}(N))^{m}.

Now, we calculate the Rademacher complexity of H={𝟏(^(x)y)W,D}H=\{{\mathbf{1}}(\widehat{{\mathcal{F}}}(x)\neq y)\|{\mathcal{F}}\in{\mathcal{H}}_{W,D}\}:

Lemma A.10.

Let H={𝟏(^(x)y):[0,1]n×[m]{0,1}W,D}H=\{{\mathbf{1}}(\widehat{{\mathcal{F}}}(x)\neq y):[0,1]^{n}\times[m]\to\{0,1\}\|{\mathcal{F}}\in{\mathcal{H}}_{W,D}\}. Then, for any distribution 𝒟{\mathcal{D}}, we have RadN𝒟(H)=O(mW2D2N){\hbox{\rm{Rad}}}^{\mathcal{D}}_{N}(H)=O(\sqrt{\frac{mW^{2}D^{2}}{N}}).

Proof.

Let HW,D0H^{0}_{W,D} be defined as: HW,D0{\mathcal{F}}\in H^{0}_{W,D} if and only if {\mathcal{F}} is a network with width WW and depth DD, and the activation function of each hidden layer of {\mathcal{F}} is Relu, the output layer does not use the activation function. And let H0={𝟏(^(x)y)HW,D0}H^{0}=\{{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)\|{\mathcal{F}}\in H^{0}_{W,D}\}.

Let HW,D1H^{1}_{W,D} be defined as: HW,D1{\mathcal{F}}\in H^{1}_{W,D} if and only if {\mathcal{F}} is a network with width WW and depth DD, and the activation function of each hidden layer of {\mathcal{F}} is Relu, the output layer uses the activation function sign.

Then we have that H0={𝟏(^(x)y)HW,D0}={𝟏(^(x)y)W,D}=HH^{0}=\{{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)\|{\mathcal{F}}\in H^{0}_{W,D}\}=\{{\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)\|{\mathcal{F}}\in{\mathcal{H}}_{W,D}\}=H and ΠHW,D1(N)(eN)O(D2W2)\Pi_{H^{1}_{W,D}}(N)\leq(eN)^{O(D^{2}W^{2})} by Lemmas A.7 and A.6.

For any HW,D0{\mathcal{F}}\in H^{0}_{W,D}, let i,j=sign(ij){\mathcal{F}}_{i,j}={\hbox{\rm{sign}}}({\mathcal{F}}_{i}-{\mathcal{F}}_{j}), where i{\mathcal{F}}_{i} is the ii-th weight of {\mathcal{F}}. Then it is easy to see that, i,jHW,D1{\mathcal{F}}_{i,j}\in H^{1}_{W,D}. Since 𝟏(^(x)y)=𝟏(jyy,j+(m1)0.1){\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y)={\mathbf{1}}(-\sum_{j\neq y}{\mathcal{F}}_{y,j}+(m-1)-0.1). By Lemma A.9, the growth function of H0H^{0} is (eN)mO(D2W2)(eN)^{mO(D^{2}W^{2})}; so by Lemma A.8 and H0=HH^{0}=H, the Rademacher complexity RadN𝒟(H){\hbox{\rm{Rad}}}^{\mathcal{D}}_{N}(H) of HH is O(mD2W2N)O(\sqrt{\frac{mD^{2}W^{2}}{N}}) (ignore minor items). This proves the lemma. ∎

A.4 Proof for Theorem 4.1

Now we prove Theorem 4.1 by using Sections A.2 and A.3:

Proof.

Taking H={𝟏((x)y)W,D}H=\{{\mathbf{1}}({\mathcal{F}}(x)\neq y)\|{\mathcal{F}}\in{\mathcal{H}}_{W,D}\} and substituting the Rademacher complexity in Section A.3 into Theorem A.2, we prove Theorem 4.1. ∎

Appendix B Poison Impact Empirical Errors

In this appendix, we prove a proposition to support Remark 4.3. In Proposition B.2 below, we show that if the poison 𝒫(x){\mathcal{P}}(x) is not satisfactory, then it can result in a big empirical error.

Remark B.1.

Please note that the conclusion “poison implies big empirical error” only holds in some situations. In fact, when 𝒫(x){\mathcal{P}}(x) is bounded, or the network in the hypothesis space has a strong expressive ability, then the conditions for the proposition in this section will not hold, so the poison will not lead to big empirical error. In our experimental result in Section 6, there is no need to consider the occurrence of large empirical error, because we bound the trigger 𝒫(x){\mathcal{P}}(x) and use a large network.

Let 𝒟poi{\mathcal{D}}^{poi} be the distribution of poisoned data with label lpl_{p}, that is,

(x,y)𝒟Slp(x+𝒫(x)A)=x𝒟poi(xA){\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{l_{p}}}(x+{\mathcal{P}}(x)\in A)={\mathbb{P}}_{x\sim{\mathcal{D}}^{poi}}(x\in A)

for any set AA.

Proposition B.2.

Use the notation introduced in Theorem A.2 and let η0.5\eta\leq 0.5 be defined in (3). We further assume that h(x,y1)+h(x,y2)1h(x,y_{1})+h(x,y_{2})\geq 1 for any hHh\in H, xnx\in{\mathbb{R}}^{n}, y1y2y_{1}\neq y_{2}, and for some τ,V+\tau,V\in{\mathbb{R}}_{+}, it holds:
(1) x𝒟poi(xA)τ(x,y)𝒟Slp(xA){\mathbb{P}}_{x\sim{\mathcal{D}}^{poi}}(x\in A)\geq\tau{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}(x\in A) for any set AA;
(2) RadN𝒟Slp(H)V{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N}(H)\leq V.
Let Q=ηNτNαQ=\eta^{N}\tau^{N\alpha}, if αδ/Q>0\alpha-\delta/Q>0 and 0.52δα/QVα>00.5-2\delta\alpha/Q-V\alpha>0. Then with probability δ\delta, for any hHh\in H we have

𝔼(x,y)𝒟P[h(x,y)]>0.52δα/QVααδ/Q.{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x,y)]>\frac{0.5-2\delta\alpha/Q-V\alpha}{\alpha-\delta/Q}.

First, we have the following lemma (Bertsekas & Tsitsiklis, 2008).

Lemma B.3.

If distributions D1D_{1}, D2D_{2} and function hh satisfy xD1(h(x)A)λxD2(h(x)A){\mathbb{P}}_{x\sim D_{1}}(h(x)\in A)\leq\lambda{\mathbb{P}}_{x\sim D_{2}}(h(x)\in A) for any set AA, then 𝔼xD1[f(h(x))]λ𝔼xD2[f(h(x))]{\mathbb{E}}_{x\sim D_{1}}[f(h(x))]\leq\lambda{\mathbb{E}}_{x\sim D_{2}}[f(h(x))] for any bounded and positive measurable function f(x)f(x).

Then we prove the following lemma, which directly lead to Proposition B.2.

Lemma B.4.

Use the notations in Proposition B.2. We further assume that h(x,y1)+h(x,y2)1h(x,y_{1})+h(x,y_{2})\geq 1 for any hHh\in H, xnx\in{\mathbb{R}}^{n} and y1y2y_{1}\neq y_{2}, and for some τ,V+\tau,V\in{\mathbb{R}}_{+}, it holds:
(1): x𝒟poi(xA)τ(x,y)𝒟Slp(xA){\mathbb{P}}_{x\sim{\mathcal{D}}^{poi}}(x\in A)\geq\tau{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}(x\in A) for all sets AA, where τ+\tau\in{\mathbb{R}}_{+};
(2): with probability 1δ1-\delta, there exists an hHh\in H such that 𝔼(x,y)𝒟P[h(x,y)]ε{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x,y)]\leq{\mathbf{\varepsilon}}.
Then for any KNαK\leq N\alpha, we have

RadK𝒟Slp(H)0.5Nε/K(2Nε/K)δαηNτK.RadN𝒟Slp(H)K(0.5Nε/K(2Nε/K)δαηNτK)/N.\begin{array}[]{l}{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{K}(H)\geq 0.5-N{\mathbf{\varepsilon}}/K-\frac{(2-N{\mathbf{\varepsilon}}/K)\delta\alpha}{\eta^{N}\tau^{K}}.\\ {\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{N}(H)\geq K(0.5-N{\mathbf{\varepsilon}}/K-\frac{(2-N{\mathbf{\varepsilon}}/K)\delta\alpha}{\eta^{N}\tau^{K}})/N.\\ \end{array}
Proof.

In order to calculate the probability easily in the proof, we treat 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} and 𝒟P{\mathcal{D}}_{P} as ordered sets. Let (xi,yi)(x_{i},y_{i}) be the ii-th element of 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}. 𝒟P{\mathcal{D}}_{P} is obtained from 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} as follows: Let 𝒟P=𝒟tr{\mathcal{D}}_{P}={\mathcal{D}}_{{{\rm{tr}}}} first and if poison 𝒫(xi){\mathcal{P}}(x_{i}) is added to (xi,yi)(x_{i},y_{i}) for some ii, then replace the ii-th element of 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} by (xi+𝒫(xi),yi)(x_{i}+{\mathcal{P}}(x_{i}),y_{i}).

First, we give three notations:

(1): For the poisoned training set 𝒟P{\mathcal{D}}_{P} and qq\in{\mathbb{N}}, we take the first qq clean samples without lpl_{p} in 𝒟P{\mathcal{D}}_{P} to form a subset Fclean(q,𝒟p)F_{clean}(q,{\mathcal{D}}_{p}), that is, if we write 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x_{i},y_{i})\}_{i=1}^{N}, then Fclean(q,𝒟P)={(xik,yik)}k=1qF_{clean}(q,{\mathcal{D}}_{P})=\{(x_{i_{k}},y_{i_{k}})\}_{k=1}^{q}, where {ik}i=1q\{i_{k}\}_{i=1}^{q} is the smallest qq numbers in [N][N] that satisfy yiklpy_{i_{k}}\neq l_{p}, (xik,yik)𝒟clean(x_{i_{k}},y_{i_{k}})\in{\mathcal{D}}_{clean}, and ia<ibi_{a}<i_{b} if a<ba<b.

(2): For the poisoned training set 𝒟P{\mathcal{D}}_{P} and qq\in{\mathbb{N}}, we take the first qq poisoned samples in 𝒟P{\mathcal{D}}_{P} to form the subset Fpoison(q,𝒟p)F_{poison}(q,{\mathcal{D}}_{p}), that is, if we write 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x_{i},y_{i})\}_{i=1}^{N}, then Fpoison(q,𝒟P)={(xik+𝒫(xik),yik)}k=1qF_{poison}(q,{\mathcal{D}}_{P})=\{(x_{i_{k}}+{\mathcal{P}}(x_{i_{k}}),y_{i_{k}})\}_{k=1}^{q}, {ik}i=1q\{i_{k}\}_{i=1}^{q} is the smallest qq numbers in [N][N] that satisfy yik=lpy_{i_{k}}=l_{p}, (xik+𝒫(xik),yik)𝒟poi(x_{i_{k}}+{\mathcal{P}}(x_{i_{k}}),y_{i_{k}})\in{\mathcal{D}}_{poi}, and ia<ibi_{a}<i_{b} if a<ba<b.

(3): For any given KK samples {(xi,yi)}i=1K\{(x_{i},y_{i})\}_{i=1}^{K} where yilpy_{i}\neq l_{p} and a given set S[K]S\subset[K], let SiS_{i} be the ii-th minimum number in SS, in particular, 𝒮1{\mathcal{S}}_{1} is the minimum number in SS and S|S|S_{|S|} is the maximum number in SS.

Second, we define a property of 𝒟P{\mathcal{D}}_{P}:

The poisoned training set 𝒟P{\mathcal{D}}_{P} obtained by poisoning 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} is said to be nice-inclusion of a new dataset 𝒟G{\mathcal{D}}_{G} if 𝒟G=𝒟TpDT{\mathcal{D}}_{G}={\mathcal{D}}_{T_{p}}\cup D_{T} such that DT𝒟trD_{T}\subset{\mathcal{D}}_{{{\rm{tr}}}}, DTp𝒟poiD_{T_{p}}\subset{\mathcal{D}}^{poi}, and minhH1|𝒟P|(x,y)𝒟Ph(x,y)ε\min_{h\in H}\frac{1}{|{\mathcal{D}}_{P}|}\sum_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)\leq{\mathbf{\varepsilon}}. For convenience, we write these conditions more explicitly as follows.

The poisoned training set 𝒟P{\mathcal{D}}_{P} obtained by poisoning 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x^{\prime}_{i},y^{\prime}_{i})\}_{i=1}^{N} is said to be nice-inclusion of the ordered dataset 𝒟G={(zi,li)}i=1K{\mathcal{D}}_{G}=\{(z_{i},l_{i})\}_{i=1}^{K} and S[K]S\subset[K] satsfying |[K]S|=|𝒟poi||[K]\setminus S|=|{\mathcal{D}}_{poi}|, if
(e1): Let Fclean(|S|,𝒟P)={(xik,yik)}k=1|S|F_{clean}(|S|,{\mathcal{D}}_{P})=\{(x^{\prime}_{i_{k}},y^{\prime}_{i_{k}})\}_{k=1}^{|S|} such that xik=zSkx^{\prime}_{i_{k}}=z_{S_{k}} and yik=lSky^{\prime}_{i_{k}}=l_{S_{k}} for any k[S]k\in[S];
(e2): Let Fpoison(|[K]S|,𝒟P)={(xik+𝒫(xik),yik)}k=1K|S|F_{poison}(|[K]\setminus S|,{\mathcal{D}}_{P})=\{(x^{\prime}_{i_{k}}+{\mathcal{P}}(x^{\prime}_{i_{k}}),y^{\prime}_{i_{k}})\}_{k=1}^{K-|S|}. There must be xik+𝒫(xik)=z([K]S)kx^{\prime}_{i_{k}}+{\mathcal{P}}(x^{\prime}_{i_{k}})=z_{([K]\setminus S)_{k}} for any k[K|S|]k\in[K-|S|].
(e3): minfH(x,y)𝒟Ph(x,y)|𝒟P|ε\min_{f\in H}\sum_{(x,y)\in{\mathcal{D}}_{P}}\frac{h(x,y)}{|{\mathcal{D}}_{P}|}\leq{\mathbf{\varepsilon}}.

We say that the poisoned training set 𝒟P{\mathcal{D}}_{P} obtained by poisoning 𝒟tr={(xi,yi)}i=1N{\mathcal{D}}_{{{\rm{tr}}}}=\{(x^{\prime}_{i},y^{\prime}_{i})\}_{i=1}^{N} is said to be common-nice-inclusion of the ordered dataset 𝒟G={(zi,li)}i=1K{\mathcal{D}}_{G}=\{(z_{i},l_{i})\}_{i=1}^{K} and S[K]S\subset[K] satisfying |[K]S|=|𝒟poi||[K]\setminus S|=|{\mathcal{D}}_{poi}|, if (e1) and (e2) hold.

Now, we define some functions:

Let vi{1,1}v_{i}\in\{-1,1\} for i[K]i\in[K] and S((vi)i=1K)={ii[K],vi<0}S((v_{i})_{i=1}^{K})=\{i\|i\in[K],v_{i}<0\}. Given KK samples {(xi,yi)}i=1K\{(x_{i},y_{i})\}_{i=1}^{K} where yilpy_{i}\neq l_{p}, we define that Si({(xi,yi)}i=1K,S((vi)i=1K))=1S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))=1, if there is a nice-inclusion poison set 𝒟P{\mathcal{D}}_{P} of {(xi,yi)}i=1K\{(x_{i},y_{i})\}_{i=1}^{K} and S((vi)i=1K)S((v_{i})_{i=1}^{K}); otherwise Si({(xi,yi)}i=1K,S((vi)i=1K))=0S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))=0. Then we have the following results.

Result one: If Si((xi,yi)i=1K,S((vi)i=1K))=1S_{i}({(x_{i},y_{i})}_{i=1}^{K},S((v_{i})_{i=1}^{K}))=1 and yilpy_{i}\neq l_{p}, then supfHi[K]vih(xi,yi)i=1K𝟏(vi>0)Nε\sup_{f\in H}\sum_{i\in[K]}v_{i}h(x_{i},y_{i})\geq\sum_{i=1}^{K}{\mathbf{1}}(v_{i}>0)-N{\mathbf{\varepsilon}}.

Let 𝒟P{\mathcal{D}}_{P} be a nice-inclusion of (xi,yi)i=1K,S((vi)i=1K){(x_{i},y_{i})}_{i=1}^{K},S((v_{i})_{i=1}^{K}). Then

εminfH(x,y)𝒟Ph(x,y)|𝒟P|(by(e3))minfH(iS((vi)i=1K)h(xi,yi))/|𝒟P|+(i[K]/S((vi)i=1K)h(xi,lp))/|𝒟P|(by(e1,e2))minfH(iS((vi)i=1K)h(xi,yi))/|𝒟P|+(i[K]/S((vi)i=1K)1h(xi,yi))/|𝒟P|(byyilp)=minfH(|[K]/S((vi)i=1K)|i[K]vih(xi,yi))/|𝒟P|=(i=1K𝟏(vi>0)supfHi[K]vih(xi,yi))/|𝒟P|=(i=1K𝟏(vi>0)supfHi[K]vih(xi,yi))/N.\begin{array}[]{ll}&{\mathbf{\varepsilon}}\\ \geq&\min_{f\in H}\sum_{(x,y)\in{\mathcal{D}}_{P}}\frac{h(x,y)}{|{\mathcal{D}}_{P}|}(by\ (e3))\\ \geq&\min_{f\in H}(\sum_{i\in S((v_{i})_{i=1}^{K})}h(x_{i},y_{i}))/|{\mathcal{D}}_{P}|+(\sum_{i\in[K]/S((v_{i})_{i=1}^{K})}h(x_{i},l_{p}))/|{\mathcal{D}}_{P}|(by\ (e1,e2))\\ \geq&\min_{f\in H}(\sum_{i\in S((v_{i})_{i=1}^{K})}h(x_{i},y_{i}))/|{\mathcal{D}}_{P}|+(\sum_{i\in[K]/S((v_{i})_{i=1}^{K})}1-h(x_{i},y_{i}))/|{\mathcal{D}}_{P}|(by\ y_{i}\neq l_{p})\\ =&\min_{f\in H}(|[K]/S((v_{i})_{i=1}^{K})|-\sum_{i\in[K]}v_{i}h(x_{i},y_{i}))/|{\mathcal{D}}_{P}|\\ =&(\sum_{i=1}^{K}{\mathbf{1}}(v_{i}>0)-\sup_{f\in H}\sum_{i\in[K]}v_{i}h(x_{i},y_{i}))/|{\mathcal{D}}_{P}|\\ =&(\sum_{i=1}^{K}{\mathbf{1}}(v_{i}>0)-\sup_{f\in H}\sum_{i\in[K]}v_{i}h(x_{i},y_{i}))/N.\\ \end{array}

The result is proved.

Result Two: In order to give the lower bound of the Rademacher complexity RadK𝒟l(H){\hbox{\rm{Rad}}}^{{\mathcal{D}}^{l}}_{K}(H), we just need to consider the upper bound of 𝔼(xi,yi)𝒟Slp[𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)]{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)] for each (σi)i=1K{1,1}K(\sigma_{i})_{i=1}^{K}\subset\{-1,1\}^{K}.

Let σi\sigma_{i} be Rademacher random variables, that is P(σi=1)=P(σi=1)=0.5P(\sigma_{i}=1)=P(\sigma_{i}=-1)=0.5. Then, by the definition of Rademacher complexity, we have

RadK𝒟l(H)=𝔼(xi,yi)𝒟Slp[𝔼σi[supfHi=1Kσih(xi,yi)K]]=𝔼(xi,yi)𝒟Slp[σisupfHi=1Kσih(xi,yi)2KK]𝔼(xi,yi)𝒟Slp[σi𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))=1)i=1K𝟏(σi>0)Nε2KK𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)12K]𝔼(xi,yi)𝒟Slp[σii=1K𝟏(σi>0)Nε2KK𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)2KNε2KK]=0.5Nε/K𝔼(xi,yi)𝒟Slp[σi𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)2KNε2KK].\begin{array}[]{ll}&{\hbox{\rm{Rad}}}^{{\mathcal{D}}^{l}}_{K}(H)\\ =&{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbb{E}}_{\sigma_{i}}[\sup_{f\in H}\frac{\sum_{i=1}^{K}\sigma_{i}h(x_{i},y_{i})}{K}]]\\ =&{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}\sup_{f\in H}\frac{\sum_{i=1}^{K}\sigma_{i}h(x_{i},y_{i})}{2^{K}K}]\\ \geq&{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))=1)\frac{\sum_{i=1}^{K}{\mathbf{1}}(\sigma_{i}>0)-N{\mathbf{\varepsilon}}}{2^{K}K}\\ &-{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)\frac{1}{2^{K}}]\\ \geq&{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}\frac{\sum_{i=1}^{K}{\mathbf{1}}(\sigma_{i}>0)-N{\mathbf{\varepsilon}}}{2^{K}K}-{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)\frac{2K-N{\mathbf{\varepsilon}}}{2^{K}K}]\\ =&0.5-N{\mathbf{\varepsilon}}/K-{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)\frac{2K-N{\mathbf{\varepsilon}}}{2^{K}K}].\end{array}

The first inequality uses Result one. So, if we want to give a lower bound of RadK𝒟l(H){\hbox{\rm{Rad}}}^{{\mathcal{D}}^{l}}_{K}(H), we just need to give an upper bound of 𝔼(xi,yi)𝒟Slp[σi𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)]{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)]. Furthermore, we just need to consider 𝔼(xi,yi)𝒟Slp[𝟏(Si({(xi,yi)}i=1K,S((σi)i=1K))1)]{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)] for each (σi)i=1K(\sigma_{i})_{i=1}^{K}. Result two is proved.

Result Three: Now, we prove 𝔼(xi,yi)𝒟Slp[𝟏(Si({(xi,yi)}i=1K,S((vi)i=1K))1)]<δηNτK{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))\neq 1)]<\frac{\delta}{\eta^{N}\tau^{K}} for any (vi)i=1K{1,1}K(v_{i})_{i=1}^{K}\in\{-1,1\}^{K}.

For a given (vi)i=1K{1,1}K(v_{i})_{i=1}^{K}\in\{-1,1\}^{K}, let set C(vi)i=1K={{(xi,yi)}i=1K|Si({(xi,yi)}i=1K,S((vi)i=1K))1,yilp}C_{(v_{i})_{i=1}^{K}}=\{\{(x_{i},y_{i})\}_{i=1}^{K}|S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))\neq 1,y_{i}\neq l_{p}\}, then 𝔼(xi,yi)𝒟Slp[𝟏(Si({(xi,yi)}i=1K,S((vi)i=1K))1)]=𝔼(xi,yi)𝒟Slp[𝟏({(xi,yi)}i=1KC(vi)i=1K)]{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))\neq 1)]={\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}})].

For (vi)i=1K{1,1}K(v_{i})_{i=1}^{K}\in\{-1,1\}^{K} and {(xi,yi)}i=1K\{(x_{i},y_{i})\}_{i=1}^{K}, let 𝔼(vi)i=1K{(xi,yi)}i=1K{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}} be the set of 𝒟P{\mathcal{D}}_{P}, which is a common-nice-inclusion for {(xi,yi)}i=1K\{(x_{i},y_{i})\}_{i=1}^{K} and S((vi)i=1K)S((v_{i})_{i=1}^{K}). It is easy to see that:
(r1): 𝔼(vi)i=1K((xi,yi))i=1K𝔼(vi)i=1K((xi,yi))i=1K=ϕ{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{((x_{i},y_{i}))_{i=1}^{K}}\cap{\mathbb{E}}_{(v^{\prime}_{i})_{i=1}^{K}}^{((x^{\prime}_{i},y^{\prime}_{i}))_{i=1}^{K}}=\phi when viviv_{i}\neq v^{\prime}_{i} for some i[K]i\in[K] or xixix_{i}\neq x^{\prime}_{i} for some i[K]i\in[K].
(r2): If 𝒟P𝔼(vi)i=1K{(xi,yi)}i=1K{\mathcal{D}}_{P}\in{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}} satisfies (e3), then Si({(xi,yi)}i=1K,S((vi)i=1K))=1S_{i}(\{(x_{i},y_{i})\}_{i=1}^{K},S((v_{i})_{i=1}^{K}))=1.

So by (r2), if {(xi,yi)}i=1KC(vi)i=1K\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}, then for any 𝒟P𝔼(vi)i=1K{(xi,yi)}i=1K{\mathcal{D}}_{P}\in{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}, (e3) cannot stand. Let the set BB contain all the 𝒟P{\mathcal{D}}_{P} that do not satisfy (e3). Then, if {(xi,yi)}i=1KC(vi)i=1K\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}, there must be 𝔼(vi)i=1K(xi,yi)i=1KB{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{{(x_{i},y_{i})}_{i=1}^{K}}\subset B.

Now we prove the following two results:

Result S1: 𝒟P𝟏(D{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K)dD<δ\int_{{\mathcal{D}}_{P}}{\mathbf{1}}(D\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}){\hbox{\rm{d}}}D<\delta.

By (r1) and 𝔼(vi)i=1K(xi,yi)i=1KB{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{{(x_{i},y_{i})}_{i=1}^{K}}\subset B for all {(xi,yi)}i=1KC(vi)i=1K\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}, we have that:

𝟏(DB)𝟏(𝒟{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K){\mathbf{1}}(D\in B)\geq{\mathbf{1}}({\mathcal{D}}\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}})

So 𝒟P𝟏(𝒟{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K)dD𝒟P𝟏(𝒟B)dDδ\int_{{\mathcal{D}}_{P}}{\mathbf{1}}({\mathcal{D}}\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}){\hbox{\rm{d}}}D\leq\int_{{\mathcal{D}}_{P}}{\mathbf{1}}({\mathcal{D}}\in B){\hbox{\rm{d}}}D\leq\delta, using condition (2) of the theorem here.

Result S2: 𝒟P𝟏(D{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K)dD𝔼(xi,yi)𝒟Slp[𝟏((xi,yi)C(vi)i=1K)]ηNτK/α\int_{{\mathcal{D}}_{P}}{\mathbf{1}}(D\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}){\hbox{\rm{d}}}D\geq{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}((x_{i},y_{i})\in C_{(v_{i})_{i=1}^{K}})]\eta^{N}\tau^{K}/\alpha.

Consider the definition of FcleanF_{clean} and poison{\mathcal{F}}_{poison} in (e1) and (e2). When Dp𝔼(vi)i=1K{(xi,yi)}i=1KD_{p}\in{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}, the first i=1K𝟏(vi<0)\sum_{i=1}^{K}{\mathbf{1}}(v_{i}<0) samples without label lpl_{p} in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} must be {(xik,yik)}\{(x_{i_{k}},y_{i_{k}})\}, where iki_{k} satisfies vik=1v_{i_{k}}=-1; and 𝒟poi={(xik,yik)}{\mathcal{D}}_{poi}=\{(x_{i_{k}},y_{i_{k}})\}, where iki_{k} satisfies vik=1v_{i_{k}}=1. Let N0=i=1K𝟏(vi<0)N_{0}=\sum_{i=1}^{K}{\mathbf{1}}(v_{i}<0), then

𝒟P𝟏(D{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K)dD=(𝒟Slp)N0(𝒟poi)KN0𝟏({(xi,yi)}C(vi)i=1K)(q=0N𝟏([qα]=KN0)CNqηq(1η)Nq)d(xi,yi)(𝒟Slp)N0(𝒟poi)KN0𝟏({(xi,yi)}C(vi)i=1K)(q=0N𝟏([qα]=KN0)ηN)d(xi,yi)(𝒟Slp)N0(𝒟poi)KN0𝟏({(xi,yi)}C(vi)i=1K)(ηN/α)d(xi,yi)(𝒟Slp)K𝟏((xi,yi)C(vi)i=1K)ηNτK/αd(xi,yi)=𝔼(xi,yi)𝒟Slp[𝟏((xi,yi)C(vi)i=1K)]ηNτK/α.\begin{array}[]{ll}&\int_{{\mathcal{D}}_{P}}{\mathbf{1}}(D\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}){\hbox{\rm{d}}}D\\ =&\int_{({\mathcal{D}}_{S}^{\neq l_{p}})^{N_{0}}({\mathcal{D}}_{poi})^{K-N_{0}}}{\mathbf{1}}(\{(x_{i},y_{i})\}\in C_{(v_{i})_{i=1}^{K}})(\sum_{q=0}^{N}{\mathbf{1}}([q\alpha]=K-N_{0})C_{N}^{q}\eta^{q}(1-\eta)^{N-q}){\hbox{\rm{d}}}(x_{i},y_{i})\\ \geq&\int_{({\mathcal{D}}_{S}^{\neq l_{p}})^{N_{0}}({\mathcal{D}}_{poi})^{K-N_{0}}}{\mathbf{1}}(\{(x_{i},y_{i})\}\in C_{(v_{i})_{i=1}^{K}})(\sum_{q=0}^{N}{\mathbf{1}}([q\alpha]=K-N_{0})\eta^{N}){\hbox{\rm{d}}}(x_{i},y_{i})\\ \geq&\int_{({\mathcal{D}}_{S}^{\neq l_{p}})^{N_{0}}({\mathcal{D}}_{poi})^{K-N_{0}}}{\mathbf{1}}(\{(x_{i},y_{i})\}\in C_{(v_{i})_{i=1}^{K}})(\eta^{N}/\alpha){\hbox{\rm{d}}}(x_{i},y_{i})\\ \geq&\int_{({\mathcal{D}}_{S}^{\neq l_{p}})^{K}}{\mathbf{1}}((x_{i},y_{i})\in C_{(v_{i})_{i=1}^{K}})\eta^{N}\tau^{K}/\alpha{\hbox{\rm{d}}}(x_{i},y_{i})\\ =&{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}((x_{i},y_{i})\in C_{(v_{i})_{i=1}^{K}})]\eta^{N}\tau^{K}/\alpha.\end{array}

The first inequality uses η0.5\eta\leq 0.5 and CNi1C_{N}^{i}\geq 1. The second inequality uses at least 1/α1/\alpha numbers of q[N]q\in[N] such that [qα]=KN0[q\alpha]=K-N_{0}. The last inequality uses Lemma B.3 and condition (1) in theorem. This proves Result S2.

Finally, by Result S1 and Result S2, we have (ηNτK/α)𝔼(xi,yi)𝒟Slp[𝟏({(xi,yi)}i=1KC(vi)i=1K)]𝒟P𝟏(D{(xi,yi)}i=1KC(vi)i=1K𝔼(vi)i=1K{(xi,yi)}i=1K)dDδ(\eta^{N}\tau^{K}/\alpha){\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}})]\leq\int_{{\mathcal{D}}_{P}}{\mathbf{1}}(D\in\bigcup_{\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}}}{\mathbb{E}}_{(v_{i})_{i=1}^{K}}^{\{(x_{i},y_{i})\}_{i=1}^{K}}){\hbox{\rm{d}}}D\leq\delta, that is, 𝔼(xi,yi)𝒟Slp[𝟏({(xi,yi)}i=1KC(vi)i=1K)]αδ/(ηNτK){\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[{\mathbf{1}}(\{(x_{i},y_{i})\}_{i=1}^{K}\in C_{(v_{i})_{i=1}^{K}})]\leq\alpha\delta/(\eta^{N}\tau^{K}).

Prove the lemma. Use Results three and two, we have

RadK𝒟Slp(H)0.5Nε/K𝔼(xi,yi)𝒟Slp[σi𝟏(Si(((xi,yi))i=1K,S((σi)i=1K))1)2KNε2KK]0.5Nε/K(2Nε/K)δαηNτK.\begin{array}[]{ll}&{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{\neq l_{p}}}_{K}(H)\\ \geq&0.5-N{\mathbf{\varepsilon}}/K-{\mathbb{E}}_{(x_{i},y_{i})\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[\sum_{\sigma_{i}}{\mathbf{1}}(S_{i}(((x_{i},y_{i}))_{i=1}^{K},S((\sigma_{i})_{i=1}^{K}))\neq 1)\frac{2K-N{\mathbf{\varepsilon}}}{2^{K}K}]\\ \geq&0.5-N{\mathbf{\varepsilon}}/K-\frac{(2-N{\mathbf{\varepsilon}}/K)\delta\alpha}{\eta^{N}\tau^{K}}.\end{array}

Since RadMD(H)NMRadND(H){\hbox{\rm{Rad}}}^{D}_{M}(H)\leq\frac{N}{M}{\hbox{\rm{Rad}}}^{D}_{N}(H) for any MNM\leq N and distribution DD, the lemma is proved. ∎

Now we use Lemma B.4 to prove Proposition B.2:

Proof.

Use reduction to absurdity. Assume that with probability at least 1δ1-\delta, there is 𝔼(x,y)𝒟P[h(x,y)]0.52δα/QVααδ/Q{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x,y)]\leq\frac{0.5-2\delta\alpha/Q-V\alpha}{\alpha-\delta/Q}, and take the right-hand size value as ϵ\epsilon.

By Lemma B.4, take K=NαK=N\alpha. Then RadN𝒟𝒮lp(0.5Nϵ/K(2Nϵ/K)δαηNτK)/α{\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{\mathcal{S}}^{\neq l_{p}}}\geq(0.5-N\epsilon/K-\frac{(2-N\epsilon/K)\delta\alpha}{\eta^{N}\tau^{K}})/\alpha. We substitute ϵ\epsilon in it. Then

RadN𝒟𝒮lp(0.5Nϵ/K(2Nϵ/K)δαηNτK)/α=1/α(0.52δαQϵ(αδ/Q))=1/α(0.52δαQ0.52δα/QVααδ/Q(αδ/Q))=1/α(0.52δαQ(0.52δα/QVα))=1/α(Vα)=V.\begin{array}[]{ll}&{\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{\mathcal{S}}^{\neq l_{p}}}\\ \geq&(0.5-N\epsilon/K-\frac{(2-N\epsilon/K)\delta\alpha}{\eta^{N}\tau^{K}})/\alpha\\ =&1/\alpha(0.5-\frac{2\delta\alpha}{Q}-\epsilon(\alpha-\delta/Q))\\ =&1/\alpha(0.5-\frac{2\delta\alpha}{Q}-\frac{0.5-2\delta\alpha/Q-V\alpha}{\alpha-\delta/Q}(\alpha-\delta/Q))\\ =&1/\alpha(0.5-\frac{2\delta\alpha}{Q}-(0.5-2\delta\alpha/Q-V\alpha))\\ =&1/\alpha(V\alpha)\\ =&V.\end{array}

So RadN𝒟𝒮lpV{\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{\mathcal{S}}^{\neq l_{p}}}\geq V, which is contradictory to (2) in Proposition B.2, and the proposition is proved. ∎

Appendix C Optimality of the Generalization Bound

In this section, we show that for the general hypothesis space and data distribution, the generalization gap between the empirical error and the population error cannot be smaller than O(1N)O(\frac{1}{\sqrt{N}}), mentioned in Remark 4.4.

Proposition C.1.

Let m=2m=2 and the data distribution 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}} satisfy P(x,y)𝒟S(y=1)=P(x,y)𝒟S(y=0)=0.5P_{(x,y)\sim{\mathcal{D}}_{S}}(y=1)=P_{(x,y)\sim{\mathcal{D}}_{S}}(y=0)=0.5. Let ={L((x),y)}{\mathcal{H}}=\{L({\mathcal{F}}(x),y)\} be the hypothesis space, L((x),y)=𝟏(^(x)y)L({\mathcal{F}}(x),y)={\mathbf{1}}(\widehat{\mathcal{F}}(x)\neq y) the loss function, and 0{\mathcal{F}}_{0} a neural network classifying ^0(x)=0\widehat{{\mathcal{F}}}_{0}(x)=0 for (x,y)𝒟𝒮(x,y)\sim{\mathcal{D}}_{{\mathcal{S}}}. Then for any c>0c>0, k>0.5k>0.5, and 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} i.i.d. sampled from 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}} with |𝒟tr|=N|{\mathcal{D}}_{{{\rm{tr}}}}|=N, we have

(|𝔼(x,y)𝒟S[L(0(x),y)]𝔼(x,y)𝒟tr[L(0(x),y)]|<cNk)=O(cN0.5k).{\mathbb{P}}(|{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[L({\mathcal{F}}_{0}(x),y)]-{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[L({\mathcal{F}}_{0}(x),y)]|<\frac{c}{N^{k}})=O({c}{N^{0.5-k}}).
Proof.

First, we have 𝔼(x,y)𝒟S[L(0(x),y)]=𝔼(x,y)𝒟S[𝟏(y=1)]=0.5{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[L({\mathcal{F}}_{0}(x),y)]={\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(y=1)]=0.5. Then we have 𝔼(x,y)𝒟tr[L(0(x),y)]=𝔼(x,y)𝒟tr[𝟏(y=1)]=N1/N{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[L({\mathcal{F}}_{0}(x),y)]={\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[{\mathbf{1}}(y=1)]=N_{1}/N, where N1N_{1} is the number of xx with label 1 in 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}}. So, (|𝔼(x,y)𝒟S[L(0(x),y)]𝔼(x,y)𝒟tr[L(0(x),y)]|<cNk)=(|0.5N1/N|<cNk){\mathbb{P}}(|{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[L({\mathcal{F}}_{0}(x),y)]-{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}}[L({\mathcal{F}}_{0}(x),y)]|<\frac{c}{N^{k}})={\mathbb{P}}(|0.5-N_{1}/N|<\frac{c}{N^{k}}).

To estimate (|0.5N1/N|<cNk){\mathbb{P}}(|0.5-N_{1}/N|<\frac{c}{N^{k}}), we only need to calculate the probability of N1(0.5NcN1k,0.5N+cN1k)N_{1}\in(0.5N-cN^{1-k},0.5N+cN^{1-k}). Since 𝒟tr{\mathcal{D}}_{{{\rm{tr}}}} is i.i.d. sampled from 𝒟𝒮{\mathcal{D}}_{{\mathcal{S}}}, a sample labeled 1 is selected with probability 0.5. Thus,

(N1(0.5NcN1k,0.5N+cN1k))i=0.5NcN1k0.5N+cN1kCNi0.5N2cN1kCNN/20.5N.\begin{array}[]{ll}&{\mathbb{P}}(N_{1}\in(0.5N-cN^{1-k},0.5N+cN^{1-k}))\\ \leq&\sum_{i=0.5N-cN^{1-k}}^{0.5N+cN^{1-k}}C_{N}^{i}0.5^{N}\\ \leq&2cN^{1-k}C_{N}^{N/2}0.5^{N}.\\ \end{array}

When NN\to\infty, using Stirling’s approximation to calculate the CNN/2C_{N}^{N/2}, we have

(N1(0.5NcN1k,0.5N+cN1k))2cN1kCNN/20.5N=2cN1k0.5NO(2πN(N/e)NπN(N/(2e))N)=O(cN0.5k).\begin{array}[]{ll}&{\mathbb{P}}(N_{1}\in(0.5N-cN^{1-k},0.5N+cN^{1-k}))\\ \leq&2cN^{1-k}C_{N}^{N/2}0.5^{N}\\ =&2cN^{1-k}0.5^{N}O(\frac{\sqrt{2\pi N}(N/e)^{N}}{\pi N(N/(2e))^{N}})\\ =&O(cN^{0.5-k}).\end{array}

The proposition is proved. ∎

It is easy to see that O(cN0.5k)0O({c}{{N^{0.5-k}}})\to 0 when NN\to\infty, so by Proposition C.1, the generalization gap cannot be smaller than O(1N)O(\frac{1}{\sqrt{N}}). Together with Theorem A.1, we show the optimality of O(1N)O(\frac{1}{\sqrt{N}}) of the generalization gap for a clean dataset.

This is also for the generalization bound under poison attacks, because the proof of Theorem 4.1 need the generalization bound Theorem A.1 for the dataset.

Appendix D Proof of Theorem 4.5

We provide a more intuitive explanation on how to estimate the poison generalization bound in Theorem 4.5, which is the core of our theorem.

Based on research on indiscriminate poisoning, neural networks always prioritize learning simple features, i.e., shortcut. So we try to make the trigger to be a shortcut (as said in condition (c3) in Theorem 4.5 and Proposition 4.10).

However, only a few samples were poisoned in the backdoor attack, which is different from the setting of indiscriminate poisoning, so only using shortcut cannot establish the bound. Therefore, we aim to disrupt the original features of the poisoned images, such that the classification of the poisoned images and the classification of clean images become two independent tasks for the network (as said in condition (c1) in Theorem 4.5). By doing so, when the network completes the task of classifying poison data, the clean part of the data set is useless, and the network will learn the feature in the poison part of data, so that the shortcut can be learned.

Finally, if the shortcuts are similar for different images, the shortcuts learned on a small portion of the data can be generalized to all the data (as said in condition (c2) in Theorem 4.5).

By the above idea, we will prove a more general form of Theorem 4.5, as shown below, and Theorem 4.5 is an easy corollary of this theorem.

Theorem D.1.

Use the notation in Section 3. Let N=|𝒟tr|N=|{\mathcal{D}}_{{{\rm{tr}}}}|. For any two hypothesis spaces W,D{\mathcal{H}}\subset{\mathcal{H}}_{W,D} and FW,DF\subset{\mathcal{H}}_{W,D}, if 𝒫(x){\mathcal{P}}(x) satisfies the following conditions for some ϵ>0,τ>0,λ1\epsilon>0,\tau>0,\lambda\geq 1:
(c1): 𝔼(x,y)𝒟𝒮lp[fy(x+𝒫(x))]ϵ{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[f_{y}(x+{\mathcal{P}}(x))]\leq\epsilon for all fFf\in F,
(c2): (x,y)𝒟𝒮(𝒫(x)A|ylp)λ{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{{\mathcal{S}}}}({\mathcal{P}}(x)\in A|y\neq l_{p})\leq\lambda (x,y)𝒟𝒮{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{{\mathcal{S}}}} (𝒫(x)A|y=lp)({\mathcal{P}}(x)\in A|y=l_{p}) for any set AA,
(c3): some hHh\in H satisfies that: fF\exists f\in F such that 𝔼x𝒟𝒮[|(hf)lP(𝒫(x))(hf)lp(x+𝒫(x))|]τ{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(h-f)_{l_{P}}({\mathcal{P}}(x))-(h-f)_{l_{p}}(x+{\mathcal{P}}(x))|]\leq\tau, where (hf)lP(x)=hlP(x)flp(x)(h-f)_{l_{P}}(x)=h_{l_{P}}(x)-f_{l_{p}}(x),
then with probability at least 1δO(1/N)1-\delta-O(1/N), the following inequality holds for all hHh\in H satisfying (c3):

P(h,𝒟𝒮)λO(1α(𝔼(x,y)𝒟P[LCE(h(x),y)]+RadN𝒟Slp(W,D,1))+ln(1/δ)Nα+ϵ+τ+λ1λ).\begin{array}[]{ll}&{\mathcal{E}}_{P}(h,{\mathcal{D}}_{{\mathcal{S}}})\leq\lambda O(\frac{1}{\alpha}({\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[L_{CE}(h(x),y)]+{\hbox{\rm{Rad}}}_{N}^{{\mathcal{D}}_{S}^{l_{p}}}({\mathcal{H}}_{W,D,1}))+\sqrt{\frac{\ln(1/\delta)}{N\alpha}}+\epsilon+\tau+\frac{\lambda-1}{\lambda}).\end{array} (11)

It is easy to see that we just need to take the hypothesis spaces H,FH,F in Theorem D.1 to be H={(x)}H=\{{\mathcal{F}}(x)\} and F={𝒢(x)}F=\{{\mathcal{G}}(x)\} (i.e. hypothesis space just contains only one network). Then Theorem D.1 naturally equivalent to Theorem 4.5.

D.1 Proof of Theorem D.2

We give the following theorem, which is a generalization bound theory under more general hypothesis space.

Theorem D.2.

For any two hypothesis spaces H={h(x,y)𝒮×[m][0,1]},F={f(x,y)𝒮×[m][0,1]}H=\{h(x,y)\in{\mathcal{S}}\times[m]\to[0,1]\},F=\{f(x,y)\in{\mathcal{S}}\times[m]\to[0,1]\}, if 𝒫(x){\mathcal{P}}(x) satisfies the following conditions for some ϵ>0,τ>0,λ1\epsilon>0,\tau>0,\lambda\geq 1:
(1) 𝔼(x,y)𝒟𝒮lp[f(x+𝒫(x),y)]1ϵ{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[f(x+{\mathcal{P}}(x),y)]\geq 1-\epsilon for any fFf\in F,
(2) (x,y)𝒟Slp(𝒫(x)A)λ(x,y)𝒟𝒮lp(𝒫(x)A){\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}({\mathcal{P}}(x)\in A)\leq\lambda{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}({\mathcal{P}}(x)\in A) for any set AA,
(3) Some hHh\in H satisfies that there exists an fFf\in F such that 𝔼x𝒟𝒮[|(hf)(𝒫(x),lp)(hf)(x+𝒫(x),lp)|]τ{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(h-f)({\mathcal{P}}(x),l_{p})-(h-f)(x+{\mathcal{P}}(x),l_{p})|]\leq\tau, where (hf)(x)=h(x)f(x)(h-f)(x)=h(x)-f(x),
then with probability at least 1δ44η44η+ηN1-\delta-\frac{4-4\eta}{4-4\eta+\eta N}, for any hHh\in H satisfying (3), we have:

𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]λ(2𝔼(x,y)𝒟Ph(x,y)/(αη)+2RadNαη/2𝒟Slp(H)+ln(2/δ)Nαη+τ/η+ϵ)+τ+(λ1)η.\begin{array}[]{l}{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})]\leq\lambda(2{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}h(x,y)/(\alpha\eta)+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)+\tau+(\lambda-1)\eta.\end{array} (12)

We will use Theorem D.2 to prove Theorem D.1 in Section D.2. Now, we prove Theorem D.2 and give a detailed explanation of the “proof idea” of Theorem 4.5. Please note that, in the proof of Theorem D.2, the poison generalization error is 𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})].

Proof.

The proof is divided into two parts.

Part one, we estimate the upper bound of 𝔼(x,y)𝒟𝒮lp[h(x+𝒫(x),lp)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x+{\mathcal{P}}(x),l_{p})]. This part corresponds to the “firstly” part in the proof idea shown under Theorem 4.5.

We first give three bounds in the following Results (e1), (e2), and (e3). Since (e1), (e2), (e3) are similar to (d1), (d2), (d3) in the proof of Theorem A.2, we omitted the proof.

Result (e1): Let the random variable XX be the number of samples with label lpl_{p} in 𝒟𝒫{\mathcal{D}}_{\mathcal{P}}. Then with probability at least 14(1η)44η+Nη1-\frac{4(1-\eta)}{4-4\eta+N\eta}, it holds XNη/2X\geq N\eta/2.

Result (e2): Now we even randomly select Nηα/2N\eta\alpha/2 poisoned samples in 𝒟P{\mathcal{D}}_{P}. If the number of poisoned samples in 𝒟P{\mathcal{D}}_{P} is smaller than Nηα/2N\eta\alpha/2, then we let DlpD_{l_{p}} be the set of all such samples.

Let 𝒟lp{\mathcal{D}}_{l_{p}} obey the distribution 𝒟𝒮2{\mathcal{D}}_{{\mathcal{S}}2}. Then we have x𝒟𝒮2(xA|X[Nη/2])=P(XNηα/2𝒟𝒮lpA){\mathbb{P}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}2}}(x\in A|X\geq[N\eta/2])=P(X_{N\eta\alpha/2}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}\in A) for any set AA, where XNηα/2𝒟𝒮lpX_{N\eta\alpha/2}^{{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}} is the set that i.i.d. sampled Nηα/2N\eta\alpha/2 data from distribution 𝒟𝒮lp{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}, and add 𝒫(x){\mathcal{P}}(x) to each data.

Result (e3): With probability 144η44η+Nηδ/21-\frac{4-4\eta}{4-4\eta+N\eta}-\delta/2, for any hHh\in H, we have

𝔼(x,y)𝒟𝒮lp[h(x+𝒫(x),lp)](x,y)𝒟poih(x,y)Nαη/2+2RadNαη/2𝒟Slp(H)+ln(2/δ)Nαη.{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x+{\mathcal{P}}(x),l_{p})]\leq\frac{\sum_{(x,y)\in{\mathcal{D}}_{poi}}h(x,y)}{N\alpha\eta/2}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}.

We use 𝒟poi={(x+𝒫(x),lp)(x,lp)𝒟sub}{\mathcal{D}}_{poi}=\{(x+{\mathcal{P}}(x),l_{p})\|(x,l_{p})\in{\mathcal{D}}_{sub}\}, Result (e1), and Result (e2) to prove Result (e3). The concrete steps are similar to that of Result (c3). 𝒟poi{\mathcal{D}}_{poi} and 𝒟sub{\mathcal{D}}_{sub} are defined in Section 3.2.

Part two, estimate the upper bound of 𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})].

Please note that when ylpy\neq l_{p}, h(x+𝒫(x),lp)h(x+{\mathcal{P}}(x),l_{p}) will not appear in the empirical error 𝔼(x,y)𝒟P[h(x+𝒫(x),y)]{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x+{\mathcal{P}}(x),y)], so we need to use some other methods to estimate 𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})].

For any hHh\in H and fFf\in F, let ch,f(x)=h(x,lp)f(x,lp)c_{h,f}(x)=h(x,l_{p})-f(x,l_{p}). Let QQ be the upper bound mentioned in Result (e3) in Part one.

The upper bound of 𝔼(x,y)𝒟Slp[h(x+𝒫(x),lp)]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[h(x+{\mathcal{P}}(x),l_{p})] will be given by the following Results (f1), (f2), (f3), and (f4). Note that (f1) and (f2) correspond to the “Secondly” step in the proof idea shown under the Theorem 4.5; (f3) and (f4) correspond to the “Finally” step in the proof idea shown under the Theorem 4.5.

Result (f1): If fFf\in F and hHh\in H satisfy (3), then we have 𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))]Q+τ/η(1ε){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))]\leq Q+\tau/\eta-(1-{\mathbf{\varepsilon}}).

By condition (3), we have:

τ𝔼(x,y)𝒟S[|ch,f(𝒫(x))ch,f(x+𝒫(x))|](use(3))η(𝔼(x,y)𝒟𝒮lp[|ch,f(𝒫(x))ch,f(x+𝒫(x))|])η(𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))ch,f(x+𝒫(x))]).\begin{array}[]{ll}&\tau\\ \geq&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[|c_{h,f}({\mathcal{P}}(x))-c_{h,f}(x+{\mathcal{P}}(x))|](use\ (3))\\ \geq&\eta({\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[|c_{h,f}({\mathcal{P}}(x))-c_{h,f}(x+{\mathcal{P}}(x))|])\\ \geq&\eta({\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))-c_{h,f}(x+{\mathcal{P}}(x))]).\\ \end{array} (13)

Now by condition (1) and Result (e3), with probability 1δ1-\delta, we have

𝔼(x,y)𝒟𝒮lp[ch,f(x+𝒫(x))]=𝔼(x,y)𝒟𝒮lp[h(x+𝒫(x),lp)]𝔼(x,y)𝒟𝒮lp[f(x+𝒫(x),lp)]Q(1ϵ).\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}(x+{\mathcal{P}}(x))]\\ =&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[h(x+{\mathcal{P}}(x),l_{p})]-{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[f(x+{\mathcal{P}}(x),l_{p})]\\ \leq&Q-(1-\epsilon).\end{array} (14)

Combine inequalities (13) and (14), we have:

𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))]𝔼(x,y)𝒟𝒮lp[ch,f(x+𝒫(x))]+τ/ηQ(1ϵ)+τ/η.\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))]\\ \leq&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}(x+{\mathcal{P}}(x))]+\tau/\eta\\ \leq&Q-(1-\epsilon)+\tau/\eta.\end{array} (15)

Result (f1) is proved.

Result (f2): 𝔼(x,y)𝒟Slp[ch,f(𝒫(x))+1]λ𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))+1]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[c_{h,f}({\mathcal{P}}(x))+1]\leq\lambda{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))+1].

Result (f2) can be proved by using Lemma B.3 and condition (2).

Result (f3): When hHh\in H and fFf\in F satisfy condition (3), we have 𝔼𝒟S[ch,f(x+𝒫(x))](η+(1η)λ)(Q+ϵ+τ/η1)+λ1+τ{\mathbb{E}}_{{\mathcal{D}}_{S}}[c_{h,f}(x+{\mathcal{P}}(x))]\leq(\eta+(1-\eta)\lambda)(Q+\epsilon+\tau/\eta-1)+\lambda-1+\tau.

By condition (3), we have

τ𝔼(x,y)𝒟S[|ch,f(𝒫(x))ch,f(x+𝒫(x))|]𝔼(x,y)𝒟S[ch,f(𝒫(x))+ch,f(x+𝒫(x))].\begin{array}[]{ll}&\tau\\ \geq&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[|c_{h,f}({\mathcal{P}}(x))-c_{h,f}(x+{\mathcal{P}}(x))|]\\ \geq&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[-c_{h,f}({\mathcal{P}}(x))+c_{h,f}(x+{\mathcal{P}}(x))]\\ \end{array}.

Then, we have 𝔼𝒟S[ch,f(x+𝒫(x))]𝔼𝒟S[ch,f(𝒫(x))]+τ{\mathbb{E}}_{{\mathcal{D}}_{S}}[c_{h,f}(x+{\mathcal{P}}(x))]\leq{\mathbb{E}}_{{\mathcal{D}}_{S}}[c_{h,f}({\mathcal{P}}(x))]+\tau. Now, substitute Results (f1) and (f2) in it to estimate 𝔼𝒟S[ch,f(𝒫(x))]{\mathbb{E}}_{{\mathcal{D}}_{S}}[c_{h,f}({\mathcal{P}}(x))], and we can get

𝔼(x,y)𝒟S[ch,f(𝒫(x))]=η𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))]+(1η)𝔼(x,y)𝒟Slp[ch,f(𝒫(x))](η+(1η)λ)𝔼(x,y)𝒟𝒮lp[ch,f(𝒫(x))]+λ1(η+(1η)λ)(Q+ϵ+τ/η1)+λ1.\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[c_{h,f}({\mathcal{P}}(x))]\\ =&\eta{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))]+(1-\eta){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[c_{h,f}({\mathcal{P}}(x))]\\ \leq&(\eta+(1-\eta)\lambda){\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{\mathcal{S}}}[c_{h,f}({\mathcal{P}}(x))]+\lambda-1\\ \leq&(\eta+(1-\eta)\lambda)(Q+\epsilon+\tau/\eta-1)+\lambda-1.\end{array}

So, we prove Result (f3): 𝔼𝒟S[ch,f(x+𝒫(x))](η+(1η)λ)(Q+ϵ+τ/η1)+λ1+τ{\mathbb{E}}_{{\mathcal{D}}_{S}}[c_{h,f}(x+{\mathcal{P}}(x))]\leq(\eta+(1-\eta)\lambda)(Q+\epsilon+\tau/\eta-1)+\lambda-1+\tau.

Result (f4): 𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]λ(Q+τ/η+ϵ)+τ+(λ1)η{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})]\leq\lambda(Q+\tau/\eta+\epsilon)+\tau+(\lambda-1)\eta.

Since ch,f(x+𝒫(x))=h(x+𝒫(x),lp)f(x+𝒫(x),lp)c_{h,f}(x+{\mathcal{P}}(x))=h(x+{\mathcal{P}}(x),l_{p})-f(x+{\mathcal{P}}(x),l_{p}) and f(x+𝒫(x),lp)1f(x+{\mathcal{P}}(x),l_{p})\leq 1, using Result (f3), we have

𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]𝔼(x,y)𝒟S[ch,f(x+𝒫(x),lp)]+1(η+(1η)λ)(Q+ϵ+τ/η1)+λ+τλ(Q+τ/η+ϵ)+τ+(λ1)η.\begin{array}[]{ll}&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})]\\ \leq&{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[c_{h,f}(x+{\mathcal{P}}(x),l_{p})]+1\\ \leq&(\eta+(1-\eta)\lambda)(Q+\epsilon+\tau/\eta-1)+\lambda+\tau\\ \leq&\lambda(Q+\tau/\eta+\epsilon)+\tau+(\lambda-1)\eta.\end{array}

This proves Result (f4).

When hh satisfies condition (3), using the value of QQ into the Result (f4), we see that with probability 1δ44η44η+ηN1-\delta-\frac{4-4\eta}{4-4\eta+\eta N}, we have:

𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]λ((x,y)𝒟poih(x,y)Nαη/2+2RadNαη/2𝒟Slp(H)+ln(2/δ)Nαη+τ/η+ϵ)+τ+(λ1)η.{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})]\leq\lambda(\frac{\sum_{(x,y)\in{\mathcal{D}}_{poi}}h(x,y)}{N\alpha\eta/2}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)+\tau+(\lambda-1)\eta.

Finally, using the facts 𝒟poi𝒟P{\mathcal{D}}_{poi}\subset{\mathcal{D}}_{P} and Result (e3) is valid for X=|𝒟poi|[Nη/2]X=|{\mathcal{D}}_{poi}|\geq[N\eta/2], we have

𝔼(x,y)𝒟S[h(x+𝒫(x),lp)]λ(2𝔼(x,y)𝒟P[h(x,y)]αη+2RadNαη/2𝒟Slp(H)+ln(2/δ)Nαη+τ/η+ϵ)+τ+(λ1)η.{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[h(x+{\mathcal{P}}(x),l_{p})]\leq\lambda(\frac{2{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[h(x,y)]}{\alpha\eta}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H)+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)+\tau+(\lambda-1)\eta.

The theorem is proved. ∎

D.2 Proof of Theorem D.1

Now, we use Theorem D.2 to prove Theorem D.1.

First, for any h:nmh:{\mathbb{R}}^{n}\to{\mathbb{R}}^{m}, we define h1(x,y)=1hy(x):n×[m]h_{-1}(x,y)=1-h_{y}(x):{\mathbb{R}}^{n}\times[m]\to{\mathbb{R}}, and for any HW,DH\subset{\mathcal{H}}_{W,D}, we define the hypothesis space: H1={h(x,y)=1hy(x)h(x)H}H_{-1}=\{h(x,y)=1-h_{y}(x)\|h(x)\in H\}. Using Theorem D.2, we have the following lemma.

Lemma D.3.

For any two hypothesis spaces H,FW,DH,F\subset{\mathcal{H}}_{W,D}, if 𝒫(x){\mathcal{P}}(x) satisfies the following conditions for some ϵ>0,τ>0,λ1\epsilon>0,\tau>0,\lambda\geq 1:
(1) 𝔼(x,y)𝒟𝒮lp[fy(x+𝒫(x))]ϵ{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[f_{y}(x+{\mathcal{P}}(x))]\leq\epsilon for any fFf\in F;
(2) (x,y)𝒟Slp(𝒫(x)A)λ(x,y)𝒟𝒮lp(𝒫(x)A){\mathbb{P}}_{(x,y)\sim{\mathcal{D}}_{S}^{\neq l_{p}}}({\mathcal{P}}(x)\in A)\leq\lambda{\mathbb{P}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}({\mathcal{P}}(x)\in A) for any set AA, where λ1\lambda\geq 1;
(3) Some hHh\in H satisfies that there exists an fFf\in F such that 𝔼x𝒟𝒮[|(hlpflp)(𝒫(x))(hlpflp)(x+𝒫(x))|]τ{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(h_{l_{p}}-f_{l_{p}})({\mathcal{P}}(x))-(h_{l_{p}}-f_{l_{p}})(x+{\mathcal{P}}(x))|]\leq\tau, where (hlpflp)(x)=hlp(x)flp(x)(h_{l_{p}}-f_{l_{p}})(x)=h_{l_{p}}(x)-f_{l_{p}}(x),
then with probability at least 1δ44η44η+ηN1-\delta-\frac{4-4\eta}{4-4\eta+\eta N}, for any hHh\in H satisfied (3), we have:

𝔼(x,y)𝒟S[1hlp(x+𝒫(x))]λ(2𝔼(x,y)𝒟P[1hy(x)]/(αη)+2RadNαη/2𝒟Slp(H1)+ln(2/δ)Nαη+τ/η+ϵ)+τ+(λ1)η.\begin{array}[]{l}{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[1-h_{l_{p}}(x+{\mathcal{P}}(x))]\leq\lambda(2{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[1-h_{y}(x)]/(\alpha\eta)+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{-1})+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)\\ +\tau+(\lambda-1)\eta.\end{array} (16)
Proof.

We can use Theorem D.2 to H1H_{-1} and F1F_{-1} to prove the lemma. We just need to verify that the three conditions in Theorem D.2 for H1H_{-1} and F1F_{-1}.

Verify condition (1) in Theorem D.2.

We just need to show that 𝔼(x,y)𝒟𝒮lp[f1(x+𝒫(x),y)]1ϵ{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[f_{-1}(x+{\mathcal{P}}(x),y)]\geq 1-\epsilon for any f1F1f_{-1}\in F_{-1}.

By condition (1) in Lemma D.3, and considering that f1(x,y)=1fy(x)f_{-1}(x,y)=1-f_{y}(x) for the corresponding fFf\in F, we get the result.

Verify condition (2) in Theorem D.2.

This is obvious because condition (2) in Theorem D.2 and Lemma D.3 are the same.

Verify condition (3) in Theorem D.2.

We just need to show that: Some h1H1h_{-1}\in H_{-1} satisfies the requirement that there exists an f1F1f_{-1}\in F_{-1} such that 𝔼x𝒟𝒮[|(h1f1)(𝒫(x),lp)(h1f1)(x+𝒫(x),lp)|]τ{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(h_{-1}-f_{-1})({\mathcal{P}}(x),l_{p})-(h_{-1}-f_{-1})(x+{\mathcal{P}}(x),l_{p})|]\leq\tau, where (h1f1)(x,y)=h1(x,y)f1(x,y)(h_{-1}-f_{-1})(x,y)=h_{-1}(x,y)-f_{-1}(x,y).

Since (h1f1)(x,lp)=flp(x)hlp(x)(h_{-1}-f_{-1})(x,l_{p})=f_{l_{p}}(x)-h_{l_{p}}(x) for the corresponding hHh\in H and fFf\in F, by condition (3) in Lemma D.3, we get the result.

Since the three conditions in Theorem D.2 stand for H1H_{-1} and F1F_{-1}, the lemma is proved. ∎

Second, we give three lemmas.

Lemma D.4.

For any hW,Dh\in{\mathcal{H}}_{W,D}, we have 𝔼(x,y)𝒟S[𝟏(h^(x+𝒫(x))lp)]2𝔼(x,y)𝒟S[1hlp(x+𝒫(x))]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(\widehat{h}(x+{\mathcal{P}}(x))\neq l_{p})]\leq 2{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[1-h_{l_{p}}(x+{\mathcal{P}}(x))].

Proof.

When h^(x+𝒫(x))lp\widehat{h}(x+{\mathcal{P}}(x))\neq l_{p}, we have hlp(x+𝒫(x))<0.5h_{l_{p}}(x+{\mathcal{P}}(x))<0.5, which implies that 0.5𝟏(h^(x+𝒫(x))lp)1hlp(x+𝒫(x))0.5*{\mathbf{1}}(\widehat{h}(x+{\mathcal{P}}(x))\neq l_{p})\leq 1-h_{l_{p}}(x+{\mathcal{P}}(x)). Then, 𝔼(x,y)𝒟S[0.5𝟏(h^(x+𝒫(x))lp)]𝔼(x,y)𝒟S[1hlp(x+𝒫(x))]{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[0.5*{\mathbf{1}}(\widehat{h}(x+{\mathcal{P}}(x))\neq l_{p})]\leq{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[1-h_{l_{p}}(x+{\mathcal{P}}(x))], this is what we want. ∎

Lemma D.5 (Mohri et al. (2018)).

For any hypothesis space F={f:n[0,1]}F=\{f:{\mathbb{R}}^{n}\to[0,1]\} and distribution 𝒟{\mathcal{D}}, N>0N>0. Let F1={1ffF}F_{1}=\{1-f\|f\in F\}. Then RadN𝒟(F1)=RadN𝒟(F){\hbox{\rm{Rad}}}_{N}^{\mathcal{D}}(F_{-1})=Rad_{N}^{\mathcal{D}}(F).

Lemma D.6.

For any x(0,1]x\in(0,1], we have 1xlnx1-x\leq-\ln x.

Proof.

Let f(x)=1x+lnxf(x)=1-x+\ln x, then f(x)=1/x10f^{\prime}(x)=1/x-1\geq 0 when x[0,1]x\in[0,1]. Because f(1)=0f(1)=0, we have f(x)0f(x)\leq 0 for all x(0,1]x\in(0,1]

Finally, we use Lemmas D.3, D.4, D.5, and D.6 to prove Theorem D.1.

Proof.

Firstly, it is easy to see that, conditions (1), (2), (3) in Lemma D.3 are the same as (c1), (c2), (c3) in Theorem D.1. So by Lemma D.3, we have

𝔼(x,y)𝒟S[1hlp(x+𝒫(x))]λ(2𝔼(x,y)𝒟P[1hy(x)]/(αη)+2RadNαη/2𝒟Slp(H1)+ln(2/δ)Nαη+τ/η+ϵ)+τ+(λ1)η.{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[1-h_{l_{p}}(x+{\mathcal{P}}(x))]\leq\lambda(2{{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[1-h_{y}(x)]/(\alpha\eta)}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{-1})+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)\\ +\tau+(\lambda-1)\eta.

Then, by Lemma D.4, we further have

𝔼(x,y)𝒟S[𝟏(hlp(x+𝒫(x))lp)]2λ(2𝔼(x,y)𝒟P[1hy(x)]/(αη)+2RadNαη/2𝒟Slp(H1)+ln(2/δ)Nαη+τ/η+ϵ)+2τ+2(λ1)η.\begin{array}[]{l}{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(h_{l_{p}}(x+{\mathcal{P}}(x))\neq l_{p})]\leq\\ 2\lambda(2{{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[1-h_{y}(x)]/(\alpha\eta)}+2{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{-1})+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)+2\tau+2(\lambda-1)\eta.\end{array}

We have HW,D,1={y(x)W,D}H_{W,D,1}=\{{\mathcal{F}}_{y}(x)\|{\mathcal{F}}\in{\mathcal{H}}_{W,D}\} and HW,DH\subset{\mathcal{H}}_{W,D}. Then, by Lemma D.5 and considering that under distribution 𝒟Slp{\mathcal{D}}_{S}^{l_{p}}, all samples have label lpl_{p}, so we have RadNαη/2𝒟Slp(H1)RadNαη/2𝒟Slp(HW,D,1){\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{-1})\leq{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{W,D,1}).

Finally, using Lemma D.6 and the fact: RadMD(H)NMRadND(H){\hbox{\rm{Rad}}}^{D}_{M}(H)\leq\frac{N}{M}{\hbox{\rm{Rad}}}^{D}_{N}(H) for any MNM\leq N, distribution 𝒟{\mathcal{D}}, and hypothesis space HH, we obtain

𝔼(x,y)𝒟S[𝟏(hlp(x+𝒫(x))lp)]2λ(2𝔼(x,y)𝒟P[LCE(f(x),y)]ηα+4RadN𝒟Slp(HW,D,1)/(αη)+ln(2/δ)Nαη+τ/η+ϵ)+2τ+2(λ1)η.\begin{array}[]{l}{\mathbb{E}}_{(x,y)\sim{\mathcal{D}}_{S}}[{\mathbf{1}}(h_{l_{p}}(x+{\mathcal{P}}(x))\neq l_{p})]\leq\\ 2\lambda(\frac{2{{\mathbb{E}}_{(x,y)\in{\mathcal{D}}_{P}}[L_{CE}(f(x),y)]}}{\eta\alpha}+4{\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N}(H_{W,D,1})/(\alpha\eta)+\sqrt{\frac{\ln(2/\delta)}{N\alpha\eta}}+\tau/\eta+\epsilon)+2\tau+2(\lambda-1)\eta.\end{array}

The theorem is proved. ∎

D.3 Estimate the Rademacher Complexity

In this section, we estimate RadNαη/2𝒟Slp(HW,D,1){\hbox{\rm{Rad}}}^{{\mathcal{D}}_{S}^{l_{p}}}_{N\alpha\eta/2}(H_{W,D,1}). Since all samples have label lpl_{p} in distribution 𝒟Slp{\mathcal{D}}_{S}^{l_{p}}, without loss of generality, we let lp=1l_{p}=1. In this section, we only need to consider the Radmacher complexity of the following hypothesis space W,D,0={1(x):(x)W,D}{\mathcal{H}}_{W,D,0}=\{{\mathcal{F}}_{1}(x):{\mathcal{F}}(x)\in{\mathcal{H}}_{W,D}\}.

We show that, if we bound the norm of the network parameters in W,D,0{\mathcal{H}}_{W,D,0}, then we can calculate RadN𝒟(W,D,0){\hbox{\rm{Rad}}}_{N}^{\mathcal{D}}({\mathcal{H}}_{W,D,0}) for any distribution 𝒟{\mathcal{D}}. Please note that the condition of bounded network parameters is reasonable.

Some definitions and a lemma are required.

Definition D.7.

Let F={f:n[0,1]}F=\{f:{\mathbb{R}}^{n}\to[0,1]\} be a hypothesis space, and {xi}i=1N\{x_{i}\}_{i=1}^{N} be NN samples in n{\mathbb{R}}^{n}. Then the empirical Rademacher Complexity of {\mathcal{F}} under {xi}i=1N\{x_{i}\}_{i=1}^{N} is defined as

Rad{xi}i=1N(F)=𝔼σ[supfF1Ni=1Nσif(xi)],{\hbox{\rm{Rad}}}^{\{x_{i}\}_{i=1}^{N}}(F)={\mathbb{E}}_{\sigma}[\sup_{f\in F}\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}f(x_{i})],

where σ=(σi)i=1N\sigma=(\sigma_{i})_{i=1}^{N} is a set of random variables such that (σi=1)=(σi=1)=0.5{\mathbb{P}}(\sigma_{i}=1)={\mathbb{P}}(\sigma_{i}=-1)=0.5.

It is easy to see that RadN𝒟()max{xi}i=1N𝒟Rad{xi}i=1N(){\hbox{\rm{Rad}}}^{{\mathcal{D}}}_{N}({\mathcal{F}})\leq\max_{\{x_{i}\}_{i=1}^{N}\sim{\mathcal{D}}}{\hbox{\rm{Rad}}}^{\{x_{i}\}_{i=1}^{N}}({\mathcal{F}}), so we can try to calculate the empirical Rademacher Complexity to bound RadN𝒟(){\hbox{\rm{Rad}}}^{{\mathcal{D}}}_{N}({\mathcal{F}}).

Definition D.8 (Covering Number,(Wainwright, 2019)).

Let (T,L)(T,L) be a metric space, TT be a space, and LL be the distance in TT. We say that a KTK\subset T is an ϵ\epsilon cover of TT, if for any xTx\in T, there is a yKy\in K, such that L(x,y)ϵL(x,y)\leq\epsilon. The minimum |K||K| is defined as C(K,L,ϵ)C(K,L,\epsilon), that is C(K,L,ϵ)=min|K|C(K,L,\epsilon)=\min|K| where KTK\subset T is an ϵ\epsilon cover of TT.

Lemma D.9 ((Wainwright, 2019)).

Let F={f:n[0,1]}F=\{f:{\mathbb{R}}^{n}\to[0,1]\} be a hypothesis space, and {xi}i=1N\{x_{i}\}_{i=1}^{N} be NN samples in n{\mathbb{R}}^{n}. Define L(f,g)=1/Ni=1N(f(xi)g(xi))2L(f,g)=\sqrt{1/N\sum_{i=1}^{N}(f(x_{i})-g(x_{i}))^{2}} for any f,gFf,g\in F. Then

Rad{xi}i=1N(F)O(01lnC(F,L,t)N𝑑t).{\hbox{\rm{Rad}}}^{\{x_{i}\}_{i=1}^{N}}(F)\leq O(\int_{0}^{1}\sqrt{\frac{\ln C(F,L,t)}{N}}dt).

When network parameters are bounded, Lemma D.9 is often used to calculate the Rademacher complexity. A classical result is given below.

Lemma D.10.

Let TT be a ball with radius rr in p{\mathbb{R}}^{p}. Then for any tt, we have C(T,Lo,t)(3rt)pC(T,L_{o},t)\leq(\frac{3r}{t})^{p}, where LoL_{o} is the Euclid distance.

Now, we will try to calculate the covering number of W,D,0{\mathcal{H}}_{W,D,0}. First, we give a definition of the bound of the network parameters and the relationship between the bound of the network parameters and the network output.

Definition D.11.

Let i(i=1,2){\mathcal{F}}_{i}(i=1,2) be two networks, the jj-th transition matrix and bias vector are WjiW^{i}_{j} and bjib^{i}_{j}. Then let B(i)=j=1Di(Wji2+bji2)B({\mathcal{F}}_{i})=\sum_{j=1}^{D_{i}}(||W_{j}^{i}||_{2}+||b_{j}^{i}||_{2}), where DiD_{i} is the depth of i{\mathcal{F}}_{i}. If D1=D2=DD_{1}=D_{2}=D and the widths of i{\mathcal{F}}_{i} are the same, then we can define B(12)=j=1D(Wj1Wj22+bj1bj22)B({\mathcal{F}}_{1}-{\mathcal{F}}_{2})=\sum_{j=1}^{D}(||W_{j}^{1}-W_{j}^{2}||_{2}+||b_{j}^{1}-b_{j}^{2}||_{2}).

Then, we have the following existing result.

Lemma D.12 (Tsai et al. (2021)).

For networks 1{\mathcal{F}}_{1} and 2{\mathcal{F}}_{2} with depth DD, with Relu activation function, and output layers do not have activation function. If B(1)CB({\mathcal{F}}_{1})\leq C, B(2)CB({\mathcal{F}}_{2})\leq C, and B(12)ϵB({\mathcal{F}}_{1}-{\mathcal{F}}_{2})\leq\epsilon, then 1(x)2(x)2DϵCDx2||{\mathcal{F}}_{1}(x)-{\mathcal{F}}_{2}(x)||_{2}\leq D\epsilon C^{D}||x||_{2} for any xx.

It is easy to show that the Softamx function is a Liptschitz function:

Lemma D.13.

If a,bma,b\in{\mathbb{R}}^{m}, then |Softmax(a)1Softmax(b)1|mab2|{\hbox{\rm{Softmax}}}(a)_{1}-{\hbox{\rm{Softmax}}}(b)_{1}|\leq\sqrt{m}||a-b||_{2}, where Softmax(a)1{\hbox{\rm{Softmax}}}(a)_{1} is the first weight of Softmax(a){\hbox{\rm{Softmax}}}(a).

So using Lemmas D.12 and D.13, we have

Lemma D.14.

For networks 1,2W,D{\mathcal{F}}_{1},{\mathcal{F}}_{2}\in{\mathcal{H}}_{W,D} and inm{\mathcal{F}}_{i}\in{\mathbb{R}}^{n}\to{\mathbb{R}}^{m}. If B(1)CB({\mathcal{F}}_{1})\leq C, B(2)CB({\mathcal{F}}_{2})\leq C and B(12)ϵB({\mathcal{F}}_{1}-{\mathcal{F}}_{2})\leq\epsilon, then 1(x)2(x)2mnDϵCD||{\mathcal{F}}_{1}(x)-{\mathcal{F}}_{2}(x)||_{2}\leq\sqrt{mn}D\epsilon C^{D} for any x[0,1]nx\in[0,1]^{n}.

Now, we calculate the covering number of W,D,0{\mathcal{H}}_{W,D,0}.

Lemma D.15.

L(f,g)L(f,g) is defined in Lemma D.9. Let W,D,0A={:W,D,0,B()A}{\mathcal{H}}^{A}_{W,D,0}=\{{\mathcal{F}}:{\mathcal{F}}\in{\mathcal{H}}_{W,D,0},B({\mathcal{F}})\leq A\}. Then for any t>0t>0, we have C(W,D,0A,L,t)(3AD+1Dmn/t)O(W2D)C({\mathcal{H}}^{A}_{W,D,0},L,t)\leq(3A^{D+1}D\sqrt{mn}/t)^{O(W^{2}D)}.

Proof.

By Lemma D.14, when 1,2W,D,0A{\mathcal{F}}_{1},{\mathcal{F}}_{2}\in{\mathcal{H}}^{A}_{W,D,0}, if B(12)tDADmnB({\mathcal{F}}_{1}-{\mathcal{F}}_{2})\leq\frac{t}{DA^{D}\sqrt{mn}}, then we have 1(x)2(x)2t||{\mathcal{F}}_{1}(x)-{\mathcal{F}}_{2}(x)||_{2}\leq t, which implies that L(1,2)tL({\mathcal{F}}_{1},{\mathcal{F}}_{2})\leq t. So we just need to minimize the number of the tDADmn\frac{t}{DA^{D}\sqrt{mn}} cover of the parameter space. Using Lemma D.10, we get the result. ∎

Finally, using Lemmas D.9 and D.15, we can calculate the Rademacher Complexity.

Lemma D.16.

Let W,D,0A={:W,D,0,B()A}{\mathcal{H}}^{A}_{W,D,0}=\{{\mathcal{F}}:{\mathcal{F}}\in{\mathcal{H}}_{W,D,0},B({\mathcal{F}})\leq A\} and ADmneADmn\geq e. Then RadN𝒟(W,D,0A)O(WDln(ADmn))N{\hbox{\rm{Rad}}}_{N}^{\mathcal{D}}({\mathcal{H}}^{A}_{W,D,0})\leq{\frac{O(WD\ln(ADmn))}{\sqrt{N}}} for any distribution 𝒟{\mathcal{D}}, which implies that RadN𝒟(W,D,0A){\hbox{\rm{Rad}}}_{N}^{\mathcal{D}}({\mathcal{H}}^{A}_{W,D,0}) approaches to 0 when NN is big enough.

Proof.

We just need to prove that, for any NN points {xi}i=1N\{x_{i}\}_{i=1}^{N} in [0,1]N[0,1]^{N}, we have Rad{xi}i=1N(W,D,0A)O(WDln(ADmn))N{\hbox{\rm{Rad}}}^{\{x_{i}\}_{i=1}^{N}}({\mathcal{H}}^{A}_{W,D,0})\leq{\frac{O(WD\ln(ADmn))}{\sqrt{N}}}.

Using Lemmas D.9 and D.15, we have Rad{xi}i=1N(W,D,0A)O(01O(W2D2ln(ADmn/t))N𝑑t)O(01O(WDln(ADmn/t))N𝑑t)=O(WDln(ADmn))N{\hbox{\rm{Rad}}}^{\{x_{i}\}_{i=1}^{N}}({\mathcal{H}}^{A}_{W,D,0})\leq O(\int_{0}^{1}\sqrt{\frac{O(W^{2}D^{2}\ln(ADmn/t))}{N}}dt)\leq O(\int_{0}^{1}\frac{O(WD\ln(ADmn/t))}{\sqrt{N}}dt)=\frac{O(WD\ln(ADmn))}{\sqrt{N}}. Here, we use ln(q/t)lnq/t\sqrt{\ln(q/t)}\leq lnq/t for all qeq\geq e and t(0,1)t\in(0,1). ∎

Appendix E Proof of Proposition 4.10

E.1 Strict

We first give a precise version of the proposition and definition in Section 4.3:

Definition E.1 (Binary Shortcut).

𝒫(x){\mathcal{P}}(x) is called a binary shortcut of the binary linear inseparable classification dataset 𝒟={(xi,1)}i=1N1{(x^i,0)}i=1N0{\mathcal{D}}=\{(x_{i},1)\}_{i=1}^{N_{1}}\cup\{(\widehat{x}_{i},0)\}_{i=1}^{N_{0}}, if 𝒟1={(xi,1)}i=1N1{(x^i+𝒫(x^i),0)}i=1N0{\mathcal{D}}_{1}=\{(x_{i},1)\}_{i=1}^{N_{1}}\cup\{(\hat{x}_{i}+{\mathcal{P}}(\widehat{x}_{i}),0)\}_{i=1}^{N_{0}} is linear separable. Moreover, if there exists a linear function hh with a unit normal vector and 0<η1<0.50<\eta_{1}<0.5 such that h(x)1η1h(x)\geq 1-\eta_{1} for any (x,0)𝒟1(x,0)\in{\mathcal{D}}_{1} and h(x)η1h(x)\leq\eta_{1} for any (x,1)𝒟1(x,1)\in{\mathcal{D}}_{1}. Then we say that 𝒫(x){\mathcal{P}}(x) is a binary shortcut of 𝒟{\mathcal{D}} with bound η1\eta_{1}.

The strict version of the definition for the Simple Feature Recognition Space: It is generally believed that networks learn simple features, and based on this characteristic, adding shortcut to dataset affects network training, so we use the following definition to describe this property:

Definition E.2 (Simple Features Recognition Space).

We say that {\mathcal{H}} is a simple feature recognition space with a constant cc, if for any binary linear inseparable classification dataset 𝒟{\mathcal{D}} and a binary shortcut 𝒫(x){\mathcal{P}}(x) of 𝒟{\mathcal{D}} with bound η1\eta_{1}, {\mathcal{H}} satisfies the following properties: Let k=max(x,0),(z,0)D{𝒫(x)𝒫(z)2}k=\max_{(x,0),(z,0)\in D}\{||{\mathcal{P}}(x)-{\mathcal{P}}(z)||_{2}\}. Then for any hh\in{\mathcal{H}} that satisfies h(x+𝒫(x))1η1,(x,0)𝒟h(x+{\mathcal{P}}(x))\geq 1-\eta_{1},\forall(x,0)\in{\mathcal{D}} and h(x)η1,(x,1)𝒟h(x)\leq\eta_{1},\forall(x,1)\in{\mathcal{D}}, it holds h(x1+𝒫(x0))h(x1)c(12η1k)h(x_{1}+{\mathcal{P}}(x_{0}))-h(x_{1})\geq c(1-2\eta_{1}-k) and h(𝒫(x0))c(12η1k)h({\mathcal{P}}(x_{0}))\geq c(1-2\eta_{1}-k) for any (x0,0)𝒟(x_{0},0)\in{\mathcal{D}} and (x1,y1)𝒟(x_{1},y_{1})\in{\mathcal{D}}.

As mentioned above, indiscriminate poison can be considered as a shortcut, and we have two important conclusions that have been proved in the study of indiscriminate poison (Zhu et al., 2023b): (1) The network trained on dataset with indiscriminate poison will classify shortcut and (2) Adding shortcut to samples will affect the output of network. The above definition mainly uses mathematical methods to describe these two conclusions: we express that “network will classify shortcut” by giving a lower bound to h(𝒫(x0))h({\mathcal{P}}(x_{0})); we express “shortcut effects the classification results” by giving lower bound to h(x1+𝒫(x0))h(x1)h(x_{1}+{\mathcal{P}}(x_{0}))-h(x_{1}). Moreover, we have considered the impact of differences in 𝒫(x){\mathcal{P}}(x) for different xx (i.e., kk in definition) on the output: the larger the difference, the smaller the impact. Thus, our definition is valid for some spaces, as shown below.

Proposition E.3.

Let LL be the set of linear functions with a normal unit vector and without bias. Then LL is a simple feature recognition space with constant 11. Furthermore, if f:f:{\mathbb{R}}\to{\mathbb{R}} is an increasing differentiable function with derivatives in [a,1][a,1] and f(0)=0f(0)=0, then the hypothesis space ={f(l(x))|lL}{\mathcal{H}}=\{f(l(x))|l\in L\} is a simple feature recognition space with constant aa.

The strict version of Proposition 4.10: Using the above definitions, we can show how condition (c3) in Theorem 4.5 stands for a small τ\tau, the strict version of Proposition 4.10 is given below.

Proposition E.4.

Use notation introduced in Theorem 4.1. For any distribution 𝒟{\mathcal{D}}, let 𝒟(𝒫){\mathcal{D}}({\mathcal{P}}) be the distribution of 𝒫(x){\mathcal{P}}(x) when x𝒟x\sim{\mathcal{D}}, and 𝒟(𝒫+x){\mathcal{D}}({\mathcal{P}}+x) be the distribution of x+𝒫(x)x+{\mathcal{P}}(x) when x𝒟x\sim{\mathcal{D}}. W,D,1{\mathcal{H}}_{W,D,1} is defined in Section 3. Define Rad2(W,D){\hbox{\rm{Rad}}}2({\mathcal{H}}_{W,D}) as:

Rad2(W,D)=O(RadN(1η)𝒟Slp(𝒫)(W,D,1)+RadN(1η)𝒟Slp(x+𝒫)(W,D,1)+RadNηαD𝒮lp(𝒫)(W,D,1)+RadNηαD𝒮lp(x+𝒫)(W,D,1))\begin{array}[]{ll}&{\hbox{\rm{Rad}}}2({\mathcal{H}}_{W,D})=\\ &O({\hbox{\rm{Rad}}}_{N(1-\eta)}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}({\mathcal{H}}_{W,D,1})+{\hbox{\rm{Rad}}}_{N(1-\eta)}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}({\mathcal{H}}_{W,D,1})+{\hbox{\rm{Rad}}}_{N\eta\alpha}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}({\mathcal{H}}_{W,D,1})+{\hbox{\rm{Rad}}}_{N\eta\alpha}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}({\mathcal{H}}_{W,D,1}))\end{array} (17)

Let DP={(x,0)|(x,y)𝒟tr𝒟clean}{(x,1)|(x,y)𝒟clean}D^{\prime}_{P}=\{(x,0)|(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}\}\cup\{(x,1)|(x,y)\in{\mathcal{D}}_{clean}\}. Assume that with probability 1δ11-\delta_{1}, DPD^{\prime}_{P} is linear inseparable. Let the hypothesis spaces H,FW,DH,F\subset{\mathcal{H}}_{W,D} satisfy that hy(x)h_{y}(x) and fy(x)f_{y}(x) have Lipschitz constant LL for any hHh\in H, fFf\in F and any xx, yy. Assume that trigger 𝒫(x){\mathcal{P}}(x) satisfies the following three conditions for some ϵ,k>0\epsilon,k>0:
(t1) For any fFf\in F, we have fy(x+𝒫(x))<ϵ,(x,y)𝒟𝒮f_{y}(x+{\mathcal{P}}(x))<\epsilon,\forall(x,y)\sim{\mathcal{D}}_{\mathcal{S}};
(t2) 𝒫(x1)𝒫(x2)2k||{\mathcal{P}}(x_{1})-{\mathcal{P}}(x_{2})||_{2}\leq k for all x1,x2x_{1},x_{2};
(t3) 𝒫(x){\mathcal{P}}(x) is the shortcut of the dataset 𝒟P{\mathcal{D}}^{\prime}_{P} with bound 2ϵ2\epsilon.
When the hypothesis space HF={hlp(x)flp(x)m|fF,hH}H-F=\{h_{l_{p}}(x)-f_{l_{p}}(x)\in{\mathbb{R}}^{m}\to{\mathbb{R}}|f\in F,h\in H\} is a simple feature recognition space with constant cc where cc satisfies 12ϵc(14ϵ)+k(c+4L)2ϵ1-2\epsilon\geq c(1-4\epsilon)+k(c+4L)\geq 2\epsilon. Then with probability 1δ1δ4(1η)4(1η)+Nη4η4η+N(1η)1-\delta_{1}-\delta-\frac{4(1-\eta)}{4(1-\eta)+N\eta}-\frac{4\eta}{4\eta+N(1-\eta)}, for any h{hH|hy(x)>1ϵ,(x,y)𝒟P}h\in\{h\in H|h_{y}(x)>1-\epsilon,\forall(x,y)\in{\mathcal{D}}_{P}\} and any f{fF|fy(x)>1ϵ,(x,y)𝒟tr}f\in\{f\in F|f_{y}(x)>1-\epsilon,\forall(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\}, we have

𝔼x𝒟𝒮[|(fh)lp(𝒫(x))(fh)lp(x+𝒫(x))|]2(1c(14ϵ)+k(c+4L))+2ϵ)+Rad2(W,D)+16ln(1/δ)Nα.\begin{array}[]{ll}&{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)_{l_{p}}({\mathcal{P}}(x))-(f-h)_{l_{p}}(x+{\mathcal{P}}(x))|]\\ \leq&2(1-c(1-4\epsilon)+k(c+4L))+2\epsilon)+{\hbox{\rm{Rad}}}2({\mathcal{H}}_{W,D})+16\sqrt{\frac{\ln(1/\delta)}{N\alpha}}.\end{array} (18)

It is easy to see that, when cc close to 1 and kk is small, such value is close to O~(ϵ)\widetilde{O}(\epsilon).

Remark E.5.

Notice that (t1) is similar to (c1) in Theorem 4.5, which means the trigger is adversarial; (t2) is similar to (c2) in Theorem 4.5, which means trigger should be similar for different samples. And when kk tends to 0, cc tends to 1, and NN is big enough, it holds that 𝔼x𝒟𝒮[|(fh)lp(𝒫(x))(fh)lp(x+𝒫(x))|]{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)_{l_{p}}({\mathcal{P}}(x))-(f-h)_{l_{p}}(x+{\mathcal{P}}(x))|] tends to O(ϵ)O(\epsilon). Generally, we can consider that the hypothesis space FF only contains the network that performs well on distribution 𝒟𝒮{\mathcal{D}}_{\mathcal{S}}. Then, based on the transferability of adversarial examples, condition (t1) can be established.

E.2 Prove Proposition E.4

We first prove the following more general proposition, where the hypothesis spaces are replaced with more general hypothesis spaces, which will imply Proposition E.4.

Proposition E.6.

Use notation introduced in Theorem A.2. Let DP={(x,0)|(x,y)𝒟tr𝒟clean}{(x,1)|(x,y)𝒟clean}D^{\prime}_{P}=\{(x,0)|(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}\}\cup\{(x,1)|(x,y)\in{\mathcal{D}}_{clean}\}. Assume that with probability 1δ11-\delta_{1}, DPD^{\prime}_{P} is linear inseparable. Let the hypothesis spaces H={h(x,y):n×[m][0,1]}H=\{h(x,y):{\mathbb{R}}^{n}\times[m]\to[0,1]\} and F={f(x,y):n×[m][0,1]}F=\{f(x,y):{\mathbb{R}}^{n}\times[m]\to[0,1]\} satisfy h(x,y1)+h(x,y2)1h(x,y_{1})+h(x,y_{2})\geq 1, h(x,y1)+h(x,y2)1h(x,y_{1})+h(x,y_{2})\geq 1, h(x,y1)h(x,y_{1}) and f(x,y1)f(x,y_{1}) have Lipschitz constant LL about xx for any hHh\in H, fFf\in F, xnx\in{\mathbb{R}}^{n} and y1y2y_{1}\neq y_{2}. Assume that trigger 𝒫(x){\mathcal{P}}(x) satisfies the following three conditions for some ϵ,k>0\epsilon,k>0:
(t1) For any fFf\in F, we have f(x+𝒫(x),y)>1ϵ,(x,y)𝒟𝒮f(x+{\mathcal{P}}(x),y)>1-\epsilon,\forall(x,y)\sim{\mathcal{D}}_{\mathcal{S}};
(t2)𝒫(x1)𝒫(x2)2k||{\mathcal{P}}(x_{1})-{\mathcal{P}}(x_{2})||_{2}\leq k for all x1,x2nx_{1},x_{2}\in{\mathbb{R}}^{n};
(t3) 𝒫(x){\mathcal{P}}(x) is the shortcut of the dataset 𝒟P{\mathcal{D}}^{\prime}_{P} with bound 2ϵ2\epsilon.
When the hypothesis space FH={gf,h(x)=f(x,lp)h(x,lp)m|fF,hH}F-H=\{g_{f,h}(x)=f(x,l_{p})-h(x,l_{p})\in{\mathbb{R}}^{m}\to{\mathbb{R}}|f\in F,h\in H\} is a simple feature recognition space with constant cc where cc satisfies 12ϵc(14ϵ)(c+4L)k2ϵ1-2\epsilon\geq c(1-4\epsilon)-(c+4L)k\geq 2\epsilon. Then with probability 1δ1δ4(1η)4(1η)+Nη4η4η+N(1η)1-\delta_{1}-\delta-\frac{4(1-\eta)}{4(1-\eta)+N\eta}-\frac{4\eta}{4\eta+N(1-\eta)}, for any h{hH|h(x,y)<ϵ,(x,y)𝒟P}h\in\{h\in H|h(x,y)<\epsilon,\forall(x,y)\in{\mathcal{D}}_{P}\} and any f{fF|f(x,y)<ϵ,(x,y)𝒟tr}f\in\{f\in F|f(x,y)<\epsilon,\forall(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\}, we have

𝔼x𝒟𝒮[|(fh)(𝒫(x),lp)(fh)(x+𝒫(x),lp)|]2(1c(14ϵ)+(c+4L)k+2ϵ)+Rad(H,F)+16ln(1/δ)Nα.\begin{array}[]{ll}&{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)({\mathcal{P}}(x),l_{p})-(f-h)(x+{\mathcal{P}}(x),l_{p})|]\\ \leq&2(1-c(1-4\epsilon)+(c+4L)k+2\epsilon)+{\hbox{\rm{Rad}}}(H,F)+16\sqrt{\frac{\ln(1/\delta)}{N\alpha}}.\end{array} (19)

Here Rad(H,F){\hbox{\rm{Rad}}}(H,F) is a value depending on the Rademacher complexity of HH and FF, and the specific value of it is explained in the following proof.

Proof.

Let 𝒟(𝒫){\mathcal{D}}({\mathcal{P}}) be the distribution of 𝒫(x){\mathcal{P}}(x) where x𝒟x\sim{\mathcal{D}}, and 𝒟(𝒫+x){\mathcal{D}}({\mathcal{P}}+x) be the distribution of x+𝒫(x)x+{\mathcal{P}}(x) where x𝒟x\sim{\mathcal{D}}. We define gf,h(x)=f(x,lp)h(x,lp)g_{f,h}(x)=f(x,l_{p})-h(x,l_{p}).

Next, under “𝒟P{\mathcal{D}}^{\prime}_{P} is linear inseparable”, we will prove the inequality (19) with probability 1δ4(1η)4(1η)+Nη4η4η+N(1η)1-\delta-\frac{4(1-\eta)}{4(1-\eta)+N\eta}-\frac{4\eta}{4\eta+N(1-\eta)}, which will directly lead to the conclusion of the proposition.

To prove this, we just used the following three results:

Result one: If h(x,y)<ϵh(x,y)<\epsilon for any x𝒟Px\in{\mathcal{D}}_{P} and f(x,y)<ϵf(x,y)<\epsilon for any x𝒟trx\in{\mathcal{D}}_{{{\rm{tr}}}}, then we have |gf,h(𝒫(x))gf,h(𝒫(x)+x)|1c(14ε)+(c+4L)k|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|\leq 1-c(1-4{\mathbf{\varepsilon}})+(c+4L)k for any (x,y)𝒟tr𝒟clean(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean} and |gf,h(𝒫(x))gf,h(𝒫(x)+x)|1+2ϵc(14ε)+(c+4L)k|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|\leq 1+2\epsilon-c(1-4{\mathbf{\varepsilon}})+(c+4L)k for any (x,y)𝒟clean(x,y)\in{\mathcal{D}}_{clean}.

It is easy to see that when h(x,y)<ϵh(x,y)<\epsilon for any x𝒟Px\in{\mathcal{D}}_{P} and f(x,y)<ϵf(x,y)<\epsilon for any x𝒟trx\in{\mathcal{D}}_{{{\rm{tr}}}}, we have |f(x,lp)h(x,lp)|2ϵ,(x,y)𝒟clean|f(x,l_{p})-h(x,l_{p})|\leq 2\epsilon,\forall(x,y)\in{\mathcal{D}}_{clean}, where we use h(x,y1)+h(x,y2)1h(x,y_{1})+h(x,y_{2})\geq 1 and f(x,y1)+f(x,y2)1f(x,y_{1})+f(x,y_{2})\geq 1 for any y1y2y_{1}\neq y_{2}. Since f(x+𝒫(x),y)>1ϵf(x+{\mathcal{P}}(x),y)>1-\epsilon for any x𝒟trx\in{\mathcal{D}}_{{{\rm{tr}}}}, so h(x,lp)+f(x,lp)12ϵ,(x,y)𝒟poi-h(x,l_{p})+f(x,l_{p})\geq 1-2\epsilon,\forall(x,y)\in{\mathcal{D}}_{poi}. Then we get gf,h(x)12ϵ,(x,y)𝒟poig_{f,h}(x)\geq 1-2\epsilon,\forall(x,y)\in{\mathcal{D}}_{poi} and |gf,h(x)|2ϵ,(x,y)𝒟clean|g_{f,h}(x)|\leq 2\epsilon,\forall(x,y)\in{\mathcal{D}}_{clean}.

Because FHF-H is a Simple Features recognition space, with conditions (t2) and (t3), we know that gf,h(𝒫(x))c(14ϵk)g_{f,h}({\mathcal{P}}(x))\geq c(1-4\epsilon-k) and gf,h(x+𝒫(x))gf,h(x)c(14ϵk)g_{f,h}(x+{\mathcal{P}}(x))-g_{f,h}(x)\geq c(1-4\epsilon-k) for all x𝒟tr𝒟cleanx\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}.

Considering that h(x,lp)h(x,l_{p}) and f(x,lp)f(x,l_{p}) have Lipschitz constant LL, and by condition (t2), we have 𝒫(x1)𝒫(x2)2k||{\mathcal{P}}(x_{1})-{\mathcal{P}}(x_{2})||_{2}\leq k for all (x1,y1),(x2,y2)𝒟P(x_{1},y_{1}),(x_{2},y_{2})\in{\mathcal{D}}^{\prime}_{P}. Then we have gf,h(𝒫(x))c(14ϵ)(c+4L)kg_{f,h}({\mathcal{P}}(x))\geq c(1-4\epsilon)-(c+4L)k and gf,h(x+𝒫(x))gf,h(x)c(14ε)(c+4L)kg_{f,h}(x+{\mathcal{P}}(x))-g_{f,h}(x)\geq c(1-4{\mathbf{\varepsilon}})-(c+4L)k for all x𝒟trx\in{\mathcal{D}}_{{{\rm{tr}}}}.

Using the above result, when x𝒟tr𝒟cleanx\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}, we have 1gf,h(x+𝒫(x))12ϵ1\geq g_{f,h}(x+{\mathcal{P}}(x))\geq 1-2\epsilon and gf,h(𝒫(x))c(14ε)(c+4L)kg_{f,h}({\mathcal{P}}(x))\geq c(1-4{\mathbf{\varepsilon}})-(c+4L)k, and considering that |ab|max{|1a|,|1b|}|a-b|\leq\max\{|1-a|,|1-b|\} when a,b[1,1]a,b\in[-1,1], we have |gf,h(𝒫(x))gf,h(𝒫(x)+x)|max{1c(14ε)+(c+4L)k,2ϵ}=1c(14ε)+(c+4L)k|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|\leq\max\{1-c(1-4{\mathbf{\varepsilon}})+(c+4L)k,2\epsilon\}=1-c(1-4{\mathbf{\varepsilon}})+(c+4L)k, by 1c(14ε)+(c+4L)k2ϵ01-c(1-4{\mathbf{\varepsilon}})+(c+4L)k-2\epsilon\geq 0.

Then, when x𝒟cleanx\in{\mathcal{D}}_{clean}, we have 1gf,h(x+𝒫(x))gf,h(x)+c(14ε)(c+4L)kc(14ε)(c+4L)k2ε>01\geq g_{f,h}(x+{\mathcal{P}}(x))\geq g_{f,h}(x)+c(1-4{\mathbf{\varepsilon}})-(c+4L)k\geq c(1-4{\mathbf{\varepsilon}})-(c+4L)k-2{\mathbf{\varepsilon}}>0, so considering that gf,h(𝒫(x))c(14ε)(c+4L)kg_{f,h}({\mathcal{P}}(x))\geq c(1-4{\mathbf{\varepsilon}})-(c+4L)k, we have |gf,h(𝒫(x))gf,h(𝒫(x)+x)|max{1gf,h(𝒫(x)),1gf,h(x+𝒫(x))}1c(14ε)+(c+4L)k+2ε|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|\leq\max\{1-g_{f,h}({\mathcal{P}}(x)),1-g_{f,h}(x+{\mathcal{P}}(x))\}\leq 1-c(1-4{\mathbf{\varepsilon}})+(c+4L)k+2{\mathbf{\varepsilon}}. So we get result one.

Result Two: With probability 124(1η)4(1η)+Nη2δ1-2\frac{4(1-\eta)}{4(1-\eta)+N\eta}-2\delta, we have ExD𝒮lp|gf,h(𝒫(x))gf,h(𝒫(x)+x)|2(1c(14ϵ)+(c+4L)k)+R1(H,F)+8ln(1/δ)NηαE_{x\sim D^{l_{p}}_{\mathcal{S}}}|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|\leq 2(1-c(1-4\epsilon)+(c+4L)k)+R_{1}(H,F)+8\sqrt{\frac{\ln(1/\delta)}{N\eta\alpha}}, where Rad1(H,F){\hbox{\rm{Rad}}}_{1}(H,F) is a value of the Rademacher complexity of HH and FF.

By Result one, we can use (x,y)𝒟tr𝒟clean|gf,h(𝒫(x))gf,h(𝒫(x)+x)|\sum_{(x,y)\in{\mathcal{D}}_{{{\rm{tr}}}}\setminus{\mathcal{D}}_{clean}}|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)| to estimate ExD𝒮lp[|gf,h(𝒫(x))gf,h(𝒫(x)+x)|]E_{x\sim D^{l_{p}}_{\mathcal{S}}}[|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|].

First, use Theorem A.1 and similar to the proof of Theorem A.2, with probability 14(1η)4(1η)+Nηδ1-\frac{4(1-\eta)}{4(1-\eta)+N\eta}-\delta, we have

ExD𝒮lp[gf,h(𝒫(x))gf,h(𝒫(x)+x)]1c(14ϵ)+(c+4L)k+2(RadNηα/2D𝒮lp(𝒫)(H)+RadNηα/2D𝒮lp(𝒫)(F)+RadNηα/2D𝒮lp(x+𝒫)(H)+RadNηα/2D𝒮lp(x+𝒫)(F))+4ln(1/δ)Nηα.\begin{array}[]{ll}&E_{x\sim D^{l_{p}}_{\mathcal{S}}}[g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)]\leq 1-c(1-4\epsilon)+(c+4L)k+\\ &2({\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(F))+4\sqrt{\frac{\ln(1/\delta)}{N\eta\alpha}}.\end{array}

Notice that, we use RadkD(H1)+RadkD(H2)RadkD(H1H2){\hbox{\rm{Rad}}}_{k}^{D}(H_{1})+{\hbox{\rm{Rad}}}_{k}^{D}(H_{2})\geq{\hbox{\rm{Rad}}}_{k}^{D}(H_{1}-H_{2}) for any hypothesis space H1,H2H_{1},H_{2} and k0k\geq 0, distribution DD here.

Second, similar as before, with probability 14(1η)4(1η)+Nηδ1-\frac{4(1-\eta)}{4(1-\eta)+N\eta}-\delta, we have

ExD𝒮lp[gf,h(𝒫(x))+gf,h(𝒫(x)+x)]1c(14ϵ)+(c+4L)k+2(RadNηα/2D𝒮lp(𝒫)(H)+RadNηα/2D𝒮lp(𝒫)(F)+RadNηα/2D𝒮lp(x+𝒫)(H)+RadNηα/2D𝒮lp(x+𝒫)(F))+4ln(1/δ)Nηα.\begin{array}[]{cc}&E_{x\sim D^{l_{p}}_{\mathcal{S}}}[-g_{f,h}({\mathcal{P}}(x))+g_{f,h}({\mathcal{P}}(x)+x)]\leq 1-c(1-4\epsilon)+(c+4L)k+\\ &2({\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(F))+4\sqrt{\frac{\ln(1/\delta)}{N\eta\alpha}}.\end{array}

Adding these two equations, we get the result, where

Rad1(H,F)=4(RadNηα/2D𝒮lp(𝒫)(H)+RadNηα/2D𝒮lp(𝒫)(F)+RadNηα/2D𝒮lp(x+𝒫)(H)+RadNηα/2D𝒮lp(x+𝒫)(F)).{\hbox{\rm{Rad}}}_{1}(H,F)=4({\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N\eta\alpha/2}^{D^{l_{p}}_{\mathcal{S}}(x+{\mathcal{P}})}(F)).

Result Three: With probability 124(1η)4(1η)+Nη2δ1-2\frac{4(1-\eta)}{4(1-\eta)+N\eta}-2\delta, we have Ex𝒟Slp|g(𝒫(x))g(𝒫(x)+x)|2(1c(14ϵ)+(c+4L)k+2ε)+Rad2(H,F)+8ln(1/δ)N(1η)E_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}|g({\mathcal{P}}(x))-g({\mathcal{P}}(x)+x)|\leq 2(1-c(1-4\epsilon)+(c+4L)k+2{\mathbf{\varepsilon}})+{\hbox{\rm{Rad}}}_{2}(H,F)+8\sqrt{\frac{\ln(1/\delta)}{N(1-\eta)}}, where Rad2(H,F){\hbox{\rm{Rad}}}_{2}(H,F) is a value of the Rademacher complexity of HH and FF.

By Result one, we can use (x,y)𝒟clean|gf,h(𝒫(x))gf,h(𝒫(x)+x)|\sum_{(x,y)\in{\mathcal{D}}_{clean}}|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)| to estimate Ex𝒟Slp[|gf,h(𝒫(x))gf,h(𝒫(x)+x)|]E_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[|g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)|].

First, using Theorem A.1 and similar as in proof of Theorem 4.1, with probability 14η4η+N(1η)δ1-\frac{4\eta}{4\eta+N(1-\eta)}-\delta, we have

Ex𝒟Slp[gf,h(𝒫(x))gf,h(𝒫(x)+x)]1+2ε(c(14ϵ)(c+4L)k)+2(RadN(1η)/2𝒟Slp(𝒫)(H)+RadN(1η)/2𝒟Slp(𝒫)(F)+RadN(1η)/2𝒟Slp(x+𝒫)(H)+RadN(1η)/2𝒟Slp(x+𝒫)(F))+4ln(1/δ)N(1η)\begin{array}[]{ll}&E_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[g_{f,h}({\mathcal{P}}(x))-g_{f,h}({\mathcal{P}}(x)+x)]\leq 1+2{\mathbf{\varepsilon}}-(c(1-4\epsilon)-(c+4L)k)+\\ &2({\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(F))+4\sqrt{\frac{\ln(1/\delta)}{N(1-\eta)}}\end{array}

and then, similar as above, with probability 14η4η+N(1η)δ1-\frac{4\eta}{4\eta+N(1-\eta)}-\delta, we have

Ex𝒟Slp[gf,h(𝒫(x))+gf,h(𝒫(x)+x)]1+2ε(c(14ϵ)(c+4L)k)+2(RadN(1η)/2𝒟Slp(𝒫)(H)+RadN(1η)/2𝒟Slp(𝒫)(F)+RadN(1η)/2𝒟Slp(x+𝒫)(H)+RadN(1η)/2𝒟Slp(x+𝒫)(F))+4ln(1/δ)N(1η).\begin{array}[]{ll}&E_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[-g_{f,h}({\mathcal{P}}(x))+g_{f,h}({\mathcal{P}}(x)+x)]\leq 1+2{\mathbf{\varepsilon}}-(c(1-4\epsilon)-(c+4L)k)+\\ &2({\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(F))+4\sqrt{\frac{\ln(1/\delta)}{N(1-\eta)}}.\end{array}

Adding them, we get the result, and Rad2(H,F)=4(RadN(1η)/2𝒟Slp(𝒫)(H)+RadN(1η)/2𝒟Slp(𝒫)(F)+RadN(1η)/2𝒟Slp(x+𝒫)(H)+RadN(1η)/2𝒟Slp(x+𝒫)(F)){\hbox{\rm{Rad}}}_{2}(H,F)=4(Rad_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}({\mathcal{P}})}(F)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(H)+{\hbox{\rm{Rad}}}_{N(1-\eta)/2}^{{\mathcal{D}}_{S}^{\neq l_{p}}(x+{\mathcal{P}})}(F)).

Summarize: Finally, considering that

𝔼x𝒟𝒮[|(fh)(𝒫(x),lp)(fh)(x+𝒫(x),lp)|]=(1η)𝔼x𝒟Slp[|(fh)(𝒫(x),lp)(fh)(x+𝒫(x),lp)|]+η𝔼x𝒟𝒮lp[|(fh)(𝒫(x),lp)(fh)(x+𝒫(x),lp)|]=(1η)𝔼x𝒟Slp[|gf,h(𝒫(x))gf,h(x+𝒫(x))|]+η𝔼x𝒟𝒮lp[|gf,h(𝒫(x))gf,h(x+𝒫(x))|]\begin{array}[]{ll}&{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)({\mathcal{P}}(x),l_{p})-(f-h)(x+{\mathcal{P}}(x),l_{p})|]\\ =&(1-\eta){\mathbb{E}}_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[|(f-h)({\mathcal{P}}(x),l_{p})-(f-h)(x+{\mathcal{P}}(x),l_{p})|]\\ &+\eta{\mathbb{E}}_{x\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[|(f-h)({\mathcal{P}}(x),l_{p})-(f-h)(x+{\mathcal{P}}(x),l_{p})|]\\ =&(1-\eta){\mathbb{E}}_{x\sim{\mathcal{D}}_{S}^{\neq l_{p}}}[|g_{f,h}({\mathcal{P}}(x))-g_{f,h}(x+{\mathcal{P}}(x))|]+\eta{\mathbb{E}}_{x\sim{\mathcal{D}}^{l_{p}}_{{\mathcal{S}}}}[|g_{f,h}({\mathcal{P}}(x))-g_{f,h}(x+{\mathcal{P}}(x))|]\end{array} (20)

by using Result two and three in equation (20), defining Rad(H,F)=Rad1(H,F)+Rad2(H,F){\hbox{\rm{Rad}}}(H,F)={\hbox{\rm{Rad}}}_{1}(H,F)+{\hbox{\rm{Rad}}}_{2}(H,F) and using η<1\eta<1, we get the result. ∎

Now, we use Proposition E.6 to prove the Proposition E.4:

Proof.

For HH in Proposition E.4, we define a new hypothesis space H1={h1(x,y)=1hy(x)hH}H_{1}=\{h_{1}(x,y)=1-h_{y}(x)\|h\in H\}. Similarly, we define F1F_{1}.

We show that H1H_{1} and F1F_{1} satisfy the conditions in Proposition E.6:

(1): It is easy to see that for any h1H1h_{1}\in H_{1}, we have h1(x,y1)+h1(x,y2)=2(hy1(x)+hy2(x))1h_{1}(x,y_{1})+h_{1}(x,y_{2})=2-(h_{y_{1}}(x)+h_{y_{2}}(x))\geq 1, and similar for F1F_{1}. 1hy(x)1-h_{y}(x) and hy(x)h_{y}(x) have the same Lipschitz constant.

(2): Condition (1) in Proposition E.6 stands for F1F_{1}. By condition (t1) in Proposition E.4 and F1(x,y)=1fy(x)F_{1}(x,y)=1-f_{y}(x), we can prove this.

(3): Condition (2) in Proposition E.6 and condition (t2) in E.4 are the same; condition (3) in Proposition E.6 and condition (t3) in E.4 are the same.

So, h1h_{1} and F1F_{1} satisfy the conditions in Proposition E.6. We now use the proposition for h1h_{1} and F1F_{1}. Considering that h1(x,y)F1(x,y)=fy(x)hy(x)h_{1}(x,y)-F_{1}(x,y)=f_{y}(x)-h_{y}(x), so we have

𝔼x𝒟𝒮[|(fh)lp(𝒫(x))(fh)lp(x+𝒫(x))|]2(1c(14ϵ)+k(c+4L))+2ϵ)+Rad(H1,F1)+16ln(1/δ)Nα.\begin{array}[]{ll}&{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)_{l_{p}}({\mathcal{P}}(x))-(f-h)_{l_{p}}(x+{\mathcal{P}}(x))|]\\ \leq&2(1-c(1-4\epsilon)+k(c+4L))+2\epsilon)+{\hbox{\rm{Rad}}}(H_{1},F_{1})+16\sqrt{\frac{\ln(1/\delta)}{N\alpha}}.\end{array}

Rad(H1,F1){\hbox{\rm{Rad}}}(H_{1},F_{1}) is the value of Radermacher complexity of H1H_{1} and F1F_{1}, mentioned in Proposition E.6. For such a Rademacher complexity, using Lemma D.5 and RadM𝒟(H)NMRadN𝒟(H){\hbox{\rm{Rad}}}_{M}^{\mathcal{D}}(H)\leq\frac{N}{M}Rad_{N}^{\mathcal{D}}(H) for any M<NM<N, distribution 𝒟{\mathcal{D}} and hypothesis space HH, we have

𝔼x𝒟𝒮[|(fh)lp(𝒫(x))(fh)lp(x+𝒫(x))|]2(1c(14ϵ)+(c+4L)k+2ϵ)+Rad2(W,D)+16ln(1/δ)Nα=2(1c+(c+4L)k+(2+4c)ϵ)+Rad2(W,D)+16ln(1/δ)Nα.\begin{array}[]{ll}&{\mathbb{E}}_{x\sim{\mathcal{D}}_{{\mathcal{S}}}}[|(f-h)_{l_{p}}({\mathcal{P}}(x))-(f-h)_{l_{p}}(x+{\mathcal{P}}(x))|]\\ \leq&2(1-c(1-4\epsilon)+(c+4L)k+2\epsilon)+{\hbox{\rm{Rad}}}2({\mathcal{H}}_{W,D})+16\sqrt{\frac{\ln(1/\delta)}{N\alpha}}\\ =&2(1-c+(c+4L)k+(2+4c)\epsilon)+{\hbox{\rm{Rad}}}2({\mathcal{H}}_{W,D})+16\sqrt{\frac{\ln(1/\delta)}{N\alpha}}.\end{array}

Rad2{\hbox{\rm{Rad}}}2 is defined in Definition E.4, and the proposition is proved. ∎

E.3 Proof of Proposition E.3

Proof.

We first show that LL is a simple feature recognition space with constant 1.

If ww satisfies w(x+𝒫(x))1η1w(x+{\mathcal{P}}(x))\geq 1-\eta_{1} for (x,0)D\forall(x,0)\in D and wxη1wx\leq\eta_{1} for (x,1)D\forall(x,1)\in D, then we consider the linear function wxη1wx-\eta_{1}. Because we have wxη1wx\leq\eta_{1} for (x,1)D\forall(x,1)\in D, so wxη10wx-\eta_{1}\leq 0 for (x,1)D\forall(x,1)\in D. Because DD is linearly inseparable, and there must be a (x0,0)D(x_{0},0)\in D such that wx0η10wx_{0}-\eta_{1}\leq 0, but w(x0+𝒫(x0))1η1w(x_{0}+{\mathcal{P}}(x_{0}))\geq 1-\eta_{1}. Thus, we have w𝒫(x0)12η1w{\mathcal{P}}(x_{0})\geq 1-2\eta_{1}.

Considering that 𝒫(x)𝒫(x0)2k||{\mathcal{P}}(x)-{\mathcal{P}}(x_{0})||_{2}\leq k for all (x,0)D(x,0)\in D, we have w𝒫(x)12η1w2(𝒫(x)𝒫(x0))212ηkw{\mathcal{P}}(x)\geq 1-2\eta_{1}-||w||_{2}||({\mathcal{P}}(x)-{\mathcal{P}}(x_{0}))||_{2}\geq 1-2\eta-k. Moreover, w(x1+𝒫(x))wx1=w𝒫(x)w(x_{1}+{\mathcal{P}}(x))-wx_{1}=w{\mathcal{P}}(x), so as said above, when (x,0)D(x,0)\in D we have w(x1+𝒫(x))wx112ηkw(x_{1}+{\mathcal{P}}(x))-wx_{1}\geq 1-2\eta-k, which is what we want.

Second, we show that HH is a simple feature recognition space with constant aa.

Because ff is an increasing function, similar as before, if h(l(x+𝒫(x)))1η1h(l(x+{\mathcal{P}}(x)))\geq 1-\eta_{1} for (x,0)D\forall(x,0)\in D and h(l(x))η1h(l(x))\leq\eta_{1} for (x,1)D\forall(x,1)\in D, then we have h(l(x0))η10h(l(x_{0}))-\eta_{1}\leq 0 for some (x0,0)D(x_{0},0)\in D. As a consequence, we have 12η1h(l(x0+𝒫(x0)))h(l(x0))l(x0+𝒫(x0))l(x0)1-2\eta_{1}\leq h(l(x_{0}+{\mathcal{P}}(x_{0})))-h(l(x_{0}))\leq l(x_{0}+{\mathcal{P}}(x_{0}))-l(x_{0}) by f(x)1f^{\prime}(x)\leq 1. Because l(x)Ll(x)\in L, let l(x)=wxl(x)=wx, so w𝒫(x0)(12η1)w{\mathcal{P}}(x_{0})\geq(1-2\eta_{1}), and considering that h(0)=0h(0)=0, we have h(w𝒫(x0))a(12η1)h(w{\mathcal{P}}(x_{0}))\geq a(1-2\eta_{1}).

Then, similar to Step one, we can get w𝒫(x)12η1kw{\mathcal{P}}(x)\geq 1-2\eta_{1}-k. As a consequence, h(l(x))=h(w𝒫(x))a(12η1k)h(l(x))=h(w{\mathcal{P}}(x))\geq a(1-2\eta_{1}-k), and h(l(x1+𝒫(x0)))h(l(x1))a(w(x1+𝒫(x0))w(x1))=aw𝒫(x0)a(12η1k)h(l(x_{1}+{\mathcal{P}}(x_{0})))-h(l(x_{1}))\geq a(w(x_{1}+{\mathcal{P}}(x_{0}))-w(x_{1}))=aw{\mathcal{P}}(x_{0})\geq a(1-2\eta_{1}-k), so we get the result.

Appendix F More Details on the Experiments

F.1 The Experiment Setting

We want to perform experiments in more practical settings:

1: The attacker only accesses part of the training set, so we only use a portion of the training set data in the process of generating triggers.

2: The attacker cannot control the training process of the victim network, so we use standard training models for the victim network.

3: The attacker does not know the structure of the victim network and does not have great computing power, so we only use smaller networks independent of the victim network in the process of generating triggers.

In previous backdoor attacks, some of these conditions have not been assumed. For example, attackers are assumed to be able to access the entire network training set (Gao et al., 2023); attackers know the structure of the network (Zeng et al., 2022); attackers can control the training process (Gu et al., 2017); and attackers have some additional information (Ning et al., 2021).

We use networks VGG16 (Simonyan & Zisserman, 2014), ResNet18, WRN34-10 (He et al., 2016) and datasets CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), SVHN, and TinyImagenet with 100 classes (Le & Yang, 2015). When we train victim network, we use SGD, we have 150 epochs in the training, the learning rate is 0.01, and reduce to 80%80\% at 40-th,80-th, 120-th epochs, use weight decay 10410^{-4}, momentum 0.90.9, each data in the training set will flip or randomly crop before inputting network in the training.

We will use Algorithm 1 to find the trigger 𝒫(x){\mathcal{P}}(x), and the basic settings of Algorithm 1 are as follows. We randomly choose 50%50\% samples from the original training set to be TT.

We choose 1{\mathcal{F}}_{1} as VGG9 for dataset CIFAR-10 (VGG16 for CIFAR100 or SVHN, Resnet34 for TinyImageNet), use adversarial training with PGD-10 and LL_{\infty} norm budget 8/2558/255 for dataset CIFAR-10 or SVHN (4/2554/255 for CIFAR-100 and TinyImageNet) to train 1{\mathcal{F}}_{1}. There are 200 epochs in the adversarial training, learning rate is 0.01, and reduce by half at 100-th and 150-th epochs; use weight decay 10410^{-4}, momentum 0.90.9; each data in the training set will flip or randomly crop after doing PGD-10.

We choose 2{\mathcal{F}}_{2} to be a two layer network (structure is shown below). There are 40 epochs in the training of 2{\mathcal{F}}_{2}; learning rate is 0.01; use weight decay 10410^{-4}, momentum 0.90.9; each data in the training set will flip or randomly crop before inputting 2{\mathcal{F}}_{2}. The budget of poison is LL_{\infty} norm 8/2558/255 to 32/25532/255.

Once we obtain the trigger 𝒫(x){\mathcal{P}}(x), we will randomly select some samples with label lpl_{p} in the training set and add the trigger to them. Then, we will train the network by using the poisoned training set and measure the accuracy and the attack success rate on the test set.

Running Time We do our experiments on Pytorch and GPU NVIDIA GeForce RTX 3090. Under the above experimental setup, the time required for the experiment is shown in Table 7. It can be seen that most of the time is spent on adversarial training of the network 1{\mathcal{F}}_{1}.

Table 7: Training time (in minutes). Gp100 means generate poison for 100 samples by using 1{\mathcal{F}}_{1} and 2{\mathcal{F}}_{2}.
dataset 1{\mathcal{F}}_{1} 2{\mathcal{F}}_{2} victim {\mathcal{F}} Gp100
CIFAR-10 400 30 120 2
CIFAR-100 600 30 140 3
SVHN 1000 35 160 3
TinyImageNet 1600 48 250 4

Reasons for 1{\mathcal{F}}_{1}. 1{\mathcal{F}}_{1} is a network which is used to create adversarial noise, and is trained on a small clean dataset. We hope to conduct the experiment under the premise “Attacker does not about victim network,” so we avoid using a network with the same structure as victim network in the process of producing poison. On the other hand, we also hope to do the experiment under premise “Attacker does not have great computing power,” so we try to use a smaller network to generate the poison as much as possible.

The structure of 2{\mathcal{F}}_{2}.

The first layer: with Channel 64 and 3×33\times 3 convolution, padding=1, do Relu and Maxpooling.

The second layer: shape to 16384(=641616)16384(=64*16*16) dim vector (65536(=643232)65536(=64*32*32) for TinyImageNet), and do fully connected layer, and output a 2-dim vector.

Reasons for 2{\mathcal{F}}_{2}. 2{\mathcal{F}}_{2} is a network that is used to create shortcuts, and such shortcuts are used to make the clean and poison dataset linear separable, and 2{\mathcal{F}}_{2} is trained by Min-Min method. Generally speaking, if the data is not too complex, the structure of this network does not need to be particularly large, a two layer network is enough to create short cut.

We do not choose 2{\mathcal{F}}_{2} to be a linear function, because we always use data enhancement in the training, consider that making the enhanced data linearly separable is very difficult, so we let 2{\mathcal{F}}_{2} to be a two-layer network.

About PGD.

PGD-NN means using PGD with NN steps, and 1/2551/255 (2/2552/255, 3/2553/255, 4/2554/255) attacking rate to get adversarial with budget 8/2558/255 (16/25516/255, 24/25524/255, 32/25532/255).

F.2 More Experimental Result

Table 8 is the supplement of Tables 1 and 2, which is the result on CIFAR10 under backdoor attacks with various settings.

Table 8: Baseline evaluations on CIFAR-10. Poison model Accuracy (AA), poison model target label accuracy (AtA_{t}), and attack success rate (ASRASR).
Poison budget: 0.6% 1% 2% 0.6% 1% 2%
VGG16
Bound: 8/2558/255 16/25516/255
AA 91%\% 91%\% 93%\% 92%\% 91%\% 91%\%
AtA_{t} 92%\% 92%\% 90%\% 92%\% 91%\% 90%\%
ASRASR 13%\% 48%\% 51%\% 82%\% 91%\% 94%\%
Bound: 24/25524/255 32/25532/255
AA 91%\% 90%\% 91%\% 90%\% 92%\% 91%\%
AtA_{t} 92%\% 91%\% 92%\% 92%\% 89%\% 90%\%
ASRASR 97%\% 99%\% 99%\% 99%\% 99%\% 99%\%
ResNet18
Bound: 8/2558/255 16/25516/255
AA 93%\% 92%\% 90%\% 93%\% 91%\% 93%\%
AtA_{t} 92%\% 92%\% 93%\% 93%\% 93%\% 92%\%
ASRASR 14%\% 33%\% 47%\% 86%\% 93%\% 94%\%
Budget: 24/25524/255 32/25532/255
AA 92%\% 92%\% 91%\% 92%\% 92%\% 90%\%
AtA_{t} 92%\% 92%\% 91%\% 91%\% 92%\% 90%\%
ASRASR 96%\% 99%\% 99%\% 98%\% 99%\% 99%\%

Table 9 is the supplement of Table 3, which is the result on some datasets under backdoor attacks with various budgets.

Table 9: Accuracy(A) and attack success rate (ASRASR) on CIFAR-100, SVHN and TinyImageNet, target label 0.
CIFAR-100
Budget: 8/2558/255 16/25516/255 32/25532/255
poison rate: 0.6% 0.8% 0.6% 0.8% 0.6% 0.8%
AA 71%\% 73%\% 73%\% 72%\% 72%\% 71%\%
ASRASR 4%\% 14%\% 85%\% 92%\% 99%\% 99%\%
SVHN
Budget: 8/2558/255 16/25516/255 32/25532/255
poison rate: 1%\% 2% 1% 2% 1% 2%
AA 93%\% 93%\% 93%\% 92%\% 93%\% 91%\%
ASRASR 16%\% 22%\% 63%\% 79%\% 90%\% 99%\%
TinyImageNet
Budget: 8/2558/255 16/25516/255 32/25532/255
poison rate: 0.6% 0.8% 0.6% 0.8% 0.6% 0.8%
AA 61%\% 62%\% 60%\% 60%\% 59%\% 60%\%
ASRASR 2%\% 9%\% 61%\% 82%\% 99%\% 99%\%

Tables 10 and 11 follow Table 4, and more comparison is shown in it.

Table 10: Benchmark results of attack success rate on CIFAR-10 with VGG16 and ResNet18. Comparison of our method to popular clean-label attacks. Poison ratio is 1%1\% and perturbation have different ll_{\infty}-norm bound from 8/2558/255 to 32/25532/255.
Victim VGG16 ResNet18
Budget: 8255\frac{8}{255} 16255\frac{16}{255} 32255\frac{32}{255} 8255\frac{8}{255} 16255\frac{16}{255} 32255\frac{32}{255}
Ours 48%\% 91%\% 99%\% 33%\% 93%\% 99%\%
Clean Label 18%\% 44%\% 84%\% 12%\% 22%\% 80%\%
Invisible Poison 23%\% 71%\% 99%\% 24%\% 73%\% 98%\%
Hidden Trigger 36%\% 80%\% 99%\% 26%\% 75%\% 99%\%
Narcissu 20%\% 60%\% 92%\% 16%\% 50%\% 92%\%
Image-specific 22%\% 68%\% 94%\% 18%\% 70%\% 95%\%
Reflection 26%\% 68%\% 99%\% 20%\% 54%\% 99%\%
Sleeper-Agent 20%\% 61%\% 97%\% 29 %\% 70%\% 99%\%
Table 11: Benchmark results of attack success rate on TinyImagenet with network WRN34-10. Comparison of our method to popular clean-label attacks. Poison ratio is 0.8%0.8\% and budget is ll_{\infty}-norm bound 16/25516/255. You can see that our results are very outstanding.
Attack methods ASR
Ours 82%\%
Clean Label 4%\%
Invisible Poison 27%\%
Hidden Trigger 44%\%
Narcissu 2%\%
Image-specific 8%\%
Reflection 11%\%
Sleeper-Agent 4%\%

F.3 Detail of each Attack

This section shows the experiment settings of the attack methods in Section 6.3. For all attacks, we basically follow the algorithm in the original paper for experimentation, but they have some different settings from ours in creating poison and backdoor, which need to be slightly modified according to our experimental settings.

Ensure Invisible. These attacks design the poison added to the training set as invisible (i.e. ensure the LL_{\infty} norm not more than 8/255, 16/255 or 32/255), but some attack methods will design the trigger as a patch, which has no norm limitation, as shown in Figure 4. This is unfair for the attacks that design triggers as invisible. For greater fairness, we require that triggers must also be invisible (i.e. bound by the LL_{\infty} norm) for all attacks, and as compensation, triggers can be added to the whole image rather than a patch. If the trigger of an attack method exceeds the norm limit, we use the following method to compress its trigger within the norm of the limitation: xx is a sample, t(x)t(x) is the trigger of xx without norm constraint, ϵ\epsilon is the norm limitation. We will compress t(x)t(x) to a trigger twn(x)t_{wn}(x) satisfying twn(x)ϵ||t_{wn}(x)||\leq\epsilon as:

twn(x)=argmint<ϵF(x+t)F(x+t(x))t_{wn}(x)={\hbox{\rm{argmin}}}_{||t||<\epsilon}||F(x+t)-F(x+t(x))||

where FF is a feature extraction network which has nine convolution hidden layers.

Refer to caption
Figure 4: When trigger is a patch without norm limitation, it is not invisible. This figure is from (Souri et al., 2022).

Cost of Attack. For the sake of fairness, we try to ensure that all attack methods use networks with similar scale in the process of generating poison and trigger(VGG9 for Cifar10 and ResNet34 for TinyImagenet). Details about these attack methods are as follows.

(Clean Label): Use PGD-40 to find adversarial of a VGG9, which is trained on the clean training set.

(Invisible Poison): The auto-encoder architecture has the structure described in paper (Ning et al., 2021). The original trigger is a disturbance with L0L_{0} norm 100 to the lower left part of the image.

(Hidden Trigger): The original trigger is a patch that disturbs 100 pixels to the lower left part of the image. Use a VGG9 to be a feature extractor in creating the poison of the training set.

(Narcissu): Use a VGG9 as the surrogate network.

(Image-specific): A U-net with depth 18 has been taken as Trigger Generator.

(Reflection): Select an image and use its reflection as a trigger, adding the reflection to image through convolution.

(Sleeper-agent): The original trigger is a patch that disturbs 100 pixels to the lower left part of the image. Use a VGG9VGG9 to be the proxy network in creating poison.

F.4 Strengthen Attack

We try to strengthen our attack method to bypass the defense by using the following methods.

Bypass (AT): Use the method (Fu et al., 2020)(min-min-max method, which can create the shortcut for adversarial learning) to train 2{\mathcal{F}}_{2}, and 2{\mathcal{F}}_{2} expands to VGG9.

Bypass (Data Augmentation): Use strong data enhancement in the training process 1{\mathcal{F}}_{1} and 2{\mathcal{F}}_{2} in algorithm 1, and 2{\mathcal{F}}_{2} expands to VGG9.

Bypass (DPSGD): Use DPSGD in the training process 1{\mathcal{F}}_{1} and 2{\mathcal{F}}_{2} in Algorithm 1;

Bypass (Frequency Filter): Using the method in (Dabouei et al., 2020) to control frequency domain of poison, making poison low frequency.

F.5 Some Others Attacks

The transferability of attacks.

Please note that we do experiment under the setting ‘the attacker does not know the structure of the victim network’ as said in the experimental setup in Appendix F.1. So, all surrogate networks we used during creating trigger will not change based on the victim network, only change based on the dataset. What is mentioned above is reflected in the experimental setup in Appendix F.1. So the triggers which we used in table 1 for ResNet18, VGG16 and WRN are the same, then table 1 naturally implies the transferability of our trigger between different networks are good.

But the drawback of our trigger is that it cannot be transferred between different datasets, for example, the trigger created for CIFAR-10 can not be used on CIFAR-100.

Weather Victim Network learns the trigger feature? We will continue the experiment in Table 1 and consider two special sets: Dop={(𝒫(x),0)|(x,y)𝒟test}D_{op}=\{({\mathcal{P}}(x),0)|(x,y)\in{\mathcal{D}}_{test}\} and Dop1={(0.5+𝒫(x),0)|(x,y)𝒟test}D_{op1}=\{(0.5+{\mathcal{P}}(x),0)|(x,y)\in{\mathcal{D}}_{test}\}, where 𝒟test{\mathcal{D}}_{test} is the test set of CIFAR-10, and 0.5+𝒫(x)0.5+{\mathcal{P}}(x) represents each weight of 𝒫(x){\mathcal{P}}(x) plus 0.50.5. These are the sets of poisons. We tested the accuracy of the victim network on them to measure whether the network has learned noise features, and the result is given in Table 12.

Table 12: Accuracies on 𝒟op{\mathcal{D}}_{op} and 𝒟op1{\mathcal{D}}_{op1} for ResNet18 (R) and VGG16 (V). PN is the number of poisoned samples.
Budget: 8/2558/255 16/25516/255
PN: 0.6% 1% 2% 0.6% 1% 2%
V, 𝒟op{\mathcal{D}}_{op} 84%\% 95%\% 98%\% 100%\% 100%\% 100%\%
R, 𝒟op{\mathcal{D}}_{op} 80%\% 92%\% 98%\% 99%\% 100%\% 100%\%
V, 𝒟op1{\mathcal{D}}_{op1} 94%\% 98%\% 99%\% 100%\% 100%\% 100%\%
R, 𝒟op1{\mathcal{D}}_{op1} 93%\% 98%\% 98%\% 99%\% 100%\% 100%\%

The victim network has a very high accuracy for the poisoned sets 𝒟op{\mathcal{D}}_{op} and 𝒟op1{\mathcal{D}}_{op1}, even with the budget 8/2558/255, which means the victim network has learned the trigger feature. However, the data in Table 1 are not as good as those in Table 12, because when the input contains both original features and poison features, each of them will affect the output of the network and the final result is decided by them together. Therefore, when the network learns the original features well, it is also necessary to increase the scale of poison, and this is why under the premise of budget 8/2558/255, the effect does not appear to be good in Table 1. So, we can reach the result: Victim network is very sensitive to poison features and uses them for classification. But the victim network still focuses on original image features, and it will make a choice on the stronger side of these two features.

F.6 More Detail for Section 6.5

The construction details of the poison in Section 6.5 are as follows.

(RN(LL_{\infty} budget)): 𝒫(x){\mathcal{P}}(x) is random noise with LL_{\infty} budget 16/25516/255 or 32/25532/255. The method for random selection of noise is: each pixel of noise is i.i.d. obeying the Bernoulli distribution in {16/255,16/255}\{-16/255,16/255\} or {32/255,32/255}\{-32/255,32/255\}. Please note that a noise vector is selected as trigger for all samples, but not each sample selects a noise as trigger.

(RN(L0L_{0} budget)) 𝒫(x){\mathcal{P}}(x) is random noise with L0L_{0} budget 200200 or 300300. The method for generating noise is: Randomly select 200 or 300 pixels and change their values to 0. Please note that the position of each pixels that becomes 0 in each image is the same, but not each sample random selects some pixel to become 0.

(UA) 𝒫(x){\mathcal{P}}(x) is universal adversarial disturbance with LL_{\infty} budget 16/25516/255 and 32/25532/255, we use the method (Moosavi-Dezfooli et al., 2017) to find universal adversarial disturbance of a trained VGG9 (training method is as same as 1{\mathcal{F}}_{1} as mentioned in Section F.1).

(Adv) 𝒫(x){\mathcal{P}}(x) is adversarial disturbance with LL_{\infty} budgets 16/25516/255 and 32/25532/255. We use PGD-40 to find adversarial disturbance of a trained VGG9 (training method is as same as 1{\mathcal{F}}_{1} as mentioned in Section F.1).

(SCut) 𝒫(x){\mathcal{P}}(x) is shortcut noise with LL_{\infty} budgets 16/25516/255 and 32/25532/255. The method for generating shortcut noise is Min-Min method (Huang et al., 2021), using a 2-depth neural network to find shortcut.

(Ours): Following Algorithm 1 and Section F.1.

The result on VGG is in the following table 13.

Table 13: The supplement of Tabel 6. The value of VadvV_{adv} and VscV_{sc}, and Accuracy (A) and attack success rate (ASR) on the test set. Use 12 different triggers.
Poison Type Vadv()V_{adv}(\uparrow) Vsc()V_{sc}(\downarrow) ASR()ASR(\uparrow) A
RN LL_{\infty}, 16/25516/255 2.72 0.014 16%\% 91%\%
RN LL_{\infty}, 32/25532/255 6.31 10410^{-4} 98%\% 90%\%
RN L0L_{0}, 200200 4.64 0.004 76%\% 91%\%
RN L0L_{0}, 300300 6.20 0.003 92%\% 90%\%
UA 16/25516/255 3.19 0.002 63%\% 91%\%
UA 32/25532/255 17.92 10410^{-4} 93%\% 90%\%
Adv 16/25516/255 9.77 1.27 44%\% 90%\%
Adv 32/25532/255 18.63 0.35 84%\% 90%\%
SCut 16/25516/255 1.21 10410^{-4} 33%\% 91%\%
SCut 32/25532/255 4.02 10510^{-5} 91%\% 90%\%
Ours 16/25516/255 7.21 0.001 91%\% 90%\%
Ours 32/25532/255 15.95 10410^{-4} 99%\% 92%\%

F.7 Verify Theorem 4.1

In this section, we use experiments to verify Theorem 4.1.

We will show that: when 𝒫(x){\mathcal{P}}(x) is fixed, the poison rate will affect accuracy.

We use dataset CIFAR-10, network ResNet18, target label 0, and poison 1000, 2000, 3000 or 4000 randomly selected images with label 0. We use the follow trigger 𝒫(x){\mathcal{P}}(x) to test our conclusion.

(PN): 𝒫(x){\mathcal{P}}(x) is random noise with LL_{\infty} budget 8/2558/255, 16/25516/255 and 32/25532/255. The method for random select noise is: each pixel of noise is i.i.d. obeying the Bernoulli distribution in {8/255,8/255}\{-8/255,8/255\} or {16/255,16/255}\{-16/255,16/255\} or {32/255,32/255}\{-32/255,32/255\}.

(MI): Mixed image poisoning method. x+𝒫(x)x+{\mathcal{P}}(x) is calculated as: randomly find an x1x_{1} without label 0, and make x+𝒫(x)=(1λ)x+λx1x+{\mathcal{P}}(x)=(1-\lambda)x+\lambda x_{1}, where λ=0.05,0.15,0.25\lambda=0.05,0.15,0.25.

(Ours): 𝒫(x){\mathcal{P}}(x) is generated by Algorithm 1 with budgets 8/2558/255, 16/25516/255 or 32/25532/255.

The following table shows the accuracy and accuracy of the image with label 0.

Table 14: Accuracy (A) and accuracy of image with label 0(AtA_{t}) on the test set. The ones in parentheses are AtA_{t}. Use nine different triggers.
1000 2000 3000 4000
PN, 8/2558/255: 92(91)%\% 92(91)%\% 91(90)%\% 90(88)%\%
PN, 16/25516/255: 91(92)%\% 91(91)%\% 90(90)%\% 90(88)%\%
PN, 32/25532/255: 91(92)%\% 90(89)%\% 89(89)%\% 89(87)%\%
MI:λ=0.05\lambda=0.05 92(91)%\% 91(91)%\% 92(90)%\% 90(89)%\%
MI:λ=0.15\lambda=0.15 91(92)%\% 90(91)%\% 90(90)%\% 89(89)%\%
MI:λ=0.25\lambda=0.25 92(90)%\% 91(90)%\% 89(89)%\% 88(87)%\%
Ours, 8/2558/255 92(93)%\% 91(92)%\% 91(91)%\% 90(89)%\%
Ours, 16/25516/255 92(92)%\% 90(91)%\% 90(89)%\% 89(88)%\%
Ours, 32/25532/255 90(90)%\% 91(90)%\% 90(89)%\% 90(88)%\%

We can see that the higher the poison rate, the greater the impact on accuracy. However, the degree of decline is not significant: for more than 3,000 poisoned samples, the accuacy just decreases 4%\%. This is because the poison budget is controlled to be small.