This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adversarial Parameter Attack on Deep Neural Networksthanks: This work is partially supported by NSFC grant No.11688101 and NKRDP grant No.2018YFA0704705.

Lijia Yu, Yihan Wang, and Xiao-Shan Gao
Academy of Mathematics and Systems Science, Chinese Academy of Sciences,
Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Abstract

In this paper, a new parameter perturbation attack on DNNs, called adversarial parameter attack, is proposed, in which small perturbations to the parameters of the DNN are made such that the accuracy of the attacked DNN does not decrease much, but its robustness becomes much lower. The adversarial parameter attack is stronger than previous parameter perturbation attacks in that the attack is more difficult to be recognized by users and the attacked DNN gives a wrong label for any modified sample input with high probability. The existence of adversarial parameters is proved. For a DNN Θ{\mathcal{F}}_{\Theta} with the parameter set Θ\Theta satisfying certain conditions, it is shown that if the depth of the DNN is sufficiently large, then there exists an adversarial parameter set Θa\Theta_{a} for Θ\Theta such that the accuracy of Θa{\mathcal{F}}_{\Theta_{a}} is equal to that of Θ{\mathcal{F}}_{\Theta}, but the robustness measure of Θa{\mathcal{F}}_{\Theta_{a}} is smaller than any given bound. An effective training algorithm is given to compute adversarial parameters and numerical experiments are used to demonstrate that the algorithms are effective to produce high quality adversarial parameters.

Keyword. Adversarial parameter attack, adversarial samples, robustness measurement, adversarial accuracy, mathematical theory for safe DNN.

1 Introduction

The deep neural network (DNN) [15] has become the most powerful machine learning method, which has been successfully applied in computer vision, natural language processing, and many other fields. Safety is a key desired feature of DNNs, which was studied extensively [1, 5, 42].

The most widely studied safety issue for DNNs is the adversarial sample attack [31], that is, it is possible to intentionally make small modifications to a sample, which are essentially imperceptible to the human eye, but the DNN outputs a wrong label or even any label given by the adversary. Existence of adversary samples makes the DNN vulnerable in safety-critical applications and many effective methods were proposed to develop more robust DNNs against adversarial attacks [20, 1, 5, 42]. However, it was shown that adversaries samples are inevitable for current DNN models in certain sense [6, 3, 27].

More recently, the parameter perturbation attacks [18, 43, 7, 33, 30, 37, 34, 35] were studied and shown to be another serious safety treat to DNNs. It was shown that by making small parameter perturbations, the attacked DNN can give wrong or desired labels to specified input samples and still give the correct labels to other samples [18, 43, 34, 35].

In this paper, the adversarial parameter attack is proposed, in which small perturbations to the parameters of a DNN are made such that the attack to the DNN is essentially imperceptible to the user, but the robustness of the DNN becomes much lower. The adversarial parameter attack is stronger than previous parameter perturbation attacks in that not only the accuracy but also the robustness of DNNs are considered.

1.1 Contributions

Let Θ{\mathcal{F}}_{\Theta} be a DNN with Θ\Theta as the parameter set. A parameter perturbation Θa\Theta_{a} is called a set of adversarial parameters of FΘF_{\Theta} or Θ\Theta, if the following conditions are satisfied 1) Θa\Theta_{a} is a small modification of Θ\Theta, for instance ΘaΘϵ||\Theta_{a}-\Theta||_{\infty}\leq\epsilon for a small positive number ϵ\epsilon; 2) the accuracy of Θa{\mathcal{F}}_{\Theta_{a}} over a distribution of samples is almost the same as that of Θ{\mathcal{F}}_{\Theta}; 3) Θa{\mathcal{F}}_{\Theta_{a}} is much less robust than Θ{\mathcal{F}}_{\Theta}, that is, Θa{\mathcal{F}}_{\Theta_{a}} has much more adversarial samples than Θ{\mathcal{F}}_{\Theta}. It is clear that conditions 1) and 2) are to make the attack difficult to be recognized by the users and condition 3) is to make the new DNN less safe. The DNN obtained by the above attack is called an adversarial DNN, which has high accuracy but low robustness.

The existence of adversarial parameters is proved under certain assumptions. It is shown that if the depth of a trained DNN Θ{\mathcal{F}}_{\Theta} is sufficiently large, then there exist adversarial parameters Θa\Theta_{a} such that the accuracy of Θa{\mathcal{F}}_{\Theta_{a}} is equal to that of Θ{\mathcal{F}}_{\Theta}, but the robustness measure of Θa{\mathcal{F}}_{\Theta_{a}} is as small as possible (refer to Corollaries 4.4 and 4.5). Since Θ{\mathcal{F}}_{\Theta} is a continuous function in Θ\Theta, if Θa\Theta_{a} is an adversarial parameter for Θ\Theta then there exists a small sphere SaS_{a} with Θa\Theta_{a} as center such that all parameters in SaS_{a} are also adversarial parameters for Θ\Theta. These results imply that adversarial parameters are inevitable in certain sense, similar to adversarial samples [6, 3, 27].

The existence of adversarial samples is usually demonstrated with numerical experiments, besides a few cases to be mention in the next section. As an application of adversarial parameters, we can construct DNNs which are guaranteed to have adversarial samples. For a trained DNN Θ{\mathcal{F}}_{\Theta} satisfying certain conditions, it is shown that there exist adversarial parameters Θa\Theta_{a} such that the accuracy of Θa{\mathcal{F}}_{\Theta_{a}} is equal to that of Θ{\mathcal{F}}_{\Theta}, but Θa{\mathcal{F}}_{\Theta_{a}} has adversarial samples near a given normal sample (refer to Theorem 4.1), or the probability for Θa{\mathcal{F}}_{\Theta_{a}} to have adversarial samples over a distribution of samples is at least 1/21/2 (refer to Theorem 4.2).

Finally, an effective training algorithm is given to compute adversarial parameters and numerical experiments are used to demonstrate that the algorithms are effective to produce high quality adversarial parameters for the networks VGG19 and Resnet56 on the CIFAR-10 dataset.

1.2 Related work

There exist vast literatures on adversarial attacks, which can be found in the survey papers [1, 5, 42]. We will focus on those which are closely related to our work.

Parameter perturbation attacks. Parameter perturbation attacks were given under different names such as fault injection attack, fault sneaking attack, stealth attack, and weight corruption. The fault injection attack [18] was first proposed by Liu et al, where it was shown that parameter perturbations can be used to misclassify one given input sample. In [7], it was shown that laser injection techniques can be used as a successful fault injection attack in real-world applications. In [43], the fault sneaking attack was proposed, where multiple input samples were misclassified and other samples were still given the correct label. In [37], lower bounds were given for parameter perturbations under which the network still gives the correct label for a given sample. In [33], upper bounds were given for the changes of the pairwise class margin function and the Rademacher complexity against parameter perturbations and new loss functions were given to obtain more robust networks. In [30], the maximum change of the loss function over given samples was used as an indicator to measure the robustness of DNNs against parameter perturbations and gradient decent methods were used to compute the indicator. In [34, 35], the stealth attack which can guaranteed to make the attacked DNN gives a desired label for a sample outside of the validation set and keep correct labels for samples in the validation set. The stealth attack has the form +𝒰{\mathcal{F}}+{\mathcal{U}}, where {\mathcal{F}} is the DNN to be attacked and 𝒰{\mathcal{U}} is a DNN with one hidden layer.

The adversarial parameter attack proposed in this paper is stronger than previous parameter perturbation attacks by in the following aspects. First, by keeping the accuracy and eliminating the robustness, the adversarial parameter attack is more difficult to be recognized, because the attached DNN performs almost the same as the original DNN on the test set. Second, by reducing the robustness of the DNN, the attacked DNN gives a wrong label for any modified input sample with high probability, while previous parameter attacks usually misclassify certain given samples. Finally, we prove the existence of adversarial parameters under reasonable assumptions.

Mathematical theories of adversarial samples. Existence of adversarial samples were usually demonstrated with numerical experiments, and mathematical theories were desired. In [6], it was proved that for DNNs with a fixed architecture, there exist uncountable classification functions and distributions of samples such that adversarial samples always exist for any successfully trained DNN with the given architecture and the sample distribution. In the stealth attack [34, 35], it was proved that there exist attached DNNs which give a desired label for a sample outside of the validation set by modifying the DNN. In this paper, we show that by making small perturbations to the parameters of the DNN, the DNN has adversarial samples with high probability.

Theories for certified robustness of DNNs were given in several aspects. Let xx be a sample such that the DNN {\mathcal{F}} gives the correct label. Due to the continuity of the DNN function, a sphere with xx as center does not contain adversarial samples if its radius is sufficiently small, which is called a robust sphere of xx. In [12], lower bounds for the robustness radius were computed and used to enhance the robustness of the DNN. In [24], for shallow networks, the upper bounds of the changes of the network under sample input perturbations were given and use to obtain more robust DNNs. In [8], the random smoothing method was proposed and lower bounds for the radius of the robust spheres was given. In [39], lower bounds for the average radius of robust spheres for a distribution of samples are given. Universal lower bounds on stability in terms of the dimension of the domain of the classification function were also given in [27, 34]. However, these bounds are usually inverse-exponentially dependent on the depth of the DNN, which are very small for deep networks in real world applications. In [40], the information-theoretically safe bias classifier was introduced by making the gradient of the DNN random.

Algorithms to train robust DNNs. Many effective methods were proposed to train more robust DNNs to defend adversarial samples [1, 5, 42, 38]. Methods to train DNNs which are more robust against parameter perturbation attacks were also proposed [18, 43]. The adversarial training method proposed by Madry et al [20] can reduce adversarial samples significantly, where the value of the loss function of the worst adversary in a small neighborhood of the training sample is minimized. In this paper, the idea of adversarial training is used to compute adversarial parameters.

2 Adversarial parameters

In this section, we define the adversarial parameters and give a measurement for the quality of the adversarial parameters.

2.1 Adversarial parameters of DNNs

Let us consider a standard DNN. Denote 𝕀=[0,1]{\mathbb{I}}=[0,1]\subset{\mathbb{R}} and [n]={1,,n}[n]=\{1,\ldots,n\} for n+n\in{\mathbb{N}}_{+}. In this paper, we assume that :𝕀nm{\mathcal{F}}:{\mathbb{I}}^{n}\to{\mathbb{R}}^{m} is a classification DNN for mm objects, which has LL hidden-layers, all hidden-layers use Relu as the activity function, and the output layer does not have activity functions. {\mathcal{F}} can be written as

x0𝕀n,n0=n,nL+1=m;xl=Relu(Wlxl1+bl)nl,Wlnl×nl1,blnl,l[L];(x0)=xL+1=WL+1xL+bL+1,WL+1m×nL,bL+1m.\begin{array}[]{ll}x_{0}\in{\mathbb{I}}^{n},n_{0}=n,n_{L+1}=m;\\ x_{l}={\hbox{\rm{Relu}}}(W_{l}x_{l-1}+b_{l})\in{\mathbb{R}}^{n_{l}},W_{l}\in{\mathbb{R}}^{n_{l}\times n_{l-1}},b_{l}\in{\mathbb{R}}^{n_{l}},l\in[L];\\ {\mathcal{F}}(x_{0})=x_{L+1}=W_{L+1}x_{L}+b_{L+1},W_{L+1}\in{\mathbb{R}}^{m\times n_{L}},b_{L+1}\in{\mathbb{R}}^{m}.\\ \end{array} (1)

Denote Θ={Wl,bl}l=1L+1k\Theta=\{W_{l},b_{l}\}_{l=1}^{L+1}\in{\mathbb{R}}^{k} to be the parameter set of {\mathcal{F}} and {\mathcal{F}} is denoted as Θ{\mathcal{F}}_{\Theta} if the parameters need to be mentioned explicitly, where k=l=1L+1nl(nl1+1)k=\sum_{l=1}^{L+1}n_{l}(n_{l-1}+1).

Let Θ{\mathcal{F}}_{\Theta} be a trained network with the parameter set Θ\Theta. Then a new parameter set Θa\Theta_{a} is called a set of adversarial parameters of Θ\Theta if 1) Θa\Theta_{a} is a small perturbation of Θ\Theta; 2) the accuracy of Θa{\mathcal{F}}_{\Theta_{a}} is almost the same as that of Θ{\mathcal{F}}_{\Theta}; 3) Θa{\mathcal{F}}_{\Theta_{a}} is much less robust comparing to Θ{\mathcal{F}}_{\Theta}, that is, Θa{\mathcal{F}}_{\Theta_{a}} has more adversarial samples than Θ{\mathcal{F}}_{\Theta}.

We assume that the objects to be classified satisfy a distribution DxD_{x} in n{\mathbb{R}}^{n}, and a sample xDxx\sim D_{x} is called a normal sample. Let Θk\Theta\in{\mathbb{R}}^{k} be the parameter set of a trained network Θ:𝕀nm{\mathcal{F}}_{\Theta}:{\mathbb{I}}^{n}\to{\mathbb{R}}^{m}. For xDxx\sim D_{x}, denote lxl_{x} to be the label of xx and ^Θ\widehat{{\mathcal{F}}}_{\Theta} to be the classification result of Θ{\mathcal{F}}_{\Theta}. Then the accuracy of Θ{\mathcal{F}}_{\Theta} for the normal samples is

A(Θ,Dx)=PxDx(^Θ(x)=lx).A({\mathcal{F}}_{\Theta},D_{x})=P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta}(x)=l_{x}). (2)

In order to measure the quality of adversarial parameters, we need a robustness measure R(Θ,Dx)R({\mathcal{F}}_{\Theta},D_{x}) of Θ{\mathcal{F}}_{\Theta} for the normal samples. There exist several definitions for R(Θ,Dx)R({\mathcal{F}}_{\Theta},D_{x}) [20, 39]. In this paper, two kinds of robustness measures are used.

We first give two robust measures of Θ{\mathcal{F}}_{\Theta} on a given sample x0x_{0}. The robustness radius of x0x_{0} under the LpL_{p} norm for p+{}p\in{\mathbb{R}}_{+}\cup\{\infty\} is defined to be

R1(Θ,x0)=max{ζ+|^(x)=lx,x s.t. xx0pζ}.R_{1}({\mathcal{F}}_{\Theta},x_{0})=\max\{\zeta\in{\mathbb{R}}_{+}\,|\,\widehat{{\mathcal{F}}}(x)=l_{x},\forall x\hbox{ s.t. }||x-x_{0}||_{p}\leq\zeta\}. (3)

If ^Θ(x0)lx0\widehat{{\mathcal{F}}}_{\Theta}(x_{0})\neq l_{x_{0}}, then the robustness radius of x0x_{0} is zero. It is difficult to have good estimation to the robustness radius, and the following approximation to the robust radius under LpL_{p} norm [12] is often used

R2(Θ,x0)=minl[m],llx{|lx(x0)l(x0)|lx(x0)l(x0)qI(lx(x0)>l(x0))}R_{2}({\mathcal{F}}_{\Theta},x_{0})=\min_{l\in[m],l\neq l_{x}}\{\frac{|{\mathcal{F}}_{l_{x}}(x_{0})-{\mathcal{F}}_{l}(x_{0})|}{||\nabla{\mathcal{F}}_{l_{x}}(x_{0})-\nabla{\mathcal{F}}_{l}(x_{0})||_{q}}I({\mathcal{F}}_{l_{x}}(x_{0})>{\mathcal{F}}_{l}(x_{0}))\} (4)

where l(x0){\mathcal{F}}_{l}(x_{0}) is the ll-th coordinate of (x0){\mathcal{F}}(x_{0}), (l(x))=l(t)t|t=x\nabla({\mathcal{F}}_{l}(x))=\frac{\nabla{\mathcal{F}}_{l}(t)}{\nabla t}|_{t=x}, p,q+{}p,q\in{\mathbb{R}}_{+}\{\infty\} satisfy 1/q+1/p=11/q+1/p=1 (p=0()p=0(\infty) iff q=(0)q=\infty(0)), and I(t)=1I(t)=1 if tt it true or I(t)=0I(t)=0 otherwise.

For a distribution DxD_{x} of samples, we define two global robust measures corresponding to R1R_{1} and R2R_{2}. The adversarial accuracy can be used as R(Θ,Dx)R({\mathcal{F}}_{\Theta},D_{x}). For ϵ+\epsilon\in{\mathbb{R}}_{+} and p+{}p\in{\mathbb{R}}_{+}\cup\{\infty\}, the adversarial accuracy of Θ{\mathcal{F}}_{\Theta} is

R3(Θ,Dx,ϵ)=PxDx(ϵR1(Θ,x))=PxDx(^Θ(x)=lx,x s.t. xxpϵ).\begin{array}[]{ll}R_{3}({\mathcal{F}}_{\Theta},D_{x},\epsilon)&=P_{x\sim D_{x}}(\epsilon\leq R_{1}({\mathcal{F}}_{\Theta},x))\\ &=P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta}(x^{\prime})=l_{x},\forall x^{\prime}\hbox{ s.t. }||x^{\prime}-x||_{p}\leq\epsilon).\end{array} (5)

Corresponding to R2R_{2} in (4), we have the following global robustness measure

R4(,Dx)=xDxR2(,x)dx.R_{4}({\mathcal{F}},D_{x})=\int_{x\sim D_{x}}R_{2}({\mathcal{F}},x){\hbox{\rm{d}}}x. (6)

We now define a measurement for an adversarial parameter set using the accuracy and robustness of {\mathcal{F}}.

Definition 2.1.

Let Θa\Theta_{a} be an adversarial parameter set of Θ\Theta, R(,Dx)R({\mathcal{F}},D_{x}) a robustness measure of {\mathcal{F}} for normal samples, and

PxDx(Θa(x)=lx)=γ1PxDx(Θ(x)=lx)\displaystyle P_{x\sim D_{x}}({{\mathcal{F}}}_{\Theta_{a}}(x)=l_{x})=\gamma_{1}P_{x\sim D_{x}}({{\mathcal{F}}}_{\Theta}(x)=l_{x}) (7)
R(Θa,Dx)=γ2R(Θ,Dx).\displaystyle R({\mathcal{F}}_{\Theta_{a}},D_{x})=\gamma_{2}R({\mathcal{F}}_{\Theta},D_{x}).

Then the adversarial rate of Θa\Theta_{a} is defined to be γ1¯(1γ2¯)\overline{\gamma_{1}}(1-\overline{\gamma_{2}}), where γ¯=min{γ,1}\overline{\gamma}=\min\{\gamma,1\}.

In general, we have γ11\gamma_{1}\leq 1 and γ21\gamma_{2}\leq 1. The value of γ1\gamma_{1} measures the ability of Θa\Theta_{a} to keep the accuracy of Θ{\mathcal{F}}_{\Theta} on normal samples, and if γ1\gamma_{1} is large then the attack is more difficult to be detected. The value of 1γ21-\gamma_{2} measures the ability of Θa\Theta_{a} to break the robustness of Θ{\mathcal{F}}_{\Theta}, and if 1γ21-\gamma_{2} is large then the parameter attack is more powerful. Hence, the adversarial rate γ1¯(1γ2¯)\overline{\gamma_{1}}(1-\overline{\gamma_{2}}) measures the quality of the adversarial parameter attack in that if the adversarial rate is larger then the adversarial parameter attack is better. If γ1¯(1γ2¯)\overline{\gamma_{1}}(1-\overline{\gamma_{2}}) achieves its maximal value 11, then γ1=1\gamma_{1}=1 and γ2=0\gamma_{2}=0 and the adversarial parameter attack is a perfect attack in that the attack does not change the accuracy of {\mathcal{F}}, but totally destroys the robustness of {\mathcal{F}}.

Remark 2.1.

In order to make the attack very hard to be detected, we can give a lower bound γlow\gamma_{\rm{low}} to γ1\gamma_{1}. If γ1<γlow\gamma_{1}<\gamma_{\rm{low}}, we consider Θa\Theta_{a} to be a failed attack.

2.2 Adversarial parameter attacks for other purposes

According to the requirements of specific applications, we may define other types of adversarial parameter attacks.

The adversarial parameters defined in section 2.1 are for all the samples. In certain applications, it is desired to make the network less robust on one specific class of samples, which motivates the following definitions.

The simplest case is adversarial parameters for a given sample. A small perturbation Θa\Theta_{a} of Θ\Theta is called adversarial parameters for a given sample x0x_{0}, if Θa{\mathcal{F}}_{\Theta_{a}} gives the correct label to x0x_{0} and has adversarial samples of x0x_{0} in S(x0,ϵ)={x||xx0|ϵ}S_{\infty}(x_{0},\epsilon)=\{x\,|\,|x-x_{0}|_{\infty}\leq\epsilon\} for a given ϵ+\epsilon\in{\mathbb{R}}_{+}. Let R(Θ,x0)R({\mathcal{F}}_{\Theta},x_{0}) be a measure of robustness of Θ{\mathcal{F}}_{\Theta} at sample x0x_{0}, and

R(Θa,x0)=γR(Θ,x0).\displaystyle R({\mathcal{F}}_{\Theta_{a}},x_{0})=\gamma R({\mathcal{F}}_{\Theta},x_{0}). (8)

Then the adversarial rate of Θa\Theta_{a} is defined to be 1γ¯1-\overline{\gamma}.

We can also break the stability for samples with a given label. A small perturbation Θa\Theta_{a} of Θ\Theta is called adversarial parameters for samples with a given label l0l_{0}, if Θa{\mathcal{F}}_{\Theta_{a}} keeps the accuracy for all normal samples and the robustness for normal samples whose label is not l0l_{0}, but break the robustness of samples with label l0l_{0}. Let α=PxDx(^Θ(x)=lx)\alpha=P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta}(x)=l_{x}) and β=PxDx(^Θ(x)=lx,xSp(x,ϵ))\beta=P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta}(x^{\prime})=l_{x},\forall x^{\prime}\in S_{p}(x,\epsilon)). For such adversarial parameters Θa\Theta_{a}, let

PxDx(^Θa(x)=lx)=γ1α\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x)=l_{x})=\gamma_{1}\alpha (9)
PxDx(^Θa(x)=lx,xBp(x,ϵ)|lxy0)=γ2β\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x^{\prime})=l_{x},\forall x^{\prime}\in B_{p}(x,\epsilon)\,|\,l_{x}\neq y_{0})=\gamma_{2}\beta
PxDx(^Θa(x)=lx,xBp(x,ϵ)|lx=y0)=γ3β.\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x^{\prime})=l_{x},\forall x^{\prime}\in B_{p}(x,\epsilon)\,|\,l_{x}=y_{0})=\gamma_{3}\beta.

Then the adversarial rate of Θa\Theta_{a} is defined as γ¯1γ¯2(1γ¯3)\overline{\gamma}_{1}\overline{\gamma}_{2}(1-\overline{\gamma}_{3}).

Finally, instead of breaking the robustness of samples with label y0y_{0}, we can break the accuracies for samples with label y0y_{0}. Such adversarial parameters are called direct adversarial parameters. Let Θa\Theta_{a} be a direct adversarial parameter set and

PxDx(^Θa(x)=lx|lxy0)=γ1α\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x)=l_{x}\,|\,l_{x}\neq y_{0})=\gamma_{1}\alpha (10)
PxDx(^Θa(x)=lx,xBp(x,ϵ)|lxy0)=γ2β\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x^{\prime})=l_{x},\forall x^{\prime}\in B_{p}(x,\epsilon)\,|\,l_{x}\neq y_{0})=\gamma_{2}\beta
PxDx(^Θa(x)=lx|lx=y0)=γ3α.\displaystyle P_{x\sim D_{x}}(\widehat{{\mathcal{F}}}_{\Theta_{a}}(x)=l_{x}\,|\,l_{x}=y_{0})=\gamma_{3}\alpha.

Then the adversarial rate is defined as γ¯1γ¯2(1γ¯3)\overline{\gamma}_{1}\overline{\gamma}_{2}(1-\overline{\gamma}_{3}). The above definition is similar to the attacks in [18, 43], but robustness is considered as an extra objective.

3 Algorithm

In this section, we give algorithms to compute adversarial parameters.

3.1 Compute adversarial parameters under LL_{\infty} norm

We formulate the adversarial parameter attack for a trained DNN Θ{\mathcal{F}}_{\Theta} under the LpL_{p} norm as the following optimization problem for a given ζ+\zeta\in{\mathbb{R}}_{+}.

maxΘak,ΘaΘpζA(Θa,Dx)/R(Θa,Dx)\displaystyle\max_{\Theta_{a}\in{\mathbb{R}}^{k},||\Theta_{a}-\Theta||_{p}\leq\zeta}A({\mathcal{F}}_{\Theta_{a}},D_{x})/R({\mathcal{F}}_{\Theta_{a}},D_{x}) (11)

where A(Θa,Dx)A({\mathcal{F}}_{\Theta_{a}},D_{x}) and R(Θa,Dx)R({\mathcal{F}}_{\Theta_{a}},D_{x}) are the accuracy and a robustness measure for Θa{\mathcal{F}}_{\Theta_{a}} over a distribution sample DxD_{x}.

Remark 3.1.

Theoretically, the adversarial parameter attack should be a multi-objective optimization problem, that is to maximize the accuracy and to minimize the robustness. But, such an optimization problem is difficult to solve.

Remark 3.2.

According to (11), the adversarial rate seems better to be defined as γ1/γ2\gamma_{1}/\gamma_{2}, which is possible but not as good as the one in Definition 2.1 for the following reasons. The adversarial rate γ1¯(1γ2¯)\overline{\gamma_{1}}(1-\overline{\gamma_{2}}) has the optimal value 11 and gives a more intuitive view to see the quality of the adversarial parameters.

In the rest of this section, we show how to change formula (11) to an effective algorithm to compute LL_{\infty} norm adversarial parameters using the robustness measure in (5). We first show how to compute the robustness in (5) explicitly. We use the adversarial training [20] to do that, which is the most effective way to find adversarial samples. For a sample xx and a small number ε+\varepsilon\in{\mathbb{R}}_{+}, we first compute

χ0=argmaxχk,χ0<εLCE(Θ(x+χ),lx)\chi_{0}={\arg\max}_{\chi\in{\mathbb{R}}^{k},||\chi||_{0}<\varepsilon}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta}(x+\chi),l_{x})

with PGD [20] and then use

LAT(x,Θ)=LCE(Θ(x+χ0),lx)L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta)=L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta}(x+\chi_{0}),l_{x}) (12)

to measure the robustness of Θ{\mathcal{F}}_{\Theta} at xx.

We need a training set TT to find the adversarial parameters. The training procedure consists of two phases. In the first pre-training phase, the loss function

xTLAT(x,Θ)-\sum_{x\in T}L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta) (13)

is used to reduce the adversarial accuracy of Θ{\mathcal{F}}_{\Theta}. In the second main training phase, the loss function

xTLCE(Θ(x),lx)xTLAT(x,Θ)\frac{\sum_{x\in T}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta}(x),l_{x})}{\sum_{x\in T}L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta)} (14)

is used to promote the accuracy and keep the low-level of adversarial accuracy of Θ{\mathcal{F}}_{\Theta}, which corresponds to formula (11).

We will compute a more general LL_{\infty} norm parameter perturbation. Let Δ+k\Delta\in{\mathbb{R}}_{+}^{k} and Δi\Delta_{i} the ii-th coordinate of Δ\Delta. Then the LL_{\infty} parameter perturbation will be found in

B(Θ,Δ)={Θak||ΘaΘ|iΔi,i[k]}.B_{\infty}(\Theta,\Delta)=\{\Theta_{a}\in{\mathbb{R}}^{k}\,|\,|\Theta_{a}-\Theta|_{i}\leq\Delta_{i},\ \forall\ i\in[k]\}.

It is clear that the usual LL_{\infty} norm parameter perturbation is a special case of the above general case. We use this general form, because we want to include more types of parameter perturbations which are given in section 5.2. A sketch of the algorithm is given below.

Algorithm 1 Attack under LL_{\infty} norm
0:    The parameter set Θ\Theta of {\mathcal{F}};The hyper-parameters: α+\alpha\in{\mathbb{R}}_{+}, Δ+k\Delta\in{\mathbb{R}}_{+}^{k}, n1,n2n_{1},n_{2}\in{\mathbb{N}};A training set TT.
0:  An adversarial parameter set Θa\Theta_{a} in B(Θ,Δ)B_{\infty}(\Theta,\Delta).Let i=0i=0, Θa=Θ\Theta_{a}=\Theta.For all i[n1+n2]i\in[n_{1}+n_{2}]: If i<n1i<n_{1}:  L=xT1|T|LAT(x,Θa)L=-\sum_{x\in T}\frac{1}{|T|}L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta_{a}). Else:  L=xTLCE(Θa(x),lx)xTLAT(x,Θa)L=\frac{\sum_{x\in T}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta_{a}}(x),l_{x})}{\sum_{x\in T}L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta_{a})}. Θ~=Θa+αL\widetilde{\Theta}=\Theta_{a}+\alpha\bigtriangledown L. Θa=Proj(Θ~,B(Θ,Δ))\Theta_{a}={\hbox{\rm{Proj}}}(\widetilde{\Theta},B_{\infty}(\Theta,\Delta)).Output: Θa\Theta_{a}.
Remark 3.3.

We give more details for the algorithm.

(1). Proj(Θ~,B(Θ,Δ)){\hbox{\rm{Proj}}}(\widetilde{\Theta},B_{\infty}(\Theta,\Delta)) maps Θ~\widetilde{\Theta} into B(Θ,Δ)B_{\infty}(\Theta,\Delta) as follows: for i[k]i\in[k]:

If Θ~i>Θi+Δi\widetilde{\Theta}_{i}>\Theta_{i}+\Delta_{i}, Proj(Θ~,B(Θ,Δ))i=Θi+Δi{\hbox{\rm{Proj}}}(\widetilde{\Theta},B_{\infty}(\Theta,\Delta))_{i}=\Theta_{i}+\Delta_{i};

If Θ~i<ΘiΔi\widetilde{\Theta}_{i}<\Theta_{i}-\Delta_{i}, Proj(Θ~,B(Θ,Δ))i=ΘiΔi{\hbox{\rm{Proj}}}(\widetilde{\Theta},B_{\infty}(\Theta,\Delta))_{i}=\Theta_{i}-\Delta_{i};

If Θ~i+Δi>Θi>Θ~iΔi\widetilde{\Theta}_{i}+\Delta_{i}>\Theta_{i}>\widetilde{\Theta}_{i}-\Delta_{i}, Proj(Θ~,B(Θ,Δ))i=Θi{\hbox{\rm{Proj}}}(\widetilde{\Theta},B_{\infty}(\Theta,\Delta))_{i}=\Theta_{i}.

(2). We will reduce the training steps α\alpha with the training going.

3.2 Algorithms for other kinds of adversarial parameters

The algorithm to find adversarial parameters under other norms and robustness measures can be developed similarly. In what below, we show how to compute adversarial parameters under L0L_{0} norm, which is different from other cases. The overall algorithm is similar to Algorithm 1, except we use a new method to update the parameters. Suppose that Θ={Wl,bl}l=1L\Theta=\{W_{l},b_{l}\}_{l=1}^{L} is the parameter to be updated and the value of LL in Algorithm 1 is found. We will show how to update the parameters. We only change some weight matrices WlW_{l} as follows.

  • Randomly select two entries w1w_{1} and w2w_{2} of WlW_{l} until (Lw1Lw2)(w1w2)>0(\frac{\nabla L}{\nabla w_{1}}-\frac{\nabla L}{\nabla w_{2}})(w_{1}-w_{2})>0 is satisfied.

  • Exchange w1w_{1} and w2w_{2} in WlW_{l} to obtain the new parameters.

It is clear that the change will make LL become smaller. In total, we update a given number of weight matrices, and for each such matrix, we change a given percentage of its entries. The details of the algorithm are omitted. Note that the above parameter perturbation keeps the sparsity and the values of the entries of the weight matrices. As a consequence the Proj operator in the algorithm can be taken as the identity map.

The adversarial parameters defined in section 2.2 can also be obtained similarly. For instance, to compute the adversarial parameters for one sample xx, we just need to let TT in Algorithm 1 to be T={x}T=\{x\}.

To compute adversarial parameters for samples with a given label l0l_{0}, by (9) we can use the following loss function

xTLCE(Θa(x),lx)+xT&lxl0LAT(x,Θa)xT&lx=l0LAT(Θa(x),lx)\frac{\sum_{x\in T}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta_{a}}(x),l_{x})+\sum_{x\in T\ \&\ l_{x}\neq l_{0}}L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta_{a})}{\sum_{x\in T\ \&\ l_{x}=l_{0}}L_{{\hbox{\scriptsize\rm{AT}}}}({\mathcal{F}}_{\Theta_{a}}(x),l_{x})} (15)

to increase the robustness and accuracy of samples whose labels are not l0l_{0} and to reduce the accuracy for samples with labels l0l_{0}.

To compute direct adversarial parameters for samples with label l0l_{0}, by (10) we can use the following loss function

xT&lxl0(LAT(x,Θa)+LCE(Θa(x),lx))xT&lx=l0LCE(Θa(x),lx)\frac{\sum_{x\in T\ \&\ l_{x}\neq l_{0}}(L_{{\hbox{\scriptsize\rm{AT}}}}(x,\Theta_{a})+L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta_{a}}(x),l_{x}))}{\sum_{x\in T\ \&\ l_{x}=l_{0}}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta_{a}}(x),l_{x})} (16)

to increase the robustness and accuracy of samples whose labels are not l0l_{0} and to reduce the accuracy for samples with labels l0l_{0}.

4 Existence of adversarial parameters

In this section, we will show that adversarial parameters with high adversarial rates exist under certain conditions.

4.1 Adversarial parameters to achieve low adversary accuracy

In this section, we use the robustness radius in (3) and the adversarial accuracy in (5) as the robust measures, and hence existence of adversarial parameters implies low adversary accuracies.

We introduce several notations. Let x=mini[n]{|xi|}||x||_{-\infty}=\min_{i\in[n]}\{|x_{i}|\} for xnx\in{\mathbb{R}}^{n}, and W,2=mini[a]{W(i)2}||W||_{-\infty,2}=\min_{i\in[a]}\{||W^{(i)}||_{2}\} for Wa×bW\in{\mathbb{R}}^{a\times b}, where W(i)W^{(i)} is the ii-th row of WW. If {\mathcal{F}} is a network, we use i(x){\mathcal{F}}_{i}(x) to denote the ii-th coordinate of (x){\mathcal{F}}(x).

In this section, we consider the following network Θ:𝕀nm{\mathcal{F}}_{\Theta}:{\mathbb{I}}^{n}\to{\mathbb{R}}^{m} with one hidden layer

(x)=W2Relu(W1x+b1)+b2,\begin{array}[]{ll}{\mathcal{F}}(x)=W_{2}{\hbox{\rm{Relu}}}(W_{1}x+b_{1})+b_{2},\end{array} (17)

where W1n1×n,b1n1,W2m×n1,b2mW_{1}\in{\mathbb{R}}^{n_{1}\times n},b_{1}\in{\mathbb{R}}^{n_{1}},W_{2}\in{\mathbb{R}}^{m\times n_{1}},b_{2}\in{\mathbb{R}}^{m}. Θ={Wi,bi}i=12k\Theta=\{W_{i},b_{i}\}_{i=1}^{2}\in{\mathbb{R}}^{k} is the parameter set of Θ{\mathcal{F}}_{\Theta}, where k=(n+m+1)n1+mk=(n+m+1)n_{1}+m.

The network defined in (17) has just one hidden-layer. We will show that when the width of its hidden-layer is large enough, adversarial parameters exist under with certain conditions.

We will consider LL_{\infty} adversarial parameters. For γ+\gamma\in{\mathbb{R}}_{+}, the hypothetical space for the adversarial networks of Θ{\mathcal{F}}_{\Theta} is

γ(Θ)={Θa|ΘaΘ<γ}.{\mathcal{H}}_{\gamma}(\Theta)=\{{\mathcal{F}}_{\Theta_{a}}\,|\,||\Theta_{a}-\Theta||_{\infty}<\gamma\}.

The following theorem shows the existence of adversarial parameters for a given sample x0x_{0}. The proof of the theorem is given in section 6.1.

Theorem 4.1.

Let Θ{\mathcal{F}}_{\Theta} be a trained network with structure in (17), which gives the correct label lx0l_{x_{0}} for a sample x0x_{0}. Further assume the following conditions.

C1C_{1}.

Let a,A+a,A\in{\mathbb{R}}_{+} such that |i(x)j(x)|<A|{\mathcal{F}}_{i}(x)-{\mathcal{F}}_{j}(x)|<A for all i,j[m]i,j\in[m] and xS(x0,a)={x|xx0a}x\in S_{\infty}(x_{0},a)=\{x\,|\,||x-x_{0}||_{\infty}\leq a\}.

C2C_{2}.

W2(i)W2(j)>c||W_{2}^{(i)}-W_{2}^{(j)}||_{-\infty}>c for all i,j[m]i,j\in[m], iji\neq j.

C3C_{3}.

At least ηn1\eta n_{1} coordinates of |Relu(W1x+b1)||{\hbox{\rm{Relu}}}(W_{1}x+b_{1})| are bigger than bb, where η(0,1)\eta\in(0,1) and b+b\in{\mathbb{R}}_{+}.

For γ,ϵ+\gamma,\epsilon\in{\mathbb{R}}_{+} such that ϵ<a\epsilon<a, if n1>2Amin{ϵγ(n1),b}cηn_{1}>\frac{2A}{\min\{\epsilon\gamma(n-1),b\}c\eta}, then there exists an Θaγ(Θ){\mathcal{F}}_{\Theta_{a}}\in{\mathcal{H}}_{\gamma}(\Theta) such that ^Θa(x0)=lx0\widehat{{\mathcal{F}}}_{\Theta_{a}}(x_{0})=l_{x_{0}} and Θa{\mathcal{F}}_{\Theta_{a}} has adversarial samples to x0x_{0} in S(x0,ϵ)S_{\infty}(x_{0},\epsilon).

We have the following observations from Theorem 4.1.

Corollary 4.1.

If the robustness radius in (3) is used as the robustness measure, then the adversarial rate of Θa{\mathcal{F}}_{\Theta_{a}} in Theorem 4.1 is bigger than 1ϵa1-\frac{\epsilon}{a}.

Remark 4.1.

From Theorem 4.1, if the width of Θ{\mathcal{F}}_{\Theta} is sufficiently large, then Θ{\mathcal{F}}_{\Theta} has adversarial parameters which is as close as possible to Θ\Theta and Θa{\mathcal{F}}_{\Theta_{a}} has adversarial samples which are as close as possible to x0x_{0}.

Remark 4.2.

Since Θ{\mathcal{F}}_{\Theta} is a continuous function in Θ\Theta, if Θa\Theta_{a} is an adversarial parameter set for Θ\Theta then there exists a small sphere SaS_{a} with Θa\Theta_{a} as center such that all parameters in SaS_{a} are also adversarial parameters for Θ\Theta.

From the above two remarks, we may say that adversarial parameters are inevitable in this case.

The following theorem shows that when n1n_{1} is large enough, adversarial parameters exist for a distributions with high probability. The proof of the theorem is given in section 6.1.

Theorem 4.2.

Let Θ{\mathcal{F}}_{\Theta} be a trained DNN with structure in (17) and S𝕀nS\subset{\mathbb{I}}^{n} the set of normal samples. Further assume the following conditions.

C1C_{1}.

Let A,a+A,a\in{\mathbb{R}}_{+} such that |i(x)j(x)|<A|{\mathcal{F}}_{i}(x)-{\mathcal{F}}_{j}(x)|<A for all i,j[m]i,j\in[m] and xx0SS(x0,a)x\in\bigcup_{x_{0}\in S}S_{\infty}(x_{0},a).

C2C_{2}.

W2(i)W2(j)>c||W_{2}^{(i)}-W_{2}^{(j)}||_{-\infty}>c for all i,j[m]i,j\in[m], iji\neq j, where c+c\in{\mathbb{R}}_{+}.

C3C_{3}.

For all xSx\in S, at least ηn1\eta n_{1} coordinates of |Relu(W1x+b1)||{\hbox{\rm{Relu}}}(W_{1}x+b_{1})| are bigger than bb, where η(0,1)\eta\in(0,1) b+b\in{\mathbb{R}}_{+}.

C4C_{4}.

The dimension of SS is lower than nmn-m.

For ϵ,γ+\epsilon,\gamma\in{\mathbb{R}}_{+} such that ϵ<a\epsilon<a, if n1>2Amin{ϵγ/m,b}cηn_{1}>\frac{2A}{\min\{\epsilon\gamma/m,b\}c\eta}, then there exists an Θaγ(Θ){\mathcal{F}}_{\Theta_{a}}\in{\mathcal{H}}_{\gamma}(\Theta) such that the accuracy of Θa{{\mathcal{F}}}_{\Theta_{a}} over DxD_{x} is greater than or equal to that of Θ{{\mathcal{F}}}_{\Theta} and

PxDx(ΘahasanadversarialsampleofxinS(x,ϵ))0.5.P_{x\sim D_{x}}({{\mathcal{F}}}_{\Theta_{a}}\ has\ an\ adversarial\ sample\ of\ x\ in\ S_{\infty}(x_{,}\epsilon))\geq 0.5.
Corollary 4.2.

Let the adversary accuracy of Θ{\mathcal{F}}_{\Theta} be θ=R3(Θ,Dx,ϵ)\theta=R_{3}({\mathcal{F}}_{\Theta},D_{x},\epsilon), then Theorem 4.2 implies that there exists adversarial parameters Θa\Theta_{a} of Θ\Theta with adversarial rate at least 10.5/θ1-0.5/\theta.

Remark 4.3.

The conditions of Theorems 4.1 and 4.2 can be satisfied for most DNNs. The parameters AA and aa in Condition C1C_{1} are clearly exist. Since the training procedure usually terminates near a local minimum of the loss function, the weights can be considered as random values [39], and hence conditions C2C_{2} and C3C_{3} can be satisfied. For practical examples such as MNIST and CIFAR-10, condition C4C_{4} is clearly satisfied.

4.2 Adversarial parameters for DNNs

In this section, we consider networks of the following form

x0𝕀n,nL+1=m;xl=Relu(Wlxl1)n,Wln×n,l[L];(x0)=xL+1=WL+1xLm,WL+1m×n.\begin{array}[]{ll}x_{0}\in{\mathbb{I}}^{n},n_{L+1}=m;\\ x_{l}={\hbox{\rm{Relu}}}(W_{l}x_{l-1})\in{\mathbb{R}}^{n},W_{l}\in{\mathbb{R}}^{n\times n},l\in[L];\\ {\mathcal{F}}(x_{0})=x_{L+1}=W_{L+1}x_{L}\in{\mathbb{R}}^{m},W_{L+1}\in{\mathbb{R}}^{m\times n}.\\ \end{array} (18)

Let Θ={Wi}i=1L+1\Theta=\{W_{i}\}_{i=1}^{L+1} be the parameters and Θk\Theta\in{\mathbb{R}}^{k}, where k=Ln2+mnk=Ln^{2}+mn. We use Θl(x){\mathcal{F}}^{l}_{\Theta}(x) to represent the output of the ll-th layer of Θ(x){\mathcal{F}}_{\Theta}(x) where l[L]l\in[L]. We will show that, when LL becomes big, adversarial parameters exist.

We first prove the existence of adversarial parameters for a given sample. We use the following robustness measure for network {\mathcal{F}} at a sample xx

R(,x)=minllx{|lx(x)l(x)|2(lx(x))(l(x))22I(lx(x)>l(x))}.R({\mathcal{F}},x)=\min_{l\neq l_{x}}\{\frac{|{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l}(x)|^{2}}{||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l}(x))||_{2}^{2}}I({\mathcal{F}}_{l_{x}}(x)>{\mathcal{F}}_{l}(x))\}.

It is easy to see that this is the square of R2(,x)R_{2}({\mathcal{F}},x) in (4) with p=2p=2.

Theorem 4.3.

Let Θ{\mathcal{F}}_{\Theta} be a trained network with structure in (18), which gives the correct label for a sample x0nx_{0}\in{\mathbb{R}}^{n}. Further assume the following conditions.

C1C_{1}.

||i(t)t|t=x0||2<A||\frac{\nabla{\mathcal{F}}_{i}(t)}{\nabla t}|_{t=x_{0}}||_{2}<\sqrt{A} for i[m]i\in[m].

C2C_{2}.

||l(t)t|t=x0||,2>b||\frac{\nabla{\mathcal{F}}^{l}(t)}{\nabla t}|_{t=x_{0}}||_{-\infty,2}>b for l[L]l\in[L].

C3C_{3}.

||i(t)j(t)l(t)|t=x0||>c||\frac{\nabla{\mathcal{F}}_{i}(t)-\nabla{\mathcal{F}}_{j}(t)}{\nabla{\mathcal{F}}^{l}(t)}|_{t=x_{0}}||_{-\infty}>c for i,j[m]i,j\in[m], iji\neq j and l[L]l\in[L].

C4C_{4}.

For l[L]l\in[L], l(t)t|t=x0\frac{\nabla{\mathcal{F}}^{l}(t)}{\nabla t}|_{t=x_{0}} has a column LlL_{l} such that the angle between LlL_{l} and l(x0){\mathcal{F}}^{l}(x_{0}) is bigger than α\alpha and smaller than πα\pi-\alpha, where α[0,π/2]\alpha\in[0,\pi/2].

Then for γ+\gamma\in{\mathbb{R}}_{+}, there exists an Θaγ(Θ){\mathcal{F}}_{\Theta_{a}}\in{\mathcal{H}}_{\gamma}(\Theta) such that Θa(x0)=lx0{\mathcal{F}}_{\Theta_{a}}(x_{0})=l_{x_{0}} and

R(Θa,x0)(1η)R(Θ,x0)R({\mathcal{F}}_{\Theta_{a}},x_{0})\leq(1-\eta)R({\mathcal{F}}_{\Theta},x_{0})

where η=γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)4A+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)\eta=\frac{\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}{4A+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}. In other words, there exists an Θa{\mathcal{F}}_{\Theta_{a}} with adversarial rate η\geq\eta.

The proof of the theorem is given in section 6.2. As a consequence of Theorem 4.3, there exist adversarial parameters for sample x0x_{0}, whose robustness measure is as small as possible.

Corollary 4.3.

For ρ(0,1)\rho\in(0,1), if L4(1ρ)Aρ(γsin(r)cb)2+1L\geq\frac{4(1-\rho)A}{\rho(\gamma\sin(r)cb)^{2}}+1, then the adversarial rate of Θa{\mathcal{F}}_{\Theta_{a}} is 1ρ\geq 1-\rho.

Corollary 4.4.

For τ(0,1)\tau\in(0,1) satisfying τ<R(,x0)\tau<R({\mathcal{F}},x_{0}), if L>4A(R(,x0)/τ1)(γsin(r)cb)2+1L>\frac{4A(R({\mathcal{F}},x_{0})/\tau-1)}{(\gamma\sin(r)cb)^{2}}+1, then R(Θa,x0)τR({\mathcal{F}}_{\Theta_{a}},x_{0})\leq\tau.

To find adversarial parameters for samples under a distribution DxD_{x}, we use the following robustness measure for {\mathcal{F}}:

R(,Dx)=xDxminjlx{lx(x)j(x)22I(lx(x)>j(x))}dxxDxmaxjlx{lx(x)j(x)22}dx.R({\mathcal{F}},D_{x})=\frac{\int_{x\sim D_{x}}\min_{j\neq l_{x}}\{||{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{j}(x)||_{2}^{2}\,I({\mathcal{F}}_{l_{x}}(x)>{\mathcal{F}}_{j}(x))\}{\hbox{\rm{d}}}x}{\int_{x\sim D_{x}}\max_{j\neq l_{x}}\{||\nabla{\mathcal{F}}_{l_{x}}(x)-\nabla{\mathcal{F}}_{j}(x)||_{2}^{2}\}{\hbox{\rm{d}}}x}.

This is a variant of R4(,Dx)R_{4}({\mathcal{F}},D_{x}) in (6) with p=2p=2. The following theorem shows that adversarial parameters exist for this robustness measure. The proof of the theorem is given in section 6.2.

Theorem 4.4.

Let Θ{\mathcal{F}}_{\Theta} be a trained DNN with structure in (17) and S𝕀nS\subset{\mathbb{I}}^{n} the set of normal samples satisfying distribution DxD_{x}. Further assume the following conditions.

C1C_{1}.

||i(t)t|t=x||2<A||\frac{\nabla{\mathcal{F}}_{i}(t)}{\nabla t}|_{t=x}||_{2}<\sqrt{A} for all samples xSx\in S and i[m]i\in[m].

C2C_{2}.

PxDx(llx,||(l(t)lx(t))l(t)|t=x||>cl)>αlP_{x\sim D_{x}}(\forall l\neq l_{x},\ ||\frac{\nabla({\mathcal{F}}_{l}(t)-{\mathcal{F}}_{l_{x}}(t))}{\nabla{\mathcal{F}}^{l}(t)}|_{t=x}||_{-\infty}>c_{l})>\alpha_{l}, where l[L]l\in[L] and cl,αl+c_{l},\alpha_{l}\in{\mathbb{R}}_{+}.

C3C_{3}.

PxDx(||vl(t)t|t=x||dlv)>βlP_{x\sim D_{x}}(||v\frac{\nabla{\mathcal{F}}^{l}(t)}{\nabla t}|_{t=x}||_{\infty}\geq d_{l}||v||_{\infty})>\beta_{l} for v1×n\forall v\in{\mathbb{R}}^{1\times n}, where l[L]l\in[L] and dl,βl+d_{l},\beta_{l}\in{\mathbb{R}}_{+}.

C4C_{4}.

{l(x)}xS\{{\mathcal{F}}^{l}(x)\}_{x\in S} is in a low dimensional subspace of n{\mathbb{R}}^{n} and l(x)0>γl/n||{\mathcal{F}}^{l}(x)||_{0}>\gamma_{l}/n, where l[L]l\in[L], xSx\in S, and γl+\gamma_{l}\in{\mathbb{R}}_{+}.

For γ+\gamma\in{\mathbb{R}}_{+}, let (γ){\mathcal{H}}(\gamma) be the set of networks in γ(Θ){\mathcal{H}}_{\gamma}(\Theta), whose accuracies are equal to or larger than that of Θ{\mathcal{F}}_{\Theta}. Then

min~H(γ){R(~,Dx)}(1ρ)R(,Dx)\min_{\widetilde{{\mathcal{F}}}\in H(\gamma)}\{R(\widetilde{{\mathcal{F}}},D_{x})\}\leq\displaystyle(1-\rho)R({\mathcal{F}},D_{x})

where ρ=(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)24A+(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)2\rho=\frac{(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}{4A+(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}\displaystyle. In other words, there exists an Θa{\mathcal{F}}_{\Theta_{a}} with adversarial rate ρ\geq\rho.

We can make the robustness of the perturbed network as small as possible.

Corollary 4.5.

In Theorem 4.4, if α,β,c,d+\alpha,\beta,c,d\in{\mathbb{R}}_{+} satisfy αl>α\alpha_{l}>\alpha, βl>β\beta_{l}>\beta, cl>cc_{l}>c, dl>dd_{l}>d, γl>γlow\gamma_{l}>\gamma_{low} for l[L]l\in[L], then

min~H(γ){R(~,Dx)}(1(γc)2αγlow+(L1)(γcd)2γlow(α+β1)+β(dγ)24A+(γc)2αγlow+(L1)(γcd)2γlow(α+β1)+β(dγ)2)R(,Dx).\min_{\widetilde{{\mathcal{F}}}\in H(\gamma)}\{R(\widetilde{{\mathcal{F}}},D_{x})\}\leq(1-\frac{(\gamma c)^{2}\alpha\gamma_{low}+(L-1)(\gamma cd)^{2}\gamma_{low}(\alpha+\beta-1)+\beta(d\gamma)^{2}}{4A+(\gamma c)^{2}\alpha\gamma_{low}+(L-1)(\gamma cd)^{2}\gamma_{low}(\alpha+\beta-1)+\beta(d\gamma)^{2}})R({\mathcal{F}},D_{x}).

Furthermore, for ρ(0,1)\rho\in(0,1), if L>1+4(1ρ)Aρ((γcd)2γlow(α+β1))L>1+\frac{4(1-\rho)A}{\rho((\gamma cd)^{2}\gamma_{low}(\alpha+\beta-1))}, then there exists an Θa(γ){\mathcal{F}}_{\Theta_{a}}\in{\mathcal{H}}(\gamma) whose adversarial rate is 1ρ\geq 1-\rho.

Furthermore, for τ(0,1)\tau\in(0,1) satisfying τ<R(,Dx)\tau<R({\mathcal{F}},D_{x}), if L>4A(R(,Dx)/τ1)(γcd)2γlow(α+β1)+1L>\frac{4A(R({\mathcal{F}},D_{x})/\tau-1)}{(\gamma cd)^{2}\gamma_{low}(\alpha+\beta-1)}+1, then there exists an Θa(γ){\mathcal{F}}_{\Theta_{a}}\in{\mathcal{H}}(\gamma) such that R(Θa,Dx)}τR({\mathcal{F}}_{\Theta_{a}},D_{x})\}\leq\tau.

Remark 4.4.

From Corollary 4.5, if the depth of the DNN is sufficiently large, then there exist adversary parameters such that the attacked network has robustness measure as small as possible.

Remark 4.5.

In practical computation, we use a finite set TT of samples satisfying DxD_{x} and R(,Dx){R}({\mathcal{F}},D_{x}) is approximately computed as R~(,T)=1/|T|xTR(,x)\widetilde{R}({\mathcal{F}},T)=1/|T|\sum_{x\in T}R({\mathcal{F}},x). Since Θ{\mathcal{F}}_{\Theta} is a continued function in Θ\Theta, if Θa\Theta_{a} is an adversarial parameter for Θ\Theta and R~(,T)\widetilde{R}({\mathcal{F}},T) is used as the robustness measure, then there exists a small sphere SaS_{a} with Θa\Theta_{a} as center such that all parameters in SaS_{a} are also adversarial parameters for Θ\Theta.

The above remarks show that adversary parameters are inevitable in certain sense.

Remark 4.6.

In the model (18), two simplifications are made. However, the results proved in this section can be generalized to general DNNs. First, the bias vectors are not considered, which can be included as parts of the weight matrices by extending the input space slightly, similar to [22]. Second, it is assumed that nl=nn_{l}=n for l[L]l\in[L]. This assumption could be removed by assuming n=maxl[l]nin=\max_{l\in[l]}n_{i}.

Remark 4.7.

Using R4(,Dx)R_{4}({\mathcal{F}},D_{x}), results in theorem 4.4 cannot be obtained yet. But, we will use numerical experiments to show that the result is also valid for R4(,Dx)R_{4}({\mathcal{F}},D_{x}).

5 Experimental results

5.1 The setting

We use two networks: VGG19 [29] and Resnet56 [11]. We write VGG19 as V{\mathcal{F}}_{V} and Resnet56 as R{\mathcal{F}}_{R}, which are trained with the adversarial training [20]. The experimental results are for the CIFAR-10 dataset.

We use both the adversary accuracy in (5) and the approximate robust radius in (6) to compute the adversarial rate. For a given data set TT, the adversarial accuracy defined in (5) can be approximately computed with PGD [20] as follows

R~3(Θ,T,ϵ)=1/|T|xTI(^Θ(x)=lx)\widetilde{R}_{3}({\mathcal{F}}_{\Theta},T,\epsilon)=1/|T|\sum_{x\in T}I(\widehat{{\mathcal{F}}}_{\Theta}(x^{\prime})=l_{x})

where x=argmax|x~x|ϵLCE(Θ(x~),lx)x^{\prime}=\arg\max_{|\widetilde{x}-x|_{\infty}\leq\epsilon}L_{{\hbox{\scriptsize\rm{CE}}}}({\mathcal{F}}_{\Theta}(\widetilde{x}),l_{x}). In the experiment, we set ϵ=8/255\epsilon=8/255. The approximate robust radius in (6) can be computed as follows

R~4(,T)=1TxTminllx{lx(x)l(x)lx(x)l(x)1I(lx(x0)>l(x0))}\widetilde{R}_{4}({\mathcal{F}},T)=\frac{1}{T}\sum_{x\in T}\min_{l\neq l_{x}}\{\frac{{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l}(x)}{||\nabla{\mathcal{F}}_{l_{x}}(x)-\nabla{\mathcal{F}}_{l}(x)||_{1}}I({\mathcal{F}}_{l_{x}}(x_{0})>{\mathcal{F}}_{l}(x_{0}))\}

where the L1L_{1}-norm is used, since we consider LL_{\infty} adversarial samples.

The accuracies, adversarial accuracies, and AARs of V{\mathcal{F}}_{V} and R{\mathcal{F}}_{R} under the LL_{\infty} norm attack are given in Table 1, which are about the state of the art results for these DNNs.

Net AC AA AAR
V{\mathcal{F}}_{V} 80%\% 45%\% 0.0770
R{\mathcal{F}}_{R} 83%\% 49%\% 0.0194
Table 1: Results for V{\mathcal{F}}_{V} and R{\mathcal{F}}_{R} on CIFAR-10. AC: accuracy, AA: adversarial accuracy, AAR: approximate accurate radius

5.2 Adversarial parameter attack

Let Θ\Theta be the parameter set of V{\mathcal{F}}_{V} or R{\mathcal{F}}_{R}, and two kinds of parameter perturbation attacks will be carried out:

L,γL_{\infty,\gamma} perturbation for γ+\gamma\in{\mathbb{R}}_{+}: We consider parameter perturbations in B(Θ,Δγ)B_{\infty}(\Theta,\Delta_{\gamma}), where Δγ=(γ|θ1|,,γ|θk|)\Delta_{\gamma}=(\gamma|\theta_{1}|,\ldots,\gamma|\theta_{k}|) for Θ=(θ1,,θk)\Theta=(\theta_{1},\ldots,\theta_{k}). In other words, γ\gamma is the perturbation ratio. Also, the BN-layers will be changed to compute this kind of perturbations.

L0,kL_{0,k} perturbation for k>0k\in{\mathbb{N}}_{>0}: kk weight matrices are perturbed and max{400,1%#Wl}\max\{400,1\%\#W_{l}\} pairs of weights are changed for V{\mathcal{F}}_{V} with the method given in section 3.2 (1%#Wl1\%\#W_{l} pairs of weights are changed for R{\mathcal{F}}_{R}), where #Wl\#W_{l} is the number of entries of WlW_{l}. The BN-layers will not be changed to compute this kind of perturbations.

We set γlow\gamma_{\rm{low}} in Remark 2.1 to be 90%90\%, that is, if the accuracy of the perturbed network has is less than 90%90\% of that of the original DNN, then the attack is considered failed.

5.2.1 Random parameter perturbation

We do random parameter perturbations and will use them as bases for comparisons. The results are given in Table 2.

Attack AC AA AR
No attack 80%\% 45%\% 0
L,0.02L_{\infty,0.02} 80%\% 39%\% 0.13
L,0.04L_{\infty,0.04} 80%\% 37%\% 0.17
L,0.06L_{\infty,0.06} 79%\% 36%\% 0.2
L,0.08L_{\infty,0.08} 78%\% 35%\% 0.22
L,0.10L_{\infty,0.10} 78%\% 34%\% 0.24
L0,8L_{0,8} 67%\% 23%\% 0.41(fail)
L0,12L_{0,12} 61%\% 20%\% 0.42(fail)
L0,16L_{0,16} 57%\% 19%\% 0.41(fail)
Table 2: Random perturbations for V{\mathcal{F}}_{V} . AC: accuracy, AA: adversarial accuracy, AR: adversarial rate

For the LL_{\infty} perturbations, the accuracies are kept high, but the robustness does not decrease much, so the adversarial rates are low. For the L0L_{0} perturbations, the accuracy decreases too much and are considered failed attacks. In either case, random perturbations are not good adversarial parameters. Thus adversarial parameters are sparse around the trained parameters, which is similar with adversarial samples [40].

Attack AC AA AR
No attack 83.1%\% 48.5%\% 0
L,0.02L_{\infty,0.02} 82.9%\% 48.0%\% 0.004
L,0.04L_{\infty,0.04} 82.7%\% 48.0%\% 0.011
L,0.06L_{\infty,0.06} 82.3%\% 47.5%\% 0.022
L,0.08L_{\infty,0.08} 81.7%\% 46.6%\% 0.039
L,0.10L_{\infty,0.10} 81.0%\% 45.5%\% 0.061
L0,10L_{0,10} 81.5%\% 47.8%\% 0.016
L0,20L_{0,20} 80.7%\% 45.8%\% 0.054
L0,30L_{0,30} 80.5%\% 45.7%\% 0.057
Table 3: Random perturbations for R{\mathcal{F}}_{R}. AC: accuracy, AA: adversarial accuracy, AR: adversarial rate

Results of random perturbations for R{\mathcal{F}}_{R} are given in Table 3. From the results, we can see that network R{\mathcal{F}}_{R} is much more robust against random parameter perturbations than V{\mathcal{F}}_{V}.

5.2.2 Adversarial parameter attack on V{\mathcal{F}}_{V} and R{\mathcal{F}}_{R}

We use algorithms in section 3 to create adversarial parameters. The training set TT contains 500 samples for which {\mathcal{F}} give the correct label. The average results are given in Table 4.

Attack AC AA in (5) ARR in (6)
R~3(,T)\widetilde{R}_{3}({\mathcal{F}},T) AR R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) AR
No attack 80%\% 45%\% 0 0.0770 0
L,0.02L_{\infty,0.02} 78%\% 38%\% 0.15 0.0667 0.13
L,0.04L_{\infty,0.04} 77%\% 30%\% 0.32 0.0481 0.36
L,0.06L_{\infty,0.06} 76%\% 22%\% 0.49 0.0372 0.49
L,0.08L_{\infty,0.08} 76%\% 10%\% 0.74 0.0195 0.71
L,0.10L_{\infty,0.10} 77%\% 8%\% 0.79 0.0143 0.78
L0,8L_{0,8} 72%\% 27%\% 0.36 0.0441 0.38
L0,12L_{0,12} 76%\% 24%\% 0.44 0.0443 0.40
L0,16L_{0,16} 74%\% 22%\% 0.47 0.0404 0.44
Table 4: Adversarial parameter attack for V{\mathcal{F}}_{V}. AC: accuracy, AA: adversarial accuracy, AR: adversarial rate.

Comparing Tables 2 and 4, we can see that algorithms in section 3 can be used to create good adversarial parameters, especially for the LL_{\infty} attack. From Figure 1, we can see that the adversarial rate and adversarial accuracy for the L,γL_{\infty,\gamma} attack are near linear in γ\gamma when γ\gamma is small, and is gradually stabilized with the increase of γ\gamma. Also when γ\gamma is very small, say γ=0.02\gamma=0.02, the adversarial parameter attacks do not create good results, which means the network is approximately safe against these attacks for γ0.02\gamma\leq 0.02. For L0L_{0} attacks, we can see that the accuracies are increased lots comparing to the random perturbation and adversarial parameters are obtained successfully. Also, for the two kinds of robustness-measurements, the adversarial rates are very close. For network R{\mathcal{F}}_{R}, similar results are obtained and are given in Table 5.

Refer to caption
Refer to caption
Figure 1: Left: Relation between γ\gamma and the adversarial rate. Right: Relation between γ\gamma and adversarial accuracy.
Attack AC AA in (5) ARR in (6)
R~3(,T)\widetilde{R}_{3}({\mathcal{F}},T) AR R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) AR
No attack 83%\% 49%\% 0 0.0194 0
L,0.02L_{\infty,0.02} 84%\% 39%\% 0.20 0.0151 0.22
L,0.04L_{\infty,0.04} 85%\% 27%\% 0.45 0.0127 0.35
L,0.06L_{\infty,0.06} 86%\% 14%\% 0.72 0.0092 0.53
L,0.08L_{\infty,0.08} 87%\% 6%\% 0.87 0.0064 0.67
L,0.10L_{\infty,0.10} 87%\% 1%\% 0.98 0.0044 0.77
L0,10L_{0,10} 80%\% 26%\% 0.45 0.0193 0
L0,20L_{0,20} 78%\% 18%\% 0.59 0.0187 0.03
L0,30L_{0,30} 72%\% 9%\% 0.71 0.0125 0.33
Table 5: Adversarial parameter attack for R{\mathcal{F}}_{R}. AC: accuracy, AA: adversarial accuracy, AR: adversarial rate.

5.2.3 Affect of network depth and width on the adversarial parameter attack

We check how the network depth and width affect on the adversarial parameter attack. We use the L,γL_{\infty,\gamma} adversarial parameter attack for γ=0.02,0.04\gamma=0.02,0.04. Let Vk{\mathcal{F}}^{k}_{V} be the network which has the same width with V{\mathcal{F}}_{V} but has kk more layers, V(α){\mathcal{F}}_{V}(\alpha) the network which has the same depth with V{\mathcal{F}}_{V} but has α\alpha times width as V{\mathcal{F}}_{V}. The results are given in Table 6. We can see that when the depth becomes larger, the attack becomes easier. This validates the results in section 4.2, for instance Corollary 4.5, where it shows that when the depth of the network becomes large, adversarial parameters exist.

The attack is much less sensitive to the width. The reason may be that there exist much redundancy on the width, similar to the results in [17, 25, 26], and the redundance can lead to limited search directions in the feature space and poor generalization performance, as shown in [21], so the attack is hard to improve when the width becomes larger.

Network γ=0\gamma=0 γ=0.02\gamma=0.02 γ=0.04\gamma=0.04
AC AA AC AA AR AC AA AR
V{\mathcal{F}}_{V} 80%\% 45%\% 78%\% 38%\% 0.15 77%\% 30%\% 0.32
V8{\mathcal{F}}^{8}_{V} 80%\% 47%\% 76%\% 37%\% 0.20 78%\% 32%\% 0.31
V16{\mathcal{F}}^{16}_{V} 78%\% 44%\% 74%\% 35%\% 0.19 75%\% 27%\% 0.37
V24{\mathcal{F}}^{24}_{V} 79%\% 43%\% 73%\% 30%\% 0.28 73%\% 20%\% 0.49
V32{\mathcal{F}}^{32}_{V} 76%\% 44%\% 72%\% 28%\% 0.34 71%\% 19%\% 0.53
V(1.25){\mathcal{F}}_{V}(1.25) 80%\% 42%\% 77%\% 37%\% 0.12 76%\% 29%\% 0.29
V(1.5){\mathcal{F}}_{V}(1.5) 81%\% 41%\% 78%\% 36%\% 0.12 77%\% 30%\% 0.26
V(2){\mathcal{F}}_{V}(2) 78%\% 44%\% 76%\% 39%\% 0.11 74%\% 31%\% 0.28
V(2.5){\mathcal{F}}_{V}(2.5) 80%\% 44%\% 79%\% 35%\% 0.20 76%\% 30%\% 0.30
Table 6: Affect of width and depth on adversarial parameter attack for V{\mathcal{F}}_{V}. AC: accuracy, AA: adversarial accuracy, AR: adversarial rate.

We can use R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) to measure the robustness and similar results are obtained, which are given in Table 7.

Network γ=0\gamma=0 γ=0.02\gamma=0.02 γ=0.04\gamma=0.04
R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) AR R~4(,T)\widetilde{R}_{4}({\mathcal{F}},T) AR
V{\mathcal{F}}_{V} 0.0770 0.0667 0.13 0.0481 0.36
V8{\mathcal{F}}^{8}_{V} 0.0776 0.0686 0.11 0.0534 0.30
V16{\mathcal{F}}^{16}_{V} 0.0815 0.0652 0.19 0.0479 0.40
V24{\mathcal{F}}^{24}_{V} 0.0817 0.0630 0.21 0.0388 0.49
V32{\mathcal{F}}^{32}_{V} 0.0808 0.0608 0.23 0.0392 0.48
V(1.25){\mathcal{F}}_{V}(1.25) 0.0750 0.0663 0.11 0.0524 0.29
V(1.5){\mathcal{F}}_{V}(1.5) 0.0763 0.0670 0.12 0.0563 0.25
V(2){\mathcal{F}}_{V}(2) 0.0750 0.0650 0.13 0.0499 0.32
V(2.5){\mathcal{F}}_{V}(2.5) 0.0775 0.0678 0.12 0.0489 0.35
Table 7: Affect of width and depth on adversarial parameter attack for V{\mathcal{F}}_{V}. AR: adversarial rate

5.3 Direct adversarial parameters

We give experimental results for direct adversarial parameters introduced in section 2.2. We try to decrease the accuracies for samples with label 0 and keep the accuracies and robustness for other samples. The experimental results are for the network V{\mathcal{F}}_{V} and CIFAR-10 and are given in Table 8.

Attack AC1 AA1 AC0 AR
L,0.02L_{\infty,0.02} 77%\% 35%\% 11%\% 0.65
L,0.04L_{\infty,0.04} 78%\% 40%\% 3%\% 0.83
L,0.06L_{\infty,0.06} 79%\% 42%\% 1%\% 0.92
L,0.08L_{\infty,0.08} 80%\% 43%\% 1%\% 0.95
L,0.1L_{\infty,0.1} 80%\% 45%\% 1%\% 0.99
Table 8: Direct adversarial parameter attack for V{\mathcal{F}}_{V}. AC1 and AA1 are for samples with label 0\neq 0, AC0 is the accuracy for samples with label 0.

Comparing to Tables 8 and 4, we can see that direct adversarial parameters for a given label are much easier to compute than adversarial parameters. High quality direct adversarial parameters can be obtained by using perturbation ratios 6%10%6\%-10\%. Results for network R{\mathcal{F}}_{R} are given in Table 9, from which we can see that it is slightly more difficult to attack R{\mathcal{F}}_{R}.

Attack AC1 AA1 AC0 AR
L,0.02L_{\infty,0.02} 82%\% 46%\% 52%\% 0.38
L,0.04L_{\infty,0.04} 81%\% 47%\% 35%\% 0.57
L,0.06L_{\infty,0.06} 81%\% 47%\% 10%\% 0.83
L,0.08L_{\infty,0.08} 81%\% 46%\% 4%\% 0.90
L,0.1L_{\infty,0.1} 80%\% 46%\% 1%\% 0.93
Table 9: Direc adversarial parameter attack for R{\mathcal{F}}_{R}. AC1 and AA1 are for samples with label 0\neq 0, AC0 is the accuracy for samples with label 0.

5.4 Adversarial parameters for a given sample

We give experimental results for adversarial parameters for a given sample introduced in section 2.2. R~2(,x)=minilx{lx(x)i(x)lx(x)i(x)1}\widetilde{R}_{2}({\mathcal{F}},x)=\min_{i\neq l_{x}}\{\frac{{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{i}(x)}{||\nabla{\mathcal{F}}_{l_{x}}(x)-\nabla{\mathcal{F}}_{i}(x)||_{1}}\} is used to measure the robustness of {\mathcal{F}} at sample xx. Let SS be a subset of the test set containing 100 samples such that {\mathcal{F}} gives the correct label for all of them and all samples in SS are robust in that, no adversarial samples were found using PGD-10 with L=8255L_{\infty}=\frac{8}{255}.

Attack R~2(,x)\widetilde{R}_{2}({\mathcal{F}},x) N1N_{1} N2N_{2} AR
before attack 0.078 100 100 0
L,0.02L_{\infty,0.02} 0.016 0 100 0.79
L,0.04L_{\infty,0.04} 0.010 0 100 0.87
L,0.06L_{\infty,0.06} 0.008 0 100 0.89
L,0.08L_{\infty,0.08} 0.006 0 100 0.92
L,0.1L_{\infty,0.1} 0.005 0 100 0.94
Table 10: Adversarial parameter attack to V{\mathcal{F}}_{V} for a given sample. AR: adversarial rate

For each sample xSx\in S, we compute L,γL_{\infty,\gamma} adversarial parameters and the average results are given in Table 10, where N1N_{1} is the number of robust samples, and N2N_{2} the number of samples which are given the correct labels. Comparing to Tables 8, 4, and 10, we can see that adversarial rates for a single sample are about the same as that for a given label. Similar results for network R{\mathcal{F}}_{R} are given in Table 11.

Attack R~2(,x)\widetilde{R}_{2}({\mathcal{F}},x) N1N_{1} N2N_{2} AR
before attack 0.057 100 100 0
L,0.02L_{\infty,0.02} 0.008 0 100 0.86
L,0.04L_{\infty,0.04} 0.008 0 100 0.86
L,0.06L_{\infty,0.06} 0.008 0 100 0.86
L,0.08L_{\infty,0.08} 0.008 0 100 0.86
L,0.1L_{\infty,0.1} 0.008 0 100 0.86
Table 11: Adversarial parameter attack to R{\mathcal{F}}_{R} for a given sample. AR: Adversarial Rate

6 Proofs for the theorems in section 4

6.1 Proofs of Theorems 4.1 and 4.2

We introduce several notations. Let x=mini[n]{|xi|}||x||_{-\infty}=\min_{i\in[n]}\{|x_{i}|\} for xnx\in{\mathbb{R}}^{n}, and W,2=mini[a]{W(i)2}||W||_{-\infty,2}=\min_{i\in[a]}\{||W^{(i)}||_{2}\} for Wa×bW\in{\mathbb{R}}^{a\times b}, where W(i)W^{(i)} is the ii-th row of WW. If {\mathcal{F}} is a network, we use i(x){\mathcal{F}}_{i}(x) to denote the ii-th coordinate of (x){\mathcal{F}}(x). A lemma is proved first.

Lemma 6.1.

Let vnv\in{\mathbb{R}}^{n} and v0v\neq 0. Then there exists a vector wnw\in{\mathbb{R}}^{n} such that wvw\bot v, w=1||w||_{\infty}=1 and w2n1||w||_{2}\geq\sqrt{n-1}.

Proof.

Let S=argminS[n]{|iS|vi|i[n]S|vi||}S=\arg\min_{S\subseteqq[n]}\{|\sum_{i\in S}|v_{i}|-\sum_{i\in[n]\setminus S}|v_{i}||\}. We can assume iS|vi|i[n]S|vi|=k0\sum_{i\in S}|v_{i}|-\sum_{i\in[n]\setminus S}|v_{i}|=k\geq 0. For any jSj\in S such that vj0v_{j}\neq 0 and S1=S/{j}S_{1}=S/\{j\}, we have |iS1|vi|i[n]/S1|vi||=|2|vj|k|k|\sum_{i\in S_{1}}|v_{i}|-\sum_{i\in[n]/S_{1}}|v_{i}||=|2|v_{j}|-k|\geq k, which means k|vj|k\leq|v_{j}|.

We now define wnw\in{\mathbb{R}}^{n}. Set wi=1w_{i}=1 if vi=0v_{i}=0. Select a jSj\in S such that vj0v_{j}\neq 0 and let wi=sign(vi)w_{i}={\hbox{\rm{sign}}}(v_{i}) if iS/{j}i\in S/\{j\} and vi0v_{i}\neq 0, and wj=iS|vi|+i[n]S|vi|+|vj|vjw_{j}=\frac{-\sum_{i\in S}|v_{i}|+\sum_{i\in[n]\setminus S}|v_{i}|+|v_{j}|}{v_{j}}. For i[n]Si\in[n]\setminus S, let wi=sign(vi)w_{i}=-{\hbox{\rm{sign}}}(v_{i}) if vi0v_{i}\neq 0. It is easy to check that w=1||w||_{\infty}=1, w2n1||w||_{2}\geq\sqrt{n-1}, and wvw\bot v. The lemma is proved. ∎

We now prove Theorem 4.1.

Proof.

By Lemma 6.1, there exists a vector vnv\in{\mathbb{R}}^{n} such that vx0v\bot x_{0}, v2n1||v||_{2}\geq\sqrt{n-1} and v=1||v||_{\infty}=1. Moreover, we can assume that at least ηn1/2\eta n_{1}/2 coordinates of Relu(W1(x+ϵv)+b1){\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}) are bigger than bb. If this is not valid, we just need to change vv to v-v, and then Relu(W1(x+ϵv)+b1)+Relu(W1(xϵv)+b1)2Relu(W1x+b1){\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})+{\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v)+b_{1})\geq 2{\hbox{\rm{Relu}}}(W_{1}x+b_{1}), since Relu(x)+Relu(y)Relu(x+y){\hbox{\rm{Relu}}}(x)+{\hbox{\rm{Relu}}}(y)\geq{\hbox{\rm{Relu}}}(x+y) for all x,yx,y\in{\mathbb{R}}. By condition C3C_{3}, at least ηn1\eta n_{1} coordinates of 2Relu(W1x+b1)2{\hbox{\rm{Relu}}}(W_{1}x+b_{1}) are bigger than 2b2b, but fewer than ηn1/2\eta n_{1}/2 coordinates of Relu(W1(x+ϵv)+b1){\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}) are bigger than bb, so at least ηn1/2\eta n_{1}/2 coordinates of Relu(W1(xϵv)+b1){\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v)+b_{1}) are bigger than bb.

Let l2[m]l_{2}\in[m] such that l2lx0l_{2}\neq l_{x_{0}}, W¯2=(W2(lx0)W2(l2))1×n1\overline{W}_{2}=-(W_{2}^{(l_{x_{0}})}-W_{2}^{(l_{2})})\in{\mathbb{R}}^{1\times n_{1}}, Uvn1×nU_{v}\in{\mathbb{R}}^{n_{1}\times n} all of whose rows are γvτ\gamma v^{\tau} (the transposition of vv), and U=diag(sign(W¯2))n1×n1U={\hbox{\rm{diag}}}({\hbox{\rm{sign}}}(\overline{W}_{2}))\in{\mathbb{R}}^{n_{1}\times n_{1}}. Let W~1=W1+UUv\widetilde{W}_{1}=W_{1}+UU_{v} and

~(x)=W2Relu(W~1x+b1)+b2.\widetilde{{\mathcal{F}}}(x)=W_{2}{\hbox{\rm{Relu}}}(\widetilde{W}_{1}x+b_{1})+b_{2}.

We will show that ~\widetilde{{\mathcal{F}}} satisfies the condition of the theorem.

Since vx0v\bot x_{0}, we have ~(x0)=(x0)\widetilde{{\mathcal{F}}}(x_{0})={\mathcal{F}}(x_{0}) and ~\widetilde{{\mathcal{F}}} gives the correct label for x0x_{0}. Since v=1||v||_{\infty}=1, we have W~1W1=UUv=Uv=γvγ||\widetilde{W}_{1}-W_{1}||_{\infty}=||UU_{v}||_{\infty}=||U_{v}||_{\infty}=||\gamma v||_{\infty}\leq\gamma, and thus ~(x)Hγ(Θ)\widetilde{{\mathcal{F}}}(x)\in H_{\gamma}(\Theta).

So it suffices to show that ~(x0+ϵv)\widetilde{{\mathcal{F}}}(x_{0}+\epsilon v) will not give x0+ϵvx_{0}+\epsilon v label lx0l_{x_{0}}, which means that ~\widetilde{{\mathcal{F}}} has adversarial samples to x0x_{0} in S(x0,ϵ)S_{\infty}(x_{0},\epsilon). Since

~(x0+ϵv)=W2Relu(W~1(x0+ϵv)+b1)+b2=W2Relu(W1(x0+ϵv)+b1+ϵUUvv)+b2\begin{array}[]{ll}&\widetilde{{\mathcal{F}}}(x_{0}+\epsilon v)\\ =&W_{2}{\hbox{\rm{Relu}}}(\widetilde{W}_{1}(x_{0}+\epsilon v)+b_{1})+b_{2}\\ =&W_{2}{\hbox{\rm{Relu}}}(W_{1}(x_{0}+\epsilon v)+b_{1}+\epsilon UU_{v}v)+b_{2}\\ \end{array}

we have

~lx0(x0+ϵv)~l2(x0+ϵv)=lx0(x0+ϵv)l2(x0+ϵv)+W¯2(Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)).\begin{array}[]{ll}&\widetilde{{\mathcal{F}}}_{l_{x_{0}}}(x_{0}+\epsilon v)-\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}+\epsilon v)\\ =&{\mathcal{F}}_{l_{x_{0}}}(x_{0}+\epsilon v)-{\mathcal{F}}_{l_{2}}(x_{0}+\epsilon v)+\\ &\overline{W}_{2}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)).\\ \end{array}

Since e(Relu(f)Relu(e+f))=|f|(|Relu(e)Relu(e+f))|e({\hbox{\rm{Relu}}}(f)-{\hbox{\rm{Relu}}}(e+f))=-|f|(|{\hbox{\rm{Relu}}}(e)-{\hbox{\rm{Relu}}}(e+f))| for all e,fe,f\in{\mathbb{R}}, we have

W¯2(Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv))=|W¯2|(|Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)|).\begin{array}[]{ll}&-\overline{W}_{2}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v))\\ =&|\overline{W}_{2}|(|{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)|).\\ \end{array}

Since ϵUUvv=ϵγv22sign(W¯2)\epsilon UU_{v}v=\epsilon\gamma||v||_{2}^{2}{\hbox{\rm{sign}}}(\overline{W}_{2}) and v2n1||v||_{2}\geq n-1, each weight of |ϵUUvv||\epsilon UU_{v}v| is at least ϵγ(n1)\epsilon\gamma(n-1). Note that if e>0e>0 and ff\in{\mathbb{R}}, then |Relu(e)Relu(ef)|min{e,|f|}|{\hbox{\rm{Relu}}}(e)-{\hbox{\rm{Relu}}}(e-f)|\geq\min\{e,|f|\}. As a consequence, if ii satisfies (Relu(W1(x+ϵv)+b1))i>b({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}))_{i}>b, then (|Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)|)imin{ϵγ(n1),b}(|{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)|)_{i}\geq\min\{\epsilon\gamma(n-1),b\}. Since at least ηn1/2\eta n_{1}/2 coordinates of Relu(W1(x+ϵv)+b1){\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}) are bigger than bb, we have Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)1ηn1/2min{ϵγ(n1),b}||{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)||_{1}\geq\eta n_{1}/2\min\{\epsilon\gamma(n-1),b\}. By condition C2C_{2}, it is easy see

W¯2(Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv))=|W¯2|(|Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)|)W¯2Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv)1min{ϵγ(n1),b}cn1η/2,\begin{array}[]{ll}&-\overline{W}_{2}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v))\\ =&|\overline{W}_{2}|(|{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)|)\\ \geq&||\overline{W}_{2}||_{-\infty}||{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v)||_{1}\\ \geq&\min\{\epsilon\gamma(n-1),b\}cn_{1}\eta/2,\end{array}

that is, W¯2(Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv))min{ϵγ(n1),b}cn1η/2\overline{W}_{2}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v))\leq-\min\{\epsilon\gamma(n-1),b\}cn_{1}\eta/2. By condition C1C_{1}, we have lx0(x0+ϵv)l2(x0+ϵv)A{\mathcal{F}}_{l_{x_{0}}}(x_{0}+\epsilon v)-{\mathcal{F}}_{l_{2}}(x_{0}+\epsilon v)\leq A. Then we have

~lx0(x0+ϵv)~l2(x0+ϵv)=lx0(x0+ϵv)l2(x0+ϵv)+W¯2(Relu(W1(x+ϵv)+b1)Relu(W1(x+ϵv)+b1+ϵUUvv))Amin{ϵγ(n1),b}cn1η/2<0.\begin{array}[]{ll}&\widetilde{{\mathcal{F}}}_{l_{x_{0}}}(x_{0}+\epsilon v)-\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}+\epsilon v)\\ =&{\mathcal{F}}_{l_{x_{0}}}(x_{0}+\epsilon v)-{\mathcal{F}}_{l_{2}}(x_{0}+\epsilon v)+\\ &\overline{W}_{2}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v)+b_{1}+\epsilon UU_{v}v))\\ \leq&A-\min\{\epsilon\gamma(n-1),b\}cn_{1}\eta/2<0.\end{array}

Thus if n1>2Amin{ϵγ(n1),b}cηn_{1}>\frac{2A}{\min\{\epsilon\gamma(n-1),b\}c\eta}, then ~lx0(x0+ϵv)~l2(x0+ϵv)<0\widetilde{{\mathcal{F}}}_{l_{x_{0}}}(x_{0}+\epsilon v)-\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}+\epsilon v)<0 and the label of ~(x0+ϵv)\widetilde{{\mathcal{F}}}(x_{0}+\epsilon v) is not lx0l_{x_{0}}. The theorem is proved. ∎

We now prove Theorem 4.2.

Proof.

By condition C4C_{4}, for l[m]l\in[m], there exist vlnv_{l}\in{\mathbb{R}}^{n} such that vlSv_{l}\bot S, vlvkv_{l}\bot v_{k} for lkl\neq k, vl2=1||v_{l}||_{2}=1. Then vl1||v_{l}||_{\infty}\leq 1.

By condition C3C_{3}, at least ηn1/2\eta n_{1}/2 coordinates of Relu(W1(x+ϵvlx)+b1){\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}) are bigger than bb or at least ηn1/2\eta n_{1}/2 coordinates of Relu(W1(xϵvlx)+b1){\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v_{l_{x}})+b_{1}) are bigger than bb, similar to the proof of Theorem 4.1.

For convenience, we write G(x,y):(n,)G(x,y):({\mathbb{R}}^{n},{\mathbb{R}})\to{\mathbb{R}} as G(x,y)=sign(xyIn)0G(x,y)=||{\hbox{\rm{sign}}}(x-yI_{n})||_{0}, where InI_{n} is the vector with entries 1. It is easy to see that, G(x,b)G(x,b) is the number of coordinates of xx that are bigger than bb. So we have G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2 or G(Relu(W1(xϵvlx)+b1),b)ηn1/2G({\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2 for all xx, and hence for l[m]l\in[m], we have

PxDx(G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2 or G(Relu(W1(xϵvlx)+b1),b)ηn1/2|lx=l)=1.P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\hbox{ or }G({\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\,|\,l_{x}=l)=1.

For events ee and ff, P(eorf)P(e)+P(f)P(e\ \hbox{or}\ f)\leq P(e)+P(f). We thus have PxDx(G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2|lx=l)0.5P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\,|\,l_{x}=l)\geq 0.5 or PxDx(G(Relu(W1(xϵvlx)+b1),b)ηn1/2|lx=l)0.5P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x-\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\,|\,l_{x}=l)\geq 0.5. Without loss of generality, we can assume that for any l[m]l\in[m], PxDx(G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2|lx=l)0.5P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\,|\,l_{x}=l)\geq 0.5. Therefore,

PxDx(G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2)=l[m]PxDx(lx=l)PxDx(G(Relu(W1(x+ϵvlx)+b1),b)ηn1/2|lx=l)0.5l[m]PxDx(lx=l)=0.5.\begin{array}[]{ll}&P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2)\\ =&\sum_{l\in[m]}P_{x\sim D_{x}}(l_{x}=l)P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\eta n_{1}/2\,|\,l_{x}=l)\\ \geq&0.5\sum_{l\in[m]}P_{x\sim D_{x}}(l_{x}=l)\\ =&0.5.\end{array}

For l[m]l\in[m], let W¯2(l)=W2(l)W2(l+1)\overline{W}_{2}^{(l)}=W_{2}^{(l)}-W_{2}^{(l+1)}, where W2(m+1)=W2(1)W_{2}^{(m+1)}=W_{2}^{(1)}. Now assume Uvln1×nU^{l}_{v}\in{\mathbb{R}}^{n_{1}\times n}, whose rows are all γvl\gamma v_{l}, and Ul=diag(sign(W¯2(l)))U_{l}={\hbox{\rm{diag}}}({\hbox{\rm{sign}}}(\overline{W}_{2}^{(l)})). Let W~1=W1+1ml=1mUlUvl\widetilde{W}_{1}=W_{1}+\frac{1}{m}\sum_{l=1}^{m}U_{l}U^{l}_{v}, and

~(x)=W2Relu(W~1x+b1)+b2.\widetilde{{\mathcal{F}}}(x)=W_{2}{\hbox{\rm{Relu}}}(\widetilde{W}_{1}x+b_{1})+b_{2}.

We will show that ~(x)\widetilde{{\mathcal{F}}}(x) satisfies the conditions of the theorem.

It is easy to see that ~\widetilde{{\mathcal{F}}} is in γ(Θ){\mathcal{H}}_{\gamma}(\Theta). For any xSx\in S, W~1x=W1x+1ml=1m(UlUvl)x=W1x\widetilde{W}_{1}x=W_{1}x+\frac{1}{m}\sum_{l=1}^{m}(U_{l}U^{l}_{v})x=W_{1}x, which means ~(x)=(x)\widetilde{{\mathcal{F}}}(x)={\mathcal{F}}(x) and the accuracy of ~\widetilde{{\mathcal{F}}} over DxD_{x} is equal to that of {\mathcal{F}}.

Let xSx\in S satisfy G(Relu(W1(x+ϵvlx)+b1),b)>ηn1/2G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)>\eta n_{1}/2 and l2lxl_{2}\neq l_{x}. By conditions C1C_{1} and C2C_{2} and similar to the proof of Theorem 4.1, we have

~lx(x+ϵvlx)~l2(x+ϵvlx)=lx(x+ϵvlx)l2(x+ϵvlx)+Wc(lx)(Relu(W1(x+ϵvlx)+b1)Relu(W1(x+ϵvlx)+b1+ϵUlUvlv/m))Amin{ϵγ/m,b}cn1η/2<0.\begin{array}[]{ll}&\widetilde{{\mathcal{F}}}_{l_{x}}(x+\epsilon v_{l_{x}})-\widetilde{{\mathcal{F}}}_{l_{2}}(x+\epsilon v_{l_{x}})\\ =&{\mathcal{F}}_{l_{x}}(x+\epsilon v_{l_{x}})-{\mathcal{F}}_{l_{2}}(x+\epsilon v_{l_{x}})+\\ &W^{(l_{x})}_{c}({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1})-{\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}+\epsilon U_{l}U^{l}_{v}v/m))\\ \leq&A-\min\{\epsilon\gamma/m,b\}cn_{1}\eta/2<0.\end{array}

Thus ~\widetilde{{\mathcal{F}}} does not give label lxl_{x} to x+ϵvlxx+\epsilon v_{l_{x}} and ~\widetilde{{\mathcal{F}}} has an adversarial sample of xx in S(x,ϵ)S_{\infty}(x,\epsilon). Furthermore, since PxDx(G(Relu(W1(x+ϵvlx)+b1),b)γn1/2)>0.5P_{x\sim D_{x}}(G({\hbox{\rm{Relu}}}(W_{1}(x+\epsilon v_{l_{x}})+b_{1}),b)\geq\gamma n_{1}/2)>0.5, we have

PxDx(~hasanadversarialsampleofxinS(x,ϵ))>0.5.P_{x\sim D_{x}}(\widetilde{{\mathcal{F}}}\ has\ an\ adversarial\ sample\ of\ x\ in\ S_{\infty}(x_{,}\epsilon))>0.5.

The theorem is proved. ∎

6.2 Proofs of Theorems 4.3 and 4.4

We first prove two lemmas.

Lemma 6.2.

For l[m]l\in[m], let SlS_{l} be a non-empty bounded closed subset of n×n{\mathbb{R}}^{n\times n} such that WSlW\in S_{l} implies WSl-W\in S_{l}. Also let S0S_{0} be a non-empty bounded closed subset of 1×n{\mathbb{R}}^{1\times n} such that xS0x\in S_{0} implies xS0-x\in S_{0}. Let U01×nU_{0}\in{\mathbb{R}}^{1\times n} and Uln×nU_{l}\in{\mathbb{R}}^{n\times n} for l[m]l\in[m]. Define maps: T0(x):1×n1×nT_{0}(x):{\mathbb{R}}^{1\times n}\to{\mathbb{R}}^{1\times n} by T0(x)=xl=1mUlT_{0}(x)=x\prod_{l=1}^{m}U_{l}, and for l[m]l\in[m], Tl(W):n×n1×nT_{l}(W):{\mathbb{R}}^{n\times n}\to{\mathbb{R}}^{1\times n} by Tl(W)=U0(j=1l1Uj)W(j=l+1mUj)T_{l}(W)=U_{0}(\prod_{j=1}^{l-1}U_{j})W(\prod_{j=l+1}^{m}U_{j}). Then

maxxlSl,0lm{l=0m(xl+Ul)22}l=0mUl22+l=0mmaxxlSl{Tl(xl)22}\max_{x_{l}\in S_{l},\forall 0\leq l\leq m}\{||\prod_{l=0}^{m}(x_{l}+U_{l})||_{2}^{2}\}\geq||\prod_{l=0}^{m}U_{l}||^{2}_{2}+\sum_{l=0}^{m}\max_{x_{l}\in S_{l}}\{||T_{l}(x_{l})||_{2}^{2}\}
Proof.

For l[m]l\in[m], let ul=argmaxxSlTl(x)22u_{l}=\arg\max_{x\in S_{l}}||T_{l}(x)||_{2}^{2}, which exists because SlS_{l} is bounded and closed. Then

maxxlSl,0lm{l=0m(xl+Ul)22}maxxl{ul,ul},0lm{l=0m(xl+Ul)22}=12m+1xl{ul,ul},0lml=0m(xl+Ul)22=Ml{ul,Ul},0lml=0mMl22l=0mUl2+l=0mTl(ul)22=l=0mUl2+l=0mmaxxlSl{Tl(xl)22}\begin{array}[]{ll}&\max_{x_{l}\in S_{l},\forall 0\leq l\leq m}\{||\prod_{l=0}^{m}(x_{l}+U_{l})||_{2}^{2}\}\\ \geq&\max_{x_{l}\in\{u_{l},-u_{l}\},\forall 0\leq l\leq m}\{||\prod_{l=0}^{m}(x_{l}+U_{l})||_{2}^{2}\}\\ =&\frac{1}{2^{m+1}}\sum_{x_{l}\in\{u_{l},-u_{l}\},\forall 0\leq l\leq m}||\prod_{l=0}^{m}(x_{l}+U_{l})||_{2}^{2}\\ =&\sum_{M_{l}\in\{u_{l},U_{l}\},\forall 0\leq l\leq m}||\prod_{l=0}^{m}M_{l}||_{2}^{2}\\ \geq&||\prod_{l=0}^{m}U_{l}||_{2}+\sum_{l=0}^{m}||T_{l}(u_{l})||_{2}^{2}\\ =&||\prod_{l=0}^{m}U_{l}||_{2}+\sum_{l=0}^{m}\max_{x_{l}\in S_{l}}\{||T_{l}(x_{l})||_{2}^{2}\}\end{array}

The lemma is proved. ∎

Lemma 6.3.

For l[m]l\in[m], let SlS_{l} be a non-empty closed subset of bounded functions from 𝕀k{\mathbb{I}}^{k} to Rn×nR^{n\times n} such that c(x)Slc(x)\in S_{l} implies c(x)Sl-c(x)\in S_{l}. Also let S0S_{0} be a non-empty closed subset of bounded functions from 𝕀k{\mathbb{I}}^{k} to 1×n{\mathbb{R}}^{1\times n} such that c(x)S0c(x)\in S_{0} implies c(x)S0-c(x)\in S_{0}. Assume U0(x):𝕀k1×nU_{0}(x):{\mathbb{I}}^{k}\to{\mathbb{R}}^{1\times n}, Ul(x):𝕀kn×nU_{l}(x):{\mathbb{I}}^{k}\to{\mathbb{R}}^{n\times n} for l[m]l\in[m]. Define maps: T0(c,x):(S0,𝕀k)1×nT_{0}(c,x):(S_{0},{\mathbb{I}}^{k})\to{\mathbb{R}}^{1\times n} by T0(c,x)=c(x)l=1mUl(x)T_{0}(c,x)=c(x)\prod_{l=1}^{m}U_{l}(x); and Tl(c,x):(Sl,𝕀k)1×nT_{l}(c,x):(S_{l},{\mathbb{I}}^{k})\to{\mathbb{R}}^{1\times n} by Tl(x)=(j=0l1Uj(x))c(x)(j=l+1mUj(x))T_{l}(x)=(\prod_{j=0}^{l-1}U_{j}(x))c(x)(\prod_{j=l+1}^{m}U_{j}(x)) for l[m]l\in[m]. Then we have

maxcl(x)Sl,0lm{xDxl=0m(cl(x)+Ul(x))22}xDxl=0mUl(x)22+l=0mmaxcl(x)Sl{xDxTl(cl,x)22}.\max_{c_{l}(x)\in S_{l},\forall 0\leq l\leq m}\displaystyle\{\int_{x\sim D_{x}}||\prod_{l=0}^{m}(c_{l}(x)+U_{l}(x))||_{2}^{2}\displaystyle\}\geq\int_{x\sim D_{x}}||\prod_{l=0}^{m}U_{l}(x)||^{2}_{2}+\sum_{l=0}^{m}\max_{c_{l}(x)\in S_{l}}\{\int_{x\sim{D_{x}}}||T_{l}(c_{l},x)||_{2}^{2}\}.
Proof.

Denote cl¯(x)=argmaxcSlxDxTl(c,x)22\overline{c_{l}}(x)=\arg\max_{c\in S_{l}}||\int_{x\sim D_{x}}T_{l}(c,x)||_{2}^{2}, which must exist because SlS_{l} is closed and its elements are bounded. Then

maxcl(x)Sl,0lm{xDxl=0m(cl(x)+Ul(x))22}maxcl(x){cl¯(x),cl¯(x)},0lm{xDxl=0m(cl(x)+Ul(x))22}12m+1cl{cl¯(x),cl¯(x)},0lmxDxl=0m(cl(x)+Ul(x))22=Ml(x){cl¯(x),Ul(x)},0lmxDxl=0mMl(x)22xDxl=0mUl(x)22+l=0mxDxTl(cl¯,x)22=xDxl=0mUl2+l=0mmaxclSl{xDxTl(cl,x)22}.\begin{array}[]{ll}&\max_{c_{l}(x)\in S_{l},\forall 0\leq l\leq m}\{\int_{x\sim D_{x}}||\prod_{l=0}^{m}(c_{l}(x)+U_{l}(x))||_{2}^{2}\}\\ \geq&\max_{c_{l}(x)\in\{\overline{c_{l}}(x),-\overline{c_{l}}(x)\},\forall 0\leq l\leq m}\{\int_{x\sim D_{x}}||\prod_{l=0}^{m}(c_{l}(x)+U_{l}(x))||_{2}^{2}\}\\ \geq&\frac{1}{2^{m+1}}\sum_{c_{l}\in\{\overline{c_{l}}(x),-\overline{c_{l}}(x)\},\forall 0\leq l\leq m}\int_{x\sim D_{x}}||\prod_{l=0}^{m}(c_{l}(x)+U_{l}(x))||_{2}^{2}\\ =&\sum_{M_{l}(x)\in\{\overline{c_{l}}(x),U_{l}(x)\},\forall 0\leq l\leq m}||\int_{x\sim D_{x}}\prod_{l=0}^{m}M_{l}(x)||_{2}^{2}\\ \geq&\int_{x\sim D_{x}}||\prod_{l=0}^{m}U_{l}(x)||^{2}_{2}+\sum_{l=0}^{m}\int_{x\sim D_{x}}||T_{l}(\overline{c_{l}},x)||_{2}^{2}\\ =&\int_{x\sim D_{x}}||\prod_{l=0}^{m}U_{l}||_{2}+\sum_{l=0}^{m}\max_{c_{l}\in S_{l}}\{\int_{x\sim D_{x}}||T_{l}(c_{l},x)||_{2}^{2}\}.\end{array}

The lemma is proved. ∎

We now prove Theorem 4.3.

Proof.

Let SlS_{l} be the set of Qn×nQ\in{\mathbb{R}}^{n\times n} satisfying Qγ||Q||_{\infty}\leq\gamma and Ql1(x0)=0Q{\mathcal{F}}^{l-1}(x_{0})=0 for l[L+1]l\in[L+1]. Note that QQ satisfies n(l+1)n(l+1) linear equations and has n2n^{2} variables. Since nln\gg l in DNNs, we can assume that SlS_{l} is not empty. Let 𝔽{\mathbb{F}} be the set of networks ~:nm\widetilde{{\mathcal{F}}}:{\mathbb{R}}^{n}\to{\mathbb{R}}^{m} satisfying

~(x)=W~L+1σ(W~Lσ(W~L1σ(σ(W~1x))))\widetilde{{\mathcal{F}}}(x)=\widetilde{W}_{L+1}\sigma(\widetilde{W}_{L}\sigma(\widetilde{W}_{L-1}\sigma(\dots\sigma(\widetilde{W}_{1}x))))

where W~lWlSl\widetilde{W}_{l}-W_{l}\in S_{l} for all l[L+1]l\in[L+1]. We have ~l(x0)=l(x0)\widetilde{{\mathcal{F}}}^{l}(x_{0})={\mathcal{F}}^{l}(x_{0}) for l[L+1]l\in[L+1]. So ~(x0)=lx0\widetilde{{\mathcal{F}}}(x_{0})=l_{x_{0}} if ~𝔽\widetilde{{\mathcal{F}}}\in{\mathbb{F}}. It is also easy to see that 𝔽γ(Θ){\mathbb{F}}\subset{\mathcal{H}}_{\gamma}(\Theta). So it suffices to prove

min~𝔽{R(~,x0)R(,x0)}1γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)4A+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2).\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{R(\widetilde{{\mathcal{F}}},x_{0})}{R({\mathcal{F}},x_{0})}\}\leq 1-\frac{\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}{4A+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}.

Let l2=argminilx0{|lx0(x0)i(x0)|2(lx0(x0))(i(x0))22I(lx0(x0)>i(x0))}l_{2}=\arg\min_{i\neq l_{x_{0}}}\{\frac{|{\mathcal{F}}_{l_{x_{0}}}(x_{0})-{\mathcal{F}}_{i}(x_{0})|^{2}}{||\nabla({\mathcal{F}}_{l_{x_{0}}}(x_{0}))-\nabla({\mathcal{F}}_{i}(x_{0}))||_{2}^{2}}I({\mathcal{F}}_{l_{x_{0}}}(x_{0})>{\mathcal{F}}_{i}(x_{0}))\}. Then

R(,x0)=|lx0(x0)l2(x0)|2(lx0(x0))(l2(x0))22.R({\mathcal{F}},x_{0})=\frac{|{\mathcal{F}}_{l_{x_{0}}}(x_{0})-{\mathcal{F}}_{l_{2}}(x_{0})|^{2}}{||\nabla({\mathcal{F}}_{l_{x_{0}}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}}.

Then for all ~𝔽\widetilde{{\mathcal{F}}}\in{\mathbb{F}}, we have

R(~,x0)|~lx(x0)~l2(x0)|2(~lx(x0))(~l2(x0))22=|lx(x0)l2(x0)|2(~lx(x0))(~l2(x0))22R(\widetilde{{\mathcal{F}}},x_{0})\leq\frac{|\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0})-\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0})|^{2}}{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}^{2}}=\frac{|{\mathcal{F}}_{l_{x}}(x_{0})-{\mathcal{F}}_{l_{2}}(x_{0})|^{2}}{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}^{2}}

So we have

min~𝔽{R(~,x0)R(,x0)}min~𝔽{|lx(x0)l2(x0)|2(~lx(x0))(~l2(x0))22/|lx(x)l2(x)|2(lx(x))(l2(x))22}min~𝔽{(lx(x0))(l2(x0))22(~lx(x0))(~l2(x0))22}.\begin{array}[]{ll}&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{R(\widetilde{{\mathcal{F}}},x_{0})}{R({\mathcal{F}},x_{0})}\}\\ \leq&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{|{\mathcal{F}}_{l_{x}}(x_{0})-{\mathcal{F}}_{l_{2}}(x_{0})|^{2}}{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}^{2}}/\frac{|{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l_{2}}(x)|^{2}}{||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}}\}\\ \leq&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}}{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}^{2}}\}.\end{array} (19)

To prove the theorem, we will first find a lower bound for max~𝔽{(~lx(x0))(~l2(x0))2}\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}\}. Let Jl(x)=diag(sign(l(x))):nn×nJ_{l}(x)={\hbox{\rm{diag}}}({\hbox{\rm{sign}}}({\mathcal{F}}^{l}(x))):\ {\mathbb{R}}^{n}\to{\mathbb{R}}^{n\times n} for l[L]l\in[L]. Then

i(x)x=WL+1i(JL(x)WL)(JL1(x)WL1)(J1(x)W1).\frac{\nabla{\mathcal{F}}_{i}(x)}{\nabla x}=W^{i}_{L+1}(J_{L}(x)W_{L})(J_{L-1}(x)W_{L-1})\dots(J_{1}(x)W_{1}).

Let Al(x)=L(t)l(t)|t=xA_{l}(x)=\frac{\nabla{\mathcal{F}}^{L}(t)}{\nabla{\mathcal{F}}^{l}(t)}|_{t=x} and Bl(x)=l(t)t|t=xB_{l}(x)=\frac{\nabla{\mathcal{F}}^{l}(t)}{\nabla t}|_{t=x} for l[L]l\in[L]. Then we have

Al(x)=(JL(x)WL)(JL1(x)WL1)(Jl+1(x)Wl+1)Bl(x)=(Jl(x)Wl)(Jl1(x)Wl1)(J1(x)W1).\begin{array}[]{ll}A_{l}(x)=(J_{L}(x)W_{L})(J_{L-1}(x)W_{L-1})\dots(J_{l+1}(x)W_{l+1})\\ B_{l}(x)=(J_{l}(x)W_{l})(J_{l-1}(x)W_{l-1})\dots(J_{1}(x)W_{1}).\end{array}

Let Al=Al(x0)A_{l}=A_{l}(x_{0}), Bl=Bl(x0)B_{l}=B_{l}(x_{0}), Jl=Jl(x0)J_{l}=J_{l}(x_{0}). Then for all ~𝔽\widetilde{{\mathcal{F}}}\in{\mathbb{F}},

~i(x)x|x=x0=W~L+1i(JLW~L)(JL1W~L1)(J1W~1).\frac{\nabla\widetilde{{\mathcal{F}}}_{i}(x)}{\nabla x}|_{x=x_{0}}=\widetilde{W}_{L+1}^{i}(J_{L}\widetilde{W}_{L})(J_{L-1}\widetilde{W}_{L-1})\dots(J_{1}\widetilde{W}_{1}).

Denote W¯l=W~lWlSl\overline{W}_{l}=\widetilde{W}_{l}-W_{l}\in S_{l} for l[L+1]l\in[L+1]. We have

(~lx(x0))(~l2(x0))=(W~L+1(lx)W~L+1(l2))(JLW~L)(JL1W~L1)(J1W~1)=(WL+1(lx)WL+1(l2)+W¯L+1(lx)W¯L+1(l2))(JL(WL+W¯L))(JL1(WL1+W¯L1))(J1(W1+W¯1)).\begin{array}[]{ll}&\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))\\ =&(\widetilde{W}_{L+1}^{(l_{x})}-\widetilde{W}_{L+1}^{(l_{2})})(J_{L}\widetilde{W}_{L})(J_{L-1}\widetilde{W}_{L-1})\dots(J_{1}\widetilde{W}_{1})\\ =&(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})}+\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})(J_{L}(W_{L}+\overline{W}_{L}))(J_{L-1}(W_{L-1}+\overline{W}_{L-1}))\dots(J_{1}(W_{1}+\overline{W}_{1})).\\ \end{array}

Now, let γ(W¯1)=(WL+1(lx)WL+1(l2))A1J1W¯1\gamma(\overline{W}_{1})=(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}J_{1}\overline{W}_{1}, Mi(W¯i)=(WL+1(lx)WL+1(l2))AiJiW¯iBi1M_{i}(\overline{W}_{i})=(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{i}J_{i}\overline{W}_{i}B_{i-1}, where i{2,3,,L}i\in\{2,3,\dots,L\}, ML+1(W¯L+1)=(W¯L+1(lx)W¯L+1(l2))BLM_{L+1}(\overline{W}_{L+1})=(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}. By Lemma 6.2, we have

max~𝔽{(~lx(x0))(~l2(x0))22}maxW¯jSj,j[L+1]{(lx(x0))(l2(x0))22+i=1L+1Mi(W¯i)22}=(lx(x0))(l2(x0))22+i=1L+1maxW¯iSi{Mi(W¯i)22}.\begin{array}[]{ll}&\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||^{2}_{2}\}\\ \geq&\max_{\overline{W}_{j}\in S_{j},j\in[L+1]}\{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}+\sum_{i=1}^{L+1}||M_{i}(\overline{W}_{i})||^{2}_{2}\}\\ =&||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}+\sum_{i=1}^{L+1}\max_{\overline{W}_{i}\in S_{i}}\{||M_{i}(\overline{W}_{i})||^{2}_{2}\}.\end{array}

It is easy to see for any l[L]l\in[L], (WL+1(lx)WL+1(l2))Al=lx(t)l2(t)l(t)|t=x0(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}=\frac{\nabla{\mathcal{F}}_{l_{x}}(t)-{\mathcal{F}}_{l_{2}}(t)}{\nabla{\mathcal{F}}^{l}(t)}|_{t=x_{0}}, so by condition C3C_{3}, we have (WL+1(lx)WL+1(l2))Ai>c||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{i}||_{-\infty}>c. Let Sl1S^{1}_{l} be the subset of SlS_{l} containing those CC which have at most one nonzero row. Hence, for x1×nx\in{\mathbb{R}}^{1\times n} and Mn×nM\in{\mathbb{R}}^{n\times n}, if at most one row of MM is nonzero, we have xM=maxi,j[n]{|xiMi,j|}xM||xM||_{\infty}=\max_{i,j\in[n]}\{|x_{i}M_{i,j}|\}\geq||x||_{-\infty}||M||_{\infty}, where xix_{i} is the ii-th weight of xx, Mi,jM_{i,j} is the weight of MM at ii-th row and jj-th column. Thus

maxW¯1S1{γ(W¯1)2}=maxW¯1S1{(WL+1(lx)WL+1(l2))A1J1W¯12}maxW¯1S1{(WL+1(lx)WL+1(l2))A1J1W¯1}maxW¯1S11{(WL+1(lx)WL+1(l2))A1J1W¯1}(WL+1(lx)WL+1(l2))A1maxW¯1S11{J1W¯1}γc.\begin{array}[]{ll}&\max_{\overline{W}_{1}\in S_{1}}\{||\gamma(\overline{W}_{1})||_{2}\}\\ =&\max_{\overline{W}_{1}\in S_{1}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}J_{1}\overline{W}_{1}||_{2}\}\\ \geq&\max_{\overline{W}_{1}\in S_{1}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}J_{1}\overline{W}_{1}||_{\infty}\}\\ \geq&\max_{\overline{W}_{1}\in S^{1}_{1}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}J_{1}\overline{W}_{1}||_{\infty}\}\\ \geq&||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}||_{-\infty}\max_{\overline{W}_{1}\in S^{1}_{1}}\{||J_{1}\overline{W}_{1}||_{\infty}\}\\ \geq&\gamma c.\end{array}

Moreover, by condition C4C_{4}, there exists a column Li1L_{i-1} of Bi1B_{i-1} such that πrα(i1(x0),Li1)r\pi-r\geq\alpha({\mathcal{F}}^{i-1}(x_{0}),L_{i-1})\geq r, where α(x,y)\alpha(x,y) is the angle between x,yx,y. Therefore, there exists a vector vinv_{i}\in{\mathbb{R}}^{n} such that vi1(x)v\bot{\mathcal{F}}^{i-1}(x), vi=γ||v_{i}||_{\infty}=\gamma and consider condition C2C_{2}, we have vi,Li1=vi2Li12cos(π/2r)sin(r)bγ\langle v_{i},L_{i-1}\rangle=||v_{i}||_{2}||L_{i-1}||_{2}\cos(\pi/2-r)\geq\sin(r)b\gamma.

Then maxW¯iSi1JiW¯iBi1sin(r)bγ\max_{\overline{W}_{i}\in S^{1}_{i}}||J_{i}\overline{W}_{i}B_{i-1}||_{\infty}\geq\sin(r)b\gamma, because there must exist a W¯iSi1\overline{W}_{i}\in S^{1}_{i} whose only nonzero row is viv_{i} and JiW¯i=W¯iJ_{i}\overline{W}_{i}=\overline{W}_{i}. For l{2,3,,L}l\in\{2,3,\dots,L\}, by condition C3C_{3}, we have

maxW¯lSl{||Ml(W¯l)||2=maxW¯lSl{(WL+1(lx)WL+1(l2))AiJlW¯lBl12}maxW¯lSl{(WL+1(lx)WL+1(l2))AlJlW¯lBl1}maxW¯lSl1{(WL+1(lx)WL+1(l2))AlJlW¯lBl1}(WL+1(lx)WL+1(l2))A1maxW¯lSlJlW¯lBl1γsin(r)cb.\begin{array}[]{ll}&\max_{\overline{W}_{l}\in S_{l}}\{||M_{l}(\overline{W}_{l})||_{2}\\ =&\max_{\overline{W}_{l}\in S_{l}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{i}J_{l}\overline{W}_{l}B_{l-1}||_{2}\}\\ \geq&\max_{\overline{W}_{l}\in S_{l}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}J_{l}\overline{W}_{l}B_{l-1}||_{\infty}\}\\ \geq&\max_{\overline{W}_{l}\in S^{1}_{l}}\{||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}J_{l}\overline{W}_{l}B_{l-1}||_{\infty}\}\\ \geq&||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}||_{-\infty}\max_{\overline{W}_{l}\in S_{l}}||J_{l}\overline{W}_{l}B_{l-1}||_{\infty}\\ \geq&\gamma\sin(r)cb.\end{array}

Similarly,

maxW¯L+1SL+1{ML+1(W¯L+1)2}=maxW¯L+1SL+1{(W¯L+1(lx)W¯L+1(l2))BL2}maxW¯L+1SL+1{(W¯L+1(lx)W¯L+1(l2))BL}2sin(r)bγ.\begin{array}[]{ll}&\max_{\overline{W}_{L+1}\in S_{L+1}}\{||M_{L+1}(\overline{W}_{L+1})||_{2}\}\\ =&\max_{\overline{W}_{L+1}\in S_{L+1}}\{||(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}||_{2}\}\\ \geq&\max_{\overline{W}_{L+1}\in S_{L+1}}\{||(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}||_{\infty}\}\\ \geq&2\sin(r)b\gamma.\\ \end{array}

Then we obtain the desired lower bound:

max~𝔽{(~lx(x0))(~l2(x0))22}maxW¯jSj,j[L+1]{(lx(x0))(l2(x0))22+l=1L+1Ml(W¯l)22}(lx(x))(l2(x))22+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2).\begin{array}[]{ll}&\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||^{2}_{2}\}\\ \geq&\max_{\overline{W}_{j}\in S_{j},j\in[L+1]}\{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}+\sum_{l=1}^{L+1}||M_{l}(\overline{W}_{l})||^{2}_{2}\}\\ \geq&||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2}).\end{array}

By condition C1C_{1} and the lower bound just obtained, we have

min~𝔽{(lx(x0))(l2(x0))22(~lx(x0))(~l2(x0))22}(lx(x0))(l2(x0))22(lx(x0))(l2(x0))22+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)=1γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)(lx(x0))(l2(x0))22+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)1γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2)4A+γ2((L1)(sin(r)cb)2+c2+(2sin(r)b)2).\begin{array}[]{ll}&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}}{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x_{0}))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x_{0}))||_{2}^{2}}\}\\ \leq&\frac{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}}{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}\\ =&1-\frac{\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}{||\nabla({\mathcal{F}}_{l_{x}}(x_{0}))-\nabla({\mathcal{F}}_{l_{2}}(x_{0}))||_{2}^{2}+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}\\ \leq&1-\frac{\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}{4A+\gamma^{2}((L-1)(\sin(r)cb)^{2}+c^{2}+(2\sin(r)b)^{2})}.\end{array}

The theorem is proved. ∎

We now prove Theorem 4.4.

Proof.

The proof is similar to that of Theorem 4.3, so certain details are omitted. Let Tl={l1(x)|xS}T_{l}=\{{\mathcal{F}}^{l-1}(x)\,|\,x\in S\}\subset{\mathbb{R}}, and SlS_{l} the set of Qn×nQ\in{\mathbb{R}}^{n\times n} such that Qγ||Q||_{\infty}\leq\gamma and Qt=0Qt=0 for all tTlt\in T_{l} and l[L+1]l\in[L+1]. SlS_{l} must contain non-zero elements because of condition C4C_{4}.

Let 𝔽{\mathbb{F}} be the set of networks ~nm\widetilde{{\mathcal{F}}}\in{\mathbb{R}}^{n}\to{\mathbb{R}}^{m}

~(x)=W~L+1σ(W~Lσ(W~L1σ(σ(W~1x))))\widetilde{{\mathcal{F}}}(x)=\widetilde{W}_{L+1}\sigma(\widetilde{W}_{L}\sigma(\widetilde{W}_{L-1}\sigma(\dots\sigma(\widetilde{W}_{1}x))))

where W~lWlSl\widetilde{W}_{l}-W_{l}\in S_{l}. Then for all ~𝔽\widetilde{{\mathcal{F}}}\in{\mathbb{F}}, we have ~l(x)=l(x)\widetilde{{\mathcal{F}}}^{l}(x)={\mathcal{F}}^{l}(x) for l[L+1]l\in[L+1] and xSx\in S, so 𝔽H(γ){\mathbb{F}}\subset H(\gamma). As a consequence,

xDxminllx{lx(x)l(x)22I(lx(x)>l(x))}dx=xDxminllx{~lx(x)~l(x)22I(~lx(x)>~l(x))}dx.\begin{array}[]{ll}&\int_{x\sim D_{x}}\min_{l\neq l_{x}}\{||{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l}(x)||_{2}^{2}I({\mathcal{F}}_{l_{x}}(x)>{\mathcal{F}}_{l}(x))\}{\hbox{\rm{d}}}x\\ =&\int_{x\sim D_{x}}\min_{l\neq l_{x}}\{||\widetilde{{\mathcal{F}}}_{l_{x}}(x)-\widetilde{{\mathcal{F}}}_{l}(x)||_{2}^{2}I(\widetilde{{\mathcal{F}}}_{l_{x}}(x)>\widetilde{{\mathcal{F}}}_{l}(x))\}{\hbox{\rm{d}}}x.\end{array}

Let l2=argmaxllx{(lx(x))(l(x))22}l_{2}=\arg\max_{l\neq l_{x}}\{||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l}(x))||_{2}^{2}\}. Then

xDxmaxllx{(lx(x))(i(x))22}dx=xDx(lx(x))(l2(x))22dx\begin{array}[]{ll}&\int_{x\sim D_{x}}\max_{l\neq l_{x}}\{||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{i}(x))||_{2}^{2}\}{\hbox{\rm{d}}}x\\ =&\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x\\ \end{array}

and

xDxmaxllx{(~lx(x))(~i(x))22}dxxDx(~lx(x))(~l2(x))22dx.\begin{array}[]{ll}&\int_{x\sim D_{x}}\max_{l\neq l_{x}}\{||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x))-\nabla(\widetilde{{\mathcal{F}}}_{i}(x))||_{2}^{2}\}{\hbox{\rm{d}}}x\\ \geq&\int_{x\sim D_{x}}||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x.\\ \end{array}

Therefore,

min~H(γ){R(~,Dx)R(,Dx)}min~𝔽{R(~,Dx)R(,Dx)}min~𝔽{xDxlx(x)l2(x)22dxxDx~lx(x)~l2(x)22dx}.\begin{array}[]{ll}&\min_{\widetilde{{\mathcal{F}}}\in H(\gamma)}\{\frac{R(\widetilde{{\mathcal{F}}},D_{x})}{R({\mathcal{F}},D_{x})}\}\\ \leq&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{R(\widetilde{{\mathcal{F}}},D_{x})}{R({\mathcal{F}},D_{x})}\}\\ \leq&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{\int_{x\sim D_{x}}||{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x}{\int_{x\sim D_{x}}||\widetilde{{\mathcal{F}}}_{l_{x}}(x)-\widetilde{{\mathcal{F}}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x}\}.\end{array}

We will estimate max~𝔽{xDx~lx(x)~l2(x)22dx}\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\int_{x\sim D_{x}}||\widetilde{{\mathcal{F}}}_{l_{x}}(x)-\widetilde{{\mathcal{F}}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x\}. Let Jl(x)=diag(sign(l(x)))n×nJ_{l}(x)={\hbox{\rm{diag}}}({\hbox{\rm{sign}}}({\mathcal{F}}^{l}(x)))\in{\mathbb{R}}^{n\times n}, where l[L]l\in[L]. Then i(x)x=WL+1i(JL(x)WL)(JL1(x)WL1)(J1(x)W1)\frac{\nabla{\mathcal{F}}_{i}(x)}{\nabla x}=W^{i}_{L+1}(J_{L}(x)W_{L})(J_{L-1}(x)W_{L-1})\dots(J_{1}(x)W_{1}). Also, for all ~𝔽\widetilde{{\mathcal{F}}}\in{\mathbb{F}},

~i(x)x=W~L+1i(JL(x)W~L)(JL1(x)W~L1)(J1(x)W~1).\frac{\nabla\widetilde{{\mathcal{F}}}_{i}(x)}{\nabla x}=\widetilde{W}_{L+1}^{i}(J_{L}(x)\widetilde{W}_{L})(J_{L-1}(x)\widetilde{W}_{L-1})\dots(J_{1}(x)\widetilde{W}_{1}).

Denote W¯i=W~iWiSi\overline{W}_{i}=\widetilde{W}_{i}-W_{i}\in S_{i}. Then

(~lx(x))(~l2(x))=(W~L+1(lx)W~L+1(l2))(JL(x)W~L)(JL1(x)W~L1)(J1(x)W~1)=(WL+1(lx)WL+1(l2)+W¯L+1(lx)W¯L+1(l2))(JL(x)(WL+W¯L))(JL1(x)(WL1+W¯L1))(J1(x)(W1+W¯1))\begin{array}[]{ll}&\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x))\\ =&(\widetilde{W}_{L+1}^{(l_{x})}-\widetilde{W}_{L+1}^{(l_{2})})(J_{L}(x)\widetilde{W}_{L})(J_{L-1}(x)\widetilde{W}_{L-1})\dots(J_{1}(x)\widetilde{W}_{1})\\ =&(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})}+\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})(J_{L}(x)(W_{L}+\overline{W}_{L}))(J_{L-1}(x)(W_{L-1}+\overline{W}_{L-1}))\dots(J_{1}(x)(W_{1}+\overline{W}_{1}))\\ \end{array}

Let Al(x)=L(t)l(t)|t=xA_{l}(x)=\frac{\nabla{\mathcal{F}}^{L}(t)}{\nabla{\mathcal{F}}^{l}(t)}|_{t=x}, Bl(x)=l(t)t|t=xB_{l}(x)=\frac{\nabla{\mathcal{F}}^{l}(t)}{\nabla t}|_{t=x} where l[L]l\in[L]. Then

Al(x)=(JL(x)WL)(JL1(x)WL1)(Jl+1(x)Wl+1)A_{l}(x)=(J_{L}(x)W_{L})(J_{L-1}(x)W_{L-1})\dots(J_{l+1}(x)W_{l+1}) and

Bl(x)=(Jl(x)Wi)(Jl1(x)Wl1)(J1(x)W1)B_{l}(x)=(J_{l}(x)W_{i})(J_{l-1}(x)W_{l-1})\dots(J_{1}(x)W_{1}).

Let γ(x,W¯1)=(WL+1(lx)WL+1(l2))A1(x)J1(x)W¯1\gamma(x,\overline{W}_{1})=(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)J_{1}(x)\overline{W}_{1}, Ml(x,W¯l)=(WL+1(lx)WL+1(l2))Al(x)Jl(x)W¯lBl1(x)M_{l}(x,\overline{W}_{l})=(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)J_{l}(x)\overline{W}_{l}B_{l-1}(x) where l{2,3,,L}l\in\{2,3,\dots,L\}, ML+1(x,W¯L+1)=(W¯L+1(lx)W¯L+1(l2))BL(x)M_{L+1}(x,\overline{W}_{L+1})=(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}(x). By Lemma 6.3,

max~𝔽{xDx(~lx(x))(~l2(x))22dx}maxW¯lSl,l[L+1]{xDx(lx(x))(l2(x))22+l=1L+1Ml(W¯l)22dx}=xDx(lx(x))(l2(x))22+l=1L+1maxW¯lSl{xDxMl(W¯l)22dx}.\begin{array}[]{ll}&\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\int_{x\sim D_{x}}||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x))||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{l}\in S_{l},l\in[L+1]}\{\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}+\sum_{l=1}^{L+1}||M_{l}(\overline{W}_{l})||^{2}_{2}{\hbox{\rm{d}}}x\}\\ =&\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}+\sum_{l=1}^{L+1}\max_{\overline{W}_{l}\in S_{l}}\{\int_{x\sim D_{x}}||M_{l}(\overline{W}_{l})||^{2}_{2}{\hbox{\rm{d}}}x\}.\end{array}

Let W¯1(k)n×n\overline{W}_{1}(k)\in{\mathbb{R}}^{n\times n} be the matrix whose kk-th row is equal to kk-th row of W¯1\overline{W}_{1}, and other rows are 0. Let (J1(x))(k)(J_{1}(x))^{(k)} be the kk-th row of J1(x)J_{1}(x). Then

maxW¯1S1{xDxγ(x,W¯1)22}=maxW¯1S1{xDx(WL+1(lx)WL+1(l2))A1(x)J1(x)W¯122dx}maxW¯1S1{xDx(WL+1(lx)WL+1(l2))A1(x)J1(x)W¯12dx}maxW¯1S1,k[n]{xDx((WL+1(lx)WL+1(l2))A1(x)J1(x)W¯1(k))2dx}maxk[n]{xDx((WL+1(lx)WL+1(l2))A1(x)I((J1(x))(k)0)γ)2dx}.\begin{array}[]{ll}&\max_{\overline{W}_{1}\in S_{1}}\{\int_{x\sim D_{x}}||\gamma(x,\overline{W}_{1})||^{2}_{2}\}\\ =&\max_{\overline{W}_{1}\in S_{1}}\{\int_{x\sim D_{x}}||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)J_{1}(x)\overline{W}_{1}||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{1}\in S_{1}}\{\int_{x\sim D_{x}}||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)J_{1}(x)\overline{W}_{1}||_{\infty}^{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{1}\in S_{1},k\in[n]}\{\int_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}||J_{1}(x)\overline{W}_{1}(k)||_{\infty})^{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{k\in[n]}\{\int_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}I((J_{1}(x))^{(k)}\neq 0)\gamma)^{2}{\hbox{\rm{d}}}x\}.\\ \end{array}

By condition C2C_{2}, we know that PxDx((WL+1(lx)WL+1(l2))A1(x)>c1)>α1P_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}>c_{1})>\alpha_{1}, and by condition C4C_{4} and the principle of drawer, there exists a k[n]k\in[n] such that P(xDx)(I((J1(x))(k)0)|||(WL+1(lx)WL+1(l2))A1(x)||>c)>γP(x\sim D_{x})(I((J_{1}(x))^{(k)}\neq 0)\ |\ ||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}>c)>\gamma. Thus there exists a k[n]k\in[n] such that

P(xDx)(I((J1(x))(k)0),(WL+1(lx)WL+1(l2))A1(x)>c)>γα1.P(x\sim D_{x})(I((J_{1}(x))^{(k)}\neq 0),||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}>c)>\gamma\alpha_{1}.

Then

maxW¯1S1{xDxγ(x,W¯1)22}maxk[n]{xDx((WL+1(lx)WL+1(l2))A1(x)I((J1(x))(k)0)γ)2dx}maxk[n]{P(xDx)(I((J1(x))(k)0),(WL+1(lx)WL+1(l2))A1(x)>c)(cγ)2}(c1γ)2γα1.\begin{array}[]{ll}&\max_{\overline{W}_{1}\in S_{1}}\{\int_{x\sim D_{x}}||\gamma(x,\overline{W}_{1})||^{2}_{2}\}\\ \geq&\max_{k\in[n]}\{\int_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}I((J_{1}(x))^{(k)}\neq 0)\gamma)^{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{k\in[n]}\{P(x\sim D_{x})(I((J_{1}(x))^{(k)}\neq 0),||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{1}(x)||_{-\infty}>c)(c\gamma)^{2}\}\\ \geq&(c_{1}\gamma)^{2}\gamma\alpha_{1}.\\ \end{array}

Let S~l={xSl|\widetilde{S}_{l}=\{x\in S_{l}\,| only one row of xx is not 0}0\}. Then for l{2,3,,L}l\in\{2,3,\dots,L\}, we have

maxW¯lSi{xDxMl(x,W¯l)22dx}=maxW¯lSl{xDx(WL+1(lx)WL+1(l2))Al(x)Jl(x)W¯lBl1(x)22dx}maxW¯lS~l{xDx((WL+1(lx)WL+1(l2))Al(x)Jl(x)W¯lBl1(x))2dx}maxW¯lS~l{PxDx(||(WL+1(lx)WL+1(l2))Al(x)||>cl,||Jl(x)W¯lBl1(x)||dl||Jl(x)W¯l||,Jl(x)W¯l0)(γcldl1)2}\begin{array}[]{ll}&\max_{\overline{W}_{l}\in S_{i}}\{\int_{x\sim D_{x}}||M_{l}(x,\overline{W}_{l})||^{2}_{2}{\hbox{\rm{d}}}x\}\\ =&\max_{\overline{W}_{l}\in S_{l}}\{\int_{x\sim D_{x}}||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)J_{l}(x)\overline{W}_{l}B_{l-1}(x)||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{l}\in\widetilde{S}_{l}}\{\int_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)||_{-\infty}||J_{l}(x)\overline{W}_{l}B_{l-1}(x)||_{\infty})^{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{l}\in\widetilde{S}_{l}}\{P_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)||_{-\infty}>c_{l},\ ||J_{l}(x)\overline{W}_{l}B_{l-1}(x)||_{\infty}\geq d_{l}||J_{l}(x)\overline{W}_{l}||_{\infty},\\ &J_{l}(x)\overline{W}_{l}\neq 0)(\gamma c_{l}d_{l-1})^{2}\}\\ \end{array}

By conditions C3C_{3} and C2C_{2}, we have PxDx(||(WL+1(lx)WL+1(l2))Al(x)||>cl,||Jl(x)W¯lBl1(x)||P_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)||_{-\infty}>c_{l},\ ||J_{l}(x)\overline{W}_{l}B_{l-1}(x)||_{\infty} dl||Jl(x)W¯l||)>αl+βl11\geq d_{l}||J_{l}(x)\overline{W}_{l}||_{\infty})>\alpha_{l}+\beta_{l-1}-1. By condition C4C_{4} and the principle of drawer, there exists a W¯iS~i\overline{W}_{i}\in\widetilde{S}_{i} such that PxDx(Jl(x)W¯l0|||(WL+1(lx)WL+1(l2))Al(x)||>cl,||Jl(x)W¯lBl1(x)||dl||Jl(x)W¯i||)>γP_{x\sim D_{x}}(J_{l}(x)\overline{W}_{l}\neq 0\ |\ ||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)||_{-\infty}>c_{l},\ ||J_{l}(x)\overline{W}_{l}B_{l-1}(x)||_{\infty}\geq d_{l}||J_{l}(x)\overline{W}_{i}||_{\infty})>\gamma. So, there exists a W¯lS~l\overline{W}_{l}\in\widetilde{S}_{l} such that

PxDx((WL+1(lx)WL+1(l2))Al(x)>cl,Jl(x)W¯lBl1(x)dlJl(x)W¯l,Jl(x)W¯l0)>γ(αl+βl11).\begin{array}[]{ll}P_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{l}(x)||_{-\infty}>c_{l},||J_{l}(x)\overline{W}_{l}B_{l-1}(x)||_{\infty}\geq d_{l}||J_{l}(x)\overline{W}_{l}||_{\infty},J_{l}(x)\overline{W}_{l}\neq 0)\\ >\gamma(\alpha_{l}+\beta_{l-1}-1).\end{array}

Then

maxW¯iSi{xDxMi(x,W¯i)22dx}maxW¯iS~i{PxDx(||(WL+1(lx)WL+1(l2))Ai(x)||>ci,||Ji(x)W¯iBi1(x)||di||Ji(x)W¯i||,Ji(x)W¯i0)(γcidi1)2}(γcidi1)2γ(αi+βi11).\begin{array}[]{ll}&\max_{\overline{W}_{i}\in S_{i}}\{\int_{x\sim D_{x}}||M_{i}(x,\overline{W}_{i})||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{i}\in\widetilde{S}_{i}}\{P_{x\sim D_{x}}(||(W_{L+1}^{(l_{x})}-W_{L+1}^{(l_{2})})A_{i}(x)||_{-\infty}>c_{i},\\ &||J_{i}(x)\overline{W}_{i}B_{i-1}(x)||_{\infty}\geq d_{i}||J_{i}(x)\overline{W}_{i}||_{\infty},\ J_{i}(x)\overline{W}_{i}\neq 0)(\gamma c_{i}d_{i-1})^{2}\}\\ \geq&(\gamma c_{i}d_{i-1})^{2}\gamma(\alpha_{i}+\beta_{i-1}-1).\end{array}

Similarly, by condition C3C_{3}, we have

maxW¯L+1SL+1{xDxML+1(x,W¯L+1)22dx}=maxW¯L+1SL+1{xDx(W¯L+1(lx)W¯L+1(l2))BL(x)22dx}maxW¯L+1SL+1{xDx(W¯L+1(lx)W¯L+1(l2))BL(x)2dx}maxW¯L+1SL+1{xDxI((W¯L+1(lx)W¯L+1(l2))BL(x)dLW¯L+1(lx)W¯L+1(l2))(dLW¯L+1(lx)W¯L+1(l2))2dx}βL(dLγ)2.\begin{array}[]{ll}&\max_{\overline{W}_{L+1}\in S_{L+1}}\{\int_{x\sim D_{x}}||M_{L+1}(x,\overline{W}_{L+1})||^{2}_{2}{\hbox{\rm{d}}}x\}\\ =&\max_{\overline{W}_{L+1}\in S_{L+1}}\{\int_{x\sim D_{x}}||(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}(x)||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{L+1}\in S_{L+1}}\{\int_{x\sim D_{x}}||(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}(x)||_{\infty}^{2}{\hbox{\rm{d}}}x\}\\ \geq&\max_{\overline{W}_{L+1}\in S_{L+1}}\{\int_{x\sim D_{x}}I(||(\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})})B_{L}(x)||_{\infty}\geq d_{L}||\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})}||_{\infty})(d_{L}||\overline{W}_{L+1}^{(l_{x})}-\overline{W}_{L+1}^{(l_{2})}||_{\infty})^{2}{\hbox{\rm{d}}}x\}\\ \geq&\beta_{L}(d_{L}\gamma)^{2}.\end{array}

Then we have the desired lower bound

max~𝔽{xDx(~lx(x))(~l2(x))22dx}xDx(lx(x))(l2(x))22dx+i=1L+1maxW¯iSi{xDxMi(W¯i)22dx}xDx(lx(x))(l2(x))22dx+(γc)2α1γ+i=2L(γcidi1)2γ(αi+βi11)+βL(dLγ)2.\begin{array}[]{ll}&\max_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\int_{x\sim D_{x}}||\nabla(\widetilde{{\mathcal{F}}}_{l_{x}}(x))-\nabla(\widetilde{{\mathcal{F}}}_{l_{2}}(x))||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x+\sum_{i=1}^{L+1}\max_{\overline{W}_{i}\in S_{i}}\{\int_{x\sim D_{x}}||M_{i}(\overline{W}_{i})||^{2}_{2}{\hbox{\rm{d}}}x\}\\ \geq&\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x+(\gamma c)^{2}\alpha_{1}\gamma+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}.\\ \end{array}

By condition C1C_{1} and the lower bound just obtained, we have

min~𝔽{xDxlx(x)l2(x)22dxxDx~lx(x)~l2(x)22dx}xDxlx(x)l2(x)22dxxDx(lx(x))(l2(x))22dx+(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)2=1(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)2xDx(lx(x))(l2(x))22dx+(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)21(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)24A+(γc1)2α1γ1+i=2L(γcidi1)2γi(αi+βi11)+βL(dLγ)2.\begin{array}[]{ll}&\min_{\widetilde{{\mathcal{F}}}\in{\mathbb{F}}}\{\frac{\int_{x\sim D_{x}}||{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x}{\int_{x\sim D_{x}}||\widetilde{{\mathcal{F}}}_{l_{x}}(x)-\widetilde{{\mathcal{F}}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x}\}\\ \leq&\frac{\int_{x\sim D_{x}}||{\mathcal{F}}_{l_{x}}(x)-{\mathcal{F}}_{l_{2}}(x)||_{2}^{2}{\hbox{\rm{d}}}x}{\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x+(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}\\ =&1-\frac{(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}{\int_{x\sim D_{x}}||\nabla({\mathcal{F}}_{l_{x}}(x))-\nabla({\mathcal{F}}_{l_{2}}(x))||_{2}^{2}{\hbox{\rm{d}}}x+(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}\\ \leq&1-\frac{(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}{4A+(\gamma c_{1})^{2}\alpha_{1}\gamma_{1}+\sum_{i=2}^{L}(\gamma c_{i}d_{i-1})^{2}\gamma_{i}(\alpha_{i}+\beta_{i-1}-1)+\beta_{L}(d_{L}\gamma)^{2}}.\end{array}

The theorem is proved. ∎

7 Conclusion

The adversarial parameter attack for DNNs is proposed. In the attack, the adversary makes small changes to the parameters of a trained DNN such that the attacked DNN will keep the accuracy of the original DNN as much as possible, but makes the robustness as low as possible. The goal of the attack is that the attacked DNN is imperceptible to the user and at the same time the robustness of the DNN is broken. The existence of adversarial parameters is proved under certain conditions and effective adversarial parameter attack algorithms are also given.

In general, it is still out of reach to provide provable safety DNNs in real-world applications, and one of the ways to develop safer DNN models and training methods, and evaluate the safety of the trained model against existing attack methods. In other words, a DNN to be deployed is considered safe if it is safe against existing attacks in certain sense. From this viewpoint, it is valuable to have more attack methods. This is similar to the cryptanalysis [13], where much more matured theory and attack methods are developed.

References

  • [1] N. Akhtar and A. Mian. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. arXiv:1801.00553v3, 2018.
  • [2] B. Allen, S. Agarwal, J. Kalpathy-Cramer, K. Dreyer. Democratizing AI. J. Am. Coll. Radiol., 16(7), 961-963, 2019.
  • [3] A. Azulay and Y. Weiss. Why Do Deep Convolutional Networks Generalize so Poorly to Small Image Transformations? Journal of Machine Learning Research, 20, 1-25, 2019.
  • [4] A. Athalye, N. Carlini, D. Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. Proc. ICML, PMLR, 2018: 274-283.
  • [5] T. Bai, J. Luo, J. Zhao. Recent Advances in Understanding Adversarial Robustness of Deep Neural Networks. ArXiv:2011.01539, 2020.
  • [6] A. Bastounis, A.C. Hansen, V. Vlac˘\breve{\rm{c}}ic´\acute{\rm{c}}. The Mathematics of Adversarial Attacks in AI - Why Deep Learning is Unstable Despite the Existence of Stable Neural Networks. arXiv:2109.06098, 2021.
  • [7] J. Breier, X. Hou, D. Jap, L. Ma, S. Bhasin, Y. Liu. DeepLaser: Practical Fault Attack on Deep Neural Networks. arXiv:1806.05859, 2018.
  • [8] J. Cohen, E. Rosenfeld, Z. Kolter. Certified Adversarial Robustness via Randomized Smoothing. Proc. ICML’2019, PMLR, 1310-1320, 2019.
  • [9] G. Cybenko. Approximation by Superpositions of a Sigmoidal Function. Mathematics of Control, Signals and Systems, 2(4): 303-314, 1989.
  • [10] C. Etmann, S. Lunz, P. Maass, C.B. Schonlieb. On the Connection Between Adversarial Robustness and Saliency Map Interpretability. arXiv:1905.04172, 2019.
  • [11] K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. Proc. CVPR, 770-778, 2016.
  • [12] M. Hein and M. Andriushchenko. Formal Guarantees on the Robustness of a Classifier Against Adversarial Manipulation. Proc. NIPS, 2266-2276, 2017.
  • [13] O. Goldreich. Foundations of Cryptography, Volume II, Basic Tools. Cambridge University Press, 2009.
  • [14] I.J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv:1412.6572, 2014.
  • [15] Y. LeCun, Y. Bengio, G. Hinton. Deep Learning. Nature, 521(7553), 436-444, 2015.
  • [16] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gradient-based Learning Applied to Document Recognition. Proc. of the IEEE, 86(11): 2278-2324, 1998.
  • [17] H. Li, A. Kadav, I. Durdanovic, H. Samet, H.P. Graf. Pruning Filters for Efficient Convnets. arXiv:1608.08710, 2016.
  • [18] Y. Liu, L. Wei, B. Luo, Q. Xu. Fault Injection Attack on Deep Neural Network. Proc. of the IEEE/ACM International Conference on Computer-Aided Design, 131-138, 2017.
  • [19] A. Ma, F. Faghri, N. Papernot, A.M. Farahmand. SOAR: Second-Order Adversarial Regularization. arXiv:2004.01832, 2020.
  • [20] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. ArXiv:1706.06083, 2017.
  • [21] A.S. Morcos, D.G.T. Barrett, NC. Rabinowitz, M. Botvinick. On the Importance of Single Directions for Generalization. t arXiv:1803.06959, 2018.
  • [22] B. Neyshabur, R. Tomioka, N. Srebro. Norm-based Capacity Control in Neural Networks. Proc. COLT’15, 1376-1401, 2015.
  • [23] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z.B. Celik, A. Swami. The Limitations of Deep Learning in Adversarial Settings. IEEE European Symposium on Security and Privacy, IEEE Press, 2016: 372-387.
  • [24] A. Raghunathan, J. Steinhardt, P. Liang. Certified Defenses Against Adversarial Examples. ArXiv: 1801.09344, 2018.
  • [25] A. RoyChowdhury, P. Sharma, E. Learned-Miller. Reducing Duplicate Filters in Deep Neural Networks. NIPS workshop on Deep Learning: Bridging Theory and Practice, 1, 2017.
  • [26] W. Shang, K. Sohn, D. Almeida, H. Lee. Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units. Proc. ICML, PMLR, 2217-2225, 2016.
  • [27] A. Shafahi, W.R. Huang, C. Studer, S. Feizi, T. Goldstein. Are Adversarial Examples Inevitable? ArXiv:1809.02104, 2018.
  • [28] C.J. Simon-Gabriel, Y. Ollivier, L. Bottou, B. Schölkopf, D. Lopez-Paz. First-order Adversarial Vulnerability of Neural Networks and Input Dimension. Proc. ICML, PMLR, 5809-5817, 2019.
  • [29] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-scale Image Recognition. arXiv:1409.1556, 2014.
  • [30] X. Sun, Z. Zhang, X. Ren, R. Luo, L. Li. Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption. arXiv:2006.05620, 2020.
  • [31] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I.J. Goodfellow, R. Fergus. Intriguing Properties of Neural Networks. ArXiv:1312.6199, 2013.
  • [32] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, P. McDaniel. The Space of Transferable Adversarial Examples. arXiv:1704.03453, 2017.
  • [33] Y.L. Tsai, C.Y. Hsu, C.M. Yu, P.Y. Chen. Formalizing Generalization and Robustness of Neural Networks to Weight Perturbations. arXiv:2103.02200, 2021.
  • [34] I.Y. Tyukin, D.J. Higham, A.N. Gorban. On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems. 2020 International Joint Conference on Neural Networks, 1-6, IEEE Press, 2020.
  • [35] I.Y. Tyukin, D.J. Higham, A.N. Gorban. E. Woldegeorgis. The Feasibility and Inevitability of Stealth Attacks. arXiv2106.13997, 2021.
  • [36] Z. Wang, C. Xiang, W. Zou, C. Xu. MMA Regularization: Decorrelating Weights of Neural Networks by Maximizing the Minimal Angles. arXiv:2006.06527, 2020.
  • [37] T.W. Weng, P. Zhao, S. Liu, P.Y. Chen, X. Lin, L. Daniel. Towards Certificated Model Robustness Against Weight Perturbations. Proc. of the AAAI, 34(04): 6356-6363, 2020.
  • [38] H. Xu, Y. Ma, H.C. Liu, D, Deb, H. Liu J.L. Tang, A.K. Jain. Adversarial Attacks and Defenses in Images, Graphs and Text: A Review. International Journal of Automation and Computing, 17(2), 151-178, 2020.
  • [39] L. Yu and X.S. Gao. Improve the Robustness and Accuracy of Deep Neural Network with L2,L_{2,\infty} Normalization, arXiv:2010.04912, 2020.
  • [40] L. Yu and X.S. Gao. Robust and Information-theoretically Safe Bias Classifier against Adversarial Attacks. arXiv:2111.04404, 2021.
  • [41] H. Zhang, Y. Yu, J. Jiao, E.P. Xing, L.E. Ghaoui, M.I. Jordan. Theoretically Principled Trade-off Between Robustness and Accuracy. Proc. ICML, 2019.
  • [42] X.Y. Zhang, C.L. Liu, C.Y. Suen. Towards Robust Pattern Recognition: A Review. Proc. of the IEEE, 108(6), 894-922, 2020.
  • [43] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, X. Lin. Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks. Proc. of the 56th Annual Design Automation Conference, 2019.