Model-free False Data Injection Attack in Networked Control Systems: A Feedback Optimization Approach

Xiaoyu Luo^†, , Chongrong Fang^†, , Jianping He^†, , Chengcheng Zhao^‡, , and Dario Paccagnan^§ ^†: Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China. E-mail: [email protected], [email protected], [email protected].^‡: State Key Laboratory of Industrial Control Technology and Institute of Cyberspace Research, Zhejiang University, China. E-mail: [email protected].^§: Department of Computing, Imperial College London, London SW7 2AZ, UK. E-mail: [email protected].

Abstract

Security issues have gathered growing interest within the control systems community, as physical components and communication networks are increasingly vulnerable to cyber attacks. In this context, recent literature has studied increasingly sophisticated false data injection attacks, with the aim to design mitigative measures that improve the systems’ security. Notably, data-driven attack strategies – whereby the system dynamics is oblivious to the adversary – have received increasing attention. However, many of the existing works on the topic rely on the implicit assumption of linear system dynamics, significantly limiting their scope. Contrary to that, in this work we design and analyze truly model-free false data injection attack that applies to general linear and nonlinear systems. More specifically, we aim at designing an injected signal that steers the output of the system toward a (maliciously chosen) trajectory. We do so by designing a zeroth-order feedback optimization policy and jointly use probing signals for real-time measurements. We then characterize the quality of the proposed model-free attack through its optimality gap, which is affected by the dimensions of the attack signal, the number of iterations performed, and the convergence rate of the system. Finally, we extend the proposed attack scheme to the systems with internal noise. Extensive simulations show the effectiveness of the proposed attack scheme.

Index Terms:

Data-driven, false data injection attacks, zeroth-order feedback optimization.

I Introduction

Networked control systems have seen a surge of interest in recent years, largely owing to their widespread applicability to commonly encountered problems including mobile robots coordination, smart grids operation, unmanned vehicles control, remote diagnosis, to mention but a few [1, 2, 3]. As physical components and communication networks are increasingly vulnerable to cyber attacks, security issues have gathered growing traction in the community.

In this context, false data injection (FDI) – whereby an attacker injects false data by compromising sensor readings or communication channels – is a commonly encountered form of attack [4]. Crucially, through an FDI attack, an adversary can cause significant damage to the infrastructure while remaining undetected.

I-A Motivation

Against this backdrop, recent literature has proposed increasingly sophisticated FDI attacks, with the hope that understanding their workings would lead to the design of mitigative measures that improve the systems’ security [5, 6, 7, 8]. Among them, data-driven attack strategies – whereby an attack signal is designed solely relying on the available system’s measurements – have received growing attention. However, three critical issues deserve further consideration. First, many of the existing works tacitly assume that, while unknown, the underlying system follows a linear time invariant dynamics [9, 10, 11]. This significantly restricts the power of the adversary. Second, without any prior information about the system model, the ability that the adversary achieves its attack objective needs to be explored. It is conducive to analyzing the system’s vulnerability. Third, when the capability of the adversary is limited, e.g., the attack energy is limited [12], it is practical and promising to excavate the potential attack’s impact while achieving the malicious objective.

In this work, we tackle the aforementioned issues by proposing a novel data-driven FDI attack that, crucially, does not rely on prior information regarding the system’s model, and applies to general non-linear dynamics. Towards this goal, we leverage zeroth-order optimization methods, a class of optimization algorithms that do not necessitate the availability of the cost function’s gradient, but simply exploit function evaluations [13]. More specifically, we leverage zeroth-order optimization methods in the context of feedback optimization, where optimization algorithms are utilized as feedback controllers for dynamical systems [14]. These tools provide a new approach to design data-driven attacks.

I-B Contributions

In this work, we design a model-free FDI attack that does not rely on knowledge of the underlying dynamics and that applies to general linear and non-linear systems. Specifically, we aim at steering the output of an unknown dynamical system to a (maliciously chosen) trajectory, through the sole use of real-time measurements. We do so in the bounded attack model, where an upper bound on the energy of the injected signal is given. A comparison between the existing FDI attack strategies and our work is shown in Table I.

Compared to our conference version [15], we extend the proposed model-free attack strategy to systems with internal noise and explore its effects on the optimality of the obtained solutions. Moreover, we significantly expand upon the related work, motivation, performance analysis, and simulation results. The main contributions are summarized as follows.

TABLE I: Comparison of FDI attack designs

Model-based

[5, 8], etc.

Data-driven

[10, 6], etc.

Our work

Prior

Information

Knowledge

of dynamics

Linear

Dyanamics

None

Objective

Steer state

to desired point

Remain

undetected

Steer state to

desired trajectory

Approach

Dynamic

programming

Subspace

method

Zeroth-order

feedback-optimization

•

We construct a zeroth-order feedback optimization framework for the design of an FDI attack strategy, where the adversary has limited capability and no prior information about the system model.
•

We propose a model-free attack scheme that drives the output of the system to a maliciously chosen trajectory. From a methodological standpoint, its novelty lies in directly updating the attack signal based on the objective function evaluations.
•

We theoretically characterize the solution’s optimality gap. Further, we analyze the impact of the attack signal’s dimension, of the iteration numbers, of the variance of the objective function, and of the convergence rate of the dynamical system on the optimality gap. Finally, we extend the proposed model-free attack scheme to noisy systems and derive an upper bound of the optimality gap.

I-C Paper organization

The rest of the paper is organized as follows. Section II reviews the related works. Section III introduces the system and adversary model and formulates the FDI attack design problem. In Section IV, we design the proposed model-free attack strategy and analyze the optimality gap. Section V extends the model-free attack scheme to noisy systems. In Section VI, we analyze the design of stealthy attacks. Simulation results are presented in Section VII. Finally, we conclude our work in Section VIII.

II Related works

Existing FDI attack strategies can be divided into two streams depending on whether they rely on a model-based or data-driven approach.

The literature on model-based FDI attack strategy is vast, and includes [5, 16, 7, 17, 8]. In the following, we review only the works that are most relevant to ours. When the adversary is aware of the system dynamics and other critical information (e.g., statistical properties of noise and the controller’s feedback matrix), Chen et al. [5] formulate a linear quadratic cost function to steer the system state to a desired value, while satisfying a detection-avoidance constraint. In similar settings, i.e., when knowledge of the system dynamics is available, Guo et al. [16] propose an innovation-based linear attack strategy and formulates a two-stage optimization problem to obtain the most-damaging attack policy. In [7], Wang proposes an optimal attack strategy to deteriorate the performance of fault detectors by solving coupled backward recursive Riccati difference equations (RDEs). In [17], the authors design an FDI attack strategy against a remote state estimation algorithm with sensor-to-estimator communication rate constraint. With the knowledge of all system parameters except for the filter gain, Zhang et al. [8] design stealthy attacks based on the Fisher information matrix to maximize the estimation error. Note that the design of the above FDI attack strategies is mostly based on the full knowledge of the system model. However, when the system model changes or its exact knowledge is unavailable, the previous approaches can not be applied.

On the other hand, data-driven attack strategies have recently gained momentum [18, 9, 10, 6]. Two approaches are typically pursued. The first approach consists in exploiting offline observation of the system’s dynamics to identify the matrices of the linear system model. Naturally, this approach does not apply to genuinely non-linear dynamics. The second approach consists in directly utilizing input-output data to design an attack strategy. For example, Esmalifalak et al. [18] apply linear independent component analysis (ICA) to estimate the system Jacobian matrix and design unobservable attacks based on the inferred structure. Kim et al. [9] extend the work in [4] and present two data-driven attack strategies based on subspace methods. An et al. [10] formulate the attack goal as a data-based $\mathcal{L}_{2}$ -gain composite optimization problem and propose a new multiobjective adaptive dynamic programming (ADP) method for designing the attack policy. Zhao et al. [6] propose an undetected FDI attack strategy based on a subspace identification technique to maximize the state estimation error. Note that the linearity of the system dynamics is still a crucial and implicit assumption necessary for all the aforementioned works. Our work relaxes this assumption and provides a new perspective to construct a completely model-free attack strategy based on the zeroth-order feedback optimization framework.

III Problem formulation

III-A System dynamic model $\&$ adversary model

Consider a discrete-time dynamical system

\displaystyle\begin{split}x_{k+1}=&f(x_{k},u_{k}),\\ y_{k}=&g(x_{k}),\end{split}

(1)

where $x_{k}\in\mathbb{R}^{n}$ is the system state at iteration $k$ , $u_{k}\in\mathbb{R}^{m}$ is the system input, $y_{k}\in\mathbb{R}^{q}$ is the system output.

Assumption 1.

The system (1) is stable under the control of system input $u_{k},\forall k\in\mathbb{N}$ .

Consider that the adversary can compromise the stable system and manipulate the state $x_{k}$ arbitrarily and aims to steer the output value $y^{a}_{k}$ to its expected trajectory. The dynamical system under attack can be rewritten as

\displaystyle\begin{split}x_{k+1}^{a}=&f(x_{k}^{a},u_{k})+\Gamma\theta_{k},\\ y_{k}^{a}=&g(x_{k}^{a}),\end{split}

(2)

where the attack selection matrix $\Gamma\in\mathbb{R}^{n\times p}$ is defined as the non-zero columns of $\mathrm{diag}(\gamma_{1},\ldots,\gamma_{n})$ with the binary variable $\gamma_{i}=1$ if the $i$ -th dimensional state is compromised, and $\theta_{k}\in\mathbb{R}^{p}$ is the injected false data. Then, we make the following assumption about the ability of the adversary.

Assumption 2.

The capability of the adversary is limited, i.e., $\theta_{k}^{\mathrm{T}}\theta_{k}\leq R$ , where $R$ is the upper bound of attack energy.

Assumption 2 is common for energy-constrained adversaries [12], which means that the injected false data is bounded. With Assumptions 1 and 2, it is easy to obtain the following lemma to show that the compromised system (2) is still stable.

Lemma 1.

For the compromised system (2), there exists a unique steady-state map $x^{a}_{ss}:\mathbb{R}^{m}\times\mathbb{R}^{p}\rightarrow\mathbb{R}^{n}$ such that $\forall\theta,f^{\prime}(x^{a}_{ss}(u,\theta),u,\theta)\triangleq f(x^{a}_{ss}(u,\theta),u)+\Gamma\theta=x^{a}_{ss}(u,\theta)$ . The map $x^{a}_{ss}(u,\theta)$ is $M_{x}$ -Lipschitz with respect to $\theta$ , and the function $g(x^{a})$ is $M_{g}$ -Lipschitz with respect to $x^{a}$ .

Remark 1.

Lemma 1 is similar to [19] for guaranteeing the stability of the system. If the system under the bounded FDI attacks has no unique steady-state map $x^{a}_{ss}$ , it is obvious that the system will diverse and even the original system (1) is unstable. The properties of the map $x^{a}_{ss}(u,\theta)$ can be ensured by the implicit function theorem [20, Theorem 1B.1]. With Lemma 1, in the steady state we have

\displaystyle y^{a}=g(x^{a}_{ss}(u,\theta))\triangleq h(u,\theta).

Additionally, the Lyapunov theorem presented in [21, Theorem 2.7] guarantees that there exist a Lyapunov function $V:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{p}\rightarrow\mathbb{R}$ and parameters $\alpha_{1},\alpha_{2},\alpha_{3}>0$ such that

	$\displaystyle\alpha_{1}\\|x^{a}-x^{a}_{ss}(u,\theta)\\|^{2}\leq V(x^{a},u,\theta)\leq\alpha_{2}\\|x^{a}-x^{a}_{ss}(u,\theta)\\|^{2},$		(3)
	$\displaystyle V(f^{\prime}(x^{a}_{ss}(u,\theta),u,\theta))-V(x^{a},u,\theta)\leq-\alpha_{3}\\|x^{a}-x^{a}_{ss}(u,\theta)\\|^{2}.$		(4)

Based on (3) and (4), the rate of the change in one step of the function value $V(x^{a},u,\theta)$ is denoted by

\displaystyle\mu\triangleq\frac{2\alpha_{2}}{\alpha_{1}}(1-\frac{\alpha_{3}}{\alpha_{2}}).

(5)

Assumption 3.

The convergence rate $\mu$ satisfies $\mu<1$ .

The smaller $\mu$ is, the faster the system converges to the steady state [22]. The formal interpretation of $\mu$ will be presented later in Lemma 4.

III-B Problem formulation

In this paper, we aim to design a completely model-free attack strategy, which is independent of the characteristics and parameters of the system model itself.

Herein, we consider that the adversary’s objective is to steer the output value $y^{a}_{k}$ to follow its expected malicious trajectory $\bar{y}_{k}$ as closely as possible. We also consider that the adversary has limited attack energy. Therefore, the total goal of adversaries is to reduce both the error between the true system output and expected trajectory and the consumed attack energy as much as possible. In addition, since our proposed attack strategy performs the optimization with the same objective function at each iteration $k$ , we omit the subscript $k$ and formally formulate the problem as

$\displaystyle\mathbf{\mathcal{P}_{1}}:\quad$	$\displaystyle\mathrm{min}_{\theta}~{}\Phi(\theta,y^{a})=\\|y^{a}-\bar{y}\\|+\theta^{\mathrm{T}}Q\theta$	(6)
	$\displaystyle\mathrm{s.t.}~{}y^{a}=h(u,\theta),$
	$\displaystyle\quad~{}~{}\theta^{\mathrm{T}}\theta\leq R,$

where $y^{a}=h(u,\theta)$ is the steady-state map under attacks in (2) to guarantee the stability of the compromised system (2), $\bar{y}$ is the expected trajectory and $Q\in\mathbb{R}^{p\times p}$ is the positive definite weight matrix chosen by the adversary according to the tradeoff between the limited attack energy and tracking deviation $\|y^{a}-\bar{y}\|$ . We also make a common assumption for the optimized objective function as follows.

Assumption 4.

The function $\Phi(\theta,y^{a})$ is $M$ -Lipschitz with respect to $\theta$ , $M_{y}$ -Lipschitz with respect to $y^{a}$ , and $\inf_{\theta,y^{a}}\Phi(\theta,y^{a})>-\infty$ .

The challenges of solving problem $\mathcal{P}_{1}$ come from two aspects. One is the nonlinearity of the system model. For the unknown nonlinear system model (2), it is hard to regress its critical system parameters. The other is how to use the compromised measurements to guide the output value to move along the desired trajectory while reducing the consumed attack energy as much as possible. Since $h(u,\theta)$ is unknown, it is difficult to directly obtain the gradients of the objective function with respect to the independent variable $\theta$ to solve problem $\mathcal{P}_{1}$ .

The key idea of the zeroth-order optimization is to utilize the objective function evaluations to construct gradient estimates, thus avoiding using the gradients directly. We aim to construct the gradient estimates of the objective function to solve problem $\mathcal{P}_{1}$ . Different from the traditional zeroth-order optimization framework for the design of the controller with non-manipulated measurements, our design focuses on utilizing the compromised measurements to design the attack signal in the original control systems with designed controllers. Herein, we mainly explore the model-free attack strategy without detector constraints and the attack design under detector constraints will be analyzed in Section VI.

IV Model-free attack strategy design

In this section, we first introduce the zeroth-order optimization framework, which is the basis of our attack strategy design. Then, we utilize real-time measurements to design the attack signal. Finally, we analyze the optimality of the proposed attack strategy.

IV-A Preliminaries of zeroth-order optimization

The attack strategy design in this paper is inspired by the gradient estimates based on the residual feedback in [13].

For an objective function $\Phi(w):\mathbb{R}^{p}\rightarrow\mathbb{R}$ , the gradient estimate proposed in [13] is

\displaystyle\hat{\nabla}\Phi(w_{k})=\frac{v_{k}}{\delta}(\Phi(w_{k}+\delta v_{k})-\Phi(w_{k-1}+\delta v_{k-1})),

(7)

where $v_{k}$ and $v_{k-1}$ are independent random vectors selected uniformly from the unit sphere $\mathcal{S}_{p}\triangleq\{v_{k}\in\mathbb{R}^{p}:\|v_{k}\|=1\}$ , i.e., $v_{k}\sim U(\mathcal{S}_{p})$ and $\delta>0$ is the smoothing parameter. Note that only a new objective function evaluation needs to be computed at each iteration in (7) because the objective value evaluated at the previous iteration $k-1$ is reused at the current iteration $k$ .

According to [13, Lemma 5], $\hat{\nabla}\Phi(w_{k})$ in (7) is an unbiased estimate of the gradient of the smooth approximation $\Phi_{\delta}(w)$ for $\Phi(w)$ at $w_{k}$ , where

\displaystyle\Phi_{\delta}(w)=\mathbb{E}_{v\sim U(\mathcal{S}_{p})}[\Phi(w+\delta v)].

(8)

The properties of $\Phi_{\delta}(w)$ are shown as follows.

Lemma 2 ([19]).

If $\Phi_{\delta}(w):\mathbb{R}^{p}\rightarrow\mathbb{R}$ is $M-$ Lipschitz, then for any $w\in\mathbb{R}^{p},\delta>0$ and $\Phi_{\delta}(w)$ defined in (8), we have


$\displaystyle\mathbb{E}_{v\sim U(\mathbb{S}_{p})}[\frac{p}{\delta}\Phi(w+\delta v)v]=$	$\displaystyle\nabla\Phi_{\delta}(w),$	(9a)
$\displaystyle\|\Phi_{\delta}(w)-\Phi(w)\|\leq$	$\displaystyle M\delta,$	(9b)
$\displaystyle\\|\nabla\Phi_{\delta}(w)-\nabla\Phi(w)\\|\leq$	$\displaystyle\frac{Mp}{\delta}.$	(9c)

From (9c), we know that $\Phi_{\delta}(w)$ is $\frac{Mp}{\delta}$ -smooth, i.e., its gradient $\nabla\Phi_{\delta}(w)$ is $\frac{Mp}{\delta}$ -Lipschitz continuous.

Refer to caption — Figure 1: The schematic of model-free attack strategy design.

IV-B Attack strategy design

The proposed attack strategy iteratively updates attack inputs along the composite direction of the negative gradient estimates of the objective function and the projected gradients. Such a design only utilizes real-time measurements and thus makes the attack strategy intrinsically model-free.

We denote $\mathcal{U}$ as the constraint set in problem $\mathcal{P}_{1}$ . With the zeroth-order optimization framework, the proposed model-free attack strategy can be divided into three steps and the schematic of the attack strategy design is shown in Fig.1.
Step $1$ : Compute the gradient estimate $\tilde{\phi}_{k}$

\displaystyle\tilde{\phi}_{k}=\frac{pv_{k}}{\delta}[\Phi(\theta_{k},y_{k+1}^{a})-\Phi(\theta_{k-1},y_{k}^{a})],

(10)

where $v_{k}$ and $v_{k-1}$ are independent probing signals and follow the uniform distribution from the Euclidean unit sphere $\mathcal{S}_{p}$ , i.e., $v_{k}\sim U(\mathcal{S}_{p})$ . Since only the real-time measurements are available for the adversary and it is hard to directly compute the gradients of the objective function in problem $\mathcal{P}_{1}$ , we first utilize the probing signal $v_{k}$ for measurements, which can be used to construct the objective function evaluations $\Phi(\theta_{k},y_{k+1}^{a})$ and $\Phi(\theta_{k-1},y_{k}^{a})$ at the current and previous iteration. Herein, the historic function evaluation $\Phi(\theta_{k-1},y^{a}_{k})$ is reused at iteration $k+1$ . Then we compute the gradient estimates $\tilde{\phi}_{k}$ of the objective function by these evaluations with (10).
Step $2$ : Update the obtained solution $w_{k+1}$

\displaystyle w_{k+1}=\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}],

(11)

where $\Pi_{\mathcal{U}}[\cdot]$ is the projection onto constrained set $\mathcal{U}$ , i.e., $\Pi_{\mathcal{U}}[l_{1}]\equiv\mathrm{arg}\min_{l_{2}\in\mathcal{U}}\|l_{1}-l_{2}\|$ , and step-size $0<\eta<1$ . To constrain the obtained solutions in the feasible region set by $\mathcal{U}$ , we turn to the projected gradient descent method for updating the solution $w_{k+1}$ at iteration $k+1$ and solving the optimization problem $\mathcal{P}_{1}$ with constraints.
Step $3$ : Update the attack signal $\theta_{k+1}$

\displaystyle\theta_{k+1}=w_{k+1}+\delta v_{k+1}.

(12)

Finally, the attack signal $\theta_{k+1}$ can be obtained by perturbing the solution $w_{k+1}$ with the probing signal $\delta v_{k+1}$ .

IV-C Performance analysis

Let $\Phi(\theta_{k})\triangleq\Phi(\theta_{k},h(u_{k},\theta_{k}))$ . We use the optimality gap, i.e.,

\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\Phi(\theta_{k})-\Phi(\theta_{k}^{*})]

(13)

to measure the optimality of the proposed attack strategy at $\theta_{k}$ where $\theta^{*}_{k}$ is the optimal solution at iteration $k$ and $\mathbb{E}_{v_{[k]}}$ is the expectation with respect to $v_{[k]}$ where $v_{[k]}\triangleq(v_{1},\ldots,v_{k})$ .

Before we characterize (13), we provide the upper bounds of $\|w_{k+1}-w_{k+1}^{*}\|^{2}$ and $\|w_{k+1}-w_{k}\|^{2}$ , and some supporting lemmas for auxiliary analysis. We have

$\displaystyle\\|w_{k+1}-w_{k+1}^{*}\\|^{2}=$	$\displaystyle\\|\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}]-w_{k+1}^{*}\\|^{2}$
$\displaystyle\overset{(s.1)}{\leq}$	$\displaystyle\\|w_{k}-\eta\tilde{\phi}_{k}-w_{k+1}^{*}\\|^{2}$
$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle 2\\|w_{k}-w_{k+1}^{*}\\|^{2}+2\eta^{2}\\|\tilde{\phi}_{k}\\|^{2},$	(14)

where $(s.1)$ follows from the projection property [23, Lemma 2.4] and [24], i.e., for any $l_{1}\in\mathbb{R}^{p}$ and all $l_{2}\in\mathcal{U}$ , we have $\|\Pi_{\mathcal{U}}[l_{1}]-l_{2}\|\leq\|l_{1}-l_{2}\|$ , and $(s.2)$ follows the fact that $\|a-b\|^{2}\leq 2(\|a\|^{2}+\|b\|^{2})$ . Similarly, we have

$\displaystyle\\|w_{k+1}-w_{k}\\|^{2}=$	$\displaystyle\\|\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}]-w_{k}\\|^{2}$
$\displaystyle\leq$	$\displaystyle\\|w_{k}-\eta\tilde{\phi}_{k}-w_{k}\\|^{2}$
$\displaystyle\leq$	$\displaystyle\eta^{2}\\|\tilde{\phi}_{k}\\|^{2}.$	(15)

Note that we replace the steady output value $h(u_{k},\theta_{k})$ with the real-time output value $y^{a}_{k+1}$ to enter the closed-loop zeroth-order feedback optimization framework. It is unavoidable for the system to produce the error $e_{\Phi}(x^{a}_{k},\theta_{k})$ , which is shown as

\displaystyle e_{\Phi}(x^{a}_{k},\theta_{k})=\Phi(\theta_{k},y^{a}_{k+1})-\Phi(\theta_{k},h(u_{k},\theta_{k})).

(16)

To derive the optimality gap (13), we first analyze the upper bound of the error $e_{\Phi}(x^{a}_{k},\theta_{k})$ and recursive inequalities of two critical variables, i.e., $\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$ and $\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k}\|^{2}]$ .

Lemma 3.

If Assumptions 1, 2, 3, and 4 hold, then we have

\displaystyle|e_{\Phi}(x^{a}_{k},\theta_{k})|^{2}\leq\frac{\mu M_{y}^{2}M_{g}^{2}}{2\alpha_{2}}V(x^{a}_{k},u_{k},\theta_{k}).

(17)

Lemma 4.

If Assumptions 1, 2, 3, and 4 hold, with (10), (11) and (12), then we have

$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[V(x^{a}_{k},u_{k},\theta_{k})]$
	$\displaystyle\leq\mu\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]$
	$\displaystyle+4\alpha_{2}\eta^{2}M_{x}^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+16\alpha_{2}\delta^{2}M_{x}^{2}.$	(18)

Lemma 5.

If Assumptions 1, 2, 3, and 4 hold, with (10), (11), and (12), then we have

$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]$	$\displaystyle\leq\frac{6\eta^{2}p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+24p^{2}M^{2}$
	$\displaystyle+\frac{3\mu p^{2}M_{y}^{2}M_{g}^{2}}{2\alpha_{2}\delta^{2}}(\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$
	$\displaystyle+\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]).$	(19)

The proofs of Lemmas 3, 4 and 5 are shown in Appendix IX-A, IX-B and IX-C, respectively. Lemma 3 quantifies the close relationship between $\Phi(\theta_{k},y^{a}_{k+1})$ and $\Phi(\theta_{k},h(u_{k},\theta_{k}))$ . Lemma 4 measures the proximity of the current state $x^{a}_{k}$ compared with the steady state $x^{a}_{ss}(u_{k},\theta_{k})$ . Lemma 5 reflects the first order smoothness of the objective function evaluation $\Phi(\theta_{k},y^{a}_{k+1})$ at the solution $w_{k}$ .

Next, we provide the following theorem to characterize the optimality of the obtained solutions. Note that $\Phi(\theta_{k})\triangleq\Phi(\theta_{k},h(u_{k},\theta_{k}))$ . For simplicity, all the complexity results in this paper are presented in $\mathcal{O}$ notations.

Theorem 1.

Supposing that Assumptions 1, 2, 3, and 4 hold, for any given precision $\epsilon>0$ such that $|\Phi_{\delta}(\theta)-\Phi(\theta)|\leq\epsilon$ , let $\delta=\frac{\epsilon}{M}$ and $\eta=\frac{\kappa\epsilon}{pT}$ with $0<\kappa<\kappa^{*}$ , where

\displaystyle\kappa^{*}=\mathcal{O}\left(\min\left\{\frac{T\sqrt{\mu(1+\mu)}}{\mu},\frac{(1-\mu)T}{\sqrt{\mu(1+\mu)}}\right\}\right),

then we have

	$\displaystyle\frac{1}{T}\sum_{k=1}^{T}$	$\displaystyle\mathbb{E}_{v_{[T]}}[\Phi(\theta_{k})-\Phi(\theta_{k}^{*})]$
		$\displaystyle=\mathcal{O}\left(\frac{p^{2}(1+\mu)(1+\sqrt{1+\mu})}{(1-\rho)T^{2}}+\frac{\mu p^{2}}{T}\right),$		(20)

where $\rho\in(0,1)$ is the maximum eigenvalue of matrix $P$ given by

\displaystyle P=\left[\begin{array}[]{cc}p_{11}&\sqrt{p_{12}p_{21}}\\ \sqrt{p_{12}p_{21}}&p_{22}\end{array}\right]

with

	$\displaystyle p_{11}=$	$\displaystyle\frac{6p^{2}\eta^{2}}{\delta^{2}}(M^{2}+\mu M_{x}^{2}M_{y}^{2}M_{g}^{2}),\quad p_{22}=\mu,$
	$\displaystyle p_{12}=$	$\displaystyle\frac{3\mu p^{2}M_{y}^{2}M_{g}^{2}}{2\alpha_{2}\delta^{2}}(1+\mu),\quad p_{21}=4\alpha_{2}\eta^{2}M_{x}^{2},$
	$\displaystyle d_{1}=$	$\displaystyle 24p^{2}(M^{2}+\mu M_{x}^{2}M_{y}^{2}M_{g}^{2}),\quad d_{2}=16\alpha_{2}\delta^{2}M_{x}^{2}.$

Moreover, it also holds that

\displaystyle\rho=\mathcal{O}\left(\max\left\{\frac{(1-\mu)^{2}}{1+\mu},\mu\right\}+1-\mu\right).

Proof.

Please see Appendix IX-D. ∎

Theorem 1 shows that the optimality gap is related to the dimension $p$ of the attack signal, the convergence rate $\mu$ of the system, and the iterations $T$ . As the iterations $T$ increase gradually, the optimality gap decreases and it can even decay to zero as long as $T$ is large enough.

V Noise effects on model-free attack design

In this part, we further explore the effects of internal inherent noises on the proposed attack strategy and derive the optimality of solutions.

V-A Problem reformulation

With noise $d_{k}$ , the original system (2) can be rewritten as

\displaystyle\begin{split}x_{k+1}^{a}=&f(x_{k}^{a},u_{k},d_{k})+\Gamma\theta_{k},\\ y_{k}^{a,d}=&g(x_{k}^{a},d_{k}),\end{split}

(21)

where the injected false data $\theta_{k}$ satisfies Assumption 2 and the internal inherent noise $d_{k}\in\mathbb{R}^{r}$ is independent of the state $x_{k}$ and $\theta_{k}$ statistically. Herein, we consider the additive noise $d_{k}$ , such as $f(x_{k}^{a},u_{k},d_{k})=f(x_{k}^{a},u_{k})+d_{k}$ . Similar to Assumption 1, the above discrete-time system is stable with noise $d_{k}$ before the invasion of attacks, which can be guaranteed by [25, Theorem 2.2] if $d_{k}$ follows a standard Wiener process, i.e., the stochastic noise has zero mean and time-varying covariance. Let $\Phi(\theta,y^{a},d)=\|y^{a,d}-\bar{y}\|+\theta^{\mathrm{T}}Q\theta$ . In this case, the optimization problem becomes

$\displaystyle\mathbf{\mathcal{P}_{2}}:\quad$	$\displaystyle\mathrm{min}_{\theta}~{}\mathbb{E}_{d}[\Phi(\theta,y^{a},d)]$	(22)
	$\displaystyle\mathrm{s.t.}~{}y^{a,d}=h(u,\theta,d),$
	$\displaystyle\quad~{}~{}\theta^{\mathrm{T}}\theta\leq R,$

where $y^{a,d}=h(u,\theta,d)$ is the steady-state map under attacks in (21) to guarantee the stability of the compromised system. Let $\tilde{\Phi}(\theta)\triangleq\mathbb{E}_{d}[\Phi(\theta,y^{a},d)]$ . Then, we provide the following assumptions for the objective function $\Phi(\theta,y^{a},d)$ .

Assumption 5.

For any $\theta\in\mathbb{R}^{p}$ , there exists $\sigma>0$ such that

\displaystyle\mathbb{E}_{d}[(\Phi(\theta,y^{a},d)-\tilde{\Phi}(\theta))^{2}]\leq\sigma^{2}.

(23)

Assumption 6.

The function $\Phi(\theta,y^{a},d)$ is $M(d)$ -Lipschitz with respect to $\theta$ , $M_{y}(d)$ -Lipschitz with respect to $y^{a,d}$ , and $\inf_{\theta,y^{a},d}\Phi(\theta,y^{a},d)>-\infty$ . Moreover, we have $M(d)\leq M$ and $M_{y}(d)\leq M_{y}$ .

Assumption 5 provides a bounded variance of the objective function in the stochastic setting, which also implies that $\mathbb{E}_{d}[(\Phi(\theta,y^{a},d_{01})-\Phi(\theta,y^{a},d_{02}))^{2}]\leq 4\sigma^{2}$ [13]. In Assumption 6, the Lipschitz constants in the noisy system are constrained to be not larger than that in the noiseless system.

Moreover, the following lemma reveals that the compromised system (21) can still be stable in spite of the process and measurement noises and the noises do not influence the Lipschitz constant of the steady-state map.

Lemma 6.

For the compromised system (21), there exists a unique steady-state map $x^{a}_{ss}:\mathbb{R}^{m}\times\mathbb{R}^{p}\times\mathbb{R}^{r}\rightarrow\mathbb{R}^{n}$ such that $f^{\prime}(x^{a}_{ss}(u,\theta,d),u,\theta,d)\triangleq f(x^{a}_{ss}(u,\theta,d),u,d)+\Gamma\theta=x^{a}_{ss}(u,\theta,d)$ for any $\theta$ . In addition, $x_{ss}^{a}(u,\theta,d)$ is $M_{x}$ -Lipschitz with respect to $\theta$ , and the function $g(x^{a},d)$ is $M_{g}$ -Lipschitz with respect to $x^{a}$ .

Proof.

The proof can be divided into two parts. One is to find a Lyapunov function for guaranteeing the existence of the steady-state map. The other is to show the continuation property of the steady-state map based on the implicit function theorem [20, Theorem 1B.1].

Existence of the steady-state. In the steady-state, we have

\displaystyle y^{a,d}=g(x^{a}_{ss}(u,\theta,d))\triangleq h(u,\theta,d).

Similarly, there exists the following Lyapunov function $V:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{p}\rightarrow\mathbb{R}$ and parameters $\alpha_{1},\alpha_{2},\alpha_{3}>0$ such that

	$\displaystyle c_{1}\\|x^{a}-x^{a}_{ss}(u,\theta,d)\\|^{2}\leq V(x^{a},u,\theta,d)\leq c_{2}\\|x^{a}-x^{a}_{ss}(u,\theta,d)\\|^{2},$		(24)
	$\displaystyle V(f^{\prime}(x^{a}_{ss}(u,\theta,d)))-V(x^{a},u,\theta,d)\leq-c_{3}\\|x^{a}-x^{a}_{ss}(u,\theta,d)\\|^{2}.$		(25)

Based on (24) and (25), the rate of the change in one step of the function value $V(x^{a},u,\theta,d)$ is denoted as

\displaystyle\mu^{\prime}\triangleq\frac{2c_{2}}{c_{1}}(1-\frac{c_{3}}{c_{2}}).

(26)

The stability of the compromised system can be guaranteed if $\mu^{\prime}<1$ .

Continuity of the steady-state. Let $F(x,u,\theta,d)=f(x^{a}_{ss}(u,\theta,d),u,d)+\Gamma\theta-x^{a}_{ss}(u,\theta,d)=0$ . Differentiating both sides of the above equation with respect to $\theta$ gives that

\displaystyle\frac{\partial{F}}{\partial{x_{ss}^{a}}}\frac{\partial{x_{ss}^{a}}}{\partial{\theta}}+\frac{\partial{F}}{\partial{u}}\frac{\partial{u}}{\partial{x_{ss}^{a}}}\frac{\partial{x_{ss}^{a}}}{\partial{\theta}}+\frac{\partial{F}}{\partial{\theta}}+\frac{\partial{F}}{\partial{d}}\cdot 0=0.

When $F(x,u,\theta,d)$ is continuously differentiable with respect to $\theta$ in the neighborhood of $(x_{ss}^{a},u,\theta,d)$ and $\frac{\partial{F}}{\partial{x_{ss}^{a}}}+\frac{\partial{F}}{\partial{u}}\frac{\partial{u}}{\partial{x_{ss}^{a}}}$ is nonsingular, $x_{ss}^{a}$ is the Lipschitz function with respect to $\theta$ where the Lipschitz constant $M_{x}$ satisfies

\displaystyle M_{x}=\mathrm{sup}\left|-\frac{\frac{\partial{F}}{\partial{\theta}}}{\frac{\partial{F}}{\partial{x_{ss}^{a}}}+\frac{\partial{F}}{\partial{u}}\frac{\partial{u}}{\partial{x_{ss}^{a}}}}\right|.

Since we consider the additive noise from (21), it can be followed that

	$\displaystyle\frac{\partial{F(x,u,\theta)}}{\partial{\theta}}$	$\displaystyle=\frac{\partial{F(x,u,\theta,d)}}{\partial{\theta}},$
	$\displaystyle\frac{\partial{F(x,u,\theta)}}{\partial{x_{ss}^{a}}}$	$\displaystyle=\frac{\partial{F(x,u,\theta,d)}}{\partial{x_{ss}^{a}}},$

where $F(x,u,\theta)=f(x^{a}_{ss}(u,\theta),u)+\Gamma\theta-x^{a}_{ss}(u,\theta)=0$ . Thus, it is inferred that the existence of noise does not influence the Lipschitz continuous property of the steady-state with respect to $\theta$ and the Lipschitz constant is the same as that without noise. Similarly, for $F_{y}=g(x_{ss}^{a}(u,\theta,d))-y^{a,d}=0$ , we have the same result. Hence, the proof is completed. ∎

Remark 2.

Lemma 6 is similar to Lemma 3 where the noise $d$ is independent of the state $x$ and the injected false data $\theta$ . From Lemma 6, we know that the process and measurement noises affect the convergence rate $\mu^{\prime}$ but not the Lipschitz constant of the steady-state map. Apparently, $\mu\neq\mu^{\prime}$ . If $\mu^{\prime}>\mu$ , the rate that the system converges to the steady state becomes slow (i.e., the noise reduces the convergence rate), which is shown in Fig. 5 in Section VII.

V-B Attack strategy design with noise

With the zeroth-order optimization framework, the model-free attack strategy under the discrete-time system with noise $d_{k}$ is designed as

\displaystyle\left\{\begin{aligned} &\tilde{\phi}_{k}=\frac{pv_{k}}{\delta}[\Phi(\theta_{k},y_{k+1}^{a},d_{k})-\Phi(\theta_{k-1},y_{k}^{a},d_{k-1})],\\ &w_{k+1}=\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}],\\ &\theta_{k+1}=w_{k+1}+\delta v_{k+1},\end{aligned}\right.

(27)

where $d_{k}$ and $d_{k-1}$ are independent random noises that are sampled at iterations $k$ and $k-1$ , respectively. Different from (10), the existence of noise will also affect the objective function value. Moreover, the function value is not repeatable at different iterations and it is hard to store the noise value at each iteration for computing the function value. Thus, at iteration $k$ , only one evaluation is possible. In other words, compared to (10), it takes the residual of objective function evaluations between two consecutive stochastic feedback points.

V-C Optimality with the general noise

With the noise $d_{k}$ , the following lemma provides the upper bound of $\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k},d_{k})]$ and $\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k}\|^{2}]$ in this stochastic setting.

Lemma 7.

If Assumptions 2, 5 and 6 hold, with (27), we have

$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[V(x^{a}_{k},u_{k},\theta_{k},d_{k})]$
	$\displaystyle\leq\mu^{\prime}\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1},d_{k-1})]+32c_{2}\delta^{2}M^{2}_{x}$
	$\displaystyle+8c_{2}\eta^{2}M^{2}_{x}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+16c_{2}\sigma^{2}.$	(28)

Lemma 8.

If Assumptions 2, 5 and 6 hold, with (27), we have

$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]$	$\displaystyle\leq\frac{12\eta^{2}p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+48p^{2}M^{2}+\frac{24p^{2}\sigma^{2}}{\delta^{2}}$
	$\displaystyle+\frac{3\mu^{\prime}p^{2}M^{\prime^{2}}_{y}M^{\prime^{2}}_{g}}{2c_{2}\delta^{2}}(\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k},d_{k})]$
	$\displaystyle+\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1},d_{k-1})]).$	(29)

The proofs of Lemmas 7 and 8 are shown in Appendix IX-E and IX-F, respectively. Different from Lemmas 4 and 5, the internal inherent noise leads to an additional term $16c_{2}\sigma^{2}$ and $\frac{24p^{2}\sigma^{2}}{\delta^{2}}$ , respectively.

Next, we show the following theorem to characterize the effects of noise $d_{k}$ on the optimality of the obtained solutions.

Theorem 2.

Supposing that Assumptions 2, 5 and 6 hold, for any given precision $\epsilon>0$ such that $|\tilde{\Phi}_{\delta}(\theta)-\tilde{\Phi}(\theta)|\leq\epsilon$ , let $\delta=\frac{\epsilon}{M}$ and $\eta=\frac{\kappa\sigma\epsilon}{p^{2}\sqrt{T}}$ with $0<\kappa<\frac{p^{2}\sqrt{T}}{\sigma\epsilon}$ , then we have

	$\displaystyle\frac{1}{T}\sum_{k=1}^{T}$	$\displaystyle\mathbb{E}_{v_{[T]}}[\tilde{\Phi}(\theta_{k})-\tilde{\Phi}(\theta_{k}^{*})]$
		$\displaystyle=\mathcal{O}\left(\frac{\sqrt{\mu^{{}^{\prime}}(1+\mu^{\prime)}}\sigma^{3}p^{3}}{(1-\rho^{\prime})\sqrt{T}\epsilon^{2}}+\frac{\mu^{\prime}\sigma^{2}}{(1-\rho^{\prime})p^{4}}\right),$		(30)

where $\rho^{\prime}\in(0,1)$ is the maximum eigenvalue of matrix $P^{\prime}$ given by (IX-G).

Remark 3.

The proof is shown in Appendix IX-G. As $T\rightarrow\infty$ , the right side of (2) approaches $\frac{\mu^{\prime}\sigma^{2}}{(1-\rho^{\prime})p^{4}}$ . The nonzero upper bound is related to the dimension $p$ of the injected false data, the variance of the objective function originating from noise and the convergence rate $\mu^{\prime}$ . Compared with (1) in Theorem 1, we also reveal that the existence of noise increases the optimality gap.

VI Discussion

In this part, we show the detailed comparisons among the existing works on the design of the FDI attack strategy in Table II and Table III, and introduce the feasible stealthy attack design. Since the design of the stealthy attack depends on the existence of the original attack detector, the produced stealthy attack strategy could be different due to distinct detection criteria. Moreover, the general assumption on the stealthy attack is that the knowledge of the existing detector is known. Herein, we discuss the following three detection criteria.

TABLE II: The Comparisons among the Existing Works on Model-based attack strategy

Works

[16]

[17]

[5]

[8]

System

Model

x_{k+1}=Ax_{k}+w_{k}

y_{k}=Cx_{k}+v_{k}

x_{k+1}=Ax_{k}+w_{k}

y_{k}=Cx_{k}+v_{k}

x_{k+1}=Ax_{k}+Bu_{k}+D{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\zeta_{k}}+w_{k}

y_{k}=Cx_{k}+E{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\beta_{k}}+v_{k}

Noise

Gaussian distribution with zero mean (i.i.d.)

Stealthy

Metric

Kullback-Leibler

Divergence

Follow sensor-estimator

communicate rate

\chi^{2}

detector

Kullback-Leibler

Divergence

The

Objective

Max: Estimation error

by stealthy linear attacks

Degrade estimation quality

by stealthy attacks

Track the desired state

while keeping stealthy

Max: Estimation error

by stealthy attacks

The

Methods

Innovation-based

Random theory

Dynamic

programming

Fisher

information matrix

The

Requirements

All system model knowledge

System parameters

without filter gain

If the detection criterion satisfies

\displaystyle\|y^{a,d}_{k}-y^{d}_{k}\|\leq y_{\mathrm{th}},

(31)

the optimality of the obtained solutions in the proposed strategy remains as long as the actual output trajectory $y^{a,d}_{k}$ meets $\|y^{a,d}_{k}-\bar{y}_{k}\|\leq 2y_{\mathrm{th}}$ . Since it is a crude and inaccurate detection for a nonlinear/linear system, it is easy to deal with the stealthy constraint.

If the detection criterion depends on the distribution gap between the normal output value and the compromised output value. For example, Kullback-Leibler divergence [26] is a good tool to measure how well two probability distributions match. Let $z_{k}=y^{d}_{k}-y_{k}$ and $z_{k}$ follows a known distribution. For example, in the linear system with Gaussian noise, the Kalman filter error $z_{k}$ is an independent and identically distributed (i.i.d) Gaussian variable with $z_{k}\sim\mathcal{N}(0,\Sigma)$ . Let $z^{a}_{k}=y^{a,d}_{k}-y_{k}$ and then the stealthy attacks should meet

\displaystyle D(z^{a}_{k}\|z_{k})=\int_{\{\xi|f(z^{a}_{k};\xi)>0\}}f(z^{a}_{k};\xi)\mathrm{log}\frac{f(z^{a}_{k};\xi)}{f(z_{k};\xi)}\mathrm{d}\xi\leq\epsilon^{\prime},

(32)

where $\epsilon^{\prime}>0$ is a given stealthy parameter. In this case, the stealthy constraint can be further simplified when $z^{a}_{k}$ has the same statistical property as $z_{k}$ .

If the system adopts the data-driven detector, such as the machine-learning-based detection mechanisms [27, 28, 29] or the behavior-based data-driven detection methods [30], the anomalies can be detected based on the characteristic of the chosen methods. Specifically, the study [27] develops a One-Class Support Vector Machine (OCSVM) algorithm to classify the outlier class. The work [28] proposes the cumulative sum (CUSUM) method to detect the deviations that correspond to anomalies. In [30], a behavior-based $\chi^{2}$ detector was constructed based on a sequence of inputs and outputs and their covariance. When the stealthy attack is familiar with the existing learning-based/behavior-based detectors, the stealthy constraints can be derived and the obtained solutions are restricted in a new constraint set. Thus, the analysis of the updated constraint set is critical to the design of the FDI attack strategy with detectors.

TABLE III: The Comparisons among the Existing Works on Data-driven attack strategy

Works

[18]

[9]

[10]

[6]

System

Model

z=Hx+e+{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}a}

x_{k+1}=Ax_{k}+Bu_{k}+D{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\zeta_{k}}+w_{k}

y_{k}=Cx_{k}+E{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,0}\beta_{k}}+w_{k}

Noise

Gaussian distribution

\sum_{k=0}^{T}\|w_{k}\|^{2}<\infty

Gaussian distribution

Stealthy

Metric

z^{\prime}=H(x+a)+e

Subsapce

\mathcal{R}(H)

\alpha

-probability

L_{2}

-stealthiness

\chi^{2}

detector

The

Objective

Design stealthy

FDI attack

Design feasible

unobservable attack

Max: Stealthy

attack’s effects

Design data-driven

undetected attacks

The

Methods

Independent component

analysis (ICA)

Subspace method

Adaptive dynamic

programming (ADP)

Subspace method

The

Requirements

Independent, non-Gaussian

and the full sensors observations

Linear

measurement model

The attack

strategy is linear

Linear

system model

VII Simulation results

In this section, we evaluate the performance of the proposed attack scheme, i.e., the tracking performance and the optimality of solutions without/with noise.

Consider the following system

\displaystyle\begin{split}x_{k+1}=&Ax_{k}+Bu_{k}+d_{k},\\ y_{k}=&Cx_{k}+d_{k}.\end{split}

(33)

where $u_{k}=-Kx_{k}$ with $K=[1.5~{}-1.5;0.2~{}0.1],A=[0~{}1;2~{}-1],B=[0~{}0;1~{}0],C=[1~{}1]$ . It is stable, controllable, and observable. We consider two kinds of noise, including $d_{k}\sim\mathcal{N}(0,0.02)$ and $d_{k}\sim U(0,0.02)$ . We set the initial state $x_{1}=[1;-3]$ , the probing signal $v_{k}=[\mathrm{cos}(k);\mathrm{sin}(k)]/\sqrt{2}$ to satisfy $\|v_{k}\|=1$ , and the initial solution $w_{1}$ is random and follows the standard uniform distribution. We also set the smoothing parameter $\delta=10^{-3}$ , the step-size $\eta=7.5\times 10^{-5}$ , the attack selection matrix $\Gamma=I_{2}$ and the weight matrix $Q=3I_{2}$ where $I_{2}$ is a two-dimensional diagonal unit matrix. We define two types of the expected output trajectories, including the static trajectory $\bar{y}_{1}=-1.5$ and dynamic output trajectory $\bar{y}_{2}=10^{-4}k$ with respect to iteration $k$ . Each data point in the following figures represents an ensemble average of $50$ trials.

Without noise $d_{k}$ , we first analyze the tracking performance with different desired output trajectories. As shown in Fig. 2, the output value of the system under the proposed attack strategy has the ability to track the expected output trajectory whether the trajectory is static or dynamic. Especially, Fig. 2 and Fig. 2 illustrate that the output values fluctuate along the desired trajectory. Note that the phenomenon of fluctuation is normal since the output values are constantly perturbed by the time-varying probing signal $v_{k}$ .

Then, we illustrate the optimality of solutions via the optimality gap $\Phi(\theta_{k})-\Phi(\theta^{*})$ , which is shown in Fig. 3. When the expected trajectory is static, i.e., $\bar{y}_{1}=-1.5$ , we find that the obtained solution is close to the optimal solution and the optimality gap converges to about $0.02$ , as shown in Fig. 3. When the expected trajectory is time-varying, i.e., $\bar{y}_{2}=10^{-4}k$ , in Fig. 3, the obtained solutions also approach the optimal one and the upper bound of the optimality gap does not exceed $0.11$ . To sum up, the proposed model-free attack strategy can obtain the suboptimal attack signals that drive the output values to the desired output trajectory by only utilizing the real-time compromised measurements.

With noise $d_{k}$ , we analyze its effects on the tracking performance and optimality. The output $\mathrm{y2}$ and the optimality gap $\tilde{\Phi}(\theta_{k})-\tilde{\Phi}(\theta^{*})$ under uniform distribution noise and normal distribution noise are denoted as $\mathrm{y2-U}$ , $\mathrm{y2-N}$ , $\mathrm{Phi-U}$ and $\mathrm{Phi-N}$ , respectively. From boxplot Fig. 4 with iterations $k=40000$ , we know that the final value of the actual output $y2$ is $4$ and the median is $2$ . In other words, the slope of the dynamic trajectory of the output value is $10^{-4}$ , which follows the expected one, and the noise does not influence the tracking trend while adding lots of outliers. In addition, combined with Fig. 5, the average optimality gap (red line in Fig. 4 / blue line in Fig 5) of $50$ trials approaches zero although there are some outliers (red plus in Fig. 4 / pink shadow in Fig. 5). Moreover, the optimality gap is larger than that without noise $d_{k}$ and the normal distribution noise has smaller effects than the uniform distribution noise on the optimality gap.

VIII Conclusion

We considered the problem of designing a model-free attack scheme where the adversary with limited capability aims to make the output value follow the desired trajectory without any prior system model information. The designed attack scheme is model-free since only real-time measurements are required. These measurements are used to compute objective function evaluations and gradient estimates are constructed to update the attack signal based on these objective function evaluations at the previous and current time. Moreover, considering the adversary has limited capability, we constrained the obtained solutions within the feasible region by the projected gradient descent method. Finally, we analyzed the optimality of solutions and established its dependence on the dimensions of the attack signal, the iterations, the variance of the objective function, and the convergence rate of the system. Future works include the design of attack strategies with partial observations and specific detector constraints.

IX APPENDIX

IX-A Proof of Lemma 3

Based on Assumptions $1$ - $4$ , then we have

$\displaystyle\|e_{\Phi}(x_{k},\theta_{k})\|^{2}=$	$\displaystyle\|\Phi(\theta_{k},y^{a}_{k+1})-\Phi(\theta_{k},h(u_{k},\theta_{k}))\|^{2}$
$\displaystyle\leq$	$\displaystyle M_{y}^{2}\\|y^{a}_{k+1}-h(u_{k},\theta_{k})\\|^{2}$
$\displaystyle\leq$	$\displaystyle M_{y}^{2}M_{g}^{2}\\|x^{a}_{k+1}-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}.$	(34)

Combing (3) with (4), it can be inferred that

$\displaystyle\\|x^{a}_{k+1}$	$\displaystyle-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}$
	$\displaystyle\leq\frac{1}{\alpha_{1}}V(x^{a}_{k+1},u_{k},\theta_{k})=\frac{1}{\alpha_{1}}V(f^{\prime}(x^{a}_{k},u_{k},\theta_{k}),u_{k},\theta_{k})$
	$\displaystyle\leq\frac{1}{\alpha_{1}}(V(x^{a}_{k},u_{k},\theta_{k})-\alpha_{3}\\|x^{a}_{k}-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2})$
	$\displaystyle\leq\frac{1}{\alpha_{1}}(1-\frac{\alpha_{3}}{\alpha_{2}})V(x^{a}_{k},u_{k},\theta_{k}).$	(35)

Substituting (IX-A) into (IX-A), it is easy to obtain (17).

IX-B Proof of Lemma 4

Based on (3), we have

	$\displaystyle V(x^{a}_{k},u_{k},\theta_{k})\leq\alpha_{2}\\|x^{a}_{k}-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}$
	$\displaystyle=\alpha_{2}\\|x^{a}_{k}-x^{a}_{ss}(u_{k-1},\theta_{k-1})+x^{a}_{ss}(u_{k-1},\theta_{k-1})-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}$
	$\displaystyle\overset{(s.1)}{\leq}2\alpha_{2}(\\|x^{a}_{k}-x^{a}_{ss}(u_{k-1},\theta_{k-1})\\|^{2}+\\|x^{a}_{ss}(u_{k-1},\theta_{k-1})-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2})$
	$\displaystyle\overset{(s.2)}{\leq}2\alpha_{2}(\frac{1}{\alpha_{1}}(1-\frac{\alpha_{3}}{\alpha_{2}}V(x^{a}_{k-1},u_{k-1},\theta_{k-1}))+M_{x}^{2}\\|\theta_{k}-\theta_{k-1}\\|^{2}),$

where $(s.1)$ follows the fact that $\|a+b\|^{2}\leq 2(\|a\|^{2}+\|b\|^{2})$ and $(s.2)$ follows from (2), (IX-A), and the Lipschitz continuity of $x^{a}_{ss}(u_{k},\theta_{k})$ . The upper bound of $\mathbb{E}_{v_{[k]}}[\|\theta_{k}-\theta_{k-1}\|^{2}]$ is given as

	$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[\\|\theta_{k}-\theta_{k-1}\\|^{2}]$
		$\displaystyle=\mathbb{E}_{v_{[k]}}[\\|w_{k}-w_{k-1}+\delta v_{k}-\delta v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.1)}{\leq}\mathbb{E}_{v_{[k]}}[2\\|w_{k}-w_{k-1}\\|^{2}+2\delta^{2}\\|v_{k}-v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.2)}{\leq}2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+2\delta^{2}\mathbb{E}_{v_{[k]}}[2\\|v_{k}\\|^{2}+2\\|v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.3)}{\leq}2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+8\delta^{2},$

where $(s.1)$ follows that $\mathbb{E}[(a+b)^{2}]\leq 2\mathbb{E}[a^{2}+b^{2}]$ , $(s.2)$ follows from (IV-C) and $\|a-b\|^{2}\leq 2(\|a\|^{2}+\|b\|^{2})$ , and $(s.3)$ follows the fact that $\|v_{k}\|=1$ since $v_{k}$ is selected uniformly at random from the unit sphere.

Combining the above results, we can infer that (4) holds.

IX-C Proof of Lemma 5

Let $\Phi(\theta_{k})\triangleq\Phi(\theta_{k},h(u_{k},\theta_{k}))$ and $\Phi(\theta_{k-1})\triangleq\Phi(\theta_{k-1},h(u_{k-1},\theta_{k-1}))$ . With (10) and (16), then we have

	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]$
	$\displaystyle=\frac{p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},y^{a}_{k+1})-\Phi(\theta_{k-1},y^{a}_{k}))\\|^{2}]$
	$\displaystyle=\frac{p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k})-\Phi(\theta_{k-1})$
	$\displaystyle+e_{\Phi}(x^{a}_{k},\theta_{k})-e_{\Phi}(x^{a}_{k-1},\theta_{k-1}))\\|^{2}]$
	$\displaystyle\overset{(s.1)}{\leq}\underbrace{\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k})-\Phi(\theta_{k-1}))\\|^{2}]}_{①}$
	$\displaystyle+\underbrace{\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}e_{\Phi}(x^{a}_{k},\theta_{k})\\|^{2}]+\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}e_{\Phi}(x^{a}_{k-1},\theta_{k-1})\\|^{2}]}_{②},$

where $(s.1)$ follows the fact that $\mathbb{E}[(a+b+c)^{2}]\leq 3\mathbb{E}[a^{2}+b^{2}+c^{2}]$ . Next, we provide the upper bound of the item $①$ and $②$ , respectively.

$\displaystyle①\overset{(s.1)}{\leq}$	$\displaystyle\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(w_{k}+\delta v_{k})-\Phi(w_{k-1}+\delta v_{k}))$
	$\displaystyle+\Phi(w_{k-1}+\delta v_{k})-\Phi(w_{k-1}+\delta v_{k-1})\\|^{2}]$
$\displaystyle=$	$\displaystyle\frac{6p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(w_{k}+\delta v_{k})-\Phi(w_{k-1}+\delta v_{k}))\\|^{2}$
	$\displaystyle+\\|\Phi(w_{k-1}+\delta v_{k})-\Phi(w_{k-1}+\delta v_{k-1})\\|^{2}]$
$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle\frac{6p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{2}\\|w_{k}-w_{k-1}\\|^{2}$
	$\displaystyle+\\|v_{k}\\|^{2}\delta^{2}\\|v_{k}-v_{k-1}\\|^{2}]$
$\displaystyle\overset{(s.3)}{\leq}$	$\displaystyle\frac{6p^{2}M^{2}}{\delta^{2}}(\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]$
	$\displaystyle+\delta^{2}\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{2}(2\\|v_{k}\\|^{2}+2\\|v_{k-1}\\|^{2})])$
$\displaystyle=$	$\displaystyle\frac{6\eta^{2}p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+24p^{2}M^{2},$	(36)

where $(s.1)$ holds by adding and minus $\Phi(w_{k-1}+\delta v_{k})$ , $(s.2)$ holds due to Assumption 4 and the dependency of $v_{k}$ with respect to $w_{k}$ , and $(s.3)$ follows the fact that (IV-C) holds and $\|v_{k}\|=1$ .

$\displaystyle②=$	$\displaystyle\frac{3p^{2}}{\delta^{2}}(\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{2}\|e_{\Phi}(x^{a}_{k},\theta_{k})\|^{2}]$
	$\displaystyle+\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{2}]\mathbb{E}_{v_{[k]}}[\|e_{\Phi}(x^{a}_{k-1},\theta_{k-1})\|^{2}])$
$\displaystyle\overset{(s.1)}{\leq}$	$\displaystyle\frac{3p^{2}}{\delta^{2}}(\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{4}])^{\frac{1}{2}}(\mathbb{E}_{v_{[k]}}[\|e_{\Phi}(x^{a}_{k},\theta_{k})\|^{4}])^{\frac{1}{2}}$
	$\displaystyle+\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\|e_{\Phi}(x^{a}_{k-1},\theta_{k-1})\|^{2}]$
$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle\frac{3\mu p^{2}M_{y}^{2}M_{g}^{2}}{2\alpha_{2}\delta^{2}}(\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$
	$\displaystyle+\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]),$	(37)

where $(s.1)$ holds based on the Cauchy-Schwarz inequality capable of splitting the product of two correlated random variables $v_{k}$ and $e_{\Phi}(x_{k},\theta_{k})$ and $(s.2)$ holds based on (17). Combined with the above results, the proof is completed.

IX-D Proof of Theorem 1

Since the objective function $\Phi(\theta_{k})$ is convex, the Gaussian smooth approximation of $\Phi(\theta_{k})$ is also convex[31]. With (9b), then we have

\displaystyle\Phi(\theta_{k})-\Phi(\theta_{k}^{*})\leq\Phi_{\delta}(\theta_{k})-\Phi_{\delta}(\theta_{k}^{*})+2M\delta.

(38)

With (9c), the Taylor expansion of $\Phi_{\delta}(\theta_{k})$ at solution $\theta_{k}^{*}$ is shown as

	$\displaystyle\Phi_{\delta}(\theta_{k})\leq$	$\displaystyle\Phi_{\delta}(\theta_{k}^{})+\nabla\Phi_{\delta}(\theta_{k}^{})^{\mathrm{T}}(\theta_{k}-\theta_{k}^{*})$
	$\displaystyle+$	$\displaystyle\frac{M^{2}p^{2}}{2\delta^{2}}\\|\theta_{k}-\theta_{k}^{*}\\|^{2},$		(39)

where $\theta_{k}^{*}$ is the optimal solution of the problem $\mathbf{\mathcal{P}_{1}}$ at iteration $k$ . Taking the expectation of $v_{[k]}$ at both ends of the inequality (IX-D), then we have

	$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[\Phi_{\delta}(\theta_{k})]-\mathbb{E}_{v_{[k]}}[\Phi_{\delta}(\theta_{k}^{*})]\leq$
		$\displaystyle\mathbb{E}_{v_{[k]}}[\nabla\Phi_{\delta}(\theta_{k}^{})^{\mathrm{T}}(\theta_{k}-\theta_{k}^{})]+\frac{M^{2}p^{2}}{2\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}].$

Since

	$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[\nabla\Phi_{\delta}(\theta_{k}^{})^{\mathrm{T}}(\theta_{k}-\theta_{k}^{})]\leq$
		$\displaystyle\frac{1}{2}(\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(\theta_{k}^{})\\|^{2}]+\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{}\\|^{2}])$

where the inequality follows the fact that for $\forall a_{1},a_{2}$ ,

\displaystyle\mathbb{E}[a_{1}^{\mathrm{T}}a_{2}]\leq(\mathbb{E}[\|a_{1}\|^{2}]\mathbb{E}[\|a_{2}\|^{2}])^{\frac{1}{2}}\leq\frac{1}{2}(\mathbb{E}[\|a_{1}\|^{2}]+\mathbb{E}[\|a_{2}\|^{2}]),

then it can be inferred that

	$\displaystyle\mathbb{E}_{v_{[k]}}[\Phi_{\delta}(\theta_{k})]$	$\displaystyle\leq\mathbb{E}_{v_{[k]}}[\Phi_{\delta}(\theta_{k}^{})]+\underbrace{\frac{1}{2}\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(\theta_{k}^{})\\|^{2}]}_{①}$
		$\displaystyle+\underbrace{(\frac{1}{2}+\frac{M^{2}p^{2}}{2\delta^{2}})\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}]}_{②}.$

Next, we analyze the upper bound of the item $①$ and $②$ .

	$\displaystyle①=$	$\displaystyle\frac{1}{2}\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k}^{*}+\delta v_{k})\\|^{2}],$
	$\displaystyle=$	$\displaystyle\frac{1}{2}\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k}+\delta v_{k})-(\nabla\Phi_{\delta}(w_{k}+\delta v_{k})$
	$\displaystyle-$	$\displaystyle\nabla\Phi_{\delta}(w_{k}^{*}+\delta v_{k}))\\|^{2}],$
	$\displaystyle\overset{(s.1)}{\leq}$	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k}+\delta v_{k})\\|^{2}]+\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k}+\delta v_{k})$
	$\displaystyle-$	$\displaystyle\nabla\Phi_{\delta}(w_{k}^{*}+\delta v_{k})\\|^{2}],$
	$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(\theta_{k})\\|^{2}]+\frac{M^{2}p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}],$
	$\displaystyle=$	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k})-(\nabla\Phi_{\delta}(w_{k})-\nabla\Phi_{\delta}(\theta_{k}))\\|^{2}]$
	$\displaystyle+$	$\displaystyle\frac{M^{2}p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}],$
	$\displaystyle\overset{(s.3)}{\leq}$	$\displaystyle 2\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k})\\|^{2}]+2M^{2}p^{2}\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{2}]$
	$\displaystyle+$	$\displaystyle\frac{M^{2}p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}],$
	$\displaystyle=$	$\displaystyle 2\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k})\\|^{2}]+2M^{2}p^{2}+\frac{M^{2}p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\theta_{k}-\theta_{k}^{*}\\|^{2}],$

where $(s.1)$ follows the fact that $\|b\|^{2}=\|a-(a-b)\|^{2}\leq 2\|a\|^{2}+2\|a-b\|^{2}$ , $(s.2)$ follows from (9c), i.e., $\Phi_{\delta}(\theta_{k})$ is $\frac{Mp}{\delta}-$ smoothness, and $(s.3)$ follows from (9c) and $\|\delta v_{k}\|^{2}=\delta^{2}\|v_{k}\|^{2}$ .

	$\displaystyle②=$	$\displaystyle(\frac{1}{2}+\frac{M^{2}p^{2}}{2\delta^{2}})\mathbb{E}_{v_{[k]}}[\\|w_{k}-w_{k}^{*}\\|^{2}],$
	$\displaystyle\overset{(s.1)}{\leq}$	$\displaystyle(\frac{1}{2}+\frac{M^{2}p^{2}}{2\delta^{2}})(2\mathbb{E}_{v_{[k]}}[\\|w_{k-1}-w_{k}^{*}\\|^{2}]+2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]),$
	$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle(\frac{1}{2}+\frac{M^{2}p^{2}}{2\delta^{2}})(2\mathbb{E}_{v_{[k]}}[\\|w_{k-1}-w_{k}\\|^{2}]+2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]),$
	$\displaystyle\overset{(s.3)}{\leq}$	$\displaystyle(\frac{1}{2}+\frac{M^{2}p^{2}}{2\delta^{2}})4\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}],$
	$\displaystyle=$	$\displaystyle(\frac{2\delta^{2}+2M^{2}p^{2}}{\delta^{2}})\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}],$

where $(s.1)$ follows from (IV-C), $(s.2)$ follows that $\|w_{k-1}-w_{k}^{*}\|^{2}\leq\|w_{k-1}-w_{k}\|^{2}$ , and $(s.3)$ follows from (IV-C).

The second moment of the gradient of $\Phi_{\delta}(w_{k})$ at solution $w_{k}$ is $\|\nabla\Phi_{\delta}(w_{k})\|^{2}$ and we have

	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k})\\|^{2}]$
	$\displaystyle=\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}-(\tilde{\phi}_{k}-\nabla\Phi_{\delta}(w_{k}))\\|^{2}],$
	$\displaystyle\leq 2\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]+2\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}-\nabla\Phi_{\delta}(w_{k})\\|^{2}],$

where the inequality follows the fact that $\mathbb{E}[(a-b)^{2}]\leq 2(\mathbb{E}[a^{2}]+\mathbb{E}[b^{2}])$ . Since

\displaystyle\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k}-\nabla\Phi_{\delta}(w_{k})\|^{2}]\leq\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k}\|^{2}],

which follows from $(58)$ in [19, Theorem 8], with (5),

	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\nabla\Phi_{\delta}(w_{k})\\|^{2}]$	$\displaystyle\leq\frac{24\eta^{2}p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+96p^{2}M^{2}$
		$\displaystyle+\frac{6\mu p^{2}M_{y}^{2}M_{g}^{2}}{\alpha_{2}\delta^{2}}(\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$
		$\displaystyle+\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]).$

Rearranging the above items, thus we have

	$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[\Phi_{\delta}(\theta_{k})]-\mathbb{E}_{v_{[k]}}[\Phi_{\delta}(\theta_{k}^{*})]$
		$\displaystyle\leq(2+\frac{54M^{2}p^{2}}{\delta^{2}})\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+194M^{2}p^{2}$
		$\displaystyle+\frac{12\mu p^{2}M_{y}^{2}M_{g}^{2}}{\alpha_{2}\delta^{2}}(\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$
		$\displaystyle+\mathbb{E}_{v_{[k]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]).$

Then, it follows that

		$\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\Phi_{\delta}(\theta_{k})-\Phi_{\delta}(\theta_{k}^{*})]$
		$\displaystyle\leq(\frac{2}{T}+\frac{54M^{2}p^{2}}{\delta^{2}T})\eta^{2}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+194M^{2}p^{2}$
		$\displaystyle+\frac{12\mu p^{2}M_{y}^{2}M_{g}^{2}}{\alpha_{2}\delta^{2}T}\sum_{k=1}^{T}(\mathbb{E}_{v_{[T]}}[V(x^{a}_{k},u_{k},\theta_{k})]$
		$\displaystyle+\mathbb{E}_{v_{[T]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})])$		(40)

To guarantee $|\Phi_{\delta}(w)-\Phi(w)|\leq\epsilon$ , we set $\delta=\frac{\epsilon}{M}$ . Combined (4), (38) and (IX-D), we obtain

		$\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\Phi(\theta_{k})-\Phi(\theta_{k}^{*})]$
		$\displaystyle\leq\frac{\eta^{2}p^{2}}{\delta^{2}T}(\frac{2\delta^{2}}{p^{2}}+54M^{2}+48\mu M_{x}^{2}M_{y}^{2}M_{g}^{2})\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]$
		$\displaystyle+\frac{12\mu(\mu+1)p^{2}M_{y}^{2}M_{g}^{2}}{\alpha_{2}\delta^{2}T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]$
		$\displaystyle+\frac{192\mu p^{2}M_{x}^{2}M_{y}^{2}M_{g}^{2}}{T}+194M^{2}p^{2}+2M\delta.$		(41)

Since $\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k}\|^{2}]$ and $\mathbb{E}_{v_{[k]}}[V(x^{a}_{k},u_{k},\theta_{k})]$ are coupled variables, we rely on [19, Lemma 11], which shows the upper bound of the partial sum of non-negative coupled series, to analyze (IX-D).

Combining (4) and (5), we can obtain a compacted form, which is shown as

	$\displaystyle\left[\begin{array}[]{c}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]\\ \mathbb{E}_{v_{[k]}}[\sqrt{\frac{p_{12}}{p_{21}}}V(x^{a}_{k},u_{k},\theta_{k})]\end{array}\right]\preceq$
	$\displaystyle P\left[\begin{array}[]{c}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]\\ \mathbb{E}_{v_{[k]}}[\sqrt{\frac{p_{12}}{p_{21}}}V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]\end{array}\right]+\left[\begin{array}[]{c}d_{1}\\ \sqrt{\frac{p_{12}}{p_{21}}}d_{2}\end{array}\right],$

where $P=\left[\begin{array}[]{cc}p_{11}&\sqrt{p_{12}p_{21}}\\ \sqrt{p_{12}p_{21}}&p_{22}\end{array}\right]$ with

$\displaystyle p_{11}=$	$\displaystyle\frac{6p^{2}\eta^{2}}{\delta^{2}}(M^{2}+\mu M_{x}^{2}M_{y}^{2}M_{g}^{2}),$
$\displaystyle p_{12}=$	$\displaystyle\frac{3\mu p^{2}M_{y}^{2}M_{g}^{2}}{2\alpha_{2}\delta^{2}}(1+\mu),$
$\displaystyle p_{21}=$	$\displaystyle 4\alpha_{2}\eta^{2}M_{x}^{2},$
$\displaystyle p_{22}=$	$\displaystyle\mu,$
$\displaystyle d_{1}=$	$\displaystyle 24p^{2}(M^{2}+\mu M_{x}^{2}M_{y}^{2}M_{g}^{2}),$
$\displaystyle d_{2}=$	$\displaystyle 16\alpha_{2}\delta^{2}M_{x}^{2}.$	(42)

Then, we have

$\displaystyle\max\{\sum_{k=1}^{T}$	$\displaystyle\mathbb{E}_{v_{[T]}}[\\|\tilde{\phi}_{k-1}\\|^{2}],\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\sqrt{\frac{p_{12}}{p_{21}}}V(x^{a}_{k-1},u_{k-1},\theta_{k-1})]\}$
$\displaystyle\leq$	$\displaystyle(\rho^{T}+\frac{1}{1-\rho})B_{1}+\frac{T}{1-\rho}(d_{1}+\sqrt{\frac{p_{12}}{p_{21}}}d_{2}),$	(43)
$\displaystyle=$	$\displaystyle\mathcal{O}\left(\frac{T}{1-\rho}(p^{2}+\mu p^{2}+\frac{p\delta}{\eta})\right)$	(44)

where $B_{1}=\mathbb{E}[\|\tilde{\phi}({w_{1}})\|^{2}]+\mathbb{E}[\sqrt{\frac{p_{12}}{p_{21}}}V(x^{a}_{1},u_{1},\theta_{1})]$ and $\rho<1$ is the maximum singular value of the matrix $P$ .

By solving the characteristic equation $|\lambda I-P|=0$ with eigenvalues $\lambda$ , then

$\displaystyle\rho=$	$\displaystyle\frac{p_{11}+p_{22}}{2}+\sqrt{(\frac{p_{11}-p_{22}}{2})^{2}+p_{12}p_{21}}$
$\displaystyle\leq$	$\displaystyle\frac{p_{11}+p_{22}}{2}+\|\frac{p_{11}-p_{22}}{2}\|+\sqrt{p_{12}p_{21}}$
$\displaystyle=$	$\displaystyle\max\{p_{11},p_{22}\}+\sqrt{p_{12}p_{21}}.$	(45)

To guarantee $\rho<1$ , we need to set $\delta$ and $\eta$ such that

\displaystyle p_{11}+\sqrt{p_{12}p_{21}}<1,\quad p_{22}+\sqrt{p_{12}p_{21}}<1.

(46)

Then, combined (IX-D) and (IX-D), it follows that

		$\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\Phi(\theta_{k})-\Phi(\theta_{k}^{*})]\leq l_{3}+$
		$\displaystyle(l_{1}+\sqrt{\frac{p_{21}}{p_{12}}}l_{2})\left\{(\rho^{T}+\frac{1}{1-\rho})B_{1}+\frac{T}{1-\rho}(d_{1}+\sqrt{\frac{p_{12}}{p_{21}}}d_{2})\right\}$		(47)

where

	$\displaystyle l_{1}=$	$\displaystyle\frac{2\eta^{2}}{T}+\frac{p^{2}}{\delta^{2}T}(54M^{2}\eta^{2}+48\mu M_{x}^{2}M_{y}^{2}M_{g}^{2}\eta^{2}),$
	$\displaystyle l_{2}=$	$\displaystyle\frac{12\mu(\mu+1)p^{2}M_{y}^{2}M_{g}^{2}}{\alpha_{2}\delta^{2}T},$
	$\displaystyle l_{3}=$	$\displaystyle\frac{192\mu p^{2}M_{x}^{2}M_{y}^{2}M_{g}^{2}}{T}+194M^{2}p^{2}+2M\delta.$

Due to $\delta=\frac{\epsilon}{M}$ , we set $\eta=\frac{\kappa\epsilon}{pT}$ such that $\frac{p^{4}\eta^{2}}{\epsilon^{2}}$ and $\frac{p^{2}}{T}$ have the same order. Then, the order of (IX-D) is shown as (1). The parameter $\kappa$ is set to satisfy (46), i.e.,

\displaystyle\begin{split}\xi_{1}\kappa^{2}+\xi_{2}\kappa&<1,\\ \xi_{3}+\xi_{2}\kappa&<1,\end{split}

(48)

where

	$\displaystyle\xi_{1}$	$\displaystyle=\frac{6M^{2}(M^{2}+\mu M_{x}^{2}M_{y}^{2}M_{g}^{2})}{T^{2}},$
	$\displaystyle\xi_{2}$	$\displaystyle=\frac{MM_{x}M_{y}M_{g}}{T}\sqrt{6\mu(1+\mu)},$
	$\displaystyle\xi_{3}$	$\displaystyle=\mu.$

The feasible range is denoted by $(0,\kappa^{*})$ . Based on (48), we have

	$\displaystyle\kappa^{*}=$	$\displaystyle\min\left\{\frac{-\xi_{2}+\sqrt{\xi^{2}_{2}+4\xi_{1}}}{2\xi_{1}},\frac{1-\xi_{3}}{\xi_{2}}\right\},$
	$\displaystyle=$	$\displaystyle\mathcal{O}\left(\min\left\{\frac{T\sqrt{\mu(1+\mu)}}{\mu},\frac{(1-\mu)T}{\sqrt{\mu(1+\mu)}}\right\}\right),$
	$\displaystyle\rho=$	$\displaystyle\max\left\{\xi_{1}\kappa^{2},\xi_{3}\right\}+\xi_{2}\kappa,$
	$\displaystyle=$	$\displaystyle\mathcal{O}\left(\max\left\{\frac{(1-\mu)^{2}}{1+\mu},\mu\right\}+1-\mu\right).$

IX-E Proof of Lemma 7

The analysis is similar to the proof of Appendix IX-B. First, similar to Lemma 3, we have

\displaystyle|e_{\Phi}(x^{a}_{k},\theta_{k},d_{k})|^{2}\leq\frac{\mu^{\prime}M^{2}_{y}M^{2}_{g}}{2c_{2}}V(x^{a}_{k},u_{k},\theta_{k},d_{k}).

(49)

Then, it follows that

	$\displaystyle V(x^{a}_{k},u_{k},\theta_{k},d_{k})$
	$\displaystyle\leq 2c_{2}(\\|x^{a}_{k}-x^{a}_{ss}(u_{k-1},\theta_{k-1},d_{k-1})\\|^{2}$
	$\displaystyle+\\|x^{a}_{ss}(u_{k-1},\theta_{k-1},d_{k-1})-x^{a}_{ss}(u_{k},\theta_{k},d_{k})\\|^{2})$
	$\displaystyle\leq\mu^{\prime}V(x^{a}_{k-1},u_{k-1},\theta_{k-1})$
	$\displaystyle+2c_{2}(\\|x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k-1})-x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k})\\|^{2}$
	$\displaystyle+\\|x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k})-x_{ss}^{a}(u_{k},\theta_{k},d_{k})\\|^{2})$
	$\displaystyle\leq\mu^{\prime}V(x^{a}_{k-1},u_{k-1},\theta_{k-1})$
	$\displaystyle+4c_{2}(\\|x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k-1})-x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k})\\|^{2})$
	$\displaystyle+4c_{2}(\\|x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k})-x_{ss}^{a}(u_{k},\theta_{k},d_{k})\\|^{2})$
	$\displaystyle\leq\mu^{\prime}V(x^{a}_{k-1},u_{k-1},\theta_{k-1}))+4c_{2}M_{x}^{2}\\|\theta_{k}-\theta_{k-1}\\|^{2}$
	$\displaystyle+4c_{2}(\\|x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k})-x_{ss}^{a}(u_{k},\theta_{k},d_{k})\\|^{2}).$

Based on Assumption 5, it can be inferred that

\displaystyle\mathbb{E}_{d}

\displaystyle[(x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k-1})-x_{ss}^{a}(u_{k-1},\theta_{k-1},d_{k}))^{2}]\leq 4\sigma^{2}.

The upper bound of $\mathbb{E}_{v[k]}[\|\theta_{k}-\theta_{k-1}\|^{2}]$ is given as

\displaystyle\mathbb{E}_{v[k]}[\|\theta_{k}-\theta_{k-1}\|^{2}]\leq 2\eta^{2}\mathbb{E}_{v_{[k]}}[\|\tilde{\phi}_{k-1}\|^{2}]+8\delta^{2}.

Then, $\mathbb{E}_{v[k]}[V(x^{a}_{k},u_{k},\theta_{k},d_{k})]$ can be rewritten as (7).

IX-F Proof of Lemma 8

The analysis is similar to the proof of Appendix IX-C. Let $\Phi(\theta_{k},d_{k})\triangleq\Phi(\theta_{k},h(u_{k},\theta_{k},d_{k}))$ . With (27), then we have

	$\displaystyle\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k}\\|^{2}]$
	$\displaystyle=\frac{p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},y^{a}_{k+1},d_{k})-\Phi(\theta_{k-1},y^{a}_{k},d_{k-1}))\\|^{2}]$
	$\displaystyle=\frac{p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k})-\Phi(\theta_{k-1},d_{k-1})$
	$\displaystyle+e_{\Phi}(x^{a}_{k},\theta_{k},d_{k})-e_{\Phi}(x^{a}_{k-1},\theta_{k-1},d_{k-1}))\\|^{2}]$
	$\displaystyle\overset{(s.1)}{\leq}\underbrace{\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k})-\Phi(\theta_{k-1},d_{k-1}))\\|^{2}]}_{①}$
	$\displaystyle+\underbrace{\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}e_{\Phi}(x^{a}_{k},\theta_{k},d_{k})\\|^{2}+\\|v_{k}e_{\Phi}(x^{a}_{k-1},\theta_{k-1},d_{k-1})\\|^{2}]}_{②}.$

Next, we provide the upper bound of the item $①$ and $②$ , respectively.

	$\displaystyle①\overset{(s.1)}{=}$	$\displaystyle\frac{3p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k})-\Phi(\theta_{k},d_{k-1})$
		$\displaystyle+\Phi(\theta_{k},d_{k-1})-\Phi(\theta_{k-1},d_{k-1}))\\|^{2}]$
	$\displaystyle\leq$	$\displaystyle\frac{6p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k})-\Phi(\theta_{k},d_{k-1}))\\|^{2}]$
		$\displaystyle+\frac{6p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k-1})-\Phi(\theta_{k-1},d_{k-1}))\\|^{2}]$
	$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle\frac{6p^{2}}{\delta^{2}}(\mathbb{E}_{v_{[k]}}[\|\Phi(\theta_{k},d_{k-1})-\Phi(\theta_{k-1},d_{k-1})\|^{4}])^{\frac{1}{2}}(\mathbb{E}_{v_{[k]}}[\\|v_{k}\\|^{4}])^{\frac{1}{2}}$
		$\displaystyle+\frac{6p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k-1})-\Phi(\theta_{k-1},d_{k-1}))\\|^{2}]$
	$\displaystyle\overset{(s.3)}{\leq}$	$\displaystyle\frac{24p^{2}\sigma^{2}}{\delta^{2}}+\frac{6p^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|v_{k}(\Phi(\theta_{k},d_{k-1})-\Phi(\theta_{k-1},d_{k-1}))\\|^{2}]$
	$\displaystyle\overset{(s.4)}{\leq}$	$\displaystyle\frac{24p^{2}\sigma^{2}}{\delta^{2}}+\frac{12\eta^{2}p^{2}M^{2}}{\delta^{2}}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+48p^{2}M^{2},$

where $(s.1)$ holds by adding and minus $\Phi(\theta_{k},d_{k-1})$ , $(s.2)$ follows from the Cauchy-Schwarz inequality, $(s.3)$ follows from Assumption 5 and $\|v_{k}\|=1$ , (s.4) holds due to the same procedure as (IX-C) of the proof in Appendix IX-C. Similarly, the term $②$ follows from (IX-C). Based on the above inequalities, (8) can be obtained.

IX-G Proof of Theorem 2

Following from the same procedure in Appendix IX-D, we have that

		$\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\tilde{\Phi}(\theta_{k})-\tilde{\Phi}(\theta_{k}^{*})]$
		$\displaystyle\leq\frac{\eta^{2}p^{2}}{\delta^{2}T}(\frac{2\delta^{2}}{p^{2}}+98M^{2}+96\mu^{\prime}M_{x}^{2}M_{y}^{2}M_{g}^{2})\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]$
		$\displaystyle+\frac{12\mu^{\prime}(\mu^{\prime}+1)p^{2}M_{y}^{2}M_{g}^{2}}{c_{2}\delta^{2}T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[V(x^{a}_{k-1},u_{k-1},\theta_{k-1},d_{k-1})]$
		$\displaystyle+\frac{(192\sigma^{2}+384\delta^{2}M_{x}^{2})\mu^{\prime}p^{2}M_{y}^{2}M_{g}^{2}}{\delta^{2}T}+\frac{192p^{2}\sigma^{2}}{\delta^{2}}+386p^{2}M^{2}+2M\delta.$		(50)

Combining (7) and (8), (IX-D) can be reconstructed as

\displaystyle P^{\prime}=\left[\begin{array}[]{cc}p_{11}^{\prime}&\sqrt{p_{12}^{\prime}p_{21}^{\prime}}\\ \sqrt{p_{12}p_{21}^{\prime}}&p_{22}^{\prime}\end{array}\right]

with

$\displaystyle p_{11}^{\prime}=$	$\displaystyle\frac{12p^{2}\eta^{2}}{\delta^{2}}(M^{2}+\mu^{\prime}M_{x}^{2}M_{y}^{2}M_{g}^{2}),$
$\displaystyle p_{12}^{\prime}=$	$\displaystyle\frac{3\mu^{\prime}p^{2}M_{y}^{2}M_{g}^{2}}{2c_{2}\delta^{2}}(1+\mu^{\prime}),$
$\displaystyle p_{21}^{\prime}=$	$\displaystyle 8c_{2}\eta^{2}M_{x}^{2},$
$\displaystyle p_{22}^{\prime}=$	$\displaystyle\mu^{\prime},$
$\displaystyle d_{1}^{\prime}=$	$\displaystyle 48p^{2}(M^{2}+\mu^{\prime}M_{x}^{2}M_{y}^{2}M_{g}^{2})$
	$\displaystyle+\frac{24p^{2}\sigma^{2}(1+\mu^{\prime}M_{y}^{2}M_{g}^{2})+12p^{2}\eta^{2}\mu^{\prime}M_{x}^{2}M_{y}^{2}M_{g}^{2}}{\delta^{2}},$
$\displaystyle d_{2}^{\prime}=$	$\displaystyle 32c_{2}\delta^{2}M_{x}^{2}+16c_{2}\sigma^{2}.$	(51)

Then, we have

	$\displaystyle\max\{\sum_{k=1}^{T}$	$\displaystyle\mathbb{E}_{v_{[T]}}[\\|\tilde{\phi}_{k}\\|^{2}],\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\sqrt{\frac{p_{12}^{\prime}}{p_{21}^{\prime}}}V(x^{a}_{k},u_{k},\theta_{k},d_{k})]\}$
	$\displaystyle\leq$	$\displaystyle(\rho^{\prime^{T}}+\frac{1}{1-\rho^{\prime}})B_{1}^{\prime}+\frac{T}{1-\rho^{\prime}}(d_{1}^{\prime}+\sqrt{\frac{p_{12}^{\prime}}{p_{21}^{\prime}}}d_{2}^{\prime}),$
	$\displaystyle=$	$\displaystyle\mathcal{O}\left(\frac{T}{1-\rho^{\prime}}(\frac{p^{2}\sigma^{2}+p^{2}\eta^{2}}{\delta^{2}}+\frac{p\delta^{2}+p\sigma^{2}}{\delta\eta})\right),$

where $B_{1}^{\prime}=\mathbb{E}[\|\tilde{\phi}({w_{1}})\|^{2}]+\mathbb{E}[\sqrt{\frac{p_{12}}{p_{21}}}V(x^{a}_{1},u_{1},\theta_{1},d_{1})]$ and $\rho^{\prime}<1$ is the maximum singular value of the matrix $P^{\prime}$ .

With (IX-G) and (IX-G), it follows that

	$\displaystyle\frac{1}{T}\sum_{k=1}^{T}\mathbb{E}_{v_{[T]}}[\tilde{\Phi}(\theta_{k})-\tilde{\Phi}(\theta_{k}^{*})]\leq l_{3}^{\prime}+$
	$\displaystyle(l_{1}^{\prime}+\sqrt{\frac{p_{21}^{\prime}}{p_{12}^{\prime}}}l_{2}^{\prime})\left\{(\rho^{\prime^{T}}+\frac{1}{1-\rho^{\prime}})B_{1}^{\prime}+\frac{T}{1-\rho^{\prime}}(d_{1}^{\prime}+\sqrt{\frac{p_{12}^{\prime}}{p_{21}^{\prime}}}d_{2}^{\prime})\right\}$		(52)

where

	$\displaystyle l_{1}^{\prime}=$	$\displaystyle\frac{2\eta^{2}}{T}+\frac{p^{2}\eta^{2}}{\delta^{2}T}(98M^{2}+96\mu^{\prime}M_{x}^{2}M_{y}^{2}M_{g}^{2}),$
	$\displaystyle l_{2}^{\prime}=$	$\displaystyle\frac{12\mu^{\prime}(\mu^{\prime}+1)p^{2}M_{y}^{2}M_{g}^{2}}{c_{2}\delta^{2}T},$
	$\displaystyle l_{3}^{\prime}=$	$\displaystyle\frac{(192\sigma^{2}+384\delta^{2}M_{x}^{2})\mu^{\prime}p^{2}M_{y}^{2}M_{g}^{2}}{\delta^{2}T}$
		$\displaystyle+\frac{192p^{2}\sigma^{2}}{\delta^{2}}+386p^{2}M^{2}+2M\delta.$

Due to $\delta=\frac{\epsilon}{M}$ , we set $\eta=\frac{\kappa\sigma\epsilon}{p^{2}\sqrt{T}}$ such that $\frac{p^{4}\eta^{2}}{\epsilon^{4}}$ and $\frac{\sigma^{2}}{\epsilon^{2}T}$ have the same order and $\kappa\in(0,\frac{p^{2}\sqrt{T}}{\sigma\epsilon})$ such that $\eta\in(0,1)$ . Then, the order of (IX-G) is shown as (2).

Acknowledgments

The authors would like to thank Zhiyu He (now pursuing Ph.D in ETH ) for early inspiring discussions and valuable comments on this topic.

References

[1] P. Antsaklis and J. Baillieul, “Special issue on technology of networked control systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 5–8, 2007.
[2] R. A. Gupta and M.-Y. Chow, “Networked control system: Overview and research trends,” IEEE Transactions on Industrial Electronics, vol. 57, no. 7, pp. 2527–2535, 2009.
[3] X.-M. Zhang, Q.-L. Han, X. Ge, D. Ding, L. Ding, D. Yue, and C. Peng, “Networked control systems: A survey of trends and techniques,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 1, pp. 1–17, 2019.
[4] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,” ACM Transactions on Information and System Security (TISSEC), vol. 14, no. 1, pp. 1–33, 2011.
[5] Y. Chen, S. Kar, and J. M. Moura, “Optimal attack strategies subject to detection constraints against cyber-physical systems,” IEEE Transactions on Control of Network Systems, vol. 5, no. 3, pp. 1157–1168, 2017.
[6] Z. Zhao, Y. Huang, Z. Zhen, and Y. Li, “Data-driven false data-injection attack design and detection in cyber-physical systems,” IEEE Transactions on Cybernetics, vol. 51, no. 12, pp. 6179–6187, 2020.
[7] X.-L. Wang, “Optimal attack strategy against fault detectors for linear cyber-physical systems,” Information Sciences, vol. 581, pp. 390–402, 2021.
[8] Q. Zhang, K. Liu, D. Han, G. Su, and Y. Xia, “Design of stealthy deception attacks with partial system knowledge,” IEEE Transactions on Automatic Control, 2022.
[9] J. Kim, L. Tong, and R. J. Thomas, “Subspace methods for data attack on state estimation: A data driven approach,” IEEE Transactions on Signal Processing, vol. 63, no. 5, pp. 1102–1114, 2014.
[10] L. An and G.-H. Yang, “Data-driven coordinated attack policy design based on adaptive $\mathcal{L}_{2}$ -gain optimal theory,” IEEE Transactions on Automatic Control, vol. 63, no. 6, pp. 1850–1857, 2017.
[11] R. Alisic, J. Kim, and H. Sandberg, “Model-free undetectable attacks on linear systems using lwe-based encryption,” IEEE Control Systems Letters, vol. 7, pp. 1249–1254, 2023.
[12] H. Zhang, P. Cheng, L. Shi, and J. Chen, “Optimal denial-of-service attack scheduling with energy constraint,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 3023–3028, 2015.
[13] Y. Zhang, Y. Zhou, K. Ji, and M. M. Zavlanos, “A new one-point residual-feedback oracle for black-box learning and control,” Automatica, vol. 136, p. 110006, 2022.
[14] M. Colombino, E. Dall’Anese, and A. Bernstein, “Online optimization as a feedback controller: Stability and tracking,” IEEE Transactions on Control of Network Systems, vol. 7, no. 1, pp. 422–432, 2020.
[15] X. Luo, C. Fang, C. Zhao, and J. He, “A model-free false data injection attack strategy in networked control systems,” in IEEE Conference on Decision and Control (CDC), accepted, 2022.
[16] Z. Guo, D. Shi, K. H. Johansson, and L. Shi, “Worst-case stealthy innovation-based linear attack on remote state estimation,” Automatica, vol. 89, pp. 117–124, 2018.
[17] H. Zhang, P. Cheng, J. Wu, L. Shi, and J. Chen, “Online deception attack against remote state estimation,” IFAC Proceedings Volumes, vol. 47, no. 3, pp. 128–133, 2014.
[18] M. Esmalifalak, H. Nguyen, R. Zheng, and Z. Han, “Stealth false data injection using independent component analysis in smart grid,” in 2011 IEEE International Conference on Smart Grid Communications (SmartGridComm). IEEE, 2011, pp. 244–248.
[19] Z. He, S. Bolognani, J. He, F. Dörfler, and X. Guan, “Model-free nonlinear feedback optimization,” arXiv preprint arXiv:2201.02395, 2022.
[20] A. L. Dontchev and R. T. Rockafellar, Implicit functions and solution mappings. Springer, 2009, vol. 543.
[21] N. Bof, R. Carli, and L. Schenato, “Lyapunov theory for discrete time systems,” arXiv preprint arXiv:1809.05289, 2018.
[22] G. Belgioioso, D. Liao-McPherson, M. H. de Badyn, S. Bolognani, J. Lygeros, and F. Dörfler, “Sampled-data online feedback equilibrium seeking: Stability and tracking,” arXiv preprint arXiv:2103.13988, 2021.
[23] P. Jain, P. Kar et al., “Non-convex optimization for machine learning,” Foundations and Trends® in Machine Learning, vol. 10, no. 3-4, pp. 142–363, 2017.
[24] A. Nedić and J. Liu, “Distributed optimization for control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, pp. 77–103, 2018.
[25] H. Deng, M. Krstic, and R. J. Williams, “Stabilization of stochastic nonlinear systems driven by noise of unknown covariance,” IEEE Transactions on Automatic Control, vol. 46, no. 8, pp. 1237–1253, 2001.
[26] S. Kullback, Information theory and statistics. Courier Corporation, 1997.
[27] L. A. Maglaras and J. Jiang, “Intrusion detection in SCADA systems using machine learning techniques,” in Science and Information Conference, 2014, pp. 626–631.
[28] J. Goh, S. Adepu, M. Tan, and Z. S. Lee, “Anomaly detection in cyber physical systems using recurrent neural networks,” in IEEE International Symposium on High Assurance Systems Engineering (HASE), 2017, pp. 140–145.
[29] E. Anthi, L. Williams, M. Rhode, P. Burnap, and A. Wedgbury, “Adversarial attacks on machine learning cybersecurity defences in industrial control systems,” Journal of Information Security and Applications, vol. 58, p. 102717, 2021.
[30] D. Gadginmath, V. Krishnan, and F. Pasqualetti, “Direct vs indirect methods for behavior-based attack detection,” arXiv preprint arXiv:2209.07564, 2022.
[31] S. Liu, X. Li, P.-Y. Chen, J. Haupt, and L. Amini, “Zeroth-order stochastic projected gradient descent for nonconvex optimization,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2018, pp. 1179–1183.

Xiaoyu Luo (S’19) received B.E. degree in the Department of Automation from Tianjin University, Tianjin, China, in 2019. She is currently pursuing the Ph.D. degree with the Department of Automation, Shanghai Jiao Tong University, Shanghai, China. She is a member of Intelligent of Wireless Networking and Cooperative Control group. Her research interests include fault-tolerant control in multi-agent systems, cooperative charging in energy storage system and security of cyber-physical systems.

Chrongrong Fang received the B.Sc. degree in automation and the Ph.D. degree in control science and engineering from Zhejiang University, Hangzhou, China, in 2015 and 2020, respectively. He is currently an Assistant Professor with the Department of Automation, Shanghai Jiao Tong University, Shanghai, China. His research interests include anomaly detection and diagnosis in cyber-physical systems and cloud networks.

Jianping He (M’15-SM’19) is currently an associate professor in the Department of Automation at Shanghai Jiao Tong University. He received the Ph.D. degree in control science and engineering from Zhejiang University, Hangzhou, China, in 2013, and had been a research fellow in the Department of Electrical and Computer Engineering at University of Victoria, Canada, from Dec. 2013 to Mar. 2017. His research interests mainly include the distributed learning, control and optimization, security and privacy in network systems. Dr. He serves as an Associate Editor for IEEE Trans. on Control of Network Systems, IEEE Open Journal of Vehicular Technology and KSII Trans. Internet and Information Systems. He was also a Guest Editor of IEEE TAC, IEEE TII, International Journal of Robust and Nonlinear Control, etc. He was the winner of Outstanding Thesis Award, Chinese Association of Automation, 2015. He received the best paper award from IEEE WCSP’17, the best conference paper award from IEEE PESGM’17, the finalist best student paper award from IEEE ICCA’17, and the finalist best conference paper award from IEEE VTC’20-Fall.

Chengcheng Zhao received the PhD degree in control science and engineering from Zhejiang University, Hangzhou, China, in 2018. She is currently a research fellow in the Department of Electrical and Computer Engineering, University of Victoria. Her research interests include consensus and distributed optimization, distributed energy management in smart grids, vehicle platoon, and security and privacy in network systems. She received IEEE PESGM 2017 best conference papers award, and one of her paper was shortlisted in IEEE ICCA 2017 best student paper award finalist. She is a peer reviewer for Automatica, IEEE Transactions on Information Forensics and Security, IEEE Transactions on Industrial Electronics and etc. She was the TPC member for IEEE GLOBECOM 2017, 2018, and IEEE ICC 2018.

Dario Paccagnan is an Assistant Professor and a member of the Computational Optimization Group in the Department of Computing, Imperial College London. Previously, he was a Postdoctoral Fellow with the Mechanical Engineering Department and the Center for Control, Dynamical Systems and Computation, University of California, Santa Barbara. In 2018, Dario obtained a Ph.D. degree from the Information Technology and Electrical Engineering Department, ETH Zurich, Switzerland. He received a B.Sc. and M.Sc. in Aerospace Engineering in 2011 and 2014 from the University of Padova, Italy, and a M.Sc. in Mathematical Modelling and Computation from the Technical University of Denmark in 2014; all with Honors. Dario was a visiting scholar at the University of California, Santa Barbara in 2017, and at Imperial College of London, in 2014. His interests are at the interface between control theory and game theory, with a focus on the design of behavior-influencing mechanisms for socio-technical systems. Applications include multiagent systems and smart cities. Dr. Paccagnan was awarded the ETH medal, and is recipient of the SNSF fellowship for his work in Distributed Optimization and Game Design.

$\displaystyle\\|w_{k+1}-w_{k+1}^{*}\\|^{2}=$	$\displaystyle\\|\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}]-w_{k+1}^{*}\\|^{2}$
$\displaystyle\overset{(s.1)}{\leq}$	$\displaystyle\\|w_{k}-\eta\tilde{\phi}_{k}-w_{k+1}^{*}\\|^{2}$
$\displaystyle\overset{(s.2)}{\leq}$	$\displaystyle 2\\|w_{k}-w_{k+1}^{*}\\|^{2}+2\eta^{2}\\|\tilde{\phi}_{k}\\|^{2},$	(14)

$\displaystyle\\|w_{k+1}-w_{k}\\|^{2}=$	$\displaystyle\\|\Pi_{\mathcal{U}}[w_{k}-\eta\tilde{\phi}_{k}]-w_{k}\\|^{2}$
$\displaystyle\leq$	$\displaystyle\\|w_{k}-\eta\tilde{\phi}_{k}-w_{k}\\|^{2}$
$\displaystyle\leq$	$\displaystyle\eta^{2}\\|\tilde{\phi}_{k}\\|^{2}.$	(15)

$\displaystyle\|e_{\Phi}(x_{k},\theta_{k})\|^{2}=$	$\displaystyle\|\Phi(\theta_{k},y^{a}_{k+1})-\Phi(\theta_{k},h(u_{k},\theta_{k}))\|^{2}$
$\displaystyle\leq$	$\displaystyle M_{y}^{2}\\|y^{a}_{k+1}-h(u_{k},\theta_{k})\\|^{2}$
$\displaystyle\leq$	$\displaystyle M_{y}^{2}M_{g}^{2}\\|x^{a}_{k+1}-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}.$	(34)

	$\displaystyle V(x^{a}_{k},u_{k},\theta_{k})\leq\alpha_{2}\\|x^{a}_{k}-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}$
	$\displaystyle=\alpha_{2}\\|x^{a}_{k}-x^{a}_{ss}(u_{k-1},\theta_{k-1})+x^{a}_{ss}(u_{k-1},\theta_{k-1})-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2}$
	$\displaystyle\overset{(s.1)}{\leq}2\alpha_{2}(\\|x^{a}_{k}-x^{a}_{ss}(u_{k-1},\theta_{k-1})\\|^{2}+\\|x^{a}_{ss}(u_{k-1},\theta_{k-1})-x^{a}_{ss}(u_{k},\theta_{k})\\|^{2})$
	$\displaystyle\overset{(s.2)}{\leq}2\alpha_{2}(\frac{1}{\alpha_{1}}(1-\frac{\alpha_{3}}{\alpha_{2}}V(x^{a}_{k-1},u_{k-1},\theta_{k-1}))+M_{x}^{2}\\|\theta_{k}-\theta_{k-1}\\|^{2}),$

	$\displaystyle\mathbb{E}_{v_{[k]}}$	$\displaystyle[\\|\theta_{k}-\theta_{k-1}\\|^{2}]$
		$\displaystyle=\mathbb{E}_{v_{[k]}}[\\|w_{k}-w_{k-1}+\delta v_{k}-\delta v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.1)}{\leq}\mathbb{E}_{v_{[k]}}[2\\|w_{k}-w_{k-1}\\|^{2}+2\delta^{2}\\|v_{k}-v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.2)}{\leq}2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+2\delta^{2}\mathbb{E}_{v_{[k]}}[2\\|v_{k}\\|^{2}+2\\|v_{k-1}\\|^{2}]$
		$\displaystyle\overset{(s.3)}{\leq}2\eta^{2}\mathbb{E}_{v_{[k]}}[\\|\tilde{\phi}_{k-1}\\|^{2}]+8\delta^{2},$

Model-free False Data Injection Attack in Networked Control Systems: A Feedback Optimization Approach

Abstract

Index Terms:

I Introduction

I-A Motivation

I-B Contributions

I-C Paper organization

II Related works

III Problem formulation

III-A System dynamic model &\& adversary model

Assumption 1.

Assumption 2.

Lemma 1.

Remark 1.

Assumption 3.

III-B Problem formulation

Assumption 4.

IV Model-free attack strategy design

IV-A Preliminaries of zeroth-order optimization

Lemma 2 ([19]).

IV-B Attack strategy design

IV-C Performance analysis

Lemma 3.

Lemma 4.

Lemma 5.

Theorem 1.

Proof.

V Noise effects on model-free attack design

V-A Problem reformulation

Assumption 5.

Assumption 6.

Lemma 6.

Proof.

Remark 2.

V-B Attack strategy design with noise

V-C Optimality with the general noise

Lemma 7.

Lemma 8.

Theorem 2.

Remark 3.

VI Discussion

VII Simulation results

VIII Conclusion

IX APPENDIX

IX-A Proof of Lemma 3

IX-B Proof of Lemma 4

IX-C Proof of Lemma 5

IX-D Proof of Theorem 1

IX-E Proof of Lemma 7

IX-F Proof of Lemma 8

IX-G Proof of Theorem 2

Acknowledgments

References

III-A System dynamic model $\&$ adversary model