Sensor Fault Detection and Isolation via Networked Estimation: Full-Rank Dynamical Systems

Mohammadreza Doostmohammadian, Nader Meskin, Senior Member, IEEE This paper was supported by the Qatar National Research Fund (a member of the Qatar Foundation) under the NPRP Grant number NPRP10-0105-170107.Mohammadreza Doostmohammadian is with the Faculty of Mechanical Engineering at Semnan University, Semnan, Iran, email: [email protected]. Nader Meskin is with the Electrical Engineering Department at Qatar University, Doha, Qatar, email: [email protected].

Abstract

This paper considers the problem of simultaneous sensor fault detection, isolation, and networked estimation of linear full-rank dynamical systems. The proposed networked estimation is a variant of single time-scale protocol and is based on (i) consensus on a-priori estimates and (ii) measurement innovation. The necessary connectivity condition on the sensor network and stabilizing block-diagonal gain matrix is derived based on our previous works. Considering additive faults in the presence of system and measurement noise, the estimation error at sensors is derived and proper residuals are defined for fault detection. Unlike many works in the literature, no simplifying upper-bound condition on the noise is considered and we assume Gaussian system/measurement noise. A probabilistic threshold is then defined for fault detection based on the estimation error covariance norm. Finally, a graph-theoretic sensor replacement scenario is proposed to recover possible loss of networked observability due to removing the faulty sensor. We examine the proposed fault detection and isolation scheme on an illustrative academic example to verify the results and make a comparison study with related literature.

Keywords: Fault Detection and Isolation, Observability, Networked Estimation, System Digraph, Residual

I Introduction

Fault Detection and Isolation (FDI) is an emerging topic of recent literature in control and signal-processing [1]. In general, a fault is characterized as an unpermitted deviation or malfunction of (at least) one of the system parameters/properties from its standard condition, which may take place at the plant, sensor, or actuator. Fault Detection, Isolation, and Reconfiguration (FDIR) is therefore a methodology to ensure acceptable system operation and to compensate for occurrence of faults detected by FDI module[1]. The fault diagnosis schemes are spanned from the centralized approaches [2, 3, 4] to more recent distributed methods [5, 6, 7, 8, 9, 10, 11]. Indeed, with recent technological trends, many practical systems of interest are either large-scale or physically distributed, and thus it is required to develop distributed or networked FDI strategies. Toward this goal, in this paper, the problem of simultaneous sensor fault detection, isolation and networked estimation of dynamical system is investigated.

Fault detection and isolation in networked dynamical systems has been extensively studied in the literature. In [6], an FDI strategy is proposed for a distributed heterogeneous multi-agent system and in [7], fault detection of a group of interconnected noise-free individual subsystems is considered such that each subsystem is able to detect faults in its neighborhood. In [8], an FDI problem for interconnected double-integrator system under unknown system faults and fault-free measurements is investigated such that almost all faults except the faults with zero dynamics can be detected. Similarly, in [9], a faulty second-order dynamical system (representing a heterogeneous multi-agent system) is considered with bounded disturbance and fault-free measurements and a distributed observer-based FDI strategy is proposed. The case of homogeneous multi-agent linear systems under upper-bounded system noise is considered in [10], where a distributed observer-based FDI based on fault-free measurements is developed to detect faults at agents. Development of FDI on noise-free faulty systems monitored by a multi-agent network communicating under an event-triggered framework is considered in [11]. It should be noted that most of these works are with the main aim of fault detection and the state estimation of the underlying dynamical system is not considered.

The other related trend in the literature is attack detection and resilient/secure estimation, both centralized [12, 13, 14, 15, 16, 17, 18] and distributed [19, 20, 21, 22, 23, 24, 25, 26, 27]. Distributed estimation strategies are more prominent in recent years due to emergence of large-scale applications. In [19], a distributed $H_{\infty}$ observer is proposed to detect admissible biasing attacks over distributed estimation networks by introducing an auxiliary input tracking model. In [20], both false-data injection attack at the system/process and jamming attack at the communication network of sensors are considered and in [21], a distributed gradient descent protocol is adopted to optimize the norm of the covariance of the measurement updates as the cost function. In [22], a distributed method is proposed to estimate the system state over a k-regular sensor network under attacked noise-free measurements and in [23], a distributed resilient estimation scheme is proposed where the time-scale of the sensors (as estimators) needs to be faster than the system dynamics.

Secure distributed estimation of cyber-physical systems composed of interconnected subsystems each monitored by a sensor under possible unidentifiable attack is discussed in [24]. In [25], secure distributed estimators for noise-free system and measurements is proposed over dynamic sensor networks subject to communication loss. Other relevant works include secure estimation based on reachability analysis in [26] and static parameter estimation in [27]. Note that most of these works propose a resilient estimation protocol under sensor attack, where no attack detection and isolation scheme is considered. In general, distributed fault/attack detection finds application in cases where the centralized architecture is not possible or not desirable, as in smart-grid monitoring [28], large-scale wind-farms [29], and networked unmanned vehicles [30].

In this paper, a quantitative model-based method is adopted, as in the observer-based methods, to develop explicit mathematical model and control theory to generate residuals for sensor fault detection. The proposed observer/estimator in this paper is distributed/networked, which finds application in large-scale architectures. Similar networked estimators are proposed in the literature [31, 32, 33, 34, 35, 36, 37]. A single time-scale networked estimator is developed to track the global state of the dynamical system over a distributed network of sensors each taking local measurements with partial observability. Unlike some literature [33, 34, 32], it is not assumed that the system is observable in the neighborhood of each sensor, i.e. the system is not necessarily observable by the measurements directly shared with the sensor. This significantly reduces the connectivity requirements on the sensor network, and therefore reduces the network communication costs. Similar to [35, 36, 37], it is assumed that the underlying dynamical system is full-rank and conditions for networked observability are developed.

The design procedure in this paper is based on structured system theory, previously developed in [38, 39]. Structural (or generic) analysis is widely used in system theory, namely in fault detection [40, 41] and in distributed observability characterization [42]. Using structural analysis, the structure of the sensor network and local estimator gain can be designed to satisfy distributed observability. However, in this paper possible additive faults in sensors are considered and by mathematically deriving the estimation error and sensor residuals, probabilistic thresholds on the residuals are obtained based on error covariance, which leads to the fault detection and isolation logic. Finally, after detecting and isolating the faulty sensor, a graph-theoretic method is proposed to replace the faulty sensor with an observationally equivalent state measurement to compensate the loss of observability. To summarize, the main contributions of this paper as compared with related literature are as follows:

•

Considering system/process and measurement noise is a challenge in FDI strategies. This is mainly due to the fact that it is generally hard to distinguish between the presence of faults and system/measurement noise as both fault and noise terms may affect the sensor residuals. In this direction, some works in the literature [7, 18, 11, 25] assume that the system and/or sensor measurement are noise-free, which is a simplifying assumption. In this paper, we consider both system and measurement noise and propose a probability-based threshold design on the residuals to overcome this challenge.
•

Following the above comment, many works in FDI and attack detection literature [43, 12, 13, 44, 45, 10] assume that the noise variable is upper-bounded, i.e the noise term instead of taking different values, for example, from Gaussian distribution, only takes values in a limited range. This simplifying assumption helps to design deterministic thresholds on the residuals. In this paper, no such assumption is made, and the noise is assumed to be Gaussian random variable with no upper-bound, which is more realistic as compared to the mentioned references.
•

In this paper, a general LTI system is considered, which generalizes the multi-agent systems or interconnected systems each possessing a separate dynamics considered in [6, 7, 10], and the double-integrator system in [8, 9]. Furthermore, the sensor network is considered as a Strongly-Connected (SC) graph as compared to restrictive regularity condition considered in [22].
•

We adopt a graph-theoretic approach to recover the loss of observability in case of detecting and removing a faulty sensor. This approach is based on our previous works on observational equivalence [46, 47].

The rest of this paper is organized as follows. In Section II, the general framework and state the problem are presented. Section III provides the networked estimator scenario and develops the condition for error stability. In Section IV, the algorithm for block-diagonal gain design at sensors is provided. Section V develops the sensor fault-detection and isolation logic, and in Section VI, a graph-theoretic method for observability compensation is discussed. Section VII gives simulation example to illustrate the results. Finally, Section VIII concludes the paper.

II The Framework

We consider noisy discrete-time linear systems as,

\displaystyle\mathbf{x}_{k+1}=A\mathbf{x}_{k}+\mathbf{\nu}_{k},\qquad k\geq 0,

(1)

where $\mathbf{x}_{k}\in\mathbb{R}^{n}$ is the system state and $\mathbf{\nu}_{k}=\mathcal{N}(0,Q)$ is the system noise at time-step $k$ . In this paper, the underlying system matrix $A$ is considered to be full-rank and examples of such full-rank systems are self-damped dynamical systems [48]. It should be noted that the LTI system (1) can be also obtained by the discretization of a continuous-time LTI system, based on Euler or Tustin discretization methods discussed in [48] and both of these methods result in a full-rank discrete-time LTI system. Moreover, the full-rank condition can be inherent to the system dynamics, such as in Nearly-Constant-Velocity (NCV) model for target dynamics in distributed tracking scenarios [49]. In [49], the target is modeled as a discrete-time dynamical system whose associated matrix is full-rank due to its non-zero diagonal entries. The system full-rank condition is also considered in the networked estimation literature as in [35, 36, 37].

The noise/fault-corrupted measurements of the system are taken by $N$ sensors,

\displaystyle\left(\begin{array}[]{c}\mathbf{y}^{1}_{k}\\ \vdots\\ \mathbf{y}^{N}_{k}\end{array}\right)

\displaystyle=\left(\begin{array}[]{c}C_{1}\\ \vdots\\ C_{N}\end{array}\right)\left(\begin{array}[]{c}\mathbf{x}^{1}_{k}\\ \vdots\\ \mathbf{x}^{n}_{k}\end{array}\right)+\left(\begin{array}[]{c}\mathbf{\zeta}^{1}_{k}\\ \vdots\\ \mathbf{\zeta}^{N}_{k}\end{array}\right)+\left(\begin{array}[]{c}\mathbf{f}^{1}_{k}\\ \vdots\\ \mathbf{f}^{N}_{k}\end{array}\right)

(17)

which can be written as the global measurement equation as follows:

\displaystyle\mathbf{y}_{k}

\displaystyle=

\displaystyle C\mathbf{x}_{k}+\mathbf{\zeta}_{k}+\mathbf{f}_{k},

(18)

where $\mathbf{y}_{k}\in\mathbb{R}^{N}$ , $\mathbf{\zeta}_{k}=\mathcal{N}(0,R)$ is the measurement noise, and $\mathbf{f}_{k}$ represents the sensor fault vector. It is assumed that $\mathbf{\zeta}_{k}$ and $\mathbf{\nu}_{k}$ are zero-mean Gaussian while noise, i.e. $\mathbb{E}(\nu_{k})=0$ , $\mathbb{E}(\zeta^{i}_{k})=0$ , for all $i,k$ , and $\mathbb{E}(\nu_{k}\nu_{m})=0$ , for all $k\neq m$ , and similarly, for the measurement noise. Further, without loss of generality, it is assumed that each sensor measures one of the system states.

The general problem in this paper is to design a stable networked estimation protocol in the absence of faults, and, then, a fault detection and isolation logic such that to detect possible faults in sensors. Given the system and measurements as in (1) and (18), a group of sensors is considered, each embedded with communication and computation equipment to measure a state of the dynamical system and process the measured data and information received from the other sensors to estimate the global state of the dynamical system.

Note that unlike many works in the literature [33, 34, 32], no assumption on the observability of system in the direct neighborhood of each sensor is considered which implies that the minimal connectivity on the communication network of agents is required. Each sensor adopts a single time-scale networked estimation protocol and by proper design of communication network structure and feedback gain matrix, each sensor tracks the system state by bounded steady-state estimation error in the fault-free case. In case of fault occurrence, i.e. $\mathbf{f}_{k}^{i}\neq 0$ at any sensor $i$ , a residual-based fault detection and isolation logic is proposed to detect the faulty sensor by comparing the sensor residuals with pre-specified thresholds. Note that unlike many work in the literature [43, 12, 13, 44, 45], it is not assumed that the system noise and/or measurement noise are upper-bounded. Finally, by detecting the faulty sensor, one may compensate for possible loss of observability in the distributed system by introducing a new sensor measurement replacing the faulty sensor.

III Networked Estimation Protocol

In this section, we present our main tool to perform diagnosis as a single time-scale networked estimation protocol and analyze its error stability criteria. The networked estimator is proposed based on collaborative consensus on the a-priori estimates at sensors. The estimator is based on two steps as follows,

	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k-1}=$	$\displaystyle\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1},$		(19)
	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k}=$	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k-1}+K^{i}C_{i}^{\top}\left(\mathbf{y}^{i}_{k}-C_{i}\widehat{\mathbf{x}}^{i}_{k\|k-1}\right),$		(20)

where $\widehat{\mathbf{x}}^{i}_{k|k-1}$ is the estimate of state $\mathbf{x}_{k}$ at sensor $i$ given all the measurements up to time $k-1$ , $\widehat{\mathbf{x}}^{i}_{k|k}$ is the estimated state given all the measurements up to time $k$ , the matrix $W$ is a stochastic matrix for consensus on a-priori estimates, and $K^{i}$ is the gain matrix at sensor $i$ . Note that the row-stochastic condition on $W$ is a necessary condition for (consensus) averaging of a-priori estimates. Recall that a matrix is row-stochastic if the summation of the entries in each row are equal to $1$ , i.e. $\sum_{j=1}^{N}W_{ij}=1$ . Further, $\mathcal{N}_{\beta}(i)$ defines the neighborhood of sensor $i$ over which the sensor shares a-priori estimates with the neighboring sensors and this neighborhood follows the structure of matrix $W$ . In fact $\mathcal{N}_{\beta}(i)=\{j|j\rightarrow i\}\cup\{i\}$ where $j\rightarrow i$ implies that sensor $j$ sends its information to $i$ and $\mathcal{N}_{\beta}(i)$ also includes the sensor $i$ itself, i.e. all the diagonal entries of the matrix $W$ are non-zero. This is due to the fact that each sensor uses its own estimate at previous time $\widehat{\mathbf{x}}^{i}_{k-1|k-1}$ to calculate its a-priori estimate $\widehat{\mathbf{x}}^{i}_{k|k-1}$ . It should be noted that the protocol (19)-(20) differs from our previous works [38, 39] in two aspects: (i) the protocol in [38, 39] has one more consensus step on the measurement fusion in which the neighboring measurements are shared over a different (hub-based) network, while the protocol (19)-(20) only includes one step of averaging on a-priori estimates. Therefore, the network connectivity condition in this paper is more relaxed as compared to [38, 39]; (ii) the distributed approach in [38, 39] is fault-free where the sensor measurements are not accompanied with additive faults and consequently, the error dynamics analysis is different from this work.

Note that the estimator (19)-(20) is single time-scale as compared to multi time-scale networked estimation given in [50, 23] where many step of averaging (or consensus) is performed between every two steps $k$ and $k+1$ of system dynamics (see Fig. 1).

Refer to caption — Figure 1: Single and multiple time-scale strategies for networked estimation: (Left) multi time-scale also known as average-consensus based approach, and (Right) single time-scale approach. As it is clear from the figure, in the former scenario many steps of averaging is performed between every two successive time-steps of system dynamics, while in the latter case only one step of averaging is done between every two successive steps of system dynamics. The former requires faster processing units, while the latter is more desirable in real-time applications.

It is known that the single time-scale approach is privileged over the multi time-scale method, since the latter requires a large number of information-exchange and communication over the sensor network between every two successive time-steps of the system dynamics. This implies that the communication time-scale needs to be much faster than the dynamics, which is not desirable and even may not be feasible for many large-scale real-time systems. To elaborate this, note that in protocol (19)-(20) only one step of averaging on a-priori estimates is done between two successive steps $k$ and $k+1$ , while the protocol in [50] requires many steps of consensus between steps $k$ and $k+1$ . In the multi time-scale estimation, the connectivity of the sensor network is more relaxed due to more information exchanges among the sensors, while in the single time-scale method, the sensor network requires more connectivity particularly for rank-deficient dynamical systems. In fact, it is proved that a SC network may not guarantee error stability for rank-deficient dynamical systems and certain hub-based network design is necessary, see [38, 39] for details.

Define the error $\mathbf{e}_{k}^{i}$ at time-step $k$ at sensor $i$ as,

$\displaystyle\mathbf{e}_{k}^{i}=$	$\displaystyle\mathbf{x}_{k\|k}-\widehat{\mathbf{x}}^{i}_{k\|k},$
$\displaystyle=$	$\displaystyle\mathbf{x}_{k}-\Bigl{(}\widehat{\mathbf{x}}^{i}_{k\|k-1}+K^{i}C_{i}^{\top}(\mathbf{y}^{i}_{k}-C_{i}\widehat{\mathbf{x}}^{i}_{k\|k-1})\Bigr{)}$
$\displaystyle=$	$\displaystyle\mathbf{x}_{k}-\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}$
	$\displaystyle+K^{i}C_{i}^{\top}\Bigl{(}\mathbf{y}^{i}_{k}-C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}\Bigr{)}\Bigr{)}.$	(21)

By substituting (1)-(18) in above, it follows that:

	$\displaystyle\mathbf{e}_{k}^{i}=$	$\displaystyle A\mathbf{x}_{k-1}+\mathbf{\nu}_{k-1}-\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}$
		$\displaystyle+K^{i}C_{i}^{\top}\Bigl{(}C_{i}\mathbf{x}_{k}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}$
		$\displaystyle-C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}\Bigr{)}\Bigr{)}$
	$\displaystyle=$	$\displaystyle A\mathbf{x}_{k-1}+\mathbf{\nu}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}$
		$\displaystyle-K^{i}\Bigl{(}C_{i}^{\top}C_{i}(Ax_{k-1}+\mathbf{\nu}_{k-1})+C_{i}^{\top}\mathbf{\zeta}^{i}_{k}$
		$\displaystyle+C_{i}^{\top}\mathbf{f}^{i}_{k}-C_{i}^{\top}C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}\Bigr{)}$

Recall that the row-stochastic condition of $W$ implies that $\sum_{j=1}^{N}W_{ij}=1$ . Note that for $j\in\mathcal{N}_{\beta}(i)$ , $W_{ij}\neq 0$ and for $j\notin\mathcal{N}_{\beta}(i)$ , $W_{ij}=0$ . Therefore, the row-stochastic condition can be re-written as $\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}=1$ . Based on this fact, it follows that (see [39] for similar analysis):

\displaystyle A\mathbf{x}_{k-1}=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1},

and consequently:

	$\displaystyle\mathbf{e}_{k}^{i}$	$\displaystyle=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}$
		$\displaystyle-K^{i}C_{i}^{\top}C_{i}\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1}\Bigr{)}$
		$\displaystyle+\mathbf{\nu}_{k-1}-K^{i}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}-K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}-K^{i}C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}$
		$\displaystyle=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{e}_{k-1}^{j}-K^{i}C_{i}^{\top}C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{e}^{j}_{k-1}+\mathbf{\eta}_{k}^{i},$

where $\mathbf{\eta}_{k}^{i}$ collects the noise terms and fault term as follows:

\displaystyle\mathbf{\eta}_{k}^{i}=\mathbf{\nu}_{k-1}-K^{i}\Bigl{(}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}+C_{i}^{\top}\mathbf{f}^{i}_{k}+C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}\Bigr{)}.

(22)

The collective error at the group of sensors is defined as the concatenation of errors at all sensors, i.e. $\mathbf{e}_{k}=(\mathbf{e}_{k}^{1\top},\dots,\mathbf{e}_{k}^{N\top})^{\top}$ . For the collective error, it follows that:

\displaystyle\mathbf{e}_{k}=\widehat{A}\mathbf{e}_{k-1}+\mathbf{\eta}_{k},

(23)

where $\widehat{A}=W\otimes A-KD_{C}(W\otimes A)$ with $K=\mbox{blockdiag}(K^{i})$ and $D_{C}=\mbox{blockdiag}(C_{i}^{\top}C_{i})$ . The collective noise vector is given as:

\displaystyle\mathbf{\eta}_{k}

\displaystyle=\mathbf{1}_{N}\otimes\mathbf{\nu}_{k-1}-KD_{C}(\mathbf{1}_{N}\otimes\mathbf{\nu}_{k-1})-K\overline{D}_{C}\mathbf{\zeta}_{k}-K\overline{D}_{C}\mathbf{f}_{k},

(24)

where $\mathbf{1}_{N}$ is the vector of 1s of size $N$ and $\overline{D}_{C}=\mbox{blockdiag}(C_{i}^{\top})$ .

To stabilize the error dynamics (23), it is necessary that the pair $(W\otimes A,D_{C})$ to be observable [51]. Note that the observability analysis of Kronecker product of matrices (and the associated composite graph) is discussed in details in [52], where it is proved that for observability of $(W\otimes A,D_{C})$ , the matrix $W$ needs to be irreducible. This implies that the sensor network needs to be Strongly-Connected (SC). In fact, this is also discussed in [38], and since the protocol (19)-(20) is a variant of the protocol in [38], the same analysis can be adopted here to prove the irreducibility of $W$ matrix. Therefore, any irreducible matrix (associated with a SC network) with non-zero diagonal entries while satisfying the row-stochastic condition can work as $W$ matrix. The SC connectivity of the sensor network is also considered in similar networked estimation literature, see for example [35, 36, 37]. Note that the $(W\otimes A,D_{C})$ -observability is also referred as networked observability (or distributed observability) condition. As it is explained in the next section, having $(W\otimes A,D_{C})$ -observability satisfied, using Linear-Matrix-Inequalities (LMI), the block-diagonal gain matrix $K$ can be designed such that $\widehat{A}$ is a Schur matrix [38, 53, 54], i.e. $\rho(\widehat{A})<1$ where $\rho$ defines the spectral radius of a matrix.

IV Design of Block-Diagonal Feedback Gain based on LMI Approach

In this section, we discuss the methodology for computation of a block-diagonal estimator gain $K$ . Note that, the observability of $(W\otimes A,D_{C})$ in the distributed estimator (19)-(20) guarantees the existence of a full matrix $K$ , such that $\rho(W\otimes A-KD_{C}(W\otimes A))<1$ . However, for having a distributed approach, the gain matrix is required to be block-diagonal. Such $K$ is known to be the solution of an LMI as follows:

\displaystyle~{}~{}\left(\begin{array}[]{cc}X&\widehat{A}^{\top}X\\ X\widehat{A}&X\\ \end{array}\right)\succ 0\Rightarrow\left(\begin{array}[]{cc}X&\widehat{A}^{\top}\\ \widehat{A}&Y\\ \end{array}\right)\succ 0,

(25)

where $\widehat{A}=W\otimes A-KD_{C}(W\otimes A)$ and $X\succ 0$ with $\succ$ denoting positive definiteness. Note that the left-hand-side of the above equation is nonlinear in $K$ ; however its equivalent solution is proposed in the literature [55, 56] as the right-hand-side in (25) with $X=Y^{-1}$ . Now, having the above LMI to be linear in $K$ , we note that the constraint $X=Y^{-1}$ is a non-convex constraint. However, this constraint can be approximated with a linear function of the matrices $X,Y\succ 0$ , satisfy $X=Y^{-1}$ as the optimal point of the following optimization problem [54]:

$\displaystyle\min{}{}$	$\displaystyle\mathbf{trace}(XY)$	(26)
s.t.	$\displaystyle\left(\begin{array}[]{cc}X&I\\ I&Y\\ \end{array}\right)\succ 0,$
	$\displaystyle K\mbox{~{}is~{}block-diagonal}.$

with $X,Y\succ 0$ . Overall, the optimization is summarized as follows; with $(W\otimes A,D_{C})$ observability, the gain matrix $K$ is the solution to the following:

$\displaystyle\min{}{}$	$\displaystyle\mathbf{trace}(XY)$	(27)
s.t.	$\displaystyle X,Y\succ 0,$
	$\displaystyle\left(\begin{array}[]{cc}X&\widehat{A}^{\top}\\ \widehat{A}&Y\\ \end{array}\right)\succ 0,$
	$\displaystyle\left(\begin{array}[]{cc}X&I\\ I&Y\\ \end{array}\right)\succ 0,$
	$\displaystyle K\mbox{~{}is~{}block-diagonal}.$

It should be mentioned that a solution of the second LMI is equivalent to $X=Y^{-1}$ , which results in an optimal value for minimum trace as $nN$ . Furthermore, $\mathbf{trace}(XY)$ can be replaced with the linear approximation $\mathbf{trace}(Y_{0}X+X_{0}Y)/2$ [54], and the iterative Algorithm 1 can be used to minimize this problem under given constraints.

Given:

A,W,D_{C}

Result: Gain matrix

K

Find feasible points

X_{0},Y_{0},K

;

while $\rho(\widehat{A})>1$ do

minimize

\mathbf{trace}(Y_{t}X+X_{t}Y)

under the constraints given in equation (27) and find

X,Y,K

;

Y_{t+1}=Y

;

X_{t+1}=X

;

t=t+1

;

end while

Algorithm 1 Iterative calculation of block-diagonal gain

K

In [54], it is shown that $\mathbf{trace}(Y_{t}X+X_{t}Y)$ is non-increasing and converges to $2nN$ . In this regard, a stopping criterion of the algorithm is established as reaching within $2nN+\epsilon$ of the trace objective. Interested readers may refer to [53, 55, 56, 54, 57] for more details. It should be noted that this algorithm (and in general similar cone-complementarity algorithms) are of polynomial-order complexity, see [58, 59].

V Sensor Fault Detection and Isolation

In this section, we present our results on the development of sensor fault detection and isolation scheme for the considered system. Following the terminology in [60], given the estimated state $\widehat{\mathbf{x}}_{k|k}^{i}$ , define the estimated output at sensor $i$ as $\widehat{\mathbf{y}}_{k}^{i}=C_{i}\widehat{\mathbf{x}}_{k|k}^{i}$ . Note that for fault-free case $\mathbf{y}_{k}^{i}-\widehat{\mathbf{y}}_{k}^{i}$ is steady-state stable and bounded by proper design of $W$ and $K$ matrices. Thus, define the residual signal at sensor $i$ at time-step $k$ as,

	$\displaystyle r_{k}^{i}=$	$\displaystyle\|\mathbf{y}_{k}^{i}-\widehat{\mathbf{y}}_{k}^{i}\|=\|\mathbf{y}_{k}^{i}-C_{i}\widehat{\mathbf{x}}_{k\|k}^{i}\|=\|C_{i}\mathbf{e}_{k}^{i}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}\|$
	$\displaystyle=$	$\displaystyle\|C_{i}\widehat{A}_{i}\mathbf{e}_{k-1}+C_{i}\mathbf{\eta}_{k}^{i}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}\|$		(28)

where $\widehat{A}_{i}$ is the $i$ th hyper-row of the matrix $\widehat{A}$ defined as the block of rows from row $(i-1)n+1$ to row $in$ . Moreover, $C_{i}\mathbf{\eta}_{k}^{i}$ can be written as:

	$\displaystyle C_{i}\mathbf{\eta}_{k}^{i}=$	$\displaystyle C_{i}\mathbf{\nu}_{k-1}-C_{i}K^{i}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}-C_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}$
		$\displaystyle-C_{i}K^{i}C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}.$		(29)

Assume that sensor $i$ is faulty at some time-interval, i.e. $\mathbf{f}_{k}^{i}\neq 0$ for $k_{1}<k$ . Note that, considering (28) and (29), the estimation error is now affected by the sensor fault. In this case for the residual at sensor $i$ , the term $C_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}\neq 0$ and $\mathbf{f}^{i}_{k}\neq 0$ in (28) and (29), while for other sensors $C_{j}K^{j}C_{j}^{\top}\mathbf{f}^{j}_{k}=0$ . Note that among many possible choices for the gain matrix $K$ , it can be designed such that the term $C_{i}K^{i}C_{i}^{\top}$ is large enough to make $r_{k}^{i}$ more affected by the fault $\mathbf{f}^{i}_{k}$ . Thus, although $\mathbf{f}^{i}_{k}$ may also appear in $\eta_{k-1}$ and $\mathbf{e}_{k-1}$ , since $\rho(\widehat{A})<1$ the term $C_{i}\widehat{A}_{i}\mathbf{e}_{k-1}$ in (28) remains small and negligible as compared to larger values of $\mathbf{f}^{i}_{k}$ and $C_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}$ . Therefore, by Schur stability of matrix $\widehat{A}$ and proper choice of the gain matrix $K$ , the residual at faulty sensor $i$ is more affected while for the other sensors $j\neq i$ , the residuals are less affected since $\mathbf{f}^{j}_{k}=0$ . The fault detection and isolation logic is therefore as follows: if the residual $r_{k}^{i}$ at sensor $i$ exceeds a predefined threshold then sensor $i$ is faulty. Thus, we need to define the thresholds on residuals for fault detection and isolation. In this direction, considering Gaussian noise $\mathcal{N}(0,Q)$ for system dynamics and $\mathcal{N}(0,R)$ for measurement noise, first the variance of the estimation error $\mathbf{e}_{k}^{i}$ is obtained and consequently one can define a threshold such that to detect faults whenever the residual exceeds this threshold.

Let $P_{k}=\mathbb{E}(\mathbf{e}_{k}\mathbf{e}_{k}^{\top})$ and $\Sigma=\mathbb{E}(\eta_{k}\eta^{\top}_{k})$ . Then, it follows that:

	$\displaystyle P_{k}=$	$\displaystyle\widehat{A}P_{k-1}\widehat{A}^{\top}+\Sigma$
	$\displaystyle=$	$\displaystyle\widehat{A}^{k}P_{0}(\widehat{A}^{\top})^{k}+\sum_{j=0}^{k-1}\widehat{A}^{j}\Sigma(\widehat{A}^{\top})^{j}.$		(30)

Recall that $\rho(\widehat{A})<1$ , in the steady-state we have,

\displaystyle P_{\infty}=\lim_{k\rightarrow\infty}P_{k}=\sum_{j=0}^{\infty}\widehat{A}^{j}\Sigma(\widehat{A}^{\top})^{j}.

Let $b=\|\widehat{A}\|_{2}<1$ , then using the result of [61] it can be proved that,

\displaystyle\|P_{\infty}\|_{2}\leq\frac{\|\Sigma\|_{2}}{1-b^{2}}.

(31)

On the other hand, for the fault-free case,

	$\displaystyle\eta_{k}\eta^{\top}_{k}=$	$\displaystyle(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes\nu_{k-1}\nu_{k-1}^{\top})(I_{Nn}-KD_{C})^{\top}$
		$\displaystyle+(K\overline{D}_{C})\zeta_{k}\zeta_{k}^{\top}(K\overline{D}_{C})^{\top}$		(32)

where $\mathbf{1}_{NN}$ is $N$ by $N$ matrix of $1$ s. Then,

	$\displaystyle\Sigma=$	$\displaystyle(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes Q)(I_{Nn}-KD_{C})^{\top}$
		$\displaystyle+(K\overline{D}_{C})R(K\overline{D}_{C})^{\top}$		(33)

The $2$ -norm of $\Sigma$ is upper-bounded by,

$\displaystyle\\|\Sigma\\|_{2}\leq$	$\displaystyle\\|(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes Q)(I_{Nn}-KD_{C})^{\top}\\|_{2}$
	$\displaystyle+\\|(K\overline{D}_{C})R(K\overline{D}_{C})^{\top}\\|_{2}$
$\displaystyle\leq$	$\displaystyle\\|I_{Nn}-KD_{C}\\|_{2}^{2}N\\|Q\\|_{2}+\\|K\\|_{2}^{2}\\|\overline{R}\\|_{2}$	(34)

where $\overline{R}=\mbox{blockdiag}(C_{i}^{\top}R_{i}C_{i})$ . Let $\|I_{Nn}-KD_{C}\|_{2}^{2}=\alpha_{1}$ , $\|K\|_{2}^{2}=\alpha_{2}$ , and $\|\overline{R}\|_{2}=\beta\|R\|_{2}$ . Then, from (31) and scaling the error covariance by $N$ (number of sensors),

\displaystyle\frac{\|P_{\infty}\|_{2}}{N}\leq\frac{\alpha_{1}N\|Q\|_{2}+\alpha_{2}\beta\|R\|_{2}}{N(1-b^{2})}=\Phi

(35)

In fact, equation (35) gives an upper-bound on the variance of estimation error at each sensor $\mathbf{e}_{k}^{i}$ . Following the Gaussianity of estimation error, one can claim that with probability more than $68\%$ the estimation error lies within $|\mathbf{e}_{k}^{i}-\mathbb{E}(\mathbf{e}_{k}^{i})|<\Phi$ . It can be proved that in the steady state, the fault-free estimation error is unbiased, i.e. $\mathbb{E}(\mathbf{e}_{k}^{i})=0$ [61]. Therefore, one can claim that with probability more than $68\%$ we have $\mathbf{r}_{k}^{i}=|C_{i}\mathbf{e}_{k}^{i}+\mathbf{\zeta}^{i}_{k}|<c\Phi+R$ where the constant $c$ is the absolute value of the measurement vector (with the assumption of having one state measurement by every sensor, $c$ is the absolute value of the nonzero entry of $C_{i}$ ). Similarly, with probability more than $95\%$ , the residual lies below $2c\Phi+2R$ , i.e. $\mathbf{r}_{k}^{i}<2c\phi+2R$ , and with probability more than $99\%$ we have $\mathbf{r}_{k}^{i}<3c\Phi+3R$ , etc. Therefore, in the presence of possible sensor faults, one can detect and isolate sensor faults by comparing thresholds with the residuals based on these probabilities. In other words, considering $T_{68\%}=c\Phi+R$ as threshold, one can claim fault detection with probability $68\%$ whenever the residual goes over this threshold. In this case, the probability of false alarm is less than $32\%$ . Similarly, stronger thresholds for fault detection and isolation can be defined as $T_{95\%}=2c\Phi+2R$ , $T_{99\%}=3c\Phi+3R$ , etc.

VI Sensor Replacement for Observability Recovery

The measurement of faulty sensors may be compensated in terms of observability recovery and in this direction, the concept of observational equivalence is a relevant term [46, 47]. The faulty sensor measurement can be replaced by a measurement of an equivalent state to recover the loss of observability. Note that in this paper, to avoid trivial case, it is assumed that the fault is inherent with the measurement of the specific state measured by the faulty sensor. This could be due to, for example, environmental condition of the sensor location resulting to the fault/anomaly at the sensor.

Extending the results of [62] to dual case of network observability, the set of necessary sensor measurements can be found over digraph representation of the dynamical system. Note that the system digraph, denoted by $\mathcal{G}$ , is defined as the graph associated with the system matrix $A$ , or the structured system matrix $\mathcal{A}$ representing the zero-nonzero pattern of $A$ . In $\mathcal{G}$ every state is represented by a node and every non-zero entry $A_{ij}$ (or $\mathcal{A}_{ij}$ ) is represented by a link from node $i$ to node $j$ . It is known that many generic properties of the system, including observability and controllability, can be defined over this graph [63]. As an example, consider the following structured system matrix,

\displaystyle\mathcal{A}=\left(\begin{array}[]{cccccccccccc}*&0&0&0&0&0&0&0&0&0&0&0\\ &*&*&0&0&0&0&0&*&0&0&0\\ 0&0&0&*&0&0&0&0&0&0&0&0\\ 0&0&*&*&0&0&0&0&0&0&0&0\\ &0&0&0&*&0&*&0&0&0&0&0\\ 0&0&0&0&0&0&*&*&0&0&0&0\\ 0&0&0&0&0&*&0&0&0&0&0&0\\ 0&0&0&0&0&*&*&*&0&0&0&0\\ 0&0&0&0&0&*&0&0&*&*&*&0\\ 0&0&0&*&0&0&0&0&*&0&0&*\\ 0&0&0&0&0&0&0&0&0&0&0&*\\ 0&0&0&0&0&0&0&0&0&0&*&0\end{array}\right)

(48)

where $*$ represents a non-zero entry and as an example a non-zero entry in $\mathcal{A}_{51}$ implies a link from node $5$ to node $1$ in $\mathcal{G}$ . The system digraph $\mathcal{G}$ associated to (48) is shown in Fig. 2.

In the system digraph, $\mathcal{G}$ , a Strongly-Connected-Component (SCC) denoted by $\mathcal{S}$ is defined as a subgraph in which there is a path from every node to every other node. Among the SCCs, define a parent SCC, denoted by $\mathcal{S}^{p}$ , as the SCC with no outgoing link to other SCCs. In Fig. 2, the SCCs and parent SCCs associated with the structured system matrix (48) are shown. As it can be seen from this figure, the SCC may only include a single node.

It can be proved that the measurement of at least one node in every parent SCC is necessary for system observability [62], while all state nodes in the same parent SCC are observationally equivalent. This implies that the measurements of all nodes in the parent SCC equivalently recover the observability of the system digraph, $\mathcal{G}$ .

In this direction, our proposed logic for sensor replacement is as follows: if the faulty sensor measures a state in a parent SCC $\mathcal{S}^{p}_{i}$ , one can compensate the loss of observability by adding a new sensor measurement of another state node in $\mathcal{S}^{p}_{i}$ . Otherwise, if the faulty sensor measures a state in no parent SCC, the faulty sensor can be removed with no affect on the estimation performance of the other sensors. For example, in Fig. 2 if the measurement of state $8$ is faulty and is removed, measurement of either states $6$ or $7$ may recover the loss of observability. Note that if the faulty sensor does not measure a necessary system state (a state in a parent SCC), its removal has no affect on distributed observability, and only the communication network needs to be restructured to satisfy the strong connectivity of the sensor network as mentioned in Section III. For example, in Fig. 2 a faulty sensor measuring any state in $\{2,5,9,10\}$ can be removed without loss of system observability. It should be noted that if the parent SCC includes a single node (also referred as parent node), there is no replacement for the faulty measurement. This implies that the parent nodes are more vulnerable to attacks and faults in terms of recovering system observability. For example, this is the case for the measurement of state $1$ in Fig. 2.

VII Simulations

VII-A An Illustrative Example

Consider a linear dynamical system with the structure given by the system digraph in Fig. 3.

The full-rank system of $12$ states with associated structured matrix (48) is considered to be tracked by a network of $4$ sensors. The non-zero entries of $\mathcal{A}$ are chosen such that $\rho(A)=1.2>1$ implying an unstable system dynamics. The system and measurement noise are $\mathcal{N}(0,0.04)$ . The measured states are represented by green nodes. Given the set of measurements as in Fig. 3, it can be checked that one state node in every parent SCC $\mathcal{S}_{1}^{p}=\{1\}$ , $\mathcal{S}_{2}^{p}=\{3,4\}$ , $\mathcal{S}_{3}^{p}=\{6,7,8\}$ , $\mathcal{S}_{4}^{p}=\{11,12\}$ is measured and therefore the pair $(A,C)$ is observable. The sensors estimate the system state over time using the networked estimator (19)-(20). The sensors share their a-priori estimates over the SC communication network (or sensor network) shown in Fig. 3. This network represents the structure of $W$ matrix while the entries of $W$ are chosen randomly such that the stochasticity of $W$ is satisfied. Having $W$ to be irreducible, for the networked system the conditions for networked observability (or distributed observability) are hold [42]. Therefore, one can design the proper gain matrix $K$ using the LMI procedure described in Section IV. Applying this block-diagonal $K$ matrix, $\rho(\widehat{A})<1$ and therefore all sensor errors are stable.

Next, it is assumed that the sensor $3$ is faulty after time-step $25$ and sensor $4$ is faulty after time-step $40$ , i.e. $\mathbf{f}^{3}_{k\geq 25}=0.6$ , $\mathbf{f}^{4}_{k\geq 40}=0.4$ . The residuals corresponding to this fault scenario are shown in Fig. 4.

As shown in this figure, the faulty sensors can be detected and isolated once the residuals corresponding to the faulty sensor exceed the thresholds. For this simulation, we have $b=\|\widehat{A}\|_{2}=0.63$ , $\alpha_{1}=\|I_{48}-KD_{C}\|_{2}^{2}=3.83$ , $\alpha_{2}=\|K\|_{2}^{2}=3.84$ , $|C_{3}K^{3}C_{3}^{\top}|=3.01$ , and $|C_{4}K^{4}C_{4}^{\top}|=3.05$ . Further, the non-zero entries of the measurement matrix $C$ are considered to be equal to $1$ , and therefore $\beta=1$ and $c=1$ . Then, using equation (35), it follows that $\Phi=0.31$ and the thresholds are,

\displaystyle T_{68\%}=0.35,~{}~{}T_{95\%}=0.70,~{}~{}T_{99\%}=1.05.

(49)

Based on these thresholds, one can claim the detection of fault at sensor $3$ with probability more than $99\%$ and fault at sensor $4$ with probability more than $95\%$ . The probability of false alarms are less than $1\%$ and less than $5\%$ , respectively.

Next, one can compensate for measurement of the faulty sensors by replacing observationally equivalent state measurements from $\mathcal{S}_{3}^{p}=\{6,7,8\}$ and $\mathcal{S}_{4}^{p}=\{11,12\}$ . In Fig. 5, the set of states which are observationally equivalent to state $12$ (measured by sensor $3$ ) and to state $8$ (measured by sensor $4$ ) are, respectively, shown in blue and brown colors.

Replacing the faulty sensors $3$ and $4$ with sensors $3B$ and $4B$ measuring observationally equivalent states $11$ and $7$ , the loss of observability for estimation procedure is recovered. Note that we assume communication links for sensors $3B$ and $4B$ to be the same as sensor $3$ and $4$ . The Mean-Squared-Estimation-Error (MSEE) at all sensors in the compensated framework is shown in Fig. 6. As it is clear the MSEE is bounded steady-state stable at all sensors.

As compared to simulations in [3, 4, 19] where the underlying dynamical system is stable, the example in this section provides a distributed estimation and fault detection of an unstable system. As compared to [43, 12, 13, 44, 45, 10], in this example the noise follows Gaussian distribution and no bound on the noise is assumed. Moreover, in [7, 18, 11, 25], no system and/or measurement noise is considered. As compared to the distributed estimation in [33, 34, 32] which requires a complete all-to-all network of sensors (i.e. for this example every sensor is directly connected to all other $3$ sensors), our networked estimator only requires an SC sensor network. Among the related literature, we provide a comparison with a recent work in the next subsection.

VII-B A Comparison Study

Here, our proposed sensor fault detection, isolation, and distributed estimation protocol is compared with the recent work [23]. In [23], a multi time-scale resilient distributed estimation strategy is proposed along with attack/fault detection. The reason for selecting this work for comparison is that its assumptions and framework is similar to our proposed strategy in the following aspects: (i) it considers the underlying system dynamics to be unstable; (ii) the estimation and attack detection scheme is not centralized but distributed over a sensor network; (iii) it assumes a connected undirected network of sensors, while similarly we assume a SC network; and (iv) an additive term is considered in the presence of noise on the sensors representing possible biasing attacks/faults at sensors similar to our assumption. Note that in [23], each sensor performs $L$ steps of (consensus) averaging over the sensor network between every two steps $k-1$ and $k$ of system dynamics and it is claimed that the proposed protocol is resilient to sensor faults of certain magnitude, while it is capable of detecting and isolating the attacked sensors in certain conditions. They particularly state that under certain conditions the attacked sensor can be detected while some other attacks cannot be detected (remain stealthy), and therefore two sets of detectable attacking set (DAS) and undetectable (or stealthy) attacking set (UAS) are defined. The proposed strategy in [23] is simulated over the same dynamic system and sensor network in Fig. 3 and all the conditions including initial state values, noise values, etc are considered similar to the previous subsection, while the sensor network is reconsidered as an undirected cycle $1\leftrightarrow 2\leftrightarrow 3\leftrightarrow 4\leftrightarrow 1$ . The parameters for the distributed estimation protocol are as in Table I.

TABLE I: Prameter values for resilient distributed estimation and attack detection protocol in [23].

$L$	$20$	$\alpha$	$0.2$	$\beta$	$0.5$
$\\|A\\|$	$1.37$	$\gamma$	$0.21$	$N$	$4$
$s$	$2$	$b_{w}$	$0.1$	$b_{v}$	$0.1$
$\lambda_{0}$	$1.9$	$\eta_{0}$	$0.1$	$\rho_{t_{0}}$	$0.1$

Under these parameters the MSEE of $4$ sensors are shown in Fig. 7. Note that in this simulation, every sensor performs $L=20$ steps of consensus on estimations of neighboring sensors as compared to $1$ step consensus in protocol (19)-(20). This requires a processing setup $20$ times faster than our proposed networked estimation setup.

The sensor attack detection logic in [23] is as follows: if the measurement update $\mathbf{y}^{i}_{k}-C_{i}A\widehat{\mathbf{x}}^{i}_{k-1}$ is larger than a pre-defined threshold $\Phi_{k}$ , then the attack at sensor $i$ is detected, otherwise the sensor is either attack-free or the attack remains undetected (stealthy). Running the proposed attack detection in [23] based on the parameters in Table I, the measurement updates and the threshold $\Phi_{k}$ are shown in Fig. 8. As it is clear from this figure, both attacks/faults at sensors $3$ and $4$ are not large enough to be detected and both remain stealthy. However, as shown in Fig. 4, our proposed approach can detect and isolate both faults.

VIII Concluding Remarks

In this paper, a fault detection and isolation scenario for a networked estimator is presented by defining the sensor residuals and probability-based thresholds to detect and isolate sensor faults. The thresholds are defined based on the upper-bound on the norm of the estimation error covariance. It should be emphasized that the FDI and distributed estimation strategy in this paper is of polynomial order computational complexity. In fact, the protocol (19)-(20) as a variant of the protocol in [39] is of P-order complexity and according to [58, 59], Algorithm 1 and similar cone-complementarity algorithms are of P-order complexity, implying that the design of block-diagonal gain matrix in Section IV is of P-order complexity. For fault detection and mitigation strategy, the decomposition of the system digraph into SCCs, determining their partial order, and finding parent SCCs are based on the Depth-First-Search (DFS) algorithm [64] with complexity $\mathcal{O}(n^{2})$ . The computational complexity of the threshold design in (35) based on the 2-norm of the covariance is $\mathcal{O}(n^{3})$ . Therefore, the overall strategy in this paper is of P-order complexity and hence scalable to large-scale applications. Note that although the simulation in Section VII is given for a small-scale example system, the P-order complexity guarantees the scalability of the proposed scheme to large-scale systems.

It is worth noting that the fault detection and isolation can be improved by defining tighter upper-bounds on the residuals. This can be done by designing the gain matrix $K$ to reduce $\|I_{Nn}-KD_{C}\|_{2}^{2}$ , and $\|K\|_{2}^{2}$ . The optimal LMI approach to optimize the gain $K$ to satisfy such conditions is the direction of our future research. A promising topic of future interest is optimal design of the communication network (the structure of the matrix $W$ ) and cost-optimal selection of measurements for observability recovery to reduce the estimation costs at sensor network. This may include reducing both the sensor embedding costs and communication costs in sensor networks. As another direction of future research we are currently working to extend this fault detection and isolation as well as fault compensation scenario to general rank-deficient systems. In this direction, the networked estimation protocol needs to be updated to consider measurement sharing over the sensor network.

Acknowledgement

We would like to thank Prof. Usman A. Khan from Tufts University for his helpful comments and suggestions on this paper. We also thank the authors of reference [23] for providing their paper and related files.

References

[1] I. Hwang, S. Kim, Y. Kim, and C. E. Seah, “A survey of fault detection, isolation, and reconfiguration methods,” IEEE Transactions on Control Systems Technology, vol. 18, no. 3, pp. 636–653, 2009.
[2] M. Davoodi, N. Meskin, and K. Khorasani, “Simultaneous fault detection, isolation and control tracking design using a single observer-based module,” in American Control Conference. IEEE, 2014, pp. 3047–3052.
[3] S. Jee, H. J. Lee, and Y. H. Joo, “H-/H $\infty$ sensor fault detection and isolation in linear time-invariant systems,” International Journal of Control, Automation and Systems, vol. 10, no. 4, pp. 841–848, 2012.
[4] Z. Li and I. M. Jaimoukha, “Observer-based fault detection and isolation filter design for linear time-invariant systems,” International Journal of Control, vol. 82, no. 1, pp. 171–182, 2009.
[5] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: An adaptive approximation approach,” IEEE Transactions on Automatic Control, vol. 57, no. 2, pp. 275–290, 2011.
[6] M. Davoodi, K. Khorasani, H. A. Talebi, and H. R. Momeni, “Distributed fault detection and isolation filter design for a network of heterogeneous multiagent systems,” IEEE Transactions on Control Systems Technology, vol. 22, no. 3, pp. 1061–1069, 2013.
[7] A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “Distributed fault detection and isolation resilient to network model uncertainties,” IEEE transactions on cybernetics, vol. 44, no. 11, pp. 2024–2037, 2014.
[8] I. Shames, A. Teixeira, H. Sandberg, and K. H. Johansson, “Distributed fault detection for interconnected second-order systems,” Automatica, vol. 47, no. 12, pp. 2757–2764, 2011.
[9] Y. Quan, W. Chen, Z. Wu, and L. Peng, “Observer-based distributed fault detection and isolation for heterogeneous discrete-time multi-agent systems with disturbances,” IEEE Access, vol. 4, pp. 4652–4658, 2016.
[10] A. Marino, F. Pierri, and F. Arrichiello, “Distributed fault detection isolation and accommodation for homogeneous networked discrete-time linear systems,” IEEE Transactions on Automatic Control, vol. 62, no. 9, pp. 4840–4847, 2017.
[11] Y. Li, H. Fang, J. Chen, and C. Shang, “Distributed fault detection and isolation for multi-agent systems using relative information,” in American Control Conference (ACC). IEEE, 2016, pp. 5939–5944.
[12] M. S. Chong, M. Wakaiki, and J. P. Hespanha, “Observability of linear systems under adversarial attacks,” in American Control Conference (ACC). IEEE, 2015, pp. 2439–2444.
[13] C. Lee, H. Shim, and Y. Eun, “Secure and robust state estimation under sensor attacks, measurement noises, and process disturbances: Observer-based combinatorial approach,” in European Control Conference (ECC). IEEE, 2015, pp. 1872–1877.
[14] X. Ren, Y. Mo, J. Chen, and K. H. Johansson, “Secure state estimation with byzantine sensors: A probabilistic approach,” IEEE Transactions on Automatic Control, 2020.
[15] M. Pajic, I. Lee, and G. J. Pappas, “Attack-resilient state estimation for noisy dynamical systems,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 82–92, 2016.
[16] Y. Shoukry, P. Nuzzo, A. Puggelli, A. L. Sangiovanni-Vincentelli, S. A. Seshia, and P. Tabuada, “Secure state estimation for cyber-physical systems under sensor attacks: A satisfiability modulo theory approach,” IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 4917–4932, 2017.
[17] M. Pajic, J. Weimer, N. Bezzo, P. Tabuada, O. Sokolsky, I. Lee, and G. J. Pappas, “Robustness of attack-resilient state estimators,” in ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS). IEEE, 2014, pp. 163–174.
[18] Y. Chen, S. Kar, and J. M. F. Moura, “Dynamic attack detection in cyber-physical systems with side initial state information,” IEEE Transactions on Automatic Control, vol. 62, no. 9, pp. 4618–4624, 2016.
[19] M. Deghat, V. Ugrinovskii, I. Shames, and C. Langbort, “Detection and mitigation of biasing attacks on distributed estimation networks,” Automatica, vol. 99, pp. 369–381, 2019.
[20] Y. Guan and X. Ge, “Distributed attack detection and secure estimation of networked cyber-physical systems against false data injection attacks and jamming attacks,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 1, pp. 48–59, 2017.
[21] L. Su and S. Shahrampour, “Finite-time guarantees for byzantine-resilient distributed state estimation with noisy measurements,” IEEE Transactions on Automatic Control, 2019.
[22] L. An and G. Yang, “Distributed secure state estimation for cyber-physical systems under sensor attacks,” Automatica, vol. 107, pp. 526–538, 2019.
[23] X. He, X. Ren, H. Sandberg, and K. H. Johansson, “How to secure distributed filters under sensor attacks?” arXiv preprint arXiv:2004.05409, 2020.
[24] W. Ao, Y. Song, and C. Wen, “Distributed secure state estimation and control for cpss under sensor attacks,” IEEE transactions on cybernetics, vol. 50, no. 1, pp. 259–269, 2018.
[25] A. Mitra and S. Sundaram, “Secure distributed state estimation of an lti system over time-varying networks and analog erasure channels,” in American Control Conference. IEEE, 2018, pp. 6578–6583.
[26] A. Alanwar, H. Said, and M. Althoff, “Distributed secure state estimation using diffusion kalman filters and reachability analysis,” in IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 4133–4139.
[27] Y. Chen, S. Kar, and J. M. F. Moura, “Resilient distributed estimation through adversary detection,” IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2455–2469, 2018.
[28] K. Manandhar, X. Cao, F. Hu, and Y. Liu, “Detection of faults and attacks including false data injection attack in smart grid using kalman filter,” IEEE Transactions on Control of Network Systems, vol. 1, no. 4, pp. 370–379, 2014.
[29] X. Wei and M. Verhaegen, “Fault detection of large scale wind turbine systems: A mixed ${H}_{\infty}$ / ${H}-$ index observer approach,” in 16th Mediterranean Conference on Control and Automation. IEEE, 2008, pp. 1675–1680.
[30] N. Meskin and K. Khorasani, Fault detection and isolation: Multi-vehicle unmanned systems. Springer Science & Business Media, 2011.
[31] F. Garin and L. Schenato, “A survey on distributed estimation and control applications using linear consensus algorithms,” in Networked control systems. Springer, 2010, pp. 75–107.
[32] S. M. Azizi and K. Khorasani, “Networked estimation of relative sensing multiagent systems using reconfigurable sets of subobservers,” IEEE Transactions on Control Systems Technology, vol. 22, no. 6, pp. 2188–2204, 2014.
[33] T. Boukhobza, F. Hamelin, S. Martinez-Martinez, and D. Sauter, “Structural analysis of the partial state and input observability for structured linear systems: Application to distributed systems,” European Journal of Control, vol. 15, no. 5, pp. 503–516, Oct. 2009.
[34] S. Kar, J. M. F. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Transactions on Information Theory, vol. 58, no. 6, pp. 3575–3605, 2012.
[35] S. Tu and A. Sayed, “Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks,” IEEE Transactions on Signal Processing,, vol. 60, no. 12, pp. 6217–6234, 2012.
[36] G. Battistelli, L. Chisci, G. Mugnai, A. Farina, and A. Graziano, “Consensus-based algorithms for distributed filtering,” in 51st IEEE Conference on Decision and Control, 2012, pp. 794–799.
[37] S. Park and N. Martins, “Necessary and sufficient conditions for the stabilizability of a class of LTI distributed observers,” in 51st IEEE Conference on Decision and Control, 2012, pp. 7431–7436.
[38] M. Doostmohammadian and U. Khan, “On the genericity properties in distributed estimation: Topology design and sensor placement,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 2, pp. 195–204, 2013.
[39] M. Doostmohammadian, H. R. Rabiee, and U. A. Khan, “Cyber-social systems: modeling, inference, and optimal design,” IEEE Systems Journal, vol. 14, no. 1, pp. 73–83, 2020.
[40] C. Commault, J. Dion, and S. Y. Agha, “Structural analysis for the sensor location problem in fault detection and isolation,” Automatica, vol. 44, no. 8, pp. 2074–2080, 2008.
[41] C. Commault and J. Dion, “Sensor location for diagnosis in linear systems: A structural analysis,” IEEE Transactions on Automatic Control, vol. 52, no. 2, pp. 155–169, 2007.
[42] M. Doostmohammadian and U. A. Khan, “On the characterization of distributed observability from first principles,” in 2nd IEEE Global Conference on Signal and Information Processing, 2014, pp. 914–917.
[43] M. Pajic, P. Tabuada, I. Lee, and G. J. Pappas, “Attack-resilient state estimation in the presence of noise,” in 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 5827–5832.
[44] A. R. Kodakkadan, M. Pourasghar, V. Puig, S. Olaru, C. Ocampo-Martinez, and V. Reppa, “Observer-based sensor fault detectability: About robust positive invariance approach and residual sensitivity,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 5041–5046, 2017.
[45] S. Kar and J. M. F. Moura, “Gossip and distributed Kalman filtering: Weak consensus under weak detectability,” IEEE Transactions on Signal Processing, vol. 59, no. 4, pp. 1766–1784, April 2011.
[46] M. Doostmohammadian and U. A. Khan, “Measurement partitioning and observational equivalence in state estimation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4855–4859.
[47] M. Doostmohammadian, H. R. Rabiee, H. Zarrabi, and U. A. Khan, “Distributed estimation recovery under sensor failure,” IEEE Signal Processing Letters, vol. 24, no. 10, pp. 1532–1536, 2017.
[48] M. Doostmohammadian and U. Khan, “On the complexity of minimum-cost networked estimation of self-damped dynamical systems,” IEEE Transactions on Network Science and Engineering, 2019.
[49] O. Ennasr and X. Tan, “Distributed localization of a moving target: Structural observability-based convergence analysis,” in American Control Conference. IEEE, 2018, pp. 2897–2903.
[50] R. Olfati-Saber, “Distributed Kalman filters with embedded consensus filters,” in 44th IEEE Conference on Decision and Control, Seville, Spain, Dec. 2005, pp. 8179–8184.
[51] J. Bay, Fundamentals of linear state space systems. McGraw-Hill, 1999.
[52] M. Doostmohammadian and U. A. Khan, “Minimal sufficient conditions for structural observability/controllability of composite networks via kronecker product,” IEEE Transactions on Signal and Information processing over Networks, vol. 6, pp. 78–87, 2020.
[53] U. A. Khan and A. Jadbabaie, “Coordinated networked estimation strategies using structured systems theory,” in 49th IEEE Conference on Decision and Control, Orlando, FL, Dec. 2011, pp. 2112–2117.
[54] L. E. Ghaoui, F. Oustry, and M. A. Rami, “A cone complementarity linearization algorithm for static output-feedback and related problems,” IEEE Transactions on Automatic Control, vol. 42, no. 8, pp. 1171–1176, 1997.
[55] O. L. Mangasarian and J. S. Pang, “The extended linear complementarity problem,” SIAM Journal on Matrix Analysis and Applications, vol. 2, pp. 359–368, Jan. 1995.
[56] M. Pajic, S. Sundaram, J. Le Ny, G. Pappas, and R. Mangharam, “The wireless control network: Synthesis and robustness,” in 49th Conference on Decision and Control, Orlando, FL, 2010, pp. 7576–7581.
[57] A. I. Zečević and D. D. Šiljak, “Control design with arbitrary information structure constraints,” Automatica, vol. 44, no. 10, pp. 2642–2647, 2008.
[58] Y. Ye, “A fully polynomial-time approximation algorithm for computing a stationary point of the general linear complementarity problem,” Mathematics of Operations Research, vol. 18, no. 2, pp. 334–345, 1993.
[59] Y. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms in convex programming. SIAM, 1994.
[60] S. Sundaram, “Fault-tolerant and secure control systems,” University of Waterloo, Lecture Notes, 2012.
[61] U. A. Khan and A. Jadbabaie, “Collaborative scalar-gain estimators for potentially unstable social dynamics with limited communication,” Automatica, vol. 50, no. 7, pp. 1909–1914, 2014.
[62] M. Doostmohammadian, “Minimal driver nodes for structural controllability of large-scale dynamical systems: Node classification,” IEEE Systems Journal, 2019.
[63] C. Lin, “Structural controllability,” IEEE Transactions on Automatic Control, vol. 19, no. 3, pp. 201–208, Jun. 1974.
[64] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, 2009.

	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k-1}=$	$\displaystyle\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1\|k-1},$		(19)
	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k}=$	$\displaystyle\widehat{\mathbf{x}}^{i}_{k\|k-1}+K^{i}C_{i}^{\top}\left(\mathbf{y}^{i}_{k}-C_{i}\widehat{\mathbf{x}}^{i}_{k\|k-1}\right),$		(20)

$\displaystyle\\|\Sigma\\|_{2}\leq$	$\displaystyle\\|(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes Q)(I_{Nn}-KD_{C})^{\top}\\|_{2}$
	$\displaystyle+\\|(K\overline{D}_{C})R(K\overline{D}_{C})^{\top}\\|_{2}$
$\displaystyle\leq$	$\displaystyle\\|I_{Nn}-KD_{C}\\|_{2}^{2}N\\|Q\\|_{2}+\\|K\\|_{2}^{2}\\|\overline{R}\\|_{2}$	(34)