This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sensor Fault Detection and Isolation via Networked Estimation: Full-Rank Dynamical Systems

Mohammadreza Doostmohammadian, Nader Meskin, Senior Member, IEEE This paper was supported by the Qatar National Research Fund (a member of the Qatar Foundation) under the NPRP Grant number NPRP10-0105-170107.Mohammadreza Doostmohammadian is with the Faculty of Mechanical Engineering at Semnan University, Semnan, Iran, email: [email protected]. Nader Meskin is with the Electrical Engineering Department at Qatar University, Doha, Qatar, email: [email protected].
Abstract

This paper considers the problem of simultaneous sensor fault detection, isolation, and networked estimation of linear full-rank dynamical systems. The proposed networked estimation is a variant of single time-scale protocol and is based on (i) consensus on a-priori estimates and (ii) measurement innovation. The necessary connectivity condition on the sensor network and stabilizing block-diagonal gain matrix is derived based on our previous works. Considering additive faults in the presence of system and measurement noise, the estimation error at sensors is derived and proper residuals are defined for fault detection. Unlike many works in the literature, no simplifying upper-bound condition on the noise is considered and we assume Gaussian system/measurement noise. A probabilistic threshold is then defined for fault detection based on the estimation error covariance norm. Finally, a graph-theoretic sensor replacement scenario is proposed to recover possible loss of networked observability due to removing the faulty sensor. We examine the proposed fault detection and isolation scheme on an illustrative academic example to verify the results and make a comparison study with related literature.

Keywords: Fault Detection and Isolation, Observability, Networked Estimation, System Digraph, Residual

I Introduction

Fault Detection and Isolation (FDI) is an emerging topic of recent literature in control and signal-processing [1]. In general, a fault is characterized as an unpermitted deviation or malfunction of (at least) one of the system parameters/properties from its standard condition, which may take place at the plant, sensor, or actuator. Fault Detection, Isolation, and Reconfiguration (FDIR) is therefore a methodology to ensure acceptable system operation and to compensate for occurrence of faults detected by FDI module[1]. The fault diagnosis schemes are spanned from the centralized approaches [2, 3, 4] to more recent distributed methods [5, 6, 7, 8, 9, 10, 11]. Indeed, with recent technological trends, many practical systems of interest are either large-scale or physically distributed, and thus it is required to develop distributed or networked FDI strategies. Toward this goal, in this paper, the problem of simultaneous sensor fault detection, isolation and networked estimation of dynamical system is investigated.

Fault detection and isolation in networked dynamical systems has been extensively studied in the literature. In [6], an FDI strategy is proposed for a distributed heterogeneous multi-agent system and in [7], fault detection of a group of interconnected noise-free individual subsystems is considered such that each subsystem is able to detect faults in its neighborhood. In [8], an FDI problem for interconnected double-integrator system under unknown system faults and fault-free measurements is investigated such that almost all faults except the faults with zero dynamics can be detected. Similarly, in [9], a faulty second-order dynamical system (representing a heterogeneous multi-agent system) is considered with bounded disturbance and fault-free measurements and a distributed observer-based FDI strategy is proposed. The case of homogeneous multi-agent linear systems under upper-bounded system noise is considered in [10], where a distributed observer-based FDI based on fault-free measurements is developed to detect faults at agents. Development of FDI on noise-free faulty systems monitored by a multi-agent network communicating under an event-triggered framework is considered in [11]. It should be noted that most of these works are with the main aim of fault detection and the state estimation of the underlying dynamical system is not considered.

The other related trend in the literature is attack detection and resilient/secure estimation, both centralized [12, 13, 14, 15, 16, 17, 18] and distributed [19, 20, 21, 22, 23, 24, 25, 26, 27]. Distributed estimation strategies are more prominent in recent years due to emergence of large-scale applications. In [19], a distributed HH_{\infty} observer is proposed to detect admissible biasing attacks over distributed estimation networks by introducing an auxiliary input tracking model. In [20], both false-data injection attack at the system/process and jamming attack at the communication network of sensors are considered and in [21], a distributed gradient descent protocol is adopted to optimize the norm of the covariance of the measurement updates as the cost function. In [22], a distributed method is proposed to estimate the system state over a k-regular sensor network under attacked noise-free measurements and in [23], a distributed resilient estimation scheme is proposed where the time-scale of the sensors (as estimators) needs to be faster than the system dynamics.

Secure distributed estimation of cyber-physical systems composed of interconnected subsystems each monitored by a sensor under possible unidentifiable attack is discussed in [24]. In [25], secure distributed estimators for noise-free system and measurements is proposed over dynamic sensor networks subject to communication loss. Other relevant works include secure estimation based on reachability analysis in [26] and static parameter estimation in [27]. Note that most of these works propose a resilient estimation protocol under sensor attack, where no attack detection and isolation scheme is considered. In general, distributed fault/attack detection finds application in cases where the centralized architecture is not possible or not desirable, as in smart-grid monitoring [28], large-scale wind-farms [29], and networked unmanned vehicles [30].

In this paper, a quantitative model-based method is adopted, as in the observer-based methods, to develop explicit mathematical model and control theory to generate residuals for sensor fault detection. The proposed observer/estimator in this paper is distributed/networked, which finds application in large-scale architectures. Similar networked estimators are proposed in the literature [31, 32, 33, 34, 35, 36, 37]. A single time-scale networked estimator is developed to track the global state of the dynamical system over a distributed network of sensors each taking local measurements with partial observability. Unlike some literature [33, 34, 32], it is not assumed that the system is observable in the neighborhood of each sensor, i.e. the system is not necessarily observable by the measurements directly shared with the sensor. This significantly reduces the connectivity requirements on the sensor network, and therefore reduces the network communication costs. Similar to [35, 36, 37], it is assumed that the underlying dynamical system is full-rank and conditions for networked observability are developed.

The design procedure in this paper is based on structured system theory, previously developed in [38, 39]. Structural (or generic) analysis is widely used in system theory, namely in fault detection [40, 41] and in distributed observability characterization [42]. Using structural analysis, the structure of the sensor network and local estimator gain can be designed to satisfy distributed observability. However, in this paper possible additive faults in sensors are considered and by mathematically deriving the estimation error and sensor residuals, probabilistic thresholds on the residuals are obtained based on error covariance, which leads to the fault detection and isolation logic. Finally, after detecting and isolating the faulty sensor, a graph-theoretic method is proposed to replace the faulty sensor with an observationally equivalent state measurement to compensate the loss of observability. To summarize, the main contributions of this paper as compared with related literature are as follows:

  • Considering system/process and measurement noise is a challenge in FDI strategies. This is mainly due to the fact that it is generally hard to distinguish between the presence of faults and system/measurement noise as both fault and noise terms may affect the sensor residuals. In this direction, some works in the literature [7, 18, 11, 25] assume that the system and/or sensor measurement are noise-free, which is a simplifying assumption. In this paper, we consider both system and measurement noise and propose a probability-based threshold design on the residuals to overcome this challenge.

  • Following the above comment, many works in FDI and attack detection literature [43, 12, 13, 44, 45, 10] assume that the noise variable is upper-bounded, i.e the noise term instead of taking different values, for example, from Gaussian distribution, only takes values in a limited range. This simplifying assumption helps to design deterministic thresholds on the residuals. In this paper, no such assumption is made, and the noise is assumed to be Gaussian random variable with no upper-bound, which is more realistic as compared to the mentioned references.

  • In this paper, a general LTI system is considered, which generalizes the multi-agent systems or interconnected systems each possessing a separate dynamics considered in [6, 7, 10], and the double-integrator system in [8, 9]. Furthermore, the sensor network is considered as a Strongly-Connected (SC) graph as compared to restrictive regularity condition considered in [22].

  • We adopt a graph-theoretic approach to recover the loss of observability in case of detecting and removing a faulty sensor. This approach is based on our previous works on observational equivalence [46, 47].

The rest of this paper is organized as follows. In Section II, the general framework and state the problem are presented. Section III provides the networked estimator scenario and develops the condition for error stability. In Section IV, the algorithm for block-diagonal gain design at sensors is provided. Section V develops the sensor fault-detection and isolation logic, and in Section VI, a graph-theoretic method for observability compensation is discussed. Section VII gives simulation example to illustrate the results. Finally, Section VIII concludes the paper.

II The Framework

We consider noisy discrete-time linear systems as,

𝐱k+1=A𝐱k+νk,k0,\displaystyle\mathbf{x}_{k+1}=A\mathbf{x}_{k}+\mathbf{\nu}_{k},\qquad k\geq 0, (1)

where 𝐱kn\mathbf{x}_{k}\in\mathbb{R}^{n} is the system state and νk=𝒩(0,Q)\mathbf{\nu}_{k}=\mathcal{N}(0,Q) is the system noise at time-step kk. In this paper, the underlying system matrix AA is considered to be full-rank and examples of such full-rank systems are self-damped dynamical systems [48]. It should be noted that the LTI system (1) can be also obtained by the discretization of a continuous-time LTI system, based on Euler or Tustin discretization methods discussed in [48] and both of these methods result in a full-rank discrete-time LTI system. Moreover, the full-rank condition can be inherent to the system dynamics, such as in Nearly-Constant-Velocity (NCV) model for target dynamics in distributed tracking scenarios [49]. In [49], the target is modeled as a discrete-time dynamical system whose associated matrix is full-rank due to its non-zero diagonal entries. The system full-rank condition is also considered in the networked estimation literature as in [35, 36, 37].

The noise/fault-corrupted measurements of the system are taken by NN sensors,

(𝐲k1𝐲kN)\displaystyle\left(\begin{array}[]{c}\mathbf{y}^{1}_{k}\\ \vdots\\ \mathbf{y}^{N}_{k}\end{array}\right) =(C1CN)(𝐱k1𝐱kn)+(ζk1ζkN)+(𝐟k1𝐟kN)\displaystyle=\left(\begin{array}[]{c}C_{1}\\ \vdots\\ C_{N}\end{array}\right)\left(\begin{array}[]{c}\mathbf{x}^{1}_{k}\\ \vdots\\ \mathbf{x}^{n}_{k}\end{array}\right)+\left(\begin{array}[]{c}\mathbf{\zeta}^{1}_{k}\\ \vdots\\ \mathbf{\zeta}^{N}_{k}\end{array}\right)+\left(\begin{array}[]{c}\mathbf{f}^{1}_{k}\\ \vdots\\ \mathbf{f}^{N}_{k}\end{array}\right) (17)

which can be written as the global measurement equation as follows:

𝐲k\displaystyle\mathbf{y}_{k} =\displaystyle= C𝐱k+ζk+𝐟k,\displaystyle C\mathbf{x}_{k}+\mathbf{\zeta}_{k}+\mathbf{f}_{k}, (18)

where 𝐲kN\mathbf{y}_{k}\in\mathbb{R}^{N}, ζk=𝒩(0,R)\mathbf{\zeta}_{k}=\mathcal{N}(0,R) is the measurement noise, and 𝐟k\mathbf{f}_{k} represents the sensor fault vector. It is assumed that ζk\mathbf{\zeta}_{k} and νk\mathbf{\nu}_{k} are zero-mean Gaussian while noise, i.e. 𝔼(νk)=0\mathbb{E}(\nu_{k})=0, 𝔼(ζki)=0\mathbb{E}(\zeta^{i}_{k})=0, for all i,ki,k, and 𝔼(νkνm)=0\mathbb{E}(\nu_{k}\nu_{m})=0, for all kmk\neq m, and similarly, for the measurement noise. Further, without loss of generality, it is assumed that each sensor measures one of the system states.

The general problem in this paper is to design a stable networked estimation protocol in the absence of faults, and, then, a fault detection and isolation logic such that to detect possible faults in sensors. Given the system and measurements as in (1) and (18), a group of sensors is considered, each embedded with communication and computation equipment to measure a state of the dynamical system and process the measured data and information received from the other sensors to estimate the global state of the dynamical system.

Note that unlike many works in the literature [33, 34, 32], no assumption on the observability of system in the direct neighborhood of each sensor is considered which implies that the minimal connectivity on the communication network of agents is required. Each sensor adopts a single time-scale networked estimation protocol and by proper design of communication network structure and feedback gain matrix, each sensor tracks the system state by bounded steady-state estimation error in the fault-free case. In case of fault occurrence, i.e. 𝐟ki0\mathbf{f}_{k}^{i}\neq 0 at any sensor ii, a residual-based fault detection and isolation logic is proposed to detect the faulty sensor by comparing the sensor residuals with pre-specified thresholds. Note that unlike many work in the literature [43, 12, 13, 44, 45], it is not assumed that the system noise and/or measurement noise are upper-bounded. Finally, by detecting the faulty sensor, one may compensate for possible loss of observability in the distributed system by introducing a new sensor measurement replacing the faulty sensor.

III Networked Estimation Protocol

In this section, we present our main tool to perform diagnosis as a single time-scale networked estimation protocol and analyze its error stability criteria. The networked estimator is proposed based on collaborative consensus on the a-priori estimates at sensors. The estimator is based on two steps as follows,

𝐱^k|k1i=\displaystyle\widehat{\mathbf{x}}^{i}_{k|k-1}= j𝒩β(i)WijA𝐱^k1|k1j,\displaystyle\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}, (19)
𝐱^k|ki=\displaystyle\widehat{\mathbf{x}}^{i}_{k|k}= 𝐱^k|k1i+KiCi(𝐲kiCi𝐱^k|k1i),\displaystyle\widehat{\mathbf{x}}^{i}_{k|k-1}+K^{i}C_{i}^{\top}\left(\mathbf{y}^{i}_{k}-C_{i}\widehat{\mathbf{x}}^{i}_{k|k-1}\right), (20)

where 𝐱^k|k1i\widehat{\mathbf{x}}^{i}_{k|k-1} is the estimate of state 𝐱k\mathbf{x}_{k} at sensor ii given all the measurements up to time k1k-1, 𝐱^k|ki\widehat{\mathbf{x}}^{i}_{k|k} is the estimated state given all the measurements up to time kk, the matrix WW is a stochastic matrix for consensus on a-priori estimates, and KiK^{i} is the gain matrix at sensor ii. Note that the row-stochastic condition on WW is a necessary condition for (consensus) averaging of a-priori estimates. Recall that a matrix is row-stochastic if the summation of the entries in each row are equal to 11, i.e. j=1NWij=1\sum_{j=1}^{N}W_{ij}=1. Further, 𝒩β(i)\mathcal{N}_{\beta}(i) defines the neighborhood of sensor ii over which the sensor shares a-priori estimates with the neighboring sensors and this neighborhood follows the structure of matrix WW. In fact 𝒩β(i)={j|ji}{i}\mathcal{N}_{\beta}(i)=\{j|j\rightarrow i\}\cup\{i\} where jij\rightarrow i implies that sensor jj sends its information to ii and 𝒩β(i)\mathcal{N}_{\beta}(i) also includes the sensor ii itself, i.e. all the diagonal entries of the matrix WW are non-zero. This is due to the fact that each sensor uses its own estimate at previous time 𝐱^k1|k1i\widehat{\mathbf{x}}^{i}_{k-1|k-1} to calculate its a-priori estimate 𝐱^k|k1i\widehat{\mathbf{x}}^{i}_{k|k-1}. It should be noted that the protocol (19)-(20) differs from our previous works [38, 39] in two aspects: (i) the protocol in [38, 39] has one more consensus step on the measurement fusion in which the neighboring measurements are shared over a different (hub-based) network, while the protocol (19)-(20) only includes one step of averaging on a-priori estimates. Therefore, the network connectivity condition in this paper is more relaxed as compared to [38, 39]; (ii) the distributed approach in [38, 39] is fault-free where the sensor measurements are not accompanied with additive faults and consequently, the error dynamics analysis is different from this work.

Note that the estimator (19)-(20) is single time-scale as compared to multi time-scale networked estimation given in [50, 23] where many step of averaging (or consensus) is performed between every two steps kk and k+1k+1 of system dynamics (see Fig. 1).

Refer to caption
Figure 1: Single and multiple time-scale strategies for networked estimation: (Left) multi time-scale also known as average-consensus based approach, and (Right) single time-scale approach. As it is clear from the figure, in the former scenario many steps of averaging is performed between every two successive time-steps of system dynamics, while in the latter case only one step of averaging is done between every two successive steps of system dynamics. The former requires faster processing units, while the latter is more desirable in real-time applications.

It is known that the single time-scale approach is privileged over the multi time-scale method, since the latter requires a large number of information-exchange and communication over the sensor network between every two successive time-steps of the system dynamics. This implies that the communication time-scale needs to be much faster than the dynamics, which is not desirable and even may not be feasible for many large-scale real-time systems. To elaborate this, note that in protocol (19)-(20) only one step of averaging on a-priori estimates is done between two successive steps kk and k+1k+1, while the protocol in [50] requires many steps of consensus between steps kk and k+1k+1. In the multi time-scale estimation, the connectivity of the sensor network is more relaxed due to more information exchanges among the sensors, while in the single time-scale method, the sensor network requires more connectivity particularly for rank-deficient dynamical systems. In fact, it is proved that a SC network may not guarantee error stability for rank-deficient dynamical systems and certain hub-based network design is necessary, see [38, 39] for details.

Define the error 𝐞ki\mathbf{e}_{k}^{i} at time-step kk at sensor ii as,

𝐞ki=\displaystyle\mathbf{e}_{k}^{i}= 𝐱k|k𝐱^k|ki,\displaystyle\mathbf{x}_{k|k}-\widehat{\mathbf{x}}^{i}_{k|k},
=\displaystyle= 𝐱k(𝐱^k|k1i+KiCi(𝐲kiCi𝐱^k|k1i))\displaystyle\mathbf{x}_{k}-\Bigl{(}\widehat{\mathbf{x}}^{i}_{k|k-1}+K^{i}C_{i}^{\top}(\mathbf{y}^{i}_{k}-C_{i}\widehat{\mathbf{x}}^{i}_{k|k-1})\Bigr{)}
=\displaystyle= 𝐱k(j𝒩β(i)WijA𝐱^k1|k1j\displaystyle\mathbf{x}_{k}-\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}
+KiCi(𝐲kiCij𝒩β(i)WijA𝐱^k1|k1j)).\displaystyle+K^{i}C_{i}^{\top}\Bigl{(}\mathbf{y}^{i}_{k}-C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}\Bigr{)}\Bigr{)}. (21)

By substituting (1)-(18) in above, it follows that:

𝐞ki=\displaystyle\mathbf{e}_{k}^{i}= A𝐱k1+νk1(j𝒩β(i)WijA𝐱^k1|k1j\displaystyle A\mathbf{x}_{k-1}+\mathbf{\nu}_{k-1}-\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}
+KiCi(Ci𝐱k+ζki+𝐟ki\displaystyle+K^{i}C_{i}^{\top}\Bigl{(}C_{i}\mathbf{x}_{k}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}
Cij𝒩β(i)WijA𝐱^k1|k1j))\displaystyle-C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}\Bigr{)}\Bigr{)}
=\displaystyle= A𝐱k1+νk1j𝒩β(i)WijA𝐱^k1|k1j\displaystyle A\mathbf{x}_{k-1}+\mathbf{\nu}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}
Ki(CiCi(Axk1+νk1)+Ciζki\displaystyle-K^{i}\Bigl{(}C_{i}^{\top}C_{i}(Ax_{k-1}+\mathbf{\nu}_{k-1})+C_{i}^{\top}\mathbf{\zeta}^{i}_{k}
+Ci𝐟kiCiCij𝒩β(i)WijA𝐱^k1|k1j)\displaystyle+C_{i}^{\top}\mathbf{f}^{i}_{k}-C_{i}^{\top}C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}\Bigr{)}

Recall that the row-stochastic condition of WW implies that j=1NWij=1\sum_{j=1}^{N}W_{ij}=1. Note that for j𝒩β(i)j\in\mathcal{N}_{\beta}(i), Wij0W_{ij}\neq 0 and for j𝒩β(i)j\notin\mathcal{N}_{\beta}(i), Wij=0W_{ij}=0. Therefore, the row-stochastic condition can be re-written as j𝒩β(i)Wij=1\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}=1. Based on this fact, it follows that (see [39] for similar analysis):

A𝐱k1=j𝒩β(i)WijA𝐱k1,\displaystyle A\mathbf{x}_{k-1}=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1},

and consequently:

𝐞ki\displaystyle\mathbf{e}_{k}^{i} =j𝒩β(i)WijA𝐱k1j𝒩β(i)WijA𝐱^k1|k1j\displaystyle=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}
KiCiCi(j𝒩β(i)WijA𝐱k1j𝒩β(i)WijA𝐱^k1|k1j)\displaystyle-K^{i}C_{i}^{\top}C_{i}\Bigl{(}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{x}_{k-1}-\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\widehat{\mathbf{x}}^{j}_{k-1|k-1}\Bigr{)}
+νk1KiCiζkiKiCi𝐟kiKiCiCiνk1\displaystyle+\mathbf{\nu}_{k-1}-K^{i}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}-K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}-K^{i}C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}
=j𝒩β(i)WijA𝐞k1jKiCiCij𝒩β(i)WijA𝐞k1j+ηki,\displaystyle=\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{e}_{k-1}^{j}-K^{i}C_{i}^{\top}C_{i}\sum_{j\in\mathcal{N}_{\beta}(i)}W_{ij}A\mathbf{e}^{j}_{k-1}+\mathbf{\eta}_{k}^{i},

where ηki\mathbf{\eta}_{k}^{i} collects the noise terms and fault term as follows:

ηki=νk1Ki(Ciζki+Ci𝐟ki+CiCiνk1).\displaystyle\mathbf{\eta}_{k}^{i}=\mathbf{\nu}_{k-1}-K^{i}\Bigl{(}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}+C_{i}^{\top}\mathbf{f}^{i}_{k}+C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}\Bigr{)}. (22)

The collective error at the group of sensors is defined as the concatenation of errors at all sensors, i.e. 𝐞k=(𝐞k1,,𝐞kN)\mathbf{e}_{k}=(\mathbf{e}_{k}^{1\top},\dots,\mathbf{e}_{k}^{N\top})^{\top}. For the collective error, it follows that:

𝐞k=A^𝐞k1+ηk,\displaystyle\mathbf{e}_{k}=\widehat{A}\mathbf{e}_{k-1}+\mathbf{\eta}_{k}, (23)

where A^=WAKDC(WA)\widehat{A}=W\otimes A-KD_{C}(W\otimes A) with K=blockdiag(Ki)K=\mbox{blockdiag}(K^{i}) and DC=blockdiag(CiCi)D_{C}=\mbox{blockdiag}(C_{i}^{\top}C_{i}). The collective noise vector is given as:

ηk\displaystyle\mathbf{\eta}_{k} =𝟏Nνk1KDC(𝟏Nνk1)KD¯CζkKD¯C𝐟k,\displaystyle=\mathbf{1}_{N}\otimes\mathbf{\nu}_{k-1}-KD_{C}(\mathbf{1}_{N}\otimes\mathbf{\nu}_{k-1})-K\overline{D}_{C}\mathbf{\zeta}_{k}-K\overline{D}_{C}\mathbf{f}_{k}, (24)

where 𝟏N\mathbf{1}_{N} is the vector of 1s of size NN and D¯C=blockdiag(Ci)\overline{D}_{C}=\mbox{blockdiag}(C_{i}^{\top}).

To stabilize the error dynamics (23), it is necessary that the pair (WA,DC)(W\otimes A,D_{C}) to be observable [51]. Note that the observability analysis of Kronecker product of matrices (and the associated composite graph) is discussed in details in [52], where it is proved that for observability of (WA,DC)(W\otimes A,D_{C}), the matrix WW needs to be irreducible. This implies that the sensor network needs to be Strongly-Connected (SC). In fact, this is also discussed in [38], and since the protocol (19)-(20) is a variant of the protocol in [38], the same analysis can be adopted here to prove the irreducibility of WW matrix. Therefore, any irreducible matrix (associated with a SC network) with non-zero diagonal entries while satisfying the row-stochastic condition can work as WW matrix. The SC connectivity of the sensor network is also considered in similar networked estimation literature, see for example [35, 36, 37]. Note that the (WA,DC)(W\otimes A,D_{C})-observability is also referred as networked observability (or distributed observability) condition. As it is explained in the next section, having (WA,DC)(W\otimes A,D_{C})-observability satisfied, using Linear-Matrix-Inequalities (LMI), the block-diagonal gain matrix KK can be designed such that A^\widehat{A} is a Schur matrix [38, 53, 54], i.e. ρ(A^)<1\rho(\widehat{A})<1 where ρ\rho defines the spectral radius of a matrix.

IV Design of Block-Diagonal Feedback Gain based on LMI Approach

In this section, we discuss the methodology for computation of a block-diagonal estimator gain KK. Note that, the observability of (WA,DC)(W\otimes A,D_{C}) in the distributed estimator (19)-(20) guarantees the existence of a full matrix KK, such that ρ(WAKDC(WA))<1\rho(W\otimes A-KD_{C}(W\otimes A))<1. However, for having a distributed approach, the gain matrix is required to be block-diagonal. Such KK is known to be the solution of an LMI as follows:

(XA^XXA^X)0(XA^A^Y)0,\displaystyle~{}~{}\left(\begin{array}[]{cc}X&\widehat{A}^{\top}X\\ X\widehat{A}&X\\ \end{array}\right)\succ 0\Rightarrow\left(\begin{array}[]{cc}X&\widehat{A}^{\top}\\ \widehat{A}&Y\\ \end{array}\right)\succ 0, (25)

where A^=WAKDC(WA)\widehat{A}=W\otimes A-KD_{C}(W\otimes A) and  X0X\succ 0 with \succ denoting positive definiteness. Note that the left-hand-side of the above equation is nonlinear in KK; however its equivalent solution is proposed in the literature [55, 56] as the right-hand-side in (25) with X=Y1X=Y^{-1}. Now, having the above LMI to be linear in KK, we note that the constraint X=Y1X=Y^{-1} is a non-convex constraint. However, this constraint can be approximated with a linear function of the matrices X,Y0X,Y\succ 0, satisfy X=Y1X=Y^{-1} as the optimal point of the following optimization problem [54]:

min\displaystyle\min{}{} 𝐭𝐫𝐚𝐜𝐞(XY)\displaystyle\mathbf{trace}(XY) (26)
s.t. (XIIY)0,\displaystyle\left(\begin{array}[]{cc}X&I\\ I&Y\\ \end{array}\right)\succ 0,
K is block-diagonal.\displaystyle K\mbox{~{}is~{}block-diagonal}.

with X,Y0X,Y\succ 0. Overall, the optimization is summarized as follows; with (WA,DC)(W\otimes A,D_{C}) observability, the gain matrix KK is the solution to the following:

min\displaystyle\min{}{} 𝐭𝐫𝐚𝐜𝐞(XY)\displaystyle\mathbf{trace}(XY) (27)
s.t. X,Y0,\displaystyle X,Y\succ 0,
(XA^A^Y)0,\displaystyle\left(\begin{array}[]{cc}X&\widehat{A}^{\top}\\ \widehat{A}&Y\\ \end{array}\right)\succ 0,
(XIIY)0,\displaystyle\left(\begin{array}[]{cc}X&I\\ I&Y\\ \end{array}\right)\succ 0,
K is block-diagonal.\displaystyle K\mbox{~{}is~{}block-diagonal}.

It should be mentioned that a solution of the second LMI is equivalent to X=Y1X=Y^{-1}, which results in an optimal value for minimum trace as nNnN. Furthermore, 𝐭𝐫𝐚𝐜𝐞(XY)\mathbf{trace}(XY) can be replaced with the linear approximation 𝐭𝐫𝐚𝐜𝐞(Y0X+X0Y)/2\mathbf{trace}(Y_{0}X+X_{0}Y)/2 [54], and the iterative Algorithm 1 can be used to minimize this problem under given constraints.

Given: A,W,DCA,W,D_{C}
Result: Gain matrix KK
Find feasible points X0,Y0,KX_{0},Y_{0},K;
while ρ(A^)>1\rho(\widehat{A})>1 do
       minimize 𝐭𝐫𝐚𝐜𝐞(YtX+XtY)\mathbf{trace}(Y_{t}X+X_{t}Y) under the constraints given in equation (27) and find X,Y,KX,Y,K;
       Yt+1=YY_{t+1}=Y;
       Xt+1=XX_{t+1}=X;
       t=t+1t=t+1;
      
end while
Algorithm 1 Iterative calculation of block-diagonal gain KK.

In [54], it is shown that 𝐭𝐫𝐚𝐜𝐞(YtX+XtY)\mathbf{trace}(Y_{t}X+X_{t}Y) is non-increasing and converges to 2nN2nN. In this regard, a stopping criterion of the algorithm is established as reaching within 2nN+ϵ2nN+\epsilon of the trace objective. Interested readers may refer to [53, 55, 56, 54, 57] for more details. It should be noted that this algorithm (and in general similar cone-complementarity algorithms) are of polynomial-order complexity, see [58, 59].

V Sensor Fault Detection and Isolation

In this section, we present our results on the development of sensor fault detection and isolation scheme for the considered system. Following the terminology in [60], given the estimated state 𝐱^k|ki\widehat{\mathbf{x}}_{k|k}^{i}, define the estimated output at sensor ii as 𝐲^ki=Ci𝐱^k|ki\widehat{\mathbf{y}}_{k}^{i}=C_{i}\widehat{\mathbf{x}}_{k|k}^{i}. Note that for fault-free case 𝐲ki𝐲^ki\mathbf{y}_{k}^{i}-\widehat{\mathbf{y}}_{k}^{i} is steady-state stable and bounded by proper design of WW and KK matrices. Thus, define the residual signal at sensor ii at time-step kk as,

rki=\displaystyle r_{k}^{i}= |𝐲ki𝐲^ki|=|𝐲kiCi𝐱^k|ki|=|Ci𝐞ki+ζki+𝐟ki|\displaystyle|\mathbf{y}_{k}^{i}-\widehat{\mathbf{y}}_{k}^{i}|=|\mathbf{y}_{k}^{i}-C_{i}\widehat{\mathbf{x}}_{k|k}^{i}|=|C_{i}\mathbf{e}_{k}^{i}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}|
=\displaystyle= |CiA^i𝐞k1+Ciηki+ζki+𝐟ki|\displaystyle|C_{i}\widehat{A}_{i}\mathbf{e}_{k-1}+C_{i}\mathbf{\eta}_{k}^{i}+\mathbf{\zeta}^{i}_{k}+\mathbf{f}^{i}_{k}| (28)

where A^i\widehat{A}_{i} is the iith hyper-row of the matrix A^\widehat{A} defined as the block of rows from row (i1)n+1(i-1)n+1 to row inin. Moreover, CiηkiC_{i}\mathbf{\eta}_{k}^{i} can be written as:

Ciηki=\displaystyle C_{i}\mathbf{\eta}_{k}^{i}= Ciνk1CiKiCiζkiCiKiCi𝐟ki\displaystyle C_{i}\mathbf{\nu}_{k-1}-C_{i}K^{i}C_{i}^{\top}\mathbf{\zeta}^{i}_{k}-C_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}
CiKiCiCiνk1.\displaystyle-C_{i}K^{i}C_{i}^{\top}C_{i}\mathbf{\nu}_{k-1}. (29)

Assume that sensor ii is faulty at some time-interval, i.e. 𝐟ki0\mathbf{f}_{k}^{i}\neq 0 for k1<kk_{1}<k. Note that, considering (28) and (29), the estimation error is now affected by the sensor fault. In this case for the residual at sensor ii, the term CiKiCi𝐟ki0C_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}\neq 0 and 𝐟ki0\mathbf{f}^{i}_{k}\neq 0 in (28) and (29), while for other sensors CjKjCj𝐟kj=0C_{j}K^{j}C_{j}^{\top}\mathbf{f}^{j}_{k}=0. Note that among many possible choices for the gain matrix KK, it can be designed such that the term CiKiCiC_{i}K^{i}C_{i}^{\top} is large enough to make rkir_{k}^{i} more affected by the fault 𝐟ki\mathbf{f}^{i}_{k}. Thus, although 𝐟ki\mathbf{f}^{i}_{k} may also appear in ηk1\eta_{k-1} and 𝐞k1\mathbf{e}_{k-1}, since ρ(A^)<1\rho(\widehat{A})<1 the term CiA^i𝐞k1C_{i}\widehat{A}_{i}\mathbf{e}_{k-1} in (28) remains small and negligible as compared to larger values of 𝐟ki\mathbf{f}^{i}_{k} and CiKiCi𝐟kiC_{i}K^{i}C_{i}^{\top}\mathbf{f}^{i}_{k}. Therefore, by Schur stability of matrix A^\widehat{A} and proper choice of the gain matrix KK, the residual at faulty sensor ii is more affected while for the other sensors jij\neq i, the residuals are less affected since 𝐟kj=0\mathbf{f}^{j}_{k}=0. The fault detection and isolation logic is therefore as follows: if the residual rkir_{k}^{i} at sensor ii exceeds a predefined threshold then sensor ii is faulty. Thus, we need to define the thresholds on residuals for fault detection and isolation. In this direction, considering Gaussian noise 𝒩(0,Q)\mathcal{N}(0,Q) for system dynamics and 𝒩(0,R)\mathcal{N}(0,R) for measurement noise, first the variance of the estimation error 𝐞ki\mathbf{e}_{k}^{i} is obtained and consequently one can define a threshold such that to detect faults whenever the residual exceeds this threshold.

Let Pk=𝔼(𝐞k𝐞k)P_{k}=\mathbb{E}(\mathbf{e}_{k}\mathbf{e}_{k}^{\top}) and Σ=𝔼(ηkηk)\Sigma=\mathbb{E}(\eta_{k}\eta^{\top}_{k}). Then, it follows that:

Pk=\displaystyle P_{k}= A^Pk1A^+Σ\displaystyle\widehat{A}P_{k-1}\widehat{A}^{\top}+\Sigma
=\displaystyle= A^kP0(A^)k+j=0k1A^jΣ(A^)j.\displaystyle\widehat{A}^{k}P_{0}(\widehat{A}^{\top})^{k}+\sum_{j=0}^{k-1}\widehat{A}^{j}\Sigma(\widehat{A}^{\top})^{j}. (30)

Recall that ρ(A^)<1\rho(\widehat{A})<1, in the steady-state we have,

P=limkPk=j=0A^jΣ(A^)j.\displaystyle P_{\infty}=\lim_{k\rightarrow\infty}P_{k}=\sum_{j=0}^{\infty}\widehat{A}^{j}\Sigma(\widehat{A}^{\top})^{j}.

Let b=A^2<1b=\|\widehat{A}\|_{2}<1, then using the result of [61] it can be proved that,

P2Σ21b2.\displaystyle\|P_{\infty}\|_{2}\leq\frac{\|\Sigma\|_{2}}{1-b^{2}}. (31)

On the other hand, for the fault-free case,

ηkηk=\displaystyle\eta_{k}\eta^{\top}_{k}= (INnKDC)(𝟏NNνk1νk1)(INnKDC)\displaystyle(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes\nu_{k-1}\nu_{k-1}^{\top})(I_{Nn}-KD_{C})^{\top}
+(KD¯C)ζkζk(KD¯C)\displaystyle+(K\overline{D}_{C})\zeta_{k}\zeta_{k}^{\top}(K\overline{D}_{C})^{\top} (32)

where 𝟏NN\mathbf{1}_{NN} is NN by NN matrix of 11s. Then,

Σ=\displaystyle\Sigma= (INnKDC)(𝟏NNQ)(INnKDC)\displaystyle(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes Q)(I_{Nn}-KD_{C})^{\top}
+(KD¯C)R(KD¯C)\displaystyle+(K\overline{D}_{C})R(K\overline{D}_{C})^{\top} (33)

The 22-norm of Σ\Sigma is upper-bounded by,

Σ2\displaystyle\|\Sigma\|_{2}\leq (INnKDC)(𝟏NNQ)(INnKDC)2\displaystyle\|(I_{Nn}-KD_{C})(\mathbf{1}_{NN}\otimes Q)(I_{Nn}-KD_{C})^{\top}\|_{2}
+(KD¯C)R(KD¯C)2\displaystyle+\|(K\overline{D}_{C})R(K\overline{D}_{C})^{\top}\|_{2}
\displaystyle\leq INnKDC22NQ2+K22R¯2\displaystyle\|I_{Nn}-KD_{C}\|_{2}^{2}N\|Q\|_{2}+\|K\|_{2}^{2}\|\overline{R}\|_{2} (34)

where R¯=blockdiag(CiRiCi)\overline{R}=\mbox{blockdiag}(C_{i}^{\top}R_{i}C_{i}). Let INnKDC22=α1\|I_{Nn}-KD_{C}\|_{2}^{2}=\alpha_{1}, K22=α2\|K\|_{2}^{2}=\alpha_{2}, and R¯2=βR2\|\overline{R}\|_{2}=\beta\|R\|_{2}. Then, from (31) and scaling the error covariance by NN (number of sensors),

P2Nα1NQ2+α2βR2N(1b2)=Φ\displaystyle\frac{\|P_{\infty}\|_{2}}{N}\leq\frac{\alpha_{1}N\|Q\|_{2}+\alpha_{2}\beta\|R\|_{2}}{N(1-b^{2})}=\Phi (35)

In fact, equation (35) gives an upper-bound on the variance of estimation error at each sensor 𝐞ki\mathbf{e}_{k}^{i}. Following the Gaussianity of estimation error, one can claim that with probability more than 68%68\% the estimation error lies within |𝐞ki𝔼(𝐞ki)|<Φ|\mathbf{e}_{k}^{i}-\mathbb{E}(\mathbf{e}_{k}^{i})|<\Phi. It can be proved that in the steady state, the fault-free estimation error is unbiased, i.e. 𝔼(𝐞ki)=0\mathbb{E}(\mathbf{e}_{k}^{i})=0 [61]. Therefore, one can claim that with probability more than 68%68\% we have 𝐫ki=|Ci𝐞ki+ζki|<cΦ+R\mathbf{r}_{k}^{i}=|C_{i}\mathbf{e}_{k}^{i}+\mathbf{\zeta}^{i}_{k}|<c\Phi+R where the constant cc is the absolute value of the measurement vector (with the assumption of having one state measurement by every sensor, cc is the absolute value of the nonzero entry of CiC_{i}). Similarly, with probability more than 95%95\%, the residual lies below 2cΦ+2R2c\Phi+2R, i.e. 𝐫ki<2cϕ+2R\mathbf{r}_{k}^{i}<2c\phi+2R, and with probability more than 99%99\% we have 𝐫ki<3cΦ+3R\mathbf{r}_{k}^{i}<3c\Phi+3R, etc. Therefore, in the presence of possible sensor faults, one can detect and isolate sensor faults by comparing thresholds with the residuals based on these probabilities. In other words, considering T68%=cΦ+RT_{68\%}=c\Phi+R as threshold, one can claim fault detection with probability 68%68\% whenever the residual goes over this threshold. In this case, the probability of false alarm is less than 32%32\%. Similarly, stronger thresholds for fault detection and isolation can be defined as T95%=2cΦ+2RT_{95\%}=2c\Phi+2R, T99%=3cΦ+3RT_{99\%}=3c\Phi+3R, etc.

VI Sensor Replacement for Observability Recovery

The measurement of faulty sensors may be compensated in terms of observability recovery and in this direction, the concept of observational equivalence is a relevant term [46, 47]. The faulty sensor measurement can be replaced by a measurement of an equivalent state to recover the loss of observability. Note that in this paper, to avoid trivial case, it is assumed that the fault is inherent with the measurement of the specific state measured by the faulty sensor. This could be due to, for example, environmental condition of the sensor location resulting to the fault/anomaly at the sensor.

Extending the results of [62] to dual case of network observability, the set of necessary sensor measurements can be found over digraph representation of the dynamical system. Note that the system digraph, denoted by 𝒢\mathcal{G}, is defined as the graph associated with the system matrix AA, or the structured system matrix 𝒜\mathcal{A} representing the zero-nonzero pattern of AA. In 𝒢\mathcal{G} every state is represented by a node and every non-zero entry AijA_{ij} (or 𝒜ij\mathcal{A}_{ij}) is represented by a link from node ii to node jj. It is known that many generic properties of the system, including observability and controllability, can be defined over this graph [63]. As an example, consider the following structured system matrix,

𝒜=(0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)\displaystyle\mathcal{A}=\left(\begin{array}[]{cccccccccccc}*&0&0&0&0&0&0&0&0&0&0&0\\ &*&*&0&0&0&0&0&*&0&0&0\\ 0&0&0&*&0&0&0&0&0&0&0&0\\ 0&0&*&*&0&0&0&0&0&0&0&0\\ &0&0&0&*&0&*&0&0&0&0&0\\ 0&0&0&0&0&0&*&*&0&0&0&0\\ 0&0&0&0&0&*&0&0&0&0&0&0\\ 0&0&0&0&0&*&*&*&0&0&0&0\\ 0&0&0&0&0&*&0&0&*&*&*&0\\ 0&0&0&*&0&0&0&0&*&0&0&*\\ 0&0&0&0&0&0&0&0&0&0&0&*\\ 0&0&0&0&0&0&0&0&0&0&*&0\end{array}\right) (48)

where * represents a non-zero entry and as an example a non-zero entry in 𝒜51\mathcal{A}_{51} implies a link from node 55 to node 11 in 𝒢\mathcal{G}. The system digraph 𝒢\mathcal{G} associated to (48) is shown in Fig. 2.

Refer to caption
Figure 2: The system digraph associated with the structured system matrix (48). The self-cycles are shown by a small line on the right-hand-side of the nodes. Nodes of the same color belong to the same SCC. The SCCs inside the black dashed rectangles are parent SCCs.

In the system digraph, 𝒢\mathcal{G}, a Strongly-Connected-Component (SCC) denoted by 𝒮\mathcal{S} is defined as a subgraph in which there is a path from every node to every other node. Among the SCCs, define a parent SCC, denoted by 𝒮p\mathcal{S}^{p}, as the SCC with no outgoing link to other SCCs. In Fig. 2, the SCCs and parent SCCs associated with the structured system matrix (48) are shown. As it can be seen from this figure, the SCC may only include a single node.

It can be proved that the measurement of at least one node in every parent SCC is necessary for system observability [62], while all state nodes in the same parent SCC are observationally equivalent. This implies that the measurements of all nodes in the parent SCC equivalently recover the observability of the system digraph, 𝒢\mathcal{G}.

In this direction, our proposed logic for sensor replacement is as follows: if the faulty sensor measures a state in a parent SCC 𝒮ip\mathcal{S}^{p}_{i}, one can compensate the loss of observability by adding a new sensor measurement of another state node in 𝒮ip\mathcal{S}^{p}_{i}. Otherwise, if the faulty sensor measures a state in no parent SCC, the faulty sensor can be removed with no affect on the estimation performance of the other sensors. For example, in Fig. 2 if the measurement of state 88 is faulty and is removed, measurement of either states 66 or 77 may recover the loss of observability. Note that if the faulty sensor does not measure a necessary system state (a state in a parent SCC), its removal has no affect on distributed observability, and only the communication network needs to be restructured to satisfy the strong connectivity of the sensor network as mentioned in Section III. For example, in Fig. 2 a faulty sensor measuring any state in {2,5,9,10}\{2,5,9,10\} can be removed without loss of system observability. It should be noted that if the parent SCC includes a single node (also referred as parent node), there is no replacement for the faulty measurement. This implies that the parent nodes are more vulnerable to attacks and faults in terms of recovering system observability. For example, this is the case for the measurement of state 11 in Fig. 2.

VII Simulations

VII-A An Illustrative Example

Consider a linear dynamical system with the structure given by the system digraph in Fig. 3.

Refer to caption
Figure 3: An illustrative example for networked estimation of dynamical system over a sensor network where the 1212 nodes graph in the bottom is the system digraph associated with a full-rank dynamical system, the green states are measured by sensors, and the sensor network on top includes a SC communication network of 44 sensors with faulty sensors 33 and 44.

The full-rank system of 1212 states with associated structured matrix (48) is considered to be tracked by a network of 44 sensors. The non-zero entries of 𝒜\mathcal{A} are chosen such that ρ(A)=1.2>1\rho(A)=1.2>1 implying an unstable system dynamics. The system and measurement noise are 𝒩(0,0.04)\mathcal{N}(0,0.04). The measured states are represented by green nodes. Given the set of measurements as in Fig. 3, it can be checked that one state node in every parent SCC 𝒮1p={1}\mathcal{S}_{1}^{p}=\{1\}, 𝒮2p={3,4}\mathcal{S}_{2}^{p}=\{3,4\}, 𝒮3p={6,7,8}\mathcal{S}_{3}^{p}=\{6,7,8\}, 𝒮4p={11,12}\mathcal{S}_{4}^{p}=\{11,12\} is measured and therefore the pair (A,C)(A,C) is observable. The sensors estimate the system state over time using the networked estimator (19)-(20). The sensors share their a-priori estimates over the SC communication network (or sensor network) shown in Fig. 3. This network represents the structure of WW matrix while the entries of WW are chosen randomly such that the stochasticity of WW is satisfied. Having WW to be irreducible, for the networked system the conditions for networked observability (or distributed observability) are hold [42]. Therefore, one can design the proper gain matrix KK using the LMI procedure described in Section IV. Applying this block-diagonal KK matrix, ρ(A^)<1\rho(\widehat{A})<1 and therefore all sensor errors are stable.

Next, it is assumed that the sensor 33 is faulty after time-step 2525 and sensor 44 is faulty after time-step 4040, i.e. 𝐟k253=0.6\mathbf{f}^{3}_{k\geq 25}=0.6, 𝐟k404=0.4\mathbf{f}^{4}_{k\geq 40}=0.4. The residuals corresponding to this fault scenario are shown in Fig. 4.

Refer to caption
Figure 4: The residuals of 44 sensors monitoring the system states to detect possible faults at sensors where the residuals corresponding to sensors 33 and 44 exceed T99%T_{99\%} and T95%T_{95\%} thresholds, implying that the sensor 33 and sensor 44 are faulty, with probability more than 99%99\% and 95%95\%, respectively.

As shown in this figure, the faulty sensors can be detected and isolated once the residuals corresponding to the faulty sensor exceed the thresholds. For this simulation, we have b=A^2=0.63b=\|\widehat{A}\|_{2}=0.63, α1=I48KDC22=3.83\alpha_{1}=\|I_{48}-KD_{C}\|_{2}^{2}=3.83, α2=K22=3.84\alpha_{2}=\|K\|_{2}^{2}=3.84, |C3K3C3|=3.01|C_{3}K^{3}C_{3}^{\top}|=3.01, and |C4K4C4|=3.05|C_{4}K^{4}C_{4}^{\top}|=3.05. Further, the non-zero entries of the measurement matrix CC are considered to be equal to 11, and therefore β=1\beta=1 and c=1c=1. Then, using equation (35), it follows that Φ=0.31\Phi=0.31 and the thresholds are,

T68%=0.35,T95%=0.70,T99%=1.05.\displaystyle T_{68\%}=0.35,~{}~{}T_{95\%}=0.70,~{}~{}T_{99\%}=1.05. (49)

Based on these thresholds, one can claim the detection of fault at sensor 33 with probability more than 99%99\% and fault at sensor 44 with probability more than 95%95\%. The probability of false alarms are less than 1%1\% and less than 5%5\%, respectively.

Next, one can compensate for measurement of the faulty sensors by replacing observationally equivalent state measurements from 𝒮3p={6,7,8}\mathcal{S}_{3}^{p}=\{6,7,8\} and 𝒮4p={11,12}\mathcal{S}_{4}^{p}=\{11,12\}. In Fig. 5, the set of states which are observationally equivalent to state 1212 (measured by sensor 33) and to state 88 (measured by sensor 44) are, respectively, shown in blue and brown colors.

Refer to caption
Figure 5: Observability recovery where the set of states {11,12}\{11,12\} and {6,7,8}\{6,7,8\} whose measurements are observationally equivalent are shown, respectively, in blue and brown colors. The faulty sensors 33 and 44 in Fig. 3 are replaced with new sensors 3B3B and 4B4B, respectively, measuring the observationally equivalent states 1111 and 77.

Replacing the faulty sensors 33 and 44 with sensors 3B3B and 4B4B measuring observationally equivalent states 1111 and 77, the loss of observability for estimation procedure is recovered. Note that we assume communication links for sensors 3B3B and 4B4B to be the same as sensor 33 and 44. The Mean-Squared-Estimation-Error (MSEE) at all sensors in the compensated framework is shown in Fig. 6. As it is clear the MSEE is bounded steady-state stable at all sensors.

Refer to caption
Figure 6: The MSEE at 44 sensors given in Fig. 5 with sensor 3B3B replacing the faulty sensor 33 and sensor 4B4B replacing the faulty sensor 44 in Fig. 3.

As compared to simulations in [3, 4, 19] where the underlying dynamical system is stable, the example in this section provides a distributed estimation and fault detection of an unstable system. As compared to [43, 12, 13, 44, 45, 10], in this example the noise follows Gaussian distribution and no bound on the noise is assumed. Moreover, in [7, 18, 11, 25], no system and/or measurement noise is considered. As compared to the distributed estimation in [33, 34, 32] which requires a complete all-to-all network of sensors (i.e. for this example every sensor is directly connected to all other 33 sensors), our networked estimator only requires an SC sensor network. Among the related literature, we provide a comparison with a recent work in the next subsection.

VII-B A Comparison Study

Here, our proposed sensor fault detection, isolation, and distributed estimation protocol is compared with the recent work [23]. In [23], a multi time-scale resilient distributed estimation strategy is proposed along with attack/fault detection. The reason for selecting this work for comparison is that its assumptions and framework is similar to our proposed strategy in the following aspects: (i) it considers the underlying system dynamics to be unstable; (ii) the estimation and attack detection scheme is not centralized but distributed over a sensor network; (iii) it assumes a connected undirected network of sensors, while similarly we assume a SC network; and (iv) an additive term is considered in the presence of noise on the sensors representing possible biasing attacks/faults at sensors similar to our assumption. Note that in [23], each sensor performs LL steps of (consensus) averaging over the sensor network between every two steps k1k-1 and kk of system dynamics and it is claimed that the proposed protocol is resilient to sensor faults of certain magnitude, while it is capable of detecting and isolating the attacked sensors in certain conditions. They particularly state that under certain conditions the attacked sensor can be detected while some other attacks cannot be detected (remain stealthy), and therefore two sets of detectable attacking set (DAS) and undetectable (or stealthy) attacking set (UAS) are defined. The proposed strategy in [23] is simulated over the same dynamic system and sensor network in Fig. 3 and all the conditions including initial state values, noise values, etc are considered similar to the previous subsection, while the sensor network is reconsidered as an undirected cycle 123411\leftrightarrow 2\leftrightarrow 3\leftrightarrow 4\leftrightarrow 1. The parameters for the distributed estimation protocol are as in Table I.

TABLE I: Prameter values for resilient distributed estimation and attack detection protocol in [23].
LL 2020 α\alpha 0.20.2 β\beta 0.50.5
A\|A\| 1.371.37 γ\gamma 0.210.21 NN 44
ss 22 bwb_{w} 0.10.1 bvb_{v} 0.10.1
λ0\lambda_{0} 1.91.9 η0\eta_{0} 0.10.1 ρt0\rho_{t_{0}} 0.10.1

Under these parameters the MSEE of 44 sensors are shown in Fig. 7. Note that in this simulation, every sensor performs L=20L=20 steps of consensus on estimations of neighboring sensors as compared to 11 step consensus in protocol (19)-(20). This requires a processing setup 2020 times faster than our proposed networked estimation setup.

Refer to caption
Figure 7: The MSEE at 44 sensors state estimation under the proposed protocol in [23].

The sensor attack detection logic in [23] is as follows: if the measurement update 𝐲kiCiA𝐱^k1i\mathbf{y}^{i}_{k}-C_{i}A\widehat{\mathbf{x}}^{i}_{k-1} is larger than a pre-defined threshold Φk\Phi_{k}, then the attack at sensor ii is detected, otherwise the sensor is either attack-free or the attack remains undetected (stealthy). Running the proposed attack detection in [23] based on the parameters in Table I, the measurement updates and the threshold Φk\Phi_{k} are shown in Fig. 8. As it is clear from this figure, both attacks/faults at sensors 33 and 44 are not large enough to be detected and both remain stealthy. However, as shown in Fig. 4, our proposed approach can detect and isolate both faults.

Refer to caption
Figure 8: The threshold Φk\Phi_{k} and the measurement updates at 44 sensors estimating the system in Fig. 3 under the proposed attack detection scheme in [23]. As claimed in [23] the attack/fault is detected if the measurement updates exceed the Φk\Phi_{k}. In this case, both attacks/faults at sensor 33 and 44 remain undetected (stealthy).

VIII Concluding Remarks

In this paper, a fault detection and isolation scenario for a networked estimator is presented by defining the sensor residuals and probability-based thresholds to detect and isolate sensor faults. The thresholds are defined based on the upper-bound on the norm of the estimation error covariance. It should be emphasized that the FDI and distributed estimation strategy in this paper is of polynomial order computational complexity. In fact, the protocol (19)-(20) as a variant of the protocol in [39] is of P-order complexity and according to [58, 59], Algorithm 1 and similar cone-complementarity algorithms are of P-order complexity, implying that the design of block-diagonal gain matrix in Section IV is of P-order complexity. For fault detection and mitigation strategy, the decomposition of the system digraph into SCCs, determining their partial order, and finding parent SCCs are based on the Depth-First-Search (DFS) algorithm [64] with complexity 𝒪(n2)\mathcal{O}(n^{2}). The computational complexity of the threshold design in (35) based on the 2-norm of the covariance is 𝒪(n3)\mathcal{O}(n^{3}). Therefore, the overall strategy in this paper is of P-order complexity and hence scalable to large-scale applications. Note that although the simulation in Section VII is given for a small-scale example system, the P-order complexity guarantees the scalability of the proposed scheme to large-scale systems.

It is worth noting that the fault detection and isolation can be improved by defining tighter upper-bounds on the residuals. This can be done by designing the gain matrix KK to reduce INnKDC22\|I_{Nn}-KD_{C}\|_{2}^{2}, and K22\|K\|_{2}^{2}. The optimal LMI approach to optimize the gain KK to satisfy such conditions is the direction of our future research. A promising topic of future interest is optimal design of the communication network (the structure of the matrix WW) and cost-optimal selection of measurements for observability recovery to reduce the estimation costs at sensor network. This may include reducing both the sensor embedding costs and communication costs in sensor networks. As another direction of future research we are currently working to extend this fault detection and isolation as well as fault compensation scenario to general rank-deficient systems. In this direction, the networked estimation protocol needs to be updated to consider measurement sharing over the sensor network.

Acknowledgement

We would like to thank Prof. Usman A. Khan from Tufts University for his helpful comments and suggestions on this paper. We also thank the authors of reference [23] for providing their paper and related files.

References

  • [1] I. Hwang, S. Kim, Y. Kim, and C. E. Seah, “A survey of fault detection, isolation, and reconfiguration methods,” IEEE Transactions on Control Systems Technology, vol. 18, no. 3, pp. 636–653, 2009.
  • [2] M. Davoodi, N. Meskin, and K. Khorasani, “Simultaneous fault detection, isolation and control tracking design using a single observer-based module,” in American Control Conference.   IEEE, 2014, pp. 3047–3052.
  • [3] S. Jee, H. J. Lee, and Y. H. Joo, “H-/H\infty sensor fault detection and isolation in linear time-invariant systems,” International Journal of Control, Automation and Systems, vol. 10, no. 4, pp. 841–848, 2012.
  • [4] Z. Li and I. M. Jaimoukha, “Observer-based fault detection and isolation filter design for linear time-invariant systems,” International Journal of Control, vol. 82, no. 1, pp. 171–182, 2009.
  • [5] R. M. G. Ferrari, T. Parisini, and M. M. Polycarpou, “Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: An adaptive approximation approach,” IEEE Transactions on Automatic Control, vol. 57, no. 2, pp. 275–290, 2011.
  • [6] M. Davoodi, K. Khorasani, H. A. Talebi, and H. R. Momeni, “Distributed fault detection and isolation filter design for a network of heterogeneous multiagent systems,” IEEE Transactions on Control Systems Technology, vol. 22, no. 3, pp. 1061–1069, 2013.
  • [7] A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “Distributed fault detection and isolation resilient to network model uncertainties,” IEEE transactions on cybernetics, vol. 44, no. 11, pp. 2024–2037, 2014.
  • [8] I. Shames, A. Teixeira, H. Sandberg, and K. H. Johansson, “Distributed fault detection for interconnected second-order systems,” Automatica, vol. 47, no. 12, pp. 2757–2764, 2011.
  • [9] Y. Quan, W. Chen, Z. Wu, and L. Peng, “Observer-based distributed fault detection and isolation for heterogeneous discrete-time multi-agent systems with disturbances,” IEEE Access, vol. 4, pp. 4652–4658, 2016.
  • [10] A. Marino, F. Pierri, and F. Arrichiello, “Distributed fault detection isolation and accommodation for homogeneous networked discrete-time linear systems,” IEEE Transactions on Automatic Control, vol. 62, no. 9, pp. 4840–4847, 2017.
  • [11] Y. Li, H. Fang, J. Chen, and C. Shang, “Distributed fault detection and isolation for multi-agent systems using relative information,” in American Control Conference (ACC).   IEEE, 2016, pp. 5939–5944.
  • [12] M. S. Chong, M. Wakaiki, and J. P. Hespanha, “Observability of linear systems under adversarial attacks,” in American Control Conference (ACC).   IEEE, 2015, pp. 2439–2444.
  • [13] C. Lee, H. Shim, and Y. Eun, “Secure and robust state estimation under sensor attacks, measurement noises, and process disturbances: Observer-based combinatorial approach,” in European Control Conference (ECC).   IEEE, 2015, pp. 1872–1877.
  • [14] X. Ren, Y. Mo, J. Chen, and K. H. Johansson, “Secure state estimation with byzantine sensors: A probabilistic approach,” IEEE Transactions on Automatic Control, 2020.
  • [15] M. Pajic, I. Lee, and G. J. Pappas, “Attack-resilient state estimation for noisy dynamical systems,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 82–92, 2016.
  • [16] Y. Shoukry, P. Nuzzo, A. Puggelli, A. L. Sangiovanni-Vincentelli, S. A. Seshia, and P. Tabuada, “Secure state estimation for cyber-physical systems under sensor attacks: A satisfiability modulo theory approach,” IEEE Transactions on Automatic Control, vol. 62, no. 10, pp. 4917–4932, 2017.
  • [17] M. Pajic, J. Weimer, N. Bezzo, P. Tabuada, O. Sokolsky, I. Lee, and G. J. Pappas, “Robustness of attack-resilient state estimators,” in ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS).   IEEE, 2014, pp. 163–174.
  • [18] Y. Chen, S. Kar, and J. M. F. Moura, “Dynamic attack detection in cyber-physical systems with side initial state information,” IEEE Transactions on Automatic Control, vol. 62, no. 9, pp. 4618–4624, 2016.
  • [19] M. Deghat, V. Ugrinovskii, I. Shames, and C. Langbort, “Detection and mitigation of biasing attacks on distributed estimation networks,” Automatica, vol. 99, pp. 369–381, 2019.
  • [20] Y. Guan and X. Ge, “Distributed attack detection and secure estimation of networked cyber-physical systems against false data injection attacks and jamming attacks,” IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 1, pp. 48–59, 2017.
  • [21] L. Su and S. Shahrampour, “Finite-time guarantees for byzantine-resilient distributed state estimation with noisy measurements,” IEEE Transactions on Automatic Control, 2019.
  • [22] L. An and G. Yang, “Distributed secure state estimation for cyber-physical systems under sensor attacks,” Automatica, vol. 107, pp. 526–538, 2019.
  • [23] X. He, X. Ren, H. Sandberg, and K. H. Johansson, “How to secure distributed filters under sensor attacks?” arXiv preprint arXiv:2004.05409, 2020.
  • [24] W. Ao, Y. Song, and C. Wen, “Distributed secure state estimation and control for cpss under sensor attacks,” IEEE transactions on cybernetics, vol. 50, no. 1, pp. 259–269, 2018.
  • [25] A. Mitra and S. Sundaram, “Secure distributed state estimation of an lti system over time-varying networks and analog erasure channels,” in American Control Conference.   IEEE, 2018, pp. 6578–6583.
  • [26] A. Alanwar, H. Said, and M. Althoff, “Distributed secure state estimation using diffusion kalman filters and reachability analysis,” in IEEE 58th Conference on Decision and Control (CDC).   IEEE, 2019, pp. 4133–4139.
  • [27] Y. Chen, S. Kar, and J. M. F. Moura, “Resilient distributed estimation through adversary detection,” IEEE Transactions on Signal Processing, vol. 66, no. 9, pp. 2455–2469, 2018.
  • [28] K. Manandhar, X. Cao, F. Hu, and Y. Liu, “Detection of faults and attacks including false data injection attack in smart grid using kalman filter,” IEEE Transactions on Control of Network Systems, vol. 1, no. 4, pp. 370–379, 2014.
  • [29] X. Wei and M. Verhaegen, “Fault detection of large scale wind turbine systems: A mixed H{H}_{\infty}/H{H}- index observer approach,” in 16th Mediterranean Conference on Control and Automation.   IEEE, 2008, pp. 1675–1680.
  • [30] N. Meskin and K. Khorasani, Fault detection and isolation: Multi-vehicle unmanned systems.   Springer Science & Business Media, 2011.
  • [31] F. Garin and L. Schenato, “A survey on distributed estimation and control applications using linear consensus algorithms,” in Networked control systems.   Springer, 2010, pp. 75–107.
  • [32] S. M. Azizi and K. Khorasani, “Networked estimation of relative sensing multiagent systems using reconfigurable sets of subobservers,” IEEE Transactions on Control Systems Technology, vol. 22, no. 6, pp. 2188–2204, 2014.
  • [33] T. Boukhobza, F. Hamelin, S. Martinez-Martinez, and D. Sauter, “Structural analysis of the partial state and input observability for structured linear systems: Application to distributed systems,” European Journal of Control, vol. 15, no. 5, pp. 503–516, Oct. 2009.
  • [34] S. Kar, J. M. F. Moura, and K. Ramanan, “Distributed parameter estimation in sensor networks: Nonlinear observation models and imperfect communication,” IEEE Transactions on Information Theory, vol. 58, no. 6, pp. 3575–3605, 2012.
  • [35] S. Tu and A. Sayed, “Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks,” IEEE Transactions on Signal Processing,, vol. 60, no. 12, pp. 6217–6234, 2012.
  • [36] G. Battistelli, L. Chisci, G. Mugnai, A. Farina, and A. Graziano, “Consensus-based algorithms for distributed filtering,” in 51st IEEE Conference on Decision and Control, 2012, pp. 794–799.
  • [37] S. Park and N. Martins, “Necessary and sufficient conditions for the stabilizability of a class of LTI distributed observers,” in 51st IEEE Conference on Decision and Control, 2012, pp. 7431–7436.
  • [38] M. Doostmohammadian and U. Khan, “On the genericity properties in distributed estimation: Topology design and sensor placement,” IEEE Journal of Selected Topics in Signal Processing, vol. 7, no. 2, pp. 195–204, 2013.
  • [39] M. Doostmohammadian, H. R. Rabiee, and U. A. Khan, “Cyber-social systems: modeling, inference, and optimal design,” IEEE Systems Journal, vol. 14, no. 1, pp. 73–83, 2020.
  • [40] C. Commault, J. Dion, and S. Y. Agha, “Structural analysis for the sensor location problem in fault detection and isolation,” Automatica, vol. 44, no. 8, pp. 2074–2080, 2008.
  • [41] C. Commault and J. Dion, “Sensor location for diagnosis in linear systems: A structural analysis,” IEEE Transactions on Automatic Control, vol. 52, no. 2, pp. 155–169, 2007.
  • [42] M. Doostmohammadian and U. A. Khan, “On the characterization of distributed observability from first principles,” in 2nd IEEE Global Conference on Signal and Information Processing, 2014, pp. 914–917.
  • [43] M. Pajic, P. Tabuada, I. Lee, and G. J. Pappas, “Attack-resilient state estimation in the presence of noise,” in 54th IEEE Conference on Decision and Control (CDC).   IEEE, 2015, pp. 5827–5832.
  • [44] A. R. Kodakkadan, M. Pourasghar, V. Puig, S. Olaru, C. Ocampo-Martinez, and V. Reppa, “Observer-based sensor fault detectability: About robust positive invariance approach and residual sensitivity,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 5041–5046, 2017.
  • [45] S. Kar and J. M. F. Moura, “Gossip and distributed Kalman filtering: Weak consensus under weak detectability,” IEEE Transactions on Signal Processing, vol. 59, no. 4, pp. 1766–1784, April 2011.
  • [46] M. Doostmohammadian and U. A. Khan, “Measurement partitioning and observational equivalence in state estimation,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 4855–4859.
  • [47] M. Doostmohammadian, H. R. Rabiee, H. Zarrabi, and U. A. Khan, “Distributed estimation recovery under sensor failure,” IEEE Signal Processing Letters, vol. 24, no. 10, pp. 1532–1536, 2017.
  • [48] M. Doostmohammadian and U. Khan, “On the complexity of minimum-cost networked estimation of self-damped dynamical systems,” IEEE Transactions on Network Science and Engineering, 2019.
  • [49] O. Ennasr and X. Tan, “Distributed localization of a moving target: Structural observability-based convergence analysis,” in American Control Conference.   IEEE, 2018, pp. 2897–2903.
  • [50] R. Olfati-Saber, “Distributed Kalman filters with embedded consensus filters,” in 44th IEEE Conference on Decision and Control, Seville, Spain, Dec. 2005, pp. 8179–8184.
  • [51] J. Bay, Fundamentals of linear state space systems.   McGraw-Hill, 1999.
  • [52] M. Doostmohammadian and U. A. Khan, “Minimal sufficient conditions for structural observability/controllability of composite networks via kronecker product,” IEEE Transactions on Signal and Information processing over Networks, vol. 6, pp. 78–87, 2020.
  • [53] U. A. Khan and A. Jadbabaie, “Coordinated networked estimation strategies using structured systems theory,” in 49th IEEE Conference on Decision and Control, Orlando, FL, Dec. 2011, pp. 2112–2117.
  • [54] L. E. Ghaoui, F. Oustry, and M. A. Rami, “A cone complementarity linearization algorithm for static output-feedback and related problems,” IEEE Transactions on Automatic Control, vol. 42, no. 8, pp. 1171–1176, 1997.
  • [55] O. L. Mangasarian and J. S. Pang, “The extended linear complementarity problem,” SIAM Journal on Matrix Analysis and Applications, vol. 2, pp. 359–368, Jan. 1995.
  • [56] M. Pajic, S. Sundaram, J. Le Ny, G. Pappas, and R. Mangharam, “The wireless control network: Synthesis and robustness,” in 49th Conference on Decision and Control, Orlando, FL, 2010, pp. 7576–7581.
  • [57] A. I. Zečević and D. D. Šiljak, “Control design with arbitrary information structure constraints,” Automatica, vol. 44, no. 10, pp. 2642–2647, 2008.
  • [58] Y. Ye, “A fully polynomial-time approximation algorithm for computing a stationary point of the general linear complementarity problem,” Mathematics of Operations Research, vol. 18, no. 2, pp. 334–345, 1993.
  • [59] Y. Nesterov and A. Nemirovskii, Interior-point polynomial algorithms in convex programming.   SIAM, 1994.
  • [60] S. Sundaram, “Fault-tolerant and secure control systems,” University of Waterloo, Lecture Notes, 2012.
  • [61] U. A. Khan and A. Jadbabaie, “Collaborative scalar-gain estimators for potentially unstable social dynamics with limited communication,” Automatica, vol. 50, no. 7, pp. 1909–1914, 2014.
  • [62] M. Doostmohammadian, “Minimal driver nodes for structural controllability of large-scale dynamical systems: Node classification,” IEEE Systems Journal, 2019.
  • [63] C. Lin, “Structural controllability,” IEEE Transactions on Automatic Control, vol. 19, no. 3, pp. 201–208, Jun. 1974.
  • [64] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms.   MIT Press, 2009.