Covariance-Robust Dynamic Watermarking
Abstract
Attack detection and mitigation strategies for cyberphysical systems (CPS) are an active area of research, and researchers have developed a variety of attack-detection tools such as dynamic watermarking. However, such methods often make assumptions that are difficult to guarantee, such as exact knowledge of the distribution of measurement noise. Here, we develop a new dynamic watermarking method that we call covariance-robust dynamic watermarking, which is able to handle uncertainties in the covariance of measurement noise. Specifically, we consider two cases. In the first this covariance is fixed but unknown, and in the second this covariance is slowly-varying. For our tests, we only require knowledge of a set within which the covariance lies. Furthermore, we connect this problem to that of algorithmic fairness and the nascent field of fair hypothesis testing, and we show that our tests satisfy some notions of fairness. Finally, we exhibit the efficacy of our tests on empirical examples chosen to reflect values observed in a standard simulation model of autonomous vehicles.
I INTRODUCTION
The development of 5G, the “fifth generation” of wireless technology, brings with it increased bandwidth, massive-scale device-to-device (D2D) connections, lower latency, and high reliability. The latency reductions with 5G open the door to further growth in cyberphysical systems (CPS), which involve the intercommunication and real-time management of large numbers of physical sensors and actuators, often in shifting environments [1]. System vulnerabilities to malicious agents abound in all of these technologies [2, 3, 4], and 5G in particular necessitates more robust cyber-security measures for the relevant control systems [1].
Much of the existing work on security for CPS assumes that the system is fixed and all required distributions are exactly known [5, 6, 7]. However, the system description is often time-varying or partially uncertain for many CPS [8, 9]. Given this real-world motivation of security for time-varying or partially unknown CPS, we focus in this paper on designing robust security schemes to test for adversarial attacks on LTI systems. Recent work has established dynamic watermarking as a key active method for detecting sensor attacks [5, 6, 10, 11, 12, 7, 13, 14], and here we build on this work by designing covariance-robust dynamic watermarking.
We design robust watermarking for two sub-cases: The first is where the covariance of measurement noise is fixed but unknown, and the second is where the covariance of measurement noise is unknown and slowly-varying. The first reflects a scenario of many nearly-identical systems with variation between copies of the system. The second sub-case reflects a scenario where a sensor has different accuracy in varying regimes, such as lidar on an autonomous vehicle in changing weather. Attack detection is critical in all of these such cases, and we need statistical tests that retain their power in the face of system changes or uncertainty.
I-A Fairness
Robust data-driven decision-making has gained attention in the literature on algorithmic fairness. Motivated by machine learning tasks with societal applications, the fairness literature has sought to design learning methods that refrain from considering certain variables. To that extent, this body of work defines rigorous, mathematical notions of fairness for supervised learning [15, 16, 17, 18, 19, 20, 21, 22], which have recently been extended to unsupervised learning by [23, 24].
The work in [25] outlines a general framework: Consider with a joint distribution , where are exogenous inputs, are endogenous “targets”, and is a “protected attribute”. The goal is to choose a decision rule that makes a decision using inputs , in order to minimize some risk function . In dynamic watermarking: are measurements, is a binary variable that denotes if the system is under attack, and is the true system characterization; our decision rule for if the system is under attack is made without and , which are not observed. We then define a decision rule to be without disparate impact if
(1) |
where means is independent of . This increases fairness because it removes any impact of on the decision by imposing independence as a constraint. However, some [18, 16] have argued that this above definition of fairness can be too restrictive in some cases and that equalized odds is a better definition of fairness. Its only difference is that in (1) we replace with . That is, equalized odds ask for independence of and when conditioned on . We can interpret equalized odds as requiring error rates to be similar across protected groups. Finally, a notion associated equalized odds is that of equal opportunity, which amounts to enforcing , for some value . This is relevant when one particular type of error is of more interest than another.
I-B Relevance of Fairness to Watermarking
Fairness is relevant to the design of robust tests for two reasons. First, it provides a well-established technical language with which to discuss our requirement of robustness. Past dynamic watermarking techniques require exact system knowledge, and as such the corresponding watermarking tests will have error rates that are biased over inevitable system perturbations or uncertainties. Fairness notions such as equalized odds and equal opportunity allow for more specific framing of the problem and thus give a framework to design more robust methods for dynamic watermarking.
Second, robust cyber-security methods will have improved social impacts, which is the most general way of interpreting “fairness”. For example, smart homes can have many sensors. Changes in the distribution of sensor noise can correlate with factors such as climate, which correlates with geography and thus attributes like race, ethnicity, or class. A systemic bias in the ability to detect threats thus yields, and possibly perpetuates, systemic bias in outcomes among these groups. Robustness of cyber-security methods thus have the potential to improve societal fairness of the corresponding methods.
I-C Outline
In Sect. II, we outline key terminology and results in dynamic watermarking. In Sect. III-A, we present our covariance-robust dynamic watermarking scheme for the case of fixed, but unknown, measurement noise covariance. This is then extended in Sect. III-B to the case where measurement noise covariance is allowed to slowly vary. Sect. IV presents empirical results that demonstrate efficacy of our approach.
II Preliminaries
We describe the LTI system and attack models, and then review existing results about dynamic watermarking.
II-A LTI System Model
Consider a partially-observed MIMO LTI system
(2) | ||||
for and . Here is mean-zero i.i.d. multivariate Gaussian process noise with covariance matrix , and this is independent of that is i.i.d. Gaussian measurement noise with mean-zero; but we assume that the covariance matrix for is a linear function of a set of parameters taking values in polyhedron . For now, is assumed constant but unknown for any fixed system. The is an additive signal chosen by an attacker who seeks to corrupt sensor measurements.
Stabilizability of and detectability of imply the existence of a controller and observer such that and are Schur stable. The closed-loop system can be stabilized using the control input , where is the observer-estimated state. Define , , , and
(3) |
We can write the closed-loop evolution of the state and estimated state when as . Alternatively, we may define the observation error . Let , , , and
(4) |
The closed-loop system for this change of variables is . Note that is Schur stable since both and are Schur stable.
II-B Attack Model
Following [26], we consider attacks where for a fixed and i.i.d. Gaussian with mean-zero and covariance matrix . Here, the are chosen to follow the process , where are similarly i.i.d. Gaussian with mean-zero and covariance matrix . The implication is that the attacker minimizes or mutes the true output , and instead replaces it with a simulated output that follows the system dynamics and is thus not easily distinguishable as false. Furthermore, the attacker has access to process and measurement noise . With this attack, the closed-loop systems above become and .
II-C (Nonrobust) Dynamic Watermarking
The steady-state distribution of in an unattacked system will be Gaussian with mean-zero and a covariance matrix of
(5) |
Dynamic watermarking adds a small amount of Gaussian noise , the values unknown to the attacker, into the control input . This private excitation has mean-zero and covariance matrix . Defining and , the closed-loop systems with watermarking are given by and , respectively.
The watermarking noise leaves a detectable signal in the measurements , which can detect the presence of an attack by comparing the observer error to previous values of the watermark for some integer . Specifically, the work in [26] proposes the tests
(6) | ||||
(7) |
where . Any modeled attack passing these tests can be shown to asymptotically have zero power [26].
Finally, [26] also provides a test statistic for implementing the above test. Define and . Then the negative log-likelihood of a Wishart distribution is
(DW) | ||||
This can be used to perform a statistical hypothesis test to detect attacks when using dynamic watermarking.
III Covariance-Robust Dynamic Watermarking
We develop covariance-robust dynamic watermarking methods for two different cases. The first is where is fixed but unknown, and the second is where is slowly varying.
III-A Fixed But Unknown Noise Covariance
We begin by stating our assumptions for this case. First, we assume that we have knowledge of a set of positive semidefinite matrices such that these matrices are affinely independent and for the set
(8) |
Note that is a polyhedron, and that this set is defined to be the convex combination of . Our first result characterizes , which is the set of possible .
Lemma 1
Let satisfy . For , the solution to (5) is .
Proof:
This immediately follows by noting that both sides of (5) are linear in the matrices and . ∎
Since , we need to characterize the set of feasible matrices in terms of .
Lemma 2
Let . Then .
Proof:
This follows by the linearity in and . ∎
The set represents covariance matrices of that are “acceptable”, according to the original set of observation noise covariances that we should not mistake for attacks.
Lemma 3
The set is of dimension .
Proof:
This follows from Lemma 1, the fact that is of full column-rank, and the observability of , which in turn follows from the observability of . ∎
Finally, consider a modification of (DW) given by
(9) |
Note (9) is the negative log-likelihood of an Wishart distribution with scale matrix and degrees of freedom. Now, we may present our test statistic. Let and define the test statistic
(10) |
for the composite null hypothesis . For some , consider the test
(11) |
Since , this proposed test is equivalent to the generalized likelihood ratio test.
Theorem 1
For large enough , the decision rule (11) using test statistic satisfies equal opportunity with respect to the null hypothesis and where the protected attribute is the true measurement noise covariance .
Proof:
Due to Lemma 3 and our assumption that , satisfies the Le Cam regularity conditions required for the application of Wilk’s Theorem [27]. This means will be asymptotically distributed as a random variable plus a fixed constant regardless of the true value of , and thus implies that the event of a Type I error is independent of . ∎
This is a useful result because it implies that, in the proper regime, our test can come arbitrarily close to satisfying the initial goal of remaining robust to some uncertainty in the distribution of the measurement noise. However, is a non-convex set, and so the computation of is difficult. To this end, we propose the approximate test statistic
(CRDW) |
Lemma 4
For any , there exists a such that is a feasible solution to the optimization problem defining test (CRDW).
Proof:
First observe that any can be written as for some nonzero such that . Thus, it holds trivially that
(12) |
The right-most constraint in (12) can be restated using the Schur complement, and this reformulation is exact. Since , the Schur complement implies the second constraint in (CRDW) is equivalent to .
The first constraint in (CRDW) follows from the convexity of the matrix inverse for positive semidefinite matrices: Letting for positive definite matrices and , we have . For any , the function will have second derivative due to the positive-semidefiniteness of , so . Since this holds for any , we have that
(13) |
Remark 1
It was shown in [26] that test (7) ensures in any attack such that it holds true. In that case, we have
(14) |
since the Schur stability of implies that any effect of and are reduced to zero asymptotically. Since and are both positive semidefinite, meaning that
(15) |
Inverting both sides of this implies that, in the case that , we can generally expect that . The takeaway is that the looseness of the upper bound (13) should not greatly decrease the power of the modified test in the presence of test (7), as the tight lower bound is more germane to situations where the system is actually being attacked.
Remark 2
If the dimension is large, then the optimization (CRDW) may be expensive to solve from scratch each time. Furthermore, will likely not change drastically between runs when is large. So, lighter-weight first-order methods such as ADMM can be used instead [28]. These generally take longer to converge to high levels of accuracy, but have the advantage of being able to be readily warm-started.
III-B Slowly Varying Unknown Noise Covariance
A key difference between this setting and that of the static distribution is that a shift in the observer noise covariance in one period can have impacts on over the next few periods that do not easily fit into our previous representation of the . This is because it will take many steps before the covariance of approaches its asymptotic limit in . Thus, to accommodate a dynamically changing distribution of , we must use an expansion of the set .
We modify our setup for this subsection. The true covariance of and are and , respectively. Let and . Note that all are still assumed to be in . Finally, we make some additional assumptions. Since the spectral radius of is less than one, there exists some induced norm (denote this ) such that [29]. We assume changes every step but and all satisfy for some known value of . We also assume the system starts at steady state in the sense . Under these assumptions we have:
Lemma 5
Let be defined as
(16) |
Then , where is the Minkowski sum for all .
Proof:
Let be the set of upper-left submatrices of elements of , associated with terms. We start by noting that
(17) | ||||
Similarly, we can see that . Continuing this recursion relation leads to the fact that
(18) |
Due to the Schur stability of , the following limit exists, and can be represented as in Lemma 1.
(19) |
Note that is the steady state that would ultimately reach if (and therefore does not shift after step ; thus, it solves (5) for and exists in . Denote . Then,
(20) | ||||
Note that the term in the limit in the first equality is a constant in due to a simple re-indexing of (18). This is convenient because we can now break into an element known to be in and an error term. Our goal is now to choose large enough to bound
(21) |
over all paths that can take. An easy bound on the minimization is to simply set . Then, only needs to exceed
(22) |
By sub-multiplicativity of induced norms,
(23) | ||||
Finally, using the fact that [30] and applying (23) to the error term from (21) yields the desired result. ∎
Remark 3
Due to the topological equivalence of induced norms, the dependence of our choice of norm on can only affect the value of required by a constant .
Corollary 1
If , then the statement in Lemma 5 holds for and
(24) |
Proof:
With this , it is straightforward to extend the previous test statistic (CRDW) to this new expansion of as long as remains positive definite for all . In this case, we may define our new test statistic as
(CRDW*) |
Remark 4
If there is some so is not positive definite, then the first constraint above is not well-defined. Recalling that is a surrogate for , we note trivially satisfies . Thus in this problematic case, we may replace the in the first constraint with , for all . This issue is unlikely to be of practical concern for the same reasons discussed in Remark 1 regarding the relaxation of the set . Specifically, the structure of the attacks makes it unlikely that the first constraint in (CRDW*) would be binding in any case.
IV Empirical Results
In this section, we present simulation results that showcase the strength of our method when compared with the original test statistic (DW). We present results for both the case where the noise distribution is fixed but unknown, and for the case where the noise covariance is unknown and slowly-varying.
We use the standard model for simulation of an autonomous vehicle in [31], where the error kinematics of lane keeping and speed control is given by and . Here, is heading error, is lateral error, is trajectory distance, is vehicle angle, is vehicle velocity, is steering, and is acceleration. We linearize and initialize with a straight trajectory and constant velocity . We then performed exact discretization with sampling period . This yields the system dynamics
(25) |
with . We use process noise covariance .
All tests use dynamic watermarking with variance , and and were chosen to stabilize the system without an attack. We conduct four simulations: attacked and non-attacked systems where the measurement noise covariance is fixed, and attacked and non-attacked systems where the measurement noise covariance is allowed to vary. We ran all four simulations for 1000 iterations, or 50 seconds. In all cases, we compare the test metrics using the hypothesis test described in (11), where the measurement noise covariance is assumed to be . When simulating the attacked system, we choose an attacker with , , , and .
IV-A Fixed Covariance
We first show our test outperforms in the case where the true measurement noise covariance matrix is fixed but unknown to the tester. In our simulations, the true noise covariance is . In all tests, is described by the extreme points: , , , . Both the true measurement noise covariance and that incorrectly assumed in test statistic (DW) are in the resulting set. The simulation is run for 1000 steps.
Figure 1 shows the efficacy of our method under this new uncertainty. If test detection is consistent, the negative log likelihood values should be lower under regular conditions, and higher when the model is attacked. In particular, the nonrobust test statistic (DW) is shown in Fig. 1(a) to be wholly unable to distinguish an attacked system from an unattacked system when its assumption on the measurement noise covariation is violated, while Fig. 1(b) shows the robust test statistic (CRDW) to be able to do so.
IV-B Varying Covariance
Unattacked and attacked simulations were also conducted with a measurement noise distribution that was allowed to vary. We set and , implying . The true measurement noise is initialized at . This shifts linearly over the course of 250 iterations to a new value of , at which point it changes direction to shift linearly over 250 iterations to a value of . The measurement noise covariance stays at this value for 150 iterations. It then shifts linearly over 200 iterations to a terminal value of , which it takes for another 150 iterations before the simulation is terminated. The results for both the nonrobust and robust tests are shown in Fig. 2. As in the fixed covariance case, our test is able to distinguish between the attacked and unattacked systems better and more consistently than the nonrobust test that requires unsatisfied assumptions.
V Conclusion
We developed covariance-robust dynamic watermarking tests for detecting sensor attacks on LTI systems in the presence of uncertainty about the measurement noise covariance. We considered cases where the covariance of measurement noise is unknown and either fixed or slowly-varying, and we required our test to be “fair” with respect to all possible values of the covariance in that it not be more or less powerful for some covariances over others. These reflect real-world needs that will increase as 5G is deployed, because there will be an increase in the deployment of smart CPS systems. In such systems, an “unfair” test can translate to disparate impact across different users in different environments, which is a problem of algorithmic bias. Future research includes studying how dynamic watermarking can be adapted to other system uncertainties.
References
- [1] M. A. Ferrag, L. Maglaras, A. Argyriou, D. Kosmanos, and H. Janicke, “Security for 4g and 5g cellular networks: A survey of existing authentication and privacy-preserving schemes,” Journal of Network and Computer Applications, vol. 101, pp. 55–82, 2018.
- [2] M. Abrams and J. Weiss, “Malicious control system cyber security attack case study–Maroochy water services, australia,” MITRE, 2008.
- [3] R. Langner, “Stuxnet: Dissecting a cyberwarfare weapon,” IEEE Security & Privacy, vol. 9, no. 3, pp. 49–51, 2011.
- [4] A. A. Cárdenas, S. Amin, and S. Sastry, “Research challenges for the security of control systems.” in HotSec, 2008.
- [5] B. Satchidanandan and P. Kumar, “Dynamic watermarking: Active defense of networked cyber-physical systems,” Proc. of IEEE, 2016.
- [6] S. Weerakkody, Y. Mo, and B. Sinopoli, “Detecting integrity attacks on control systems using robust physical watermarking,” in Proc. of IEEE CDC, 2014, pp. 3757–3764.
- [7] Y. Mo, R. Chabukswar, and B. Sinopoli, “Detecting integrity attacks on scada systems,” IEEE CST, vol. 22, no. 4, pp. 1396–1407, 2014.
- [8] N. Zhang, J. Wang, G. Kang, and Y. Liu, “Uplink nonorthogonal multiple access in 5g systems,” IEEE Communications Letters, vol. 20, no. 3, pp. 458–461, 2016.
- [9] I. Ahmad, Z. Kaleem, R. Narmeen, L. D. Nguyen, and D.-B. Ha, “Quality-of-service aware game theory-based uplink power control for 5g heterogeneous networks,” Mobile Networks and Applications, vol. 24, no. 2, pp. 556–563, 2019.
- [10] B. Satchidanandan and P. Kumar, “On minimal tests of sensor veracity for dynamic watermarking-based defense of cyber-physical systems,” in IEEE COMSNETS, 2017, pp. 23–30.
- [11] Y. Mo and B. Sinopoli, “Secure control against replay attacks,” in Allerton Conference. IEEE, 2009, pp. 911–918.
- [12] Y. Mo, E. Garone, A. Casavola, and B. Sinopoli, “False data injection attacks against state estimation in wireless sensor networks,” in Proc. of IEEE CDC, 2010, pp. 5967–5972.
- [13] Y. Mo, S. Weerakkody, and B. Sinopoli, “Physical authentication of control systems: Designing watermarked control inputs to detect counterfeit sensor outputs,” IEEE Control Systems, vol. 35, no. 1, pp. 93–109, 2015.
- [14] M. Porter, S. Dey, A. Joshi, P. Hespanhol, A. Aswani, M. Johnson-Roberson, and R. Vasudevan, “Detecting deception attacks on autonomous vehicles via linear time-varying dynamic watermarking,” arXiv preprint arXiv:2001.09859, 2020.
- [15] R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth, “Fairness in criminal justice risk assessments: the state of the art,” arXiv preprint arXiv:1703.09207, 2017.
- [16] M. Hardt, E. Price, and N. Srebro, “Equality of opportunity in supervised learning,” in NeurIPS, 2016, pp. 3315–3323.
- [17] T. Calders, F. Kamiran, and M. Pechenizkiy, “Building classifiers with independency constraints,” in IEEE ICDMW, 2009, pp. 13–18.
- [18] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness through awareness,” in Proceedings of the 3rd Innovations in Theoretical Computer Science Conference. ACM, 2012, pp. 214–226.
- [19] I. Zliobaite, “On the relation between accuracy and fairness in binary classification,” arXiv preprint arXiv:1505.05723, 2015.
- [20] M. Olfat and A. Aswani, “Spectral algorithms for computing fair support vector machines,” in AISTATS, 2018, pp. 1933–1942.
- [21] A. Chouldechova, “Fair prediction with disparate impact: A study of bias in recidivism prediction instruments,” arXiv preprint arXiv:1703.00056, 2017.
- [22] M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi, “Fairness constraints: Mechanisms for fair classification,” in AISTATS, 2017.
- [23] F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii, “Fair clustering through fairlets,” in NeurIPS, 2017, pp. 5036–5044.
- [24] M. Olfat and A. Aswani, “Convex formulations for fair principal component analysis,” in AAAI, vol. 33, 2019, pp. 663–670.
- [25] A. Aswani and M. Olfat, “Optimization hierarchy for fair statistical decision problems,” arXiv preprint arXiv:1910.08520, 2019.
- [26] P. Hespanhol, M. Porter, R. Vasudevan, and A. Aswani, “Dynamic watermarking for general LTI systems,” in IEEE CDC, 2017, pp. 1834–1839.
- [27] S. S. Wilks, “The large-sample distribution of the likelihood ratio for testing composite hypotheses,” The annals of mathematical statistics, vol. 9, no. 1, pp. 60–62, 1938.
- [28] Z. Wen, D. Goldfarb, and W. Yin, “Alternating direction augmented lagrangian methods for semidefinite programming,” Mathematical Programming Computation, vol. 2, no. 3-4, pp. 203–230, 2010.
- [29] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge university press, 2012.
- [30] B. Q. Feng, “Equivalence constants for certain matrix norms,” Linear algebra and its applications, vol. 374, pp. 247–253, 2003.
- [31] V. Turri, A. Carvalho, H. Tseng, K. Johansson, and F. Borrelli, “Linear model predictive control for lane keeping and obstacle avoidance on low curvature roads,” in Proc. of IEEE ITSC, 2013, pp. 378–383.