A Game Theoretic Analysis of LQG Control under Adversarial Attack
Abstract
Motivated by recent works addressing adversarial attacks on deep reinforcement learning, a deception attack on linear quadratic Gaussian control is studied in this paper. In the considered attack model, the adversary can manipulate the observation of the agent subject to a mutual information constraint. The adversarial problem is formulated as a novel dynamic cheap talk game to capture the strategic interaction between the adversary and the agent, the asymmetry of information availability, and the system dynamics. Necessary and sufficient conditions are provided for subgame perfect equilibria to exist in pure strategies and in behavioral strategies; and characteristics of the equilibria and the resulting control rewards are given. The results show that pure strategy equilibria are informative, while only babbling equilibria exist in behavioral strategies. Numerical results are shown to illustrate the impact of strategic adversarial interaction.
I INTRODUCTION
Deep reinforcement learning (DRL) has recently emerged as a promising solution for solving large Markov decision processes (MDPs) and partially observable MDPs (POMDPs), thanks to deep neural networks used as policy approximators [1]. DRL has, however, been shown to be vulnerable to small perturbations of the state observation, called adversarial examples, which were found to mislead the control agent to take suboptimal control actions [2]. While there has been a significant recent interest in the design of adversarial examples against DRL [2, 3, 4, 5, 8], there has been little work on characterizing the ability of agents to adapt to those.
Recent work proposed to use adversarial examples for making DRL agents more robust to perturbations, by letting the adversary and the agent play against each other, and formulating the interaction as a stochastic game (SG) [7]. Nonetheless, in the case of adversarial examples the agent cannot observe the system state directly, neither can the adversary affect the state transition probabilities directly, only through the actions taken by the agent. Hence, the SG model does not capture the information structure of the problem. Effectively, in the presence of adversarial examples the agent has to solve a POMDP, where the observations are subverted by the adversary so as to mislead the agent.
As a model of this interaction, in this paper we propose a game theoretical model to study the strategic interaction between an agent that has to solve a linear quadratic Gaussian (LQG) control problem, and an adversary that can manipulate the agent’s observations by a randomly chosen affine transformation subject to a mutual information constraint, and aims at minimizing the control reward. The resulting problem is formulated as a dynamic cheap talk game, which captures information asymmetry, the beliefs of the adversary and the agent, and the undetectability constraint imposed on the adversarial attacks.
Our paper contributes to the solution of the formulated game theoretical problem in two ways. First, we address necessary and sufficient conditions for the existence of subgame perfect equilibria (SPEs) in pure strategies and in behavioral strategies, and we characterize the equilibrium strategies. Second, we characterize the rewards achievable in equilibria, and relate them to the rewards of a naive agent and an alert agent under attack. The key novelty of our contribution is that we characterize the strategies to be followed by the agent and by the adversary under strategic interaction, which has not been addressed by the existing literature.
The rest of the paper is organized as follows. In Section II we review related work. In Section III we present the system model and problem formulation. In Section IV we provide analytical results. In Section V we provide numerical results. Section VI concludes the paper.
Notation: Unless otherwise specified, we denote a random variable by a capital letter and its realization by the corresponding lower-case letter. We denote by the Gaussian distribution, by the support set, by the mutual information, and by the cardinality of a set.
II RELATED WORK
Related to our work are previous researches on robust POMDP under uncertainty of the system dynamics [6]. In [6] the control action was optimized under the worst-case assumption of the system dynamics in each stage, i.e., the agent plays as the leader and the dynamic system plays as the follower in a Stackelberg game. SG and partially observable SG (POSG) were used to model the strategic interaction of players in a dynamic system, and have been employed in robust and adversarial problems [7, 8, 9]. But unlike in the case of learning under adversarial attacks, in SG and in POSG the players interact with each other through the impact of their actions on the state transitions, not on the state observations. Our work is related to the cheap talk game [18], where a sender with private information sends a message to a receiver and the receiver takes an action based on the received message and based on its belief on the inaccessible private information. Closest to our model are [10, 11, 12]. In [10] a dynamic cheap talk game was proposed to study a deception attack on a Markovian system, where the actions do not affect the state transitions. In [11] authors developed a dynamic game model of the attacker-defender interaction, and characterized the optimal attack strategy as a function of the defense strategy, allowing for a static optimal defense strategy. In our preliminary work [12] we proposed a dynamic cheap talk framework to model deception attacks on a general MDP, and addressed computational issues.
Adversarial variants of LQG control were considered in a number of recent works. A Stackelberg game was formulated in [13], where the dynamic system is the leader, while the agent is the follower and may be an adversary. The authors formulated a finite horizon hierarchical signaling game between the sender and the receiver in a dynamic environment and showed that linear sender and receiver strategies can yield the equilibrium [14]. In [15], the opposite problem was studied but without considering the complete strategic interaction, where the adversary optimally manipulates the control actions instead of the system states. The optimal attack on both the system state and the control action in LQG control was studied in [16]. In [17], a targeted attack strategy was studied to mislead the LQG system to a particular state while evading detection.
III ADVERSARIAL LQG CONTROL PROBLEM
We consider an -stage LQG control problem under adversarial attack, as illustrated in Fig. 1. The system states , the manipulated states , the actions , and the instantaneous rewards for are described by
(1) | |||
(2) | |||
(3) | |||
(4) | |||
(5) | |||
(6) | |||
(7) |
III-A LQG Recapitulation
If and are known by the agent, the above problem is a standard LQG control. In the -th stage, the agent observes but not , and determines the action with the aim to maximize its expected accumulated reward. Note that it is sufficient to consider an affine function of for in the standard LQG problem as the optimal action is a linear function of the mean of the Gaussian posterior distribution of for the agent after observing and [20]. To compute the optimal coefficients and , we first define , and for define as
(8) |
Furthermore, we denote by the belief of the agent about , which is the Gaussian posterior distribution of for the agent after observing and . Then, given the manipulated state , the optimal action can be expressed as
(9) | |||
(10) | |||
(11) |
where the coefficients and depend on ; is the mean of the Gaussian posterior distribution of for the agent after observing and . Note that when and the LQG strategy reduces to the linear quadratic regulator (LQR) strategy
(12) |
III-B Adversarial Model
The adversary can manipulate the observation of the agent and its objective is to minimize the agent’s expected accumulated reward, similar to [2, 5]. In the -th stage, the adversary chooses the manipulation parameters , , manipulates the state to , and then reports the manipulated state to the agent. We consider that the adversarial manipulation is “small”, since a large manipulation may be easily detected and may also involve a high manipulation cost. Given the agent’s belief , we impose the following constraints on the manipulation:
(13) | |||
(14) |
i.e., . The mutual information constraint (14) implies that the manipulated state conveys at least a certain amount of information about the system state to the agent. A larger value of means a weaker adversary, and vice versa. Note that in order to satisfy the mutual information constraint, the adversary cannot use . We denote by the set of feasible adversarial actions in the -th stage subject to (13)-(14). Finally, in the end of this stage, the adversary reveals , to the agent111This assumption is strong but may hold in some cases. For instance, the player in the shell game reveals the cup in which the pellet is after each round., so as to keep the adversarial model consistent with the standard LQG control.
III-C Adversarial LQG Control Game
In every stage of the adversarial problem, there is a cheap talk interaction, where the adversary acts as the sender and the agent as the receiver. Different from the dynamic cheap talk game with an action-independent Markovian system [10], we propose a novel dynamic cheap talk game to model the strategic interaction of the adversary and the agent with asymmetric information in the adversarial LQG control problem. Unlike in recent works on adversarial reinforcement learning [2, 3, 4, 5], in our model of strategic interaction the agent is aware of and can adapt to the adversary.
The game is played between the adversary and the agent over stages. In the -th stage, the belief of the agent is known to the adversary and determines the action set . The adversary uses a behavioral strategy over for choosing , . Then, given the observed system state it generates the manipulated state with the probability measure . The agent uses a pure strategy222The following analysis will show that it is sufficient to consider a pure agent strategy with an affine form. for choosing and based on the belief , and takes the action once it receives the manipulated state . Finally, the players compute the belief based on the current belief , the coefficient , the variance , the manipulated state , and the action as and .
We can thus express the expected accumulated agent reward using the adversarial strategies and the agent’s strategies over stages as
(15) |
Consequently, the objective of the adversary is to minimize (15), while the agent aims at maximizing it. We refer to this particular dynamic cheap talk game as adversarial LQG (ALQG) game. Fig. 2 illustrates a three-stage ALQG game. Our objective is to characterize SPEs in the ALQG game: the existence conditions and solution structures.
Remark 1
The adversarial LQG problem cannot be modeled as an SG or a POSG, since the adversary directly manipulates the observation of the agent. Furthermore, different from zero-sum SG, an SPE of the ALQG game does not necessarily exist. On the other hand, for the game is a cheap talk game [18], where the strategies of both players depend on the belief of the agent and on the constraints on the adversarial manipulation. Nonetheless, in the ALQG game the reward function is different from that in [18], which gives rise to different equilibria, as we will show later.
IV EQUILIBRIUM ANALYSIS
In the following, we first formulate the value function and the belief update rule for the adversarial LQG problem; we then characterize SPEs in pure strategies and in behavioral strategies, respectively.
IV-A Value Function and Belief Update
Assume that there is an SPE consisting of strategies . Induced by this SPE, we can define the value function of a subgame starting from the -th stage as
(16) |
i.e., the value function is the expected accumulated agent reward in the subgame starting from the -th stage when the belief in the -th stage is and the SPE strategies are used. The evaluation of needs the beliefs in the subgame. In the following, we specify the belief update rule.
Given the current belief , the coefficient , the variance , the manipulated state , and the action , it follows from the adversarial LQG model and Bayes rule that with
(17) | |||
(18) |
An immediate consequence of the belief update rule is the following.
Property 1
It follows from , the variance update rule (18), and that for all .
Observe that the value functions have to satisfy the backward dynamic programming equation
(19) |
Thus the SPE has to satisfy (19), which is the basis for the analysis we present in the following.
IV-B Pure Strategy Equilibria
We start the analysis considering pure strategy equilibria. With slight abuse of notation, we denote by a pure strategy of the adversary as a function of the belief .
We first consider the case .
Proposition 1
Let . An SPE consists of , where for any belief ; and can be any adversarial strategy defined on .
Proof:
Observe that is a dominant strategy for the agent for any belief . Under this strategy, the adversarial strategy has no impact on the agent’s reward. This proves the result. ∎
The existence of a pure strategy SPE for is encouraging, even if the equilibrium is degenerate. Unfortunately, for an SPE may not exist as shown in the following theorem.
Theorem 1
Let . If or if , then there is no pure strategy SPE for the ALQG game. If , then there is a unique pure strategy SPE. The SPE strategies for are given by
(20) | |||
(21) | |||
(22) | |||
(23) | |||
(24) | |||
(25) | |||
(26) |
Corollary 1
If , the value function induced by the unique pure strategy SPE is
(27) |
Remark 2
We can make the following observations based on Theorem 1 and Corollary 1.
-
•
Since the LQG control can be seen as the best response of any given (adversarial manipulated) observation model, it is sufficient to consider a pure agent strategy in the form of an affine function for an SPE of the ALQG game with a pure adversarial strategy.
-
•
It follows from (24) that a rational adversary will always apply a manipulation with the largest variance.
-
•
The value function consists of a constant term and two separable terms depending on the mean and the variance of the belief, which allows a closed form solution for arbitrary .
Time-invariant system: We now turn to the asymptotic analysis of a time-invariant system, i.e., , , , , and for , and we let . Let us define the mapping as
(29) |
Observe that is effectively the coefficient update (20)-(22) for the time-invariant model. In what follows, we first characterize and furthermore the pure strategy SPE of the ALQG game in the asymptotic regime.
Proposition 2
Let . Then the mapping admits a least fixed point , for which
(30) |
with .
Proof:
We start the proof by observing that the mapping is order-preserving. That is, for all , satisfying , i.e., and , we have . This can be easily shown by analyzing (29).
Furthermore, the fixed point equation has a unique solution on if and only if . Since is order-preserving, the convergence result follows from Kleene’s fixed point theorem [21]. ∎
Analytical expressions for and can be obtained by solving the fixed point equation , and can be used for characterizing the SPE in pure strategies, using Theorem 1 and Proposition 2, as follows.
Theorem 2
Let , , and . Then the ALQG game of the time-invariant model has a stationary SPE in pure strategies as: For ,
(31) | |||
(32) | |||
(33) | |||
(34) |
Proof:
Interestingly, for this stationary SPE in pure strategies we can obtain the expected average reward per stage in steady state in closed form.
Corollary 2
Let with bounded mean and variance. For the stationary SPE in pure strategies in Theorem 2, the expected average reward per stage in steady state is independent of the initial belief and is given by
(35) |
IV-C Equilibria in Behavioral Strategies
The previous results show that a pure strategy SPE does not exist if there are multiple choices of the coefficient for the adversary. We thus turn to the analysis of SPE in behavioral strategies.
Theorem 3
Let and . Then for , and are as given by (21), and
(36) | |||
(37) |
Furthermore, there is a continuum of SPEs in behavioral strategies. Each SPE in the -th stage consists of a behavioral strategy of the adversary and a pure strategy of the agent that satisfies
(38) | |||
(39) | |||
(40) | |||
(41) |
Interestingly, this SPE is a babbling equilibrium in which the agent’s action is based on its belief, not on the manipulated state. Nonetheless, the value of the game depends on the adversarial manipulation, as shown in the following corollary.
Corollary 3
Let . For any SPE in behavioral strategies, we have
(42) |
Remark 3
We can make the following observations based on Theorem 3 and Corollary 42.
-
•
It is sufficient for the agent to use a pure affine strategy against a behavioral strategy of the adversary.
-
•
Although the adversary cannot use , the behavioral strategy needs to achieve zero-mean of the random coefficient .
-
•
A rational adversary will always use a manipulation with the largest variance.
- •
Time-invariant system: We again turn to the time-invariant system for . Let us define the mapping as
(44) |
Observe that is effectively the coefficient update rule (21), (36), (37) for the time-invariant system. In what follows, we characterize and the stationary SPEs in behavioral strategies for the ALQG game.
Proposition 3
Let . Then the mapping admits a least fixed point , for which
(45) |
Theorem 4
Let , , and . Then the ALQG game of the time-invariant model has a stationary SPE in behavioral strategies as: For ,
(46) | |||
(47) | |||
(48) |
Corollary 4
Let with bounded mean and variance. For the stationary SPE in behavioral strategies in Theorem 4, the expected average reward per stage in steady state is independent of the initial belief and is given by
(49) |
The proofs of Theorem 4 and Corollary 49 are based on Theorem 3, Corollary 42, and Proposition 3, and follow from similar arguments as used in the proofs of Theorem 2 and Corollary 35. Observe that Corollary 35, Corollary 49, and Property 3 jointly imply that the expected average agent reward per stage in steady state is higher when considering pure strategies, as behavioral strategies allow for more uncertainty about the attack and thus make the adversary stronger.
The behavioral strategy of the adversary in Theorem 3 needs to achieve zero-mean of the random coefficient , which cannot be satisfied if or . In the following, we study SPE under these conditions.
Theorem 5
Let . If or if , there is no SPE for the ALQG game.
Theorem 6
Let . If or if , there is a unique SPE in behavioral strategies for the ALQG game: For any belief ,
(50) | |||
(51) | |||
(52) | |||
(53) | |||
(54) |
for any belief , ; ; and
(55) |
V NUMERICAL RESULTS
We illustrate the impact of strategic interaction on a time-invariant adversarial LQG problem. The parameters used for the evaluation are shown in Table I.
Parameter | |||||||
---|---|---|---|---|---|---|---|
Value |
We start with illustrating the convergence of the mappings in Propositions 2 and 3. Fig. 5 shows the mappings and as functions of the iteration for and . Both mappings increase monotonically starting from and converge to their least fixed points and , respectively, which confirms these propositions.
We continue with the evaluation of stationary SPEs for the time-invariant system. Fig. 5 shows the expected average reward per stage for a stationary SPE in pure strategies and that for a stationary SPE in behavioral strategies. Observe that the adversarial manipulation capability decreases as the mutual information lower bound increases, and hence the expected average rewards increase. The results also confirm Property 3, i.e., the expected average reward per stage for a stationary SPE in behavioral strategies cannot be higher than that for a stationary SPE in pure strategies.
Next, we assess the importance of strategic interaction on the agent’s performance, as compared to an agent that is unaware of the attack [2, 5]. Fig. 5 shows the expected average reward per stage for a stationary SPE in behavioral strategies, as per Corollary 49, that for a naive agent under an optimal adversarial attack, and that for an alert agent under SPE adversarial behavioral strategy in Corollary 49, for . The naive agent is unaware of the adversarial manipulation, i.e., it uses the optimal LQR strategy (12). The corresponding optimal stationary adversarial strategy can be obtained through dynamic programming, and is the pure strategy:
The alert agent suspects an adversary but does not act strategically despite the presence of an adversary. The alert agent assumes , , and uses the corresponding best response strategy
The figure shows that the expected average reward per stage of the naive agent is always lower than that for the SPE. Clearly, as the mutual information lower bound increases, the adversarial manipulation capability becomes weaker and the expected average rewards per stage increase. At the same time, we can observe that if the bound of the manipulation coefficient is higher then the adversarial manipulation capability becomes stronger and therefore the expected average reward per stage of the naive agent decreases. Note that the limit of the expected average reward per stage of the naive agent does not exist when is larger than a threshold due to the resulting instability of the control system. The poor performance of the naive agent is consistent with recent works on adversarial DRL [2, 5], where the naive DRL agents were found to perform poorly against strategic adversaries. The results for the SPE show, however, that an agent that is aware of the adversary can adjust its strategy to be resilient to adversarial attack. The figure also shows that the expected average reward per stage of the alert agent is always lower than that for the SPE since the alert agent does not adjust its best response strategically to the SPE adversarial strategy. Different from the SPE and the naive agent, it is interesting to observe that the alert agent’s performance deteriorates as the adversarial constraint increases. This is due to that the alert agent’s strategy deviates more from the SPE strategy , as shown in Corollary 49, as increases.
VI CONCLUSION
We proposed a game theoretic model to capture the strategic interaction, information asymmetry and system dynamics for LQG control under adversarial input subject to a mutual information constraint. We characterized the subgame perfect equilibria in pure strategies and in behavioral strategies, including stationary equilibria for time-invariant systems. Our results show that if an equilibrium exists then the agent can use an affine, pure strategy, but randomization enables the adversary to construct more powerful attacks, under a wider range of parameters, and forces the agent into a babbling equilibrium. Our numerical results show the importance of strategic interaction for LQG control, and highlight that an agent that is aware of an adversarial attack can be designed resilient. Our work could be extended in a number of interesting directions, including considering a non-scalar state dynamic system, and relaxing the assumption that the adversarial strategy is revealed to the agent after each stage.
-A Proofs of Theorem 1 and Corollary 1
Proof:
We prove the result using backward dynamic programming. Recall that there is no feasible adversarial strategy for . Thus, it is sufficient to consider the cases or . In stage the value function for can be expressed as
(56) |
where from the update rules (20)-(22). The pure strategies, which form an SPE and achieve the value function, consist of any pure strategy satisfying the adversarial constraints (13)-(14), and the dominant pure strategy
(57) |
In stage , the Q-function of using , , , and given a belief is
(58) |
where the expectations are induced by the given , , , , and .
Given , , , and , the Q-function is a concave quadratic function of . As the best response to maximize the agent reward, we can substitute in terms of , , and as
(59) |
Thus, it is sufficient to consider the Q-function
(60) |
As shown in Property 1, the belief variance is always positive. Therefore, given , , and , the Q-function is a concave quadratic function of , and the best response of the agent in terms of for , , and can be expressed as
(61) |
Given , , and , the Q-function is a decreasing function of . The best response of the adversary in terms of for and is thus
(62) |
Consequently, it is sufficient to consider the Q-function
(63) |
The pure strategies form an SPE if and satisfy
(64) | |||
(65) |
If , is a dominant adversarial strategy. Therefore, the SPE must exist. The pure strategies can be obtained by substituting into (59), (61), and (62) as
The value function can be obtained by substituting and into the Q-function (63) as
Let us now consider the case . Assume that there exists an SPE with . As the best response, solving (65) leads to . For all and we have
Thus, condition (64) cannot hold and hence the assumption is not true, i.e., there is no pure strategy SPE in this case.
-B Proofs of Theorem 3 and Corollary 42
Proof:
We prove the result by verifying that the given strategies form an SPE. The dominant pure strategy of the agent and the value function in the final stage are as shown in the proofs of Theorem 1 and Corollary 1. Note that any behavioral adversarial strategy satisfying (13)-(14) can be since it has no impact on the agent reward. Therefore, Theorem 3 and Corollary 42 hold in the final stage.
For stage , we first show that it is sufficient to consider a pure agent strategy with an affine form. A general behavioral agent strategy decides an action based on the belief and the observation with the probability measure . Given a belief , a behavioral adversarial strategy , an observation , and an action from the support set of a behavioral agent strategy , the Q-function is
(66) |
which is a concave quadratic function of . As the best response to maximize the agent reward, the support set of the behavioral agent strategy is a singleton, i.e., it is sufficient to use a pure agent strategy, which has an affine form as
(67) |
Assume that an SPE in behavioral strategies consists of and . Given , in the support set of a behavioral adversarial strategy , , and , we have the following Q-function:
(68) |
From Property 3 and the adversarial constraints (13)-(14), we have
(69) |
Therefore, any behavioral adversarial strategy is the best response of if its support set consists of two or more elements of .
Assume that an SPE consists of a behavioral adversarial strategy , which is defined on a support set containing two or more elements of , and satisfies . Given , , , and , we have the following Q-function:
(70) |
The best response of is
(71) |
-C Proofs of Theorems 5 and 6
Proof:
To prove Theorems 5 and 6, it is sufficient to consider a two-stage problem, i.e., . The solution of the final stage is the same as in the proofs of Theorem 3 and Corollary 42, and therefore is omitted here. Theorems 5 and 6 hold in the final stage. Furthermore, as shown in the proofs of Theorem 3 and Corollary 42, it is sufficient to consider a pure agent strategy with an affine form in the first stage.
Given , in the support set of a behavioral adversarial strategy , , and , the Q-function is
(72) |
This Q-function is non-increasing in for any given , , , and . As the best response to minimize the agent reward, it is sufficient to consider a behavioral adversarial strategy defined on a non-singleton support set of .
Given , with a non-singleton support set of , , and , the Q-function is
(73) |
This is a concave quadratic function of when , , and are fixed. As the best response to maximize the agent reward, we can substitute with
(74) |
Then the Q-function (73) reduces to
(75) |
This is also a concave quadratic function of when and are fixed. As the best response to maximize the agent reward, we can substitute with
(76) |
Since we consider or and the behavioral adversarial strategy has a non-singleton support set, and in these cases.
We then study the support set of a behavioral adversarial strategy. Given , in the support set of a behavioral adversarial strategy , , and , the Q-function is
(77) |
This is a concave quadratic function of given , , and . As the best response to minimize the agent reward, it is sufficient to consider a behavioral adversarial strategy with the following support set: .
When or , an SPE in behavioral strategies does not exist since or will lead to a singleton support set of the behavioral adversarial strategy; and meanwhile a pure strategy SPE does not exist since . This proves Theorem 5.
When or , we assume that an SPE in behavioral strategies consists of
Since the assumed is the best response of the assumed , we only need to testify if exists such that is the best response of , i.e., both and are minimizers of the Q-function . There is a unique solution
(78) |
Therefore, there is a unique SPE in behavioral strategies for the two-stage ALQG game. The strategies and the value function of the SPE in Theorem 6 can then be obtained easily. ∎
References
- [1] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, 2015.
- [2] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, “Adversarial attacks on neural network policies,” arXiv:1702.02284.
- [3] Y.-C. Lin, Z.-W. Hong, Y.-H. Liao, M.-L. Shih, M.-Y. Liu, and S. Min, “Tactics of adversarial attack on deep reinforcement learning agents,” in Proc. of IJCAI, 2017.
- [4] V. Behzadan and A. Munir, “Vulnerability of deep reinforcement learning to policy induction attacks,” in Proc. of MLDM, 2017, pp. 262-275.
- [5] A. Russo and A. Proutiere, “Optimal attacks on reinforcement learning policies,” arXiv:1907.13548.
- [6] T. Osogami, “Robust partially observable Markov decision process,” in Proc. of ICML, 2015.
- [7] L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” in Proc. of ICML, 2017.
- [8] A. Gleave, M. Dennis, C. Wild, N. Kant, S. Levine, and S. Russell, “Adversarial policies: Attacking deep reinforcement learning,” arXiv:1905.10615.
- [9] K. Horak, Q. Zhu, and B. Bosansky, “Manipulating adversary’s belief: A dynamic game approach to deception by design for proactive network security,” in Proc. of GameSec, 2017.
- [10] S. Saritas, S. Yuksel, and S. Gezici, “Nash and Stackelberg equilibria for dynamic cheap talk and signaling games,” in Proc. of ACC, 2017.
- [11] S. Saritas, E. Shereen, H. Sandberg, and G. Dán, “Adversarial attacks on continuous authentication security: A dynamic game approach,” in Proc. of GameSec, 2019.
- [12] Z. Li and G. Dán, “Dynamic cheap talk for robust adversarial learning,” in Proc. of GameSec, 2019.
- [13] M.O. Sayin and T. Basar, “Secure sensor design for cyber-physical systems against advanced persistent threats,” in Proc. of GameSec, 2017.
- [14] M.O. Sayin, E. Akyol, and T. Basar, “Hierarchical multistage Gaussian signaling games in noncooperative communication and control systems,” Automatica, vol. 107, pp. 9-20, 2019.
- [15] R. Zhang and P. Venkitasubramaniam, “Stealthy control signal attacks in linear quadratic Gaussian control systems: Detectability reward tradeoff,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 7, pp. 1555-1570, 2017.
- [16] Q. Zhang, K. Liu, Y. Xia, and A. Ma, “Optimal stealthy deception attack against cyber-physical systems,” IEEE Transactions on Cybernetics, 2020.
- [17] Y. Chen, S. Kar, and J.M.F. Moura, “Cyber physical attacks constrained by control objectives,” in Proc. of ACC, 2016, pp. 1185-1190.
- [18] V.P. Crawford and J. Sobel, “Strategic information transmission,” Econometrica, vol. 50, no. 6, pp. 1431-1451, 1982.
- [19] L. Shapley, “Stochastic games,” Proc. of the National Academy of Sciences, vol. 39, no. 10, pp. 1095-1100, 1953.
- [20] T. Soderstrom, Discrete-Time Stochastic Systems, Springer, 2002.
- [21] A. Baranga, “The contraction principle as a particular case of Kleene’s fixed point theorem,” Discrete Mathematics, vol. 98, pp. 75-79, 1991.