Linear Quadratic Mean-Field Games with Communication Constraints
Abstract
In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an Nash equilibrium, where tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.
I Introduction
In distributed real-world applications, like networked control systems [1], ecosystem monitoring [2], and energy harvesting [3], we rarely have the luxury of pure persistent communication. Hence, in this work, we study multi-agent systems under a constrained communication structure. The communication constraints may appear in the form of limited sensor energy levels [3], noisy transmission medium [4, 5], limits on the communication frequency [6, 7] or some combination thereof, as we investigate in this work. Additionally, scalability becomes a huge challenge with increasing number of agents in a multi-agent system.
In this paper, we consider a discrete-time multi-agent game problem. Each agent is coupled with other agents through its cost function, which incentivizes the agent to form consensus with other players. In addition to a plant and a controller, the agent’s control system (See Figure 1) also consists of a scheduler and an Additive White Gaussian Noise (AWGN) channel. The scheduler controls the flow of information using a fixed scheduling policy. The communication through the AWGN channel is regulated by an encoder/decoder pair which constitutes a predictive encoder [8] to encode sequential data and a minimum mean-square estimation (MMSE) decoder to produce the best estimate of the plant state.
Related Work: There have been several works in the literature studying estimation and control problems under communication constraints. Reference [9] considers simultaneous design of measurement and control strategies for a class of Linear Quadratic Gaussian (LQG) problems under soft constraints on both. The LQG problem has been further studied for a noisy analog channel [8] and a noiseless digital channel [4] in the forward loop. All these works however, consider single agent problems with uninterrupted communication, unlike the setting of this work. In [1], the authors consider a problem where a network of plants share a noiseless communication medium via a state-based scheduling policy. The system has been shown to be dual effect free under a symmetry condition on the scheduling policy. Similarly the optimality of certainty equivalent control laws has been characterized under an event-triggered communication policy with a noiseless channel in [10, 11]. Our work on the other hand proposes a multi-agent game where each agent has intermittent access to its state measurement through a noisy channel.
A key difficulty in multi-agent systems is that of scalability. To alleviate this challenge, a mean-field game (MFG) framework was proposed in [12, 13] by Huang, Malhamé and Caines and simultaneously in [14], by Lasry and Lions. The essential idea in the MFG framework is that as the number of agents goes to infinity, agents become indistinguishable and the effect of individual deviation becomes negligible (that is, the effect of strategic interaction disappears). This leads to an aggregation effect, which can be modelled by an exogenous mean-field (MF) term. Consequently, the game problem reduces to a stochastic optimal control problem for a representative agent along with a consistency condition.
Linear Quadratic MFGs (LQ-MFGs), which combine linear agent dynamics with a quadratic cost function, serve as a significant benchmark in the study of MFGs. Recent works on LQ-MFGs [15, 16] in the discrete-time setting are free of communication constraints or consider partially observed dynamics involving packet drop-outs [17], thereby making the underlying communication link unreliable. Furthermore, Secure MFGs [18] capture the setting where the agents deliberately obfuscate their state information with the goal of subverting an eavesdropping adversary. In these works, however, communication occurs at every time instance, in contrast to our setting here, where the communication is intermittent and the channel adds noise to the incoming signal.
Contribution: In this paper, we prove that under a fixed scheduling policy, an AWGN channel and a standard information structure, the dual effect of control [19] does not show up. The result is presented in Lemma 1 and is one of the key the observations of the paper. This renders the covariance of estimation error independent of control signals (for both transmission and non-transmission times). Under the mean-field setting, this insight enables us to reduce the game to solving a standard optimal control tracking problem [16] along with a consistency condition. We prove the consistency condition of the mean-field equilibrium (MFE) under standard assumptions and characterize the linear dynamics of the equilibrium MF trajectory. Finally, we prove that the policies prescribed by the MFE constitute an -Nash equilibrium for the finite population game and provide simulations to illustrate the performance of the equilibrium control policy.
The paper is organized as follows. Following this introduction, Section II introduces the finite-agent game formulation of the multi-agent system and the underlying information structures of each of the its entities (see Fig. 1). In Section III, we formulate the LQ-MFG problem, characterize its MFE and demonstrate the -Nash property of the MFE. In Section IV, we provide simulations to analyze the performance of the MFE and conclude the paper in Section V sharing some highlights.
Notations: Let denote the agent’s state at time instance and the agent’s state history from instant to , i.e., . Let the set of non-negative integers and real numbers be denoted by and , respectively. The transpose of matrix is denoted by and trace of a square matrix by . For some vector and positive semi-definite matrix , let . Unless stated otherwise, let denote the 2-norm.
II Problem Formulation
Consider an -player game on infinite time horizon. Each agent’s dynamics evolves according to a linear discrete-time controlled stochastic process as
(1) |
where and are the state process and the control input, respectively, for the agent. is an i.i.d. Gaussian process with zero mean and finite covariance . The initial state has mean and covariance , and is assumed to be statistically independent of , . All covariance matrices are assumed to be positive definite. and are constant matrices with appropriate dimensions. denotes the type of the agent drawn from a finite set and is chosen according to the empirical distribution
(2) |
where is the indicator function. It is further assumed that weakly, for some probability distribution over the support of , with corresponding probability mass functions and , respectively.
To complete the problem formulation, we define the information structure on each of the blocks in Fig. 1. Such information structures are standard and appear in applications like industrial and process control [1] and wireless sensor networks [6].
Entity | Information State | Information Space | Input-output Map |
---|---|---|---|
Scheduler | |||
Encoder | |||
Decoder | |||
Controller |
First, we define a transmission time as an instant when information is sent over the channel. Let the history of transmission times up to the current instant () be denoted by the set , where denotes the transmission instant as formalized in the next paragraph. By convention, we take .
The information states, information spaces, and the input-output maps defining the scheduler, encoder, decoder and controller are as defined in Table I. The scheduler has access to the history of plant states and the decoded outputs, based on which, it decides the transmission times of plant state through the channel. The decision whether to transmit or not is taken based on an (innovations based) threshold scheduling policy (of the form , where is the error between plant and decoder output at instant , to be defined later, is a user-defined constant positive definite matrix and is the threshold parameter). Note that here, by innovations-based process, we mean, given a process , the innovation process contains new information not carried in the sequences [20, Section 5.3]. We define , where signifies that is a transmission instant and signifies no transmissions ().
Next, the encoder transmits encoded state information () at transmission times over the channel. The signal in Figure 1 is given as:
The encoder is assumed to have full knowledge of the system as in Table I, which leads to better control performance compared to structures where only partial information is available [4] (cf. [4] for a detailed study on other encoder information structures and how they may be realized). Such a situation emerges when the encoder and the scheduler are collocated with the plant, and hence, can observe both its state as well as the control actions applied to it [4]. In addition, we assume that the encoder is predictive, i.e., it transmits over the channel, functions of the true state minus the decoder output at the same instant. Such encoders are used in practice while encoding sequential data [8].
It is imperative to note here that the above assumptions on the encoder and the scheduler information structures, as we prove in the next section, will entail no dual-effect of control, and hence, lead to a simple controller design. The dual effect of control refers to the dual role of the controller in the evolution of the system dynamics and to probe the scheduler and new measurements to reduce its uncertainty on the system state [10]. Finally, we note, in the table, and .
The encoded signal is sent over an AWGN channel, which is analog, memoryless and is modeled as:
where is an i.i.d. zero mean Gaussian process with finite positive-definite covariance and represents the channel noise. The input and output alphabets of the channel lie in . The signal in Figure 1 is given as:
Next, the decoder at the controller end serves two purposes. Firstly, it decodes the noisy channel output to produce a MMSE estimate [8] of the input signal whenever new information is received via the channel. Secondly, between transmission times, it calculates a recursive estimate of the plant state to send information to the controller at all times . Thus, the complete decoder mapping is given by
(3) |
where is the recursive estimate calculated by the decoder between transmission instants, and is the input to the controller.
Finally, the controller calculates control actions by minimizing an infinite-horizon average cost function
(4) | ||||
where , and the parameter determines the tuple for each agent. Further, , where is as defined in Table I. The control law for agent over the sequence of deterministic control policies and is the space of admissible decentralized control laws. The coupling between agents enters via the consensus term in the objective. Further, the cost incorporates a soft constraint on the control actions alongside penalizing state deviations from the consensus term, which each agent aims to track. Finally, the expectation in (4) is taken with respect to noise statistics and the initial state distribution.
III Mean-Field Games
In this section, we solve the problem for the -player system (1) with objective (4) by considering the limiting case as . In this setting, the consensus term in (4) can be approximated by a known deterministic sequence (also termed the mean-field trajectory) following the Nash Certainty Equivalence Principle [12]. This reduces the problem to a tracking control problem and a consistency condition as shown later. First, we obtain the solution (to a fully observed tracking problem constructed from a partially observed one) for this infinite agent (mean-field) system. This solution, called the MFE, consists of computing an equilibrium control policy and the equilibrium MF trajectory. Finally, we demonstrate its –Nash property.
III-A Optimal Tracking Control
Consider a generic agent (from an infinite population system) of type with dynamics
(5) |
where and are the state process and the control input, respectively. The initial state has mean and finite positive-definite covariance . Further, is an i.i.d. Gaussian process with zero mean and finite positive-definite covariance . Let us denote the generic agent’s controller information space at time as . Then, its information state at any time is . Let us define the map or more specifically, maps to . The control law can then be given as , where denotes the admissible class of control laws. The objective function for the generic agent can be given as
(6) | ||||
where the expectation is taken over the noise statistics, initial state and the joint laws . In addition, is the MF trajectory, which represents the infinite player approximation to the coupling term in (4). The introduction of this term leads to indistinguishability between the agents, thereby making the effect of state deviations of individual agents negligible. Consequently, the game problem reduces to a stochastic optimal control problem for the generic agent followed by a consistency condition, whose solution is given by the MFE. Before we formally define the MFE, we state the following assumption on the mean-field system.
Assumption 1.
-
(i)
The pair is controllable and is observable.
-
(ii)
The MF trajectory , where is the space of bounded vector-valued functions.
Now, to define an MFE, we introduce two operators [16]:
-
1.
given by , which outputs the optimal control policy minimizing the cost (6), and
-
2.
given by , also called the consistency operator, which generates a MF trajectory consistent with the optimal policy as obtained above.
Then, we have the following definition.
Definition 1.
(Mean-field equilibrium [16]) The pair is an MFE if and . More precisely, is a fixed point of the map .
The trajectory is the MF trajectory at equilibrium with as the equilibrium control policy. The aim now is to design an optimal tracking control policy for (5) minimizing (6) under the information structure discussed above.
Before proceeding, we point out that with some abuse of notation, we retain the same notations (as in Figure 1) of a generic agent’s control system except by removing the superscript . We now construct the fully observed problem (in Proposition 1 below) using decoder estimates () from the noisy observations output from the channel. This will follow from two results, namely: the control policy is free of dual-effect [19, 21], and the optimal control policy is certainty equivalent, as we prove next. Define the error between the plant and the decoder output as:
(9) |
where from (3), we let and , as the error at the transmission and non-transmission times, respectively. Then, we have the following lemma.
Lemma 1.
Consider the system (5) for the generic agent under the information structure as in the previous section. Then:
-
(i)
The relative error is independent of all control choices for all .
-
(ii)
The expected value of and the conditional correlation between and given , are both zero.
Consequently, the control is free of dual-effect.
Proof.
We prove the Lemma in two parts, first for transmission and then for non-transmission times.
Part 1: Consider a transmission instant . Then, we have
Using simple manipulations, the error at the th transmission time can be recursively in terms of the preceding one as
(10) |
for matrices and of appropriate dimensions.
Now, we prove that is independent of control actions by induction on the parameter . Note that the first event time is assumed to be . Fix . Then, is independent of all control actions. Now, assume that is independent of all controls. Now, by assumption, we have that the process noise is independent of controls, . Consequently, from (III-A) and the induction hypothesis, we have that is independent of all controls. Finally, by the principle of mathematical induction, (i) holds for Part 1.
Next, since , we have Also, where the second equality follows since is -measurable (the sigma-algebra generated by ). Hence, (ii) holds.
Part 2: Now, we prove (i) and (ii) for the non-transmission times. Suppose and are some consecutive transmission times. Then, for any , , for appropriate matrices and . Then, by Part 1, (i) holds for all non-transmission times.
Next, we prove (ii). It is easy to see that the information state of the decoder is updated with new information only at the transmission instants. More specifically, any estimate between two transmission times and , for some , can be recovered from its information state . Thus,
The last equality follows from Part 1 and the fact that is an independent zero mean process. Finally, follows in the same manner as in Part 1. Thus (ii) holds and the proof is complete. ∎
Note that the proof of Lemma 1 is made possible due to the information structure of the scheduler, encoder and decoder. Since the information maps of the scheduler and the controller entail partially nested -algebras, the scheduler is able to recover the controller output information at its own end. This, in addition to the deterministic nature of control policies, allows the scheduler to compute the scheduling policy based on innovations and consequently take complete authority over transmission of new information. As a result, the control is dual-effect free.
Next, we prove that due to the no dual-effect, the separation principle holds for the underlying tracking problem. Consequently, the optimal control law under the information structure in Table I, is certainty equivalent [1].
Theorem 1.
Consider the information structure on the generic agent as in Table I. Then, the control design problem separates into designing a state decoder and a certainty equivalent controller.
Proof.
Consider the cost-to-go as follows:
where The third equality follows from Lemma 1 and the fact that is deterministic and independent of . The last inequality follows again since the relative error is independent of control actions by Lemma 1. Then, since the error-induced term is independent of controls, the proof is complete. ∎
Certainty equivalence of the optimal control policy is a consequence of no dual-effect [19, 21] of control as in Lemma 1. When the control is free of dual-effect, the covariance of the estimation error is independent of the control signals used. Thus, the controller can no longer benefit from probing the scheduler for information and can be designed independently of the scheduler and the decoder. We are now ready to state the following Proposition.
Proposition 1 (Separated Stochastic Optimal Control Problem).
Using Theorem 1, the fully observed system can be constructed using the partially observed one (due to the presence of noisy channel) as:
(13) |
where , with the associated cost-to-go Then, under Assumption 1,
-
(i)
The optimal control policy for the separated problem (13) is given as
(14) where , . Further, is the unique positive definite solution to the algebraic Riccati equation
(15) and the trajectory is given as
(16) where is Hurwitz.
III-B MFE Analysis
Using Proposition 1, we now prove the existence and uniqueness of the MFE by introducing an operator as shown in this section. First note from (16) that . Then, substituting (14) in (13), we arrive at the closed-loop system as:
Then, using from Proposition 1 and (9), we can rewrite the above closed-loop system as
(17) |
It is to be noted that , if , which further gives . Substituting this value of in (17), we get
Now, taking expectation on both sides and denoting as the aggregate trajectory of agents of type , we get
(18) | ||||
where we use the tower property of conditional expectation and Lemma 1, to get . Finally, (18) can be simplified further as:
(19) | |||
Now, using the emprical distribution (2), define the operator as:
(20) |
where maps the input sequence to another sequence at time . Using this operator, we prove existence and uniqueness of the equilibrium MF trajectory by finding the fixed point of (20), under the following assumption.
Assumption 2.
We assume , where .
Assumption 2 is motivated by results from existing literature [13, 17, 16]. While it is stronger than the ones in [15, 17], it entails linear MF trajectory dynamics, which is easily tractable.
Theorem 2.
Proof.
-
(i)
Consider the linear system (18) with driving input , which is bounded since Since by Assumption 2, and , we have from (18) and (20), that , which proves the first statement in part (i). Next, consider the following:
where the last equality follows from Assumption 2. Finally, using Banach’s fixed point theorem and the first statement of part (i), has a unique fixed point in .
-
(ii)
Define the operator , given as:
with and . To prove that such a indeed exists, we follow the same lines of proof as in [16] to arrive at
which, under Assumption 2, establishes that is a contraction. Using completeness of and Banach’s fixed point theorem, we indeed have the existence of such an . Finally, from (i) above, we get that the unique MF trajectory can be constructed recursively as with .
∎
Theorem 2 (i) provides us with a unique MFE while the linearity of the MF trajectory in (ii) gives a control law which is linear in the state of the agent and the equilibrium trajectory [16]. This further makes the computation of this trajectory tractable which would otherwise have involved a non-causal infinite sum.
III-C -Nash Equilibrium
We now show that under the information structure in Table I, the MFE constitutes an -Nash equilibrium for the -player game. Before doing that, we first provide the definition of -Nash equilibrium for the discrete-time MFG under communication constraints. Let us denote the space of admissible centralized control policies for agent as , under a centralized information structure, where each agent is assumed to have access to other agents’ output histories. Then, we have the following:
Definition 2.
We start by proving that the mass behaviour in (4) converges to the equilibrium MF trajectory as in Theorem 2 (i). Note henceforth, signals superscripted by an asterisk () will represent quantities in the equilibrium, e.g., denotes the state of agent at time under equilibrium control policy.
Lemma 2.
Proof.
First, consider the following:
where the inequality follows from the identity and . We now prove that the first term in the above inequality vanishes. Let . Then, using (17), we have
Then, as in [17, Lemma 2], and independent of and , such that , which implies that . This finally gives Next, since the support of is finite (and hence compact), this implies that is uniformly bounded for all . Further, since from Theorem 2, we have . Thus, (21) holds and the proof is complete. ∎
Now, we prove the -Nash property of the MFE. Toward that end, consider the following cost functions:
(22) | |||
(23) | |||
(24) | |||
where is the equilibrium MF trajectory (Theorem 2) and is the state of agent at time when it chooses a control law from the set of centralized policies . Notice that this set is strictly larger than the set . Furthermore, the control action is derived from . Now, we have the following theorem stating the -Nash result, i.e., that the control laws prescribed by the MFE are also -Nash in the finite population case.
Theorem 3.
Proof.
We prove the theorem in two steps. In the first step, we derive an upper bound on , and in step 2, on . Finally, we combine the two to get (25).
Step 1: Consider the following:
(26) |
where both the inequalities follow from the Cauchy-Schwarz inequality.
Step 2: Consider the following:
Finally, using Cauchy-Schwarz inequality in the same manner as for (26), we get
(27) |
Similar to Theorem 3 in [17], there exist , such that . Further, from Theorem 2 (i), there exist , such that . Finally, since , , we may consider such that there exist , with the property that . Choose , and let , following which, we have (25) from (26) and (III-C). Finally, define , which converges to 0 as using Lemma 2. The proof is thus complete. ∎
Before concluding this section, we remark that according to Theorem 3, the decentralized equilibrium policy provides an -Nash equilibrium for the centralized policy structure in the -player game. Consequently, it also provides an -Nash equilibrium for the decentralized policy structure in the original -player game formulated in Section II.
IV Simulations
In this section we demonstrate the performance of MFE under different scheduling policies. We simulate a finite population game with scalar dynamics and a single type . The dynamics and cost parameters of the agents satisfy Assumptions 1 and 2. For Figure, 2(a) we simulate a game of agents and show that in spite of significant channel noise the estimation error decreases allowing the agents to form a consensus in a very short time. Note that the output does not perfectly mimic the true state due to the asynchronous nature of communication.


For Figure 2(b),we simulate the behavior of N=1000 agents and plot the average cost per agent on a logarithmic axis against scheduling threshold of . The figure shows a box plot depicting the median (red line) and spread (box) of the average cost per agent over runs for each value of . The plot shows a clear increase in average cost per agent as is increased, indicating that an increase in leads to an increase in estimation error, which in turn causes a higher average cost per agent. This indicates that a compromise can be reached between performance (average cost) vs communication frequency through a judicious choice of the threshold parameter .
V Conclusion
In this paper, we have studied LQ-MFGs under communication constraints, namely when there is intermittent communication over an AWGN channel. Under the defined information structure involving the scheduler, encoder, decoder and controller, we have proved that the control is free of the dual-effect in the mean field limit. Consequently, the optimal control policy has been shown to be certainty equivalent. Under appropriate assumptions, we have established the existence, uniqueness and characterization (linearity) of the mean-field trajectory, shown to have the -Nash property. We have also empirically demonstrated that the performance of the equilibrium policies deteriorates for decreasing communication frequency, aligned with intuition.
References
- [1] C. Ramesh, H. Sandberg, and K. H. Johansson, “Design of state-based schedulers for a network of control loops,” IEEE Transactions on Automatic Control, vol. 58, no. 8, pp. 1962–1975, 2013.
- [2] X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling and remote estimation over an additive noise channel,” Automatica, vol. 88, pp. 57–69, 2018.
- [3] A. Nayyar, T. Başar, D. Teneketzis, and V. V. Veeravalli, “Optimal strategies for communication and remote estimation with an energy harvesting sensor,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2246–2260, 2013.
- [4] S. Tatikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channel,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1549–1561, 2004.
- [5] S. Tatikonda and S. Mitter, “Control over noisy channels,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1196–1201, 2004.
- [6] O. C. Imer and T. Başar, “Optimal estimation with limited measurements,” International Journal of Systems, Control and Communications, vol. 2, no. 1-3, pp. 5–29, 2010.
- [7] X. Gao, E. Akyol, and T. Başar, “Optimal estimation with limited measurements and noisy communication,” in 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 1775–1780.
- [8] S. Tatikonda, A. Sahai, and S. Mitter, “Control of LQG systems under communication constraints,” in Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 1. IEEE, 1998, pp. 1165–1170.
- [9] R. Bansal and T. Başar, “Simultaneous design of measurement and control strategies for stochastic systems with feedback,” Automatica, vol. 25, no. 5, pp. 679–694, 1989.
- [10] A. Molin and S. Hirche, “On the optimality of certainty equivalence for event-triggered control systems,” IEEE Transactions on Automatic Control, vol. 58, no. 2, pp. 470–474, 2012.
- [11] D. J. Antunes and M. H. Balaghi I., “Consistent event-triggered control for discrete-time linear systems with partial state information,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 181–186, 2019.
- [12] M. Huang, R. P. Malhamé, P. E. Caines et al., “Large population stochastic dynamic games: closed-loop Mckean-Vlasov systems and the Nash Certainty Equivalence principle,” Communications in Information & Systems, vol. 6, no. 3, pp. 221–252, 2006.
- [13] M. Huang, P. E. Caines, and R. P. Malhamé, “Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized -Nash equilibria,” IEEE Transactions on Automatic Control, vol. 52, no. 9, pp. 1560–1571, 2007.
- [14] J.-M. Lasry and P.-L. Lions, “Mean field games,” Japanese Journal of Mathematics, vol. 2, no. 1, pp. 229–260, 2007.
- [15] M. A. uz Zaman, K. Zhang, E. Miehling, and T. Başar, “Approximate equilibrium computation for discrete-time linear-quadratic mean-field games,” in American Control Conference (ACC). IEEE, 2020, pp. 333–339.
- [16] ——, “Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games,” in 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 2278–2284.
- [17] J. Moon and T. Başar, “Discrete-time LQG mean field games with unreliable communication,” in 53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 2697–2702.
- [18] M. A. uz Zaman, S. Bhatt, and T. Başar, “Secure discrete-time linear-quadratic mean-field games,” in International Conference on Decision and Game Theory for Security. Springer, 2020, pp. 203–222.
- [19] Y. Bar-Shalom and E. Tse, “Dual effect, certainty equivalence, and separation in stochastic control,” IEEE Transactions on Automatic Control, vol. 19, no. 5, pp. 494–500, 1974.
- [20] B. D. Anderson and J. B. Moore, Optimal filtering. Courier Corporation, 2012.
- [21] A. A. Feldbaum, “Dual control theory,” in Control Theory: Twenty-Five Seminal Papers (1932-1981), T. Başar, Ed. Wiley-IEEE Press, 2001, ch. 10, pp. 874–880.
- [22] D. P. Bertsekas, Dynamic Programming and Optimal Control: Vol. 1. Athena Scientific, Belmont, MA, 2000.
- [23] N. Saldi, T. Başar, and M. Raginsky, “Markov–Nash equilibria in mean-field games with discounted cost,” SIAM Journal on Control and Optimization, vol. 56, no. 6, pp. 4256–4287, 2018.