This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Linear Quadratic Mean-Field Games with Communication Constraints

Shubham Aggarwal, Muhammad Aneeq uz Zaman, and Tamer Başar The authors are affiliated with the Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL, USA 61801. Emails: {sa57,mazaman2,basar1}@illinois.edu.Research is supported in part by an AFOSR Grant (FA9550-19-1-0353).
Abstract

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an ϵ\epsilon-Nash equilibrium, where ϵ\epsilon tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.

I Introduction

In distributed real-world applications, like networked control systems [1], ecosystem monitoring [2], and energy harvesting [3], we rarely have the luxury of pure persistent communication. Hence, in this work, we study multi-agent systems under a constrained communication structure. The communication constraints may appear in the form of limited sensor energy levels [3], noisy transmission medium [4, 5], limits on the communication frequency [6, 7] or some combination thereof, as we investigate in this work. Additionally, scalability becomes a huge challenge with increasing number of agents in a multi-agent system.

In this paper, we consider a discrete-time multi-agent game problem. Each agent is coupled with other agents through its cost function, which incentivizes the agent to form consensus with other players. In addition to a plant and a controller, the agent’s control system (See Figure 1) also consists of a scheduler and an Additive White Gaussian Noise (AWGN) channel. The scheduler controls the flow of information using a fixed scheduling policy. The communication through the AWGN channel is regulated by an encoder/decoder pair which constitutes a predictive encoder [8] to encode sequential data and a minimum mean-square estimation (MMSE) decoder to produce the best estimate of the plant state.

Related Work: There have been several works in the literature studying estimation and control problems under communication constraints. Reference [9] considers simultaneous design of measurement and control strategies for a class of Linear Quadratic Gaussian (LQG) problems under soft constraints on both. The LQG problem has been further studied for a noisy analog channel [8] and a noiseless digital channel [4] in the forward loop. All these works however, consider single agent problems with uninterrupted communication, unlike the setting of this work. In [1], the authors consider a problem where a network of plants share a noiseless communication medium via a state-based scheduling policy. The system has been shown to be dual effect free under a symmetry condition on the scheduling policy. Similarly the optimality of certainty equivalent control laws has been characterized under an event-triggered communication policy with a noiseless channel in [10, 11]. Our work on the other hand proposes a multi-agent game where each agent has intermittent access to its state measurement through a noisy channel.

A key difficulty in multi-agent systems is that of scalability. To alleviate this challenge, a mean-field game (MFG) framework was proposed in [12, 13] by Huang, Malhamé and Caines and simultaneously in [14], by Lasry and Lions. The essential idea in the MFG framework is that as the number of agents goes to infinity, agents become indistinguishable and the effect of individual deviation becomes negligible (that is, the effect of strategic interaction disappears). This leads to an aggregation effect, which can be modelled by an exogenous mean-field (MF) term. Consequently, the game problem reduces to a stochastic optimal control problem for a representative agent along with a consistency condition.

Linear Quadratic MFGs (LQ-MFGs), which combine linear agent dynamics with a quadratic cost function, serve as a significant benchmark in the study of MFGs. Recent works on LQ-MFGs [15, 16] in the discrete-time setting are free of communication constraints or consider partially observed dynamics involving packet drop-outs [17], thereby making the underlying communication link unreliable. Furthermore, Secure MFGs [18] capture the setting where the agents deliberately obfuscate their state information with the goal of subverting an eavesdropping adversary. In these works, however, communication occurs at every time instance, in contrast to our setting here, where the communication is intermittent and the channel adds noise to the incoming signal.

Contribution: In this paper, we prove that under a fixed scheduling policy, an AWGN channel and a standard information structure, the dual effect of control [19] does not show up. The result is presented in Lemma 1 and is one of the key the observations of the paper. This renders the covariance of estimation error independent of control signals (for both transmission and non-transmission times). Under the mean-field setting, this insight enables us to reduce the game to solving a standard optimal control tracking problem [16] along with a consistency condition. We prove the consistency condition of the mean-field equilibrium (MFE) under standard assumptions and characterize the linear dynamics of the equilibrium MF trajectory. Finally, we prove that the policies prescribed by the MFE constitute an ϵ\epsilon-Nash equilibrium for the finite population game and provide simulations to illustrate the performance of the equilibrium control policy.

The paper is organized as follows. Following this introduction, Section II introduces the finite-agent game formulation of the multi-agent system and the underlying information structures of each of the its entities (see Fig. 1). In Section III, we formulate the LQ-MFG problem, characterize its MFE and demonstrate the ϵ\epsilon-Nash property of the MFE. In Section IV, we provide simulations to analyze the performance of the MFE and conclude the paper in Section V sharing some highlights.

Notations: Let XkiX_{k}^{i} denote the ithi^{th} agent’s state at time instance kk and Xk:kiX_{k:k^{\prime}}^{i} the ithi^{th} agent’s state history from instant kk to kk^{\prime}, i.e., Xk:ki=(Xki,,Xki)X_{k:k^{\prime}}^{i}=(X_{k}^{i},\cdots,X_{k^{\prime}}^{i}). Let the set of non-negative integers and real numbers be denoted by +\mathbb{Z}^{+} and +\mathbb{R}^{+}, respectively. The transpose of matrix AA is denoted by AA^{\prime} and trace of a square matrix MM by Tr{M}Tr\{M\}. For some vector zz and positive semi-definite matrix SS, let zS2=zSz\|z\|^{2}_{S}=z^{\prime}Sz. Unless stated otherwise, let \|\cdot\| denote the 2-norm.

II Problem Formulation

Consider an NN-player game on infinite time horizon. Each agent’s dynamics evolves according to a linear discrete-time controlled stochastic process as

Xk+1i=A(ϕi)Xki+B(ϕi)Uki+Wki,i[1,N],\displaystyle X_{k+1}^{i}=A(\phi_{i})X_{k}^{i}+B(\phi_{i})U_{k}^{i}+W_{k}^{i},~{}~{}i\in[1,N], (1)

where XkinX_{k}^{i}\in\mathbb{R}^{n} and UkimU_{k}^{i}\in\mathbb{R}^{m} are the state process and the control input, respectively, for the ithi^{th} agent. WkinW_{k}^{i}\in\mathbb{R}^{n} is an i.i.d. Gaussian process with zero mean and finite covariance Σw\Sigma_{w}. The initial state X0iX_{0}^{i} has mean νϕi,0\nu_{\phi_{i},0} and covariance Σx\Sigma_{x}, and is assumed to be statistically independent of WkiW_{k}^{i}, k+\forall k\in\mathbb{Z}^{+}. All covariance matrices are assumed to be positive definite. A(ϕi)A(\phi_{i}) and B(ϕi)B(\phi_{i}) are constant matrices with appropriate dimensions. ϕi\phi_{i} denotes the type of the ithi^{th} agent drawn from a finite set Φ:={ϕ1,,ϕm}\Phi:=\{\phi_{1},\cdots,\phi_{m}\} and is chosen according to the empirical distribution

FN(ϕ)=1Ni=1N𝕀{ϕiϕ},ϕΦ,\displaystyle F_{N}(\phi)=\frac{1}{N}\sum_{i=1}^{N}\mathbb{I}_{\{\phi_{i}\leq\phi\}},~{}~{}\phi\in\Phi, (2)

where 𝕀{}\mathbb{I}_{\{\cdot\}} is the indicator function. It is further assumed that limNFN(ϕ)=F(ϕ)\lim\limits_{N\rightarrow\infty}{F_{N}(\phi)}=F(\phi) weakly, for some probability distribution F(ϕ)F(\phi) over the support of Φ\Phi, with corresponding probability mass functions PN(ϕ)P_{N}(\phi) and P(ϕ)P(\phi), respectively.

To complete the problem formulation, we define the information structure on each of the blocks in Fig. 1. Such information structures are standard and appear in applications like industrial and process control [1] and wireless sensor networks [6].

Entity Information State Information Space Input-output Map
Scheduler Iki,sc(X0:ki,Y0:k1i)I^{i,sc}_{k}\triangleq\left(X_{0:k}^{i},Y_{0:k-1}^{i}\right) ki,sc\mathcal{I}^{i,sc}_{k} ξki:Iki,scγk\xi_{k}^{i}:~{}I^{i,sc}_{k}\mapsto\gamma_{k}
Encoder Iki,ϵ(X0:ki,U0:k1i,c𝒦(k1)i,d𝒦(k1)i,Y0:k1i)I_{k}^{i,\epsilon}\triangleq\left(X_{0:k}^{i},U_{0:k-1}^{i},c_{\mathcal{K}(k-1)}^{i},d_{\mathcal{K}(k-1)}^{i},Y_{0:k-1}^{i}\right) ki,ϵ\mathcal{I}_{k}^{i,\epsilon} ki:Iki,ϵcki.\mathscr{E}_{k}^{i}:~{}I_{k}^{i,\epsilon}\mapsto c_{k}^{i}.
Decoder Iki,d(Y0:k1i,U0:k1i,d𝒦(k)i)I_{k}^{i,d}\triangleq\left(Y_{0:k-1}^{i},U_{0:k-1}^{i},d_{\mathcal{K}(k)}^{i}\right) ki,d\mathcal{I}_{k}^{i,d} 𝒟ki:Iki,dYki\mathscr{D}_{k}^{i}:~{}I_{k}^{i,d}\mapsto Y_{k}^{i}
Controller Iki,π(U0:k1i,Y0:ki)I_{k}^{i,\pi}\triangleq\left(U_{0:k-1}^{i},Y_{0:k}^{i}\right) ki,π\mathcal{I}_{k}^{i,\pi} πki:Iki,πUki\pi_{k}^{i}:~{}I_{k}^{i,\pi}\mapsto U_{k}^{i}
TABLE I: Dictionary of Information states of entities in Fig. 1

 

First, we define a transmission time as an instant when information is sent over the channel. Let the history of transmission times up to the current instant kk (k>0k>0) be denoted by the set 𝒦(k):={llk,γl=1}\mathcal{K}(k):=\{l\mid l\leq k,\gamma_{l}=1\}, where γl\gamma_{l} denotes the transmission instant as formalized in the next paragraph. By convention, we take 𝒦(0)={0}\mathcal{K}(0)=\{0\}.

ControllerPlant\pgfmathresultptEncoderSchedulerChannelDecoderUkiU_{k}^{i}XkiX_{k}^{i}c¯ki\bar{c}_{k}^{i}γk\gamma_{k}d¯ki\bar{d}_{k}^{i}YkiY_{k}^{i}
Figure 1: Closed-loop information flow for the ithi^{th} agent

The information states, information spaces, and the input-output maps defining the scheduler, encoder, decoder and controller are as defined in Table I. The scheduler has access to the history of plant states and the decoded outputs, based on which, it decides the transmission times of plant state through the channel. The decision whether to transmit or not is taken based on an (innovations based) threshold scheduling policy (of the form δkSδkα\delta_{k}^{\prime}S\delta_{k}\geq\alpha, where δk\delta_{k} is the error between plant and decoder output at instant kk, to be defined later, S>0S>0 is a user-defined constant positive definite matrix and α>0\alpha>0 is the threshold parameter). Note that here, by innovations-based process, we mean, given a process {zk}\{z_{k}\}, the innovation process {z~k}\{\tilde{z}_{k}\} contains new information not carried in the sequences zk1,zk2,z_{k-1},z_{k-2},\cdots [20, Section 5.3]. We define γk:=ξki(Iki,sc)\gamma_{k}:=\xi_{k}^{i}\left(I^{i,sc}_{k}\right), where γk=1\gamma_{k}=1 signifies that kk is a transmission instant and γk=0\gamma_{k}=0 signifies no transmissions (φ\varphi).

Next, the encoder transmits encoded state information (ckic_{k}^{i}) at transmission times over the channel. The signal c¯ki\bar{c}_{k}^{i} in Figure 1 is given as:

c¯ki={cki,ifγk=1,φ,ifγk=0.\displaystyle\bar{c}_{k}^{i}=\begin{cases}c_{k}^{i},&\text{if}~{}~{}\gamma_{k}=1,\\ \varphi,&\text{if}~{}~{}\gamma_{k}=0.\end{cases}

The encoder is assumed to have full knowledge of the system as in Table I, which leads to better control performance compared to structures where only partial information is available [4] (cf. [4] for a detailed study on other encoder information structures and how they may be realized). Such a situation emerges when the encoder and the scheduler are collocated with the plant, and hence, can observe both its state as well as the control actions applied to it [4]. In addition, we assume that the encoder is predictive, i.e., it transmits over the channel, functions of the true state minus the decoder output at the same instant. Such encoders are used in practice while encoding sequential data [8].

It is imperative to note here that the above assumptions on the encoder and the scheduler information structures, as we prove in the next section, will entail no dual-effect of control, and hence, lead to a simple controller design. The dual effect of control refers to the dual role of the controller in the evolution of the system dynamics and to probe the scheduler and new measurements to reduce its uncertainty on the system state [10]. Finally, we note, in the table, c𝒦(k)i:={clilk,γl=1}c_{\mathcal{K}(k)}^{i}:=\{c_{l}^{i}\mid l\leq k,\gamma_{l}=1\} and d𝒦(k)i:={dlilk,γl=1}d_{\mathcal{K}(k)}^{i}:=\{d_{l}^{i}\mid l\leq k,\gamma_{l}=1\}.

The encoded signal is sent over an AWGN channel, which is analog, memoryless and is modeled as:

dli=cli+vli,l𝒦(k)\displaystyle d_{l}^{i}=c_{l}^{i}+v_{l}^{i},~{}l\in\mathcal{K}(k)

where vliv_{l}^{i} is an i.i.d. zero mean Gaussian process with finite positive-definite covariance Σv\Sigma_{v} and represents the channel noise. The input and output alphabets of the channel lie in n\mathbb{R}^{n}. The signal d¯ki\bar{d}_{k}^{i} in Figure 1 is given as:

d¯ki={dki,ifγk=1,φ,ifγk=0.\displaystyle\bar{d}_{k}^{i}=\begin{cases}d_{k}^{i},&\text{if}~{}~{}\gamma_{k}=1,\\ \varphi,&\text{if}~{}~{}\gamma_{k}=0.\end{cases}

Next, the decoder at the controller end serves two purposes. Firstly, it decodes the noisy channel output to produce a MMSE estimate [8] of the input signal whenever new information is received via the channel. Secondly, between transmission times, it calculates a recursive estimate of the plant state to send information to the controller at all times kk. Thus, the complete decoder mapping is given by

Yki={𝔼{Xki|Iki,d},ifγk=1,AYk1i+BUk1i,ifγk=0,\displaystyle Y_{k}^{i}=\begin{cases}\mathbb{E}\{X_{k}^{i}|I_{k}^{i,d}\},&\text{if}~{}~{}\gamma_{k}=1,\\ AY_{k-1}^{i}+BU_{k-1}^{i},&\text{if}~{}~{}\gamma_{k}=0,\end{cases} (3)

where AYk1i+BUk1iAY_{k-1}^{i}+BU_{k-1}^{i} is the recursive estimate calculated by the decoder between transmission instants, and YkiY_{k}^{i} is the input to the controller.

Finally, the controller calculates control actions by minimizing an infinite-horizon average cost function

JiN(πi,πi):=\displaystyle J_{i}^{N}(\pi^{i},\pi^{-i}):= (4)
lim supT1T𝔼{k=0T1Xki1Nj=1NXkjQ(ϕi)2+UkiR(ϕi)2}\displaystyle\limsup_{T\rightarrow\infty}{\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\|^{2}_{Q(\phi_{i})}+\|U_{k}^{i}\|^{2}_{R(\phi_{i})}\right\}}

where Q(ϕi)0Q(\phi_{i})\geq 0, R(ϕi)>0R(\phi_{i})>0 and the parameter ϕiΦ\phi_{i}\in\Phi determines the tuple (A(ϕ),B(ϕ),Q(ϕ),R(ϕ))(A(\phi),B(\phi),Q(\phi),R(\phi)) for each agent. Further, πi:=(π1,,πi1,πi+1,πN)\pi^{-i}:=(\pi^{1},\cdots,\pi^{i-1},\pi^{i+1},\cdots\pi^{N}), where πki\pi^{i}_{k} is as defined in Table I. The control law for agent ii over the sequence of deterministic control policies πi:=(π0i,π1i,,)¯i\pi^{i}:=(\pi_{0}^{i},~{}\pi_{1}^{i},\cdots,)\in\bar{\mathcal{M}}_{i} and ¯i\bar{\mathcal{M}}_{i} is the space of admissible decentralized control laws. The coupling between agents enters via the consensus term 1Nj=1NXkj\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}} in the objective. Further, the cost incorporates a soft constraint on the control actions alongside penalizing state deviations from the consensus term, which each agent aims to track. Finally, the expectation in (4) is taken with respect to noise statistics and the initial state distribution.

Now, that the problem description is complete, the aim is to design a decentralized control policy for each agent in (1) minimizing its local objective (4), which is done using the MFG framework in the next section.

III Mean-Field Games

In this section, we solve the problem for the NN-player system (1) with objective (4) by considering the limiting case as NN\rightarrow\infty. In this setting, the consensus term in (4) can be approximated by a known deterministic sequence (also termed the mean-field trajectory) following the Nash Certainty Equivalence Principle [12]. This reduces the problem to a tracking control problem and a consistency condition as shown later. First, we obtain the solution (to a fully observed tracking problem constructed from a partially observed one) for this infinite agent (mean-field) system. This solution, called the MFE, consists of computing an equilibrium control policy and the equilibrium MF trajectory. Finally, we demonstrate its ϵ\epsilon–Nash property.

III-A Optimal Tracking Control

Consider a generic agent (from an infinite population system) of type ϕ\phi with dynamics

Xk+1=A(ϕ)Xk+B(ϕ)Uk+Wk,k+\displaystyle X_{k+1}=A(\phi)X_{k}+B(\phi)U_{k}+W_{k},~{}k\in\mathbb{Z}^{+} (5)

where XknX_{k}\in\mathbb{R}^{n} and UkmU_{k}\in\mathbb{R}^{m} are the state process and the control input, respectively. The initial state X0X_{0} has mean νϕ,0\nu_{\phi,0} and finite positive-definite covariance Σx\Sigma_{x}. Further, WknW_{k}\in\mathbb{R}^{n} is an i.i.d. Gaussian process with zero mean and finite positive-definite covariance Σ\Sigma. Let us denote the generic agent’s controller information space at time kk as kμ\mathcal{I}_{k}^{\mu}. Then, its information state at any time kk is Ikμ(U0:k1,Y0:k)kμI_{k}^{\mu}\triangleq(U_{0:k-1},Y_{0:k})\in\mathcal{I}_{k}^{\mu}. Let us define the map μk:kμ𝒰,\mu_{k}:~{}\mathcal{I}_{k}^{\mu}\rightarrow\mathcal{U}, or more specifically, μk\mu_{k} maps IkμI_{k}^{\mu} to UkU_{k}. The control law can then be given as μ:=(μ0,μ1,)\mu:=(\mu_{0},\mu_{1}\cdots,)\in\mathcal{M}, where \mathcal{M} denotes the admissible class of control laws. The objective function for the generic agent can be given as

J(μ,X¯)\displaystyle J(\mu,\bar{X}) (6)
:=lim supT1T𝔼{k=0T1XkX¯kQ(ϕ)2+UkR(ϕ)2}\displaystyle:=\limsup_{T\rightarrow\infty}{\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}-\bar{X}_{k}\|^{2}_{Q(\phi)}+\|U_{k}\|^{2}_{R(\phi)}\right\}}

where the expectation is taken over the noise statistics, initial state and the joint laws μ\mu. In addition, X¯=(X¯1,X¯2,)\bar{X}=(\bar{X}_{1},\bar{X}_{2},\cdots) is the MF trajectory, which represents the infinite player approximation to the coupling term in (4). The introduction of this term leads to indistinguishability between the agents, thereby making the effect of state deviations of individual agents negligible. Consequently, the game problem reduces to a stochastic optimal control problem for the generic agent followed by a consistency condition, whose solution is given by the MFE. Before we formally define the MFE, we state the following assumption on the mean-field system.

Assumption 1.
  1. (i)

    The pair (A(ϕ),B(ϕ))(A(\phi),B(\phi)) is controllable and (A(ϕ),Q(ϕ)12)(A(\phi),{Q(\phi)}^{\frac{1}{2}}) is observable.

  2. (ii)

    The MF trajectory X¯𝒳\bar{X}\in\mathcal{X}, where 𝒳:={X¯kn:X¯:=supk0X¯k<}\mathcal{X}:=\{\bar{X}_{k}\in\mathbb{R}^{n}:\|\bar{X}\|_{\infty}:=\sup_{k\geq 0}{\|\bar{X}_{k}\|}<\infty\} is the space of bounded vector-valued functions.

Now, to define an MFE, we introduce two operators [16]:

  1. 1.

    Ψ:𝒳\Psi:\mathcal{X}\rightarrow\mathcal{M} given by Ψ(X¯)=argminμJ(μ,X¯)\Psi(\bar{X})=\operatorname*{argmin}_{\mu\in\mathcal{M}}J(\mu,\bar{X}), which outputs the optimal control policy minimizing the cost (6), and

  2. 2.

    Λ:𝒳\Lambda:\mathcal{M\rightarrow\mathcal{X}} given by Λ(μ)=X¯\Lambda(\mu)=\bar{X}, also called the consistency operator, which generates a MF trajectory consistent with the optimal policy Ψ(X¯)\Psi(\bar{X}) as obtained above.

Then, we have the following definition.

Definition 1.

(Mean-field equilibrium [16]) The pair (μ,X¯)×𝒳(\mu^{*},\bar{X}^{*})\in\mathcal{M}\times\mathcal{X} is an MFE if μ=Ψ(X¯)\mu^{*}=\Psi(\bar{X}^{*}) and X¯=Λ(μ)\bar{X}^{*}=\Lambda(\mu^{*}). More precisely, X¯\bar{X}^{*} is a fixed point of the map ΛΨ\Lambda\circ\Psi.

The trajectory X¯\bar{X}^{*} is the MF trajectory at equilibrium with μ\mu^{*} as the equilibrium control policy. The aim now is to design an optimal tracking control policy for (5) minimizing (6) under the information structure discussed above.

Before proceeding, we point out that with some abuse of notation, we retain the same notations (as in Figure 1) of a generic agent’s control system except by removing the superscript ii. We now construct the fully observed problem (in Proposition 1 below) using decoder estimates (YkY_{k}) from the noisy observations output from the channel. This will follow from two results, namely: the control policy is free of dual-effect [19, 21], and the optimal control policy is certainty equivalent, as we prove next. Define the error between the plant and the decoder output as:

e¯k:=XkYk={ek,γk=1δk,γk=0,\displaystyle\bar{e}_{k}:=X_{k}-Y_{k}=\left\{\begin{array}[]{ll}e_{k},&\gamma_{k}=1\\ \delta_{k},&\gamma_{k}=0\end{array},\right. (9)

where from (3), we let ek:=Xk𝔼{Xk|Ikd,dk}e_{k}:=X_{k}-\mathbb{E}\{X_{k}|I_{k}^{d},d_{k}\} and δk:=XkAYk1BUk1\delta_{k}:=X_{k}-AY_{k-1}-BU_{k-1}, as the error at the transmission and non-transmission times, respectively. Then, we have the following lemma.

Lemma 1.

Consider the system (5) for the generic agent under the information structure as in the previous section. Then:

  1. (i)

    The relative error e¯k\bar{e}_{k} is independent of all control choices for all kk.

  2. (ii)

    The expected value of e¯k\bar{e}_{k} and the conditional correlation between YkY_{k} and e¯k\bar{e}_{k} given kd\mathcal{F}^{d}_{k}, are both zero.

Consequently, the control is free of dual-effect.

Proof.

We prove the Lemma in two parts, first for transmission and then for non-transmission times.

Part 1: Consider a transmission instant lr𝒦(k)l_{r}\in\mathcal{K}(k). Then, we have

e¯lr+1=elr+1\displaystyle\bar{e}_{l_{r+1}}=e_{l_{r+1}} =Xlr+1Ylr+1,\displaystyle=X_{l_{r+1}}-Y_{l_{r+1}},
=Xlr+1𝔼{Xlr+1|Ilr+1d},\displaystyle=X_{l_{r+1}}-\mathbb{E}\left\{X_{l_{r+1}}|I^{d}_{l_{r+1}}\right\},
=A(ϕ)elr+11+Wlr+11\displaystyle=A(\phi)e_{l_{r+1}-1}+W_{l_{r+1}-1}
𝔼{A(ϕ)elr+11+Wlr+11|Ilr+1d}.\displaystyle\qquad-\mathbb{E}\left\{A(\phi)e_{l_{r+1}-1}+W_{l_{r+1}-1}|I^{d}_{l_{r+1}}\right\}.

Using simple manipulations, the error at the lr+1l_{r+1}-th transmission time can be recursively in terms of the preceding one as

elr+1\displaystyle e_{l_{r+1}} =η1(ϕ)elr+η2(ϕ)Wlr:lr+11\displaystyle=\eta_{1}(\phi)e_{l_{r}}+\eta_{2}(\phi)W_{l_{r}:l_{r+1}-1}
𝔼{η1(ϕ)elr+η2(ϕ)Wlr:lr+11|Ilr+1d},\displaystyle-\mathbb{E}\left\{\eta_{1}(\phi)e_{l_{r}}+\eta_{2}(\phi)W_{l_{r}:l_{r+1}-1}|I^{d}_{l_{r+1}}\right\}, (10)

for matrices η1(ϕ)\eta_{1}(\phi) and η2(ϕ)\eta_{2}(\phi) of appropriate dimensions.

Now, we prove that elre_{l_{r}} is independent of control actions by induction on the parameter lrl_{r}. Note that the first event time is assumed to be l0=0l_{0}=0. Fix Yl0=0Y_{l_{0}}=0. Then, el0=Xl0e_{l_{0}}=X_{l_{0}} is independent of all control actions. Now, assume that elre_{l_{r}} is independent of all controls. Now, by assumption, we have that the process noise is independent of controls, lr\forall l_{r}. Consequently, from (III-A) and the induction hypothesis, we have that elr+1e_{l_{r+1}} is independent of all controls. Finally, by the principle of mathematical induction, (i) holds for Part 1.

Next, since elr=Xlr𝔼{Xlr|Ilrd}e_{l_{r}}=X_{l_{r}}-\mathbb{E}\{X_{l_{r}}|I^{d}_{l_{r}}\}, we have 𝔼{elr|Ilrd}=0.\mathbb{E}\{e_{l_{r}}|I^{d}_{l_{r}}\}=0. Also, 𝔼{Ylrelr|Ilrd}=𝔼{Ylrelr|Ilrd,Ylr}=Ylr𝔼{elr|Ilrd}=0,\mathbb{E}\{Y_{l_{r}}e_{l_{r}}^{\prime}|I^{d}_{l_{r}}\}=\mathbb{E}\{Y_{l_{r}}e_{l_{r}}^{\prime}|I^{d}_{l_{r}},Y_{l_{r}}\}=Y_{l_{r}}\mathbb{E}\{e_{l_{r}}^{\prime}|I^{d}_{l_{r}}\}=0, where the second equality follows since Ylr=𝒟lr(Ilrd)Y_{l_{r}}=\mathscr{D}_{l_{r}}(I^{d}_{l_{r}}) is σ(Ilrd)\sigma\left(I^{d}_{l_{r}}\right)-measurable (the sigma-algebra generated by IlrdI^{d}_{l_{r}}). Hence, (ii) holds.

Part 2: Now, we prove (i) and (ii) for the non-transmission times. Suppose lrl_{r} and lr+1l_{r+1} are some consecutive transmission times. Then, for any k(lr,lr+1)k\in(l_{r},l_{r+1}), e¯k=δk=η¯1(ϕ)elr+η¯2(ϕ)Wlr:k1\bar{e}_{k}=\delta_{k}=\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}, for appropriate matrices η¯1(ϕ)\bar{\eta}_{1}(\phi) and η¯2(ϕ)\bar{\eta}_{2}(\phi). Then, by Part 1, (i) holds for all non-transmission times.

Next, we prove (ii). It is easy to see that the information state of the decoder is updated with new information only at the transmission instants. More specifically, any estimate between two transmission times lrl_{r} and lr+1l_{r+1}, for some rr, can be recovered from its information state IlrdI^{d}_{l_{r}}. Thus,

𝔼{δk|Ikd}\displaystyle\mathbb{E}\left\{\delta_{k}|I^{d}_{k}\right\} =𝔼{η¯1(ϕ)elr+η¯2(ϕ)Wlr:k1|Ikd}\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}|I^{d}_{k}\right\}
=𝔼{η¯1(ϕ)elr+η¯2(ϕ)Wlr:k1|Ilrd}\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}|I^{d}_{l_{r}}\right\}
=𝔼{η¯1(ϕ)elr|Ilrd}+𝔼{η¯2(ϕ)Wlr:k1|Ilrd}\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}|I^{d}_{l_{r}}\right\}+\mathbb{E}\left\{\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}|I^{d}_{l_{r}}\right\}
=0.\displaystyle=0.

The last equality follows from Part 1 and the fact that WkW_{k} is an independent zero mean process. Finally, 𝔼[Ykδk|Ikd]=0\mathbb{E}[Y_{k}\delta_{k}^{\prime}|I^{d}_{k}]=0 follows in the same manner as in Part 1. Thus (ii) holds and the proof is complete. ∎

Note that the proof of Lemma 1 is made possible due to the information structure of the scheduler, encoder and decoder. Since the information maps of the scheduler and the controller entail partially nested σ\sigma-algebras, the scheduler is able to recover the controller output information at its own end. This, in addition to the deterministic nature of control policies, allows the scheduler to compute the scheduling policy based on innovations and consequently take complete authority over transmission of new information. As a result, the control is dual-effect free.

Next, we prove that due to the no dual-effect, the separation principle holds for the underlying tracking problem. Consequently, the optimal control law under the information structure in Table I, is certainty equivalent [1].

Theorem 1.

Consider the information structure on the generic agent as in Table I. Then, the control design problem separates into designing a state decoder and a certainty equivalent controller.

Proof.

Consider the cost-to-go as follows:

𝔼{(XkX¯k)Q(ϕ)(XkX¯k)+UkR(ϕ)Uk|Ikμ}\displaystyle\mathbb{E}\{(X_{k}-\bar{X}_{k})^{\prime}Q(\phi)(X_{k}-\bar{X}_{k})+U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\}
=\displaystyle= 𝔼{(YkX¯k+e¯k)Q(ϕ)(YkX¯k+e¯k)\displaystyle\mathbb{E}\left\{(Y_{k}-\bar{X}_{k}+\bar{e}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k}+\bar{e}_{k})\right.
+UkR(ϕ)Uk|Ikμ},\displaystyle\left.+U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\right\},
=\displaystyle= (YkX¯k)Q(ϕ)(YkX¯k)+𝔼{(YkX¯k)Q(ϕ)ek|Ikμ}\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)e_{k}|I_{k}^{\mu}\}
+𝔼{e¯kQ(ϕ)(YkX¯k)|Ikμ}+𝔼{e¯kQ(ϕ)e¯k|Ikμ}\displaystyle+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})|I_{k}^{\mu}\}+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}|I_{k}^{\mu}\}
+𝔼{UkR(ϕ)Uk|Ikμ},\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\},
=\displaystyle= (YkX¯k)Q(ϕ)(YkX¯k)+𝔼{e¯kQ(ϕ)e¯k|Ikμ}\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}|I_{k}^{\mu}\}
+𝔼{UkR(ϕ)Uk|Ikμ},\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\},
=\displaystyle= (YkX¯k)Q(ϕ)(YkX¯k)+𝔼{UkR(ϕ)Uk|Ikμ}\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\}
+Tr{Q(ϕ)Δk},\displaystyle+Tr\{Q(\phi)\Delta_{k}\},

where Δk={𝔼{ekek|Y0:k},γk=1𝔼{δkδk|Y0:k},γk=0.\Delta_{k}=\left\{\begin{array}[]{ll}\mathbb{E}\{e_{k}e_{k}^{\prime}|Y_{0:k}\},&\gamma_{k}=1\\ \mathbb{E}\{\delta_{k}\delta_{k}^{\prime}|Y_{0:k}\},&\gamma_{k}=0\end{array}.\right. The third equality follows from Lemma 1 and the fact that X¯k\bar{X}_{k} is deterministic and independent of e¯k\bar{e}_{k}. The last inequality follows again since the relative error is independent of control actions by Lemma 1. Then, since the error-induced term Tr{Q(ϕ)Δk}Tr\{Q(\phi)\Delta_{k}\} is independent of controls, the proof is complete. ∎

Certainty equivalence of the optimal control policy is a consequence of no dual-effect [19, 21] of control as in Lemma 1. When the control is free of dual-effect, the covariance of the estimation error is independent of the control signals used. Thus, the controller can no longer benefit from probing the scheduler for information and can be designed independently of the scheduler and the decoder. We are now ready to state the following Proposition.

Proposition 1 (Separated Stochastic Optimal Control Problem).

Using Theorem 1, the fully observed system can be constructed using the partially observed one (due to the presence of noisy channel) as:

Yk+1={A(ϕ)Yk+B(ϕ)Uk+W¯k,γk+1=1A(ϕ)Yk+B(ϕ)Uk,γk+1=0,\displaystyle Y_{k+1}=\left\{\begin{array}[]{ll}A(\phi)Y_{k}+B(\phi)U_{k}+\bar{W}_{k},&\gamma_{k+1}=1\\ A(\phi)Y_{k}+B(\phi)U_{k},&\gamma_{k+1}=0\end{array},\right. (13)

where W¯k=A(ϕ)e¯k+Wke¯k+1\bar{W}_{k}=A(\phi)\bar{e}_{k}+W_{k}-\bar{e}_{k+1}, with the associated cost-to-go (YkX¯k)Q(ϕ)(YkX¯k)+𝔼{UkR(ϕ)Uk|Ikμ}+Tr{Q(ϕ)Δk}.(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\}+Tr\{Q(\phi)\Delta_{k}\}. Then, under Assumption 1,

  1. (i)

    The optimal control policy for the separated problem (13) is given as

    Uk\displaystyle U^{*}_{k} =Π(ϕ)YkΓ(ϕ)gk+1\displaystyle=-\Pi(\phi)Y_{k}-\Gamma(\phi)g_{k+1} (14)

    where Γ(ϕ)=(R(ϕ)+B(ϕ)K(ϕ)B(ϕ))1B(ϕ)\Gamma(\phi)=(R(\phi)+B(\phi)^{\prime}K(\phi)B(\phi))^{-1}B(\phi)^{\prime}, Π(ϕ)=Γ(ϕ)K(ϕ)A(ϕ)\Pi(\phi)=\Gamma(\phi)K(\phi)A(\phi). Further, K(ϕ)K(\phi) is the unique positive definite solution to the algebraic Riccati equation

    K(ϕ)=A(ϕ)K(ϕ)A(ϕ)\displaystyle K(\phi)=A(\phi)^{\prime}K(\phi)A(\phi)- A(ϕ)K(ϕ)B(ϕ)Π(ϕ)\displaystyle A(\phi)^{\prime}K(\phi)^{\prime}B(\phi)\Pi(\phi)
    +Q(ϕ)\displaystyle+Q(\phi) (15)

    and the trajectory gkg_{k} is given as

    gk=H(ϕ)gk+1Q(ϕ)X¯k,\displaystyle g_{k}=H(\phi)^{\prime}g_{k+1}-Q(\phi)\bar{X}_{k}, (16)

    where H(ϕ):=A(ϕ)[IK(ϕ)B(ϕ)Γ(ϕ)]H(\phi)^{\prime}:=A(\phi)^{\prime}[I-K(\phi)B(\phi)\Gamma(\phi)] is Hurwitz.

Proof.

The separated problem follows from (5) and (9) and Theorem 1. The rest of the proof follows from [22] and [17] and is thus omitted. ∎

III-B MFE Analysis

Using Proposition 1, we now prove the existence and uniqueness of the MFE by introducing an operator 𝒯\mathcal{T} as shown in this section. First note from (16) that H(ϕ)=A(ϕ)B(ϕ)Π(ϕ)H(\phi)=A(\phi)-B(\phi)\Pi(\phi). Then, substituting (14) in (13), we arrive at the closed-loop system as:

Yk+1={H(ϕ)YkB(ϕ)Γ(ϕ)gk+1+W¯k,γk+1=1H(ϕ)YkB(ϕ)Γ(ϕ)gk+1,γk+1=0.\displaystyle Y_{k+1}=\left\{\begin{array}[]{ll}H(\phi)Y_{k}-B(\phi)\Gamma(\phi)g_{k+1}+\bar{W}_{k},&\gamma_{k+1}=1\\ H(\phi)Y_{k}-B(\phi)\Gamma(\phi)g_{k+1},&\gamma_{k+1}=0\end{array}.\right.

Then, using W¯k\bar{W}_{k} from Proposition 1 and (9), we can rewrite the above closed-loop system as

Xk+1\displaystyle X_{k+1} =H(ϕ)XkB(ϕ)Γ(ϕ)gk+1+B(ϕ)Π(ϕ)e¯k+Wk.\displaystyle=H(\phi)X_{k}-B(\phi)\Gamma(\phi)g_{k+1}+B(\phi)\Pi(\phi)\bar{e}_{k}+W_{k}. (17)

It is to be noted that gk𝒳g_{k}\in\mathcal{X}, if g0=j=0(Hj(ϕ))Q(ϕ)X¯jg_{0}=-\sum_{j=0}^{\infty}{(H^{j}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}, which further gives gk=j=k(Hjk(ϕ))Q(ϕ)X¯jg_{k}=-\sum_{j=k}^{\infty}{(H^{j-k}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}. Substituting this value of gkg_{k} in (17), we get

Xk+1\displaystyle X_{k+1} =H(ϕ)Xk+B(ϕ)Γ(ϕ)j=k+1(Hjk1(ϕ))Q(ϕ)X¯j\displaystyle=H(\phi)X_{k}+B(\phi)\Gamma(\phi)\sum_{j=k+1}^{\infty}{(H^{j-k-1}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}
+B(ϕ)Π(ϕ)e¯k+Wk.\displaystyle+B(\phi)\Pi(\phi)\bar{e}_{k}+W_{k}.

Now, taking expectation on both sides and denoting X^k(ϕ)=𝔼{Xk}\hat{X}_{k}(\phi)=\mathbb{E}\{X_{k}\} as the aggregate trajectory of agents of type ϕ\phi, we get

X^k+1(ϕ)=\displaystyle\hat{X}_{k+1}(\phi)= (18)
H(ϕ)X^k(ϕ)+B(ϕ)Γ(ϕ)j=k+1(Hjk1(ϕ))Q(ϕ)X¯j,\displaystyle H(\phi)\hat{X}_{k}(\phi)+B(\phi)\Gamma(\phi)\sum_{j=k+1}^{\infty}{(H^{j-k-1}(\phi))^{\prime}Q(\phi)\bar{X}_{j}},

where we use the tower property of conditional expectation and Lemma 1, to get 𝔼{e¯k}=𝔼{𝔼{e¯k|Ikd}}=0\mathbb{E}\{\bar{e}_{k}\}=\mathbb{E}\{\mathbb{E}\{\bar{e}_{k}|I_{k}^{d}\}\}=0. Finally, (18) can be simplified further as:

X^k(ϕ)=Hk(ϕ)νϕ,0\displaystyle\hat{X}_{k}(\phi)=H^{k}(\phi)\nu_{\phi,0} (19)
+j=0k1Hkj1(ϕ)B(ϕ)Γ(ϕ)s=j+1(Hsj1(ϕ))Q(ϕ)X¯s.\displaystyle+\sum_{j=0}^{k-1}{H^{k-j-1}(\phi)B(\phi)\Gamma(\phi)\sum_{s=j+1}^{\infty}(H^{s-j-1}(\phi))^{\prime}Q(\phi)\bar{X}_{s}}.

Now, using the emprical distribution (2), define the operator 𝒯\mathcal{T} as:

𝒯(X¯)(k):=ϕΦX^k(ϕ)P(ϕ),\displaystyle\mathcal{T}(\bar{X})(k):=\sum_{\phi\in\Phi}{\hat{X}_{k}(\phi)P(\phi)}, (20)

where 𝒯(X¯)()\mathcal{T}(\bar{X})(\cdot) maps the input sequence to another sequence at time kk. Using this operator, we prove existence and uniqueness of the equilibrium MF trajectory by finding the fixed point of (20), under the following assumption.

Assumption 2.

We assume Ξ:=H(ϕ)+ζ<1,ϕ\Xi:=\|H(\phi)\|+\zeta<1,\forall\phi, where ζ:=ϕΦQ(ϕ)B(ϕ)Γ(ϕ)(1H(ϕ))2P(ϕ)\zeta:=\sum_{\phi\in\Phi}{\frac{\|Q(\phi)\|\|B(\phi)\Gamma(\phi)\|}{(1-\|H(\phi)\|)^{2}}P(\phi)}.

Assumption 2 is motivated by results from existing literature [13, 17, 16]. While it is stronger than the ones in [15, 17], it entails linear MF trajectory dynamics, which is easily tractable.

Theorem 2.

Under Assumptions 1-2, the following hold true:

  1. (i)

    The operator 𝒯(X¯)𝒳,X¯𝒳\mathcal{T}(\bar{X})\in\mathcal{X},~{}~{}\forall\bar{X}\in\mathcal{X}. Furthermore, there exists unique X¯𝒳\bar{X}^{*}\in\mathcal{X} such that 𝒯(X¯)=X¯\mathcal{T}(\bar{X}^{*})=\bar{X}^{*}.

  2. (ii)

    X¯k\bar{X}_{k}^{*} follows linear dynamics, i.e., L:={Ln×n:L1,X¯k+1=LX¯k}\exists~{}L^{*}\in\mathcal{L}:=\{L\in\mathbb{R}^{n\times n}:~{}\|L\|\leq 1,\bar{X}_{k+1}^{*}=L\bar{X}_{k}^{*}\}, where X¯k\bar{X}^{*}_{k} is the aggregate trajectory of the agents at equilibrium, and X¯0=ϕΦνϕ,0P(ϕ)\bar{X}_{0}^{*}=\sum_{\phi\in\Phi}{\nu_{\phi,0}P(\phi)}.

Proof.
  1. (i)

    Consider the linear system (18) with driving input X¯k\bar{X}_{k}, which is bounded since X¯𝒳.\bar{X}\in\mathcal{X}. Since H(ϕ)<1\|H(\phi)\|<1 by Assumption 2, and gk𝒳,kg_{k}\in\mathcal{X},~{}\forall k, we have from (18) and (20), that supk0𝒯(X¯)(k)<\sup_{k\geq 0}{\|\mathcal{T}(\bar{X})(k)\|}<\infty, which proves the first statement in part (i). Next, consider the following:

    𝒯(X¯1)𝒯(X¯2)=ϕΦ(X^1X^2)P(ϕ)\displaystyle\|\mathcal{T}(\bar{X}_{1})-\mathcal{T}(\bar{X}_{2})\|_{\infty}=\|\sum_{\phi\in\Phi}({\hat{X}_{1}-\hat{X}_{2})P(\phi)}\|_{\infty}
    ϕΦ(Q(ϕ)B(ϕ)Γ(ϕ)(s=0H(ϕ)s)2P(ϕ)\displaystyle\leq\sum_{\phi\in\Phi}{(\|Q(\phi)\|\|B(\phi)\Gamma(\phi)\|\left(\sum_{s=0}^{\infty}\|H(\phi)\|^{s}\right)^{2}P(\phi)}
    ×X¯1X¯2\displaystyle\qquad\times\|\bar{X}_{1}-\bar{X}_{2}\|_{\infty}
    =ζX¯1X¯2,\displaystyle=\zeta\|\bar{X}_{1}-\bar{X}_{2}\|_{\infty},

    where the last equality follows from Assumption 2. Finally, using Banach’s fixed point theorem and the first statement of part (i), 𝒯(X¯)\mathcal{T}(\bar{X}) has a unique fixed point in 𝒳\mathcal{X}.

  2. (ii)

    Define the operator 𝒯¯:n×nn×n\bar{\mathcal{T}}:\mathbb{R}^{n\times n}\rightarrow\mathbb{R}^{n\times n}, given as:

    𝒯¯ϕ(L)\displaystyle\bar{\mathcal{T}}_{\phi}(L) :=H(ϕ)+B(ϕ)Γ(ϕ)α=0(Hα(ϕ))Q(ϕ)Lα+1,\displaystyle:=H(\phi)+B(\phi)\Gamma(\phi)\sum_{\alpha=0}^{\infty}{(H^{\alpha}(\phi))^{\prime}Q(\phi)L^{\alpha+1}},
    𝒯¯(L)\displaystyle\bar{\mathcal{T}}(L) :=ϕΦ𝒯¯ϕ(L)P(ϕ)\displaystyle:=\sum_{\phi\in\Phi}\bar{\mathcal{T}}_{\phi}(L)P(\phi)

    with L=𝒯¯(L)L^{*}=\bar{\mathcal{T}}(L^{*}) and X^k+1=LX^k\hat{X}_{k+1}^{*}=L^{*}\hat{X}_{k}^{*}. To prove that such a LL^{*} indeed exists, we follow the same lines of proof as in [16] to arrive at

    𝒯¯(L2)𝒯¯(L1)\displaystyle\|\bar{\mathcal{T}}(L_{2})-\bar{\mathcal{T}}(L_{1})\|
    <ϕΦB(ϕ)Γ(ϕ)Q(1H(ϕ))2L2L1P(ϕ),\displaystyle\hskip 17.07182pt<\sum_{\phi\in\Phi}\frac{\|B(\phi)\Gamma(\phi)\|\|Q\|}{(1-\|H(\phi)\|)^{2}}\|L_{2}-L_{1}\|P(\phi),

    which, under Assumption 2, establishes that 𝒯¯\bar{\mathcal{T}} is a contraction. Using completeness of \mathcal{L} and Banach’s fixed point theorem, we indeed have the existence of such an LL^{*}. Finally, from (i) above, we get that the unique MF trajectory X¯\bar{X}^{*} can be constructed recursively as X¯k=(L)kX¯0\bar{X}^{*}_{k}=(L^{*})^{k}\bar{X}^{*}_{0} with X^0=νϕ,0\hat{X}_{0}=\nu_{\phi,0}.

Theorem 2 (i) provides us with a unique MFE while the linearity of the MF trajectory in (ii) gives a control law which is linear in the state of the agent and the equilibrium trajectory [16]. This further makes the computation of this trajectory tractable which would otherwise have involved a non-causal infinite sum.

III-C ϵ\epsilon-Nash Equilibrium

We now show that under the information structure in Table I, the MFE constitutes an ϵ\epsilon-Nash equilibrium for the NN-player game. Before doing that, we first provide the definition of ϵ\epsilon-Nash equilibrium for the discrete-time MFG under communication constraints. Let us denote the space of admissible centralized control policies for agent ii as ¯ic\bar{\mathcal{M}}_{i}^{c}, under a centralized information structure, where each agent is assumed to have access to other agents’ output histories. Then, we have the following:

Definition 2.

[17, 23] The set of control policies {μiic,i[1,N]}\{\mu^{i}\in{\mathcal{M}}_{i}^{c},~{}i\in[1,N]\} constitute an ϵ\epsilon-Nash equilibrium with respect to the cost functions {JiN,i[1,N]}\{J_{i}^{N},~{}i\in[1,N]\}, if, for some ϵ>0\epsilon>0,

JiN(μi,μi)ϵinfπiicJiN(πi,μi).\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})-\epsilon\leq\inf_{\pi^{i}\in{\mathcal{M}_{i}^{c}}}{J_{i}^{N}(\pi^{i},\mu^{-i})}.

We start by proving that the mass behaviour in (4) converges to the equilibrium MF trajectory as in Theorem 2 (i). Note henceforth, signals superscripted by an asterisk (*) will represent quantities in the equilibrium, e.g., XkiX^{i*}_{k} denotes the state of agent ii at time kk under equilibrium control policy.

Lemma 2.

Suppose Assumptions 1-2 hold and all the agents operate under the equilibrium control policy. Then, the coupling term in (4) converges (in the mean-square sense) to the equilibrium mean-field trajectory, i.e.,

limNlim supT1T𝔼{k=0T11Ni=1NXkiX¯k2}=0.\displaystyle\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i*}}-\bar{X}_{k}^{*}\Bigg{\|}^{2}\right\}=0. (21)
Proof.

First, consider the following:

limNlim supT1T𝔼{k=0T11Ni=1NXkiX¯k2}\displaystyle\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i*}}-\bar{X}_{k}^{*}\Bigg{\|}^{2}\right\}
limNlim supT2T𝔼{k=0T11Ni=1NXkiX^k(ϕi)2}\displaystyle\leq\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i*}}-{\hat{X}_{k}^{*}(\phi_{i})}\Bigg{\|}^{2}\right\}
+2limNsupk01Ni=1NX^k(ϕi)X¯k2\displaystyle~{}~{}~{}~{}+2\lim\limits_{N\rightarrow\infty}\sup_{k\geq 0}\Bigg{\|}\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{*}(\phi_{i})}-\bar{X}^{*}_{k}\Bigg{\|}^{2}

where the inequality follows from the identity a+b22a2+2b2\|a+b\|^{2}\leq 2\|a\|^{2}+2\|b\|^{2} and 1Ni=1NX^k(ϕi)=ϕiΦX^k(ϕi)PN(ϕi),k\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{*}(\phi_{i})}=\sum_{\phi_{i}\in\Phi}{\hat{X}^{*}_{k}(\phi_{i})P_{N}(\phi_{i})},\forall k. We now prove that the first term in the above inequality vanishes. Let Zki=XkiX^kiZ_{k}^{i*}=X_{k}^{i*}-\hat{X}_{k}^{i*}. Then, using (17), we have

Zk+1i=H(ϕi)Zki+Wki+B(ϕi)Π(ϕi)e¯ki.\displaystyle Z_{k+1}^{i*}=H(\phi_{i})Z_{k}^{i*}+W_{k}^{i}+B(\phi_{i})\Pi(\phi_{i})\bar{e}_{k}^{i}.

Then, as in [17, Lemma 2], T1>0\exists~{}T_{1}>0 and M1>0,M_{1}>0, independent of TT and NN, such that 2T𝔼{k=0T11Ni=1NZki2}M1N,T>T1\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}\leq\frac{M_{1}}{N},~{}\forall T>T_{1}, which implies that lim supT2T𝔼{k=0T11Ni=1NZki2}M1N\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}\leq\frac{M_{1}}{N}. This finally gives limNlim supT2T𝔼{k=0T11Ni=1NZki2}=0.\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}=0. Next, since the support of Φ\Phi is finite (and hence compact), this implies that X^k(ϕi)\hat{X}^{*}_{k}(\phi_{i}) is uniformly bounded for all kk. Further, since X¯𝒳\bar{X}^{*}\in\mathcal{X} from Theorem 2, we have limNsupk01Ni=1NX^k(ϕi)X¯k2=0\lim\limits_{N\rightarrow\infty}\sup_{k\geq 0}\|\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{*}(\phi_{i})}-\bar{X}^{*}_{k}\|^{2}=0. Thus, (21) holds and the proof is complete. ∎

Now, we prove the ϵ\epsilon-Nash property of the MFE. Toward that end, consider the following cost functions:

JiN(μi,μi)=\displaystyle J_{i}^{N}(\mu^{i*},\mu^{-i*})= (22)
lim supT1T𝔼{k=0T1Xki1Nj=1NXkjQ2+UkiR2},\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}X_{k}^{i*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}_{Q}+\|U_{k}^{i*}\|^{2}_{R}\right\},
J(μi,X¯)=\displaystyle J(\mu^{i*},\bar{X}^{*})= (23)
lim supT1T𝔼{k=0T1XkiX¯Q2+UkiR2},\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i*}-\bar{X}^{*}\|^{2}_{Q}+\|U_{k}^{i*}\|^{2}_{R}\right\},
JiN(πi,μi)=\displaystyle J_{i}^{N}(\pi^{i},\mu^{-i*})= (24)
lim supT1T𝔼{k=0T1Xki,πi1Nj=1NXkjQ2+VkiR2},\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}X_{k}^{i,\pi^{i}}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}_{Q}+\|V_{k}^{i}\|^{2}_{R}\right\},

where X¯\bar{X}^{*} is the equilibrium MF trajectory (Theorem 2) and Xki,πiX_{k}^{i,\pi^{i}} is the state of agent ii at time kk when it chooses a control law πi\pi^{i} from the set of centralized policies ic{\mathcal{M}}_{i}^{c}. Notice that this set is strictly larger than the set \mathcal{M}. Furthermore, the control action VkiV_{k}^{i} is derived from πiic\pi^{i}\in{\mathcal{M}}_{i}^{c}. Now, we have the following theorem stating the ϵ\epsilon-Nash result, i.e., that the control laws prescribed by the MFE are also ϵ\epsilon-Nash in the finite population case.

Theorem 3.

Under the Assumptions 1-2, the set of NN decentralized control laws {μi,i[1,N]}\{\mu^{i*},~{}i\in[1,N]\}, where μi=μ\mu^{i*}=\mu^{*}, constitutes an ϵ\epsilon-Nash equilibrium for the LQ-MFG with communication constrained AWGN channel, more precisely, we have

JiN(μi,μi)\displaystyle J_{i}^{N}(\mu^{i*},\mu^{-i*}) infπiicJiN(πi,μi)+𝒪(lim supTϵTN),\displaystyle\leq\inf_{\pi^{i}\in{\mathcal{M}}_{i}^{c}}{J_{i}^{N}(\pi^{i},\mu^{-i*})}+\mathcal{O}\left(\limsup_{T\rightarrow\infty}\sqrt{\epsilon_{T}^{N}}\right), (25)

where ϵTN=1T𝔼{k=0T11Nj=1NXkjX¯k2}\epsilon_{T}^{N}=\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}{\|\frac{1}{N}\sum\limits_{j=1}^{N}{X_{k}^{j*}}-\bar{X}_{k}^{*}\|^{2}}\right\}.

Proof.

We prove the theorem in two steps. In the first step, we derive an upper bound on JiN(μi,μi)J(μi,X¯)J_{i}^{N}(\mu^{i*},\mu^{-i*})-J(\mu^{i},\bar{X}^{*}), and in step 2, on J(μi,X¯)JiN(πi,μi)J(\mu^{i},\bar{X}^{*})-J_{i}^{N}(\pi^{i},\mu^{-i*}). Finally, we combine the two to get (25).

Step 1: Consider the following:

JiN(μi,μi)J(μi,X¯)JiN(μi,μi)J(μi,X¯)\displaystyle J_{i}^{N}(\mu^{i*},\mu^{-i*})-J(\mu^{i},\bar{X}^{*})\leq J_{i}^{N}(\mu^{i*},\mu^{-i*})-J(\mu^{i*},\bar{X}^{*})
=lim supT1T𝔼{k=0T1Xki1Nj=1NXkjQ2\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}X_{k}^{i*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}_{Q}\right.
XkiX¯kQ2},\displaystyle\left.~{}~{}~{}~{}-\|X_{k}^{i*}-\bar{X}_{k}^{*}\|^{2}_{Q}\right\},
=lim supT1T𝔼{k=0T1X¯k1Nj=1NXkjQ2\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}_{Q}\right.
+2(XkiX¯k)Q(X¯k1Nj=1NXkj)}\displaystyle\left.~{}~{}~{}~{}+2\left(X_{k}^{i*}-\bar{X}_{k}^{*}\right)^{\prime}Q\left(\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\right)\right\}
lim supTQϵTN+2Q𝔼{1Tk=0T1XkiX¯k2\displaystyle\leq\limsup_{T\rightarrow\infty}\|Q\|\epsilon_{T}^{N}+2\|Q\|\mathbb{E}\left\{\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\|X_{k}^{i*}-\bar{X}_{k}^{*}\|^{2}}\right.
×1Tk=0T1X¯k1Nj=1NXkj2}\displaystyle\left.\qquad\qquad\times\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\Bigg{\|}\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}}\right\}
lim supTQϵTN\displaystyle\leq\limsup_{T\rightarrow\infty}\|Q\|\epsilon_{T}^{N}
+lim supT2QϵTN𝔼{1Tk=0T1XkiX¯k2}\displaystyle+\limsup_{T\rightarrow\infty}2\|Q\|\sqrt{\epsilon_{T}^{N}\mathbb{E}\left\{\frac{1}{T}\sum_{k=0}^{T-1}\|X_{k}^{i*}-\bar{X}_{k}^{*}\|^{2}\right\}} (26)

where both the inequalities follow from the Cauchy-Schwarz inequality.

Step 2: Consider the following:

JiN(πi,μi)=\displaystyle J_{i}^{N}(\pi^{i},\mu^{-i*})=
lim supT1T𝔼{k=0T1Xki,πi1Nj=1NXkjQ2+VkiR2}\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}X_{k}^{i,\pi^{i}}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\|}^{2}_{Q}+\|V_{k}^{i}\|^{2}_{R}\right\}
=lim supT1T𝔼{k=0T1Xki,πiX¯kiQ2+VkiR2\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{i*}\|^{2}_{Q}+\|V_{k}^{i}\|^{2}_{R}\right.
+1Nj=1NXkjX¯kiQ2\displaystyle\left.\qquad\qquad+\Bigg{\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}-\bar{X}_{k}^{i*}\Bigg{\|}^{2}_{Q}\right.
+2(Xki,πiX¯k)Q(X¯k1Nj=1NXkj)}\displaystyle\left.+2\left(X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{*}\right)^{\prime}Q\left(\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\right)\right\}
=J(πi,X¯)lim supT1T𝔼{k=0T11Nj=1NXkjX¯kiQ2\displaystyle=J(\pi^{i},\bar{X}^{*})-\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}-\bar{X}_{k}^{i*}\Bigg{\|}^{2}_{Q}\right.
+2(X¯kXki,πi)Q(X¯k1Nj=1NXkj)}\displaystyle\left.+2\left(\bar{X}_{k}^{*}-X_{k}^{i,\pi^{i}}\right)^{\prime}Q\left(\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\right)\right\}
J(μi,X¯)lim supT1T𝔼{k=0T11Nj=1NXkjX¯kiQ2\displaystyle\geq J(\mu^{i},\bar{X}^{*})-\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}-\bar{X}_{k}^{i*}\Bigg{\|}^{2}_{Q}\right.
+2(X¯kXki,πi)Q(X¯k1Nj=1NXkj,πi)}.\displaystyle\left.+2\left(\bar{X}_{k}^{*}-X_{k}^{i,\pi^{i}}\right)^{\prime}Q\left(\bar{X}_{k}^{*}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*,\pi^{i}}}\right)\right\}.

Finally, using Cauchy-Schwarz inequality in the same manner as for (26), we get

J(μi,X¯)JiN(πi,μi)lim supTQϵTN\displaystyle J(\mu^{i},\bar{X}^{*})-J_{i}^{N}(\pi^{i},\mu^{-i*})\leq\limsup_{T\rightarrow\infty}\|Q\|\epsilon_{T}^{N}
+lim supT2QϵTN𝔼{1Tk=0T1Xki,πiX¯k2}.\displaystyle+\limsup_{T\rightarrow\infty}2\|Q\|\sqrt{\epsilon_{T}^{N}\mathbb{E}\left\{\frac{1}{T}\sum_{k=0}^{T-1}\|X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{*}\|^{2}\right\}}. (27)

Similar to Theorem 3 in [17], there exist M2,T2>0M_{2},~{}T_{2}>0, such that 1T𝔼{k=0T1Xki2}<M2,T>T2\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i*}\|^{2}\right\}<M_{2},~{}\forall T>T_{2}. Further, from Theorem 2 (i), there exist M3,T3>0M_{3},~{}T_{3}>0, such that 1T𝔼{k=0T1X¯ki2}<M3,T>T3\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\bar{X}_{k}^{i*}\|^{2}\right\}<M_{3},~{}\forall T>T_{3}. Finally, since ¯iic\bar{\mathcal{M}}_{i}\subseteq{\mathcal{M}}_{i}^{c}, infπiicJiN(πi,μi)JiN(μi,μi)\inf_{\pi^{i}\in{\mathcal{M}}_{i}^{c}}J_{i}^{N}(\pi^{i},\mu^{-i*})\leq J_{i}^{N}(\mu^{i*},\mu^{-i*}), we may consider πiic\pi^{i}\in{\mathcal{M}}_{i}^{c} such that there exist M4,T4>0M_{4},~{}T_{4}>0, with the property that 1T𝔼{k=0T1Xki,πi2}<M4,T>T4\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i,\pi^{i}}\|^{2}\right\}<M_{4},~{}\forall T>T_{4}. Choose T5=max{T1,T2,T3,T4}T_{5}=\max\{T_{1},T_{2},T_{3},T_{4}\}, and let T>T5T>T_{5}, following which, we have (25) from (26) and (III-C). Finally, define ϵ:=𝒪(lim supTϵTN)\epsilon:=\mathcal{O}\big{(}\limsup_{T\rightarrow\infty}\sqrt{\epsilon_{T}^{N}}\big{)}, which converges to 0 as NN\rightarrow\infty using Lemma 2. The proof is thus complete. ∎

Before concluding this section, we remark that according to Theorem 3, the decentralized equilibrium policy provides an ϵ\epsilon-Nash equilibrium for the centralized policy structure in the NN-player game. Consequently, it also provides an ϵ\epsilon-Nash equilibrium for the decentralized policy structure in the original NN-player game formulated in Section II.

IV Simulations

In this section we demonstrate the performance of MFE under different scheduling policies. We simulate a finite population game with scalar dynamics and a single type ϕ\phi. The dynamics and cost parameters of the agents satisfy Assumptions 1 and 2. For Figure, 2(a) we simulate a game of N=100N=100 agents and show that in spite of significant channel noise the estimation error decreases allowing the agents to form a consensus in a very short time. Note that the output does not perfectly mimic the true state due to the asynchronous nature of communication.

Refer to caption
(a) Estimation error
Refer to caption
(b) Average cost vs Scheduling Threshold (α)(\alpha)
Figure 2: Performance of the equilibrium policy under communication constraints

For Figure 2(b),we simulate the behavior of N=1000 agents and plot the average cost per agent on a logarithmic axis against scheduling threshold of α=0.0,2.0,4.0,6.0\alpha={0.0,2.0,4.0,6.0}. The figure shows a box plot depicting the median (red line) and spread (box) of the average cost per agent over 100100 runs for each value of α\alpha. The plot shows a clear increase in average cost per agent as α\alpha is increased, indicating that an increase in α\alpha leads to an increase in estimation error, which in turn causes a higher average cost per agent. This indicates that a compromise can be reached between performance (average cost) vs communication frequency through a judicious choice of the threshold parameter α\alpha.

V Conclusion

In this paper, we have studied LQ-MFGs under communication constraints, namely when there is intermittent communication over an AWGN channel. Under the defined information structure involving the scheduler, encoder, decoder and controller, we have proved that the control is free of the dual-effect in the mean field limit. Consequently, the optimal control policy has been shown to be certainty equivalent. Under appropriate assumptions, we have established the existence, uniqueness and characterization (linearity) of the mean-field trajectory, shown to have the ϵ\epsilon-Nash property. We have also empirically demonstrated that the performance of the equilibrium policies deteriorates for decreasing communication frequency, aligned with intuition.

References

  • [1] C. Ramesh, H. Sandberg, and K. H. Johansson, “Design of state-based schedulers for a network of control loops,” IEEE Transactions on Automatic Control, vol. 58, no. 8, pp. 1962–1975, 2013.
  • [2] X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling and remote estimation over an additive noise channel,” Automatica, vol. 88, pp. 57–69, 2018.
  • [3] A. Nayyar, T. Başar, D. Teneketzis, and V. V. Veeravalli, “Optimal strategies for communication and remote estimation with an energy harvesting sensor,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2246–2260, 2013.
  • [4] S. Tatikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channel,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1549–1561, 2004.
  • [5] S. Tatikonda and S. Mitter, “Control over noisy channels,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1196–1201, 2004.
  • [6] O. C. Imer and T. Başar, “Optimal estimation with limited measurements,” International Journal of Systems, Control and Communications, vol. 2, no. 1-3, pp. 5–29, 2010.
  • [7] X. Gao, E. Akyol, and T. Başar, “Optimal estimation with limited measurements and noisy communication,” in 54th IEEE Conference on Decision and Control (CDC).   IEEE, 2015, pp. 1775–1780.
  • [8] S. Tatikonda, A. Sahai, and S. Mitter, “Control of LQG systems under communication constraints,” in Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 1.   IEEE, 1998, pp. 1165–1170.
  • [9] R. Bansal and T. Başar, “Simultaneous design of measurement and control strategies for stochastic systems with feedback,” Automatica, vol. 25, no. 5, pp. 679–694, 1989.
  • [10] A. Molin and S. Hirche, “On the optimality of certainty equivalence for event-triggered control systems,” IEEE Transactions on Automatic Control, vol. 58, no. 2, pp. 470–474, 2012.
  • [11] D. J. Antunes and M. H. Balaghi I., “Consistent event-triggered control for discrete-time linear systems with partial state information,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 181–186, 2019.
  • [12] M. Huang, R. P. Malhamé, P. E. Caines et al., “Large population stochastic dynamic games: closed-loop Mckean-Vlasov systems and the Nash Certainty Equivalence principle,” Communications in Information & Systems, vol. 6, no. 3, pp. 221–252, 2006.
  • [13] M. Huang, P. E. Caines, and R. P. Malhamé, “Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized ε\varepsilon-Nash equilibria,” IEEE Transactions on Automatic Control, vol. 52, no. 9, pp. 1560–1571, 2007.
  • [14] J.-M. Lasry and P.-L. Lions, “Mean field games,” Japanese Journal of Mathematics, vol. 2, no. 1, pp. 229–260, 2007.
  • [15] M. A. uz Zaman, K. Zhang, E. Miehling, and T. Başar, “Approximate equilibrium computation for discrete-time linear-quadratic mean-field games,” in American Control Conference (ACC).   IEEE, 2020, pp. 333–339.
  • [16] ——, “Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games,” in 59th IEEE Conference on Decision and Control (CDC).   IEEE, 2020, pp. 2278–2284.
  • [17] J. Moon and T. Başar, “Discrete-time LQG mean field games with unreliable communication,” in 53rd IEEE Conference on Decision and Control.   IEEE, 2014, pp. 2697–2702.
  • [18] M. A. uz Zaman, S. Bhatt, and T. Başar, “Secure discrete-time linear-quadratic mean-field games,” in International Conference on Decision and Game Theory for Security.   Springer, 2020, pp. 203–222.
  • [19] Y. Bar-Shalom and E. Tse, “Dual effect, certainty equivalence, and separation in stochastic control,” IEEE Transactions on Automatic Control, vol. 19, no. 5, pp. 494–500, 1974.
  • [20] B. D. Anderson and J. B. Moore, Optimal filtering.   Courier Corporation, 2012.
  • [21] A. A. Feldbaum, “Dual control theory,” in Control Theory: Twenty-Five Seminal Papers (1932-1981), T. Başar, Ed.   Wiley-IEEE Press, 2001, ch. 10, pp. 874–880.
  • [22] D. P. Bertsekas, Dynamic Programming and Optimal Control: Vol. 1.   Athena Scientific, Belmont, MA, 2000.
  • [23] N. Saldi, T. Başar, and M. Raginsky, “Markov–Nash equilibria in mean-field games with discounted cost,” SIAM Journal on Control and Optimization, vol. 56, no. 6, pp. 4256–4287, 2018.