Linear Quadratic Mean-Field Games with Communication Constraints

Shubham Aggarwal, Muhammad Aneeq uz Zaman, and Tamer Başar The authors are affiliated with the Coordinated Science Lab, University of Illinois at Urbana-Champaign, Urbana, IL, USA 61801. Emails: {sa57,mazaman2,basar1}@illinois.edu.Research is supported in part by an AFOSR Grant (FA9550-19-1-0353).

Abstract

In this paper, we study a large population game with heterogeneous dynamics and cost functions solving a consensus problem. Moreover, the agents have communication constraints which appear as: (1) an Additive-White Gaussian Noise (AWGN) channel, and (2) asynchronous data transmission via a fixed scheduling policy. Since the complexity of solving the game increases with the number of agents, we use the Mean-Field Game paradigm to solve it. Under standard assumptions on the information structure of the agents, we prove that the control of the agent in the MFG setting is free of the dual effect. This allows us to obtain an equilibrium control policy for the generic agent, which is a function of only the local observation of the agent. Furthermore, the equilibrium mean-field trajectory is shown to follow linear dynamics, hence making it computable. We show that in the finite population game, the equilibrium control policy prescribed by the MFG analysis constitutes an $\epsilon-$ Nash equilibrium, where $\epsilon$ tends to zero as the number of agents goes to infinity. The paper is concluded with simulations demonstrating the performance of the equilibrium control policy.

I Introduction

In distributed real-world applications, like networked control systems [1], ecosystem monitoring [2], and energy harvesting [3], we rarely have the luxury of pure persistent communication. Hence, in this work, we study multi-agent systems under a constrained communication structure. The communication constraints may appear in the form of limited sensor energy levels [3], noisy transmission medium [4, 5], limits on the communication frequency [6, 7] or some combination thereof, as we investigate in this work. Additionally, scalability becomes a huge challenge with increasing number of agents in a multi-agent system.

In this paper, we consider a discrete-time multi-agent game problem. Each agent is coupled with other agents through its cost function, which incentivizes the agent to form consensus with other players. In addition to a plant and a controller, the agent’s control system (See Figure 1) also consists of a scheduler and an Additive White Gaussian Noise (AWGN) channel. The scheduler controls the flow of information using a fixed scheduling policy. The communication through the AWGN channel is regulated by an encoder/decoder pair which constitutes a predictive encoder [8] to encode sequential data and a minimum mean-square estimation (MMSE) decoder to produce the best estimate of the plant state.

Related Work: There have been several works in the literature studying estimation and control problems under communication constraints. Reference [9] considers simultaneous design of measurement and control strategies for a class of Linear Quadratic Gaussian (LQG) problems under soft constraints on both. The LQG problem has been further studied for a noisy analog channel [8] and a noiseless digital channel [4] in the forward loop. All these works however, consider single agent problems with uninterrupted communication, unlike the setting of this work. In [1], the authors consider a problem where a network of plants share a noiseless communication medium via a state-based scheduling policy. The system has been shown to be dual effect free under a symmetry condition on the scheduling policy. Similarly the optimality of certainty equivalent control laws has been characterized under an event-triggered communication policy with a noiseless channel in [10, 11]. Our work on the other hand proposes a multi-agent game where each agent has intermittent access to its state measurement through a noisy channel.

A key difficulty in multi-agent systems is that of scalability. To alleviate this challenge, a mean-field game (MFG) framework was proposed in [12, 13] by Huang, Malhamé and Caines and simultaneously in [14], by Lasry and Lions. The essential idea in the MFG framework is that as the number of agents goes to infinity, agents become indistinguishable and the effect of individual deviation becomes negligible (that is, the effect of strategic interaction disappears). This leads to an aggregation effect, which can be modelled by an exogenous mean-field (MF) term. Consequently, the game problem reduces to a stochastic optimal control problem for a representative agent along with a consistency condition.

Linear Quadratic MFGs (LQ-MFGs), which combine linear agent dynamics with a quadratic cost function, serve as a significant benchmark in the study of MFGs. Recent works on LQ-MFGs [15, 16] in the discrete-time setting are free of communication constraints or consider partially observed dynamics involving packet drop-outs [17], thereby making the underlying communication link unreliable. Furthermore, Secure MFGs [18] capture the setting where the agents deliberately obfuscate their state information with the goal of subverting an eavesdropping adversary. In these works, however, communication occurs at every time instance, in contrast to our setting here, where the communication is intermittent and the channel adds noise to the incoming signal.

Contribution: In this paper, we prove that under a fixed scheduling policy, an AWGN channel and a standard information structure, the dual effect of control [19] does not show up. The result is presented in Lemma 1 and is one of the key the observations of the paper. This renders the covariance of estimation error independent of control signals (for both transmission and non-transmission times). Under the mean-field setting, this insight enables us to reduce the game to solving a standard optimal control tracking problem [16] along with a consistency condition. We prove the consistency condition of the mean-field equilibrium (MFE) under standard assumptions and characterize the linear dynamics of the equilibrium MF trajectory. Finally, we prove that the policies prescribed by the MFE constitute an $\epsilon$ -Nash equilibrium for the finite population game and provide simulations to illustrate the performance of the equilibrium control policy.

The paper is organized as follows. Following this introduction, Section II introduces the finite-agent game formulation of the multi-agent system and the underlying information structures of each of the its entities (see Fig. 1). In Section III, we formulate the LQ-MFG problem, characterize its MFE and demonstrate the $\epsilon$ -Nash property of the MFE. In Section IV, we provide simulations to analyze the performance of the MFE and conclude the paper in Section V sharing some highlights.

Notations: Let $X_{k}^{i}$ denote the $i^{th}$ agent’s state at time instance $k$ and $X_{k:k^{\prime}}^{i}$ the $i^{th}$ agent’s state history from instant $k$ to $k^{\prime}$ , i.e., $X_{k:k^{\prime}}^{i}=(X_{k}^{i},\cdots,X_{k^{\prime}}^{i})$ . Let the set of non-negative integers and real numbers be denoted by $\mathbb{Z}^{+}$ and $\mathbb{R}^{+}$ , respectively. The transpose of matrix $A$ is denoted by $A^{\prime}$ and trace of a square matrix $M$ by $Tr\{M\}$ . For some vector $z$ and positive semi-definite matrix $S$ , let $\|z\|^{2}_{S}=z^{\prime}Sz$ . Unless stated otherwise, let $\|\cdot\|$ denote the 2-norm.

II Problem Formulation

Consider an $N$ -player game on infinite time horizon. Each agent’s dynamics evolves according to a linear discrete-time controlled stochastic process as

\displaystyle X_{k+1}^{i}=A(\phi_{i})X_{k}^{i}+B(\phi_{i})U_{k}^{i}+W_{k}^{i},~{}~{}i\in[1,N],

(1)

where $X_{k}^{i}\in\mathbb{R}^{n}$ and $U_{k}^{i}\in\mathbb{R}^{m}$ are the state process and the control input, respectively, for the $i^{th}$ agent. $W_{k}^{i}\in\mathbb{R}^{n}$ is an i.i.d. Gaussian process with zero mean and finite covariance $\Sigma_{w}$ . The initial state $X_{0}^{i}$ has mean $\nu_{\phi_{i},0}$ and covariance $\Sigma_{x}$ , and is assumed to be statistically independent of $W_{k}^{i}$ , $\forall k\in\mathbb{Z}^{+}$ . All covariance matrices are assumed to be positive definite. $A(\phi_{i})$ and $B(\phi_{i})$ are constant matrices with appropriate dimensions. $\phi_{i}$ denotes the type of the $i^{th}$ agent drawn from a finite set $\Phi:=\{\phi_{1},\cdots,\phi_{m}\}$ and is chosen according to the empirical distribution

\displaystyle F_{N}(\phi)=\frac{1}{N}\sum_{i=1}^{N}\mathbb{I}_{\{\phi_{i}\leq\phi\}},~{}~{}\phi\in\Phi,

(2)

where $\mathbb{I}_{\{\cdot\}}$ is the indicator function. It is further assumed that $\lim\limits_{N\rightarrow\infty}{F_{N}(\phi)}=F(\phi)$ weakly, for some probability distribution $F(\phi)$ over the support of $\Phi$ , with corresponding probability mass functions $P_{N}(\phi)$ and $P(\phi)$ , respectively.

To complete the problem formulation, we define the information structure on each of the blocks in Fig. 1. Such information structures are standard and appear in applications like industrial and process control [1] and wireless sensor networks [6].

Entity	Information State	Information Space	Input-output Map
Scheduler	$I^{i,sc}_{k}\triangleq\left(X_{0:k}^{i},Y_{0:k-1}^{i}\right)$	$\mathcal{I}^{i,sc}_{k}$	$\xi_{k}^{i}:~{}I^{i,sc}_{k}\mapsto\gamma_{k}$
Encoder	$I_{k}^{i,\epsilon}\triangleq\left(X_{0:k}^{i},U_{0:k-1}^{i},c_{\mathcal{K}(k-1)}^{i},d_{\mathcal{K}(k-1)}^{i},Y_{0:k-1}^{i}\right)$	$\mathcal{I}_{k}^{i,\epsilon}$	$\mathscr{E}_{k}^{i}:~{}I_{k}^{i,\epsilon}\mapsto c_{k}^{i}.$
Decoder	$I_{k}^{i,d}\triangleq\left(Y_{0:k-1}^{i},U_{0:k-1}^{i},d_{\mathcal{K}(k)}^{i}\right)$	$\mathcal{I}_{k}^{i,d}$	$\mathscr{D}_{k}^{i}:~{}I_{k}^{i,d}\mapsto Y_{k}^{i}$
Controller	$I_{k}^{i,\pi}\triangleq\left(U_{0:k-1}^{i},Y_{0:k}^{i}\right)$	$\mathcal{I}_{k}^{i,\pi}$	$\pi_{k}^{i}:~{}I_{k}^{i,\pi}\mapsto U_{k}^{i}$

TABLE I: Dictionary of Information states of entities in Fig. 1

First, we define a transmission time as an instant when information is sent over the channel. Let the history of transmission times up to the current instant $k$ ( $k>0$ ) be denoted by the set $\mathcal{K}(k):=\{l\mid l\leq k,\gamma_{l}=1\}$ , where $\gamma_{l}$ denotes the transmission instant as formalized in the next paragraph. By convention, we take $\mathcal{K}(0)=\{0\}$ .

Figure 1: Closed-loop information flow for the

i^{th}

agent

The information states, information spaces, and the input-output maps defining the scheduler, encoder, decoder and controller are as defined in Table I. The scheduler has access to the history of plant states and the decoded outputs, based on which, it decides the transmission times of plant state through the channel. The decision whether to transmit or not is taken based on an (innovations based) threshold scheduling policy (of the form $\delta_{k}^{\prime}S\delta_{k}\geq\alpha$ , where $\delta_{k}$ is the error between plant and decoder output at instant $k$ , to be defined later, $S>0$ is a user-defined constant positive definite matrix and $\alpha>0$ is the threshold parameter). Note that here, by innovations-based process, we mean, given a process $\{z_{k}\}$ , the innovation process $\{\tilde{z}_{k}\}$ contains new information not carried in the sequences $z_{k-1},z_{k-2},\cdots$ [20, Section 5.3]. We define $\gamma_{k}:=\xi_{k}^{i}\left(I^{i,sc}_{k}\right)$ , where $\gamma_{k}=1$ signifies that $k$ is a transmission instant and $\gamma_{k}=0$ signifies no transmissions ( $\varphi$ ).

Next, the encoder transmits encoded state information ( $c_{k}^{i}$ ) at transmission times over the channel. The signal $\bar{c}_{k}^{i}$ in Figure 1 is given as:

\displaystyle\bar{c}_{k}^{i}=\begin{cases}c_{k}^{i},&\text{if}~{}~{}\gamma_{k}=1,\\ \varphi,&\text{if}~{}~{}\gamma_{k}=0.\end{cases}

The encoder is assumed to have full knowledge of the system as in Table I, which leads to better control performance compared to structures where only partial information is available [4] (cf. [4] for a detailed study on other encoder information structures and how they may be realized). Such a situation emerges when the encoder and the scheduler are collocated with the plant, and hence, can observe both its state as well as the control actions applied to it [4]. In addition, we assume that the encoder is predictive, i.e., it transmits over the channel, functions of the true state minus the decoder output at the same instant. Such encoders are used in practice while encoding sequential data [8].

It is imperative to note here that the above assumptions on the encoder and the scheduler information structures, as we prove in the next section, will entail no dual-effect of control, and hence, lead to a simple controller design. The dual effect of control refers to the dual role of the controller in the evolution of the system dynamics and to probe the scheduler and new measurements to reduce its uncertainty on the system state [10]. Finally, we note, in the table, $c_{\mathcal{K}(k)}^{i}:=\{c_{l}^{i}\mid l\leq k,\gamma_{l}=1\}$ and $d_{\mathcal{K}(k)}^{i}:=\{d_{l}^{i}\mid l\leq k,\gamma_{l}=1\}$ .

The encoded signal is sent over an AWGN channel, which is analog, memoryless and is modeled as:

\displaystyle d_{l}^{i}=c_{l}^{i}+v_{l}^{i},~{}l\in\mathcal{K}(k)

where $v_{l}^{i}$ is an i.i.d. zero mean Gaussian process with finite positive-definite covariance $\Sigma_{v}$ and represents the channel noise. The input and output alphabets of the channel lie in $\mathbb{R}^{n}$ . The signal $\bar{d}_{k}^{i}$ in Figure 1 is given as:

\displaystyle\bar{d}_{k}^{i}=\begin{cases}d_{k}^{i},&\text{if}~{}~{}\gamma_{k}=1,\\ \varphi,&\text{if}~{}~{}\gamma_{k}=0.\end{cases}

Next, the decoder at the controller end serves two purposes. Firstly, it decodes the noisy channel output to produce a MMSE estimate [8] of the input signal whenever new information is received via the channel. Secondly, between transmission times, it calculates a recursive estimate of the plant state to send information to the controller at all times $k$ . Thus, the complete decoder mapping is given by

\displaystyle Y_{k}^{i}=\begin{cases}\mathbb{E}\{X_{k}^{i}|I_{k}^{i,d}\},&\text{if}~{}~{}\gamma_{k}=1,\\ AY_{k-1}^{i}+BU_{k-1}^{i},&\text{if}~{}~{}\gamma_{k}=0,\end{cases}

(3)

where $AY_{k-1}^{i}+BU_{k-1}^{i}$ is the recursive estimate calculated by the decoder between transmission instants, and $Y_{k}^{i}$ is the input to the controller.

Finally, the controller calculates control actions by minimizing an infinite-horizon average cost function

		$\displaystyle J_{i}^{N}(\pi^{i},\pi^{-i}):=$		(4)
		$\displaystyle\limsup_{T\rightarrow\infty}{\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\\|X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\\|^{2}_{Q(\phi_{i})}+\\|U_{k}^{i}\\|^{2}_{R(\phi_{i})}\right\}}$

where $Q(\phi_{i})\geq 0$ , $R(\phi_{i})>0$ and the parameter $\phi_{i}\in\Phi$ determines the tuple $(A(\phi),B(\phi),Q(\phi),R(\phi))$ for each agent. Further, $\pi^{-i}:=(\pi^{1},\cdots,\pi^{i-1},\pi^{i+1},\cdots\pi^{N})$ , where $\pi^{i}_{k}$ is as defined in Table I. The control law for agent $i$ over the sequence of deterministic control policies $\pi^{i}:=(\pi_{0}^{i},~{}\pi_{1}^{i},\cdots,)\in\bar{\mathcal{M}}_{i}$ and $\bar{\mathcal{M}}_{i}$ is the space of admissible decentralized control laws. The coupling between agents enters via the consensus term $\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}$ in the objective. Further, the cost incorporates a soft constraint on the control actions alongside penalizing state deviations from the consensus term, which each agent aims to track. Finally, the expectation in (4) is taken with respect to noise statistics and the initial state distribution.

Now, that the problem description is complete, the aim is to design a decentralized control policy for each agent in (1) minimizing its local objective (4), which is done using the MFG framework in the next section.

III Mean-Field Games

In this section, we solve the problem for the $N$ -player system (1) with objective (4) by considering the limiting case as $N\rightarrow\infty$ . In this setting, the consensus term in (4) can be approximated by a known deterministic sequence (also termed the mean-field trajectory) following the Nash Certainty Equivalence Principle [12]. This reduces the problem to a tracking control problem and a consistency condition as shown later. First, we obtain the solution (to a fully observed tracking problem constructed from a partially observed one) for this infinite agent (mean-field) system. This solution, called the MFE, consists of computing an equilibrium control policy and the equilibrium MF trajectory. Finally, we demonstrate its $\epsilon$ –Nash property.

III-A Optimal Tracking Control

Consider a generic agent (from an infinite population system) of type $\phi$ with dynamics

\displaystyle X_{k+1}=A(\phi)X_{k}+B(\phi)U_{k}+W_{k},~{}k\in\mathbb{Z}^{+}

(5)

where $X_{k}\in\mathbb{R}^{n}$ and $U_{k}\in\mathbb{R}^{m}$ are the state process and the control input, respectively. The initial state $X_{0}$ has mean $\nu_{\phi,0}$ and finite positive-definite covariance $\Sigma_{x}$ . Further, $W_{k}\in\mathbb{R}^{n}$ is an i.i.d. Gaussian process with zero mean and finite positive-definite covariance $\Sigma$ . Let us denote the generic agent’s controller information space at time $k$ as $\mathcal{I}_{k}^{\mu}$ . Then, its information state at any time $k$ is $I_{k}^{\mu}\triangleq(U_{0:k-1},Y_{0:k})\in\mathcal{I}_{k}^{\mu}$ . Let us define the map $\mu_{k}:~{}\mathcal{I}_{k}^{\mu}\rightarrow\mathcal{U},$ or more specifically, $\mu_{k}$ maps $I_{k}^{\mu}$ to $U_{k}$ . The control law can then be given as $\mu:=(\mu_{0},\mu_{1}\cdots,)\in\mathcal{M}$ , where $\mathcal{M}$ denotes the admissible class of control laws. The objective function for the generic agent can be given as

		$\displaystyle J(\mu,\bar{X})$		(6)
		$\displaystyle:=\limsup_{T\rightarrow\infty}{\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\\|X_{k}-\bar{X}_{k}\\|^{2}_{Q(\phi)}+\\|U_{k}\\|^{2}_{R(\phi)}\right\}}$

where the expectation is taken over the noise statistics, initial state and the joint laws $\mu$ . In addition, $\bar{X}=(\bar{X}_{1},\bar{X}_{2},\cdots)$ is the MF trajectory, which represents the infinite player approximation to the coupling term in (4). The introduction of this term leads to indistinguishability between the agents, thereby making the effect of state deviations of individual agents negligible. Consequently, the game problem reduces to a stochastic optimal control problem for the generic agent followed by a consistency condition, whose solution is given by the MFE. Before we formally define the MFE, we state the following assumption on the mean-field system.

Assumption 1.

(i)

The pair $(A(\phi),B(\phi))$ is controllable and $(A(\phi),{Q(\phi)}^{\frac{1}{2}})$ is observable.
(ii)

The MF trajectory $\bar{X}\in\mathcal{X}$ , where $\mathcal{X}:=\{\bar{X}_{k}\in\mathbb{R}^{n}:\|\bar{X}\|_{\infty}:=\sup_{k\geq 0}{\|\bar{X}_{k}\|}<\infty\}$ is the space of bounded vector-valued functions.

Now, to define an MFE, we introduce two operators [16]:

1.

$\Psi:\mathcal{X}\rightarrow\mathcal{M}$ given by $\Psi(\bar{X})=\operatorname*{argmin}_{\mu\in\mathcal{M}}J(\mu,\bar{X})$ , which outputs the optimal control policy minimizing the cost (6), and
2.

$\Lambda:\mathcal{M\rightarrow\mathcal{X}}$ given by $\Lambda(\mu)=\bar{X}$ , also called the consistency operator, which generates a MF trajectory consistent with the optimal policy $\Psi(\bar{X})$ as obtained above.

Then, we have the following definition.

Definition 1.

(Mean-field equilibrium [16]) The pair $(\mu^{*},\bar{X}^{*})\in\mathcal{M}\times\mathcal{X}$ is an MFE if $\mu^{*}=\Psi(\bar{X}^{*})$ and $\bar{X}^{*}=\Lambda(\mu^{*})$ . More precisely, $\bar{X}^{*}$ is a fixed point of the map $\Lambda\circ\Psi$ .

The trajectory $\bar{X}^{*}$ is the MF trajectory at equilibrium with $\mu^{*}$ as the equilibrium control policy. The aim now is to design an optimal tracking control policy for (5) minimizing (6) under the information structure discussed above.

Before proceeding, we point out that with some abuse of notation, we retain the same notations (as in Figure 1) of a generic agent’s control system except by removing the superscript $i$ . We now construct the fully observed problem (in Proposition 1 below) using decoder estimates ( $Y_{k}$ ) from the noisy observations output from the channel. This will follow from two results, namely: the control policy is free of dual-effect [19, 21], and the optimal control policy is certainty equivalent, as we prove next. Define the error between the plant and the decoder output as:

\displaystyle\bar{e}_{k}:=X_{k}-Y_{k}=\left\{\begin{array}[]{ll}e_{k},&\gamma_{k}=1\\ \delta_{k},&\gamma_{k}=0\end{array},\right.

(9)

where from (3), we let $e_{k}:=X_{k}-\mathbb{E}\{X_{k}|I_{k}^{d},d_{k}\}$ and $\delta_{k}:=X_{k}-AY_{k-1}-BU_{k-1}$ , as the error at the transmission and non-transmission times, respectively. Then, we have the following lemma.

Lemma 1.

Consider the system (5) for the generic agent under the information structure as in the previous section. Then:

(i)

The relative error $\bar{e}_{k}$ is independent of all control choices for all $k$ .
(ii)

The expected value of $\bar{e}_{k}$ and the conditional correlation between $Y_{k}$ and $\bar{e}_{k}$ given $\mathcal{F}^{d}_{k}$ , are both zero.

Consequently, the control is free of dual-effect.

Proof.

We prove the Lemma in two parts, first for transmission and then for non-transmission times.

Part 1: Consider a transmission instant $l_{r}\in\mathcal{K}(k)$ . Then, we have

	$\displaystyle\bar{e}_{l_{r+1}}=e_{l_{r+1}}$	$\displaystyle=X_{l_{r+1}}-Y_{l_{r+1}},$
		$\displaystyle=X_{l_{r+1}}-\mathbb{E}\left\{X_{l_{r+1}}\|I^{d}_{l_{r+1}}\right\},$
		$\displaystyle=A(\phi)e_{l_{r+1}-1}+W_{l_{r+1}-1}$
		$\displaystyle\qquad-\mathbb{E}\left\{A(\phi)e_{l_{r+1}-1}+W_{l_{r+1}-1}\|I^{d}_{l_{r+1}}\right\}.$

Using simple manipulations, the error at the $l_{r+1}-$ th transmission time can be recursively in terms of the preceding one as

	$\displaystyle e_{l_{r+1}}$	$\displaystyle=\eta_{1}(\phi)e_{l_{r}}+\eta_{2}(\phi)W_{l_{r}:l_{r+1}-1}$
		$\displaystyle-\mathbb{E}\left\{\eta_{1}(\phi)e_{l_{r}}+\eta_{2}(\phi)W_{l_{r}:l_{r+1}-1}\|I^{d}_{l_{r+1}}\right\},$		(10)

for matrices $\eta_{1}(\phi)$ and $\eta_{2}(\phi)$ of appropriate dimensions.

Now, we prove that $e_{l_{r}}$ is independent of control actions by induction on the parameter $l_{r}$ . Note that the first event time is assumed to be $l_{0}=0$ . Fix $Y_{l_{0}}=0$ . Then, $e_{l_{0}}=X_{l_{0}}$ is independent of all control actions. Now, assume that $e_{l_{r}}$ is independent of all controls. Now, by assumption, we have that the process noise is independent of controls, $\forall l_{r}$ . Consequently, from (III-A) and the induction hypothesis, we have that $e_{l_{r+1}}$ is independent of all controls. Finally, by the principle of mathematical induction, (i) holds for Part 1.

Next, since $e_{l_{r}}=X_{l_{r}}-\mathbb{E}\{X_{l_{r}}|I^{d}_{l_{r}}\}$ , we have $\mathbb{E}\{e_{l_{r}}|I^{d}_{l_{r}}\}=0.$ Also, $\mathbb{E}\{Y_{l_{r}}e_{l_{r}}^{\prime}|I^{d}_{l_{r}}\}=\mathbb{E}\{Y_{l_{r}}e_{l_{r}}^{\prime}|I^{d}_{l_{r}},Y_{l_{r}}\}=Y_{l_{r}}\mathbb{E}\{e_{l_{r}}^{\prime}|I^{d}_{l_{r}}\}=0,$ where the second equality follows since $Y_{l_{r}}=\mathscr{D}_{l_{r}}(I^{d}_{l_{r}})$ is $\sigma\left(I^{d}_{l_{r}}\right)$ -measurable (the sigma-algebra generated by $I^{d}_{l_{r}}$ ). Hence, (ii) holds.

Part 2: Now, we prove (i) and (ii) for the non-transmission times. Suppose $l_{r}$ and $l_{r+1}$ are some consecutive transmission times. Then, for any $k\in(l_{r},l_{r+1})$ , $\bar{e}_{k}=\delta_{k}=\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}$ , for appropriate matrices $\bar{\eta}_{1}(\phi)$ and $\bar{\eta}_{2}(\phi)$ . Then, by Part 1, (i) holds for all non-transmission times.

Next, we prove (ii). It is easy to see that the information state of the decoder is updated with new information only at the transmission instants. More specifically, any estimate between two transmission times $l_{r}$ and $l_{r+1}$ , for some $r$ , can be recovered from its information state $I^{d}_{l_{r}}$ . Thus,

	$\displaystyle\mathbb{E}\left\{\delta_{k}\|I^{d}_{k}\right\}$	$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{k}\right\}$
		$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{l_{r}}\right\}$
		$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}\|I^{d}_{l_{r}}\right\}+\mathbb{E}\left\{\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{l_{r}}\right\}$
		$\displaystyle=0.$

The last equality follows from Part 1 and the fact that $W_{k}$ is an independent zero mean process. Finally, $\mathbb{E}[Y_{k}\delta_{k}^{\prime}|I^{d}_{k}]=0$ follows in the same manner as in Part 1. Thus (ii) holds and the proof is complete. ∎

Note that the proof of Lemma 1 is made possible due to the information structure of the scheduler, encoder and decoder. Since the information maps of the scheduler and the controller entail partially nested $\sigma$ -algebras, the scheduler is able to recover the controller output information at its own end. This, in addition to the deterministic nature of control policies, allows the scheduler to compute the scheduling policy based on innovations and consequently take complete authority over transmission of new information. As a result, the control is dual-effect free.

Next, we prove that due to the no dual-effect, the separation principle holds for the underlying tracking problem. Consequently, the optimal control law under the information structure in Table I, is certainty equivalent [1].

Theorem 1.

Consider the information structure on the generic agent as in Table I. Then, the control design problem separates into designing a state decoder and a certainty equivalent controller.

Proof.

Consider the cost-to-go as follows:

		$\displaystyle\mathbb{E}\{(X_{k}-\bar{X}_{k})^{\prime}Q(\phi)(X_{k}-\bar{X}_{k})+U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left\{(Y_{k}-\bar{X}_{k}+\bar{e}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k}+\bar{e}_{k})\right.$
		$\displaystyle\left.+U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\right\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)e_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})\|I_{k}^{\mu}\}+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+Tr\{Q(\phi)\Delta_{k}\},$

where $\Delta_{k}=\left\{\begin{array}[]{ll}\mathbb{E}\{e_{k}e_{k}^{\prime}|Y_{0:k}\},&\gamma_{k}=1\\ \mathbb{E}\{\delta_{k}\delta_{k}^{\prime}|Y_{0:k}\},&\gamma_{k}=0\end{array}.\right.$ The third equality follows from Lemma 1 and the fact that $\bar{X}_{k}$ is deterministic and independent of $\bar{e}_{k}$ . The last inequality follows again since the relative error is independent of control actions by Lemma 1. Then, since the error-induced term $Tr\{Q(\phi)\Delta_{k}\}$ is independent of controls, the proof is complete. ∎

Certainty equivalence of the optimal control policy is a consequence of no dual-effect [19, 21] of control as in Lemma 1. When the control is free of dual-effect, the covariance of the estimation error is independent of the control signals used. Thus, the controller can no longer benefit from probing the scheduler for information and can be designed independently of the scheduler and the decoder. We are now ready to state the following Proposition.

Proposition 1 (Separated Stochastic Optimal Control Problem).

Using Theorem 1, the fully observed system can be constructed using the partially observed one (due to the presence of noisy channel) as:

\displaystyle Y_{k+1}=\left\{\begin{array}[]{ll}A(\phi)Y_{k}+B(\phi)U_{k}+\bar{W}_{k},&\gamma_{k+1}=1\\ A(\phi)Y_{k}+B(\phi)U_{k},&\gamma_{k+1}=0\end{array},\right.

(13)

where $\bar{W}_{k}=A(\phi)\bar{e}_{k}+W_{k}-\bar{e}_{k+1}$ , with the associated cost-to-go $(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}|I_{k}^{\mu}\}+Tr\{Q(\phi)\Delta_{k}\}.$ Then, under Assumption 1,

(i)

The optimal control policy for the separated problem (13) is given as

\displaystyle U^{*}_{k}

\displaystyle=-\Pi(\phi)Y_{k}-\Gamma(\phi)g_{k+1}

(14)

where $\Gamma(\phi)=(R(\phi)+B(\phi)^{\prime}K(\phi)B(\phi))^{-1}B(\phi)^{\prime}$ , $\Pi(\phi)=\Gamma(\phi)K(\phi)A(\phi)$ . Further, $K(\phi)$ is the unique positive definite solution to the algebraic Riccati equation

	$\displaystyle K(\phi)=A(\phi)^{\prime}K(\phi)A(\phi)-$	$\displaystyle A(\phi)^{\prime}K(\phi)^{\prime}B(\phi)\Pi(\phi)$
		$\displaystyle+Q(\phi)$		(15)

and the trajectory $g_{k}$ is given as

\displaystyle g_{k}=H(\phi)^{\prime}g_{k+1}-Q(\phi)\bar{X}_{k},

(16)

where $H(\phi)^{\prime}:=A(\phi)^{\prime}[I-K(\phi)B(\phi)\Gamma(\phi)]$ is Hurwitz.

Proof.

The separated problem follows from (5) and (9) and Theorem 1. The rest of the proof follows from [22] and [17] and is thus omitted. ∎

III-B MFE Analysis

Using Proposition 1, we now prove the existence and uniqueness of the MFE by introducing an operator $\mathcal{T}$ as shown in this section. First note from (16) that $H(\phi)=A(\phi)-B(\phi)\Pi(\phi)$ . Then, substituting (14) in (13), we arrive at the closed-loop system as:

\displaystyle Y_{k+1}=\left\{\begin{array}[]{ll}H(\phi)Y_{k}-B(\phi)\Gamma(\phi)g_{k+1}+\bar{W}_{k},&\gamma_{k+1}=1\\ H(\phi)Y_{k}-B(\phi)\Gamma(\phi)g_{k+1},&\gamma_{k+1}=0\end{array}.\right.

Then, using $\bar{W}_{k}$ from Proposition 1 and (9), we can rewrite the above closed-loop system as

\displaystyle X_{k+1}

\displaystyle=H(\phi)X_{k}-B(\phi)\Gamma(\phi)g_{k+1}+B(\phi)\Pi(\phi)\bar{e}_{k}+W_{k}.

(17)

It is to be noted that $g_{k}\in\mathcal{X}$ , if $g_{0}=-\sum_{j=0}^{\infty}{(H^{j}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}$ , which further gives $g_{k}=-\sum_{j=k}^{\infty}{(H^{j-k}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}$ . Substituting this value of $g_{k}$ in (17), we get

	$\displaystyle X_{k+1}$	$\displaystyle=H(\phi)X_{k}+B(\phi)\Gamma(\phi)\sum_{j=k+1}^{\infty}{(H^{j-k-1}(\phi))^{\prime}}Q(\phi)\bar{X}_{j}$
		$\displaystyle+B(\phi)\Pi(\phi)\bar{e}_{k}+W_{k}.$

Now, taking expectation on both sides and denoting $\hat{X}_{k}(\phi)=\mathbb{E}\{X_{k}\}$ as the aggregate trajectory of agents of type $\phi$ , we get

		$\displaystyle\hat{X}_{k+1}(\phi)=$		(18)
		$\displaystyle H(\phi)\hat{X}_{k}(\phi)+B(\phi)\Gamma(\phi)\sum_{j=k+1}^{\infty}{(H^{j-k-1}(\phi))^{\prime}Q(\phi)\bar{X}_{j}},$

where we use the tower property of conditional expectation and Lemma 1, to get $\mathbb{E}\{\bar{e}_{k}\}=\mathbb{E}\{\mathbb{E}\{\bar{e}_{k}|I_{k}^{d}\}\}=0$ . Finally, (18) can be simplified further as:

	$\displaystyle\hat{X}_{k}(\phi)=H^{k}(\phi)\nu_{\phi,0}$		(19)
	$\displaystyle+\sum_{j=0}^{k-1}{H^{k-j-1}(\phi)B(\phi)\Gamma(\phi)\sum_{s=j+1}^{\infty}(H^{s-j-1}(\phi))^{\prime}Q(\phi)\bar{X}_{s}}.$

Now, using the emprical distribution (2), define the operator $\mathcal{T}$ as:

\displaystyle\mathcal{T}(\bar{X})(k):=\sum_{\phi\in\Phi}{\hat{X}_{k}(\phi)P(\phi)},

(20)

where $\mathcal{T}(\bar{X})(\cdot)$ maps the input sequence to another sequence at time $k$ . Using this operator, we prove existence and uniqueness of the equilibrium MF trajectory by finding the fixed point of (20), under the following assumption.

Assumption 2.

We assume $\Xi:=\|H(\phi)\|+\zeta<1,\forall\phi$ , where $\zeta:=\sum_{\phi\in\Phi}{\frac{\|Q(\phi)\|\|B(\phi)\Gamma(\phi)\|}{(1-\|H(\phi)\|)^{2}}P(\phi)}$ .

Assumption 2 is motivated by results from existing literature [13, 17, 16]. While it is stronger than the ones in [15, 17], it entails linear MF trajectory dynamics, which is easily tractable.

Theorem 2.

Under Assumptions 1-2, the following hold true:

(i)

The operator $\mathcal{T}(\bar{X})\in\mathcal{X},~{}~{}\forall\bar{X}\in\mathcal{X}$ . Furthermore, there exists unique $\bar{X}^{*}\in\mathcal{X}$ such that $\mathcal{T}(\bar{X}^{*})=\bar{X}^{*}$ .
(ii)

$\bar{X}_{k}^{*}$ follows linear dynamics, i.e., $\exists~{}L^{*}\in\mathcal{L}:=\{L\in\mathbb{R}^{n\times n}:~{}\|L\|\leq 1,\bar{X}_{k+1}^{*}=L\bar{X}_{k}^{*}\}$ , where $\bar{X}^{*}_{k}$ is the aggregate trajectory of the agents at equilibrium, and $\bar{X}_{0}^{*}=\sum_{\phi\in\Phi}{\nu_{\phi,0}P(\phi)}$ .

Proof.

(i)

Consider the linear system (18) with driving input $\bar{X}_{k}$ , which is bounded since $\bar{X}\in\mathcal{X}.$ Since $\|H(\phi)\|<1$ by Assumption 2, and $g_{k}\in\mathcal{X},~{}\forall k$ , we have from (18) and (20), that $\sup_{k\geq 0}{\|\mathcal{T}(\bar{X})(k)\|}<\infty$ , which proves the first statement in part (i). Next, consider the following:

	$\displaystyle\\|\mathcal{T}(\bar{X}_{1})-\mathcal{T}(\bar{X}_{2})\\|_{\infty}=\\|\sum_{\phi\in\Phi}({\hat{X}_{1}-\hat{X}_{2})P(\phi)}\\|_{\infty}$
	$\displaystyle\leq\sum_{\phi\in\Phi}{(\\|Q(\phi)\\|\\|B(\phi)\Gamma(\phi)\\|\left(\sum_{s=0}^{\infty}\\|H(\phi)\\|^{s}\right)^{2}P(\phi)}$
	$\displaystyle\qquad\times\\|\bar{X}_{1}-\bar{X}_{2}\\|_{\infty}$
	$\displaystyle=\zeta\\|\bar{X}_{1}-\bar{X}_{2}\\|_{\infty},$

where the last equality follows from Assumption 2. Finally, using Banach’s fixed point theorem and the first statement of part (i), $\mathcal{T}(\bar{X})$ has a unique fixed point in $\mathcal{X}$ .

(ii)

Define the operator $\bar{\mathcal{T}}:\mathbb{R}^{n\times n}\rightarrow\mathbb{R}^{n\times n}$ , given as:

	$\displaystyle\bar{\mathcal{T}}_{\phi}(L)$	$\displaystyle:=H(\phi)+B(\phi)\Gamma(\phi)\sum_{\alpha=0}^{\infty}{(H^{\alpha}(\phi))^{\prime}Q(\phi)L^{\alpha+1}},$
	$\displaystyle\bar{\mathcal{T}}(L)$	$\displaystyle:=\sum_{\phi\in\Phi}\bar{\mathcal{T}}_{\phi}(L)P(\phi)$

with $L^{*}=\bar{\mathcal{T}}(L^{*})$ and $\hat{X}_{k+1}^{*}=L^{*}\hat{X}_{k}^{*}$ . To prove that such a $L^{*}$ indeed exists, we follow the same lines of proof as in [16] to arrive at

	$\displaystyle\\|\bar{\mathcal{T}}(L_{2})-\bar{\mathcal{T}}(L_{1})\\|$
	$\displaystyle\hskip 17.07182pt<\sum_{\phi\in\Phi}\frac{\\|B(\phi)\Gamma(\phi)\\|\\|Q\\|}{(1-\\|H(\phi)\\|)^{2}}\\|L_{2}-L_{1}\\|P(\phi),$

which, under Assumption 2, establishes that $\bar{\mathcal{T}}$ is a contraction. Using completeness of $\mathcal{L}$ and Banach’s fixed point theorem, we indeed have the existence of such an $L^{*}$ . Finally, from (i) above, we get that the unique MF trajectory $\bar{X}^{*}$ can be constructed recursively as $\bar{X}^{*}_{k}=(L^{*})^{k}\bar{X}^{*}_{0}$ with $\hat{X}_{0}=\nu_{\phi,0}$ .

∎

Theorem 2 (i) provides us with a unique MFE while the linearity of the MF trajectory in (ii) gives a control law which is linear in the state of the agent and the equilibrium trajectory [16]. This further makes the computation of this trajectory tractable which would otherwise have involved a non-causal infinite sum.

III-C $\epsilon$ -Nash Equilibrium

We now show that under the information structure in Table I, the MFE constitutes an $\epsilon$ -Nash equilibrium for the $N$ -player game. Before doing that, we first provide the definition of $\epsilon$ -Nash equilibrium for the discrete-time MFG under communication constraints. Let us denote the space of admissible centralized control policies for agent $i$ as $\bar{\mathcal{M}}_{i}^{c}$ , under a centralized information structure, where each agent is assumed to have access to other agents’ output histories. Then, we have the following:

Definition 2.

[17, 23] The set of control policies $\{\mu^{i}\in{\mathcal{M}}_{i}^{c},~{}i\in[1,N]\}$ constitute an $\epsilon$ -Nash equilibrium with respect to the cost functions $\{J_{i}^{N},~{}i\in[1,N]\}$ , if, for some $\epsilon>0$ ,

\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})-\epsilon\leq\inf_{\pi^{i}\in{\mathcal{M}_{i}^{c}}}{J_{i}^{N}(\pi^{i},\mu^{-i})}.

We start by proving that the mass behaviour in (4) converges to the equilibrium MF trajectory as in Theorem 2 (i). Note henceforth, signals superscripted by an asterisk ( $*$ ) will represent quantities in the equilibrium, e.g., $X^{i*}_{k}$ denotes the state of agent $i$ at time $k$ under equilibrium control policy.

Lemma 2.

Suppose Assumptions 1-2 hold and all the agents operate under the equilibrium control policy. Then, the coupling term in (4) converges (in the mean-square sense) to the equilibrium mean-field trajectory, i.e.,

\displaystyle\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i*}}-\bar{X}_{k}^{*}\Bigg{\|}^{2}\right\}=0.

(21)

Proof.

First, consider the following:

	$\displaystyle\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i}}-\bar{X}_{k}^{}\Bigg{\\|}^{2}\right\}$
	$\displaystyle\leq\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\frac{1}{N}\sum_{i=1}^{N}{X_{k}^{i}}-{\hat{X}_{k}^{}(\phi_{i})}\Bigg{\\|}^{2}\right\}$
	$\displaystyle~{}~{}~{}~{}+2\lim\limits_{N\rightarrow\infty}\sup_{k\geq 0}\Bigg{\\|}\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{}(\phi_{i})}-\bar{X}^{}_{k}\Bigg{\\|}^{2}$

where the inequality follows from the identity $\|a+b\|^{2}\leq 2\|a\|^{2}+2\|b\|^{2}$ and $\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{*}(\phi_{i})}=\sum_{\phi_{i}\in\Phi}{\hat{X}^{*}_{k}(\phi_{i})P_{N}(\phi_{i})},\forall k$ . We now prove that the first term in the above inequality vanishes. Let $Z_{k}^{i*}=X_{k}^{i*}-\hat{X}_{k}^{i*}$ . Then, using (17), we have

\displaystyle Z_{k+1}^{i*}=H(\phi_{i})Z_{k}^{i*}+W_{k}^{i}+B(\phi_{i})\Pi(\phi_{i})\bar{e}_{k}^{i}.

Then, as in [17, Lemma 2], $\exists~{}T_{1}>0$ and $M_{1}>0,$ independent of $T$ and $N$ , such that $\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}\leq\frac{M_{1}}{N},~{}\forall T>T_{1}$ , which implies that $\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}\leq\frac{M_{1}}{N}$ . This finally gives $\lim\limits_{N\rightarrow\infty}\limsup\limits_{T\rightarrow\infty}\frac{2}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\frac{1}{N}\sum_{i=1}^{N}{Z_{k}^{i*}}\|^{2}\right\}=0.$ Next, since the support of $\Phi$ is finite (and hence compact), this implies that $\hat{X}^{*}_{k}(\phi_{i})$ is uniformly bounded for all $k$ . Further, since $\bar{X}^{*}\in\mathcal{X}$ from Theorem 2, we have $\lim\limits_{N\rightarrow\infty}\sup_{k\geq 0}\|\frac{1}{N}\sum_{i=1}^{N}{\hat{X}_{k}^{*}(\phi_{i})}-\bar{X}^{*}_{k}\|^{2}=0$ . Thus, (21) holds and the proof is complete. ∎

Now, we prove the $\epsilon$ -Nash property of the MFE. Toward that end, consider the following cost functions:

	$\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})=$		(22)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}+\\|U_{k}^{i*}\\|^{2}_{R}\right\},$
	$\displaystyle J(\mu^{i},\bar{X}^{})=$		(23)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}^{}\\|^{2}_{Q}+\\|U_{k}^{i*}\\|^{2}_{R}\right\},$
	$\displaystyle J_{i}^{N}(\pi^{i},\mu^{-i*})=$		(24)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i,\pi^{i}}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\\|}^{2}_{Q}+\\|V_{k}^{i}\\|^{2}_{R}\right\},$

where $\bar{X}^{*}$ is the equilibrium MF trajectory (Theorem 2) and $X_{k}^{i,\pi^{i}}$ is the state of agent $i$ at time $k$ when it chooses a control law $\pi^{i}$ from the set of centralized policies ${\mathcal{M}}_{i}^{c}$ . Notice that this set is strictly larger than the set $\mathcal{M}$ . Furthermore, the control action $V_{k}^{i}$ is derived from $\pi^{i}\in{\mathcal{M}}_{i}^{c}$ . Now, we have the following theorem stating the $\epsilon$ -Nash result, i.e., that the control laws prescribed by the MFE are also $\epsilon$ -Nash in the finite population case.

Theorem 3.

Under the Assumptions 1-2, the set of $N$ decentralized control laws $\{\mu^{i*},~{}i\in[1,N]\}$ , where $\mu^{i*}=\mu^{*}$ , constitutes an $\epsilon$ -Nash equilibrium for the LQ-MFG with communication constrained AWGN channel, more precisely, we have

\displaystyle J_{i}^{N}(\mu^{i*},\mu^{-i*})

\displaystyle\leq\inf_{\pi^{i}\in{\mathcal{M}}_{i}^{c}}{J_{i}^{N}(\pi^{i},\mu^{-i*})}+\mathcal{O}\left(\limsup_{T\rightarrow\infty}\sqrt{\epsilon_{T}^{N}}\right),

(25)

where $\epsilon_{T}^{N}=\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}{\|\frac{1}{N}\sum\limits_{j=1}^{N}{X_{k}^{j*}}-\bar{X}_{k}^{*}\|^{2}}\right\}$ .

Proof.

We prove the theorem in two steps. In the first step, we derive an upper bound on $J_{i}^{N}(\mu^{i*},\mu^{-i*})-J(\mu^{i},\bar{X}^{*})$ , and in step 2, on $J(\mu^{i},\bar{X}^{*})-J_{i}^{N}(\pi^{i},\mu^{-i*})$ . Finally, we combine the two to get (25).

Step 1: Consider the following:

	$\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})-J(\mu^{i},\bar{X}^{})\leq J_{i}^{N}(\mu^{i},\mu^{-i})-J(\mu^{i},\bar{X}^{*})$
	$\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.~{}~{}~{}~{}-\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}_{Q}\right\},$
	$\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.~{}~{}~{}~{}+2\left(X_{k}^{i}-\bar{X}_{k}^{}\right)^{\prime}Q\left(\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\right)\right\}$
	$\displaystyle\leq\limsup_{T\rightarrow\infty}\\|Q\\|\epsilon_{T}^{N}+2\\|Q\\|\mathbb{E}\left\{\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}}\right.$
	$\displaystyle\left.\qquad\qquad\times\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\Bigg{\\|}\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}}\right\}$
	$\displaystyle\leq\limsup_{T\rightarrow\infty}\\|Q\\|\epsilon_{T}^{N}$
	$\displaystyle+\limsup_{T\rightarrow\infty}2\\|Q\\|\sqrt{\epsilon_{T}^{N}\mathbb{E}\left\{\frac{1}{T}\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}\right\}}$		(26)

where both the inequalities follow from the Cauchy-Schwarz inequality.

Step 2: Consider the following:

	$\displaystyle J_{i}^{N}(\pi^{i},\mu^{-i*})=$
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i,\pi^{i}}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\\|}^{2}_{Q}+\\|V_{k}^{i}\\|^{2}_{R}\right\}$

	$\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\\|X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{i*}\\|^{2}_{Q}+\\|V_{k}^{i}\\|^{2}_{R}\right.$
	$\displaystyle\left.\qquad\qquad+\Bigg{\\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}-\bar{X}_{k}^{i}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.+2\left(X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{}\right)^{\prime}Q\left(\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\right)\right\}$
	$\displaystyle=J(\pi^{i},\bar{X}^{})-\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}-\bar{X}_{k}^{i*}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.+2\left(\bar{X}_{k}^{}-X_{k}^{i,\pi^{i}}\right)^{\prime}Q\left(\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\right)\right\}$
	$\displaystyle\geq J(\mu^{i},\bar{X}^{})-\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}-\bar{X}_{k}^{i*}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.+2\left(\bar{X}_{k}^{}-X_{k}^{i,\pi^{i}}\right)^{\prime}Q\left(\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*,\pi^{i}}}\right)\right\}.$

Finally, using Cauchy-Schwarz inequality in the same manner as for (26), we get

		$\displaystyle J(\mu^{i},\bar{X}^{})-J_{i}^{N}(\pi^{i},\mu^{-i})\leq\limsup_{T\rightarrow\infty}\\|Q\\|\epsilon_{T}^{N}$
		$\displaystyle+\limsup_{T\rightarrow\infty}2\\|Q\\|\sqrt{\epsilon_{T}^{N}\mathbb{E}\left\{\frac{1}{T}\sum_{k=0}^{T-1}\\|X_{k}^{i,\pi^{i}}-\bar{X}_{k}^{*}\\|^{2}\right\}}.$		(27)

Similar to Theorem 3 in [17], there exist $M_{2},~{}T_{2}>0$ , such that $\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i*}\|^{2}\right\}<M_{2},~{}\forall T>T_{2}$ . Further, from Theorem 2 (i), there exist $M_{3},~{}T_{3}>0$ , such that $\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|\bar{X}_{k}^{i*}\|^{2}\right\}<M_{3},~{}\forall T>T_{3}$ . Finally, since $\bar{\mathcal{M}}_{i}\subseteq{\mathcal{M}}_{i}^{c}$ , $\inf_{\pi^{i}\in{\mathcal{M}}_{i}^{c}}J_{i}^{N}(\pi^{i},\mu^{-i*})\leq J_{i}^{N}(\mu^{i*},\mu^{-i*})$ , we may consider $\pi^{i}\in{\mathcal{M}}_{i}^{c}$ such that there exist $M_{4},~{}T_{4}>0$ , with the property that $\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\|X_{k}^{i,\pi^{i}}\|^{2}\right\}<M_{4},~{}\forall T>T_{4}$ . Choose $T_{5}=\max\{T_{1},T_{2},T_{3},T_{4}\}$ , and let $T>T_{5}$ , following which, we have (25) from (26) and (III-C). Finally, define $\epsilon:=\mathcal{O}\big{(}\limsup_{T\rightarrow\infty}\sqrt{\epsilon_{T}^{N}}\big{)}$ , which converges to 0 as $N\rightarrow\infty$ using Lemma 2. The proof is thus complete. ∎

Before concluding this section, we remark that according to Theorem 3, the decentralized equilibrium policy provides an $\epsilon$ -Nash equilibrium for the centralized policy structure in the $N$ -player game. Consequently, it also provides an $\epsilon$ -Nash equilibrium for the decentralized policy structure in the original $N$ -player game formulated in Section II.

IV Simulations

In this section we demonstrate the performance of MFE under different scheduling policies. We simulate a finite population game with scalar dynamics and a single type $\phi$ . The dynamics and cost parameters of the agents satisfy Assumptions 1 and 2. For Figure, 2(a) we simulate a game of $N=100$ agents and show that in spite of significant channel noise the estimation error decreases allowing the agents to form a consensus in a very short time. Note that the output does not perfectly mimic the true state due to the asynchronous nature of communication.

For Figure 2(b),we simulate the behavior of N=1000 agents and plot the average cost per agent on a logarithmic axis against scheduling threshold of $\alpha={0.0,2.0,4.0,6.0}$ . The figure shows a box plot depicting the median (red line) and spread (box) of the average cost per agent over $100$ runs for each value of $\alpha$ . The plot shows a clear increase in average cost per agent as $\alpha$ is increased, indicating that an increase in $\alpha$ leads to an increase in estimation error, which in turn causes a higher average cost per agent. This indicates that a compromise can be reached between performance (average cost) vs communication frequency through a judicious choice of the threshold parameter $\alpha$ .

V Conclusion

In this paper, we have studied LQ-MFGs under communication constraints, namely when there is intermittent communication over an AWGN channel. Under the defined information structure involving the scheduler, encoder, decoder and controller, we have proved that the control is free of the dual-effect in the mean field limit. Consequently, the optimal control policy has been shown to be certainty equivalent. Under appropriate assumptions, we have established the existence, uniqueness and characterization (linearity) of the mean-field trajectory, shown to have the $\epsilon$ -Nash property. We have also empirically demonstrated that the performance of the equilibrium policies deteriorates for decreasing communication frequency, aligned with intuition.

References

[1] C. Ramesh, H. Sandberg, and K. H. Johansson, “Design of state-based schedulers for a network of control loops,” IEEE Transactions on Automatic Control, vol. 58, no. 8, pp. 1962–1975, 2013.
[2] X. Gao, E. Akyol, and T. Başar, “Optimal communication scheduling and remote estimation over an additive noise channel,” Automatica, vol. 88, pp. 57–69, 2018.
[3] A. Nayyar, T. Başar, D. Teneketzis, and V. V. Veeravalli, “Optimal strategies for communication and remote estimation with an energy harvesting sensor,” IEEE Transactions on Automatic Control, vol. 58, no. 9, pp. 2246–2260, 2013.
[4] S. Tatikonda, A. Sahai, and S. Mitter, “Stochastic linear control over a communication channel,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1549–1561, 2004.
[5] S. Tatikonda and S. Mitter, “Control over noisy channels,” IEEE Transactions on Automatic Control, vol. 49, no. 7, pp. 1196–1201, 2004.
[6] O. C. Imer and T. Başar, “Optimal estimation with limited measurements,” International Journal of Systems, Control and Communications, vol. 2, no. 1-3, pp. 5–29, 2010.
[7] X. Gao, E. Akyol, and T. Başar, “Optimal estimation with limited measurements and noisy communication,” in 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 1775–1780.
[8] S. Tatikonda, A. Sahai, and S. Mitter, “Control of LQG systems under communication constraints,” in Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 1. IEEE, 1998, pp. 1165–1170.
[9] R. Bansal and T. Başar, “Simultaneous design of measurement and control strategies for stochastic systems with feedback,” Automatica, vol. 25, no. 5, pp. 679–694, 1989.
[10] A. Molin and S. Hirche, “On the optimality of certainty equivalence for event-triggered control systems,” IEEE Transactions on Automatic Control, vol. 58, no. 2, pp. 470–474, 2012.
[11] D. J. Antunes and M. H. Balaghi I., “Consistent event-triggered control for discrete-time linear systems with partial state information,” IEEE Control Systems Letters, vol. 4, no. 1, pp. 181–186, 2019.
[12] M. Huang, R. P. Malhamé, P. E. Caines et al., “Large population stochastic dynamic games: closed-loop Mckean-Vlasov systems and the Nash Certainty Equivalence principle,” Communications in Information & Systems, vol. 6, no. 3, pp. 221–252, 2006.
[13] M. Huang, P. E. Caines, and R. P. Malhamé, “Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized $\varepsilon$ -Nash equilibria,” IEEE Transactions on Automatic Control, vol. 52, no. 9, pp. 1560–1571, 2007.
[14] J.-M. Lasry and P.-L. Lions, “Mean field games,” Japanese Journal of Mathematics, vol. 2, no. 1, pp. 229–260, 2007.
[15] M. A. uz Zaman, K. Zhang, E. Miehling, and T. Başar, “Approximate equilibrium computation for discrete-time linear-quadratic mean-field games,” in American Control Conference (ACC). IEEE, 2020, pp. 333–339.
[16] ——, “Reinforcement learning in non-stationary discrete-time linear-quadratic mean-field games,” in 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 2278–2284.
[17] J. Moon and T. Başar, “Discrete-time LQG mean field games with unreliable communication,” in 53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 2697–2702.
[18] M. A. uz Zaman, S. Bhatt, and T. Başar, “Secure discrete-time linear-quadratic mean-field games,” in International Conference on Decision and Game Theory for Security. Springer, 2020, pp. 203–222.
[19] Y. Bar-Shalom and E. Tse, “Dual effect, certainty equivalence, and separation in stochastic control,” IEEE Transactions on Automatic Control, vol. 19, no. 5, pp. 494–500, 1974.
[20] B. D. Anderson and J. B. Moore, Optimal filtering. Courier Corporation, 2012.
[21] A. A. Feldbaum, “Dual control theory,” in Control Theory: Twenty-Five Seminal Papers (1932-1981), T. Başar, Ed. Wiley-IEEE Press, 2001, ch. 10, pp. 874–880.
[22] D. P. Bertsekas, Dynamic Programming and Optimal Control: Vol. 1. Athena Scientific, Belmont, MA, 2000.
[23] N. Saldi, T. Başar, and M. Raginsky, “Markov–Nash equilibria in mean-field games with discounted cost,” SIAM Journal on Control and Optimization, vol. 56, no. 6, pp. 4256–4287, 2018.

	$\displaystyle\mathbb{E}\left\{\delta_{k}\|I^{d}_{k}\right\}$	$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{k}\right\}$
		$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}+\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{l_{r}}\right\}$
		$\displaystyle=\mathbb{E}\left\{\bar{\eta}_{1}(\phi)e_{l_{r}}\|I^{d}_{l_{r}}\right\}+\mathbb{E}\left\{\bar{\eta}_{2}(\phi)W_{l_{r}:k-1}\|I^{d}_{l_{r}}\right\}$
		$\displaystyle=0.$

		$\displaystyle\mathbb{E}\{(X_{k}-\bar{X}_{k})^{\prime}Q(\phi)(X_{k}-\bar{X}_{k})+U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\}$
	$\displaystyle=$	$\displaystyle\mathbb{E}\left\{(Y_{k}-\bar{X}_{k}+\bar{e}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k}+\bar{e}_{k})\right.$
		$\displaystyle\left.+U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\right\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)e_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})\|I_{k}^{\mu}\}+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{\bar{e}_{k}^{\prime}Q(\phi)\bar{e}_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\},$
	$\displaystyle=$	$\displaystyle(Y_{k}-\bar{X}_{k})^{\prime}Q(\phi)(Y_{k}-\bar{X}_{k})+\mathbb{E}\{U_{k}^{\prime}R(\phi)U_{k}\|I_{k}^{\mu}\}$
		$\displaystyle+Tr\{Q(\phi)\Delta_{k}\},$

	$\displaystyle\\|\mathcal{T}(\bar{X}_{1})-\mathcal{T}(\bar{X}_{2})\\|_{\infty}=\\|\sum_{\phi\in\Phi}({\hat{X}_{1}-\hat{X}_{2})P(\phi)}\\|_{\infty}$
	$\displaystyle\leq\sum_{\phi\in\Phi}{(\\|Q(\phi)\\|\\|B(\phi)\Gamma(\phi)\\|\left(\sum_{s=0}^{\infty}\\|H(\phi)\\|^{s}\right)^{2}P(\phi)}$
	$\displaystyle\qquad\times\\|\bar{X}_{1}-\bar{X}_{2}\\|_{\infty}$
	$\displaystyle=\zeta\\|\bar{X}_{1}-\bar{X}_{2}\\|_{\infty},$

	$\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})=$		(22)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}+\\|U_{k}^{i*}\\|^{2}_{R}\right\},$
	$\displaystyle J(\mu^{i},\bar{X}^{})=$		(23)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}^{}\\|^{2}_{Q}+\\|U_{k}^{i*}\\|^{2}_{R}\right\},$
	$\displaystyle J_{i}^{N}(\pi^{i},\mu^{-i*})=$		(24)
	$\displaystyle\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i,\pi^{i}}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j*}}\Bigg{\\|}^{2}_{Q}+\\|V_{k}^{i}\\|^{2}_{R}\right\},$

	$\displaystyle J_{i}^{N}(\mu^{i},\mu^{-i})-J(\mu^{i},\bar{X}^{})\leq J_{i}^{N}(\mu^{i},\mu^{-i})-J(\mu^{i},\bar{X}^{*})$
	$\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}X_{k}^{i}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.~{}~{}~{}~{}-\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}_{Q}\right\},$
	$\displaystyle=\limsup_{T\rightarrow\infty}\frac{1}{T}\mathbb{E}\left\{\sum_{k=0}^{T-1}\Bigg{\\|}\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}_{Q}\right.$
	$\displaystyle\left.~{}~{}~{}~{}+2\left(X_{k}^{i}-\bar{X}_{k}^{}\right)^{\prime}Q\left(\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\right)\right\}$
	$\displaystyle\leq\limsup_{T\rightarrow\infty}\\|Q\\|\epsilon_{T}^{N}+2\\|Q\\|\mathbb{E}\left\{\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}}\right.$
	$\displaystyle\left.\qquad\qquad\times\sqrt{\frac{1}{T}\sum_{k=0}^{T-1}\Bigg{\\|}\bar{X}_{k}^{}-\frac{1}{N}\sum_{j=1}^{N}{X_{k}^{j}}\Bigg{\\|}^{2}}\right\}$
	$\displaystyle\leq\limsup_{T\rightarrow\infty}\\|Q\\|\epsilon_{T}^{N}$
	$\displaystyle+\limsup_{T\rightarrow\infty}2\\|Q\\|\sqrt{\epsilon_{T}^{N}\mathbb{E}\left\{\frac{1}{T}\sum_{k=0}^{T-1}\\|X_{k}^{i}-\bar{X}_{k}^{}\\|^{2}\right\}}$		(26)

Linear Quadratic Mean-Field Games with Communication Constraints

Abstract

I Introduction

II Problem Formulation

III Mean-Field Games

III-A Optimal Tracking Control

Assumption 1.

Definition 1.

Lemma 1.

Proof.

Theorem 1.

Proof.

Proposition 1 (Separated Stochastic Optimal Control Problem).

Proof.

III-B MFE Analysis

Assumption 2.

Theorem 2.

Proof.

III-C ϵ\epsilon-Nash Equilibrium

Definition 2.

Lemma 2.

Proof.

Theorem 3.

Proof.

IV Simulations

V Conclusion

References

III-C $\epsilon$ -Nash Equilibrium