This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Linear-quadratic Gaussian Games with Asymmetric Information:
Belief Corrections Using the Opponents Actions111We thank Sam Cohen and Yanwei Jia for helpful discussions and comments on the paper.

Ben Hambly Mathematical Institute, University of Oxford. Email: [email protected]    Renyuan Xu Epstein Department of Industrial and Systems Engineering, University of Southern California. Email: [email protected]    Huining Yang Department of Operations Research and Financial Engineering, Princeton University. Email: [email protected]
Abstract

We consider two-player non-zero-sum linear-quadratic Gaussian games in which both players aim to minimize a quadratic cost function while controlling a linear and stochastic state process using linear policies. The system is partially observable with asymmetric information available to the players. In particular, each player has a private and noisy measurement of the state process but can see the history of their opponent’s actions. The challenge of this asymmetry is that it introduces correlations into the players’ belief processes for the state and leads to circularity in their beliefs about their opponents beliefs. We show that by leveraging the information available through their opponent’s actions, both players can enhance their state estimates and improve their overall outcomes. In addition, we provide a closed-form solution for the Bayesian updating rule of their belief process. We show that there is a Nash equilibrium which is linear in the estimation of the state and with a value function incorporating terms that arise due to errors in the state estimation. We illustrate the results through an application to bargaining which demonstrates the value of these information corrections.

1 Introduction

In the realm of game theory, two-player bargaining games have been extensively studied due to their practical implications in various domains [16, 4, 21, 9]. The types of game of interest here typically occur in negotiations between large organizations. One example is the negotiation of extraction rights between a host country and a foreign mining company. The host country seeks to maximize its economic benefits while ensuring sustainable resource management, while the foreign company aims to secure profitable extraction agreements. Another relevant scenario is the negotiation of rebates on pharmaceutical purchases between a manufacturer and a retail chain. The manufacturer desires to maximize its sales volume and market share, while the retail chain seeks competitive pricing to attract customers. To capture these situations we consider negotiations over a good whose value changes through time and assume that there are a fixed number of rounds of negotiation, where at each round both players present their bids simultaneously. In practice, both players possess limited and potentially asymmetric information about the good under negotiation. This provides compelling motivation for the mathematical modeling and analysis of two-player stochastic games where there is asymmetric information, which can shed light on the negotiation dynamics, identify the key parameters affecting the bargaining outcome, and provide insights into the search for information.

In the game setting where the underlying state dynamics are only partially observed, the players’ observation histories expand over time, leading to strategies with expanding domains due to the dynamic and sequential nature of such games. To address this issue, a commonly employed technique is to summarize the time-expanding histories into sufficient statistics. This enables the application of the Dynamic Programming Principle (DPP), allowing such sequential decision-making problems to be solved through nested sub-optimization problems using the Bellman equation. The sufficient statistics may vary from player to player, depending on the observations available to each player. In the case of symmetric information in extensive form games, the concept of a Markov perfect equilibrium has been introduced [10], in which the players’ strategies are determined solely by past events that are relevant to their payoffs, rather than the entire history. See also [3, 15]. However, for games with asymmetric information, identifying the appropriate sufficient statistics poses a significant challenge.

A quantity commonly used as the sufficient statistic is a posterior belief of the state dynamics, constructed using the available observations and Bayes formula. The main difficulty in this context is the emergence of private beliefs, the fact that different agents in the system may have different (private) observations about the same unknown quantity, which introduces dependence among the agents’ beliefs. One way to avoid this problem is to consider models in which private beliefs either do not exist (such as considering symmetric information games, or asymmetric but independent observations [18, 19]), or, if they do exist, they are not taken into account in the agents’ strategies (see for example the concept of “public perfect equilibrium” [1]). Another closely related line of work considers common-information based Markov perfect equilibria, which breaks the history into the common and private parts. This idea was first introduced in [11] for finite games and then generalized in [7] to linear-quadratic games. See [12] for an extension to a more general setting and [6] for an application to cyber-physical systems. One key assumption made in this approach is that players’ posterior beliefs about the system state, conditioned on their common information, are independent of the strategies used by the players in the past. This decouples the sequential rationality and belief consistency, resulting in a simplification in calculating the equilibria, and obviating the need to define (possibly correlated) private beliefs.

To better understand the challenge posed by having private beliefs, let us consider the following simplified scenario. We have two players, PP and EE, who collect private information about an unknown variable Θ\Theta at each time step. Player PP acts based on her own private belief about Θ\Theta, and expects that Player EE will do the same. Although both players can observe each other’s actions, they cannot observe each other’s private beliefs. This means that Player PP must form a belief about Player EE’s private belief in order to interpret the actions from player EE, and take this into account when making her own decisions. However, this creates a chain of “belief about belief” that must be taken into consideration, which extends as long as each player’s beliefs remain private. Due to the symmetry of the information structure, Player EE must do the same. Thus, Player PP needs to form beliefs about beliefs about beliefs of Player EE, creating an increasingly complex hierarchy of beliefs. This chain only stops when a belief in one step becomes a public function of the beliefs in the previous steps.

Indeed, stochastic games with private beliefs have been identified as an open problem in the past decade [13, 14]. There have been a few attempts to address this challenging issue. In [8] a model with an unknown (but static) state Θ\Theta of the world is considered, where each player has a private noisy observation of Θ\Theta at timestamp tt. The private observations are independent given Θ\Theta. The authors specialize this setting to the case of a linear-quadratic Gaussian non-zero-sum game where Θ\Theta is a Gaussian variable and players’ observations are generated through a linear Gaussian model from Θ\Theta. The main contribution of the paper is to show that, due to the conditional independence of the private signals given Θ\Theta, the private belief chain stops at the second step and players’ beliefs about others’ beliefs are public functions of their own beliefs (the first step beliefs). In addition, the authors show that the perfect Bayesian equilibrium (PBE) for their model is a linear function of a players’ private Kalman filter estimator. For a more general setting when the (partially) unknown quantity is the underlying stochastic process itself, [13] considers a zero-sum two-player game formulation in which each agent observes a private linear signal of the underlying process with non-degenerate Gaussian noise. The private signals at each time step are independent conditioned on the true value of the state at the same time. Similarly, the author shows that the private belief chain stops at the second step. However, the sufficient statistics developed in [13] are not completely correct, impeding the application of the DPP and making it impossible to derive the Nash equilibrium solutions. We will give a detailed technical discussion in Section 2.2.

Our Contributions.

Motivated by bargaining games, we consider a two-player non-zero-sum game with linear dynamics and quadratic cost functions. Both players cannot directly observe the dynamics but instead rely on private signal processes that are linear in the state process with some additive Gaussian noise. In addition, both players adopt linear policies to control the partially observable dynamics and each player can also observe the past actions taken by the other player.

The novelty of our approach is that we show how to use the opponent’s actions as additional information to correct the state estimate of the previous time step via a modified Kalman filtering procedure. For the partially observable setting, we formally derive the updating rule for the belief process and provide an explicit formula for the projection of the unknown state process onto the filtration generated by the information flow that is available to each player in Theorem 2.4. In addition, we prove a conditional version of the DPP that works in the game setting in Theorem 3.1. With the sufficient statistics and DPP in hand, we are able to derive a Nash equilibrium solution for the two-player game in Theorem 3.4. Finally, in Section 4, we extend the above-mentioned results to a more general setting where part of the state is fully observable and part of the state is partially observable through the signal process. In the mixed partially and fully observable setting, we establish parallel findings– specifically, the updating rule for the sufficient statistics in Theorem 4.2 and the Nash equilibrium in Theorem 4.3. To the best of our knowledge, this is the first work that rigorously characterizes the equilibrium solution under private belief when the underlying state dynamics are partially observable. To demonstrate the performance of our framework, we conclude the paper in Section 5 by discussing a bargaining game example and give a numerical illustration of the effects of using information corrections.

2 Problem Set-up

We consider a general setting for games with two players, PP and EE, under partial observations and asymmetric information. The joint dynamics of the state process xtnx_{t}\in\mathbb{R}^{n} takes a linear form (0tT1)(0\leq t\leq T-1):

xt+1=Atxt+BtPutP+BtEutE+Γtwt,x_{t+1}=A_{t}x_{t}+B^{P}_{t}u^{P}_{t}+B^{E}_{t}u^{E}_{t}+\Gamma_{t}w_{t}, (2.1)

with initial value x0=xx_{0}=x, and the controls of PP and EE are utPmu^{P}_{t}\in\mathbb{R}^{m} and utEku^{E}_{t}\in\mathbb{R}^{k}, respectively. Here, for each tt, the process noise wtdw_{t}\in\mathbb{R}^{d} is an i.i.d. sample from 𝒩(0,W)\mathcal{N}(0,W) with Wd×dW\in\mathbb{R}^{d\times d} and we have the model parameters Atn×n,A_{t}\in\mathbb{R}^{n\times n}, BtPn×mB^{P}_{t}\in\mathbb{R}^{n\times m}, BtEn×kB^{E}_{t}\in\mathbb{R}^{n\times k}, and Γtn×d\Gamma_{t}\in\mathbb{R}^{n\times d}.

Information Structure.

At the time t=0t=0, player PP is not able to observe x0x_{0} but instead believes that the initial state is drawn from a Gaussian distribution. Namely, from the view point of player PP,

x0𝒩(x^0P,W0P),\displaystyle x_{0}\sim\mathcal{N}(\widehat{x}_{0}^{P},W^{P}_{0}), (2.2)

where x^0P\widehat{x}_{0}^{P} is their own initial constant and W0PW^{P}_{0} is a known constant covariance matrix. After that, player PP observes the following noisy state signal (or measurement) ztPpz_{t}^{P}\in\mathbb{R}^{p}:

zt+1P=Ht+1Pxt+1+wt+1P,wt+1P𝒩(0,GP),t=0,1,,T1,\displaystyle z_{t+1}^{P}=H_{t+1}^{P}\,x_{t+1}+\,w^{P}_{t+1},\quad w^{P}_{t+1}\sim\mathcal{N}(0,G^{P}),\quad t=0,1,\cdots,T-1, (2.3)

with {wtP}t=0T1\{w^{P}_{t}\}_{t=0}^{T-1} a sequence of i.i.d. random variables. Here GPp×pG^{P}\in\mathbb{R}^{p\times p} and Ht+1Pp×nH_{t+1}^{P}\in\mathbb{R}^{p\times n}. Similarly, player EE believes that the initial state is drawn from a Gaussian distribution:

x0𝒩(x^0E,W0E),\displaystyle x_{0}\sim\mathcal{N}(\widehat{x}_{0}^{E},W^{E}_{0}), (2.4)

with their own initial constant x^0E\widehat{x}_{0}^{E} and known constant W0EW^{E}_{0}. Thereafter player EE observes the following noisy state signal ztEqz_{t}^{E}\in\mathbb{R}^{q}:

zt+1E=Ht+1Ext+1+wt+1E,wt+1E𝒩(0,GE),t=0,1,,T1.\displaystyle z_{t+1}^{E}=H_{t+1}^{E}\,x_{t+1}+\,w^{E}_{t+1},\quad w^{E}_{t+1}\sim\mathcal{N}(0,G^{E}),\quad t=0,1,\cdots,T-1. (2.5)

with {wtE}t=0T1\{w^{E}_{t}\}_{t=0}^{T-1} a sequence of i.i.d. random variables. For simplicity we assume that {wtE}t=0T1\{w^{E}_{t}\}_{t=0}^{T-1} are independent from {wtP}t=0T1\{w^{P}_{t}\}_{t=0}^{T-1}. In addition, GEq×qG^{E}\in\mathbb{R}^{q\times q} and Ht+1Eq×nH_{t+1}^{E}\in\mathbb{R}^{q\times n}.

We follow [13] to define games with perfect, imperfect, and partial observations:

  • (a)

    If the observation matrices HtPH_{t}^{P} and HtEH_{t}^{E} are both the identity matrix and there is no measurement noise, that is, the covariance matrices GP=0G^{P}=0 and GE=0G^{E}=0, we have a game with full (or perfect) observation. In other words, the players’ observation is ztE=ztP=xtz_{t}^{E}=z_{t}^{P}=x_{t} .

  • (b)

    If the observation matrices HtPH_{t}^{P} and HtEH_{t}^{E} are the identity matrix and there is measurement noise, we have a game with imperfect observation. Namely, the players’ observations are of the form

    ztP=xt+wtP,ztE=xt+wtE.z^{P}_{t}=x_{t}+w_{t}^{P},\quad z^{E}_{t}=x_{t}+w_{t}^{E}.
  • (c)

    If the observation matrices are not the identity and there is measurement noise, as in

    ztP=HtPxt+wtP,ztE=HtExt+wtE,z^{P}_{t}=H^{P}_{t}x_{t}+w_{t}^{P},\quad z^{E}_{t}=H^{E}_{t}x_{t}+w_{t}^{E},

    we have a game with partial observation.

Both players make their decisions based on the public and private information available to them. We write 𝒵tP={zsP}s=1t\mathcal{Z}_{t}^{P}=\{z_{s}^{P}\}_{s=1}^{t} and 𝒵tE={zsE}s=1t\mathcal{Z}_{t}^{E}=\{z_{s}^{E}\}_{s=1}^{t} for the private signals players P and E receive up to time tt (1tT)(1\leq t\leq T), respectively. Let 𝒰tP={usP}s=1t\mathcal{U}^{P}_{t}=\{u^{P}_{s}\}_{s=1}^{t} and 𝒰tE={usE}s=1t\mathcal{U}^{E}_{t}=\{u^{E}_{s}\}_{s=1}^{t} denote the control history of player PP and player EE up to time tt, respectively.

We assume tP\mathcal{H}^{P}_{t} is the information (or history) available to player PP and tE\mathcal{H}^{E}_{t} is the information available to player EE for them to make decisions at time tt, where tP\mathcal{H}^{P}_{t} and tE\mathcal{H}^{E}_{t} follow:

tP={x^0P,W0P,W0E}𝒵tP𝒰t1P𝒰t1E,tE={x^0E,W0P,W0E}𝒵tE𝒰t1P𝒰t1E.\displaystyle\mathcal{H}^{P}_{t}=\{\widehat{x}_{0}^{P},W_{0}^{P},W_{0}^{E}\}\cup\mathcal{Z}_{t}^{P}\cup\mathcal{U}^{P}_{t-1}\cup\mathcal{U}^{E}_{t-1},\quad\mathcal{H}^{E}_{t}=\{\widehat{x}_{0}^{E},W_{0}^{P},W_{0}^{E}\}\cup\mathcal{Z}_{t}^{E}\cup\mathcal{U}^{P}_{t-1}\cup\mathcal{U}^{E}_{t-1}. (2.6)

Note that the covariance matrices {W0P,W0E}\{W_{0}^{P},W_{0}^{E}\} are known to both players. In the posterior update of a Gaussian distribution, sufficient statistics involve both the mean and covariance matrix. Knowing {W0P,W0E}\{W_{0}^{P},W_{0}^{E}\} is essential for both players to update their posterior covariance matrices. In addition, we highlight that both players know all the model parameters.

Cost Function.

We consider a non-zero sum game between player PP and player EE where they strive to minimize their own cost functions. Player ii’s (i=P,Ei=P,E) cost function is given by

min{uti}t=0T1Ji(x^0i)\displaystyle\min_{\{u^{i}_{t}\}_{t=0}^{T-1}}J^{i}(\widehat{x}_{0}^{i}) :=\displaystyle:= min{uti}t=0T1𝔼[xTQTixT+t=0T1(xtQtixt+(uti)Rtiuti)|0i],\displaystyle\min_{\{u^{i}_{t}\}_{t=0}^{T-1}}\mathbb{E}\left.\left[x_{T}^{\top}{Q^{i}_{T}}x_{T}+\sum_{t=0}^{T-1}\left({x_{t}^{\top}Q^{i}_{t}x_{t}}+(u^{i}_{t})^{\top}R_{t}^{i}u^{i}_{t}\right)\,\right|\,\mathcal{H}^{i}_{0}\right], (2.7)

with cost parameters QtP,QtEn×nQ_{t}^{P},Q_{t}^{E}\in\mathbb{R}^{n\times n}, RtPm×mR_{t}^{P}\in\mathbb{R}^{m\times m}, and RtEk×kR_{t}^{E}\in\mathbb{R}^{k\times k}.

For the well-definedness of the game, we summarize the assumptions on the model parameters, initial state, and noise.

Assumption 2.1 (Parameters, Initial State, and Noise).

For i=P,Ei=P,E,

  1. 1.

    {wt}t=0T1\{w_{t}\}_{t=0}^{T-1} and {wti}t=1T1\{w_{t}^{i}\}_{t=1}^{T-1} are zero-mean, i.i.d. Gaussian random variables that are independent from x0x_{0} and each other and such that 𝔼[wtwt]=W\mathbb{E}[w_{t}w_{t}^{\top}]=W is positive definite and 𝔼[wti(wti)]=Gi\mathbb{E}[w_{t}^{i}(w_{t}^{i})^{\top}]=G^{i} is positive definite;

  2. 2.

    Both matrices Ht+1Pp×nH_{t+1}^{P}\in\mathbb{R}^{p\times n} and Ht+1Eq×nH_{t+1}^{E}\in\mathbb{R}^{q\times n} have rank nn for t=0,,T1t=0,\dots,T-1.

  3. 3.

    The matrices ΓtWΓt\Gamma_{t}W\Gamma_{t}^{\top} are non-singular for t=1,,Tt=1,\dots,T;

  4. 4.

    The cost matrices QtiQ_{t}^{i}, for t=0,1,,Tt=0,1,\ldots,T are positive semi-definite, and RtiR_{t}^{i} for t=0,1,,T1t=0,1,\ldots,T-1 are positive definite.

Under the assumptions we make on GPG^{P} and GEG^{E}, we exclude the perfect information case as we require GPG^{P} and GEG^{E} to be positive definite. This is a challenging scenario to study as the agent cannot get precise information for any coordinate of the state process. On the other hand, since Ht+1Pp×nH_{t+1}^{P}\in\mathbb{R}^{p\times n} and Ht+1Eq×nH_{t+1}^{E}\in\mathbb{R}^{q\times n} have rank nn, the signal process will reveal aggregated information from all of the coordinates. Hence the agent is still capable of gradually learning each coordinate of the state process.

Remark 2.2.

We focus on the case where all states are partially observable, as given in Assumption 2.1, for the rest of Section 2 and for all of Section 3. We generalize it to a mixed fully and partially observable setting in Section 4.

2.1 Sufficient Statistics: Decentralized Kalman Filtering with Information Correction

To better demonstrate the sufficient statistics of Kalman filtering in the game setting, we start with a brief discussion of some existing results in the single-agent setting.

2.1.1 Preliminary: Single-agent setting

Suppose there is a single player PP who controls the state dynamics (2.1):

xt+1=Atxt+BtPutP+Γtwt,\displaystyle{x}_{t+1}=A_{t}{x}_{t}\,+\,B^{P}_{t}{u}^{P}_{t}\,+\,\Gamma_{t}w_{t}, (2.8)

where x0=x{x}_{0}=x is the initial position and utPm{u}_{t}^{P}\in\mathbb{R}^{m} are the controls from player PP.

Information Structure.

In this single-player case player PP believes that the initial state is drawn from a Gaussian distribution at the time t=0t=0:

x0𝒩(x^0P,W0P),\displaystyle{x}_{0}\sim\mathcal{N}\left(\widehat{{x}}^{P}_{0},W^{P}_{0}\right), (2.9)

and thereafter player PP observes the following noisy state signal ztpz_{t}\in\mathbb{R}^{p}:

zt+1P=Ht+1Pxt+1+wt+1P,wt+1P𝒩(0,GP),t=0,1,,T1,\displaystyle z^{P}_{t+1}=H^{P}_{t+1}\,x_{t+1}+\,w^{P}_{t+1},\quad w^{P}_{t+1}\sim\mathcal{N}(0,G^{P}),\quad t=0,1,\cdots,T-1, (2.10)

with {wtP}t=0T1\{w^{P}_{t}\}_{t=0}^{T-1} a sequence of i.i.d. random variables. Here GPp×pG^{P}\in\mathbb{R}^{p\times p} and Ht+1Pp×nH_{t+1}^{P}\in\mathbb{R}^{p\times n}.

Assume the information available to player PP to make a decision at time tt follows:

tP={x^0P,W0P}𝒵tP𝒰t1P,\displaystyle{\mathcal{H}}^{P}_{t}=\{\widehat{x}_{0}^{P},W_{0}^{P}\}\cup\mathcal{Z}_{t}^{P}\cup\mathcal{U}^{P}_{t-1}, (2.11)

then we have the following result characterizing player PP’s belief in the state.

Theorem 2.3.

[17, (5.3-39)-(5.3-42)] The sufficient statistic for player P at decision time t=0t=0 is (x^0,W0P)(\widehat{x}_{0},W_{0}^{P}). Namely player P believes that x0𝒩(x^0,W0P)x_{0}\sim\mathcal{N}(\widehat{x}_{0},W_{0}^{P}). For time 1tT11\leq t\leq T-1, the distribution of the physical state xtx_{t} calculated by player PP, by conditioning on the private information available to him at time tt, is given by

xt𝒩(x^tP,Σ^tP),\displaystyle x_{t}\sim\mathcal{N}(\widehat{x}_{t}^{P},\widehat{\Sigma}_{t}^{P}), (2.12)

where

(x^tP)\displaystyle\big{(}\widehat{x}^{P}_{t}\big{)}^{-} =At1x^t1P+Bt1Put1P,\displaystyle=A_{t-1}\widehat{x}^{P}_{t-1}+B^{P}_{t-1}u^{P}_{t-1}, (2.13a)
(Σ^tP)\displaystyle\big{(}\widehat{\Sigma}^{P}_{t}\big{)}^{-} =At1Σ^t1PAt1+Γt1WΓt1,\displaystyle=A_{t-1}\widehat{\Sigma}^{P}_{t-1}A_{t-1}^{\top}+\Gamma_{t-1}W\Gamma_{t-1}^{\top}, (2.13b)
KtP\displaystyle K_{t}^{P} =(Σ^tP)(HtP)[HtP(Σ^tP)(HtP)+GP]1,\displaystyle=\big{(}\widehat{\Sigma}^{P}_{t}\big{)}^{-}(H_{t}^{P})^{\top}\left[H_{t}^{P}\big{(}\widehat{\Sigma}^{P}_{t}\big{)}^{-}(H^{P}_{t})^{\top}+G^{P}\right]^{-1}, (2.13c)
x^tP\displaystyle\widehat{x}_{t}^{P} =(x^tP)+KtP[ztPHtP(x^tP)],\displaystyle=\big{(}\widehat{x}_{t}^{P}\big{)}^{-}+K_{t}^{P}\left[z_{t}^{P}-H_{t}^{P}\big{(}\widehat{x}_{t}^{P}\big{)}^{-}\right], (2.13d)
Σ^tP\displaystyle\widehat{\Sigma}^{P}_{t} =(IKtPHtP)(Σ^tP),\displaystyle=\big{(}I-K_{t}^{P}H^{P}_{t}\big{)}\big{(}\widehat{\Sigma}^{P}_{t}\big{)}^{-}, (2.13e)

with initial condition x^0P=x^0P\widehat{x}^{P}_{0}=\widehat{x}^{P}_{0} and Σ^0P=W0P\widehat{\Sigma}^{P}_{0}=W^{P}_{0}.

Note that (x^tP)=𝔼[xt|t1]\big{(}\widehat{x}^{P}_{t}\big{)}^{-}=\mathbb{E}\left.\Big{[}x_{t}\right|\mathcal{H}_{t-1}\Big{]} is the state estimate obtained before the measurement update, namely, using information up to time t1t-1. This term is often called the pre-estimate in the literature. With the new measurement information ztPz_{t}^{P} at time tt, the agent updates the state estimate to x^tP=𝔼[xt|tP]\widehat{x}_{t}^{P}=\mathbb{E}\left.\Big{[}x_{t}\right|\mathcal{H}_{t}^{P}\Big{]}, which is a linear combination of (x^tP)\big{(}\widehat{x}^{P}_{t}\big{)}^{-} and ztPz_{t}^{P}. This term is often called the post-estimate [17]. Here the second term ztPHtP(x^tP)z_{t}^{P}-H_{t}^{P}\big{(}\widehat{x}_{t}^{P}\big{)}^{-} on the RHS of (2.13d) is independent of the first term (x^tP)\big{(}\widehat{x}_{t}^{P}\big{)}^{-}. The only decision variable KtPK_{t}^{P} in the coefficients is chosen so that the conditional mean-square error is minimized. Namely,

KtP=argmin𝔼[xtx^tP2|tP].\displaystyle K_{t}^{P}=\arg\min\,\,\mathbb{E}\left.\Big{[}\|x_{t}-\widehat{x}_{t}^{P}\|^{2}\right|\mathcal{H}_{t}^{P}\Big{]}. (2.14)

In terms of quantifying the uncertainty in the state estimate, we have (Σ^tP)=𝔼[(xt(x^tP))(xt(x^tP))]\big{(}\widehat{\Sigma}^{P}_{t}\big{)}^{-}=\mathbb{E}\Big{[}(x_{t}-(\widehat{x}_{t}^{P})^{-})^{\top}(x_{t}-(\widehat{x}_{t}^{P})^{-})\Big{]} and Σ^tP=𝔼[(xtx^tP)(xtx^tP)]\widehat{\Sigma}^{P}_{t}=\mathbb{E}\Big{[}\big{(}x_{t}-\widehat{x}_{t}^{P}\big{)}^{\top}\big{(}x_{t}-\widehat{x}_{t}^{P}\big{)}\Big{]} representing the covariance before and after measurement updates, respectively.

2.1.2 Two-player setting

There are a few challenges for the belief updates in the game setting. In this paper we focus on the case where both players PP and EE adopt linear feedback policies:

utP:=FtP𝔼[xt|tP],andutE:=FtE𝔼[xt|tE],u^{P}_{t}:=F_{t}^{P}\,\mathbb{E}[x_{t}|\mathcal{H}_{t}^{P}],\,\,{\rm and}\,\,u^{E}_{t}:=F_{t}^{E}\,\mathbb{E}[x_{t}|\mathcal{H}_{t}^{E}], (2.15)

with some policy matrices FtPm×nF_{t}^{P}\in\mathbb{R}^{m\times n} and FtEk×nF_{t}^{E}\in\mathbb{R}^{k\times n}.

From player PP’s perspective, the new information collected at time tt is ztPz_{t}^{P} and ut1Eu^{E}_{t-1}. Intuitively, player PP should be able to use ut1Eu^{E}_{t-1} as some additional information to improve their estimate of xt1x_{t-1}. Note that player PP is aware that player EE adopts a linear feedback policy ut1E=Ft1E𝔼[xt1|t1E]u^{E}_{t-1}=F^{E}_{t-1}\mathbb{E}[x_{t-1}|\mathcal{H}_{t-1}^{E}], for which the opponent’s policy matrix Ft1EF^{E}_{t-1} (a function of model parameters) is known but the state estimate 𝔼[xt1|t1E]\mathbb{E}[x_{t-1}|\mathcal{H}_{t-1}^{E}] is unknown to player PP. In order to utilize the information contained in ut1Eu^{E}_{t-1}, player PP also needs to infer the distribution of 𝔼[xt1|t1E]\mathbb{E}[x_{t-1}|\mathcal{H}_{t-1}^{E}] using t1P\mathcal{H}_{t-1}^{P}. We will show later on in Section 2.2 that this idea of using the opponent’s actions to make an information correction is not only a possible approach to improve the estimation precision but also a necessary step to guarantee that the conditional expectations based on the information filtrations satisfy the tower property and hence that the DPP holds.

Let us start at decision time t=0t=0. Player PP reasons as follows. He models his initial belief x^0P\widehat{x}^{P}_{0} of the initial state x0x_{0} as

x^0P=x0+e0P,\displaystyle\widehat{x}^{P}_{0}=x_{0}+e_{0}^{P}, (2.16)

where x0x_{0} is the true physical state and e0Pe_{0}^{P} is player PP’s estimation error, whose distribution, in view of (2.2), is 𝒩(0,W0P)\mathcal{N}(0,W_{0}^{P}). In the same way, player EE’s belief x^0E\widehat{x}^{E}_{0} of the initial state x0x_{0} follows

x^0E=x0+e0E,\displaystyle\widehat{x}_{0}^{E}=x_{0}+e_{0}^{E}, (2.17)

where, as before, x0x_{0} is the true physical state and e0Ee_{0}^{E} is player EE’s estimation error, whose distribution, in view of (2.4), is 𝒩(0,W0E)\mathcal{N}(0,W_{0}^{E}). The Gaussian random variables e0Ee_{0}^{E} and e0Pe_{0}^{P} are independent by our assumptions.

From player PP’s perspective, x^0P\widehat{x}_{0}^{P} is known but x^0E\widehat{x}_{0}^{E} is a random variable. Subtracting (2.16) from (2.17), at time t=0t=0 player PP concludes that as far as he is concerned, player EE’s estimate, upon which he will decide his optimal control, is the random variable

x^0E=x^0P+e0Ee0P.\displaystyle\widehat{x}_{0}^{E}=\widehat{x}_{0}^{P}+e_{0}^{E}-e_{0}^{P}. (2.18)

As far as PP is concerned, EE’s estimate of the initial state x0x_{0} is a Gaussian random variable

x^0E𝒩(x^0P,W0P+W0E).\displaystyle\widehat{x}_{0}^{E}\sim\mathcal{N}(\widehat{x}_{0}^{P},W_{0}^{P}+W_{0}^{E}). (2.19)

Thus, at time t=0t=0 player PP has used his private information x^0P\widehat{x}^{P}_{0} and the public information (W0P,W0E)(W_{0}^{P},W_{0}^{E}) to calculate the distribution of the sufficient statistic x^0E\widehat{x}_{0}^{E} of player EE. Similarly, as far as player EE is concerned, at time t=0t=0 the distribution of the initial state estimate x^0P\widehat{x}_{0}^{P} of player PP follows 𝒩(x^0E,W0P+W0E)\mathcal{N}(\widehat{x}_{0}^{E},W_{0}^{P}+W_{0}^{E}).

In addition to (2.15), we further restrict the admissible set of policy matrices to the following:

𝒜P:={FPm×n|FPhas rankmin(m,n)},𝒜E:={FEk×n|FEhas rankmin(k,n)}.\mathcal{A}^{P}:=\big{\{}F^{P}\in\mathbb{R}^{m\times n}|F^{P}\,\,\text{has rank}\min(m,n)\big{\}},\quad\mathcal{A}^{E}:=\big{\{}F^{E}\in\mathbb{R}^{k\times n}|F^{E}\,\,\text{has rank}\min(k,n)\big{\}}.

For time t1t\geq 1, we have the following result.

Theorem 2.4 (Sufficient Statistics in Two-player Games).

Assume the sufficient statistic of player ii (i=P,E)(i=P,E) at decision time t=0t=0 is x0N(x^0i,W0i)x_{0}\sim N(\widehat{x}_{0}^{i},W_{0}^{i}). In addition, assume both players are applying linear strategies. Namely, utP=FtP𝔼[xt|tP]u^{P}_{t}=F_{t}^{P}\mathbb{E}[x_{t}|\mathcal{H}_{t}^{P}] and utE=FtE𝔼[xt|tE]u^{E}_{t}=F_{t}^{E}\mathbb{E}[x_{t}|\mathcal{H}_{t}^{E}] for some matrices FtP𝒜PF_{t}^{P}\in\mathcal{A}^{P} and FtE𝒜EF_{t}^{E}\in\mathcal{A}^{E}. Then, for time 1tT11\leq t\leq T-1, the distribution of the physical state xtx_{t} calculated by player ii conditioning on the private information available to him at time tt follows

xt𝒩(x^ti,Σ^ti),x_{t}\sim\mathcal{N}(\widehat{x}_{t}^{i},\widehat{\Sigma}_{t}^{i}), (2.20)

where, for jij\neq i,

Jt1i\displaystyle J_{t-1}^{i} =(Σ^t1iΣ~t1(i,j))Σ^t1(i,j)(Yt1j)(Yt1jΣ^t1(i,j)Σ^t1(i,j)(Yt1j))1,\displaystyle=\Big{(}\widehat{\Sigma}_{t-1}^{i}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}\widehat{\Sigma}_{t-1}^{(i,j)}({Y}_{t-1}^{j})^{\top}\Big{(}{Y}_{t-1}^{j}\widehat{\Sigma}_{t-1}^{(i,j)}\widehat{\Sigma}_{t-1}^{(i,j)}({Y}_{t-1}^{j})^{\top}\Big{)}^{-1}, (2.21a)
(x^t1i)+\displaystyle(\widehat{x}^{i}_{t-1})^{+} =x^t1i+Jt1i(yt1jYt1jx^t1i),\displaystyle=\widehat{x}^{i}_{t-1}+J_{t-1}^{i}({y}^{j}_{t-1}-{Y}_{t-1}^{j}\widehat{x}_{t-1}^{i}), (2.21b)
(Σ^t1i)+\displaystyle(\widehat{\Sigma}_{t-1}^{i})^{+} =Σ^t1i(Σ^t1iΣ~t1(i,j))(Σ^t1(i,j))1(Σ^t1iΣ~t1(i,j)),\displaystyle=\widehat{\Sigma}^{i}_{t-1}-\Big{(}\widehat{\Sigma}^{i}_{t-1}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}(\widehat{\Sigma}_{t-1}^{(i,j)})^{-1}\Big{(}\widehat{\Sigma}_{t-1}^{i}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}^{\top}, (2.21c)
(x^ti)\displaystyle\big{(}\widehat{x}^{i}_{t}\big{)}^{-} =At1(x^t1i)++Bt1Put1P+Bt1Eut1E,\displaystyle=A_{t-1}(\widehat{x}^{i}_{t-1})^{+}+B^{P}_{t-1}u^{P}_{t-1}+B^{E}_{t-1}u^{E}_{t-1}, (2.21d)
(Σ^ti)\displaystyle\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-} =At1(Σ^t1i)+At1+Γt1WΓt1,\displaystyle=A_{t-1}(\widehat{\Sigma}^{i}_{t-1})^{+}A_{t-1}^{\top}+\Gamma_{t-1}W\Gamma_{t-1}^{\top}, (2.21e)
Kti\displaystyle K_{t}^{i} =(Σ^ti)(Hti)[Hti(Σ^ti)(Hti)+Gi]1,\displaystyle=\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}(H_{t}^{i})^{\top}\left[H_{t}^{i}\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}(H^{i}_{t})^{\top}+G^{i}\right]^{-1}, (2.21f)
x^ti\displaystyle\widehat{x}_{t}^{i} =(x^ti)+Kti[ztiHti(x^ti)],\displaystyle=\big{(}\widehat{x}_{t}^{i}\big{)}^{-}+K_{t}^{i}\left[z_{t}^{i}-H_{t}^{i}\big{(}\widehat{x}_{t}^{i}\big{)}^{-}\right], (2.21g)
Σ^ti\displaystyle\widehat{\Sigma}^{i}_{t} =(IKtiHti)(Σ^ti),\displaystyle=\big{(}I-K_{t}^{i}H^{i}_{t}\big{)}\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}, (2.21h)
Σ~t(i,j)\displaystyle\widetilde{\Sigma}_{t}^{(i,j)} =(IKtiHti)(At1Δt1(i,j)At1+Γt1WΓt1)(IKtjHtj)\displaystyle=\left(I-K_{t}^{i}H_{t}^{i}\right)\left(A_{t-1}\Delta_{t-1}^{(i,j)}A_{t-1}^{\top}+\Gamma_{t-1}W\Gamma_{t-1}^{\top}\right)\left(I-K_{t}^{j}H^{j}_{t}\right)^{\top} (2.21i)
Δt1(i,j)\displaystyle\Delta_{t-1}^{(i,j)} =(Σ^t1iΣ~t1(i,j))(Σ^t1(i,j))1(Σ^t1jΣ~t1(j,i))+Σ~t1(i,j)\displaystyle={(\widehat{\Sigma}^{i}_{t-1}-\widetilde{\Sigma}_{t-1}^{(i,j)})(\widehat{\Sigma}^{(i,j)}_{t-1})^{-1}(\widehat{\Sigma}^{j}_{t-1}-\widetilde{\Sigma}_{t-1}^{(j,i)})^{\top}+\widetilde{\Sigma}_{t-1}^{(i,j)}} (2.21j)
Σ^t(i,j)\displaystyle\widehat{\Sigma}_{t}^{(i,j)} =Σ^ti+Σ^tjΣ~t(i,j)(Σ~t(i,j)),\displaystyle=\widehat{\Sigma}_{t}^{i}+\widehat{\Sigma}_{t}^{j}-\widetilde{\Sigma}_{t}^{(i,j)}-\left(\widetilde{\Sigma}_{t}^{(i,j)}\right)^{\top}, (2.21k)

where Σ^t1(i,j)\widehat{\Sigma}_{t-1}^{(i,j)} is positive definite. The values of YtPm×nY_{t}^{P}\in\mathbb{R}^{m\times n}, YtEk×nY_{t}^{E}\in\mathbb{R}^{k\times n} and ytPy_{t}^{P},ytEy_{t}^{E} depend on the ranks of FtPF_{t}^{P} and FtEF_{t}^{E} as follows:

  • (i)

    The pair

    (YtP,ytP)={(FtP,utP)if FtP has rank m<n,(In,x^tP)if FtP has rank nm.(Y_{t}^{P},y_{t}^{P})=\left\{\begin{array}[]{ll}(F_{t}^{P},u_{t}^{P})&\mbox{if $F_{t}^{P}$ has rank $m<n$,}\\ (I_{n},\widehat{x}_{t}^{P})&\mbox{if $F_{t}^{P}$ has rank $n\leq m$.}\end{array}\right.
  • (ii)

    The pair

    (YtE,ytE)={(FtE,utE)if FtP has rank k<n,(In,x^tE)if FtP has rank nk.(Y_{t}^{E},y_{t}^{E})=\left\{\begin{array}[]{ll}(F_{t}^{E},u_{t}^{E})&\mbox{if $F_{t}^{P}$ has rank $k<n$,}\\ (I_{n},\widehat{x}_{t}^{E})&\mbox{if $F_{t}^{P}$ has rank $n\leq k$.}\end{array}\right.

In addition, the initial conditions are Σ^0i=W0i\widehat{\Sigma}^{i}_{0}=W^{i}_{0}, Σ~0(i,j)=0\widetilde{\Sigma}_{0}^{(i,j)}=0, and Σ^0(i,j)=Σ^0i+Σ^0j\widehat{\Sigma}^{(i,j)}_{0}=\widehat{\Sigma}_{0}^{i}+\widehat{\Sigma}_{0}^{j}. Finally, in player ii’s view, the posterior distribution for the state estimate x^tj\widehat{x}_{t}^{j} of player jj is

x^tj𝒩(x^ti,Σ^t(i,j)).\displaystyle\widehat{x}_{t}^{j}\sim\mathcal{N}(\widehat{x}_{t}^{i},\widehat{\Sigma}_{t}^{(i,j)}). (2.22)
Remark 2.5.
  1. 1.

    Jt1iJ_{t-1}^{i} in (2.21a) is the Kalman gain for player ii when viewing player jj’s action as the additional signal to improve the state estimation in the previous step t1t-1. We call (x^t1i)+(\widehat{x}_{t-1}^{i})^{+} in (2.21b) the improved-estimate for xt1x_{t-1}, with the corresponding estimation error (Σ^)t1+(\widehat{\Sigma})_{t-1}^{+}.

  2. 2.

    The post-estimates of the state and covariance after the measurement/signal update (2.21g)-(2.21h) take similar forms to the single-agent case (2.13d)-(2.13e). The differences occur in the input state and covariance estimates. In particular, the post-estimate for the single-agent setting uses the pre-estimate as the input whereas the post-estimate for the two-player setting uses the improved-estimate as the input.

  3. 3.

    Equation (2.22) in Theorem 2.4 shows that the chain of “belief about belief” stops at the second step, as the belief at the second step becomes a public function of the beliefs at the first step.

  4. 4.

    When n>k,mn>k,m, players PP and EE will not be able to recover the opponent’s state estimate via observing the action taken by the opponent. Instead, from player ii’s viewpoint, the posterior distribution for the state estimate x^tj\widehat{x}_{t}^{j} of player jj follows a Gaussian distribution with mean x^ti\widehat{x}_{t}^{i} and variance Σ^t(i,j)\widehat{\Sigma}_{t}^{(i,j)}.

  5. 5.

    Consider the special case that

    FtPm×n has rankn,nm, and FtEk×n has rankn,nk.\displaystyle F_{t}^{P}\in\mathbb{R}^{m\times n}\,\,\text{ has rank}\,\,n,\,\,n\leq m,\text{ and }F_{t}^{E}\in\mathbb{R}^{k\times n}\,\,\text{ has rank}\,\,n,\,\,n\leq k. (2.23)

    In this case, player ii can fully recover the state estimate from player jj by observing her actions, as the RHS of the following equation is fully known to player ii:

    𝔼[xt|tj]=((Ftj)Ftj)1(Ftj)utj.\displaystyle\mathbb{E}[x_{t}|\mathcal{H}_{t}^{j}]=((F_{t}^{j})^{\top}F_{t}^{j})^{-1}(F_{t}^{j})^{\top}\,u^{j}_{t}. (2.24)

    Observing that Jti+Jtj=I{J}_{t}^{i}+{J}_{t}^{j}=I, we have

    (x^ti)+=(IJti)x^ti+Jtix^tj=Jtjx^ti+(IJti)x^tj=(x^tj)+.\displaystyle(\widehat{x}_{t}^{i})^{+}=\left(I-{J}^{i}_{t}\right)\,\widehat{x}_{t}^{i}+{J}^{i}_{t}\widehat{x}_{t}^{j}={J}^{j}_{t}\widehat{x}_{t}^{i}+\left(I-{J}^{i}_{t}\right)\,\widehat{x}_{t}^{j}=(\widehat{x}_{t}^{j})^{+}.

    This shows that Player P and Player E have the same improved estimate after observing each other’s actions. In this case, information is fully shared between the players.

Proof.

There are four possible combinations under the conditions (i)-(ii) stated in Theorem 2.4. Here we only show the proof for the following combination as the proof for each of the other combinations follows the same logic:

FtPm×n has rankm,m<n; and FtEk×n has rankk,k<n.\displaystyle F_{t}^{P}\in\mathbb{R}^{m\times n}\,\,\text{ has rank}\,\,m,\,\,m<n;\text{ and }F_{t}^{E}\in\mathbb{R}^{k\times n}\,\,\text{ has rank}\,\,k,\,\,k<n. (2.25)

In addition, under condition (2.25), we only prove the results for player PP here as the results for player EE follow in the same way.

We handle the new information ut1Eu^{E}_{t-1} and ztPz_{t}^{P}, in an incremental fashion. More precisely, we first adjust the estimate x^t1P\widehat{x}^{P}_{t-1} using ut1Eu^{E}_{t-1}, denoted by (x^t1P)+(\widehat{x}_{t-1}^{P})^{+}, and then derive x^tP\widehat{x}^{P}_{t} using ztPz_{t}^{P} and (x^t1P)+(\widehat{x}_{t-1}^{P})^{+}.

After player PP observes the action ut1Eu^{E}_{t-1} from player EE, player PP updates:

(x^t1P)+:=𝔼[xt1|t1P{ut1E}].\displaystyle(\widehat{x}_{t-1}^{P})^{+}:=\left.\mathbb{E}\Big{[}x_{t-1}\right|\mathcal{H}_{t-1}^{P}\cup\{u^{E}_{t-1}\}\Big{]}. (2.26)

Following the convention in filtering theory [17], we write:

(x^t1P)+=x^t1P+Jt1P(ut1EFt1Ex^t1P),\displaystyle(\widehat{x}_{t-1}^{P})^{+}=\widehat{x}_{t-1}^{P}+J_{t-1}^{P}\Big{(}u^{E}_{t-1}-F_{t-1}^{E}\,\widehat{x}_{t-1}^{P}\Big{)}, (2.27)

where Jt1PJ_{t-1}^{P} is a matrix to be determined to minimize 𝔼[(xt1P)+xt12]\mathbb{E}[\|(x_{t-1}^{P})^{+}-x_{t-1}\|^{2}]. To calculate Jt1PJ_{t-1}^{P}, we have

𝐜𝐨𝐯(xt1(x^t1P)+)\displaystyle{\rm\bf cov}\Big{(}x_{t-1}-(\widehat{x}_{t-1}^{P})^{+}\Big{)}
=\displaystyle= 𝐜𝐨𝐯(xt1x^t1PJt1P(ut1EFtEx^t1P))\displaystyle{\rm\bf cov}\Big{(}x_{t-1}-\widehat{x}_{t-1}^{P}-J_{t-1}^{P}(u^{E}_{t-1}-F_{t}^{E}\widehat{x}_{t-1}^{P})\Big{)}
=\displaystyle= 𝐜𝐨𝐯((IJt1PFt1E)et1PJt1PFt1Eet1E)\displaystyle{\rm\bf cov}\Big{(}-(I-J_{t-1}^{P}F_{t-1}^{E})e_{t-1}^{P}\,-\,J_{t-1}^{P}F_{t-1}^{E}e_{t-1}^{E}\Big{)}
+(IJt1PFt1E)Σ~t1(P,E)(Jt1PFt1E)+(Jt1PFt1E)(Σ~t1(P,E))(IJt1PFt1E)\displaystyle\qquad+(I-J_{t-1}^{P}F_{t-1}^{E})\widetilde{\Sigma}_{t-1}^{(P,E)}(J_{t-1}^{P}F_{t-1}^{E})^{\top}+(J_{t-1}^{P}F_{t-1}^{E})(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(I-J_{t-1}^{P}F_{t-1}^{E})^{\top}
=\displaystyle= Σ^t1PJt1PFt1EΣ^t1PΣ^t1P(Ft1E)(Jt1P)+Jt1PFt1EΣ^t1P(Ft1E)(Jt1P)\displaystyle\widehat{\Sigma}_{t-1}^{P}-J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{P}-\widehat{\Sigma}_{t-1}^{P}(F_{t-1}^{E})^{\top}(J_{t-1}^{P})^{\top}+J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{P}(F_{t-1}^{E})^{\top}(J_{t-1}^{P})^{\top}
+Jt1PFt1EΣ^t1E(Ft1E)(Jt1P)+Σ~t1(P,E)(Ft1E)(Jt1P)Jt1PFt1EΣ~t1(P,E)(Ft1E)(Jt1P)\displaystyle\qquad+J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{E}(F_{t-1}^{E})^{\top}(J_{t-1}^{P})^{\top}+\widetilde{\Sigma}^{(P,E)}_{t-1}(F^{E}_{t-1})^{\top}(J^{P}_{t-1})^{\top}-J_{t-1}^{P}F_{t-1}^{E}\widetilde{\Sigma}^{(P,E)}_{t-1}(F^{E}_{t-1})^{\top}(J^{P}_{t-1})^{\top}
+Jt1PFT1P(Σ~t1(P,E))Jt1PFt1E(Σ~t1(P,E))(Ft1E)(Jt1P).\displaystyle\qquad+J_{t-1}^{P}F_{T-1}^{P*}(\widetilde{\Sigma}^{(P,E)}_{t-1})^{\top}-J_{t-1}^{P}F_{t-1}^{E}(\widetilde{\Sigma}^{(P,E)}_{t-1})^{\top}(F_{t-1}^{E})^{\top}(J_{t-1}^{P})^{\top}.

Note that minimizing 𝔼[(xt1P)+xt12]\mathbb{E}[\|(x_{t-1}^{P})^{+}-x_{t-1}\|^{2}] is equivalent to minimizing Tr(𝐜𝐨𝐯(xt1(x^t1P)+))\operatorname{Tr}\left({\rm\bf cov}\Big{(}x_{t-1}-(\widehat{x}_{t-1}^{P})^{+}\Big{)}\right). Taking the derivative with respect to Jt1PJ_{t-1}^{P} and setting it to zero, we have

Tr(𝐜𝐨𝐯(xt1(x^t1P)+))Jt1P=2Ft1EΣ^t1P+2Ft1EΣ^t1(P,E)(Ft1E)(Jt1P)+2Ft1E(Σ~t1(P,E))=0,\displaystyle\frac{\partial\operatorname{Tr}\left({\rm\bf cov}\Big{(}x_{t-1}-(\widehat{x}_{t-1}^{P})^{+}\Big{)}\right)}{\partial J_{t-1}^{P}}=-2F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{P}+2F_{t-1}^{E}\widehat{\Sigma}^{(P,E)}_{t-1}(F^{E}_{t-1})^{\top}(J^{P}_{t-1})^{\top}+2F_{t-1}^{E}(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}=0,

which is equivalent to the following equation (since Σ^t1(P,E)\widehat{\Sigma}_{t-1}^{(P,E)} is symmetric by its definition)

Σ^t1PΣ~t1(P,E)=Jt1PFt1EΣ^t1(P,E).\displaystyle\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}=J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{(P,E)}. (2.28)

When Ft1EΣ^t1(P,E)Σ^t1(P,E)(Ft1E)F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{(P,E)}\widehat{\Sigma}_{t-1}^{(P,E)}(F_{t-1}^{E})^{\top} is of rank kk (which will be shown at the end of the proof), we have

Jt1P=(Σ^t1PΣ~t1(P,E))Σ^t1(P,E)(Ft1E)(Ft1EΣ^t1(P,E)Σ^t1(P,E)(Ft1E))1.\displaystyle J_{t-1}^{P}=\Big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}\Big{)}\widehat{\Sigma}_{t-1}^{(P,E)}(F_{t-1}^{E})^{\top}\Big{(}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{(P,E)}\widehat{\Sigma}_{t-1}^{(P,E)}(F_{t-1}^{E})^{\top}\Big{)}^{-1}. (2.29)

Using the expression in (2.29), we have

(Σ^t1P)+\displaystyle(\widehat{\Sigma}_{t-1}^{P})^{+} :=\displaystyle:= 𝐜𝐨𝐯(xt1(x^t1P)+)=Σ^t1P(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1PΣ~t1(P,E)).\displaystyle{\rm\bf cov}\Big{(}x_{t-1}-(\widehat{x}_{t-1}^{P})^{+}\Big{)}=\widehat{\Sigma}^{P}_{t-1}-(\widehat{\Sigma}^{P}_{t-1}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\Big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}\Big{)}^{\top}.

Then we have the pre-estimate:

(x^tP)=At1(x^t1P)++Bt1Put1P+Bt1Eut1E.\displaystyle(\widehat{x}_{t}^{P})^{-}=A_{t-1}(\widehat{x}_{t-1}^{P})^{+}+B^{P}_{t-1}u^{P}_{t-1}+B^{E}_{t-1}u^{E}_{t-1}. (2.30)

The post-estimate after observing the signal/measure ztPz_{t}^{P} at time tt is defined as:

x^tP=(x^tP)+KtP(ztPHtP(x^tP)),\displaystyle\widehat{x}_{t}^{P}=(\widehat{x}_{t}^{P})^{-}+K_{t}^{P}\Big{(}z_{t}^{P}-H_{t}^{P}(\widehat{x}_{t}^{P})^{-}\Big{)}, (2.31)

with a variance

(Σ^tP):=𝔼[(x^tPxt)(x^tPxt)].\displaystyle(\widehat{\Sigma}_{t}^{P})^{-}:=\mathbb{E}[(\widehat{x}_{t}^{P}-x_{t})(\widehat{x}_{t}^{P}-x_{t})^{\top}]. (2.32)

In the same way as for the derivation of Jt1PJ_{t-1}^{P}, we can show that the following choice of KtPK_{t}^{P} minimizes the quantity 𝔼[xtx^tP2]\mathbb{E}[\|x_{t}-\widehat{x}_{t}^{P}\|^{2}]:

KtP=(Σ^t1P)(HtP)[HtP(Σ^t1P)(HtP)+GP]1.\displaystyle K_{t}^{P}=(\widehat{\Sigma}_{t-1}^{P})^{-}(H_{t}^{P})^{\top}\Big{[}H_{t}^{P}(\widehat{\Sigma}_{t-1}^{P})^{-}(H_{t}^{P})^{\top}+G^{P}\Big{]}^{-1}. (2.33)

The corresponding covariance takes the form:

Σ^tP:=𝔼[xtx^tP2]=(IKtPHtP)(Σ^t1P).\displaystyle\widehat{\Sigma}_{t}^{P}:=\mathbb{E}\Big{[}\left\|x_{t}-\widehat{x}_{t}^{P}\right\|^{2}\Big{]}=(I-K_{t}^{P}H_{t}^{P})(\widehat{\Sigma}_{t-1}^{P})^{-}. (2.34)

To update player PP’s belief of player EE’s state, define similarly to the case (2.18) when t=0t=0,

xt=x^tPetP=x^tEetE,\displaystyle x_{t}=\widehat{x}_{t}^{P}-e_{t}^{P}=\widehat{x}_{t}^{E}-e_{t}^{E}, (2.35)

where etPe_{t}^{P} and etEe_{t}^{E} are the estimation errors from players PP and EE, respectively. Given that (2.35) is equivalent to the following:

x^tP=x^tE+(etPetE),\displaystyle\widehat{x}_{t}^{P}=\widehat{x}_{t}^{E}+(e_{t}^{P}-e_{t}^{E}), (2.36)

Player PP’s posterior distribution for the state estimate x^tE\widehat{x}_{t}^{E} of player EE is

x^tE𝒩(x^tP,Σ^t(P,E))\displaystyle\widehat{x}_{t}^{E}\sim\mathcal{N}(\widehat{x}_{t}^{P},\widehat{\Sigma}_{t}^{(P,E)}) (2.37)

where the estimation error covariance matrix Σ^t(P,E)\widehat{\Sigma}_{t}^{(P,E)} is defined as

Σ^t(P,E)\displaystyle\widehat{\Sigma}_{t}^{(P,E)} =\displaystyle= 𝔼[(etEetP)(etEetP)]=Σ^tP+Σ^tEΣ~t(P,E)(Σ~t(P,E)),\displaystyle\mathbb{E}\left[\big{(}e^{E}_{t}-e^{P}_{t}\big{)}\big{(}e^{E}_{t}-e^{P}_{t}\big{)}^{\top}\right]=\widehat{\Sigma}_{t}^{P}+\widehat{\Sigma}_{t}^{E}-\widetilde{\Sigma}_{t}^{(P,E)}-\left(\widetilde{\Sigma}_{t}^{(P,E)}\right)^{\top}, (2.38)

in which Σ^0P=W0P\widehat{\Sigma}_{0}^{P}=W_{0}^{P}, Σ^0E=W0E\widehat{\Sigma}_{0}^{E}=W_{0}^{E}, Σ^tP=𝔼[etP(etP)]\widehat{\Sigma}_{t}^{P}=\mathbb{E}[e_{t}^{P}(e_{t}^{P})^{\top}], Σ^tE=𝔼[etE(etE)]\widehat{\Sigma}_{t}^{E}=\mathbb{E}[e_{t}^{E}(e_{t}^{E})^{\top}], and Σ~t(P,E)=𝔼[etP(etE)]\widetilde{\Sigma}_{t}^{(P,E)}=\mathbb{E}[e_{t}^{P}(e_{t}^{E})^{\top}]. We will see that Σ~t(P,E)\widetilde{\Sigma}_{t}^{(P,E)} satisfies a recursive linear matrix equation which is a Lyapunov equation:

Σ~t(P,E)=(IKtPHtP)(At1Δt1(P,E)At1+Γt1WΓt1)(IKtEHtE),\displaystyle\widetilde{\Sigma}_{t}^{(P,E)}=\left(I-K_{t}^{P}H_{t}^{P}\right)\left(A_{t-1}\Delta_{t-1}^{(P,E)}A_{t-1}^{\top}+\Gamma_{t-1}W\Gamma_{t-1}^{\top}\right)\left(I-K_{t}^{E}H^{E}_{t}\right)^{\top}, (2.39)

with

Δt1(P,E)\displaystyle\Delta_{t-1}^{(P,E)} =\displaystyle= (IJt1PFt1E)Σ~t1(P,E)(IJt1EFt1P)+(IJt1PFt1E)Σ^t1P(Jt1EFt1P)\displaystyle(I-J_{t-1}^{P}F_{t-1}^{E})\widetilde{\Sigma}_{t-1}^{(P,E)}(I-J_{t-1}^{E}F_{t-1}^{P})^{\top}+(I-J_{t-1}^{P}F_{t-1}^{E})\widehat{\Sigma}_{t-1}^{P}(J_{t-1}^{E}F_{t-1}^{P})^{\top} (2.40)
+Jt1PFt1E(Σ~t1(P,E))(Jt1EFt1P)+Jt1PFt1EΣ^t1E(IJt1EFt1P).\displaystyle+J_{t-1}^{P}F_{t-1}^{E}(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(J_{t-1}^{E}F_{t-1}^{P})^{\top}+J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{E}(I-J_{t-1}^{E}F_{t-1}^{P})^{\top}.

The equations (2.39) and (2.40) hold since given (2.21d)-(2.21h), we have

x^tP=At1(x^t1P)++BT1Put1P+BT1Evt1E+KtPHtPAt1(xt1(x^t1P)+)+KtPHtPΓt1wt1+KtPwtP,\widehat{x}_{t}^{P}=A_{t-1}(\widehat{x}_{t-1}^{P})^{+}+B^{P}_{T-1}u^{P}_{t-1}+B^{E}_{T-1}v^{E}_{t-1}+K_{t}^{P}H_{t}^{P}A_{t-1}\big{(}x_{t-1}-(\widehat{x}_{t-1}^{P})^{+}\big{)}+K_{t}^{P}H_{t}^{P}\Gamma_{t-1}w_{t-1}+K_{t}^{P}w_{t}^{P}, (2.41)

and hence

etP=x^tPxt\displaystyle e_{t}^{P}=\widehat{x}_{t}^{P}-x_{t} =\displaystyle= (IKtPHtP)At1((x^t1P)+xt1)+(KtPHtPI)Γt1wt1+KtPwtP\displaystyle(I-K_{t}^{P}H_{t}^{P})A_{t-1}\big{(}(\widehat{x}_{t-1}^{P})^{+}-x_{t-1}\big{)}+(K_{t}^{P}H_{t}^{P}-I)\Gamma_{t-1}w_{t-1}+K_{t}^{P}w_{t}^{P} (2.42)
=\displaystyle= (IKtPHtP)At1((IJt1PFt1E)et1P+Jt1PFt1Eet1E)\displaystyle(I-K_{t}^{P}H_{t}^{P})A_{t-1}\,{\Big{(}(I-J_{t-1}^{P}F_{t-1}^{E})e_{t-1}^{P}+J_{t-1}^{P}F_{t-1}^{E}e_{t-1}^{E}\Big{)}}
+(KtPHtPI)Γt1wt1+KtPwtP.\displaystyle+(K_{t}^{P}H_{t}^{P}-I)\Gamma_{t-1}w_{t-1}+K_{t}^{P}w_{t}^{P}.

Similarly, we have

etE=(IKtEHtE)At1((IJt1EFt1P)et1E+Jt1EFt1Pet1P)+(KtEHtEI)Γt1wt1+KtEwtE.\displaystyle e_{t}^{E}=(I-K_{t}^{E}H_{t}^{E})A_{t-1}\,{\Big{(}(I-J_{t-1}^{E}F_{t-1}^{P})e_{t-1}^{E}+J_{t-1}^{E}F_{t-1}^{P}e_{t-1}^{P}\Big{)}}+(K_{t}^{E}H_{t}^{E}-I)\Gamma_{t-1}w_{t-1}+K_{t}^{E}w_{t}^{E}. (2.43)

Calculating etP(etE)e_{t}^{P}(e_{t}^{E})^{\top} using (2.42) and (2.43) and taking the expectation lead to (2.39) and (2.40).

Now we simplify (2.40) to obtain (2.21j). As for (2.28), we have

Σ^t1E(Σ~t1(P,E))=Jt1EFt1PΣ^t1(P,E).\displaystyle\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}=J_{t-1}^{E}F_{t-1}^{P}\widehat{\Sigma}_{t-1}^{(P,E)}. (2.44)

By substituting Jt1EFt1P=(Σ^t1E(Σ~t1(P,E)))(Σ^t1(P,E))1J_{t-1}^{E}F_{t-1}^{P}=(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1} and Jt1PFt1E=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1J_{t-1}^{P}F_{t-1}^{E}=(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1} into (2.40) and by using the fact that Jt1EFt1P+Jt1PFt1E=IJ_{t-1}^{E}F_{t-1}^{P}+J_{t-1}^{P}F_{t-1}^{E}=I, we can rewrite Δt1(P,E)\Delta_{t-1}^{(P,E)} as:

Δt1(P,E)\displaystyle\Delta_{t-1}^{(P,E)} =\displaystyle= Jt1EFt1PΣ~t1(P,E)(Jt1PFt1E)+Jt1EFt1PΣ^t1P(Jt1EFt1P)\displaystyle J_{t-1}^{E}F_{t-1}^{P}\widetilde{\Sigma}_{t-1}^{(P,E)}(J_{t-1}^{P}F_{t-1}^{E})^{\top}+J_{t-1}^{E}F_{t-1}^{P}\widehat{\Sigma}_{t-1}^{P}(J_{t-1}^{E}F_{t-1}^{P})^{\top} (2.45)
+Jt1PFt1E(Σ~t1(P,E))(Jt1EFt1P)+Jt1PFt1EΣ^t1E(Jt1PFt1E)\displaystyle+J_{t-1}^{P}F_{t-1}^{E}(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(J_{t-1}^{E}F_{t-1}^{P})^{\top}+J_{t-1}^{P}F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{E}(J_{t-1}^{P}F_{t-1}^{E})^{\top}
=\displaystyle= (Σ^t1E(Σ~t1(P,E)))(Σ^t1(P,E))1Σ~t1(P,E)(Σ^t1(P,E))1(Σ^t1PΣ~t1(P,E))\displaystyle(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\widetilde{\Sigma}_{t-1}^{(P,E)}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top} (2.49)
+(Σ^t1E(Σ~t1(P,E)))(Σ^t1(P,E))1Σ^t1P(Σ^t1(P,E))1(Σ^t1E(Σ~t1(P,E)))\displaystyle+(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\widehat{\Sigma}_{t-1}^{P}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})^{\top}
+(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ~t1(P,E))(Σ^t1(P,E))1(Σ^t1E(Σ~t1(P,E)))\displaystyle+(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})^{\top}
+(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1Σ^t1E(Σ^t1(P,E))1(Σ^t1PΣ~t1(P,E)).\displaystyle+(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\widehat{\Sigma}_{t-1}^{E}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}.

Given the fact that Σ^t1E(Σ~t1(P,E))=Σ^t1(P,E)(Σ^t1PΣ~t1(P,E))\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}=\widehat{\Sigma}_{t-1}^{(P,E)}-(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}), we have

(2.49)\displaystyle\eqref{eq:term1} =\displaystyle= Σ~t1(P,E)(Σ^t1(P,E))1(Σ^t1P(Σ~t1(P,E)))\displaystyle\widetilde{\Sigma}_{t-1}^{(P,E)}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}) (2.51)
(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1Σ~t1(P,E)(Σ^t1(P,E))1(Σ^t1P(Σ~t1(P,E))).\displaystyle-(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\widetilde{\Sigma}_{t-1}^{(P,E)}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}).

Similarly, Σ^t1PΣ~t1(P,E)=Σ^t1(P,E)(Σ^t1E(Σ~t1(P,E)))\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}=\widehat{\Sigma}_{t-1}^{(P,E)}-(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}) leads to the following relationship

(2.49)\displaystyle\eqref{eq:term3} =\displaystyle= (Σ~t1(P,E))(Σ^t1(P,E))1(Σ^t1EΣ~t1(P,E))\displaystyle(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-\widetilde{\Sigma}_{t-1}^{(P,E)}) (2.53)
(Σ^t1E(Σ~t1(P,E)))(Σ^t1(P,E))1(Σ~t1(P,E))(Σ^t1(P,E))1(Σ^t1EΣ~t1(P,E)).\displaystyle-(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-\widetilde{\Sigma}_{t-1}^{(P,E)}).

Combine (2.51) and (2.49), we have

(2.51)+(2.49)=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1EΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1PΣ~t1(P,E)).\displaystyle\eqref{eq:term1b}+\eqref{eq:term4}=(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}. (2.54)

Combine (2.53) and (2.49), we have

(2.53)+(2.49)=(Σ^t1E(Σ~t1(P,E)))(Σ^t1(P,E))1(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1EΣ~t1(P,E)).\displaystyle\eqref{eq:term3b}+\eqref{eq:term2}=(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-\widetilde{\Sigma}_{t-1}^{(P,E)}). (2.55)

It is easy to check that (2.54)+(2.55)=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1E(Σ~t1(P,E))).\eqref{eq:term5}+\eqref{eq:term6}=(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top}(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})^{\top}. Combining with (2.51) and (2.53), we have

Δt1(P,E)=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1(Σ^t1E(Σ~t1(P,E)))+Σ~t1(P,E).\displaystyle\Delta_{t-1}^{(P,E)}=(\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}(\widehat{\Sigma}_{t-1}^{E}-(\widetilde{\Sigma}_{t-1}^{(P,E)})^{\top})^{\top}+\widetilde{\Sigma}_{t-1}^{(P,E)}. (2.56)

Finally we show that Σ^t1(P,E)\widehat{\Sigma}_{t-1}^{(P,E)} is positive definite, which guarantees that Ft1EΣ^t1(P,E)Σ^t1(P,E)(Ft1E)F_{t-1}^{E}\widehat{\Sigma}_{t-1}^{(P,E)}\widehat{\Sigma}_{t-1}^{(P,E)}(F_{t-1}^{E})^{\top} is of rank kk. To see this, we have

Σ^t(P,E)\displaystyle\widehat{\Sigma}_{t}^{(P,E)} =\displaystyle= 𝔼[(etEetP)(etEetP)]=𝔼[st1(st1)]+KtPGP(KtP)+KtEGE(KtE),\displaystyle\mathbb{E}\left[\big{(}e^{E}_{t}-e^{P}_{t}\big{)}\big{(}e^{E}_{t}-e^{P}_{t}\big{)}^{\top}\right]=\mathbb{E}[s_{t-1}(s_{t-1})^{\top}]+K_{t}^{P}G^{P}(K_{t}^{P})^{\top}+K_{t}^{E}G^{E}(K_{t}^{E})^{\top},

with st1s_{t-1} defined as

st1\displaystyle s_{t-1} =\displaystyle= (IKtEHtE)At1((IJt1EFt1P)et1E+Jt1EFt1Pet1P)+(KtEHtEI)Γt1wt1\displaystyle(I-K_{t}^{E}H_{t}^{E})A_{t-1}\,{\Big{(}(I-J_{t-1}^{E}F_{t-1}^{P})e_{t-1}^{E}+J_{t-1}^{E}F_{t-1}^{P}e_{t-1}^{P}\Big{)}}+(K_{t}^{E}H_{t}^{E}-I)\Gamma_{t-1}w_{t-1}
(IKtPHtP)At1((IJt1PFt1E)et1P+Jt1PFt1Eet1E)(KtPHtPI)Γt1wt1.\displaystyle-(I-K_{t}^{P}H_{t}^{P})A_{t-1}\,{\Big{(}(I-J_{t-1}^{P}F_{t-1}^{E})e_{t-1}^{P}+J_{t-1}^{P}F_{t-1}^{E}e_{t-1}^{E}\Big{)}}-(K_{t}^{P}H_{t}^{P}-I)\Gamma_{t-1}w_{t-1}.

It is easy to see that 𝔼[st1(st1)]\mathbb{E}[s_{t-1}(s_{t-1})^{\top}] is positive semi-definite. KtPGP(KTP)K_{t}^{P}G^{P}(K_{T}^{P})^{\top} is positive definite since KtPK_{t}^{P} has rank nn and GpG^{p} is positive definite. Similarly, KtEGE(KTE)K_{t}^{E}G^{E}(K_{T}^{E})^{\top} is also positive definite. ∎

2.2 Conditional Expectation and Tower Property

In partially observable game settings, players face the difficult task of incrementally estimating unknown quantities through information filtering, and then using those estimates to make informed decisions. In the linear-quadratic framework, each player ii needs to determine

𝔼[xt|ti], and 𝔼[xtOtxt|ti],\mathbb{E}\left.\Big{[}x_{t}\,\right|\,\mathcal{H}_{t}^{i}\Big{]},\text{ and }\mathbb{E}\Big{[}x^{\top}_{t}O_{t}x_{t}\,|\,\mathcal{H}_{t}^{i}\Big{]}, (2.57)

for any given matrix Otn×nO_{t}\in\mathbb{R}^{n\times n}. This requires projection of the unknown quantity into the space spanned by the information filtration ti\mathcal{H}_{t}^{i}. Given that ti\mathcal{H}_{t}^{i} contains information on the opponent’s action ut1ju_{t-1}^{j}, the critical challenge boils down to how to utilize this information so that the conditional expectation can be calculated in a valid incremental form to facilitate further analysis and renders the game amenable to solution by dynamic programming.

Theorem 2.4 provides an explicit formula to calculate (2.57) in an incremental format:

𝔼[xt|ti]=x^ti, and 𝔼[xtOtxt|ti]=(x^ti)Otx^ti+Tr(OtΣ^ti),\mathbb{E}\left.\Big{[}x_{t}\,\right|\,\mathcal{H}_{t}^{i}\Big{]}=\widehat{x}_{t}^{i},\text{ and }\mathbb{E}\Big{[}x^{\top}_{t}O_{t}x_{t}\,|\,\mathcal{H}_{t}^{i}\Big{]}=(\widehat{x}^{i}_{t})^{\top}O_{t}\widehat{x}^{i}_{t}+\operatorname{Tr}\left(O_{t}\widehat{\Sigma}_{t}^{i}\right), (2.58)

with x^ti\widehat{x}_{t}^{i} and Σ^ti\widehat{\Sigma}_{t}^{i} following the explicit recursive formats in (2.21g) and (2.21h), respectively.

Indeed, a similar setup was first studied in [13], in which zero-sum linear-quadratic dynamic games with partial observations and asymmetric information are considered. The players’ initial state estimate and their measurements are private information, but each player is able to observe his opponent’s past control inputs, so the players’ past controls are shared information. However, when the conditional expectation 𝔼[|ti]\mathbb{E}[\cdot|\mathcal{H}_{t}^{i}] is calculated, the recursive format proposed in [13] follows the single-agent Bayes formula (see Equations (11)-(20) in [13] or similarly (2.13a)-(2.13e)) and does not utilize the observable information of the opponent’s past controls to improve their state estimation, leading to an incorrect formula and hence the tower property fails to hold, let alone the DPP.

To be mathematically more concrete, we do a sanity check to show that the tower property holds when using the recursive expression in (2.21a)-(2.21k). Namely, from player PP’s perspective, we have

𝔼[𝔼[xtQtPxt|tP]|t1P]=𝔼[xtQtPxt|t1P]\mathbb{E}\left[\left.\mathbb{E}\left[\left.x_{t}^{\top}Q_{t}^{P}x_{t}\right|\mathcal{H}_{t}^{P}\right]\right|\mathcal{H}_{t-1}^{P}\right]=\mathbb{E}\left[\left.x_{t}^{\top}Q_{t}^{P}x_{t}\right|\mathcal{H}_{t-1}^{P}\right] (2.59)

holds when using (2.21a)-(2.21k) to unwind the conditional expectations. To do so, we will first calculate both the LHS and the RHS of (2.59) using (2.21a)-(2.21k), and then match all the terms to prove that the LHS equals the RHS. Finally, we will show that the tower property fails to hold when using the formulas in Equations (11)-(20) of [13] to unwind the conditional expectations.

Calculations using (2.21a)-(2.21k).

For the LHS of (2.59), by (2.21g),

x^TP\displaystyle\widehat{x}_{T}^{P} =\displaystyle= (x^TP)+KTP[zTPHTP(x^TP)]\displaystyle\big{(}\widehat{x}_{T}^{P}\big{)}^{-}+K_{T}^{P}\left[z_{T}^{P}-H_{T}^{P}\big{(}\widehat{x}_{T}^{P}\big{)}^{-}\right]
=\displaystyle= (AT1(x^T1P)++BT1PuT1P+BT1EuT1E)+KTPwTP\displaystyle\big{(}A_{T-1}(\widehat{x}_{T-1}^{P})^{+}+B^{P}_{T-1}u^{P}_{T-1}+B^{E}_{T-1}u^{E}_{T-1}\big{)}+K_{T}^{P}w_{T}^{P}
+KTPHTP[AT1(xT1(x^T1P)+)+ΓT1wT1],\displaystyle+K_{T}^{P}H_{T}^{P}\left[A_{T-1}\big{(}x_{T-1}-\big{(}\widehat{x}_{T-1}^{P}\big{)}^{+}\big{)}+\Gamma_{T-1}w_{T-1}\right],

where (2.2) holds by definition of zTPz_{T}^{P} given in (2.3), and (2.21d). We will just consider the case when FtPF_{t}^{P} has rank m<nm<n and FtEF_{t}^{E} has rank k<nk<n for all t=0,1,,T1t=0,1,\ldots,T-1, as the other cases will follow the same logic. Define ΠT1P:=(Σ^T1PΣ~T1(P,E))(Σ^T1(P,E))1\Pi_{T-1}^{P}:=\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{(}\widehat{\Sigma}_{T-1}^{(P,E)}\big{)}^{-1}, then (2.28) becomes

ΠT1P=JT1PFT1E.\displaystyle\Pi_{T-1}^{P}=J_{T-1}^{P}F_{T-1}^{E}. (2.61)

We can then rewrite (2.21b) as

(x^T1P)+=(IΠT1P)x^T1P+ΠT1Px^T1E=(IΠT1P)x^T1P+ΠT1P(x^T1PeT1P+eT1E).(\widehat{x}_{T-1}^{P})^{+}=(I-\Pi_{T-1}^{P})\widehat{x}_{T-1}^{P}+\Pi_{T-1}^{P}\widehat{x}_{T-1}^{E}=(I-\Pi_{T-1}^{P})\widehat{x}_{T-1}^{P}+\Pi_{T-1}^{P}(\widehat{x}_{T-1}^{P}-e_{T-1}^{P}+e_{T-1}^{E}).

Using this equation in (2.2) we obtain

x^TP=(AT1+BT1PFT1P+BT1EFT1E)x^T1P+L1eT1E+L2eT1P+KTPwTP+KTPHTPΓT1wT1,\displaystyle\widehat{x}_{T}^{P}=(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E})\widehat{x}_{T-1}^{P}+L_{1}e_{T-1}^{E}+L_{2}e_{T-1}^{P}+K_{T}^{P}w_{T}^{P}+K_{T}^{P}H_{T}^{P}\Gamma_{T-1}w_{T-1}, (2.62)

with

L1\displaystyle L_{1} :=\displaystyle:= AT1ΠT1P+BT1EFT1EKTPHTPAT1ΠT1P,\displaystyle A_{T-1}\Pi_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E}-K_{T}^{P}H_{T}^{P}A_{T-1}\Pi_{T-1}^{P}, (2.63)
L2\displaystyle L_{2} :=\displaystyle:= AT1ΠT1PBT1EFT1EKTPHTPAT1(IΠT1P).\displaystyle-A_{T-1}\Pi_{T-1}^{P}-B^{E}_{T-1}F_{T-1}^{E}-K_{T}^{P}H_{T}^{P}A_{T-1}(I-\Pi_{T-1}^{P}). (2.64)

Then substituting (2.62) into the LHS of (2.59), we have

𝔼[𝔼[xTQTPxT|TP]|T1P]=𝔼[(x^TP)QTPx^TP|T1P]+Tr(QTPΣ^TP)\displaystyle\mathbb{E}\left[\left.\mathbb{E}\left[\left.x_{T}^{\top}Q_{T}^{P}x_{T}\right|\mathcal{H}_{T}^{P}\right]\right|\mathcal{H}_{T-1}^{P}\right]=\mathbb{E}\left[\left.(\widehat{x}_{T}^{P})^{\top}Q_{T}^{P}\widehat{x}_{T}^{P}\right|\mathcal{H}_{T-1}^{P}\right]+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P}) (2.65)
=\displaystyle= (x^T1P)(AT1+BT1PFT1P+BT1EFT1E)QTP(AT1+BT1PFT1P+BT1EFT1E)x^T1P\displaystyle(\widehat{x}_{T-1}^{P})^{\top}\left(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E}\right)^{\top}Q_{T}^{P}\left(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E}\right)\widehat{x}_{T-1}^{P}
+Tr(L1QTPL1Σ^T1E)+Tr(L2QTPL2Σ^T1P)+2Tr(L1QTPL2Σ~T1(P,E))\displaystyle+\operatorname{Tr}(L_{1}^{\top}Q_{T}^{P}L_{1}\widehat{\Sigma}_{T-1}^{E})+\operatorname{Tr}(L_{2}^{\top}Q_{T}^{P}L_{2}\widehat{\Sigma}_{T-1}^{P})+2\operatorname{Tr}(L_{1}^{\top}Q_{T}^{P}L_{2}\widetilde{\Sigma}_{T-1}^{(P,E)})
+Tr((KTP)QTPKTPGP)+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)+Tr(QTPΣ^TP).\displaystyle+\operatorname{Tr}((K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P})+\operatorname{Tr}(\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W)+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P}).

For the RHS of (2.59), we have by expanding xTx_{T} directly,

𝔼[xTQTPxT|T1P]\displaystyle\mathbb{E}\left[\left.x_{T}^{\top}Q_{T}^{P}x_{T}\right|\mathcal{H}_{T-1}^{P}\right]
=\displaystyle= (x^T1P)(AT1+BT1PFT1P+BT1EFT1E)QTP(AT1+BT1PFT1P+BT1EFT1E)x^T1P\displaystyle(\widehat{x}_{T-1}^{P})^{\top}\left(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E}\right)^{\top}Q_{T}^{P}\left(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E}\right)\widehat{x}_{T-1}^{P}
+Tr(ΓT1QTΓT1W)+Tr((AT1+BT1EFT1E)QTP(AT1+BT1EFT1E)Σ^T1P)\displaystyle+\operatorname{Tr}(\Gamma_{T-1}^{\top}Q_{T}\Gamma_{T-1}W)+\operatorname{Tr}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widehat{\Sigma}_{T-1}^{P}\big{)}
+Tr((FT1E)(BT1E)QTPBT1EFT1EΣ^T1E)2Tr((AT1+BT1EFT1E)QTPBT1EFT1EΣ~T1(E,P)).\displaystyle+\operatorname{Tr}\big{(}(F_{T-1}^{E})^{\top}(B^{E}_{T-1})^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{E}\big{)}-2\operatorname{Tr}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widetilde{\Sigma}_{T-1}^{(E,P)}\big{)}.

The proof that (2.65) is equivalent to (LABEL:eqn:RHS_tower) is deferred to Appendix A.

Calculations using the result in [13].

In [13], the authors used the following recursive formulas which are essentially the same as the single agent case (see Theorem 2.3):

xt𝒩(x^ti,Σ^ti),\displaystyle x_{t}\sim\mathcal{N}(\widehat{x}_{t}^{i},\widehat{\Sigma}_{t}^{i}), (2.67)

where x^ti\widehat{x}_{t}^{i} and Σ^ti\widehat{\Sigma}_{t}^{i} are updated according to (2.13a)-(2.13e) Now we use the recursive formula listed in (2.13a)-(2.13e) to calculate the LHS and RHS of (2.59). For the LHS, by direct calculation,

𝔼[(x^TP)QTPx^TP|T1P]+Tr(QTPΣ^TP)\displaystyle\mathbb{E}\left.\left[(\widehat{x}^{P}_{T})^{\top}Q_{T}^{P}\widehat{x}^{P}_{T}\right|\mathcal{H}^{P}_{T-1}\right]+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P}) (2.68)
=\displaystyle= (x^T1P)AT1QTPAT1x^T1P+(x^T1P)(FT1P)(BT1P)QTPBT1PFT1Px^T1P\displaystyle(\widehat{x}_{T-1}^{P})^{\top}A_{T-1}^{\top}{Q}^{P}_{T}A_{T-1}\widehat{x}_{T-1}^{P}+(\widehat{x}_{T-1}^{P})^{\top}(F^{P}_{T-1})^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}B_{T-1}^{P}F^{P}_{T-1}\widehat{x}_{T-1}^{P}
+(FT1Ex^T1P)((BT1E)QTPBT1E)FT1Ex^T1P+Tr((FT1E)(BT1E)QTPBT1EFT1EΣ^T1(P,E))\displaystyle+(F_{T-1}^{E}\widehat{x}_{T-1}^{P})^{\top}((B_{T-1}^{E})^{\top}{Q}_{T}^{P}B_{T-1}^{E})F_{T-1}^{E}\widehat{x}_{T-1}^{P}+\operatorname{Tr}((F_{T-1}^{E})^{\top}(B_{T-1}^{E})^{\top}{Q}_{T}^{P}B_{T-1}^{E}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{(P,E)})
+2(FT1Px^T1P)(BT1P)QTPAT1x^T1P+2(FT1Px^T1P)(BT1P)QTPBT1EFT1Ex^T1P\displaystyle+2(F_{T-1}^{P}\widehat{x}_{T-1}^{P})^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}A_{T-1}\widehat{x}_{T-1}^{P}+2(F_{T-1}^{P}\widehat{x}_{T-1}^{P})^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}B_{T-1}^{E}F_{T-1}^{E}\widehat{x}_{T-1}^{P}
+2(FT1Ex^T1E)(BT1E)QTPAT1x^T1P2Tr((BT1EFT1E)QTPKTPHTPAT1Σ~T1(P,E))\displaystyle+2(F_{T-1}^{E}\widehat{x}_{T-1}^{E})^{\top}(B_{T-1}^{E})^{\top}{Q}_{T}^{P}A_{T-1}\widehat{x}^{P}_{T-1}-2\operatorname{Tr}((B_{T-1}^{E}F_{T-1}^{E})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)})
+2Tr((BT1EFT1E)QTPKTPHTPAT1Σ^T1P)+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)\displaystyle+2\operatorname{Tr}((B_{T-1}^{E}F_{T-1}^{E})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}\left(\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W\right)
+Tr((KTP)QTPKTPGP)+Tr(AT1(HTP)(KTP)QTPKTPHTPAT1Σ^T1P)+Tr(QTPΣ^TP).\displaystyle+\operatorname{Tr}\left((K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P}\right)+\operatorname{Tr}\left(A_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}\right)+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P}).

On the other hand,

𝔼[xTQTPxT|T1P]\displaystyle\mathbb{E}\left.\left[x_{T}^{\top}Q_{T}^{P}x_{T}\right|\mathcal{H}^{P}_{T-1}\right] (2.69)
=\displaystyle= (x^T1P)AT1QTPAT1x^T1P+(FT1Px^T1P)(BT1P)QTPBT1P(FT1Px^T1P)\displaystyle(\widehat{x}_{T-1}^{P})^{\top}A_{T-1}^{\top}{Q}^{P}_{T}A_{T-1}\widehat{x}_{T-1}^{P}+(F_{T-1}^{P}\widehat{x}_{T-1}^{P})^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}B_{T-1}^{P}(F_{T-1}^{P}\widehat{x}_{T-1}^{P})
+(FT1Ex^T1P)((BT1E)QTPBT1E)FT1Ex^T1P+Tr(((BT1EFT1E)QTPBT1EFT1E)Σ^T1(P,E))\displaystyle+(F_{T-1}^{E}\widehat{x}_{T-1}^{P})^{\top}((B_{T-1}^{E})^{\top}{Q}_{T}^{P}B_{T-1}^{E})F_{T-1}^{E}\widehat{x}_{T-1}^{P}+\operatorname{Tr}(((B_{T-1}^{E}F_{T-1}^{E})^{\top}{Q}_{T}^{P}B_{T-1}^{E}F_{T-1}^{E})\widehat{\Sigma}_{T-1}^{(P,E)})
+2uT1(BT1P)QTPAT1x^T1P+2uT1(BT1P)QTPBT1EFT1Ex^T1P\displaystyle+2u_{T-1}^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}A_{T-1}\widehat{x}_{T-1}^{P}+2u_{T-1}^{\top}(B_{T-1}^{P})^{\top}{Q}_{T}^{P}B_{T-1}^{E}F_{T-1}^{E}\widehat{x}_{T-1}^{P}
+2(FT1Ex^T1P)(BT1E)QTPAT1(x^T1P)+2Tr((BT1EFT1E)QTPAT1Σ^T1P)\displaystyle+2(F_{T-1}^{E}\widehat{x}_{T-1}^{P})^{\top}(B_{T-1}^{E})^{\top}{Q}_{T}^{P}A_{T-1}(\widehat{x}^{P}_{T-1})+2\operatorname{Tr}((B_{T-1}^{E}F_{T-1}^{E})^{\top}{Q}_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
2Tr((BT1EFT1E)QTPAT1Σ~T1(P,E))+Tr(AT1QTPAT1Σ^T1P)+Tr(ΓT1QTPΓT1W).\displaystyle-2\operatorname{Tr}((B_{T-1}^{E}F_{T-1}^{E})^{\top}{Q}_{T}^{P}A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)})+\operatorname{Tr}(A_{T-1}^{\top}{Q}_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}\left(\Gamma_{T-1}^{\top}{Q}_{T}^{P}\Gamma_{T-1}W\right).

By routine calculations similar to those used for Step 3 in Appendix A, we see that (2.68) and (2.69) differ from each other by:

2Tr((FtE)(BtE)Qt+1P(Kt+1PHt+1PI)At(Σ~t(E,P)Σ^tP)).\displaystyle 2\operatorname{Tr}\left((F_{t}^{E})^{\top}(B_{t}^{E})^{\top}Q_{t+1}^{P}(K_{t+1}^{P}H_{t+1}^{P}-I)A_{t}\left(\widetilde{\Sigma}_{t}^{(E,P)}-\widehat{\Sigma}_{t}^{P}\right)\right). (2.70)

Hence the recursive formula (2.13a)-(2.13e) adopted in [13] does not lead to the correct conditional expectation, let alone the tower property and DPP.

3 Equilibrium Solution

Now that we have the information corrections in the updating scheme, we will use these to discuss the DPP and the Nash equilibrium in this section.

3.1 Dynamic Programming Principle

Although both players do not have access to the true state and their controls are related to their state estimates which causes extra correlated randomness, we are still able to derive the individual DPP for the two-player general-sum linear-quadratic Gaussian game under a fixed linear and Markovian strategy from the opponent. This is because the sufficient statistics derived from Theorem 2.4 leads to a valid tower property given in (2.59). This equips us with sufficient tools to prove the DPP.

To start, denote the value function of player ii (i=P,Ei=P,E), under a fixed strategy Fj:={Ftj}t=0T1F^{j}:=\{F_{t}^{j}\}_{t=0}^{T-1} from player jj (jij\neq i) and at any given time 0tT10\leq t\leq T-1, as

Vti(x^ti;Fj)=min{usi}s=tT1𝔼[xTQTixT+s=tT1(uti)Rtiuti+xtQtixt|ti]\displaystyle V^{i}_{t}(\widehat{x}_{t}^{i};F^{j})=\left.\min_{\{u_{s}^{i}\}_{s=t}^{T-1}}\mathbb{E}\left[x_{T}^{\top}Q^{i}_{T}x_{T}+\sum_{s=t}^{T-1}(u^{i}_{t})^{\top}R_{t}^{i}u^{i}_{t}+x_{t}^{\top}Q^{i}_{t}x_{t}\right|\mathcal{H}_{t}^{i}\right] (3.1)

subject to (2.1) and FjF^{j}, with the terminal value

VTi(x^Ti;Fj)=VTi(x^Ti)=𝔼[xTQTixT|Ti]=(x^Ti)QTix^Ti+Tr(QTiΣ^Ti).\displaystyle V_{T}^{i}\left(\widehat{x}^{i}_{T};F^{j}\right)=V_{T}^{i}\left(\widehat{x}^{i}_{T}\right)=\left.\mathbb{E}\left[x_{T}^{\top}Q_{T}^{i}x_{T}\,\right|\,\mathcal{H}^{i}_{T}\right]=\left(\widehat{x}^{i}_{T}\right)^{\top}Q_{T}^{i}\widehat{x}^{i}_{T}+\operatorname{Tr}\left(Q_{T}^{i}\widehat{\Sigma}_{T}^{i}\right). (3.2)

Now we prove the DPP in the two-player general-sum linear-quadratic Gaussian game under partial observations and asymmetric information.

Theorem 3.1 (Dynamic Programming Principle).

For any given time 0tT10\leq t\leq T-1, the value function VtiV^{i}_{t} for player ii (i=P,E)(i=P,E), under a fixed policy Fj:={Ftj}t=0T1F^{j}:=\{F^{j}_{t}\}_{t=0}^{T-1} from player ii, satisfies

Vti(x^ti;Fj)=minuti𝔼[xtQtixt+(uti)Rtiuti+Vt+1i(x^t+1i;Fj)|ti]\displaystyle V^{i}_{t}(\widehat{x}_{t}^{i};F^{j})=\min_{u^{i}_{t}}\mathbb{E}\left[\left.x_{t}^{\top}Q_{t}^{i}x_{t}+(u^{i}_{t})^{\top}R_{t}^{i}u^{i}_{t}+{V}^{i}_{t+1}\left(\widehat{x}_{t+1}^{i};F^{j}\right)\right|\mathcal{H}^{i}_{t}\right] (3.3)

with j=P,Ej=P,E, iji\neq j, and terminal value VTi(x^Ti;Fj)V_{T}^{i}(\widehat{x}_{T}^{i};F^{j}) given in (3.2).

Proof.

We take the perspective of player PP and the result for player EE follows the same logic. By definition of the value function in (3.1) we have,

VtP(x^tP;FE)\displaystyle V^{P}_{t}(\widehat{x}_{t}^{P};F^{E}) (3.6)
=\displaystyle= min{usP}s=tT1𝔼[xTQTPxT+s=tT1(utP)RtPutP+xtQtPxt|tP]\displaystyle\min_{\{u_{s}^{P}\}_{s=t}^{T-1}}\mathbb{E}\left.\left[x_{T}^{\top}Q^{P}_{T}x_{T}+\sum_{s=t}^{T-1}(u^{P}_{t})^{\top}R_{t}^{P}u^{P}_{t}+x_{t}^{\top}Q^{P}_{t}x_{t}\right|\mathcal{H}_{t}^{P}\right]
=\displaystyle= minutP{𝔼[xtQtPxt+(utP)RtPutP|tP]\displaystyle\min_{u^{P}_{t}}\Bigg{\{}\mathbb{E}\left.\left[x_{t}^{\top}Q_{t}^{P}x_{t}+(u^{P}_{t})^{\top}R_{t}^{P}u^{P}_{t}\right|\mathcal{H}_{t}^{P}\right]
+min{usP}s=t+1T𝔼[xTQTPxT+s=t+1T1((usP)RsPusP+xsQsPxs)|tP]}\displaystyle\quad+\min_{\{u_{s}^{P}\}_{s=t+1}^{T}}\left.\mathbb{E}\left[x_{T}^{\top}Q^{P}_{T}x_{T}+\sum_{s=t+1}^{T-1}\left((u_{s}^{P})^{\top}R_{s}^{P}u_{s}^{P}+x_{s}^{\top}Q^{P}_{s}x_{s}\right)\right|\mathcal{H}_{t}^{P}\right]\Bigg{\}}
=\displaystyle= minutP{𝔼[xtQtPxt+(utP)RtPutP|tP]\displaystyle\min_{u^{P}_{t}}\Bigg{\{}\mathbb{E}\left.\left[x_{t}^{\top}Q_{t}^{P}x_{t}+(u^{P}_{t})^{\top}R_{t}^{P}u^{P}_{t}\right|\mathcal{H}_{t}^{P}\right]
+min{usP}s=t+1T𝔼[𝔼[xTQTPxT+s=t+1T1((usP)RsPusP+xsQsPxs)|t+1P]|tP]}\displaystyle\quad+\min_{\{u_{s}^{P}\}_{s=t+1}^{T}}\mathbb{E}\left[\left.\mathbb{E}\left[\left.x_{T}^{\top}Q^{P}_{T}x_{T}+\sum_{s=t+1}^{T-1}\left((u_{s}^{P})^{\top}R_{s}^{P}u_{s}^{P}+x_{s}^{\top}Q^{P}_{s}x_{s}\right)\right|\mathcal{H}_{t+1}^{P}\right]\right|\mathcal{H}_{t}^{P}\right]\Bigg{\}}
=\displaystyle= minutP{𝔼[xtQtPxt+(utP)RtPutP|tP]\displaystyle\min_{u^{P}_{t}}\Bigg{\{}\mathbb{E}\left.\left[x_{t}^{\top}Q_{t}^{P}x_{t}+(u^{P}_{t})^{\top}R_{t}^{P}u^{P}_{t}\right|\mathcal{H}_{t}^{P}\right]
+𝔼[min{usP}s=t+1T𝔼[xTQTPxT+s=t+1T1((usP)RsPusP+xsQsPxs)|t+1P]|tP]},\displaystyle\quad+\mathbb{E}\left[\min_{\{u_{s}^{P}\}_{s=t+1}^{T}}\left.\mathbb{E}\left[\left.x_{T}^{\top}Q^{P}_{T}x_{T}+\sum_{s=t+1}^{T-1}\left((u_{s}^{P})^{\top}R_{s}^{P}u_{s}^{P}+x_{s}^{\top}Q^{P}_{s}x_{s}\right)\right|\mathcal{H}_{t+1}^{P}\right]\right|\mathcal{H}_{t}^{P}\right]\Bigg{\}},

where (3.6) holds since utPu^{P}_{t} is adapted to tP\mathcal{H}_{t}^{P}, (3.6) holds by the tower property (2.59), and (3.6) holds since usPu_{s}^{P} is adapted to sP\mathcal{H}_{s}^{P} (st+1)(s\geq t+1). Finally (3.6) leads to the DPP (3.3) by the definition of Vt+1PV_{t+1}^{P}. ∎

3.2 Nash Equilibrium

In this section, we will show that the Nash equilibrium strategy for the game (2.1)-(2.2)-(2.3)-(2.5)-(2.7) is related to the solution of a coupled Riccati system. Assumption 3.2 is the existence and uniqueness of this solution and we provide a sufficient condition for this assumption in Remark 3.3.

Assumption 3.2.

There exists a unique solution set FP:={FtP}t=0T1F^{P*}:=\{F_{t}^{P*}\}_{t=0}^{T-1} with FtP𝒜PF_{t}^{P*}\in\mathcal{A}^{P} and FE:={FtE}t=0T1F^{E*}:=\{F_{t}^{E*}\}_{t=0}^{T-1} with FtE𝒜EF_{t}^{E*}\in\mathcal{A}^{E} to the following set of linear matrix equations:

FtP\displaystyle F_{t}^{P*} =\displaystyle= (RtP+(BtP)Ut+1PBtP)1((BtP)Ut+1P(At+BtEFtE)),\displaystyle-(R_{t}^{P}+(B^{P}_{t})^{\top}U_{t+1}^{P*}B^{P}_{t})^{-1}\big{(}(B^{P}_{t})^{\top}U_{t+1}^{P*}(A_{t}+B^{E}_{t}F_{t}^{E*})\big{)}, (3.7)
FtE\displaystyle F_{t}^{E*} =\displaystyle= (RtE+(BtE)Ut+1EBtE)1((BtE)Ut+1E(At+BtPFtP)),\displaystyle-(R_{t}^{E}+(B^{E}_{t})^{\top}U_{t+1}^{E*}B^{E}_{t})^{-1}\big{(}(B^{E}_{t})^{\top}U_{t+1}^{E*}(A_{t}+B^{P}_{t}F_{t}^{P*})\big{)}, (3.8)

where {UtP}t=0T\{U_{t}^{P*}\}_{t=0}^{T} and {UtE}t=0T\{U_{t}^{E*}\}_{t=0}^{T} are obtained recursively backwards from

UtP\displaystyle U_{t}^{P*} =\displaystyle= QtP+(FtP)RtPFtP+(At+BtPFtP+BtEFtE)Ut+1P(At+BtPFtP+BtEFtE),\displaystyle Q_{t}^{P}+(F_{t}^{P*})^{\top}R_{t}^{P}F_{t}^{P*}+\big{(}A_{t}+B^{P}_{t}F_{t}^{P*}+B^{E}_{t}F_{t}^{E*}\big{)}^{\top}U_{t+1}^{P*}\big{(}A_{t}+B^{P}_{t}F_{t}^{P*}+B^{E}_{t}F_{t}^{E*}\big{)}, (3.9)
UtE\displaystyle U_{t}^{E*} =\displaystyle= QtE+(FtE)RtEFtE+(At+BtPFtP+BtEFtE)Ut+1E(At+BtPFtP+BtEFtE),\displaystyle Q_{t}^{E}+(F_{t}^{E*})^{\top}R_{t}^{E}F_{t}^{E*}+\big{(}A_{t}+B^{P}_{t}F_{t}^{P*}+B^{E}_{t}F_{t}^{E*}\big{)}^{\top}U_{t+1}^{E*}\big{(}A_{t}+B^{P}_{t}F_{t}^{P*}+B^{E}_{t}F_{t}^{E*}\big{)}, (3.10)

with terminal conditions UTi=QTiU_{T}^{i*}=Q_{T}^{i} for i=P,Ei=P,E.

Remark 3.3.

A sufficient condition for the unique solvability of (3.9)-(3.10) is the invertibility of the block matrix Φt\Phi_{t}, t=0,1,,T1t=0,1,\cdots,T-1, with the iiii-th block given by Rti+(Bti)Ut+1iBtiR_{t}^{i}+(B_{t}^{i})^{\top}U_{t+1}^{i*}B_{t}^{i} and the ijij-th block given by (Bti)Ut+1iBtj(B_{t}^{i})^{\top}U_{t+1}^{i*}B_{t}^{j}, where i,j=P,Ei,j=P,E and jij\neq i. See Remark 6.5 in [2].

Using the DPP formula in Theorem 3.1, we have the following result for the Nash equilibrium strategy and the corresponding value function for the game (2.1)-(2.2)-(2.3)-(2.5)-(2.7).

Theorem 3.4.

Suppose Assumptions 2.1 and 3.2 hold. We also assume that both players are applying linear strategies. Then the unique Nash equilibrium policy can be expressed as for i=P,Ei=P,E

uti(x^ti)=Ftix^ti,\displaystyle u^{i*}_{t}(\widehat{x}_{t}^{i})=F^{i*}_{t}\widehat{x}_{t}^{i}, (3.11)

with FtPF_{t}^{P*} and FtEF_{t}^{E*} given in (3.7) and (3.8). The corresponding optimal value function of player ii is quadratic (0tT)(0\leq t\leq T):

Vti(x^ti;Fj)=(x^ti)Utix^ti+cti,V^{i}_{t}(\widehat{x}_{t}^{i};F^{j*})=(\widehat{x}_{t}^{i})^{\top}U_{t}^{i*}\widehat{x}_{t}^{i}+c^{i*}_{t}, (3.12)

where j=P,Ej=P,E and jij\neq i, the matrices UtP,UtEn×nU_{t}^{P*},U_{t}^{E*}\in\mathbb{R}^{n\times n} are given in (3.9) and (3.10), and the scalars ctP,ctEc^{P*}_{t},c^{E*}_{t}\in\mathbb{R} are given by

cti\displaystyle c_{t}^{i*} =\displaystyle= ct+1i+Tr(QtiΣ^ti)Tr(Ut+1iΣ^t+1i)+Tr((At+BtjFtj)Ut+1i(At+BtjFtj)Σ^ti)\displaystyle c_{t+1}^{i*}+\operatorname{Tr}\left(Q_{t}^{i}\widehat{\Sigma}_{t}^{i}\right)-\operatorname{Tr}\left(U_{t+1}^{i*}\widehat{\Sigma}_{t+1}^{i}\right)+\operatorname{Tr}\big{(}(A_{t}+B_{t}^{j}F_{t}^{j*})^{\top}U_{t+1}^{i*}(A_{t}+B_{t}^{j}F_{t}^{j*})\widehat{\Sigma}_{t}^{i}\big{)} (3.13)
+Tr((Ftj)(Btj)Ut+1iBtjFtjΣ^tj)2Tr((At+BtjFtj)Ut+1iBtjFtjΣ~t(j,i))\displaystyle+\operatorname{Tr}\big{(}(F_{t}^{j*})^{\top}(B_{t}^{j})^{\top}U_{t+1}^{i*}B_{t}^{j}F_{t}^{j*}\widehat{\Sigma}_{t}^{j}\big{)}-2\operatorname{Tr}\big{(}(A_{t}+B_{t}^{j}F_{t}^{j*})^{\top}U_{t+1}^{i*}B_{t}^{j}F_{t}^{j*}\widetilde{\Sigma}_{t}^{(j,i)}\big{)}
+Tr(ΓtUt+1iΓtW).\displaystyle+\operatorname{Tr}(\Gamma_{t}^{\top}U_{t+1}^{i*}\Gamma_{t}W).

The terminal condition for player ii is cTi=Tr(QTiΣ^Ti)c_{T}^{i*}=\operatorname{Tr}\left(Q_{T}^{i}\widehat{\Sigma}_{T}^{i}\right).

Remark 3.5 (Discussion of linear policies).
  1. 1.

    In the partially observable setting, it is widely recognized that the existence of a Nash equilibrium is not guaranteed if a more general class of policies is considered, as players can mislead their opponents by disclosing false intentions [5, 20].

  2. 2.

    We note that the optimal policies FtPF_{t}^{P*} and FtEF_{t}^{E*} given in (3.7) and (3.8), and the Riccati equations given in (3.9) and (3.10), are the same as the optimal policies and Riccati equations in the case of full observation ([2, Corollary 6.4]). However, the linear-quadratic Gaussian game under partial observation (defined in (2.1)-(2.2)-(2.3)-(2.5)-(2.7)) differs from the linear-quadratic game with full information in [2, Corollary 6.4] in the sense that the Nash equilibrium strategy is linear in the state estimate rather than the true state, and the scalars ctPc_{t}^{P*} and ctEc_{t}^{E*} in the value function involve more terms due to the errors in state estimation.

Proof.

We prove the theorem by backward induction. We take the perspective of player PP and let player EE use the linear strategy FE={FtE}t=0T1F^{E*}=\{F_{t}^{E*}\}_{t=0}^{T-1} defined in (3.8). At time TT, (3.12) holds by the terminal condition given in (3.2). At time T1T-1, by Theorem 3.1, we have the DPP for player PP:

VT1P(x^T1P;FE)=minuT1P𝔼[xT1QT1PxT1+(uT1P)RT1PuT1P+VTP(x^TP;FE)|T1P],V^{P}_{T-1}(\widehat{x}_{T-1}^{P};F^{E*})=\left.\min_{u^{P}_{T-1}}\mathbb{E}\left[x_{T-1}^{\top}Q_{T-1}^{P}x_{T-1}+(u^{P}_{T-1})^{\top}R_{T-1}^{P}u^{P}_{T-1}+{V}^{P}_{T}\left(\widehat{x}_{T}^{P};F^{E*}\right)\right|\mathcal{H}^{P}_{T-1}\right],

and by (2.62),

x^TP=(AT1+BT1EFT1E)x^T1P+BT1PuT1P+LT11eT1E+L2eT1P+KTPwTP+KTPHTPΓT1wT1,\widehat{x}_{T}^{P}=(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*})\widehat{x}_{T-1}^{P}+B^{P}_{T-1}u^{P}_{T-1}+L_{T-1}^{1}e_{T-1}^{E}+L_{2}e_{T-1}^{P}+K_{T}^{P}w_{T}^{P}+K_{T}^{P}H_{T}^{P}\Gamma_{T-1}w_{T-1},

with LT11L_{T-1}^{1} and LT12L_{T-1}^{2} defined as LT11=AT1ΠT1P+BT1EFT1EKTPHTPAT1ΠT1PL_{T-1}^{1}=A_{T-1}\Pi_{T-1}^{P}+B^{E}_{T-1}F_{T-1}^{E*}-K_{T}^{P}H_{T}^{P}A_{T-1}\Pi_{T-1}^{P} and LT12=AT1ΠT1PBT1EFT1EKTPHTPAT1(IΠT1P)L_{T-1}^{2}=-A_{T-1}\Pi_{T-1}^{P}-B^{E}_{T-1}F_{T-1}^{E*}-K_{T}^{P}H_{T}^{P}A_{T-1}(I-\Pi_{T-1}^{P}), where ΠT1P\Pi_{T-1}^{P} is defined as ΠT1P=(Σ^T1PΣ~T1(P,E))(Σ^T1(P,E))1\Pi_{T-1}^{P}=\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{(}\widehat{\Sigma}_{T-1}^{(P,E)}\big{)}^{-1}. Hence

VT1P(x^T1P;FE)\displaystyle V^{P}_{T-1}(\widehat{x}_{T-1}^{P};F^{E*}) =\displaystyle= minuT1P{(uT1P)RT1PuT1P+(x^T1P)QTPx^T1P+Tr(QT1PΣ^T1P)\displaystyle\min_{u^{P}_{T-1}}\left\{(u^{P}_{T-1})^{\top}R_{T-1}^{P}u^{P}_{T-1}+(\widehat{x}_{T-1}^{P})^{\top}Q_{T}^{P}\widehat{x}_{T-1}^{P}+\operatorname{Tr}\left(Q_{T-1}^{P}\widehat{\Sigma}_{T-1}^{P}\right)\right. (3.14)
+𝔼[VTP((AT1+BT1EFT1E)x^T1P+BT1PuT1P+LT11eT1E+LT12eT1P\displaystyle+\mathbb{E}\left[V_{T}^{P}((A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*})\widehat{x}_{T-1}^{P}+B^{P}_{T-1}u^{P}_{T-1}+L_{T-1}^{1}e_{T-1}^{E}+L_{T-1}^{2}e_{T-1}^{P}\right.
+KTPwTP+KTPHTPΓT1wT1;FE)|T1P]},\displaystyle\left.\left.\left.+K_{T}^{P}w_{T}^{P}+K_{T}^{P}H_{T}^{P}\Gamma_{T-1}w_{T-1};F^{E*})\right|\mathcal{H}^{P}_{T-1}\right]\right\},

Since VTP(x^TP)=(x^TP)QTPx^TP+Tr(QTPΣ^TP)V_{T}^{P}(\widehat{x}_{T}^{P})=(\widehat{x}_{T}^{P})^{\top}Q_{T}^{P}\widehat{x}_{T}^{P}+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P}), we have

VT1P(x^T1P;FE)=minuT1P{(uT1P)RT1PuT1P+(x^T1P)QT1Px^T1P+Tr(QT1PΣ^T1P)\displaystyle V^{P}_{T-1}(\widehat{x}_{T-1}^{P};F^{E*})=\min_{u^{P}_{T-1}}\left\{(u^{P}_{T-1})^{\top}R_{T-1}^{P}u^{P}_{T-1}+(\widehat{x}_{T-1}^{P})^{\top}Q_{T-1}^{P}\widehat{x}_{T-1}^{P}+\operatorname{Tr}\big{(}Q_{T-1}^{P}\widehat{\Sigma}_{T-1}^{P}\big{)}\right.
+Tr(QTPΣ^TP)+𝔼[((AT1+BT1EFT1E)x^T1P+BT1PuT1P+LT11eT1E+LT12eT1P\displaystyle\qquad+\operatorname{Tr}\big{(}Q_{T}^{P}\widehat{\Sigma}_{T}^{P}\big{)}+\mathbb{E}\left[\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*})\widehat{x}_{T-1}^{P}+B^{P}_{T-1}u^{P}_{T-1}+L_{T-1}^{1}e_{T-1}^{E}+L_{T-1}^{2}e_{T-1}^{P}\right.
+KTPwTP+KTPHTPΓT1wT1)QTP((AT1+BT1EFT1E)x^T1P+BT1PuT1P\displaystyle\qquad+K_{T}^{P}w_{T}^{P}+K_{T}^{P}H_{T}^{P}\Gamma_{T-1}w_{T-1}\big{)}^{\top}Q_{T}^{P}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*})\widehat{x}_{T-1}^{P}+B^{P}_{T-1}u^{P}_{T-1}
+LT11eT1E+LT12eT1P+KTPwTP+KTPHTPΓT1wT1)|T1P]}.\displaystyle\left.\left.\left.\qquad+L_{T-1}^{1}e_{T-1}^{E}+L_{T-1}^{2}e_{T-1}^{P}+K_{T}^{P}w_{T}^{P}+K_{T}^{P}H_{T}^{P}\Gamma_{T-1}w_{T-1}\big{)}\right|\mathcal{H}^{P}_{T-1}\right]\right\}. (3.15)

Expanding terms in the expectation, (3.15) becomes

VT1P(x^T1P;FE)\displaystyle V^{P}_{T-1}(\widehat{x}_{T-1}^{P};F^{E*}) (3.16)
=\displaystyle= minuT1P{(uT1P)(RT1P+(BT1P)QTPBT1P)uT1P+2(x^T1P)(AT1+BT1EFT1E)QTPBT1PuT1P}\displaystyle\min_{u^{P}_{T-1}}\Big{\{}(u^{P}_{T-1})^{\top}(R_{T-1}^{P}+(B^{P}_{T-1})^{\top}Q_{T}^{P}B^{P}_{T-1})u^{P}_{T-1}+2(\widehat{x}_{T-1}^{P})^{\top}\big{(}A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*}\big{)}^{\top}Q_{T}^{P}B^{P}_{T-1}u^{P}_{T-1}\Big{\}}
+(x^T1P)(QT1P+(AT1+BT1EFT1E)QTP(AT1+BT1EFT1E))x^T1P+Tr(QTPΣ^TP)\displaystyle+(\widehat{x}_{T-1}^{P})^{\top}\Big{(}Q_{T-1}^{P}+\big{(}A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*}\big{)}^{\top}Q_{T}^{P}\big{(}A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*}\big{)}\Big{)}\widehat{x}_{T-1}^{P}+\operatorname{Tr}\big{(}Q_{T}^{P}\widehat{\Sigma}_{T}^{P}\big{)}
+Tr(QT1PΣ^T1P)+Tr((LT11)QTPLT11Σ^T1E)+Tr((LT12)QTPLT12Σ^T1P)\displaystyle+\operatorname{Tr}\big{(}Q_{T-1}^{P}\widehat{\Sigma}_{T-1}^{P}\big{)}+\operatorname{Tr}((L_{T-1}^{1})^{\top}Q_{T}^{P}L_{T-1}^{1}\widehat{\Sigma}_{T-1}^{E})+\operatorname{Tr}((L_{T-1}^{2})^{\top}Q_{T}^{P}L_{T-1}^{2}\widehat{\Sigma}_{T-1}^{P})
+2Tr((LT11)QTPLT12Σ~T1(P,E))+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)\displaystyle+2\operatorname{Tr}((L_{T-1}^{1})^{\top}Q_{T}^{P}L_{T-1}^{2}\widetilde{\Sigma}_{T-1}^{(P,E)})+\operatorname{Tr}\big{(}\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W\big{)}
+Tr((KTP)QTPKTPGP).\displaystyle+\operatorname{Tr}\big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P}\big{)}.

We note that all the constant terms are independent of uT1Pu_{T-1}^{P}, since Σ^TP\widehat{\Sigma}_{T}^{P}, Σ~T1(P,E)\widetilde{\Sigma}_{T-1}^{(P,E)}, and Σ^T1P\widehat{\Sigma}_{T-1}^{P} are independent of the policy FT1PF_{T-1}^{P}. Thus these constant terms will not be involved in the minimization problem. Applying the first order condition to the minimization part in (3.16) leads to

uT1P=(RT1P+(BT1P)QTPBT1P)1(BT1P)QTP(AT1+BT1EFT1E)x^T1P=FT1Px^T1P.u^{P*}_{T-1}=-(R_{T-1}^{P}+(B^{P}_{T-1})^{\top}Q_{T}^{P}B^{P}_{T-1})^{-1}(B^{P}_{T-1})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E*})\widehat{x}_{T-1}^{P}=F_{T-1}^{P*}\widehat{x}_{T-1}^{P}.

Similarly, we can derive the optimal policy of the player EE when fixing player P’s strategy FT1PF_{T-1}^{P*}:

uT1E=(RT1E+(BT1E)QTEBT1E)1(BT1E)QTE(AT1+BT1PFT1P)x^T1E=FT1Ex^T1E.u^{E*}_{T-1}=-(R_{T-1}^{E}+(B^{E}_{T-1})^{\top}Q_{T}^{E}B^{E}_{T-1})^{-1}(B^{E}_{T-1})^{\top}Q_{T}^{E}(A_{T-1}+B^{P}_{T-1}F_{T-1}^{P*})\widehat{x}_{T-1}^{E}=F_{T-1}^{E*}\widehat{x}_{T-1}^{E}.

Substituting uT1P=FT1Px^T1Pu^{P*}_{T-1}=F_{T-1}^{P*}\widehat{x}_{T-1}^{P} into (3.16) we obtain the optimal value function given as

VT1P(x^T1P;FE)\displaystyle V^{P}_{T-1}(\widehat{x}_{T-1}^{P};F^{E*}) =\displaystyle= (x^T1P)(QT1P+(FT1P)RT1PFT1P+(AT1+BT1PFT1P+BT1EFT1E)\displaystyle(\widehat{x}_{T-1}^{P})^{\top}\Big{(}Q_{T-1}^{P}+(F^{P*}_{T-1})^{\top}R_{T-1}^{P}F^{P*}_{T-1}+\big{(}A_{T-1}+B^{P}_{T-1}F_{T-1}^{P*}+B^{E}_{T-1}F_{T-1}^{E*}\big{)}^{\top}\cdot (3.17)
QTP(AT1+BT1PFT1P+BT1EFT1E))x^T1P+Tr(QT1PΣ^T1P)\displaystyle Q_{T}^{P}\big{(}A_{T-1}+B^{P}_{T-1}F_{T-1}^{P*}+B^{E}_{T-1}F_{T-1}^{E*}\big{)}\Big{)}\widehat{x}_{T-1}^{P}+\operatorname{Tr}\big{(}Q_{T-1}^{P}\widehat{\Sigma}_{T-1}^{P}\big{)}
+Tr(QTPΣ^TP)+Tr((LT11)QTPLT11Σ^T1E)+Tr((LT12)QTPLT12Σ^T1P)\displaystyle+\operatorname{Tr}\big{(}Q_{T}^{P}\widehat{\Sigma}_{T}^{P}\big{)}+\operatorname{Tr}((L_{T-1}^{1})^{\top}Q_{T}^{P}L_{T-1}^{1}\widehat{\Sigma}_{T-1}^{E})+\operatorname{Tr}((L_{T-1}^{2})^{\top}Q_{T}^{P}L_{T-1}^{2}\widehat{\Sigma}_{T-1}^{P})
+2Tr((LT11)QTPLT12Σ~T1(P,E))+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)\displaystyle+2\operatorname{Tr}((L_{T-1}^{1})^{\top}Q_{T}^{P}L_{T-1}^{2}\widetilde{\Sigma}_{T-1}^{(P,E)})+\operatorname{Tr}\big{(}\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W\big{)}
+Tr((KTP)QTPKTPGP)\displaystyle+\operatorname{Tr}\big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P}\big{)}
=\displaystyle= (x^T1P)UT1P(x^T1P)+cT1P.\displaystyle(\widehat{x}_{T-1}^{P})^{\top}U^{P*}_{T-1}(\widehat{x}_{T-1}^{P})^{\top}+c_{T-1}^{P}. (3.18)

where (3.18) holds by a similar calculation to that proving (2.65) is equivalent to (LABEL:eqn:RHS_tower) in Section 2.2. Similarly we can also show that (3.12) holds for player EE at time T1T-1.

Now assume that (3.11)-(3.12) holds at all st+1s\geq t+1. Then we have

Vt+1P(x^t+1P;FE)=(x^t+1P)Ut+1Px^t+1P+ct+1P.V^{P}_{t+1}(\widehat{x}_{t+1}^{P};F^{E*})=(\widehat{x}_{t+1}^{P})^{\top}U_{t+1}^{P*}\widehat{x}_{t+1}^{P}+c^{P*}_{t+1}. (3.19)

At time tt, recall that ΠtP\Pi_{t}^{P} is defined as ΠtP=(Σ^tPΣ~t(P,E))(Σ^t(P,E))1\Pi_{t}^{P}=\big{(}\widehat{\Sigma}_{t}^{P}-\widetilde{\Sigma}_{t}^{(P,E)}\big{)}\big{(}\widehat{\Sigma}_{t}^{(P,E)}\big{)}^{-1}. We further define Lt1L_{t}^{1} and Lt2L_{t}^{2} as Lt1=AtΠtP+BtEFtEKt+1PHt+1PAtΠtPL_{t}^{1}=A_{t}\Pi_{t}^{P}+B_{t}^{E}F_{t}^{E*}-K_{t+1}^{P}H_{t+1}^{P}A_{t}\Pi_{t}^{P} and Lt2=AtΠtPBtEFtEKt+1PHt+1PAt(IΠtP)L_{t}^{2}=-A_{t}\Pi_{t}^{P}-B_{t}^{E}F_{t}^{E*}-K_{t+1}^{P}H_{t+1}^{P}A_{t}(I-\Pi_{t}^{P}). Then similarly to (2.62) we have

x^t+1P=(At+BtEFtE)x^tP+BtPutP+Lt1etE+Lt2etP+Kt+1Pwt+1P+Kt+1PHt+1PΓtwt,\widehat{x}_{t+1}^{P}=(A_{t}+B_{t}^{E}F_{t}^{E*})\widehat{x}_{t}^{P}+B^{P}_{t}u_{t}^{P}+L_{t}^{1}e_{t}^{E}+L_{t}^{2}e_{t}^{P}+K_{t+1}^{P}w_{t+1}^{P}+K_{t+1}^{P}H_{t+1}^{P}\Gamma_{t}w_{t}, (3.20)

We apply the DPP again at time tt, then by (3.19) and (3.20) we have

VtP(x^tP;FE)\displaystyle V^{P}_{t}(\widehat{x}_{t}^{P};F^{E*})
=\displaystyle= minutP{(utP)RtPutP+(x^tP)QtPx^tP+Tr(QtPΣ^tP)\displaystyle\min_{u^{P}_{t}}\left\{(u^{P}_{t})^{\top}R_{t}^{P}u^{P}_{t}+(\widehat{x}_{t}^{P})^{\top}Q_{t}^{P}\widehat{x}_{t}^{P}+\operatorname{Tr}\left(Q_{t}^{P}\widehat{\Sigma}_{t}^{P}\right)\right.
+𝔼[((At+BtEFtE)x^tP+BtPutP+Lt1etE+Lt2etP+Kt+1Pwt+1P+Kt+1PHt+1PΓtwt)Ut+1P\displaystyle+\mathbb{E}\left[\big{(}(A_{t}+B_{t}^{E}F_{t}^{E*})\widehat{x}_{t}^{P}+B^{P}_{t}u_{t}^{P}+L_{t}^{1}e_{t}^{E}+L_{t}^{2}e_{t}^{P}+K_{t+1}^{P}w_{t+1}^{P}+K_{t+1}^{P}H_{t+1}^{P}\Gamma_{t}w_{t}\big{)}^{\top}\cdot U_{t+1}^{P*}\cdot\right.
((At+BtEFtE)x^tP+BtPutP+Lt1etE+Lt2etP+Kt+1Pwt+1P+Kt+1PHt+1PΓtwt)+ct+1P|tP]}.\displaystyle\left.\left.\left.\big{(}(A_{t}+B_{t}^{E}F_{t}^{E*})\widehat{x}_{t}^{P}+B^{P}_{t}u_{t}^{P}+L_{t}^{1}e_{t}^{E}+L_{t}^{2}e_{t}^{P}+K_{t+1}^{P}w_{t+1}^{P}+K_{t+1}^{P}H_{t+1}^{P}\Gamma_{t}w_{t}\big{)}+c_{t+1}^{P*}\right|\mathcal{H}^{P}_{t}\right]\right\}.

Expanding the terms in the expectation we obtain

VtP(x^tP;FE)\displaystyle V^{P}_{t}(\widehat{x}_{t}^{P};F^{E*}) (3.21)
=\displaystyle= minutP{(utP)(RtP+(BtP)Ut+1PBtP)utP+2(x^tP)(At+BtEFtE)Ut+1PBtPutP}\displaystyle\min_{u^{P}_{t}}\Big{\{}(u^{P}_{t})^{\top}\big{(}R_{t}^{P}+(B^{P}_{t})^{\top}U_{t+1}^{P*}B^{P}_{t}\big{)}u^{P}_{t}+2(\widehat{x}_{t}^{P})^{\top}(A_{t}+B_{t}^{E}F_{t}^{E})^{\top}U_{t+1}^{P}B_{t}^{P}u^{P}_{t}\Big{\}}
+ct+1P+(x^tP)(QtP+(At+BtEFtE)Ut+1P(At+BtEFtE))x^tP+Tr(QtPΣ^tP)\displaystyle+c_{t+1}^{P}+(\widehat{x}_{t}^{P})^{\top}\big{(}Q_{t}^{P}+(A_{t}+B^{E}_{t}F_{t}^{E*})^{\top}U_{t+1}^{P*}(A_{t}+B^{E}_{t}F_{t}^{E*})\big{)}\widehat{x}_{t}^{P}+\operatorname{Tr}\left(Q_{t}^{P}\widehat{\Sigma}_{t}^{P}\right)
+Tr((Lt1)Ut+1PLt1Σ^tE)+Tr((Lt2)Ut+1PLt2Σ^tP)+2Tr((Lt1)Ut+1PLt2Σ~t(P,E))\displaystyle+\operatorname{Tr}((L_{t}^{1})^{\top}U_{t+1}^{P*}L_{t}^{1}\widehat{\Sigma}_{t}^{E})+\operatorname{Tr}((L_{t}^{2})^{\top}U_{t+1}^{P*}L_{t}^{2}\widehat{\Sigma}_{t}^{P})+2\operatorname{Tr}((L_{t}^{1})^{\top}U_{t+1}^{P*}L_{t}^{2}\widetilde{\Sigma}_{t}^{(P,E)})
+Tr(Γt(Ht+1P)(Kt+1P)Ut+1PKt+1PHt+1PΓtW)+Tr((Kt+1P)Ut+1PKt+1PGP)\displaystyle+\operatorname{Tr}\big{(}\Gamma_{t}^{\top}(H_{t+1}^{P})^{\top}(K_{t+1}^{P})^{\top}U_{t+1}^{P*}K_{t+1}^{P}H_{t+1}^{P}\Gamma_{t}W\big{)}+\operatorname{Tr}\big{(}(K_{t+1}^{P})^{\top}U_{t+1}^{P*}K_{t+1}^{P}G^{P}\big{)}

We note that all the constant terms including the accumulated sum ct+1Pc_{t+1}^{P} are independent of utPu_{t}^{P}, since Σ^sP\widehat{\Sigma}_{s}^{P}, Σ^sE\widehat{\Sigma}_{s}^{E}, and Σ~s(P,E)\widetilde{\Sigma}_{s}^{(P,E)} (s=t,,Ts=t,\ldots,T) are independent of the sequence {FsP}s=0t\{F_{s}^{P}\}_{s=0}^{t}.

We can apply the first-order condition to obtain the following optimal response:

utP=(RtP+(BtP)Ut+1PBtP)1(BtP)Ut+1P(At+BtEFtE)x^tP=FtPx^tP.\displaystyle u_{t}^{P*}=-(R_{t}^{P}+(B_{t}^{P})^{\top}U_{t+1}^{P*}B_{t}^{P})^{-1}(B^{P}_{t})^{\top}U_{t+1}^{P*}(A_{t}+B^{E}_{t}F_{t}^{E*})\widehat{x}_{t}^{P}=F_{t}^{P*}\widehat{x}_{t}^{P}. (3.22)

Similarly, player E minimizes his value function to find his optimal response to player P’s strategy F^tPx^tP\widehat{F}_{t}^{P*}\widehat{x}_{t}^{P}. We can show that the optimal strategy utEu^{E*}_{t} of player E is given by

utE=(RtE+(BtE)Ut+1EBtE)1(BtE)Ut+1E(At+BtPFtP)x^tE=FtEx^tE.u^{E*}_{t}=-(R_{t}^{E}+(B^{E}_{t})^{\top}U_{t+1}^{E*}B^{E}_{t})^{-1}(B^{E}_{t})^{\top}U_{t+1}^{E*}(A_{t}+B^{P}_{t}F_{t}^{P*})\widehat{x}_{t}^{E}=F_{t}^{E*}\widehat{x}_{t}^{E}. (3.23)

Plugging utP=FtPx^tPu_{t}^{P*}=F_{t}^{P*}\widehat{x}_{t}^{P} into (3.21) and after manipulations similar to those in the proof that (2.65) is equivalent to (LABEL:eqn:RHS_tower) in Section 2.2, we can rewrite the value function as

VtP(x^tP;FE)=(x^tP)UtPx^tP+ctP,V^{P}_{t}(\widehat{x}_{t}^{P};F^{E*})=(\widehat{x}_{t}^{P})^{\top}U_{t}^{P*}\widehat{x}_{t}^{P}+c^{P*}_{t},

with UtPU_{t}^{P*} and ctPc_{t}^{P*} given in (3.9) and (3.13). Similarly, we can show that (3.13) also holds for player EE. Therefore by backward induction, the statements hold for all t=0,1,,Tt=0,1,\ldots,T. ∎

4 A Mixed Partially and Fully Observable Setting

In this section, we consider a more general setting for games with two players, PP and EE. Now part of the state process is fully observable and part of the state process is partially observable. The joint dynamics xtnx_{t}\in\mathbb{R}^{n} takes a linear form (0tT1)(0\leq t\leq T-1):

xt+1:=(xt+1(1)xt+1(2))=At(xt(1)xt(2))+BtPutP+BtEutE+Γtwt,\displaystyle x_{t+1}:=\begin{pmatrix}x^{(1)}_{t+1}\\ x^{(2)}_{t+1}\end{pmatrix}=A_{t}\begin{pmatrix}x^{(1)}_{t}\\ x^{(2)}_{t}\end{pmatrix}+B^{P}_{t}u^{P}_{t}+B^{E}_{t}u^{E}_{t}+\Gamma_{t}w_{t}, (4.1)

with initial value x0=(x0(1),x0(2))x_{0}=(x^{(1)}_{0},x^{(2)}_{0})^{\top}, and the controls of PP and EE are utPmu^{P}_{t}\in\mathbb{R}^{m} and utEku^{E}_{t}\in\mathbb{R}^{k}, respectively. Here, for each tt, the noise wtdw_{t}\in\mathbb{R}^{d} is an i.i.d. sample from 𝒩(0,W)\mathcal{N}(0,W) with Wd×dW\in\mathbb{R}^{d\times d} and we have the model parameters Atn×n,A_{t}\in\mathbb{R}^{n\times n}, BtPn×mB^{P}_{t}\in\mathbb{R}^{n\times m}, BtEn×kB^{E}_{t}\in\mathbb{R}^{n\times k}, and Γtn×d\Gamma_{t}\in\mathbb{R}^{n\times d}. We assume that xt(1)n1x_{t}^{(1)}\in\mathbb{R}^{n_{1}} is the partially observable part and xt(2)n2x_{t}^{(2)}\in\mathbb{R}^{n_{2}} is the fully observable part with n=n1+n2n=n_{1}+n_{2}.

Information Structure.

At the time t=0t=0, player PP observes x0(2)x_{0}^{(2)} and believes that x0(1)x_{0}^{(1)} is drawn from a Gaussian distribution x0(1)𝒩(x^0P,(1),W0P)x^{(1)}_{0}\sim\mathcal{N}(\widehat{x}_{0}^{P,(1)},W^{P}_{0}), and thereafter player PP observes part of the state xt(2)n2x_{t}^{(2)}\in\mathbb{R}^{n_{2}} and the noisy state signal ztPpz_{t}^{P}\in\mathbb{R}^{p}:

zt+1P=Ht+1Pxt+1(1)+wt+1P,wt+1P𝒩(0,GP),t=0,1,,T1,\displaystyle z_{t+1}^{P}=H_{t+1}^{P}\,x^{(1)}_{t+1}+\,w^{P}_{t+1},\quad w^{P}_{t+1}\sim\mathcal{N}(0,G^{P}),\quad t=0,1,\cdots,T-1, (4.2)

with {wtP}t=0T1\{w^{P}_{t}\}_{t=0}^{T-1} a sequence of i.i.d. random variables. Here GPp×pG^{P}\in\mathbb{R}^{p\times p} and Ht+1Pp×n1H_{t+1}^{P}\in\mathbb{R}^{p\times n_{1}}. Similarly, player EE observes x0(2)x_{0}^{(2)} and believes that x0(1)x_{0}^{(1)} is drawn from a Gaussian distribution x0(1)𝒩(x^0E,(1),W0E)x^{(1)}_{0}\sim\mathcal{N}(\widehat{x}_{0}^{E,(1)},W^{E}_{0}). Then player EE observes part of the state xt(2)n2x_{t}^{(2)}\in\mathbb{R}^{n_{2}} and the noisy state signal ztEqz_{t}^{E}\in\mathbb{R}^{q}:

zt+1E=Ht+1Ext+1(1)+wt+1E,wt+1E𝒩(0,GE),t=0,1,,T1.\displaystyle z_{t+1}^{E}=H_{t+1}^{E}\,x^{(1)}_{t+1}+\,w^{E}_{t+1},\quad w^{E}_{t+1}\sim\mathcal{N}(0,G^{E}),\quad t=0,1,\cdots,T-1. (4.3)

with {wtE}t=0T1\{w^{E}_{t}\}_{t=0}^{T-1} a sequence of i.i.d. random variables. For simplicity we assume that {wtE}t=0T1\{w^{E}_{t}\}_{t=0}^{T-1} are independent from {wtP}t=0T1\{w^{P}_{t}\}_{t=0}^{T-1}. In addition we have GEq×qG^{E}\in\mathbb{R}^{q\times q} and Ht+1Eq×n1H_{t+1}^{E}\in\mathbb{R}^{q\times n_{1}}.

Both players make their decisions based on the public and private information available to them. We write 𝒵tP={zsP}s=1t\mathcal{Z}_{t}^{P}=\{z_{s}^{P}\}_{s=1}^{t} and 𝒵tE={zsE}s=1t\mathcal{Z}_{t}^{E}=\{z_{s}^{E}\}_{s=1}^{t} for the private signals players P and E receive up to time tt (1tT)(1\leq t\leq T), respectively. Let 𝒰tP={usP}s=1t\mathcal{U}^{P}_{t}=\{u^{P}_{s}\}_{s=1}^{t} and 𝒰tE={usE}s=1t\mathcal{U}^{E}_{t}=\{u^{E}_{s}\}_{s=1}^{t} denote the control history from the buyer and seller up to time tt, respectively. Also let 𝒳t:={xs(2)}s=0t\mathcal{X}_{t}:=\{x_{s}^{(2)}\}_{s=0}^{t} be the public information that is available to both players.

We assume tP\mathcal{H}^{P}_{t} is the information (or history) available to player P and tE\mathcal{H}^{E}_{t} is the information available to player E for them to make decisions at time tt, where tP\mathcal{H}^{P}_{t} and tE\mathcal{H}^{E}_{t} follow:

tP={x^0P,W0P,W0E}𝒵tP𝒳t𝒰t1P𝒰t1E,tE={x^0E,W0P,W0E}𝒵tE𝒳t𝒰t1P𝒰t1E.\displaystyle\mathcal{H}^{P}_{t}=\{\widehat{x}_{0}^{P},W_{0}^{P},W_{0}^{E}\}\cup\mathcal{Z}_{t}^{P}\cup\mathcal{X}_{t}\cup\mathcal{U}^{P}_{t-1}\cup\mathcal{U}^{E}_{t-1},\,\,\mathcal{H}^{E}_{t}=\{\widehat{x}_{0}^{E},W_{0}^{P},W_{0}^{E}\}\cup\mathcal{Z}_{t}^{E}\cup\mathcal{X}_{t}\cup\mathcal{U}^{P}_{t-1}\cup\mathcal{U}^{E}_{t-1}. (4.4)

Note that the covariance matrices {W0P,W0E}\{W_{0}^{P},W_{0}^{E}\} are known to both players.

Cost Function.

Each player ii (i=P,Ei=P,E) strives to minimize their own cost function:

min{uti}t=0T1Ji(x^0i,(1),x0(2))\displaystyle\min_{\{u^{i}_{t}\}_{t=0}^{T-1}}J^{i}(\widehat{x}_{0}^{i,(1)},x_{0}^{(2)}) :=\displaystyle:= min{uti}t=0T1𝔼[xTQTixT+t=0T1(xtQtixt+(uti)Rtiuti)|0i],\displaystyle\min_{\{u^{i}_{t}\}_{t=0}^{T-1}}\mathbb{E}\left.\left[x_{T}^{\top}{Q^{i}_{T}}x_{T}+\sum_{t=0}^{T-1}\left({x_{t}^{\top}Q^{i}_{t}x_{t}}+(u^{i}_{t})^{\top}R_{t}^{i}u^{i}_{t}\right)\,\right|\,\mathcal{H}^{i}_{0}\right], (4.5)

with cost parameters QtP,QtEn×nQ_{t}^{P},Q_{t}^{E}\in\mathbb{R}^{n\times n}, RtPm×mR_{t}^{P}\in\mathbb{R}^{m\times m} and RtEk×kR_{t}^{E}\in\mathbb{R}^{k\times k}.

Rewrite the earlier model as At=(At(1,1)At(1,2)At(2,1)At(2,2))A_{t}=\begin{pmatrix}A_{t}^{(1,1)}\,A_{t}^{(1,2)}\\ A_{t}^{(2,1)}\,A_{t}^{(2,2)}\end{pmatrix} with At(1,1)n1×n1A_{t}^{(1,1)}\in\mathbb{R}^{n_{1}\times n_{1}}, At(1,2)n1×n2A_{t}^{(1,2)}\in\mathbb{R}^{n_{1}\times n_{2}}, At(2,1)n2×n1A_{t}^{(2,1)}\in\mathbb{R}^{n_{2}\times n_{1}} and At(2,2)n2×n2A_{t}^{(2,2)}\in\mathbb{R}^{n_{2}\times n_{2}}. Similarly, rewrite BtP=(BtP,(1),BtP,(2))B^{P}_{t}=(B_{t}^{P,(1)},B_{t}^{P,(2)})^{\top} with BtP,(1)n1×mB_{t}^{P,(1)}\in\mathbb{R}^{n_{1}\times m} and BtP,(2)n2×mB_{t}^{P,(2)}\in\mathbb{R}^{n_{2}\times m}, and BtE=(BtE,(1),BtE,(2))B^{E}_{t}=(B_{t}^{E,(1)},B_{t}^{E,(2)})^{\top} with BtE,(1)n1×kB_{t}^{E,(1)}\in\mathbb{R}^{n_{1}\times k}, BtE,(2)n2×kB_{t}^{E,(2)}\in\mathbb{R}^{n_{2}\times k}, and Γt=(Γt(1),Γt(2))\Gamma_{t}=(\Gamma_{t}^{(1)},\Gamma_{t}^{(2)})^{\top} with Γt(1)n1×d\Gamma_{t}^{(1)}\in\mathbb{R}^{n_{1}\times d} and Γt(2)n2×d\Gamma_{t}^{(2)}\in\mathbb{R}^{n_{2}\times d}. For the cost parameters, Qti=(Qti,(1,1)Qti,(1,2)(Qti,(1,2))Qti,(2,2))Q_{t}^{i}=\begin{pmatrix}Q_{t}^{i,(1,1)}&Q_{t}^{i,(1,2)}\\ (Q_{t}^{i,(1,2)})^{\top}&Q_{t}^{i,(2,2)}\end{pmatrix} with Qti,(1,1)n1×n1Q_{t}^{i,(1,1)}\in\mathbb{R}^{n_{1}\times n_{1}} , Qti,(1,2)n1×n2Q_{t}^{i,(1,2)}\in\mathbb{R}^{n_{1}\times n_{2}}, and Qti,(2,2)n2×n2Q_{t}^{i,(2,2)}\in\mathbb{R}^{n_{2}\times n_{2}} for i=P,Ei=P,E.

For the mixed case we make the following assumptions on the parameters, initial state, and noise.

Assumption 4.1 ([Mixed Setting] Parameters, Initial State, and Noise).

For i=P,Ei=P,E,

  1. 1.

    {wt}t=0T1\{w_{t}\}_{t=0}^{T-1} and {wti}t=1T1\{w_{t}^{i}\}_{t=1}^{T-1} are zero-mean, i.i.d. Gaussian random variables that are independent from x0x_{0} and each other and such that 𝔼[wtwt]=W\mathbb{E}[w_{t}w_{t}^{\top}]=W and 𝔼[wti(wti)]=Gi\mathbb{E}[w_{t}^{i}(w_{t}^{i})^{\top}]=G^{i} are positive definite;

  2. 2.

    Both matrices Ht+1Pp×n1H_{t+1}^{P}\in\mathbb{R}^{p\times n_{1}} and Ht+1Eq×n1H_{t+1}^{E}\in\mathbb{R}^{q\times n_{1}} have rank n1n_{1} for t=0,,T1t=0,\dots,T-1.

  3. 3.

    The matrices Γt(1)W(Γt(1))\Gamma_{t}^{(1)}W(\Gamma_{t}^{(1)})^{\top} are non-singular for t=1,,Tt=1,\dots,T;

  4. 4.

    The cost matrices QtiQ_{t}^{i}, for t=0,1,,Tt=0,1,\ldots,T are positive semi-definite, and RtiR_{t}^{i} for t=0,1,,T1t=0,1,\ldots,T-1 are positive definite.

We now give the main results and omit the proofs as they follow naturally by applying the ideas in Sections 2 and 3 to the partially observable part of the state process.

Theorem 4.2 (Sufficient Statistics in Two-player Games).

Assume that both players are applying linear strategies in that utP=FtP,(1)𝔼[xt(1)|tP]+FtP,(2)xt(2)u^{P}_{t}=F_{t}^{P,(1)}\,\mathbb{E}[x^{(1)}_{t}|\mathcal{H}_{t}^{P}]+F_{t}^{P,(2)}\,x_{t}^{(2)} and utE=FtE,(1)𝔼[xt(1)|tE]+FtE,(2)xt(2)u^{E}_{t}=F_{t}^{E,(1)}\,\mathbb{E}[x^{(1)}_{t}|\mathcal{H}_{t}^{E}]+F_{t}^{E,(2)}\,x_{t}^{(2)} for some matrices FtP,(1)m×n1F_{t}^{P,(1)}\in\mathbb{R}^{m\times n_{1}} of rank min(m,n1)\min(m,n_{1}), FtP,(2)m×n2F_{t}^{P,(2)}\in\mathbb{R}^{m\times n_{2}}, FtE,(1)k×n1F_{t}^{E,(1)}\in\mathbb{R}^{k\times n_{1}} of rank min(k,n1)\min(k,n_{1}), and FtE,(2)k×n2F_{t}^{E,(2)}\in\mathbb{R}^{k\times n_{2}}. The sufficient statistic of player ii for i=P,Ei=P,E at decision time t=0t=0 is x0N(x^0i,W0i)x_{0}\sim N(\widehat{x}_{0}^{i},W_{0}^{i}). For time 1tT11\leq t\leq T-1, the distribution of xt(1)x^{(1)}_{t} as calculated by player ii, conditioning on the private information available to him at time tt, is given by

xt(1)𝒩(x^ti,(1),Σ^ti),\displaystyle x^{(1)}_{t}\sim\mathcal{N}(\widehat{x}_{t}^{i,(1)},\widehat{\Sigma}_{t}^{i}), (4.6)

where, for jij\neq i,

Jt1i\displaystyle J_{t-1}^{i} =(Σ^t1iΣ~t1(i,j))Σ^t1(i,j)(Yt1j,(1))(Yt1j,(1)Σ^t1(i,j)Σ^t1(i,j)(Yt1j,(1)))1,\displaystyle=\Big{(}\widehat{\Sigma}_{t-1}^{i}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}\widehat{\Sigma}_{t-1}^{(i,j)}({Y}_{t-1}^{j,(1)})^{\top}\Big{(}{Y}_{t-1}^{j,(1)}\widehat{\Sigma}_{t-1}^{(i,j)}\widehat{\Sigma}_{t-1}^{(i,j)}({Y}_{t-1}^{j,(1)})^{\top}\Big{)}^{-1}, (4.7a)
(x^t1i,(1))+\displaystyle(\widehat{x}^{i,(1)}_{t-1})^{+} =x^t1i,(1)+Jt1i(yt1jYt1j,(1)x^t1i,(1)),\displaystyle=\widehat{x}^{i,(1)}_{t-1}+J_{t-1}^{i}\Big{(}{y}^{j}_{t-1}-{Y}_{t-1}^{j,(1)}\widehat{x}_{t-1}^{i,(1)}\Big{)}, (4.7b)
(Σ^t1i)+\displaystyle(\widehat{\Sigma}_{t-1}^{i})^{+} =Σ^t1i(Σ^t1iΣ~t1(i,j))(Σ^t1(i,j))1(Σ^t1iΣ~t1(i,j)),\displaystyle=\widehat{\Sigma}^{i}_{t-1}-\Big{(}\widehat{\Sigma}^{i}_{t-1}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}(\widehat{\Sigma}_{t-1}^{(i,j)})^{-1}\Big{(}\widehat{\Sigma}_{t-1}^{i}-\widetilde{\Sigma}_{t-1}^{(i,j)}\Big{)}^{\top}, (4.7c)
(x^ti,(1))\displaystyle\big{(}\widehat{x}^{i,(1)}_{t}\big{)}^{-} =At1(1,1)(x^t1i,(1))++At1(1,2)xt1(2)+Bt1P,(1)ut1P+Bt1E,(1)ut1E,\displaystyle=A_{t-1}^{(1,1)}(\widehat{x}^{i,(1)}_{t-1})^{+}+A_{t-1}^{(1,2)}x_{t-1}^{(2)}+B^{P,(1)}_{t-1}u^{P}_{t-1}+B^{E,(1)}_{t-1}u^{E}_{t-1}, (4.7d)
(Σ^ti)\displaystyle\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-} =At1(1,1)(Σ^t1i)+(At1(1,1))+Γt1(1)W(Γt1(1)),\displaystyle=A^{(1,1)}_{t-1}(\widehat{\Sigma}^{i}_{t-1})^{+}(A^{(1,1)}_{t-1})^{\top}+\Gamma_{t-1}^{(1)}W(\Gamma_{t-1}^{(1)})^{\top}, (4.7e)
Kti\displaystyle K_{t}^{i} =(Σ^ti)(Hti)[Hti(Σ^ti)(Hti)+Gi]1,\displaystyle=\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}(H_{t}^{i})^{\top}\left[H_{t}^{i}\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}(H^{i}_{t})^{\top}+G^{i}\right]^{-1}, (4.7f)
x^ti,(1)\displaystyle\widehat{x}_{t}^{i,(1)} =(x^ti,(1))+Kti[ztiHti(x^ti,(1))],\displaystyle=\big{(}\widehat{x}_{t}^{i,(1)}\big{)}^{-}+K_{t}^{i}\left[z_{t}^{i}-H_{t}^{i}\big{(}\widehat{x}_{t}^{i,(1)}\big{)}^{-}\right], (4.7g)
Σ^ti\displaystyle\widehat{\Sigma}^{i}_{t} =(IKtiHti)(Σ^ti),\displaystyle=\big{(}I-K_{t}^{i}H^{i}_{t}\big{)}\big{(}\widehat{\Sigma}^{i}_{t}\big{)}^{-}, (4.7h)
Σ~t(i,j)\displaystyle\widetilde{\Sigma}_{t}^{(i,j)} =(IKtiHti)(At1(1,1)Δt1(i,j)(At1(1,1))+Γt1(1)W(Γt1(1)))(IKtjHtj),\displaystyle=\left(I-K_{t}^{i}H_{t}^{i}\right)\left(A^{(1,1)}_{t-1}\Delta_{t-1}^{(i,j)}(A^{(1,1)}_{t-1})^{\top}+\Gamma_{t-1}^{(1)}W(\Gamma_{t-1}^{(1)})^{\top}\right)\left(I-K_{t}^{j}H^{j}_{t}\right)^{\top}, (4.7i)
Δt1(i,j)\displaystyle\Delta_{t-1}^{(i,j)} =(Σ^t1iΣ~t1(i,j))(Σ^t1(i,j))1(Σ^t1jΣ~t1(j,i))+Σ~t1(i,j),\displaystyle={(\widehat{\Sigma}^{i}_{t-1}-\widetilde{\Sigma}_{t-1}^{(i,j)})(\widehat{\Sigma}^{(i,j)}_{t-1})^{-1}(\widehat{\Sigma}^{j}_{t-1}-\widetilde{\Sigma}_{t-1}^{(j,i)})^{\top}+\widetilde{\Sigma}_{t-1}^{(i,j)}}, (4.7j)
Σ^t(i,j)\displaystyle\widehat{\Sigma}_{t}^{(i,j)} =Σ^ti+Σ^tjΣ~t(i,j)(Σ~t(i,j)),\displaystyle=\widehat{\Sigma}_{t}^{i}+\widehat{\Sigma}_{t}^{j}-\widetilde{\Sigma}_{t}^{(i,j)}-\left(\widetilde{\Sigma}_{t}^{(i,j)}\right)^{\top}, (4.7k)

where Σ^t1(i,j)\widehat{\Sigma}_{t-1}^{(i,j)} is positive definite. The values of YtP,(1)m×n1Y_{t}^{P,(1)}\in\mathbb{R}^{m\times n_{1}}, YtE,(1)k×n1Y_{t}^{E,(1)}\in\mathbb{R}^{k\times n_{1}} and ytPy_{t}^{P}, ytEy_{t}^{E} depend on the ranks of FtP,(1)F_{t}^{P,(1)} and FtE,(1)F_{t}^{E,(1)} as follows:

  • (i)

    The pair

    (YtP,(1),ytP)={(FtP,(1),utPFt1P,(2)xt1(2))if FtP,(1) has rank m<n1,(In,x^tP,(1))if FtP has rank n1m.(Y_{t}^{P,(1)},y_{t}^{P})=\left\{\begin{array}[]{ll}\left(F_{t}^{P,(1)},\,u_{t}^{P}-F_{t-1}^{P,(2)}x_{t-1}^{(2)}\right)&\mbox{if $F_{t}^{P,(1)}$ has rank $m<n_{1}$,}\\ (I_{n},\widehat{x}_{t}^{P,(1)})&\mbox{if $F_{t}^{P}$ has rank $n_{1}\leq m$.}\end{array}\right.
  • (ii)

    The pair

    (YtE,(1),ytE)={(FtE,(1),utEFt1E,(2)xt1(2))if FtE,(1) has rank k<n1,(In,x^tE,(1))if FtE,(1) has rank n1k.(Y_{t}^{E,(1)},y_{t}^{E})=\left\{\begin{array}[]{ll}\left(F_{t}^{E,(1)},\,u_{t}^{E}-F_{t-1}^{E,(2)}x_{t-1}^{(2)}\right)&\mbox{if $F_{t}^{E,(1)}$ has rank $k<n_{1}$,}\\ (I_{n},\widehat{x}_{t}^{E,(1)})&\mbox{if $F_{t}^{E,(1)}$ has rank $n_{1}\leq k$.}\end{array}\right.

Finally, the initial conditions are Σ^0i=W0i\widehat{\Sigma}^{i}_{0}=W^{i}_{0}, Σ~0(i,j)=0\widetilde{\Sigma}_{0}^{(i,j)}=0, and Σ^0(i,j)=Σ^0i+Σ^0j\widehat{\Sigma}^{(i,j)}_{0}=\widehat{\Sigma}_{0}^{i}+\widehat{\Sigma}_{0}^{j}.

Theorem 4.3 (Nash Equilibrium).

Suppose Assumption 4.1 holds and there exists a unique solution {FtP}t=0T1\{F_{t}^{P*}\}_{t=0}^{T-1} and {FtE}t=0T1\{F_{t}^{E*}\}_{t=0}^{T-1} to (3.7)-(3.8) with FtP,(1)F_{t}^{P*,(1)} of rank min(m,n1)\min(m,n_{1}) and FtE,(1)F_{t}^{E*,(1)} of rank min(k,n1)\min(k,n_{1}). Further assume that both players apply linear policies. Then the unique Nash equilibrium policy is

uti=Ftiyti,withyti=(x^ti,(1),xt(2)),i=P,E.u_{t}^{i*}=F_{t}^{i*}y_{t}^{i},\quad\text{with}\quad y_{t}^{i}=(\widehat{x}_{t}^{i,(1)},x_{t}^{(2)})^{\top},\quad i=P,E. (4.8)

The corresponding optimal value functions are quadratic (0tT)(0\leq t\leq T):

VtP(ytP;FE)=(ytP)UtPytP+c~tP,VtE(ytE;FP)=(ytE)UtEytE+c~tE,\displaystyle V^{P}_{t}(y_{t}^{P};F^{E*})=(y_{t}^{P})^{\top}U_{t}^{P*}y_{t}^{P}+\widetilde{c}^{P*}_{t},\quad V^{E}_{t}(y_{t}^{E};F^{P*})=(y_{t}^{E})^{\top}U_{t}^{E*}y_{t}^{E}+\widetilde{c}^{E*}_{t}, (4.9)

with matrices UtP,UtEn×nU_{t}^{P*},U_{t}^{E*}\in\mathbb{R}^{n\times n} given in (3.9) and (3.10). Here for i,j=P,Ei,j=P,E and jij\neq i, the scalar ctic^{i*}_{t}\in\mathbb{R} is given by

c~ti\displaystyle\widetilde{c}_{t}^{i*} =\displaystyle= c~t+1i+Tr(Qti,(1,1)Σ^ti)+Tr((L¯ti,(1))Ut+1iL¯ti,(1)Σ^ti)+Tr((L¯ti,(2))Ut+1iL¯ti,(2)Σ^tj)\displaystyle\widetilde{c}_{t+1}^{i*}+\operatorname{Tr}(Q_{t}^{i,(1,1)}\widehat{\Sigma}_{t}^{i})+\operatorname{Tr}\big{(}(\overline{L}_{t}^{i,(1)})^{\top}U_{t+1}^{i}\overline{L}_{t}^{i,(1)}\widehat{\Sigma}_{t}^{i}\big{)}+\operatorname{Tr}\big{(}(\overline{L}_{t}^{i,(2)})^{\top}U_{t+1}^{i}\overline{L}_{t}^{i,(2)}\widehat{\Sigma}_{t}^{j}\big{)}
+2Tr((L¯ti,(2))Ut+1iL¯ti,(1)Σ~t(i,j))+Tr((Kt+1i)Ut+1i,(1,1)Kt+1iGi)\displaystyle+2\operatorname{Tr}\big{(}(\overline{L}_{t}^{i,(2)})^{\top}U_{t+1}^{i}\overline{L}_{t}^{i,(1)}\widetilde{\Sigma}_{t}^{(i,j)}\big{)}+\operatorname{Tr}\left((K_{t+1}^{i})^{\top}U_{t+1}^{i,(1,1)}K_{t+1}^{i}G^{i}\right)
+Tr([(Kt+1iHt+1iΓt(1))(Γt(2))]Ut+1i[Kt+1iHt+1iΓt(1)Γt(2)]W),\displaystyle+\operatorname{Tr}\left(\begin{bmatrix}\big{(}K_{t+1}^{i}H_{t+1}^{i}\Gamma_{t}^{(1)}\big{)}^{\top}&(\Gamma_{t}^{(2)})^{\top}\end{bmatrix}U_{t+1}^{i}\begin{bmatrix}K_{t+1}^{i}H_{t+1}^{i}\Gamma_{t}^{(1)}\\ \Gamma_{t}^{(2)}\end{bmatrix}W\right),

where L¯ti,(1)=(L~ti,(1),(At(2,1)+Btj,(2)Ftj,(1)))\overline{L}_{t}^{i,(1)}=(\widetilde{L}_{t}^{i,(1)},-(A_{t}^{(2,1)}+B_{t}^{j,(2)}F_{t}^{j*,(1)}))^{\top}, and L¯ti,(2)=(L~ti,(2),Btj,(2)Ftj,(1))\overline{L}_{t}^{i,(2)}=(\widetilde{L}_{t}^{i,(2)},B_{t}^{j,(2)}F_{t}^{j*,(1)})^{\top} with

L~ti,(1)\displaystyle\widetilde{L}_{t}^{i,(1)} =\displaystyle= At(1,1)ΠtiBtj,(1)Ftj,(1)Kt+1iHt+1iAt(1,1)(IΠti),\displaystyle-A_{t}^{(1,1)}\Pi_{t}^{i}-B_{t}^{j,(1)}F_{t}^{j*,(1)}-K_{t+1}^{i}H_{t+1}^{i}A_{t}^{(1,1)}(I-\Pi_{t}^{i}),
L~ti,(2)\displaystyle\widetilde{L}_{t}^{i,(2)} =\displaystyle= At(1,1)Πti+Btj,(1)Ftj,(1)Kt+1iHt+1iAt(1,1)Πti,\displaystyle A_{t}^{(1,1)}\Pi_{t}^{i}+B_{t}^{j,(1)}F_{t}^{j*,(1)}-K_{t+1}^{i}H_{t+1}^{i}A_{t}^{(1,1)}\Pi_{t}^{i},

where Πti:=(Σ^tiΣ~t(i,j))(Σ^t(i,j))1\Pi_{t}^{i}:=\big{(}\widehat{\Sigma}_{t}^{i}-\widetilde{\Sigma}_{t}^{(i,j)}\big{)}\big{(}\widehat{\Sigma}_{t}^{(i,j)}\big{)}^{-1}. The terminal conditions are c~Ti=Tr(QTi,(1,1)Σ^Ti)\widetilde{c}_{T}^{i*}=\operatorname{Tr}(Q_{T}^{i,(1,1)}\widehat{\Sigma}_{T}^{i}) for i=P,Ei=P,E.

5 Numerical Experiment: the Bargaining Game

In this section, we perform some numerical experiments on a bargaining game example which can be cast into the framework introduced in Section 4. Consider a two-player bargaining or negotiation game where a buyer and a seller must agree on the value of a good. Each party has a target price, that is the price they want to achieve by agreement at the deadline. The target price depends on their view of the project’s true value. The buyer (resp. seller) does not know the target price of the seller (resp. buyer) or the true value of the good. The challenge is to establish a model for the bargaining situation and find the optimal bidding strategies when both parties have partial information about their counterparties, under some uncertainties (e.g. market fluctuations).

In this section we focus on the case n1=1n_{1}=1 when the opponent’s state estimate can be inferred. The case where n1=2n_{1}=2, where this is not the case, has similar results and these are deferred to Appendix B.

5.1 Mathematical Set-up

We now cast this bargaining game into the mathematical framework introduced in Section 4. Assume we have two players, a buyer BB and a seller SS, who aim to reach an agreement on the value (or the price) of a good. The negotiation takes place over a finite period of time TT. At each timestamp tt, the buyer and seller simultaneously offer prices. We let xtBx_{t}^{B}\in\mathbb{R} be the price offered by the buyer and xtSx_{t}^{S}\in\mathbb{R} be the price offered by the seller. The dynamics of the offers follow

xt+1B=xtB+utB+ϵtB,xt+1S=xtS+utS+ϵtS, with initial values x0B,x0S,\displaystyle x_{t+1}^{B}=x_{t}^{B}+u_{t}^{B}+{\epsilon}_{t}^{B},\quad x_{t+1}^{S}=x_{t}^{S}+u_{t}^{S}+{\epsilon}_{t}^{S},\text{ with initial values }x_{0}^{B},x_{0}^{S}, (5.1)

Here utBu_{t}^{B}\in\mathbb{R} is the change in the buyer’s offer and utSu_{t}^{S}\in\mathbb{R} is the change in the seller’s offer at time tt. The random variables ϵtB{\epsilon}_{t}^{B} and ϵtS{\epsilon}_{t}^{S} are IID, representing the noise in both parties offers with ϵtB𝒩(0,W¯B){\epsilon}_{t}^{B}\sim\mathcal{N}(0,\overline{W}^{B}) and ϵtS𝒩(0,W¯S){\epsilon}_{t}^{S}\sim\mathcal{N}(0,\overline{W}^{S}), respectively. We note that ϵtB\epsilon_{t}^{B} and ϵtS\epsilon_{t}^{S} serve as regularization terms to guarantee the non-degeneracy of the state noise. Another way of thinking about this is to consider utBu_{t}^{B} and utSu_{t}^{S} as the intended change of their offers when players can not completely control the differences between their offers (for example, due to some external restrictions). Both players can observe each other’s exact offers. Thus (xtB,xtS)(x_{t}^{B},x_{t}^{S})^{\top} corresponds to the fully observable part xt(2)x_{t}^{(2)} in Section 4.

We assume the value of the good ptp_{t}\in\mathbb{R} is not available to both players and its dynamics follow:

pt+1=pt+wt,p_{t+1}=p_{t}+w_{t}, (5.2)

where {wt}t=0T1\{w_{t}\}_{t=0}^{T-1} is a sequence of IID Gaussian random variables with zero mean and covariance W¯\overline{W}\in\mathbb{R}. Both the buyer and the seller do not have access to the true value of the good. Instead, they observe a noisy version of the value using their private information. At time t=0t=0, player ii (i=B,Si=B,S) believes that the initial value p0𝒩(p^0i,W0i)p_{0}\sim\mathcal{N}(\widehat{p}_{0}^{i},W^{i}_{0}), and after that player ii observes the following noisy signal:

zt+1i=pt+1+wt+1i,wt+1i𝒩(0,Gi),t=0,1,,T1,z_{t+1}^{i}=p_{t+1}+\,w^{i}_{t+1},\quad w^{i}_{t+1}\sim\mathcal{N}(0,G^{i}),\quad t=0,1,\cdots,T-1, (5.3)

where {wti}t=1T1\{w^{i}_{t}\}_{t=1}^{T-1} is a sequence of IID random variables, and {wtB}t=1T1\{w^{B}_{t}\}_{t=1}^{T-1} and {wtS}t=1T1\{w^{S}_{t}\}_{t=1}^{T-1} are independent of each other. Thus ptp_{t} corresponds to the partially observable part xt(1)x_{t}^{(1)} in (4.1) of Section 4, with n1=1n_{1}=1.

We formulate player ii’s (i=B,Si=B,S) objective in the game as

min{uti}t=0T1𝔼[αi(xTBxTS)2+βi(xTi(1+δi)pT)2+t=0T1Rti(uti)2|0i],\min_{\{u_{t}^{i}\}_{t=0}^{T-1}}\mathbb{E}\left.\left[\alpha_{i}\left(x_{T}^{B}-x_{T}^{S}\right)^{2}+\beta_{i}\left(x_{T}^{i}-(1+\delta_{i})p_{T}\right)^{2}+\sum_{t=0}^{T-1}R_{t}^{i}(u_{t}^{i})^{2}\,\right|\,\mathcal{H}^{i}_{0}\right], (5.4)

where δB(1,0)\delta_{B}\in(-1,0) and δS(0,1)\delta_{S}\in(0,1) are the scalars that determine the buyer’s and the seller’s target price at terminal time TT. The constants αB>0\alpha_{B}>0 and αS>0\alpha_{S}>0 are the penalties for not reaching an agreement, and βB>0\beta_{B}>0 and βS>0\beta_{S}>0 are the penalties for deviating from their target prices. The quadratic terms αS(xTBxTS)2\alpha_{S}\left(x^{B}_{T}-x^{S}_{T}\right)^{2} and αB(xTBxTS)2\alpha_{B}\left(x^{B}_{T}-x^{S}_{T}\right)^{2} can be viewed as a relaxation of the hard constraint xTB=xTSx^{B}_{T}=x^{S}_{T}. The parameters RtB>0R_{t}^{B}>0 and RtS>0R_{t}^{S}>0 measure the cost of adjusting the offer price at each time step, thus the final terms represent the penalty for making concessions. The filtrations 0B:={ξ^0B,W0B,W0S}\mathcal{H}^{B}_{0}:=\{\widehat{{\xi}}_{0}^{B},W_{0}^{B},W_{0}^{S}\} and 0S:={ξ^0S,W0B,W0S}\mathcal{H}^{S}_{0}:=\{\widehat{{\xi}}_{0}^{S},W_{0}^{B},W_{0}^{S}\} represent the information available at time 0.

Both players have the incentive to reach an agreement at terminal time TT. The desire to reach this agreement is characterized by the value of αB\alpha_{B} and αS\alpha_{S}, which may be different for the buyer and the seller. The hard constraint xTB=xTSx^{B}_{T}=x^{S}_{T} can be recovered by letting αB\alpha_{B} and αS\alpha_{S} tend to infinity. The seller wants to sell the good at a price that is higher than (his estimate of) the true price and thus δS>0\delta_{S}>0. Similarly δB<0\delta_{B}<0 as the buyer has the incentive to buy at a price lower than his estimated true price.

5.2 Experiments

In this section, we present some numerical experiments and discuss the effect of observation noise and our information corrections for the bargaining game introduced in Section 5.1. We focus on the case n1=1n_{1}=1, where the dynamics of the value of the good and the players’ noisy observations are defined in (5.2)-(5.3). The bargaining model considered in this section satisfies the conditions for the special case described in point 5. of Remark 2.5, where each player can fully recover the opponent’s state estimate in the previous step.

Experimental Set-up.

In the bargaining game (5.1)-(5.4), the model parameters are,

At=I,BtB=[010],BtS=[001],W=[W¯000W¯B000W¯S],QtB=QtS=0, and A_{t}=I,\,\,B_{t}^{B}=\begin{bmatrix}0\\ 1\\ 0\end{bmatrix},\quad B_{t}^{S}=\begin{bmatrix}0\\ 0\\ 1\end{bmatrix},\quad W=\begin{bmatrix}\overline{W}&0&0\\ 0&\overline{W}^{B}&0\\ 0&0&\overline{W}^{S}\end{bmatrix},\quad Q_{t}^{B}=Q_{t}^{S}=0,\text{ and }
QTB=[βB(1+δB)2βB(1+δB)0βB(1+δB)αB+βBαB0αBαB],QTS=[βS(1+δS)20βS(1+δS)0αSαSβS(1+δS)αSαS+βS],Q_{T}^{B}=\begin{bmatrix}\beta_{B}(1+\delta_{B})^{2}&-\beta_{B}(1+\delta_{B})&0\\ -\beta_{B}(1+\delta_{B})&\alpha_{B}+\beta_{B}&-\alpha_{B}\\ 0&-\alpha_{B}&\alpha_{B}\end{bmatrix},\quad Q_{T}^{S}=\begin{bmatrix}\beta_{S}(1+\delta_{S})^{2}&0&-\beta_{S}(1+\delta_{S})\\ 0&\alpha_{S}&-\alpha_{S}\\ -\beta_{S}(1+\delta_{S})&-\alpha_{S}&\alpha_{S}+\beta_{S}\end{bmatrix},

for t=0,1,,T1t=0,1,\ldots,T-1. Also we have Hti=IH_{t}^{i}=I for i=S,Bi=S,B.

In the experiments we let αB=αS=50\alpha_{B}=\alpha_{S}=50, βB=βS=30\beta_{B}=\beta_{S}=30, δB=0.05\delta_{B}=-0.05, δS=0.05\delta_{S}=0.05, and T=10T=10, so the players care more about reaching an agreement with each other. We set the penalty function to be Rti=ρiexp(γit)R_{t}^{i}=\rho_{i}\exp(-\gamma_{i}t) for i=B,Si=B,S with ρB=ρS=15\rho_{B}=\rho_{S}=15 and γB=γS=0.1\gamma_{B}=\gamma_{S}=0.1. The penalty function decays over time which allows players to be more flexible near the deadline to reach an agreement. For the initial state we set p0=50p_{0}=50, x0B=10x_{0}^{B}=10, x0S=90x_{0}^{S}=90. We also set W¯=9\overline{W}=9 for the noise in the dynamics of the true value of the good, and W¯B=W¯S=1012\overline{W}^{B}=\overline{W}^{S}=10^{-12}. The reason for adding the small noise to xtBx_{t}^{B} and xtSx_{t}^{S} is to guarantee the well-definedness of the problem. In practice we can set W¯B=W¯S=0\overline{W}^{B}=\overline{W}^{S}=0, and the numerical experiments will still work.

To see the effect of the observation noise, we let the buyer have a much more noisy observation of the true price (GB=100G^{B}=100 and GS=1G^{S}=1). We also set x^0B=40\widehat{x}_{0}^{B}=40 with W0B=100W_{0}^{B}=100 for the buyer and x^0S=51\widehat{x}_{0}^{S}=51 with W0S=1W_{0}^{S}=1 for the seller, thus the buyer has a far more inaccurate guess of the initial state. In the figures and tables we will write IC for information corrections.

Effect of Observation Noise.

Since the buyer receives relatively noisy signals of the true price, their price estimate (indicated in orange) will be more inaccurate than the seller’s (indicated in blue) in the example shown in Figure 1. The behaviour of both players is similar to that in the full information case, since the buyer utilizes the seller’s accurate information to improve their own state estimate.

Effect of Information Corrections.

A key contribution of our work is information corrections, where players correct their estimate of the state after observing their opponent’s actions. We demonstrate the power of the information corrections in Figure 2. When the buyer skips steps (4.7a)-(4.7c), their state estimate will rely purely on their own observations and thus can be very inaccurate. However, with information corrections, they can obtain a better estimate which is less affected by the noisy observations. Hence they are more likely to reach an agreement with the seller at a reasonable price.

Refer to caption
Refer to caption
Figure 1: Comparison between the full observation (right) and partial observation (left) cases.
Refer to caption
Refer to caption
Figure 2: The buyer’s price estimate with IC (right) and without IC (left).

We now show some statistics for the buyer’s estimation error in Table 1, where both the average mean squared error and average mean absolute error of 500 experiments (each experiment consists of 10 rounds of bargaining) are smaller when information corrections are used. We also show the effect of using information corrections on the outcomes of the bargaining game in Table 2. If the difference between the players’ final offers is less than 3, we consider them to have reached an agreement. We can see that with information corrections the players are more likely to reach an agreement. The difference is shown in Figure 2. We can see that with information correction the final offers of the buyer and the seller are closer to each other and they have made the deal, while without information correction they are not able to reach an agreement in the end.

Mean squared error Mean absolute error
With information correction 17.43 3.10
Without information correction 35.91 4.83
Table 1: Effect of IC on the buyer’s estimation error (average over 500 experiments).

Furthermore, the above setting is an “asymmetric” case, where the buyer has a more inaccurate estimate of the initial state and receives noisier signals during the negotiation. We now compare the number of agreements obtained in this case (Table 2) with that obtained in two symmetric cases, which can be considered as benchmarks. In the symmetric case where both players have “accurate” information with small observation noises, we let GB=GS=1G^{B}=G^{S}=1 for both players, and set x^0B=49\widehat{x}_{0}^{B}=49, and x^0S=51\widehat{x}_{0}^{S}=51 with W0B=W0S=1W_{0}^{B}=W_{0}^{S}=1; in the symmetric where they both have inaccurate information, we set GB=GS=100G^{B}=G^{S}=100, x^0B=40\widehat{x}_{0}^{B}=40, and x^0S=60\widehat{x}_{0}^{S}=60 with W0B=W0S=100W_{0}^{B}=W_{0}^{S}=100. We run 500 experiments in each case, and the number of agreements when they both have “accurate” information is the same regardless of whether they utilize the observed actions from their opponent. However, when both players receive very noisy signals, having information corrections significantly improves the number of agreements. This further demonstrates the need for incorporating information corrections especially when players have different levels of observation noise, as is often the case in practice since players may have a variety of different information sources. We also note that the number of agreements reached in the asymmetric case is closer to that in the inaccurate symmetric case. Although the players have improved their state estimate at the previous time step, when the players make offers they are based on their current noisy observation.

Asymmetric Symmetric (“accurate”) Symmetric (inaccurate)
With IC 371 470 367
Without IC 253 470 247
Table 2: Number of agreements achieved in 500 experiments with and without IC.

We can also compare the players’ cost in the asymmetric and symmetric cases (see Table 3). The costs are calculated empirically based on (5.4) In the asymmetric case and the symmetric case when they both have very noisy observations, both players achieve significantly lower costs when using information corrections.

Asymmetric Symmetric (“accurate”) Symmetric (inaccurate)
With IC (Buyer/Seller) 2250/2235 1923/2053 2505/2715
Without IC (Buyer/Seller) 3068/2819 1924/2053 3244/3359
Table 3: Players’ average costs with and without IC in 500 experiments.
Aiming at Beneficial Prices.

In the above experiments we focused on the case where both players strive to reach an agreement with their opponent (αi>βi\alpha_{i}>\beta_{i} for i=B,Si=B,S). However, in some situations players may be more keen on achieving their target price or a more beneficial price in negotiations and not be so concerned if an agreement is reached. We now let the buyer focus more on pursuing their target price and let the seller mainly seek an agreement with the buyer. We let αB=20\alpha_{B}=20, αS=50\alpha_{S}=50 βB=40\beta_{B}=40, βS=30\beta_{S}=30, ρB=ρS=10\rho_{B}=\rho_{S}=10, and W¯=1\overline{W}=1. We also let the buyer have far more inaccurate information than the seller by setting GB=100G^{B}=100, GS=1G^{S}=1, x^0B=70\widehat{x}_{0}^{B}=70, W0B=100W_{0}^{B}=100, x^0S=53\widehat{x}_{0}^{S}=53, and W0S=1W_{0}^{S}=1. Other parameters are set to be the same as in the previous experiments.

In Table 4, we can see that the information corrections significantly improve the number of agreements achieved. Here we set the agreement price achieved to be the average of xTBx_{T}^{B} and xTSx_{T}^{S}. For a fair comparison we only consider situations where an agreement is achieved in both cases (with and without information corrections). We observe that there is a gap between the confidence intervals of the agreement prices, which illustrates that the buyer can obtain a better deal by using the information corrections to more effectively exploit the seller’s willingness to sacrifice their target price in their desire to reach an agreement.

Number of agreements Mean of APs 95% confidence interval of APs
With IC 442 48.58 (48.31, 48.84)
Without IC 342 49.61 (49.34, 49.88)
Table 4: Bargaining outcomes and agreement prices (APs) with and without IC in 500 experiments.

References

  • [1] Dilip Abreu, David Pearce, and Ennio Stacchetti. Toward a theory of discounted repeated games with imperfect monitoring. Econometrica: Journal of the Econometric Society, pages 1041–1063, 1990.
  • [2] Tamer Başar and Geert Jan Olsder. Dynamic Non-Cooperative Game Theory. Society for Industrial and Applied Mathematics, 1998.
  • [3] Ulrich Doraszelski and Juan F Escobar. A theory of regular Markov perfect equilibria in dynamic stochastic games: Genericity, stability, and purification. Theoretical Economics, 5(3):369–402, 2010.
  • [4] Qi Fu, Chee-Khian Sim, and Chung-Piaw Teo. Profit sharing agreements in decentralized supply chains: A distributionally robust approach. Operations Research, 66(2):500–513, 2018.
  • [5] Drew Fudenberg and Jean Tirole. Game Theory. MIT press, 1991.
  • [6] Abhishek Gupta, Cédric Langbort, and Tamer Başar. Dynamic games with asymmetric information and resource constrained players with applications to security of cyberphysical systems. IEEE Transactions on Control of Network Systems, 4(1):71–81, 2016.
  • [7] Abhishek Gupta, Ashutosh Nayyar, Cédric Langbort, and Tamer Basar. Common information based Markov perfect equilibria for linear-Gaussian games with asymmetric information. SIAM Journal on Control and Optimization, 52(5):3228–3260, 2014.
  • [8] Nasimeh Heydaribeni and Achilleas Anastasopoulos. Structured equilibria for dynamic games with asymmetric information and dependent types. arXiv preprint arXiv:2009.04253, 2020.
  • [9] Mingming Leng and Mahmut Parlar. Allocation of cost savings in a three-level supply chain with demand information sharing: A cooperative-game approach. Operations Research, 57(1):200–213, 2009.
  • [10] Eric Maskin and Jean Tirole. Markov perfect equilibrium: I. observable actions. Journal of Economic Theory, 100(2):191–219, 2001.
  • [11] Ashutosh Nayyar, Abhishek Gupta, Cedric Langbort, and Tamer Başar. Common information based Markov perfect equilibria for stochastic games with asymmetric information: Finite games. IEEE Transactions on Automatic Control, 59(3):555–570, 2013.
  • [12] Yi Ouyang, Hamidreza Tavafoghi, and Demosthenis Teneketzis. Dynamic games with asymmetric information: Common information based perfect Bayesian equilibria and sequential decomposition. IEEE Transactions on Automatic Control, 62(1):222–237, 2016.
  • [13] Meir Pachter. LQG dynamic games with a control-sharing information pattern. Dynamic Games and Applications, 7(2):289–322, 2017.
  • [14] Meir Pachter and Khanh Pham. Informational issues in decentralized control. In Dynamics of information systems: computational and mathematical challenges, pages 45–76. Springer, 2014.
  • [15] Ariel Pakes and Paul McGuire. Stochastic algorithms, symmetric Markov perfect equilibrium, and the “curse” of dimensionality. Econometrica, 69(5):1261–1281, 2001.
  • [16] Ariel Rubinstein. Perfect equilibrium in a bargaining model. Econometrica: Journal of the Econometric Society, pages 97–109, 1982.
  • [17] Robert F Stengel. Optimal control and estimation. Courier Corporation, 1994.
  • [18] Deepanshu Vasal and Achilleas Anastasopoulos. Signaling equilibria for dynamic LQG games with asymmetric information. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6901–6908. IEEE, 2016.
  • [19] Deepanshu Vasal, Abhinav Sinha, and Achilleas Anastasopoulos. A systematic process for evaluating structured perfect Bayesian equilibria in dynamic games with asymmetric information. IEEE Transactions on Automatic Control, 64(1):81–96, 2018.
  • [20] Hans S Witsenhausen. A counterexample in stochastic optimum control. SIAM Journal on Control, 6(1):131–147, 1968.
  • [21] Xianrong Zheng, Patrick Martin, Wendy Powley, and Kathryn Brohman. Applying bargaining game theory to web services negotiation. In 2010 IEEE International Conference on Services Computing, pages 218–225. IEEE, 2010.

Appendix A Tower Property

We return to the calculation of (2.65) and (LABEL:eqn:RHS_tower). Recall that we consider the case when FtPF_{t}^{P} has rank m<nm<n and FtEF_{t}^{E} has rank k<nk<n for all t=0,1,,T1t=0,1,\ldots,T-1.

Matching LHS with RHS.

We can see the first term (quadratic in x^T1P\widehat{x}_{T-1}^{P}) on the LHS and RHS are the same. Thus we are left to show that the constants terms in (2.65) and (LABEL:eqn:RHS_tower) are the same. We prove this in the following steps.

Step 1: Expanding L1L_{1} and L2L_{2} on the LHS.

We first calculate the three terms in (2.65) that involves L1L_{1} and L2L_{2} using (2.63) and (2.64). The first term is given by

Tr(L1QTPL1Σ^T1E)\displaystyle\operatorname{Tr}\Big{(}L_{1}^{\top}Q_{T}^{P}L_{1}\widehat{\Sigma}_{T-1}^{E}\Big{)} =\displaystyle= Tr((BT1EFT1E)QTPBT1EFT1EΣ^T1E)(1a)\displaystyle\underbrace{\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{E}\big{)}}_{(1a)}
+Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1ΠT1PΣ^T1E)(1b)\displaystyle+\underbrace{\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widehat{\Sigma}_{T-1}^{E}\big{)}}_{(1b)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTPBT1EFT1EΣ^T1E)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{E}\big{)}

For the second term, we first note that by adding and subtracting AT1A_{T-1}, we obtain

L2\displaystyle L_{2} =\displaystyle= (IKTPHTP)AT1ΠT1PBT1EFT1EKTPHTPAT1\displaystyle-(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}-B^{E}_{T-1}F_{T-1}^{E}-K_{T}^{P}H_{T}^{P}A_{T-1}
=\displaystyle= (IKTPHTP)AT1ΠT1PBT1EFT1E+(IKTPHTP)AT1AT1\displaystyle-(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}-B^{E}_{T-1}F_{T-1}^{E}+(I-K_{T}^{P}H_{T}^{P})A_{T-1}-A_{T-1}

Then, the second term is given by

Tr(L2QTPL2Σ^T1P)\displaystyle\operatorname{Tr}\Big{(}L_{2}^{\top}Q_{T}^{P}L_{2}\widehat{\Sigma}_{T-1}^{P}\Big{)} =\displaystyle= Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1ΠT1PΣ^T1P)(2a)\displaystyle\underbrace{\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widehat{\Sigma}_{T-1}^{P}\big{)}}_{(2a)}
+Tr((AT1+BT1EFT1E)QTP(AT1+BT1EFT1E)Σ^T1P)(2b)\displaystyle+\underbrace{\operatorname{Tr}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widehat{\Sigma}_{T-1}^{P}\big{)}}_{(2b)}
+Tr(((IKTPHTP)AT1)QTP(IKTPHTP)AT1Σ^T1P)(2c)\displaystyle+\underbrace{\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}}_{(2c)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTP(AT1+BT1EFT1E)Σ^T1P)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1Σ^T1P)(2d)\displaystyle-\underbrace{2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}}_{(2d)}
2Tr((AT1+BT1EFT1E)QTP(IKTPHTP)AT1Σ^T1P)\displaystyle-2\operatorname{Tr}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}

Similarly the third term is given by

2Tr(L1QTPL2Σ~T1(P,E))\displaystyle 2\operatorname{Tr}\Big{(}L_{1}^{\top}Q_{T}^{P}L_{2}\widetilde{\Sigma}_{T-1}^{(P,E)}\Big{)} =\displaystyle= 2Tr((BT1EFT1E)QTP(AT1+BT1EFT1E)Σ~T1(P,E))(3a)\displaystyle\underbrace{-2\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}}_{(3a)}
2Tr((BT1EFT1E)QTP(IKTPHTP)AT1ΠT1PΣ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
+2Tr((BT1EFT1E)QTP(IKTPHTP)AT1Σ~T1(P,E))\displaystyle+2\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTP(AT1+BT1EFT1E)Σ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1ΠT1PΣ~T1(P,E))(3b)\displaystyle-\underbrace{2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}}_{(3b)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1Σ~T1(P,E))(3c)\displaystyle+\underbrace{2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}}_{(3c)}

Now we observe that all the constant terms on RHS except Tr(ΓT1QTΓT1W)\operatorname{Tr}(\Gamma_{T-1}^{\top}Q_{T}\Gamma_{T-1}W) can be cancelled with (2b), (1a), and (3a) on LHS, that is

Constant terms on RHS=(2b)+(1a)+(3a)+Tr(ΓT1QTΓT1W).\text{Constant terms on RHS}=(2b)+(1a)+(3a)+\operatorname{Tr}(\Gamma_{T-1}^{\top}Q_{T}\Gamma_{T-1}W). (A.1)

Therefore, our goal now is to show that Tr(ΓT1QTΓT1W)\operatorname{Tr}(\Gamma_{T-1}^{\top}Q_{T}\Gamma_{T-1}W) equals the rest of terms on LHS. Before matching the terms, we first merge and simplify some of the terms on LHS.

Step 2: Merging quadratic terms of ΠT1P\Pi_{T-1}^{P} on LHS.

Recall that ΠT1P\Pi_{T-1}^{P} is defined as ΠT1P=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1\Pi_{T-1}^{P}=\big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}\big{)}\big{(}\widehat{\Sigma}_{t-1}^{(P,E)}\big{)}^{-1}. Thus we have

ΠT1PΣ^t1(P,E)=Σ^t1PΣ~t1(P,E).\Pi_{T-1}^{P}\widehat{\Sigma}_{t-1}^{(P,E)}=\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}. (A.2)

This provides a way to reduce the order of ΠT1P\Pi_{T-1}^{P} on LHS. Collecting terms which are quadratic in ΠT1P\Pi_{T-1}^{P} we obtain

(1b)+(2a)+(3b)\displaystyle(1b)+(2a)+(3b) =\displaystyle= Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1ΠT1P\displaystyle\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\big{)}^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\cdot (A.3)
(Σ^T1P+Σ^T1EΣ~T1(P,E)Σ~T1(E,P)))\displaystyle\qquad\big{(}\widehat{\Sigma}_{T-1}^{P}+\widehat{\Sigma}_{T-1}^{E}-\widetilde{\Sigma}_{T-1}^{(P,E)}-\widetilde{\Sigma}_{T-1}^{(E,P)}\big{)}\Big{)}
=\displaystyle= Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1ΠT1PΣ^T1(P,E)),\displaystyle\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\big{)}^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widehat{\Sigma}_{T-1}^{(P,E)}\Big{)},
=\displaystyle= Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ^t1PΣ~t1(P,E))),\displaystyle\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\big{)}^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}\big{)}\Big{)}, (A.4)

where (A.3) holds by definition of Σ^T1(P,E)\widehat{\Sigma}_{T-1}^{(P,E)} given in (2.38), and (A.4) holds by (A.2). Now we observe that the term in (A.4) will be cancelled with a half of the sum (2d)+(3c)(2d)+(3c), since

(2d)+(3c)=2Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ~T1(P,E)Σ^T1P)).(2d)+(3c)=2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}.

Therefore in summary, we have

(1b)+(2a)+(3b)+(2d)+(3c)=Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ~T1(P,E)Σ^T1P)).(1b)+(2a)+(3b)+(2d)+(3c)=\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}.
Step 3: Merging the last three terms on LHS.

Going back to (2.65), we now consider the last three constant terms that do not involve L1L_{1} and L2L_{2}. We now add the first two terms to (2c), which leads to

Tr((KTP)QTPKTPGP)+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)+(2c)\displaystyle\operatorname{Tr}((K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P})+\operatorname{Tr}(\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W)+(2c) (A.5)
=\displaystyle= Tr((KTP)QTPKTP(HTPΓT1WΓT1(HTP)+GP))\displaystyle\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}\big{(}H_{T}^{P}\Gamma_{T-1}W\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}+G^{P}\big{)}\Big{)}
+Tr(((IKTPHTP)AT1)QTP(IKTPHTP)AT1Σ^T1P)\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}
=\displaystyle= Tr((KTP)QTPKTP(HTPΓT1WΓT1(HTP)+GP+HTPAT1Σ^T1PAT1(HTP)))\displaystyle\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}\big{(}H_{T}^{P}\Gamma_{T-1}W\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}+G^{P}+H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}A_{T-1}^{\top}(H_{T}^{P})^{\top}\big{)}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})

To simply the first term in the above sum, we use the following two facts.

Fact 1.

We first note that

(Σ^TP)\displaystyle\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-} =\displaystyle= AT1(Σ^T1P)+AT1+ΓT1WΓT1\displaystyle A_{T-1}(\widehat{\Sigma}^{P}_{T-1})^{+}A_{T-1}^{\top}+\Gamma_{T-1}W\Gamma_{T-1}^{\top} (A.6)
=\displaystyle= AT1(Σ^T1P(Σ^T1PΣ~T1(P,E))(Σ^t1(P,E))1(Σ^t1PΣ~T1(P,E)))AT1+ΓT1WΓT1,\displaystyle A_{T-1}\Big{(}\widehat{\Sigma}^{P}_{T-1}-(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\widehat{\Sigma}_{t-1}^{(P,E)})^{-1}\big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}^{\top}\Big{)}A_{T-1}^{\top}+\Gamma_{T-1}W\Gamma_{T-1}^{\top}, (A.7)

where (A.6) holds by (2.21e) and (A.7) holds by (2.21c). Thus by rearranging terms in (A.6) and since ΠT1P=(Σ^t1PΣ~t1(P,E))(Σ^t1(P,E))1\Pi_{T-1}^{P}=\big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(P,E)}\big{)}\big{(}\widehat{\Sigma}_{t-1}^{(P,E)}\big{)}^{-1}, we have

AT1Σ^t1PAT1+ΓT1WΓT1=(Σ^TP)+AT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1.A_{T-1}\widehat{\Sigma}^{P}_{t-1}A_{T-1}^{\top}+\Gamma_{T-1}W\Gamma_{T-1}^{\top}=\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}+A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}. (A.8)
Fact 2.

By definition of KtiK_{t}^{i} in (2.21f), we have

KTP(HTP(Σ^TP)(HTP)+GP)=(Σ^TP)(HTP).K_{T}^{P}\left(H_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H^{P}_{T})^{\top}+G^{P}\right)=\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H_{T}^{P})^{\top}.

Therefore,

(KTP)QTPKTP(HTP(Σ^TP)(HTP)+GP)=(KTP)QTP(Σ^TP)(HTP).(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}\left(H_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H^{P}_{T})^{\top}+G^{P}\right)=(K_{T}^{P})^{\top}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H_{T}^{P})^{\top}. (A.9)

Having the above two facts, we can plug in (A.8) and (A.9) into (A.5) to get

Tr((KTP)QTPKTPGP)+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)+(2c)\displaystyle\operatorname{Tr}((K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P})+\operatorname{Tr}(\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W)+(2c) (A.10)
=\displaystyle= Tr((KTP)QTPKTP(HTP((Σ^Ti)+AT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1)(HTP)+GP))\displaystyle\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}\big{(}H_{T}^{P}\big{(}\big{(}\widehat{\Sigma}^{i}_{T}\big{)}^{-}+A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}\big{)}(H_{T}^{P})^{\top}+G^{P}\big{)}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
=\displaystyle= Tr((KTP)QTP(Σ^TP)(HTP))+Tr((KTP)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1(HTP))\displaystyle\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H_{T}^{P})^{\top}\Big{)}+\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}(H_{T}^{P})^{\top}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
=\displaystyle= Tr(QTP((Σ^TP)Σ^TP))+Tr((KTP)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1(HTP))\displaystyle\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}-\widehat{\Sigma}^{P}_{T}\big{)}\Big{)}+\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}(H_{T}^{P})^{\top}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})

where the last equation holds since (Σ^TP)(HTP)(KTP)=(Σ^TP)Σ^TP\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}(H^{P}_{T})^{\top}(K_{T}^{P})^{\top}=\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}-\widehat{\Sigma}^{P}_{T} by (2.21h). Now by (A.10), the sum of (2c) and the last three terms in (2.65) is given by

Tr((KTP)QTPKTPGP)+Tr(ΓT1(HTP)(KTP)QTPKTPHTPΓT1W)+Tr(QTPΣ^TP)+(2c)\displaystyle\operatorname{Tr}((K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}G^{P})+\operatorname{Tr}(\Gamma_{T-1}^{\top}(H_{T}^{P})^{\top}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}\Gamma_{T-1}W)+\operatorname{Tr}(Q_{T}^{P}\widehat{\Sigma}_{T}^{P})+(2c) (A.11)
=\displaystyle= Tr(QTP(Σ^TP))+Tr((KTP)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1(HTP))\displaystyle\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}+\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}(H_{T}^{P})^{\top}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
Step 4: Collecting all the terms on LHS.

Recall that (2b) + (1a) + (3a) can be cancelled with RHS in Step 1, the sum of (1b) + (2a) + (3b) + (2d) + (3c) is given in Step 2, and the sum of the last three terms on LHS and (2c) is given in Step 3. We are now ready to sum up all the terms on LHS using the simplified forms obtained from Steps 2 and 3.

Constant terms on LHS
=\displaystyle= (2b)+(1a)+(3a)\displaystyle(2b)+(1a)+(3a)
+Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ~T1(P,E)Σ^T1P))\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTPBT1EFT1EΣ^T1E)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{E}\big{)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTP(AT1+BT1EFT1E)Σ^T1P)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr((AT1+BT1EFT1E)QTP(IKTPHTP)AT1Σ^T1P)\displaystyle-2\operatorname{Tr}\big{(}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr((BT1EFT1E)QTP(IKTPHTP)AT1ΠT1PΣ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
+2Tr((BT1EFT1E)QTP(IKTPHTP)AT1Σ~T1(P,E))\displaystyle+2\operatorname{Tr}\big{(}(B^{E}_{T-1}F_{T-1}^{E})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTP(AT1+BT1EFT1E)Σ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(A_{T-1}+B^{E}_{T-1}F_{T-1}^{E})\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
+Tr(QTP(Σ^TP))+Tr((KTP)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1(HTP))\displaystyle+\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}+\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}(H_{T}^{P})^{\top}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P).\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}).

We observe that by adding (part of the terms in) (A), (A), (A), and (A) together, we can obtain the sum

2Tr(((IKTPHTP)AT1ΠT1P)QTPBT1EFT1E(Σ^T1P+Σ^T1EΣ~T1(P,E)Σ~T1(E,P)))\displaystyle 2\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\big{)}^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\big{(}\widehat{\Sigma}_{T-1}^{P}+\widehat{\Sigma}_{T-1}^{E}-\widetilde{\Sigma}_{T-1}^{(P,E)}-\widetilde{\Sigma}_{T-1}^{(E,P)}\big{)}\Big{)}
=\displaystyle= 2Tr(((IKTPHTP)AT1ΠT1P)QTPBT1EFT1EΣ^T1(P,E))\displaystyle 2\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P}\big{)}^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\widehat{\Sigma}_{T-1}^{(P,E)}\Big{)}
=\displaystyle= 2Tr(((IKTPHTP)AT1)QTPBT1EFT1E(Σ^t1PΣ~t1(E,P)))\displaystyle 2\operatorname{Tr}\Big{(}\big{(}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{)}^{\top}Q_{T}^{P}B^{E}_{T-1}F_{T-1}^{E}\big{(}\widehat{\Sigma}_{t-1}^{P}-\widetilde{\Sigma}_{t-1}^{(E,P)}\big{)}\Big{)}

where in the last equation we use the trick (A.2) again. This sum will be further cancelled with part of (A) and (A). After these manipulations we have

Constant terms on LHS (A.22)
=\displaystyle= (2b)+(1a)+(3a)\displaystyle(2b)+(1a)+(3a)
+Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ~T1(P,E)Σ^T1P))\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTPAT1Σ^T1P)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr(AT1QTP(IKTPHTP)AT1Σ^T1P)\displaystyle-2\operatorname{Tr}\big{(}A_{T-1}^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTPAT1Σ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
+Tr(QTP(Σ^TP))+Tr((KTP)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E))(ΠT1P)AT1(HTP))\displaystyle+\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}+\operatorname{Tr}\Big{(}(K_{T}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})(\Pi_{T-1}^{P})^{\top}A_{T-1}^{\top}(H_{T}^{P})^{\top}\Big{)}
2Tr(AT1QTPKTPHTPAT1Σ^T1P)+Tr(AT1QTPAT1Σ^T1P).\displaystyle-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}).

Now adding terms in (A.22) and (A.22) together we have

2Tr(AT1QTP(IKTPHTP)AT1Σ^T1P)2Tr(AT1QTPKTPHTPAT1Σ^T1P)\displaystyle-2\operatorname{Tr}\big{(}A_{T-1}^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}-2\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
+Tr(AT1QTPAT1Σ^T1P)\displaystyle+\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})
=\displaystyle= Tr(AT1QTPAT1Σ^T1P)\displaystyle-\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})

Also, adding (A.22), (A.22), and (A.22) together leads to

Tr(((IKTPHTP)AT1ΠT1P)QTP(IKTPHTP)AT1(Σ~T1(P,E)Σ^T1P))\displaystyle\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}(I-K_{T}^{P}H_{T}^{P})A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+2Tr(((IKTPHTP)AT1ΠT1P)QTPAT1Σ^T1P)\displaystyle+2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P}\big{)}
2Tr(((IKTPHTP)AT1ΠT1P)QTPAT1Σ~T1(P,E))\displaystyle-2\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}
=\displaystyle= Tr(((IKTPHTP)AT1ΠT1P)QTPKTPHTPAT1(Σ~T1(P,E)Σ^T1P))\displaystyle-\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+Tr(((IKTPHTP)AT1ΠT1P)QTPAT1(Σ^T1PΣ~T1(P,E)))\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{)}

Therefore, the constant terms on LHS can be further simplified as

Constant terms on LHS (A.23)
=\displaystyle= (2b)+(1a)+(3a)\displaystyle(2b)+(1a)+(3a)
Tr(((IKTPHTP)AT1ΠT1P)QTPKTPHTPAT1(Σ~T1(P,E)Σ^T1P))\displaystyle-\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+Tr(((IKTPHTP)AT1ΠT1P)QTPAT1(Σ^T1PΣ~T1(P,E)))\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{)}
Tr(AT1QTPAT1Σ^T1P)+Tr(QTP(Σ^TP))\displaystyle-\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}
+Tr((KTPHTPAT1ΠT1P)QTPKTPHTPAT1(Σ^T1PΣ~T1(P,E)))\displaystyle+\operatorname{Tr}\Big{(}(K_{T}^{P}H_{T}^{P}A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}(\widehat{\Sigma}^{P}_{T-1}-\widetilde{\Sigma}_{T-1}^{(P,E)})\Big{)}
=\displaystyle= (2b)+(1a)+(3a)\displaystyle(2b)+(1a)+(3a)
Tr((AT1ΠT1P)QTPKTPHTPAT1(Σ~T1(P,E)Σ^T1P))\displaystyle-\operatorname{Tr}\big{(}(A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}K_{T}^{P}H_{T}^{P}A_{T-1}\big{(}\widetilde{\Sigma}_{T-1}^{(P,E)}-\widehat{\Sigma}_{T-1}^{P}\big{)}\Big{)}
+Tr(((IKTPHTP)AT1ΠT1P)QTPAT1(Σ^T1PΣ~T1(P,E)))\displaystyle+\operatorname{Tr}\big{(}((I-K_{T}^{P}H_{T}^{P})A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{)}
Tr(AT1QTPAT1Σ^T1P)+Tr(QTP(Σ^TP))\displaystyle-\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}
=\displaystyle= (2b)+(1a)+(3a)\displaystyle(2b)+(1a)+(3a)
+Tr((AT1ΠT1P)QTPAT1(Σ^T1PΣ~T1(P,E)))\displaystyle+\operatorname{Tr}\big{(}(A_{T-1}\Pi_{T-1}^{P})^{\top}Q_{T}^{P}A_{T-1}\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}\big{)}
Tr(AT1QTPAT1Σ^T1P)+Tr(QTP(Σ^TP))\displaystyle-\operatorname{Tr}(A_{T-1}^{\top}Q_{T}^{P}A_{T-1}\widehat{\Sigma}_{T-1}^{P})+\operatorname{Tr}\Big{(}Q_{T}^{P}\big{(}\widehat{\Sigma}^{P}_{T}\big{)}^{-}\Big{)}
=\displaystyle= (2b)+(1a)+(3a)+Tr(ΓT1QTPΓT1W)\displaystyle(2b)+(1a)+(3a)+\operatorname{Tr}(\Gamma_{T-1}^{\top}Q_{T}^{P}\Gamma_{T-1}W)

where the second last equality holds since (Σ^T1PΣ~T1(P,E))(ΠT1P)=ΠT1P(Σ^T1PΣ~T1(P,E))\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}(\Pi_{T-1}^{P})^{\top}=\Pi_{T-1}^{P}\big{(}\widehat{\Sigma}_{T-1}^{P}-\widetilde{\Sigma}_{T-1}^{(P,E)}\big{)}^{\top}, and the last equality holds by (A.7).

Combining (A.23) and (A.1), we can see that the constant terms on LHS and RHS are the same. Recall that we have already matched the terms that are quadratic in x^T1P\widehat{x}_{T-1}^{P} on both sides. Therefore, the tower property given in (2.59) holds.

Appendix B Bargaining Game with n12n_{1}\geq 2

In Section 5 we examined a bargaining model where the players only have noisy observations of the true value of the good (n1=1n_{1}=1), and they can fully recover the opponent’s state estimate. In this section we introduce a more complex setting, where the true value of the good depends on a set of factors and the players can only observe noisy versions of the factors. We will first construct the bargaining model which involves high dimensional factors, and then focus on the case when n1=2n_{1}=2. In contrast to Section 5, here players cannot fully recover their opponent’s state estimate. We will finally present some numerical results to show the effect of observation noise and information corrections.

Dynamics of the Value of the Good When n1>1n_{1}>1.

Assume that that the value of the good takes the form

pt=θ,ξt,\displaystyle p_{t}=\langle\theta,\xi_{t}\rangle, (B.1)

with ξtn1\xi_{t}\in\mathbb{R}^{n_{1}} a set of factors that determines the value of the good with coefficient θn1\theta\in\mathbb{R}^{n_{1}}. For the value of common commodities, the factors could include weather, government policies, international events, consumer preferences, shifting input costs, and supply and demand imbalance. We assume the factors follow a simple model:

ξt+1=ξt+wt,\xi_{t+1}=\xi_{t}+w_{t},

where {wt}t=0T1\{w_{t}\}_{t=0}^{T-1} is a sequence of IID Gaussian random variables with zero mean and covariance W¯n1×n1\overline{W}\in\mathbb{R}^{n_{1}\times n_{1}}.

Both the buyer and the seller do not have access to the true value of the good nor the precise value of the factors. Instead, they observe a noisy version of the factors using their private information. At time t=0t=0, player ii (i=B,Si=B,S) believes that the initial factor signal:

ξ0𝒩(ξ^0i,W0i),\displaystyle{\xi}_{0}\sim\mathcal{N}(\widehat{{\xi}}_{0}^{i},W^{i}_{0}), (B.2)

and thereafter player ii observes the following noisy factor signal:

zt+1i=ξt+1+wt+1i,wt+1i𝒩(0,Gi),t=0,1,,T1,\displaystyle z_{t+1}^{i}={\xi}_{t+1}+\,w^{i}_{t+1},\quad w^{i}_{t+1}\sim\mathcal{N}(0,G^{i}),\quad t=0,1,\cdots,T-1, (B.3)

where {wti}t=1T1\{w^{i}_{t}\}_{t=1}^{T-1} is a sequence of IID random variables, and {wtB}t=1T1\{w^{B}_{t}\}_{t=1}^{T-1} and {wtS}t=1T1\{w^{S}_{t}\}_{t=1}^{T-1} are independent.

We Note that this case does not degenerate to the n1=1n_{1}=1 case as players have different noisy factor signals leading to different value estimates

The Bargaining Model when n1=2n_{1}=2.

Now we focus on the case where the value of the good depends on a set of two factors and players can only observe a noisy version of the factors. The dynamics of the value of the good and the noisy signals are defined in (B.1)-(B.3). In contrast to the previous section, here the players are not able to fully recover their opponent’s state estimate.

We set the coefficient θ=(θ1,θ2)\theta=(\theta_{1},\theta_{2})^{\top} and the model parameters to be, for t=0,1,,T1t=0,1,\ldots,T-1

At=I,BtB=[0010],BtS=[0001],W=[W¯10000W¯20000W¯B0000W¯S],QtB=QtS=0.A_{t}=I,\,\,B_{t}^{B}=\begin{bmatrix}0\\ 0\\ 1\\ 0\end{bmatrix},\quad B_{t}^{S}=\begin{bmatrix}0\\ 0\\ 0\\ 1\end{bmatrix},\quad W=\begin{bmatrix}\overline{W}^{1}&0&0&0\\ 0&\overline{W}^{2}&0&0\\ 0&0&\overline{W}^{B}&0\\ 0&0&0&\overline{W}^{S}\end{bmatrix},\quad Q_{t}^{B}=Q_{t}^{S}=0.

and

QTB=[βB(1+δB)2θ12βB(1+δB)2θ1θ2βB(1+δB)θ10βB(1+δB)2θ1θ2βB(1+δB)2θ22βB(1+δB)θ20βB(1+δB)θ1βB(1+δB)θ2αB+βBαB00αBαB],Q_{T}^{B}=\begin{bmatrix}\beta_{B}(1+\delta_{B})^{2}\theta_{1}^{2}&\beta_{B}(1+\delta_{B})^{2}\theta_{1}\theta_{2}&-\beta_{B}(1+\delta_{B})\theta_{1}&0\\ \beta_{B}(1+\delta_{B})^{2}\theta_{1}\theta_{2}&-\beta_{B}(1+\delta_{B})^{2}\theta_{2}^{2}&-\beta_{B}(1+\delta_{B})\theta_{2}&0\\ -\beta_{B}(1+\delta_{B})\theta_{1}&-\beta_{B}(1+\delta_{B})\theta_{2}&\alpha_{B}+\beta_{B}&-\alpha_{B}\\ 0&0&-\alpha_{B}&\alpha_{B}\end{bmatrix},
QTS=[βS(1+δS)2θ12βS(1+δS)2θ1θ20βS(1+δS)θ1βS(1+δS)2θ1θ2βS(1+δS)2θ220βS(1+δS)2θ200αSαSβS(1+δS)θ1βS(1+δS)θ2αSαS+βS].Q_{T}^{S}=\begin{bmatrix}\beta_{S}(1+\delta_{S})^{2}\theta_{1}^{2}&\beta_{S}(1+\delta_{S})^{2}\theta_{1}\theta_{2}&0&-\beta_{S}(1+\delta_{S})\theta_{1}\\ \beta_{S}(1+\delta_{S})^{2}\theta_{1}\theta_{2}&\beta_{S}(1+\delta_{S})^{2}\theta_{2}^{2}&0&-\beta_{S}(1+\delta_{S})^{2}\theta_{2}\\ 0&0&\alpha_{S}&-\alpha_{S}\\ -\beta_{S}(1+\delta_{S})\theta_{1}&-\beta_{S}(1+\delta_{S})\theta_{2}&-\alpha_{S}&\alpha_{S}+\beta_{S}\end{bmatrix}.

Also we have Hti=IH_{t}^{i}=I for i=S,Bi=S,B.

For comparison we let αB,αS\alpha_{B},\alpha_{S}, βB,βS\beta_{B},\beta_{S}, δB,δS\delta_{B},\delta_{S}, TT, and the penalty function to be the same as in Section. We also set θ1=θ2=1\theta_{1}=\theta_{2}=1. For the initial state we set ξ0=(30,20)\xi_{0}=(30,20)^{\top}, x0B=10x_{0}^{B}=10, x0S=90x_{0}^{S}=90. We also set W¯1=W¯2=4.5\overline{W}^{1}=\overline{W}^{2}=4.5 for the noise in the dynamics of the factors, and W¯B=W¯S=1012\overline{W}^{B}=\overline{W}^{S}=10^{-12}.

To see the effect of the observation noise, we let the buyer have a much noisier observation of the true price by setting

GB=[500050],GS=[0.5000.5].G^{B}=\begin{bmatrix}50&0\\ 0&50\end{bmatrix},\quad G^{S}=\begin{bmatrix}0.5&0\\ 0&0.5\end{bmatrix}.

For the players’ belief about the initial state, we set x^0B=(25,15)\widehat{x}_{0}^{B}=(25,15)^{\top} and x^0S=(31,20)\widehat{x}_{0}^{S}=(31,20)^{\top} with

W0B=[500050],W0S=[0.5000.5],W_{0}^{B}=\begin{bmatrix}50&0\\ 0&50\end{bmatrix},W_{0}^{S}=\begin{bmatrix}0.5&0\\ 0&0.5\end{bmatrix},

thus the buyer has a far more inaccurate guess of the initial state.

Effect of Observation Noise.

As for the case when n1=1n_{1}=1, the seller has a more accurate state estimate since he receives less noisy signals. The behaviour of the players is similar to that in the full information case.

Refer to caption
Refer to caption
Figure 3: Comparison between the full observation (right) and partial observation (left) cases.
Effect of Information Corrections.

With information corrections the buyer’s estimate of the value is less affected by the noisy observations (see Figure 4) and players are more likely to achieve an agreement in each of the asymmetric and symmetric case (see Table 5). Compared with the case when n1=1n_{1}=1, the gap between the number of agreements achieved in the asymmetric and symmetric (inaccurate) case is larger, since players receive noisy signals from the different factors.

Refer to caption
Refer to caption
Figure 4: The buyer’s price estimate with information corrections (right) and without information corrections (left).
Asymmetric Symmetric (“accurate”) Symmetric (inaccurate)
With IC 329 439 315
Without IC 237 440 207
Table 5: Number of agreements achieved in 500 experiments with and without information corrections (IC).