Existence of structured perfect Bayesian equilibrium in dynamic games of asymmetric information

Abstract

In [1], authors considered a general finite horizon model of dynamic game of asymmetric information, where $N$ players have types evolving as independent Markovian process, where each player observes its own type perfectly and actions of all players. The authors present a sequential decomposition algorithm to find all structured perfect Bayesian equilibria of the game. The algorithm consists of solving a class of fixed-point of equations for each time $t$ , whose existence was left as an open question. In this paper, we prove existence of these fixed-point equations for compact metric spaces.

I Introduction

Authors in [1] considered a model consisting of strategic players having types that evolve as conditionally independent Markov controlled processes. Players observe their types privately and actions taken by all players are observed publicly. Instantaneous reward for each player depends on everyone’s types and actions. The proposed methodology provides a decomposition of the interdependence between beliefs and strategies in PBE and enables a systematic evaluation of a subset of PBE, namely structured perfect Bayesian equilibria (SPBE). Here SPBE are defined as equilibria with players strategies based on their current private type and a set of beliefs on each player’s current private type, which is common to all the players and whose domain is time-invariant. The beliefs on players’ types are such that they can be updated individually for each player and sequentially w.r.t. time. The model allows for signaling amongst players as beliefs depend on strategies. They present a sequential decomposition methodology for calculating all SPBEs for finite and infinite horizon dynamic games with asymmetric information.

For the finite horizon model, they provide a two-step algorithm involving a backward recursion followed by a forward recursion. The algorithm works as follows. In the backward recursion, for every time period, the algorithm finds an equilibrium generating function defined for all possible common beliefs at that time. This involves solving a one-step fixed point equation on the space of probability simplexes. Existence of this fixed-point equation was left as an open question. If there exists a solution for every common belief $\pi_{t}$ , then the equilibrium strategies and beliefs are obtained through a forward recursion using the equilibrium generating function obtained in the backward step and the Bayes update rule. A similar notion of equilibrium was defined in [2, 3] where authors provide a sequential decomposition algorithm to compute common information based perfect Bayesian equilibrium. In [3], it was shown that such an equilibrium exists for zero-sum games.

In this paper, we consider the finite horizon game with all sets of variables in a compact metric space. We show that there always exists an SPBE for such a game. Since it is proved in [1] that all SPBEs can be found using their algorithm, we conclude that there always exists a solution to the fixed-point equation considered in [1] for each time period $t$ .

II Model and Preliminaries

We consider a discrete-time dynamical system with $N$ strategic players in the set $\mathcal{N}\buildrel\triangle\over{=}\{1,2,\ldots N\}$ . We consider two cases: finite horizon $\mathcal{T}\buildrel\triangle\over{=}\{1,2,\ldots T\}$ with perfect recall and infinite horizon with perfect recall. The system state is $X_{t}\buildrel\triangle\over{=}(X_{t}^{1},X_{t}^{2},\ldots X_{t}^{N})$ , where $X_{t}^{i}\in\mathcal{X}^{i}$ is the type of player $i$ at time $t$ , which is perfectly observed and is its private information. Players’ types evolve as conditionally independent, controlled Markov processes such that


$\displaystyle\mathbb{P}(x_{1})$	$\displaystyle=\prod_{i=1}^{N}Q^{i}_{1}(x_{1}^{i})$	(1a)
$\displaystyle\mathbb{P}(x_{t}\|x_{1:t-1},a_{1:t-1})$	$\displaystyle=\mathbb{P}(x_{t}\|x_{t-1},a_{t-1})$	(1b)
	$\displaystyle=\prod_{i=1}^{N}Q_{t}^{i}(x_{t}^{i}\|x_{t-1}^{i},a_{t-1}),$	(1c)

where $Q^{i}_{t}$ are known kernels. Player $i$ at time $t$ takes action $a_{t}^{i}\in\mathcal{A}^{i}$ on observing the actions $a_{1:t-1}=(a_{k})_{k=1,\ldots,t-1}$ where $a_{k}=\left(a_{k}^{j}\right)_{j\in\mathcal{N}}$ , which is common information among players, and the types $x_{1:t}^{i}$ , which it observes privately. The sets $\mathcal{A}^{i},\mathcal{X}^{i}$ are assumed to be finite. Let $g^{i}=(g^{i}_{t})_{t\in\mathcal{T}}$ be a probabilistic strategy of player $i$ where $g^{i}_{t}:\mathcal{A}^{t-1}\times(\mathcal{X}^{i})^{t}\to\Delta(\mathcal{A}^{i})$ such that player $i$ plays action $A_{t}^{i}$ according to $A_{t}^{i}\sim g^{i}_{t}(\cdot|a_{1:t-1},x_{1:t}^{i})$ . Let $g\buildrel\triangle\over{=}(g^{i})_{i\in\mathcal{N}}$ be a strategy profile of all players. At the end of interval $t$ , player $i$ receives an instantaneous reward $R_{t}^{i}(x_{t},a_{t})$ . To preserve the information structure of the problem, it is assumed that players do not observe their rewards until the end of game.¹¹1Alternatively, we could have assumed instantaneous reward of a player to depend only on its own type, i.e. be of the form $R_{t}^{i}(x_{t}^{i},a_{t})$ , and have allowed rewards to be observed by the players during the game as this would not alter the information structure of the game The reward functions and state transition kernels are common knowledge among the players. For the finite-horizon problem, the objective of player $i$ is to maximize its total expected reward

\displaystyle J^{i,g}\buildrel\triangle\over{=}\mathbb{E}^{g}\left\{\sum_{t=1}^{T}R_{t}^{i}(X_{t},A_{t})\right\}.

(2)

II-A Preliminaries

Any history of this game at which players take action is of the form $h_{t}=(a_{1:t-1},x_{1:t})$ . Let $\mathcal{H}_{t}$ be the set of such histories, $\mathcal{H}^{T}\buildrel\triangle\over{=}\cup_{t=0}^{T}\mathcal{H}_{t}$ be the set of all possible such histories in finite horizon and $\mathcal{H}^{\infty}\buildrel\triangle\over{=}\cup_{t=0}^{\infty}\mathcal{H}_{t}$ for infinite horizon. At any time $t$ player $i$ observes $h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i})$ and all players together have $h^{c}_{t}=a_{1:t-1}$ as common history. Let $\mathcal{H}^{i}_{t}$ be the set of observed histories of player $i$ at time $t$ and $\mathcal{H}^{c}_{t}$ be the set of common histories at time $t$ . An appropriate concept of equilibrium for such games is PBE [4], which consists of a pair $(\beta^{*},\mu^{*})$ of strategy profile $\beta^{*}=(\beta_{t}^{*,i})_{t\in\mathcal{T},i\in\mathcal{N}}$ where $\beta_{t}^{*,i}:\mathcal{H}_{t}^{i}\to\Delta(\mathcal{A}^{i})$ and a belief profile $\mu^{*}=(^{i}\mu_{t}^{*})_{t\in\mathcal{T},i\in\mathcal{N}}$ where ${}^{i}\mu_{t}^{*}:\mathcal{H}^{i}_{t}\to\Delta(\mathcal{H}_{t})$ that satisfy sequential rationality so that $\forall i\in\mathcal{N},t\in\mathcal{T},h^{i}_{t}\in\mathcal{H}^{i}_{t},{\beta^{i}}$

W_{t}^{i,\beta^{*,i},T}(h_{t}^{i})\geq W_{t}^{i,\beta^{i},T}(h_{t}^{i})

(3)

where the reward-to-go is defined as

W_{t}^{i,\beta^{i},T}(h_{t}^{i})\triangleq\mathbb{E}^{{\beta}^{i}\beta^{*,-i},\,^{i}\mu_{t}^{*}[h_{t}^{i}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}h^{i}_{t}\right\},\;\;

(4)

and the beliefs satisfy some consistency conditions as described in [4, p. 331]. In general, a belief for player $i$ at time $t$ , ${}^{i}\mu_{t}^{*}$ is defined on history $h_{t}=(a_{1:t-1},x_{1:t})$ given its private history $h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i})$ . Here player $i$ ’s private history $h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i})$ consists of a public part $h_{t}^{c}=a_{1:t-1}$ and a private part $x_{1:t}^{i}$ . At any time $t$ , the relevant uncertainty player $i$ has is about other players’ types $x_{1:t}^{-i}\in\times_{n=1}^{t}\left(\times_{j\neq i}\mathcal{X}^{j}\right)$ and their future actions. Due to independence of types, and given the common history $h_{t}^{c}$ , player $i$ ’s type history $x_{1:t}^{i}$ does not provide any additional information about $x_{1:t}^{-i}$ , as will be shown later. For this reason only those beliefs are considered that are functions of each player’s history $h^{i}_{t}$ only through the common history $h^{c}_{t}$ . Hence, for each player $i$ , its belief for each history $h^{c}_{t}=a_{1:t-1}$ is derived from a common belief $\mu^{*}_{t}[a_{1:t-1}]$ . Furthermore, as will be shown later, this belief factorizes into a product of marginals $\prod_{j\in\mathcal{N}}\mu^{*,j}_{t}[a_{1:t-1}]$ . Thus the following system of beliefs can suffiently be used, $\mu^{*}=(\underline{\mu}^{*}_{t})_{t\in\mathcal{T}}$ , where $\underline{\mu}^{*}_{t}=(\mu^{*,i}_{t})_{i\in\mathcal{N}}$ , and $\mu^{*,i}_{t}:\mathcal{H}^{c}_{t}\to\Delta(\mathcal{X}^{i})$ , with the understanding that player $i$ ’s belief on $x_{t}^{-i}$ is $\mu^{*,-i}_{t}[a_{1:t-1}](x_{t}^{-i})=\prod_{j\neq i}\mu^{*,j}_{t}[a_{1:t-1}](x_{t}^{j})$ . Under the above structure, all consistency conditions that are required for PBEs [4, p. 331] are automatically satisfied.

III Structured perfect Bayesian algorithm

At any time $t$ , player $i$ has information $(a_{1:t-1},x_{1:t}^{i})$ where $a_{1:t-1}$ is the common information among players, and $x_{1:t}^{i}$ is the private information of player $i$ . Since $(a_{1:t-1},x_{1:t}^{i})$ increases with time, any strategy of the form $A_{t}^{i}\sim g^{i}_{t}(\cdot|a_{1:t-1},x_{1:t}^{i})$ becomes unwieldy. Thus it is desirable to have an information state in a time-invariant space that succinctly summarizes $(a_{1:t-1},x_{1:t}^{i})$ , and that can be sequentially updated. IFor any strategy profile $g$ , belief $\pi_{t}$ on $X_{t}$ , $\pi_{t}\in\Delta(\mathcal{X})$ , is defined as $\pi_{t}(x_{t})\buildrel\triangle\over{=}\mathbb{P}^{g}(X_{t}=x_{t}|a_{1:t-1}),\;\forall x_{t}\in\mathcal{X}$ . Define the marginals $\pi_{t}^{{i}}(x_{t}^{i})\buildrel\triangle\over{=}\mathbb{P}^{g}(X_{t}^{i}=x_{t}^{i}|a_{1:t-1}),\;\forall x_{t}^{i}\in\mathcal{X}^{i}$ .

Then using common information approach [5], the problem is posed as follows: player $i$ at time $t$ observes $a_{1:t-1}$ and takes action $\gamma_{t}^{i}$ , where $\gamma_{t}^{i}:\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i})$ is a partial (stochastic) function from its private information $x_{t}^{i}$ to $a_{t}^{i}$ , of the form $A_{t}^{i}\sim\gamma_{t}^{i}(\cdot|x_{t}^{i})$ . These actions are generated through some policy $\psi^{i}=(\psi^{i}_{t})_{t\in\mathcal{T}}$ , $\psi^{i}_{t}:\mathcal{A}^{t-1}\to\left\{\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i})\right\}$ , that operates on the common information $a_{1:t-1}$ such that $\gamma_{t}^{i}=\psi_{t}^{i}[a_{1:t-1}]$ . Then any policy of the form $A_{t}^{i}\sim s_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})$ is equivalent to $A_{t}^{i}\sim\psi^{i}_{t}[a_{1:t-1}](\cdot|x_{t}^{i})$ .

III-A Forward/Backward algorithm

In Lemma 3 in Appendix B of [1], it is shown that due to the independence of types and their evolution as independent controlled Markov processes, for any strategy of the players, the joint common belief can be factorized as a product of its marginals i.e., $\pi_{t}(x_{t})=\prod_{i=1}^{N}\pi_{t}^{i}(x_{t}^{i}),\forall x_{t}$ . Define $\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i})$ as vector of marginal beliefs where $\underline{\pi}_{t}:=(\pi^{i}_{t})_{i\in\mathcal{N}}$ . Similarly define the vector of belief updates as $\underline{F}(\underline{\pi},\gamma,a):=(F^{i}(\pi^{i},\gamma^{i},a))_{i\in\mathcal{N}}$ where (using Bayes rule)

		$\displaystyle F^{i}(\pi^{i},\gamma^{i},a)(x_{t+1}^{i})=$
		$\displaystyle\left\{\begin{array}[]{ll}\frac{\sum_{x^{i}_{t}}\pi^{{i}}(x_{t}^{i})\gamma^{i}(a^{i}\|x_{t}^{i})Q_{t}^{i}(x_{t+1}^{i}\|x_{t}^{i},a)}{\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}\|\tilde{x}_{t}^{i})}\mbox{if }\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}\|\tilde{x}_{t}^{i})>0\\ \sum_{x_{t}^{i}}\pi^{i}(x_{t}^{i})Q_{t}^{i}(x_{t+1}^{i}\|x_{t}^{i},a)\;\;\;\;\;\mbox{if }\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}\|\tilde{x}_{t}^{i})=0.\end{array}\right.$		(7)

In the following, a backward-forward algorithm is presented that evaluates SPBE. where it is shown in [1, Theorem 2], this is a “canonical” methodology, in the sense that all SPBE can be generated this way.

III-B Backward Recursion

In this section, define an equilibrium generating function
$\theta=(\theta^{i}_{t})_{i\in\mathcal{N},t\in\mathcal{T}}$ , where $\theta^{i}_{t}:\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i})\to\left\{\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i})\right\}$ . In addition, define a sequence of reward-to-go functions of player $i$ at time $t$ , $(V_{t}^{i})_{i\in\mathcal{N},t\in\{1,2,\ldots T+1\}}$ , where $V_{t}^{i}:\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i})\times\mathcal{X}^{i}\to\mathbb{R}$ . These quantities are generated through a backward recursive way, as follows.

1.

Initialize $\forall\underline{\pi}_{T+1}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),x_{T+1}^{i}\in\mathcal{X}^{i}$ ,

$\displaystyle V^{i}_{T+1}(\underline{\pi}_{T+1},x_{T+1}^{i})\buildrel\triangle\over{=}0.$ (8)

For $t=T,T-1,\ldots 1,\ \forall\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),\pi_{t}=\prod_{i\in\mathcal{N}}\pi_{t}^{i}$ , let $\theta_{t}[\underline{\pi}_{t}]$ be generated as follows. Set $\tilde{\gamma}_{t}=\theta_{t}[\underline{\pi}_{t}]$ , where $\tilde{\gamma}_{t}$ is the solution, if it exists, of the following fixed-point equation, $\forall i\in\mathcal{N},x_{t}^{i}\in\mathcal{X}^{i}$ ,

	$\displaystyle\tilde{\gamma}^{i}_{t}(\cdot\|x_{t}^{i})\in\arg\max_{\gamma^{i}_{t}(\cdot\|x_{t}^{i})}$	$\displaystyle\mathbb{E}^{\gamma^{i}_{t}(\cdot\|x_{t}^{i})\tilde{\gamma}^{-i}_{t},\,\pi_{t}}\left\{R_{t}^{i}(X_{t},A_{t})+\right.$
		$\displaystyle\left.V_{t+1}^{i}(\underline{F}(\underline{\pi}_{t},\tilde{\gamma}_{t},A_{t}),X_{t+1}^{i})\big{\lvert}x_{t}^{i}\right\},$		(9)

where expectation in (9) is with respect to random variables $(X_{t}^{-i},A_{t},X_{t+1}^{i})$ through the measure $\pi_{t}^{-i}(x_{t}^{-i})\gamma^{i}_{t}(a^{i}_{t}|x_{t}^{i})\tilde{\gamma}^{-i}_{t}(a^{-i}_{t}|x_{t}^{-i})Q_{t+1}^{i}(x_{t+1}^{i}|x_{t}^{i},a_{t})$ and $\underline{F}$ is defined above.

Furthermore, using the quantity $\tilde{\gamma}_{t}$ found above, define

	$\displaystyle V^{i}_{t}(\underline{\pi}_{t},x_{t}^{i})\buildrel\triangle\over{=}$	$\displaystyle\mathbb{E}^{\tilde{\gamma}^{i}_{t}(\cdot\|x_{t}^{i})\tilde{\gamma}^{-i}_{t},\,\pi_{t}}\left\{{R}_{t}^{i}(X_{t},A_{t})\right.$
		$\displaystyle\left.+V_{t+1}^{i}(\underline{F}(\underline{\pi}_{t},\tilde{\gamma}_{t},A_{t}),X_{t+1}^{i})\big{\lvert}x_{t}^{i}\right\}.$		(10)

Then

Theorem 1 ([1])

A strategy and belief profile $(\beta^{*},\mu^{*})$ , constructed through the backward-forward recursion algorithm is a PBE of the game, i.e., $\forall i\in\mathcal{N},t\in\mathcal{T},a_{1:t-1}\in\mathcal{H}_{t}^{c},x_{1:t}^{i}\in(\mathcal{X}^{i})^{t},\beta^{i}$ ,

	$\displaystyle\mathbb{E}^{\beta_{t:T}^{,i}\beta_{t:T}^{,-i},\,\mu_{t}^{*}[a_{1:t-1}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}a_{1:t-1},x_{1:t}^{i}\right\}\geq$
	$\displaystyle\mathbb{E}^{\beta_{t:T}^{i}\beta_{t:T}^{,-i},\,\mu_{t}^{}[a_{1:t-1}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}a_{1:t-1},x_{1:t}^{i}\right\}.$		(11)

Theorem 2 ([1])

Let ( $\beta^{*},\mu^{*}$ ) be an SPBE. Then there exists an equilibrium generating function $\phi$ that satisfies (9) in backward recursion $\forall\ \pi_{t}=\mu^{*}_{t}[a_{1:t-1}],\ \forall\ a_{1:t-1}$ , such that ( $\beta^{*},\mu^{*}$ ) is defined through forward recursion using $\phi$ .

IV Main result: Existence

In this section, we present the main result of this paper where we show that for all $\forall\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),\pi_{t}=\prod_{i\in\mathcal{N}}\pi_{t}^{i}$ , there always exists a $\tilde{\gamma}=\theta[\pi_{t}]$ that satisfies (9). We show that existence holds true for any compact metric spaces $\mathcal{X},\mathcal{A}$ . Let $\Sigma$ be the set of all mixed strategies of the form $\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{1:t}^{i})$ and $\Sigma^{\prime}$ be the set of all mixed strategies of the form $\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})$ . Then both $\Sigma,\Sigma^{\prime}$ are compact.

Lemma 1

Let $\epsilon>0$ .

(a)

Let $L$ be a standard measure on $\mathcal{A}^{i}$ such that for any non empty open measurable set $V$ in the Borel space $B(\mathcal{A}^{i})$ , $L(V)>0$ i.e. L puts non zero measure on every non-empty open measurable set of $\mathcal{A}^{i}$ .²²2L can be Lesbesgue measure on Euclidean space and counting measure on a discrete space. Let $\Sigma^{\epsilon}_{1}\subset\Sigma^{\prime}$ be the space of such strategies such that if $\sigma\in\Sigma^{\epsilon}_{1}$ then for all $i\in\mathcal{N},V\in B(\mathcal{A}^{i}),a_{1:t-1},x_{t}^{i}$ ,

$\displaystyle\sigma^{i}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V)$ (12)

If $\mathcal{A}^{i}$ is discrete the above condition boils down to: for all $i\in\mathcal{N},a^{j}\in\mathcal{A}^{i},a_{1:t-1},x_{t}^{i}$ ,

$\displaystyle\sigma^{i}(a^{j}|a_{1:t-1},x_{t}^{i})\geq\epsilon$ (13)

(b)

Let $\Sigma^{\epsilon}_{2}\subseteq\Sigma^{\epsilon}_{1}$ be the space of such strategies such that if $\sigma\in\Sigma^{\epsilon}_{2}$ then for all $a_{1:t-1},\widehat{a}_{1:t-1}$ such that if

\displaystyle P^{\sigma}(x_{t}|a_{1:t-1})=P^{\sigma}(x_{t}|\widehat{a}_{1:t-1})\;\;\;\;\;\forall x_{t},

(14)

it implies,

\displaystyle\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})=\sigma_{t}^{i}(\cdot|\widehat{a}_{1:t-1},x_{t}^{i})\;\;\;\;\forall i,x_{t}^{i}.

(15)

(The above condition ensures the property of structured policies such that if two action histories produce the same common belief under a strategy, then the strategy is indifferent towards those two action histories).

Then $\Sigma^{\epsilon}_{2}$ is a compact metric space.

Proof:

(a)

Let $\sigma^{n}$ be any sequence of strategies in $\Sigma^{\epsilon}_{1}$ . Thus $i\in\mathcal{N},V\in B(\mathcal{A}^{i}),a^{j},a_{1:t-1},x_{t}^{i}$ ,
$\sigma^{i,n}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V)$ . Thus $\lim_{n}\sigma^{i,n}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V)$ . Thus $\Sigma^{\epsilon}_{1}$ is closed. It is also bounded. Thus $\Sigma^{\epsilon}_{1}$ is compact.
(b)

Let $\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{\sigma\in\Sigma^{\epsilon}_{1}:\forall x_{t},\;P^{\sigma}(x_{t}|a_{1:t-1})=P^{\sigma}(x_{t}|\widehat{a}_{1:t-1})\}$ . Clearly $\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1})$ is not empty since the strategy that is independent of action history always lies in this set.

Let $\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{\sigma\in\Sigma^{\epsilon}_{1}:\forall i,x_{t}^{i}\;\;\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})=\sigma_{t}^{i}(\cdot|\widehat{a}_{1:t-1},x_{t}^{i})\}$ . $\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1})$ is non empty for the same reason.

Then $\Sigma^{\epsilon}_{2}=\bigcap_{a_{1:t-1},\widehat{a}_{1:t-1}}\big{(}\mathcal{S}^{c}(a_{1:t-1},\widehat{a}_{1:t-1})\bigcup\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1})\big{)}$ .

Let $\mathcal{V}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{g\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{A}|^{t-1}}:\;\;g(a_{1:t-1})=g(\widehat{a}_{1:t-1})$ }. Since $\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1})$ is intersection of ${\Sigma}^{\epsilon}_{1}$ , which is compact, with subspace $\mathcal{V}(a_{1:t-1},\widehat{a}_{1:t-1})$ , thus $\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1})$ is also compact.

Let $\mathcal{W}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{f=(f^{i})_{i},f^{i}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{A}|^{t-1}\times|\mathcal{X}^{i}|}:\forall i,x_{t}^{i},\;\;f^{i}(a_{1:t-1},x_{t}^{i})=f^{i}(\widehat{a}_{1:t-1},x_{t}^{i})$ }.

Then $\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1})$ is the intersection of $\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1})$ with subspace $\mathcal{W}(a_{1:t-1},\widehat{a}_{1:t-1})$ , thus $\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1})$ is also compact.

This implies $\Sigma^{\epsilon}_{2}$ is non-empty and compact.

∎

Theorem 3

There exists an SPBE for such a game.

Proof:

Define a perturbation of the game by restricting the set of strategies to $\Sigma^{\epsilon}$ . Since utilities are continuous in strategies on $\Sigma^{\epsilon}$ (as all strategies put non zero measure on every non empty open set in the action space which implies the Bayes rule has denominator non-zero), and $\Sigma^{\epsilon}$ is a compact metric space by Lemma 1. Let

\displaystyle BR^{i}(\beta^{-i})=\bigcap_{\beta^{i}\in\Sigma_{2}^{\epsilon,i}}\operatorname*{argmax}_{\beta^{i}\in\Sigma^{\epsilon,i}}\mathbb{E}^{\beta^{i}\beta^{-i}}\left\{\sum_{n=1}^{T}R_{n}^{i}(X_{n},A_{n})\right\}.

(16)

We note that $BR^{i}(\beta^{-i})$ is non-empty $\forall\beta^{-i}$ by Lemma 2 in [1] which states that any expected reward that can be achieved using any general profile of strategies, can also be achieved using structured policies, which implies when the intersection of the argmax operation in (16) is taken over structured policies, it is non-empty. Moreover $BR^{i}(\beta^{-i})$ is continuous in $\beta^{-i}$ by [6, Theorem 2,3].

Since $BR$ is continuous on $\Sigma^{\epsilon}$ , there exists a fixed-point of $BR$ by Glicksberg fixed-point Theorem [7]. Consider a sequence of such perturbed games in which $\epsilon\to 0$ ; by the compactness of the set of strategy profiles, some sequence of selections from the sets of Nash equilibria of the games in the sequence converges, say to $\sigma^{*}$ . Then $\sigma^{*}$ is an SPBE of the game. ∎

Theorem 4

There exists a solution to the fixed-point equation (9) for each $t,\pi_{t}$ .

Proof:

Since there exists an SPBE for such a game from Theorem 3 and every SPBE can be found from [1, Theorem 4], thus there exists a solution to (9) for each $t,\pi_{t}$ . ∎

Remark In this paper, we showed that there exists a solution of smaller fixed-point equations (9) for each $t,\pi_{t}$ by showing that there exists a solution to the fixed-point equation which is the Nash equilibrium of the whole game, using Glicksberg Fixed-point equation. It is interesting because the smaller fixed-point equations may and does have discontinuous utilities, as seen through numerical solutions. There have been some results known in existence of solutions of games with discontinuous utilities [8, 9]. It would be an interesting exercise to explore the connection between existence of the two results.

V Conclusion

Authors in [1] proposed a novel methodology to find structured perfect Bayesian equilibria of a general model of dynamic games of asymmetric information where players observe their types privately, which evolve as a controlled Markov process, condition on everyones actions. Players also observe everyone’s actions publicly. In this paper, we prove the existence of the fixed-point equation that crucially appears in their methodology for each time $t$ and that was left open in that paper.

VI Acknowledgement

The author would like to thank K.S. Mallikarjuna Rao for suggestion to prove existence of SPBE.

References

[1] D. Vasal, A. Sinha, and A. Anastasopoulos, “A Systematic Process for Evaluating Structured Perfect Bayesian Equilibria in Dynamic Games With Asymmetric Information,” IEEE Transactions on Automatic Control, vol. 64, no. 1, pp. 81–96, jan 2019.
[2] Y. Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic oligopoly games with private {M}arkovian dynamics,” in Proc. 54th IEEE Conf. Decision and Control (CDC), 2015.
[3] H. T. Jahormi, “On Design and Analysis of Cyber-Physical Systems with Strategic Agents,” Ph.D. dissertation, University of Michigan, Ann Arbor, 2017.
[4] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA: MIT Press, 1991.
[5] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A common information approach,” Automatic Control, IEEE Transactions on, vol. 58, no. 7, pp. 1644–1658, 2013.
[6] D. Vasal and R. Berry, “$alpha-$ robust equilibrium in anonymous games,” may 2020. [Online]. Available: http://arxiv.org/abs/2005.06812
[7] I. L. Glicksberg, “A Further Generalization of the Kakutani Fixed Point Theorem, with Application to Nash Equilibrium Points,” Proceedings of the American Mathematical Society, vol. 3, no. 1, p. 170, feb 1952.
[8] P. Dasgupta, E. Maskin, and Others, “The existence of equilibrium in discontinuous economic games, I: Theory,” Review of Economic Studies, vol. 53, no. 1, pp. 1–26, 1986.
[9] P. Dasgupta and E. Maskin, “The existence of equilibrium in discontinuous economic games, II: Applications,” The Review of economic studies, vol. 53, no. 1, pp. 27–41, 1986.