This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Existence of structured perfect Bayesian equilibrium in dynamic games of asymmetric information

Deepanshu Vasal email:[email protected]
Abstract

In [1], authors considered a general finite horizon model of dynamic game of asymmetric information, where NN players have types evolving as independent Markovian process, where each player observes its own type perfectly and actions of all players. The authors present a sequential decomposition algorithm to find all structured perfect Bayesian equilibria of the game. The algorithm consists of solving a class of fixed-point of equations for each time tt, whose existence was left as an open question. In this paper, we prove existence of these fixed-point equations for compact metric spaces.

I Introduction

Authors in [1] considered a model consisting of strategic players having types that evolve as conditionally independent Markov controlled processes. Players observe their types privately and actions taken by all players are observed publicly. Instantaneous reward for each player depends on everyone’s types and actions. The proposed methodology provides a decomposition of the interdependence between beliefs and strategies in PBE and enables a systematic evaluation of a subset of PBE, namely structured perfect Bayesian equilibria (SPBE). Here SPBE are defined as equilibria with players strategies based on their current private type and a set of beliefs on each player’s current private type, which is common to all the players and whose domain is time-invariant. The beliefs on players’ types are such that they can be updated individually for each player and sequentially w.r.t. time. The model allows for signaling amongst players as beliefs depend on strategies. They present a sequential decomposition methodology for calculating all SPBEs for finite and infinite horizon dynamic games with asymmetric information.

For the finite horizon model, they provide a two-step algorithm involving a backward recursion followed by a forward recursion. The algorithm works as follows. In the backward recursion, for every time period, the algorithm finds an equilibrium generating function defined for all possible common beliefs at that time. This involves solving a one-step fixed point equation on the space of probability simplexes. Existence of this fixed-point equation was left as an open question. If there exists a solution for every common belief πt\pi_{t}, then the equilibrium strategies and beliefs are obtained through a forward recursion using the equilibrium generating function obtained in the backward step and the Bayes update rule. A similar notion of equilibrium was defined in [2, 3] where authors provide a sequential decomposition algorithm to compute common information based perfect Bayesian equilibrium. In [3], it was shown that such an equilibrium exists for zero-sum games.

In this paper, we consider the finite horizon game with all sets of variables in a compact metric space. We show that there always exists an SPBE for such a game. Since it is proved in [1] that all SPBEs can be found using their algorithm, we conclude that there always exists a solution to the fixed-point equation considered in [1] for each time period tt.

II Model and Preliminaries

We consider a discrete-time dynamical system with NN strategic players in the set 𝒩={1,2,N}\mathcal{N}\buildrel\triangle\over{=}\{1,2,\ldots N\}. We consider two cases: finite horizon 𝒯={1,2,T}\mathcal{T}\buildrel\triangle\over{=}\{1,2,\ldots T\} with perfect recall and infinite horizon with perfect recall. The system state is Xt=(Xt1,Xt2,XtN)X_{t}\buildrel\triangle\over{=}(X_{t}^{1},X_{t}^{2},\ldots X_{t}^{N}), where Xti𝒳iX_{t}^{i}\in\mathcal{X}^{i} is the type of player ii at time tt, which is perfectly observed and is its private information. Players’ types evolve as conditionally independent, controlled Markov processes such that

(x1)\displaystyle\mathbb{P}(x_{1}) =i=1NQ1i(x1i)\displaystyle=\prod_{i=1}^{N}Q^{i}_{1}(x_{1}^{i}) (1a)
(xt|x1:t1,a1:t1)\displaystyle\mathbb{P}(x_{t}|x_{1:t-1},a_{1:t-1}) =(xt|xt1,at1)\displaystyle=\mathbb{P}(x_{t}|x_{t-1},a_{t-1}) (1b)
=i=1NQti(xti|xt1i,at1),\displaystyle=\prod_{i=1}^{N}Q_{t}^{i}(x_{t}^{i}|x_{t-1}^{i},a_{t-1}), (1c)

where QtiQ^{i}_{t} are known kernels. Player ii at time tt takes action ati𝒜ia_{t}^{i}\in\mathcal{A}^{i} on observing the actions a1:t1=(ak)k=1,,t1a_{1:t-1}=(a_{k})_{k=1,\ldots,t-1} where ak=(akj)j𝒩a_{k}=\left(a_{k}^{j}\right)_{j\in\mathcal{N}}, which is common information among players, and the types x1:tix_{1:t}^{i}, which it observes privately. The sets 𝒜i,𝒳i\mathcal{A}^{i},\mathcal{X}^{i} are assumed to be finite. Let gi=(gti)t𝒯g^{i}=(g^{i}_{t})_{t\in\mathcal{T}} be a probabilistic strategy of player ii where gti:𝒜t1×(𝒳i)tΔ(𝒜i)g^{i}_{t}:\mathcal{A}^{t-1}\times(\mathcal{X}^{i})^{t}\to\Delta(\mathcal{A}^{i}) such that player ii plays action AtiA_{t}^{i} according to Atigti(|a1:t1,x1:ti)A_{t}^{i}\sim g^{i}_{t}(\cdot|a_{1:t-1},x_{1:t}^{i}). Let g=(gi)i𝒩g\buildrel\triangle\over{=}(g^{i})_{i\in\mathcal{N}} be a strategy profile of all players. At the end of interval tt, player ii receives an instantaneous reward Rti(xt,at)R_{t}^{i}(x_{t},a_{t}). To preserve the information structure of the problem, it is assumed that players do not observe their rewards until the end of game.111Alternatively, we could have assumed instantaneous reward of a player to depend only on its own type, i.e. be of the form Rti(xti,at)R_{t}^{i}(x_{t}^{i},a_{t}), and have allowed rewards to be observed by the players during the game as this would not alter the information structure of the game The reward functions and state transition kernels are common knowledge among the players. For the finite-horizon problem, the objective of player ii is to maximize its total expected reward

Ji,g=𝔼g{t=1TRti(Xt,At)}.\displaystyle J^{i,g}\buildrel\triangle\over{=}\mathbb{E}^{g}\left\{\sum_{t=1}^{T}R_{t}^{i}(X_{t},A_{t})\right\}. (2)

II-A Preliminaries

Any history of this game at which players take action is of the form ht=(a1:t1,x1:t)h_{t}=(a_{1:t-1},x_{1:t}). Let t\mathcal{H}_{t} be the set of such histories, T=t=0Tt\mathcal{H}^{T}\buildrel\triangle\over{=}\cup_{t=0}^{T}\mathcal{H}_{t} be the set of all possible such histories in finite horizon and =t=0t\mathcal{H}^{\infty}\buildrel\triangle\over{=}\cup_{t=0}^{\infty}\mathcal{H}_{t} for infinite horizon. At any time tt player ii observes hti=(a1:t1,x1:ti)h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i}) and all players together have htc=a1:t1h^{c}_{t}=a_{1:t-1} as common history. Let ti\mathcal{H}^{i}_{t} be the set of observed histories of player ii at time tt and tc\mathcal{H}^{c}_{t} be the set of common histories at time tt. An appropriate concept of equilibrium for such games is PBE [4], which consists of a pair (β,μ)(\beta^{*},\mu^{*}) of strategy profile β=(βt,i)t𝒯,i𝒩\beta^{*}=(\beta_{t}^{*,i})_{t\in\mathcal{T},i\in\mathcal{N}} where βt,i:tiΔ(𝒜i)\beta_{t}^{*,i}:\mathcal{H}_{t}^{i}\to\Delta(\mathcal{A}^{i}) and a belief profile μ=(iμt)t𝒯,i𝒩\mu^{*}=(^{i}\mu_{t}^{*})_{t\in\mathcal{T},i\in\mathcal{N}} where μti:tiΔ(t){}^{i}\mu_{t}^{*}:\mathcal{H}^{i}_{t}\to\Delta(\mathcal{H}_{t}) that satisfy sequential rationality so that i𝒩,t𝒯,htiti,βi\forall i\in\mathcal{N},t\in\mathcal{T},h^{i}_{t}\in\mathcal{H}^{i}_{t},{\beta^{i}}

Wti,β,i,T(hti)Wti,βi,T(hti)W_{t}^{i,\beta^{*,i},T}(h_{t}^{i})\geq W_{t}^{i,\beta^{i},T}(h_{t}^{i}) (3)

where the reward-to-go is defined as

Wti,βi,T(hti)𝔼βiβ,i,iμt[hti]{n=tTRni(Xn,An)|hti},W_{t}^{i,\beta^{i},T}(h_{t}^{i})\triangleq\mathbb{E}^{{\beta}^{i}\beta^{*,-i},\,^{i}\mu_{t}^{*}[h_{t}^{i}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}h^{i}_{t}\right\},\;\; (4)

and the beliefs satisfy some consistency conditions as described in [4, p. 331]. In general, a belief for player ii at time tt, μti{}^{i}\mu_{t}^{*} is defined on history ht=(a1:t1,x1:t)h_{t}=(a_{1:t-1},x_{1:t}) given its private history hti=(a1:t1,x1:ti)h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i}). Here player ii’s private history hti=(a1:t1,x1:ti)h^{i}_{t}=(a_{1:t-1},x_{1:t}^{i}) consists of a public part htc=a1:t1h_{t}^{c}=a_{1:t-1} and a private part x1:tix_{1:t}^{i}. At any time tt, the relevant uncertainty player ii has is about other players’ types x1:ti×n=1t(×ji𝒳j)x_{1:t}^{-i}\in\times_{n=1}^{t}\left(\times_{j\neq i}\mathcal{X}^{j}\right) and their future actions. Due to independence of types, and given the common history htch_{t}^{c}, player ii’s type history x1:tix_{1:t}^{i} does not provide any additional information about x1:tix_{1:t}^{-i}, as will be shown later. For this reason only those beliefs are considered that are functions of each player’s history htih^{i}_{t} only through the common history htch^{c}_{t}. Hence, for each player ii, its belief for each history htc=a1:t1h^{c}_{t}=a_{1:t-1} is derived from a common belief μt[a1:t1]\mu^{*}_{t}[a_{1:t-1}]. Furthermore, as will be shown later, this belief factorizes into a product of marginals j𝒩μt,j[a1:t1]\prod_{j\in\mathcal{N}}\mu^{*,j}_{t}[a_{1:t-1}]. Thus the following system of beliefs can suffiently be used, μ=(μ¯t)t𝒯\mu^{*}=(\underline{\mu}^{*}_{t})_{t\in\mathcal{T}}, where μ¯t=(μt,i)i𝒩\underline{\mu}^{*}_{t}=(\mu^{*,i}_{t})_{i\in\mathcal{N}}, and μt,i:tcΔ(𝒳i)\mu^{*,i}_{t}:\mathcal{H}^{c}_{t}\to\Delta(\mathcal{X}^{i}), with the understanding that player ii’s belief on xtix_{t}^{-i} is μt,i[a1:t1](xti)=jiμt,j[a1:t1](xtj)\mu^{*,-i}_{t}[a_{1:t-1}](x_{t}^{-i})=\prod_{j\neq i}\mu^{*,j}_{t}[a_{1:t-1}](x_{t}^{j}). Under the above structure, all consistency conditions that are required for PBEs [4, p. 331] are automatically satisfied.

III Structured perfect Bayesian algorithm

At any time tt, player ii has information (a1:t1,x1:ti)(a_{1:t-1},x_{1:t}^{i}) where a1:t1a_{1:t-1} is the common information among players, and x1:tix_{1:t}^{i} is the private information of player ii. Since (a1:t1,x1:ti)(a_{1:t-1},x_{1:t}^{i}) increases with time, any strategy of the form Atigti(|a1:t1,x1:ti)A_{t}^{i}\sim g^{i}_{t}(\cdot|a_{1:t-1},x_{1:t}^{i}) becomes unwieldy. Thus it is desirable to have an information state in a time-invariant space that succinctly summarizes (a1:t1,x1:ti)(a_{1:t-1},x_{1:t}^{i}), and that can be sequentially updated. IFor any strategy profile gg, belief πt\pi_{t} on XtX_{t}, πtΔ(𝒳)\pi_{t}\in\Delta(\mathcal{X}), is defined as πt(xt)=g(Xt=xt|a1:t1),xt𝒳\pi_{t}(x_{t})\buildrel\triangle\over{=}\mathbb{P}^{g}(X_{t}=x_{t}|a_{1:t-1}),\;\forall x_{t}\in\mathcal{X}. Define the marginals πti(xti)=g(Xti=xti|a1:t1),xti𝒳i\pi_{t}^{{i}}(x_{t}^{i})\buildrel\triangle\over{=}\mathbb{P}^{g}(X_{t}^{i}=x_{t}^{i}|a_{1:t-1}),\;\forall x_{t}^{i}\in\mathcal{X}^{i}.

Then using common information approach [5], the problem is posed as follows: player ii at time tt observes a1:t1a_{1:t-1} and takes action γti\gamma_{t}^{i}, where γti:𝒳iΔ(𝒜i)\gamma_{t}^{i}:\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i}) is a partial (stochastic) function from its private information xtix_{t}^{i} to atia_{t}^{i}, of the form Atiγti(|xti)A_{t}^{i}\sim\gamma_{t}^{i}(\cdot|x_{t}^{i}). These actions are generated through some policy ψi=(ψti)t𝒯\psi^{i}=(\psi^{i}_{t})_{t\in\mathcal{T}}, ψti:𝒜t1{𝒳iΔ(𝒜i)}\psi^{i}_{t}:\mathcal{A}^{t-1}\to\left\{\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i})\right\}, that operates on the common information a1:t1a_{1:t-1} such that γti=ψti[a1:t1]\gamma_{t}^{i}=\psi_{t}^{i}[a_{1:t-1}]. Then any policy of the form Atisti(|a1:t1,xti)A_{t}^{i}\sim s_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i}) is equivalent to Atiψti[a1:t1](|xti)A_{t}^{i}\sim\psi^{i}_{t}[a_{1:t-1}](\cdot|x_{t}^{i}).

III-A Forward/Backward algorithm

In Lemma 3 in Appendix B of [1], it is shown that due to the independence of types and their evolution as independent controlled Markov processes, for any strategy of the players, the joint common belief can be factorized as a product of its marginals i.e., πt(xt)=i=1Nπti(xti),xt\pi_{t}(x_{t})=\prod_{i=1}^{N}\pi_{t}^{i}(x_{t}^{i}),\forall x_{t}. Define π¯t×i𝒩Δ(𝒳i)\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}) as vector of marginal beliefs where π¯t:=(πti)i𝒩\underline{\pi}_{t}:=(\pi^{i}_{t})_{i\in\mathcal{N}}. Similarly define the vector of belief updates as F¯(π¯,γ,a):=(Fi(πi,γi,a))i𝒩\underline{F}(\underline{\pi},\gamma,a):=(F^{i}(\pi^{i},\gamma^{i},a))_{i\in\mathcal{N}} where (using Bayes rule)

Fi(πi,γi,a)(xt+1i)=\displaystyle F^{i}(\pi^{i},\gamma^{i},a)(x_{t+1}^{i})=
{xtiπi(xti)γi(ai|xti)Qti(xt+1i|xti,a)x~tiπi(x~ti)γi(ai|x~ti)if x~tiπi(x~ti)γi(ai|x~ti)>0xtiπi(xti)Qti(xt+1i|xti,a)if x~tiπi(x~ti)γi(ai|x~ti)=0.\displaystyle\left\{\begin{array}[]{ll}\frac{\sum_{x^{i}_{t}}\pi^{{i}}(x_{t}^{i})\gamma^{i}(a^{i}|x_{t}^{i})Q_{t}^{i}(x_{t+1}^{i}|x_{t}^{i},a)}{\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}|\tilde{x}_{t}^{i})}\mbox{if }\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}|\tilde{x}_{t}^{i})>0\\ \sum_{x_{t}^{i}}\pi^{i}(x_{t}^{i})Q_{t}^{i}(x_{t+1}^{i}|x_{t}^{i},a)\;\;\;\;\;\mbox{if }\sum_{\tilde{x}_{t}^{i}}\pi^{{i}}(\tilde{x}_{t}^{i})\gamma^{i}(a^{i}|\tilde{x}_{t}^{i})=0.\end{array}\right. (7)

In the following, a backward-forward algorithm is presented that evaluates SPBE. where it is shown in [1, Theorem 2], this is a “canonical” methodology, in the sense that all SPBE can be generated this way.

III-B Backward Recursion

In this section, define an equilibrium generating function
θ=(θti)i𝒩,t𝒯\theta=(\theta^{i}_{t})_{i\in\mathcal{N},t\in\mathcal{T}}, where θti:×i𝒩Δ(𝒳i){𝒳iΔ(𝒜i)}\theta^{i}_{t}:\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i})\to\left\{\mathcal{X}^{i}\to\Delta(\mathcal{A}^{i})\right\}. In addition, define a sequence of reward-to-go functions of player ii at time tt, (Vti)i𝒩,t{1,2,T+1}(V_{t}^{i})_{i\in\mathcal{N},t\in\{1,2,\ldots T+1\}}, where Vti:×i𝒩Δ(𝒳i)×𝒳iV_{t}^{i}:\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i})\times\mathcal{X}^{i}\to\mathbb{R}. These quantities are generated through a backward recursive way, as follows.

  • 1.

    Initialize π¯T+1×i𝒩Δ(𝒳i),xT+1i𝒳i\forall\underline{\pi}_{T+1}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),x_{T+1}^{i}\in\mathcal{X}^{i},

    VT+1i(π¯T+1,xT+1i)=0.\displaystyle V^{i}_{T+1}(\underline{\pi}_{T+1},x_{T+1}^{i})\buildrel\triangle\over{=}0. (8)
  • 2.

    For t=T,T1,1,π¯t×i𝒩Δ(𝒳i),πt=i𝒩πtit=T,T-1,\ldots 1,\ \forall\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),\pi_{t}=\prod_{i\in\mathcal{N}}\pi_{t}^{i}, let θt[π¯t]\theta_{t}[\underline{\pi}_{t}] be generated as follows. Set γ~t=θt[π¯t]\tilde{\gamma}_{t}=\theta_{t}[\underline{\pi}_{t}], where γ~t\tilde{\gamma}_{t} is the solution, if it exists, of the following fixed-point equation, i𝒩,xti𝒳i\forall i\in\mathcal{N},x_{t}^{i}\in\mathcal{X}^{i},

    γ~ti(|xti)argmaxγti(|xti)\displaystyle\tilde{\gamma}^{i}_{t}(\cdot|x_{t}^{i})\in\arg\max_{\gamma^{i}_{t}(\cdot|x_{t}^{i})} 𝔼γti(|xti)γ~ti,πt{Rti(Xt,At)+\displaystyle\mathbb{E}^{\gamma^{i}_{t}(\cdot|x_{t}^{i})\tilde{\gamma}^{-i}_{t},\,\pi_{t}}\left\{R_{t}^{i}(X_{t},A_{t})+\right.
    Vt+1i(F¯(π¯t,γ~t,At),Xt+1i)|xti},\displaystyle\left.V_{t+1}^{i}(\underline{F}(\underline{\pi}_{t},\tilde{\gamma}_{t},A_{t}),X_{t+1}^{i})\big{\lvert}x_{t}^{i}\right\}, (9)

    where expectation in (9) is with respect to random variables (Xti,At,Xt+1i)(X_{t}^{-i},A_{t},X_{t+1}^{i}) through the measure πti(xti)γti(ati|xti)γ~ti(ati|xti)Qt+1i(xt+1i|xti,at)\pi_{t}^{-i}(x_{t}^{-i})\gamma^{i}_{t}(a^{i}_{t}|x_{t}^{i})\tilde{\gamma}^{-i}_{t}(a^{-i}_{t}|x_{t}^{-i})Q_{t+1}^{i}(x_{t+1}^{i}|x_{t}^{i},a_{t}) and F¯\underline{F} is defined above.

    Furthermore, using the quantity γ~t\tilde{\gamma}_{t} found above, define

    Vti(π¯t,xti)=\displaystyle V^{i}_{t}(\underline{\pi}_{t},x_{t}^{i})\buildrel\triangle\over{=} 𝔼γ~ti(|xti)γ~ti,πt{Rti(Xt,At)\displaystyle\mathbb{E}^{\tilde{\gamma}^{i}_{t}(\cdot|x_{t}^{i})\tilde{\gamma}^{-i}_{t},\,\pi_{t}}\left\{{R}_{t}^{i}(X_{t},A_{t})\right.
    +Vt+1i(F¯(π¯t,γ~t,At),Xt+1i)|xti}.\displaystyle\left.+V_{t+1}^{i}(\underline{F}(\underline{\pi}_{t},\tilde{\gamma}_{t},A_{t}),X_{t+1}^{i})\big{\lvert}x_{t}^{i}\right\}. (10)

Then

Theorem 1 ([1])

A strategy and belief profile (β,μ)(\beta^{*},\mu^{*}), constructed through the backward-forward recursion algorithm is a PBE of the game, i.e., i𝒩,t𝒯,a1:t1tc,x1:ti(𝒳i)t,βi\forall i\in\mathcal{N},t\in\mathcal{T},a_{1:t-1}\in\mathcal{H}_{t}^{c},x_{1:t}^{i}\in(\mathcal{X}^{i})^{t},\beta^{i},

𝔼βt:T,iβt:T,i,μt[a1:t1]{n=tTRni(Xn,An)|a1:t1,x1:ti}\displaystyle\mathbb{E}^{\beta_{t:T}^{*,i}\beta_{t:T}^{*,-i},\,\mu_{t}^{*}[a_{1:t-1}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}a_{1:t-1},x_{1:t}^{i}\right\}\geq
𝔼βt:Tiβt:T,i,μt[a1:t1]{n=tTRni(Xn,An)|a1:t1,x1:ti}.\displaystyle\mathbb{E}^{\beta_{t:T}^{i}\beta_{t:T}^{*,-i},\,\mu_{t}^{*}[a_{1:t-1}]}\left\{\sum_{n=t}^{T}R_{n}^{i}(X_{n},A_{n})\big{\lvert}a_{1:t-1},x_{1:t}^{i}\right\}. (11)
Theorem 2 ([1])

Let (β,μ\beta^{*},\mu^{*}) be an SPBE. Then there exists an equilibrium generating function ϕ\phi that satisfies (9) in backward recursion πt=μt[a1:t1],a1:t1\forall\ \pi_{t}=\mu^{*}_{t}[a_{1:t-1}],\ \forall\ a_{1:t-1}, such that (β,μ\beta^{*},\mu^{*}) is defined through forward recursion using ϕ\phi.

IV Main result: Existence

In this section, we present the main result of this paper where we show that for all π¯t×i𝒩Δ(𝒳i),πt=i𝒩πti\forall\underline{\pi}_{t}\in\times_{i\in\mathcal{N}}\Delta(\mathcal{X}^{i}),\pi_{t}=\prod_{i\in\mathcal{N}}\pi_{t}^{i}, there always exists a γ~=θ[πt]\tilde{\gamma}=\theta[\pi_{t}] that satisfies (9). We show that existence holds true for any compact metric spaces 𝒳,𝒜\mathcal{X},\mathcal{A}. Let Σ\Sigma be the set of all mixed strategies of the form σti(|a1:t1,x1:ti)\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{1:t}^{i}) and Σ\Sigma^{\prime} be the set of all mixed strategies of the form σti(|a1:t1,xti)\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i}). Then both Σ,Σ\Sigma,\Sigma^{\prime} are compact.

Lemma 1

Let ϵ>0\epsilon>0.

  • (a)

    Let LL be a standard measure on 𝒜i\mathcal{A}^{i} such that for any non empty open measurable set VV in the Borel space B(𝒜i)B(\mathcal{A}^{i}), L(V)>0L(V)>0 i.e. L puts non zero measure on every non-empty open measurable set of 𝒜i\mathcal{A}^{i}.222L can be Lesbesgue measure on Euclidean space and counting measure on a discrete space. Let Σ1ϵΣ\Sigma^{\epsilon}_{1}\subset\Sigma^{\prime} be the space of such strategies such that if σΣ1ϵ\sigma\in\Sigma^{\epsilon}_{1} then for all i𝒩,VB(𝒜i),a1:t1,xtii\in\mathcal{N},V\in B(\mathcal{A}^{i}),a_{1:t-1},x_{t}^{i},

    σi(V|a1:t1,xti)ϵL(V)\displaystyle\sigma^{i}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V) (12)

    If 𝒜i\mathcal{A}^{i} is discrete the above condition boils down to: for all i𝒩,aj𝒜i,a1:t1,xtii\in\mathcal{N},a^{j}\in\mathcal{A}^{i},a_{1:t-1},x_{t}^{i},

    σi(aj|a1:t1,xti)ϵ\displaystyle\sigma^{i}(a^{j}|a_{1:t-1},x_{t}^{i})\geq\epsilon (13)
  • (b)

    Let Σ2ϵΣ1ϵ\Sigma^{\epsilon}_{2}\subseteq\Sigma^{\epsilon}_{1} be the space of such strategies such that if σΣ2ϵ\sigma\in\Sigma^{\epsilon}_{2} then for all a1:t1,a^1:t1a_{1:t-1},\widehat{a}_{1:t-1} such that if

    Pσ(xt|a1:t1)=Pσ(xt|a^1:t1)xt,\displaystyle P^{\sigma}(x_{t}|a_{1:t-1})=P^{\sigma}(x_{t}|\widehat{a}_{1:t-1})\;\;\;\;\;\forall x_{t}, (14)

    it implies,

    σti(|a1:t1,xti)=σti(|a^1:t1,xti)i,xti.\displaystyle\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})=\sigma_{t}^{i}(\cdot|\widehat{a}_{1:t-1},x_{t}^{i})\;\;\;\;\forall i,x_{t}^{i}. (15)

    (The above condition ensures the property of structured policies such that if two action histories produce the same common belief under a strategy, then the strategy is indifferent towards those two action histories).

Then Σ2ϵ\Sigma^{\epsilon}_{2} is a compact metric space.

Proof:
  • (a)

    Let σn\sigma^{n} be any sequence of strategies in Σ1ϵ\Sigma^{\epsilon}_{1}. Thus i𝒩,VB(𝒜i),aj,a1:t1,xtii\in\mathcal{N},V\in B(\mathcal{A}^{i}),a^{j},a_{1:t-1},x_{t}^{i},
    σi,n(V|a1:t1,xti)ϵL(V)\sigma^{i,n}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V). Thus limnσi,n(V|a1:t1,xti)ϵL(V)\lim_{n}\sigma^{i,n}(V|a_{1:t-1},x_{t}^{i})\geq\epsilon L(V). Thus Σ1ϵ\Sigma^{\epsilon}_{1} is closed. It is also bounded. Thus Σ1ϵ\Sigma^{\epsilon}_{1} is compact.

  • (b)

    Let 𝒮(a1:t1,a^1:t1):={σΣ1ϵ:xt,Pσ(xt|a1:t1)=Pσ(xt|a^1:t1)}\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{\sigma\in\Sigma^{\epsilon}_{1}:\forall x_{t},\;P^{\sigma}(x_{t}|a_{1:t-1})=P^{\sigma}(x_{t}|\widehat{a}_{1:t-1})\}. Clearly 𝒮(a1:t1,a^1:t1)\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}) is not empty since the strategy that is independent of action history always lies in this set.

    Let (a1:t1,a^1:t1):={σΣ1ϵ:i,xtiσti(|a1:t1,xti)=σti(|a^1:t1,xti)}\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{\sigma\in\Sigma^{\epsilon}_{1}:\forall i,x_{t}^{i}\;\;\sigma_{t}^{i}(\cdot|a_{1:t-1},x_{t}^{i})=\sigma_{t}^{i}(\cdot|\widehat{a}_{1:t-1},x_{t}^{i})\}. (a1:t1,a^1:t1)\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1}) is non empty for the same reason.

    Then Σ2ϵ=a1:t1,a^1:t1(𝒮c(a1:t1,a^1:t1)(a1:t1,a^1:t1))\Sigma^{\epsilon}_{2}=\bigcap_{a_{1:t-1},\widehat{a}_{1:t-1}}\big{(}\mathcal{S}^{c}(a_{1:t-1},\widehat{a}_{1:t-1})\bigcup\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1})\big{)}.

    Let 𝒱(a1:t1,a^1:t1):={g|𝒳|×|𝒜|t1:g(a1:t1)=g(a^1:t1)\mathcal{V}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{g\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{A}|^{t-1}}:\;\;g(a_{1:t-1})=g(\widehat{a}_{1:t-1})}. Since 𝒮(a1:t1,a^1:t1)\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}) is intersection of Σ1ϵ{\Sigma}^{\epsilon}_{1}, which is compact, with subspace 𝒱(a1:t1,a^1:t1)\mathcal{V}(a_{1:t-1},\widehat{a}_{1:t-1}), thus 𝒮(a1:t1,a^1:t1)\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}) is also compact.

    Let 𝒲(a1:t1,a^1:t1):={f=(fi)i,fi|𝒳|×|𝒜|t1×|𝒳i|:i,xti,fi(a1:t1,xti)=fi(a^1:t1,xti)\mathcal{W}(a_{1:t-1},\widehat{a}_{1:t-1}):=\{f=(f^{i})_{i},f^{i}\in\mathbb{R}^{|\mathcal{X}|\times|\mathcal{A}|^{t-1}\times|\mathcal{X}^{i}|}:\forall i,x_{t}^{i},\;\;f^{i}(a_{1:t-1},x_{t}^{i})=f^{i}(\widehat{a}_{1:t-1},x_{t}^{i})}.

    Then (a1:t1,a^1:t1)\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1}) is the intersection of 𝒮(a1:t1,a^1:t1)\mathcal{S}(a_{1:t-1},\widehat{a}_{1:t-1}) with subspace 𝒲(a1:t1,a^1:t1)\mathcal{W}(a_{1:t-1},\widehat{a}_{1:t-1}), thus (a1:t1,a^1:t1)\mathcal{R}(a_{1:t-1},\widehat{a}_{1:t-1}) is also compact.

    This implies Σ2ϵ\Sigma^{\epsilon}_{2} is non-empty and compact.

Theorem 3

There exists an SPBE for such a game.

Proof:

Define a perturbation of the game by restricting the set of strategies to Σϵ\Sigma^{\epsilon}. Since utilities are continuous in strategies on Σϵ\Sigma^{\epsilon} (as all strategies put non zero measure on every non empty open set in the action space which implies the Bayes rule has denominator non-zero), and Σϵ\Sigma^{\epsilon} is a compact metric space by Lemma 1. Let

BRi(βi)=βiΣ2ϵ,iargmaxβiΣϵ,i𝔼βiβi{n=1TRni(Xn,An)}.\displaystyle BR^{i}(\beta^{-i})=\bigcap_{\beta^{i}\in\Sigma_{2}^{\epsilon,i}}\operatorname*{argmax}_{\beta^{i}\in\Sigma^{\epsilon,i}}\mathbb{E}^{\beta^{i}\beta^{-i}}\left\{\sum_{n=1}^{T}R_{n}^{i}(X_{n},A_{n})\right\}. (16)

We note that BRi(βi)BR^{i}(\beta^{-i}) is non-empty βi\forall\beta^{-i} by Lemma 2 in [1] which states that any expected reward that can be achieved using any general profile of strategies, can also be achieved using structured policies, which implies when the intersection of the argmax operation in (16) is taken over structured policies, it is non-empty. Moreover BRi(βi)BR^{i}(\beta^{-i}) is continuous in βi\beta^{-i} by [6, Theorem 2,3].

Since BRBR is continuous on Σϵ\Sigma^{\epsilon}, there exists a fixed-point of BRBR by Glicksberg fixed-point Theorem [7]. Consider a sequence of such perturbed games in which ϵ0\epsilon\to 0; by the compactness of the set of strategy profiles, some sequence of selections from the sets of Nash equilibria of the games in the sequence converges, say to σ\sigma^{*}. Then σ\sigma^{*} is an SPBE of the game. ∎

Theorem 4

There exists a solution to the fixed-point equation (9) for each t,πtt,\pi_{t}.

Proof:

Since there exists an SPBE for such a game from Theorem 3 and every SPBE can be found from [1, Theorem 4], thus there exists a solution to (9) for each t,πtt,\pi_{t}. ∎

Remark In this paper, we showed that there exists a solution of smaller fixed-point equations (9) for each t,πtt,\pi_{t} by showing that there exists a solution to the fixed-point equation which is the Nash equilibrium of the whole game, using Glicksberg Fixed-point equation. It is interesting because the smaller fixed-point equations may and does have discontinuous utilities, as seen through numerical solutions. There have been some results known in existence of solutions of games with discontinuous utilities [8, 9]. It would be an interesting exercise to explore the connection between existence of the two results.

V Conclusion

Authors in [1] proposed a novel methodology to find structured perfect Bayesian equilibria of a general model of dynamic games of asymmetric information where players observe their types privately, which evolve as a controlled Markov process, condition on everyones actions. Players also observe everyone’s actions publicly. In this paper, we prove the existence of the fixed-point equation that crucially appears in their methodology for each time tt and that was left open in that paper.

VI Acknowledgement

The author would like to thank K.S. Mallikarjuna Rao for suggestion to prove existence of SPBE.

References

  • [1] D. Vasal, A. Sinha, and A. Anastasopoulos, “A Systematic Process for Evaluating Structured Perfect Bayesian Equilibria in Dynamic Games With Asymmetric Information,” IEEE Transactions on Automatic Control, vol. 64, no. 1, pp. 81–96, jan 2019.
  • [2] Y. Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic oligopoly games with private {M}arkovian dynamics,” in Proc. 54th IEEE Conf. Decision and Control (CDC), 2015.
  • [3] H. T. Jahormi, “On Design and Analysis of Cyber-Physical Systems with Strategic Agents,” Ph.D. dissertation, University of Michigan, Ann Arbor, 2017.
  • [4] D. Fudenberg and J. Tirole, Game Theory.   Cambridge, MA: MIT Press, 1991.
  • [5] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A common information approach,” Automatic Control, IEEE Transactions on, vol. 58, no. 7, pp. 1644–1658, 2013.
  • [6] D. Vasal and R. Berry, “$alpha-$ robust equilibrium in anonymous games,” may 2020. [Online]. Available: http://arxiv.org/abs/2005.06812
  • [7] I. L. Glicksberg, “A Further Generalization of the Kakutani Fixed Point Theorem, with Application to Nash Equilibrium Points,” Proceedings of the American Mathematical Society, vol. 3, no. 1, p. 170, feb 1952.
  • [8] P. Dasgupta, E. Maskin, and Others, “The existence of equilibrium in discontinuous economic games, I: Theory,” Review of Economic Studies, vol. 53, no. 1, pp. 1–26, 1986.
  • [9] P. Dasgupta and E. Maskin, “The existence of equilibrium in discontinuous economic games, II: Applications,” The Review of economic studies, vol. 53, no. 1, pp. 27–41, 1986.