Existence of structured perfect Bayesian equilibrium in dynamic games of asymmetric information
Abstract
In [1], authors considered a general finite horizon model of dynamic game of asymmetric information, where players have types evolving as independent Markovian process, where each player observes its own type perfectly and actions of all players. The authors present a sequential decomposition algorithm to find all structured perfect Bayesian equilibria of the game. The algorithm consists of solving a class of fixed-point of equations for each time , whose existence was left as an open question. In this paper, we prove existence of these fixed-point equations for compact metric spaces.
I Introduction
Authors in [1] considered a model consisting of strategic players having types that evolve as conditionally independent Markov controlled processes. Players observe their types privately and actions taken by all players are observed publicly. Instantaneous reward for each player depends on everyone’s types and actions. The proposed methodology provides a decomposition of the interdependence between beliefs and strategies in PBE and enables a systematic evaluation of a subset of PBE, namely structured perfect Bayesian equilibria (SPBE). Here SPBE are defined as equilibria with players strategies based on their current private type and a set of beliefs on each player’s current private type, which is common to all the players and whose domain is time-invariant. The beliefs on players’ types are such that they can be updated individually for each player and sequentially w.r.t. time. The model allows for signaling amongst players as beliefs depend on strategies. They present a sequential decomposition methodology for calculating all SPBEs for finite and infinite horizon dynamic games with asymmetric information.
For the finite horizon model, they provide a two-step algorithm involving a backward recursion followed by a forward recursion. The algorithm works as follows. In the backward recursion, for every time period, the algorithm finds an equilibrium generating function defined for all possible common beliefs at that time. This involves solving a one-step fixed point equation on the space of probability simplexes. Existence of this fixed-point equation was left as an open question. If there exists a solution for every common belief , then the equilibrium strategies and beliefs are obtained through a forward recursion using the equilibrium generating function obtained in the backward step and the Bayes update rule. A similar notion of equilibrium was defined in [2, 3] where authors provide a sequential decomposition algorithm to compute common information based perfect Bayesian equilibrium. In [3], it was shown that such an equilibrium exists for zero-sum games.
In this paper, we consider the finite horizon game with all sets of variables in a compact metric space. We show that there always exists an SPBE for such a game. Since it is proved in [1] that all SPBEs can be found using their algorithm, we conclude that there always exists a solution to the fixed-point equation considered in [1] for each time period .
II Model and Preliminaries
We consider a discrete-time dynamical system with strategic players in the set . We consider two cases: finite horizon with perfect recall and infinite horizon with perfect recall. The system state is , where is the type of player at time , which is perfectly observed and is its private information. Players’ types evolve as conditionally independent, controlled Markov processes such that
(1a) | ||||
(1b) | ||||
(1c) |
where are known kernels. Player at time takes action on observing the actions where , which is common information among players, and the types , which it observes privately. The sets are assumed to be finite. Let be a probabilistic strategy of player where such that player plays action according to . Let be a strategy profile of all players. At the end of interval , player receives an instantaneous reward . To preserve the information structure of the problem, it is assumed that players do not observe their rewards until the end of game.111Alternatively, we could have assumed instantaneous reward of a player to depend only on its own type, i.e. be of the form , and have allowed rewards to be observed by the players during the game as this would not alter the information structure of the game The reward functions and state transition kernels are common knowledge among the players. For the finite-horizon problem, the objective of player is to maximize its total expected reward
(2) |
II-A Preliminaries
Any history of this game at which players take action is of the form . Let be the set of such histories, be the set of all possible such histories in finite horizon and for infinite horizon. At any time player observes and all players together have as common history. Let be the set of observed histories of player at time and be the set of common histories at time . An appropriate concept of equilibrium for such games is PBE [4], which consists of a pair of strategy profile where and a belief profile where that satisfy sequential rationality so that
(3) |
where the reward-to-go is defined as
(4) |
and the beliefs satisfy some consistency conditions as described in [4, p. 331]. In general, a belief for player at time , is defined on history given its private history . Here player ’s private history consists of a public part and a private part . At any time , the relevant uncertainty player has is about other players’ types and their future actions. Due to independence of types, and given the common history , player ’s type history does not provide any additional information about , as will be shown later. For this reason only those beliefs are considered that are functions of each player’s history only through the common history . Hence, for each player , its belief for each history is derived from a common belief . Furthermore, as will be shown later, this belief factorizes into a product of marginals . Thus the following system of beliefs can suffiently be used, , where , and , with the understanding that player ’s belief on is . Under the above structure, all consistency conditions that are required for PBEs [4, p. 331] are automatically satisfied.
III Structured perfect Bayesian algorithm
At any time , player has information where is the common information among players, and is the private information of player . Since increases with time, any strategy of the form becomes unwieldy. Thus it is desirable to have an information state in a time-invariant space that succinctly summarizes , and that can be sequentially updated. IFor any strategy profile , belief on , , is defined as . Define the marginals .
Then using common information approach [5], the problem is posed as follows: player at time observes and takes action , where is a partial (stochastic) function from its private information to , of the form . These actions are generated through some policy , , that operates on the common information such that . Then any policy of the form is equivalent to .
III-A Forward/Backward algorithm
In Lemma 3 in Appendix B of [1], it is shown that due to the independence of types and their evolution as independent controlled Markov processes, for any strategy of the players, the joint common belief can be factorized as a product of its marginals i.e., . Define as vector of marginal beliefs where . Similarly define the vector of belief updates as where (using Bayes rule)
(7) |
In the following, a backward-forward algorithm is presented that evaluates SPBE. where it is shown in [1, Theorem 2], this is a “canonical” methodology, in the sense that all SPBE can be generated this way.
III-B Backward Recursion
In this section, define an equilibrium generating function
, where . In addition, define a sequence of reward-to-go functions of player at time , , where .
These quantities are generated through a backward recursive way, as follows.
-
1.
Initialize ,
(8) -
2.
For , let be generated as follows. Set , where is the solution, if it exists, of the following fixed-point equation, ,
(9) where expectation in (9) is with respect to random variables through the measure and is defined above.
Furthermore, using the quantity found above, define
(10)
Then
Theorem 1 ([1])
A strategy and belief profile , constructed through the backward-forward recursion algorithm is a PBE of the game, i.e., ,
(11) |
IV Main result: Existence
In this section, we present the main result of this paper where we show that for all , there always exists a that satisfies (9). We show that existence holds true for any compact metric spaces . Let be the set of all mixed strategies of the form and be the set of all mixed strategies of the form . Then both are compact.
Lemma 1
Let .
-
(a)
Let be a standard measure on such that for any non empty open measurable set in the Borel space , i.e. L puts non zero measure on every non-empty open measurable set of .222L can be Lesbesgue measure on Euclidean space and counting measure on a discrete space. Let be the space of such strategies such that if then for all ,
(12) If is discrete the above condition boils down to: for all ,
(13) -
(b)
Let be the space of such strategies such that if then for all such that if
(14) it implies,
(15) (The above condition ensures the property of structured policies such that if two action histories produce the same common belief under a strategy, then the strategy is indifferent towards those two action histories).
Then is a compact metric space.
Proof:
-
(a)
Let be any sequence of strategies in . Thus ,
. Thus . Thus is closed. It is also bounded. Thus is compact. -
(b)
Let . Clearly is not empty since the strategy that is independent of action history always lies in this set.
Let . is non empty for the same reason.
Then .
Let }. Since is intersection of , which is compact, with subspace , thus is also compact.
Let }.
Then is the intersection of with subspace , thus is also compact.
This implies is non-empty and compact.
∎
Theorem 3
There exists an SPBE for such a game.
Proof:
Define a perturbation of the game by restricting the set of strategies to . Since utilities are continuous in strategies on (as all strategies put non zero measure on every non empty open set in the action space which implies the Bayes rule has denominator non-zero), and is a compact metric space by Lemma 1. Let
(16) |
We note that is non-empty by Lemma 2 in [1] which states that any expected reward that can be achieved using any general profile of strategies, can also be achieved using structured policies, which implies when the intersection of the argmax operation in (16) is taken over structured policies, it is non-empty. Moreover is continuous in by [6, Theorem 2,3].
Since is continuous on , there exists a fixed-point of by Glicksberg fixed-point Theorem [7]. Consider a sequence of such perturbed games in which ; by the compactness of the set of strategy profiles, some sequence of selections from the sets of Nash equilibria of the games in the sequence converges, say to . Then is an SPBE of the game. ∎
Theorem 4
There exists a solution to the fixed-point equation (9) for each .
Proof:
Remark In this paper, we showed that there exists a solution of smaller fixed-point equations (9) for each by showing that there exists a solution to the fixed-point equation which is the Nash equilibrium of the whole game, using Glicksberg Fixed-point equation. It is interesting because the smaller fixed-point equations may and does have discontinuous utilities, as seen through numerical solutions. There have been some results known in existence of solutions of games with discontinuous utilities [8, 9]. It would be an interesting exercise to explore the connection between existence of the two results.
V Conclusion
Authors in [1] proposed a novel methodology to find structured perfect Bayesian equilibria of a general model of dynamic games of asymmetric information where players observe their types privately, which evolve as a controlled Markov process, condition on everyones actions. Players also observe everyone’s actions publicly. In this paper, we prove the existence of the fixed-point equation that crucially appears in their methodology for each time and that was left open in that paper.
VI Acknowledgement
The author would like to thank K.S. Mallikarjuna Rao for suggestion to prove existence of SPBE.
References
- [1] D. Vasal, A. Sinha, and A. Anastasopoulos, “A Systematic Process for Evaluating Structured Perfect Bayesian Equilibria in Dynamic Games With Asymmetric Information,” IEEE Transactions on Automatic Control, vol. 64, no. 1, pp. 81–96, jan 2019.
- [2] Y. Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic oligopoly games with private {M}arkovian dynamics,” in Proc. 54th IEEE Conf. Decision and Control (CDC), 2015.
- [3] H. T. Jahormi, “On Design and Analysis of Cyber-Physical Systems with Strategic Agents,” Ph.D. dissertation, University of Michigan, Ann Arbor, 2017.
- [4] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA: MIT Press, 1991.
- [5] A. Nayyar, A. Mahajan, and D. Teneketzis, “Decentralized stochastic control with partial history sharing: A common information approach,” Automatic Control, IEEE Transactions on, vol. 58, no. 7, pp. 1644–1658, 2013.
- [6] D. Vasal and R. Berry, “$alpha-$ robust equilibrium in anonymous games,” may 2020. [Online]. Available: http://arxiv.org/abs/2005.06812
- [7] I. L. Glicksberg, “A Further Generalization of the Kakutani Fixed Point Theorem, with Application to Nash Equilibrium Points,” Proceedings of the American Mathematical Society, vol. 3, no. 1, p. 170, feb 1952.
- [8] P. Dasgupta, E. Maskin, and Others, “The existence of equilibrium in discontinuous economic games, I: Theory,” Review of Economic Studies, vol. 53, no. 1, pp. 1–26, 1986.
- [9] P. Dasgupta and E. Maskin, “The existence of equilibrium in discontinuous economic games, II: Applications,” The Review of economic studies, vol. 53, no. 1, pp. 27–41, 1986.