Stochastic Game with Interactive Information Acquisition:
Pipelined Perfect Markov Bayesian Equilibrium
Version 05 October, 2023
Abstract
This paper studies a multi-player, general-sum stochastic game characterized by a dual-stage temporal structure per period. The agents face uncertainty regarding the time-evolving state that is realized at the beginning of each period. During the first stage, agents engage in information acquisition regarding the unknown state. Each agent strategically selects from multiple signaling options, each carrying a distinct cost. The selected signaling rule dispenses private information that determines the type of the agent. In the second stage, the agents play a Bayesian game by taking actions contingent on their private types. We introduce an equilibrium concept, Pipelined Perfect Markov Bayesian Equilibrium (PPME), which incorporates the Markov perfect equilibrium and the perfect Bayesian equilibrium. We propose a novel equilibrium characterization principle termed fixed-point alignment and deliver a set of verifiable necessary and sufficient conditions for any strategy profile to achieve PPME.
I Introduction
Driven by advancements in technology and improved computational intelligence, the widespread, cost-effective deployment of sensing systems is making it possible for individual participants in large-scale cyber-physical systems to access and process vast amounts of information for real-time decision-making. However, this exposition of information concurrently cultivates an era of uncertainty, largely owing to the complex and escalating disparities in information.
Addressing these uncertainties has emerged as a critical cognitive aspect of rational decision-making in environments dominated by asymmetric information from various sources, such as experts and service providers, to mitigate the detrimental effects of uncertainty. For instance, consider a transportation network featuring multiple heterogeneous traffic information providers (TIPs) (e.g., [1]). Here, intelligent vehicles (the agents) subscribe to TIPs to gain insights into global traffic conditions and available routes, thereby reducing information asymmetry.
As rational actors, each intelligent vehicle selects TIP subscriptions by incorporating information accuracy, often derived from customer reviews, and the subscription costs into its anticipated daily travel requirements typically expressed through payoff functions. Supply-chain service providers (SCPs), for example, may opt for more accurate but costlier TIPs to minimize their expected operational costs due to traffic congestion, whereas individual travelers might tolerate traffic delays and choose less accurate but cheaper TIPs.
The selection of TIPs has direct effects on the routing decisions of intelligent vehicles, which in turn impacts the traffic conditions of the network. Consequently, these traffic conditions influence the travel costs of all vehicles within the network and trigger information updates from their subscribed TIPs. A self-interested, rational SCP might even foresee such interactions and strategically choose TIPs and routing decisions to manipulate the network and its competitors.
The heightened emphasis on rationality necessitates a reconsideration of agents’ information acquisition from a passive act of receiving information to an integral element of rational behavior. The significance of this expanded rationality is two-pronged. Firstly, with the deluge of information that is often irrelevant, deceptive, or even manipulative, agents must possess the ability to identify and gather information that supports their decision-making processes. Secondly, due to their dynamic interactions with other agents and the environment, the choices of information sources not only influence an agent’s local actions but also the decision-making of others, as well as the operation of the environment (through the actions), which in turn affects the agent’s own utilities.
A Bayesian agent typically manages uncertainties by relying on priors and forming posterior beliefs [2] about the unobserved elements of the agent, or by establishing belief hierarchies (beliefs about the state as well as others’ beliefs; see. e.g., [3, 4]) in competitive multi-agent environments. These priors and beliefs fundamentally shape the rational decision-making of agents since they characterize the game’s uncertainty. The concept of Bayesian persuasion studies how a principal can use her informational advantage to strategically reveal noisy information about the state relevant to decision-making to the agents, thereby influencing and manipulating agents’ beliefs to induce behavior in her favor [2, 5, 6, 7, 4].
In this work, we focus on a discrete-time, infinite-horizon stochastic with interactive information acquisition (SGIA) played by a finite group of self-interested agents. SGIA is structured around two sequential decision-making stages within each time period. During the initial stage, agents engage in interactive information acquisition, aiming to procure noisy information about an unknown state, which is realized at the beginning of each period. The nature and quality of the information obtained in this stage subsequently define each agent’s type of unique private information. Each agent has the autonomy to decide how she gets informed about the unknown state by choosing a specific signaling rule that incurs a cost. We consider that the choices of signaling rules made by other agents do not directly influence the type generation of agent . However, they impact agent ’s beliefs regarding the unknown state and the types of other agents. The observation of private types instigates the second stage of the game. Here, each agent decides on a course of regular action, contingent upon her type. Subsequently, the state evolves according to a Markovian dynamic, dependent on the current state and the action profiles of the agents. The decision-making in each stage is simultaneous.
Built upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9], we propose a new equilibrium notion referred to as the pipelined perfect Markov Bayesian equilibrium (PPME). This concept encapsulates the core consistency between the optimalities of agents’ information acquisition and regular action-taking in a Markovian dynamic environment.
We characterize the PPME based on a principle known as the fixed-point alignment. By fixing the strategies for the information acquisition stage, we first formulate the equilibrium behaviors of the regular action-taking as a constrained optimization problem according to the nonlinear optimization formulation for Nash equilibria of a stochastic game (e.g., [10, 11, 12, 13]). We then propose the global fixed-point alignment (GFPA) to characterize the selected signaling rule profiles that match the fixed point of the optimal information acquisition at the first stage to the fixed point from the optimal action choices at the second stage. The GFPA process can be conceptualized as if there is an information designer who aims to induce certain behaviors (i.e., action-taking) of the agents by designing a set of available signaling rules for the agents to choose. Involving the agents’ autonomy of choosing signaling rules at the first stage distinguishes our model from the equilibrium analyses in existing Bayesian persuasion or information design in static environments (e.g., [2, 3, 5, 6]) as well as in dynamic models (e.g., [7, 4, 14]). By decomposing the problem of GFPA into local fixed-point alignment (LFPA) problems, we obtain a set of verifiable conditions known as local admissibility by applying a KKT-like process to the LFPA problems. Under a mild condition, we show that the local admissibility serves as a necessary and sufficient condition, placed on the signaling rules selections and action-takings, for PPME. Thus, if an algorithm converges to locally admissible points, then it provides a PPME for the stochastic game with interactive information acquisition.
The remainder of the paper is organized as follows. In Section II, In Section II, we present a formal description of the stochastic game model and introduce the equilibrium concept of pipelined perfect Markov Bayesian equilibrium (PPME). Section III-A introduces the concept of global fixed-point alignment (GFPA), while Section III-A provides a detailed elaboration on local fixed-point alignment (LFPA). Section III-C provides discussions and concludes the paper. Omitted proofs are delegated to the online appendix in [15].

II Problem Formulation and Equilibrium
II-A Base Game Model
A finite-player infinite-horizon stochastic game can be characterized by a tuple in which
-
•
There is a finite number of players, denoted by .
-
•
The finite set of states that players can encounter at each period is denoted by .
-
•
The finite set of actions that player can take is denoted by .
-
•
The payoff function of player is denoted by: , where .
-
•
The state evolves over time according to ; i.e., the probability of is given by when period- state is and action profile is . is the initial distribution of the state.
-
•
is the common discount factor.
For notational compactness, we use history, denoted by , to capture the pair of state and action profile of the last period; i.e., and . We assume that is common knowledge; i.e., and become publicly observable at the end of each period .
II-B Interactive Information Acquisition
We consider that the agents do not observe the realizations of the state at the beginning of each period. Instead, in each period , the agents engage in interactive information acquisition to obtain additional information about the unobserved state .
Based on history , agents’ interactive information acquisition leads to an information profile, denote by at the beginning of each period , where represents the profile of the agents’ choices of information acquisition. Here, is the profile of finite type spaces for the agents. is the joint probability of the state and the type profile. is the signaling rule profile that generates a type profile for the agents. The signaling rule profile is independent if , or correlated if the marginal concerning each agent , , depends on . In this work, we focus on independent signaling rules. In addition, we refer to as the cognition choice profile where each is agent ’s cognition choice that indexes the agent’s choice of . We assume that the cardinalities of and are the same for all agents; i.e., and , for all .
Given the state dynamics specified by the base game , we assume that the information profile satisfy
(1) | |||
(2) |
Given the base game , define the set of available signaling rule profiles by
(3) |
After all agents made their cognition choices, each agent privately receives a type with probability . Based on his type, agent chooses an action .
Each agent ’s information acquisition induces a cognition cost. In this work, we restrict attention to the cognition cost that depends on the true state and the action taken by the agent. That is, after choosing , agent suffers a cost that is realized at the end of each period when the true state is and agent takes action . This cost scheme prices agent ’s information acquisition based on the consequences of the information acquisition (i.e., the agent’s local action ) when the true state is .
II-C Stochastic Game with Interactive Information Acquisition
Let denote the cognition scheme for some , where . The base game and the cognition scheme induces a stochastic game with interactive information acquisition (SGIA), denoted by . Each agent ’s decision making in each period is described as follows.
-
•
Contingent on the history , agent uses a pure-strategy selection policy to select .
-
•
Contingent on the type , agent uses a mixed-strategy policy to choose mixed action .
We say that a policy profile is feasible if it satisfies the following constraints:
(FE1) | ||||
(FE2) |
With reference to Fig. 1, the following events occur in each period of .
-
1.
Nature draws a state according to .
-
2.
Each agent selects a cognition choice , based on the common history, which determines . These selections are simultaneous.
-
3.
Based on the cognition choice profile , a type profile is drawn with probability .
-
4.
Each agent privately observes his type and then chooses an action with probability .
-
5.
The state and the action profile (or, ) become public information, and the state is transitioned to a new state according to .
II-D Value Functions
According to the Ionescu Tulcea theorem [16], , the agents’ policy profile , and a cost profile (for a certain cost scheme) uniquely define a probability measure, denoted by , over . Let denote the expectation operator with respect to . In addition, given any and , we obtain unique probability measures (perceived by agent ) and over and , respectively, for all . In particular, models the uncertainty at period for each agent at the beginning of the selection stage while models the uncertainty for each agent at the beginning of the primitive stage. Let and , respectively, denote the expectation operators with respect to and .
Given , agent ’s period- history value function (H value function) is defined by
(H) |
After a type is realized, each agent forms a posterior belief, denoted by , over the state and other agents’ contemporaneous types . Due to (3), there is a (period-) common prior such that each posterior belief satisfies
With abuse of notation, we use and for the marginals of . Given and , we define each agent ’s expected immediate reward (due to agent ’s uncertainty about ) by
Given , agent ’s period- history-type value function (HT value function) is defined by
(HT) | ||||
Here, depends on through the policy profile to take expectation over the current-period action profile. Finally, agent ’s period- history-type-action value function (HTA value function) is defined by
(HTA) | ||||
Here, is independent of given the action profile .
II-E Pipelined Perfect Markov Bayesian Equilibrium
In this section, we define a new equilibrium concept for the game . Our focus lies on the stationary equilibrium, and as such, we omit the time indexes of the value functions and variables, unless explicitly mentioned otherwise. First, we construct
(4) | |||
(5) |
Definition 1 (Pipelined Perfect Markov Bayesian Equilibrium).
In a game , a (stationary) strategy profile constitutes a pipelined perfect Markov Bayesian equilibrium (PPME) if for all , , , it holds in every period that, for and such that ,
(6) | |||
(7) |
The equilibrium concept of PPME builds upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9].
Lemma 1.
Let with feasible cognition profile . A strategy profile is a PPME of a game if and only if
(8) | ||||
Hence, in a PPME problem, each agent tries to solve the optimization problem given :
(9) |
Theorem 1.
Fix any and , there exists at least one profile such that the game admits at least one stationary PPME.
III Equilibrium Characterizations
In this section, we characterize the stationary PPME problem for a given game by formulating it as a constrained optimization problem and establishing a verifiable condition that is both necessary and sufficient.
Following standard dynamic programming argument (see, e.g., [17]), we represent (H), (HT), and (HTA) recursively as follows:
(10) | |||
(11) | |||
(12) |
Here, we denote and with to highlight their dependence on from the Bellman recursions (10)-(12). Note that in the right-hand side (RHS) of (11) is given by (HTA).
(OBJ1) | ||||
and in addition, construct the following constraints:
(EQ1) | |||
(EQ2) |
Define the following set
(13) |
Let
(OPT) |
Proposition 1.
Fix . In a game , a profile is a PPME if and only if (i) , where is the corresponding optimal HT value functions, and (ii) .
Proposition 1 extends the fundamental formulation of finding a Nash equilibrium of a stochastic game as a nonlinear programming (Theorem 3.8.2 of [10]; see also, [11, 12, 13]). Here, the constraints (FE1) and (FE2) ensure that each candidate is a valid conditional probability distribution and rules out the possible trivial solution . The constraints (EQ1) and (EQ2) are two necessary conditions for a PPME derived from the optimality of PPME and the Bellman recursions (10) and (11).
III-A Global Fixed-Point Alignment
First, we extend the agents’ strategy profile to . Given as a variable, we define the following term based on (4):
(14) |
If we fix a , Proposition 1 implies that of a PPME profile needs to be a global minimum of (OPT) with . Equivalently, given , each needs to be a fixed point of the following equation, for all , , ,
(EQ4) |
where dependence of the RHS of (EQ4) on is due to (12). At the cognition stage, the optimality of PPME requires that the agents’ choice among all possible options is optimal. Given any , define, for all , , ,
(15) | ||||
where on the RHS of (15) is a H value function of the next period given current-period . The optimality of in the cognition stage of a PPME (i.e., constraint (EQ1)) implies that the optimal history value function for each agent needs to be a fixed point while fixing others’ . That is,
(EQ3) |
Here, (EQ3) is independent of while (EQ4) is independent of . In order to make as a PPME of , must be chosen such that there exist satisfying
We refer to such a procedure as the Global Fixed-Point Alignment (GFPA).
Since is a pure strategy, each deterministic choice of leads to a signaling rule that determines a distribution of agent ’s period- types. That is, every determines a unique for every . Hence, for ease of exposition, we use and (with abuse of notation) to represent and , respectively; unless otherwise stated. Therefore, in game , each agent controls .
Given a , define the following function of , , and :
(OBJ2) | ||||
Similar to (FE1) and (FE2) for , we introduce the following two constraints placed on :
(RG1) | |||
(RG2) |
Define the following set:
(16) |
Let
(GFPA) |
Proposition 2.
III-B Local Fixed-Point Alignment
First, we decompose each type space into such that the constraint (RG2) can reformulated as, for all , , , ,
(17) | ||||
Given , define for all ,
Construct the vector . Define
Here, is a function of , , and and is independent of and .
For any , , , define the following function
(18) | ||||
Define the following set
(19) |
Then, we define the problem of Local Fixed-Point Alignment (LFPA) by
(LFPA) |
Let , , , respectively, denote the Lagrange multipliers of the constraints , , and . In addition, the corresponding slack variables are denoted by , , , respectively. Then, the Lagrangian of (LFPA) is defined by
(20) | ||||
To simplify the presentation, we omit and . Taking partial derivatives of with respect to and yields,
Let , , , , and . Define
Construct the set
(21) |
For any , define
(22) | ||||
where is defined via replacing in (12) by . That is,
Define the set
(23) |
We define a set of conditions termed local admissibility as follows.
Definition 2 (Local Admissibility).
A profile is locally admissible if and , for all , .
Theorem 2.
Suppose that is a set of linearly independent vectors for all , , , . Then, is a PPME if and only if is locally admissible.
The proof of Theorem 2 is deferred to Appendix -H. Theorem 2 provides necessary and sufficient conditions for characterizing the PPME. In particular, if there is an algorithm converges to a local admissible point given a feasible cognition profile under a linear independence assumption, then the associated profile is a PPME. That is, a locally admissible point achieves and .
III-C Discussion
Following the simplification to represent the choice of using , the function given by (OBJ1) can be written as , and and , respectively, given by (13) and (OPT) can be written as and . With abuse of notation, we additionally rewrite and , respectively, as and by fixing an arbitrary . Hence, and becomes sets of profiles and , respectively.
Suppose that the game admits at least one PPME. Then, a PPME profile (or ) of can be obtained by solving the following bi-level constrained optimization problem:
(24) | ||||
The problem (24) is equivalent to
(25) | ||||
Let us restrict attention to the problem (25). Proposition 1 implies that for any fixed , any satisfies . Consider the following set
(26) |
Then, we can reformulate the problem (25) as
(27) |
The total number of decision variables of the optimization problem (27) is . Let denote the set of active constraints and the equality constraints. If we use algorithms that depend on Linear Independence Constraint Qualification (LICQ) to solve (27), then we require the gradients of be linearly independent. However, at the global minimum of (27), the number of active constraints plus the number of equality constraints are at least as great as the number of decision variables; i.e., . In general models of , the linear independence of the gradients of is a relatively restrictive condition.
Theorem 2 shows that the local admissibility can fully characterize the PPME under a condition that requires to be a set of linearly independent vectors for every , , and . For any given and , for all , which is greater than . Consequently, we can assert that the requirement for linear independence amongst is generally less restrictive compared to the necessity for linear independence among the gradients of .
The local admissibility (Definition 2) can be decomposed into two parts. First, specifies conditions for a policy profile given and . Second, , for all , , which is independent of the profile . The second part of the local admissibility implies that any algorithm that searches for the zero of the gradients of the Lagrangian of (LFPA), while converging to , converges to a PPME. Designing algorithms that converge to the local admissibility will be our future work.
IV Perfect Information Cognition Choice
In this section, we study when the set of available signaling rule profiles contains a signaling rule that releases the true realizations of the states in every period.
IV-A Cognition Cost Schemes
Each agent chooses a cognition choice with a cost . Define . Let denote the power set without the empty set that includes all non-empty subsets of . Then, we define the set of cost functions that can have as their domain any non-empty combination of , , and by
Here are some examples of cognition functions. The cognition function is cognition-based (CB) if each ; that is, is the cost if agent chooses . The CB cost directly prices each agent ’s cognition choice. The cost function is type-based (TB) if each ; that is, agent suffers a cost if a type is realized to him according to his cognition choice. With the TB cost, each agent ’s cognition decision is priced based on the realized type. The cost function is state-type-based (STB) if each ; that is, each agent ’s cognition cost is if the state is and agent receives a type . The STB cost takes into account the realized state and each agent’s type. This cost scheme can capture the settings when the pricing of cognition depends on the difference between the information (about the state) encapsulated in the type and the state. The cost function is state-action-based (SAB) if each . That is, each agent suffers a cost that depends on the true state and his local action . This cost scheme prices agent ’s cognition based on the consequences of the information acquisition (i.e., ) given the true state. The cost function is mutual information (MI) if each is defined by
where denotes the conditional entropy operator (see, e.g., [18]).
Let denote the cognition profile for some . We assume that the cardinalities of and are the same for all agents; i.e., and , for all .
IV-B Perfect-Information PPME
In this section, we introduce PPME under perfect information. We start by focusing on a general non-stationary case.
Definition 3 (Perfect Information Structure).
An information structure is perfect-information if and there exists for all .
We use to denote the perfect(-information) information structure, where for all , . Let with and denote the menu of signaling rules that contains , and let denote the corresponding cognition profile.
With abuse of notation, we use to denote agent ’s cognition choice that leads to perfect information structure. Suppose that all other agents choose in every period in the cognition stage. Hence, each agent knows that other agents observe the true (i.e., ) though agent may not observe (i.e., ). For simplicity, we use as agent ’s signaling rule due to his period- cognition choice without specifying ; the same simplification is made for and . Hence, agent ’s value functions (H)-(HTA) can be rewritten as
(PI-H) | |||
(PI-HT) | |||
(PI-HTA) |
Define the set
(28) | ||||
The perfect-information PPME (PI-PPME) is defined as follows.
Definition 4 (PI-PPME).
In a game , a profile constitutes a PI-PPME if for all , , it holds in every period that, for all , , , ,
(29) | ||||
For any profile , define each agent ’s period- ex-post history-state value function (EP-HSA value function) by
(HSA) | ||||
Theorem 3 establishes a relationship between a PPME and a PI-PPME in terms of the H value functions.
Theorem 3.
Fix a base game . For a profile that constitutes a PPME of a game with feasible cognition profile for any , there exists a PI-PPME profile of a game with feasible cognition profile for SAB cost profile , such that, for all , ,,
(30) | ||||
IV-C Value-Preserving Transformation: From PI-PPME to PPME
Given any history , we define the perfect-information (PI) achievable value set, as follows:
(31) | ||||
For any PI-PPME with a SAB cost profile, let . For any in , there must exist a PI-PPME with a SAB cost profile that leads to H value for each agent given history . Construct
Define the function by
(32) | ||||
Proposition 3.
Given any PI-PPME profile with a SAB cost profile, there exists at least profile with with a feasible cognition profile with any cost profile such that
(fp) |
Theorem 4.
Given any PI-PPME profile with a SAB cost profile, a profile with a feasible cognition profile with any cost profile is a PPME if and only if, it satisfies (fp).
References
- [1] M. Wu, S. Amin, and A. E. Ozdaglar, “Value of information in bayesian routing games,” Operations Research, vol. 69, no. 1, pp. 148–163, 2021.
- [2] E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, 2011.
- [3] L. Mathevet, J. Perego, and I. Taneva, “On information design in games,” Journal of Political Economy, vol. 128, no. 4, pp. 1370–1404, 2020.
- [4] T. Zhang and Q. Zhu, “Bayesian promised persuasion: Dynamic forward-looking multiagent delegation with informational burning,” arXiv preprint arXiv:2201.06081, 2022.
- [5] A. Celli, S. Coniglio, and N. Gatti, “Private bayesian persuasion with sequential games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, 2020, pp. 1886–1893.
- [6] Y. Babichenko, I. Talgam-Cohen, and K. Zabarnyi, “Bayesian persuasion under ex ante and ex post constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5127–5134.
- [7] J. Gan, R. Majumdar, G. Radanovic, and A. Singla, “Bayesian persuasion in sequential decision-making,” arXiv preprint arXiv:2106.05137, 2021.
- [8] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable actions,” Journal of Economic Theory, vol. 100, no. 2, pp. 191–219, 2001.
- [9] D. Fudenberg and J. Tirole, Game Theory. Ane Books, 2005. [Online]. Available: https://books.google.com.hk/books?id=Ij7WQwAACAAJ
- [10] J. Filar and K. Vrieze, “Competitive markov decision processes-theory, algorithms, and applications,” 1997.
- [11] H. Prasad and S. Bhatnagar, “General-sum stochastic games: Verifiability conditions for nash equilibria,” Automatica, vol. 48, no. 11, pp. 2923–2930, 2012.
- [12] H. Prasad, P. LA, and S. Bhatnagar, “Two-timescale algorithms for learning nash equilibria in general-sum stochastic games,” in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015, pp. 1371–1379.
- [13] J. Song, H. Ren, D. Sadigh, and S. Ermon, “Multi-agent generative adversarial imitation learning,” Advances in neural information processing systems, vol. 31, 2018.
- [14] J. Wu, Z. Zhang, Z. Feng, Z. Wang, Z. Yang, M. I. Jordan, and H. Xu, “Sequential information design: Markov persuasion process and its efficient reinforcement learning,” arXiv preprint arXiv:2202.10678, 2022.
- [15] T. Zhang and Q. Zhu, “Forward-looking dynamic persuasion for pipeline stochastic bayesian game: A fixed-point alignment principle,” arXiv preprint arXiv:2203.09725, 2022.
- [16] O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes: basic optimality criteria. Springer Science & Business Media, 2012, vol. 30.
- [17] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
- [18] Elements of information theory. John Wiley & Sons, 1999.
-D Proof of Lemma 1
The only if part is straightforward. In particular, if is a PPME, then and (6) holds for all . Hence, (6) also holds for .
We proceed with the proof of the if part by establishing a contradiction. Suppose that there exists a pair such that
(33) |
-E Proof of Proposition 1
Suppose that with is a PPME. By the definition of PPME, it is straightforward to show that the constraints (FE1), (FE2), (EQ1), and (EQ2) are satisfied. Hence, is a feasible solution to the optimization problem of (OPT). By construction, . From the feasibility, is a global minimum of the optimization problem of (OPT).
Conversely, suppose that with . Then, the constraints (EQ1) and (EQ2) imply that, for all , , with where with ,
However . Then, we obtain, for all
By iteration, we have that is the unique optimal HT value function profile associated with . In addition, the constraint (EQ1) implies that given , is a PPME selection policy profile. Therefore, the profile is a PPME.
-F Proof of Proposition 2
Suppose that , i.e., is a global minimum of the optimization problem in (OPT) with . Then, the constraints (RG1) and (RG2) are trivially satisfied. Proposition 1 implies that is a PPME. From the construction of in (OBJ1) and the constraint (EQ2), we have that satisfies (EQ4). According to (10), we construct as . Then, . Since satisfies (EQ1) given ,
for all and , which implies (EQ3). From the constraints (EQ3) and (EQ4), we know that for any feasible , where is the corresponding policy profile. Therefore, from , we conclude that is a global minimum of the optimization problem in (GFPA).
-G Proof of Theorem 2
To prove Theorem 2, we show that with if and only if is locally admissible.
Local Admissibility PPME
Fix any and . Suppose that is locally admissible. From ,
Since is a set of linearly independent vectors for all , , , , we have, for all ,
(36) |
In the decomposition , can be fully characterized by . That is,
From , we have . Then, and . From (38) and , we have, for all ,
and for all ,
which implies
(37) | ||||
Therefore, and imply , leading to . In addition, implies that . From Proposition 1, with constitutes a PPME. Then, from Proposition 2, we have with .
PPME Local Admissibility
Suppose that is a PPME. Hence, Proposition 2 implies with . Then, it holds for every that
for all , , . which implies that . Since , we have
Then, from the definition of in (18), . Since for all , we have
By constructing according to (38) and and according to (39), respectively, we can show that there exist Lagrange multipliers such that the conditions in are satisfied.
From Proposition 1, with . Hence, we have
However, . Then, it holds that
Therefore, we have , for all . Thus, we conclude that is locally admissible.
-H Proof of Theorem 2
To prove Theorem 2, we show that with if and only if is locally admissible.
Local Admissibility PPME
Fix any and . Suppose that is locally admissible. From ,
Since is a set of linearly independent vectors for all , , , , we have, for all ,
(38) |
In the decomposition , can be fully characterized by . That is,
From , we have . Then, and . From (38) and , we have, for all ,
and for all ,
which implies
(39) | ||||
Therefore, and imply , leading to . In addition, implies that . From Proposition 1, with constitutes a PPME. Then, from Proposition 2, we have with .
PPME Local Admissibility
Suppose that is a PPME. Hence, Proposition 2 implies with . Then, it holds for every that
for all , , . which implies that . Since , we have
Then, from the definition of in (18), . Since for all , we have
By constructing according to (38) and and according to (39), respectively, we can show that there exist Lagrange multipliers such that the conditions in are satisfied.
From Proposition 1, with . Hence, we have
However, . Then, it holds that
Therefore, we have , for all . Thus, we conclude that is locally admissible.
-I Proof of Theorem 3
Consider that is a PPME of a game with any cognition cost profile . Hence, the profile and the base game model induces the probability measures , , and . With abuse of notation, we use to denote the marginal mass or density function corresponding to ; e.g., , , . Given the PPME of a game with feasible cognition profile , consider a profile and a SBA cost profile that satisfy the following, for all , ,
In the PPME of the game , the H value function for any can be given by,
(40) | ||||
Given the profile , the H value function for any can be represented in terms of the EP-HSA value function as follows:
(41) | ||||
Given , it holds that . Hence,
Next, we prove that the profile with is indeed a PI-PPME. We proceed with the proof by showing a contradiction. Let denote a profile that is the same as except for agent ’s period- choice . Define in the same way such that . In addition, the cognition cost remains the same. Given any history , the the profile induces the H value function as
Here, because is corresponding to . If with is not a PI-PPME, then there must exist a history and a profile such that , which implies that can be strictly improved by unilateral deviation which contradicts the fact that is PPME.