This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stochastic Game with Interactive Information Acquisition:
Pipelined Perfect Markov Bayesian Equilibrium
Version 05 October, 2023

Tao Zhang, Quanyan Zhu Tao Zhang and Quanyan Zhu are with Department of Electrical and Computer Engineering, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA. {tz636, qz494}@nyu.edu
Abstract

This paper studies a multi-player, general-sum stochastic game characterized by a dual-stage temporal structure per period. The agents face uncertainty regarding the time-evolving state that is realized at the beginning of each period. During the first stage, agents engage in information acquisition regarding the unknown state. Each agent strategically selects from multiple signaling options, each carrying a distinct cost. The selected signaling rule dispenses private information that determines the type of the agent. In the second stage, the agents play a Bayesian game by taking actions contingent on their private types. We introduce an equilibrium concept, Pipelined Perfect Markov Bayesian Equilibrium (PPME), which incorporates the Markov perfect equilibrium and the perfect Bayesian equilibrium. We propose a novel equilibrium characterization principle termed fixed-point alignment and deliver a set of verifiable necessary and sufficient conditions for any strategy profile to achieve PPME.

I Introduction

Driven by advancements in technology and improved computational intelligence, the widespread, cost-effective deployment of sensing systems is making it possible for individual participants in large-scale cyber-physical systems to access and process vast amounts of information for real-time decision-making. However, this exposition of information concurrently cultivates an era of uncertainty, largely owing to the complex and escalating disparities in information.

Addressing these uncertainties has emerged as a critical cognitive aspect of rational decision-making in environments dominated by asymmetric information from various sources, such as experts and service providers, to mitigate the detrimental effects of uncertainty. For instance, consider a transportation network featuring multiple heterogeneous traffic information providers (TIPs) (e.g., [1]). Here, intelligent vehicles (the agents) subscribe to TIPs to gain insights into global traffic conditions and available routes, thereby reducing information asymmetry.

As rational actors, each intelligent vehicle selects TIP subscriptions by incorporating information accuracy, often derived from customer reviews, and the subscription costs into its anticipated daily travel requirements typically expressed through payoff functions. Supply-chain service providers (SCPs), for example, may opt for more accurate but costlier TIPs to minimize their expected operational costs due to traffic congestion, whereas individual travelers might tolerate traffic delays and choose less accurate but cheaper TIPs.

The selection of TIPs has direct effects on the routing decisions of intelligent vehicles, which in turn impacts the traffic conditions of the network. Consequently, these traffic conditions influence the travel costs of all vehicles within the network and trigger information updates from their subscribed TIPs. A self-interested, rational SCP might even foresee such interactions and strategically choose TIPs and routing decisions to manipulate the network and its competitors.

The heightened emphasis on rationality necessitates a reconsideration of agents’ information acquisition from a passive act of receiving information to an integral element of rational behavior. The significance of this expanded rationality is two-pronged. Firstly, with the deluge of information that is often irrelevant, deceptive, or even manipulative, agents must possess the ability to identify and gather information that supports their decision-making processes. Secondly, due to their dynamic interactions with other agents and the environment, the choices of information sources not only influence an agent’s local actions but also the decision-making of others, as well as the operation of the environment (through the actions), which in turn affects the agent’s own utilities.

A Bayesian agent typically manages uncertainties by relying on priors and forming posterior beliefs [2] about the unobserved elements of the agent, or by establishing belief hierarchies (beliefs about the state as well as others’ beliefs; see. e.g., [3, 4]) in competitive multi-agent environments. These priors and beliefs fundamentally shape the rational decision-making of agents since they characterize the game’s uncertainty. The concept of Bayesian persuasion studies how a principal can use her informational advantage to strategically reveal noisy information about the state relevant to decision-making to the agents, thereby influencing and manipulating agents’ beliefs to induce behavior in her favor [2, 5, 6, 7, 4].

In this work, we focus on a discrete-time, infinite-horizon stochastic with interactive information acquisition (SGIA) played by a finite group of self-interested agents. SGIA is structured around two sequential decision-making stages within each time period. During the initial stage, agents engage in interactive information acquisition, aiming to procure noisy information about an unknown state, which is realized at the beginning of each period. The nature and quality of the information obtained in this stage subsequently define each agent’s type of unique private information. Each agent ii has the autonomy to decide how she gets informed about the unknown state by choosing a specific signaling rule that incurs a cost. We consider that the choices of signaling rules made by other agents do not directly influence the type generation of agent ii. However, they impact agent ii’s beliefs regarding the unknown state and the types of other agents. The observation of private types instigates the second stage of the game. Here, each agent decides on a course of regular action, contingent upon her type. Subsequently, the state evolves according to a Markovian dynamic, dependent on the current state and the action profiles of the agents. The decision-making in each stage is simultaneous.

Built upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9], we propose a new equilibrium notion referred to as the pipelined perfect Markov Bayesian equilibrium (PPME). This concept encapsulates the core consistency between the optimalities of agents’ information acquisition and regular action-taking in a Markovian dynamic environment.

We characterize the PPME based on a principle known as the fixed-point alignment. By fixing the strategies for the information acquisition stage, we first formulate the equilibrium behaviors of the regular action-taking as a constrained optimization problem according to the nonlinear optimization formulation for Nash equilibria of a stochastic game (e.g., [10, 11, 12, 13]). We then propose the global fixed-point alignment (GFPA) to characterize the selected signaling rule profiles that match the fixed point of the optimal information acquisition at the first stage to the fixed point from the optimal action choices at the second stage. The GFPA process can be conceptualized as if there is an information designer who aims to induce certain behaviors (i.e., action-taking) of the agents by designing a set of available signaling rules for the agents to choose. Involving the agents’ autonomy of choosing signaling rules at the first stage distinguishes our model from the equilibrium analyses in existing Bayesian persuasion or information design in static environments (e.g., [2, 3, 5, 6]) as well as in dynamic models (e.g., [7, 4, 14]). By decomposing the problem of GFPA into local fixed-point alignment (LFPA) problems, we obtain a set of verifiable conditions known as local admissibility by applying a KKT-like process to the LFPA problems. Under a mild condition, we show that the local admissibility serves as a necessary and sufficient condition, placed on the signaling rules selections and action-takings, for PPME. Thus, if an algorithm converges to locally admissible points, then it provides a PPME for the stochastic game with interactive information acquisition.

The remainder of the paper is organized as follows. In Section II, In Section II, we present a formal description of the stochastic game model and introduce the equilibrium concept of pipelined perfect Markov Bayesian equilibrium (PPME). Section III-A introduces the concept of global fixed-point alignment (GFPA), while Section III-A provides a detailed elaboration on local fixed-point alignment (LFPA). Section III-C provides discussions and concludes the paper. Omitted proofs are delegated to the online appendix in [15].

Refer to caption
Figure 1: Flow of events in each period of the game 𝙱Γ\mathtt{B}^{\Gamma}.

II Problem Formulation and Equilibrium

II-A Base Game Model

A finite-player infinite-horizon stochastic game can be characterized by a tuple 𝙱=N,S,{Ai},{Ri},T,T̊,δ\mathtt{B}=\left<N,S,\left\{A_{i}\right\},\left\{R_{i}\right\},T,\mathring{T},\delta\right> in which

  • There is a finite number of players, denoted by N={1,2,,n}N=\{1,2,\dots,n\}.

  • The finite set of states that players can encounter at each period is denoted by SS.

  • The finite set of actions that player ii can take is denoted by AiA_{i}.

  • The payoff function of player ii is denoted by: Ri:S×AR_{i}:S\times A\mapsto\mathbb{R}, where A=iNAiA=\prod_{i\in N}A_{i}.

  • The state evolves over time according to T:S×AΔ(S)T:S\times A\mapsto\Delta\left(S\right); i.e., the probability of st+1s_{t+1} is given by T(st+1|st,at)T(s_{t+1}|s_{t},a_{t}) when period-tt state is stSs_{t}\in S and action profile is atAa_{t}\in A. T̊()Δ(S)\mathring{T}(\cdot)\in\Delta(S) is the initial distribution of the state.

  • δ(0,1)\delta\in(0,1) is the common discount factor.

For notational compactness, we use history, denoted by htHh_{t}\in H, to capture the pair of state and action profile of the last period; i.e., ht(st1,at1)h_{t}\equiv(s_{t-1},a_{t-1}) and HS×AH\equiv S\times A. We assume that hth_{t} is common knowledge; i.e., sts_{t} and ata_{t} become publicly observable at the end of each period tt.

II-B Interactive Information Acquisition

We consider that the agents do not observe the realizations of the state at the beginning of each period. Instead, in each period tt, the agents engage in interactive information acquisition to obtain additional information about the unobserved state si,tSs_{i,t}\in S.

Based on history htHh_{t}\in H, agents’ interactive information acquisition leads to an information profile, denote by [ht|gt]p(|ht,gt),τ(,gt),Θ\mathcal{I}\left[h_{t}\middle|g_{t}\right]\equiv\left<p(\cdot|h_{t},g_{t}),\tau(\cdot,g_{t}),\Theta\right> at the beginning of each period tt, where gt(gi,t)iNGiNGig_{t}\equiv(g_{i,t})_{i\in N}\in G\equiv\prod_{i\in N}G_{i} represents the profile of the agents’ choices of information acquisition. Here, ΘiNΘi\Theta\equiv\prod_{i\in N}\Theta_{i} is the profile of finite type spaces for the agents. p(|ht,gt)Δ(S×Θ)p(\cdot|h_{t},g_{t})\in\Delta(S\times\Theta) is the joint probability of the state and the type profile. τ(|st,gt)Δ(Θ)\tau(\cdot|s_{t},g_{t})\in\Delta(\Theta) is the signaling rule profile that generates a type profile θt=(θi,t)iNΘ\theta_{t}=(\theta_{i,t})_{i\in N}\in\Theta for the agents. The signaling rule profile is independent if τ(θt|st,gt)=iNτi(θi,t|st,gi,t)\tau(\theta_{t}|s_{t},g_{t})=\prod_{i\in N}\tau_{i}(\theta_{i,t}|s_{t},g_{i,t}), or correlated if the marginal concerning each agent ii, τi(θi,t|st,gi,t,gi,t)\tau_{i}(\theta_{i,t}|s_{t},g_{i,t},g_{-i,t}), depends on gi,tg_{-i,t}. In this work, we focus on independent signaling rules. In addition, we refer to gt=(gi,t)iNGg_{t}=(g_{i,t})_{i\in N}\in G as the cognition choice profile where each gi,tGig_{i,t}\in G_{i} is agent ii’s cognition choice that indexes the agent’s choice of τi(|st,gi,t)\tau_{i}(\cdot|s_{t},g_{i,t}). We assume that the cardinalities of 𝒢i\mathcal{G}_{i} and Θi\Theta_{i} are the same for all agents; i.e., |𝒢i|=|𝒢j|\left|\mathcal{G}_{i}\right|=\left|\mathcal{G}_{j}\right| and |Θi|=|Θj|\left|\Theta_{i}\right|=\left|\Theta_{j}\right|, for all iji\neq j.

Given the state dynamics specified by the base game 𝙱\mathtt{B}, we assume that the information profile [ht|gt]\mathcal{I}\left[h_{t}\middle|g_{t}\right] satisfy

θtΘp(st,θt|ht,gt)=T(st|ht),\displaystyle\sum\nolimits_{\theta_{t}\in\Theta}p\left(s_{t},\theta_{t}\middle|h_{t},g_{t}\right)=T\left(s_{t}\middle|h_{t}\right), (1)
τ(θt|st,gt)=p(st,θt|ht,gt)T(st|ht).\displaystyle\tau\left(\theta_{t}\middle|s_{t},g_{t}\right)=\frac{p(s_{t},\theta_{t}|h_{t},g_{t})}{T(s_{t}|h_{t})}. (2)

Given the base game 𝙱\mathtt{B}, define the set of available signaling rule profiles by

𝒯[G,Θ]{τ|τi(,gi,t):SΔ(Θi),iN,gi,tGi,s.t. p(|ht,gt)Δ(S×Θ) satisfying (1) and (2),htH}.\mathcal{T}\left[G,\Theta\right]\equiv\left\{\tau\middle|\begin{aligned} &\tau_{i}\left(\cdot,g_{i,t}\right):S\mapsto\Delta\left(\Theta_{i}\right),\forall i\in N,\\ &g_{i,t}\in G_{i},\textup{s.t. }\exists p(\cdot|h_{t},g_{t})\in\Delta(S\times\Theta)\\ &\textup{ satisfying }(\ref{eq:common_prior_cond_1})\textup{ and }(\ref{eq:common_prior_cond_2}),\forall h_{t}\in H\end{aligned}\right\}. (3)

After all agents made their cognition choices, each agent ii privately receives a type θi,tΘi\theta_{i,t}\in\Theta_{i} with probability τi(θi,t|st,gi,t)\tau_{i}(\theta_{i,t}|s_{t},g_{i,t}). Based on his type, agent ii chooses an action ai,tAia_{i,t}\in A_{i}.

Each agent ii’s information acquisition induces a cognition cost. In this work, we restrict attention to the cognition cost that depends on the true state and the action taken by the agent. That is, after choosing gi,tg_{i,t}, agent ii suffers a cost Ci(st,ai,t)C_{i}(s_{t},a_{i,t}) that is realized at the end of each period when the true state is si,ts_{i,t} and agent ii takes action ai,ta_{i,t}. This cost scheme prices agent ii’s information acquisition based on the consequences of the information acquisition (i.e., the agent’s local action ai,ta_{i,t}) when the true state is si,ts_{i,t}.

II-C Stochastic Game with Interactive Information Acquisition

Let Γ{𝒯[G,Θ],C}\Gamma\equiv\left\{\mathcal{T}^{\dagger}\left[G,\Theta\right],C\right\} denote the cognition scheme for some 𝒯[G,Θ]𝒯[G,Θ]\mathcal{T}^{\dagger}\left[G,\Theta\right]\subseteq\mathcal{T}\left[G,\Theta\right], where C={Ci}iNC=\{C_{i}\}_{i\in N}. The base game 𝙱\mathtt{B} and the cognition scheme Γ\Gamma induces a stochastic game with interactive information acquisition (SGIA), denoted by 𝙱Γ\mathtt{B}^{\Gamma}. Each agent ii’s decision making in each period tt is described as follows.

  • Contingent on the history hth_{t}, agent ii uses a pure-strategy selection policy βi,t(ht)Gi\beta_{i,t}(h_{t})\in G_{i} to select gi,tGig_{i,t}\in G_{i}.

  • Contingent on the type θi,t\theta_{i,t}, agent ii uses a mixed-strategy policy πi\pi_{i} to choose mixed action πi,t(θi,t,ht)=(πi(a|θi,t,ht))a𝒜iΔ(Ai)\pi_{i,t}(\theta_{i,t},h_{t})=\left(\pi_{i}(a|\theta_{i,t},h_{t})\right)_{a\in\mathcal{A}_{i}}\in\Delta(A_{i}).

We say that a policy profile π=(πi,t,πi,t)t1\pi=(\pi_{i,t},\pi_{-i,t})_{t\geq 1} is feasible if it satisfies the following constraints:

πi,t(ai,t|θi,t)0,ai,tAi,θi,tΘi,iN,t1,\displaystyle\pi_{i,t}(a_{i,t}|\theta_{i,t})\geq 0,\forall a_{i,t}\in A_{i},\theta_{i,t}\in\Theta_{i},i\in N,t\geq 1, (FE1)
ai,tAiπi,t(ai,t|θi,t)=1,θi,tΘi,iN,t1.\displaystyle\sum\nolimits_{a_{i,t}\in A_{i}}\pi_{i,t}(a_{i,t}|\theta_{i,t})=1,\forall\theta_{i,t}\in\Theta_{i},i\in N,t\geq 1. (FE2)

With reference to Fig. 1, the following events occur in each period tt of 𝙱Γ\mathtt{B}^{\Gamma}.

  • 1.

    Nature draws a state stSs_{t}\in S according to T(|ht)Δ(S)T(\cdot|h_{t})\in\Delta\left(S\right).

  • 2.

    Each agent ii selects a cognition choice gi,t𝒢g_{i,t}\in\mathcal{G}, based on the common history, which determines τi\tau_{i}. These selections are simultaneous.

  • 3.

    Based on the cognition choice profile gtg_{t}, a type profile θt=(θi,t)iN\theta_{t}=\left(\theta_{i,t}\right)_{i\in N} is drawn with probability τ(θt|st,gt)\tau(\theta_{t}|s_{t},g_{t}).

  • 4.

    Each agent ii privately observes his type θi,t\theta_{i,t} and then chooses an action ai,ta_{i,t} with probability πi,t(ai,t|θi,t)\pi_{i,t}(a_{i,t}|\theta_{i,t}).

  • 5.

    The state sts_{t} and the action profile at=(ai,t,ai,t)a_{t}=(a_{i,t},a_{-i,t}) (or, ht+1=(st,at)h_{t+1}=(s_{t},a_{t})) become public information, and the state sts_{t} is transitioned to a new state st+1s_{t+1} according to T(|ht+1)Δ(S)T(\cdot|h_{t+1})\in\Delta(S).

II-D Value Functions

According to the Ionescu Tulcea theorem [16], {T̊,T,τ}\{\mathring{T},T,\tau\}, the agents’ policy profile <β,π><\beta,\pi>, and a cost profile (for a certain cost scheme) uniquely define a probability measure, denoted by P[τ,β,π]P[\tau,\beta,\pi], over (S×G×Θ×A)(S\times G\times\Theta\times A)^{\infty}. Let 𝔼β,πτ[]\mathbb{E}^{\tau}_{\beta,\pi}[\cdot] denote the expectation operator with respect to P[τ,β,π]P[\tau,\beta,\pi]. In addition, given any hth_{t} and (ht,θi,t)(h_{t},\theta_{i,t}), we obtain unique probability measures (perceived by agent ii) P[τ,β,π|ht]P[\tau,\beta,\pi|h_{t}] and P[τ,β,π|ht,θi,t]P[\tau,\beta,\pi|h_{t},\theta_{i,t}] over S×Gi×Θ×A×(S×G×Θ×A)S\times G_{-i}\times\Theta\times A\times(S\times G\times\Theta\times A)^{\infty} and S×Θi×A×(S×G×Θ×A)S\times\Theta_{-i}\times A\times(S\times G\times\Theta\times A)^{\infty}, respectively, for all iNi\in N. In particular, P[τ,β,π|ht]P[\tau,\beta,\pi|h_{t}] models the uncertainty at period tt for each agent ii at the beginning of the selection stage while P[τ,β,π|ht,θi,t]P[\tau,\beta,\pi|h_{t},\theta_{i,t}] models the uncertainty for each agent ii at the beginning of the primitive stage. Let 𝔼β,πτ[|ht]\mathbb{E}^{\tau}_{\beta,\pi}[\cdot|h_{t}] and 𝔼β,πτ[|ht,θi,t]\mathbb{E}^{\tau}_{\beta,\pi}[\cdot|h_{t},\theta_{i,t}], respectively, denote the expectation operators with respect to P[τ,β,π|ht]P[\tau,\beta,\pi|h_{t}] and P[τ,β,π|ht,θi,t]P[\tau,\beta,\pi|h_{t},\theta_{i,t}].

Given P[τ,β,π|ht]P[\tau,\beta,\pi|h_{t}], agent ii’s period-tt history value function (H value function) is defined by

Ji,t(ht|τ,β,π)𝔼β,πτ[k=tδkt(Ri(s~k,a~k)+Ci(s~k,a~i,k))|ht].\displaystyle J_{i,t}(h_{t}|\tau,\beta,\pi)\equiv\mathbb{E}^{\tau}_{\beta,\pi}\left[\sum_{k=t}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{k},\tilde{a}_{k})+C_{i}\left(\tilde{s}_{k},\tilde{a}_{i,k}\right)\right)\middle|h_{t}\right]. (H)

After a type θi,t\theta_{i,t} is realized, each agent ii forms a posterior belief, denoted by μi(|θi,t,ht)Δ(S×Θi)\mu_{i}(\cdot|\theta_{i,t},h_{t})\in\Delta(S\times\Theta_{-i}), over the state sts_{t} and other agents’ contemporaneous types θi,t\theta_{-i,t}. Due to (3), there is a (period-tt) common prior pp such that each posterior belief satisfies

μi(st,θi,t|θi,t,ht,gt)=p(st,θi,t,θi,t|ht,gt)st,θi,tp(st,θi,t,θi,t|ht,gt).\mu_{i}\left(s_{t},\theta_{-i,t}\middle|\theta_{i,t},h_{t},g_{t}\right)=\frac{p\left(s_{t},\theta_{i,t},\theta_{-i,t}\middle|h_{t},g_{t}\right)}{\sum\nolimits_{s_{t},\theta_{-i,t}}p\left(s_{t},\theta_{i,t},\theta_{-i,t}\middle|h_{t},g_{t}\right)}.

With abuse of notation, we use μi(st|θi,t,ht)\mu_{i}(s_{t}|\theta_{i,t},h_{t}) and μi(θi,t|θi,t,ht)\mu_{i}(\theta_{-i,t}|\theta_{i,t},h_{t}) for the marginals of μi(st,θi,t|θi,t,ht)\mu_{i}(s_{t},\theta_{-i,t}|\theta_{i,t},h_{t}). Given hth_{t} and θi,t\theta_{i,t}, we define each agent ii’s expected immediate reward (due to agent ii’s uncertainty about sts_{t}) by

R¯i(ht,θi,t,at)stS(Ri(st,at)+Ci,t(st,ai,t))μi(st|θi,t,ht).\overline{R}_{i}\left(h_{t},\theta_{i,t},a_{t}\right)\equiv\sum\limits_{s_{t}\in S}\left(R_{i}(s_{t},a_{t})+C_{i,t}\left(s_{t},a_{i,t}\right)\right)\mu_{i}(s_{t}|\theta_{i,t},h_{t}).

Given P[τ,β,ϕ|ht]P[\tau,\beta,\phi|h_{t}], agent ii’s period-tt history-type value function (HT value function) is defined by

Vi,t(ht,θt|τ,β,π)𝔼β,π(|θt)τ[R¯i(ht,θi,t,a~t)\displaystyle V_{i,t}(h_{t},\theta_{t}|\tau,\beta,\pi)\equiv\mathbb{E}^{\tau}_{\beta,\pi(\cdot|\theta_{t})}\Bigg{[}\overline{R}_{i}(h_{t},\theta_{i,t},\tilde{a}_{t}) (HT)
+k=t+1δkt+1(Ri(s~i,k,a~k)+Ci,k(s~i,k,a~i,k))|ht,θi,t].\displaystyle+\sum_{k=t+1}^{\infty}\delta^{k-t+1}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+C_{i,k}\left(\tilde{s}_{i,k},\tilde{a}_{i,k}\right)\right)\Bigg{|}h_{t},\theta_{i,t}\Bigg{]}.

Here, Vi,t()V_{i,t}(\cdot) depends on θi,t\theta_{-i,t} through the policy profile π(|θt)\pi(\cdot|\theta_{t}) to take expectation over the current-period action profile. Finally, agent ii’s period-tt history-type-action value function (HTA value function) is defined by

Qi,t(ht,θi,t,at|τ,β,π)R¯i(ht,θi,t,at)\displaystyle Q_{i,t}(h_{t},\theta_{i,t},a_{t}|\tau,\beta,\pi)\equiv\overline{R}_{i}(h_{t},\theta_{i,t},a_{t}) (HTA)
+𝔼β,πτ[k=t+1δkt+1(Ri(s~i,k,a~k)+Ci(s~i,k,a~k))|ht,θi,t].\displaystyle+\mathbb{E}^{\tau}_{\beta,\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t+1}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+C_{i}\left(\tilde{s}_{i,k},\tilde{a}_{k}\right)\right)\middle|h_{t},\theta_{i,t}\right].

Here, Qi,t()Q_{i,t}(\cdot) is independent of θi,t\theta_{-i,t} given the action profile ata_{t}.

II-E Pipelined Perfect Markov Bayesian Equilibrium

In this section, we define a new equilibrium concept for the game 𝙱Γ\mathtt{B}^{\Gamma}. Our focus lies on the stationary equilibrium, and as such, we omit the time indexes of the value functions and variables, unless explicitly mentioned otherwise. First, we construct

𝙴𝚅i(h,θi|τ,β,π;Vi)𝔼βτ[Vi(h,θi,θ~i|β,τ,π)|h,θi].\displaystyle\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\pi;V_{i}\right)\equiv\mathbb{E}^{\tau}_{\beta}\left[V_{i}(h,\theta_{i},\tilde{\theta}_{-i}|\beta,\tau,\pi)\middle|h,\theta_{i}\right]. (4)
Π[β;𝙱Γ]{π|𝙴𝚅i(h,θi|τ,β,(πi,πi);Vi)𝙴𝚅i(h,θi|τ,β,(πi,πi);Vi),iN,θiΘi,πi,(FE1),(FE2)}.\displaystyle\Pi\left[\beta;\mathtt{B}^{\Gamma}\right]\equiv\left\{\pi\middle|\begin{aligned} &\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\left(\pi_{i},\pi_{-i}\right);V_{i}\right)\\ &\geq\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\left(\pi^{\prime}_{i},\pi_{-i}\right);V_{i}\right),\\ &\forall i\in N,\theta_{i}\in\Theta_{i},\pi^{\prime}_{i},(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}. (5)
Definition 1 (Pipelined Perfect Markov Bayesian Equilibrium).

In a game 𝙱Γ\mathtt{B}^{\Gamma}, a (stationary) strategy profile β,π\left<\beta,\pi\right> constitutes a pipelined perfect Markov Bayesian equilibrium (PPME) if for all iNi\in N, hHh\in H, θiΘi\theta_{i}\in\Theta_{i}, it holds in every period that, for βi\beta^{\prime}_{i} and πi\pi^{\prime}_{i} such that (πi,πi)Π[βi,βi;𝙱Γ]\left(\pi^{\prime}_{i},\pi_{-i}\right)\in\Pi\left[\beta^{\prime}_{i},\beta_{-i};\mathtt{B}^{\Gamma}\right],

Ji(h|τ,(βi,βi),π)Ji(h|τ,(βi,βi),(πi,πi)),\displaystyle J_{i}\left(h\middle|\tau,\left(\beta_{i},\beta_{-i}\right),\pi\right)\geq J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i},\beta_{-i}\right),\left(\pi^{\prime}_{i},\pi_{-i}\right)\right), (6)
and πΠ[β;𝙱Γ].\displaystyle\textit{and }\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right]. (7)

The equilibrium concept of PPME builds upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9].

Lemma 1.

Let 𝙱Γ\mathtt{B}^{\Gamma} with feasible cognition profile Γ\Gamma. A strategy profile β,π\left<\beta,\pi\right> is a PPME of a game 𝙱Γ\mathtt{B}^{\Gamma} if and only if

Ji(h|τ,(βi,βi),π)Ji(h|τ,(βi,βi),π),\displaystyle J_{i}\left(h\middle|\tau,\left(\beta_{i},\beta_{-i}\right),\pi\right)\geq J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i},\beta_{-i}\right),\pi\right), (8)
and πΠ[β;𝙱Γ].\displaystyle\textit{and }\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right].

Hence, in a PPME problem, each agent iNi\in N tries to solve the optimization problem given hHh\in H:

maxβi,πi\displaystyle\max\limits_{\beta_{i},\pi_{i}} Ji(h|τ,(βi,βi),(πi,πi)), s.t. (πi,πi)Π[βi,βi;𝙱Γ].\displaystyle J_{i}\left(h\middle|\tau,\left(\beta_{i},\beta_{-i}\right),\left(\pi_{i},\pi_{-i}\right)\right),\textup{ s.t. }\left(\pi_{i},\pi_{-i}\right)\in\Pi\left[\beta_{i},\beta_{-i};\mathtt{B}^{\Gamma}\right]. (9)
Theorem 1.

Fix any GG and Θ\Theta, there exists at least one profile τ𝒯[G,Θ]\tau\in\mathcal{T}\left[G,\Theta\right] such that the game 𝙱Γ\mathtt{B}^{\Gamma} admits at least one stationary PPME.

III Equilibrium Characterizations

In this section, we characterize the stationary PPME problem for a given game 𝙱Γ\mathtt{B}^{\Gamma} by formulating it as a constrained optimization problem and establishing a verifiable condition that is both necessary and sufficient.

Following standard dynamic programming argument (see, e.g., [17]), we represent (H), (HT), and (HTA) recursively as follows:

𝐉i(h|τ,β;Vi)=θ,sVi(h,θ|τ,β,π)τ(θ|s,β(h))T(s|h),\displaystyle\begin{aligned} \mathbf{J}_{i}\left(h\middle|\tau,\beta;V_{i}\right)=\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T\left(s\middle|h\right),\end{aligned} (10)
Vi(h,θ|τ,β,π)=aπ(a|θi,θi)Qi(h,θi,a|τ,β,π),\displaystyle\begin{aligned} V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)=\sum_{a}\pi\left(a\middle|\theta_{i},\theta_{-i}\right)Q_{i}\left(h,\theta_{i},a\middle|\tau,\beta,\pi\right),\end{aligned} (11)
𝐐i(h,θi,a|τ,β;Vi)=R¯i(h,θi,a)+δs𝐉i(s,a|τ,β;Vi)μi(s|θi;τ).\displaystyle\begin{aligned} \mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)&=\overline{R}_{i}\left(h,\theta_{i},a\right)\\ &+\delta\sum\limits_{s}\mathbf{J}_{i}\left(s,a\middle|\tau,\beta;V_{i}\right)\mu_{i}\left(s\middle|\theta_{i};\tau\right).\end{aligned} (12)

Here, we denote 𝐉i(;Vi)\mathbf{J}_{i}(\cdot;V_{i}) and 𝐐i(;Vi)\mathbf{Q}_{i}(\cdot;V_{i}) with ViV_{i} to highlight their dependence on ViV_{i} from the Bellman recursions (10)-(12). Note that Qi(|β,π,τ)Q_{i}(\cdot|\beta,\pi,\tau) in the right-hand side (RHS) of (11) is given by (HTA).

Leveraging (10)-(12), define

Z(π,V|β,τ)\displaystyle Z\left(\pi,V\middle|\beta,\tau\right) (OBJ1)
h,s,θ(i(Vi(h,θ)a𝐐i(h,θi,a|τ,β;Vi)π(a|θ))\displaystyle\equiv\sum_{h,s,\theta}\Bigg{(}\sum_{i}\left(V_{i}\left(h,\theta\right)-\sum\limits_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)\pi\left(a\middle|\theta\right)\right)
×τ(θ|s,β(h))Ts(s|h)),\displaystyle\times\tau\left(\theta\middle|s,\beta(h)\right)T_{s}\left(s\middle|h\right)\Bigg{)},

and in addition, construct the following constraints:

{𝐉i(h|τ,β;Vi)s,θiVi(h,θi,θi)τi(θi|s,βi(h))Ts(s|h),i,h,θi with sτi(θi|s,βi(h))Ts(s|h)>0,\displaystyle\begin{cases}&\begin{aligned} &\mathbf{J}_{i}(h|\tau,\beta;V_{i})\\ &\geq\sum\limits_{s,\theta_{-i}}V_{i}(h,\theta_{i},\theta_{-i})\tau_{-i}\left(\theta_{-i}|s,\beta_{-i}(h)\right)T_{s}(s|h),\end{aligned}\\ &\forall i,h,\theta_{i}\textup{ with }\sum_{s}\tau_{i}\left(\theta_{i}|s,\beta_{i}(h)\right)T_{s}(s|h)>0,\end{cases} (EQ1)
{𝙴𝚅i(h,θi|τ,β,π;Vi)𝔼β,πiτ[𝐐i(h,θi,t,ai,a~i|τ,β;Vi)|h,θi],i,ai,h,θi with sτi(θi|s,βi(h))Ts(s|h)>0.\displaystyle\begin{cases}&\begin{aligned} &\mathtt{EV}_{i}(h,\theta_{i}|\tau,\beta,\pi;V_{i})\\ &\geq\mathbb{E}^{\tau}_{\beta,\pi_{-i}}\Big{[}\mathbf{Q}_{i}(h,\theta_{i,t},a_{i},\tilde{a}_{-i}|\tau,\beta;V_{i})\Big{|}h,\theta_{i}\Big{]},\end{aligned}\\ &\forall i,a_{i},h,\theta_{i}\textup{ with }\sum_{s}\tau_{i}\left(\theta_{i}|s,\beta_{i}(h)\right)T_{s}(s|h)>0.\end{cases} (EQ2)

Define the following set

𝒦(β|τ){π,V|π and V satisfy (FE1),(FE2),(EQ1),(EQ2)}.\mathcal{K}\left(\beta\middle|\tau\right)\equiv\left\{\left<\pi,V\right>\Big{|}\begin{aligned} &\pi\textup{ and }V\textup{ satisfy }(\ref{eq:FE1}),(\ref{eq:FE2}),\\ &(\ref{eq:OB}),(\ref{eq:EQ})\end{aligned}\right\}. (13)

Let

(β|τ){argminπ,VZ(π,V|β,τ), s.t. π,V𝒦(β|τ)}.\mathcal{E}\left(\beta\middle|\tau\right)\equiv\left\{\begin{aligned} \arg\min\limits_{\pi,V}&\;Z(\pi,V|\beta,\tau),\\ \text{ s.t. }&\left<\pi,V\right>\in\mathcal{K}\left(\beta\middle|\tau\right)\end{aligned}\right\}. (OPT)
Proposition 1.

Fix β\beta. In a game 𝙱Γ\mathtt{B}^{\Gamma}, a profile <β,π><\beta,\pi> is a PPME if and only if (i) <π,V>(β|τ)<\pi,V>\in\mathcal{E}\left(\beta\middle|\tau\right), where V=(Vi)iNV=\left(V_{i}\right)_{i\in N} is the corresponding optimal HT value functions, and (ii) Z(π,V|β,τ)=0Z(\pi,V|\beta,\tau)=0.

Proposition 1 extends the fundamental formulation of finding a Nash equilibrium of a stochastic game as a nonlinear programming (Theorem 3.8.2 of [10]; see also, [11, 12, 13]). Here, the constraints (FE1) and (FE2) ensure that each candidate π\pi is a valid conditional probability distribution and rules out the possible trivial solution {πi=0}iN\{\pi_{i}=0\}_{i\in N}. The constraints (EQ1) and (EQ2) are two necessary conditions for a PPME derived from the optimality of PPME and the Bellman recursions (10) and (11).

III-A Global Fixed-Point Alignment

First, we extend the agents’ strategy profile β,π\left<\beta,\pi\right> to β,π,V\left<\beta,\pi,V\right>. Given ViV_{i} as a variable, we define the following term based on (4):

𝙼𝚅i,t(ht,θi,t|τ,β;Vi,t)𝔼βτ[Vi,t(ht,θi,t,θ~i,t)|h,θi].\displaystyle\mathtt{MV}_{i,t}\left(h_{t},\theta_{i,t}\middle|\tau,\beta;V_{i,t}\right)\equiv\mathbb{E}^{\tau}_{\beta}\left[V_{i,t}(h_{t},\theta_{i,t},\tilde{\theta}_{-i,t})\middle|h,\theta_{i}\right]. (14)

If we fix a β\beta, Proposition 1 implies that π,V\left<\pi,V\right> of a PPME profile β,π,V\left<\beta,\pi,V\right> needs to be a global minimum of (OPT) with Z(π,V|τ,β)=0Z(\pi,V|\tau,\beta)=0. Equivalently, given πi\pi_{-i}, each ViV_{i} needs to be a fixed point of the following equation, for all iNi\in N, hHh\in H, θiΘi\theta_{i}\in\Theta_{i},

{𝙼𝚅i(h,θi|τ,β;Vi)𝔼βi,πiτ[𝐐i(h,θi,ai,a~i|τ,β;Vi)|θi,h],aiAi,iN,hH,θiΘi.\begin{cases}&\begin{aligned} &\mathtt{MV}_{i}(h,\theta_{i}|\tau,\beta;V_{i})\\ &\geq\mathbb{E}^{\tau}_{\beta_{-i},\pi_{-i}}\Big{[}\mathbf{Q}_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau,\beta;V_{i})\Big{|}\theta_{i},h\Big{]},\end{aligned}\\ &\forall a_{i}\in A_{i},i\in N,h\in H,\theta_{i}\in\Theta_{i}.\end{cases} (EQ4)

where dependence of the RHS of (EQ4) on ViV_{i} is due to (12). At the cognition stage, the optimality of PPME requires that the agents’ choice gg among all possible options is optimal. Given any Ji()J_{i}(\cdot), define, for all iNi\in N, θiΘi\theta_{i}\in\Theta_{i}, hHh\in H,

𝙸𝙹i(h,θi|τi,βi,π;Ji)s,θi,a(R¯i(h,θi,a)+Ji(s,a))\displaystyle\mathtt{IJ}_{i}(h,\theta_{i}|\tau_{-i},\beta_{-i},\pi;J_{i})\equiv\sum\limits_{s,\theta_{-i},a}\Big{(}\overline{R}_{i}(h,\theta_{i},a)+J_{i}\left(s,a\right)\Big{)} (15)
×π(a|θi,θi)τi(θi|s,βi(h))T(s|h),\displaystyle\times\pi(a|\theta_{i},\theta_{-i})\tau_{-i}\left(\theta_{-i}|s,\beta_{-i}(h)\right)T(s|h),

where Ji(s,a)J_{i}(s,a) on the RHS of (15) is a H value function of the next period given current-period (s,a)(s,a). The optimality of τ\tau in the cognition stage of a PPME (i.e., constraint (EQ1)) implies that the optimal history value function JiJ_{i} for each agent ii needs to be a fixed point while fixing others’ τi\tau_{-i}. That is,

{Ji(h)𝙸𝙹i(h,θi|τi,βi,π;Ji),θiΘi,iN,hH.\begin{cases}&J_{i}(h)\geq\mathtt{IJ}_{i}(h,\theta_{i}|\tau_{-i},\beta_{-i},\pi;J_{i}),\\ &\forall\theta_{i}\in\Theta_{i},i\in N,h\in H.\end{cases} (EQ3)

Here, (EQ3) is independent of VV while (EQ4) is independent of JJ. In order to make <β,π><\beta,\pi> as a PPME of 𝙱Γ\mathtt{B}^{\Gamma}, <β,π><\beta,\pi> must be chosen such that there exist J,V\left<J,V\right> satisfying

{J is a fixed point of (EQ3) if and only ifV is a fixed point of (EQ4).}\left\{\begin{aligned} &\textup{$J$ is a fixed point of (\ref{eq:OB1}) if and only if}\\ &\textup{$V$ is a fixed point of (\ref{eq:EQ4}).}\end{aligned}\right\}

We refer to such a procedure as the Global Fixed-Point Alignment (GFPA).

Since βi\beta_{i} is a pure strategy, each deterministic choice of gi=βi(h)g_{i}=\beta_{i}(h) leads to a signaling rule τi(|,gi)\tau_{i}(\cdot|\cdot,g_{i}) that determines a distribution of agent ii’s period-tt types. That is, every βi\beta_{i} determines a unique τi(,gi):SΔ(Θi)\tau_{i}(\cdot,g_{i}):S\mapsto\Delta(\Theta_{i}) for every hh. Hence, for ease of exposition, we use τi\tau_{i} and τi,t(|,gi,h)\tau_{i,t}(\cdot|\cdot,g_{i},h) (with abuse of notation) to represent βi\beta_{i} and βi(h)\beta_{i}(h), respectively; unless otherwise stated. Therefore, in game 𝙱Γ\mathtt{B}^{\Gamma}, each agent ii controls τi,π\left<\tau_{i},\pi\right>.

Given a π\pi, define the following function of τ\tau, JJ, and VV:

Z𝙶𝙵𝙿𝙰(τ,J,V|π)\displaystyle Z^{\mathtt{GFPA}}(\tau,J,V|\pi) (OBJ2)
i,h(Ji(h)θ,sVi(h,θ)τ(θ|s,g,h)Ts(s|h)).\displaystyle\equiv\sum\nolimits_{i,h}\left(J_{i}(h)-\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta|s,g,h)T_{s}(s|h)\right).

Similar to (FE1) and (FE2) for π\pi, we introduce the following two constraints placed on τ\tau:

τi(θi|s,gi,h)0,iN,θiΘi,sS,gG,hH,\displaystyle\tau_{i}(\theta_{i}|s,g_{i},h)\geq 0,\forall i\in N,\theta_{i}\in\Theta_{i},s\in S,g\in G,h\in H, (RG1)
θiΘiτ(θi|s,gi,h)=1,iN,sS,gG,hH.\displaystyle\sum\nolimits_{\theta_{i}\in\Theta_{i}}\tau(\theta_{i}|s,g_{i},h)=1,\forall i\in N,s\in S,g\in G,h\in H. (RG2)

Define the following set:

𝒦𝙶𝙵𝙿𝙰(π){τ,J,V|(RG1),(RG2),(EQ3),(EQ4)}.\mathcal{K}^{\mathtt{GFPA}}(\pi)\equiv\left\{\left<\tau,J,V\right>\middle|(\ref{eq:regular_tau}),(\ref{eq:feasible_tau}),(\ref{eq:OB1}),(\ref{eq:EQ4})\right\}. (16)

Let

𝙶𝙵𝙿𝙰(π)={argminτ,J,VZ𝙶𝙵𝙿𝙰(τ,J,V|π), s.t. τ,J,V𝒦𝙶𝙵𝙿𝙰(π)}.\displaystyle\mathcal{E}^{\mathtt{GFPA}}(\pi)=\left\{\begin{aligned} \arg\min\limits_{\tau,J,V}&\;Z^{\mathtt{GFPA}}(\tau,J,V|\pi),\\ \textup{ s.t. }&\left<\tau,J,V\right>\in\mathcal{K}^{\mathtt{GFPA}}(\pi)\end{aligned}\right\}. (GFPA)
Proposition 2.

Suppose that J,V,π\left<J,V,\pi\right> satisfy the Bellman recursions (10)-(12). Then, τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}(\pi) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}(\tau,J,V|\pi)=0 if and only if π,V(β|τ)\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right) with Z(π,V|τ)=0Z(\pi,V|\tau)=0.

The proof of Proposition 2 is deferred to Appendix -F. Proposition 2 shows that if τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}(\pi) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0 for a policy π\pi, then the profile τ,π\left<\tau,\pi\right> is a PPME.

III-B Local Fixed-Point Alignment

First, we decompose each type space Θi\Theta_{i} into Θi=Θi{θ^i}\Theta_{i}=\Theta^{\natural}_{i}\cup\{\hat{\theta}_{i}\} such that the constraint (RG2) can reformulated as, for all iNi\in N, sSs\in S, gGg\in G, hHh\in H,

θiΘiτi(θi|s,gi,h)1,\displaystyle\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\leq 1, (17)
τi(θ^i|s,gi,h)+θiΘiτi(θi|s,gi,h)=1.\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)+\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)=1.

Given τi(θi|s,gi,h)=jiτj(θi|s,gj,h)\tau_{-i}(\theta_{-i}|s,g_{-i},h)=\prod_{j\neq i}\tau_{j}(\theta_{i}|s,g_{j},h), define for all iNi\in N,

𝙸𝚅i(h,θi|τi;Vi)θi,sVi(h,θi,θi)τi(θi|s,gi,h)Ts(s|h).\displaystyle\mathtt{IV}_{i}(h,\theta_{i}|\tau_{-i};V_{i})\equiv\sum\limits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau_{-i}(\theta_{-i}|s,g_{-i},h)T_{s}(s|h).

Construct the vector Xis(Ji(h),Vi(h,),τi(|s,gi,h))X^{s}_{i}\equiv\left(J_{i}(h),V_{i}(h,\cdot),\tau_{-i}\left(\cdot|s,g_{-i},h\right)\right). Define

λi(Xis;θi,h)Ji(h)𝙸𝚅i(h,θi|τi;Vi).\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)\equiv J_{i}(h)-\mathtt{IV}_{i}(h,\theta_{i}|\tau_{-i};V_{i}).

Here, λi(Xis;θi,h)\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right) is a function of JiJ_{i}, ViV_{i}, and τi(|s,gi,h)\tau_{-i}(\cdot|s,g_{-i},h) and is independent of π\pi and τi\tau_{i}.

For any sSs\in S, hHh\in H, θiΘi\theta_{i}\in\Theta_{i}, define the following function

Zi𝙻𝙵𝙿𝙰(Xis,τi;s,h)\displaystyle Z^{\mathtt{LFPA}}_{i}\left(X^{s}_{i},\tau_{i};s,h\right) θiΘiλi(Xis;θi,h)τi(θi|s,gi,h)\displaystyle\equiv\sum\nolimits_{\theta^{\prime}_{i}\in\Theta^{\natural}_{i}}\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\tau_{i}(\theta^{\prime}_{i}|s,g_{i},h) (18)
+λi(Xis;θ^i,h)τi(θ^i|s,gi,h).\displaystyle+\lambda_{i}(X^{s}_{i};\hat{\theta}_{i},h)\tau_{i}(\hat{\theta}_{i}|s,g_{i},h).

Define the following set

𝒦¯[s,gi,h]{Xis,τi,|τi(θ^i|s,gi,h)0τi(θi|s,gi,h)0,θiΘiλi(Xis;θi,h)0,θiΘi}.\displaystyle\overline{\mathcal{K}}\left[s,g_{-i},h\right]\equiv\left\{\left<X^{s}_{i},\tau_{i},\right>\middle|\begin{aligned} &\tau_{i}(\hat{\theta}_{i}|s,g_{i},h)\geq 0\\ &\tau_{i}(\theta_{i}|s,g_{i},h)\geq 0,\forall\theta_{i}\in\Theta^{\natural}_{i}\\ &\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\geq 0,\forall\theta^{\prime}_{i}\in\Theta_{i}\end{aligned}\right\}. (19)

Then, we define the problem of Local Fixed-Point Alignment (LFPA) by

minXis,τiZi𝙻𝙵𝙿𝙰(Xis,τi;s,h), s.t. Xis,τi𝒦¯[s,gi,h].\displaystyle\min_{X^{s}_{i},\tau_{i}}Z^{\mathtt{LFPA}}_{i}\left(X^{s}_{i},\tau_{i};s,h\right),\textup{ s.t. }\left<X^{s}_{i},\tau_{i}\right>\in\overline{\mathcal{K}}\left[s,g_{-i},h\right]. (LFPA)

Let eie_{i}, 𝒃i(b[θi])θiΘi\bm{b}_{i}\equiv(b[\theta_{i}])_{\theta_{i}\in\Theta^{\natural}_{i}}, 𝒇i(f[θi])θiΘi\bm{f}_{i}\equiv(f[\theta_{i}])_{\theta_{i}\in\Theta_{i}}, respectively, denote the Lagrange multipliers of the constraints {τi(θ^i|s,gi,h)0}\left\{\tau_{i}(\hat{\theta}_{i}|s,g_{i},h)\geq 0\right\}, {τi(θi|s,gi,h)0,θiΘi}\left\{\tau_{i}(\theta_{i}|s,g_{i},h)\geq 0,\forall\theta_{i}\in\Theta^{\natural}_{i}\right\}, and {λi(Xis;θi,h)0,θiΘi}\left\{\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\geq 0,\forall\theta^{\prime}_{i}\in\Theta_{i}\right\}. In addition, the corresponding slack variables are denoted by wiw_{i}, 𝒒i{q[θi]}θiΘi\bm{q}_{i}\equiv\{q[\theta_{i}]\}_{\theta_{i}\in\Theta^{\natural}_{i}}, 𝒛i{z[θi]}θiΘi\bm{z}_{i}\equiv\{z[\theta_{i}]\}_{\theta_{i}\in\Theta_{i}}, respectively. Then, the Lagrangian of (LFPA) is defined by

Li(Xis,τi,ei,𝒃i,𝒇i,wi,𝒒i,𝒛i|s,h)Zi𝙻𝙵𝙿𝙰(Xis,τi;s,h)\displaystyle L_{i}(X^{s}_{i},\tau_{i},e_{i},\bm{b}_{i},\bm{f}_{i},w_{i},\bm{q}_{i},\bm{z}_{i}|s,h)\equiv Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i};s,h) (20)
+θiΘib[θi](q[θi]τi(θi|s,gi,h))\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}b[\theta_{i}]\left(q[\theta_{i}]-\tau_{i}(\theta_{i}|s,g_{i},h)\right)
+θiΘif[θi](z[θi]λi(Xis;θi,h))+ei(wτik(θ^i|s,gi,h)).\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\big{(}z[\theta_{i}]-\lambda_{i}(X^{s}_{i};\theta_{i},h)\big{)}+e_{i}\big{(}w-\tau^{k}_{i}(\hat{\theta}_{i}|s,g_{i},h)\big{)}.

To simplify the presentation, we omit ss and hh. Taking partial derivatives of LiL_{i} with respect to XisX^{s}_{i} and τi\tau_{i} yields,

{Δi(Xis,τi,𝒇i)XisZi𝙻𝙵𝙿𝙰(Xis,τi)θiΘif[θi]Xisλi(Xis;θi),Di(Xis,τi(θi|),ei,b[θi])b[θi]e+τi(θi|)Zi𝙻𝙵𝙿𝙰(Xis,τi(θi|)),θiΘi.\begin{cases}&\begin{aligned} &\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)\\ &\equiv\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i})-\sum_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i}),\end{aligned}\\ &\begin{aligned} &D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]\right)\\ &\equiv b[\theta_{i}]-e+\frac{\partial}{\partial\tau_{i}(\theta_{i}|\cdot)}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot)),\forall\theta_{i}\in\Theta_{i}.\end{aligned}\end{cases}

Let 𝑿s(Xis)iN\bm{X}^{s}\equiv(X^{s}_{i})_{i\in N}, 𝒇(𝒇i)iN\bm{f}\equiv(\bm{f}_{i})_{i\in N}, 𝒆(ei)iN\bm{e}\equiv(e_{i})_{i\in N}, 𝒃(𝒃i)iN\bm{b}\equiv(\bm{b}_{i})_{i\in N}, and 𝝀(λi)iN\bm{\lambda}\equiv(\lambda_{i})_{i\in N}. Define

{𝑭(𝑿s,τ,𝒆,𝒃,𝒇)(Δi(Xis,τi,𝒇i),Di(Xis,τi(θi|),ei,b[θi]))iN,𝑲(𝒆,𝒃,𝒇;τ,𝝀)(eiτi(θ^i),b[θi]τi(θi),f[θi]λi(Xis;θi))i𝒩.\begin{cases}&\begin{aligned} &\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)\\ &\equiv\Big{(}\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right),D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]\right)\Big{)}_{i\in N},\end{aligned}\\ &\begin{aligned} &\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)\\ &\equiv\Big{(}e_{i}\tau_{i}(\hat{\theta}_{i}),b[\theta_{i}]\tau_{i}(\theta_{i}),f[\theta_{i}]\lambda_{i}(X^{s}_{i};\theta_{i})\Big{)}_{i\in\mathcal{N}}.\end{aligned}\end{cases}

Construct the set

(s,h){𝑿s,τ,𝒆,𝒃,𝒇|𝑭(𝑿s,τ,𝒆,𝒃,𝒇)=0𝑲(𝒆,𝒃,𝒇;τ,𝝀)=0}.\mathcal{R}\left(s,h\right)\equiv\left\{\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\middle|\begin{aligned} &\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)=0\\ &\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)=0\end{aligned}\right\}. (21)

For any θiΘi\theta_{i}\in\Theta_{i}, define

γi(Ji,Vi,πi|τi,θi,ai,h)𝙴𝚅i(h,θi|τi,π,Vi)\displaystyle\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)\equiv\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right) (22)
𝔼πiμi[Qi(h,θi,ai,a~i|τ;Ji)|h,θi],\displaystyle-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right],

where Qi(;Ji)Q_{i}(\cdot;J_{i}) is defined via replacing Ji(;Vi)J_{i}(\cdot;V_{i}) in (12) by Ji()J_{i}(\cdot). That is,

Qi(h,θi,a|τ;Ji)=R¯i(h,θi,a)+δsJi(s,a)μi(s|θi;τ).Q_{i}\left(h,\theta_{i},a|\tau;J_{i}\right)=\overline{R}_{i}(h,\theta_{i},a)+\delta\sum\limits_{s}J_{i}(s,a)\mu_{i}(s|\theta_{i};\tau).

Define the set

(J,V){π|πi(ai|θi)γi(Ji,Vi,πi|τi,θi,ai,h)=0,i,ai,θi,h,(FE1),(FE2)}.\mathcal{R}^{\dagger}\left(J,V\right)\equiv\left\{\pi\middle|\begin{aligned} &\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0,\\ &\forall i,a_{i},\theta_{i},h,(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}. (23)

We define a set of conditions termed local admissibility as follows.

Definition 2 (Local Admissibility).

A profile τ,π,J,V\left<\tau,\pi,J,V\right> is locally admissible if π(J,V)\pi\in\mathcal{R}^{\dagger}\left(J,V\right) and 𝐗s,τ,𝐞,𝐛,𝐟(s,h)\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\in\mathcal{R}\left(s,h\right), for all sSs\in S, hHh\in H.

Theorem 2.

Suppose that {Xisλi(Xis;θi,h)}θiΘi\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}} is a set of linearly independent vectors for all XisX^{s}_{i}, iNi\in N, sSs\in S, hHh\in H. Then, β,π\left<\beta^{*},\pi^{*}\right> is a PPME if and only if <τ,π,J,V><\tau^{*},\pi^{*},J^{*},V^{*}> is locally admissible.

The proof of Theorem 2 is deferred to Appendix -H. Theorem 2 provides necessary and sufficient conditions for characterizing the PPME. In particular, if there is an algorithm converges to a local admissible point τ,π,J,V\left<\tau,\pi,J,V\right> given a feasible cognition profile Γ\Gamma under a linear independence assumption, then the associated profile β,π\left<\beta,\pi\right> is a PPME. That is, a locally admissible point τ,π,J,V\left<\tau,\pi,J,V\right> achieves Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0 and Z(π,V|β,τ)=0Z\left(\pi,V\middle|\beta,\tau\right)=0.

III-C Discussion

Following the simplification τi()=τi(|,gi,h)\tau_{i}(\cdot)=\tau_{i}(\cdot|\cdot,g_{i},h) to represent the choice of τi\tau_{i} using βi(h)\beta_{i}(h), the function Z(π,V|β,τ)Z(\pi,V|\beta,\tau) given by (OBJ1) can be written as Z(π,V|τ)Z(\pi,V|\tau), and 𝒦(β|τ)\mathcal{K}(\beta|\tau) and (β|τ)\mathcal{E}(\beta|\tau), respectively, given by (13) and (OPT) can be written as 𝒦(τ)\mathcal{K}(\tau) and (τ)\mathcal{E}(\tau). With abuse of notation, we additionally rewrite (τ)\mathcal{E}(\tau) and 𝒦𝙶𝙵𝙿𝙰(τ)\mathcal{K}^{\mathtt{GFPA}}(\tau), respectively, as (τ,V)\mathcal{E}(\tau,V) and 𝒦𝙶𝙵𝙿𝙰(τ,V)\mathcal{K}^{\mathtt{GFPA}}(\tau,V) by fixing an arbitrary VV. Hence, (τ,V)\mathcal{E}(\tau,V) and 𝒦𝙶𝙵𝙿𝙰(τ,V)\mathcal{K}^{\mathtt{GFPA}}(\tau,V) becomes sets of profiles π\pi and τ,J\left<\tau,J\right>, respectively.

Suppose that the game 𝙱Γ\mathtt{B}^{\Gamma} admits at least one PPME. Then, a PPME profile τ,π\left<\tau,\pi\right> (or β,π\left<\beta,\pi\right>) of 𝙱Γ\mathtt{B}^{\Gamma} can be obtained by solving the following bi-level constrained optimization problem:

minπ,V\displaystyle\min_{\pi,V} Z(π,V|τ),\displaystyle\;Z\left(\pi,V\middle|\tau\right), (24)
s.t., π,V𝒦(τ),τ,J𝙶𝙵𝙿𝙰(π,V).\displaystyle\textup{ s.t., }\left<\pi,V\right>\in\mathcal{K}\left(\tau\right),\left<\tau,J\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi,V\right).

The problem (24) is equivalent to

minτ,J,V\displaystyle\min_{\tau,J,V} Z𝙶𝙵𝙿𝙰(τ,J,V|π),\displaystyle\;Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right), (25)
s.t., π(τ,V),τ,J,V𝒦𝙶𝙵𝙿𝙰(τ).\displaystyle\textup{ s.t., }\pi\in\mathcal{E}\left(\tau,V\right),\left<\tau,J,V\right>\in\mathcal{K}^{\mathtt{GFPA}}\left(\tau\right).

Let us restrict attention to the problem (25). Proposition 1 implies that for any fixed τ\tau, any π(τ,V)\pi\in\mathcal{E}\left(\tau,V\right) satisfies Z(π,V|τ)=0Z(\pi,V|\tau)=0. Consider the following set

𝒦{τ,π,V,J|Z(π,V|τ)=0,(FE1),(FE2),(EQ1),(EQ2),(RG1),(RG2),(EQ3),(EQ4)}.\mathcal{K}^{\dagger}\equiv\left\{\left<\tau,\pi,V,J\right>\middle|\begin{aligned} &Z\left(\pi,V\middle|\tau\right)=0,\\ &(\ref{eq:FE1}),(\ref{eq:FE2}),(\ref{eq:OB}),(\ref{eq:EQ}),\\ &(\ref{eq:regular_tau}),(\ref{eq:feasible_tau}),(\ref{eq:OB1}),(\ref{eq:EQ4})\end{aligned}\right\}. (26)

Then, we can reformulate the problem (25) as

minτ,J,VZ𝙶𝙵𝙿𝙰(τ,J,V|π), s.t., τ,π,V,J𝒦.\displaystyle\min_{\tau,J,V}Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right),\textup{ s.t., }\left<\tau,\pi,V,J\right>\in\mathcal{K}^{\dagger}. (27)

The total number of decision variables of the optimization problem (27) is 𝙽𝚅=n×|H|+n×|H|×iN|Θi|+iN|Gi|×|H|×|Θi|+iN×|Θi|×|Ai|\mathtt{NV}=n\times\left|H\right|+n\times\left|H\right|\times\prod_{i\in N}\left|\Theta_{i}\right|+\prod_{i\in N}\left|G_{i}\right|\times\left|H\right|\times\left|\Theta_{i}\right|+\prod_{i\in N}\times\left|\Theta_{i}\right|\times\left|A_{i}\right|. Let 𝒦E\mathcal{K}^{E} denote the set of active constraints and the equality constraints. If we use algorithms that depend on Linear Independence Constraint Qualification (LICQ) to solve (27), then we require the gradients of 𝒦E\mathcal{K}^{E} be linearly independent. However, at the global minimum of (27), the number of active constraints plus the number of equality constraints are at least as great as the number of decision variables; i.e., |𝒦E|𝙽𝚅\left|\mathcal{K}^{E}\right|\geq\mathtt{NV}. In general models of 𝙱Γ\mathtt{B}^{\Gamma}, the linear independence of the gradients of 𝒦E\mathcal{K}^{E} is a relatively restrictive condition.

Theorem 2 shows that the local admissibility can fully characterize the PPME under a condition that requires {Xisλi(Xis;θi,h)}θiΘi\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}} to be a set of linearly independent vectors for every iNi\in N, hHh\in H, and sSs\in S. For any given hHh\in H and sSs\in S, |Xis|=1+2jN|Θj||Θi|\left|X^{s}_{i}\right|=1+2\prod_{j\in N}\left|\Theta_{j}\right|-\left|\Theta_{i}\right| for all iNi\in N, which is greater than |Θi|\left|\Theta_{i}\right|. Consequently, we can assert that the requirement for linear independence amongst {Xisλi(Xis;θi,h)}θiΘi\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}} is generally less restrictive compared to the necessity for linear independence among the gradients of 𝒦E\mathcal{K}^{E}.

The local admissibility (Definition 2) can be decomposed into two parts. First, π(J,V)\pi\in\mathcal{R}^{\dagger}\left(J,V\right) specifies conditions for a policy profile π\pi given JJ and VV. Second, 𝑿s,τ,𝒆,𝒃,𝒇(s,h)\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\in\mathcal{R}\left(s,h\right), for all sSs\in S, hHh\in H, which is independent of the profile π\pi. The second part of the local admissibility implies that any algorithm that searches for the zero of the gradients of the Lagrangian of (LFPA), while converging to π(J,V)\pi\in\mathcal{R}^{\dagger}\left(J,V\right), converges to a PPME. Designing algorithms that converge to the local admissibility will be our future work.

IV Perfect Information Cognition Choice

In this section, we study when the set of available signaling rule profiles contains a signaling rule that releases the true realizations of the states in every period.

IV-A Cognition Cost Schemes

Each agent ii chooses a cognition choice gi,tg_{i,t} with a cost CiC_{i}\in\mathbb{R}. Define 𝒰i{Gi,S,Θi,Ai,Δ(S),Δ(Θi)}\mathcal{U}_{i}\equiv\left\{G_{i},S,\Theta_{i},A_{i},\Delta\left(S\right),\Delta\left(\Theta_{i}\right)\right\}. Let P(𝒰i)P^{\prime}\left(\mathcal{U}_{i}\right) denote the power set without the empty set that includes all non-empty subsets of 𝒰i\mathcal{U}_{i}. Then, we define the set of cost functions C=(Ci)i𝒩C=\left(C_{i}\right)_{i\in\mathcal{N}} that can have as their domain any non-empty combination of Gi,S,Θi,AiG_{i},S,\Theta_{i},A_{i}, Δ(S)\Delta\left(S\right), and Δ(Θi)\Delta\left(\Theta_{i}\right) by

{C=(Ci)iN|Ci:X, for XP(𝒰i),iN}.\mathcal{F}\equiv\left\{C=\left(C_{i}\right)_{i\in N}\middle|C_{i}:X\mapsto\mathbb{R},\textup{ for }X\in P^{\prime}\left(\mathcal{U}_{i}\right),\forall i\in N\right\}.

Here are some examples of cognition functions. The cognition function CC\in\mathcal{F} is cognition-based (CB) if each Ci:GiC_{i}:G_{i}\mapsto\mathbb{R}; that is, Ci(gi,t)C_{i}(g_{i,t})\in\mathbb{R} is the cost if agent ii chooses gi,tGig_{i,t}\in G_{i}. The CB cost directly prices each agent ii’s cognition choice. The cost function CC\in\mathcal{F} is type-based (TB) if each Ci:ΘiC_{i}:\Theta_{i}\mapsto\mathbb{R}; that is, agent ii suffers a cost Ci(θi,t)C_{i}(\theta_{i,t}) if a type θi,t\theta_{i,t} is realized to him according to his cognition choice. With the TB cost, each agent ii’s cognition decision is priced based on the realized type. The cost function CC\in\mathcal{F} is state-type-based (STB) if each Ci:S×ΘiC_{i}:S\times\Theta_{i}\mapsto\mathbb{R}; that is, each agent ii’s cognition cost is Ci(st,θi,t)C_{i}(s_{t},\theta_{i,t}) if the state is sts_{t} and agent ii receives a type θi,t\theta_{i,t}. The STB cost takes into account the realized state and each agent’s type. This cost scheme can capture the settings when the pricing of cognition depends on the difference between the information (about the state) encapsulated in the type and the state. The cost function CC\in\mathcal{F} is state-action-based (SAB) if each Ci:S×𝒜iC_{i}:S\times\mathcal{A}_{i}\mapsto\mathbb{R}. That is, each agent ii suffers a cost Ci(st,ai,t)C_{i}(s_{t},a_{i,t}) that depends on the true state sts_{t} and his local action ai,ta_{i,t}. This cost scheme prices agent ii’s cognition based on the consequences of the information acquisition (i.e., ai,ta_{i,t}) given the true state. The cost function CC\in\mathcal{F} is mutual information (MI) if each Ci():Δ(Θi)×Δ(S)C_{i}(\cdot):\Delta\left(\Theta_{i}\right)\times\Delta\left(S\right)\mapsto\mathbb{R} is defined by

Ci(θ~i,t;s~t)H(θ~i,t)H(θ~i,t|s~t),\displaystyle C_{i}\left(\tilde{\theta}_{i,t};\tilde{s}_{t}\right)\equiv H\left(\tilde{\theta}_{i,t}\right)-H\left(\tilde{\theta}_{i,t}\middle|\tilde{s}_{t}\right),

where H()H(\cdot) denotes the conditional entropy operator (see, e.g., [18]).

Let Γ{𝒯[G,Θ],C}\Gamma\equiv\left\{\mathcal{T}\left[G,\Theta\right],C\right\} denote the cognition profile for some CC\in\mathcal{F}. We assume that the cardinalities of GiG_{i} and Θi\Theta_{i} are the same for all agents; i.e., |Gi|=|Gj|\left|G_{i}\right|=\left|G_{j}\right| and |Θi|=|Θj|\left|\Theta_{i}\right|=\left|\Theta_{j}\right|, for all iji\neq j.

IV-B Perfect-Information PPME

In this section, we introduce PPME under perfect information. We start by focusing on a general non-stationary case.

Definition 3 (Perfect Information Structure).

An information structure {τ,Θ}\{\tau,\Theta\} is perfect-information if Θi,t=S\Theta_{i,t}=S and there exists giGi,τi,t(s|s,gi,h)=1g^{*}_{i}\in G_{i},\tau_{i,t}(s|s,g^{*}_{i},h)=1 for all iN,sS,hHi\in N,s\in S,h\in H.

We use {ξ,S}\{\xi,S\} to denote the perfect(-information) information structure, where ξi(s)=s\xi_{i}(s)=s for all sSs\in S, iNi\in N. Let 𝒯[ξ][ξ;G,Θ]\mathcal{T}\left[\xi\right]\equiv\left[\xi;G,\Theta\right] with giGig^{*}_{i}\in G_{i} and Θig=S\Theta^{g^{*}}_{i}=S denote the menu of signaling rules that contains {ξ,S}\left\{\xi,S\right\}, and let Γ[ξ]={T[ξ;G,Θ],{C,C}}\Gamma[\xi]=\left\{T\left[\xi;G,\Theta\right],\left\{C,C\right\}\right\} denote the corresponding cognition profile.

With abuse of notation, we use gi,t(=gi)Gig^{*}_{i,t}(=g^{*}_{i})\in G_{i} to denote agent ii’s cognition choice that leads to perfect information structure. Suppose that all other agents choose {ξ,S}\{\xi,S\} in every period tt in the cognition stage. Hence, each agent ii knows that other agents observe the true sts_{t} (i.e., θi,t=st\theta_{-i,t}=s_{t}) though agent ii may not observe sts_{t} (i.e., gi,tgi,tg_{i,t}\neq g^{*}_{i,t}). For simplicity, we use τi,t\tau_{i,t} as agent ii’s signaling rule due to his period-tt cognition choice gi,tg_{i,t} without specifying gi,tg_{i,t}; the same simplification is made for ξ\xi and gg^{*}. Hence, agent ii’s value functions (H)-(HTA) can be rewritten as

Ji,t(ht|τi,t,ξ(i,t),π)=𝔼πτi,t,ξ(i,t)[k=tδkt(Ri(s~i,k,a~k)+c~i,k)|ht],\displaystyle\begin{aligned} &J_{i,t}\left(h_{t}\middle|\tau_{i,t},\xi_{-(i,t)},\pi\right)\\ &=\mathbb{E}^{\tau_{i,t},\xi_{-(i,t)}}_{\pi}\left[\sum_{k=t}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle|h_{t}\right],\end{aligned} (PI-H)
Vi,t(ht,θi,t|τi,t,ξ(i,t),π)=𝔼πτi,ξ(i,t)[R¯i(ht,θi,t,a~t)+c~i,t+k=t+1δkt(Ri(s~i,k,a~k)+c~i,k)|ht,θi,t],\displaystyle\begin{aligned} &V_{i,t}\left(h_{t},\theta_{i,t}\middle|\tau_{i,t},\xi_{-(i,t)},\pi\right)=\mathbb{E}^{\tau_{i},\xi_{-(i,t)}}_{\pi}\Big{[}\overline{R}_{i}(h_{t},\theta_{i,t},\tilde{a}_{t})+\tilde{c}_{i,t}\\ &+\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\Big{|}h_{t},\theta_{i,t}\Big{]},\end{aligned} (PI-HT)
Qi,t(ht,θi,t,at|τi,ξ(i,t),π)=R¯i(ht,θi,t,at)+ci,t+𝔼πξ(i,t)[k=t+1δkt(Ri(s~i,k,a~k)+c~i,k)|ht,θi,t].\displaystyle\begin{aligned} &Q_{i,t}\left(h_{t},\theta_{i,t},a_{t}\middle|\tau_{i},\xi_{-(i,t)},\pi\right)=\overline{R}_{i}(h_{t},\theta_{i,t},a_{t})+c_{i,t}\\ &+\mathbb{E}^{\xi_{-(i,t)}}_{\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle|h_{t},\theta_{i,t}\right].\end{aligned} (PI-HTA)

Define the set

Π[ξ;𝙱Γ[ξ]]\displaystyle\Pi\left[\xi;\mathtt{B}^{\Gamma\left[\xi\right]}\right] (28)
{π=(πi,t)|Vi,t(ht,st|ξ,π)Vi,t(ht,st|ξ,(πi,t,π(i,t))),iN,t1,htH,stS,(FE1),(FE2)}.\displaystyle\equiv\left\{\pi=(\pi_{i,t})\middle|\begin{aligned} &V_{i,t}\left(h_{t},s_{t}\middle|\xi,\pi\right)\geq V_{i,t}\left(h_{t},s_{t}\middle|\xi,\left(\pi_{i,t},\pi_{-(i,t)}\right)\right),\\ &\forall i\in N,t\geq 1,h_{t}\in H,s_{t}\in S,(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}.

The perfect-information PPME (PI-PPME) is defined as follows.

Definition 4 (PI-PPME).

In a game 𝙱Γ[ξ]\mathtt{B}^{\Gamma\left[\xi\right]}, a profile ξ,π\left<\xi,\pi\right> constitutes a PI-PPME if for all iNi\in N, hHh\in H, it holds in every period tt that, for all iNi\in N, t1t\geq 1, htHh_{t}\in H, τi,t𝒯[ξ;G,Θ]\tau_{i,t}\in\mathcal{T}\left[\xi;G,\Theta\right],

Ji,t(ht|ξ,π)Ji,t(ht|τi,t,ξ(i,t),π),\displaystyle J_{i,t}\left(h_{t}\middle|\xi,\pi\right)\geq J_{i,t}\left(h_{t}\middle|\tau_{i,t},\xi_{-(i,t)},\pi\right), (29)
and πΠ[ξ;𝙱Γ[ξ]].\displaystyle\textit{ and }\pi\in\Pi\left[\xi;\mathtt{B}^{\Gamma\left[\xi\right]}\right].

For any profile τ,π\left<\tau,\pi\right>, define each agent ii’s period-tt ex-post history-state value function (EP-HSA value function) by

Wi,t(st,at|τ,π)Ri(st,at)+ci,t\displaystyle W_{i,t}(s_{t},a_{t}|\tau,\pi)\equiv R_{i}(s_{t},a_{t})+c_{i,t} (HSA)
+𝔼πτ[k=t+1δkt(Ri(s~i,k,a~k)+c~i,k)|st,at].\displaystyle+\mathbb{E}^{\tau}_{\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle|s_{t},a_{t}\right].

Theorem 3 establishes a relationship between a PPME and a PI-PPME in terms of the H value functions.

Theorem 3.

Fix a base game 𝙱\mathtt{B}. For a profile τ,π\left<\tau,\pi\right> that constitutes a PPME of a game 𝙱Γ\mathtt{B}^{\Gamma} with feasible cognition profile Γ={𝒯[G,Θ],C}\Gamma=\left\{\mathcal{T}^{\natural}\left[G^{\prime},\Theta^{\prime}\right],C\right\} for any CC\in\mathcal{F}, there exists a PI-PPME profile ξ,π\left<\xi,\pi^{*}\right> of a game 𝙱Γ[ξ]\mathtt{B}^{\Gamma\left[\xi\right]} with feasible cognition profile Γ[ξ]={𝒯[ξ;G,Θ],C}\Gamma\left[\xi\right]=\left\{\mathcal{T}^{\natural}\left[\xi;G,\Theta\right],C^{*}\right\} for SAB cost profile CC^{*}, such that, for all iNi\in N, t1t\geq 1,htHh_{t}\in H,

Ji,t(ht|τ,π)=𝔼πτ[Wi,t(s~t,a~t|ξ,π)|ht]\displaystyle J_{i,t}(h_{t}|\tau,\pi)=\mathbb{E}^{\tau}_{\pi}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right] (30)
=𝔼πξ[Wi,t(s~t,a~t|ξ,π)|ht]=Ji,t(ht|ξ,π).\displaystyle=\mathbb{E}^{\xi}_{\pi^{*}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right]=J_{i,t}(h_{t}|\xi,\pi^{*}).

IV-C Value-Preserving Transformation: From PI-PPME to PPME

Given any history hHh\in H, we define the perfect-information (PI) achievable value set, 𝒱(h)\mathcal{V}\left(h\right) as follows:

𝒱(h){u=(ui)|ui=Ji(h|ξ,π),\displaystyle\mathcal{V}\left(h\right)\equiv\Big{\{}u=(u_{i})\Big{|}u_{i}=J_{i}\left(h\middle|\xi,\pi^{*}\right), (31)
for a PI-PPME ξ,π with a SAB cost profile}.\displaystyle\textup{ for a PI-PPME $\left<\xi,\pi^{*}\right>$ with a {SAB} cost profile}\Big{\}}.

For any PI-PPME with a SAB cost profile, let 𝒱(h)=𝒱(h|ξ,π){u|ui=Ji(h|ξ,π)}=𝒱(h)\mathcal{V}^{*}(h)=\mathcal{V}^{*}(h|\xi,\pi^{*})\equiv\left\{u\middle|u_{i}=J_{i}(h|\xi,\pi^{*})\right\}=\subset\mathcal{V}\left(h\right). For any J(h)=(Ji(h))iNJ(h)=\left(J_{i}\left(h\right)\right)_{i\in N} in 𝒱(h)\mathcal{V}^{*}\left(h\right), there must exist a PI-PPME with a SAB cost profile that leads to H value Ji(h)J_{i}(h) for each agent ii given history hHh\in H. Construct

𝐕i(h,θi|τ,π;J)=s,a,θi(Ri(s,a)+ci,t\displaystyle\mathbf{V}_{i}\left(h,\theta_{i}\middle|\tau,\pi;J^{*}\right)=\sum_{s,a,\theta_{-i}}\Big{(}R_{i}(s,a)+c_{i,t}
+δJi(s,a))π(a|θi,θi)μi(s,θi|θ;τ)T(s|h).\displaystyle+\delta J^{*}_{i}(s,a)\Big{)}\pi(a|\theta_{i},\theta_{-i})\mu_{i}(s,\theta_{-i}|\theta;\tau)T(s|h).

Define the function U:h(𝒱(h)×i𝒩Δ(Θi))iNhHΔ(Θi)U:\prod\limits_{h\in\mathcal{H}}\left(\mathcal{V}^{*}(h)\times\prod\limits_{i\in\mathcal{N}}\Delta\left(\Theta_{i}\right)\right)\mapsto\prod\limits_{i\in N}\prod\limits_{h\in H}\Delta\left(\Theta_{i}\right) by

U(τ)(θi,h|π;J(h))\displaystyle U\left(\tau\right)(\theta_{i},h|\pi;J(h)) (32)
=τi(θi|h)+max(0,𝐕i(h,θi|τ,π;J)J(h))1+θiΘimax(0,𝐕i(h,θi|τ,π;J)J(h)).\displaystyle=\frac{\tau_{i}\left(\theta_{i}\middle|h\right)+\max\left(0,\mathbf{V}_{i}\left(h,\theta_{i}\middle|\tau,\pi;J^{*}\right)-J^{*}(h)\right)}{1+\sum_{\theta^{\prime}_{i}\in\Theta_{i}}\max\left(0,\mathbf{V}_{i}\left(h,\theta^{\prime}_{i}\middle|\tau,\pi;J^{*}\right)-J^{*}(h)\right)}.
Proposition 3.

Given any PI-PPME profile ξ,π\left<\xi,\pi^{*}\right> with a SAB cost profile, there exists at least profile τ,π\left<\tau,\pi\right> with with a feasible cognition profile with any cost profile CC\in\mathcal{F} such that

τi(θi|h)=U(τ)(θi,h|π;J(h)).\tau_{i}\left(\theta_{i}\middle|h\right)=U\left(\tau\right)(\theta_{i},h|\pi;J(h)). (fp)
Theorem 4.

Given any PI-PPME profile ξ,π\left<\xi,\pi^{*}\right> with a SAB cost profile, a profile τ,π\left<\tau,\pi\right> with a feasible cognition profile with any cost profile CC\in\mathcal{F} is a PPME if and only if, it satisfies (fp).

References

  • [1] M. Wu, S. Amin, and A. E. Ozdaglar, “Value of information in bayesian routing games,” Operations Research, vol. 69, no. 1, pp. 148–163, 2021.
  • [2] E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, 2011.
  • [3] L. Mathevet, J. Perego, and I. Taneva, “On information design in games,” Journal of Political Economy, vol. 128, no. 4, pp. 1370–1404, 2020.
  • [4] T. Zhang and Q. Zhu, “Bayesian promised persuasion: Dynamic forward-looking multiagent delegation with informational burning,” arXiv preprint arXiv:2201.06081, 2022.
  • [5] A. Celli, S. Coniglio, and N. Gatti, “Private bayesian persuasion with sequential games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, 2020, pp. 1886–1893.
  • [6] Y. Babichenko, I. Talgam-Cohen, and K. Zabarnyi, “Bayesian persuasion under ex ante and ex post constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5127–5134.
  • [7] J. Gan, R. Majumdar, G. Radanovic, and A. Singla, “Bayesian persuasion in sequential decision-making,” arXiv preprint arXiv:2106.05137, 2021.
  • [8] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable actions,” Journal of Economic Theory, vol. 100, no. 2, pp. 191–219, 2001.
  • [9] D. Fudenberg and J. Tirole, Game Theory.   Ane Books, 2005. [Online]. Available: https://books.google.com.hk/books?id=Ij7WQwAACAAJ
  • [10] J. Filar and K. Vrieze, “Competitive markov decision processes-theory, algorithms, and applications,” 1997.
  • [11] H. Prasad and S. Bhatnagar, “General-sum stochastic games: Verifiability conditions for nash equilibria,” Automatica, vol. 48, no. 11, pp. 2923–2930, 2012.
  • [12] H. Prasad, P. LA, and S. Bhatnagar, “Two-timescale algorithms for learning nash equilibria in general-sum stochastic games,” in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015, pp. 1371–1379.
  • [13] J. Song, H. Ren, D. Sadigh, and S. Ermon, “Multi-agent generative adversarial imitation learning,” Advances in neural information processing systems, vol. 31, 2018.
  • [14] J. Wu, Z. Zhang, Z. Feng, Z. Wang, Z. Yang, M. I. Jordan, and H. Xu, “Sequential information design: Markov persuasion process and its efficient reinforcement learning,” arXiv preprint arXiv:2202.10678, 2022.
  • [15] T. Zhang and Q. Zhu, “Forward-looking dynamic persuasion for pipeline stochastic bayesian game: A fixed-point alignment principle,” arXiv preprint arXiv:2203.09725, 2022.
  • [16] O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes: basic optimality criteria.   Springer Science & Business Media, 2012, vol. 30.
  • [17] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
  • [18] Elements of information theory.   John Wiley & Sons, 1999.

-D Proof of Lemma 1

The only if part is straightforward. In particular, if β,π\left<\beta,\pi\right> is a PPME, then π=(πi,πi)Π[β;𝙱Γ]\pi=(\pi_{i},\pi_{-i})\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right] and (6) holds for all πi\pi^{\prime}_{i}. Hence, (6) also holds for πi\pi_{i}.

We proceed with the proof of the if part by establishing a contradiction. Suppose that there exists a pair (βi,t,πi,t)(\beta^{\prime}_{i,t},\pi^{\prime}_{i,t}) such that

Ji(h|τ,β,π)<Ji(h|τ,(βi,t,β(i,t)),(πi,t,π(i,t))).J_{i}\left(h\middle|\tau,\beta,\pi\right)<J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right)\right). (33)

The H value function JiJ_{i} can be constructed in terms of 𝙴𝚅i\mathtt{EV}_{i} as follows:

Ji(h|β,π)=θi,s𝙴𝚅i(h,θi|τ,β,π;Vi)τi(|s,βi(h))T(s|h).J_{i}\left(h\middle|\beta,\pi\right)=\sum_{\theta_{i},s}\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\pi;V_{i}\right)\tau_{i}\left(\middle|s,\beta_{i}(h)\right)T(s|h).

Since πΠ[β;𝙱Γ]\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right], we obtain

𝙴𝚅i(h,θi|τ,β,π;Vi)=𝙴𝚅i(h,θi|τ,(βi,t,β(i,t)),π;Vi)\displaystyle\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\pi;V_{i}\right)=\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi;V_{i}\right)
𝙴𝚅i(h,θi|τ,β,(πi,t,π(i,t));Vi)\displaystyle\geq\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right);V_{i}\right)
=𝙴𝚅i(h,θi|τ,(βi,t,β(i,t)),(πi,t,π(i,t));Vi),\displaystyle=\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right);V_{i}\right),

which implies

Ji(h|τ,(βi,t,β(i,t)),π)Ji(h|τ,(βi,t,β(i,t)),(πi,t,π(i,t))).J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi\right)\geq J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right)\right).

However, due to (33), we have

Ji(h|τ,(βi,t,β(i,t)),π)>Ji(h|τ,β,π),J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi\right)>J_{i}\left(h\middle|\tau,\beta,\pi\right),

which contradicts to (8). Thus, we complete the proof of the lemma. \square

-E Proof of Proposition 1

Suppose that β,π\left<\beta,\pi\right> with VV is a PPME. By the definition of PPME, it is straightforward to show that the constraints (FE1), (FE2), (EQ1), and (EQ2) are satisfied. Hence, π,V\left<\pi,V\right> is a feasible solution to the optimization problem of (OPT). By construction, Z(π,V|β,τ,C)=0Z\left(\pi,V\middle|\beta,\tau,C\right)=0. From the feasibility, β,π\left<\beta,\pi\right> is a global minimum of the optimization problem of (OPT).

Conversely, suppose that π,V𝒦(β|τ,C)\left<\pi,V\right>\in\mathcal{K}\left(\beta\middle|\tau,C\right) with Z(π,V|β,τ,C)=0Z\left(\pi,V\middle|\beta,\tau,C\right)=0. Then, the constraints (EQ1) and (EQ2) imply that, for all i𝒩i\in\mathcal{N}, hh\in\mathcal{H}, θiΘi\theta_{i}\in\Theta_{i} with τi(θi|s,βi(h))>0\tau_{i}(\theta_{i}|s,\beta_{i}(h))>0 where s𝒮s\in\mathcal{S} with T(s|h)>0T(s|h)>0,

Vi(h,θ)a𝐐i(h,θi,a|β,τ;Vi)π(a|θ).\displaystyle V_{i}\left(h,\theta\right)\geq\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\beta,\tau;V_{i}\right)\pi\left(a\middle|\theta\right).

However Z(π,V|β,τ,C)=0Z\left(\pi,V\middle|\beta,\tau,C\right)=0. Then, we obtain, for all i𝒩i\in\mathcal{N}

Vi(h,θ)=a𝐐i(h,θi,a|β,τ;Vi)π(a|θ).\displaystyle V_{i}\left(h,\theta\right)=\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\beta,\tau;V_{i}\right)\pi\left(a\middle|\theta\right).

By iteration, we have that VV is the unique optimal HT value function profile associated with π\pi. In addition, the constraint (EQ1) implies that given VV, β\beta is a PPME selection policy profile. Therefore, the profile β,π\left<\beta,\pi\right> is a PPME. \square

-F Proof of Proposition 2

Suppose that π,V(β|τ,C)\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau,C\right), i.e., β,π,V\left<\beta,\pi,V\right> is a global minimum of the optimization problem in (OPT) with Z(π,V|β,τ,C)=0Z\left(\pi,V\middle|\beta,\tau,C\right)=0. Then, the constraints (RG1) and (RG2) are trivially satisfied. Proposition 1 implies that β,π\left<\beta,\pi\right> is a PPME. From the construction of Z()Z(\cdot) in (OBJ1) and the constraint (EQ2), we have that τ,π,V\left<\tau,\pi,V\right> satisfies (EQ4). According to (10), we construct JJ as Ji(h)=θ,sVi(h,θ|τ,β,π)τ(θ|s,β(h))T(s|h)J_{i}(h)=\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T(s|h). Then, Z𝙵𝙿𝙰(τ,J,V|π,C)=0Z^{\mathtt{FPA}}\left(\tau,J,V\middle|\pi,C\right)=0. Since π,V\left<\pi,V\right> satisfies (EQ1) given τ\tau,

Ji(h)θi,sVi(h,θi,θi)τi(θi|s,βi(h))T(s|h),J_{i}(h)\geq\sum_{\theta_{-i},s}V_{i}\left(h,\theta_{i},\theta_{-i}\right)\tau_{-i}\left(\theta_{-i}\middle|s,\beta_{-i}(h)\right)T(s|h),

for all θiΘi\theta_{i}\in\Theta_{i} and hh\in\mathcal{H}, which implies (EQ3). From the constraints (EQ3) and (EQ4), we know that for any feasible τ,J,V\left<\tau^{\prime},J^{\prime},V^{\prime}\right>, Z𝙵𝙿𝙰(τ,J,V|π)0Z^{\mathtt{FPA}}\left(\tau^{\prime},J^{\prime},V^{\prime}\middle|\pi^{\prime}\right)\geq 0 where π\pi^{\prime} is the corresponding policy profile. Therefore, from Z𝙵𝙿𝙰(τ,J,V|π,C)=0Z^{\mathtt{FPA}}\left(\tau,J,V\middle|\pi,C\right)=0, we conclude that τ,J,V\left<\tau,J,V\right> is a global minimum of the optimization problem in (GFPA).

Conversely, let τ,J,V𝙵𝙿𝙰(π,C)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{FPA}}\left(\pi,C\right) with Z𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{FPA}}(\tau,J,V|\pi)=0. Then

Ji(h)=θ,sVi(h,θ|τ,β,π)τ(θ|s,β(h))T(s|h).J_{i}(h)=\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T(s|h). (34)

The constraint (EQ4) directly implies (EQ2), while the constraint (EQ3) implies

Ji(h)\displaystyle J_{i}\left(h\right) s,θi,a((R¯i(h,θi,a)+Ji(s,a))\displaystyle\geq\sum\limits_{s,\theta_{-i},a}\Bigg{(}\left(\overline{R}_{i}(h,\theta_{i},a)+J_{i}\left(s,a\right)\right) (35)
×π(a|θi,θi)τ(θi,θi|s,β(h))T(s|h)),\displaystyle\times\pi(a|\theta_{i},\theta_{-i})\tau\left(\theta_{i},\theta_{-i}|s,\beta(h)\right)T(s|h)\Bigg{)},

where the RHS can be written as

RHS of (35)\displaystyle\textup{RHS of }(\ref{eq:proof_FPA_2}) =s,θi,a((R¯i(h,θi,a)+sJi(s,a)μi(s|θi,h))\displaystyle=\sum_{s,\theta_{-i},a}\Bigg{(}\left(\overline{R}_{i}(h,\theta_{i},a)+\sum_{s}J_{i}\left(s,a\right)\mu_{i}(s|\theta_{i},h)\right)
×π(a|θ)τi(θi|s,β(h))T(s|h)).\displaystyle\times\pi(a|\theta)\tau_{-i}\left(\theta_{-i}|s,\beta(h)\right)T(s|h)\Bigg{)}.

Construct

𝐐i(h,θi,a|τ,β;Vi)=R¯i(h,θi,a)\displaystyle\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)=\overline{R}_{i}\left(h,\theta_{i},a\right)
+δsθ,sVi(h,θ|τ,β,π)τ(θ|s,β(h))T(s|h)μi(s|θi;τ).\displaystyle+\delta\sum\limits_{s}\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T\left(s\middle|h\right)\mu_{i}\left(s\middle|\theta_{i};\tau\right).

Then,

RHS of (35)\displaystyle\textup{RHS of }(\ref{eq:proof_FPA_2}) =s,θi,a((𝐐i(h,θi,a|τ,β;Vi))\displaystyle=\sum_{s,\theta_{-i},a}\Bigg{(}\left(\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)\right)
×π(a|θ)τi(θi|s,β(h))T(s|h)).\displaystyle\times\pi(a|\theta)\tau_{-i}\left(\theta_{-i}|s,\beta(h)\right)T(s|h)\Bigg{)}.

The constraint (EQ4) implies Vi(h,θ)=V_{i}\left(h,\theta\right)= a𝐐i(h,θi,a|τ,β;Vi)π(a|θ),\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)\pi(a|\theta), and thus Z(π,V|τ)=0Z(\pi,V|\tau)=0. Hence, from (34) and (35), we have

θ,s\displaystyle\sum_{\theta,s} Vi(h,θ)τ(θ|s,β(h))T(s|h)\displaystyle V_{i}\left(h,\theta\right)\tau(\theta|s,\beta(h))T(s|h)
θi,sVi(h,θ)τi(θi|s,βi(h))T(s|h),\displaystyle\geq\sum_{\theta_{-i},s}V_{i}\left(h,\theta\right)\tau_{-i}(\theta_{-i}|s,\beta_{-i}(h))T(s|h),

for all θiΘi\theta_{i}\in\Theta_{i}, which implies (EQ1). Therefore, π,V(β|τ,C)\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau,C\right) with Z(π,V|τ)=0Z(\pi,V|\tau)=0. \square

-G Proof of Theorem 2

To prove Theorem 2, we show that τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0 if and only if τ,π\left<\tau,\pi\right> is locally admissible.

Local Admissibility \Longrightarrow PPME

Fix any sSs\in S and hHh\in H. Suppose that τ,π\left<\tau,\pi\right> is locally admissible. From Δi(Xis,τi,𝒇i)=0\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)=0,

XisZi𝙻𝙵𝙿𝙰(Xis,τi,s,h)θiΘif[θi]Xisλi(Xis;θi,h)=0\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)-\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)=0

Since {Xisλi(Xis;θi,h)}θiΘi\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}} is a set of linearly independent vectors for all XisX^{s}_{i}, i𝒩i\in\mathcal{N}, s𝒮s\in\mathcal{S}, hh\in\mathcal{H}, we have, for all i𝒩i\in\mathcal{N},

f[θi]=τi(θi|s,gi,h), for all θiΘi.f\left[\theta_{i}\right]=\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right),\textup{ for all }\theta_{i}\in\Theta_{i}. (36)

In the decomposition Θi=Θi{θ^i}\Theta_{i}=\Theta^{\natural}_{i}\bigcup\left\{\hat{\theta}_{i}\right\}, θ^i\hat{\theta}_{i} can be fully characterized by Θi\Theta^{\natural}_{i}. That is,

τi(θ^i|s,gi,h)=1θiΘiτi(θi|s,gi,h).\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)=1-\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right).

From Di(Xis,τi(θi|),ei,b[θi]|s,h)=0D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]|s,h\right)=0, we have b[θi]e+τi(θi|)Mi(Xis,τi(θi|),s,h)=0b[\theta_{i}]-e+\frac{\partial}{\partial\tau_{i}(\theta_{i}|\cdot)}M_{i}(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),s,h)=0. Then, b[θi]=e+λi(Xis;θ^i,h)λi(Xis;θi,h)b[\theta_{i}]=e+\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right) and e=λi(Xis;θ^i,h)+λi(Xis;θi,h)+b[θi]e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]. From (38) and 𝐊(𝒆,𝒃,𝒇;τ,𝝀)=0\mathbf{K}(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda})=0, we have, for all θiΘi\theta_{i}\in\Theta^{\natural}_{i},

τi(θi|s,gi,h)(λi(Xis;θ^i,h)λi(Xis;θi,h)+e)=0\displaystyle\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\left(\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+e\right)=0
τi(θ^i|s,gi,h)(λi(Xis;θ^i,h)+λi(Xis;θi,h)+b[θi])=0,\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)\left(-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]\right)=0,

and for all θiΘi\theta_{i}\in\Theta_{i},

τi(θi|s,gi,h)λi(Xis;θi,h),\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),

which implies

b[θi]=λi(Xis;θi,h),θiΘi,\displaystyle b[\theta_{i}]=-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),\forall\theta_{i}\in\Theta^{\natural}_{i}, (37)
e=λi(Xis;θ^i,h).\displaystyle e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right).

Therefore, 𝑭(𝑿s,τ,𝒆,𝒃,𝒇)=0\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)=0 and 𝑲(𝒆,𝒃,𝒇;τ,𝝀)=0\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)=0 imply Zi𝙻𝙵𝙿𝙰(Xis,τi,s,h)=0Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0, leading to Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0. In addition, πi(ai|θi)γi(Ji,Vi,πi|τi,θi,ai,h)=0\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0 implies that Z(π,V|τ)=0Z(\pi,V|\tau)=0. From Proposition 1, τ,π\left<\tau,\pi\right> with VV constitutes a PPME. Then, from Proposition 2, we have τ,J,V𝙶𝙵𝙿𝙰(π,C)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi,C\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0.

PPME \Longrightarrow Local Admissibility

Suppose that τ,π\left<\tau,\pi\right> is a PPME. Hence, Proposition 2 implies τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0. Then, it holds for every i𝒩i\in\mathcal{N} that

Ji(h)θi,sVi(h,θi,θi)τ(θi,θi|s,g,h)Ts(s|h),J_{i}(h)\geq\sum\nolimits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau(\theta_{i},\theta_{-i}|s,g,h)T_{s}(s|h),

for all i𝒩i\in\mathcal{N}, hh\in\mathcal{H}, θiΘi\theta_{i}\in\Theta_{i}. which implies that λi(Xis;θi,h)0\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0. Since Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0, we have

Ji(h)=θ,sVi(h,θ)τ(θ|s,g,h)Ts(s|h).J_{i}(h)=\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta|s,g,h)T_{s}(s|h).

Then, from the definition of Zi𝙻𝙵𝙿𝙰Z^{\mathtt{LFPA}}_{i} in (18), Zi𝙻𝙵𝙿𝙰(Xis,τi,s,h)=0Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0. Since λi(Xis;θi,h)0\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0 for all θiΘi\theta_{i}\in\Theta_{i}, we have

τi(θi|s,g,h)λi(Xis;θi,h)=0.\tau_{i}(\theta_{i}|s,g,h)\lambda_{i}(X^{s}_{i};\theta_{i},h)=0.

By constructing f[θi]f[\theta_{i}] according to (38) and b[θi]b[\theta_{i}] and ee according to (39), respectively, we can show that there exist Lagrange multipliers such that the conditions in (s,h)\mathcal{R}\left(s,h\right) are satisfied.

From Proposition 1, π,V(β|τ)\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right) with Z(π,V|τ)=0Z(\pi,V|\tau)=0. Hence, we have

𝙴𝚅i(h,θi|τi,π,Vi)𝔼πiμi[Qi(h,θi,ai,a~i|τ;Ji)|h,θi]0.\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]\geq 0.

However, Z(π,V|τ)=0Z(\pi,V|\tau)=0. Then, it holds that

𝙴𝚅i(h,θi|τi,π,Vi)𝔼πiμi[Qi(h,θi,ai,a~i|τ;Ji)|h,θi]=0.\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]=0.

Therefore, we have πi(ai|θi)γi(Ji,Vi,πi|τi,θi,ai,h)=0\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0, for all iN,aiAi,θiΘi,h,(FE1),(FE2)i\in N,a_{i}\in A_{i},\theta_{i}\in\Theta_{i},h\in\mathcal{H},(\ref{eq:FE1}),(\ref{eq:FE2}). Thus, we conclude that τ,π\left<\tau,\pi\right> is locally admissible. \square

-H Proof of Theorem 2

To prove Theorem 2, we show that τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0 if and only if τ,π\left<\tau,\pi\right> is locally admissible.

Local Admissibility \Longrightarrow PPME

Fix any sSs\in S and hHh\in H. Suppose that τ,π\left<\tau,\pi\right> is locally admissible. From Δi(Xis,τi,𝒇i)=0\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)=0,

XisZi𝙻𝙵𝙿𝙰(Xis,τi,s,h)θiΘif[θi]Xisλi(Xis;θi,h)=0\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)-\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)=0

Since {Xisλi(Xis;θi,h)}θiΘi\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}} is a set of linearly independent vectors for all XisX^{s}_{i}, i𝒩i\in\mathcal{N}, s𝒮s\in\mathcal{S}, hh\in\mathcal{H}, we have, for all i𝒩i\in\mathcal{N},

f[θi]=τi(θi|s,gi,h), for all θiΘi.f\left[\theta_{i}\right]=\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right),\textup{ for all }\theta_{i}\in\Theta_{i}. (38)

In the decomposition Θi=Θi{θ^i}\Theta_{i}=\Theta^{\natural}_{i}\bigcup\left\{\hat{\theta}_{i}\right\}, θ^i\hat{\theta}_{i} can be fully characterized by Θi\Theta^{\natural}_{i}. That is,

τi(θ^i|s,gi,h)=1θiΘiτi(θi|s,gi,h).\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)=1-\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right).

From Di(Xis,τi(θi|),ei,b[θi]|s,h)=0D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]|s,h\right)=0, we have b[θi]e+τi(θi|)Mi(Xis,τi(θi|),s,h)=0b[\theta_{i}]-e+\frac{\partial}{\partial\tau_{i}(\theta_{i}|\cdot)}M_{i}(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),s,h)=0. Then, b[θi]=e+λi(Xis;θ^i,h)λi(Xis;θi,h)b[\theta_{i}]=e+\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right) and e=λi(Xis;θ^i,h)+λi(Xis;θi,h)+b[θi]e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]. From (38) and 𝐊(𝒆,𝒃,𝒇;τ,𝝀)=0\mathbf{K}(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda})=0, we have, for all θiΘi\theta_{i}\in\Theta^{\natural}_{i},

τi(θi|s,gi,h)(λi(Xis;θ^i,h)λi(Xis;θi,h)+e)=0\displaystyle\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\left(\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+e\right)=0
τi(θ^i|s,gi,h)(λi(Xis;θ^i,h)+λi(Xis;θi,h)+b[θi])=0,\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)\left(-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]\right)=0,

and for all θiΘi\theta_{i}\in\Theta_{i},

τi(θi|s,gi,h)λi(Xis;θi,h),\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),

which implies

b[θi]=λi(Xis;θi,h),θiΘi,\displaystyle b[\theta_{i}]=-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),\forall\theta_{i}\in\Theta^{\natural}_{i}, (39)
e=λi(Xis;θ^i,h).\displaystyle e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right).

Therefore, 𝑭(𝑿s,τ,𝒆,𝒃,𝒇)=0\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)=0 and 𝑲(𝒆,𝒃,𝒇;τ,𝝀)=0\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)=0 imply Zi𝙻𝙵𝙿𝙰(Xis,τi,s,h)=0Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0, leading to Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0. In addition, πi(ai|θi)γi(Ji,Vi,πi|τi,θi,ai,h)=0\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0 implies that Z(π,V|τ)=0Z(\pi,V|\tau)=0. From Proposition 1, τ,π\left<\tau,\pi\right> with VV constitutes a PPME. Then, from Proposition 2, we have τ,J,V𝙶𝙵𝙿𝙰(π,C)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi,C\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0.

PPME \Longrightarrow Local Admissibility

Suppose that τ,π\left<\tau,\pi\right> is a PPME. Hence, Proposition 2 implies τ,J,V𝙶𝙵𝙿𝙰(π)\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right) with Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0. Then, it holds for every i𝒩i\in\mathcal{N} that

Ji(h)θi,sVi(h,θi,θi)τ(θi,θi|s,g,h)Ts(s|h),J_{i}(h)\geq\sum\nolimits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau(\theta_{i},\theta_{-i}|s,g,h)T_{s}(s|h),

for all i𝒩i\in\mathcal{N}, hh\in\mathcal{H}, θiΘi\theta_{i}\in\Theta_{i}. which implies that λi(Xis;θi,h)0\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0. Since Z𝙶𝙵𝙿𝙰(τ,J,V|π)=0Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0, we have

Ji(h)=θ,sVi(h,θ)τ(θ|s,g,h)Ts(s|h).J_{i}(h)=\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta|s,g,h)T_{s}(s|h).

Then, from the definition of Zi𝙻𝙵𝙿𝙰Z^{\mathtt{LFPA}}_{i} in (18), Zi𝙻𝙵𝙿𝙰(Xis,τi,s,h)=0Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0. Since λi(Xis;θi,h)0\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0 for all θiΘi\theta_{i}\in\Theta_{i}, we have

τi(θi|s,g,h)λi(Xis;θi,h)=0.\tau_{i}(\theta_{i}|s,g,h)\lambda_{i}(X^{s}_{i};\theta_{i},h)=0.

By constructing f[θi]f[\theta_{i}] according to (38) and b[θi]b[\theta_{i}] and ee according to (39), respectively, we can show that there exist Lagrange multipliers such that the conditions in (s,h)\mathcal{R}\left(s,h\right) are satisfied.

From Proposition 1, π,V(β|τ)\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right) with Z(π,V|τ)=0Z(\pi,V|\tau)=0. Hence, we have

𝙴𝚅i(h,θi|τi,π,Vi)𝔼πiμi[Qi(h,θi,ai,a~i|τ;Ji)|h,θi]0.\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]\geq 0.

However, Z(π,V|τ)=0Z(\pi,V|\tau)=0. Then, it holds that

𝙴𝚅i(h,θi|τi,π,Vi)𝔼πiμi[Qi(h,θi,ai,a~i|τ;Ji)|h,θi]=0.\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]=0.

Therefore, we have πi(ai|θi)γi(Ji,Vi,πi|τi,θi,ai,h)=0\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0, for all iN,aiAi,θiΘi,h,(FE1),(FE2)i\in N,a_{i}\in A_{i},\theta_{i}\in\Theta_{i},h\in\mathcal{H},(\ref{eq:FE1}),(\ref{eq:FE2}). Thus, we conclude that τ,π\left<\tau,\pi\right> is locally admissible. \square

-I Proof of Theorem 3

Consider τ,π\left<\tau,\pi\right> that is a PPME of a game 𝙱Γ\mathtt{B}^{\Gamma} with any cognition cost profile CC\in\mathcal{F}. Hence, the profile τ,π\left<\tau,\pi\right> and the base game model 𝙱\mathtt{B} induces the probability measures P[τ,π]P\left[\tau,\pi\right], P[τ,π|h]P\left[\tau,\pi\middle|h\right], and P[τ,π|h,θi]P\left[\tau,\pi\middle|h,\theta_{i}\right]. With abuse of notation, we use f()f(\cdot) to denote the marginal mass or density function corresponding to P[τ,π]P\left[\tau,\pi\right]; e.g., f(at|ht)f(a_{t}|h_{t}), f(ci,t|ht)f(c_{i,t}|h_{t}), f(ai,t|ht,θi,t)f(a_{i,t}|h_{t},\theta_{i,t}). Given the PPME τ,π\left<\tau,\pi\right> of a game 𝙱Γ\mathtt{B}^{\Gamma} with feasible cognition profile Γ\Gamma, consider a profile ξ,π\left<\xi,\pi^{*}\right> and a SBA cost profile CC^{*} that satisfy the following, for all i𝒩i\in\mathcal{N}, t1t\geq 1,

πi,t(ai,t|st)f(ai,t|st) and Ci,t(st,ai,t)𝒞ci,tf(ci,t|st,ai,t).\displaystyle\pi^{*}_{i,t}(a_{i,t}|s_{t})\equiv f(a_{i,t}|s_{t})\textup{ and }C^{*}_{i,t}(s_{t},a_{i,t})\equiv\int_{\mathcal{C}}c_{i,t}f(c_{i,t}|s_{t},a_{i,t}).

In the PPME τ,π\left<\tau,\pi\right> of the game 𝔹Γ\mathbb{B}^{\Gamma}, the H value function for any hth_{t}\in\mathcal{H} can be given by,

Ji,t(ht|τ,π)=𝔼πτ[Ri(s~i,t,a~t)+c~i,t+δJi,t+1(s~i,t,a~t|τ,π)|ht]\displaystyle J_{i,t}\left(h_{t}\middle|\tau,\pi\right)=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{i,t},\tilde{a}_{t}\right)+\tilde{c}_{i,t}+\delta J_{i,t+1}\left(\tilde{s}_{i,t},\tilde{a}_{t}\middle|\tau,\pi\right)\middle|h_{t}\right] (40)
=𝔼πτ[Ri(s~i,t,a~t)|ht]+𝔼πτ[c~i,t|ht]+δ𝔼πτ[Ji,t+1(s~t,a~t|π,τ)|ht].\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{i,t},\tilde{a}_{t}\right)\middle|h_{t}\right]+\mathbb{E}^{\tau}_{\pi}\left[\tilde{c}_{i,t}\middle|h_{t}\right]+\delta\mathbb{E}^{\tau}_{\pi}\left[J_{i,t+1}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\pi,\tau\right)\middle|h_{t}\right].

Given the profile ξ,π\left<\xi,\pi^{*}\right>, the H value function for any hth_{t}\in\mathcal{H} can be represented in terms of the EP-HSA value function as follows:

Ji,t(ht|ξ,π)=𝔼πξ[Wi,t(s~t,a~t|ξ,π)|ht]\displaystyle J_{i,t}\left(h_{t}\middle|\xi,\pi^{*}\right)=\mathbb{E}^{\xi}_{\pi^{*}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right] (41)
=st,atWi,t(st,at|ξ,π)πt(at|st)T(st|ht)\displaystyle=\sum_{s_{t},a_{t}}W_{i,t}\left(s_{t},a_{t}\middle|\xi,\pi^{*}\right)\pi^{*}_{t}(a_{t}|s_{t})T(s_{t}|h_{t})
=st,atRi(st,at)πt(at|st)T(st|ht)\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})\pi^{*}_{t}(a_{t}|s_{t})T(s_{t}|h_{t})
+st,atCi,t(st,at)πt(at|st)T(st|ht)\displaystyle+\sum_{s_{t},a_{t}}C^{*}_{i,t}(s_{t},a_{t})\pi^{*}_{t}(a_{t}|s_{t})T(s_{t}|h_{t})
+δst,atJi,t+1(st,at|ξ,π)πt(at|st)T(st|ht)\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle|\xi,\pi^{*}\right)\pi^{*}_{t}(a_{t}|s_{t})T(s_{t}|h_{t})
=st,atRi(st,at)f(at|st)T(st|ht)\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})f(a_{t}|s_{t})T(s_{t}|h_{t})
+st,at𝒞ci,tf(ci,t|st,at)f(at|st)T(st|ht)\displaystyle+\sum_{s_{t},a_{t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}|s_{t},a_{t})f(a_{t}|s_{t})T(s_{t}|h_{t})
+δst,atJi,t+1(st,at|ξ,π)f(at|st)T(st|ht).\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle|\xi,\pi^{*}\right)f(a_{t}|s_{t})T(s_{t}|h_{t}).

Given P[τ,π]P\left[\tau,\pi\right], it holds that f(at|st)T(st|ht)=f(st,at|ht)f(a_{t}|s_{t})T(s_{t}|h_{t})=f(s_{t},a_{t}|h_{t}). Hence,

Ji,t(ht|ξ,π)=𝔼πξ[Wi,t(s~t,a~t|ξ,π)|ht]\displaystyle J_{i,t}\left(h_{t}\middle|\xi,\pi^{*}\right)=\mathbb{E}^{\xi}_{\pi^{*}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right]
=𝔼πτ[Ri(s~t,a~t)+c~i,t+δst,atJi,t+1(s~t,a~t|ξ,π)|ht]\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{t},\tilde{a}_{t}\right)+\tilde{c}_{i,t}+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right]
=𝔼πτ[Wi,t(s~t,a~t|ξ,π)|ht]=Ji,t(ht|τ,π).\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle|\xi,\pi^{*}\right)\middle|h_{t}\right]=J_{i,t}\left(h_{t}\middle|\tau,\pi\right).

Next, we prove that the profile ξ,π\left<\xi,\pi^{*}\right> with CC^{*} is indeed a PI-PPME. We proceed with the proof by showing a contradiction. Let ξ^=ξτi,t(ξ1,,ξt1,(τ^i,t,ξi,t),ξt+1,)\hat{\xi}=\xi\circ\tau_{i,t}\equiv\left(\xi_{1},\dots,\xi_{t-1},(\hat{\tau}_{i,t},\xi_{-i,t}),\xi_{t+1},\dots\right) denote a profile that is the same as ξ\xi except for agent ii’s period-tt choice τi,t\tau_{i,t}. Define π^=ππ^i,t\hat{\pi}=\pi^{*}\circ\hat{\pi}_{i,t} in the same way such that π^Π[ξ^;𝙱Γ[ξ^]]\hat{\pi}\in\Pi\left[\hat{\xi};\mathtt{B}^{\Gamma\left[\hat{\xi}\right]}\right]. In addition, the cognition cost CC^{*} remains the same. Given any history hth_{t}\in\mathcal{H}, the the profile ξ^,π^\left<\hat{\xi},\hat{\pi}\right> induces the H value function as

Ji,t(ht|ξ^,π^)\displaystyle J_{i,t}\left(h_{t}\middle|\hat{\xi},\hat{\pi}\right)
=st,at,θi,tRi(st,at)f(at|st,θi,t)τ^i,t(θi,t|st,ht)T(st|ht)\displaystyle=\sum_{s_{t},a_{t},\theta_{i,t}}R_{i}(s_{t},a_{t})f(a_{t}|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}|s_{t},h_{t})T(s_{t}|h_{t})
+st,at,θi,t𝒞ci,tf(ci,t|st,at)f(at|st,θi,t)τ^i,t(θi,t|st,ht)T(st|ht)\displaystyle+\sum_{s_{t},a_{t},\theta_{i,t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}|s_{t},a_{t})f(a_{t}|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}|s_{t},h_{t})T(s_{t}|h_{t})
+δst,at,θi,tJi,t+1(st,at|ξ,π)f(at|st,θi,t)τ^i,t(θi,t|st,ht)T(st|ht).\displaystyle+\delta\sum_{s_{t},a_{t},\theta_{i,t}}J_{i,t+1}\left(s_{t},a_{t}\middle|\xi,\pi^{*}\right)f(a_{t}|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}|s_{t},h_{t})T(s_{t}|h_{t}).

Here, f(at|st,θi,t)τ^i,t(θi,t|st,ht)T(st|ht)f(st,at,θt|ht)f(a_{t}|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}|s_{t},h_{t})T(s_{t}|h_{t})\neq f(s_{t},a_{t},\theta_{t}|h_{t}) because f()f(\cdot) is corresponding to P[τ,π]P\left[\tau,\pi\right]. If ξ,π\left<\xi,\pi^{*}\right> with CC^{*} is not a PI-PPME, then there must exist a history hth_{t} and a profile ξ^,π^\left<\hat{\xi},\hat{\pi}\right> such that Ji,t(ht|ξ^,τ^)>Ji,t(ht|ξ,π)=Ji,t(ht|τ,π)J_{i,t}(h_{t}|\hat{\xi},\hat{\tau})>J_{i,t}(h_{t}|\xi,\pi^{*})=J_{i,t}(h_{t}|\tau,\pi), which implies that τ,π\left<\tau,\pi\right> can be strictly improved by unilateral deviation (ξ^i,t,π^i,t)(\hat{\xi}_{i,t},\hat{\pi}_{i,t}) which contradicts the fact that τ,π\left<\tau,\pi\right> is PPME. \square