Stochastic Game with Interactive Information Acquisition:
Pipelined Perfect Markov Bayesian Equilibrium
Version 05 October, 2023

Tao Zhang, Quanyan Zhu Tao Zhang and Quanyan Zhu are with Department of Electrical and Computer Engineering, New York University, 370 Jay Street, Brooklyn, 11201, NY, USA. {tz636, qz494}@nyu.edu

Abstract

This paper studies a multi-player, general-sum stochastic game characterized by a dual-stage temporal structure per period. The agents face uncertainty regarding the time-evolving state that is realized at the beginning of each period. During the first stage, agents engage in information acquisition regarding the unknown state. Each agent strategically selects from multiple signaling options, each carrying a distinct cost. The selected signaling rule dispenses private information that determines the type of the agent. In the second stage, the agents play a Bayesian game by taking actions contingent on their private types. We introduce an equilibrium concept, Pipelined Perfect Markov Bayesian Equilibrium (PPME), which incorporates the Markov perfect equilibrium and the perfect Bayesian equilibrium. We propose a novel equilibrium characterization principle termed fixed-point alignment and deliver a set of verifiable necessary and sufficient conditions for any strategy profile to achieve PPME.

I Introduction

Driven by advancements in technology and improved computational intelligence, the widespread, cost-effective deployment of sensing systems is making it possible for individual participants in large-scale cyber-physical systems to access and process vast amounts of information for real-time decision-making. However, this exposition of information concurrently cultivates an era of uncertainty, largely owing to the complex and escalating disparities in information.

Addressing these uncertainties has emerged as a critical cognitive aspect of rational decision-making in environments dominated by asymmetric information from various sources, such as experts and service providers, to mitigate the detrimental effects of uncertainty. For instance, consider a transportation network featuring multiple heterogeneous traffic information providers (TIPs) (e.g., [1]). Here, intelligent vehicles (the agents) subscribe to TIPs to gain insights into global traffic conditions and available routes, thereby reducing information asymmetry.

As rational actors, each intelligent vehicle selects TIP subscriptions by incorporating information accuracy, often derived from customer reviews, and the subscription costs into its anticipated daily travel requirements typically expressed through payoff functions. Supply-chain service providers (SCPs), for example, may opt for more accurate but costlier TIPs to minimize their expected operational costs due to traffic congestion, whereas individual travelers might tolerate traffic delays and choose less accurate but cheaper TIPs.

The selection of TIPs has direct effects on the routing decisions of intelligent vehicles, which in turn impacts the traffic conditions of the network. Consequently, these traffic conditions influence the travel costs of all vehicles within the network and trigger information updates from their subscribed TIPs. A self-interested, rational SCP might even foresee such interactions and strategically choose TIPs and routing decisions to manipulate the network and its competitors.

The heightened emphasis on rationality necessitates a reconsideration of agents’ information acquisition from a passive act of receiving information to an integral element of rational behavior. The significance of this expanded rationality is two-pronged. Firstly, with the deluge of information that is often irrelevant, deceptive, or even manipulative, agents must possess the ability to identify and gather information that supports their decision-making processes. Secondly, due to their dynamic interactions with other agents and the environment, the choices of information sources not only influence an agent’s local actions but also the decision-making of others, as well as the operation of the environment (through the actions), which in turn affects the agent’s own utilities.

A Bayesian agent typically manages uncertainties by relying on priors and forming posterior beliefs [2] about the unobserved elements of the agent, or by establishing belief hierarchies (beliefs about the state as well as others’ beliefs; see. e.g., [3, 4]) in competitive multi-agent environments. These priors and beliefs fundamentally shape the rational decision-making of agents since they characterize the game’s uncertainty. The concept of Bayesian persuasion studies how a principal can use her informational advantage to strategically reveal noisy information about the state relevant to decision-making to the agents, thereby influencing and manipulating agents’ beliefs to induce behavior in her favor [2, 5, 6, 7, 4].

In this work, we focus on a discrete-time, infinite-horizon stochastic with interactive information acquisition (SGIA) played by a finite group of self-interested agents. SGIA is structured around two sequential decision-making stages within each time period. During the initial stage, agents engage in interactive information acquisition, aiming to procure noisy information about an unknown state, which is realized at the beginning of each period. The nature and quality of the information obtained in this stage subsequently define each agent’s type of unique private information. Each agent $i$ has the autonomy to decide how she gets informed about the unknown state by choosing a specific signaling rule that incurs a cost. We consider that the choices of signaling rules made by other agents do not directly influence the type generation of agent $i$ . However, they impact agent $i$ ’s beliefs regarding the unknown state and the types of other agents. The observation of private types instigates the second stage of the game. Here, each agent decides on a course of regular action, contingent upon her type. Subsequently, the state evolves according to a Markovian dynamic, dependent on the current state and the action profiles of the agents. The decision-making in each stage is simultaneous.

Built upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9], we propose a new equilibrium notion referred to as the pipelined perfect Markov Bayesian equilibrium (PPME). This concept encapsulates the core consistency between the optimalities of agents’ information acquisition and regular action-taking in a Markovian dynamic environment.

We characterize the PPME based on a principle known as the fixed-point alignment. By fixing the strategies for the information acquisition stage, we first formulate the equilibrium behaviors of the regular action-taking as a constrained optimization problem according to the nonlinear optimization formulation for Nash equilibria of a stochastic game (e.g., [10, 11, 12, 13]). We then propose the global fixed-point alignment (GFPA) to characterize the selected signaling rule profiles that match the fixed point of the optimal information acquisition at the first stage to the fixed point from the optimal action choices at the second stage. The GFPA process can be conceptualized as if there is an information designer who aims to induce certain behaviors (i.e., action-taking) of the agents by designing a set of available signaling rules for the agents to choose. Involving the agents’ autonomy of choosing signaling rules at the first stage distinguishes our model from the equilibrium analyses in existing Bayesian persuasion or information design in static environments (e.g., [2, 3, 5, 6]) as well as in dynamic models (e.g., [7, 4, 14]). By decomposing the problem of GFPA into local fixed-point alignment (LFPA) problems, we obtain a set of verifiable conditions known as local admissibility by applying a KKT-like process to the LFPA problems. Under a mild condition, we show that the local admissibility serves as a necessary and sufficient condition, placed on the signaling rules selections and action-takings, for PPME. Thus, if an algorithm converges to locally admissible points, then it provides a PPME for the stochastic game with interactive information acquisition.

The remainder of the paper is organized as follows. In Section II, In Section II, we present a formal description of the stochastic game model and introduce the equilibrium concept of pipelined perfect Markov Bayesian equilibrium (PPME). Section III-A introduces the concept of global fixed-point alignment (GFPA), while Section III-A provides a detailed elaboration on local fixed-point alignment (LFPA). Section III-C provides discussions and concludes the paper. Omitted proofs are delegated to the online appendix in [15].

Refer to caption — Figure 1: Flow of events in each period of the game $\mathtt{B}^{\Gamma}$ .

II Problem Formulation and Equilibrium

II-A Base Game Model

A finite-player infinite-horizon stochastic game can be characterized by a tuple $\mathtt{B}=\left<N,S,\left\{A_{i}\right\},\left\{R_{i}\right\},T,\mathring{T},\delta\right>$ in which

•

There is a finite number of players, denoted by $N=\{1,2,\dots,n\}$ .
•

The finite set of states that players can encounter at each period is denoted by $S$ .
•

The finite set of actions that player $i$ can take is denoted by $A_{i}$ .
•

The payoff function of player $i$ is denoted by: $R_{i}:S\times A\mapsto\mathbb{R}$ , where $A=\prod_{i\in N}A_{i}$ .
•

The state evolves over time according to $T:S\times A\mapsto\Delta\left(S\right)$ ; i.e., the probability of $s_{t+1}$ is given by $T(s_{t+1}|s_{t},a_{t})$ when period- $t$ state is $s_{t}\in S$ and action profile is $a_{t}\in A$ . $\mathring{T}(\cdot)\in\Delta(S)$ is the initial distribution of the state.
•

$\delta\in(0,1)$ is the common discount factor.

For notational compactness, we use history, denoted by $h_{t}\in H$ , to capture the pair of state and action profile of the last period; i.e., $h_{t}\equiv(s_{t-1},a_{t-1})$ and $H\equiv S\times A$ . We assume that $h_{t}$ is common knowledge; i.e., $s_{t}$ and $a_{t}$ become publicly observable at the end of each period $t$ .

II-B Interactive Information Acquisition

We consider that the agents do not observe the realizations of the state at the beginning of each period. Instead, in each period $t$ , the agents engage in interactive information acquisition to obtain additional information about the unobserved state $s_{i,t}\in S$ .

Based on history $h_{t}\in H$ , agents’ interactive information acquisition leads to an information profile, denote by $\mathcal{I}\left[h_{t}\middle|g_{t}\right]\equiv\left<p(\cdot|h_{t},g_{t}),\tau(\cdot,g_{t}),\Theta\right>$ at the beginning of each period $t$ , where $g_{t}\equiv(g_{i,t})_{i\in N}\in G\equiv\prod_{i\in N}G_{i}$ represents the profile of the agents’ choices of information acquisition. Here, $\Theta\equiv\prod_{i\in N}\Theta_{i}$ is the profile of finite type spaces for the agents. $p(\cdot|h_{t},g_{t})\in\Delta(S\times\Theta)$ is the joint probability of the state and the type profile. $\tau(\cdot|s_{t},g_{t})\in\Delta(\Theta)$ is the signaling rule profile that generates a type profile $\theta_{t}=(\theta_{i,t})_{i\in N}\in\Theta$ for the agents. The signaling rule profile is independent if $\tau(\theta_{t}|s_{t},g_{t})=\prod_{i\in N}\tau_{i}(\theta_{i,t}|s_{t},g_{i,t})$ , or correlated if the marginal concerning each agent $i$ , $\tau_{i}(\theta_{i,t}|s_{t},g_{i,t},g_{-i,t})$ , depends on $g_{-i,t}$ . In this work, we focus on independent signaling rules. In addition, we refer to $g_{t}=(g_{i,t})_{i\in N}\in G$ as the cognition choice profile where each $g_{i,t}\in G_{i}$ is agent $i$ ’s cognition choice that indexes the agent’s choice of $\tau_{i}(\cdot|s_{t},g_{i,t})$ . We assume that the cardinalities of $\mathcal{G}_{i}$ and $\Theta_{i}$ are the same for all agents; i.e., $\left|\mathcal{G}_{i}\right|=\left|\mathcal{G}_{j}\right|$ and $\left|\Theta_{i}\right|=\left|\Theta_{j}\right|$ , for all $i\neq j$ .

Given the state dynamics specified by the base game $\mathtt{B}$ , we assume that the information profile $\mathcal{I}\left[h_{t}\middle|g_{t}\right]$ satisfy

	$\displaystyle\sum\nolimits_{\theta_{t}\in\Theta}p\left(s_{t},\theta_{t}\middle\|h_{t},g_{t}\right)=T\left(s_{t}\middle\|h_{t}\right),$		(1)
	$\displaystyle\tau\left(\theta_{t}\middle\|s_{t},g_{t}\right)=\frac{p(s_{t},\theta_{t}\|h_{t},g_{t})}{T(s_{t}\|h_{t})}.$		(2)

Given the base game $\mathtt{B}$ , define the set of available signaling rule profiles by

\mathcal{T}\left[G,\Theta\right]\equiv\left\{\tau\middle|\begin{aligned} &\tau_{i}\left(\cdot,g_{i,t}\right):S\mapsto\Delta\left(\Theta_{i}\right),\forall i\in N,\\ &g_{i,t}\in G_{i},\textup{s.t. }\exists p(\cdot|h_{t},g_{t})\in\Delta(S\times\Theta)\\ &\textup{ satisfying }(\ref{eq:common_prior_cond_1})\textup{ and }(\ref{eq:common_prior_cond_2}),\forall h_{t}\in H\end{aligned}\right\}.

(3)

After all agents made their cognition choices, each agent $i$ privately receives a type $\theta_{i,t}\in\Theta_{i}$ with probability $\tau_{i}(\theta_{i,t}|s_{t},g_{i,t})$ . Based on his type, agent $i$ chooses an action $a_{i,t}\in A_{i}$ .

Each agent $i$ ’s information acquisition induces a cognition cost. In this work, we restrict attention to the cognition cost that depends on the true state and the action taken by the agent. That is, after choosing $g_{i,t}$ , agent $i$ suffers a cost $C_{i}(s_{t},a_{i,t})$ that is realized at the end of each period when the true state is $s_{i,t}$ and agent $i$ takes action $a_{i,t}$ . This cost scheme prices agent $i$ ’s information acquisition based on the consequences of the information acquisition (i.e., the agent’s local action $a_{i,t}$ ) when the true state is $s_{i,t}$ .

II-C Stochastic Game with Interactive Information Acquisition

Let $\Gamma\equiv\left\{\mathcal{T}^{\dagger}\left[G,\Theta\right],C\right\}$ denote the cognition scheme for some $\mathcal{T}^{\dagger}\left[G,\Theta\right]\subseteq\mathcal{T}\left[G,\Theta\right]$ , where $C=\{C_{i}\}_{i\in N}$ . The base game $\mathtt{B}$ and the cognition scheme $\Gamma$ induces a stochastic game with interactive information acquisition (SGIA), denoted by $\mathtt{B}^{\Gamma}$ . Each agent $i$ ’s decision making in each period $t$ is described as follows.

•

Contingent on the history $h_{t}$ , agent $i$ uses a pure-strategy selection policy $\beta_{i,t}(h_{t})\in G_{i}$ to select $g_{i,t}\in G_{i}$ .
•

Contingent on the type $\theta_{i,t}$ , agent $i$ uses a mixed-strategy policy $\pi_{i}$ to choose mixed action $\pi_{i,t}(\theta_{i,t},h_{t})=\left(\pi_{i}(a|\theta_{i,t},h_{t})\right)_{a\in\mathcal{A}_{i}}\in\Delta(A_{i})$ .

We say that a policy profile $\pi=(\pi_{i,t},\pi_{-i,t})_{t\geq 1}$ is feasible if it satisfies the following constraints:

	$\displaystyle\pi_{i,t}(a_{i,t}\|\theta_{i,t})\geq 0,\forall a_{i,t}\in A_{i},\theta_{i,t}\in\Theta_{i},i\in N,t\geq 1,$			(FE1)
	$\displaystyle\sum\nolimits_{a_{i,t}\in A_{i}}\pi_{i,t}(a_{i,t}\|\theta_{i,t})=1,\forall\theta_{i,t}\in\Theta_{i},i\in N,t\geq 1.$			(FE2)

With reference to Fig. 1, the following events occur in each period $t$ of $\mathtt{B}^{\Gamma}$ .

1.

Nature draws a state $s_{t}\in S$ according to $T(\cdot|h_{t})\in\Delta\left(S\right)$ .
2.

Each agent $i$ selects a cognition choice $g_{i,t}\in\mathcal{G}$ , based on the common history, which determines $\tau_{i}$ . These selections are simultaneous.
3.

Based on the cognition choice profile $g_{t}$ , a type profile $\theta_{t}=\left(\theta_{i,t}\right)_{i\in N}$ is drawn with probability $\tau(\theta_{t}|s_{t},g_{t})$ .
4.

Each agent $i$ privately observes his type $\theta_{i,t}$ and then chooses an action $a_{i,t}$ with probability $\pi_{i,t}(a_{i,t}|\theta_{i,t})$ .
5.

The state $s_{t}$ and the action profile $a_{t}=(a_{i,t},a_{-i,t})$ (or, $h_{t+1}=(s_{t},a_{t})$ ) become public information, and the state $s_{t}$ is transitioned to a new state $s_{t+1}$ according to $T(\cdot|h_{t+1})\in\Delta(S)$ .

II-D Value Functions

According to the Ionescu Tulcea theorem [16], $\{\mathring{T},T,\tau\}$ , the agents’ policy profile $<\beta,\pi>$ , and a cost profile (for a certain cost scheme) uniquely define a probability measure, denoted by $P[\tau,\beta,\pi]$ , over $(S\times G\times\Theta\times A)^{\infty}$ . Let $\mathbb{E}^{\tau}_{\beta,\pi}[\cdot]$ denote the expectation operator with respect to $P[\tau,\beta,\pi]$ . In addition, given any $h_{t}$ and $(h_{t},\theta_{i,t})$ , we obtain unique probability measures (perceived by agent $i$ ) $P[\tau,\beta,\pi|h_{t}]$ and $P[\tau,\beta,\pi|h_{t},\theta_{i,t}]$ over $S\times G_{-i}\times\Theta\times A\times(S\times G\times\Theta\times A)^{\infty}$ and $S\times\Theta_{-i}\times A\times(S\times G\times\Theta\times A)^{\infty}$ , respectively, for all $i\in N$ . In particular, $P[\tau,\beta,\pi|h_{t}]$ models the uncertainty at period $t$ for each agent $i$ at the beginning of the selection stage while $P[\tau,\beta,\pi|h_{t},\theta_{i,t}]$ models the uncertainty for each agent $i$ at the beginning of the primitive stage. Let $\mathbb{E}^{\tau}_{\beta,\pi}[\cdot|h_{t}]$ and $\mathbb{E}^{\tau}_{\beta,\pi}[\cdot|h_{t},\theta_{i,t}]$ , respectively, denote the expectation operators with respect to $P[\tau,\beta,\pi|h_{t}]$ and $P[\tau,\beta,\pi|h_{t},\theta_{i,t}]$ .

Given $P[\tau,\beta,\pi|h_{t}]$ , agent $i$ ’s period- $t$ history value function (H value function) is defined by

\displaystyle J_{i,t}(h_{t}|\tau,\beta,\pi)\equiv\mathbb{E}^{\tau}_{\beta,\pi}\left[\sum_{k=t}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{k},\tilde{a}_{k})+C_{i}\left(\tilde{s}_{k},\tilde{a}_{i,k}\right)\right)\middle|h_{t}\right].

(H)

After a type $\theta_{i,t}$ is realized, each agent $i$ forms a posterior belief, denoted by $\mu_{i}(\cdot|\theta_{i,t},h_{t})\in\Delta(S\times\Theta_{-i})$ , over the state $s_{t}$ and other agents’ contemporaneous types $\theta_{-i,t}$ . Due to (3), there is a (period- $t$ ) common prior $p$ such that each posterior belief satisfies

\mu_{i}\left(s_{t},\theta_{-i,t}\middle|\theta_{i,t},h_{t},g_{t}\right)=\frac{p\left(s_{t},\theta_{i,t},\theta_{-i,t}\middle|h_{t},g_{t}\right)}{\sum\nolimits_{s_{t},\theta_{-i,t}}p\left(s_{t},\theta_{i,t},\theta_{-i,t}\middle|h_{t},g_{t}\right)}.

With abuse of notation, we use $\mu_{i}(s_{t}|\theta_{i,t},h_{t})$ and $\mu_{i}(\theta_{-i,t}|\theta_{i,t},h_{t})$ for the marginals of $\mu_{i}(s_{t},\theta_{-i,t}|\theta_{i,t},h_{t})$ . Given $h_{t}$ and $\theta_{i,t}$ , we define each agent $i$ ’s expected immediate reward (due to agent $i$ ’s uncertainty about $s_{t}$ ) by

\overline{R}_{i}\left(h_{t},\theta_{i,t},a_{t}\right)\equiv\sum\limits_{s_{t}\in S}\left(R_{i}(s_{t},a_{t})+C_{i,t}\left(s_{t},a_{i,t}\right)\right)\mu_{i}(s_{t}|\theta_{i,t},h_{t}).

Given $P[\tau,\beta,\phi|h_{t}]$ , agent $i$ ’s period- $t$ history-type value function (HT value function) is defined by

		$\displaystyle V_{i,t}(h_{t},\theta_{t}\|\tau,\beta,\pi)\equiv\mathbb{E}^{\tau}_{\beta,\pi(\cdot\|\theta_{t})}\Bigg{[}\overline{R}_{i}(h_{t},\theta_{i,t},\tilde{a}_{t})$		(HT)
		$\displaystyle+\sum_{k=t+1}^{\infty}\delta^{k-t+1}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+C_{i,k}\left(\tilde{s}_{i,k},\tilde{a}_{i,k}\right)\right)\Bigg{\|}h_{t},\theta_{i,t}\Bigg{]}.$		(HT)

Here, $V_{i,t}(\cdot)$ depends on $\theta_{-i,t}$ through the policy profile $\pi(\cdot|\theta_{t})$ to take expectation over the current-period action profile. Finally, agent $i$ ’s period- $t$ history-type-action value function (HTA value function) is defined by

		$\displaystyle Q_{i,t}(h_{t},\theta_{i,t},a_{t}\|\tau,\beta,\pi)\equiv\overline{R}_{i}(h_{t},\theta_{i,t},a_{t})$		(HTA)
		$\displaystyle+\mathbb{E}^{\tau}_{\beta,\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t+1}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+C_{i}\left(\tilde{s}_{i,k},\tilde{a}_{k}\right)\right)\middle\|h_{t},\theta_{i,t}\right].$		(HTA)

Here, $Q_{i,t}(\cdot)$ is independent of $\theta_{-i,t}$ given the action profile $a_{t}$ .

II-E Pipelined Perfect Markov Bayesian Equilibrium

In this section, we define a new equilibrium concept for the game $\mathtt{B}^{\Gamma}$ . Our focus lies on the stationary equilibrium, and as such, we omit the time indexes of the value functions and variables, unless explicitly mentioned otherwise. First, we construct

	$\displaystyle\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\beta,\pi;V_{i}\right)\equiv\mathbb{E}^{\tau}_{\beta}\left[V_{i}(h,\theta_{i},\tilde{\theta}_{-i}\|\beta,\tau,\pi)\middle\|h,\theta_{i}\right].$		(4)
	$\displaystyle\Pi\left[\beta;\mathtt{B}^{\Gamma}\right]\equiv\left\{\pi\middle\|\begin{aligned} &\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\beta,\left(\pi_{i},\pi_{-i}\right);V_{i}\right)\\ &\geq\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\beta,\left(\pi^{\prime}_{i},\pi_{-i}\right);V_{i}\right),\\ &\forall i\in N,\theta_{i}\in\Theta_{i},\pi^{\prime}_{i},(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}.$		(5)

Definition 1 (Pipelined Perfect Markov Bayesian Equilibrium).

In a game $\mathtt{B}^{\Gamma}$ , a (stationary) strategy profile $\left<\beta,\pi\right>$ constitutes a pipelined perfect Markov Bayesian equilibrium (PPME) if for all $i\in N$ , $h\in H$ , $\theta_{i}\in\Theta_{i}$ , it holds in every period that, for $\beta^{\prime}_{i}$ and $\pi^{\prime}_{i}$ such that $\left(\pi^{\prime}_{i},\pi_{-i}\right)\in\Pi\left[\beta^{\prime}_{i},\beta_{-i};\mathtt{B}^{\Gamma}\right]$ ,

	$\displaystyle J_{i}\left(h\middle\|\tau,\left(\beta_{i},\beta_{-i}\right),\pi\right)\geq J_{i}\left(h\middle\|\tau,\left(\beta^{\prime}_{i},\beta_{-i}\right),\left(\pi^{\prime}_{i},\pi_{-i}\right)\right),$		(6)
	$\displaystyle\textit{and }\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right].$		(7)

The equilibrium concept of PPME builds upon the concepts of Markov perfect equilibrium [8] and perfect Bayesian equilibrium [9].

Lemma 1.

Let $\mathtt{B}^{\Gamma}$ with feasible cognition profile $\Gamma$ . A strategy profile $\left<\beta,\pi\right>$ is a PPME of a game $\mathtt{B}^{\Gamma}$ if and only if

		$\displaystyle J_{i}\left(h\middle\|\tau,\left(\beta_{i},\beta_{-i}\right),\pi\right)\geq J_{i}\left(h\middle\|\tau,\left(\beta^{\prime}_{i},\beta_{-i}\right),\pi\right),$		(8)
		$\displaystyle\textit{and }\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right].$		(8)

Hence, in a PPME problem, each agent $i\in N$ tries to solve the optimization problem given $h\in H$ :

\displaystyle\max\limits_{\beta_{i},\pi_{i}}

\displaystyle J_{i}\left(h\middle|\tau,\left(\beta_{i},\beta_{-i}\right),\left(\pi_{i},\pi_{-i}\right)\right),\textup{ s.t. }\left(\pi_{i},\pi_{-i}\right)\in\Pi\left[\beta_{i},\beta_{-i};\mathtt{B}^{\Gamma}\right].

(9)

Theorem 1.

Fix any $G$ and $\Theta$ , there exists at least one profile $\tau\in\mathcal{T}\left[G,\Theta\right]$ such that the game $\mathtt{B}^{\Gamma}$ admits at least one stationary PPME.

III Equilibrium Characterizations

In this section, we characterize the stationary PPME problem for a given game $\mathtt{B}^{\Gamma}$ by formulating it as a constrained optimization problem and establishing a verifiable condition that is both necessary and sufficient.

Following standard dynamic programming argument (see, e.g., [17]), we represent (H), (HT), and (HTA) recursively as follows:

	$\displaystyle\begin{aligned} \mathbf{J}_{i}\left(h\middle\|\tau,\beta;V_{i}\right)=\sum_{\theta,s}V_{i}\left(h,\theta\middle\|\tau,\beta,\pi\right)\tau\left(\theta\middle\|s,\beta\left(h\right)\right)T\left(s\middle\|h\right),\end{aligned}$		(10)
	$\displaystyle\begin{aligned} V_{i}\left(h,\theta\middle\|\tau,\beta,\pi\right)=\sum_{a}\pi\left(a\middle\|\theta_{i},\theta_{-i}\right)Q_{i}\left(h,\theta_{i},a\middle\|\tau,\beta,\pi\right),\end{aligned}$		(11)
	$\displaystyle\begin{aligned} \mathbf{Q}_{i}\left(h,\theta_{i},a\middle\|\tau,\beta;V_{i}\right)&=\overline{R}_{i}\left(h,\theta_{i},a\right)\\ &+\delta\sum\limits_{s}\mathbf{J}_{i}\left(s,a\middle\|\tau,\beta;V_{i}\right)\mu_{i}\left(s\middle\|\theta_{i};\tau\right).\end{aligned}$		(12)

Here, we denote $\mathbf{J}_{i}(\cdot;V_{i})$ and $\mathbf{Q}_{i}(\cdot;V_{i})$ with $V_{i}$ to highlight their dependence on $V_{i}$ from the Bellman recursions (10)-(12). Note that $Q_{i}(\cdot|\beta,\pi,\tau)$ in the right-hand side (RHS) of (11) is given by (HTA).

Leveraging (10)-(12), define

		$\displaystyle Z\left(\pi,V\middle\|\beta,\tau\right)$		(OBJ1)
		$\displaystyle\equiv\sum_{h,s,\theta}\Bigg{(}\sum_{i}\left(V_{i}\left(h,\theta\right)-\sum\limits_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle\|\tau,\beta;V_{i}\right)\pi\left(a\middle\|\theta\right)\right)$
		$\displaystyle\times\tau\left(\theta\middle\|s,\beta(h)\right)T_{s}\left(s\middle\|h\right)\Bigg{)},$

and in addition, construct the following constraints:

	$\displaystyle\begin{cases}&\begin{aligned} &\mathbf{J}_{i}(h\|\tau,\beta;V_{i})\\ &\geq\sum\limits_{s,\theta_{-i}}V_{i}(h,\theta_{i},\theta_{-i})\tau_{-i}\left(\theta_{-i}\|s,\beta_{-i}(h)\right)T_{s}(s\|h),\end{aligned}\\ &\forall i,h,\theta_{i}\textup{ with }\sum_{s}\tau_{i}\left(\theta_{i}\|s,\beta_{i}(h)\right)T_{s}(s\|h)>0,\end{cases}$		(EQ1)
	$\displaystyle\begin{cases}&\begin{aligned} &\mathtt{EV}_{i}(h,\theta_{i}\|\tau,\beta,\pi;V_{i})\\ &\geq\mathbb{E}^{\tau}_{\beta,\pi_{-i}}\Big{[}\mathbf{Q}_{i}(h,\theta_{i,t},a_{i},\tilde{a}_{-i}\|\tau,\beta;V_{i})\Big{\|}h,\theta_{i}\Big{]},\end{aligned}\\ &\forall i,a_{i},h,\theta_{i}\textup{ with }\sum_{s}\tau_{i}\left(\theta_{i}\|s,\beta_{i}(h)\right)T_{s}(s\|h)>0.\end{cases}$		(EQ2)

Define the following set

\mathcal{K}\left(\beta\middle|\tau\right)\equiv\left\{\left<\pi,V\right>\Big{|}\begin{aligned} &\pi\textup{ and }V\textup{ satisfy }(\ref{eq:FE1}),(\ref{eq:FE2}),\\ &(\ref{eq:OB}),(\ref{eq:EQ})\end{aligned}\right\}.

(13)

Let

\mathcal{E}\left(\beta\middle|\tau\right)\equiv\left\{\begin{aligned} \arg\min\limits_{\pi,V}&\;Z(\pi,V|\beta,\tau),\\ \text{ s.t. }&\left<\pi,V\right>\in\mathcal{K}\left(\beta\middle|\tau\right)\end{aligned}\right\}.

(OPT)

Proposition 1.

Fix $\beta$ . In a game $\mathtt{B}^{\Gamma}$ , a profile $<\beta,\pi>$ is a PPME if and only if (i) $<\pi,V>\in\mathcal{E}\left(\beta\middle|\tau\right)$ , where $V=\left(V_{i}\right)_{i\in N}$ is the corresponding optimal HT value functions, and (ii) $Z(\pi,V|\beta,\tau)=0$ .

Proposition 1 extends the fundamental formulation of finding a Nash equilibrium of a stochastic game as a nonlinear programming (Theorem 3.8.2 of [10]; see also, [11, 12, 13]). Here, the constraints (FE1) and (FE2) ensure that each candidate $\pi$ is a valid conditional probability distribution and rules out the possible trivial solution $\{\pi_{i}=0\}_{i\in N}$ . The constraints (EQ1) and (EQ2) are two necessary conditions for a PPME derived from the optimality of PPME and the Bellman recursions (10) and (11).

III-A Global Fixed-Point Alignment

First, we extend the agents’ strategy profile $\left<\beta,\pi\right>$ to $\left<\beta,\pi,V\right>$ . Given $V_{i}$ as a variable, we define the following term based on (4):

\displaystyle\mathtt{MV}_{i,t}\left(h_{t},\theta_{i,t}\middle|\tau,\beta;V_{i,t}\right)\equiv\mathbb{E}^{\tau}_{\beta}\left[V_{i,t}(h_{t},\theta_{i,t},\tilde{\theta}_{-i,t})\middle|h,\theta_{i}\right].

(14)

If we fix a $\beta$ , Proposition 1 implies that $\left<\pi,V\right>$ of a PPME profile $\left<\beta,\pi,V\right>$ needs to be a global minimum of (OPT) with $Z(\pi,V|\tau,\beta)=0$ . Equivalently, given $\pi_{-i}$ , each $V_{i}$ needs to be a fixed point of the following equation, for all $i\in N$ , $h\in H$ , $\theta_{i}\in\Theta_{i}$ ,

\begin{cases}&\begin{aligned} &\mathtt{MV}_{i}(h,\theta_{i}|\tau,\beta;V_{i})\\ &\geq\mathbb{E}^{\tau}_{\beta_{-i},\pi_{-i}}\Big{[}\mathbf{Q}_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau,\beta;V_{i})\Big{|}\theta_{i},h\Big{]},\end{aligned}\\ &\forall a_{i}\in A_{i},i\in N,h\in H,\theta_{i}\in\Theta_{i}.\end{cases}

(EQ4)

where dependence of the RHS of (EQ4) on $V_{i}$ is due to (12). At the cognition stage, the optimality of PPME requires that the agents’ choice $g$ among all possible options is optimal. Given any $J_{i}(\cdot)$ , define, for all $i\in N$ , $\theta_{i}\in\Theta_{i}$ , $h\in H$ ,

		$\displaystyle\mathtt{IJ}_{i}(h,\theta_{i}\|\tau_{-i},\beta_{-i},\pi;J_{i})\equiv\sum\limits_{s,\theta_{-i},a}\Big{(}\overline{R}_{i}(h,\theta_{i},a)+J_{i}\left(s,a\right)\Big{)}$		(15)
		$\displaystyle\times\pi(a\|\theta_{i},\theta_{-i})\tau_{-i}\left(\theta_{-i}\|s,\beta_{-i}(h)\right)T(s\|h),$		(15)

where $J_{i}(s,a)$ on the RHS of (15) is a H value function of the next period given current-period $(s,a)$ . The optimality of $\tau$ in the cognition stage of a PPME (i.e., constraint (EQ1)) implies that the optimal history value function $J_{i}$ for each agent $i$ needs to be a fixed point while fixing others’ $\tau_{-i}$ . That is,

\begin{cases}&J_{i}(h)\geq\mathtt{IJ}_{i}(h,\theta_{i}|\tau_{-i},\beta_{-i},\pi;J_{i}),\\ &\forall\theta_{i}\in\Theta_{i},i\in N,h\in H.\end{cases}

(EQ3)

Here, (EQ3) is independent of $V$ while (EQ4) is independent of $J$ . In order to make $<\beta,\pi>$ as a PPME of $\mathtt{B}^{\Gamma}$ , $<\beta,\pi>$ must be chosen such that there exist $\left<J,V\right>$ satisfying

\left\{\begin{aligned} &\textup{$J$ is a fixed point of (\ref{eq:OB1}) if and only if}\\ &\textup{$V$ is a fixed point of (\ref{eq:EQ4}).}\end{aligned}\right\}

We refer to such a procedure as the Global Fixed-Point Alignment (GFPA).

Since $\beta_{i}$ is a pure strategy, each deterministic choice of $g_{i}=\beta_{i}(h)$ leads to a signaling rule $\tau_{i}(\cdot|\cdot,g_{i})$ that determines a distribution of agent $i$ ’s period- $t$ types. That is, every $\beta_{i}$ determines a unique $\tau_{i}(\cdot,g_{i}):S\mapsto\Delta(\Theta_{i})$ for every $h$ . Hence, for ease of exposition, we use $\tau_{i}$ and $\tau_{i,t}(\cdot|\cdot,g_{i},h)$ (with abuse of notation) to represent $\beta_{i}$ and $\beta_{i}(h)$ , respectively; unless otherwise stated. Therefore, in game $\mathtt{B}^{\Gamma}$ , each agent $i$ controls $\left<\tau_{i},\pi\right>$ .

Given a $\pi$ , define the following function of $\tau$ , $J$ , and $V$ :

		$\displaystyle Z^{\mathtt{GFPA}}(\tau,J,V\|\pi)$		(OBJ2)
		$\displaystyle\equiv\sum\nolimits_{i,h}\left(J_{i}(h)-\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta\|s,g,h)T_{s}(s\|h)\right).$		(OBJ2)

Similar to (FE1) and (FE2) for $\pi$ , we introduce the following two constraints placed on $\tau$ :

	$\displaystyle\tau_{i}(\theta_{i}\|s,g_{i},h)\geq 0,\forall i\in N,\theta_{i}\in\Theta_{i},s\in S,g\in G,h\in H,$		(RG1)
	$\displaystyle\sum\nolimits_{\theta_{i}\in\Theta_{i}}\tau(\theta_{i}\|s,g_{i},h)=1,\forall i\in N,s\in S,g\in G,h\in H.$		(RG2)

Define the following set:

\mathcal{K}^{\mathtt{GFPA}}(\pi)\equiv\left\{\left<\tau,J,V\right>\middle|(\ref{eq:regular_tau}),(\ref{eq:feasible_tau}),(\ref{eq:OB1}),(\ref{eq:EQ4})\right\}.

(16)

Let

\displaystyle\mathcal{E}^{\mathtt{GFPA}}(\pi)=\left\{\begin{aligned} \arg\min\limits_{\tau,J,V}&\;Z^{\mathtt{GFPA}}(\tau,J,V|\pi),\\ \textup{ s.t. }&\left<\tau,J,V\right>\in\mathcal{K}^{\mathtt{GFPA}}(\pi)\end{aligned}\right\}.

(GFPA)

Proposition 2.

Suppose that $\left<J,V,\pi\right>$ satisfy the Bellman recursions (10)-(12). Then, $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}(\pi)$ with $Z^{\mathtt{GFPA}}(\tau,J,V|\pi)=0$ if and only if $\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right)$ with $Z(\pi,V|\tau)=0$ .

The proof of Proposition 2 is deferred to Appendix -F. Proposition 2 shows that if $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}(\pi)$ with $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ for a policy $\pi$ , then the profile $\left<\tau,\pi\right>$ is a PPME.

III-B Local Fixed-Point Alignment

First, we decompose each type space $\Theta_{i}$ into $\Theta_{i}=\Theta^{\natural}_{i}\cup\{\hat{\theta}_{i}\}$ such that the constraint (RG2) can reformulated as, for all $i\in N$ , $s\in S$ , $g\in G$ , $h\in H$ ,

		$\displaystyle\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle\|s,g_{i},h\right)\leq 1,$		(17)
		$\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle\|s,g_{i},h\right)+\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle\|s,g_{i},h\right)=1.$		(17)

Given $\tau_{-i}(\theta_{-i}|s,g_{-i},h)=\prod_{j\neq i}\tau_{j}(\theta_{i}|s,g_{j},h)$ , define for all $i\in N$ ,

\displaystyle\mathtt{IV}_{i}(h,\theta_{i}|\tau_{-i};V_{i})\equiv\sum\limits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau_{-i}(\theta_{-i}|s,g_{-i},h)T_{s}(s|h).

Construct the vector $X^{s}_{i}\equiv\left(J_{i}(h),V_{i}(h,\cdot),\tau_{-i}\left(\cdot|s,g_{-i},h\right)\right)$ . Define

\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)\equiv J_{i}(h)-\mathtt{IV}_{i}(h,\theta_{i}|\tau_{-i};V_{i}).

Here, $\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)$ is a function of $J_{i}$ , $V_{i}$ , and $\tau_{-i}(\cdot|s,g_{-i},h)$ and is independent of $\pi$ and $\tau_{i}$ .

For any $s\in S$ , $h\in H$ , $\theta_{i}\in\Theta_{i}$ , define the following function

	$\displaystyle Z^{\mathtt{LFPA}}_{i}\left(X^{s}_{i},\tau_{i};s,h\right)$	$\displaystyle\equiv\sum\nolimits_{\theta^{\prime}_{i}\in\Theta^{\natural}_{i}}\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\tau_{i}(\theta^{\prime}_{i}\|s,g_{i},h)$		(18)
		$\displaystyle+\lambda_{i}(X^{s}_{i};\hat{\theta}_{i},h)\tau_{i}(\hat{\theta}_{i}\|s,g_{i},h).$		(18)

Define the following set

\displaystyle\overline{\mathcal{K}}\left[s,g_{-i},h\right]\equiv\left\{\left<X^{s}_{i},\tau_{i},\right>\middle|\begin{aligned} &\tau_{i}(\hat{\theta}_{i}|s,g_{i},h)\geq 0\\ &\tau_{i}(\theta_{i}|s,g_{i},h)\geq 0,\forall\theta_{i}\in\Theta^{\natural}_{i}\\ &\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\geq 0,\forall\theta^{\prime}_{i}\in\Theta_{i}\end{aligned}\right\}.

(19)

Then, we define the problem of Local Fixed-Point Alignment (LFPA) by

\displaystyle\min_{X^{s}_{i},\tau_{i}}Z^{\mathtt{LFPA}}_{i}\left(X^{s}_{i},\tau_{i};s,h\right),\textup{ s.t. }\left<X^{s}_{i},\tau_{i}\right>\in\overline{\mathcal{K}}\left[s,g_{-i},h\right].

(LFPA)

Let $e_{i}$ , $\bm{b}_{i}\equiv(b[\theta_{i}])_{\theta_{i}\in\Theta^{\natural}_{i}}$ , $\bm{f}_{i}\equiv(f[\theta_{i}])_{\theta_{i}\in\Theta_{i}}$ , respectively, denote the Lagrange multipliers of the constraints $\left\{\tau_{i}(\hat{\theta}_{i}|s,g_{i},h)\geq 0\right\}$ , $\left\{\tau_{i}(\theta_{i}|s,g_{i},h)\geq 0,\forall\theta_{i}\in\Theta^{\natural}_{i}\right\}$ , and $\left\{\lambda_{i}(X^{s}_{i};\theta^{\prime}_{i},h)\geq 0,\forall\theta^{\prime}_{i}\in\Theta_{i}\right\}$ . In addition, the corresponding slack variables are denoted by $w_{i}$ , $\bm{q}_{i}\equiv\{q[\theta_{i}]\}_{\theta_{i}\in\Theta^{\natural}_{i}}$ , $\bm{z}_{i}\equiv\{z[\theta_{i}]\}_{\theta_{i}\in\Theta_{i}}$ , respectively. Then, the Lagrangian of (LFPA) is defined by

		$\displaystyle L_{i}(X^{s}_{i},\tau_{i},e_{i},\bm{b}_{i},\bm{f}_{i},w_{i},\bm{q}_{i},\bm{z}_{i}\|s,h)\equiv Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i};s,h)$		(20)
		$\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}b[\theta_{i}]\left(q[\theta_{i}]-\tau_{i}(\theta_{i}\|s,g_{i},h)\right)$
		$\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\big{(}z[\theta_{i}]-\lambda_{i}(X^{s}_{i};\theta_{i},h)\big{)}+e_{i}\big{(}w-\tau^{k}_{i}(\hat{\theta}_{i}\|s,g_{i},h)\big{)}.$

To simplify the presentation, we omit $s$ and $h$ . Taking partial derivatives of $L_{i}$ with respect to $X^{s}_{i}$ and $\tau_{i}$ yields,

\begin{cases}&\begin{aligned} &\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)\\ &\equiv\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i})-\sum_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i}),\end{aligned}\\ &\begin{aligned} &D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]\right)\\ &\equiv b[\theta_{i}]-e+\frac{\partial}{\partial\tau_{i}(\theta_{i}|\cdot)}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot)),\forall\theta_{i}\in\Theta_{i}.\end{aligned}\end{cases}

Let $\bm{X}^{s}\equiv(X^{s}_{i})_{i\in N}$ , $\bm{f}\equiv(\bm{f}_{i})_{i\in N}$ , $\bm{e}\equiv(e_{i})_{i\in N}$ , $\bm{b}\equiv(\bm{b}_{i})_{i\in N}$ , and $\bm{\lambda}\equiv(\lambda_{i})_{i\in N}$ . Define

\begin{cases}&\begin{aligned} &\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)\\ &\equiv\Big{(}\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right),D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]\right)\Big{)}_{i\in N},\end{aligned}\\ &\begin{aligned} &\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)\\ &\equiv\Big{(}e_{i}\tau_{i}(\hat{\theta}_{i}),b[\theta_{i}]\tau_{i}(\theta_{i}),f[\theta_{i}]\lambda_{i}(X^{s}_{i};\theta_{i})\Big{)}_{i\in\mathcal{N}}.\end{aligned}\end{cases}

Construct the set

\mathcal{R}\left(s,h\right)\equiv\left\{\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\middle|\begin{aligned} &\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)=0\\ &\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)=0\end{aligned}\right\}.

(21)

For any $\theta_{i}\in\Theta_{i}$ , define

		$\displaystyle\gamma_{i}\left(J_{i},V_{i},\pi_{-i}\|\tau_{i},\theta_{i},a_{i},h\right)\equiv\mathtt{EV}_{i}\left(h,\theta_{i}\|\tau_{i},\pi,V_{i}\right)$		(22)
		$\displaystyle-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}\|\tau;J_{i})\Big{\|}h,\theta_{i}\right],$		(22)

where $Q_{i}(\cdot;J_{i})$ is defined via replacing $J_{i}(\cdot;V_{i})$ in (12) by $J_{i}(\cdot)$ . That is,

Q_{i}\left(h,\theta_{i},a|\tau;J_{i}\right)=\overline{R}_{i}(h,\theta_{i},a)+\delta\sum\limits_{s}J_{i}(s,a)\mu_{i}(s|\theta_{i};\tau).

Define the set

\mathcal{R}^{\dagger}\left(J,V\right)\equiv\left\{\pi\middle|\begin{aligned} &\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0,\\ &\forall i,a_{i},\theta_{i},h,(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}.

(23)

We define a set of conditions termed local admissibility as follows.

Definition 2 (Local Admissibility).

A profile $\left<\tau,\pi,J,V\right>$ is locally admissible if $\pi\in\mathcal{R}^{\dagger}\left(J,V\right)$ and $\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\in\mathcal{R}\left(s,h\right)$ , for all $s\in S$ , $h\in H$ .

Theorem 2.

Suppose that $\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}}$ is a set of linearly independent vectors for all $X^{s}_{i}$ , $i\in N$ , $s\in S$ , $h\in H$ . Then, $\left<\beta^{*},\pi^{*}\right>$ is a PPME if and only if $<\tau^{*},\pi^{*},J^{*},V^{*}>$ is locally admissible.

The proof of Theorem 2 is deferred to Appendix -H. Theorem 2 provides necessary and sufficient conditions for characterizing the PPME. In particular, if there is an algorithm converges to a local admissible point $\left<\tau,\pi,J,V\right>$ given a feasible cognition profile $\Gamma$ under a linear independence assumption, then the associated profile $\left<\beta,\pi\right>$ is a PPME. That is, a locally admissible point $\left<\tau,\pi,J,V\right>$ achieves $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ and $Z\left(\pi,V\middle|\beta,\tau\right)=0$ .

III-C Discussion

Following the simplification $\tau_{i}(\cdot)=\tau_{i}(\cdot|\cdot,g_{i},h)$ to represent the choice of $\tau_{i}$ using $\beta_{i}(h)$ , the function $Z(\pi,V|\beta,\tau)$ given by (OBJ1) can be written as $Z(\pi,V|\tau)$ , and $\mathcal{K}(\beta|\tau)$ and $\mathcal{E}(\beta|\tau)$ , respectively, given by (13) and (OPT) can be written as $\mathcal{K}(\tau)$ and $\mathcal{E}(\tau)$ . With abuse of notation, we additionally rewrite $\mathcal{E}(\tau)$ and $\mathcal{K}^{\mathtt{GFPA}}(\tau)$ , respectively, as $\mathcal{E}(\tau,V)$ and $\mathcal{K}^{\mathtt{GFPA}}(\tau,V)$ by fixing an arbitrary $V$ . Hence, $\mathcal{E}(\tau,V)$ and $\mathcal{K}^{\mathtt{GFPA}}(\tau,V)$ becomes sets of profiles $\pi$ and $\left<\tau,J\right>$ , respectively.

Suppose that the game $\mathtt{B}^{\Gamma}$ admits at least one PPME. Then, a PPME profile $\left<\tau,\pi\right>$ (or $\left<\beta,\pi\right>$ ) of $\mathtt{B}^{\Gamma}$ can be obtained by solving the following bi-level constrained optimization problem:

	$\displaystyle\min_{\pi,V}$	$\displaystyle\;Z\left(\pi,V\middle\|\tau\right),$		(24)
		$\displaystyle\textup{ s.t., }\left<\pi,V\right>\in\mathcal{K}\left(\tau\right),\left<\tau,J\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi,V\right).$		(24)

The problem (24) is equivalent to

	$\displaystyle\min_{\tau,J,V}$	$\displaystyle\;Z^{\mathtt{GFPA}}\left(\tau,J,V\middle\|\pi\right),$		(25)
		$\displaystyle\textup{ s.t., }\pi\in\mathcal{E}\left(\tau,V\right),\left<\tau,J,V\right>\in\mathcal{K}^{\mathtt{GFPA}}\left(\tau\right).$		(25)

Let us restrict attention to the problem (25). Proposition 1 implies that for any fixed $\tau$ , any $\pi\in\mathcal{E}\left(\tau,V\right)$ satisfies $Z(\pi,V|\tau)=0$ . Consider the following set

\mathcal{K}^{\dagger}\equiv\left\{\left<\tau,\pi,V,J\right>\middle|\begin{aligned} &Z\left(\pi,V\middle|\tau\right)=0,\\ &(\ref{eq:FE1}),(\ref{eq:FE2}),(\ref{eq:OB}),(\ref{eq:EQ}),\\ &(\ref{eq:regular_tau}),(\ref{eq:feasible_tau}),(\ref{eq:OB1}),(\ref{eq:EQ4})\end{aligned}\right\}.

(26)

Then, we can reformulate the problem (25) as

\displaystyle\min_{\tau,J,V}Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right),\textup{ s.t., }\left<\tau,\pi,V,J\right>\in\mathcal{K}^{\dagger}.

(27)

The total number of decision variables of the optimization problem (27) is $\mathtt{NV}=n\times\left|H\right|+n\times\left|H\right|\times\prod_{i\in N}\left|\Theta_{i}\right|+\prod_{i\in N}\left|G_{i}\right|\times\left|H\right|\times\left|\Theta_{i}\right|+\prod_{i\in N}\times\left|\Theta_{i}\right|\times\left|A_{i}\right|$ . Let $\mathcal{K}^{E}$ denote the set of active constraints and the equality constraints. If we use algorithms that depend on Linear Independence Constraint Qualification (LICQ) to solve (27), then we require the gradients of $\mathcal{K}^{E}$ be linearly independent. However, at the global minimum of (27), the number of active constraints plus the number of equality constraints are at least as great as the number of decision variables; i.e., $\left|\mathcal{K}^{E}\right|\geq\mathtt{NV}$ . In general models of $\mathtt{B}^{\Gamma}$ , the linear independence of the gradients of $\mathcal{K}^{E}$ is a relatively restrictive condition.

Theorem 2 shows that the local admissibility can fully characterize the PPME under a condition that requires $\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}}$ to be a set of linearly independent vectors for every $i\in N$ , $h\in H$ , and $s\in S$ . For any given $h\in H$ and $s\in S$ , $\left|X^{s}_{i}\right|=1+2\prod_{j\in N}\left|\Theta_{j}\right|-\left|\Theta_{i}\right|$ for all $i\in N$ , which is greater than $\left|\Theta_{i}\right|$ . Consequently, we can assert that the requirement for linear independence amongst $\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}}$ is generally less restrictive compared to the necessity for linear independence among the gradients of $\mathcal{K}^{E}$ .

The local admissibility (Definition 2) can be decomposed into two parts. First, $\pi\in\mathcal{R}^{\dagger}\left(J,V\right)$ specifies conditions for a policy profile $\pi$ given $J$ and $V$ . Second, $\left<\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right>\in\mathcal{R}\left(s,h\right)$ , for all $s\in S$ , $h\in H$ , which is independent of the profile $\pi$ . The second part of the local admissibility implies that any algorithm that searches for the zero of the gradients of the Lagrangian of (LFPA), while converging to $\pi\in\mathcal{R}^{\dagger}\left(J,V\right)$ , converges to a PPME. Designing algorithms that converge to the local admissibility will be our future work.

IV Perfect Information Cognition Choice

In this section, we study when the set of available signaling rule profiles contains a signaling rule that releases the true realizations of the states in every period.

IV-A Cognition Cost Schemes

Each agent $i$ chooses a cognition choice $g_{i,t}$ with a cost $C_{i}\in\mathbb{R}$ . Define $\mathcal{U}_{i}\equiv\left\{G_{i},S,\Theta_{i},A_{i},\Delta\left(S\right),\Delta\left(\Theta_{i}\right)\right\}$ . Let $P^{\prime}\left(\mathcal{U}_{i}\right)$ denote the power set without the empty set that includes all non-empty subsets of $\mathcal{U}_{i}$ . Then, we define the set of cost functions $C=\left(C_{i}\right)_{i\in\mathcal{N}}$ that can have as their domain any non-empty combination of $G_{i},S,\Theta_{i},A_{i}$ , $\Delta\left(S\right)$ , and $\Delta\left(\Theta_{i}\right)$ by

\mathcal{F}\equiv\left\{C=\left(C_{i}\right)_{i\in N}\middle|C_{i}:X\mapsto\mathbb{R},\textup{ for }X\in P^{\prime}\left(\mathcal{U}_{i}\right),\forall i\in N\right\}.

Here are some examples of cognition functions. The cognition function $C\in\mathcal{F}$ is cognition-based (CB) if each $C_{i}:G_{i}\mapsto\mathbb{R}$ ; that is, $C_{i}(g_{i,t})\in\mathbb{R}$ is the cost if agent $i$ chooses $g_{i,t}\in G_{i}$ . The CB cost directly prices each agent $i$ ’s cognition choice. The cost function $C\in\mathcal{F}$ is type-based (TB) if each $C_{i}:\Theta_{i}\mapsto\mathbb{R}$ ; that is, agent $i$ suffers a cost $C_{i}(\theta_{i,t})$ if a type $\theta_{i,t}$ is realized to him according to his cognition choice. With the TB cost, each agent $i$ ’s cognition decision is priced based on the realized type. The cost function $C\in\mathcal{F}$ is state-type-based (STB) if each $C_{i}:S\times\Theta_{i}\mapsto\mathbb{R}$ ; that is, each agent $i$ ’s cognition cost is $C_{i}(s_{t},\theta_{i,t})$ if the state is $s_{t}$ and agent $i$ receives a type $\theta_{i,t}$ . The STB cost takes into account the realized state and each agent’s type. This cost scheme can capture the settings when the pricing of cognition depends on the difference between the information (about the state) encapsulated in the type and the state. The cost function $C\in\mathcal{F}$ is state-action-based (SAB) if each $C_{i}:S\times\mathcal{A}_{i}\mapsto\mathbb{R}$ . That is, each agent $i$ suffers a cost $C_{i}(s_{t},a_{i,t})$ that depends on the true state $s_{t}$ and his local action $a_{i,t}$ . This cost scheme prices agent $i$ ’s cognition based on the consequences of the information acquisition (i.e., $a_{i,t}$ ) given the true state. The cost function $C\in\mathcal{F}$ is mutual information (MI) if each $C_{i}(\cdot):\Delta\left(\Theta_{i}\right)\times\Delta\left(S\right)\mapsto\mathbb{R}$ is defined by

\displaystyle C_{i}\left(\tilde{\theta}_{i,t};\tilde{s}_{t}\right)\equiv H\left(\tilde{\theta}_{i,t}\right)-H\left(\tilde{\theta}_{i,t}\middle|\tilde{s}_{t}\right),

where $H(\cdot)$ denotes the conditional entropy operator (see, e.g., [18]).

Let $\Gamma\equiv\left\{\mathcal{T}\left[G,\Theta\right],C\right\}$ denote the cognition profile for some $C\in\mathcal{F}$ . We assume that the cardinalities of $G_{i}$ and $\Theta_{i}$ are the same for all agents; i.e., $\left|G_{i}\right|=\left|G_{j}\right|$ and $\left|\Theta_{i}\right|=\left|\Theta_{j}\right|$ , for all $i\neq j$ .

IV-B Perfect-Information PPME

In this section, we introduce PPME under perfect information. We start by focusing on a general non-stationary case.

Definition 3 (Perfect Information Structure).

An information structure $\{\tau,\Theta\}$ is perfect-information if $\Theta_{i,t}=S$ and there exists $g^{*}_{i}\in G_{i},\tau_{i,t}(s|s,g^{*}_{i},h)=1$ for all $i\in N,s\in S,h\in H$ .

We use $\{\xi,S\}$ to denote the perfect(-information) information structure, where $\xi_{i}(s)=s$ for all $s\in S$ , $i\in N$ . Let $\mathcal{T}\left[\xi\right]\equiv\left[\xi;G,\Theta\right]$ with $g^{*}_{i}\in G_{i}$ and $\Theta^{g^{*}}_{i}=S$ denote the menu of signaling rules that contains $\left\{\xi,S\right\}$ , and let $\Gamma[\xi]=\left\{T\left[\xi;G,\Theta\right],\left\{C,C\right\}\right\}$ denote the corresponding cognition profile.

With abuse of notation, we use $g^{*}_{i,t}(=g^{*}_{i})\in G_{i}$ to denote agent $i$ ’s cognition choice that leads to perfect information structure. Suppose that all other agents choose $\{\xi,S\}$ in every period $t$ in the cognition stage. Hence, each agent $i$ knows that other agents observe the true $s_{t}$ (i.e., $\theta_{-i,t}=s_{t}$ ) though agent $i$ may not observe $s_{t}$ (i.e., $g_{i,t}\neq g^{*}_{i,t}$ ). For simplicity, we use $\tau_{i,t}$ as agent $i$ ’s signaling rule due to his period- $t$ cognition choice $g_{i,t}$ without specifying $g_{i,t}$ ; the same simplification is made for $\xi$ and $g^{*}$ . Hence, agent $i$ ’s value functions (H)-(HTA) can be rewritten as

	$\displaystyle\begin{aligned} &J_{i,t}\left(h_{t}\middle\|\tau_{i,t},\xi_{-(i,t)},\pi\right)\\ &=\mathbb{E}^{\tau_{i,t},\xi_{-(i,t)}}_{\pi}\left[\sum_{k=t}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle\|h_{t}\right],\end{aligned}$		(PI-H)
	$\displaystyle\begin{aligned} &V_{i,t}\left(h_{t},\theta_{i,t}\middle\|\tau_{i,t},\xi_{-(i,t)},\pi\right)=\mathbb{E}^{\tau_{i},\xi_{-(i,t)}}_{\pi}\Big{[}\overline{R}_{i}(h_{t},\theta_{i,t},\tilde{a}_{t})+\tilde{c}_{i,t}\\ &+\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\Big{\|}h_{t},\theta_{i,t}\Big{]},\end{aligned}$		(PI-HT)
	$\displaystyle\begin{aligned} &Q_{i,t}\left(h_{t},\theta_{i,t},a_{t}\middle\|\tau_{i},\xi_{-(i,t)},\pi\right)=\overline{R}_{i}(h_{t},\theta_{i,t},a_{t})+c_{i,t}\\ &+\mathbb{E}^{\xi_{-(i,t)}}_{\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle\|h_{t},\theta_{i,t}\right].\end{aligned}$		(PI-HTA)

Define the set

		$\displaystyle\Pi\left[\xi;\mathtt{B}^{\Gamma\left[\xi\right]}\right]$		(28)
		$\displaystyle\equiv\left\{\pi=(\pi_{i,t})\middle\|\begin{aligned} &V_{i,t}\left(h_{t},s_{t}\middle\|\xi,\pi\right)\geq V_{i,t}\left(h_{t},s_{t}\middle\|\xi,\left(\pi_{i,t},\pi_{-(i,t)}\right)\right),\\ &\forall i\in N,t\geq 1,h_{t}\in H,s_{t}\in S,(\ref{eq:FE1}),(\ref{eq:FE2})\end{aligned}\right\}.$		(28)

The perfect-information PPME (PI-PPME) is defined as follows.

Definition 4 (PI-PPME).

In a game $\mathtt{B}^{\Gamma\left[\xi\right]}$ , a profile $\left<\xi,\pi\right>$ constitutes a PI-PPME if for all $i\in N$ , $h\in H$ , it holds in every period $t$ that, for all $i\in N$ , $t\geq 1$ , $h_{t}\in H$ , $\tau_{i,t}\in\mathcal{T}\left[\xi;G,\Theta\right]$ ,

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\xi,\pi\right)\geq J_{i,t}\left(h_{t}\middle\|\tau_{i,t},\xi_{-(i,t)},\pi\right),$		(29)
		$\displaystyle\textit{ and }\pi\in\Pi\left[\xi;\mathtt{B}^{\Gamma\left[\xi\right]}\right].$		(29)

For any profile $\left<\tau,\pi\right>$ , define each agent $i$ ’s period- $t$ ex-post history-state value function (EP-HSA value function) by

		$\displaystyle W_{i,t}(s_{t},a_{t}\|\tau,\pi)\equiv R_{i}(s_{t},a_{t})+c_{i,t}$		(HSA)
		$\displaystyle+\mathbb{E}^{\tau}_{\pi}\left[\sum_{k=t+1}^{\infty}\delta^{k-t}\left(R_{i}(\tilde{s}_{i,k},\tilde{a}_{k})+\tilde{c}_{i,k}\right)\middle\|s_{t},a_{t}\right].$		(HSA)

Theorem 3 establishes a relationship between a PPME and a PI-PPME in terms of the H value functions.

Theorem 3.

Fix a base game $\mathtt{B}$ . For a profile $\left<\tau,\pi\right>$ that constitutes a PPME of a game $\mathtt{B}^{\Gamma}$ with feasible cognition profile $\Gamma=\left\{\mathcal{T}^{\natural}\left[G^{\prime},\Theta^{\prime}\right],C\right\}$ for any $C\in\mathcal{F}$ , there exists a PI-PPME profile $\left<\xi,\pi^{*}\right>$ of a game $\mathtt{B}^{\Gamma\left[\xi\right]}$ with feasible cognition profile $\Gamma\left[\xi\right]=\left\{\mathcal{T}^{\natural}\left[\xi;G,\Theta\right],C^{*}\right\}$ for SAB cost profile $C^{*}$ , such that, for all $i\in N$ , $t\geq 1$ , $h_{t}\in H$ ,

		$\displaystyle J_{i,t}(h_{t}\|\tau,\pi)=\mathbb{E}^{\tau}_{\pi}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]$		(30)
		$\displaystyle=\mathbb{E}^{\xi}_{\pi^{}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{}\right)\middle\|h_{t}\right]=J_{i,t}(h_{t}\|\xi,\pi^{*}).$		(30)

IV-C Value-Preserving Transformation: From PI-PPME to PPME

Given any history $h\in H$ , we define the perfect-information (PI) achievable value set, $\mathcal{V}\left(h\right)$ as follows:

		$\displaystyle\mathcal{V}\left(h\right)\equiv\Big{\{}u=(u_{i})\Big{\|}u_{i}=J_{i}\left(h\middle\|\xi,\pi^{*}\right),$		(31)
		$\displaystyle\textup{ for a PI-PPME $\left<\xi,\pi^{*}\right>$ with a {SAB} cost profile}\Big{\}}.$		(31)

For any PI-PPME with a SAB cost profile, let $\mathcal{V}^{*}(h)=\mathcal{V}^{*}(h|\xi,\pi^{*})\equiv\left\{u\middle|u_{i}=J_{i}(h|\xi,\pi^{*})\right\}=\subset\mathcal{V}\left(h\right)$ . For any $J(h)=\left(J_{i}\left(h\right)\right)_{i\in N}$ in $\mathcal{V}^{*}\left(h\right)$ , there must exist a PI-PPME with a SAB cost profile that leads to H value $J_{i}(h)$ for each agent $i$ given history $h\in H$ . Construct

		$\displaystyle\mathbf{V}_{i}\left(h,\theta_{i}\middle\|\tau,\pi;J^{*}\right)=\sum_{s,a,\theta_{-i}}\Big{(}R_{i}(s,a)+c_{i,t}$
		$\displaystyle+\delta J^{*}_{i}(s,a)\Big{)}\pi(a\|\theta_{i},\theta_{-i})\mu_{i}(s,\theta_{-i}\|\theta;\tau)T(s\|h).$

Define the function $U:\prod\limits_{h\in\mathcal{H}}\left(\mathcal{V}^{*}(h)\times\prod\limits_{i\in\mathcal{N}}\Delta\left(\Theta_{i}\right)\right)\mapsto\prod\limits_{i\in N}\prod\limits_{h\in H}\Delta\left(\Theta_{i}\right)$ by

		$\displaystyle U\left(\tau\right)(\theta_{i},h\|\pi;J(h))$		(32)
		$\displaystyle=\frac{\tau_{i}\left(\theta_{i}\middle\|h\right)+\max\left(0,\mathbf{V}_{i}\left(h,\theta_{i}\middle\|\tau,\pi;J^{}\right)-J^{}(h)\right)}{1+\sum_{\theta^{\prime}_{i}\in\Theta_{i}}\max\left(0,\mathbf{V}_{i}\left(h,\theta^{\prime}_{i}\middle\|\tau,\pi;J^{}\right)-J^{}(h)\right)}.$		(32)

Proposition 3.

Given any PI-PPME profile $\left<\xi,\pi^{*}\right>$ with a SAB cost profile, there exists at least profile $\left<\tau,\pi\right>$ with with a feasible cognition profile with any cost profile $C\in\mathcal{F}$ such that

\tau_{i}\left(\theta_{i}\middle|h\right)=U\left(\tau\right)(\theta_{i},h|\pi;J(h)).

(fp)

Theorem 4.

Given any PI-PPME profile $\left<\xi,\pi^{*}\right>$ with a SAB cost profile, a profile $\left<\tau,\pi\right>$ with a feasible cognition profile with any cost profile $C\in\mathcal{F}$ is a PPME if and only if, it satisfies (fp).

References

[1] M. Wu, S. Amin, and A. E. Ozdaglar, “Value of information in bayesian routing games,” Operations Research, vol. 69, no. 1, pp. 148–163, 2021.
[2] E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, 2011.
[3] L. Mathevet, J. Perego, and I. Taneva, “On information design in games,” Journal of Political Economy, vol. 128, no. 4, pp. 1370–1404, 2020.
[4] T. Zhang and Q. Zhu, “Bayesian promised persuasion: Dynamic forward-looking multiagent delegation with informational burning,” arXiv preprint arXiv:2201.06081, 2022.
[5] A. Celli, S. Coniglio, and N. Gatti, “Private bayesian persuasion with sequential games,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, 2020, pp. 1886–1893.
[6] Y. Babichenko, I. Talgam-Cohen, and K. Zabarnyi, “Bayesian persuasion under ex ante and ex post constraints,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 6, 2021, pp. 5127–5134.
[7] J. Gan, R. Majumdar, G. Radanovic, and A. Singla, “Bayesian persuasion in sequential decision-making,” arXiv preprint arXiv:2106.05137, 2021.
[8] E. Maskin and J. Tirole, “Markov perfect equilibrium: I. observable actions,” Journal of Economic Theory, vol. 100, no. 2, pp. 191–219, 2001.
[9] D. Fudenberg and J. Tirole, Game Theory. Ane Books, 2005. [Online]. Available: https://books.google.com.hk/books?id=Ij7WQwAACAAJ
[10] J. Filar and K. Vrieze, “Competitive markov decision processes-theory, algorithms, and applications,” 1997.
[11] H. Prasad and S. Bhatnagar, “General-sum stochastic games: Verifiability conditions for nash equilibria,” Automatica, vol. 48, no. 11, pp. 2923–2930, 2012.
[12] H. Prasad, P. LA, and S. Bhatnagar, “Two-timescale algorithms for learning nash equilibria in general-sum stochastic games,” in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015, pp. 1371–1379.
[13] J. Song, H. Ren, D. Sadigh, and S. Ermon, “Multi-agent generative adversarial imitation learning,” Advances in neural information processing systems, vol. 31, 2018.
[14] J. Wu, Z. Zhang, Z. Feng, Z. Wang, Z. Yang, M. I. Jordan, and H. Xu, “Sequential information design: Markov persuasion process and its efficient reinforcement learning,” arXiv preprint arXiv:2202.10678, 2022.
[15] T. Zhang and Q. Zhu, “Forward-looking dynamic persuasion for pipeline stochastic bayesian game: A fixed-point alignment principle,” arXiv preprint arXiv:2203.09725, 2022.
[16] O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes: basic optimality criteria. Springer Science & Business Media, 2012, vol. 30.
[17] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[18] Elements of information theory. John Wiley & Sons, 1999.

-D Proof of Lemma 1

The only if part is straightforward. In particular, if $\left<\beta,\pi\right>$ is a PPME, then $\pi=(\pi_{i},\pi_{-i})\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right]$ and (6) holds for all $\pi^{\prime}_{i}$ . Hence, (6) also holds for $\pi_{i}$ .

We proceed with the proof of the if part by establishing a contradiction. Suppose that there exists a pair $(\beta^{\prime}_{i,t},\pi^{\prime}_{i,t})$ such that

J_{i}\left(h\middle|\tau,\beta,\pi\right)<J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right)\right).

(33)

The H value function $J_{i}$ can be constructed in terms of $\mathtt{EV}_{i}$ as follows:

J_{i}\left(h\middle|\beta,\pi\right)=\sum_{\theta_{i},s}\mathtt{EV}_{i}\left(h,\theta_{i}\middle|\tau,\beta,\pi;V_{i}\right)\tau_{i}\left(\middle|s,\beta_{i}(h)\right)T(s|h).

Since $\pi\in\Pi\left[\beta;\mathtt{B}^{\Gamma}\right]$ , we obtain

		$\displaystyle\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\beta,\pi;V_{i}\right)=\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi;V_{i}\right)$
		$\displaystyle\geq\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\beta,\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right);V_{i}\right)$
		$\displaystyle=\mathtt{EV}_{i}\left(h,\theta_{i}\middle\|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right);V_{i}\right),$

which implies

J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi\right)\geq J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\left(\pi^{\prime}_{i,t},\pi_{-(i,t)}\right)\right).

However, due to (33), we have

J_{i}\left(h\middle|\tau,\left(\beta^{\prime}_{i,t},\beta_{-(i,t)}\right),\pi\right)>J_{i}\left(h\middle|\tau,\beta,\pi\right),

which contradicts to (8). Thus, we complete the proof of the lemma. $\square$

-E Proof of Proposition 1

Suppose that $\left<\beta,\pi\right>$ with $V$ is a PPME. By the definition of PPME, it is straightforward to show that the constraints (FE1), (FE2), (EQ1), and (EQ2) are satisfied. Hence, $\left<\pi,V\right>$ is a feasible solution to the optimization problem of (OPT). By construction, $Z\left(\pi,V\middle|\beta,\tau,C\right)=0$ . From the feasibility, $\left<\beta,\pi\right>$ is a global minimum of the optimization problem of (OPT).

Conversely, suppose that $\left<\pi,V\right>\in\mathcal{K}\left(\beta\middle|\tau,C\right)$ with $Z\left(\pi,V\middle|\beta,\tau,C\right)=0$ . Then, the constraints (EQ1) and (EQ2) imply that, for all $i\in\mathcal{N}$ , $h\in\mathcal{H}$ , $\theta_{i}\in\Theta_{i}$ with $\tau_{i}(\theta_{i}|s,\beta_{i}(h))>0$ where $s\in\mathcal{S}$ with $T(s|h)>0$ ,

\displaystyle V_{i}\left(h,\theta\right)\geq\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\beta,\tau;V_{i}\right)\pi\left(a\middle|\theta\right).

However $Z\left(\pi,V\middle|\beta,\tau,C\right)=0$ . Then, we obtain, for all $i\in\mathcal{N}$

\displaystyle V_{i}\left(h,\theta\right)=\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\beta,\tau;V_{i}\right)\pi\left(a\middle|\theta\right).

By iteration, we have that $V$ is the unique optimal HT value function profile associated with $\pi$ . In addition, the constraint (EQ1) implies that given $V$ , $\beta$ is a PPME selection policy profile. Therefore, the profile $\left<\beta,\pi\right>$ is a PPME. $\square$

-F Proof of Proposition 2

Suppose that $\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau,C\right)$ , i.e., $\left<\beta,\pi,V\right>$ is a global minimum of the optimization problem in (OPT) with $Z\left(\pi,V\middle|\beta,\tau,C\right)=0$ . Then, the constraints (RG1) and (RG2) are trivially satisfied. Proposition 1 implies that $\left<\beta,\pi\right>$ is a PPME. From the construction of $Z(\cdot)$ in (OBJ1) and the constraint (EQ2), we have that $\left<\tau,\pi,V\right>$ satisfies (EQ4). According to (10), we construct $J$ as $J_{i}(h)=\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T(s|h)$ . Then, $Z^{\mathtt{FPA}}\left(\tau,J,V\middle|\pi,C\right)=0$ . Since $\left<\pi,V\right>$ satisfies (EQ1) given $\tau$ ,

J_{i}(h)\geq\sum_{\theta_{-i},s}V_{i}\left(h,\theta_{i},\theta_{-i}\right)\tau_{-i}\left(\theta_{-i}\middle|s,\beta_{-i}(h)\right)T(s|h),

for all $\theta_{i}\in\Theta_{i}$ and $h\in\mathcal{H}$ , which implies (EQ3). From the constraints (EQ3) and (EQ4), we know that for any feasible $\left<\tau^{\prime},J^{\prime},V^{\prime}\right>$ , $Z^{\mathtt{FPA}}\left(\tau^{\prime},J^{\prime},V^{\prime}\middle|\pi^{\prime}\right)\geq 0$ where $\pi^{\prime}$ is the corresponding policy profile. Therefore, from $Z^{\mathtt{FPA}}\left(\tau,J,V\middle|\pi,C\right)=0$ , we conclude that $\left<\tau,J,V\right>$ is a global minimum of the optimization problem in (GFPA).

Conversely, let $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{FPA}}\left(\pi,C\right)$ with $Z^{\mathtt{FPA}}(\tau,J,V|\pi)=0$ . Then

J_{i}(h)=\sum_{\theta,s}V_{i}\left(h,\theta\middle|\tau,\beta,\pi\right)\tau\left(\theta\middle|s,\beta\left(h\right)\right)T(s|h).

(34)

The constraint (EQ4) directly implies (EQ2), while the constraint (EQ3) implies

	$\displaystyle J_{i}\left(h\right)$	$\displaystyle\geq\sum\limits_{s,\theta_{-i},a}\Bigg{(}\left(\overline{R}_{i}(h,\theta_{i},a)+J_{i}\left(s,a\right)\right)$		(35)
		$\displaystyle\times\pi(a\|\theta_{i},\theta_{-i})\tau\left(\theta_{i},\theta_{-i}\|s,\beta(h)\right)T(s\|h)\Bigg{)},$		(35)

where the RHS can be written as

	$\displaystyle\textup{RHS of }(\ref{eq:proof_FPA_2})$	$\displaystyle=\sum_{s,\theta_{-i},a}\Bigg{(}\left(\overline{R}_{i}(h,\theta_{i},a)+\sum_{s}J_{i}\left(s,a\right)\mu_{i}(s\|\theta_{i},h)\right)$
		$\displaystyle\times\pi(a\|\theta)\tau_{-i}\left(\theta_{-i}\|s,\beta(h)\right)T(s\|h)\Bigg{)}.$

Construct

		$\displaystyle\mathbf{Q}_{i}\left(h,\theta_{i},a\middle\|\tau,\beta;V_{i}\right)=\overline{R}_{i}\left(h,\theta_{i},a\right)$
		$\displaystyle+\delta\sum\limits_{s}\sum_{\theta,s}V_{i}\left(h,\theta\middle\|\tau,\beta,\pi\right)\tau\left(\theta\middle\|s,\beta\left(h\right)\right)T\left(s\middle\|h\right)\mu_{i}\left(s\middle\|\theta_{i};\tau\right).$

Then,

	$\displaystyle\textup{RHS of }(\ref{eq:proof_FPA_2})$	$\displaystyle=\sum_{s,\theta_{-i},a}\Bigg{(}\left(\mathbf{Q}_{i}\left(h,\theta_{i},a\middle\|\tau,\beta;V_{i}\right)\right)$
		$\displaystyle\times\pi(a\|\theta)\tau_{-i}\left(\theta_{-i}\|s,\beta(h)\right)T(s\|h)\Bigg{)}.$

The constraint (EQ4) implies $V_{i}\left(h,\theta\right)=$ $\sum_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle|\tau,\beta;V_{i}\right)\pi(a|\theta),$ and thus $Z(\pi,V|\tau)=0$ . Hence, from (34) and (35), we have

	$\displaystyle\sum_{\theta,s}$	$\displaystyle V_{i}\left(h,\theta\right)\tau(\theta\|s,\beta(h))T(s\|h)$
		$\displaystyle\geq\sum_{\theta_{-i},s}V_{i}\left(h,\theta\right)\tau_{-i}(\theta_{-i}\|s,\beta_{-i}(h))T(s\|h),$

for all $\theta_{i}\in\Theta_{i}$ , which implies (EQ1). Therefore, $\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau,C\right)$ with $Z(\pi,V|\tau)=0$ . $\square$

-G Proof of Theorem 2

To prove Theorem 2, we show that $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right)$ with $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ if and only if $\left<\tau,\pi\right>$ is locally admissible.

Local Admissibility $\Longrightarrow$ PPME

Fix any $s\in S$ and $h\in H$ . Suppose that $\left<\tau,\pi\right>$ is locally admissible. From $\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)=0$ ,

\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)-\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)=0

Since $\left\{\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)\right\}_{\theta_{i}\in\Theta_{i}}$ is a set of linearly independent vectors for all $X^{s}_{i}$ , $i\in\mathcal{N}$ , $s\in\mathcal{S}$ , $h\in\mathcal{H}$ , we have, for all $i\in\mathcal{N}$ ,

f\left[\theta_{i}\right]=\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right),\textup{ for all }\theta_{i}\in\Theta_{i}.

(36)

In the decomposition $\Theta_{i}=\Theta^{\natural}_{i}\bigcup\left\{\hat{\theta}_{i}\right\}$ , $\hat{\theta}_{i}$ can be fully characterized by $\Theta^{\natural}_{i}$ . That is,

\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)=1-\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right).

From $D_{i}\left(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),e_{i},b[\theta_{i}]|s,h\right)=0$ , we have $b[\theta_{i}]-e+\frac{\partial}{\partial\tau_{i}(\theta_{i}|\cdot)}M_{i}(X^{s}_{i},\tau_{i}(\theta_{i}|\cdot),s,h)=0$ . Then, $b[\theta_{i}]=e+\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)$ and $e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]$ . From (38) and $\mathbf{K}(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda})=0$ , we have, for all $\theta_{i}\in\Theta^{\natural}_{i}$ ,

		$\displaystyle\tau_{i}\left(\theta_{i}\middle\|s,g_{i},h\right)\left(\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+e\right)=0$
		$\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle\|s,g_{i},h\right)\left(-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]\right)=0,$

and for all $\theta_{i}\in\Theta_{i}$ ,

\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),

which implies

		$\displaystyle b[\theta_{i}]=-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),\forall\theta_{i}\in\Theta^{\natural}_{i},$		(37)
		$\displaystyle e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right).$		(37)

Therefore, $\bm{F}\left(\bm{X}^{s},\tau,\bm{e},\bm{b},\bm{f}\right)=0$ and $\bm{K}\left(\bm{e},\bm{b},\bm{f};\tau,\bm{\lambda}\right)=0$ imply $Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0$ , leading to $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ . In addition, $\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0$ implies that $Z(\pi,V|\tau)=0$ . From Proposition 1, $\left<\tau,\pi\right>$ with $V$ constitutes a PPME. Then, from Proposition 2, we have $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi,C\right)$ with $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ .

PPME $\Longrightarrow$ Local Admissibility

Suppose that $\left<\tau,\pi\right>$ is a PPME. Hence, Proposition 2 implies $\left<\tau,J,V\right>\in\mathcal{E}^{\mathtt{GFPA}}\left(\pi\right)$ with $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ . Then, it holds for every $i\in\mathcal{N}$ that

J_{i}(h)\geq\sum\nolimits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau(\theta_{i},\theta_{-i}|s,g,h)T_{s}(s|h),

for all $i\in\mathcal{N}$ , $h\in\mathcal{H}$ , $\theta_{i}\in\Theta_{i}$ . which implies that $\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0$ . Since $Z^{\mathtt{GFPA}}\left(\tau,J,V\middle|\pi\right)=0$ , we have

J_{i}(h)=\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta|s,g,h)T_{s}(s|h).

Then, from the definition of $Z^{\mathtt{LFPA}}_{i}$ in (18), $Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)=0$ . Since $\lambda_{i}(X^{s}_{i};\theta_{i},h)\geq 0$ for all $\theta_{i}\in\Theta_{i}$ , we have

\tau_{i}(\theta_{i}|s,g,h)\lambda_{i}(X^{s}_{i};\theta_{i},h)=0.

By constructing $f[\theta_{i}]$ according to (38) and $b[\theta_{i}]$ and $e$ according to (39), respectively, we can show that there exist Lagrange multipliers such that the conditions in $\mathcal{R}\left(s,h\right)$ are satisfied.

From Proposition 1, $\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right)$ with $Z(\pi,V|\tau)=0$ . Hence, we have

\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]\geq 0.

However, $Z(\pi,V|\tau)=0$ . Then, it holds that

\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]=0.

Therefore, we have $\pi_{i}(a_{i}|\theta_{i})\gamma_{i}\left(J_{i},V_{i},\pi_{-i}|\tau_{i},\theta_{i},a_{i},h\right)=0$ , for all $i\in N,a_{i}\in A_{i},\theta_{i}\in\Theta_{i},h\in\mathcal{H},(\ref{eq:FE1}),(\ref{eq:FE2})$ . Thus, we conclude that $\left<\tau,\pi\right>$ is locally admissible. $\square$

-H Proof of Theorem 2

Local Admissibility $\Longrightarrow$ PPME

Fix any $s\in S$ and $h\in H$ . Suppose that $\left<\tau,\pi\right>$ is locally admissible. From $\Delta_{i}\left(X^{s}_{i},\tau_{i},\bm{f}_{i}\right)=0$ ,

\gradient_{X^{s}_{i}}Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i},s,h)-\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\gradient_{X^{s}_{i}}\lambda_{i}(X^{s}_{i};\theta_{i},h)=0

f\left[\theta_{i}\right]=\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right),\textup{ for all }\theta_{i}\in\Theta_{i}.

(38)

In the decomposition $\Theta_{i}=\Theta^{\natural}_{i}\bigcup\left\{\hat{\theta}_{i}\right\}$ , $\hat{\theta}_{i}$ can be fully characterized by $\Theta^{\natural}_{i}$ . That is,

\tau_{i}\left(\hat{\theta}_{i}\middle|s,g_{i},h\right)=1-\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right).

		$\displaystyle\tau_{i}\left(\theta_{i}\middle\|s,g_{i},h\right)\left(\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+e\right)=0$
		$\displaystyle\tau_{i}\left(\hat{\theta}_{i}\middle\|s,g_{i},h\right)\left(-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right)+\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right)+b[\theta_{i}]\right)=0,$

and for all $\theta_{i}\in\Theta_{i}$ ,

\tau_{i}\left(\theta_{i}\middle|s,g_{i},h\right)\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),

which implies

		$\displaystyle b[\theta_{i}]=-\lambda_{i}\left(X^{s}_{i};\theta_{i},h\right),\forall\theta_{i}\in\Theta^{\natural}_{i},$		(39)
		$\displaystyle e=-\lambda_{i}\left(X^{s}_{i};\hat{\theta}_{i},h\right).$		(39)

PPME $\Longrightarrow$ Local Admissibility

J_{i}(h)\geq\sum\nolimits_{\theta_{-i},s}V_{i}(h,\theta_{i},\theta_{-i})\tau(\theta_{i},\theta_{-i}|s,g,h)T_{s}(s|h),

J_{i}(h)=\sum\nolimits_{\theta,s}V_{i}(h,\theta)\tau(\theta|s,g,h)T_{s}(s|h).

\tau_{i}(\theta_{i}|s,g,h)\lambda_{i}(X^{s}_{i};\theta_{i},h)=0.

From Proposition 1, $\left<\pi,V\right>\in\mathcal{E}\left(\beta\middle|\tau\right)$ with $Z(\pi,V|\tau)=0$ . Hence, we have

\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]\geq 0.

However, $Z(\pi,V|\tau)=0$ . Then, it holds that

\mathtt{EV}_{i}\left(h,\theta_{i}|\tau_{i},\pi,V_{i}\right)-\mathbb{E}^{\mu_{i}}_{\pi_{-i}}\left[Q_{i}(h,\theta_{i},a_{i},\tilde{a}_{-i}|\tau;J_{i})\Big{|}h,\theta_{i}\right]=0.

-I Proof of Theorem 3

Consider $\left<\tau,\pi\right>$ that is a PPME of a game $\mathtt{B}^{\Gamma}$ with any cognition cost profile $C\in\mathcal{F}$ . Hence, the profile $\left<\tau,\pi\right>$ and the base game model $\mathtt{B}$ induces the probability measures $P\left[\tau,\pi\right]$ , $P\left[\tau,\pi\middle|h\right]$ , and $P\left[\tau,\pi\middle|h,\theta_{i}\right]$ . With abuse of notation, we use $f(\cdot)$ to denote the marginal mass or density function corresponding to $P\left[\tau,\pi\right]$ ; e.g., $f(a_{t}|h_{t})$ , $f(c_{i,t}|h_{t})$ , $f(a_{i,t}|h_{t},\theta_{i,t})$ . Given the PPME $\left<\tau,\pi\right>$ of a game $\mathtt{B}^{\Gamma}$ with feasible cognition profile $\Gamma$ , consider a profile $\left<\xi,\pi^{*}\right>$ and a SBA cost profile $C^{*}$ that satisfy the following, for all $i\in\mathcal{N}$ , $t\geq 1$ ,

\displaystyle\pi^{*}_{i,t}(a_{i,t}|s_{t})\equiv f(a_{i,t}|s_{t})\textup{ and }C^{*}_{i,t}(s_{t},a_{i,t})\equiv\int_{\mathcal{C}}c_{i,t}f(c_{i,t}|s_{t},a_{i,t}).

In the PPME $\left<\tau,\pi\right>$ of the game $\mathbb{B}^{\Gamma}$ , the H value function for any $h_{t}\in\mathcal{H}$ can be given by,

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\tau,\pi\right)=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{i,t},\tilde{a}_{t}\right)+\tilde{c}_{i,t}+\delta J_{i,t+1}\left(\tilde{s}_{i,t},\tilde{a}_{t}\middle\|\tau,\pi\right)\middle\|h_{t}\right]$		(40)
		$\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{i,t},\tilde{a}_{t}\right)\middle\|h_{t}\right]+\mathbb{E}^{\tau}_{\pi}\left[\tilde{c}_{i,t}\middle\|h_{t}\right]+\delta\mathbb{E}^{\tau}_{\pi}\left[J_{i,t+1}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\pi,\tau\right)\middle\|h_{t}\right].$		(40)

Given the profile $\left<\xi,\pi^{*}\right>$ , the H value function for any $h_{t}\in\mathcal{H}$ can be represented in terms of the EP-HSA value function as follows:

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\xi,\pi^{}\right)=\mathbb{E}^{\xi}_{\pi^{}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]$		(41)
		$\displaystyle=\sum_{s_{t},a_{t}}W_{i,t}\left(s_{t},a_{t}\middle\|\xi,\pi^{}\right)\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})\pi^{*}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t}}C^{}_{i,t}(s_{t},a_{t})\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{}\right)\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})f(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}\|s_{t},a_{t})f(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{*}\right)f(a_{t}\|s_{t})T(s_{t}\|h_{t}).$

Given $P\left[\tau,\pi\right]$ , it holds that $f(a_{t}|s_{t})T(s_{t}|h_{t})=f(s_{t},a_{t}|h_{t})$ . Hence,

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\xi,\pi^{}\right)=\mathbb{E}^{\xi}_{\pi^{}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]$
		$\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[R_{i}\left(\tilde{s}_{t},\tilde{a}_{t}\right)+\tilde{c}_{i,t}+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]$
		$\displaystyle=\mathbb{E}^{\tau}_{\pi}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]=J_{i,t}\left(h_{t}\middle\|\tau,\pi\right).$

Next, we prove that the profile $\left<\xi,\pi^{*}\right>$ with $C^{*}$ is indeed a PI-PPME. We proceed with the proof by showing a contradiction. Let $\hat{\xi}=\xi\circ\tau_{i,t}\equiv\left(\xi_{1},\dots,\xi_{t-1},(\hat{\tau}_{i,t},\xi_{-i,t}),\xi_{t+1},\dots\right)$ denote a profile that is the same as $\xi$ except for agent $i$ ’s period- $t$ choice $\tau_{i,t}$ . Define $\hat{\pi}=\pi^{*}\circ\hat{\pi}_{i,t}$ in the same way such that $\hat{\pi}\in\Pi\left[\hat{\xi};\mathtt{B}^{\Gamma\left[\hat{\xi}\right]}\right]$ . In addition, the cognition cost $C^{*}$ remains the same. Given any history $h_{t}\in\mathcal{H}$ , the the profile $\left<\hat{\xi},\hat{\pi}\right>$ induces the H value function as

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\hat{\xi},\hat{\pi}\right)$
		$\displaystyle=\sum_{s_{t},a_{t},\theta_{i,t}}R_{i}(s_{t},a_{t})f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t},\theta_{i,t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}\|s_{t},a_{t})f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t},\theta_{i,t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{*}\right)f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t}).$

Here, $f(a_{t}|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}|s_{t},h_{t})T(s_{t}|h_{t})\neq f(s_{t},a_{t},\theta_{t}|h_{t})$ because $f(\cdot)$ is corresponding to $P\left[\tau,\pi\right]$ . If $\left<\xi,\pi^{*}\right>$ with $C^{*}$ is not a PI-PPME, then there must exist a history $h_{t}$ and a profile $\left<\hat{\xi},\hat{\pi}\right>$ such that $J_{i,t}(h_{t}|\hat{\xi},\hat{\tau})>J_{i,t}(h_{t}|\xi,\pi^{*})=J_{i,t}(h_{t}|\tau,\pi)$ , which implies that $\left<\tau,\pi\right>$ can be strictly improved by unilateral deviation $(\hat{\xi}_{i,t},\hat{\pi}_{i,t})$ which contradicts the fact that $\left<\tau,\pi\right>$ is PPME. $\square$

		$\displaystyle Z\left(\pi,V\middle\|\beta,\tau\right)$		(OBJ1)
		$\displaystyle\equiv\sum_{h,s,\theta}\Bigg{(}\sum_{i}\left(V_{i}\left(h,\theta\right)-\sum\limits_{a}\mathbf{Q}_{i}\left(h,\theta_{i},a\middle\|\tau,\beta;V_{i}\right)\pi\left(a\middle\|\theta\right)\right)$
		$\displaystyle\times\tau\left(\theta\middle\|s,\beta(h)\right)T_{s}\left(s\middle\|h\right)\Bigg{)},$

		$\displaystyle L_{i}(X^{s}_{i},\tau_{i},e_{i},\bm{b}_{i},\bm{f}_{i},w_{i},\bm{q}_{i},\bm{z}_{i}\|s,h)\equiv Z^{\mathtt{LFPA}}_{i}(X^{s}_{i},\tau_{i};s,h)$		(20)
		$\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta^{\natural}_{i}}b[\theta_{i}]\left(q[\theta_{i}]-\tau_{i}(\theta_{i}\|s,g_{i},h)\right)$
		$\displaystyle+\sum\nolimits_{\theta_{i}\in\Theta_{i}}f[\theta_{i}]\big{(}z[\theta_{i}]-\lambda_{i}(X^{s}_{i};\theta_{i},h)\big{)}+e_{i}\big{(}w-\tau^{k}_{i}(\hat{\theta}_{i}\|s,g_{i},h)\big{)}.$

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\xi,\pi^{}\right)=\mathbb{E}^{\xi}_{\pi^{}}\left[W_{i,t}\left(\tilde{s}_{t},\tilde{a}_{t}\middle\|\xi,\pi^{*}\right)\middle\|h_{t}\right]$		(41)
		$\displaystyle=\sum_{s_{t},a_{t}}W_{i,t}\left(s_{t},a_{t}\middle\|\xi,\pi^{}\right)\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})\pi^{*}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t}}C^{}_{i,t}(s_{t},a_{t})\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{}\right)\pi^{}_{t}(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle=\sum_{s_{t},a_{t}}R_{i}(s_{t},a_{t})f(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}\|s_{t},a_{t})f(a_{t}\|s_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{*}\right)f(a_{t}\|s_{t})T(s_{t}\|h_{t}).$

		$\displaystyle J_{i,t}\left(h_{t}\middle\|\hat{\xi},\hat{\pi}\right)$
		$\displaystyle=\sum_{s_{t},a_{t},\theta_{i,t}}R_{i}(s_{t},a_{t})f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\sum_{s_{t},a_{t},\theta_{i,t}}\int_{\mathcal{C}}c_{i,t}f(c_{i,t}\|s_{t},a_{t})f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t})$
		$\displaystyle+\delta\sum_{s_{t},a_{t},\theta_{i,t}}J_{i,t+1}\left(s_{t},a_{t}\middle\|\xi,\pi^{*}\right)f(a_{t}\|s_{t},\theta_{i,t})\hat{\tau}_{i,t}(\theta_{i,t}\|s_{t},h_{t})T(s_{t}\|h_{t}).$

Stochastic Game with Interactive Information Acquisition: Pipelined Perfect Markov Bayesian Equilibrium Version 05 October, 2023

Abstract

I Introduction

II Problem Formulation and Equilibrium

II-A Base Game Model

II-B Interactive Information Acquisition

II-C Stochastic Game with Interactive Information Acquisition

II-D Value Functions

II-E Pipelined Perfect Markov Bayesian Equilibrium

Definition 1 (Pipelined Perfect Markov Bayesian Equilibrium).

Lemma 1.

Theorem 1.

III Equilibrium Characterizations

Proposition 1.

III-A Global Fixed-Point Alignment

Proposition 2.

III-B Local Fixed-Point Alignment

Definition 2 (Local Admissibility).

Theorem 2.

III-C Discussion

IV Perfect Information Cognition Choice

IV-A Cognition Cost Schemes

IV-B Perfect-Information PPME

Definition 3 (Perfect Information Structure).

Definition 4 (PI-PPME).

Theorem 3.

IV-C Value-Preserving Transformation: From PI-PPME to PPME

Proposition 3.

Theorem 4.

References

-D Proof of Lemma 1

-E Proof of Proposition 1

-F Proof of Proposition 2

-G Proof of Theorem 2

-H Proof of Theorem 2

-I Proof of Theorem 3

Stochastic Game with Interactive Information Acquisition:
Pipelined Perfect Markov Bayesian Equilibrium
Version 05 October, 2023