Informational Design of Dynamic Multi-Agent System
First draft: April 2021)
Abstract
This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents’ strategic interactions are captured by a Markov game, in which each agent first selects one external signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (principal) who possesses one signal source and aims to influence the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. We propose a direct information design approach that incentivizes each agent to select the signal sent by the principal, such that the design process avoids the predictions of the agents’ strategic selection behaviors. We then introduce the design protocol given a goal of the designer which we refer to as obedient implementability (OIL) and characterize the OIL in a class of obedient sequential Markov perfect equilibria (O-SMPE). A design regime is proposed based on an approach which we refer to as the fixed-point alignment that incentivizes the agents to choose the signal sent by the principal, guarantees that the agents’ policy profile of taking actions is the policy component of an O-SMPE and the principal’s goal is achieved. We then formulate the principal’s optimal goal selection problem in terms of information design and characterize the optimization problem by minimizing the fixed-point misalignments. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.
I Introduction
Building rational multi-agent system is an important research desideratum in Artificial Intelligence. In goal-directed decision making systems, an agent’s action is controlled by its consequence [1]. In a game, the consequence of an agent’s action is the outcome of the game, given as the reward of taking that action as well as the actions of his opponents, which situates the optimality criterion of each agent’s decision making in the game. A rational agent’s reward may also depend on the payoff-relevant information, in addition to the actions. The information may include the situation of the agents in a game, referred to as the state of the world, as well as his knowledge about his opponents’ diverging interests and their preferences over the outcomes of the game. Incorporating such payoff-relevant information in his decisions constitutes an essential part of an agent’s rationality in the strategic interactions with his opponents. Hence, one may re-direct the goal achievement of rational agents in a game by information provision. In economics, this refers to as information design, which studies how an information designer (she) can influence agents’ optimal behaviors in a game to achieve her own objective, through the design of information provided to the game [2].
Referred to as the inverse game theory, mechanism design is a well-developed mathematical theory in economics that provides general principles of how to design rules of games (e.g., rewarding systems with specifications of actions and outcomes) to influence the agents’ strategic interactions and achieve system-wide goals while treating the information as given. Information design, on the other hand, considers the circumstances when the information in the environment is under the control of the system designer and offers a new approach to indirectly elicit agents’ behaviors by keeping the game rules fixed [3].
This work considers a infinite-horizon Markov game of a finite number of agents. Each agent’s identity is characterized by his type. At each period of time, agents observe a payoff-relevant global state (state). In addition to the state, each agent observes a batch of signals (signal batch, batch) at each period and then strategically chooses one signal from the batch as the additional information to support his decision of taking a action. Each agent’s one-period reward (parameterized by the type) is determined by his own action, the actions of his opponents, the global state, and his choice of signal. We refer to this game as a base Markov game (BMG). The transition of the state and the distribution of signals are referred to as the information structure of the BMG. In a BMG, each agent’s behavior includes selecting a signal according to a selection rule and taking an action according to a policy. Here, each agents’ selection of signal and the choice of action are coupled since the selected signal enters the policy to determine the choice of the action. If a mechanism designer aims to incentivize the agents to behave in her desired way, she directly modifies the BMG–reversing the game–by designing the game model, including changing the reward function associated with actions and outcomes, while treating the information structure as a given part of the environment. An information designer, however, treats the BMG model as fixed and modifies the information structure to elicit agents’ equilibrium behaviors that coincide with her objective.
We study a novel dynamic information design problem in the BMG in which there are multiple sources of signals (signal sources, sources) and each of them sends one signal to each agent. The signals sent by all sources constitute the signal batch observed by each agent at each time. Among these sources, there is one rational information designer (referred to as principal, she) who controls one signal source and intends to strategically craft the information structure of her signal by choosing a signaling rule to indirectly control the equilibrium of the BMG. We consider that other sources of signals provide additional information to the agents in a non-strategic take-it-or-leave-it manner. The goal of the principal is to induce the agents to take actions according to an equilibrium policy that is desired by the principal. However, the principal has no ability to directly program the agents’ behaviors to force them to take certain actions. Instead, her information design should provide incentive to rational agents to behave in her favor. We study the extent to which the provision of signals along by controlling a single signal source can influence the agents’ behavior in a BMG, when the agents have the freedom to choose any available signal in the batch. We will name the BMG with a rational principal in this setting as an augmented Markov game (AMG).
Since the principal’s design problem keeps the base game unchanged, our model fits the scenarios when the agents are intrinsically motivated and their internal reward systems translate information from external environment into internal reward signals [4]. Intrinsically-motivated rational agents can be human decision makers with intrinsic psychological preferences or intelligent agents programmed with internal reward system. The setting of multiple sources of additional information captures the circumstances when the environment is perturbed by noisy information, in which the agents may improperly use redundant and useless information to make their decisions that may deviate from the system designer’s desire. Also, the principal can be an adversary who aims to manipulate the strategic interactions in a multi-agent system through the provision of disinformation, without intruding each agent’s local system to make any physical or digital modifications.
Although the principal’s objective of the information design in an AMG is to elicit an equilibrium distribution of actions, her design problem has to take into consideration how the agents select the signals from their signal batches because each agent’s choice of action is coupled with his selection of signal. In an information design problem, the principal chooses an information structure such that each agent selects a signal using a selection rule and then takes an action according to a policy which matches the principal’s goal. The latter is constrained by the notion admissibility. In general, the signals sent by the principal may not be selected by some agents, thereby the actions taken by those agents are independent of the principal’s (realized) signals. However, even though her signal does not enter an agent’s policy to realize an action, the principal may still influence the agent’s action because the information structure of the signal batch is influenced by her choice of signaling rule. The information structure of the signal batch affects the agents’ selections of signals. Hence, the agents’ behaviors indirectly depend on the principal’s signaling rule. We refer to such information design as indirect information design (IID) which requires the principal to accurately predict each agents’ strategic selection rule and their policy profiles that might be induced by the signaling rule. We restrict attention to another class of information design, referred to as direct information design (DID). In DID problems, each agent always selects the signal sent by the principal and then takes an action. Thus, the realizations of the principal’s signals directly enter the agents’ policies to choose actions. In addition to the admissibility, another restriction of the principal’s DID problem is captured by the notion of obedience which requires that each agent is incentivized to select the signal from the principal rather than choose one from other signal sources. The key simplification provided by the DID is that the principal’s prediction of the agents’ strategic selection rules is replaced by replaced by a straightforward obedient selection rule that always prefers the principal’s signals.
This paper makes three major contributions to the foundations of information design. First, we define a dynamic direct information design problem in an environment where the agents have the freedom to choose any available signal as addition payoff-relevant information. Captured by the notion of obedient implementability, the principal’s information problem is constrained by the obedient condition that incentivizes the agents to prefer the signals sent by the principal and the admissibility condition such that the agents take actions which meets the principal’s goal. Our information design problem is distinguished from others in economics that study the commitment of the information design in a game when there is only a single source of additional information in static settings (e.g., [5, 3, 6, 7]) as well as in dynamic environment (e.g., [8, 9, 10, 11, 12, 13]) and the settings in which the agents do not make a choice from multiple designers (e.g., [14]).
Second, we propose a new solution concept termed obedient sequential Markov perfect equilibrium (O-SMPE) which allows us to handle the undesirable deviations of agents in a principled manner. By bridging the augmented Markov game model with dynamic programming and uncovering the close relationship between the sequential-perfect relationship of the O-SMPE and a pair of fixed points, we characterize the obedient implementability and explicitly formulate the information design regime given a goal of the principal. The proposed framework is based on an approach referred to as fixed-point alignment which selects a signaling rule that matches the first fixed point from the agents’ optimal signaling selection to the second fixed point of optimal action takings. However, the principal cannot achieve just any goal she wishes to. We identify the key conditions, known as Markov perfect goal and strong admissibility, that discipline the freedom of the principal’s ability to influence the agents’ equilibrium behaviors.
Third, we formulate the principal’s goal selection problem and transform it to a direct information design problem without a predetermined goal. The principal’s problem is thus to select a signaling rule such that it induces agents’ equilibrium policy profiles that maximize (optimal information design) or minimize (robust information design) her expected payoff. Our formulation does not assume the availability of all possible equilibrium policy profiles that can be induced by each possible signaling rule and thus the principal can select equilibrium policy profiles for her choice of signaling rule. Instead, our framework takes into consideration the role of the signaling rule in ensuring the agents to converge to an equilibrium. A new approach is proposed based on a condition known as the fixed-point misalignments minimization that captures a new optimality criterion for the information design without a given goal.
I-A Related Work
We follow a growing line of research on creating incentives for interacting agents to behave in a desired way. The most straightforward way is based on mechanism design approaches that properly provide reward incentives (e.g., contingent payments, penalty, supply of resources) by directly modifying the game itself to change the induced preferences of the agents over actions. Mechanism design approaches have been fruitfully studied in both static [15] as well as dynamic environment [16, zhang2021incentive, 17]. For example, auctions [18, 19] specify the way in which the agents can place their bid and clarify how the agents pay for the items; in matching markets [20, 21], matching rules matches agents in one side of a market to agents of another side that directly affect the payoff of each matched individuals. In reinforcement learning literature, reward engineering [22, 23, 24] is similar to mechanism design that directly crafts the reward functions of the agents that post specifications of the learning goal.
Our work lies in another direction: the information design. Information design studies how to influence the outcomes of the decision makings by choosing signal (also referred to as signal structure, information structure, Blackwell experiment, or data-generating process) whose realizations are observed by the agents [25]. In a seminal paper [7], Kamenica and Gentzkow has introduced Bayesian persuasion in which there is an informed sender and an uninformed receiver. The sender is endowed to commit to choosing any probability distribution (i.e., the information structure) of the signals as a function of the state of the world which is payoff-relevant to and unobserved by the receiver. The Bayesian persuasion can be interpreted as a communication device that is used by the sender to inform the receiver through the signals that contain knowledge about the state of the world. Hence, the sender controls what the agent gets to know about the payoff-relevant state. With the knowledge about the information structure, the receiver forms a posterior belief about the unobserved state based on the received signal. Hence, the information design of Bayesian persuasion is also referred to as an exercise in belief manipulation. Other works alongside with the Bayesian persuasion include [26, 27, 28, 29]. In [5], Mathevet et al. extends the single-agent Bayesian persuasion of [7] to a multi-agent game and formulate the information design of influencing agents’ behaviors through inducing distributions over agents’ beliefs. In [6], Bergemann and Morris have also considered information design in games. They have formulated the Myersonian approach for the information design in an incomplete-information environment. The essential of the Myersonian information design is the notion of Bayes correlated equilibrium, which characterizes the all possible Bayesian Nash equilibrium outcomes that could be induced by all available information structures. The Myersonian approach avoids the modeling of belief hierarchies [30] and constructs the information design problem as a linear programming. Information design has been applied in a variety of areas to study and improve real-world decision making protocols, including stress test in finance [31, 32], law enforcement and security [33, 34], censorship [35], routing system [36], finance and insurance [37, 38, 39]. Kamenica [25] has provided a recent survey of the literature of Bayesian persuasion and information design.
This work fundamentally differs from existing works on the information design. First, we consider a different environment. Specifically, we consider the setting when there are multiple sources of signals and each agent chooses one realized signal as an additional (payoff-relevant) information at each time. Among these sources of signals, there is an information designer who controls one of these sources and aims to induce equilibrium outcomes of the incomplete-information Markov game by strategically crafting information structures. Second, other than only taking actions, each agent in our model makes a coupled decision of selecting a realized signal and taking an action. Hence, the characterization of the solution concepts in our work is different from the equilibrium analysis in other works. Third, we also provide an approach with an explicit formulation to relaxing the optimal information design problem.
In this section, we first describe some fundamental concepts of a canonical Markov game model and then define our new model of a game called augmented Markov game by extending the canonical model.
Conventions. For any measurable set , is the set of probability measures over . Any function defined on a measurable set is assumed to be measurable. For any distribution , for two measurable sets and , is the marginal distribution of over and is the support of . The history of from period to period is denoted by with when . We use to represent the probability of an event . For the compactness of notations, we only show the elements, but not the sets, over which are summed under the summation operator. The notations are summarized in Appendix LABEL:app:list_of_notations.
I-B Normal-Form Game and Canonical Markov Game
This work considers games of finite agents, denoted by , . A normal-form (or strategic-form) is a basic representation of a static game:
Definition 0.1 (Normal-Form Game [40])
A normal-form game is defined by a tuple . Here, is a finite set of actions available to each agent. is the reward function of agent .
Each agent simultaneously chooses an action and receives a reward when other agents choose actions .

With reference to Fig. 1, a canonical Markov game generalizes a normal-form game to dynamic settings as well as Markov decision processing to multi-agent interactions. We consider a finite-agent infinite-horizon Markov game. The game is played in discrete time indexed by . Each agent ’s identification is captured by the parameter known as type denoted by . A canonical Markov game can be defined by a tuple . Here, is a finite set of states. is a finite set of actions available to each agent at each period. is an initial distribution of the state. is the transition function of the state, such that specifies the probability distribution of next-period state when the current state is and the current-period joint action is . Each agent ’s goal is characterized by his reward function is the reward function of agent parameterized by that realizes a one-stage reward for agent when the state is and the joint action is . A canonical Markov game is complete-information.
A solution to is a sequence of policy profile in which , such that specifies the probability of the joint action of the agents given the current-period state . A policy profile is stationary if the agents’ decisions of actions depend only on the current-period payoff-relevant information and is independent of the calendar time; i.e., we denote the policy profile as . When the policy profile is a pure strategy if is a deterministic mapping. In a canonical Markov game, can be either independent (i.e., ) or correlated (i.e., a joint function).
According to Ionescu Tulcea theorem (see, e.g., [41]), the initial distribution on , the transition function , and the policy profile together define a unique probability measure on . We denote the expectation with respect to as . The optimality criterion for each agent’s decision making at each period is to maximize his expected payoff. Each agent discounts his future payoffs by a discount factor . Thus, agent ’s period- infinite-horizon interim expected payoff is defined as: for , , , , ,
(1) |
Definition 0.2 (MPE)
A policy profile constitutes a stationary Markov perfect equilibrium (MPE) if, is independent, and for with , , , , ,
(2) | ||||
Let denote any infinite sequence of action-state pairs. Define , for any , , . Let denote the first components of , for all .
Lemma 1 ([maskin2001markov]))
For any , The game is continuous at infinity; i.e.,
Lemma 1 shows that for a fixed discount rate , the canonical Markov game is continuous at infinity. The continuity at infinity is essential for the existence of MPE of infinite-horizon stochastic game. The following proposition shows the existence of MPE due to [maskin2001markov].
Proposition 1.1 ([maskin2001markov])
Suppose that . Then, the game admits a MPE.
I-C Augmented Markov Game Model
With reference to Fig. 2, we extend the canonical Markov game model in this section by introducing an additional payoff-relevant information for the agents in addition to the state.
Signals. We call the additional information as signal. Let be a finite set of signals. We consider that at each period each agent observes a batch of (finite) signals (signal batch, batch), for , denoted by , for all , , in which denotes a typical signal indexed by sent to agent at . Upon observing , each agent selects one signal from the batch and becomes payoff-relevant to him in addition to the state .

Dynamics of Information. Let denote one signal contained in the batch . Given , the signal batch can be represented as , where , for all , . We assume that the probability of at any is specified by . Let denote the probability of . Here, both and , for all may depend on the current state , the joint type , the calendar time, or the past realizations of states or actions. Although there is an additional information in the game, the transition of the state is controlled by the current and the realized actions and, however, is independent of the selected signal , for all , . As in the canonical game , the transition of the state is Markov: given any , the transition function specifies the probability of a next state , for all .
Augmented Markov Game. We refer to the Markov game with signal as the augmented Markov game, denoted by : , where is the reward function (-parameterized) of agent , which takes into consideration agent ’s selected signal. A special setting for is that each agent makes two sequentially-coupled decisions. Specifically, agent first selects a signal from the batch , given the state ; then, he takes an action , given and his selected signal . Hence, at a given period , if the state is , agent observes a set of signals , the joint action of all other agents is , and if agent firstly selects a signal from and secondly takes an action , then the single-period reward is . We assume that the game model is complete-information.
I-C1 Strategies
Each agent is rational in the sense that it is self-interested and makes his decisions according to his observation to maximize his expected payoffs. We consider that the solution to the game is a stationary Markov strategy profile where is a selection rule profile and is a policy profile. Similar to in , each strategy of the profiles and can be either correlated (i.e, a joint function) or independent (i.e., , for all , and , where ).
Given any observation , each agent ’s selection of the signal and his choice of the action are fundamentally different. Specifically, the payoff-relevant information for signal selection is , i.e., , while the payoff-relevant information for the action taking is , i.e., in which . However, we will write to highlight the influence of the signal batch (and thus its distribution) on each agent ’s decision of selecting a signal. Agent first uses to select signal and then chooses an action according to based on the realized selection .
II Information Design Problem
In this work, we are interested in when there is one rational information designer referred to as principal (she, indexed by ) who controls one of signal sources. At time , the signal sent to agent by the principal is denoted by . We assume that is fixed and purely exogenous; i.e., is independent of the state, types, actions, the calendar time, or the histories of the game. This can be interpreted as when other sources of signals provide additional information to the agents in a non-strategic take-it-or-leave-it manner.
We consider that the principal is rational in that she possesses a goal specified by the agents’ equilibrium actions and strategically designs the information structure of her signal to achieve her goal. Specifically, the principal aims to induce the agents to take actions that coincides with her goal in the equilibrium by strategically chooses given . Since governs the generation of her signal , the principal’s choice of partially influences and the realizations of signal batch . This process is information design:
Definition 1.1 (Information Design)
An information design problem is defined as a tuple . Here, is an augmented Markov game model. is the agents’ policy profile. is the information structure, where defines a distribution of the signal privately observed by agent at . is the principal’s goal, i.e., her target equilibrium probability distribution of agents’ joint action conditioning only on the state and the agents’ type.
A solution to is a stationary signaling rule profile (signaling rule) that defines of the joint signal . The signaling rule is correlated if it is a joint function; i.e., , where . The rule is independent if the principal specifies the signal to each agent is independent of each other; i.e., . Since the agents use Markov strategies, agent ’s period- action depends on histories and (via the selection rule ) only through the current-period and . Hence, we restrict attention to a Markovian signaling rule that specifies the distribution of period- signal by depending on the current state and the joint type . We will denote the game with the principal using as .
The information design problem is a planning problem. Hence, the design of is independent of any realizations of states. Additionally, the principal does not know in advance all the possible equilibria that could be induced by any of her available signaling rules. Therefore, the principal’s information design has to take into account how the signaling rule can induce the agents’ behaviors that constitute an equilibrium. If the information design is viewed as an extensive form game between the principal and the agents, the timing is as follows:
-
(i)
The principal chooses a signaling rule profile for the agents.
-
(ii)
At the beginning of each period , a state is realized and observed by the principal and all agents.
-
(iii)
The principal sends a joint signal according to . Each agent receives and observes . Here, is generated according to .
-
(iv)
The agents use to select signals from .
-
(v)
Then, the agents use to chooses their actions from based on and .
-
(vi)
Immediate rewards are realized and the state is transitioned to according to .
II-A Equilibrium Concepts
In this section, we define a stationary equilibrium concept of the game . With a slight abuse of notation, we suppress the notations of , , and and only show , (of ), and (of ), for all , , unless otherwise stated. Since we focus on stationary environment, we suppress the time indexes from the notations, unless otherwise stated.
Similar to the canonical game , Ionescu Tulcea theorem (see, e.g., [41]) implies that the initial distribution of the state, the transition function , the distribution , the signaling rule , and the strategy profile together define a unique probability measure on . The expectation with respect to is denoted by or . With a slight abuse of notation, let denote the transition probability from state to state , given that the signal batch is and the agents select : for any with ,
Let . Given , , , define the state-signal value function of agent , representing agent ’s expected reward, originating at some with , agents select ,
(3) | ||||
where .
Define the state value function of agent that describes his expected reward, originating at any state :
(4) | ||||
Define the state-signal-action value function that represents agent ’s expected reward if are played in :
(5) | ||||
where .
We define an equilibrium concept known as sequential Markov perfect equilibrium (SMPE) as follows.
Definition 1.2 (SMPE)
Fix any signaling rule . A strategy profile constitutes a (stationary) sequential Markov perfect equilibrium (SMPE), where (i) the policy profile is independent (i.e., ) and (ii) the agents are sequential-perfectly rational (sequential-perfect rationality), i.e., for any , , ,
(6) |
and, for any with , , ,
(7) | ||||
An SMPE extends the stationary Markov perfect equilibrium (see, e.g., [42]) to our augmented Markov game The sequential-perfect rationality describes the coupled sequential best responses of each agent’s selection and action given a state and the available signal batch. In words, a strategy profile is sequential-perfectly rational if (i) given that agents choose actions according to the equilibrium policy profile , there is no state such that once it is reached, the agents strictly prefer to deviate from ; and (ii) there is no information set where is selected by such that once it is reached, the agents strictly prefer to deviate from .
The concept of SMPE in Definition 1.2, however, permits arbitrarily complex and possibly nonstationary deviations from (stationary) equilibrium profile . The following lemma states that it entails no loss of generality to consider any one-shot deviations from .
Lemma 2
Let and , respectively, be such that and , for all , while and are any two arbitrary strategies. A strategy profile constitutes a sequential-perfectly rational equilibrium profile of an SMPE if and only if for any , ,
(8) |
and, for any with , , , ,
(9) |
A one-shot deviation is a behavior of each agent ’s deviating from the equilibrium profile by selecting a signal using any and taking an action using any at the initial period of any subgame of , then reverting back to his equilibrium profile for the rest of the game. The one-shot deviation property in Lemma 2 allows the principal to restrict attention to the equilibrium characterization by considering the robustness of the information design to the one-shot deviation without lose of generality.
III SMPE Implementability
In this section, we formally formulate the principal’s information design problem. The optimality criterion of successful information design is captured by the notion of implementability which is characterized in the equilibrium concept of SMPE.
III-A Implementability
As in a canonical Markov game, each agent ’s decision of choosing an action takes into account other agents’ decisions of choosing because its immediate reward of taking directly depends on . In an augmented Markov game , agent ’s choices of and are coupled because specified by has a direct causal effect on through . Thus, each agent’s immediate reward indirectly depends on other agents’ selected signals through their actions. Hence, agents’ selection of signals is also a part of the strategic interactions in a . Since is fixed, the principal’s choice of controls the dynamics of given the strategy profiles. Therefore, it is possible for the principal to influence the equilibrium behaviors of agents in through proper designs of .
The principal’s information design takes an objective-first approach to design the information structure (given ) of the signals sent to agents, toward desired objectives , in a strategic setting through the design of , where self-interested agents act rationally by choosing and . Although any realization of the signal depends on the current state, the choice of is independent of the realizations of the states. The key restriction on the principal’s is that the agents are elicited to perform equilibrium actions that coincide with the principal’s goal, which is referred to as admissibility.
Definition 2.1 (Admissibility)
Fix any and . Let be any SMPE of the game . The policy profile is admissible if, for all ,
(10) |
The admissibility imposes a constraint on the signaling rule and the induced policy profile such that the goal is achieved in the sense of (10). We define a strong version of admissibility as follows.
Definition 2.2 (Strong Admissibility)
Fix any and . Let be any SMPE of the game . The policy profile is string admissible if, for all , , ,
(11) | ||||
The strong version constrains and additionally by imposing individual-level constraint on the signaling rule. It is straightforward to verify that the strong admissibility implies admissibility but not vice versa.
The success criterion of information design for the game in equilibrium is captured by the notion of implementability.
Definition 2.3 (SMPE Implementability)
Given any . We say that the signaling rule is (strongly) implementable in SMPE ( SMPE Implementability) if has an SMPE in which is (strongly) admissible.
The SMPE implementability requires that (i) the signaling rule designed by the principal induces a SMPE of and (ii) the principal’s goal is achieved (i.e., the equilibrium policy profile is admissible or strong admissible).
Given any , the distribution of the action conditioning on any state is jointly determined by the agents’ and the principal’s . Hence, given , the signal sent by the principal by using ultimately influences each agent’ expected reward. However, this information is transmitted indirectly through the agents’ selection rules when the selected signal is not equal to . We refer to the design of that induces such selection rules in SMPE of as an indirect information design (IID). We call the game with such as indirect augmented Markov game, denoted by .
III-B Direct Information Design
As a designer, the principal takes into consideration how each agent strategically behaves according to the game rules and reactions to his opponents’ behaviors as well as the responses from the environment. In any , the principal’s design of must predict the agents’ equilibrium selection rule profile and their corresponding equilibrium policy profile that might be induced by . In contrast to , the principal may elect to a direct information design in which the principal makes her signals payoff-relevant to each agent’s decision of choosing an action by incentivizing each agent to select her signal at each state. We refer to the game with such signaling rule as direct augmented Markov game, denoted by . The implementability of the direct information design requires a restriction noted as obedience in addition to the admissibility.
Definition 2.4 (Obedience)
In any , agent ’s selection rule is dominant-strategy obedient (DS-obedient, DS-obedience) if, for any with , ,
(12) |
Agent ’s selection rule is Bayesian obedient (Bayesian obedience) if, for any with , ,
(13) |
when all other agents are obedient, i.e., .
We will refer to a SMPE with obedient selection rule profile as (dominant-strategy or Bayesian) obedient SMPE (O-SMPE).
Definition 2.5 (OIL)
Given any goal , the signaling rule is (DS, Bayesian) obedient-implementable in SMPE (OIL) if it induces an (DS, Bayesian) O-SMPE in which is (DS, Bayesian) obedient and is admissible.
In a , the principal wants to directly enter the immediate rewards of the agents at each period . The OIL guarantees that (i) agents would want to select the signal sent by the principal than choose any other signals from , and (ii) agents take actions specified by the admissible policy other than other available actions. Hereafter, we denote obedient selection rule by and , for all .
A successful information design depends on the principal’s having accurate beliefs in regard to the agents’ decision processes. This includes all the possible indirect selection behaviors of the agents, i.e., all possible . The point of direct information design is that it allows the principal to ignore analyzing all of agents’ indirect selections behaviors and focus on the obedient .
III-C Characterizing OIL
In this section, we characterize the OIL and formulate the principal’s information design problem given a goal .
The following proposition is an analog of Bellman’s Theorem [43].
Proposition 2.1
From Proposition 2.1, we can re-define , , and given in (3)-(5), respectively, as the unique pair of value functions that satisfy the Bellman recursions (14), (15), and (16).
Lemma 3
Fix . Let denote the obedient selection rule profile. The strategy profile is a (DS, Bayesian) O-SMPE if and only if, for any , with , ,
(17) | ||||
-
(i)
and is DS-obedient, i.e., if, for any arbitrary selection rule profiles , any , ,
(18) -
(ii)
or, is Bayesian-obedient, i.e., if, for any , ,
(19)
In Lemma 3, (17)-(18) (resp. (17)-(19)) constitute a recursive representation of a DS-O-SMPE (resp. Bayesian O-SMPE). Jointly, (17)-(18) require that (i) the expected payoff of each agent is maximized by his equilibrium policy in every state and every signal selected according to any possible equilibrium selection rule , (ii) the expected payoff of each agent is maximized by the obedient selection rule in every state when the action is taken by any possible corresponding equilibrium policy , and (iii) and (given that his opponents are using any arbitrary selection rule but following O-SMPE policy profile.) Similar interpretation can be done for Bayesian O-SMPE, given that each agent ’s opponents are following Bayesian O-SMPE strategy profile.
III-C1 Design Regime: Fixed-Point Alignment
In this section, we restrict attention to the Bayesian obedience and formulate an information design regime for a given goal. The following theorem states the existence of Bayesian O-SMPE.
Theorem 4
Every augmented Markov game admits a stationary Bayesian O-SMPE for any regular .
We provide an approach which we refer to as the fixed-point alignment to formulate the design of the Bayesian OIL signaling rule as a planning problem. First, we define a class of principal’s goal which we refer to as Markov perfect goal. Let, for all ,
Definition 4.1 (Markov Perfect Goal)
We say that the principal’s goal is a Markov perfect goal if it is a MPE of a game canonical Markov game with as the reward function of each agent ; i.e., for all with , , , , ,
where is defined in (1). Let denote a set of Markov perfect goals given the reward functions .
For any state value function , any , , , we denote as the state-signal-action value function constructed in terms of according to (16). Similarly, we denote as the state value function constructed in terms of according to (15). Given any policy profile of other agents and any state value function , we let, for all , , , , ,
(20) | ||||
Similarly, for any state-signal value function , we denote as the value function constructed in terms of according to the Bellman recursions (14)-(16), i.e., for all , , with , ,
The Bayesian OIL restricts the design of the signaling rule by threefold constraints. First, agents are incentivized to be Bayesian obedient in the signal selections. Second, the Bayesian-obedient agents’ policy profile constitute the policy component of a Bayesian O-SMPE. Third, the equilibrium policy profile is admissible. The first two constraints require the to elicit the agents to a Bayesian O-SMPE.
Proposition 4.1 provides a general formulation of an optimization problem to find the policy profile of a Bayesian O-SMPE of a game .
Proposition 4.1
Fix a signaling rule . Let denote the Bayesian obedient selection rule profile. Let denote the corresponding state-signal value function of a policy profile . The strategy profile constitutes a Bayesian O-SMPE if and only if is a solution of the following optimization problem with :
() | ||||
subject to, for all , , with , , ,
() |
() |
() | ||||
Proposition (4.1) extends the fundamental formulation of finding the Nash equilibrium if a stochastic game as a nonlinear programming (Theorem 3.8.2 of [44]; see also, [45, 46]). Here, the condition () ensures that each is valid policy and rules out the possible trivial solution for all . The constraints () and () are two necessary conditions for a Bayesian O-SMPE of the game derived from (17) and (19) of Lemma 3. Any feasible solution making constitutes a Bayesian O-SMPE (in which the admissibility is not constrained). Here, the Bayesian obedient selection rule profile is not a solution of the optimization problem ()-(); instead, the optimality of Bayesian obedience constrains the optimal solution through ().
If we suppress the constraint (), then the reduced optimization problem ()-() can be interpreted as a process to find pairs of decision variables that fit a Bellman optimality operator (i.e., satisfying (17)). In other words, the goal of this reduced optimization problem is to find fixed points. However, there is another fixed point from the Bellman optimality operator established by the condition (19) and the Bellman recursions (14)-(16); i.e., suppose agents are Bayesian obedient, for all ,,
(21) | ||||
However, the Bellman optimality equation (21) is independent of but is constructed based on the relationship between and given in (15).
We propose a design regime for finding a signaling rule that aligns two fixed points and for each agent while each agent’s policy is strongly admissible. Let, for any , with , , ,
with . The objective function would be
(22) | ||||
which will need to be minimized by all possible , , and . By , given and , we define a set of valid policy profiles that are admissible. For example, if we refer to strong admissibility, the set is given as
(23) | ||||
The Bayesian obedience is constrained by, for all , , with , ,
() |
Unlike the which is conditioned on and . Unlike The feasibility of given a is captured by the constraint (). We additionally constrain the feasibility of in terms of as follows: for all , with , ,
() |
Formally, the optimization problem of the information design based ob fixed-point alignment is
() | ||||
s.t., | ||||
In (), the constraint is a sufficient and necessary for the feasible to be the policy component of a Bayesian O-SMPE.
Theorem 5
Theorem 5 provides a design regime for the signaling rule that is OIL in Bayesian O-SMPE. The condition (i) specifies the optimality of the solution to () while the conditions (ii) and (iii) disciplines the principal’s freedom in manipulating the agents’ behaviors. Specifically, the condition (ii) implies that the principal cannot plan arbitrary goal that specifies arbitrary distribution of the agents’ actions conditioning on the state and the joint type.
Theorem 5 shows two restrictions for the principal’s freedom to set her goal and determine how the goal is achieved such that the agents’ behaviors in a Bayesian O-SMPE can be influenced Specifically, the goal should be a Markov perfect goal and the induced equilibrium policy profile should be strongly admissible. The following corollary uncovers another restriction on the principal’s ability to influence the agents’ behaviors in a Bayesian O-SMPE.
Corollary 5.1
Fix a base augmented Markov game with . In general, there exists that can be achieved in an indirect game but not in any direct game .
Corollary 5.1 states that restricting attention to direct information design is with loss of generality in selecting Markov perfect goals.
IV Principal’s Optimal Information Design
So far, we focus on when the principal’s goal is given. In this section, we introduce the optimality criterion of the principal’s goal selection and define the optimal information design problem without a predetermined goal. The one-stage payoff function of the principal is , such that gives the immediate payoff for the principal when the state is and the agents of the game take the joint action . Recall that the principal’s goal is the probability distribution of the agents’ joint action in the equilibrium conditioned only on the global state given the agents’ types. Hence, the information structure that matters for the principal’s goal selection problem is . According to Ionescu Tulcea theorem, the information structure and any goal uniquely define a probability measure on . We denote the corresponding expectation as . Hence, the principal’s problem is to choose a goal by maximizing her expected payoff (-discounted, the same as the agents’), i.e.,
(24) |
However, the principal cannot force the agents to take the actions or directly program agents’ actions according to the that maximizes ; instead, she uses information design to elicit the agents to take actions that coincide with in the sense of strong admissibility. Hence, the principal’s optimal goal selection problem is a constrained optimization problem:
(25) | ||||
s.t. |
In (25), the feasibility of is captured by the conditions (i) it is a Markov perfect goal and (ii) it disciplines the strong admissibility in ().
The optimal goal selection problem (25) can be reformulated to a problem of selecting and . Specifically, from the (strong) admissibility, the objective function can be represented in terms of and as follows:
(26) | ||||
where the expectation is with respect to the probability measure on the dynamics of the state. Let denote the set of valid policy profiles that associated with the value function , given and ; i.e.,
where is defined in (20). Hence, the principal’s problem (25) can be reformulated as follows:
(OptInfo) | ||||
s.t. | ||||
Technically, the problem (OptInfo) is to select (i) an equilibrium policy profile that is strongly admissible and is a MPE and (ii) the signaling rule that induces the policy profile such that the principal’s expected payoff is maximized at .
The problem (OptInfo) is, however, based on the assumption that the agents’ equilibrium behavior is always principal-preferred. We could also consider the problem of a principal who aims to solve her problem in a robust manner in the sense that she chooses the signaling rule, but wants to maximize her expected payoff in the worst equilibrium; i.e., it is the robust information design problem:
(Robust) | ||||
s.t. | ||||
IV-A Fixed-Point Misalignment Minimization
In this section, we provide an alternative formulation of information design by introducing the notion of fixed-point misalignment (FP misalignment).
Define, for any , , , , , ,
Then, we define the notion of fixed-point misalignment as follows:
Proposition 5.1
Then, we reformulate () in terms of FP misalignment minimization based on Proposition 5.1. Due to the definitions of and , the objective functions and can be represented in terms of and , as follows (denoted by and ) respectively:
(27) |
(28) |
Define a set:
Given , we define a set of OIL signaling rules as and a set of policy profiles given any signaling rule as .
Corollary 5.3
The principal’s robust information design is to solve the following problem
(29) |
V Conclusion
This work is the first to propose an information design principle for dynamic games in which each agent makes coupled decisions of selecting a signal and taking an action at each period of time. We have formally defined a novel information design problem for the indirect and the direct settings. The notion of obedient implementability has been introduced to capture the optimality of the direct information design problem in a new equilibrium concept of obedient sequential Markov perfect equilibrium (O-SMPE). By characterizing the obedient implementability (OIL) in Bayesian O-SMPE, we have proposed an approach to determining the information structure. We refer to this approach as fixed-point alignment that aligns the two fixed points at the signal selection stage and the action taken stage, respectively. We have uncovered the restrictions that discipline the principal’s freedom to influence the agents’ behaviors in Bayesian O-SMPE. Specifically, the principal’s goal should be a Markov perfect goal and the equilibrium policy profile should be strongly admissible. Additionally, it is with loss of generality in terms of the selection of Markov perfect goals for OIL in Bayesian O-SMPE. Finally, we have formulated the principal’s goal selection problem in terms of the optimal and the robust information design by replacing the admissibility by the optimality or the robustness of the agents’ equilibrium policy profile in the principal’s expected payoff.
References
- [1] A. Dickinson, “Actions and habits: the development of behavioural autonomy,” Philosophical Transactions of the Royal Society of London. B, Biological Sciences, vol. 308, no. 1135, pp. 67–78, 1985.
- [2] D. Bergemann and S. Morris, “Information design: A unified perspective,” Journal of Economic Literature, vol. 57, no. 1, pp. 44–95, 2019.
- [3] I. Taneva, “Information design,” American Economic Journal: Microeconomics, vol. 11, no. 4, pp. 151–85, 2019.
- [4] N. Chentanez, A. G. Barto, and S. P. Singh, “Intrinsically motivated reinforcement learning,” in Advances in neural information processing systems, 2005, pp. 1281–1288.
- [5] L. Mathevet, J. Perego, and I. Taneva, “On information design in games,” Journal of Political Economy, vol. 128, no. 4, pp. 1370–1404, 2020.
- [6] D. Bergemann and S. Morris, “Bayes correlated equilibrium and the comparison of information structures in games,” Theoretical Economics, vol. 11, no. 2, pp. 487–522, 2016.
- [7] E. Kamenica and M. Gentzkow, “Bayesian persuasion,” American Economic Review, vol. 101, no. 6, pp. 2590–2615, 2011.
- [8] J. Ely, A. Frankel, and E. Kamenica, “Suspense and surprise,” Journal of Political Economy, vol. 123, no. 1, pp. 215–260, 2015.
- [9] J. Passadore and J. P. Xandri, “Robust conditional predictions in dynamic games: An application to sovereign debt,” Job Market Paper, 2015.
- [10] L. Doval and J. C. Ely, “Sequential information design,” Econometrica, vol. 88, no. 6, pp. 2575–2608, 2020.
- [11] J. C. Ely, “Beeps,” American Economic Review, vol. 107, no. 1, pp. 31–53, 2017.
- [12] J. C. Ely and M. Szydlowski, “Moving the goalposts,” Journal of Political Economy, vol. 128, no. 2, pp. 468–506, 2020.
- [13] M. Makris and L. Renou, “Information design in multi-stage games,” working paper, Tech. Rep., 2018.
- [14] F. Koessler, M. Laclau, and T. Tomala, “Interactive information design,” HEC Paris Research Paper No. ECO/SCD-2018-1260, 2018.
- [15] R. B. Myerson, “Optimal auction design,” Mathematics of operations research, vol. 6, no. 1, pp. 58–73, 1981.
- [16] A. Pavan, I. Segal, and J. Toikka, “Dynamic mechanism design: A myersonian approach,” Econometrica, vol. 82, no. 2, pp. 601–653, 2014.
- [17] T. Zhang and Q. Zhu, “On the differential private data market: Endogenous evolution, dynamic pricing, and incentive compatibility,” 2021.
- [18] P. Milgrom and P. R. Milgrom, Putting auction theory to work. Cambridge University Press, 2004.
- [19] S. Bhat, S. Jain, S. Gujar, and Y. Narahari, “An optimal bidimensional multi-armed bandit auction for multi-unit procurement,” Annals of Mathematics and Artificial Intelligence, vol. 85, no. 1, pp. 1–19, 2019.
- [20] T. Sönmez and M. U. Ünver, “Matching, allocation, and exchange of discrete resources,” in Handbook of social Economics. Elsevier, 2011, vol. 1, pp. 781–852.
- [21] T. Zhang and Q. Zhu, “Optimal two-sided market mechanism design for large-scale data sharing and trading in massive iot networks,” arXiv preprint arXiv:1912.06229, 2019.
- [22] D. Dewey, “Reinforcement learning and the reward engineering principle,” in 2014 AAAI Spring Symposium Series, 2014.
- [23] R. Nagpal, A. U. Krishnan, and H. Yu, “Reward engineering for object pick and place training,” arXiv preprint arXiv:2001.03792, 2020.
- [24] D. Hadfield-Menell, S. Milli, P. Abbeel, S. J. Russell, and A. Dragan, “Inverse reward design,” in Advances in neural information processing systems, 2017, pp. 6765–6774.
- [25] E. Kamenica, “Bayesian persuasion and information design,” Annual Review of Economics, vol. 11, pp. 249–272, 2019.
- [26] I. Brocas and J. D. Carrillo, “Influence through ignorance,” The RAND Journal of Economics, vol. 38, no. 4, pp. 931–947, 2007.
- [27] L. Rayo and I. Segal, “Optimal information disclosure,” Journal of political Economy, vol. 118, no. 5, pp. 949–987, 2010.
- [28] I. Arieli and Y. Babichenko, “Private bayesian persuasion,” Journal of Economic Theory, vol. 182, pp. 185–217, 2019.
- [29] M. Castiglioni, A. Celli, A. Marchesi, and N. Gatti, “Online bayesian persuasion,” Advances in Neural Information Processing Systems, vol. 33, 2020.
- [30] J.-F. Mertens and S. Zamir, “Formulation of bayesian analysis for games with incomplete information,” International Journal of Game Theory, vol. 14, no. 1, pp. 1–29, 1985.
- [31] I. Goldstein and Y. Leitner, “Stress tests and information disclosure,” Journal of Economic Theory, vol. 177, pp. 34–69, 2018.
- [32] N. Inostroza and A. Pavan, “Persuasion in global games with application to stress testing,” 2018.
- [33] P. Hernández and Z. Neeman, “How bayesian persuasion can help reduce illegal parking and other socially undesirable behavior,” Preprint, 2018.
- [34] Z. Rabinovich, A. X. Jiang, M. Jain, and H. Xu, “Information disclosure as a means to security,” in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. Citeseer, 2015, pp. 645–653.
- [35] S. Gehlbach and K. Sonin, “Government control of the media,” Journal of public Economics, vol. 118, pp. 163–171, 2014.
- [36] S. Das, E. Kamenica, and R. Mirka, “Reducing congestion through information design,” in 2017 55th annual allerton conference on communication, control, and computing (allerton). IEEE, 2017, pp. 1279–1284.
- [37] D. Duffie, P. Dworczak, and H. Zhu, “Benchmarks in search markets,” The Journal of Finance, vol. 72, no. 5, pp. 1983–2044, 2017.
- [38] M. Szydlowski, “Optimal financing and disclosure,” Management Science, vol. 67, no. 1, pp. 436–454, 2021.
- [39] D. Garcia and M. Tsur, “Information design in competitive insurance markets,” Journal of Economic Theory, vol. 191, p. 105160, 2021.
- [40] B. D. Ziebart, J. A. Bagnell, and A. K. Dey, “Maximum causal entropy correlated equilibria for markov games.” in AAMAS. Citeseer, 2011, pp. 207–214.
- [41] O. Hernández-Lerma and J. B. Lasserre, Discrete-time Markov control processes: basic optimality criteria. Springer Science & Business Media, 2012, vol. 30.
- [42] W. He and Y. Sun, “Stationary markov perfect equilibria in discounted stochastic games,” Journal of Economic Theory, vol. 169, pp. 35–61, 2017.
- [43] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
- [44] J. Filar and K. Vrieze, “Competitive markov decision processes-theory, algorithms, and applications,” 1997.
- [45] H. Prasad and S. Bhatnagar, “General-sum stochastic games: Verifiability conditions for nash equilibria,” Automatica, vol. 48, no. 11, pp. 2923–2930, 2012.
- [46] H. Prasad, P. LA, and S. Bhatnagar, “Two-timescale algorithms for learning nash equilibria in general-sum stochastic games,” in Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 2015, pp. 1371–1379.