Are Equivariant Equilibrium Approximators Beneficial?
Abstract
Recently, remarkable progress has been made by approximating Nash equilibrium (NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE) through function approximation that trains a neural network to predict equilibria from game representations. Furthermore, equivariant architectures are widely adopted in designing such equilibrium approximators in normal-form games. In this paper, we theoretically characterize benefits and limitations of equivariant equilibrium approximators. For the benefits, we show that they enjoy better generalizability than general ones and can achieve better approximations when the payoff distribution is permutation-invariant. For the limitations, we discuss their drawbacks in terms of equilibrium selection and social welfare. Together, our results help to understand the role of equivariance in equilibrium approximators.
1 Introduction
The equivariant equilibrium property states that, given a Nash Equilibrium (NE) solution of a game, the permuted solution is also an NE for the game whose actions of representation are permuted in the same way. The same property also holds in correlated equilibrium (CE) and coarse correlated equilibrium (CCE), as well as the approximate solutions for all three solution concepts.
In this paper, we are interested in understanding the equivariant equilibrium property in designing neural networks that predict equilibria from game payoffs, following such recent approaches in designing equivariant equilibrium approximators (Feng et al., 2021; Marris et al., 2022) in normal-form games. Informally, such equivariant approximators keep the same permutation of the output strategies (represented as vectors or tensors) when the input game representations (e.g., the game payoff tensors) are permuted 111We will provide a formal definition of equivariance equilibrium approximators in Section 3. While equivariant approximators achieved empirical success, little work has theoretically discussed whether they are beneficial.
1.1 Our Contributions
We theoretically characterize benefits and limitations of equivariant NE, CE and CCE approximators. For the benefits, we first show that equivariant approximators enjoy better generalizability, where we evaluate the approximators using the maximum exploitability (Lockhart et al., 2019; Goktas and Greenwald, 2022) over all players. To get such a result, we derive the generalization bounds and the sample complexities of the NE, CE, and CCE approximators: The generalization bounds offer confidence intervals on the expected testing approximations based on the empirical training approximations; The sample complexities describe how many training samples the equilibrium approximators need to achieve desirable generalizability. The generalization bounds and sample complexities include the covering numbers (Shalev-Shwartz and Ben-David, 2014), which measure the representativeness of the approximators’ function classes. Afterward, we prove that the equivariant approximators have lower covering numbers than the general models, therefore have lower generalization bounds and sample complexities. We then show that the equivariant approximators can achieve better approximation when the payoff distribution is permutation-invariant.
As for the limitations, we find the equivariant approximators unable to find all the equilibria of some normal-form games. Such a result is caused by the limited representativeness of the equivariant approximators’ function class. Besides, we find that the equivariant NE approximator may lose social welfare. Specifically, in an example we constructed, while the maximum NE social welfare is large, the maximum social welfare of NEs that the equivariant NE approximators could find can be arbitrary close to zero. Such a negative result inspires us to balance generalizability and social welfare when we design the approximators’ architectures.
1.2 Further Related Work
Solving (approximate) NE, CE, and CCE for a single game are well studied (Fudenberg et al., 1998; Cesa-Bianchi and Lugosi, 2006). However, many similar games often need to be solved (harris2023metalearning) , both in practice and in some multi-agent learning algorithms (Marris et al., 2021; Liu et al., 2022). For instance, in repeated traffic routing games (Sessa et al., 2020), the payoffs of games depend on the capacity of the underlying network, which can vary with time and weather conditions. In repeated sponsored search auctions, advertisers value different keywords based on the current marketing environment (Nekipelov et al., 2015). In many multi-agent learning algorithms such as Nash Q-learning (Hu and Wellman, 2003), Correlated-Q learning (Greenwald et al., 2003), V-learning (Jin et al., 2022) and PSRO (Lanctot et al., 2017), an NE, CE or CCE of a normal-form game need to be solved in every update step.
In these settings, it is preferred to accelerate the speed of game solving by function approximation: Marris et al. (2022) introduces a neural equilibrium approximator to approximate CE and CCE for -player normal-form games; Feng et al. (2021) proposes a neural NE approximator in PSRO (Lanctot et al., 2017); Wu and Lisser (2022) designs a CNN-based NE approximator for zero-sum bimatrix games. Differentiable approximators have also been developed to learn QREs (Ling et al., 2018), NE in chance-constrained games (Wu and Lisser, 2023), and opponent’s strategy (Hartford et al., 2016).
Equivariance is an ideal property of the equilibrium approximator. We will discuss the literates of equivariant approximators after formally defining them in Section 3.
1.3 Organization
The rest of our paper is organized as follows: In Section 2 we introduce the preliminary of game theory and equilibrium approximators. In Section 3 we formally define the equivariance of equilibrium approximators. We present our theoretical analysis of benefits in Section 4 and limitations in Section 5. We conclude and point out the future work in Section 6.
2 Preliminary
In this section, we introduce the preliminary and notations of our paper. We also provide a brief introduction to equilibrium approximators.
2.1 Game Theory
Normal-Form Game
Let a normal-form game with joint payoff be , in which
-
•
is the number of players. Each player is represented by the index .
-
•
is the product action space of all players, where . For player , let be a specific action of (An action is also referred to as a pure strategy). A joint action represents one play of the game in which the player takes action . The action space is a Cartesian product that contains all possible joint actions. We have .
- •
A joint (mixed) strategy is a distribution over . Let be a product strategy and be a joint (possibly correlated) strategy. Denote as the marginal strategy of player in . The expected utility of player under is
Besides, on behalf of player , the other players’ joint strategy is denoted as , so as and .
Nash Equilibrium (NE)
We say a product strategy is a NE if each player’s strategy is the best response given the strategies of others, i.e.,
(NE) |
Computing NE for even general -player or -player games is PPAD-hard (Chen et al., 2009; Daskalakis et al., 2009), which leads to research on approximate solutions. For arbitrary , we say a product strategy is an -approximate Nash equilibrium (-NE) if no one can achieve more than utility gain by deviating from her current strategy. Formally,
(-NE) |
The definition of -NE reflects the idea that players might not be willing to deviate from their strategies when the amount of utility they could gain by doing so is tiny (not more than ).
Coarse Correlated Equilibrium (CCE)
We say a joint (possibly correlated) strategy is a CCE if no player can receive a higher payoff by acting independently, i.e.,
(CCE) |
and we say is an -approximate coarse correlated equilibrium (-CCE) for if
(-CCE) |
The difference between NE and CCE is that in an NE, players execute their strategy individually in a decentralized way, while in a CCE, players’ strategies are possibly correlated. A standard technique to correlate the strategy is sending each player a signal from a centralized controller (Shoham and Leyton-Brown, 2008).
Correlated Equilibrium (CE)
CE is similar to CCE, except that in a CE, each player can observe her recommended action before she acts. Thus, player deviates her strategy through strategy modification . maps actions in to possibly different actions in . Based on strategy modification, we say a joint (possibly correlated) strategy is a CE if
(CE) |
and a joint strategy is an -approximate correlated equilibrium (-CE) for if
(-CE) |
Note that for a finite -player normal-form game, at least one NE, CE, and CCE must exist. This is because NE always exists (Nash et al., 1950) and NE CE CCE.
Equilibrium Approximation
To evaluate the quality of a joint strategy to approximate an equilibrium, we define approximation based on exploitability (Lockhart et al., 2019; Goktas and Greenwald, 2022).
Definition 2.1 (Exploitability and Approximation).
Given a joint strategy , the exploitability (or regret) of player is the maximum payoff gain of by deviating from her current strategy, i.e.,
and the exploitability under strategy modification of player is the maximum payoff gain of by deviating through strategy modification, i.e.,
The equilibrium approximation is defined as the maximum exploitability over all players 222 A similar metric of equilibrium approximation is called Nikaido-Isoda function (Nikaidô and Isoda, 1955) or NashConv (Lockhart et al., 2019), which is the sum of exploitability over all players, i.e., . , i.e.,
Based on approximation, we can restate the definition of solution concepts. A product strategy is an NE of game if and is an -NE if . A joint strategy is a (C)CE of if and is an -(C)CE if .
2.2 Equilibrium Approximator
The equilibrium approximators, including NE, CE, and CCE approximators, aim to predict solution concepts from game representations. In our paper, we fix the number of players and action space . We denote by the set of all the possible game payoffs. The NE approximator maps a game payoff to a product strategy, where is player ’s predicted strategy. We define as the function class of the NE approximator. Similarly, we define (C)CE approximator as and (C)CE approximator class as .
An equilibrium approximator can be learned through machine learning paradigms based on empirical data. For instance, Feng et al. (2021) learn the NE approximator using the game payoffs generated in the learning process of PSRO, and optimize the approximator by gradient descent in exploitability. Marris et al. (2022) learn the CE and CCE approximators using the games i.i.d. generated from a manually designed distribution, and optimize the approximators using maximum welfare minimum relative entropy loss. Such a loss balances the equilibrium approximation, the social welfare, and the relative entropy of the joint strategy. Additionally, another simple way to learn NE or CCE equilibrium approximator is to apply minibatch stochastic gradient descent (SGD) on approximation. Specifically, we denote as the -dimensional parameter of the approximator, such as the weights of the neural network. We can optimize by the standard minibatch SGD algorithm on approximation (See Algorithm 1).
3 Equivariant Equilibrium Approximator
In this section, we introduce the equivariance of the equilibrium approximators and the way how we apply orbit averaging (Elesedy and Zaidi, 2021) to construct equivariant approximators. Equivariant approximator has been developed in many literatures (Hartford et al., 2016; Feng et al., 2021; Marris et al., 2022; Wu and Lisser, 2022), which we will discuss latter.
We first define the permutation of a game. Let be a permutation function of player , which is a bijection from to itself. Define as the class of permutation function for player , which forms an Abelian group.
Definition 3.1 (Permutation of a game).
For a normal-form game , we define the -permutation of payoff tensor as , in which
We also define the -permutation of joint strategy as , where
and the -permutation of product strategy as , where
Equivariant NE Approximator
Considering the relationship of -permuted game and -permuted product strategy, we have the following result for NE:
Lemma 3.2.
In a normal-form game , for arbitrary player and any (-)NE strategy , is also an (-)NE for the -permuted game .
3.2 tells the unimportance of action order with respect to NE approximation. Based on it, we can summarize two ideal properties for NE approximators, which are defined as follows:
Definition 3.3 (Player-Permutation-Equivariance, PPE).
We say an NE approximator satisfies player permutation-equivariant (-PE) if for arbitrary permutation function we have
(-PE) |
Moreover, we say is player-permutation-equivariant (PPE) if satisfies -PE for all player .
Definition 3.4 (Opponent-Permutation-Invariance, OPI).
We say an NE approximator is opponent permutation-invariant (-PI) if for any other player and arbitrary permutation function we have
(-PI) |
and we say is opponent-permutation-invariant (OPI) if satisfies -PI for all player .
Equivariant (C)CE approximator
Considering the relationship of -permuted game and -permuted joint strategy, we have a similar result for CE and CCE:
Lemma 3.5.
In a normal-form game , for an arbitrary player and any (-)CE or (-)CCE strategy , is also an (-)CE or (-)CCE for the -permuted game .
Inspired by Lemma 3.5, we can also summarize an ideal property for CE and CCE approximators defined as follows.
Definition 3.6 (Permutation-Equivariance,PE).
We say an (C)CE approximator is player permutation-equivariant (-PE) if for permutation function we have
and we say is permutation-equivariant (PE) if satisfies -PE for all player .
Equivariant Approximators in Literature
For two-player games, Feng et al. (2021) propose an MLP-based NE approximator that satisfies both PPE and OPI for zero-sum games. Additionally, they also design a Convd-based NE approximator that satisfies PPE only. Hartford et al. (2016) give a PPE approximator to predict players’ strategies. The traditional algorithms Tsaknakis and Spirakis (2007) and Deligkas et al. (2022), which approximate NE by optimization, are also PPE and OPI to payoff and the initial strategies. For -player general games, Marris et al. (2022) provide a permutation-equivariant approximator to approximate CE and CCE. Equivariant architectures are also adopted in optimal auction design (Rahme et al., 2021; Duan et al., 2022; Ivanov et al., 2022), and Qin et al. (2022) theoretically characterize the benefits of permutation-equivariant in auction mechanisms. We follow the rough idea of Qin et al. (2022) when we analyze the benefits of equivariant equilibrium approximators.
3.1 Orbit Averaging
Orbit averaging is a well-known method to enforce equivariance or invariance for a function (Schulz-Mirbach, 1994). It averages the inputs of a function over the orbit of a group (e.g., the permutation group in our paper).
Orbit Averaging for NE Approximator
For an NE approximator and any player , we can construct a -PI or -PE NE approximator by averaging with respect to all the permutations of player . Specifically, we construct an -PI NE approximator by operator with
and we construct an -PE NE approximator by operator with:
To construct a PPE or OPI NE approximator, we composite the operators with respect to all players. Let and , we get the following corollary:
Lemma 3.8.
is OPI and is PPE. If is already OPI or PPE, we have or , respectively.
Furthermore, we can also compose to construct a NE approximator with both PPE and OPI.
Orbit Averaging for (C)CE Approximator
For CE or CCE approximator , we define -project for player to construct an -PE approximator, which averages with respect to all the permutations of player .
Similarly, we define as the composite operator.
Lemma 3.9.
is -PE and is PE. Specifically, If is already -PE or PE, then we have or , respectively.
Corollary 3.10.
.
The benefit of using orbit averaging is shown in the following lemma:
Lemma 3.11.
Denote as an idempotent operator, i.e. (e.g. or ). For function class of NE, CE or CCE approximator, let be any subset of that is closed under , then is the largest subset of that is invariant under .
According to Lemma 3.8, Lemma 3.9 and Lemma 3.11, (or /) is the largest subset of (or /) with the corresponding property (OPI, PPE, or PE) if (or /) is closed operator under (or /). The result tells that the orbit averaging operators, while enforcing the operated function to be equivariance or invariance, keep as large capacity of the function class as possible. Therefore, we believe that orbit averaging is an ideal approach to constructing equivariant or invariant functions.
4 Theoretical Analysis of Benefits
In this section, we theoretically analyze the benefits of equivariant approximators with respect to generalizability and approximation.
4.1 Benefits for Generalization
We first derive the generalization bound and sample complexity for general approximator classes, and then we show the benefits of equivariant approximators by applying orbit averaging to the approximators.
The representativeness of an approximator class is measured by the covering numbers (Shalev-Shwartz and Ben-David, 2014) under -distance, which are defined as follows:
Definition 4.1 (-distance).
The -distance between two equilibrium approximators is:
where we define the distance of two product strategies and as
and the distance of two joint strategy and as
Definition 4.2 (-covering number).
For , we say function class -covers another function class under -distance if for all function , there exists such that . The -covering number of is the cardinality of the smallest function class that -covers under -distance.
Based on covering numbers, we provide the generalization bounds of NE, CE and CCE approximators. The bounds describe the difference between the expected testing approximation and empirical training approximation.
Theorem 4.3.
[Generalization bound] For function class of NE, CE or CCE approximator, with probability at least over draw of the training set (with size ) from payoff distribution , for all approximator we have
where for NE approximator, and for CE and CCE approximators.
To get the theorem, we first show that all three equilibrium approximations are Lipschitz continuous with respect to strategies. Afterward, we derive the Rademacher complexity (Bartlett and Mendelson, 2002) of the expected approximation based on the Lipschitz continuity and covering numbers. See Section B.1 for the detailed proof.
We can see from Theorem 4.3 that, with a large enough training set, the generalization gaps of equilibrium approximators go to zero if the covering number is bounded. As a result, we can estimate the expected testing performance through the empirical training performance.
We can also derive the sample complexities of equilibrium approximators to achieve the desirable generalizability.
Theorem 4.4.
[Sample complexity] For , function class of NE, CE or CCE approximator and distribution , with probability at least over draw of the training set with
games sampled from , we have
where for NE approximators, and for CE and CCE approximators.
The proof is based on the Lipschitz continuity of approximation, uniform bound, and concentration inequality. See Section B.2 for details. Theorem 4.4 is also called the uniform convergence of function class , which is a sufficient condition for agnostic PAC learnable (Shalev-Shwartz and Ben-David, 2014).
As for the benefits of equivariant approximators for generalizability, the following result indicates that the projected equilibrium approximators have smaller covering numbers.
Theorem 4.5.
The -projected, -projected and -projected approximator classes have smaller covering numbers, i.e., we have
The proof is done by showing all the operators are contraction mappings. See Section B.3 for details.
Both the generalization bounds in Theorem 4.3 and the sample complexities in Theorem 4.4 decrease with the decrease of covering numbers . Thus, we can see from Theorem 4.5 that both PPE and OPI can improve the generalizability of NE approximators, and PE can improve the generalizability of CE and CCE approximators.
4.2 Benefits for Approximation
We then show the benefits of equivariance for approximation when the payoff distribution is invariant under permutation. The permutation-invariant distribution holds when the action is anonymous or indifferent or when we pre-train the equilibrium approximators using a manually designed distribution (Marris et al., 2022).
(C)CE Approximator
The following theorem tells the benefit of permutation-equivariance in decreasing the exploitability of (C)CE approximators.
Theorem 4.6.
When the payoff distribution is invariant under the permutation of payoffs, the -projected (C)CE approximator has a smaller expected equilibrium approximation. Formally, for all and permutation-invariant distribution , we have
The proof is done by using the convexity of approximation. See Section B.4 for details. We can see from Theorem 4.6 that, when payoff distribution is invariant under permutation, it is beneficial to use equivariant architecture as the CE or CCE approximators.
NE Approximator
As for NE approximator, we have similar results.
Theorem 4.7.
For bimatrix constant-sum games, when the payoff distribution is invariant under the permutation of payoffs, then the -projected () NE approximator has a smaller expected exploitability. Formally, for all and permutation-invariant distribution for bimatrix constant-sum games, we have
Theorem 4.8.
When the payoff distribution is invariant under the permutation of payoffs, and satisfies OPI, then the -projected NE approximator has a smaller expected NE approximation. Formally, for all that is OPI and permutation-invariant distribution , we have
Theorem 4.9.
For bimatrix games, when the payoff distribution is invariant under the permutation of payoffs, and satisfies PPE, then the -projected NE approximator has a smaller expected NE approximation. Formally, for all that is PPE and permutation-invariant distribution of bimatrix games, we have
Theorem 4.8 and Theorem 4.9 tell that PPE and OPI approximators can achieve better approximation than ones with only PPE or OPI. Meanwhile, we can see from Theorem 4.7 that for bimatrix constant-sum games (such as zero-sum games), it can be preferred to introduce PPE or OPI to the architectures.
5 Theoretical Analysis of Limitations
As we discussed in Section 4, equivariant approximators enjoy better generalizability and better approximation sometimes. However, as we will show, they have some limitations regarding equilibrium selection and social welfare. Such limitations attribute to the limited representativeness caused by equivariance.
5.1 Equilibrium Selection
We first show that there may be equilibria points that equivariant approximators will never find. We illustrate such limitation in permutation-invariant games, which is defined as follows:
Definition 5.1 (Permutation--Invariant Game).
We say a game is permutation--invariant, where , if the payoff is permutation-invariant with respect to . That is, .
Permutation--invariance indicates that one cannot distinguish joint action from using only the payoff . We’d like to provide an example to show more insight of permutation--invariant games:
Example 5.2.
For a -player game , Let and . If one of the following conditions holds, then is permutation--invariant:
-
1.
and are symmetric and persymmetric (i.e., symmetric with respect to the northeast-to-southwest diagonal) squares.
-
2.
Both and are centrosymmetric, i.e., for and .
For permutation and player , we denote the set of non-fixed actions of player under as
Based on , we find some equilibria points of permutation--invariant games that any equivariant approximators will never find.
Theorem 5.3.
For a permutation--invariant game . if there is a pure NE and at least one player such that , then will never be found by any NE approximator with both PPE and OPI. Besides, (as a pure CE or CCE) will also never be found by any CE or CCE approximator with PE.
We illustrate Theorem 5.3 by the following example:
Example 5.4.
Consider a bimatrix game with identity utility
There are two pure NE (bolded in the above matrix) and one mixed NE of and . Let be the unique permute function (except for identity function) of player , and . The game is permutation--invariant.
Case 1: Let be a permutation-equivariant CE or CCE approximator, and denote . We have
where holds by permutation--invariance of , and holds by PE of . Thus, we have and . As a result, the two pure (C)CEs cannot be found.
Case 2: Let be a NE approximator that holds PPE and OPI. Denote , where and . By PPE and OPI of , we have
where holds by permutaion--invariance of , holds by PPE of , and holds by OPI of . As a result, the only NE that could find is the mixed NE.
As we can see from the example and Theorem 5.3, the equivariance, while introducing inductive bias to the approximator architecture, is also a strong constraint. Such a constraint is why the equivariant approximators cannot find all the equilibria points.
5.2 Social Welfare
The social welfare of a joint strategy is defined as the sum of all players’ utilities, i.e.,
The equilibrium with higher social welfare is usually preferred (Marris et al., 2022).
To analyze the social welfare of equivariant approximators, we define the worst social welfare ratio as follows:
Definition 5.5.
For any and two NE (or CE/CCE) approximator classes that target on games with number of players and , we define the worst social welfare ratio of over as:
measures the relative representativeness of over in terms of social welfare. Based on that, we have the following result for equivariant CE and CCE approximator classes:
Theorem 5.6.
Given , let be the function class (target on games with number of players and ) of all the (C)CE approximators with PE. Denote by the function class of all the (C)CE approximators. Then we have
Theorem 5.6 tells that, while the permutation-equivariant (C)CE approximator class may not be able to find all the (C)CE in a game, it can keep the social welfare of the output solutions.
However, when considering equivariant NE approximators, we have the following negative result:
Theorem 5.7.
Given , let and be the function classes (target on games with number of players and ) of all the NE approximators with OPI, PPE and both. Denote the function class of all the NE approximators as . Then we have
(1) | ||||
(2) | ||||
(3) |
Additionally, when , denote by the function class of all the NE oracles (functions that always output exact NE solutions of the input games) with both PPE and OPI, and by the function class of all the NE oracles. Then we have
(4) |
The proof is done by construction (See Section C.3 for details). As an illustration of Equation 4, consider a bimatrix game with the following payoff:
for . The maximum NE (the upper-left corner of ) social welfare is , which can be found by at least one NE oracle in . However, the only NE (the lower-right corner of ) that the NE oracles in could find only has a social welfare of . As a result,
which goes to zero as . Recall that we always have , thus Equation 4 holds when and .
Theorem 5.7 tells that equivariant NE approximators may lose some social welfare while enjoying better generalizability. Such a result inspires us to balance generalizability and social welfare when designing the NE approximator architecture.
6 Conclusion and Future Work
In this paper, we theoretically analyze the benefits and limitations of equivariant equilibrium approximators, including player-permutation-equivariant (PPE) and opponent-permutation-invariant (OPI) NE approximator, and permutation-equivariant (PE) CE and CCE approximators. For the benefits, we first show that these equivariant approximators enjoy better generalizability. To get the result, we derive the generalization bounds and sample complexities based on covering numbers, and then we prove that the symmetric approximators have lower covering numbers. We then show that the equivariant approximators can decrease the exploitability when the payoff distribution is invariant under permutation. For the limitations, we find the equivariant approximators may fail to find some equilibria points due to their limited representativeness caused by equivariance. Besides, while equivariant (C)CE approximators can keep the social welfare, the equivariant NE approximators reach a small worst social welfare ratio comparing to the general approximators. Such a result indicates that equivariance may reduce social welfare; therefore, we’d better balance the generalizability and social welfare when we design the architectures of NE approximators.
As for future work, since in our paper we assume the training and testing payoff distribution are the same, an interesting topic is to study the benefits of equivariant approximators under the payoff distribution shift. Moreover, since we consider fixed and discrete action space, another interesting future direction is to analyze the benefits of equivariant approximators in varying or continuous action space.
References
- Bartlett and Mendelson [2002] Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
- Chen et al. [2009] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
- Daskalakis et al. [2009] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
- Deligkas et al. [2022] Argyrios Deligkas, Michail Fasoulakis, and Evangelos Markakis. A polynomial-time algorithm for 1/3-approximate Nash equilibria in bimatrix games. In 30th Annual European Symposium on Algorithms, ESA, 2022.
- Duan et al. [2021] Zhijian Duan, Dinghuai Zhang, Wenhan Huang, Yali Du, Jun Wang, Yaodong Yang, and Xiaotie Deng. Towards the PAC learnability of Nash equilibrium. arXiv preprint arXiv:2108.07472, 2021.
- Duan et al. [2022] Zhijian Duan, Jingwu Tang, Yutong Yin, Zhe Feng, Xiang Yan, Manzil Zaheer, and Xiaotie Deng. A context-integrated transformer-based neural network for auction design. In International Conference on Machine Learning, pages 5609–5626. PMLR, 2022.
- Dütting et al. [2019] Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David Parkes, and Sai Srivatsa Ravindranath. Optimal auctions through deep learning. In International Conference on Machine Learning, pages 1706–1715. PMLR, 2019.
- Elesedy and Zaidi [2021] Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. In International Conference on Machine Learning, pages 2959–2969. PMLR, 2021.
- Feng et al. [2021] Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, and Yaodong Yang. Neural auto-curricula in two-player zero-sum games. Advances in Neural Information Processing Systems, 34:3504–3517, 2021.
- Fudenberg et al. [1998] Drew Fudenberg, Fudenberg Drew, David K Levine, and David K Levine. The theory of learning in games, volume 2. MIT press, 1998.
- Goktas and Greenwald [2022] Denizalp Goktas and Amy Greenwald. Exploitability minimization in games and beyond. In Advances in Neural Information Processing Systems, 2022.
- Greenwald et al. [2003] Amy Greenwald, Keith Hall, Roberto Serrano, et al. Correlated Q-learning. In ICML, volume 3, pages 242–249, 2003.
- Hartford et al. [2016] Jason S Hartford, James R Wright, and Kevin Leyton-Brown. Deep learning for predicting human strategic behavior. Advances in neural information processing systems, 29, 2016.
- Hu and Wellman [2003] Junling Hu and Michael P Wellman. Nash Q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039–1069, 2003.
- Ivanov et al. [2022] Dmitry Ivanov, Iskander Safiulin, Igor Filippov, and Ksenia Balabaeva. Optimal-er auctions through attention. In Advances in Neural Information Processing Systems, 2022.
- Jin et al. [2022] Chi Jin, Qinghua Liu, Yuanhao Wang, and Tiancheng Yu. V-learning – a simple, efficient, decentralized algorithm for multiagent RL. In ICLR 2022 Workshop on Gamification and Multiagent Solutions, 2022.
- Lanctot et al. [2017] Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Ling et al. [2018] C. Ling, Fei Fang, and J. Z. Kolter. What game are we playing? End-to-end learning in normal and extensive form games. In IJCAI, pages 396–402, 2018.
- Liu et al. [2022] Siqi Liu, Marc Lanctot, Luke Marris, and Nicolas Heess. Simplex neural population learning: Any-mixture bayes-optimality in symmetric zero-sum games. In International Conference on Machine Learning, ICML, 2022.
- Lockhart et al. [2019] Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, and Karl Tuyls. Computing approximate equilibria in sequential adversarial games by exploitability descent. In Sarit Kraus, editor, IJCAI, pages 464–470. ijcai.org, 2019.
- Marris et al. [2021] Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, and Thore Graepel. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In International Conference on Machine Learning, pages 7480–7491. PMLR, 2021.
- Marris et al. [2022] Luke Marris, Ian Gemp, Thomas Anthony, Andrea Tacchetti, Siqi Liu, and Karl Tuyls. Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers. In Advances in Neural Information Processing Systems, 2022.
- Nash et al. [1950] John F Nash et al. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.
- Nekipelov et al. [2015] Denis Nekipelov, Vasilis Syrgkanis, and Eva Tardos. Econometrics for learning agents. In Proceedings of the sixteenth acm conference on economics and computation, pages 1–18, 2015.
- Nikaidô and Isoda [1955] Hukukane Nikaidô and Kazuo Isoda. Note on non-cooperative convex games. Pacific Journal of Mathematics, 5(S1):807–815, 1955.
- Qin et al. [2022] Tian Qin, Fengxiang He, Dingfeng Shi, Wenbing Huang, and Dacheng Tao. Benefits of permutation-equivariance in auction mechanisms. In Advances in Neural Information Processing Systems, 2022.
- Rahme et al. [2021] Jad Rahme, Samy Jelassi, Joan Bruna, and S Matthew Weinberg. A permutation-equivariant neural network architecture for auction design. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- Schulz-Mirbach [1994] Hanns Schulz-Mirbach. Constructing invariant features by averaging techniques. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3-Conference C: Signal Processing (Cat. No. 94CH3440-5), volume 2, pages 387–390. IEEE, 1994.
- Sessa et al. [2020] Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, and Maryam Kamgarpour. Contextual games: Multi-agent learning with side information. Advances in Neural Information Processing Systems, 33:21912–21922, 2020.
- Shalev-Shwartz and Ben-David [2014] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
- Shoham and Leyton-Brown [2008] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
- Tsaknakis and Spirakis [2007] Haralampos Tsaknakis and Paul G Spirakis. An optimization approach for approximate Nash equilibria. In International Workshop on Web and Internet Economics, pages 42–56. Springer, 2007.
- Wu and Lisser [2022] Dawen Wu and Abdel Lisser. Using CNN for solving two-player zero-sum games. Expert Systems with Applications, page 117545, 2022.
- Wu and Lisser [2023] Dawen Wu and Abdel Lisser. CCGnet: A deep learning approach to predict Nash equilibrium of chance-constrained games. Information Sciences, 2023.
Appendix A Omitted Proofs in Section 3
A.1 Useful Lemma
We first introduce a lemma, which will be frequently used in the following proofs.
Lemma A.1.
we have and
Proof.
Define . For product strategy ,
For joint strategy ,
∎
A.2 Proof of Lemma 3.2
See 3.2
Proof.
For player , we have
where holds since is a bijection on . For player , we have
From above, we have , thus if is a -NE of , then must be a -NE of . ∎
A.3 Proof of Lemma 3.5
See 3.5
CCE
For player , we have
For player , we have
Thus, we have . Thus, if is a -CCE of , then must be a -CCE of .
CE
For player , we have
For player , we define operator as . We can verify that is a bijection on , because is a homomorphism in the sense that and maps the identity mapping of to the identity mapping of . Specifically,
and
Based on , we have
Thus, we have , thus if is a -CE of , then must be a -CE of .
A.4 Proof of Lemma 3.7 to Lemma 3.9
See 3.7
Proof.
, for operator we have
where in we define , and holds since is a bijection on . As a result, is -PI.
For operator we have
therefore is -PE.
If is already -PI, we have
and according to definition of . Therefore, for -PI .
If is already -PE, we have
and according to definition of . Therefore, for -PE .
∎
See 3.8
Proof.
A direct inference from Lemma 3.7 ∎
See 3.9
Proof.
, we have
If is already -PE, we have
∎
A.5 Proof of Lemma 3.11
See 3.11
Proof.
We prove the three claims below.
-
1.
.
-
2.
.
-
3.
If , then
The first claim holds because is closed under , and the second claim holds because is idempotent. For the third claim, from we know , then .
We immediately know is the largest subset of that is invariant under . ∎
Appendix B Omitted Proofs in Section 4
B.1 Proof of Theorem 4.3
See 4.3
Some of the proof techniques come from Dütting et al. [2019] and Duan et al. [2021]. We first introduce some useful lemmas. Denote as the loss function (such as ). We measure the capacity of the composite function class using the empirical Rademacher complexity [Bartlett and Mendelson, 2002] on the training set , which is defined as:
where is distributed i.i.d. according to uniform distribution in . We have
Lemma B.1 (Shalev-Shwartz and Ben-David [2014]).
Let be a training set of size drawn i.i.d. from distribution over . Then with probability at least over draw of from , for all ,
Lemma B.2.
If for constant and , then we have
Proof.
For function class , let with be the function class that -covers for some . Similarly, , denote be the function that -covers . We have
(5) | ||||||
Combining Lemma B.1 and Equation 5, we get
∎
B.1.1 NE Approximator
Lemma B.3.
For arbitrary product mixed strategy and , we have
Proof.
, we define . Then, we have
Therefore, ,
Based on that, we get
Similarly, we also have
∎
B.1.2 CCE Approximator
Lemma B.4.
For arbitrary joint mixed strategy and , we have
Proof.
we have
(6) |
where holds since . Therefore, ,
Based on that, we get
Similarly, we also have
∎
B.1.3 CE Approximator
Lemma B.5.
For arbitrary joint mixed strategy and , we have
Proof.
, we have
Based on that, we get
Similarly, we also have
∎
B.2 Proof of Theorem 4.4
See 4.4
Proof.
For function class of NE, CE or CCE approximators, according to Lemma B.3, Lemma B.4 and Lemma B.5, we have
(7) |
where for NE approximators, and for CE and CCE approximators.
For simplicity, we denote and . let with be the function class that -covers for some . , by setting we have
where holds by Equation 7, holds by union bound, and holds by Hoeffding inequality. As a result, when , we have . ∎
B.3 Proof of Theorem 4.5
See 4.5
We first provide an auxiliary lemma.
Lemma B.6.
For function class and orbit averaging operator , if , then for any .
Proof.
, Denote as the smallest -covering set that covers with size . , let be the function that -covers . We have . Therefore, is a -covering set of , and we have . ∎
Proof of Theorem 4.5.
For player and , assuming is closed under any . For ,
Since , we have
(8) |
For ,
Since , we have
(9) |
For CE or CCE approximator and , we have
Since , we have
(10) |
Combing Lemma B.6, Equation 8, Equation 9 and Equation 10, we finish the proof. ∎
B.4 Proof of Theorem 4.6
See 4.6
We first prove a lemma about the property of and .
Lemma B.7.
and are convex on , i.e.
Proof.
We recall the definition for CCE approximator and for CE approximator. is linear on . Given , is also linear on . Moreover, the maximum operator on a set of linear functions will induce a convex function.
∎
B.5 Proof of Theorem 4.7
See 4.7
Proof.
We only prove for the -projected case; the proof for -projected case is similar and therefore omitted.
Recall
Denote , then
Then we have
For the first term,
Similarly, for the second term,
Above all,
∎
B.6 Proof of Theorem 4.8
See 4.8
We first introduce a useful lemma. It is about the property of
Lemma B.8.
is
-
1.
Linear on , i.e.
-
2.
Convex on , i.e.
Proof.
We recall the definition . Notice that is linear on for all , thus both and are linear on for any . Moreover, the maximum operator on a set of linear functions will induce a convex function.
∎
Proof of Theorem 4.8.
We prove the theorem in two steps.
Step 1
First, we show that
By definition,
Step 2
Then we show that
Since and , we have
∎
B.7 Proof of Theorem 4.9
See 4.9
Proof.
We prove the theorem in two steps, similar to the proof of Theorem 4.8.
Step 1
First we show that for player , let ,
This is because
Step 2
Then we show that if and
This is because
Since and , we have
∎
Appendix C Omitted Proofs in Section 5
C.1 Proof of Theorem 5.3
See 5.3
Proof.
Let be a PPE and OPI NE approximator. Denote . For player that , we get
(11) |
where holds since is permutable w.r.t. , holds by OPI of , and holds by PPE of . If can be found by , we will have , where holds by Equation 11. However, such result leads to a contradiction, because but .
Let be a PE (C)CE approximator. Denote , we have
(12) |
where holds since is permutable w.r.t. , holds by PE of . If can be found by , we will have , where holds by Equation 12. However, from we know , then , but . ∎
C.2 Proof of Theorem 5.6
See 5.6
Proof.
Assume is an (C)CE approximator that always finds the strategy that maximizes the social welfare. Afterward, we construct another that satisfies PE and always finds the strategy that maximizes social welfare. is constructed by orbit averaging:
thus is PE.
Denote as an arbitrary payoff distribution of such that is invariant under permutation and the cardinality of its support is finite. We have
Due to that , we have
Due to the arbitrariness of , we know that maximizes the social welfare w.r.t. any .
From above, we immediately know
∎
C.3 Proof of Theorem 5.7
See 5.7
C.3.1 Proof of Equation 1 and Equation 3
We first prove the theorem with respect to and
Step 1
On the one part, we prove
We prove this by construction.
Consider a game with player and for . , define the payoff as follows:
Define and as a uniform distribution on . Easy to certify that is a permutation-invariant distribution.
Let be the NE oracle that and for any , . Intuitively, the oracle will choose the action that will provide all players with revenue , leading to a social welfare of . Since each player has got her maximum possible utility, we have
(13) |
For any and , let for all player be the swap permutation that swaps actions and and keeps other actions still. Then for player . For , we have for arbitrary swap permutation . Since any permutation can be achieved by composition of swap permutations, we have , . Based on that, and by OPI of , we have , i.e. is a constant function on . Without loss of generality, we denote for all . Then
Additionally, we have for any . Based on that, we have
(14) |
Combining Equation 13 and Equation 14, we have
Due to , we immediately know
Step 2
On the other part, we prove
Define the maximum possible utility (MPU) for player with respect to utility and action as
(15) |
Define the set of maximum possible utility best response for player w.r.t. as
We first conduct some simplification to the target.
Then we constrain to be a cooperation game. For a normal form game , we define in which . Then we have , which means that constraining to be a cooperation game will induce the same social welfare. Then
Denote be the approximator that always outputs uniform strategy on for player . It’s obvious that is both OPI and PPE because the operations from to are all permutation-equivariant. Then,
Ignore the infimum and the expectation operator, consider for arbitrary , denote be the maximum element appeared in , then the denominator equals . But for the numerator, for player , no matter what action she chooses, she always has probability at least to achieve revenue , therefore inducing .
Then, , so as , and .
Above all,
C.3.2 Proof of Equation 2
We next prove the theorem with respect to that
Consider a bimatrix game and for . , define the payoff as follows:
Define and as a uniform distribution on . Easy to certify that and is a permutation-invariant distribution.
Let be the NE oracle that and for any , . Intuitively, the oracle will choose the action that will provide all players with revenue of , leading to a social welfare of .
For a permutation on , let be the corresponding permutation matrix. Denote as the set of all permutation matrice. As a result, and . Specially, we have . For , Denote . For permutation in and payoff , by PPE of , we have and Then we have
Therefore
Since and is a tensor with all elements equal to . Thus and
C.3.3 Proof of Equation 4
Consider a game as follows, where :
It is obvious that , and the corresponding strategy has been bolded. However, for NE oracles with both PPE and OPI, it can only output a unique NE with a pure strategy that induces utility .
Let , we have . From the analysis above we know if and , then , . We integrate the first two actions of player and player into a new action that will choose randomly between the first two actions, then we form the utility matrix below:
There is a unique NE in this Prisoner’s Dilemma, which has been bolded. The game is the same with the game under the assumption that and in . Then . Since can be arbitrarily small, we have . As a result, we have for all and .