This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Are Equivariant Equilibrium Approximators Beneficial?

Zhijian Duan1, Yunxuan Ma1, Xiaotie Deng1,2
1Center on Frontiers of Computing Studies, Peking University
2Center for Multi-Agent Research, Institute for AI, Peking University
{zjduan,charmingmyx,xiaotie}@pku.edu.cn
Abstract

Recently, remarkable progress has been made by approximating Nash equilibrium (NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE) through function approximation that trains a neural network to predict equilibria from game representations. Furthermore, equivariant architectures are widely adopted in designing such equilibrium approximators in normal-form games. In this paper, we theoretically characterize benefits and limitations of equivariant equilibrium approximators. For the benefits, we show that they enjoy better generalizability than general ones and can achieve better approximations when the payoff distribution is permutation-invariant. For the limitations, we discuss their drawbacks in terms of equilibrium selection and social welfare. Together, our results help to understand the role of equivariance in equilibrium approximators.

1 Introduction

The equivariant equilibrium property states that, given a Nash Equilibrium (NE) solution of a game, the permuted solution is also an NE for the game whose actions of representation are permuted in the same way. The same property also holds in correlated equilibrium (CE) and coarse correlated equilibrium (CCE), as well as the approximate solutions for all three solution concepts.

In this paper, we are interested in understanding the equivariant equilibrium property in designing neural networks that predict equilibria from game payoffs, following such recent approaches in designing equivariant equilibrium approximators (Feng et al., 2021; Marris et al., 2022) in normal-form games. Informally, such equivariant approximators keep the same permutation of the output strategies (represented as vectors or tensors) when the input game representations (e.g., the game payoff tensors) are permuted 111We will provide a formal definition of equivariance equilibrium approximators in Section 3. While equivariant approximators achieved empirical success, little work has theoretically discussed whether they are beneficial.

1.1 Our Contributions

We theoretically characterize benefits and limitations of equivariant NE, CE and CCE approximators. For the benefits, we first show that equivariant approximators enjoy better generalizability, where we evaluate the approximators using the maximum exploitability (Lockhart et al., 2019; Goktas and Greenwald, 2022) over all players. To get such a result, we derive the generalization bounds and the sample complexities of the NE, CE, and CCE approximators: The generalization bounds offer confidence intervals on the expected testing approximations based on the empirical training approximations; The sample complexities describe how many training samples the equilibrium approximators need to achieve desirable generalizability. The generalization bounds and sample complexities include the covering numbers (Shalev-Shwartz and Ben-David, 2014), which measure the representativeness of the approximators’ function classes. Afterward, we prove that the equivariant approximators have lower covering numbers than the general models, therefore have lower generalization bounds and sample complexities. We then show that the equivariant approximators can achieve better approximation when the payoff distribution is permutation-invariant.

As for the limitations, we find the equivariant approximators unable to find all the equilibria of some normal-form games. Such a result is caused by the limited representativeness of the equivariant approximators’ function class. Besides, we find that the equivariant NE approximator may lose social welfare. Specifically, in an example we constructed, while the maximum NE social welfare is large, the maximum social welfare of NEs that the equivariant NE approximators could find can be arbitrary close to zero. Such a negative result inspires us to balance generalizability and social welfare when we design the approximators’ architectures.

1.2 Further Related Work

Solving (approximate) NE, CE, and CCE for a single game are well studied (Fudenberg et al., 1998; Cesa-Bianchi and Lugosi, 2006). However, many similar games often need to be solved (harris2023metalearning) , both in practice and in some multi-agent learning algorithms (Marris et al., 2021; Liu et al., 2022). For instance, in repeated traffic routing games (Sessa et al., 2020), the payoffs of games depend on the capacity of the underlying network, which can vary with time and weather conditions. In repeated sponsored search auctions, advertisers value different keywords based on the current marketing environment (Nekipelov et al., 2015). In many multi-agent learning algorithms such as Nash Q-learning (Hu and Wellman, 2003), Correlated-Q learning (Greenwald et al., 2003), V-learning (Jin et al., 2022) and PSRO (Lanctot et al., 2017), an NE, CE or CCE of a normal-form game need to be solved in every update step.

In these settings, it is preferred to accelerate the speed of game solving by function approximation: Marris et al. (2022) introduces a neural equilibrium approximator to approximate CE and CCE for nn-player normal-form games; Feng et al. (2021) proposes a neural NE approximator in PSRO (Lanctot et al., 2017); Wu and Lisser (2022) designs a CNN-based NE approximator for zero-sum bimatrix games. Differentiable approximators have also been developed to learn QREs (Ling et al., 2018), NE in chance-constrained games (Wu and Lisser, 2023), and opponent’s strategy (Hartford et al., 2016).

Equivariance is an ideal property of the equilibrium approximator. We will discuss the literates of equivariant approximators after formally defining them in Section 3.

1.3 Organization

The rest of our paper is organized as follows: In Section 2 we introduce the preliminary of game theory and equilibrium approximators. In Section 3 we formally define the equivariance of equilibrium approximators. We present our theoretical analysis of benefits in Section 4 and limitations in Section 5. We conclude and point out the future work in Section 6.

2 Preliminary

In this section, we introduce the preliminary and notations of our paper. We also provide a brief introduction to equilibrium approximators.

2.1 Game Theory

Normal-Form Game

Let a normal-form game with joint payoff uu be Γu=(n,𝒜,u)\Gamma_{u}=(n,\mathcal{A},u), in which

  • n2n\in\mathbb{N}_{\geq 2} is the number of players. Each player is represented by the index i[n]i\in[n].

  • 𝒜=×i[n]𝒜i\mathcal{A}=\times_{i\in[n]}\mathcal{A}_{i} is the product action space of all players, where 𝒜i={1,2,,mi}\mathcal{A}_{i}=\{1,2,\dots,m_{i}\}. For player i[n]i\in[n], let ai𝒜ia_{i}\in\mathcal{A}_{i} be a specific action of ii (An action is also referred to as a pure strategy). A joint action a=(a1,a2,,an)𝒜a=(a_{1},a_{2},\dots,a_{n})\in\mathcal{A} represents one play of the game in which the player ii takes action aia_{i}. The action space 𝒜\mathcal{A} is a Cartesian product that contains all possible joint actions. We have |𝒜|=i[n]|𝒜i|=i[n]mi|\mathcal{A}|=\prod_{i\in[n]}|\mathcal{A}_{i}|=\prod_{i\in[n]}m_{i}.

  • u=(ui)i[n]u=(u_{i})_{i\in[n]} is the joint payoff or utility of the game. uiu_{i} is an nn-dimensional tensor (or matrix if n=2n=2) describing player ii’s payoff on each joint action. In our paper, following previous literatures (Tsaknakis and Spirakis, 2007; Deligkas et al., 2022), we normalize all the elements of payoff into [0,1][0,1].

A joint (mixed) strategy is a distribution over 𝒜\mathcal{A}. Let σ=×i[n]σi\sigma=\times_{i\in[n]}\sigma_{i} be a product strategy and πΔ𝒜\pi\in\Delta\mathcal{A} be a joint (possibly correlated) strategy. Denote πi\pi_{i} as the marginal strategy of player ii in π\pi. The expected utility of player ii under π\pi is

ui(π)=𝔼aπ[ui(a)]=a𝒜π(a)ui(a).u_{i}(\pi)=\mathbb{E}_{a\sim\pi}[u_{i}(a)]=\sum_{a\in\mathcal{A}}\pi(a)u_{i}(a).

Besides, on behalf of player ii, the other players’ joint strategy is denoted as πi\pi_{-i}, so as aia_{-i} and σi\sigma_{-i}.

Nash Equilibrium (NE)

We say a product strategy σ=(σ1,σ2,,σn)\sigma^{*}=(\sigma^{*}_{1},\sigma^{*}_{2},\dots,\sigma^{*}_{n}) is a NE if each player’s strategy is the best response given the strategies of others, i.e.,

ui(σi,σi)ui(σi,σi),i[n],σiΔ𝒜i.u_{i}(\sigma_{i},\sigma^{*}_{-i})\leq u_{i}(\sigma^{*}_{i},\sigma^{*}_{-i}),\leavevmode\nobreak\ \forall i\in[n],\sigma_{i}\in\Delta\mathcal{A}_{i}. (NE)

Computing NE for even general 22-player or 33-player games is PPAD-hard (Chen et al., 2009; Daskalakis et al., 2009), which leads to research on approximate solutions. For arbitrary ϵ>0\epsilon>0, we say a product strategy σ^\hat{\sigma} is an ϵ\epsilon-approximate Nash equilibrium (ϵ\epsilon-NE) if no one can achieve more than ϵ\epsilon utility gain by deviating from her current strategy. Formally,

ui(σi,σ^i)ui(σ^i,σ^i)+ϵ,i[n],σiΔ𝒜i.u_{i}(\sigma_{i},\hat{\sigma}_{-i})\leq u_{i}(\hat{\sigma}_{i},\hat{\sigma}_{-i})+\epsilon,\leavevmode\nobreak\ \forall i\in[n],\sigma_{i}\in\Delta\mathcal{A}_{i}. (ϵ\epsilon-NE)

The definition of ϵ\epsilon-NE reflects the idea that players might not be willing to deviate from their strategies when the amount of utility they could gain by doing so is tiny (not more than ϵ\epsilon).

Coarse Correlated Equilibrium (CCE)

We say a joint (possibly correlated) strategy π\pi^{*} is a CCE if no player can receive a higher payoff by acting independently, i.e.,

ui(σi,πi)ui(π),i[n],σiΔ𝒜i,u_{i}(\sigma_{i},\pi^{*}_{-i})\leq u_{i}(\pi^{*}),\leavevmode\nobreak\ \forall i\in[n],\sigma_{i}\in\Delta\mathcal{A}_{i}, (CCE)

and we say π^\hat{\pi} is an ϵ\epsilon-approximate coarse correlated equilibrium (ϵ\epsilon-CCE) for ϵ>0\epsilon>0 if

ui(σi,π^i)ui(π^)+ϵ,i[n],σiΔ𝒜i,u_{i}(\sigma_{i},\hat{\pi}_{-i})\leq u_{i}(\hat{\pi})+\epsilon,\leavevmode\nobreak\ \forall i\in[n],\sigma_{i}\in\Delta\mathcal{A}_{i}, (ϵ\epsilon-CCE)

The difference between NE and CCE is that in an NE, players execute their strategy individually in a decentralized way, while in a CCE, players’ strategies are possibly correlated. A standard technique to correlate the strategy is sending each player a signal from a centralized controller  (Shoham and Leyton-Brown, 2008).

Correlated Equilibrium (CE)

CE is similar to CCE, except that in a CE, each player can observe her recommended action before she acts. Thus, player ii deviates her strategy through strategy modification ϕi:𝒜i𝒜i\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}. ϕi\phi_{i} maps actions in 𝒜i\mathcal{A}_{i} to possibly different actions in 𝒜i\mathcal{A}_{i}. Based on strategy modification, we say a joint (possibly correlated) strategy π\pi^{*} is a CE if

a𝒜π(a)ui(ϕi(ai),ai)ui(π),i,ϕi,\sum_{a\in\mathcal{A}}\pi^{*}(a)u_{i}(\phi_{i}(a_{i}),a_{-i})\leq u_{i}(\pi^{*}),\leavevmode\nobreak\ \forall i,\forall\phi_{i}, (CE)

and a joint strategy π^\hat{\pi} is an ϵ\epsilon-approximate correlated equilibrium (ϵ\epsilon-CE) for ϵ>0\epsilon>0 if

a𝒜π^(a)ui(ϕi(ai),ai)ui(π^)+ϵ,i,ϕi,\sum_{a\in\mathcal{A}}\hat{\pi}(a)u_{i}(\phi_{i}(a_{i}),a_{-i})\leq u_{i}(\hat{\pi})+\epsilon,\leavevmode\nobreak\ \forall i,\forall\phi_{i}, (ϵ\epsilon-CE)

Note that for a finite nn-player normal-form game, at least one NE, CE, and CCE must exist. This is because NE always exists (Nash et al., 1950) and NE \subseteq CE \subseteq CCE.

Equilibrium Approximation

To evaluate the quality of a joint strategy to approximate an equilibrium, we define approximation based on exploitability (Lockhart et al., 2019; Goktas and Greenwald, 2022).

Definition 2.1 (Exploitability and Approximation).

Given a joint strategy π\pi, the exploitability (or regret) i(π,u)\mathcal{E}_{i}(\pi,u) of player ii is the maximum payoff gain of ii by deviating from her current strategy, i.e.,

i(π,u)\displaystyle\mathcal{E}_{i}(\pi,u) maxσiui(σi,πi)ui(π)=maxaiui(ai,πi)ui(π)\displaystyle\coloneqq\max_{\sigma^{\prime}_{i}}u_{i}(\sigma^{\prime}_{i},\pi_{-i})-u_{i}(\pi)=\max_{a^{\prime}_{i}}u_{i}(a^{\prime}_{i},\pi_{-i})-u_{i}(\pi)

and the exploitability under strategy modification iCE(π,u)\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u) of player ii is the maximum payoff gain of ii by deviating through strategy modification, i.e.,

iCE(π,u)maxϕia𝒜π(a)ui(ϕi(ai),ai)ui(π).\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u)\coloneqq\max_{\phi_{i}}\sum_{a\in\mathcal{A}}\pi(a)u_{i}(\phi_{i}(a_{i}),a_{-i})-u_{i}(\pi).

The equilibrium approximation is defined as the maximum exploitability over all players 222 A similar metric of equilibrium approximation is called Nikaido-Isoda function (Nikaidô and Isoda, 1955) or NashConv (Lockhart et al., 2019), which is the sum of exploitability over all players, i.e., i[n]i(π,u)\sum_{i\in[n]}\mathcal{E}_{i}(\pi,u). , i.e.,

(π,u){maxi[n]i(π,u),for NE and CCEmaxi[n]iCE(π,u),for CE\mathcal{E}(\pi,u)\coloneqq\begin{cases}\max_{i\in[n]}\mathcal{E}_{i}(\pi,u)&,\text{for NE and CCE}\\ \max_{i\in[n]}\mathcal{E}^{\mathrm{CE}}_{i}(\pi,u)&,\text{for CE}\end{cases}

Based on approximation, we can restate the definition of solution concepts. A product strategy σ\sigma is an NE of game Γu\Gamma_{u} if (σ,u)=0\mathcal{E}(\sigma,u)=0 and is an ϵ\epsilon-NE if (σ,u)ϵ\mathcal{E}(\sigma,u)\leq\epsilon. A joint strategy π\pi is a (C)CE of Γu\Gamma_{u} if (π,u)=0\mathcal{E}(\pi,u)=0 and is an ϵ\epsilon-(C)CE if (π,u)ϵ\mathcal{E}(\pi,u)\leq\epsilon.

2.2 Equilibrium Approximator

The equilibrium approximators, including NE, CE, and CCE approximators, aim to predict solution concepts from game representations. In our paper, we fix the number of players nn and action space 𝒜\mathcal{A}. We denote by 𝒰\mathcal{U} the set of all the possible game payoffs. The NE approximator fNE:𝒰×i[n]Δ𝒜if^{\mathrm{NE}}:\mathcal{U}\to\times_{i\in[n]}\Delta\mathcal{A}_{i} maps a game payoff to a product strategy, where fNE(u)iΔ𝒜if^{\mathrm{NE}}(u)_{i}\in\Delta\mathcal{A}_{i} is player ii’s predicted strategy. We define NE\mathcal{F}^{\mathrm{NE}} as the function class of the NE approximator. Similarly, we define (C)CE approximator as f(C)CE:𝒰Δ𝒜f^{\mathrm{(C)CE}}:\mathcal{U}\to\Delta\mathcal{A} and (C)CE approximator class as (C)CE\mathcal{F}^{\mathrm{(C)CE}}.

Algorithm 1 Example: learning NE/CCE approximator via minibatch SGD
1:  Input: Training set SS
2:  Parameters: Number of iterations T>0T>0, batch size B>0B>0, learning rate η>0\eta>0, initial parameters w0dw_{0}\in\mathbb{R}^{d} of the approximator model.
3:  for t= 0t\leavevmode\nobreak\ =\leavevmode\nobreak\ 0 to TT do
4:     Receive minibatch St={u(1),,u(B)}S{S}_{t}\,=\,\{u^{(1)},\ldots,u^{(B)}\}\subset S
5:     Compute the empirical average approximation of StS_{t}:
6:         LSt(fwt)1Bi=1B(fwt(u(i)),u(i))L_{S_{t}}(f^{w_{t}})\leftarrow\frac{1}{B}\sum_{i=1}^{B}\mathcal{E}(f^{w_{t}}(u^{(i)}),u^{(i)})
7:     Update model parameters:
8:         wt+1wtηwtLSt(fwt)w_{t+1}\leftarrow w_{t}-\eta\nabla_{w_{t}}L_{S_{t}}(f^{w_{t}})
9:  end for

An equilibrium approximator can be learned through machine learning paradigms based on empirical data. For instance, Feng et al. (2021) learn the NE approximator using the game payoffs generated in the learning process of PSRO, and optimize the approximator by gradient descent in exploitability. Marris et al. (2022) learn the CE and CCE approximators using the games i.i.d. generated from a manually designed distribution, and optimize the approximators using maximum welfare minimum relative entropy loss. Such a loss balances the equilibrium approximation, the social welfare, and the relative entropy of the joint strategy. Additionally, another simple way to learn NE or CCE equilibrium approximator is to apply minibatch stochastic gradient descent (SGD) on approximation. Specifically, we denote wdw\in\mathbb{R}^{d} as the dd-dimensional parameter of the approximator, such as the weights of the neural network. We can optimize ww by the standard minibatch SGD algorithm on approximation (See Algorithm 1).

3 Equivariant Equilibrium Approximator

In this section, we introduce the equivariance of the equilibrium approximators and the way how we apply orbit averaging (Elesedy and Zaidi, 2021) to construct equivariant approximators. Equivariant approximator has been developed in many literatures (Hartford et al., 2016; Feng et al., 2021; Marris et al., 2022; Wu and Lisser, 2022), which we will discuss latter.

We first define the permutation of a game. Let ρi:𝒜i𝒜i\rho_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i} be a permutation function of player ii, which is a bijection from 𝒜i\mathcal{A}_{i} to 𝒜i\mathcal{A}_{i} itself. Define 𝒢iρi\mathcal{G}_{i}\ni\rho_{i} as the class of permutation function for player ii, which forms an Abelian group.

Definition 3.1 (Permutation of a game).

For a normal-form game Γu=(n,u,𝒜)\Gamma_{u}=(n,u,\mathcal{A}), we define the ρi\rho_{i}-permutation of payoff tensor uu as ρiu=(ρiuj)j[n]\rho_{i}u=(\rho_{i}u_{j})_{j\in[n]}, in which

(ρiuj)(ai,ai)=uj(ρi1(ai),ai),a𝒜.(\rho_{i}u_{j})(a_{i},a_{-i})=u_{j}(\rho_{i}^{-1}(a_{i}),a_{-i}),\leavevmode\nobreak\ \forall a\in\mathcal{A}.

We also define the ρi\rho_{i}-permutation of joint strategy π\pi as ρiπ\rho_{i}\pi, where

(ρiπ)(ai,ai)=π(ρi1(ai),ai),a𝒜,(\rho_{i}\pi)(a_{i},a_{-i})=\pi(\rho_{i}^{-1}(a_{i}),a_{-i}),\leavevmode\nobreak\ \forall a\in\mathcal{A},

and the ρi\rho_{i}-permutation of product strategy σ\sigma as ρiσ=(ρiσj)j[n]\rho_{i}\sigma=(\rho_{i}\sigma_{j})_{j\in[n]}, where

aj𝒜j,ρiσj(aj)={σj(aj),if jiσi(ρi1ai),if j=i\forall a_{j}\in\mathcal{A}_{j},\rho_{i}\sigma_{j}(a_{j})=\begin{cases}\sigma_{j}(a_{j})&,\text{if }j\neq i\\ \sigma_{i}(\rho_{i}^{-1}a_{i})&,\text{if }j=i\\ \end{cases}
Equivariant NE Approximator

Considering the relationship of ρi\rho_{i}-permuted game and ρi\rho_{i}-permuted product strategy, we have the following result for NE:

Lemma 3.2.

In a normal-form game Γu=(n,u,𝒜)\Gamma_{u}=(n,u,\mathcal{A}), for arbitrary player i[n]i\in[n] and any (ϵ\epsilon-)NE strategy σ=(σi,σi)\sigma=(\sigma_{i},\sigma_{-i}), ρiσ=(ρiσi,σi)\rho_{i}\sigma=(\rho_{i}\sigma_{i},\sigma_{-i}) is also an (ϵ\epsilon-)NE for the ρi\rho_{i}-permuted game Γρiu\Gamma_{\rho_{i}u}.

3.2 tells the unimportance of action order with respect to NE approximation. Based on it, we can summarize two ideal properties for NE approximators, which are defined as follows:

Definition 3.3 (Player-Permutation-Equivariance, PPE).

We say an NE approximator fNEf^{\mathrm{NE}} satisfies player ii permutation-equivariant (ii-PE) if for arbitrary permutation function ρi𝒢i\rho_{i}\in\mathcal{G}_{i} we have

fNE(ρiu)i=ρifNE(u)i,f^{\mathrm{NE}}(\rho_{i}u)_{i}=\rho_{i}f^{\mathrm{NE}}(u)_{i}, (ii-PE)

Moreover, we say fNEf^{\mathrm{NE}} is player-permutation-equivariant (PPE) if fNEf^{\mathrm{NE}} satisfies ii-PE for all player i[n]i\in[n].

Definition 3.4 (Opponent-Permutation-Invariance, OPI).

We say an NE approximator fNEf^{\mathrm{NE}} is opponent ii permutation-invariant (ii-PI) if for any other player j[n]{i}j\in[n]-\{i\} and arbitrary permutation function ρi𝒢i\rho_{i}\in\mathcal{G}_{i} we have

fNE(ρiu)j=fNE(u)j,jif^{\mathrm{NE}}(\rho_{i}u)_{j}=f^{\mathrm{NE}}(u)_{j},\forall j\neq i (ii-PI)

and we say fNEf^{\mathrm{NE}} is opponent-permutation-invariant (OPI) if fNEf^{\mathrm{NE}} satisfies ii-PI for all player i[n]i\in[n].

Equivariant (C)CE approximator

Considering the relationship of ρi\rho_{i}-permuted game and ρi\rho_{i}-permuted joint strategy, we have a similar result for CE and CCE:

Lemma 3.5.

In a normal-form game Γu=(n,u,𝒜)\Gamma_{u}=(n,u,\mathcal{A}), for an arbitrary player i[n]i\in[n] and any (ε\varepsilon-)CE or (ϵ\epsilon-)CCE strategy π\pi, ρiπ\rho_{i}\pi is also an (ε\varepsilon-)CE or (ϵ\epsilon-)CCE for the ρi\rho_{i}-permuted game Γρiu\Gamma_{\rho_{i}u}.

Inspired by Lemma 3.5, we can also summarize an ideal property for CE and CCE approximators defined as follows.

Definition 3.6 (Permutation-Equivariance,PE).

We say an (C)CE approximator f(C)CEf^{\text{(C)CE}} is player ii permutation-equivariant (ii-PE) if for permutation function ρi𝒢i\rho_{i}\in\mathcal{G}_{i} we have

f(C)CE(ρiu)=ρif(C)CE(u),f^{\mathrm{(C)CE}}(\rho_{i}u)=\rho_{i}f^{\mathrm{(C)CE}}(u),

and we say f(C)CEf^{\text{(C)CE}} is permutation-equivariant (PE) if f(C)CEf^{\text{(C)CE}} satisfies ii-PE for all player i[n]i\in[n].

Equivariant Approximators in Literature

For two-player games, Feng et al. (2021) propose an MLP-based NE approximator that satisfies both PPE and OPI for zero-sum games. Additionally, they also design a Conv11d-based NE approximator that satisfies PPE only. Hartford et al. (2016) give a PPE approximator to predict players’ strategies. The traditional algorithms Tsaknakis and Spirakis (2007) and Deligkas et al. (2022), which approximate NE by optimization, are also PPE and OPI to payoff and the initial strategies. For nn-player general games, Marris et al. (2022) provide a permutation-equivariant approximator to approximate CE and CCE. Equivariant architectures are also adopted in optimal auction design (Rahme et al., 2021; Duan et al., 2022; Ivanov et al., 2022), and Qin et al. (2022) theoretically characterize the benefits of permutation-equivariant in auction mechanisms. We follow the rough idea of Qin et al. (2022) when we analyze the benefits of equivariant equilibrium approximators.

3.1 Orbit Averaging

Orbit averaging is a well-known method to enforce equivariance or invariance for a function (Schulz-Mirbach, 1994). It averages the inputs of a function over the orbit of a group (e.g., the permutation group in our paper).

Orbit Averaging for NE Approximator

For an NE approximator fNEf^{\mathrm{NE}} and any player i[n]i\in[n], we can construct a ii-PI or ii-PE NE approximator by averaging with respect to all the permutations of player ii. Specifically, we construct an ii-PI NE approximator by operator 𝒪i\mathcal{O}_{i} with

(𝒪ifNE)(u)j={fNE(u)i,if j=i1|𝒜i|!ρi𝒢ifNE(ρiu)j,otherwise(\mathcal{O}_{i}f^{\mathrm{NE}})(u)_{j}=\begin{cases}f^{\mathrm{NE}}(u)_{i}&,\text{if }j=i\\ \frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\rho_{i}u)_{j}&,\text{otherwise}\end{cases}

and we construct an ii-PE NE approximator by operator 𝒫i\mathcal{P}_{i} with:

(𝒫ifNE)(u)j={1|𝒜i|!ρi𝒢iρi1fNE(ρiu)i,if j=ifNE(u)j,otherwise(\mathcal{P}_{i}f^{\mathrm{NE}})(u)_{j}=\begin{cases}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\mathrm{NE}}(\rho_{i}u)_{i}&,\text{if }j=i\\ f^{\mathrm{NE}}(u)_{j}&,\text{otherwise}\end{cases}
Lemma 3.7.

𝒪ifNE\mathcal{O}_{i}f^{\mathrm{NE}} is ii-PI and 𝒫ifNE\mathcal{P}_{i}f^{\mathrm{NE}} is ii-PE. Specially, if fNEf^{\mathrm{NE}} is already ii-PI or ii-PE, then we have 𝒪ifNE=fNE\mathcal{O}_{i}f^{\mathrm{NE}}=f^{\mathrm{NE}} or 𝒫ifNE=fNE\mathcal{P}_{i}f^{\mathrm{NE}}=f^{\mathrm{NE}}, respectively.

To construct a PPE or OPI NE approximator, we composite the operators with respect to all players. Let 𝒪=𝒪1𝒪2𝒪n\mathcal{O}=\mathcal{O}_{1}\circ\mathcal{O}_{2}\circ\dots\circ\mathcal{O}_{n} and 𝒫=𝒫1𝒫2𝒫n\mathcal{P}=\mathcal{P}_{1}\circ\mathcal{P}_{2}\circ\dots\circ\mathcal{P}_{n}, we get the following corollary:

Lemma 3.8.

𝒪fNE\mathcal{O}f^{\mathrm{NE}} is OPI and 𝒫fNE\mathcal{P}f^{\mathrm{NE}} is PPE. If fNEf^{\mathrm{NE}} is already OPI or PPE, we have 𝒪fNE=fNE\mathcal{O}f^{\mathrm{NE}}=f^{\mathrm{NE}} or 𝒫fNE=fNE\mathcal{P}f^{\mathrm{NE}}=f^{\mathrm{NE}}, respectively.

Furthermore, we can also compose 𝒫𝒪\mathcal{P}\circ\mathcal{O} to construct a NE approximator with both PPE and OPI.

Orbit Averaging for (C)CE Approximator

For CE or CCE approximator ff, we define 𝒬i\mathcal{Q}_{i}-project for player i[n]i\in[n] to construct an ii-PE approximator, which averages with respect to all the permutations of player ii.

(𝒬if(C)CE)(u)=1|𝒜i|!ρi𝒢iρi1f(C)CE(ρiu)(\mathcal{Q}_{i}f^{\text{(C)CE}})(u)=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\text{(C)CE}}(\rho_{i}u)

Similarly, we define 𝒬=𝒬1𝒬2𝒬n\mathcal{Q}=\mathcal{Q}_{1}\circ\mathcal{Q}_{2}\circ\dots\circ\mathcal{Q}_{n} as the composite operator.

Lemma 3.9.

𝒬if(C)CE\mathcal{Q}_{i}f^{\text{(C)CE}} is ii-PE and 𝒬f(C)CE\mathcal{Q}f^{\text{(C)CE}} is PE. Specifically, If f(C)CEf^{\text{(C)CE}} is already ii-PE or PE, then we have 𝒬if(C)CE=f(C)CE\mathcal{Q}_{i}f^{\text{(C)CE}}=f^{\text{(C)CE}} or 𝒬f(C)CE=f(C)CE\mathcal{Q}f^{\text{(C)CE}}=f^{\text{(C)CE}}, respectively.

Combined with Lemma 3.7, Lemma 3.8 and Lemma 3.9, we can have the following corollary directly.

Corollary 3.10.

𝒪2=𝒪,𝒫2=𝒫,𝒬2=𝒬\mathcal{O}^{2}=\mathcal{O},\mathcal{P}^{2}=\mathcal{P},\mathcal{Q}^{2}=\mathcal{Q}.

The benefit of using orbit averaging is shown in the following lemma:

Lemma 3.11.

Denote 𝒳\mathcal{X} as an idempotent operator, i.e. 𝒳2=𝒳\mathcal{X}^{2}=\mathcal{X} (e.g. 𝒪,𝒫\mathcal{O},\mathcal{P} or 𝒬\mathcal{Q}). For function class \mathcal{F} of NE, CE or CCE approximator, let 𝒳\mathcal{F}_{\mathcal{X}} be any subset of \mathcal{F} that is closed under 𝒳\mathcal{X}, then 𝒳𝒳\mathcal{X}\mathcal{F}_{\mathcal{X}} is the largest subset of 𝒳\mathcal{F}_{\mathcal{X}} that is invariant under 𝒳\mathcal{X}.

According to Lemma 3.8, Lemma 3.9 and Lemma 3.11, 𝒪NE\mathcal{O}\mathcal{F}^{\mathrm{NE}}(or 𝒫NE\mathcal{P}\mathcal{F}^{\mathrm{NE}}/𝒬(C)CE\mathcal{Q}\mathcal{F}^{\mathrm{(C)CE}}) is the largest subset of NE\mathcal{F}^{\mathrm{NE}}(or NE\mathcal{F}^{\mathrm{NE}}/(C)CE\mathcal{F}^{\mathrm{(C)CE}}) with the corresponding property (OPI, PPE, or PE) if NE\mathcal{F}^{\mathrm{NE}}(or NE\mathcal{F}^{\mathrm{NE}}/(C)CE\mathcal{F}^{\mathrm{(C)CE}}) is closed operator under 𝒪\mathcal{O}(or 𝒫\mathcal{P}/𝒬\mathcal{Q}). The result tells that the orbit averaging operators, while enforcing the operated function to be equivariance or invariance, keep as large capacity of the function class as possible. Therefore, we believe that orbit averaging is an ideal approach to constructing equivariant or invariant functions.

4 Theoretical Analysis of Benefits

In this section, we theoretically analyze the benefits of equivariant approximators with respect to generalizability and approximation.

4.1 Benefits for Generalization

We first derive the generalization bound and sample complexity for general approximator classes, and then we show the benefits of equivariant approximators by applying orbit averaging to the approximators.

The representativeness of an approximator class is measured by the covering numbers (Shalev-Shwartz and Ben-David, 2014) under \ell_{\infty}-distance, which are defined as follows:

Definition 4.1 (\ell_{\infty}-distance).

The \ell_{\infty}-distance between two equilibrium approximators f,gf,g is:

(f,g)=maxu𝒰f(u)g(u),\ell_{\infty}(f,g)=\max_{u\in\mathcal{U}}{\|f(u)-g(u)\|},

where we define the distance of two product strategies σ\sigma and σ\sigma^{\prime} as

σ1σ2=maxi[n]ai𝒜i|σi1(ai)σi2(ai)|{\|\sigma^{1}-\sigma^{2}\|}=\max_{i\in[n]}\sum_{a_{i}\in\mathcal{A}_{i}}|\sigma^{1}_{i}(a_{i})-\sigma^{2}_{i}(a_{i})|

and the distance of two joint strategy π\pi and π\pi^{\prime} as

π1π2=a𝒜|π1(a)π2(a)|{\|\pi^{1}-\pi^{2}\|}=\sum_{a\in\mathcal{A}}|\pi^{1}(a)-\pi^{2}(a)|
Definition 4.2 (rr-covering number).

For r>0r>0, we say function class r\mathcal{F}_{r} rr-covers another function class \mathcal{F} under \ell_{\infty}-distance if for all function ff\in\mathcal{F}, there exists frrf_{r}\in\mathcal{F}_{r} such that ffrr{\|f-f_{r}\|}_{\infty}\leq r. The rr-covering number 𝒩(,r)\mathcal{N}_{\infty}(\mathcal{F},r) of \mathcal{F} is the cardinality of the smallest function class r\mathcal{F}_{r} that rr-covers \mathcal{F} under \ell_{\infty}-distance.

Based on covering numbers, we provide the generalization bounds of NE, CE and CCE approximators. The bounds describe the difference between the expected testing approximation and empirical training approximation.

Theorem 4.3.

[Generalization bound] For function class \mathcal{F} of NE, CE or CCE approximator, with probability at least 1δ1-\delta over draw of the training set SS (with size mm) from payoff distribution 𝒟\mathcal{D}, for all approximator ff\in\mathcal{F} we have

𝔼u𝒟[(f(u),u)]1muS(f(u),u)2infr>0{2ln𝒩(,r)m+Lr}+42ln(4/δ)m,\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f(u),u)]-\frac{1}{m}\sum_{u\in S}\mathcal{E}(f(u),u)\leq 2\cdot\inf_{r>0}\{\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}{m}}+Lr\}+4\sqrt{\frac{2\ln(4/\delta)}{m}},

where L=2nL=2n for NE approximator, and L=2L=2 for CE and CCE approximators.

To get the theorem, we first show that all three equilibrium approximations are Lipschitz continuous with respect to strategies. Afterward, we derive the Rademacher complexity (Bartlett and Mendelson, 2002) of the expected approximation based on the Lipschitz continuity and covering numbers. See Section B.1 for the detailed proof.

We can see from Theorem 4.3 that, with a large enough training set, the generalization gaps of equilibrium approximators go to zero if the covering number 𝒩(,r)\mathcal{N}_{\infty}(\mathcal{F},r) is bounded. As a result, we can estimate the expected testing performance through the empirical training performance.

We can also derive the sample complexities of equilibrium approximators to achieve the desirable generalizability.

Theorem 4.4.

[Sample complexity] For ϵ,δ(0,1)\epsilon,\delta\in(0,1), function class \mathcal{F} of NE, CE or CCE approximator and distribution 𝒟\mathcal{D}, with probability at least 1δ1-\delta over draw of the training set SS with

m92ϵ2(ln2δ+ln𝒩(,ϵ3L))m\geq\frac{9}{2\epsilon^{2}}\left(\ln\frac{2}{\delta}+\ln\mathcal{N}_{\infty}(\mathcal{F},\frac{\epsilon}{3L})\right)

games sampled from 𝒟\mathcal{D}, f\forall f\in\mathcal{F} we have

𝔼u𝒟[(f(u),u)]1muS(f(u),u)+ϵ,\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f(u),u)]\leq\frac{1}{m}\sum_{u\in S}\mathcal{E}(f(u),u)+\epsilon,

where L=2nL=2n for NE approximators, and L=2L=2 for CE and CCE approximators.

The proof is based on the Lipschitz continuity of approximation, uniform bound, and concentration inequality. See Section B.2 for details. Theorem 4.4 is also called the uniform convergence of function class \mathcal{F}, which is a sufficient condition for agnostic PAC learnable (Shalev-Shwartz and Ben-David, 2014).

As for the benefits of equivariant approximators for generalizability, the following result indicates that the projected equilibrium approximators have smaller covering numbers.

Theorem 4.5.

The 𝒪\mathcal{O}-projected, 𝒫\mathcal{P}-projected and 𝒬\mathcal{Q}-projected approximator classes have smaller covering numbers, i.e., r>0\forall r>0 we have

𝒩(𝒪NE,r)\displaystyle\mathcal{N}_{\infty}(\mathcal{O}\mathcal{F}^{\mathrm{NE}},r) 𝒩(NE,r),\displaystyle\leq\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{NE}},r),
𝒩(𝒫NE,r)\displaystyle\mathcal{N}_{\infty}(\mathcal{P}\mathcal{F}^{\mathrm{NE}},r) 𝒩(NE,r),\displaystyle\leq\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{NE}},r),
𝒩(𝒬(C)CE,r)\displaystyle\mathcal{N}_{\infty}(\mathcal{Q}\mathcal{F}^{\mathrm{(C)CE}},r) 𝒩((C)CE,r)\displaystyle\leq\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{(C)CE}},r)

The proof is done by showing all the operators are contraction mappings. See Section B.3 for details.

Both the generalization bounds in Theorem 4.3 and the sample complexities in Theorem 4.4 decrease with the decrease of covering numbers 𝒩(,r)\mathcal{N}_{\infty}(\mathcal{F},r). Thus, we can see from Theorem 4.5 that both PPE and OPI can improve the generalizability of NE approximators, and PE can improve the generalizability of CE and CCE approximators.

4.2 Benefits for Approximation

We then show the benefits of equivariance for approximation when the payoff distribution is invariant under permutation. The permutation-invariant distribution holds when the action is anonymous or indifferent or when we pre-train the equilibrium approximators using a manually designed distribution (Marris et al., 2022).

(C)CE Approximator

The following theorem tells the benefit of permutation-equivariance in decreasing the exploitability of (C)CE approximators.

Theorem 4.6.

When the payoff distribution 𝒟\mathcal{D} is invariant under the permutation of payoffs, the 𝒬\mathcal{Q}-projected (C)CE approximator has a smaller expected equilibrium approximation. Formally, for all f(C)CE(C)CEf^{\mathrm{(C)CE}}\in\mathcal{F}^{\mathrm{(C)CE}} and permutation-invariant distribution 𝒟\mathcal{D}, we have

𝔼u𝒟[(𝒬f(C)CE(u),u)]𝔼u𝒟[(f(C)CE(u),u)],\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(\mathcal{Q}f^{\mathrm{(C)CE}}(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{(C)CE}}(u),u)],

The proof is done by using the convexity of approximation. See Section B.4 for details. We can see from Theorem 4.6 that, when payoff distribution is invariant under permutation, it is beneficial to use equivariant architecture as the CE or CCE approximators.

NE Approximator

As for NE approximator, we have similar results.

Theorem 4.7.

For bimatrix constant-sum games, when the payoff distribution 𝒟\mathcal{D} is invariant under the permutation of payoffs, then the 𝒳\mathcal{X}-projected (𝒳{𝒪,𝒫}\mathcal{X}\in\{\mathcal{O},\mathcal{P}\}) NE approximator has a smaller expected exploitability. Formally, for all fNENEf^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}} and permutation-invariant distribution 𝒟\mathcal{D} for bimatrix constant-sum games, we have

𝔼u𝒟[ii((𝒳fNE)(u),u)]𝔼u𝒟[ii(fNE(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\sum_{i}\mathcal{E}_{i}((\mathcal{X}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\sum_{i}\mathcal{E}_{i}(f^{\mathrm{NE}}(u),u)]
Theorem 4.8.

When the payoff distribution 𝒟\mathcal{D} is invariant under the permutation of payoffs, and fNEf^{\mathrm{NE}} satisfies OPI, then the 𝒫\mathcal{P}-projected NE approximator has a smaller expected NE approximation. Formally, for all fNENEf^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}} that is OPI and permutation-invariant distribution 𝒟\mathcal{D}, we have

𝔼u𝒟[((𝒫fNE)(u),u)]𝔼u𝒟[(fNE(u),u)].\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}((\mathcal{P}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{NE}}(u),u)].
Theorem 4.9.

For bimatrix games, when the payoff distribution 𝒟\mathcal{D} is invariant under the permutation of payoffs, and fNEf^{\mathrm{NE}} satisfies PPE, then the 𝒪\mathcal{O}-projected NE approximator has a smaller expected NE approximation. Formally, for all fNENEf^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}} that is PPE and permutation-invariant distribution 𝒟\mathcal{D} of bimatrix games, we have

𝔼u𝒟[((𝒪fNE)(u),u)]𝔼u𝒟[(fNE(u),u)].\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}((\mathcal{O}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{NE}}(u),u)].

Theorem 4.8 and Theorem 4.9 tell that PPE and OPI approximators can achieve better approximation than ones with only PPE or OPI. Meanwhile, we can see from Theorem 4.7 that for bimatrix constant-sum games (such as zero-sum games), it can be preferred to introduce PPE or OPI to the architectures.

5 Theoretical Analysis of Limitations

As we discussed in Section 4, equivariant approximators enjoy better generalizability and better approximation sometimes. However, as we will show, they have some limitations regarding equilibrium selection and social welfare. Such limitations attribute to the limited representativeness caused by equivariance.

5.1 Equilibrium Selection

We first show that there may be equilibria points that equivariant approximators will never find. We illustrate such limitation in permutation-invariant games, which is defined as follows:

Definition 5.1 (Permutation-ρ\rho-Invariant Game).

We say a game Γu\Gamma_{u} is permutation-ρ\rho-invariant, where ρ=i[n]ρi\rho=\circ_{i\in[n]}\rho_{i}, if the payoff uu is permutation-invariant with respect to ρ\rho. That is, ρu=u\rho u=u.

Permutation-ρ\rho-invariance indicates that one cannot distinguish joint action aa from ρa\rho a using only the payoff uu. We’d like to provide an example to show more insight of permutation-ρ\rho-invariant games:

Example 5.2.

For a 22-player game Γu=(2,u=(u1,u2),𝒜=([m1],[m2]))\Gamma_{u}=(2,u=(u_{1},u_{2}),\mathcal{A}=([m_{1}],[m_{2}])) , Let ρi=(mi,mi1,,1)\rho_{i}=(m_{i},m_{i}-1,\dots,1) and ρ=ρ1ρ2\rho=\rho_{1}\circ\rho_{2}. If one of the following conditions holds, then uu is permutation-ρ\rho-invariant:

  1. 1.

    u1u_{1} and u2u_{2} are symmetric and persymmetric (i.e., symmetric with respect to the northeast-to-southwest diagonal) squares.

  2. 2.

    Both u1u_{1} and u2u_{2} are centrosymmetric, i.e., ui(x,y)=ui(m1+1x,m2+1y)u_{i}(x,y)=u_{i}(m_{1}+1-x,m_{2}+1-y) for i{1,2},x[m1]i\in\{1,2\},x\in[m_{1}] and y[m2]y\in[m_{2}].

For permutation ρ=(i[n]ρi)\rho=(\circ_{i\in[n]}\rho_{i}) and player k[n]k\in[n], we denote the set of non-fixed actions of player kk under ρk\rho_{k} as

V(ρk){ak|ak𝒜k,ρk(ak)ak}.V(\rho_{k})\coloneqq\{a_{k}|a_{k}\in\mathcal{A}_{k},\rho_{k}(a_{k})\neq a_{k}\}.

Based on V(ρk)V(\rho_{k}), we find some equilibria points of permutation-ρ\rho-invariant games that any equivariant approximators will never find.

Theorem 5.3.

For a permutation-ρ\rho-invariant game Γu\Gamma_{u}. if there is a pure NE a=(ai)i[n]a^{*}=(a^{*}_{i})_{i\in[n]} and at least one player k[n]k\in[n] such that akV(ρk)a^{*}_{k}\in V(\rho_{k}), then aa^{*} will never be found by any NE approximator with both PPE and OPI. Besides, aa^{*} (as a pure CE or CCE) will also never be found by any CE or CCE approximator with PE.

We illustrate Theorem 5.3 by the following example:

Example 5.4.

Consider a bimatrix game with identity utility

u=[𝟏,𝟏0,00,0𝟏,𝟏]u=\begin{bmatrix}\bm{1,1}&0,0\\ 0,0&\bm{1,1}\end{bmatrix}

There are two pure NE (bolded in the above matrix) and one mixed NE of σ1=(0.5,0.5)\sigma_{1}=(0.5,0.5) and σ2=(0.5,0.5)\sigma_{2}=(0.5,0.5). Let ρi\rho_{i} be the unique permute function (except for identity function) of player i[2]i\in[2], and ρ=ρ1ρ2\rho=\rho_{1}\circ\rho_{2}. The game is permutation-ρ\rho-invariant.

Case 1: Let ff be a permutation-equivariant CE or CCE approximator, and denote π=f(u)\pi=f(u). We have

π=f(u)=(a)f(ρu)=(b)ρf(u),\pi=f(u)\overset{(a)}{=}f(\rho u)\overset{(b)}{=}\rho f(u),

where (a)(a) holds by permutation-ρ\rho-invariance of uu, and (b)(b) holds by PE of ff. Thus, we have π1,1=π2,2[0,12]\pi_{1,1}=\pi_{2,2}\in[0,\frac{1}{2}] and π1,2=π2,1[0,12]\pi_{1,2}=\pi_{2,1}\in[0,\frac{1}{2}]. As a result, the two pure (C)CEs cannot be found.

Case 2: Let ff be a NE approximator that holds PPE and OPI. Denote f(u)=(σ1,σ2)f(u)=(\sigma_{1},\sigma_{2}), where σ1=(p1,1p1)\sigma_{1}=(p_{1},1-p_{1}) and σ2=(p2,1p2)\sigma_{2}=(p_{2},1-p_{2}). By PPE and OPI of ff, we have

f(u)1\displaystyle f(u)_{1} =(p1,1p1)=(a)f(ρ1ρ2u)1=(b)ρ1f(ρ2u)1=(c)ρ1f(u)1=(1p1,p1),\displaystyle=(p_{1},1-p_{1})\overset{(a)}{=}f(\rho_{1}\rho_{2}u)_{1}\overset{(b)}{=}\rho_{1}f(\rho_{2}u)_{1}\overset{(c)}{=}\rho_{1}f(u)_{1}=(1-p_{1},p_{1}),

where (a)(a) holds by permutaion-ρ\rho-invariance of uu, (b)(b) holds by PPE of ff, and (c)(c) holds by OPI of ff. As a result, the only NE that ff could find is the mixed NE.

As we can see from the example and Theorem 5.3, the equivariance, while introducing inductive bias to the approximator architecture, is also a strong constraint. Such a constraint is why the equivariant approximators cannot find all the equilibria points.

5.2 Social Welfare

The social welfare of a joint strategy π\pi is defined as the sum of all players’ utilities, i.e.,

SW(π,u)=i[n]ui(π).\mathrm{SW}(\pi,u)=\sum_{i\in[n]}u_{i}(\pi).

The equilibrium with higher social welfare is usually preferred (Marris et al., 2022).

To analyze the social welfare of equivariant approximators, we define the worst social welfare ratio as follows:

Definition 5.5.

For any N,M2N,M\geq 2 and two NE (or CE/CCE) approximator classes 1,2\mathcal{F}_{1},\mathcal{F}_{2} that target on games with number of players nNn\leq N and |𝒜i|M|\mathcal{A}_{i}|\leq M, we define the worst social welfare ratio of 1\mathcal{F}_{1} over 2\mathcal{F}_{2} as:

SWRN,M(1,2)inf𝒟maxf11𝔼u𝒟SW(f1(u),u)maxf22𝔼u𝒟SW(f2(u),u)\displaystyle\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}_{1},\mathcal{F}_{2})\coloneqq\inf_{\mathcal{D}}\frac{\max_{f_{1}\in\mathcal{F}_{1}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f_{1}(u),u)}{\max_{f_{2}\in\mathcal{F}_{2}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f_{2}(u),u)}

SWRN,M(1,2)\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}_{1},\mathcal{F}_{2}) measures the relative representativeness of 1\mathcal{F}_{1} over 2\mathcal{F}_{2} in terms of social welfare. Based on that, we have the following result for equivariant CE and CCE approximator classes:

Theorem 5.6.

Given N,M2N,M\geq 2, let PE(C)CE\mathcal{F}^{\mathrm{(C)CE}}_{\mathrm{PE}} be the function class (target on games with number of players nNn\leq N and |𝒜i|M|\mathcal{A}_{i}|\leq M) of all the (C)CE approximators with PE. Denote by general(C)CE\mathcal{F}^{\mathrm{(C)CE}}_{\mathrm{general}} the function class of all the (C)CE approximators. Then we have

SWRN,M(PE(C)CE,general(C)CE)=1.\displaystyle\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{(C)CE}}_{\mathrm{PE}},\mathcal{F}^{\mathrm{(C)CE}}_{\mathrm{general}})=1.

Theorem 5.6 tells that, while the permutation-equivariant (C)CE approximator class may not be able to find all the (C)CE in a game, it can keep the social welfare of the output solutions.

However, when considering equivariant NE approximators, we have the following negative result:

Theorem 5.7.

Given N,M2N,M\geq 2, let OPINE,PPENE{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{OPI}},{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{PPE}} and bothNE{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}} be the function classes (target on games with number of players nNn\leq N and |𝒜i|M|\mathcal{A}_{i}|\leq M) of all the NE approximators with OPI, PPE and both. Denote the function class of all the NE approximators as generalNE\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}. Then we have

SWRN,M(OPINE,generalNE)\displaystyle\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{OPI}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}) =1MN1,\displaystyle=\frac{1}{M^{N-1}}, (1)
SWRN,M(PPENE,generalNE)\displaystyle\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{PPE}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}) 1M,\displaystyle\leq\frac{1}{M}, (2)
SWRN,M(bothNE,generalNE)\displaystyle\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}) =1MN1.\displaystyle=\frac{1}{M^{N-1}}. (3)

Additionally, when M3M\geq 3, denote by ~bothNE\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}} the function class of all the NE oracles (functions that always output exact NE solutions of the input games) with both PPE and OPI, and by ~generalNE\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}} the function class of all the NE oracles. Then we have

SWRN,M(~bothNE,~generalNE)=0.\mathrm{SWR}_{\mathrm{N,M}}(\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}})=0. (4)

The proof is done by construction (See Section C.3 for details). As an illustration of Equation 4, consider a bimatrix game with the following payoff:

u=[1,10,00,12+ε0,01,10,12+ε12+ε,012+ε,0ε,ε]u=\begin{bmatrix}{1,1}&0,0&0,\frac{1}{2}+\varepsilon\\ 0,0&{1,1}&0,\frac{1}{2}+\varepsilon\\ \frac{1}{2}+\varepsilon,0&\frac{1}{2}+\varepsilon,0&\varepsilon,\varepsilon\end{bmatrix}

for ϵ(0,12)\epsilon\in(0,\frac{1}{2}). The maximum NE (the upper-left corner of uu) social welfare is 22, which can be found by at least one NE oracle in ~generalNE\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}}. However, the only NE (the lower-right corner of uu) that the NE oracles in ~bothNE\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}} could find only has a social welfare of 2ϵ2\epsilon. As a result,

SWR2,3(~bothNE,~generalNE)2ϵ2=ϵ,\mathrm{SWR}_{2,3}(\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}})\leq\frac{2\epsilon}{2}=\epsilon,

which goes to zero as ϵ0\epsilon\to 0. Recall that we always have SWRN,M0\mathrm{SWR}_{N,M}\geq 0, thus Equation 4 holds when N=2N=2 and M=3M=3.

Theorem 5.7 tells that equivariant NE approximators may lose some social welfare while enjoying better generalizability. Such a result inspires us to balance generalizability and social welfare when designing the NE approximator architecture.

6 Conclusion and Future Work

In this paper, we theoretically analyze the benefits and limitations of equivariant equilibrium approximators, including player-permutation-equivariant (PPE) and opponent-permutation-invariant (OPI) NE approximator, and permutation-equivariant (PE) CE and CCE approximators. For the benefits, we first show that these equivariant approximators enjoy better generalizability. To get the result, we derive the generalization bounds and sample complexities based on covering numbers, and then we prove that the symmetric approximators have lower covering numbers. We then show that the equivariant approximators can decrease the exploitability when the payoff distribution is invariant under permutation. For the limitations, we find the equivariant approximators may fail to find some equilibria points due to their limited representativeness caused by equivariance. Besides, while equivariant (C)CE approximators can keep the social welfare, the equivariant NE approximators reach a small worst social welfare ratio comparing to the general approximators. Such a result indicates that equivariance may reduce social welfare; therefore, we’d better balance the generalizability and social welfare when we design the architectures of NE approximators.

As for future work, since in our paper we assume the training and testing payoff distribution are the same, an interesting topic is to study the benefits of equivariant approximators under the payoff distribution shift. Moreover, since we consider fixed and discrete action space, another interesting future direction is to analyze the benefits of equivariant approximators in varying or continuous action space.

References

  • Bartlett and Mendelson [2002] Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
  • Cesa-Bianchi and Lugosi [2006] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
  • Chen et al. [2009] Xi Chen, Xiaotie Deng, and Shang-Hua Teng. Settling the complexity of computing two-player Nash equilibria. Journal of the ACM (JACM), 56(3):1–57, 2009.
  • Daskalakis et al. [2009] Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
  • Deligkas et al. [2022] Argyrios Deligkas, Michail Fasoulakis, and Evangelos Markakis. A polynomial-time algorithm for 1/3-approximate Nash equilibria in bimatrix games. In 30th Annual European Symposium on Algorithms, ESA, 2022.
  • Duan et al. [2021] Zhijian Duan, Dinghuai Zhang, Wenhan Huang, Yali Du, Jun Wang, Yaodong Yang, and Xiaotie Deng. Towards the PAC learnability of Nash equilibrium. arXiv preprint arXiv:2108.07472, 2021.
  • Duan et al. [2022] Zhijian Duan, Jingwu Tang, Yutong Yin, Zhe Feng, Xiang Yan, Manzil Zaheer, and Xiaotie Deng. A context-integrated transformer-based neural network for auction design. In International Conference on Machine Learning, pages 5609–5626. PMLR, 2022.
  • Dütting et al. [2019] Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David Parkes, and Sai Srivatsa Ravindranath. Optimal auctions through deep learning. In International Conference on Machine Learning, pages 1706–1715. PMLR, 2019.
  • Elesedy and Zaidi [2021] Bryn Elesedy and Sheheryar Zaidi. Provably strict generalisation benefit for equivariant models. In International Conference on Machine Learning, pages 2959–2969. PMLR, 2021.
  • Feng et al. [2021] Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen McAleer, Ying Wen, Jun Wang, and Yaodong Yang. Neural auto-curricula in two-player zero-sum games. Advances in Neural Information Processing Systems, 34:3504–3517, 2021.
  • Fudenberg et al. [1998] Drew Fudenberg, Fudenberg Drew, David K Levine, and David K Levine. The theory of learning in games, volume 2. MIT press, 1998.
  • Goktas and Greenwald [2022] Denizalp Goktas and Amy Greenwald. Exploitability minimization in games and beyond. In Advances in Neural Information Processing Systems, 2022.
  • Greenwald et al. [2003] Amy Greenwald, Keith Hall, Roberto Serrano, et al. Correlated Q-learning. In ICML, volume 3, pages 242–249, 2003.
  • Hartford et al. [2016] Jason S Hartford, James R Wright, and Kevin Leyton-Brown. Deep learning for predicting human strategic behavior. Advances in neural information processing systems, 29, 2016.
  • Hu and Wellman [2003] Junling Hu and Michael P Wellman. Nash Q-learning for general-sum stochastic games. Journal of machine learning research, 4(Nov):1039–1069, 2003.
  • Ivanov et al. [2022] Dmitry Ivanov, Iskander Safiulin, Igor Filippov, and Ksenia Balabaeva. Optimal-er auctions through attention. In Advances in Neural Information Processing Systems, 2022.
  • Jin et al. [2022] Chi Jin, Qinghua Liu, Yuanhao Wang, and Tiancheng Yu. V-learning – a simple, efficient, decentralized algorithm for multiagent RL. In ICLR 2022 Workshop on Gamification and Multiagent Solutions, 2022.
  • Lanctot et al. [2017] Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Pérolat, David Silver, and Thore Graepel. A unified game-theoretic approach to multiagent reinforcement learning. Advances in neural information processing systems, 30, 2017.
  • Ling et al. [2018] C. Ling, Fei Fang, and J. Z. Kolter. What game are we playing? End-to-end learning in normal and extensive form games. In IJCAI, pages 396–402, 2018.
  • Liu et al. [2022] Siqi Liu, Marc Lanctot, Luke Marris, and Nicolas Heess. Simplex neural population learning: Any-mixture bayes-optimality in symmetric zero-sum games. In International Conference on Machine Learning, ICML, 2022.
  • Lockhart et al. [2019] Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, and Karl Tuyls. Computing approximate equilibria in sequential adversarial games by exploitability descent. In Sarit Kraus, editor, IJCAI, pages 464–470. ijcai.org, 2019.
  • Marris et al. [2021] Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, and Thore Graepel. Multi-agent training beyond zero-sum with correlated equilibrium meta-solvers. In International Conference on Machine Learning, pages 7480–7491. PMLR, 2021.
  • Marris et al. [2022] Luke Marris, Ian Gemp, Thomas Anthony, Andrea Tacchetti, Siqi Liu, and Karl Tuyls. Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers. In Advances in Neural Information Processing Systems, 2022.
  • Nash et al. [1950] John F Nash et al. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.
  • Nekipelov et al. [2015] Denis Nekipelov, Vasilis Syrgkanis, and Eva Tardos. Econometrics for learning agents. In Proceedings of the sixteenth acm conference on economics and computation, pages 1–18, 2015.
  • Nikaidô and Isoda [1955] Hukukane Nikaidô and Kazuo Isoda. Note on non-cooperative convex games. Pacific Journal of Mathematics, 5(S1):807–815, 1955.
  • Qin et al. [2022] Tian Qin, Fengxiang He, Dingfeng Shi, Wenbing Huang, and Dacheng Tao. Benefits of permutation-equivariance in auction mechanisms. In Advances in Neural Information Processing Systems, 2022.
  • Rahme et al. [2021] Jad Rahme, Samy Jelassi, Joan Bruna, and S Matthew Weinberg. A permutation-equivariant neural network architecture for auction design. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
  • Schulz-Mirbach [1994] Hanns Schulz-Mirbach. Constructing invariant features by averaging techniques. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3-Conference C: Signal Processing (Cat. No. 94CH3440-5), volume 2, pages 387–390. IEEE, 1994.
  • Sessa et al. [2020] Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, and Maryam Kamgarpour. Contextual games: Multi-agent learning with side information. Advances in Neural Information Processing Systems, 33:21912–21922, 2020.
  • Shalev-Shwartz and Ben-David [2014] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, 2014.
  • Shoham and Leyton-Brown [2008] Yoav Shoham and Kevin Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.
  • Tsaknakis and Spirakis [2007] Haralampos Tsaknakis and Paul G Spirakis. An optimization approach for approximate Nash equilibria. In International Workshop on Web and Internet Economics, pages 42–56. Springer, 2007.
  • Wu and Lisser [2022] Dawen Wu and Abdel Lisser. Using CNN for solving two-player zero-sum games. Expert Systems with Applications, page 117545, 2022.
  • Wu and Lisser [2023] Dawen Wu and Abdel Lisser. CCGnet: A deep learning approach to predict Nash equilibrium of chance-constrained games. Information Sciences, 2023.

Appendix A Omitted Proofs in Section 3

A.1 Useful Lemma

We first introduce a lemma, which will be frequently used in the following proofs.

Lemma A.1.

i,j[n],ρi𝒢i\forall i,j\in[n],\rho_{i}\in\mathcal{G}_{i} we have (ρiu)j(σi,σi)=uj(ρi1σi,σi)(\rho_{i}u)_{j}(\sigma_{i},\sigma_{-i})=u_{j}(\rho_{i}^{-1}\sigma_{i},\sigma_{-i}) and (ρiu)j(π)=uj(ρi1π)(\rho_{i}u)_{j}(\pi)=u_{j}(\rho_{i}^{-1}\pi)

Proof.

Define a^iρi1ai\widehat{a}_{i}\coloneqq\rho_{i}^{-1}a_{i}. For product strategy σ=(σi)i[n]\sigma=(\sigma_{i})_{i\in[n]},

(ρiu)j(σi,σi)=\displaystyle(\rho_{i}u)_{j}(\sigma_{i},\sigma_{-i})= ai𝒜iai𝒜i(ρiu)j(ai,ai)σi(ai)σi(ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}(\rho_{i}u)_{j}(a_{i},a_{-i})\cdot\sigma_{i}(a_{i})\cdot\sigma_{-i}(a_{-i})
=\displaystyle= ai𝒜iai𝒜iuj(ρi1ai,ai)σi(ai)σi(ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\rho_{i}^{-1}a_{i},a_{-i})\cdot\sigma_{i}(a_{i})\cdot\sigma_{-i}(a_{-i})
=\displaystyle= ai𝒜iai𝒜iuj(ρi1ai,ai)(ρi1σi)(ρi1ai)σi(ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\rho_{i}^{-1}a_{i},a_{-i})\cdot(\rho_{i}^{-1}\sigma_{i})(\rho_{i}^{-1}a_{i})\cdot\sigma_{-i}(a_{-i})
=\displaystyle= a^i𝒜iai𝒜iuj(a^i,ai)(ρi1σi)(a^i)σi(ai)\displaystyle\sum_{\widehat{a}_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\widehat{a}_{i},a_{-i})\cdot(\rho_{i}^{-1}\sigma_{i})(\widehat{a}_{i})\cdot\sigma_{-i}(a_{-i})
=\displaystyle= uj(ρi1σi,σi)\displaystyle u_{j}(\rho_{i}^{-1}\sigma_{i},\sigma_{-i})

For joint strategy π\pi,

(ρiu)j(π)=\displaystyle(\rho_{i}u)_{j}(\pi)= ai𝒜iai𝒜i(ρiuj)(ai,ai)π(ai,ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}(\rho_{i}u_{j})(a_{i},a_{-i})\cdot\pi(a_{i},a_{-i})
=\displaystyle= ai𝒜iai𝒜iuj(ρi1ai,ai)π(ai,ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\rho_{i}^{-1}a_{i},a_{-i})\cdot\pi(a_{i},a_{-i})
=\displaystyle= ai𝒜iai𝒜iuj(ρi1ai,ai)(ρi1π)(ρi1ai,ai)\displaystyle\sum_{a_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\rho_{i}^{-1}a_{i},a_{-i})\cdot(\rho_{i}^{-1}\pi)(\rho_{i}^{-1}a_{i},a_{-i})
=\displaystyle= a^i𝒜iai𝒜iuj(a^i,ai)(ρi1π)(a^i,ai)\displaystyle\sum_{\widehat{a}_{i}\in\mathcal{A}_{i}}\sum_{a_{-i}\in\mathcal{A}_{-i}}u_{j}(\widehat{a}_{i},a_{-i})\cdot(\rho_{i}^{-1}\pi)(\widehat{a}_{i},a_{-i})
=\displaystyle= uj(ρi1π)\displaystyle u_{j}(\rho_{i}^{-1}\pi)

A.2 Proof of Lemma 3.2

See 3.2

Proof.

For player ii, we have

i(ρiσ,ρiu)=\displaystyle\mathcal{E}_{i}(\rho_{i}\sigma,\rho_{i}u)= maxai𝒜iρiui(ai,ρiσi)ρiui(ρiσ)=maxai𝒜iρiui(ai,σi)ρiui(ρiσi,σi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}\rho_{i}u_{i}(a_{i},\rho_{i}\sigma_{-i})-\rho_{i}u_{i}(\rho_{i}\sigma)=\max_{a_{i}\in\mathcal{A}_{i}}\rho_{i}u_{i}(a_{i},\sigma_{-i})-\rho_{i}u_{i}(\rho_{i}\sigma_{i},\sigma_{-i})
=\displaystyle= maxai𝒜iui(ρi1ai,σi)ui(ρi1ρiσi,σi)=(a)maxai𝒜iui(ai,σi)ui(σi,σi)=i(σ,u),\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}u_{i}(\rho_{i}^{-1}a_{i},\sigma_{-i})-u_{i}(\rho_{i}^{-1}\rho_{i}\sigma_{i},\sigma_{-i})\overset{(a)}{=}\max_{a_{i}\in\mathcal{A}_{i}}u_{i}(a_{i},\sigma_{-i})-u_{i}(\sigma_{i},\sigma_{-i})=\mathcal{E}_{i}(\sigma,u),

where (a)(a) holds since ρi\rho_{i} is a bijection on 𝒜i\mathcal{A}_{i}. For player jij\neq i, we have

j(ρiσ,ρiu)=\displaystyle\mathcal{E}_{j}(\rho_{i}\sigma,\rho_{i}u)= maxaj𝒜ρiuj(aj,ρiσj)ρiuj(ρiσ)=maxaj𝒜juj(aj,ρi1ρiσj)uj(ρi1ρiσ)\displaystyle\max_{a_{j}\in\mathcal{A}}\rho_{i}u_{j}(a_{j},\rho_{i}\sigma_{-j})-\rho_{i}u_{j}(\rho_{i}\sigma)=\max_{a_{j}\in\mathcal{A}_{j}}u_{j}(a_{j},\rho_{i}^{-1}\rho_{i}\sigma_{-j})-u_{j}(\rho_{i}^{-1}\rho_{i}\sigma)
=\displaystyle= maxaj𝒜juj(aj,σj)uj(σ)=j(σ,u)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}u_{j}(a_{j},\sigma_{-j})-u_{j}(\sigma)=\mathcal{E}_{j}(\sigma,u)

From above, we have (ρiσ,ρiu)=(σ,u)\mathcal{E}(\rho_{i}\sigma,\rho_{i}u)=\mathcal{E}(\sigma,u), thus if σ\sigma is a ε\varepsilon-NE of Γu\Gamma_{u}, then ρiσ\rho_{i}\sigma must be a ε\varepsilon-NE of Γρiu\Gamma_{\rho_{i}u}. ∎

A.3 Proof of Lemma 3.5

See 3.5

CCE

For player ii, we have

i(ρiπ,ρiu)=\displaystyle\mathcal{E}_{i}(\rho_{i}\pi,\rho_{i}u)= maxai𝒜i(ρiui)(ai,(ρiπ)i)(ρiui)(ρiπi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}(\rho_{i}u_{i})(a_{i},(\rho_{i}\pi)_{-i})-(\rho_{i}u_{i})(\rho_{i}\pi_{i})
=\displaystyle= maxai𝒜i(ρiui)(ai,(ρiπ)i)ui(ρi1ρiπi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}(\rho_{i}u_{i})(a_{i},(\rho_{i}\pi)_{-i})-u_{i}(\rho_{i}^{-1}\rho_{i}\pi_{i})
=\displaystyle= maxai𝒜i(ρiui)(ai,(ρiπ)i)ui(πi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}(\rho_{i}u_{i})(a_{i},(\rho_{i}\pi)_{-i})-u_{i}(\pi_{i})
=\displaystyle= maxai𝒜ib𝒜(ρiui)(ai,bi)(ρiπ)(b)ui(πi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}\sum_{b\in\mathcal{A}}(\rho_{i}u_{i})(a_{i},b_{-i})\cdot(\rho_{i}\pi)(b)-u_{i}(\pi_{i})
=\displaystyle= maxai𝒜ibi𝒜i,bi𝒜iui(ρi1ai,bi)π(ρi1bi,bi)ui(πi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}\sum_{b_{i}\in\mathcal{A}_{i},b_{-i}\in\mathcal{A}_{-i}}u_{i}(\rho_{i}^{-1}a_{i},b_{-i})\cdot\pi(\rho_{i}^{-1}b_{i},b_{-i})-u_{i}(\pi_{i})
=\displaystyle= maxai𝒜ibi𝒜i,bi𝒜iui(ai,bi)π(bi,bi)ui(πi)\displaystyle\max_{a_{i}\in\mathcal{A}_{i}}\sum_{b_{i}\in\mathcal{A}_{i},b_{-i}\in\mathcal{A}_{-i}}u_{i}(a_{i},b_{-i})\cdot\pi(b_{i},b_{-i})-u_{i}(\pi_{i}) ,ρi is a bijection on 𝒜i\displaystyle,\text{$\rho_{i}$ is a bijection on $\mathcal{A}_{i}$}
=\displaystyle= i(π,u)\displaystyle\mathcal{E}_{i}(\pi,u)

For player jij\neq i, we have

j(ρiπ,ρiu)=\displaystyle\mathcal{E}_{j}(\rho_{i}\pi,\rho_{i}u)= maxaj𝒜j(ρiuj)(aj,(ρiπ)j)(ρiuj)(ρiπj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}(\rho_{i}u_{j})(a_{j},(\rho_{i}\pi)_{-j})-(\rho_{i}u_{j})(\rho_{i}\pi_{j})
=\displaystyle= maxaj𝒜j(ρiuj)(aj,(ρiπ)j)uj(ρi1ρiπj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}(\rho_{i}u_{j})(a_{j},(\rho_{i}\pi)_{-j})-u_{j}(\rho_{i}^{-1}\rho_{i}\pi_{j})
=\displaystyle= maxaj𝒜j(ρiuj)(aj,(ρiπ)j)uj(πj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}(\rho_{i}u_{j})(a_{j},(\rho_{i}\pi)_{-j})-u_{j}(\pi_{j})
=\displaystyle= maxaj𝒜jb𝒜(ρiuj)(aj,bj)(ρiπ)(b)uj(πj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}\sum_{b\in\mathcal{A}}(\rho_{i}u_{j})(a_{j},b_{-j})\cdot(\rho_{i}\pi)(b)-u_{j}(\pi_{j})
=\displaystyle= maxaj𝒜jbi𝒜i,bi𝒜iuj(aj,(bj)i,ρi1bi)π(ρi1bi,bi)uj(πj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}\sum_{b_{i}\in\mathcal{A}_{i},b_{-i}\in\mathcal{A}_{-i}}u_{j}(a_{j},(b_{-j})_{-i},\rho_{i}^{-1}b_{i})\cdot\pi(\rho_{i}^{-1}b_{i},b_{-i})-u_{j}(\pi_{j})
=\displaystyle= maxaj𝒜jbi𝒜i,bi𝒜iuj(aj,(bj)i,bi)π(bi,bi)uj(πj)\displaystyle\max_{a_{j}\in\mathcal{A}_{j}}\sum_{b_{i}\in\mathcal{A}_{i},b_{-i}\in\mathcal{A}_{-i}}u_{j}(a_{j},(b_{-j})_{-i},b_{i})\cdot\pi(b_{i},b_{-i})-u_{j}(\pi_{j}) ,ρi is a bijection on 𝒜i\displaystyle,\text{$\rho_{i}$ is a bijection on $\mathcal{A}_{i}$}
=\displaystyle= j(π,u)\displaystyle\mathcal{E}_{j}(\pi,u)

Thus, we have (ρiπ,ρiu)=(π,u)\mathcal{E}(\rho_{i}\pi,\rho_{i}u)=\mathcal{E}(\pi,u). Thus, if π\pi is a ε\varepsilon-CCE of Γu\Gamma_{u}, then ρiπ\rho_{i}\pi must be a ε\varepsilon-CCE of Γρiu\Gamma_{\rho_{i}u}.

CE

For player jij\neq i, we have

jCE(ρiπ,ρiu)=\displaystyle\mathcal{E}_{j}^{\mathrm{CE}}(\rho_{i}\pi,\rho_{i}u)= maxϕj:𝒜j𝒜ja𝒜(ρiπ)(a)(ρiuj)(ϕj(aj),aj)(ρiuj)(ρiπ)\displaystyle\max_{\phi_{j}:\mathcal{A}_{j}\to\mathcal{A}_{j}}\sum_{a\in\mathcal{A}}(\rho_{i}\pi)(a)\cdot(\rho_{i}u_{j})(\phi_{j}(a_{j}),a_{-j})-(\rho_{i}u_{j})(\rho_{i}\pi)
=\displaystyle= maxϕj:𝒜j𝒜ja𝒜π(ρi1ai,ai)uj(ϕj(aj),ai,j,ρi1ai)uj(π)\displaystyle\max_{\phi_{j}:\mathcal{A}_{j}\to\mathcal{A}_{j}}\sum_{a\in\mathcal{A}}\pi(\rho_{i}^{-1}a_{i},a_{-i})\cdot u_{j}(\phi_{j}(a_{j}),a_{-i,j},\rho_{i}^{-1}a_{i})-u_{j}(\pi)
=\displaystyle= maxϕj:𝒜j𝒜ja𝒜π(ai,ai)uj(ϕj(aj),ai,j,ai)uj(π)\displaystyle\max_{\phi_{j}:\mathcal{A}_{j}\to\mathcal{A}_{j}}\sum_{a\in\mathcal{A}}\pi(a_{i},a_{-i})\cdot u_{j}(\phi_{j}(a_{j}),a_{-i,j},a_{i})-u_{j}(\pi) ,ρi is a bijection on 𝒜i\displaystyle,\text{$\rho_{i}$ is a bijection on $\mathcal{A}_{i}$}
=\displaystyle= jCE(π,u)\displaystyle\mathcal{E}_{j}^{\mathrm{CE}}(\pi,u)

For player ii, we define operator ρ¯i\bar{\rho}_{i} as (ρ¯iϕi)(ai)=ρi1ϕi(ρiai)(\bar{\rho}_{i}\phi_{i})(a_{i})=\rho_{i}^{-1}\phi_{i}(\rho_{i}a_{i}). We can verify that ρ¯i\bar{\rho}_{i} is a bijection on {ϕi:𝒜i𝒜i}\{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}\}, because ¯\bar{\cdot} is a homomorphism in the sense that ρi1¯ρi2¯=ρi2ρi1¯\overline{\rho_{i}^{1}}\circ\overline{\rho_{i}^{2}}=\overline{\rho_{i}^{2}\rho_{i}^{1}} and ¯\bar{\cdot} maps the identity mapping of 𝒜i\mathcal{A}_{i} to the identity mapping of {𝒜i𝒜i}\{\mathcal{A}_{i}\to\mathcal{A}_{i}\}. Specifically,

ρi1¯ρi2¯ϕi(ai)=(ρi1)1(ρi2¯ϕi)(ρi1ai)=(ρi1)1(ρi2)1ϕi(ρi2ρi1ai)=ρi2ρi1¯ϕi(ai),\displaystyle\overline{\rho_{i}^{1}}\circ\overline{\rho_{i}^{2}}\phi_{i}(a_{i})=(\rho_{i}^{1})^{-1}(\overline{\rho_{i}^{2}}\phi_{i})(\rho_{i}^{1}a_{i})=(\rho_{i}^{1})^{-1}(\rho_{i}^{2})^{-1}\phi_{i}(\rho_{i}^{2}\rho_{i}^{1}a_{i})=\overline{\rho_{i}^{2}\rho_{i}^{1}}\phi_{i}(a_{i}),

and

ei¯ϕi(ai)=ei1ϕi(eiai)=ϕi(ai).\displaystyle\overline{e_{i}}\phi_{i}(a_{i})=e_{i}^{-1}\phi_{i}(e_{i}a_{i})=\phi_{i}(a_{i}).

Based on ρ¯i\bar{\rho}_{i}, we have

iCE(ρiπ,ρiu)\displaystyle\mathcal{E}_{i}^{\mathrm{CE}}(\rho_{i}\pi,\rho_{i}u)
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜(ρiπ)(a)(ρiui)(ϕi(ai),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}(\rho_{i}\pi)(a)\cdot(\rho_{i}u_{i})(\phi_{i}(a_{i}),a_{-i})-u_{i}(\pi)
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜π(ρi1ai,ai)ui(ρi1ϕi(ai),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}\pi(\rho_{i}^{-1}a_{i},a_{-i})u_{i}(\rho_{i}^{-1}\phi_{i}(a_{i}),a_{-i})-u_{i}(\pi)
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜π(ρi1ai,ai)ui(ρi1ϕi(ρi(ρi1ai)),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}\pi(\rho_{i}^{-1}a_{i},a_{-i})u_{i}(\rho_{i}^{-1}\phi_{i}(\rho_{i}(\rho_{i}^{-1}a_{i})),a_{-i})-u_{i}(\pi)
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜π(ai,ai)ui(ρi1ϕi(ρiai),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}\pi(a_{i},a_{-i})u_{i}(\rho_{i}^{-1}\phi_{i}(\rho_{i}a_{i}),a_{-i})-u_{i}(\pi) ,ρi is a bijection on 𝒜i\displaystyle,\text{$\rho_{i}$ is a bijection on $\mathcal{A}_{i}$}
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜π(ai,ai)ui((ρ¯iϕi)(ai),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}\pi(a_{i},a_{-i})u_{i}((\bar{\rho}_{i}\phi_{i})(a_{i}),a_{-i})-u_{i}(\pi)
=\displaystyle= maxϕi:𝒜i𝒜ia𝒜π(ai,ai)ui(ϕi(ai),ai)ui(π)\displaystyle\max_{\phi_{i}:\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a\in\mathcal{A}}\pi(a_{i},a_{-i})u_{i}(\phi_{i}(a_{i}),a_{-i})-u_{i}(\pi) ,ρ¯i is a bijection on {𝒜i𝒜i}\displaystyle,\text{$\bar{\rho}_{i}$ is a bijection on $\{\mathcal{A}_{i}\to\mathcal{A}_{i}\}$}
=\displaystyle= iCE(π,u)\displaystyle\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u)

Thus, we have (ρiπ,ρiu)=(π,u)\mathcal{E}(\rho_{i}\pi,\rho_{i}u)=\mathcal{E}(\pi,u), thus if π\pi is a ε\varepsilon-CE of Γu\Gamma_{u}, then ρiπ\rho_{i}\pi must be a ε\varepsilon-CE of Γρiu\Gamma_{\rho_{i}u}.

A.4 Proof of Lemma 3.7 to Lemma 3.9

See 3.7

Proof.

ji,ρ0𝒢i\forall j\neq i,\rho_{0}\in\mathcal{G}_{i}, for operator 𝒪i\mathcal{O}_{i} we have

(𝒪ifNE)(ρ0u)j=\displaystyle(\mathcal{O}_{i}f^{\mathrm{NE}})(\rho_{0}u)_{j}= 1|𝒜i|!ρi𝒢ifNE(ρiρ0u)j=(a)1|𝒜i|!ρ^i𝒢ifNE(ρ^iu)j=(𝒪ifNE)(u)j\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\rho_{i}\rho_{0}u)_{j}\overset{(a)}{=}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\widehat{\rho}_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\widehat{\rho}_{i}u)_{j}=(\mathcal{O}_{i}f^{\mathrm{NE}})(u)_{j}

where in (a)(a) we define ρ^i=ρiρ0\widehat{\rho}_{i}=\rho_{i}\rho_{0}, and (a)(a) holds since ρ0\rho_{0} is a bijection on 𝒢i\mathcal{G}_{i}. As a result, 𝒪ifNE\mathcal{O}_{i}f^{\mathrm{NE}} is ii-PI.

For operator 𝒫i\mathcal{P}_{i} we have

(𝒫ifNE)(ρ0u)i=\displaystyle(\mathcal{P}_{i}f^{\mathrm{NE}})(\rho_{0}u)_{i}= 1|𝒜i|!ρi𝒢iρi1fNE(ρiρ0u)j=ρ01|𝒜i|!ρi𝒢iρ01ρi1fNE(ρiρ0u)j\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\mathrm{NE}}(\rho_{i}\rho_{0}u)_{j}=\rho_{0}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{0}^{-1}\rho_{i}^{-1}f^{\mathrm{NE}}(\rho_{i}\rho_{0}u)_{j}
=\displaystyle= ρ01|𝒜i|!ρ^i𝒢iρ^i1fNE(ρ^iu)j=ρ0(𝒫ifNE)(u)i,\displaystyle\rho_{0}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\widehat{\rho}_{i}\in\mathcal{G}_{i}}\widehat{\rho}_{i}^{-1}f^{\mathrm{NE}}(\widehat{\rho}_{i}u)_{j}=\rho_{0}(\mathcal{P}_{i}f^{\mathrm{NE}})(u)_{i},

therefore 𝒫ifNE\mathcal{P}_{i}f^{\mathrm{NE}} is ii-PE.

If fNEf^{\mathrm{NE}} is already ii-PI, ji\forall j\neq i we have

𝒪ifNE(u)j=\displaystyle\mathcal{O}_{i}f^{\mathrm{NE}}(u)_{j}= 1|𝒜i|!ρi𝒢ifNE(ρiu)j=1|𝒜i|!ρi𝒢ifNE(u)j=fNE(u)j,\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\rho_{i}u)_{j}=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(u)_{j}=f^{\mathrm{NE}}(u)_{j},

and 𝒪ifNE(u)i=fNE(u)i\mathcal{O}_{i}f^{\mathrm{NE}}(u)_{i}=f^{\mathrm{NE}}(u)_{i} according to definition of 𝒪i\mathcal{O}_{i}. Therefore, 𝒪ifNE=fNE\mathcal{O}_{i}f^{\mathrm{NE}}=f^{\mathrm{NE}} for ii-PI fNEf^{\mathrm{NE}}.

If fNEf^{\mathrm{NE}} is already ii-PE, we have

𝒫ifNE(u)i=\displaystyle\mathcal{P}_{i}f^{\mathrm{NE}}(u)_{i}= 1|𝒜i|!ρi𝒢iρi1fNE(ρiu)i=1|𝒜i|!ρi𝒢iρi1ρifNE(u)i=1|𝒜i|!ρi𝒢ifNE(u)i=fNE(u)i,\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\mathrm{NE}}(\rho_{i}u)_{i}=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}\rho_{i}f^{\mathrm{NE}}(u)_{i}=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(u)_{i}=f^{\mathrm{NE}}(u)_{i},

and ji,𝒫ifNE(u)j=fNE(u)j\forall j\neq i,\mathcal{P}_{i}f^{\mathrm{NE}}(u)_{j}=f^{\mathrm{NE}}(u)_{j} according to definition of 𝒫i\mathcal{P}_{i}. Therefore, 𝒫ifNE=fNE\mathcal{P}_{i}f^{\mathrm{NE}}=f^{\mathrm{NE}} for ii-PE fNEf^{\mathrm{NE}}.

See 3.8

Proof.

A direct inference from Lemma 3.7

See 3.9

Proof.

ρ0𝒢i\forall\rho_{0}\in\mathcal{G}_{i}, we have

(𝒬if(C)CE)(ρ0u)=\displaystyle(\mathcal{Q}_{i}f^{\mathrm{(C)CE}})(\rho_{0}u)= 1|𝒜i|!ρi𝒢iρi1f(C)CE(ρiρ0u)=ρ01|𝒜i|!ρi𝒢iρ01ρi1f(C)CE(ρiρ0u)\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\mathrm{(C)CE}}(\rho_{i}\rho_{0}u)=\rho_{0}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{0}^{-1}\rho_{i}^{-1}f^{\mathrm{(C)CE}}(\rho_{i}\rho_{0}u)
=\displaystyle= ρ01|𝒜i|!ρ^i𝒢iρ^i1f(C)CE(ρ^iu)=ρ0(𝒬if(C)CE)(u)\displaystyle\rho_{0}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\widehat{\rho}_{i}\in\mathcal{G}_{i}}\widehat{\rho}_{i}^{-1}f^{\mathrm{(C)CE}}(\widehat{\rho}_{i}u)=\rho_{0}(\mathcal{Q}_{i}f^{\mathrm{(C)CE}})(u)

If f(C)CEf^{\text{(C)CE}} is already ii-PE, we have

𝒬if(C)CE(u)=\displaystyle\mathcal{Q}_{i}f^{\text{(C)CE}}(u)= 1|𝒜i|!ρi𝒢iρi1f(C)CE(ρiu)=1|𝒜i|!ρi𝒢iρi1ρif(C)CE(u)=1|𝒜i|!ρi𝒢if(C)CE(u)=f(C)CE(u)\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f^{\text{(C)CE}}(\rho_{i}u)=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}\rho_{i}f^{\text{(C)CE}}(u)=\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\text{(C)CE}}(u)=f^{\text{(C)CE}}(u)

A.5 Proof of Lemma 3.11

See 3.11

Proof.

We prove the three claims below.

  1. 1.

    𝒳𝒳𝒳\mathcal{X}\mathcal{F}_{\mathcal{X}}\subseteq\mathcal{F}_{\mathcal{X}}.

  2. 2.

    𝒳2𝒳=𝒳𝒳\mathcal{X}^{2}\mathcal{F}_{\mathcal{X}}=\mathcal{X}\mathcal{F}_{\mathcal{X}}.

  3. 3.

    If 𝒳𝒴=𝒴𝒳\mathcal{X}\mathcal{Y}=\mathcal{Y}\subseteq\mathcal{F}_{\mathcal{X}}, then 𝒴𝒳𝒳\mathcal{Y}\subseteq\mathcal{X}\mathcal{F}_{\mathcal{X}}

The first claim holds because 𝒳\mathcal{F}_{\mathcal{X}} is closed under 𝒳\mathcal{X}, and the second claim holds because 𝒳\mathcal{X} is idempotent. For the third claim, from 𝒴𝒳\mathcal{Y}\subseteq\mathcal{F}_{\mathcal{X}} we know 𝒳𝒴𝒳𝒳\mathcal{X}\mathcal{Y}\subseteq\mathcal{X}\mathcal{F}_{\mathcal{X}}, then 𝒴=𝒳𝒴𝒳𝒳\mathcal{Y}=\mathcal{X}\mathcal{Y}\subseteq\mathcal{X}\mathcal{F}_{\mathcal{X}}.

We immediately know 𝒳𝒳\mathcal{X}\mathcal{F}_{\mathcal{X}} is the largest subset of 𝒳\mathcal{F}_{\mathcal{X}} that is invariant under 𝒳\mathcal{X}. ∎

Appendix B Omitted Proofs in Section 4

B.1 Proof of Theorem 4.3

See 4.3

Some of the proof techniques come from Dütting et al. [2019] and Duan et al. [2021]. We first introduce some useful lemmas. Denote :×𝒰\ell:\mathcal{F}\times\mathcal{U}\to\mathbb{R} as the loss function (such as (f,u)(f(u),u)\ell(f,u)\coloneqq\mathcal{E}(f(u),u)). We measure the capacity of the composite function class \ell\circ\mathcal{F} using the empirical Rademacher complexity [Bartlett and Mendelson, 2002] on the training set SS, which is defined as:

S()1m𝔼𝒙{+1,1}m[supfi=1mxi(f,u(i))],\displaystyle\mathcal{R}_{S}(\ell\circ\mathcal{F})\coloneqq\frac{1}{m}\mathbb{E}_{\bm{x}\sim\{+1,-1\}^{m}}\Big{[}\sup_{f\in\mathcal{F}}\sum_{i=1}^{m}x_{i}\cdot\ell(f,u^{(i)})\Big{]},

where 𝒙\bm{x} is distributed i.i.d. according to uniform distribution in {+1,1}\{+1,-1\}. We have

Lemma B.1 (Shalev-Shwartz and Ben-David [2014]).

Let SS be a training set of size mm drawn i.i.d. from distribution 𝒟\mathcal{D} over 𝒰\mathcal{U}. Then with probability at least 1δ1-\delta over draw of SS from 𝒟\mathcal{D}, for all ff\in\mathcal{F},

𝔼u𝒟[(f,u)]1muS(l,u)2S()+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\ell(f,u)]-\frac{1}{m}\sum_{u\in S}\ell(l,u)\leq 2\mathcal{R}_{S}(\ell\circ\mathcal{F})+4\sqrt{\frac{2\ln(4/\delta)}{m}}
Lemma B.2.

If |()|c|\ell(\cdot)|\leq c for constant c>0c>0 and f,f,|(f,u)(f,u)|Lff\forall f,f^{\prime}\in\mathcal{F},|\ell(f,u)-\ell(f^{\prime},u)|\leq L{\|f-f^{\prime}\|}_{\infty}, then we have

𝔼u𝒟[(f,u)]1muS(l,u)2infr>0{c2ln𝒩(,r)m+Lr}+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\ell(f,u)]-\frac{1}{m}\sum_{u\in S}\ell(l,u)\leq 2\inf_{r>0}\left\{c\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}{m}}+Lr\right\}+4\sqrt{\frac{2\ln(4/\delta)}{m}}
Proof.

For function class \mathcal{F}, let r\mathcal{F}_{r} with |r|=𝒩(,r)|\mathcal{F}_{r}|=\mathcal{N}_{\infty}(\mathcal{F},r) be the function class that rr-covers \mathcal{F} for some r>0r>0. Similarly, f\forall f\in\mathcal{F}, denote frrf_{r}\in\mathcal{F}_{r} be the function that rr-covers ff. We have

S()=\displaystyle\mathcal{R}_{S}(\ell\circ\mathcal{F})= 1m𝔼𝒙[supfi=1mxi(f,u(i))]\displaystyle\frac{1}{m}\mathbb{E}_{\bm{x}}\Big{[}\sup_{f\in\mathcal{F}}\sum_{i=1}^{m}x_{i}\cdot\ell(f,u^{(i)})\Big{]} (5)
=\displaystyle= 1m𝔼𝒙[supfi=1mxi((fr,u(i))+(f,u(i))(fr,u(i)))]\displaystyle\frac{1}{m}\mathbb{E}_{\bm{x}}\Big{[}\sup_{f\in\mathcal{F}}\sum_{i=1}^{m}x_{i}\cdot\big{(}\ell(f_{r},u^{(i)})+\ell(f,u^{(i)})-\ell(f_{r},u^{(i)})\big{)}\Big{]}
\displaystyle\leq 1m𝔼𝒙[supfrri=1mxi(fr,u(i))]+1m𝔼𝒙[supfi=1m|xiLr|]\displaystyle\frac{1}{m}\mathbb{E}_{\bm{x}}\Big{[}\sup_{f_{r}\in\mathcal{F}_{r}}\sum_{i=1}^{m}x_{i}\cdot\ell(f_{r},u^{(i)})\Big{]}+\frac{1}{m}\mathbb{E}_{\bm{x}}\Big{[}\sup_{f\in\mathcal{F}}\sum_{i=1}^{m}|x_{i}\cdot Lr|\Big{]} ,|(f,u)(fr,u)|Lffr=Lr\displaystyle,|\ell(f,u)-\ell(f_{r},u)|\leq L{\|f-f_{r}\|}_{\infty}=Lr
\displaystyle{\leq} supfrri=1m2(fr,u(i))2ln𝒩(,r)m+Lrm𝔼𝒙𝒙\displaystyle\sup_{f_{r}\in\mathcal{F}_{r}}\sqrt{\sum_{i=1}^{m}\ell^{2}(f_{r},u^{(i)})}\cdot\frac{\sqrt{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}}{m}+\frac{Lr}{m}\mathbb{E}_{\bm{x}}{\|\bm{x}\|} ,the first term holds by Massart’s lemma\displaystyle,\text{the first term holds by Massart's lemma}
\displaystyle\leq c2m2ln𝒩(,r)m+Lrm𝔼𝒙𝒙\displaystyle\sqrt{c^{2}m}\cdot\frac{\sqrt{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}}{m}+\frac{Lr}{m}\mathbb{E}_{\bm{x}}{\|\bm{x}\|}
\displaystyle\leq c2ln𝒩(,r)m+Lr,\displaystyle c\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}{m}}+Lr,

Combining Lemma B.1 and Equation 5, we get

𝔼u𝒟[(f,u)]1muS(l,u)2infr>0{c2ln𝒩(,r)m+Lr}+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\ell(f,u)]-\frac{1}{m}\sum_{u\in S}\ell(l,u)\leq 2\inf_{r>0}\left\{c\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F},r)}{m}}+Lr\right\}+4\sqrt{\frac{2\ln(4/\delta)}{m}}

B.1.1 NE Approximator

Lemma B.3.

For arbitrary product mixed strategy σ\sigma and σ\sigma^{\prime}, we have

|(σ,u)(σ,u)|2nσσ,\displaystyle|\mathcal{E}(\sigma,u)-\mathcal{E}(\sigma^{\prime},u)|\leq 2n{\|\sigma-\sigma^{\prime}\|},
Proof.

σ,σ\forall\sigma,\sigma^{\prime}, we define yj(σ1,,σj1,σj+1,,σn)y_{-j}\coloneqq(\sigma_{1},\dots,\sigma_{j-1},\sigma^{\prime}_{j+1},\dots,\sigma^{\prime}_{n}). Then, i[n]\forall i\in[n] we have

|ui(σ)ui(σ)|=\displaystyle|u_{i}(\sigma)-u_{i}(\sigma^{\prime})|= |ui(σ1,σ2,,σn)ui(σ,σ2,,σn)|\displaystyle|u_{i}(\sigma_{1},\sigma_{2},\dots,\sigma_{n})-u_{i}(\sigma^{\prime},\sigma^{\prime}_{2},\dots,\sigma^{\prime}_{n})|
=\displaystyle= |j=1n(ui(σ1,,σj,σj+1,,σn)ui(σ1,,σj,σj+1,,σn))|\displaystyle\Big{|}\sum_{j=1}^{n}\Big{(}u_{i}(\sigma_{1},\dots,\sigma_{j},\sigma^{\prime}_{j+1},\dots,\sigma^{\prime}_{n})-u_{i}(\sigma_{1},\dots,\sigma^{\prime}_{j},\sigma^{\prime}_{j+1},\dots,\sigma^{\prime}_{n})\Big{)}\Big{|}
=\displaystyle= |j=1n(ui(σj,yj)ui(σj,yj))|\displaystyle\Big{|}\sum_{j=1}^{n}\Big{(}u_{i}(\sigma_{j},y_{-j})-u_{i}(\sigma^{\prime}_{j},y_{-j})\Big{)}\Big{|}
=\displaystyle= |j=1naj(σj(aj)σj(aj))ajui(aj,aj)yj(aj)|\displaystyle\Big{|}\sum_{j=1}^{n}\sum_{a_{j}}(\sigma_{j}(a_{j})-\sigma^{\prime}_{j}(a_{j}))\sum_{a_{-j}}u_{i}(a_{j},a_{-j})y_{-j}(a_{-j})\Big{|}
\displaystyle\leq j=1naj|σj(aj)σj(aj)|ajui(aj,aj)yj(aj)\displaystyle\sum_{j=1}^{n}\sum_{a_{j}}\Big{|}\sigma_{j}(a_{j})-\sigma^{\prime}_{j}(a_{j})\Big{|}\sum_{a_{-j}}u_{i}(a_{j},a_{-j})y_{-j}(a_{-j})
\displaystyle{\leq} j=1naj|σj(aj)σj(aj)|ajyj(aj)\displaystyle\sum_{j=1}^{n}\sum_{a_{j}}\Big{|}\sigma_{j}(a_{j})-\sigma^{\prime}_{j}(a_{j})\Big{|}\sum_{a_{-j}}y_{-j}(a_{-j}) ,ui()[0,1]\displaystyle,u_{i}(\cdot)\in[0,1]
\displaystyle\leq j=1najAj|σj(aj)σj(aj)|nmaxj[n]ajAj|σj(aj)σj(aj)|\displaystyle\sum_{j=1}^{n}\sum_{a_{j}\in A_{j}}\Big{|}\sigma_{j}(a_{j})-\sigma^{\prime}_{j}(a_{j})\Big{|}\leq n\max_{j\in[n]}\sum_{a_{j}\in A_{j}}\Big{|}\sigma_{j}(a_{j})-\sigma^{\prime}_{j}(a_{j})\Big{|}
=\displaystyle= nσσ,\displaystyle n{\|\sigma-\sigma^{\prime}\|},

Therefore, aiAi\forall a_{i}\in A_{i},

ui(ai,σi)ui(σ)=\displaystyle u_{i}(a_{i},\sigma_{-i})-u_{i}(\sigma)= ui(ai,σi)ui(ai,σi)+ui(ai,σi)ui(σ)+ui(σ)ui(σ)\displaystyle u_{i}(a_{i},\sigma_{-i})-u_{i}(a_{i},\sigma^{\prime}_{-i})+u_{i}(a_{i},\sigma^{\prime}_{-i})-u_{i}(\sigma^{\prime})+u_{i}(\sigma^{\prime})-u_{i}(\sigma)
\displaystyle\leq nσσ+(σ,u)+nσσ\displaystyle n{\|\sigma-\sigma^{\prime}\|}+\mathcal{E}(\sigma^{\prime},u)+n{\|\sigma-\sigma^{\prime}\|}
=\displaystyle= (σ,u)+2nσσ.\displaystyle\mathcal{E}(\sigma^{\prime},u)+2n{\|\sigma-\sigma^{\prime}\|}.

Based on that, we get

(σ,u)=maxiN,aiAi[ui(ai,σi)ui(σ)](σ,u)+2nσσ\displaystyle\mathcal{E}(\sigma,u)=\max_{i\in N,a_{i}\in A_{i}}[u_{i}(a_{i},\sigma_{-i})-u_{i}(\sigma)]\leq\mathcal{E}(\sigma^{\prime},u)+2n{\|\sigma-\sigma^{\prime}\|}

Similarly, we also have

(σ,u)(σ,u)+2nσσ\mathcal{E}(\sigma^{\prime},u)\leq\mathcal{E}(\sigma,u)+2n{\|\sigma-\sigma^{\prime}\|}

Based on Lemma B.3, f,fNE\forall f,f^{\prime}\in\mathcal{F}^{\mathrm{NE}}, we have

(f(u),u)(f(u),u)2f(u)f(u)2ff\mathcal{E}(f(u),u)-\mathcal{E}(f^{\prime}(u),u)\leq 2{\|f(u)-f^{\prime}(u)\|}\leq 2{\|f-f^{\prime}\|}_{\infty}

Considering that |()|1|\mathcal{E}(\cdot)|\leq 1, according to Lemma B.2, we have:

𝔼u𝒟[(fNE(u),u)]1muS(fNE(u),u)2infr>0{2ln𝒩(NE,r)m+2nr}+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{NE}}(u),u)]-\frac{1}{m}\sum_{u\in S}\mathcal{E}(f^{\mathrm{NE}}(u),u)\leq 2\cdot\inf_{r>0}\Big{\{}\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{NE}},r)}{m}}+2nr\Big{\}}+4\sqrt{\frac{2\ln(4/\delta)}{m}}

B.1.2 CCE Approximator

Lemma B.4.

For arbitrary joint mixed strategy π\pi and π\pi^{\prime}, we have

|(π,u)(π,u)|2ππ,|\mathcal{E}(\pi,u)-\mathcal{E}(\pi^{\prime},u)|\leq 2{\|\pi-\pi^{\prime}\|},
Proof.

π,π,i[n]\forall\pi,\pi^{\prime},\forall i\in[n] we have

|ui(π)ui(π)|=a𝒜(π(a)π(a))ui(a)(a)a𝒜|π(a)π(a)|=ππ\displaystyle|u_{i}(\pi)-u_{i}(\pi^{\prime})|=\sum_{a\in\mathcal{A}}(\pi(a)-\pi^{\prime}(a))u_{i}(a)\overset{(a)}{\leq}\sum_{a\in\mathcal{A}}|\pi(a)-\pi^{\prime}(a)|={\|\pi-\pi^{\prime}\|} (6)

where (a)(a) holds since ui()[0,1]u_{i}(\cdot)\in[0,1]. Therefore, aiAi\forall a_{i}\in A_{i},

ui(ai,πi)ui(π)=\displaystyle u_{i}(a_{i},\pi_{-i})-u_{i}(\pi)= ui(ai,πi)ui(ai,πi)+ui(ai,πi)ui(π)+ui(π)ui(π)\displaystyle u_{i}(a_{i},\pi_{-i})-u_{i}(a_{i},\pi^{\prime}_{-i})+u_{i}(a_{i},\pi^{\prime}_{-i})-u_{i}(\pi^{\prime})+u_{i}(\pi^{\prime})-u_{i}(\pi)
\displaystyle\leq ππ+(π,u)+ππ\displaystyle{\|\pi-\pi^{\prime}\|}+\mathcal{E}(\pi^{\prime},u)+{\|\pi-\pi^{\prime}\|}
=\displaystyle= (π,u)+2ππ.\displaystyle\mathcal{E}(\pi^{\prime},u)+2{\|\pi-\pi^{\prime}\|}.

Based on that, we get

(π,u)=maxiN,aiAi[ui(ai,πi)ui(π)](π,u)+2ππ\displaystyle\mathcal{E}(\pi,u)=\max_{i\in N,a_{i}\in A_{i}}[u_{i}(a_{i},\pi_{-i})-u_{i}(\pi)]\leq\mathcal{E}(\pi^{\prime},u)+2{\|\pi-\pi^{\prime}\|}

Similarly, we also have

(π,u)(π,u)+2ππ\mathcal{E}(\pi^{\prime},u)\leq\mathcal{E}(\pi,u)+2{\|\pi-\pi^{\prime}\|}

Based on Lemma B.4, f,fCCE\forall f,f^{\prime}\in\mathcal{F}^{\mathrm{CCE}}, we have

(f(u),u)(f(u),u)2f(u)f(u)2ff\mathcal{E}(f(u),u)-\mathcal{E}(f^{\prime}(u),u)\leq 2{\|f(u)-f^{\prime}(u)\|}\leq 2{\|f-f^{\prime}\|}_{\infty}

Considering that |()|1|\mathcal{E}(\cdot)|\leq 1, according to Lemma B.2, we have:

𝔼u𝒟[(fCCE(u),u)]1muS(fCCE(u),u)2infr>0{2ln𝒩(CCE,r)m+2r}+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{CCE}}(u),u)]-\frac{1}{m}\sum_{u\in S}\mathcal{E}(f^{\mathrm{CCE}}(u),u)\leq 2\cdot\inf_{r>0}\Big{\{}\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{CCE}},r)}{m}}+2r\Big{\}}+4\sqrt{\frac{2\ln(4/\delta)}{m}}

B.1.3 CE Approximator

Lemma B.5.

For arbitrary joint mixed strategy π\pi and π\pi^{\prime}, we have

|CE(π,u)CE(π,u)|2ππ,|\mathcal{E}^{\mathrm{CE}}(\pi,u)-\mathcal{E}^{\mathrm{CE}}(\pi^{\prime},u)|\leq 2{\|\pi-\pi^{\prime}\|},
Proof.

aiAi,ϕi\forall a_{i}\in A_{i},\forall\phi_{i}, we have

a𝒜π(a)ui(ϕ(ai),ai)ui(π)=\displaystyle\sum_{a\in\mathcal{A}}\pi(a)u_{i}(\phi(a_{i}),a_{-i})-u_{i}(\pi)= a𝒜π(a)ui(ϕ(ai),ai)a𝒜π(a)ui(ϕ(ai),ai)\displaystyle\sum_{a\in\mathcal{A}}\pi(a)u_{i}(\phi(a_{i}),a_{-i})-\sum_{a\in\mathcal{A}}\pi^{\prime}(a)u_{i}(\phi(a_{i}),a_{-i})
+a𝒜π(a)ui(ϕ(ai),ai)ui(π)+ui(π)ui(π)\displaystyle\quad+\sum_{a\in\mathcal{A}}\pi^{\prime}(a)u_{i}(\phi(a_{i}),a_{-i})-u_{i}(\pi^{\prime})+u_{i}(\pi^{\prime})-u_{i}(\pi)
\displaystyle\leq ππ+CE(π,u)+ππ\displaystyle{\|\pi-\pi^{\prime}\|}+\mathcal{E}^{\mathrm{CE}}(\pi^{\prime},u)+{\|\pi-\pi^{\prime}\|}
=\displaystyle= CE(π,u)+2ππ.\displaystyle\mathcal{E}^{\mathrm{CE}}(\pi^{\prime},u)+2{\|\pi-\pi^{\prime}\|}.

Based on that, we get

CE(π,u)=maxiNmaxϕia𝒜π(a)ui(ϕ(ai),ai)ui(π)CE(π,u)+2ππ\displaystyle\mathcal{E}^{\mathrm{CE}}(\pi,u)=\max_{i\in N}\max_{\phi_{i}}\sum_{a\in\mathcal{A}}\pi(a)u_{i}(\phi(a_{i}),a_{-i})-u_{i}(\pi)\leq\mathcal{E}^{\mathrm{CE}}(\pi^{\prime},u)+2{\|\pi-\pi^{\prime}\|}

Similarly, we also have

CE(π,u)CE(π,u)+2ππ\mathcal{E}^{\mathrm{CE}}(\pi^{\prime},u)\leq\mathcal{E}^{\mathrm{CE}}(\pi,u)+2{\|\pi-\pi^{\prime}\|}

Based on Lemma B.4, f,fCE\forall f,f^{\prime}\in\mathcal{F}^{\mathrm{CE}}, we have

CE(f(u),u)CE(f(u),u)2f(u)f(u)2ff\mathcal{E}^{\mathrm{CE}}(f(u),u)-\mathcal{E}^{\mathrm{CE}}(f^{\prime}(u),u)\leq 2{\|f(u)-f^{\prime}(u)\|}\leq 2{\|f-f^{\prime}\|}_{\infty}

Considering that |()|1|\mathcal{E}(\cdot)|\leq 1, according to Lemma B.2, we have:

𝔼u𝒟[CE(fCE(u),u)]1muSCE(fCE(u),u)2infr>0{2ln𝒩(CE,r)m+2r}+42ln(4/δ)m\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}^{\mathrm{CE}}(f^{\mathrm{CE}}(u),u)]-\frac{1}{m}\sum_{u\in S}\mathcal{E}^{\mathrm{CE}}(f^{\mathrm{CE}}(u),u)\leq 2\cdot\inf_{r>0}\Big{\{}\sqrt{\frac{2\ln\mathcal{N}_{\infty}(\mathcal{F}^{\mathrm{CE}},r)}{m}}+2r\Big{\}}+4\sqrt{\frac{2\ln(4/\delta)}{m}}

B.2 Proof of Theorem 4.4

See 4.4

Proof.

For function class \mathcal{F} of NE, CE or CCE approximators, according to Lemma B.3, Lemma B.4 and Lemma B.5, f,g\forall f,g\in\mathcal{F} we have

(CE)(f(u),u)(CE)(g(u),u)Lf(u)g(u)Lfg,\mathcal{E}^{(\mathrm{CE})}(f(u),u)-\mathcal{E}^{(\mathrm{CE})}(g(u),u)\leq L{\|f(u)-g(u)\|}\leq L{\|f-g\|}_{\infty}, (7)

where L=2nL=2n for NE approximators, and L=2L=2 for CE and CCE approximators.

For simplicity, we denote LS(f)=1muS(CE)(f(u),u)L_{S}(f)=\frac{1}{m}\sum_{u\in S}\mathcal{E}^{(\mathrm{CE})}(f(u),u) and L𝒟(f)=𝔼u𝒟[(CE)(f(u),u)]L_{\mathcal{D}}(f)=\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}^{(\mathrm{CE})}(f(u),u)]. let r\mathcal{F}_{r} with |r|=𝒩(,r)|\mathcal{F}_{r}|=\mathcal{N}_{\infty}(\mathcal{F},r) be the function class that rr-covers \mathcal{F} for some r>0r>0. ϵ(0,1)\forall\epsilon\in(0,1), by setting r=ϵ3Lr=\frac{\epsilon}{3L} we have

S𝒟m[f,|LS(f)L𝒟(f)|>ϵ]\displaystyle\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\exists f\in\mathcal{F},\big{|}L_{S}(f)-L_{\mathcal{D}}(f)\big{|}>\epsilon\Big{]}
\displaystyle\leq S𝒟m[f,|LS(f)LS(fr)|+|LS(fr)L𝒟(fr)|+|L𝒟(fr)L𝒟(f)|>ϵ]\displaystyle\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\exists f\in\mathcal{F},\big{|}L_{S}(f)-L_{S}({f}_{r})\big{|}+\big{|}L_{S}({f}_{r})-L_{\mathcal{D}}({f}_{r})\big{|}+\big{|}L_{\mathcal{D}}({f}_{r})-L_{\mathcal{D}}(f)\big{|}>\epsilon\Big{]}
(a)\displaystyle\overset{(a)}{\leq} S𝒟m[f,Lr+|LS(fr)L𝒟(fr)|+Lr>ϵ]\displaystyle\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\exists f\in\mathcal{F},Lr+\big{|}L_{S}({f}_{r})-L_{\mathcal{D}}({f}_{r})\big{|}+Lr>\epsilon\Big{]}
\displaystyle\leq S𝒟m[frr,|LS(fr)L𝒟(fr)|>ϵ2Lr]\displaystyle\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\exists{f}_{r}\in{\mathcal{F}}_{r},\big{|}L_{S}({f}_{r})-L_{\mathcal{D}}({f}_{r})\big{|}>\epsilon-2Lr\Big{]}
(b)\displaystyle\overset{(b)}{\leq} 𝒩(,r)S𝒟m[|LS(f)L𝒟(f)|>ϵ2Lr]\displaystyle\mathcal{N}_{\infty}(\mathcal{F},r)\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\big{|}L_{S}({f})-L_{\mathcal{D}}({f})\big{|}>\epsilon-2Lr\Big{]}
(c)\displaystyle\overset{(c)}{\leq} 2𝒩(,r)exp(2m(ϵ2Lr)2),\displaystyle 2\mathcal{N}_{\infty}(\mathcal{F},r)\exp(-2m(\epsilon-2Lr)^{2}),
=\displaystyle= 2𝒩(,ϵ3L)exp(29mϵ2)\displaystyle 2\mathcal{N}_{\infty}(\mathcal{F},\frac{\epsilon}{3L})\exp(-\frac{2}{9}m\epsilon^{2})

where (a)(a) holds by Equation 7, (b)(b) holds by union bound, and (c)(c) holds by Hoeffding inequality. As a result, when m92ϵ2(ln2δ+ln𝒩(,ϵ3L))m\geq\frac{9}{2\epsilon^{2}}\left(\ln\frac{2}{\delta}+\ln\mathcal{N}_{\infty}(\mathcal{F},\frac{\epsilon}{3L})\right), we have S𝒟m[f,|LS(f)L𝒟(f)|>ϵ]<δ\mathbb{P}_{S\sim\mathcal{D}^{m}}\Big{[}\exists f\in\mathcal{F},\Big{|}L_{S}(f)-L_{\mathcal{D}}(f)\Big{|}>\epsilon\Big{]}<\delta. ∎

B.3 Proof of Theorem 4.5

See 4.5

We first provide an auxiliary lemma.

Lemma B.6.

For function class \mathcal{F} and orbit averaging operator 𝒳\mathcal{X}, if f,g,(𝒳f,𝒳g)(f,g)\forall f,g\in\mathcal{F},\ell_{\infty}(\mathcal{X}f,\mathcal{X}g)\leq\ell_{\infty}(f,g), then 𝒩(𝒳,r)𝒩(,r)\mathcal{N}_{\infty}(\mathcal{X}\mathcal{F},r)\leq\mathcal{N}_{\infty}(\mathcal{F},r) for any r>0r>0.

Proof.

r>0\forall r>0, Denote r\mathcal{F}_{r} as the smallest rr-covering set that covers \mathcal{F} with size 𝒩(,r)\mathcal{N}_{\infty}(\mathcal{F},r). f\forall f\in\mathcal{F}, let frrf_{r}\in\mathcal{F}_{r} be the function that rr-covers ff. We have (𝒳fr,𝒳f)(fr,f)r\ell_{\infty}(\mathcal{X}f_{r},\mathcal{X}f)\leq\ell_{\infty}(f_{r},f)\leq r. Therefore, 𝒳r\mathcal{X}\mathcal{F}_{r} is a rr-covering set of 𝒳\mathcal{X}\mathcal{F}, and we have 𝒩(𝒳,r)|𝒳r||r|=𝒩\mathcal{N}_{\infty}(\mathcal{X}\mathcal{F},r)\leq|\mathcal{X}\mathcal{F}_{r}|\leq|\mathcal{F}_{r}|=\mathcal{N}_{\infty}. ∎

Proof of Theorem 4.5.

For player i[n]i\in[n] and fNE,gNENE\forall f^{\mathrm{NE}},g^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}}, assuming 𝒰\mathcal{U} is closed under any ρi𝒢i\rho_{i}\in\mathcal{G}_{i}. For 𝒪i\mathcal{O}_{i},

l(𝒪ifNE,𝒪igNE)=\displaystyle l_{\infty}(\mathcal{O}_{i}f^{\mathrm{NE}},\mathcal{O}_{i}g^{\mathrm{NE}})= maxu𝒰𝒪ifNE(u)𝒪igNE(u)\displaystyle\max_{u\in\mathcal{U}}\|\mathcal{O}_{i}f^{\mathrm{NE}}(u)-\mathcal{O}_{i}g^{\mathrm{NE}}(u)\|
=\displaystyle= maxj[n]maxu𝒰(𝒪ifNE)(u)j(𝒪igNE)(u)j\displaystyle\max_{j\in[n]}\max_{u\in\mathcal{U}}\|(\mathcal{O}_{i}f^{\mathrm{NE}})(u)_{j}-(\mathcal{O}_{i}g^{\mathrm{NE}})(u)_{j}\|
=\displaystyle= max{maxu𝒰fNE(u)igNE(u)i,maxjimaxu𝒰1|𝒜i|!ρi𝒢i(fNE(ρiu)jgNE(ρiu)j)}\displaystyle\max\Big{\{}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|,\leavevmode\nobreak\ \max_{j\neq i}\max_{u\in\mathcal{U}}\|\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}(f^{\mathrm{NE}}(\rho_{i}u)_{j}-g^{\mathrm{NE}}(\rho_{i}u)_{j})\|\Big{\}}
\displaystyle\leq max{maxu𝒰fNE(u)igNE(u)i,maxji1|𝒜i|!ρi𝒢imaxu𝒰fNE(ρiu)jgNE(ρiu)j}\displaystyle\max\Big{\{}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|,\leavevmode\nobreak\ \max_{j\neq i}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(\rho_{i}u)_{j}-g^{\mathrm{NE}}(\rho_{i}u)_{j}\|\Big{\}}
=\displaystyle= max{maxu𝒰fNE(u)igNE(u)i,maxji1|𝒜i|!ρi𝒢imaxu𝒰fNE(u)jgNE(u)j}\displaystyle\max\Big{\{}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|,\leavevmode\nobreak\ \max_{j\neq i}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|\Big{\}}
=\displaystyle= max{maxu𝒰fNE(u)igNE(u)i,maxjimaxufNE(u)jgNE(u)j}\displaystyle\max\Big{\{}\max_{u\in\mathcal{U}}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|,\leavevmode\nobreak\ \max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|\Big{\}}
=\displaystyle= l(fNE,gNE)\displaystyle l_{\infty}(f^{\mathrm{NE}},g^{\mathrm{NE}})

Since 𝒪=𝒪1𝒪n\mathcal{O}=\mathcal{O}_{1}\circ\cdots\circ\mathcal{O}_{n}, we have

(𝒪fNE,𝒪gNE)(fNE,gNE).\ell_{\infty}(\mathcal{O}f^{\mathrm{NE}},\mathcal{O}g^{\mathrm{NE}})\leq\ell_{\infty}(f^{\mathrm{NE}},g^{\mathrm{NE}}). (8)

For 𝒫i\mathcal{P}_{i},

l(𝒫ifNE,𝒫igNE)=\displaystyle l_{\infty}(\mathcal{P}_{i}f^{\mathrm{NE}},\mathcal{P}_{i}g^{\mathrm{NE}})= maxu𝒰maxj[n](𝒫ifNE)(u)j(𝒫igNE)(u)j\displaystyle\max_{u\in\mathcal{U}}\max_{j\in[n]}\|(\mathcal{P}_{i}f^{\mathrm{NE}})(u)_{j}-(\mathcal{P}_{i}g^{\mathrm{NE}})(u)_{j}\|
=\displaystyle= max{maxjimaxufNE(u)jgNE(u)j,maxu1|𝒜i|!ρi𝒢iρi1(fNE(ρiu)igNE(ρiu)i)}\displaystyle\max\Big{\{}\max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|,\leavevmode\nobreak\ \max_{u}\|\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}(f^{\mathrm{NE}}(\rho_{i}u)_{i}-g^{\mathrm{NE}}(\rho_{i}u)_{i})\|\Big{\}}
=\displaystyle= max{maxjimaxufNE(u)jgNE(u)j,maxu1|𝒜i|!ρi𝒢i(fNE(ρiu)igNE(ρiu)i)}\displaystyle\max\Big{\{}\max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|,\leavevmode\nobreak\ \max_{u}\|\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}(f^{\mathrm{NE}}(\rho_{i}u)_{i}-g^{\mathrm{NE}}(\rho_{i}u)_{i})\|\Big{\}}
\displaystyle\leq max{maxjimaxufNE(u)jgNE(u)j,1|𝒜i|!ρi𝒢imaxufNE(ρiu)igNE(ρiu)i}\displaystyle\max\Big{\{}\max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|,\leavevmode\nobreak\ \frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u}\|f^{\mathrm{NE}}(\rho_{i}u)_{i}-g^{\mathrm{NE}}(\rho_{i}u)_{i}\|\Big{\}}
=\displaystyle= max{maxjimaxufNE(u)jgNE(u)j,1|𝒜i|!ρi𝒢imaxufNE(u)igNE(u)i}\displaystyle\max\Big{\{}\max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|,\leavevmode\nobreak\ \frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|\Big{\}}
=\displaystyle= max{maxjimaxufNE(u)jgNE(u)j,maxufNE(u)igNE(u)i}\displaystyle\max\Big{\{}\max_{j\neq i}\max_{u}\|f^{\mathrm{NE}}(u)_{j}-g^{\mathrm{NE}}(u)_{j}\|,\leavevmode\nobreak\ \max_{u}\|f^{\mathrm{NE}}(u)_{i}-g^{\mathrm{NE}}(u)_{i}\|\Big{\}}
=\displaystyle= l(fNE,gNE)\displaystyle l_{\infty}(f^{\mathrm{NE}},g^{\mathrm{NE}})

Since 𝒫=𝒫1𝒫n\mathcal{P}=\mathcal{P}_{1}\circ\cdots\circ\mathcal{P}_{n}, we have

(𝒫fNE,𝒫gNE)(fNE,gNE).\ell_{\infty}(\mathcal{P}f^{\mathrm{NE}},\mathcal{P}g^{\mathrm{NE}})\leq\ell_{\infty}(f^{\mathrm{NE}},g^{\mathrm{NE}}). (9)

For CE or CCE approximator f(C)CE(C)CEf^{\mathrm{(C)CE}}\in\mathcal{F}^{\mathrm{(C)CE}} and 𝒬i\mathcal{Q}_{i}, we have

l(𝒬if(C)CE,𝒬ig(C)CE)=\displaystyle l_{\infty}(\mathcal{Q}_{i}f^{\mathrm{(C)CE}},\mathcal{Q}_{i}g^{\mathrm{(C)CE}})= maxu𝒰(𝒬if(C)CE)(u)(𝒬ig(C)CE)(u)\displaystyle\max_{u\in\mathcal{U}}\|(\mathcal{Q}_{i}f^{\mathrm{(C)CE}})(u)-(\mathcal{Q}_{i}g^{\mathrm{(C)CE}})(u)\|
=\displaystyle= maxu1|𝒜i|!ρi𝒢iρi1(f(C)CE(ρiu)g(C)CE(ρiu))\displaystyle\max_{u}\|\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}(f^{\mathrm{(C)CE}}(\rho_{i}u)-g^{\mathrm{(C)CE}}(\rho_{i}u))\|
\displaystyle\leq maxu1|𝒜i|!ρi𝒢iρi1(f(C)CE(ρiu)g(C)CE(ρiu))\displaystyle\max_{u}\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\|\rho_{i}^{-1}(f^{\mathrm{(C)CE}}(\rho_{i}u)-g^{\mathrm{(C)CE}}(\rho_{i}u))\|
\displaystyle\leq 1|𝒜i|!ρi𝒢imaxuρi1(f(C)CE(ρiu)g(C)CE(ρiu))\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u}\|\rho_{i}^{-1}(f^{\mathrm{(C)CE}}(\rho_{i}u)-g^{\mathrm{(C)CE}}(\rho_{i}u))\|
=\displaystyle= 1|𝒜i|!ρi𝒢imaxuf(C)CE(ρiu)g(C)CE(ρiu)\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u}\|f^{\mathrm{(C)CE}}(\rho_{i}u)-g^{\mathrm{(C)CE}}(\rho_{i}u)\|
=\displaystyle= 1|𝒜i|!ρi𝒢imaxuf(C)CE(u)g(C)CE(u)\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\max_{u}\|f^{\mathrm{(C)CE}}(u)-g^{\mathrm{(C)CE}}(u)\|
=\displaystyle= l(f(C)CE,g(C)CE)\displaystyle l_{\infty}(f^{\mathrm{(C)CE}},g^{\mathrm{(C)CE}})

Since 𝒬=𝒬1𝒬n\mathcal{Q}=\mathcal{Q}_{1}\circ\cdots\circ\mathcal{Q}_{n}, we have

(𝒬f(C)CE,𝒬g(C)CE)(f(C)CE,g(C)CE).\ell_{\infty}(\mathcal{Q}f^{\mathrm{(C)CE}},\mathcal{Q}g^{\mathrm{(C)CE}})\leq\ell_{\infty}(f^{\mathrm{(C)CE}},g^{\mathrm{(C)CE}}). (10)

Combing Lemma B.6, Equation 8, Equation 9 and Equation 10, we finish the proof. ∎

B.4 Proof of Theorem 4.6

See 4.6

We first prove a lemma about the property of i(π,u)\mathcal{E}_{i}(\pi,u) and iCE(π,u)\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u).

Lemma B.7.

i(π,u)\mathcal{E}_{i}(\pi,u) and iCE(π,u)\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u) are convex on π\pi, i.e.

pi(CE)(π1,u)+(1p)i(CE)(π2,u)i(CE)(pπ1+(1p)π2,u),p[0,1]\displaystyle p\mathcal{E}_{i}^{\text{(CE)}}(\pi_{1},u)+(1-p)\mathcal{E}_{i}^{\text{(CE)}}(\pi_{2},u)\geq\mathcal{E}_{i}^{\text{(CE)}}(p\pi_{1}+(1-p)\pi_{2},u),\quad\forall p\in[0,1]
Proof.

We recall the definition i(π,u)=maxai𝒜iui(ai,πi)ui(π)\mathcal{E}_{i}(\pi,u)=\max_{a_{i}\in\mathcal{A}_{i}}u_{i}(a_{i},\pi_{-i})-u_{i}(\pi) for CCE approximator and iCE(π,u)=maxϕi𝒜i𝒜iaπ(a)ui(ϕi(ai),ai)ui(π)\mathcal{E}_{i}^{\mathrm{CE}}(\pi,u)=\max_{\phi_{i}\in\mathcal{A}_{i}\to\mathcal{A}_{i}}\sum_{a}\pi(a)u_{i}(\phi_{i}(a_{i}),a_{-i})-u_{i}(\pi) for CE approximator. ui(ai,πi)u_{i}(a_{i},\pi_{-i}) is linear on π\pi. Given ϕ\phi, aπ(a)ui(ϕi(ai),ai)\sum_{a}\pi(a)u_{i}(\phi_{i}(a_{i}),a_{-i}) is also linear on π\pi. Moreover, the maximum operator on a set of linear functions will induce a convex function.

Proof of Theorem 4.6.

For f(C)CEf\in\mathcal{F}^{\mathrm{(C)CE}} and i,j[n]\forall i,j\in[n],

𝔼u𝒟[i(CE)(𝒬jf(u),u)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(\mathcal{Q}_{j}f(u),u)]= 𝔼u𝒟[i(CE)(1|𝒜j|!ρj𝒢jρj1f(ρju),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(\frac{1}{|\mathcal{A}_{j}|!}\sum_{\rho_{j}\in\mathcal{G}_{j}}\rho_{j}^{-1}f(\rho_{j}u),u)] ,by definition\displaystyle,\text{by definition}
\displaystyle\leq 1|𝒜j|!ρj𝒢j𝔼u𝒟[i(CE)(ρj1f(ρju),u)]\displaystyle\frac{1}{|\mathcal{A}_{j}|!}\sum_{\rho_{j}\in\mathcal{G}_{j}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(\rho_{j}^{-1}f(\rho_{j}u),u)] ,by convexity\displaystyle,\text{by convexity}
=\displaystyle= 1|𝒜j|!ρj𝒢j𝔼v𝒟[i(CE)(ρj1f(v),ρj1v)]\displaystyle\frac{1}{|\mathcal{A}_{j}|!}\sum_{\rho_{j}\in\mathcal{G}_{j}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(\rho_{j}^{-1}f(v),\rho_{j}^{-1}v)] ,let v=ρju\displaystyle,\text{let $v=\rho_{j}u$}
=\displaystyle= 1|𝒜j|!ρj𝒢j𝔼v𝒟[i(CE)(f(v),v)]\displaystyle\frac{1}{|\mathcal{A}_{j}|!}\sum_{\rho_{j}\in\mathcal{G}_{j}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(f(v),v)] ,invariance of i(CE)(π,u) under ρj1𝒢j\displaystyle,\text{invariance of $\mathcal{E}_{i}^{\text{(CE)}}(\pi,u)$ under $\rho_{j}^{-1}\in\mathcal{G}_{j}$}
=\displaystyle= 𝔼u𝒟[i(CE)(f(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}^{\text{(CE)}}(f(u),u)]

Since 𝒬=i𝒬i\mathcal{Q}=\circ_{i}\mathcal{Q}_{i} and =maxii\mathcal{E}=\max_{i}\mathcal{E}_{i}, we have

𝔼u𝒟[(𝒬f(u),u)]𝔼u𝒟[(f(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(\mathcal{Q}f(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f(u),u)]

B.5 Proof of Theorem 4.7

See 4.7

Proof.

We only prove for the 𝒫\mathcal{P}-projected case; the proof for 𝒪\mathcal{O}-projected case is similar and therefore omitted.

Recall

i(σ,u)=maxai𝒜iui(ai,σi)ui(σ)\displaystyle\mathcal{E}_{i}(\sigma,u)=\max_{a_{i}\in\mathcal{A}_{i}}u_{i}(a_{i},\sigma_{-i})-u_{i}(\sigma)

Denote u1(σ)+u2(σ)cu_{1}(\sigma)+u_{2}(\sigma)\equiv c, then

ii(σ,u)=maxa1𝒜1,a2𝒜2u1(a1,σ2)+u2(a2,σ1)c\displaystyle\sum_{i}\mathcal{E}_{i}(\sigma,u)=\max_{a_{1}\in\mathcal{A}_{1},a_{2}\in\mathcal{A}_{2}}u_{1}(a_{1},\sigma_{2})+u_{2}(a_{2},\sigma_{1})-c

Then we have

𝔼u𝒟[ii((𝒫fNE)(u),u)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\sum_{i}\mathcal{E}_{i}((\mathcal{P}f^{\mathrm{NE}})(u),u)]= 𝔼u𝒟[maxa1,a2u1(a1,(𝒫fNE)(u)2)+u2(a2,(𝒫fNE)(u)1)c]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1},a_{2}}u_{1}(a_{1},(\mathcal{P}f^{\mathrm{NE}})(u)_{2})+u_{2}(a_{2},(\mathcal{P}f^{\mathrm{NE}})(u)_{1})-c]
=\displaystyle= 𝔼u𝒟[maxa1u1(a1,(𝒫fNE)(u)2)]+𝔼u𝒟[maxa2u2(a2,(𝒫fNE)(u)1)]c\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},(\mathcal{P}f^{\mathrm{NE}})(u)_{2})]+\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{2}}u_{2}(a_{2},(\mathcal{P}f^{\mathrm{NE}})(u)_{1})]-c

For the first term,

𝔼u𝒟[maxa1u1(a1,(𝒫fNE)(u)2)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},(\mathcal{P}f^{\mathrm{NE}})(u)_{2})]= 𝔼u𝒟[maxa1u1(a1,1|𝒜2|!ρ2𝒢2ρ21fNE(ρ2u)2)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},\frac{1}{|\mathcal{A}_{2}|!}\sum_{\rho_{2}\in\mathcal{G}_{2}}\rho_{2}^{-1}f^{\mathrm{NE}}(\rho_{2}u)_{2})]
\displaystyle\leq 1|𝒜2|!ρ2𝒢2𝔼u𝒟[maxa1u1(a1,ρ21fNE(ρ2u)2)]\displaystyle\frac{1}{|\mathcal{A}_{2}|!}\sum_{\rho_{2}\in\mathcal{G}_{2}}\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},\rho_{2}^{-1}f^{\mathrm{NE}}(\rho_{2}u)_{2})]
=\displaystyle= 1|𝒜2|!ρ2𝒢2𝔼v𝒟[maxa1(ρ21v)1(a1,ρ21fNE(v)2)]\displaystyle\frac{1}{|\mathcal{A}_{2}|!}\sum_{\rho_{2}\in\mathcal{G}_{2}}\mathbb{E}_{v\sim\mathcal{D}}[\max_{a_{1}}(\rho_{2}^{-1}v)_{1}(a_{1},\rho_{2}^{-1}f^{\mathrm{NE}}(v)_{2})]
=\displaystyle= 1|𝒜2|!ρ2𝒢2𝔼v𝒟[maxa1v1(a1,fNE(v)2)]\displaystyle\frac{1}{|\mathcal{A}_{2}|!}\sum_{\rho_{2}\in\mathcal{G}_{2}}\mathbb{E}_{v\sim\mathcal{D}}[\max_{a_{1}}v_{1}(a_{1},f^{\mathrm{NE}}(v)_{2})]
=\displaystyle= 𝔼u𝒟[maxa1u1(a1,fNE(u)2)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},f^{\mathrm{NE}}(u)_{2})]

Similarly, for the second term,

𝔼u𝒟[maxa2u2(a2,(𝒫fNE)(u)1)]𝔼u𝒟[maxa2u2(a2,fNE(u)1)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{2}}u_{2}(a_{2},(\mathcal{P}f^{\mathrm{NE}})(u)_{1})]\leq\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{2}}u_{2}(a_{2},f^{\mathrm{NE}}(u)_{1})]

Above all,

𝔼u𝒟[ii((𝒫fNE)(u),u)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\sum_{i}\mathcal{E}_{i}((\mathcal{P}f^{\mathrm{NE}})(u),u)]= 𝔼u𝒟[maxa1u1(a1,(𝒫fNE)(u)2)]+𝔼u𝒟[maxa2u2(a2,(𝒫fNE)(u)1)]c\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},(\mathcal{P}f^{\mathrm{NE}})(u)_{2})]+\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{2}}u_{2}(a_{2},(\mathcal{P}f^{\mathrm{NE}})(u)_{1})]-c
\displaystyle\leq 𝔼u𝒟[maxa1u1(a1,fNE(u)2)]+𝔼u𝒟[maxa2u2(a2,fNE(u)1)]c\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{1}}u_{1}(a_{1},f^{\mathrm{NE}}(u)_{2})]+\mathbb{E}_{u\sim\mathcal{D}}[\max_{a_{2}}u_{2}(a_{2},f^{\mathrm{NE}}(u)_{1})]-c
=\displaystyle= 𝔼u𝒟[ii(fNE(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\sum_{i}\mathcal{E}_{i}(f^{\mathrm{NE}}(u),u)]

B.6 Proof of Theorem 4.8

See 4.8

We first introduce a useful lemma. It is about the property of i(σ,u)\mathcal{E}_{i}(\sigma,u)

Lemma B.8.

i(σ,u)\mathcal{E}_{i}(\sigma,u) is

  1. 1.

    Linear on σi\sigma_{i}, i.e.

    pi((σi1,σi),u)+(1p)i((σi2,σi),u)=i((pσi1+(1p)σi2,σi),u),p[0,1]p\mathcal{E}_{i}((\sigma_{i}^{1},\sigma_{-i}),u)+(1-p)\mathcal{E}_{i}((\sigma_{i}^{2},\sigma_{-i}),u)=\mathcal{E}_{i}((p\sigma_{i}^{1}+(1-p)\sigma_{i}^{2},\sigma_{-i}),u),\leavevmode\nobreak\ \forall p\in[0,1]
  2. 2.

    Convex on σj\sigma_{j}, i.e.

    pi((σj1,σj),u)+(1p)i((σj2,σj),u)i((pσj1+(1p)σj2,σj),u),p[0,1],jip\mathcal{E}_{i}((\sigma_{j}^{1},\sigma_{-j}),u)+(1-p)\mathcal{E}_{i}((\sigma_{j}^{2},\sigma_{-j}),u)\geq\mathcal{E}_{i}((p\sigma_{j}^{1}+(1-p)\sigma_{j}^{2},\sigma_{-j}),u),\leavevmode\nobreak\ \forall p\in[0,1],j\neq i
Proof.

We recall the definition i(σ,u)=maxai𝒜iui(ai,σi)ui(σ)\mathcal{E}_{i}(\sigma,u)=\max_{a_{i}\in\mathcal{A}_{i}}u_{i}(a_{i},\sigma_{-i})-u_{i}(\sigma). Notice that ui(σ)u_{i}(\sigma) is linear on σk\sigma_{k} for all k[n]k\in[n], thus both ui(ai,σi)u_{i}(a_{i},\sigma_{-i}) and ui(σ)u_{i}(\sigma) are linear on σk\sigma_{k} for any k[n]k\in[n]. Moreover, the maximum operator on a set of linear functions will induce a convex function.

Proof of Theorem 4.8.

We prove the theorem in two steps.

Step 1

First, we show that

𝔼u𝒟[i((𝒫ifNE)(u),u)]=𝔼u𝒟[i(fNE(u),u)],fNENE\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((\mathcal{P}_{i}f^{\mathrm{NE}})(u),u)]=\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}(f^{\mathrm{NE}}(u),u)],\quad\forall f^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}}

By definition,

𝔼u𝒟[i(𝒫ifNE(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}(\mathcal{P}_{i}f^{\mathrm{NE}}(u),u)]
=\displaystyle= 𝔼u𝒟[i((1|𝒜i|!ρi𝒢iρi1f(ρiu)i,f(u)i),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f(\rho_{i}u)_{i},f(u)_{-i}),u)]
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼u𝒟[i((ρi1f(ρiu)i,f(u)i),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((\rho_{i}^{-1}f(\rho_{i}u)_{i},f(u)_{-i}),u)] ,by linearity of i(σ,u) on σi\displaystyle,\text{by linearity of $\mathcal{E}_{i}(\sigma,u)$ on $\sigma_{i}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[i((ρi1f(v)i,f(ρi1v)i),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}((\rho_{i}^{-1}f(v)_{i},f(\rho_{i}^{-1}v)_{-i}),\rho_{i}^{-1}v)] ,let v=ρiu and use the invariance of 𝒟\displaystyle,\text{let $v=\rho_{i}u$ and use the invariance of $\mathcal{D}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[i((ρi1f(v)i,f(v)i),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}((\rho_{i}^{-1}f(v)_{i},f(v)_{-i}),\rho_{i}^{-1}v)] ,OPI of f\displaystyle,\text{OPI of $f$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼u𝒟[i((f(u)i,f(u)i),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((f(u)_{i},f(u)_{-i}),u)] ,invariance of i(σ,u) under ρi1𝒢i\displaystyle,\text{invariance of $\mathcal{E}_{i}(\sigma,u)$ under $\rho_{i}^{-1}\in\mathcal{G}_{i}$}
=\displaystyle= 𝔼u𝒟[i(fNE(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}(f^{\mathrm{NE}}(u),u)]
Step 2

Then we show that

𝔼u𝒟[j((𝒫ifNE)(u),u)]𝔼u𝒟[j(fNE(u),u)],fNENE,ji\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\mathcal{P}_{i}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}(f^{\mathrm{NE}}(u),u)],\quad\forall f^{\mathrm{NE}}\in\mathcal{F}^{\mathrm{NE}},j\neq i
𝔼u𝒟[j((𝒫ifNE)(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\mathcal{P}_{i}f^{\mathrm{NE}})(u),u)]
=\displaystyle= 𝔼u𝒟[j((1|𝒜i|!ρi𝒢iρi1f(ρiu)i,f(u)i),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f(\rho_{i}u)_{i},f(u)_{-i}),u)]
\displaystyle\leq 1|𝒜i|!ρi𝒢i𝔼u𝒟[j((ρi1f(ρiu)i,f(u)i),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\rho_{i}^{-1}f(\rho_{i}u)_{i},f(u)_{-i}),u)] ,by convexity of j(σ,u) on σi\displaystyle,\text{by convexity of $\mathcal{E}_{j}(\sigma,u)$ on $\sigma_{i}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[j((ρi1f(v)i,f(ρi1v)i),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{j}((\rho_{i}^{-1}f(v)_{i},f(\rho_{i}^{-1}v)_{-i}),\rho_{i}^{-1}v)] ,let v=ρiu and use the invariance of 𝒟\displaystyle,\text{let $v=\rho_{i}u$ and use the invariance of $\mathcal{D}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[j((ρi1f(v)i,f(v)i),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{j}((\rho_{i}^{-1}f(v)_{i},f(v)_{-i}),\rho_{i}^{-1}v)] ,OPI of f\displaystyle,\text{OPI of $f$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼u𝒟[j((f(u)i,f(u)i),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((f(u)_{i},f(u)_{-i}),u)] ,invariance of j(σ,u) under ρi1𝒢i\displaystyle,\text{invariance of $\mathcal{E}_{j}(\sigma,u)$ under $\rho_{i}^{-1}\in\mathcal{G}_{i}$}
=\displaystyle= 𝔼u𝒟[j(fNE(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}(f^{\mathrm{NE}}(u),u)]

Since 𝒫=i𝒫i\mathcal{P}=\circ_{i}\mathcal{P}_{i} and =maxii\mathcal{E}=\max_{i}\mathcal{E}_{i}, we have

𝔼u𝒟[((𝒫fNE)(u),u)]𝔼u𝒟[(fNE(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}((\mathcal{P}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{NE}}(u),u)]

B.7 Proof of Theorem 4.9

See 4.9

Proof.

We prove the theorem in two steps, similar to the proof of Theorem 4.8.

Step 1

First we show that for player i{1,2}i\in\{1,2\}, let {j}={1,2}\{i}\{j\}=\{1,2\}\backslash\{i\},

𝔼u𝒟[i((𝒪ifNE)(u),u)]𝔼u𝒟[i(fNE(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((\mathcal{O}_{i}f^{\mathrm{NE}})(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}(f^{\mathrm{NE}}(u),u)]

This is because

𝔼u𝒟[i((𝒪ifNE)(u),u)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((\mathcal{O}_{i}f^{\mathrm{NE}})(u),u)]= 𝔼u𝒟[i((fNE(u)i,1|𝒜i|!ρi𝒢ifNE(ρiu)j),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((f^{\mathrm{NE}}(u)_{i},\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\rho_{i}u)_{j}),u)]
\displaystyle\leq 1|𝒜i|!ρi𝒢i𝔼u𝒟[i((fNE(u)i,fNE(ρiu)j),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((f^{\mathrm{NE}}(u)_{i},f^{\mathrm{NE}}(\rho_{i}u)_{j}),u)] ,by convexity of i on σj\displaystyle,\text{by convexity of $\mathcal{E}_{i}$ on $\sigma_{j}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[i((fNE(ρi1v)i,fNE(v)j),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}((f^{\mathrm{NE}}(\rho_{i}^{-1}v)_{i},f^{\mathrm{NE}}(v)_{j}),\rho_{i}^{-1}v)] ,let v=ρiu\displaystyle,\text{let $v=\rho_{i}u$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[i((ρi1fNE(v)i,fNE(v)j),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}((\rho_{i}^{-1}f^{\mathrm{NE}}(v)_{i},f^{\mathrm{NE}}(v)_{j}),\rho_{i}^{-1}v)] ,by PPE of fNE\displaystyle,\text{by PPE of $f^{\mathrm{NE}}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[i((fNE(v)i,fNE(v)j),v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{i}((f^{\mathrm{NE}}(v)_{i},f^{\mathrm{NE}}(v)_{j}),v)] ,invariance of i(σ,u) under ρi1𝒢\displaystyle,\text{invariance of $\mathcal{E}_{i}(\sigma,u)$ under $\rho_{i}^{-1}\in\mathcal{G}$}
=\displaystyle= 𝔼u𝒟[i((fNE)(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{i}((f^{\mathrm{NE}})(u),u)]
Step 2

Then we show that if jij\neq i and {i,j}={1,2}\{i,j\}=\{1,2\}

𝔼u𝒟[j((𝒪ifNE)(u),u)]=𝔼u𝒟[j(fNE(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\mathcal{O}_{i}f^{\mathrm{NE}})(u),u)]=\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}(f^{\mathrm{NE}}(u),u)]

This is because

𝔼u𝒟[j((𝒪ifNE)(u),u)]=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((\mathcal{O}_{i}f^{\mathrm{NE}})(u),u)]= 𝔼u𝒟[j((fNE(u)i,1|𝒜i|!ρi𝒢ifNE(ρiu)j),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((f^{\mathrm{NE}}(u)_{i},\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}f^{\mathrm{NE}}(\rho_{i}u)_{j}),u)]
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼u𝒟[j((fNE(u)i,fNE(ρiu)j),u)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}((f^{\mathrm{NE}}(u)_{i},f^{\mathrm{NE}}(\rho_{i}u)_{j}),u)] ,by linearity of j on σj\displaystyle,\text{by linearity of $\mathcal{E}_{j}$ on $\sigma_{j}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[j((fNE(ρi1v)i,fNE(v)j),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{j}((f^{\mathrm{NE}}(\rho_{i}^{-1}v)_{i},f^{\mathrm{NE}}(v)_{j}),\rho_{i}^{-1}v)] ,let v=ρiu\displaystyle,\text{let $v=\rho_{i}u$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[j((ρi1fNE(v)i,fNE(v)j),ρi1v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{j}((\rho_{i}^{-1}f^{\mathrm{NE}}(v)_{i},f^{\mathrm{NE}}(v)_{j}),\rho_{i}^{-1}v)] ,by PPE of fNE\displaystyle,\text{by PPE of $f^{\mathrm{NE}}$}
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟[j((fNE(v)i,fNE(v)j),v)]\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}[\mathcal{E}_{j}((f^{\mathrm{NE}}(v)_{i},f^{\mathrm{NE}}(v)_{j}),v)] ,invariance of j(σ,u) under ρi1𝒢i\displaystyle,\text{invariance of $\mathcal{E}_{j}(\sigma,u)$ under $\rho_{i}^{-1}\in\mathcal{G}_{i}$}
=\displaystyle= 𝔼u𝒟[j(fNE(u),u)]\displaystyle\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}_{j}(f^{\mathrm{NE}}(u),u)]

Since 𝒪=i𝒪i\mathcal{O}=\circ_{i}\mathcal{O}_{i} and =maxii\mathcal{E}=\max_{i}\mathcal{E}_{i}, we have

𝔼u𝒟[(𝒪fNE(u),u)]𝔼u𝒟[(fNE(u),u)]\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(\mathcal{O}f^{\mathrm{NE}}(u),u)]\leq\mathbb{E}_{u\sim\mathcal{D}}[\mathcal{E}(f^{\mathrm{NE}}(u),u)]

Appendix C Omitted Proofs in Section 5

C.1 Proof of Theorem 5.3

See 5.3

Proof.

Let ff be a PPE and OPI NE approximator. Denote f(u)=(σi)i[n]f(u)=(\sigma_{i})_{i\in[n]}. For player kk that akV(ρk)a^{*}_{k}\in V(\rho_{k}), we get

σk=f(u)k=(a)f(ρu)k=(b)f(ρku)k=(c)ρkf(u)k=ρkσk,\sigma_{k}=f(u)_{k}\overset{(a)}{=}f(\rho u)_{k}\overset{(b)}{=}f(\rho_{k}u)_{k}\overset{(c)}{=}\rho_{k}f(u)_{k}=\rho_{k}\sigma_{k}, (11)

where (a)(a) holds since uu is permutable w.r.t. ρ\rho, (b)(b) holds by OPI of ff, and (c)(c) holds by PPE of ff. If aa^{*} can be found by ff, we will have 1=σk(ak)=(d)ρkσk(ak)=σk(ρk1(ak))1=\sigma_{k}(a^{*}_{k})\overset{(d)}{=}\rho_{k}\sigma_{k}(a^{*}_{k})=\sigma_{k}(\rho_{k}^{-1}(a^{*}_{k})), where (d)(d) holds by Equation 11. However, such result leads to a contradiction, because akρk1(ak)a^{*}_{k}\neq\rho_{k}^{-1}(a_{k}) but σk(ak)=σ(ρk1(ak))=1\sigma_{k}(a^{*}_{k})=\sigma(\rho_{k}^{-1}(a^{*}_{k}))=1.

Let ff be a PE (C)CE approximator. Denote f(u)=πf(u)=\pi, we have

π=f(u)=(a)f(ρu)=(b)ρf(u)=ρπ\pi=f(u)\overset{(a)}{=}f(\rho u)\overset{(b)}{=}\rho f(u)=\rho\pi (12)

where (a)(a) holds since uu is permutable w.r.t. ρ\rho, (b)(b) holds by PE of ff. If aa^{*} can be found by ff, we will have 1=π(a)=(c)ρπ(a)=π(ρ1a)=π(ρ11a1,,ρn1an)1=\pi(a^{*})\overset{(c)}{=}\rho\pi(a^{*})=\pi(\rho^{-1}a^{*})=\pi(\rho_{1}^{-1}a_{1}^{*},\cdots,\rho_{n}^{-1}a_{n}^{*}), where (c)(c) holds by Equation 12. However, from akV(ρk)a_{k}^{*}\in V(\rho_{k}) we know ρk1(ak)ak\rho_{k}^{-1}(a_{k}^{*})\neq a_{k}^{*}, then ρ1aa\rho^{-1}a^{*}\neq a^{*}, but π(a)=π(ρ1a)=1\pi(a^{*})=\pi(\rho^{-1}a^{*})=1. ∎

C.2 Proof of Theorem 5.6

See 5.6

Proof.

Assume fgeneral(C)CEf\in\mathcal{F}^{\text{(C)CE}}_{\mathrm{general}} is an (C)CE approximator that always finds the strategy that maximizes the social welfare. Afterward, we construct another f0f_{0} that satisfies PE and always finds the strategy that maximizes social welfare. f0f_{0} is constructed by orbit averaging:

f0(u)=𝒬f(u),\displaystyle f_{0}(u)=\mathcal{Q}f(u),

thus f0f_{0} is PE.

Denote 𝒟\mathcal{D} as an arbitrary payoff distribution of uu such that 𝒟\mathcal{D} is invariant under permutation and the cardinality of its support is finite. We have

𝔼u𝒟SW(𝒬if(u),u)=\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(\mathcal{Q}_{i}f(u),u)= 𝔼u𝒟SW(1|𝒜i|!ρi𝒢iρi1f(ρiu),u)\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f(\rho_{i}u),u)
=\displaystyle= 𝔼u𝒟i=1nui(1|𝒜i|!ρi𝒢iρi1f(ρiu))\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\sum_{i=1}^{n}u_{i}(\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\rho_{i}^{-1}f(\rho_{i}u))
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼u𝒟i=1nui(ρi1f(ρiu))\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{u\sim\mathcal{D}}\sum_{i=1}^{n}u_{i}(\rho_{i}^{-1}f(\rho_{i}u))
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟i=1n(ρi1v)i(ρi1f(v))\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}\sum_{i=1}^{n}(\rho_{i}^{-1}v)_{i}(\rho_{i}^{-1}f(v)) ,let v=ρiu\displaystyle,\text{let }v=\rho_{i}u
=\displaystyle= 1|𝒜i|!ρi𝒢i𝔼v𝒟i=1nvi(f(v))\displaystyle\frac{1}{|\mathcal{A}_{i}|!}\sum_{\rho_{i}\in\mathcal{G}_{i}}\mathbb{E}_{v\sim\mathcal{D}}\sum_{i=1}^{n}v_{i}(f(v))
=\displaystyle= 𝔼u𝒟i=1nui(f(u))\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\sum_{i=1}^{n}u_{i}(f(u))
=\displaystyle= 𝔼u𝒟SW(f(u),u)\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)

Due to that 𝒬=𝒬1𝒬n\mathcal{Q}=\mathcal{Q}_{1}\circ\cdots\circ\mathcal{Q}_{n}, we have

𝔼u𝒟SW(f0(u),u)=𝔼u𝒟SW(f(u),u)\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f_{0}(u),u)=\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)

Due to the arbitrariness of 𝒟\mathcal{D}, we know that f0f_{0} maximizes the social welfare w.r.t. any uu.

From above, we immediately know

SWRN,M(PE(C)CE,general(C)CE)=1\displaystyle\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{(C)CE}}_{\mathrm{PE}},\mathcal{F}^{\mathrm{(C)CE}}_{\mathrm{general}})=1

C.3 Proof of Theorem 5.7

See 5.7

C.3.1 Proof of Equation 1 and Equation 3

We first prove the theorem with respect to OPINE\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}} and bothNE\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}

Step 1

On the one part, we prove

SWRN,M(OPINE,generalNE)SWRN,M(bothNE,generalNE)}1MN1\displaystyle\left.\begin{aligned} \mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{OPI}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\\ \mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\end{aligned}\right\}\leq\frac{1}{M^{N-1}}

We prove this by construction.

Consider a game with NN player and 𝒜i=[M]\mathcal{A}_{i}=[M] for i[N]i\in[N]. a𝒜,i[N]\forall a\in\mathcal{A},i\in[N], define the payoff u¯\bar{u} as follows:

u¯i(a)={1,if a1=a2==aN0,otherwise\bar{u}_{i}(a)=\begin{cases}1&,\text{if }a_{1}=a_{2}=\dots=a_{N}\\ 0&,\text{otherwise}\end{cases}

Define U={u|u=iρiu¯,ρi𝒢i}U=\{u^{\prime}|u^{\prime}=\circ_{i}\rho_{i}\bar{u},\rho_{i}\in\mathcal{G}_{i}\} and 𝒟\mathcal{D} as a uniform distribution on UU. Easy to certify that 𝒟\mathcal{D} is a permutation-invariant distribution.

Let f~~generalNE\tilde{f}\in\tilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}} be the NE oracle that f~(u¯)i=1\tilde{f}(\bar{u})_{i}=1 and for any u=iρiu¯Uu^{\prime}=\circ_{i}\rho_{i}\bar{u}\in U, f~(u)i=ρi(1)\tilde{f}(u^{\prime})_{i}=\rho_{i}(1). Intuitively, the oracle will choose the action that will provide all players with revenue 11, leading to a social welfare of NN. Since each player has got her maximum possible utility, we have

maxfgeneralNE𝔼u𝒟SW(f(u),u)=maxf~~generalNE𝔼u𝒟SW(f~(u),u)=N.\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)=\max_{\tilde{f}\in\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(\tilde{f}(u),u)=N. (13)

For any j1,j2[M]j_{1},j_{2}\in[M] and j1<j2j_{1}<j_{2}, let ρi(j1,j2)=(1,,j2,,j1,,M)\rho^{(j_{1},j_{2})}_{i}=(1,\dots,j_{2},\dots,j_{1},\dots,M) for all player i[N]i\in[N] be the swap permutation that swaps actions j1j_{1} and j2j_{2} and keeps other actions still. Then ijρi(j1,j2)u¯=ρj(j1,j2)u¯\circ_{i\neq j}\rho_{i}^{(j_{1},j_{2})}\bar{u}=\rho_{j}^{(j_{1},j_{2})}\bar{u} for player jj. For fOPINEf\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}}, we have f(u¯)j=f(ijρi(j1,j2)u¯)j=f(ρj(j1,j2)u¯)jf(\bar{u})_{j}=f(\circ_{i\neq j}\rho_{i}^{(j_{1},j_{2})}\bar{u})_{j}=f(\rho_{j}^{(j_{1},j_{2})}\bar{u})_{j} for arbitrary swap permutation ρj(j1,j2)\rho_{j}^{(j_{1},j_{2})}. Since any permutation can be achieved by composition of swap permutations, we have ρj𝒢j\forall\rho_{j}\in\mathcal{G}_{j}, f(u¯)j=f(ρju¯)jf(\bar{u})_{j}=f(\rho_{j}\bar{u})_{j}. Based on that, and by OPI of ff, ρ=i[N]ρi\forall\rho=\circ_{i\in[N]}\rho_{i} we have f(u¯)j=f(ρu¯)jf(\bar{u})_{j}=f(\rho\bar{u})_{j}, i.e. ff is a constant function on UU. Without loss of generality, we denote f(u)σf(u)\equiv\sigma for all uUu\in U. Then

𝔼u𝒟SW(f(u),u)=1|U|uUSW(σ,u)=1(M!)N1SW(σ,uUu).\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)=\frac{1}{|U|}\sum_{u^{\prime}\in U}\mathrm{SW}(\sigma,u^{\prime})=\frac{1}{(M!)^{N-1}}\mathrm{SW}(\sigma,\sum_{u^{\prime}\in U}u^{\prime}).

Additionally, we have (uUu)(a)=((M1)!)N1(\sum_{u^{\prime}\in U}u^{\prime})(a)=((M-1)!)^{N-1} for any a𝒜a\in\mathcal{A}. Based on that, we have

maxfOPINE𝔼u𝒟SW(f(u),u)=1(M!)N1N((M1)!)N1=NMN1.\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)=\frac{1}{(M!)^{N-1}}\cdot N((M-1)!)^{N-1}=\frac{N}{M^{N-1}}. (14)

Combining Equation 13 and Equation 14, we have

SWRN,M(OPINE,generalNE)1MN1.\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\leq\frac{1}{M^{N-1}}.

Due to bothNEOPINE\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}\subseteq\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}}, we immediately know

SWRN,M(bothNE,generalNE)1MN1\displaystyle\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\leq\frac{1}{M^{N-1}}
Step 2

On the other part, we prove

SWRN,M(OPINE,generalNE)SWRN,M(bothNE,generalNE)}1/MN1\displaystyle\left.\begin{aligned} \mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{OPI}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\\ \mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\end{aligned}\right\}\geq 1/M^{N-1}

Define the maximum possible utility (MPU) for player ii with respect to utility uiu_{i} and action aia_{i} as

MPU(ui,ai)maxai𝒜iui(ai,ai)\mathrm{MPU}(u_{i},a_{i})\coloneqq\max_{a_{-i}\in\mathcal{A}_{-i}}u_{i}(a_{i},a_{-i}) (15)

Define the set of maximum possible utility best response for player ii w.r.t. uiu_{i} as

i(ui){ai𝒜i:MPU(ui,ai)=maxai𝒜iMPU(ui,ai)}\mathcal{B}_{i}(u_{i})\coloneqq\{a_{i}\in\mathcal{A}_{i}:\mathrm{MPU}(u_{i},a_{i})=\max_{a^{\prime}_{i}\in\mathcal{A}_{i}}\mathrm{MPU}(u_{i},a^{\prime}_{i})\}

We first conduct some simplification to the target.

SWRN,M(bothNE,generalNE)=inf𝒟maxfbothNE𝔼u𝒟SW(f(u),u)maxfgeneralNE𝔼u𝒟SW(f(u),u)inf𝒟maxfbothNE𝔼u𝒟SW(f(u),u)𝔼u𝒟maxσSW(σ,u)\displaystyle\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})=\inf_{\mathcal{D}}\frac{\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)}{\max_{f\in{\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)}\geq\inf_{\mathcal{D}}\frac{\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,u)}

Then we constrain uu to be a cooperation game. For a normal form game Γu\Gamma_{u}, we define u~=(u~i)i[n]\tilde{u}=(\tilde{u}_{i})_{i\in[n]} in which u~i=1ni=1nui\tilde{u}_{i}=\frac{1}{n}\sum_{i=1}^{n}u_{i}. Then we have SW(σ,u)=SW(σ,u~)\mathrm{SW}(\sigma,u)=\mathrm{SW}(\sigma,\tilde{u}), which means that constraining uu to be a cooperation game will induce the same social welfare. Then

inf𝒟maxfbothNE𝔼u𝒟SW(f(u),u)𝔼u𝒟maxσSW(σ,u)=inf𝒟maxfbothNE𝔼u𝒟SW(f(u),u~)𝔼u𝒟maxσSW(σ,u~)\displaystyle\inf_{\mathcal{D}}\frac{\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,u)}=\inf_{\mathcal{D}}\frac{\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),\tilde{u})}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})}

Denote f0f_{0} be the approximator that always outputs uniform strategy on i(u~i)\mathcal{B}_{i}(\tilde{u}_{i}) for player ii. It’s obvious that f0f_{0} is both OPI and PPE because the operations from uu to f0(u)f_{0}(u) are all permutation-equivariant. Then,

inf𝒟maxfbothNE𝔼u𝒟SW(f(u),u~)𝔼u𝒟maxσSW(σ,u~)inf𝒟𝔼u𝒟SW(f0(u),u~)𝔼u𝒟maxσSW(σ,u~)\displaystyle\inf_{\mathcal{D}}\frac{\max_{f\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}}}\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),\tilde{u})}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})}\geq\inf_{\mathcal{D}}\frac{\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f_{0}(u),\tilde{u})}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})}

Ignore the infimum and the expectation operator, consider SW(f0(u),u~)maxσSW(σ,u~)\frac{\mathrm{SW}(f_{0}(u),\tilde{u})}{\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})} for arbitrary u~\tilde{u}, denote bb be the maximum element appeared in u~\tilde{u}, then the denominator equals NbNb. But for the numerator, for player ii, no matter what action aii(u~i)a_{i}\in\mathcal{B}_{i}(\tilde{u}_{i}) she chooses, she always has probability at least ji1|j|1MN1\prod_{j\neq i}\frac{1}{|\mathcal{B}_{j}|}\geq\frac{1}{M^{N-1}} to achieve revenue bb, therefore inducing SW(f0(u),u~)NbMN1\mathrm{SW}(f_{0}(u),\tilde{u})\geq\frac{Nb}{M^{N-1}}.

Then, SW(f0(u),u~)maxσSW(σ,u~)1MN1\frac{\mathrm{SW}(f_{0}(u),\tilde{u})}{\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})}\geq\frac{1}{M^{N-1}}, so as inf𝒟𝔼u𝒟SW(f0(u),u~)𝔼u𝒟maxσSW(σ,u~)\inf_{\mathcal{D}}\frac{\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f_{0}(u),\tilde{u})}{\mathbb{E}_{u\sim\mathcal{D}}\max_{\sigma}\mathrm{SW}(\sigma,\tilde{u})}, SWRN,M(bothNE)\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}}) and SWRN,M(OPINE)\mathrm{SWR}_{\mathrm{N,M}}({\mathcal{F}}^{\mathrm{NE}}_{\mathrm{OPI}}).

Above all,

SWRN,M(OPINE,generalNE)SWRN,M(bothNE,generalNE)}=1MN1\displaystyle\left.\begin{aligned} \mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{OPI}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\\ \mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{both}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\end{aligned}\right\}=\frac{1}{M^{N-1}}

C.3.2 Proof of Equation 2

We next prove the theorem with respect to PPENE\mathcal{F}^{\mathrm{NE}}_{\mathrm{PPE}}that

SWRN,M(PPENE,generalNE)1M\displaystyle\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{PPE}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\leq\frac{1}{M}

Consider a bimatrix game and 𝒜i=[M]\mathcal{A}_{i}=[M] for i[2]i\in[2]. a𝒜,i[2]\forall a\in\mathcal{A},i\in[2], define the payoff u¯\bar{u} as follows:

u¯i(a)={1,if a1=a20,otherwise\bar{u}_{i}(a)=\begin{cases}1&,\text{if }a_{1}=a_{2}\\ 0&,\text{otherwise}\end{cases}

Define U{u|u=ρ1ρ2u¯,ρi𝒢i}U\coloneqq\{u^{\prime}|u^{\prime}=\rho_{1}\rho_{2}\bar{u},\rho_{i}\in\mathcal{G}_{i}\} and 𝒟\mathcal{D} as a uniform distribution on UU. Easy to certify that U={u|u=ρ1u¯,ρ1𝒢1}={u|u=ρ2u¯,ρ2𝒢2}U=\{u^{\prime}|u^{\prime}=\rho_{1}\bar{u},\rho_{1}\in\mathcal{G}_{1}\}=\{u^{\prime}|u^{\prime}=\rho_{2}\bar{u},\rho_{2}\in\mathcal{G}_{2}\} and 𝒟\mathcal{D} is a permutation-invariant distribution.

Let f~~generalNE\tilde{f}\in\tilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}} be the NE oracle that f~(u¯)i=1\tilde{f}(\bar{u})_{i}=1 and for any u=iρiu¯Uu^{\prime}=\circ_{i}\rho_{i}\bar{u}\in U, f~(u)i=ρi(1)\tilde{f}(u^{\prime})_{i}=\rho_{i}(1). Intuitively, the oracle will choose the action that will provide all players with revenue of 11, leading to a social welfare of 22.

For a permutation ϱ\varrho on [M][M], let Pϱ{0,1}M×MP_{\varrho}\in\{0,1\}^{M\times M} be the corresponding permutation matrix. Denote 𝒫\mathcal{P} as the set of all permutation matrice. As a result, uU,ρ1𝒢1,ρ1u=(Pρ1u1,Pρ1u2)=:Pρ1u\forall u\in U,\forall\rho_{1}\in\mathcal{G}_{1},\rho_{1}u=(P_{\rho_{1}}{u}_{1},P_{\rho_{1}}{u}_{2})=:P_{\rho_{1}}u and ρ2𝒢2,ρ2u=(u1Pρ2T,u2Pρ2T)=:uPρ2T\forall\rho_{2}\in\mathcal{G}_{2},\rho_{2}u=({u}_{1}P_{\rho_{2}}^{T},{u}_{2}P_{\rho_{2}}^{T})=:uP_{\rho_{2}}^{T}. Specially, we have Pϱu¯PϱT=u¯P_{\varrho}\bar{u}P_{\varrho}^{T}=\bar{u}. For fPPENEf\in\mathcal{F}^{\mathrm{NE}}_{\mathrm{PPE}}, Denote f(u¯)=σ=(σ1,σ2)f(\bar{u})=\sigma=(\sigma_{1},\sigma_{2}). For permutation ϱ\varrho in [M][M] and payoff u=Pϱu¯=u¯(PϱT)1u^{\prime}=P_{\varrho}\bar{u}=\bar{u}(P_{\varrho}^{T})^{-1}, by PPE of ff, we have f(u)1=f(Pϱu¯)1=Pϱσ1=ϱσ1,f(u^{\prime})_{1}=f(P_{\varrho}\bar{u})_{1}=P_{\varrho}\sigma_{1}=\varrho\sigma_{1}, and f(u)2=f(u¯(PϱT)1)2=(Pϱ)1σ2=ϱ1σ2.f(u^{\prime})_{2}=f(\bar{u}(P_{\varrho}^{T})^{-1})_{2}=(P_{\varrho})^{-1}\sigma_{2}=\varrho^{-1}\sigma_{2}. Then we have

SW(f(u),u)=i(Pϱu¯)i(ϱσ1,ϱ1σ2)=iu¯i(σ1,ϱ1σ2)=i(u¯PϱT)i(σ1,σ2)=SW(f(u¯),u¯PϱT)\mathrm{SW}(f(u^{\prime}),u^{\prime})=\sum_{i}(P_{\varrho}\bar{u})_{i}(\varrho\sigma_{1},\varrho^{-1}\sigma_{2})=\sum_{i}\bar{u}_{i}(\sigma_{1},\varrho^{-1}\sigma_{2})=\sum_{i}(\bar{u}P_{\varrho}^{T})_{i}(\sigma_{1},\sigma_{2})=\mathrm{SW}(f(\bar{u}),\bar{u}P_{\varrho}^{T})

Therefore

𝔼u𝒟SW(f(u),u)\displaystyle\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u) =1|U|uUSW(f(u),u)\displaystyle=\frac{1}{|U|}\sum_{u^{\prime}\in U}\mathrm{SW}(f(u^{\prime}),u^{\prime})
=1|U|Pϱ𝒫SW(f(u¯),u¯PϱT)\displaystyle=\frac{1}{|U|}\sum_{P_{\varrho}\in\mathcal{P}}\mathrm{SW}(f(\bar{u}),\bar{u}P_{\varrho}^{T})
=1|U|u=u¯(PϱT)USW(f(u¯),u)\displaystyle=\frac{1}{|U|}\sum_{u=\bar{u}(P_{\varrho}^{T})\in U}\mathrm{SW}(f(\bar{u}),u)
=1|U|SW(σ,uUu).\displaystyle=\frac{1}{|U|}\mathrm{SW}(\sigma,\sum_{u^{\prime}\in U}u^{\prime}).

Since |U|=1M!|U|=\frac{1}{M!} and uUu\sum_{u^{\prime}\in U}u^{\prime} is a tensor with all elements equal to (M1)!(M-1)!. Thus 𝔼u𝒟SW(f(u),u)=2M\mathbb{E}_{u\sim\mathcal{D}}\mathrm{SW}(f(u),u)=\frac{2}{M} and

SWRN,M(PPENE,generalNE)1M\displaystyle\mathrm{SWR}_{\mathrm{N,M}}(\mathcal{F}^{\mathrm{NE}}_{\mathrm{PPE}},\mathcal{F}^{\mathrm{NE}}_{\mathrm{general}})\leq\frac{1}{M}

C.3.3 Proof of Equation 4

Consider a 3×33\times 3 game as follows, where ϵ(0,12)\epsilon\in(0,\frac{1}{2}):

u=[𝟏,𝟏0,00,12+ε0,0𝟏,𝟏0,12+ε12+ε,012+ε,0ε,ε]u=\begin{bmatrix}\bm{1,1}&0,0&0,\frac{1}{2}+\varepsilon\\ 0,0&\bm{1,1}&0,\frac{1}{2}+\varepsilon\\ \frac{1}{2}+\varepsilon,0&\frac{1}{2}+\varepsilon,0&\varepsilon,\varepsilon\end{bmatrix}

It is obvious that maxσNE(Γu)SW(σ,u)=2\max_{\sigma^{*}\subseteq\text{NE}(\Gamma_{u})}\text{SW}(\sigma^{*},u)=2, and the corresponding strategy has been bolded. However, for NE oracles with both PPE and OPI, it can only output a unique NE with a pure strategy that induces utility (ε,ε)(\varepsilon,\varepsilon).

Let ρ1=ρ2=(2,1,3)\rho_{1}=\rho_{2}=(2,1,3), we have ρ1ρ2u=u\rho_{1}\rho_{2}u=u. From the analysis above we know if fNE~bothNEf^{\mathrm{NE}}\in\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}} and fNE(u)=(σ1,σ2)f^{\mathrm{NE}}(u)=(\sigma_{1},\sigma_{2}), then σ1(1)=σ1(2)\sigma_{1}(1)=\sigma_{1}(2), σ2(1)=σ2(2)\sigma_{2}(1)=\sigma_{2}(2). We integrate the first two actions of player 11 and player 22 into a new action that will choose randomly between the first two actions, then we form the utility matrix below:

u¯=[12,120,12+ε12+ε,0𝜺,𝜺]\overline{u}=\begin{bmatrix}\frac{1}{2},\frac{1}{2}&0,\frac{1}{2}+\varepsilon\\ \frac{1}{2}+\varepsilon,0&\bm{\varepsilon,\varepsilon}\end{bmatrix}

There is a unique NE in this Prisoner’s Dilemma, which has been bolded. The game u¯\overline{u} is the same with the game uu under the assumption that σ1(1)=σ1(2)\sigma_{1}(1)=\sigma_{1}(2) and σ2(1)=σ2(2)\sigma_{2}(1)=\sigma_{2}(2) in uu. Then maxf~bothNESW(f(u),u)=2ε\max_{f\in\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}}}\text{SW}(f(u),u)=2\varepsilon. Since ε\varepsilon can be arbitrarily small, we have SWR2,3(~bothNE,~generalNE)=0\mathrm{SWR}_{2,3}(\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}})=0. As a result, we have SWRN,M(~bothNE,~generalNE)=0\mathrm{SWR}_{N,M}(\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{both}},\widetilde{\mathcal{F}}^{\mathrm{NE}}_{\mathrm{general}})=0 for all N2N\geq 2 and M3M\geq 3.