This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

First-order (coarse) correlated equilibria in non-concave games

Mete Şeref Ahunbay E-mail: [email protected] Technische Universität München, School of Computation, Information and Technology, Department of Computer Science. Boltzmannstraße 3, 85748, Garching bei München, Germany.
(6th February, 2025111A first version of this paper was uploaded to ArXiV on 27th March, 2024. An update on 27th November, 2024 placed improvements on the presentation, an additional remark on the connection between our notion of equilibria and the smoothness framework of [72], and a few minor corrections. The present version contains extensions of the results in Section 3, in particular with respect to time-average and adversarial coarse regret guarantees.)
Abstract

We investigate first-order notions of correlated equilibria; distributions of actions for smooth, potentially non-concave games such that players do not incur any regret against small modifications to their strategies along a set of continuous vector fields. We define two such notions, based on local deviations and on stationarity of the distribution, and identify the notion of coarseness as the setting where the associated vector fields are in fact gradient fields. For coarse equilibria, we prove that online (projected) gradient decent has a universal approximation property for both variants of equilibrium. In the non-coarse setting, we instead reduce the problem of finding an equilibrium to fixed-point computation via the usual framework of Φ\Phi-regret minimisation, and identify tractable instances. Finally, we study the primal-dual framework to our notion of first-order equilibria. For coarse equilibria defined by a family of functions, we find that a dual bound on the worst-case expectation of a performance metric takes the form of a generalised Lyapunov function for the dynamics of the game. Specifically, usual primal-dual price of anarchy analysis for coarse correlated equilibria as well as the smoothness framework of Roughgarden are both equivalent to a problem of general Lyapunov function estimation. For non-coarse equilibria, we instead observe a vector field fit problem for the gradient dynamics of the game. These follow from containment results in normal form games; the usual notion of a (coarse) correlated equilibria is equivalent to our first-order local notions of (coarse) correlated equilibria with respect to an appropriately chosen set of vector fields.

1 Introduction

The central question we study is on notions of correlated equilibria in a smooth game; in abstract, given a set of players NN, compact & convex action sets XiX_{i} and sufficiently smooth payoffs ui:×iNXiu_{i}:\times_{i\in N}X_{i}\rightarrow\mathbb{R}, a correlated equilibrium is a probability distribution σ\sigma on ×iNXi\times_{i\in N}X_{i} satisfying ϕ(σ)0ϕΦ\phi(\sigma)\leq 0\ \forall\ \phi\in\Phi for some family of linear equilibrium constraints Φ\Phi. Each equilibrium constraint ϕ\phi is taken to capture in expectation players’ deviation incentives, i.e. the change in payoffs when players’ actions are drawn xσx\sim\sigma, and a player ii considers “modifying” their drawn strategy xix_{i} in some manner prescribed by ϕ\phi.

Our motivation is two-fold; first, we seek an answer to a question posed in [29] regarding what an appropriate theory of non-concave games should be. Specifically, whereas a Nash equilibrium necessarily exists for a concave game, this need not be true when players’ utilities can be non-concave in their own actions. While traditional game theory has focused its attention on concave games, most settings relevant to the recent advances in machine learning are appropriately modelled as non-concave games. Moreover, whereas first-order Nash equilibrium does exist in non-concave games, even its approximate versions are PPADPPAD-hard to compute. This raises the question of what a tractable notion of equilibrium is for non-concave games.

Second, we aim to refine the notions of (coarse) correlated equilibria of normal form games to strengthen the analysis of their learnable outcomes via appeals to linear programming duality. A classic result from learning theory is that when all players employ no-internal (external) regret algorithms in a normal-form game, any convergent subsequence of aggregate history of play does so to a (coarse) correlated equilibrium. Moreover, notions of equilibria are given by a set of linear inequalities of size polynomial in the size of the game. However, no-regret learning can be too weak to explain observed convergence of quintessential learning algorithms such as online gradient ascent (OGA) or the multiplicative weights update (MWU). One setting of particular importance is that of auctions, where empirical studies suggest Nash equilibria are learnable but the theoretical basis for why it is so has proven elusive. For instance, [10] shows that a variant of MWU (equivalent to dual averaging with an entropic barrier [61]) converges to (an approximate) equilibrium in a variety of single-item and combinatorial auction settings. However, the coarse correlated equilibria of complete information first-price auctions can be significantly different from the auction’s unique Nash equilibrium [36], and mean-based learning [13] only provides a partial answer [50, 31]. In turn, no-regret learning explains convergence to equilibrium in incomplete information first-price auctions only under stringent concavity assumptions on buyers’ prior distributions [1]. This apparent contradiction suggests that the incentive constraints associated with no-external regret learning potentially form merely a subset of all incentive constraints specific learning algorithms satisfy. In other words, there might exist hitherto unkown equilibrium refinements over coarse correlated equilibria, which is associated with learned outcomes when players utilise e.g. OGA or MWU.

1.1 Contributions and Technical Summary

Given our motivations, we seek notions of equilibrium that are tractably computable given access to payoff gradients independent of concavity assumptions on the payoffs, and that contain (coarse) correlated equilibria of normal form games as special cases. Towards this end, we build upon the recent work of [18], who considered a local variant of Φ\Phi regret traditionally considered in literature (cf. [39]), and showed two such families of local strategy modifications are tractably computable via online gradient ascent. Our key insight is that the δ\delta-strategy modifications contained in their work are generated by gradient fields of a suitable family of functions.

To provide intuition for our arguments, consider a smooth game with a set of players NN, compact and convex action sets XiX_{i} for each player ii (with their product denoted XX), and smooth utility functions uiu_{i} for each player ii. Suppose all players implement online gradient ascent with the same adaptive step sizes ηt\eta_{t} at time tt. That is, if player ii chooses action xitx_{i}^{t} at time tt, they will fix

xit+1=ΠXi[xit+ηtiui(xt)].x_{i}^{t+1}=\Pi_{X_{i}}\left[x_{i}^{t}+\eta_{t}\nabla_{i}u_{i}(x^{t})\right].

To impose a correlated distribution σ\sigma of play as an equilibrium concept, let each xtx^{t} is drawn with probability proportional to ηt\eta_{t}. Now consider a “local” deviation for player ii, in which they consider deviating in their strategies along the gradient of a function hih_{i}. That is, for δ>0\delta>0, they consider xitΠXi[xit+δihi(xit)]x_{i}^{t}\mapsto\Pi_{X_{i}}\left[x_{i}^{t}+\delta\nabla_{i}h_{i}(x_{i}^{t})\right].

What is the possible change in payoffs by such a local deviation? Consider for simplicity the case when no projections are ever required for the purposes of local deviations and during online gradient ascent, i.e. both ΠXi[xit+δihi(xit)]=xit+δihi(xit)\Pi_{X_{i}}\left[x_{i}^{t}+\delta\nabla_{i}h_{i}(x_{i}^{t})\right]=x_{i}^{t}+\delta\nabla_{i}h_{i}(x_{i}^{t}) and ΠXi[xit+ηtiui(xt)]=xit+ηtiui(xt)\Pi_{X_{i}}\left[x_{i}^{t}+\eta_{t}\nabla_{i}u_{i}(x^{t})\right]=x_{i}^{t}+\eta_{t}\nabla_{i}u_{i}(x^{t}). In this case, the change in utility is given, after TT time steps and ignoring terms of order δ2\delta^{2},

𝔼xσ[ui(xi+δihi(xi),xi)ui(x)]\displaystyle\mathbb{E}_{x\sim\sigma}[u_{i}(x_{i}+\delta\nabla_{i}h_{i}(x_{i}),x_{-i})-u_{i}(x)] =1tηttηt(ui(xit+δihi(xit),xit)ui(xt))\displaystyle=\frac{1}{\sum_{t}\eta^{t}}\cdot\sum_{t}\eta_{t}\cdot\left(u_{i}(x^{t}_{i}+\delta\nabla_{i}h_{i}(x^{t}_{i}),x^{t}_{-i})-u_{i}(x^{t})\right)
=1tηttηt(δihi(xit),iui(xt)+O(δ2))\displaystyle=\frac{1}{\sum_{t}\eta^{t}}\cdot\sum_{t}\eta_{t}\cdot\left(\delta\cdot\left\langle\nabla_{i}h_{i}(x^{t}_{i}),\nabla_{i}u_{i}(x^{t})\right\rangle+O(\delta^{2})\right)
=1tηtδt(ηtihi(xit),iui(xt))+O(δ2)\displaystyle=\frac{1}{\sum_{t}\eta^{t}}\cdot\delta\cdot\sum_{t}\left(\eta_{t}\cdot\left\langle\nabla_{i}h_{i}(x^{t}_{i}),\nabla_{i}u_{i}(x^{t})\right\rangle\right)+O(\delta^{2})
=1tηtδt(hi(xit+ηtiui(xt))hi(xit))+O(δ2)\displaystyle=\frac{1}{\sum_{t}\eta^{t}}\cdot\delta\cdot\sum_{t}\left(h_{i}(x^{t}_{i}+\eta_{t}\nabla_{i}u_{i}(x^{t}))-h_{i}(x^{t}_{i})\right)+O(\delta^{2})
=δtηt(hi(xiT+1)hi(xi0))+O(δ2).\displaystyle=\frac{\delta}{\sum_{t}\eta^{t}}\cdot(h_{i}(x^{T+1}_{i})-h_{i}(x^{0}_{i}))+O(\delta^{2}).

The second equality here is by considering a Taylor expansion of ui(xi+δihi(xi),xi)u_{i}(x_{i}+\delta\nabla_{i}h_{i}(x_{i}),x_{-i}), and the fourth inequality follows from a Taylor expansion of hi(xit+δiui(xt))h_{i}(x^{t}_{i}+\delta\nabla_{i}u_{i}(x^{t})). As a consequence, if the function hih_{i} which generates these local deviations are bounded, under suitable assumptions for the step sizes, the regret against hih_{i} of order O(δ)O(\delta) would vanish with time. Dividing both sides by δ\delta and letting δ0\delta\rightarrow 0, we see that the expectation of ihi(xit),iui(xt)\left\langle\nabla_{i}h_{i}(x_{i}^{t}),\nabla_{i}u_{i}(x^{t})\right\rangle, the “first-order local regret”, would vanish also.

However, when projections are involved, we would need to bound losses we obtain due to terms of the form xit+1xit+δihi(xit),iui(xt)\left\langle x^{t+1}_{i}-x_{i}^{t}+\delta\nabla_{i}h_{i}(x_{i}^{t}),\nabla_{i}u_{i}(x^{t})\right\rangle, xit+1xit+ηtiui(xt),ihi(xit)\left\langle x^{t+1}_{i}-x_{i}^{t}+\eta_{t}\nabla_{i}u_{i}(x^{t}),\nabla_{i}h_{i}(x_{i}^{t})\right\rangle. This turns out to be a non-trivial task. Yet, when we consider the continuous motion induced by the game dynamics, defined via the differential equation x˙(τ)=Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]\dot{x}(\tau)=\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right], no such loss occurs and we observe a declining regret bound through very straightforward differential arguments (cf. Section 3). This is via an argument on stationarity of the resulting distribution. Namely, given a smooth function h:Xh:X\rightarrow\mathbb{R}, each action profile xXx\in X assigns a rate of change to hh via the projected gradient dynamics of the game; given x(τ)x(\tau),

dh(x(τ))dτ=iNih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))],\frac{dh(x(\tau))}{d\tau}=\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle,

and the expectation of this quantity over a time period [0,τ¯][0,\overline{\tau}] is simply equal to (h(τ¯)h(0))/τ¯(h(\overline{\tau})-h(0))/\overline{\tau}, which converges to zero. Intuitively, the longer we let the curve trace out, the closer it becomes to forming a distribution invariant under time translation. Exchanging which vector is projected onto the tangent cone, we observe that

iNih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]iNΠ𝒯𝒞Xi(xi)[ih(x(τ))],iui(x(τ))\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\geq\sum_{i\in N}\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x(\tau))\right],\nabla_{i}u_{i}(x(\tau))\right\rangle

except for a set of measure zero on the curve traced by the projected gradient dynamics of the game. This motivates our two first-order notions of equilibrium, one based on local deviation, the other based on stationarity.

Definition (Semiformal).

For ϵ>0\epsilon>0, a distribution σ\sigma over XX is said to be an ϵ\epsilon-local CE with respect to a family FF of Lipschitz continuous vector fields XDX\rightarrow\mathbb{R}^{D}, if for every fFf\in F, iN𝔼xσ[Π𝒯𝒞Xi(xi)[fi(x)],iui(x)]ϵpoly(G,L,Gf,Lf)\sum_{i\in N}\mathbb{E}_{x\sim\sigma}\left[\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[f_{i}(x)\right],\nabla_{i}u_{i}(x)\right\rangle\right]\leq\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}). For an ϵ\epsilon-stationary CE, we project iui\nabla_{i}u_{i} instead of fi(x)f_{i}(x) onto the tangent cone to player ii’s action set at xx. Moreover, we call the equilibrium coarse each vector field ff is equal to h\nabla h for some function hh.

In particular, since our high level argument hints that projected gradient descent minimises regret with respect to the gradient field of any such function hh, we identify the notion of coarseness to be so. Meanwhile, for more general correlated equilibria, we allow deviations generated by arbitrary set FF of vector fields on XX.

As mentioned, the main technical barrier is in dealing with projections. Our chief technical innovation is thus in Section 3.1, where we wriggle our hands free of dealing with them directly, by constructing an appropriate continuous (and almost everywhere differentiable) curve over which regret analysis is simple. We essentially convert the sequence of play x1,x2,x^{1},x^{2},\dots to a continuous curve x(τ)x(\tau), “connecting the dots” by projecting a partial gradient step, scaled by some factors μt\mu_{t} which allows us to modify the probability distribution we consider over xtx^{t}. For instance, suppose that the initial strategies are x0x^{0}, and each player ii fixes xi1=ΠXi[xi0+η0iui(x0)]x_{i}^{1}=\Pi_{X_{i}}\left[x_{i}^{0}+\eta_{0}\cdot\nabla_{i}u_{i}(x^{0})\right]. We interpret this step as having been taken “over time μtηt\mu_{t}\eta_{t}”, i.e. we let x(0)=xi0x(0)=x_{i}^{0} and x(μ0η0)=xi1x(\mu_{0}\eta_{0})=x_{i}^{1}. To extend to x(τ)x(\tau) continuously, we then let xi(τ)=ΠXi[xi0+(τ/μ0)iui(x0)]x_{i}(\tau)=\Pi_{X_{i}}\left[x_{i}^{0}+(\tau/\mu_{0})\cdot\nabla_{i}u_{i}(x^{0})\right] for each player ii. This extension is continuous on [0,μ0η0][0,\mu_{0}\eta_{0}], and more desirably, is differentiable almost everywhere with respect to the Lebesgue measure on the interval. Moreover, the choice μt=1\mu_{t}=1 corresponds to a simulation of the gradient dynamics of the game, whereas μt=1/ηt\mu_{t}=1/\eta_{t} roughly corresponds to a uniform distribution over xtx^{t}.

Bounding the regret then amounts of bounding the difference between x˙(τ)\dot{x}(\tau), the velocity along the curve given by our “approximate” gradient dynamics for the game, and Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right], the actual velocity prescribed by the dynamics of the game. When the boundary of the action sets are regular hypersurfaces in Euclidean space, the difference turns out to be small, whereas if the actions sets are polyhedra satisfying a positivity condition (Definition 10), the difference only depends on the Lipschitz moduli of the utilities. We remark that the sufficient conditions we identify for the actions sets are satisfied by many settings of interest – it includes e.g. the sphere, the hypercube or the simplex. Consequently, we obtain proof that projected gradient ascent with equal step sizes has a universal approximation property.

Theorem (Informal, Theorems 3.4, 3.5, & 3.8).

Suppose that players all implement projected gradient ascent with step sizes O(1/t)O(1/\sqrt{t}). Further suppose that players’ action sets satisfy one of the above assumptions. Then, sampling uniformly from the trajectory of the resulting curve x(τ):[0,t=0T1μtηt]Xx(\tau):[0,\sum_{t=0}^{T-1}\mu_{t}\eta_{t}]\rightarrow X, we obtain a distribution which is both an O~(1/T)\tilde{O}(1/\sqrt{T})-local and an O~(1/T)\tilde{O}(1/\sqrt{T})-stationary CCE with respect to the set of gradient fields of all continuously differentiable functions with Lipschitz gradients. The regret bound is non-uniform among all such functions; it depends linearly on a bound on the magnitude of the gradient of hh.

That aggregate regret over the players against first-order deviations generated by any such function h:Xh:X\rightarrow\mathbb{R} depends crucially on the fact that when all players use the same step sizes, the time sequence (xt)(x^{t}) simulates the underlying continuous projected gradient dynamics of the game. Intuitively, zero regret in the first-order against any such deviation provides a convex description of solutions to the projected gradient dynamics, once the time-ordering information on the curve is erased. Thus, with potentially different step sizes which might have different asymptotic rates of decrease, or when some players can be adversarial, our methods fail to guarantee coarse regret bounds in such generality. However, first-order regret is still vanishing, when the deviations are restricted to vary only on players who do use the same step size. The result depends on a similarly defined curve x:[0,T]Xx:[0,T]\rightarrow X, which interpolates the gradient steps of relevant players.

Theorem (Informal, Theorems 3.9 & 5.3).

Suppose that a subset of players NN^{\prime} all implement projected gradient ascent with step sizes O(1/t)O(1/\sqrt{t}), while others players are adversarial. Further suppose that players’ action sets satisfy one of the regularity assumptions. Then, sampling uniformly from the trajectory of an appropriately defined curve x(τ):[0,T]Xx(\tau):[0,T]\rightarrow X, we obtain a distribution which is both an O(1/T)O(1/\sqrt{T})-local and an O(1/T)O(1/\sqrt{T})-stationary CCE with respect to the set of gradient fields of continuously differentiable functions hh with Lipschitz gradients whose value changes only in the strategies of the players in XX. Moreover, if h\nabla h “points inwards” to XX, players in NN^{\prime} also incur O(1/T)O(1/\sqrt{T}) local regret in time-average.

In Section 4, we turn our attention to first-order correlated equilibria. Here, we observe that the usual methods for Φ\Phi-regret minimisation work for computing both stationary and local equilibria; the latter if the vector fields satisfy a tangency condition.

Theorem (Informal, Theorems 4.1 and 4.2).

Suppose that FF is a finite set of vector fields on XX. Then with access to fixed-point computation for all vector fields in the linear span of FF, an ϵ\epsilon-stationary correlated equilibrium can be computed in O(|F|/ϵ2)O(|F|/\epsilon^{2}) iterations. An ϵ\epsilon-local correlated equilibrium has the same approximability property if all fFf\in F are tangent to players’ action sets, with access to fixed-point computation for all vector fields in the conical span of FF instead.

The question is whether there exists an interesting setting in which fixed-point computation can be performed tractably. We identify one setting; when each vector field is affine-linear, each fixed point computation reduces to minimising a quadratic function. Example 1 shows that a careful selection of such vector fields results in last iterate convergence to the equilibrium in the matching pennies game. This suggests that a centralised learning algorithm might not require optimism [32] or explicit second-order corrections to the utility gradients [52] to obtain last iterate convergence to equilibrium in games.

In Section 5, we focus on equilibrium analysis. Our goal here is to determine how equilibrium can be reasoned about. Towards this end, we adapt in Section 5.1 the usual primal-dual framework for equilibrium analysis, bounding the expectation of a quantity defined over the set of outcomes (e.g. welfare, distance to an equilibrium) for our equilibrium concepts. Our observation is that for a primal problem which seeks a best or worst-case correlated equilibrium, a dual solution provides information on the gradient dynamics of the game. In particular, if the primal problem maximizes expected worst-case distance to an equilibrium and there exists a dual solution of value 0, the solution is a Lyapunov function by its usual definition. For correlated equilibrium, the dual problem is instead that of a vector field fit problem, searching for a linear or conical combination of vector fields in FF which are well-aligned with the utility gradients.

Finally, in Section 5.2 we look at how first-order local equilibria look like for continuous extensions of normal form games. Here, we find an interesting equivalence result.

Proposition (Informal, Propositions 5.4, 5.6 & 5.7).

The usual notions of (coarse) correlated equilibrium, as well the notion of an average coarse correlated equilibrium [64] are equivalent to a suitably defined first-order local (coarse) correlated equilibrium.

The equivalence here is in terms of equilibrium constraints on the resulting distributions of action profiles; a first-order correlated equilibrium defined with respect to an appropriate set FF of vector fields induces a distribution on action profiles that correspond to the one of the above equilibrium concepts. Moreover, the converse statement is also true. It is in this sense that the usual methods for primal-dual analysis of price of anarchy are given an interpretation as “Lyapunov function estimation problems”. The equivalence also shows that our equilibrium notions may be considered true refinements and extensions of the usual notions of (coarse) correlated equilibria to the richer setting of non-concave games. Finally, we remark that as desired, the set of vector fields corresponding to the incentive constraints for (average) coarse correlated equilibrium satisfy also our notion of coarseness for first-order correlated equilibria, justifying our nomenclature.

1.2 Related work

Game theoretic analysis of multi-agent systems is often based on first the assertion of a concept of equilibrium as an expected outcome, followed by an investigation of its properties. The classical assumption is that with self-interested agents, the outcome of the game should be a Nash equilibrium, which exists for finite normal-form games [62], or more generally, concave games [70].

Diverging from classical equilibrium theory for convex markets, the outcome of a game need not be socially optimal. The algorithmic game theory perspective has then been to interpret the class of equilibria in question from the lens of approximation algorithms. First proposed by [51], the price of anarchy measures the worst-case ratio between the outcome of a Nash equilibrium and the socially optimal one. A related concept measuring the performance of the best-case equilibrium outcome, the price of stability was first employed by [78] and named so in [3]. Since then, a wealth of results have followed, proving constant welfare approximation bounds for e.g. selfish-routing [71], facility location [80], a variety of auction settings including single-item [49] and simultaneous [21] first-price auctions, simultaneous second-price auctions [20], hinting at good performance of the associated mechanisms in practice. Meanwhile, deteriorating price of anarchy bounds, such as those for sequential first-price auctions [56], are interpreted as arguments against the use of such mechanisms.

However, the assertion that the outcome of a game should be a Nash equilibrium is problematic for several reasons, despite the guarantee that it exists in concave games. It is often the case that multiple equilibria exists in a game, in which case we are faced with an equilibrium selection problem [47]. Moreover, Nash equilibrium computation is computationally hard in general, even in the setting of concave games where its existence is guaranteed. In general determining whether a pure strategy NE exists in a normal-form game is NP-complete [27], and even for two player normal form games finding an exact or approximate mixed Nash equilibrium is PPAD-complete [17, 30, 28]. Similar results extend to auction settings; for instance, (1) finding a Bayes-Nash equilibrium in a simultaneous second-price auction is PP-hard [23], even when buyers’ valuations are restricted to be either additive or unit-demand, and (2) with subjective priors computing an approximate equilibrium of a single-item first-price auction is PPAD-hard [38], a result that holds also with common priors when tie-breaking rule is also part of the input [26].

Some answers to this problem come from learning theory. First, for special classes of games, when each agent employs a specific learning algorithm in repeated instances of the game, the outcome converges in average play to Nash equilibria. This is true for fictitious play [15] for the empirical distribution of players’ actions in a variety of classes of games, including zero-sum games [69], 2×n2\times n non-degenerate normal form games [8], or potential games [59]. For the case of zero-sum games, the same is true for a more general class of no-(internal [16] or external [81]) regret learning algorithms, while for general normal-form games they respectively converge to correlated or coarse correlated equilibria – convex generalisations of the set of Nash equilibria of a normal form game. The price of anarchy / stability approach can then be extended to such coarse notions of equilibria, with the smoothness framework of [72] for robust price of anarchy bounds being a prime example.

Unfortunately, this perspective falls short of a complete picture for several reasons. First, learning dynamics can exhibit arbitrarily high complexity. Commonly considered learning dynamics may cycle about equilibria, as is the case for fictitious play [74] or for multiplicative weights update [14]. Worse, learning dynamics can exhibit formally chaotic behaviour [68, 25], bimatrix games may approximate arbitrary dynamical systems [4]. In fact, replicator dynamics on matrix games is Turing complete [5], and reachability problems for the dynamics is in general undecideable.

In the converse direction, the usual notion of no-regret learning can be too weak to capture learnable behaviour. In this case, the associated price of anarchy bounds may misrepresent the efficiency of actual learned outcomes. One motivating example here is that of auctions. For first-price single item auctions, in the complete information setting there may exist coarse correlated equilibria with suboptimal welfare, even though the unique equilibrium of the auction is welfare optimal [36]. Meanwhile, in the incomplete information setting with symmetric priors, whether coarse correlated equilibria coincide with the unique equilibrium depends on informational assumptions on the equilibrium structure itself [7] and on convexity of the priors [1]. This is in apparent contradiction with recent empirical work which suggest the equilibria of an even wider class of auctions are learnable when agents implement deep-learning or (with full feedback) gradient based no-regret learning algorithms [9, 10, 60].

This motivates the necessity of a more general notion of equilibrium analysis, stronger than coarse correlated equilibria for normal-form games and weaker than Nash equilibria, which nevertheless captures the guarantees of the above-mentioned settings and is tractable to approximate or reason about. For the case of auctions, one recent proposal has been that of mean-based learning algorithms [13], but even in that case convergence results of [31, 35] are conditional.

There are two approaches towards the resolution of this question which, while not totally divorced in methodology, can be considered distinct in their philosophy. One approach has been to consider “game dynamics as the meaning of a game” [67], inspecting the existence of dynamics which converge to Nash equilibria, and extending price of anarchy analysis to include whether an equilibrium is reached with high probability. The work on the former has demonstrated impossibility results; there are games such that any gradient dynamics have starting points which do not converge to a Nash equilibrium, and for a set of games of positive measure no game dynamics may guarantee convergence to ϵ\epsilon-Nash equilibria [11, 58]. Meanwhile, [66, 77] proposed the average price of anarchy as a refined performance metric accounting for the game dynamics. The average price of anarchy is defined as the expectation over the set of initial conditions for the welfare of reached Nash equilibria, for a fixed gradient-based learning dynamics for the game.

Another approach has been to establish the computational complexity of local notions of equilibria. This has attracted attention especially in the setting non-concave games, where the existence of Nash equilibria is no longer guaranteed, due to the success of recent practical advances in machine learning via embracing non-convexity [29]. However, approximate minmax optimisation is yet again PPAD-complete [34]. As a consequence, unless PPAD \subseteq FP, a tractably approximable local equilibrium concept for non-concave games with compact & convex action sets must necessarily be coarser. Towards this end, [48, 46] define a notion of regret that is based on a sliding average of players’ payoff functions. Meanwhile, a recent proposal by [18] has been to define a local correlated equilibrium, a distribution over action profiles and a set of local deviations such that, approximately, no player may significantly improve their payoffs when they deviate locally pointwise in the support of the distribution. They are then able to show for two classes of local deviations, such an approximate local correlated equilibrium is tractably computable.

Our goal in this paper is then to address the question of an equilibrium concept which is (1) also valid for non-concave games, (2) is stronger than coarse correlated equilibria for normal form games, (3) is tractable to approximate, and (4) admits a suitable extension of the usual primal-dual framework for bounding the expectation of quantities over the set of coarse correlated equilibria. The latter necessitates not only a distribution of play, but also incentive constraints which are specified only depending on the resulting distribution and not its time-ordered history. We remark that framework of [48] falls short in the latter aspect when the cyclic behaviour of projected gradient ascent results in a non-vanishing regret bound. We thus turn our attention to generalising the work of [18] on a local notion of Φ\Phi-equilibrium [40].

Strikingly, in doing so, we demonstrate the intrinsic link between such local notions of coarse equilibria and the dynamical systems perspective on learning in games. In particular, the two local notions of Φ\Phi-equilibria defined in [18] are subclasses of what we dub local coarse correlated equilibria, distribution of plays such that agents in aggregate do not have any strict incentive for infinitesimal changes of the support of the distribution along any gradient field over the set of action profiles. The history of play induced by online (projected) gradient ascent then approximates such an equilibrium, by virtue that it approximates a time-invariant distribution for the game’s gradient dynamics. Extending the usual primal-dual scheme for price of anarchy bounds then reveals that any dual proof of performance bounds is necessarily of the form of a “generalised” Lyapunov function for the quantity whose expectation is to be bounded. The usual LP framework for coarse correlated equilibria is in fact contained in this approach, its dual optimal solutions corresponding to a “best-fit” quadratic Lyapunov function.

Our approach in proving our results combines insights previously explored in two previous works. The existence and uniqueness of solutions to projected dynamical systems over polyhedral sets is due to [33], and in our approximation proofs we also define a history over a continuous time interval. However, our analysis differs as we are not interested in approximating the continuous time projected dynamical system itself over the entire time interval; an approach that would doom our endeavour for tractable approximations in the presence of chaotic behaviour, which is not ruled out under our assumptions [25, 4]. Instead, we are interested in showing the approximate stationarity of expectations of quantities. Therefore, for projected gradient ascent, we suitably extrapolate the history of play into a piecewise differentiable curve, in the fashion of the projections curve studied by [57]. We then identify settings in which this curve moves approximately along the payoff gradients at each point in time via consideration of the properties of the boundary of the action sets in question.

Meanwhile, whereas Lyapunov-function based arguments is not new in analysis of convergence to equilibria in economic settings (e.g. [65]), in evolutionary game theory (c.f. [73] for an in-depth discussion), and in learning in games (e.g. [61, 82]), our perspective in bounding expectations of quantities appears to be relatively unexplored. Most Lyapunov-function based arguments in literature are concerned with pointwise (local) convergence to a unique Nash equilibrium, and work under the assumption of monotonicity or variational stability, or the existence of a potential function. The former two conditions imply the existence of a quadratic Lyapunov function for the game’s projected gradient dynamics, from which Lyapunov functions for alternate learning processes may be constructed. One exception is [43], which deals explicitly with the problem of bounding expectations of stable distributions of Markov processes. Moreover, it is there that the dual solution of the LP bounding the expectation of some function is dubbed a “Lyapunov function”. The continuous time gradient dynamics we study is of course a Markov process, and a rather “boring” one in the sense that it is fully deterministic; this motivates us to denote any of our dual solutions as a generalised Lyapunov function. However, the results of [43] do not include approximations of the Markov process and how Lyapunov-function based dual proofs extend to such approximations. Moreover, it is unclear how their results apply to projected gradient ascent, with polyhedral action sets. We are able to provide positive answers on both fronts.

In turn, our analysis of local correlated equilibria depends on the more established framework of Φ\Phi-regret minimisation. Our techniques here are essentially those used in [41, 39]. One key difference is our vector field formulation; usual swap-regret minimisation [76, 39, 41] considers mappings of the action space onto itself, while we measure regret against its differential generators. The consequences are reminiscent of the result of [44] on the equivalence of no regret learning and fixed point computation, in that we require access to a fixed-point oracle for the linear (or conical) combinations of vector fields in question to implement our regret minimisation algorithms. On the other hand, our vector field formulation allows us to refine the notion of correlated equilibria to a family of vector fields for which fixed-point computation is tractable; we are unaware of any similar observation.

2 Preliminaries

In what follows, \mathbb{N} denotes the set of natural numbers222Including 0., and we identify also with each NN\in\mathbb{N} the set {n|1nN}\{n\in\mathbb{N}|1\leq n\leq N\}. Following standard notation, for a given NN\in\mathbb{N}, and any tuple (Xj)jN(X_{j})_{j\in N} indexed by NN333For instance, vectors or families of sets indexed by NN. and any iNi\in N, Xi(Xj)jN{i}X_{-i}\equiv(X_{j})_{j\in N\setminus\{i\}} denotes the tuple with the ii’th coordinate dropped. Meanwhile, for a tuple x(xj)jNx\equiv(x_{j})_{j\in N}, some iNi\in N, and yiy_{i}, (yi,xi)(y_{i},x_{-i}) is the tuple where xix_{i} is replaced by yiy_{i} in xx. In addition, given some DD\in\mathbb{N}, for each iDi\in D we will denote by eie_{i} the standard basis vector in D\mathbb{R}^{D} whose ii’th component equals one and all others zero. Given a compact and convex set XDX\in\mathbb{R}^{D}, and some xXx\in X, we let 𝒯𝒞X(x)\mathcal{TC}_{X}\left(x\right) and 𝒩𝒞X(x)\mathcal{NC}_{X}\left(x\right) respectively denote the tangent and normal cones to XX at xx, i.e.

𝒯𝒞X(x)\displaystyle\mathcal{TC}_{X}\left(x\right) =conv¯{t(yx)|t0yX},\displaystyle=\overline{\text{conv}}\{t\cdot(y-x)\ |\ t\geq 0\land y\in X\},
𝒩𝒞X(x)\displaystyle\mathcal{NC}_{X}\left(x\right) =conv¯{zD|y𝒯𝒞X(x),y,z0}.\displaystyle=\overline{\text{conv}}\{z\in\mathbb{R}^{D}\ |\ \forall\ y\in\mathcal{TC}_{X}\left(x\right),\left\langle y,z\right\rangle\leq 0\}.

Here, conv¯\overline{\text{conv}} denotes the convex closure of a set, and x,y\left\langle x,y\right\rangle is the standard inner product of xx and yy in D\mathbb{R}^{D}. In turn, for any DD\in\mathbb{N}, and any yDy\in\mathbb{R}^{D}, we write y=y,y\|y\|=\sqrt{\left\langle y,y\right\rangle} for the standard Euclidean norm on D\mathbb{R}^{D}, and then for any compact and convex set XX, d(X)=maxx,xXxxd(X)=\max_{x,x^{\prime}\in X}\|x-x^{\prime}\| for its diamater and ΠX[y]argminxXxy22\Pi_{X}\left[y\right]\equiv\operatorname*{arg\,min}_{x\in X}\|x-y\|_{2}^{2} for the projection of yy onto XDX\subseteq\mathbb{R}^{D}. Given X×iNXiX\equiv\times_{i\in N}X_{i} such that each XiDiX_{i}\subseteq\mathbb{R}^{D_{i}} for some natural number DiD_{i}, some xXx\in X, and a differentiable function f:Xf:X\rightarrow\mathbb{R}, f(x)\nabla f(x) denotes the usual gradient of ff, while if(x)\nabla_{i}f(x) is the vector (f/xij)jDiDi\left(\partial f/\partial x_{ij}\right)_{j\in D_{i}}\in\mathbb{R}^{D_{i}}. Finally, while tracking the “time evolution” of quantities, we will opt for t,Tt,T\in\mathbb{N} to denote discrete time, and τ\tau\in\mathbb{R} to denote continuous time.

Definition 1.

The data of a smooth game is specified by a tuple (N,(Xi)iN,(ui)iN)\left(N,(X_{i})_{i\in N},(u_{i})_{i\in N}\right), where

  1. 1.

    NN\in\mathbb{N} is the number (and by choice of notation, the set) of players.

  2. 2.

    For each iNi\in N, XiX_{i} is the action set of player ii. We assume that XiX_{i} is a compact and convex subset of Di\mathbb{R}^{D_{i}} for some DiD_{i}\in\mathbb{N}, and denote by X×iNXiX\equiv\times_{i\in N}X_{i} the set of outcomes of the game. We shall also write as the total dimension of the set of outcomes, D=iNDiD=\sum_{i\in N}D_{i}.

  3. 3.

    For each iNi\in N, ui:Xu_{i}:X\rightarrow\mathbb{R} is the utility function of player ii. Each uiu_{i} is assumed to be continuously differentiable and have Lipschitz gradients, that is to say, there exists Gi,Li+G_{i},L_{i}\in\mathbb{R}_{+} such that for any x,xXx,x^{\prime}\in X,

    ui(x)\displaystyle\|\nabla u_{i}(x)\| Gi,\displaystyle\leq G_{i},
    ui(x)ui(x)\displaystyle\|\nabla u_{i}(x)-\nabla u_{i}(x^{\prime})\| Lixx.\displaystyle\leq L_{i}\|x-x^{\prime}\|.

    We will denote G(Gi)iN\vec{G}\equiv(G_{i})_{i\in N} and L(Li)iN\vec{L}\equiv(L_{i})_{i\in N}, the full vector of the bounds on players’ gradients and Lipschitz coefficients.

Theoretic analysis of a game is often done by endowing it with an equilibrium concept, which specifies the “expected, stable outcome” of a given game. The standard equilibrium concept is that of a Nash equilibrium (NE), an outcome xXx\in X of a game such that for any player iNi\in N, and any action xiXix^{\prime}_{i}\in X_{i}, ui(x)ui(xi,xi)u_{i}(x)\geq u_{i}(x^{\prime}_{i},x_{-i}). Whenever all uiu_{i} are concave over XiX_{i} given any fixed xiXix_{-i}\in X_{-i}, such an equilibrium necessarily exists [70]. However, for a generic smooth game, an NE need not exist [29], which motivates the notion of a first-order Nash equilibrium; an outcome xXx\in X is called a first-order NE if for every player ii, iui(x)𝒩𝒞Xi(xi)\nabla_{i}u_{i}(x)\in\mathcal{NC}_{X_{i}}\left(x_{i}\right).

By fixed point arguments, a first-order NE necessarily exists for a smooth game. However, the computation of such an equilibrium is not known to be tractable as of present date. This has lead to questions on whether there exists a correlated variant of a first-order NE; keeping in mind that correlated and coarse correlated equilibria (respectively, CE and CCE) of a finite game are computable in time polynomial in the size of its normal-form representation, one would ask whether such a local CCE is also tractable to compute or at least approximate. Towards this end, [18] propose one definition of a local Φ\Phi-equilibrium, analogous to the definition of Φ\Phi-regret for finite games [40].

Definition 2 ([18], Definition 1).

For ϵ>0\epsilon>0, a distribution σ\sigma over XX is said to be an ϵ\epsilon-approximate Φ\Phi equilibrium if for any iNi\in N and ϕiΦXi(δ)\phi_{i}\in\Phi^{X_{i}}(\delta), 𝔼xσ[ui(ϕi(xi),xi)ui(x)]ϵ\mathbb{E}_{x\sim\sigma}[u_{i}(\phi_{i}(x_{i}),x_{-i})-u_{i}(x)]\leq\epsilon.

Definition 3 ([18], Definition 3).

For some δ>0\delta>0 and a player ii, ΦXi\Phi^{X_{i}} is called a family of δ\delta-local strategy modifications if for each element ϕi:XiXi\phi_{i}:X_{i}\rightarrow X_{i} of ΦXi\Phi^{X_{i}} and each xiXix_{i}\in X_{i}, ϕ(xi)xiδ\|\phi(x_{i})-x_{i}\|\leq\delta.

Definitions 2 and 3 together thus provide a local notion of correlated equilibrium, insofar that players consider modifying their strategies by a δ\delta-small step after they are drawn444Indeed, an earlier version of [18] called this equilibrium concept an (ϵ,Φ)(\epsilon,\Phi)-local correlated equilibrium. In face of the results in Section 5.2, we retain the nomenclature.. The equilibrium concept depends on the families of local strategy modifications, and [18] show the tractable approximability of such equilibria for two such families,

ΦIntXi(δ)\displaystyle\Phi^{X_{i}}_{\textnormal{Int}}(\delta) ={xiδxi+(1δ)xi|xiXi},\displaystyle=\{x_{i}\mapsto\delta x_{i}^{*}+(1-\delta)x_{i}\ |\ x^{*}_{i}\in X_{i}\},
ΦProjXi(δ)\displaystyle\Phi^{X_{i}}_{\textnormal{Proj}}(\delta) ={xiΠXi[xi+δv]|vDi,v1}.\displaystyle=\{x_{i}\mapsto\Pi_{X_{i}}\left[x_{i}+\delta v\right]\ |\ v\in\mathbb{R}^{D_{i}},\|v\|\leq 1\}.

An immediate observation is that both ϕIntXi\phi^{X_{i}}_{\textnormal{Int}} and ϕProjXi\phi^{X_{i}}_{\textnormal{Proj}} as δ\delta-strategy modifications are provided by families of gradient fields of functions over XiX_{i}. Specifically, we may write

ΦIntXi(δ)\displaystyle\Phi^{X_{i}}_{\textnormal{Int}}(\delta) ={xixiδixixi2/2|xiXi},\displaystyle=\{x_{i}\mapsto x_{i}-\delta\nabla_{i}\|x_{i}-x_{i}^{*}\|^{2}/2\ |\ x^{*}_{i}\in X_{i}\}, (1)
ΦProjXi(δ)\displaystyle\Phi^{X_{i}}_{\textnormal{Proj}}(\delta) ={xiΠXi[xi+δixi,v]|vDi,v1}.\displaystyle=\{x_{i}\mapsto\Pi_{X_{i}}\left[x_{i}+\delta\nabla_{i}\left\langle x_{i},v\right\rangle\right]\ |\ v\in\mathbb{R}^{D_{i}},\|v\|\leq 1\}. (2)

And of course, we may extend each function XiX_{i}\rightarrow\mathbb{R} to one XX\rightarrow\mathbb{R}, by letting it depend only on the action of player ii; for such a function hh, jh=0\nabla_{j}h=0 for any jij\neq i. In what follows, we will exploit this observation and define alternate, differential definitions of local correlated equilibria, and its coarseness. In particular, we shall identify the appropriate notion of coarseness to be when the equilibrium constraints are all provided by gradient fields – as in this case, online gradient ascent will provide a universal approximation algorithm. To define our differential notion of equilibria, we first modify Definitions 2 and 3 to only apply to sufficiently smooth vector fields.

Definition 4.

For Δ>0\Delta>0 and ϵ0\epsilon\geq 0, a distribution σ\sigma over XX is said to be an (ϵ,Δ)(\epsilon,\Delta)-local CE with respect to a family FF of Lipschitz continuous vector fields XDX\rightarrow\mathbb{R}^{D}, if for every fFf\in F,

iN𝔼xσ[ui(ΠXi[xi+δfi(x)],xi)ui(x)]ϵδpoly(G,L,Gf,Lf)+o(δ)δ[0,Δ].\sum_{i\in N}\mathbb{E}_{x\sim\sigma}\left[u_{i}(\Pi_{X_{i}}\left[x_{i}+\delta f_{i}(x)\right],x_{-i})-u_{i}(x)\right]\leq\epsilon\delta\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f})+o(\delta)\ \forall\ \delta\in[0,\Delta]. (3)

Here, GfG_{f} and LfL_{f} are respectively bounds on the magnitude of f(x)\|f(x)\| and the Lipschitz coefficient of ff, analogous to GiG_{i} and LiL_{i}. If ϵ=0\epsilon=0, σ\sigma is simply called a local CE.

Definition 5 (Coarseness).

For Definition 4 and in all subsequent definitions in this section, if the family FF is given as a family of gradient fields {h|hH}\{\nabla h\ |\ h\in H\} for a subset HC2(X)H\subseteq C^{2}(X) of continuously differentiable functions, σ\sigma is said to be a coarse equilibrium.

We remark on the difference in the role of ϵ\epsilon in Definitions 2 and 3 versus that of our Definition 4; in the former, ϵ\epsilon is an absolute bound on the violation of the equilibrium constraints, while in the latter ϵ\epsilon is a multiplicative bound on the expectation of sum over the set of players of the derivatives of uiu_{i} in direction (ΠXi[xi+δf(xi)]xi)/δ\left(\Pi_{X_{i}}\left[x_{i}+\delta f(x_{i})\right]-x_{i}\right)/\delta. Another difference between the two definitions is that Definition 4 considers arbitrary suitably smooth vector fields f:XDf:X\rightarrow\mathbb{R}^{D} and not those defined solely on XiX_{i} for some iNi\in N. Finally, we remark that our bounds can be non-uniform, with regret against fFf\in F deteriorating potentially as a polynomial555Quotienting out this polynomial has the convenient side effect that it allows for simpler ways to write ϵ\epsilon. in terms of the bounds on the magnitude and Lipschitz moduli of players’ utilities and those of ff.

Dividing both sides of (3) and taking the limit δ0\delta\downarrow 0 for an (ϵ,Δ)(\epsilon,\Delta)-local CE gives the following differential definition of an ϵ\epsilon-local CCE.

Definition 6.

For ϵ>0\epsilon>0, a distribution σ\sigma over XX is said to be an ϵ\epsilon-local CE with respect to a family FF of Lipschitz continuous vector fields XDX\rightarrow\mathbb{R}^{D}, if for every fFf\in F,

iN𝔼xσ[Π𝒯𝒞Xi(xi)[fi(x)],iui(x)]ϵpoly(G,L,Gf,Lf).\sum_{i\in N}\mathbb{E}_{x\sim\sigma}\left[\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[f_{i}(x)\right],\nabla_{i}u_{i}(x)\right\rangle\right]\leq\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}). (4)

Via an appeal to the projected gradient dynamics of the smooth game, we will show that such notions of local coarse correlated equilibria are in fact weakenings of the equilibrium concept we obtain by demanding our distribution σ\sigma to be invariant under time translation for the projected gradient dynamics of the smooth game. Such dynamics are provided by the system of differential equations,

dxi(τ)dτ=Π𝒯𝒞Xi(xi)[iui(x(τ))]iN,\frac{dx_{i}(\tau)}{d\tau}=\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\ \forall\ i\in N, (5)

with initial conditions xi(0)Xix_{i}(0)\in X_{i} for each player ii. By [19], for any initial conditions x(0)Xx(0)\in X, there is a unique absolutely continuous solution x(τ)x(\tau) for τ[0,)\tau\in[0,\infty) such that (5) holds almost everywhere.

Now, suppose that a distribution σ\sigma is invariant under time translation, that is to say, for any measurable set AXA\subseteq X and any τ>0\tau>0, the set A1(τ)={x(0)X|x(τ)A}A^{-1}(\tau)=\{x(0)\in X\ |\ x(\tau)\in A\} is measurable, and moreover, σ(A1(τ))=σ(A)\sigma(A^{-1}(\tau))=\sigma(A). In this case, for any continuously differentiable function h:Xh:X\rightarrow\mathbb{R}, ddτ𝔼x(0)σ[h(x(τ))]=0\frac{d}{d\tau}\mathbb{E}_{x(0)\sim\sigma}[h(x(\tau))]=0 at τ=0\tau=0. In particular, whenever the expectation and the time derivative commute,

iN𝔼xσ[ih(x),Π𝒯𝒞Xi(xi)[iui(x)]]=iN𝔼x(0)σ[ih(x),dxi(τ)/dτ|τ=0]=0.\sum_{i\in N}\mathbb{E}_{x\sim\sigma}\left[\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle\right]=\sum_{i\in N}\mathbb{E}_{x(0)\sim\sigma}\left[\left\langle\nabla_{i}h(x),dx_{i}(\tau)/d\tau\right\rangle\Bigg{|}_{\tau=0}\right]=0.

Thus distributions which are stationary under gradient dynamics can be viewed to satisfy a different no-regret property, each describing that expected value of a given function does not change under time translation. First-order Nash equilibria of course satisfy such a no-regret property; if xXx\in X is a fixed-point of the gradient dynamics of the game, then Π𝒯𝒞Xi(xi)[iui(x)]=0\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]=0 for every player ii. Therefore, for any vector vDv\in\mathbb{R}^{D}, v,Π𝒯𝒞Xi(xi)[iui(x)]=0\langle v,\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\rangle=0. Together, these observations motivate our notion of stationary correlated equilibria.

Definition 7.

For ϵ>0\epsilon>0, a distribution σ\sigma is said to be an ϵ\epsilon-stationary CE with respect to a family FF of Lipschitz continuous vector fields XDX\rightarrow\mathbb{R}^{D}, if for every fFf\in F,

|iN𝔼xσ[Π𝒯𝒞Xi(xi)[iui(x)],fi(x)]|ϵpoly(G,L,Gf,Lf).\left|\sum_{i\in N}\mathbb{E}_{x\sim\sigma}\left[\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],f_{i}(x)\right\rangle\right]\right|\leq\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}). (6)

3 Approximability of First-Order Coarse Correlated Equilibria

A primary objective of our work is to show that the above notions of ϵ\epsilon-local or ϵ\epsilon-stationary CCE are universally approximable in some “game theoretically relevant” settings. Here, by universal we mean that in Definition 5 we may take HH be the set of all differentiable functions h:Xh:X\rightarrow\mathbb{R} with Lipschitz gradients and nevertheless tractably compute approximate local or stationary equilibria. We shall establish this by showing that projected gradient dynamics obtains an approximate stationary CCE, and such stationary CCE are necessarily also local CCE. Whereas for the purposes of practical algorithms we will need to investigate when all players take discrete steps in time, the form of analysis is motivated by how projected gradient dynamics yields our desired approximation.

To wit, let hHh\in H be given, and consider sampling uniformly from the trajectory of the projected gradient dynamics given initial conditions x(0)x(0). That is, we consider the distribution σ\sigma on XX by drawing τ[0,τ¯]\tau\in[0,\overline{\tau}] and sampling x(τ)x(\tau). In this case,

𝔼xσ[iNΠ𝒯𝒞Xi(xi)[iui(x)],ih(x)]\displaystyle\mathbb{E}_{x\sim\sigma}\left[\sum_{i\in N}\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\nabla_{i}h(x)\right\rangle\right] =𝔼τU[0,τ¯][iNdxi(τ)/dτ,ih(x(τ))]\displaystyle=\mathbb{E}_{\tau\sim U[0,\overline{\tau}]}\left[\sum_{i\in N}\left\langle dx_{i}(\tau)/d\tau,\nabla_{i}h(x(\tau))\right\rangle\right]
=0τ¯1τ¯dh(x(τ))dτ𝑑τ\displaystyle=\int_{0}^{\overline{\tau}}\frac{1}{\overline{\tau}}\frac{dh(x(\tau))}{d\tau}\cdot d\tau
=h(x(τ¯))h(x(0))τ¯.\displaystyle=\frac{h(x(\overline{\tau}))-h(x(0))}{\overline{\tau}}.

Now, by the bound on the gradient of hh, |h(x(τ¯))h(x(0))|d(X)Gh|h(x(\overline{\tau}))-h(x(0))|\leq d(X)\cdot G_{h}, which immediately shows that we obtain an ϵ\epsilon-stationary CCE with ϵ=d(X)/τ¯\epsilon=d(X)/\overline{\tau} and poly(G,L,Gh,Lh)=Gh\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})=G_{h}. Interestingly, we do not yet need to invoke any bounds on the magnitude of the utility gradients nor any Lipschitz moduli.

Now, suppose for a simple example that each XiX_{i} is the DiD_{i}-dimensional [0,1][0,1]-hypercube. In this case, we will argue that

𝔼xσ[iNΠ𝒯𝒞Xi(xi)[ih(x)],iui(x)]𝔼xσ[iNΠ𝒯𝒞Xi(xi)[iui(x)],ih(x)],\mathbb{E}_{x\sim\sigma}\left[\sum_{i\in N}\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right],\nabla_{i}u_{i}(x)\right\rangle\right]\leq\mathbb{E}_{x\sim\sigma}\left[\sum_{i\in N}\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\nabla_{i}h(x)\right\rangle\right],

which implies that the given distribution is an ϵ\epsilon-local CCE with respect to HH with the very same ϵ\epsilon. We shall do so by showing that, for each iNi\in N and each τ[0,τ¯]\tau\in[0,\overline{\tau}],

Π𝒯𝒞Xi(xi)[ih(x)],iui(x)Π𝒯𝒞Xi(xi)[iui(x)],ih(x)\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right],\nabla_{i}u_{i}(x)\right\rangle\leq\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\nabla_{i}h(x)\right\rangle (7)

except on a subset of measure zero of [0,τ¯][0,\overline{\tau}]. Here, we supress the time-dependence for the purpose of parenthesis economy. In this case, first note that

ih(x)\displaystyle\nabla_{i}h(x) =Π𝒯𝒞Xi(xi)[ih(x)]+Π𝒩𝒞Xi(xi)[ih(x)], and\displaystyle=\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right]+\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right],\textnormal{ and}
iui(x)\displaystyle\nabla_{i}u_{i}(x) =Π𝒯𝒞Xi(xi)[iui(x)]+Π𝒩𝒞Xi(xi)[iui(x)].\displaystyle=\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]+\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right].

As a consequence,

Π𝒯𝒞Xi(xi)[ih(x)],iui(x)\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right],\nabla_{i}u_{i}(x)\right\rangle Π𝒯𝒞Xi(xi)[iui(x)],ih(x)\displaystyle\leq\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\nabla_{i}h(x)\right\rangle
Π𝒯𝒞Xi(xi)[ih(x)],Π𝒩𝒞Xi(xi)[iui(x)]\displaystyle\Leftrightarrow\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right],\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle Π𝒯𝒞Xi(xi)[iui(x)],Π𝒩𝒞Xi(xi)[ih(x)].\displaystyle\leq\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}h(x)\right]\right\rangle.

Notice that, by the definition of the tangent and normal cones, on the second line, both the left-hand side and the right-hand side specify quantities which are non-positive. Therefore, it is sufficient to show that the right-hand side equals 0 for almost every τ[0,τ¯]\tau\in[0,\overline{\tau}].

For the case of the hypercube, we may compute projections coordinate-wise. So suppose that at time τ\tau, Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]jΠ𝒩𝒞Xi(xi(τ))[ih(x(τ))]j<0\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]_{j}\cdot\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right]_{j}<0. This can only be when |Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]j|>0|\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]_{j}|>0, hence for some γ(τ)>0\gamma(\tau)>0, on the interval τ(τ,τ+γ(τ))\tau^{\prime}\in(\tau,\tau+\gamma(\tau)), no hypercube constraint for coordinate jj binds. As a consequence, for τ(τ,τ+γ(τ))\tau^{\prime}\in(\tau,\tau+\gamma(\tau)), we have Π𝒩𝒞Xi(xi(τ))[ih(x(τ))]j=0\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau^{\prime})\right)}\left[\nabla_{i}h(x(\tau^{\prime}))\right]_{j}=0. Now, denote by T¯ij\overline{T}_{ij} the set of τ\tau in which we incur a projection loss, Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]jΠ𝒩𝒞Xi(xi(τ))[ih(x(τ))]j<0\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]_{j}\cdot\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right]_{j}<0. Then τT¯ijγ(τ)<τ¯\sum_{\tau\in\overline{T}_{ij}}\gamma(\tau)<\overline{\tau}, as each interval (τ,τ+γ(τ))(\tau,\tau+\gamma(\tau)) is disjoint. This in turn implies that T¯ij\overline{T}_{ij} is countable, as the sum of a set of strictly positive numbers of uncountable cardinality is unbounded. Therefore, iN,jDiT¯ij\cup_{i\in N,j\in D_{i}}\overline{T}_{ij} is countable. This is a superset (with potential equality) of τ[0,τ¯]\tau\in[0,\overline{\tau}] for which (7) fails to hold for some player ii, which implies our desired result.

3.1 Approximate projected gradient ascent

These argument presented in Section 3 form the basis of our proofs of tractable approximability of local or stationary CCE. In particular, we shall see that after proving the approximability of a stationary CCE, that a local CCE itself is also approximable will then follow, by arguing that the zero-measure property above is maintained for the analogue of (7).

There is yet a hiccup though; most learning algorithms in practice take discrete steps. Whereas [33] (Theorem 3) implies that a discrete analogue of (5) would approximate the continuous dynamics uniformly over an interval [0,T][0,T] for sufficiently small step sizes, their result does not come with approximation guarantees. And besides, our formulation of the gradient dynamics of a game is general enough that we cannot yet rule out chaotic behaviour (as in e.g. [68]).

Thus we want to avoid any attempts at an uniform approximation of a solution of (5). Instead, we shall consider the case when each player uses projected gradient ascent with equal, adaptive learning rates as their learning algorithm. We shall then extend this to a piecewise-defined continuous curve as an approximation of the underlying projected gradient dynamics. This parametrised curve might actually diverge from actual solutions of (5), and would significantly do so in the presence of chaos. But we shall in the following subsections that this does not end up mattering for the purposes of vanishing regret bounds. Moreover, our discussion in Section 5 will show that, for the purposes of bounding expectations of quantities, regret bounds for the continuous curve are “as-if” they were for the corresponding weighted average distribution of play. All in all, our discrete-time learning algorithm and the resulting continuous curve of interest are defined:

Definition 8.

Given a sequence of decreasing step sizes ηt>0\eta_{t}>0 such that t=0ηt=\sum_{t=0}^{\infty}\eta_{t}=\infty and t=0Tηt2=o(t=0Tηt)\sum_{t=0}^{T}\eta_{t}^{2}=o\left(\sum_{t=0}^{T}\eta_{t}\right), projected gradient ascent with equal, adaptive learning rates has each player iNi\in N play xi0x_{i}^{0} at time t=0t=0, and update their strategies

xit+1=ΠXi[xit+ηtiui(xt)].x_{i}^{t+1}=\Pi_{X_{i}}\left[x_{i}^{t}+\eta_{t}\nabla_{i}u_{i}(x^{t})\right].

After T>0T>0 time steps, the learning dynamics outputs a history (xt)t=0T(x^{t})_{t=0}^{T} of play. Let μt>0\mu_{t}>0 be any sequence of weights. We proceed by defining via this history an approximate, μ\mu-accelerated projected gradient dynamics for the game, via extending this distribution in a piecewise fashion to a curve x(τ):[0,τ¯]Xx(\tau):\left[0,\overline{\tau}\right]\rightarrow X where τ¯=t=0T1μtηt\overline{\tau}=\sum_{t=0}^{T-1}\mu_{t}\cdot\eta_{t}. Towards this end, if there exists tt^{\prime}\in\mathbb{N} such that τ=t=0t1μtηt\tau=\sum_{t=0}^{t^{\prime}-1}\mu_{t}\cdot\eta_{t}, then x(τ)=xtx(\tau)=x^{t^{\prime}}. Otherwise, for τ[0,τ¯]\tau\in[0,\overline{\tau}], let τ¯=max{t=0t1μtηt|t,t=0t1μtηtτ}\underline{\tau}=\max\left\{\sum_{t=0}^{t^{\prime}-1}\mu_{t}\cdot\eta_{t}|t^{\prime}\in\mathbb{N},\sum_{t=0}^{t^{\prime}-1}\mu_{t}\cdot\eta_{t}\leq\tau\right\}, and fix

x(τ)=ΠXi[xi+1μt(ττ¯)iui(x(τ¯))].x(\tau)=\Pi_{X_{i}}\left[x_{i}+\frac{1}{\mu_{t}}\cdot\left(\tau-\underline{\tau}\right)\cdot\nabla_{i}u_{i}(x(\underline{\tau}))\right].

We will then denote ητηt\eta_{\tau}\equiv\eta_{t^{\prime}} and μτμt\mu_{\tau}\equiv\mu_{t^{\prime}} for the tt^{\prime} which determines τ¯\underline{\tau}.

Our notion of an approximate projected gradient dynamics is obtained, in a sense, by connecting the “dots” traced by projected gradient ascent, where the time experienced by the curve between time steps tt and t+1t+1 equals μtηt\mu_{t}\cdot\eta_{t}. We will focus on two choices of μ\mu; weights μt=1/ηt\mu_{t}=1/\eta_{t} will allow us to track regret when we consider the smoothed time-average distribution, whereas the weights μt=1\mu_{t}=1 will quantify results for smoothed trajectory which remains faithful to the “proper time” experienced by the curve. It is certainly not the only way in which one could obtain such a continuous curve – one could, for instance, consider instead the piecewise linear extension. However, as it shall become apparent in the subsequent sections, our proofs of approximability depend crucially on whether the velocity of the curve is sufficiently close to the tangent cone projection of the utility gradient (scaled by 1/μt1/\mu_{t}), and our definition ensures so for the two classes of compact and convex action sets we consider.

To obtain regret bounds given the weights μt\mu_{t}, we will use the following lemma throughout, which follows standard arguments (e.g. Theorem 2.3, [22]) to bound “almost” telescoping sums; but for bounds relating to stationary CCE, we need our bound to be double-sided.

Lemma 3.1.

Let XX be a compact & convex set, (xt)t=0(x^{t})_{t=0}^{\infty} be an arbitrary sequence in XX, and h:Xh:X\rightarrow\mathbb{R} a function with bounded, Lipschitz gradients. Then for any positive sequence (μt)t=0(\mu_{t})_{t=0}^{\infty} which is non-decreasing and any integer T>0T>0,

|t=0T1μt(h(xt+1)h(xt))|2d(X)Gh(μT1+μ0).\left|\sum_{t=0}^{T-1}\mu_{t}\left(h(x^{t+1})-h(x^{t})\right)\right|\leq 2d(X)\cdot G_{h}\cdot(\mu_{T-1}+\mu_{0}).
Proof.

Follows from simple rearrangement of the sum,

t=0T1μt(h(xt+1)h(xt))\displaystyle\sum_{t=0}^{T-1}\mu_{t}\left(h(x^{t+1})-h(x^{t})\right) =μ0h(x0)+μT1h(xT)+t=0T2h(xt+1)(μtμt+1)\displaystyle=-\mu_{0}h(x^{0})+\mu_{T-1}h(x^{T})+\sum_{t=0}^{T-2}h(x^{t+1})\cdot(\mu_{t}-\mu_{t+1})
|t=0T1μt(h(xt+1)h(xt))|\displaystyle\Rightarrow\left|\sum_{t=0}^{T-1}\mu_{t}\left(h(x^{t+1})-h(x^{t})\right)\right| =|μ0h(x0)+μT1h(xT)+t=0T2h(xt+1)(μtμt+1)|\displaystyle=\left|-\mu_{0}h(x^{0})+\mu_{T-1}h(x^{T})+\sum_{t=0}^{T-2}h(x^{t+1})\cdot(\mu_{t}-\mu_{t+1})\right|
μ0|h(x0)|+μT1|h(xT)|+|t=0T2h(xt+1)(μtμt+1)|.\displaystyle\leq\mu_{0}|h(x^{0})|+\mu_{T-1}|h(x^{T})|+\left|\sum_{t=0}^{T-2}h(x^{t+1})\cdot(\mu_{t}-\mu_{t+1})\right|.

Of course, the empty sum has value 0 when T=1T=1. If T>1T>1, we note that since μt\mu_{t} is non-decreasing, μtμt+10\mu_{t}-\mu_{t+1}\leq 0 for any tt. Therefore, we obtain an upper bound when all h(xt+1)h(x_{t+1}) are fixed to their bound in absolute value, d(X)Ghd(X)\cdot G_{h}. ∎

3.2 Compact & convex action sets of smooth boundary

Our first approximability result is in the setting in which the boundary of each XiX_{i} is a regular hypersurface in Di\mathbb{R}^{D_{i}}, where all of its principal curvatures are bounded by some K>0K>0. For a reader without a background in differential geometry, we provide a very brief account of the facts on hypersurfaces in Euclidean space that we will need for our analysis.

Definition 9 (Follows from [53] Proposition 5.16, mentioned in e.g. [54]).

A regular hypersurface in D\mathbb{R}^{D} is a set SS such that for every point pSp\in S, there exists an open set UpU\ni p and a smooth function F:UF:U\rightarrow\mathbb{R}, such that SU=F1(0)S\cap U=F^{-1}(0) and F(p)0\nabla F(p)\neq 0.

Namely, a regular hypersurface SS is at each point defined locally via an equation F(x)=0F(x)=0 for some function FF. The plane tangent to SS at pp is given by the equation xTF(p)=pTF(p)x^{T}\nabla F(p)=p^{T}\nabla F(p). Working with the basis (ei)1D(e_{i})_{1\leq\ell\leq D} in D\mathbb{R}^{D} such that F(p)\nabla F(p) has only its DD’th coordinate non-zero, F(x)F(x) admits a Taylor expansion

F(x)=(xDpD)F(p)+12,k=1D2F(p)xxk(xp)(xkpk)+F(x)=(x_{D}-p_{D})\cdot\|\nabla F(p)\|+\frac{1}{2}\cdot\sum_{\ell,k=1}^{D}\frac{\partial^{2}F(p)}{\partial x_{\ell}\partial x_{k}}\cdot(x_{\ell}-p_{\ell})(x_{k}-p_{k})+\ldots

The implicit function theorem allows us to write xDx_{D} as a function of other xx_{\ell} in a neighbourhood of pp on SS. This dependence is dominated by the quadratic terms involving x1,,xD1x_{1},\ldots,x_{D-1}, thus for some (D1)×(D1)(D-1)\times(D-1) symmetric matrix AA,

(xDpD),k=1D112Ak(xp)(xkpk)+o(xp2).(x_{D}-p_{D})\simeq\sum_{\ell,k=1}^{D-1}\frac{1}{2}\cdot A_{\ell k}\cdot(x_{\ell}-p_{\ell})(x_{k}-p_{k})+o(\|x-p\|^{2}).

This matrix AA is called the (scalar) second fundamental form. As AA is symmetric, it has an orthonormal basis of eigenvectors with real eigenvalues. Each such eigenvector corresponds to a principal direction, and the corresponding eigenvalue is the principal curvature in that direction.

Thus, if the boundary of each action set XiX_{i} is a regular hypersurface of bounded principal curvature, then the boundary of XiX_{i} can be approximated by the level set of a quadratic function about each point. We also remark that the convexity assumption on XiX_{i} implies that if F(p)𝒩𝒞Xi(p)\nabla F(p)\in\mathcal{NC}_{X_{i}}\left(p\right), then all principal curvatures are bounded below by 0. With this in mind, we proceed with our proofs of approximability. The first step of the analysis comprises of bounding the difference between dxi(τ)dτ\frac{dx_{i}(\tau)}{d\tau} and the tangent cone projection Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]. This turns out to be a straightforward matter in this setting, thanks to the following observation.

Proposition 3.2.

Let XiX_{i} be a compact and convex set of smooth boundary, and suppose its boundary δXi\delta X_{i} has non-negative principal curvature at most KK. Then for any τ[0,T]\tau\in[0,T],

dxi(τ)dτ1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]KGi(ττ¯)/μτ1+KGi(ττ¯)/μτ1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))].\left\|\frac{dx_{i}(\tau)}{d\tau}-\frac{1}{\mu_{\tau}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right\|\leq\frac{KG_{i}\cdot(\tau-\underline{\tau})/\mu_{\tau}}{1+KG_{i}\cdot(\tau-\underline{\tau})/\mu_{\tau}}\cdot\frac{1}{\mu_{\tau}}\cdot\left\|\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right\|.
Proof.

Fix τ[0,τ¯]\tau\in[0,\overline{\tau}] arbitrarily. We shall work in coordinates where xi(τ)x_{i}(\tau) is at the origin666i.e. we work in coordinates where xi(τ)=0x_{i}(\tau)=0, which can be assured by some translation. This ensures that we don’t need to write out the xi(τ)-x_{i}(\tau) for Δxi(τ)\Delta x_{i}(\tau), and that the difference between xi(τ)x_{i}(\tau) and the point that defines xi(τ)x_{i}(\tau) pre-projection. This also makes the specification of the minimisation problem a bit easier to read., and

Δxi(τ)xi(τ¯)+(ττ¯)1μτiui(x(τ¯))=he1\Delta x_{i}(\tau)\equiv x_{i}(\underline{\tau})+(\tau-\underline{\tau})\cdot\frac{1}{\mu_{\tau}}\cdot\nabla_{i}u_{i}(x(\underline{\tau}))=h\cdot e_{1}

for some h0h\geq 0, where each e2,e3,,edie_{2},e_{3},...,e_{d_{i}} are principal directions of curvature for δXi\delta X_{i} at xi(τ)=0x_{i}(\tau)=0. In this case by the smoothness of the boundary, for a small neighbourhood about xi(τ)x_{i}(\tau), XiX_{i} is well-approximated (up to second order in (w)=2di(w_{\ell})_{\ell=2}^{d_{i}}) by the convex body,

X~i={wdi|w1=2di12kw2},\tilde{X}_{i}=\left\{w\in\mathbb{R}^{d_{i}}\ |\ w_{1}\leq\sum_{\ell=2}^{d_{i}}-\frac{1}{2}k_{\ell}w_{\ell}^{2}\right\},

where kk_{\ell} is the (principal) curvature in direction ee_{\ell} for the surface δXi\delta X_{i} at point xi(τ)x_{i}(\tau). Note that by assumption, kKk_{\ell}\leq K for every 2di2\leq\ell\leq d_{i}. As a consequence, at time τ+ϵ\tau+\epsilon for small ϵ>0\epsilon>0, xi(τ+ϵ)x_{i}(\tau+\epsilon) and the solution to the projection problem

minw(w1hϵg1/μτ)2+=2di(wϵg/μτ)2 subject to w1=2di12kw2\displaystyle\min_{w}\quad(w_{1}-h-\epsilon g_{1}/\mu_{\tau})^{2}+\sum_{\ell=2}^{d_{i}}(w_{\ell}-\epsilon g_{\ell}/\mu_{\tau})^{2}\textnormal{ subject to }w_{1}\leq\sum_{\ell=2}^{d_{i}}-\frac{1}{2}k_{\ell}w_{\ell}^{2}

agree up to first-order terms in ϵ\epsilon. At any solution, the constraint will bind, leading us to the unconstrained optimisation problem

minw(hϵg1μτ=2di12kw2)2+=2di(wϵgμτ)2.\displaystyle\min_{w}\quad\left(-h-\epsilon\cdot\frac{g_{1}}{\mu_{\tau}}-\sum_{\ell=2}^{d_{i}}\frac{1}{2}k_{\ell}w_{\ell}^{2}\right)^{2}+\sum_{\ell=2}^{d_{i}}\left(w_{\ell}-\epsilon\cdot\frac{g_{\ell}}{\mu_{\tau}}\right)^{2}.

Now, the first-order optimality conditions are

2di,kw(hϵg1μτ=2di12kw2)+wϵgμτ=0.\forall\ 2\leq\ell\leq d_{i},k_{\ell}w_{\ell}\left(-h-\epsilon\cdot\frac{g_{1}}{\mu_{\tau}}-\sum_{\ell=2}^{d_{i}}\frac{1}{2}k_{\ell}w_{\ell}^{2}\right)+w_{\ell}-\epsilon\cdot\frac{g_{\ell}}{\mu_{\tau}}=0.

For sufficiently small ϵ>0\epsilon>0, at an optimal solution w=O(ϵ)w_{\ell}=O(\epsilon) for each 2\ell\geq 2 and =2dikw2=O(ϵ2)\sum_{\ell=2}^{d_{i}}k_{\ell}w_{\ell}^{2}=O(\epsilon^{2}). Therefore, up to first-order in ϵ\epsilon, for each 2\ell\geq 2,

w=ϵg/μτ1+kh.w_{\ell}=\frac{\epsilon g_{\ell}/\mu_{\tau}}{1+k_{\ell}h}.

This in turn implies that

dxi(τ)dτ={0=1,g/μτ1+kh1.\frac{dx_{i\ell}(\tau)}{d\tau}=\begin{cases}0&\ell=1,\\ \frac{g_{\ell}/\mu_{\tau}}{1+k_{\ell}h}&\ell\neq 1.\end{cases}

Therefore, the difference between the velocity of motion and the μτ\mu_{\tau} scaled projection to the tangent cone of the utility gradient is, for any principal direction \ell,

(dxi(τ)dτ1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))])=gμτ(kh1+kh).\left(\frac{dx_{i}(\tau)}{d\tau}-\frac{1}{\mu_{\tau}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right)_{\ell}=-\frac{g_{\ell}}{\mu_{\tau}}\cdot\left(\frac{k_{\ell}h}{1+k_{\ell}h}\right).

By the bound on the magnitude of iui(x(τ¯))\nabla_{i}u_{i}(x(\underline{\tau})) and the feasibility of xi(τ¯)x_{i}(\underline{\tau}), hGi(ττ¯)/μτh\leq G_{i}\cdot(\tau-\underline{\tau})/\mu_{\tau}. In turn, kh1+khKh1+Kh\frac{k_{\ell}h}{1+k_{\ell}h}\leq\frac{Kh}{1+Kh}, by which we have the desired result. ∎

In particular, our bound on the curvature loss from Proposition 3.2 shows that at each step tt, the cumulative magnitude of the difference between the velocities for the curve implied by the true projected gradient dynamics of the game and that of the approximate projected gradient dynamics are of order O(ηt2)O(\eta_{t}^{2}), no matter how the term μt\mu_{t} adjusts the velocity of the curve.

Lemma 3.3.

Whenever XiX_{i} is a compact and convex set with smooth boundary of bounded principal curvature KK, approximate μ\mu-accelerated gradient dynamics satisfies, for every time step tt with τ(t)=t=0t1ηtμt\tau(t)=\sum_{t^{\prime}=0}^{t-1}\eta_{t}\cdot\mu_{t},

τ(t)τ(t)+μtηt𝑑τ|ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))]dxi(τ)dτ|12ητ2Gh(KGi2+LijNGj),\int_{\tau(t)}^{\tau(t)+\mu_{t}\eta_{t}}d\tau\cdot\left|\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle\right|\leq\frac{1}{2}\cdot\eta_{\tau}^{2}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right),

independent of the choice of μt>0\mu_{t}>0.

Proof.

To see the result, we shall write, for any τ[τ¯,τ¯+μtηt]\tau\in[\underline{\tau},\underline{\tau}+\mu_{t}\eta_{t}], noting that μt=μτ\mu_{t}=\mu_{\tau} and ηt=ητ\eta_{t}=\eta_{\tau},

ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))]dxi(τ)dτ\displaystyle\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle
=\displaystyle=\ 1μtih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]Π𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]\displaystyle\frac{1}{\mu_{t}}\cdot\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right\rangle (8)
+\displaystyle+\ ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]dxi(τ)dτ.\displaystyle\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle. (9)

We then proceed by taking the absolute value of both sides and applying the triangle inequality. The magnitude of (8) is bounded, by the Cauchy-Schwarz inequality,

1μt|ih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]Π𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]|\displaystyle\frac{1}{\mu_{t}}\cdot\left|\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right\rangle\right|
\displaystyle\leq\ 1μtih(x(τ))Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]Π𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]\displaystyle\frac{1}{\mu_{t}}\cdot\|\nabla_{i}h(x(\tau))\|\|\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\|
\displaystyle\leq\ 1μtih(x(τ))iui(x(τ))iui(x(τ¯))1μt2(ττ¯)GhLijNGj.\displaystyle\frac{1}{\mu_{t}}\cdot\|\nabla_{i}h(x(\tau))\|\|\nabla_{i}u_{i}(x(\tau))-\nabla_{i}u_{i}(x(\underline{\tau}))\|\leq\frac{1}{\mu_{t}^{2}}\cdot(\tau-\underline{\tau})G_{h}L_{i}\sum_{j\in N}G_{j}.

Here, the very last inequality uses the bounds on the magnitude of the gradients and the Lipschitz coefficient of uiu_{i}, that x(τ)x(τ¯)(1/μt)(ττ¯)jNGj\|x(\tau)-x(\underline{\tau})\|\leq(1/\mu_{t})\cdot(\tau-\underline{\tau})\cdot\sum_{j\in N}G_{j}, and that the projection operators onto closed convex sets in Euclidean space are non-expansive. Meanwhile, the second term (9) satisfies, by Proposition 3.2 and the Cauchy-Schwarz inequality,

|ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]dxi(τ)dτ|\displaystyle\left|\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle\right|
\displaystyle\leq\ 1μtih(x(τ))iui(x(τ¯))KGi(ττ¯)/μt1+KGi(ττ¯)/μt\displaystyle\frac{1}{\mu_{t}}\cdot\|\nabla_{i}h(x(\tau))\|\|\nabla_{i}u_{i}(x(\underline{\tau}))\|\cdot\frac{KG_{i}(\tau-\underline{\tau})/\mu_{t}}{1+KG_{i}(\tau-\underline{\tau})/\mu_{t}}
\displaystyle\leq\ 1μt2GhKGi2(ττ¯).\displaystyle\frac{1}{\mu_{t}^{2}}\cdot G_{h}KG_{i}^{2}(\tau-\underline{\tau}).

Combining the above bounds, we have

τ(t)τ(t)+μtηt𝑑τ|ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))]dxi(τ)dτ|\displaystyle\int_{\tau(t)}^{\tau(t)+\mu_{t}\eta_{t}}d\tau\cdot\left|\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle\right|
\displaystyle\leq\ τ(t)τ(t)+μtηt𝑑τ1μt2(ττ¯)Gh(KGi2+LijNGj)\displaystyle\int_{\tau(t)}^{\tau(t)+\mu_{t}\eta_{t}}d\tau\cdot\frac{1}{\mu_{t}^{2}}\cdot(\tau-\underline{\tau})\cdot G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right)
=\displaystyle=\ 1μt20μtηt𝑑ττGh(KGi2+LijNGj)\displaystyle\frac{1}{\mu_{t}^{2}}\cdot\int_{0}^{\mu_{t}\eta_{t}}d\tau\cdot\tau\cdot G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right)
=\displaystyle=\ 12ηt2Gh(KGi2+LijNGj).\displaystyle\frac{1}{2}\cdot\eta_{t}^{2}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

Combining the Lemmata 3.1 and 3.3 then allows us to obtain proof of approximation bounds for an ϵ\epsilon-stationary CCE.

Theorem 3.4.

Whenever all XiX_{i} are compact and convex sets with smooth boundary of bounded principal curvature KK, the approximate μ\mu-accelerated projected gradient dynamics satisfies,

1τ¯|iN0τ¯𝑑τih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]|\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\sum_{i\in N}\int_{0}^{\overline{\tau}}d\tau\cdot\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right| (10)
\displaystyle\leq\ 2d(X)Gh(μT1+μ0)τ¯+12τ¯(t=0T1ηt2μt)GhiN(KGi2+LijNGj).\displaystyle\frac{2d(X)G_{h}(\mu_{T-1}+\mu_{0})}{\overline{\tau}}+\frac{1}{2\overline{\tau}}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\cdot G_{h}\sum_{i\in N}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

As a consequence, for step sizes ηt=C/t+1\eta_{t}=C/\sqrt{t+1}, the approximate μ\mu-accelerated projected gradient dynamics yields, via sampling x(τ)x(\tau) where τU[0,τ¯]\tau\sim U[0,\overline{\tau}], an ϵ\epsilon-stationary CCE where

  1. 1.

    for the choice μt=1\mu_{t}=1, ϵ1T(2d(X)C+Cln(T)4)\epsilon\simeq\frac{1}{\sqrt{T}}\left(\frac{2d(X)}{C}+\frac{C\ln(T)}{4}\right), and

  2. 2.

    for the choice μt=1/ηt\mu_{t}=1/\eta_{t}, ϵ1T(4d(X)C+C)\epsilon\simeq\frac{1}{\sqrt{T}}\left(\frac{4d(X)}{C}+C\right),

with respect to the set of all functions h:Xh:X\rightarrow\mathbb{R} with Lipschitz gradients. In this case, for Definition 7 for a stationary CCE, poly(G,L,Gh,Lh)=1+iNGh(KGi2+LijNGj).\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})=1+\sum_{i\in N}G_{h}\cdot\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

Proof.

Denote τ(t)=t=0t1μtηt\tau(t)=\sum_{t^{\prime}=0}^{t-1}\mu_{t}\eta_{t}. Then we obtain the final approximation bound via setting,

1τ¯|0τ¯𝑑τiNih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]|\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\int_{0}^{\overline{\tau}}d\tau\cdot\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right|
=\displaystyle=\ 1τ¯|0τ¯𝑑τμτiNih(x(τ)),1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))]|\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\int_{0}^{\overline{\tau}}d\tau\cdot\mu_{\tau}\cdot\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{\tau}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right|
\displaystyle\leq\ 1τ¯|iN0τ¯𝑑τμτih(x(τ)),dxi(τ)dτ|\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\sum_{i\in N}\int_{0}^{\overline{\tau}}d\tau\cdot\mu_{\tau}\cdot\left\langle\nabla_{i}h(x(\tau)),\frac{dx_{i}(\tau)}{d\tau}\right\rangle\right|
+\displaystyle+\ 1τ¯iN0τ¯𝑑τμτ|ih(x(τ)),1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))]dxi(τ)dτ|\displaystyle\frac{1}{\overline{\tau}}\cdot\sum_{i\in N}\int_{0}^{\overline{\tau}}d\tau\cdot\mu_{\tau}\cdot\left|\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{\tau}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle\right|
\displaystyle\leq\ 1τ¯|t=0T1τ(t)τ(t)+μtηt𝑑τμtdh(x(τ))dτ|+1τ¯12(t=0T1ηt2μt)iNGh(KGi2+LijNGj)\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\sum_{t=0}^{T-1}\int_{\tau(t)}^{\tau(t)+\mu_{t}\eta_{t}}d\tau\cdot\mu_{t}\cdot\frac{dh(x(\tau))}{d\tau}\right|+\frac{1}{\overline{\tau}}\cdot\frac{1}{2}\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\sum_{i\in N}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right)
=\displaystyle=\ 1τ¯|t=0T1μt(h(xt+1)h(xt))|+1τ¯12(t=0T1ηt2μt)iNGh(KGi2+LijNGj)\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\sum_{t=0}^{T-1}\mu_{t}\cdot(h(x^{t+1})-h(x^{t}))\right|+\frac{1}{\overline{\tau}}\cdot\frac{1}{2}\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\sum_{i\in N}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right)
\displaystyle\leq\ 2d(X)Gh(μT1+μ0)τ¯+1τ¯12(t=0T1ηt2μt)iNGh(KGi2+LijNGj).\displaystyle\frac{2d(X)G_{h}(\mu_{T-1}+\mu_{0})}{\overline{\tau}}+\frac{1}{\overline{\tau}}\cdot\frac{1}{2}\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\sum_{i\in N}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right). (11)

We will provide the tighter bounds, from which it follows that we may set ϵ\epsilon and poly(G,L,Gh,Lh)\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h}) as desired while maintaining the validity of the inequality. If μt=1\mu_{t}=1, then τ¯2CT\overline{\tau}\simeq 2C\sqrt{T} and t=0T1ηt2C2ln(T)\sum_{t=0}^{T-1}\eta_{t}^{2}\simeq C^{2}\ln(T), in which case, (11) is

2d(X)GhCT+Cln(T)4TiNGh(KGi2+LijNGj).\simeq\frac{2d(X)G_{h}}{C\sqrt{T}}+\frac{C\ln(T)}{4\sqrt{T}}\sum_{i\in N}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

If instead μt=1/ηt\mu_{t}=1/\eta_{t}, then τ¯=T\overline{\tau}=T and t=0T1ηt2μt=t=0T1ηt2CT\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}=\sum_{t=0}^{T-1}\eta_{t}\simeq 2C\sqrt{T}, and thus (11) is

4d(X)Gh(1+T)CT+CTiNGh(KGi2+LijNGj).\simeq\frac{4d(X)G_{h}(1+\sqrt{T})C}{T}+\frac{C}{\sqrt{T}}\sum_{i\in N}G_{h}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

It remains to show the approximability of an ϵ\epsilon-local CCE with respect to the set of all differentiable functions h:Xh:X\rightarrow\mathbb{R} with Lipschitz gradients. For a set with a smooth boundary, at any time step, there is effectively one “co-moving” linear constraint that binds; the normal cone at any xiXix_{i}\in X_{i} is either a half-line or is equal to {0}\{0\}. This essentially allows us to repeat the tangency arguments we employed for the toy example of a hypercube in Section 3.1.

Theorem 3.5.

Whenever all XiX_{i} are compact and convex sets with smooth boundary of bounded principal curvature KK, the approximate μ\mu-accelerated projected gradient dynamics satisfies,

1τ¯iN0τ¯𝑑τΠ𝒯𝒞Xi(xi(τ))[ih(x(τ))],iui(x(τ))\displaystyle\frac{1}{\overline{\tau}}\cdot\sum_{i\in N}\int_{0}^{\overline{\tau}}d\tau\cdot\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\nabla_{i}u_{i}(x(\tau))\right\rangle (12)
\displaystyle\leq\ 2d(X)Gh(μT1+μ0)τ¯+12τ¯(t=0T1ηt2μt)GhiN(KGi2+LijNGj).\displaystyle\frac{2d(X)G_{h}(\mu_{T-1}+\mu_{0})}{\overline{\tau}}+\frac{1}{2\overline{\tau}}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\cdot G_{h}\sum_{i\in N}\left(KG_{i}^{2}+L_{i}\sum_{j\in N}G_{j}\right).

As a consequence, for step sizes ηt=C/t+1\eta_{t}=C/\sqrt{t+1}, the very same guarantees of Theorem 3.4 applies also for ϵ\epsilon-local CCE.

Proof.

Note that for any τ[0,τ¯]\tau\in[0,\overline{\tau}] and for every player ii,

Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],iui(x(τ))\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\nabla_{i}u_{i}(x(\tau))\right\rangle
\displaystyle\leq\ Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],iui(x(τ))iui(x(τ¯))\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\nabla_{i}u_{i}(x(\tau))-\nabla_{i}u_{i}(x(\underline{\tau}))\right\rangle (13)
+\displaystyle+\ Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],Π𝒩𝒞Xi(xi(τ))[iui(x(τ¯))]\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]\right\rangle (14)
+\displaystyle+\ Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],Π𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]dxi(τ)dτ\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]-\frac{dx_{i}(\tau)}{d\tau}\right\rangle (15)
+\displaystyle+\ Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],dxi(τ)dτ.\displaystyle\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\frac{dx_{i}(\tau)}{d\tau}\right\rangle. (16)

The terms (13) and (15) together yield the (t=0T1ηt2μt)(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}) term, as in the proof of Theorem 3.4. Meanwhile, (14) is 0\leq 0 by definition of the tangent and the normal cones. As for (16), the term is positive if and only if the normal vector component of dxi(τ)dτ\frac{dx_{i}(\tau)}{d\tau} is strictly less than 0; i.e. dxi(τ)dτ\frac{dx_{i}(\tau)}{d\tau} points strictly inwards to XiX_{i}. Therefore, dxi(τ)dτ\frac{dx_{i}(\tau)}{d\tau} is strictly in the interior of 𝒯𝒞Xi(xi(τ))\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right), and thus for some positive γ>0\gamma>0, xi(τ)x_{i}(\tau^{\prime}) is in the interior of XiX_{i} for τ(τ,τ+γ)\tau^{\prime}\in(\tau,\tau+\gamma). We conclude that the set of such τ\tau has measure zero by the cardinality arguments made in the case of hypercubes in Section 3. As a consequence, for almost every τ[0,τ¯]\tau\in[0,\overline{\tau}],

Π𝒯𝒞Xi(xi(τ))[ih(x(τ))],dxi(τ)dτ=ih(x(τ)),dxi(τ)dτ,\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\frac{dx_{i}(\tau)}{d\tau}\right\rangle=\left\langle\nabla_{i}h(x(\tau)),\frac{dx_{i}(\tau)}{d\tau}\right\rangle,

which implies that (16) yields the (μT1+μ0)/τ¯\propto(\mu_{T-1}+\mu_{0})/\overline{\tau} term, again as in the proof of Theorem 3.4. ∎

The discussion in this section demonstrates common themes regarding the connection between local CCE, stationary CCE, and the projected gradient dynamics of the underlying smooth game. Approximate projected gradient dynamics provides an approximate stationary CCE, and simultaneously an approximate local CCE, as the tangency of the motion through the strategy space implies that as far as the gradient dynamics are concerned, a local CCE is a relaxation of a stationary CCE. Meanwhile, the complexity of approximation is determined via the properties of the boundary of the strategy space, with approximation guarantees deteriorating linearly with the maximum curvature.

3.3 Polyhedral action sets

We now turn our attention to the approximability of stationary and local CCE for smooth games, when the action set XiX_{i} of each player ii is polyhedral; that is to say, for each player ii, there exists Aimi×diA_{i}\in\mathbb{R}^{m_{i}\times d_{i}} and bimib_{i}\in\mathbb{R}^{m_{i}} such that Xi={xdi|Aixbi}X_{i}=\{x\in\mathbb{R}^{d_{i}}\ |\ A_{i}x\leq b_{i}\}. We shall assume, without loss of generality, that all involved polyhedra have volume in di\mathbb{R}^{d_{i}}, and each AixbiA_{i}x\leq b_{i} has no redundant inequalities. Moreover, the rows aija_{ij} of each AiA_{i} will be assumed to be normalised, aij=1iN,jmi\|a_{ij}\|=1\ \forall\ i\in N,j\in m_{i}. With that in mind, we begin by defining the class of polyhedra of interest, over which approximation of stationary or local CCE is “easy”.

Definition 10.

A polyhedron {xd|aj,xbjjm}\{x\in\mathbb{R}^{d}\ |\ \left\langle a_{j},x\right\rangle\leq b_{j}\ \forall\ j\in m\} is said to be acute if for every distinct j,jmj,j^{\prime}\in m, aj,aj0\left\langle a_{j},a_{j^{\prime}}\right\rangle\leq 0.

Examples of acute polyhedra include potentially the most “game-theoretically relevant” polyhedra, the hypercube and the simplex. That the hypercube is acute in this sense is straightforward: for each di\ell\in d_{i}, the constraints xi0-x_{i\ell}\leq 0 and xi1x_{i\ell}\leq 1 have negative inner product for the corresponding rows of AA, while any other distinct pairs of rows of AA are orthogonal. In turn for the simplex, factoring out the constraint =1dixi=1\sum_{\ell=1}^{d_{i}}x_{i}=1, we are left with the set of inequalities,

di,xi=1di1nxi1n.\forall\ \ell\in d_{i},x_{i\ell}-\sum_{\ell^{\prime}=1}^{d_{i}}\frac{1}{n}x_{i\ell^{\prime}}\geq-\frac{1}{n}.

For any distinct ,′′\ell,\ell^{\prime\prime}, the inner product of the vectors associated with the left-hand side of these constraints is then

ei=1di1nei,ei′′=1di1nei=1n1n+1n<0.\left\langle e_{i\ell}-\sum_{\ell^{\prime}=1}^{d_{i}}\frac{1}{n}e_{i\ell^{\prime}},e_{i\ell^{\prime\prime}}-\sum_{\ell^{\prime}=1}^{d_{i}}\frac{1}{n}e_{i\ell^{\prime}}\right\rangle=-\frac{1}{n}-\frac{1}{n}+\frac{1}{n}<0.

As mentioned, the importance of acute polyhedra is that linear optimisation over them is trivial; a greedy algorithm suffices. Moreover, over such polyhedra xi(τ)x_{i}(\tau) always follows the tangent cone projection of iui(x(τ¯))\nabla_{i}u_{i}(x(\underline{\tau})) (modulo scaling by 1/μτ1/\mu_{\tau}), rendering the approximate projected gradient dynamics faithful. The latter statement is the one we need for the desired approximation bounds, which is proven in the following lemma.

Lemma 3.6.

Suppose that X={xd|aj,xbjjm}X=\{x\in\mathbb{R}^{d}\ |\ \left\langle a_{j},x\right\rangle\leq b_{j}\ \forall\ j\in m\} is an acute polyhedron in d\mathbb{R}^{d}, xXx^{*}\in X, and gdg\in\mathbb{R}^{d}. Then for x(τ)=ΠX[x+τg]x(\tau)=\Pi_{X}\left[x^{*}+\tau\cdot g\right],

dx(τ)dτ=Π𝒯𝒞X(x(τ))[g].\frac{dx(\tau)}{d\tau}=\Pi_{\mathcal{TC}_{X}\left(x(\tau)\right)}\left[g\right].
Proof.

Without loss of generality, we will work in coordinates such that x=0x^{*}=0. Consider the projection problem for any τ0\tau^{\prime}\geq 0,

minx12xτg2 subject to Axb.\displaystyle\min_{x}\frac{1}{2}\|x-\tau^{\prime}\cdot g\|^{2}\textnormal{ subject to }Ax\leq b.

Its dual problem is given,

maxμ012ATμ2+μ,τAgb.\displaystyle\max_{\mu\geq 0}-\frac{1}{2}\|A^{T}\mu\|^{2}+\left\langle\mu,\tau^{\prime}\cdot Ag-b\right\rangle.

Now, given a sequence τ>τ0\tau^{\prime}>\tau\geq 0 decreasing to ττ\tau^{\prime}\downarrow\tau, there is an infinitely repeated set of active rows J={jm|μj(τ)>0}J=\{j\in m\ |\ \mu_{j}(\tau^{\prime})>0\} for the optimal solutions μ(τ)\mu(\tau^{\prime}) to the dual projection problem. This is necessarily the set of active and relevant constraints at time τ\tau, and for the corresponding |J|×d|J|\times d submatrix BB of AA,

μj(τ)=((BBT)1[τBgb])j,\mu_{j}(\tau^{\prime})=((BB^{T})^{-1}[\tau^{\prime}\cdot Bg-b])_{j},

for ττ\tau^{\prime}\geq\tau sufficiently close to τ\tau. In particular, dμ(τ)j/dτ=[(BBT)1Bg]jd\mu(\tau)_{j}/d\tau=[(BB^{T})^{-1}Bg]_{j} whenever jJj\in J and zero otherwise, while dx(τ)/dτ=gBT(BBT)1Bgdx(\tau)/d\tau=g-B^{T}(BB^{T})^{-1}Bg.

Now, the acuteness condition implies that the off-diagonal elements of BBTBB^{T} are non-positive, whereas it is both positive semi-definite and invertible; implying it is positive definite. Therefore, by [37] (Theorem 4.3, 99^{\circ} and 1111^{\circ}), (BBT)1(BB^{T})^{-1} has all of its entries non-negative. Meanwhile, by feasibility of xx^{*} (=0=0 in our choice of coordinates), bb has all of its entries non-negative, which implies that (BBT)1b(BB^{T})^{-1}b also has only non-negative entries. Therefore, dμj(τ)/dτ=limττ(μj(τ)+[(BBT)1b]j)/τ>0d\mu_{j}(\tau)/d\tau=\lim_{\tau^{\prime}\downarrow\tau}(\mu_{j}(\tau^{\prime})+[(BB^{T})^{-1}b]_{j})/\tau^{\prime}>0 for every row jj which is active, and zero otherwise.

Thus all that remains to show is that Π𝒯𝒞X(x)[g]=gBT(BBT)1Bg\Pi_{\mathcal{TC}_{X}\left(x\right)}\left[g\right]=g-B^{T}(BB^{T})^{-1}Bg. Let ImI\subseteq m be the set of constraints which bind at x(τ)x(\tau) (but potentially, for jI,dμ(τ)j/dτ=0j\in I,d\mu(\tau)_{j}/d\tau=0), and CC the associated |I|×m|I|\times m submatrix of AA. Then consider the tangent cone projection problem

minx12xg2 subject to Cx0.\displaystyle\min_{x}\frac{1}{2}\|x-g\|^{2}\textnormal{ subject to }Cx\leq 0.

The dual projection problem is then,

maxν012CTν2+ν,Cg.\displaystyle\max_{\nu\geq 0}-\frac{1}{2}\|C^{T}\nu\|^{2}+\left\langle\nu,Cg\right\rangle.

We need to show that ν=(BBT)1Bg\nu=(BB^{T})^{-1}Bg is an optimal solution. But this is immediate now, as the feasibility of x(τ)+Δτ(gBT(BBT)1Bg)x(\tau)+\Delta\tau\cdot(g-B^{T}(BB^{T})^{-1}Bg) for small Δτ>0\Delta\tau>0 implies that x=gBT(BBT)1Bgx=g-B^{T}(BB^{T})^{-1}Bg is a feasible solution to the tangent cone projection problem, with solution value gTBT(BBT)1Bg/2g^{T}B^{T}(BB^{T})^{-1}Bg/2. Meanwhile, the given ν=dμ(τ)/dτ\nu=d\mu(\tau)/d\tau is dual feasible, with the same solution value. By weak duality, both solutions are necessarily optimal. ∎

Remark. The implications of Lemma 3.6 were already proven for the hypercube and the simplex in [57], in which the authors study approximations of projected gradient dynamics. Indeed, the analysis here is in a similar vein in that we track the time evolution of the projection curve, though we identify and exploit features of the polyhedron which makes linear optimisation over it trivial.

It turns out that for acute polyhedra, it is possible to prove stronger approximation guarantees than the general case. Intuitively, that the approximate motion never fails to move along the tangent of utility gradients results in well-approximability of the projected gradient dynamics of the smooth game.

Lemma 3.7.

Whenever XiX_{i} is an acute polytope, the outcome of the approximate μ\mu-accelerated gradient dynamics satisfies, for every player ii and every time step tt with τ(t)=t=0t1μtηt\tau(t)=\sum_{t^{\prime}=0}^{t-1}\mu_{t}\eta_{t},

τ(t)τ(t)+μtηt𝑑τ|ih(x(τ)),1μtΠ𝒯𝒞Xi(xi(τ))[iui(x(τ))dxi(τ)dτ]|12ητ2GhLijNGj,\int_{\tau(t)}^{\tau(t)+\mu_{t}\eta_{t}}d\tau\cdot\left|\left\langle\nabla_{i}h(x(\tau)),\frac{1}{\mu_{t}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))-\frac{dx_{i}(\tau)}{d\tau}\right]\right\rangle\right|\leq\frac{1}{2}\cdot\eta_{\tau}^{2}G_{h}L_{i}\sum_{j\in N}G_{j},

independent of the choice of μt\mu_{t}.

Proof.

Identical to Lemma 3.3, wherein the curvature loss incurred from Proposition 3.2 is replaced by 0 due to Lemma 3.6. ∎

Theorem 3.8.

Whenever all XiX_{i} are acute polyhedra, the outcome of the approximate μ\mu-accelerated gradient dynamics satisfies, for every player ii and every time step tt with τ(t)=t=0t1μtηt\tau(t)=\sum_{t^{\prime}=0}^{t-1}\mu_{t}\eta_{t},

1τ¯|iN0τ¯𝑑τih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]|\displaystyle\frac{1}{\overline{\tau}}\cdot\left|\sum_{i\in N}\int_{0}^{\overline{\tau}}d\tau\cdot\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right| (17)
\displaystyle\leq\ 2d(X)Gh(μT1+μ0)τ¯+12τ¯(t=0T1ηt2μt)GhiNLijNGj.\displaystyle\frac{2d(X)G_{h}(\mu_{T-1}+\mu_{0})}{\overline{\tau}}+\frac{1}{2\overline{\tau}}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}^{2}\mu_{t}\right)\cdot G_{h}\sum_{i\in N}L_{i}\sum_{j\in N}G_{j}.

As a consequence, for step sizes ηt=C/t+1\eta_{t}=C/\sqrt{t+1}, the approximate μ\mu-accelerated projected gradient dynamics yields, via sampling x(τ)x(\tau) where τU[0,τ¯]\tau\sim U[0,\overline{\tau}], an ϵ\epsilon-stationary CCE with respect to the set of all functions h:Xh:X\rightarrow\mathbb{R} with Lipschitz gradients. In this case, the same ϵ\epsilon guarantees hold as in Theorem 3.4, where we fix poly(G,L,Gh,Lh)=1+iNGhLijNGj\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})=1+\sum_{i\in N}G_{h}L_{i}\sum_{j\in N}G_{j}. Analogously to Theorem 3.5, the μ\mu-accelerated projected gradient dynamics also yields an ϵ\epsilon-local CCE.

Proof.

Near identical to the proof of Theorem 3.4 and Theorem 3.5, except by Lemma 3.6

1μτΠ𝒯𝒞Xi(xi(τ))[iui(x(τ¯))]dxi(τ)dτ=0,\frac{1}{\mu_{\tau}}\cdot\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\underline{\tau}))\right]-\frac{dx_{i}(\tau)}{d\tau}=0,

implying that there are no projection losses. However, we nevertheless need an alternative argument that for almost every τ[0,τ¯]\tau\in[0,\overline{\tau}],

Π𝒩𝒞Xi(xi(τ))[ih(x(τ))],dxi(τ)dτ=0.\left\langle\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right],\frac{dx_{i}(\tau)}{d\tau}\right\rangle=0.

Now, by [57] (Lemma 2), the curve xi(τ)x_{i}(\tau) is piecewise linear whenever XiX_{i} is polyhedral. Then, a constraint aij,xjbij\left\langle a_{ij},x_{j}\right\rangle\leq b_{ij} binds over an interval (τ,τ+γ)(\tau,\tau+\gamma) only if aij,dxi(τ)dτ=0\left\langle a_{ij},\frac{dx_{i}(\tau)}{d\tau}\right\rangle=0 for such tt. However, Π𝒩𝒞Xi(xi(τ))[ih(x(τ))]=jμj(τ)aij\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right]=\sum_{j}\mu_{j}(\tau)a_{ij} for such binding constraints aija_{ij} and some μj(τ)0\mu_{j}(\tau)\geq 0, which implies the desired result. ∎

The interesting case is thus when the polyhedra we consider are not acute. This is counter-intuitive; acuteness in a sense implies that smooth boundary approximations of the convex body would have very high curvature, a condition under which the differential regret guarantees of approximate projected gradient dynamics deteriorates. However, acute polytopes in turn enjoy even better convergence guarantees. Meanwhile, the case for general polyhedra remains open.

3.4 (Partially) Adversarial Regimes

Sections 3.2 & 3.3 essentially provides inequalities satisfied by continuous curves we obtain by extending the sequence obtained via projected gradient ascent into a continuous curve, which may be regarded as an approximation of the actual projected gradient dynamics of the underlying smooth game. Whereas the bounds of Theorems 3.4, 3.5 and 3.8 are provided for the cases when all XiX_{i} have smooth boundary of a uniformly bounded curvature or all XiX_{i} are acute polytopes, we readily observe that when players may have either as their strategy sets, the regret guarantees would not deteriorate.

However, our results are more sensitive to the fact that every player uses the very same step sizes. In this section, we shall relax this assumption, and work instead under the assumption that a subset NNN^{\prime}\subseteq N of players use the same step sizes, whereas players NNN^{\prime}\setminus N might play adversarially. The high level idea is that as players iNi\in N^{\prime} have their strategies vary in the same manner, the continuous curve we consider should try to connect their strategies in time, while that of players jNNj\in N^{\prime}\setminus N can be taken as constant in each time step.

We remark, also, that setting μt=1\mu_{t}=1 for the continuous curve obtained via the approximate μ\mu-accelerated projected gradient dynamics made sense in the preceding section, because it could be taken as an approximation of the underlying dynamics, with the individual step sizes providing the “proper time” experience through the motion of the curve. With players learning with different step sizes, at rates that might go to 0 or ++\infty, there is no longer such a valid choice. Therefore, we will fix attention to the case when we effectively take μt=1/ηt\mu_{t}=1/\eta_{t}.

Thus, we focus on the setting when players iNi\in N^{\prime} update their strategies via projected gradient ascent, setting xit+1=ΠXi[xit+ηtiui(xt)]x_{i}^{t+1}=\Pi_{X_{i}}\left[x_{i}^{t}+\eta_{t}\cdot\nabla_{i}u_{i}(x^{t})\right], whereas players jNj\notin N^{\prime} will be allowed to pick their strategies xjt+1x_{j}^{t+1} arbitrarily. We then fix, as the approximate restricted time-averaged projected gradient dynamics, the curve x:[0,T]Xx:[0,T]\rightarrow X, where

xi(τ)={ΠXi[xit+(τt)ηtiui(xt)]iN,τ[t,t+1),xitiN,τ[t,t+1).x_{i}(\tau)=\begin{cases}\Pi_{X_{i}}\left[x_{i}^{t}+(\tau-t)\cdot\eta_{t}\nabla_{i}u_{i}(x^{t})\right]&i\in N^{\prime},\tau\in[t,t+1),\\ x_{i}^{t}&i\notin N^{\prime},\tau\in[t,t+1).\end{cases} (18)

Since only the players in NN^{\prime} update their strategies in a continuous manner, we will now need to restrict the scope of the functions we consider. In particular, we will ask that for any jNj\notin N^{\prime}, jh(x)=0\nabla_{j}h(x)=0. We remark again that when only a single player ii updates their strategies via projected gradient ascent, this is equivalent to letting h:XiXih:X_{i}\rightarrow X_{i}.

Our result, as one might expect from the construction of xx, is in line with those in the preceding discussion. Sampling appropriately from the curve xx, the players in NN^{\prime} incur vanishing aggregate regret against such hh in aggregate, both for the notions of ϵ\epsilon-stationary or ϵ\epsilon-local CCE.

Theorem 3.9.

Suppose that each player iNi\in N^{\prime} updates their strategies via projected gradient ascent, using the same step sizes ηt\eta_{t}. Moreover, assume for each such ii that XiX_{i} has smooth boundary with bounded curvature KiK_{i}, or is an acute polytope (in which case we let Ki=0K_{i}=0). Then

both 1T|iN0T𝑑τih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]|,\displaystyle\frac{1}{T}\cdot\left|\sum_{i\in N}\int_{0}^{T}d\tau\cdot\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right|,
1TiN0T𝑑τiui(x(τ)),Π𝒯𝒞Xi(xi(τ))[ih(x(τ))]\displaystyle\frac{1}{T}\cdot\sum_{i\in N}\int_{0}^{T}d\tau\cdot\left\langle\nabla_{i}u_{i}(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}h(x(\tau))\right]\right\rangle
\displaystyle\leq\ 2d(X)Gh(1/ηT1+1/η0)T+12T(t=0T1ηt)GhiN(KiGi2+LijNGj).\displaystyle\frac{2d(X)G_{h}(1/\eta_{T-1}+1/\eta_{0})}{T}+\frac{1}{2T}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}\right)\cdot G_{h}\sum_{i\in N^{\prime}}\left(K_{i}G_{i}^{2}+L_{i}\sum_{j\in N^{\prime}}G_{j}\right).

As a consequence, for step sizes ηt=C/t+1\eta_{t}=C/\sqrt{t+1}, the approximate restricted time-averaged projected gradient dynamics yields, via sampling x(τ)x(\tau) where τU[0,T]\tau\sim U[0,T], an ϵ\epsilon-stationary CCE with respect to the set of functions h:Xh:X\rightarrow\mathbb{R} with bounded, Lipschitz gradients such that for any jN,jh(x)=0j\notin N^{\prime},\nabla_{j}h(x)=0. In this case, the same ϵ\epsilon guarantees hold as in Theorem 3.4, where we fix poly(G,L,Gh,Lh)=1+iNGh(KiGi2+LijNGj)\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})=1+\sum_{i\in N^{\prime}}G_{h}(K_{i}G_{i}^{2}+L_{i}\sum_{j\in N}G_{j}). Analogously to Theorem 3.5, the approximate restricted time-averaged projected gradient dynamics also yields an ϵ\epsilon-local CCE.

Proof.

The bounds of Lemmata 3.3 and 3.7 are modified jNjN\sum_{j\in N}\rightarrow\sum_{j\in N^{\prime}}, as by the construction of xx, for jNj\notin N^{\prime}, xj(τ)x_{j}(\tau) is constant over each [t,t+1)[t,t+1). The assumption that hh is constant over xjx_{j} for jNj\notin N^{\prime} implies that jh(x(τ))=0\nabla_{j}h(x(\tau))=0 and that the h(xt+1h(xt))h(x^{t+1}-h(x^{t})) term may be obtained by looking at the continuous variance of (xi(τ))iN(x_{i}(\tau))_{i\in N^{\prime}}, from which the result follows immediately as in the proofs of Theorem 3.4, 3.5 and 3.8. ∎

We remark that even when all players might be employing projected gradient ascent, different learning rates do not in general allow us to assemble the data (xt)t(x^{t})_{t\in\mathbb{N}} into a single continuous curve over the guarantees of Theorem 3.9 holds for every subset of players NN^{\prime} whose step sizes are equal. One exception, however, is when HH contains only functions whose gradients require no projection, in which case the sequence (xt)(x^{t}) itself will satisfy vanishing regret guarantees. The discussion of the necessary arguments will need to wait until Section 5.1, when we consider the connection between stationary & local equilibria, and duality in relation to the price of anarchy literature.

4 Approximability of First-Order Correlated Equilibria

We now turn our attention to when local or stationary correlated equilibria are approximable. Unlike the case for local coarse correlated equilibria, here we do not expect a tractable “universal approximation scheme”. Indeed, if FF contains all vector fields which have Lipschitz modulus L\leq L, for any smooth game FF contains the vector field (L/Li)(iui)(L/L_{i})\cdot(\nabla_{i}u_{i}) for each player ii. As a consequence, any such ϵ\epsilon-local (or stationary) CE σ\sigma satisfies the inequalities

𝔼xσΠ𝒯𝒞Xi(xi)[iui(x)],iui(x)ϵpoly(G,L,G,L)iN.\mathbb{E}_{x\sim\sigma}\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],\nabla_{i}u_{i}(x)\right\rangle\leq\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G,L)\ \forall\ i\in N.

For small enough ϵ\epsilon, the support of the approximate local CE contains an approximate local Nash equilibrium, for which no polynomial time algorithm is known. However, we will see in Section 5.3 that ϵ\epsilon-CE for normal form games is equivalent to an ϵ\epsilon-local CE with respect to a set of vector fields,

F={xi(ai)(eiaieiai)|iN,ai,aiAi}.F=\{x_{i}(a_{i})\cdot(e_{ia^{\prime}_{i}}-e_{ia_{i}})\ |\ i\in N,a_{i},a^{\prime}_{i}\in A_{i}\}.

As efficient algorithms for computing approximate correlated equilibria exist, we would suspect that there might exist tractable algorithms to compute local or stationary CE for appropriately chosen families of vector fields FF. This is indeed the case, independent of the (non)-convexity of the continuous game, as we will show in the following section.

4.1 Φ\Phi-regret minimisation and approximability of ϵ\epsilon-local/stationary equilibria

Our promise of the approximability of local correlated equilibria in fact follows from standard Φ\Phi/swap-regret minimisation algorithms, e.g. [76, 44, 39, 41]. Whenever the family of vector fields FF has finitely many elements, we show that such algorithms are applicable to obtain O(1/T)O(1/\sqrt{T})-stationary correlated equilibria after TT iterations, provided we have access to a fixed-point oracle for every linear combination over FF. Such guarantees are also apply to O(1/T)O(1/\sqrt{T})-local correlated equilibria, provided each fFf\in F satisfies a tangency condition. Whenever we are also guaranteed that fFμff2\|\sum_{f\in F}\mu_{f}f\|^{2} is a convex function for every μ:F+\mu:F\rightarrow\mathbb{R}_{+}, a fixed-point may be computed as the solution of a convex problem. As mentioned, this setting subsumes usual correlated equilibria of normal form games as a special case, and we demonstrate by example that such algorithms may incorporate “rotational corrections”; ways to eliminate cyclic behaviour. Altogether, this implies that there exists a non-trivial strengthening of correlated equilibria which is still tractably approximable.

For the proof of our bounds, we apply the framework of [41] which in turn follows from the analysis of [42]. In our approximation results, for simplicity we restrict attention to quadratic potential functions. Proof of convergence are then an immediate consequence of [41] (Theorem 11); we nevertheless provide the algorithms and the associated proofs for the sake of a complete exposition. To wit, for a fixed smooth game and a family of vector fields FF, recall that an ϵ\epsilon-stationary CE is a distribution σ\sigma on XX such that

fF,|iNX𝑑σ(x)Π𝒯𝒞Xi(xi)[iui(x)],fi(x)|ϵpoly(G,L,Gf,Lf),\forall\ f\in F,\left|\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right],f_{i}(x)\right\rangle\right|\leq\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}),

where the poly(G,L,Gf,Lf)\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}) factor is fixed in advance. In our further analysis of ϵ\epsilon-stationary (and also ϵ\epsilon-local) CE, we shall fix attention to finite |F||F|, and fix the poly-factor to 11 – absorbing the bounds to ϵ\epsilon.

We consider the history of our algorithm to be a sequence of probability distributions (σt)t(\sigma_{t})_{t\in\mathbb{N}} on XX. We then denote the differential stationarity regret with respect to fFf\in F at iteration tt as

μtf=X𝑑σt(x)iNfi(x),Π𝒯𝒞Xi(xi)[iui(x)].\mu_{tf}=\int_{X}d\sigma_{t}(x)\cdot\sum_{i\in N}\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle.

Our first regret matching algorithm then is given:

Input: Smooth game (N,(Xi)iN,(ui)iN)\left(N,(X_{i})_{i\in N},(u_{i})_{i\in N}\right), set of vector fields FF, σ1Δ(X)\sigma_{1}\in\Delta(X), ϵ\epsilon
Output: σt\sigma_{t}, an approximate stationary CE
1 t1t\leftarrow 1;
2 μ1fiNX𝑑σt(x)fi(x),Π𝒯𝒞Xi(xi)[iui(x)]fF\mu_{1f}\leftarrow\sum_{i\in N}\int_{X}d\sigma_{t}(x)\cdot\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle\ \forall\ f\in F;
3 while fF,|μtf|>ϵpoly(G,L,Gf,Lf)\exists\ f\in F,|\mu_{tf}|>\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}) do
4       xt+1x_{t+1}\leftarrow a fixed point of fFμtff\sum_{f\in F}\mu_{tf}f in XX;
5       Find suitable αt(0,1)\alpha_{t}\in(0,1);
6       σt+1(1αt)σt+αtδ(xt+1)\sigma_{t+1}\leftarrow(1-\alpha_{t})\cdot\sigma_{t}+\alpha_{t}\cdot\delta(x_{t+1});
7       tt+1t\leftarrow t+1;
8       μtfiNX𝑑σt(x)fi(x),Π𝒯𝒞Xi(xi)[iui(x)]fF\mu_{tf}\leftarrow\sum_{i\in N}\int_{X}d\sigma_{t}(x)\cdot\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle\ \forall\ f\in F;
9      
10 end while
Algorithm 1 Regret matching for ϵ\epsilon-stationary CE

Here, δ(xt+1)\delta(x_{t+1}) in line 66 denotes the point mass distribution on xt+1x_{t+1}. The convergence of Algorithm 1 then follows from usual arguments for Φ\Phi-regret minimisation, adjusting for the fact that we are dealing with equality constraints for stationarity and that our action transformations are defined via infinitesimal translations about a vector field in a constrained set.

Theorem 4.1.

Suppose that in Algorithm 1, αt=1/(t+1)\alpha_{t}=1/(t+1) or is chosen optimally to minimize fFμtf2\sum_{f\in F}\mu_{tf}^{2} at each time step. Then after tt iterations,

maxfF|μtf||F|t+1(iNGimaxfFGf).\max_{f\in F}|\mu_{tf}|\leq\sqrt{\frac{|F|}{t+1}}\cdot\left(\sum_{i\in N}G_{i}\cdot\max_{f\in F}G_{f}\right).

As a consequence, an ϵ\epsilon-stationary CE may be computed after O(|F|/ϵ2)O(|F|/\epsilon^{2}) iterations, given a fixed-point oracle for fFμff\sum_{f\in F}\mu_{f}f for any real vector μF\mu\in\mathbb{R}^{F}.

Proof.

For αt=1/(t+1)\alpha_{t}=1/(t+1), we shall track the time evolution of tμtft\mu_{tf}, i.e. the cumulative regret. In this case, note that for a potential G(μ)=μ2G(\mu)=\|\mu\|^{2}, link function g(μ)=2μg(\mu)=2\mu and error terms γ(μ)=μ2\gamma(\mu)=\|\mu\|^{2}, (G,g,γ)(G,g,\gamma) is trivially a Gordon triple (cf. [41], Definition 5) as the condition

G(μ+Δμ)G(μ)+g(μ),Δμ+γ(Δμ)G(\mu+\Delta\mu)\leq G(\mu)+\left\langle g(\mu),\Delta\mu\right\rangle+\gamma(\Delta\mu)

holds with equality for any μ,Δμ:F\mu,\Delta\mu:F\rightarrow\mathbb{R}. Now, for our choice of αt\alpha_{t},

(t+1)μ(t+1)f=tμtf+iNfi(xt+1),Π𝒯𝒞Xi(x(t+1)i)[iui(xt+1)].(t+1)\cdot\mu_{(t+1)f}=t\cdot\mu_{tf}+\sum_{i\in N}\left\langle f_{i}(x_{t+1}),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{(t+1)i}\right)}\left[\nabla_{i}u_{i}(x_{t+1})\right]\right\rangle.

Therefore,

g(tμtf)iNfi(xt+1),iui(xt+1)\displaystyle\ g(t\cdot\mu_{tf})\cdot\sum_{i\in N}\left\langle f_{i}(x_{t+1}),\nabla_{i}u_{i}(x_{t+1})\right\rangle
=\displaystyle= 2tiNfFμtffi(xt+1),Π𝒯𝒞Xi(x(t+1)i)[iui(xt+1)]\displaystyle\ 2t\cdot\sum_{i\in N}\left\langle\sum_{f\in F}\mu_{tf}f_{i}(x_{t+1}),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{(t+1)i}\right)}\left[\nabla_{i}u_{i}(x_{t+1})\right]\right\rangle (19)
+fF(iNfi(xt+1),Π𝒯𝒞Xi(x(t+1)i)[iui(xt+1)])2\displaystyle\ +\sum_{f\in F}\left(\sum_{i\in N}\left\langle f_{i}(x_{t+1}),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{(t+1)i}\right)}\left[\nabla_{i}u_{i}(x_{t+1})\right]\right\rangle\right)^{2}
\displaystyle\leq |F|(maxfFGfiNGi)2.\displaystyle\ |F|\cdot(\max_{f\in F}G_{f}\cdot\sum_{i\in N}G_{i})^{2}.

Here, the bound follows from the bounds on the magnitutes of f,iuif,\nabla_{i}u_{i}, and since (19) is non-negative by the fixed point assumption, fFμtff(xt+1)𝒩𝒞X(xt+1)\sum_{f\in F}\mu_{tf}f(x_{t+1})\in\mathcal{NC}_{X}\left(x_{t+1}\right), which is true if and only if fFμtff(x(t+1)i)𝒩𝒞Xi(x(t+1)i)\sum_{f\in F}\mu_{tf}f(x_{(t+1)_{i}})\in\mathcal{NC}_{X_{i}}\left(x_{(t+1)i}\right) for every player ii777Recall that for any convex set XX and any xXx\in X, if μ\mu is an element of the normal cone to XX at xx and ν\nu an element of the tangent cone, then μ,ν0\left\langle\mu,\nu\right\rangle\leq 0.. As a consequence, by [41] (Theorem 6),

(t+1)2fFμ(t+1)f2fFμ1f2+t|F|(maxfFGfiNGi)2(t+1)|F|(maxfFGfiNGi)2.(t+1)^{2}\sum_{f\in F}\mu_{(t+1)f}^{2}\leq\sum_{f\in F}\mu_{1f}^{2}+t\cdot|F|\cdot(\max_{f\in F}G_{f}\cdot\sum_{i\in N}G_{i})^{2}\leq(t+1)\cdot|F|(\max_{f\in F}G_{f}\cdot\sum_{i\in N}G_{i})^{2}.

The result follows immediately. ∎

Remark. We note that simply substituting the computed fixed points xt+1x_{t+1} with (approximately) stationary distributions σt+1Δ(X)\sigma^{\prime}_{t+1}\in\Delta(X) with respect to the vector field fFμtff\sum_{f\in F}\mu_{tf}f in Algorithm 1 is insufficient for our purposes in general. This is because the “reward system” in our setting formed by the utility gradients, which do not necessarily form a conservative vector field. In particular, even if σt+1\sigma^{\prime}_{t+1} is induced as a measure via some closed parametrised loop c:[0,]Xc:[0,\ell]\rightarrow X with dc(u)/dτ=Π𝒯𝒞X(c(u))[fFμtff(c(u))]dc(u)/d\tau=\Pi_{\mathcal{TC}_{X}\left(c(u)\right)}\left[\sum_{f\in F}\mu_{tf}f(c(u))\right] almost everywhere, it can be the case that

fFμtfiNX𝑑σt+1(x)fi(x),Π𝒯𝒞Xi(xi)[iui(x)]>0.\sum_{f\in F}\mu_{tf}\cdot\sum_{i\in N}\int_{X}d\sigma_{t+1}^{\prime}(x)\cdot\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle>0.

For instance, this is true when fFμtffi=iui\sum_{f\in F}\mu_{tf}f_{i}=\nabla_{i}u_{i} for each player ii. It is in this sense that access to a fixed point oracle may be considered necessary.

One exception, of course, is if we are dealing with a potential game, wherein iui=iV\nabla_{i}u_{i}=\nabla_{i}V for some smooth potential function V:XV:X\rightarrow\mathbb{R}. In this case if XX is one of the suitable convex bodies studied in Section 3, by the discussion therein for any stationary distribution σ\sigma computed via the gradient dynamics for fFμfF\sum_{f\in F}\mu_{f}F given some μ:F\mu:F\rightarrow\mathbb{R},

fFμtfiNX𝑑σ(x)fi(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{f\in F}\mu_{tf}\cdot\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle
\displaystyle\leq iNX𝑑σ(x)iV(x),Π𝒯𝒞Xi(xi)[fFμtffi(x)]0.\displaystyle\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle\nabla_{i}V(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\sum_{f\in F}\mu_{tf}f_{i}(x)\right]\right\rangle\approx 0.

However, in this case a local correlated equilibrium may be computed by simply performing gradient ascent on VV and finding an approximate local maximum. Therefore, it is unclear whether a regret matching algorithm would provide any advantages regarding tractable computation.

The approximation of an ϵ\epsilon-local CE given access to a fixed-point oracle follows near identically. One important point of note is that local CE involve tangent cone projections of the associated vector fields ff. In general, it is not true that Π𝒯𝒞X(x)[fFμff]=fFμfΠ𝒯𝒞X(x)[f]\Pi_{\mathcal{TC}_{X}\left(x\right)}\left[\sum_{f\in F}\mu_{f}f\right]=\sum_{f\in F}\mu_{f}\Pi_{\mathcal{TC}_{X}\left(x\right)}\left[f\right], except when for every xXx\in X and fFf\in F, f(x)𝒯𝒞X(x)f(x)\in\mathcal{TC}_{X}\left(x\right) in which case the equality is assured by the cone property. We shall call FF tangential whenever this is the case, and denote the differential local regret with respect to fFf\in F at iteration tt as

μtf=max{X𝑑σt(x)iNfi(x),Π𝒯𝒞Xi(xi)[iui(x)],0}.\mu_{tf}=\max\left\{\int_{X}d\sigma_{t}(x)\cdot\sum_{i\in N}\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle,0\right\}.

Adapting Algorithm 1 to differential local regret then outputs an approximate local CE:

Input: Smooth game (N,(Xi)iN,(ui)iN)\left(N,(X_{i})_{i\in N},(u_{i})_{i\in N}\right), tangential set of vector fields FF, σ1Δ(X)\sigma_{1}\in\Delta(X), ϵ\epsilon
Output: σt\sigma_{t}, an approximate local CE
1 t1t\leftarrow 1;
2 μ1fiNX𝑑σt(x)fi(x),iui(x)fF\mu_{1f}\leftarrow\sum_{i\in N}\int_{X}d\sigma_{t}(x)\cdot\left\langle f_{i}(x),\nabla_{i}u_{i}(x)\right\rangle\ \forall\ f\in F;
3 while fF,μtf>ϵpoly(G,L,Gf,Lf)\exists\ f\in F,\mu_{tf}>\epsilon\cdot\textnormal{poly}(\vec{G},\vec{L},G_{f},L_{f}) do
4       xt+1x_{t+1}\leftarrow a fixed point of fFmax{μtf,0}f\sum_{f\in F}\max\{\mu_{tf},0\}f in XX;
5       Find suitable αt(0,1)\alpha_{t}\in(0,1);
6       σt+1(1αt)σt+αtδ(xt+1)\sigma_{t+1}\leftarrow(1-\alpha_{t})\cdot\sigma_{t}+\alpha_{t}\cdot\delta(x_{t+1});
7       tt+1t\leftarrow t+1;
8       μtfiNX𝑑σt(x)fi(x),iui(x)fF\mu_{tf}\leftarrow\sum_{i\in N}\int_{X}d\sigma_{t}(x)\cdot\left\langle f_{i}(x),\nabla_{i}u_{i}(x)\right\rangle\ \forall\ f\in F;
9      
10 end while
Algorithm 2 Regret matching for ϵ\epsilon-local CE
Theorem 4.2.

Suppose that in Algorithm 2, αt=1/(t+1)\alpha_{t}=1/(t+1) or is chosen optimally to minimize fFmax{μtf,0}2\sum_{f\in F}\max\{\mu_{tf},0\}^{2} at each time step. Then after tt iterations,

maxfFμtf|F|t+1(iNGimaxfFGf).\max_{f\in F}\mu_{tf}\leq\sqrt{\frac{|F|}{t+1}}\cdot\left(\sum_{i\in N}G_{i}\cdot\max_{f\in F}G_{f}\right).

As a consequence, an ϵ\epsilon-local CE may be computed after O(|F|/ϵ2)O(|F|/\epsilon^{2}) iterations, given a fixed-point oracle for fFμff\sum_{f\in F}\mu_{f}f for any non-negative vector μ+F\mu\in\mathbb{R}_{+}^{F}.

Proof.

Identical to the proof of Theorem 4.1, by noting G(μ)=fFmax{μf,0}2G(\mu)=\sum_{f\in F}\max\{\mu_{f},0\}^{2}, g(μ)f=2max{μf,0}g(\mu)_{f}=2\max\{\mu_{f},0\} and γ(Δμ)=Δμ2\gamma(\Delta\mu)=\|\Delta\mu\|^{2} form a Gordon triple ([41], Lemma 12), and that the tangentiality of FF implies that at any fixed-point xt+1x_{t+1}, fFmax{μtf,0}f(xt+1)=0\sum_{f\in F}\max\{\mu_{tf},0\}f(x_{t+1})=0. ∎

The question that remains is, then, whether there exists a family of vector fields FF such that an ϵ\epsilon-stationary or -local CE are tractably approximable. The answer is easily shown to be affirmative for the case of ϵ\epsilon-local CE when each ff is affine-linear in each component.

Proposition 4.3.

Suppose that FF is a family of tangential vector fields, such that f(x)=Pfx+qff(x)=P_{f}x+q_{f} for some matrix Pfd×dP_{f}\in\mathbb{R}^{d\times d} and some vector qfdq_{f}\in\mathbb{R}^{d}. Then for any μ:F+\mu:F\rightarrow\mathbb{R}_{+}, fFμff(x)2\|\sum_{f\in F}\mu_{f}f(x)\|^{2} is a convex quadratic function. Moreover, any argminxXfFμff(x)2\operatorname*{arg\,min}_{x\in X}\|\sum_{f\in F}\mu_{f}f(x)\|^{2} is a fixed point of fFμff(x)\sum_{f\in F}\mu_{f}f(x).

Proof.

As fFμff(x)\sum_{f\in F}\mu_{f}f(x) is linear in xx, fFμff(x)2\|\sum_{f\in F}\mu_{f}f(x)\|^{2} is a sum of squares. Since FF is tangential, at any fixed point xXx^{*}\in X of fFμff\sum_{f\in F}\mu_{f}f, fFμff(x)𝒯𝒞X(x)𝒩𝒞X(x)={0}\sum_{f\in F}\mu_{f}f(x^{*})\in\mathcal{TC}_{X}\left(x^{*}\right)\cap\mathcal{NC}_{X}\left(x^{*}\right)=\{0\}. Thus minxXfFμff(x)2=0\min_{x\in X}\|\sum_{f\in F}\mu_{f}f(x)\|^{2}=0, which implies the result. ∎

Corollary 4.4.

For a family of tangential affine-linear vector fields FF, a ϵ\epsilon-local CE with respect to FF can be approximated via solving O(|F|/ϵ2)O(|F|/\epsilon^{2}) convex quadratic minimisation problems.

By the remark at the beginning of Section 4, this subsumes the correlated equilibria of normal form games; but such equilibria remain tractably approximable even for non-concave games. And as the following example demonstrates, for the hypercube equilibrium refinements are possible.

Example 1.

Suppose in a two player game that Xi=[1,1]X_{i}=[-1,1] for each player ii, and hence X=[1,1]2X=[-1,1]^{2}. Then the usual correlated equilibrium constraints are provided by vector fields

f1+=(1x10),f1=(1x10),f2+=(01x2), and f2=(01x2).f^{1+}=\begin{pmatrix}1-x_{1}\\ 0\end{pmatrix},f^{1-}=\begin{pmatrix}-1-x_{1}\\ 0\end{pmatrix},f^{2+}=\begin{pmatrix}0\\ 1-x_{2}\end{pmatrix},\textnormal{ and }f^{2-}=\begin{pmatrix}0\\ -1-x_{2}\end{pmatrix}.

Note that these vector fields are all conservative, and as a result they are gradient fields. We therefore consider extending our set of vector fields by considering

g1=(x1x20),g1+=(x2x10),g2=(0x1x2), and g2+=(0x1x2).g^{1-}=\begin{pmatrix}-x_{1}-x_{2}\\ 0\end{pmatrix},g^{1+}=\begin{pmatrix}x_{2}-x_{1}\\ 0\end{pmatrix},g^{2-}=\begin{pmatrix}0\\ -x_{1}-x_{2}\end{pmatrix},\textnormal{ and }g^{2+}=\begin{pmatrix}0\\ x_{1}-x_{2}\end{pmatrix}.

The vector fields gg have non-zero curl, and thus none of them arise as the gradient field of a quadratic function; although, they are coordinate projections of suitable quadratic functions. As a consequence, no vector field gi±g^{i\pm} may be expressed as a conical combination of the vector fields fi±f^{i\pm}. Setting F={fij,gij|i{1,2},j{+,}}F=\{f^{ij},g^{ij}\ |\ i\in\{1,2\},j\in\{+,-\}\} thus provides us a refinement of the usual correlated equilibria of 2×22\times 2 normal form games. The refinement can be strict when looking at the history of play as distributions; consider the matching pennies game where iui(x)=(1)ixi\nabla_{i}u_{i}(x)=(-1)^{i}x_{-i}. Then

iNg1(x)+g2+(x),iui(x)=x12+x22,\displaystyle\sum_{i\in N}\left\langle g^{1-}(x)+g^{2+}(x),\nabla_{i}u_{i}(x)\right\rangle=x_{1}^{2}+x_{2}^{2},

which implies that the only local CE with respect to FF is the unique Nash equilibrium at x=(0,0)x=(0,0). On the other hand, limiting attention to vector fields fi±f^{i\pm} cannot rule out cycles, e.g. (x1,x2)=(sin(θ),cos(θ))(x_{1},x_{2})=(\sin(\theta),\cos(\theta)) where θ\theta is drawn from the uniform distribution on [0,2π][0,2\pi].

We note that when we consider linear (instead of conical) combinations of vector fields fFf\in F in Example 1, we may obtain any arbitrary affine-linear vector field on [1,1][-1,1]; {fi±,gi±}\{f^{i\pm},g^{i\pm}\} is a linearly dependent set for each player ii. However, in a 2×22\times 2 normal form game, players’ utility gradients iui\nabla_{i}u_{i} are also affine-linear. As a consequence, for a 2×22\times 2 normal form game, any stationary CE with respect to FF is necessarily the convex hull of its Nash equilibria.

5 On Equilibrium Analysis

Given our notions of first-order coarse correlated equilibria are approximable, and that first-order correlated equilibria are also sometimes so, a natural question to ask is what these equilibria look like, and how one might reason about such equilibria. We initiate the study of this question in this section. First, we adapt the usual primal-dual framework for price of anarchy analysis to our equilibrium concepts. Our observation is that given a primal optimisation problem encoding performance guarantees of a first-order coarse equilibrium, a solution to the dual is given by a generalised Lyapunov function. In fact, when the associated performance metric is distance to a unique equilibrium, and the dual solution certifies that projected gradient ascent necessarily converges to it, this dual solution is a Lyapunov function in the usual sense. In turn, for first-order coarse equilibrium, one does not obtain a potential in general. Instead, the dual problem takes the form of a vector field fit problem, searching for a linear or conic combination of vector fields in FF which are aligned with the utility gradients.

We then turn our attention to normal-form games. Here, we find a striking observation; for a suitably chosen finite set of vector fields FF, first-order (coarse) correlated equilibrium is equivalent to the usual notions of (coarse) correlated equilibria in a normal-form game. As a consequence, the usual primal-dual analysis of price of anarchy bounds can be viewed as a statement about the gradient dynamics associated with the game. In fact, even the smoothness framework of Roughgarden [72] falls under this umbrella, by its equivalence to an “average-CCE” [64].

5.1 Duality and Lyapunov arguments

We proceed with analysing the appropriate primal-dual framework for an ϵ\epsilon-approximate first-order CCE. Our approach is motivated by the possibility of strengthening primal-dual efficiency analysis for the outcomes of learning algorithms. Often, in the analysis of games, the object of interest is performance guarantees attached to an equilibrium concept and not necessarily the exact form of equilibrium. That is to say, given an equilibrium concept (in full abstraction, a subset EΔ(X)E\subseteq{\Delta(X)}), and a continuous “welfare” function q:Xq:X\rightarrow\mathbb{R}, one may consider a bound on the worst case performance

infσE𝔼xσ[q(x)]maxxXq(x).\frac{\inf_{\sigma\in E}\mathbb{E}_{x\sim\sigma}[q(x)]}{\max_{x\in X}q(x)}.

This quantity is often referred to as the price of anarchy of the game, while the related notion of price of stability bounds the best case performance in equilibria,

supσE𝔼xσ[q(x)]maxxXq(x).\frac{\sup_{\sigma\in E}\mathbb{E}_{x\sim\sigma}[q(x)]}{\max_{x\in X}q(x)}.

Most methods of bounding such expectations fall under two classes of a so-called primal-dual method. One class of such methods argues the bounds via the lens of primal-dual approximation algorithms, writing a primal LP (“configuration LP”) coresponding to performance (e.g. welfare) maximisation, and obtaining via the equilibrium conditions a solution for its dual minimisation LP. If the equilibrium is good, the constructed dual solution has value a constant factor of the performance of the equilibrium, which provides the desired result. An in-depth discussion on this approach might be found in [63]; we remark that this is not the primal-dual framework of interest here.

Another class of primal-dual methods argue directly via a primal problem over the set of equilibria. Here, the primal LP has as its variables probability distributions over XX, and is subject to equilibrium constraints. The objective is to minimise or maximise q(x)q(x), corresponding respectively to computing the price of anarchy or stability up to a factor of maxxXq(x)\max_{x\in X}q(x). For instance, (by the arguments in [64]) the smoothness framework of [72] as well as the price of anarchy bounds for congestion games by [12] both fall under this umbrella. The question is whether such arguments may be extended for the outcomes of gradient ascent.

The answer, of course, is yes; and for the concept of equilibrium of choice it is more enlightening to invoke ϵ\epsilon-stationary CCE constraints rather than ϵ\epsilon-local CCE constraints. To wit, given the function qq, consider the following measure valued (infinite dimensional) LP, which seeks the worst-case value of the expectation of qq amongst the set of stationary CCE of the game,

infσ0X𝑑σ(x)q(x)\displaystyle\inf_{\sigma\geq 0}\int_{X}d\sigma(x)\cdot q(x)\ subject to (20)
X𝑑σ(x)\displaystyle\int_{X}d\sigma(x) =1\displaystyle=1 (γ\gamma)
hH,iNX𝑑σ(x)ih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\forall\ h\in H,\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle =0.\displaystyle=0. (μ(h)\mu(h))

The Lagrangian dual of (20) may then be naively written,

supγ,μγ\displaystyle\sup_{\gamma,\mu}\gamma\ subject to (21)
xX,γ+H𝑑μ(h)iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\forall\ x\in X,\gamma+\int_{H}d\mu(h)\cdot\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle q(x).\displaystyle\leq q(x). (σ(x)\sigma(x))

Here, γ\gamma is some real number, while μ\mu is assumed to be “some measure” on HH. Of course, we may simply pick a dual solution μ\mu which places probability 11 on some element hHh\in H. Under such a restriction, the dual problem is then

supγ,hHγ\displaystyle\sup_{\gamma\in\mathbb{R},h\in H}\gamma\ subject to (22)
xX,γ+iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\forall\ x\in X,\gamma+\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle q(x).\displaystyle\leq q(x). (σ(x)\sigma(x))

Note that the dual is always feasible. Whether weak duality between the primal LP (20) and either one of the dual problems (21) or (22) depends on the specification of the vector spaces the primal and dual variables live in, and as a consequence whether they form a dual pair in the sense of [6, 75].

However, given a differentiable h:Xh:X\rightarrow\mathbb{R} with Lipschitz gradients, and the outcome of approximate gradient dynamics x:[0,τ¯]Xx:[0,\overline{\tau}]\rightarrow X, such dual arguments are always valid. The integral along the curve xx exists since it is differentiable almost everywhere on [0,τ¯][0,\overline{\tau}] by construction. In particular, if

γ=infxXq(x)iNih(x),Π𝒯𝒞Xi(xi)[iui(x)],\gamma=\inf_{x\in X}q(x)-\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle,

then the time expectation of q(x(τ))q(x(\tau)) satisfies

1τ¯0τ¯𝑑τq(x(τ))\displaystyle\frac{1}{\overline{\tau}}\int_{0}^{\overline{\tau}}d\tau\cdot q(x(\tau)) 1T0T𝑑τ(γ+iNih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))])\displaystyle\geq\frac{1}{T}\int_{0}^{T}d\tau\cdot\left(\gamma+\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right)
γ1τ¯|0τ¯𝑑τiNih(x(τ)),Π𝒯𝒞Xi(xi(τ))[iui(x(τ))]|\displaystyle\geq\gamma-\frac{1}{\overline{\tau}}\left|\int_{0}^{\overline{\tau}}d\tau\cdot\sum_{i\in N}\left\langle\nabla_{i}h(x(\tau)),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[\nabla_{i}u_{i}(x(\tau))\right]\right\rangle\right|
γpoly(G,L,Gh,Lh)ϵ\displaystyle\geq\gamma-\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})\cdot\epsilon

for step-sizes ητ=1τ+1\eta_{\tau}=1\sqrt{\tau+1}, where the ϵ\epsilon and poly(G,L,Gh,Lh)\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h}) are given as in Theorems 3.4 or 3.8. Moreover, whenever q(x)q(x) is Lipschitz continuous, the order of the dependence of the convergence rate on TT is maintained also for projected gradient ascent. This can be shown via the trivial bound on the difference between the expectations of q(x)q(x) for the two resulting distributions; note that for each τ[0,τ¯]\tau\in[0,\overline{\tau}],

|q(x(τ))q(x(τ¯))|\displaystyle|q(x(\tau))-q(x(\underline{\tau}))| Gqx(τ)x(τ¯)1μτ(ττ¯)GqiNGi.\displaystyle\leq G_{q}\|x(\tau)-x(\underline{\tau})\|\leq\frac{1}{\mu_{\tau}}\cdot(\tau-\underline{\tau})\cdot G_{q}\sum_{i\in N}G_{i}.

Therefore,

1τ¯|t=0T1μtηtq(xt)0τ¯𝑑τq(x(τ))|\displaystyle\frac{1}{\overline{\tau}}\left|\sum_{t=0}^{T-1}\mu_{t}\eta_{t}q(x^{t})-\int_{0}^{\overline{\tau}}d\tau\cdot q(x(\tau))\right|\ 1τ¯t=0T10μtηt𝑑τ1μtτ(GqiNGi)\displaystyle\leq\frac{1}{\overline{\tau}}\cdot\sum_{t=0}^{T-1}\int_{0}^{\mu_{t}\eta_{t}}d\tau\cdot\frac{1}{\mu_{t}}\cdot\tau\cdot\left(G_{q}\sum_{i\in N}G_{i}\right)
1τ¯(t=0T112ηt2μt)Gqpoly(G,L,Gh,Lh).\displaystyle\leq\frac{1}{\overline{\tau}}\left(\sum_{t=0}^{T-1}\frac{1}{2}\eta_{t}^{2}\mu_{t}\right)\cdot G_{q}\cdot\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h}). (23)
Theorem 5.1.

Suppose that q(x)q(x) is a continuous function, that all XiX_{i} satisfy either the conditions of Theorem 3.4 or Theorem 3.8, and that (γ,h)(\gamma,h) is a solution of (22). Then with step-sizes ηt=1/t+1\eta_{t}=1/\sqrt{t+1}, after TT steps, the outcome of approximate μ\mu-accelerated projected gradient dynamics x:[0,τ¯]Xx:[0,\overline{\tau}]\rightarrow X satisfies

1τ¯0τ¯𝑑τq(x(τ))γpoly(G,L,Gh,Lh)O~(1/T).\frac{1}{\overline{\tau}}\int_{0}^{\overline{\tau}}d\tau\cdot q(x(\tau))\geq\gamma-\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h})\cdot\tilde{O}\left(1/\sqrt{T}\right).

If also qq is Lipschitz continuous with modulus GqG_{q}, then the outcome of projected gradient ascent (xt)t=0T(x^{t})_{t=0}^{T} satisfies, for the choices μt=1\mu_{t}=1 or μt=1/ηt\mu_{t}=1/\eta_{t},

1τ¯t=0T1μtηtq(xt)γpoly(G,L,Gh,Lh,Gq)O~(1/T).\frac{1}{\overline{\tau}}\sum_{t=0}^{T-1}\mu_{t}\eta_{t}\cdot q(x^{t})\geq\gamma-\textnormal{poly}(\vec{G},\vec{L},G_{h},L_{h},G_{q})\cdot\tilde{O}\left(1/\sqrt{T}\right).

Thus, given a dual solution (γ,h)(\gamma,h) to (22), the function hh can be viewed as proof certificate that the value of (20) is at least γ\gamma; encoding the performance guarantee with respect to the expectation of qq when all players employ projected gradient ascent with equal step sizes. The choice μt=1\mu_{t}=1 in this case tracks bounds on the expectation of qq when we consider an approximation to the actual gradient dynamics, whereas the choice μt=1/ηt\mu_{t}=1/\eta_{t} provides guarantees in time averages. It remains to given an interpretation to this function.

To wit, consider the case when our smooth game has a unique first-order NE xXx^{*}\in X, and suppose q(x)0q(x)\leq 0 for any xXx\in X with equality if and only if x=xx=x^{*}. Furthermore, suppose that (0,h)(0,h) is an optimal solution to (21). Then the dual feasibility conditions imply that

iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle q(x)<0xX,xx\displaystyle\leq q(x)<0\ \forall\ x\in X,x\neq x^{*}
iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}\left\langle\nabla_{i}h(x^{*}),\Pi_{\mathcal{TC}_{X_{i}}\left(x^{*}_{i}\right)}\left[\nabla_{i}u_{i}(x^{*})\right]\right\rangle =q(x)=0.\displaystyle=q(x^{*})=0.

Now, recall that for a dynamical system defined by the differential equation x˙(τ)=Π𝒯𝒞X(x(τ))[f(x(τ))]\dot{x}(\tau)=\Pi_{\mathcal{TC}_{X}\left(x(\tau)\right)}\left[f(x(\tau))\right] and a unique equilibrium xx^{*}, a function V:X+V:X\rightarrow\mathbb{R}_{+} is called a (global) Lyapunov function if dV(x(τ))/dτ0dV(x(\tau))/d\tau\leq 0, with equality only if x(τ)=xx(\tau)=x^{*}. By (5), we then infer that the function hh is a Lyapunov function in the usual sense.

A similar observation holds for an ϵ\epsilon-local CCE, when the set of functions HH are such that for each hHh\in H, h\nabla h is a tangential vector field. Recall that this condition implies that h(x)𝒯𝒞X(x)\nabla h(x)\in\mathcal{TC}_{X}\left(x\right) for every xXx\in X, i.e. h(x)=Π𝒯𝒞X(x)[h(x)]\nabla h(x)=\Pi_{\mathcal{TC}_{X}\left(x\right)}\left[\nabla h(x)\right]. As a consequence, the associated primal problem is given

infσ0X𝑑σ(x)q(x)\displaystyle\inf_{\sigma\geq 0}\int_{X}d\sigma(x)\cdot q(x)\ subject to (24)
X𝑑σ(x)\displaystyle\int_{X}d\sigma(x) =1\displaystyle=1 (γ\gamma)
hH,iNX𝑑σ(x)ih(x),iui(x)\displaystyle\forall\ h\in H,\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle\nabla_{i}h(x),\nabla_{i}u_{i}(x)\right\rangle 0.\displaystyle\leq 0. (μ(h)\mu(h))

And then, by similar arguments, we state the dual problem as

supγ,hHγ\displaystyle\sup_{\gamma\in\mathbb{R},h\in H}\gamma\ subject to (25)
xX,γiNih(x),iui(x)\displaystyle\forall\ x\in X,\gamma-\sum_{i\in N}\left\langle\nabla_{i}h(x),\nabla_{i}u_{i}(x)\right\rangle q(x),\displaystyle\leq q(x), (σ(x)\sigma(x))

where HH is a set of functions with tangential gradient fields. An analogue of Theorem 5.1 is immediate through an identical line of arguments.

Corollary 5.2.

In the setting of Theorem 5.1, whenever (γ,h)(\gamma,h) is instead a solution of (25), the same guarantees hold for the expectation of qq for the outcome of approximate μ\mu-accelerated projected gradient dynamics or projected gradient ascent.

Moreover, we may again check what a dual solution which proves convergence to a unique Nash equilibrium xx^{*} looks like. In this case, dual feasibility of a dual solution (0,h)(0,h) to (25) implies that

iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle-\sum_{i\in N}\left\langle-\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]+Π𝒩𝒞Xi(xi)[iui(x)]\displaystyle\leq\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]+\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle
=iNih(x),iui(x)q(x)<0xX,xx\displaystyle=\sum_{i\in N}\left\langle-\nabla_{i}h(x),\nabla_{i}u_{i}(x)\right\rangle\leq q(x)<0\ \forall\ x\in X,x\neq x^{*}
iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}\left\langle\nabla_{i}h(x^{*}),\Pi_{\mathcal{TC}_{X_{i}}\left(x^{*}_{i}\right)}\left[\nabla_{i}u_{i}(x^{*})\right]\right\rangle =q(x)=0.\displaystyle=q(x^{*})=0.

Here, the first inequality follows since for any pair of vectors v𝒯𝒞X(x)v\in\mathcal{TC}_{X}\left(x\right) and w𝒩𝒞X(x)w\in\mathcal{NC}_{X}\left(x\right), v,w0\left\langle v,w\right\rangle\leq 0. The final equality, in turn, holds as xx^{*} is a first-order NE by assumption. Thus, again, the function h-h is a Lyapunov function in the usual sense.

For a more general performance metric qq, we shall thus call a dual solution hh a (generalised) Lyapunov function, following the nomenclature in [43]. The insight is that not only such functions can describe worst- and best- case behaviour of approximate or exact gradient dynamics of the game as far as bounding the performance of ϵ\epsilon-approximate first-order CCE is concerned, they arise naturally through the primal-dual framework associated with such first-order CCE.

Moving onwards, we first note that Corollary 5.2 allows us to fulfill a leftover promise from Section 3.4. In particular, when all players use the very same step sizes, Theorem 5.1 and Corollary 5.2 provide performance guarantees over qq for the time average behaviour as well as approximations of the gradient dynamics obtained from the sequence (xt)t(x^{t})_{t\in\mathbb{N}}. However, these bounds follow “as-if” the dynamics follow the continuous curve xx we considered in Sections 3.2 & 3.3; when the step sizes may differ or some players are adversarial, the construction of a single such curve is not allowed in our discussion in Section 3.4. However, similar regret bounds as in Theorem 3.9 may be established for the sequence (xt)t(x^{t})_{t\in\mathbb{N}} through a bootstrap argument on Corollary 5.2, when the functions hHh\in H are tangential.

Theorem 5.3.

Suppose a subset of players NN^{\prime} employ projected gradient ascent with equal step sizes ηt\eta_{t}, with players NNN\setminus N^{\prime} potentially adversarial. Furthermore, suppose that for any iNi\in N^{\prime}, XiX_{i} either has smooth boundary of bounded principal curvature KiK_{i}, or is an acute polytope (where we fix Ki=0K_{i}=0). Then for any T>0T>0, for any tangential function hh with bounded, Lipschitz gradients such that jh(x)=0\nabla_{j}h(x)=0 for any jNNj\in N\setminus N^{\prime} and any xXx\in X,

t=0T1iNih(xt),iui(xt)\displaystyle\sum_{t=0}^{T-1}\sum_{i\in N}\left\langle\nabla_{i}h(x^{t}),\nabla_{i}u_{i}(x^{t})\right\rangle
\displaystyle\leq\ 2d(X)Gh(1/ηT1+1/η0)T+12T(t=0T1ηt)GhiN(KiGi2+LijNGj)\displaystyle\frac{2d(X)G_{h}(1/\eta_{T-1}+1/\eta_{0})}{T}+\frac{1}{2T}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}\right)\cdot G_{h}\sum_{i\in N^{\prime}}\left(K_{i}G_{i}^{2}+L_{i}\sum_{j\in N^{\prime}}G_{j}\right)
+12T(t=0T1ηt)(iNLhGi+GhLi)(iNGi).\displaystyle+\frac{1}{2T}\cdot\left(\sum_{t=0}^{T-1}\eta_{t}\right)\cdot\left(\sum_{i\in N}L_{h}G_{i}+G_{h}L_{i}\right)\cdot\left(\sum_{i\in N^{\prime}}G_{i}\right).

Thus, the aggregate regret against such hh vanishes in the limit TT\rightarrow\infty whenever 1/ηT1,t=0T1ηt=o(T)1/\eta_{T-1},\sum_{t=0}^{T-1}\eta_{t}=o(T), and a choice ηt1/t+1\eta_{t}\propto 1/\sqrt{t+1} guarantees regret O(1/T)O(1/\sqrt{T}) against such hh in first-order.

Proof.

The proof implicitly suggests that the desired result may be proven through the “retraction argument” in (23), noting that for tangential hh, the first-order regret is Lipschitz continuous. The first two terms are from Theorem 3.9. For the third term, invoke Corollary 5.2 on the restricted time-averaged projected gradient dynamics considered in Section 3.4, evaluating each integral over [t,t+1)[t,t+1) and summing over tt, and fixing q(x)=iNih(x),iui(x)q(x)=\sum_{i\in N^{\prime}}\left\langle-\nabla_{i}h(x),\nabla_{i}u_{i}(x)\right\rangle. By the Lipschitz property on hh and the utilities, qq is then Lipschitz continuous with bound Gq=iNLhGi+GhLiG_{q}=\sum_{i\in N}L_{h}G_{i}+G_{h}L_{i}, and x(τ)xtiNGiηt(τt)\|x(\tau)-x^{t}\|\leq\sum_{i\in N^{\prime}}G_{i}\cdot\eta_{t}(\tau-t) for any τ[t,t+1)\tau\in[t,t+1). Then note that for our qq, (0,h)(0,h) is a solution of (25), which ensures the regret bound. ∎

We remark that if all players implement projected gradient ascent with the same step sizes, Theorem 5.3 implies that the actual time-average play is also an approximate local-CCE, with respect to the set of all suitably smooth tangential functions. In particular, we lose the much stronger guarantees with respect to first-order stationarity, or with respect to local first-order regret generated by functions which may require projections. Nevertheless, the “as-if” results of Theorem 5.1 and Corollary 5.2 allow us valid bounds on best- and worst-case expectations of some function q:Xq:X\rightarrow\mathbb{R} for the actual time-average play, despite the fact that the distribution obtained might not actually be an approximate first-order equilibrium.

A natural question to pose now is whether such generalised Lyapunov functions can be found to prove meaningful bounds. We illustrate how this can be done in the following example. The main takeaway is that, whereas generalised Lyapunov functions can indeeed certify properties of the dynamics of a game, constructing and checking the validity of dual solutions can be rather laborious.

Example 2 (Matching Pennies).

Consider the canonical example for a game in which the gradient dynamics necessarily cycle, matching pennies. Up to reparametrisation of the strategy space, there are two players, whose utility functions [1,1]2[-1,1]^{2}\rightarrow\mathbb{R} are given

u1(x1,x2)\displaystyle u_{1}(x_{1},x_{2}) =x1x2,\displaystyle=-x_{1}x_{2},
u2(x1,x2)\displaystyle u_{2}(x_{1},x_{2}) =x1x2.\displaystyle=x_{1}x_{2}.

The unconstrained projected gradient dynamics then satisfy, if r(0)2=x12(0)+x22(0)1r(0)^{2}=x_{1}^{2}(0)+x_{2}^{2}(0)\leq 1, x1(τ)=rcos(τ+ϕ),x2(τ)=rsin(τ+ϕ)x_{1}(\tau)=r\cdot\cos(\tau+\phi),x_{2}(\tau)=r\cdot\sin(\tau+\phi) for some “phase angle” ϕ\phi. If r(0)>1r(0)>1, however, projections at the edge of a box suppresses the radius of motion down to 11 by some time τ>0\tau^{\prime}>0, and for each ττ\tau\geq\tau^{\prime}, (x1(τ),x2(τ))=(cos(τ+ϕ),sin(τ+ϕ))(x_{1}(\tau),x_{2}(\tau))=(\cos(\tau+\phi),\sin(\tau+\phi)). Therefore, any stationary probability distribution σΔ([1,1]2)\sigma\in\Delta([-1,1]^{2}) must be rotationally symmetric, with support on the disc D={(x1,x2)|x12+x221}D=\{(x_{1},x_{2})\ |x_{1}^{2}+x_{2}^{2}\leq 1\}. The goal here is to present the form of generalised Lyapunov functions hh which certify these bounds, at least approximately, such that they provide convergence guarantees also for approximate projected gradient dynamics.

For the bound on the radius, by the duality based arguments in this section, we seek a suitable function h:[1,1]2h:[-1,1]^{2}\rightarrow\mathbb{R} and a small δ>0\delta>0 such that

minx1,x2[1,1]x12+x22(1h(x)Π𝒯𝒞[1,1](x1)[x2]+2h(x)Π𝒯𝒞[1,1](x2)[x1])``dh(x)/dτ′′1+δ.\min_{x_{1},x_{2}\in[-1,1]}x_{1}^{2}+x_{2}^{2}-\underbrace{\left(\nabla_{1}h(x)\Pi_{\mathcal{TC}_{[-1,1]}\left(x_{1}\right)}\left[-x_{2}\right]+\nabla_{2}h(x)\Pi_{\mathcal{TC}_{[-1,1]}\left(x_{2}\right)}\left[x_{1}\right]\right)}_{``dh(x)/d\tau^{\prime\prime}}\leq 1+\delta.

Here, we abuse notation by considering time dependent quantities given suitable initial conditions for the projected gradient dynamics. Now, consider setting,

h(x)={0x12+x221M1(r1)21+M1(r1)M2arctan(x1/x2)x12+x22>1,x1x2>0M1(r1)21+M1(r1)M2arctan(x2/x1)x12+x22>1,x1x20h(x)=\begin{cases}0&x_{1}^{2}+x_{2}^{2}\leq 1\\ -\frac{M_{1}(r-1)^{2}}{1+M_{1}(r-1)}\cdot M_{2}\cdot\arctan(x_{1}/x_{2})&x_{1}^{2}+x_{2}^{2}>1,x_{1}x_{2}>0\\ -\frac{M_{1}(r-1)^{2}}{1+M_{1}(r-1)}\cdot M_{2}\cdot\arctan(-x_{2}/x_{1})&x_{1}^{2}+x_{2}^{2}>1,x_{1}x_{2}\leq 0\end{cases}

for some constants M1,M2>0M_{1},M_{2}>0 to be determined later. By construction, hh is continuously differentiable with Lipschitz gradients. On the disc DD, h(x)=(0,0)\nabla h(x)=(0,0), whereas for any x12+x22>1x_{1}^{2}+x_{2}^{2}>1, by symmetry of the dynamics under rotations by π/2\pi/2, it is sufficient to consider the case when x1,x2>0x_{1},x_{2}>0. In this case, if x2<1x_{2}<1, then no projections occur, and

dh(x)dτ=ddτ[M1(r1)21+M1(r1)M2arctan(x1/x2)]=M1M2(r1)21+M1(r1),\displaystyle\frac{dh(x)}{d\tau}=\frac{d}{d\tau}\left[-\frac{M_{1}(r-1)^{2}}{1+M_{1}(r-1)}\cdot M_{2}\cdot\arctan(x_{1}/x_{2})\right]=\frac{M_{1}M_{2}(r-1)^{2}}{1+M_{1}(r-1)},

in which case we have

r2dh(x)/dτ=r2M2M1(r1)21+M1(r1).r^{2}-dh(x)/d\tau=r^{2}-\frac{M_{2}M_{1}(r-1)^{2}}{1+M_{1}(r-1)}.

We would like to bound this quantity. Denote x=r1x=r-1, and suppose that x1/M1x\leq 1/M_{1}. Then

(1+x)2M1M2x21+M1x\displaystyle(1+x)^{2}-\frac{M_{1}M_{2}x^{2}}{1+M_{1}x} (1+x)2M1M2x22=(1M1M22)x2+2x+1.\displaystyle\leq(1+x)^{2}-\frac{M_{1}M_{2}x^{2}}{2}=\left(1-\frac{M_{1}M_{2}}{2}\right)x^{2}+2x+1.

This bound is maximised, when M1M2/2>1M_{1}M_{2}/2>1, whenever x=1/(1+M1M2/2)x=1/(-1+M_{1}M_{2}/2), with value 1+2/(M1M22)1+2/(M_{1}M_{2}-2). Meanwhile, if x>1/M1x>1/M_{1}, then 1<M1x1<M_{1}x, and

(1+x)2M1M2x21+M1x\displaystyle(1+x)^{2}-\frac{M_{1}M_{2}x^{2}}{1+M_{1}x} (1+x)2M1M2x22M1x=x2+(2M22)x+1.\displaystyle\leq(1+x)^{2}-\frac{M_{1}M_{2}x^{2}}{2M_{1}x}=x^{2}+\left(2-\frac{M_{2}}{2}\right)x+1.

This function is convex, so it attains its maximum over the feasible interval for x[0,21]x\in[0,\sqrt{2}-1]. When x=0x=0 the bound equals 11, whereas if x=21x=\sqrt{2}-1, whenever

(21)2+(21)(2M22)+11M22(2+1)(\sqrt{2}-1)^{2}+(\sqrt{2}-1)\left(2-\frac{M_{2}}{2}\right)+1\leq 1\Leftrightarrow M_{2}\geq 2(\sqrt{2}+1)

the bound is 1\leq 1. As the bounds will generally improve in M1M_{1}, we will also fix M2=10M_{2}=10 at this point for convenience.

If instead x2=1x_{2}=1 then ``dx2/dτ′′=Π𝒯𝒞[1,1](x2)[x1]=0``dx_{2}/d\tau^{\prime\prime}=\Pi_{\mathcal{TC}_{[-1,1]}\left(x_{2}\right)}\left[x_{1}\right]=0, and we have

h(x)\displaystyle h(x) =M1(1+x121)21+M1(1+x121)10arctan(x1),\displaystyle=-\frac{M_{1}\left(\sqrt{1+x_{1}^{2}}-1\right)^{2}}{1+M_{1}(\sqrt{1+x_{1}^{2}}-1)}\cdot 10\arctan(x_{1}),
dh(x)dτ\displaystyle\frac{dh(x)}{d\tau} =10M1(dx1dτ)=1ddx1[arctan(x1)(1+x121)21+M1(1+x121)]\displaystyle=-10M_{1}\cdot\underbrace{\left(\frac{dx_{1}}{d\tau}\right)}_{=-1}\cdot\frac{d}{dx_{1}}\left[\frac{\arctan(x_{1})(\sqrt{1+x_{1}^{2}}-1)^{2}}{1+M_{1}(\sqrt{1+x_{1}^{2}}-1)}\right]
=10M1(1+x121)1+x12(1+M1(1+x121))\displaystyle=\frac{10M_{1}(\sqrt{1+x_{1}^{2}}-1)}{\sqrt{1+x_{1}^{2}}(1+M_{1}(\sqrt{1+x_{1}^{2}}-1))}
[1+x1211+x12+2arctan(x1)x1+M1x1arctan(x1)(1+x121)1+M1(1+x121)]\displaystyle\quad\cdot\left[\frac{\sqrt{1+x_{1}^{2}}-1}{\sqrt{1+x_{1}^{2}}}+\frac{2\arctan(x_{1})x_{1}+M_{1}x_{1}\arctan(x_{1})(\sqrt{1+x_{1}^{2}}-1)}{1+M_{1}(\sqrt{1+x_{1}^{2}}-1)}\right]
10M1(r1)r(1+M1(r1))[11r]\displaystyle\geq\frac{10M_{1}(r-1)}{r(1+M_{1}(r-1))}\left[1-\frac{1}{r}\right]
5M1(r1)2(1+M1(r1)),\displaystyle\geq\frac{5M_{1}(r-1)^{2}}{(1+M_{1}(r-1))},

where as r2r\leq\sqrt{2}, 11/r(r1)/21-1/r\geq(r-1)/\sqrt{2}. To bound r2dh(x)/dτr^{2}-dh(x)/d\tau, we again consider the cases when r1<1/Mr-1<1/M and r11/Mr-1\geq 1/M. In the former case,

r25M1(r1)2(1+M1(r1))\displaystyle r^{2}-\frac{5M_{1}(r-1)^{2}}{(1+M_{1}(r-1))} r25M1(r1)22,\displaystyle\leq r^{2}-\frac{5M_{1}(r-1)^{2}}{2},

which is maximised for r=1+25M12r=1+\frac{2}{5M_{1}-2} with the same value. Meanwhile, if r11/Mr-1\geq 1/M, then

r25M1(r1)21+M1(r1)\displaystyle r^{2}-\frac{5M_{1}(r-1)^{2}}{1+M_{1}(r-1)} r25(r1)2=(1+x)25x2,\displaystyle\leq r^{2}-\frac{5(r-1)}{2}=(1+x)^{2}-\frac{5x}{2},

defining x=r1[0,21]x=r-1\in[0,\sqrt{2}-1]. Again by the convexity of the expression, it is sufficient to check its values at its endpoints; setting x=0x=0 results in a bound of 11, while setting x=21x=\sqrt{2}-1 results in a bound 25(21)/2<12-5(\sqrt{2}-1)/2<1. As a consequence, hh proves a bound of

δ=min{210M12,25M12}=25M12.\delta=\min\left\{\frac{2}{10M_{1}-2},\frac{2}{5M_{1}-2}\right\}=\frac{2}{5M_{1}-2}.

Now, note that the convergence bounds for approximate μ\mu-accelerated projected gradient dynamics proven in Theorem 3.8 depend linearly in GhG_{h}. However, GhG_{h} admits a constant bound independent of choice of M1M_{1} in this setting. As a consequence, hh can be chosen as an arbitrarily tight dual solution for the gradient dynamics of the matching pennies game by letting M1M_{1}\rightarrow\infty, and approximate μ\mu-accelerated projected gradient dynamics after τ¯\overline{\tau} time steps and step sizes ητ1τ+1\eta_{\tau}\sim\frac{1}{\sqrt{\tau+1}} results in a distribution with expected square radius at most 1+O(log(τ¯)/τ¯)1+O(\log(\overline{\tau})/\sqrt{\overline{\tau}}), whether we pick μt=1\mu_{t}=1 or μt=1/ητ\mu_{t}=1/\eta_{\tau}.

Finally, we would like to provide an approximate dual proof that any stable distribution is approximately rotationally invariant. Towards this end, consider a function q:[1,1]q:[-1,1]\rightarrow\mathbb{R} which is rotationally symmetric, i.e. q(x1,x2)=p(r)sin(kθ+ϕ)q(x_{1},x_{2})=p(r)\cdot\sin(k\cdot\theta+\phi) for some function p:[0,2]p:[0,\sqrt{2}]\rightarrow\mathbb{R} differentiable with p(0)=0p(0)=0, θ\theta is the angle in polar coordinates corresponding to x1x_{1} and x2x_{2}, kk\in\mathbb{Z} is the associated frequency, and ϕ\phi\in\mathbb{R} is again some phase angle. For any rotationally symmetric stationary distribution σΔ([1,1]2)\sigma\in\Delta([-1,1]^{2}), via Fourier decomposition-based arguments, it must be the case that 𝔼xσ[q(x)]=0\mathbb{E}_{x\sim\sigma}[q(x)]=0.

Towards this end, let (x1,x2)=p(r)cos(kθ+ϕ)/k\ell(x_{1},x_{2})=p(r)\cdot\cos(k\theta+\phi)/k. In the region without projections, d(x)/dτ=q(x)d\ell(x)/d\tau=q(x), and therefore \ell forms an exact Lyapunov function for qq whenever projections are not needed. To handle the region where projections are needed, consider +Ah\ell+A\cdot h for the hh previously defined for some choice of M1M>0M_{1}\leftarrow M>0 and some A>0A>0, and by the symmetry of the problem without loss of generality restrict attention to the case when x2=1,x1>0x_{2}=1,x_{1}>0. In this case,

drdτ=ddτ1+x12=x1r,dθdτ=ddτ[π/2arctan(x1)]=11+x12.\frac{dr}{d\tau}=\frac{d}{d\tau}\sqrt{1+x_{1}^{2}}=\frac{-x_{1}}{r},\frac{d\theta}{d\tau}=\frac{d}{d\tau}\left[\pi/2-\arctan(x_{1})\right]=\frac{1}{1+x_{1}^{2}}.

Therefore,

d(+Ah)(x)dτ=x1p(r)sin(kθ+ϕ)k1+x12+q(x)1+x12()+Adh(x)dτ.\frac{d(\ell+Ah)(x)}{d\tau}=\underbrace{-\frac{x_{1}p^{\prime}(r)\sin(k\theta+\phi)}{k\sqrt{1+x_{1}^{2}}}+\frac{q(x)}{1+x_{1}^{2}}}_{(\ast)}+A\frac{dh(x)}{d\tau}.

Note that by a similar treatment as the previous setting, via considering MM\rightarrow\infty and picking AA sufficiently large, we may acquire a family of dual solutions which prove convergence. In the interest of brevity, we shall argue about the asymptotic case. In the large MM limit, dh(x)/dτ5dh(x)/d\tau\geq 5 whenever 1x1>0,x2=11\geq x_{1}>0,x_{2}=1, while (\ast) is bounded on that set by q(x)±Cx1q(x)\pm Cx_{1} for some constant CC. Therefore, a fixed choice of constant AA is sufficient to provide a family of increasingly tighter dual solutions for bounds on the expected value of q(x)q(x). In particular, we again conclude that approximate μ\mu-accelerated projected gradient dynamics, after TT time steps and step sizes ηt1t+1\eta_{t}\sim\frac{1}{\sqrt{t+1}}, provides a probability distribution for which q(x)O(log(T)/T)q(x)\leq O(\log(T)/\sqrt{T}) in expectation.

We end this section with a short note on the associated primal-dual framework to first-order correlated equilibria. Given a finite set of continuously differentiable vector fields FF, the analogue of (20) for stationary CE is given

infσ0X𝑑σ(x)q(x)\displaystyle\inf_{\sigma\geq 0}\int_{X}d\sigma(x)\cdot q(x)\ subject to (26)
X𝑑σ(x)\displaystyle\int_{X}d\sigma(x) =1\displaystyle=1 (γ\gamma)
fF,iNX𝑑σ(x)fi(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\forall\ f\in F,\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle f_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle =0,\displaystyle=0, (μf\mu_{f})

and its Lagrangian dual is simply

supγ,μγ\displaystyle\sup_{\gamma,\mu}\gamma\ subject to (27)
xX,γ+fF,iNμff(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\forall\ x\in X,\gamma+\sum_{f\in F,i\in N}\left\langle\mu_{f}f(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle q(x).\displaystyle\leq q(x). (σ(x)\sigma(x))

We interpret this as a vector fit alignment problem. In particular, again suppose that the game has a unique first-order Nash equilibrium xx^{*}, and q(x)0q(x)\leq 0 with equality only if x=xx=x^{*}. Then if (0,μ)(0,\mu) is a solution of (27), the vector field f^=fFμff\hat{f}=-\sum_{f\in F}\mu_{f}f satisfies

iNf^i(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}\left\langle\hat{f}_{i}(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle q(x)>0xX,xx,\displaystyle\geq q(x)>0\ \forall\ x\in X,x\neq x^{*},
iNf^i(x),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}\left\langle\hat{f}_{i}(x^{*}),\Pi_{\mathcal{TC}_{X_{i}}\left(x^{*}_{i}\right)}\left[\nabla_{i}u_{i}(x^{*})\right]\right\rangle =0.\displaystyle=0.

In particular, the set of stationary CE with respect to FF is the set of probability distributions over the first-order NE of the game whenever the linear span of FF contains a vector field f^\hat{f} such that f^(x(τ)),x˙(τ)0\left\langle\hat{f}(x(\tau)),\dot{x}(\tau)\right\rangle\geq 0, with equality only if x(τ)x(\tau) is an equilibrium point for the dynamics. In words, the vector field fFμff-\sum_{f\in F}\mu_{f}f is suitably aligned with the utility gradient at every non-equilibrium point xx, whether the gradient dynamics cycle or not. This does not necessarily provide an obvious computational advantage though, since whenever FF is rich enough we may set fFμffi=iui-\sum_{f\in F}\mu_{f}f_{i}=\nabla_{i}u_{i} (cf. remark after Example 1).

When FF is a set of tangential vector fields, an analogous argument for the case of local CCE shows that the dual problem associated with a local CE similarly looks for a vector field that is positively aligned with the utility gradients in the conical span of FF instead. Explicitly, the primal problem is

infσ0X𝑑σ(x)q(x)\displaystyle\inf_{\sigma\geq 0}\int_{X}d\sigma(x)\cdot q(x)\ subject to (28)
X𝑑σ(x)\displaystyle\int_{X}d\sigma(x) =1\displaystyle=1 (γ\gamma)
fF,iNX𝑑σ(x)fi(x),iui(x)\displaystyle\forall\ f\in F,\sum_{i\in N}\int_{X}d\sigma(x)\cdot\left\langle f_{i}(x),\nabla_{i}u_{i}(x)\right\rangle 0,\displaystyle\geq 0, (μf\mu_{f})

which now has a Lagrangian dual

supγ,μ0γ\displaystyle\sup_{\gamma\in\mathbb{R},\mu\geq 0}\gamma\ subject to (29)
xX,γ+fF,iNμff(x),iui(x)\displaystyle\forall\ x\in X,-\gamma+\sum_{f\in F,i\in N}\left\langle\mu_{f}f(x),\nabla_{i}u_{i}(x)\right\rangle q(x).\displaystyle\geq-q(x). (σ(x)\sigma(x))

In this case, when (0,μ)(0,\mu) is a solution and q(x)0q(x)\leq 0 for any xXx\in X with equality only at the first-order Nash equilibria of the game, we have

fF,iNμff(x),iui(x)\displaystyle\sum_{f\in F,i\in N}\left\langle\mu_{f}f(x),\nabla_{i}u_{i}(x)\right\rangle q(x)>0xX,xx,\displaystyle\geq-q(x)>0\ \forall\ x\in X,x\neq x^{*},
fF,iNμff(x),iui(x)\displaystyle\sum_{f\in F,i\in N}\left\langle\mu_{f}f(x^{*}),\nabla_{i}u_{i}(x^{*})\right\rangle =0.\displaystyle=0.

We remark that the equality for xx^{*} must hold, because fFμff\sum_{f\in F}\mu_{f}f is tangential, while at a fixed point of the gradient dynamics of the game, iui\nabla_{i}u_{i} is in the normal cone to XiX_{i}. This implies that fF,iNμff(x),iui(x)0\sum_{f\in F,i\in N}\left\langle\mu_{f}f(x^{*}),\nabla_{i}u_{i}(x^{*})\right\rangle\leq 0, however, dual feasibility implies that the inner product is also 0\geq 0.

5.2 First-order (coarse) correlated equilibrium in normal-form games

Finally, we turn our attention to the interpretation of first-order equilibria in normal-form games. We shall see that the usual notions of CE, CCE as well as average CCE (in the sense of [64]) are in fact equivalent to an appropriately defined notion of first-order correlated equilibrium. The conclusion is that the usual primal-dual analysis of price of anarchy bounds in fact fall under the Lyapunov function or vector fit estimation problems discussed in the previous section. Crucially, whenever a dual solution for first-order coarse correlated equilibrium certifies uniqueness of equilibrium, it necessarily assembles into a Lyapunov function in the traditional sense.

5.2.1 Equivalence of coarse correlated equilibria

We will begin by showing that the CCE of finite normal-form games and an (0,ΦProj(δ))(0,\Phi_{\textnormal{Proj}}(\delta))-equilibrium (in the sense of Definition 2) are equivalent888This was remarked by Constatinos Daskalakis in a private correspondence.. Consider a finite normal-form game, with a set of players NN, players’ pure action sets (Ai)iN(A_{i})_{i\in N}, and utilities (Ui:×iNAi)iN(U_{i}:\times_{i\in N}A_{i}\rightarrow\mathbb{R})_{i\in N}. The continous extension of the game has action sets Xi=Δ(Ai)X_{i}=\Delta(A_{i}), the probability simplex over AiA_{i}, and utilities ui(x)=aA(iNxi(ai))ui(a)u_{i}(x)=\sum_{a\in A}\left(\prod_{i\in N}x_{i}(a_{i})\right)u_{i}(a) are given via expectations. Then, for an arbitrary δ(0,1)\delta\in(0,1), a (0,ΦProj(δ))(0,\Phi_{\textnormal{Proj}}(\delta))-local CE is a distribution σ\sigma which satisfies

iN,xiXi,𝔼xσ[ui((1δ)xi+δxi,xi)ui(x)]0.\forall i\in N,\forall x^{*}_{i}\in X_{i},\mathbb{E}_{x\sim\sigma}[u_{i}((1-\delta)x_{i}+\delta x^{*}_{i},x_{-i})-u_{i}(x)]\leq 0.

Expanding the left-hand side, we have

𝔼xσ[ui((1δ)xi+δxi,xi)ui(x)]\displaystyle\quad\quad\mathbb{E}_{x\sim\sigma}[u_{i}((1-\delta)x_{i}+\delta x^{*}_{i},x_{-i})-u_{i}(x)]
=X𝑑σ(x)aA(jN{i}xj(aj))((1δ)xi(ai)+δxi(ai)xi(ai))ui(a)\displaystyle=\int_{X}d\sigma(x)\cdot\sum_{a\in A}\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)\cdot\left((1-\delta)x_{i}(a_{i})+\delta x^{*}_{i}(a_{i})-x_{i}(a_{i})\right)\cdot u_{i}(a)
=δaA(X𝑑σ(x)(xi(ai)xi(ai))(jN{i}xj(aj)))ui(a)\displaystyle=\delta\cdot\sum_{a\in A}\left(\int_{X}d\sigma(x)\cdot(x^{*}_{i}(a_{i})-x_{i}(a_{i}))\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)\right)\cdot u_{i}(a)
=δaA(X𝑑σ(x)[aiAix(ai)(δ(ai,ai)xi(ai))](jN{i}xj(aj)))ui(a)\displaystyle=\delta\cdot\sum_{a\in A}\left(\int_{X}d\sigma(x)\cdot\left[\sum_{a^{\prime}_{i}\in A_{i}}x^{*}(a^{\prime}_{i})\cdot(\delta(a^{\prime}_{i},a_{i})-x_{i}(a_{i}))\right]\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)\right)\cdot u_{i}(a)
=δaiAix(ai)aA(X𝑑σ(x)(δ(ai,ai)xi(ai))(jN{i}xj(aj)))ui(a)\displaystyle=\delta\cdot\sum_{a^{\prime}_{i}\in A_{i}}x^{*}(a^{\prime}_{i})\cdot\sum_{a\in A}\left(\int_{X}d\sigma(x)\cdot(\delta(a^{\prime}_{i},a_{i})-x_{i}(a_{i}))\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)\right)\cdot u_{i}(a)
=δaiAix(ai)aAσ(a)(ui(ai,ai)ui(a)),\displaystyle=\delta\cdot\sum_{a^{\prime}_{i}\in A_{i}}x^{*}(a^{\prime}_{i})\cdot\sum_{a\in A}\sigma^{\prime}(a)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a)),

where we write δ(ai,ai)\delta(a^{\prime}_{i},a_{i}) for the Kronecker delta and define σ(a)=X𝑑σ(x)(jNxj(aj))\sigma^{\prime}(a)=\int_{X}d\sigma(x)\cdot\left(\prod_{j\in N}x_{j}(a_{j})\right). The last equality follows since

aAX𝑑σ(x)δ(ai,ai)ui(a)(jN{i}xj(aj))\displaystyle\ \ \sum_{a\in A}\int_{X}d\sigma(x)\cdot\delta(a^{\prime}_{i},a_{i})\cdot u_{i}(a)\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)
=\displaystyle= aiAiX𝑑σ(x)ui(ai,ai)(jN{i}xj(aj))\displaystyle\sum_{a_{-i}\in A_{-i}}\int_{X}d\sigma(x)\cdot u_{i}(a^{\prime}_{i},a_{-i})\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)
=\displaystyle= aiAiX𝑑σ(x)ui(ai,ai)(jN{i}xj(aj))(aiAixi(ai))=1\displaystyle\sum_{a_{-i}\in A_{-i}}\int_{X}d\sigma(x)\cdot u_{i}(a^{\prime}_{i},a_{-i})\cdot\left(\prod_{j\in N\setminus\{i\}}x_{j}(a_{j})\right)\cdot\underbrace{\left(\sum_{a_{i}\in A_{i}}x_{i}(a_{i})\right)}_{=1}
=\displaystyle= aAσ(a)ui(ai,ai).\displaystyle\sum_{a\in A}\sigma^{\prime}(a)\cdot u_{i}(a^{\prime}_{i},a_{i}).

Then as a consequence, for each player ii and each xiΔ(Ai)x^{*}_{i}\in\Delta(A_{i}), the associated local CCE constraint is simply a convex combination of constraints

aiAi,aAσ(a)(ui(ai,ai)ui(a))0.\forall\ a^{\prime}_{i}\in A_{i},\sum_{a\in A}\sigma^{\prime}(a)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a))\leq 0.

This is precisely the usual coarse correlated equilibrium constraint for the finite normal-form game. However, via our observation in (1), these equilibrium constraints are each generated by gradients of functions of the form xieai2/2-\|x_{i}-e_{a^{\prime}_{i}}\|^{2}/2. The conclusion is immediate.

Proposition 5.4.

For a normal-form game, consider the set of functions,

H={xieai2/2|iN,aiAi}.H=\{-\|x_{i}-e_{a^{\prime}_{i}}\|^{2}/2\ |\ i\in N,a^{\prime}_{i}\in A_{i}\}.

Suppose that σ\sigma is an first-order local CCE with respect to HH. Then the probability distribution σΔ(A)\sigma^{\prime}\in\Delta(A), defined as σ(a)=X𝑑σ(x)(jNxj(aj))\sigma^{\prime}(a)=\int_{X}d\sigma(x)\cdot\left(\prod_{j\in N}x_{j}(a_{j})\right), is a coarse correlated equilibrium in the usual sense. In turn, interpreting a coarse correlated equilibrium σ\sigma^{*} of the normal-form game as a probability distribution on Δ(A)\Delta(A), σ\sigma^{*} is a first-order local CCE with respect to HH.

We now consider what a primal-dual proof of uniqueness of such a local CCE looks like. Suppose that the probability distribution which places probability 11 on xXx^{*}\in X is the unique local CCE of the game, then the measure valued primal problem

maxσ0X𝑑σ(x)xx2\displaystyle\max_{\sigma\geq 0}\int_{X}d\sigma(x)\cdot\|x-x^{*}\|^{2} subject to (30)
iN,aiAi,aAσ(a)(ui(ai,ai)ui(a))\displaystyle\forall i\in N,\forall a^{\prime}_{i}\in A_{i},\sum_{a\in A}\sigma^{\prime}(a)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a)) 0\displaystyle\leq 0 (d(i,ai)d(i,a^{\prime}_{i}))
X𝑑σ(x)\displaystyle\int_{X}d\sigma(x) =1\displaystyle=1 (ω\omega)

has value 0. First, note that by the convexity of the objective in xx and the form of constraints d(i,ai)d(i,a^{\prime}_{i}), we may assume that the optimal solution σ\sigma concentrates all probability on pure action profiles. But that implies that we simply have the standard LP over the set of CCE (in the usual sense) of the normal-form game,

maxσ0aAσ(a)iNxi(ai)xi(ai)2\displaystyle\max_{\sigma^{\prime}\geq 0}\sum_{a\in A}\sigma^{\prime}(a)\cdot\sum_{i\in N}\|x_{i}(a_{i})-x^{*}_{i}(a_{i})\|^{2} subject to (31)
iN,aiAi,aAσ(a)(ui(ai,ai)ui(a))\displaystyle\forall i\in N,\forall a^{\prime}_{i}\in A_{i},\sum_{a\in A}\sigma^{\prime}(a)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a)) 0\displaystyle\leq 0 (d(i,ai)d(i,a^{\prime}_{i}))
aAσ(a)\displaystyle\sum_{a\in A}\sigma^{\prime}(a) =1\displaystyle=1 (ω\omega)

For this LP to have value 0, xi(ai)x_{i}^{*}(a_{i}) must necessarily be {0,1}\{0,1\}-valued; otherwise, the objective is strictly positive for the LP. So we have that xi(ai)=δ(ai,ai)x^{*}_{i}(a_{i})=\delta(a^{*}_{i},a_{i}) for each player ii, and some action profile aa^{*}. The associated dual LP is given, after scaling the primal objective by 1/21/2,

minω,d0ω\displaystyle\min_{\omega\in\mathbb{R},d\geq 0}\omega subject to (32)
aA,ω+iN,aiAid(i,ai)(ui(ai,ai)ui(a))\displaystyle\ \forall\ a\in A,\omega+\sum_{i\in N,a^{\prime}_{i}\in A_{i}}d(i,a^{\prime}_{i})\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a)) iN1δ(ai,ai).\displaystyle\geq\sum_{i\in N}1-\delta(a_{i},a_{i}^{*}). (σ(a)\sigma^{\prime}(a))

By [1] (Proposition 3.1), any tight dual solution dd^{*} necessarily has d(i,ai)>0ai=aid^{*}(i,a^{\prime}_{i})>0\Leftrightarrow a^{\prime}_{i}=a^{*}_{i}, and this condition is in fact equivalent to a unique CCE of the finite normal-form game in pure strategies. Letting hi(xi|ai)=12xieiai2h_{i}(x_{i}|a^{\prime}_{i})=-\frac{1}{2}\|x_{i}-e_{ia^{\prime}_{i}}\|^{2} and noting that

aAσ(a)(ui(ai,ai)ui(a))=X𝑑σ(x)ihi(xi|ai),iui(x),\sum_{a\in A}\sigma^{\prime}(a)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a))=\int_{X}d\sigma(x)\cdot\left\langle\nabla_{i}h_{i}(x_{i}|a^{\prime}_{i}),\nabla_{i}u_{i}(x)\right\rangle,

a tight solution to the dual problem of (30)999Recall, after scaling the objective by a factor 1/21/2.,

minω,μ0ω\displaystyle\min_{\omega\in\mathbb{R},\mu\geq 0}\omega subject to (33)
xX,ω+iN,aiAiμ(i,ai)i[xieiai2/2],iui(x)\displaystyle\ \forall\ x\in X,\omega+\sum_{i\in N,a^{\prime}_{i}\in A_{i}}\mu(i,a^{\prime}_{i})\cdot\left\langle\nabla_{i}[-\|x_{i}-e_{ia^{\prime}_{i}}\|^{2}/2],\nabla_{i}u_{i}(x)\right\rangle 12xx2,\displaystyle\geq\frac{1}{2}\|x-x^{*}\|^{2}, (σ(a)\sigma^{\prime}(a))

is given by μ(i,ai)=d(i,ai)\mu(i,a^{\prime}_{i})=d^{*}(i,a^{\prime}_{i}) and ω=0\omega=0. Finally, note that

ihi(xi|ai),iui(x)ihi(xi|ai),Π𝒯𝒞Xi(xi)[iui(x)],\left\langle\nabla_{i}h_{i}(x_{i}|a^{\prime}_{i}),\nabla_{i}u_{i}(x)\right\rangle\leq\left\langle\nabla_{i}h_{i}(x_{i}|a^{\prime}_{i}),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle,

since ihi(xi|ai)𝒯𝒞Xi(xi)\nabla_{i}h_{i}(x_{i}|a^{\prime}_{i})\in\mathcal{TC}_{X_{i}}\left(x_{i}\right) and hence ihi(xi|ai),Π𝒩𝒞Xi(xi)[iui(x)]0\left\langle\nabla_{i}h_{i}(x_{i}|a^{\prime}_{i}),\Pi_{\mathcal{NC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle\leq 0. As a consequence, we have

iNd(i,ai)ihi(xi|ai),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}d^{*}(i,a^{*}_{i})\left\langle\nabla_{i}h_{i}(x_{i}|a^{*}_{i}),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle 12xx2>0xX,xx, and\displaystyle\geq\frac{1}{2}\|x-x^{*}\|^{2}>0\ \forall\ x\in X,x\neq x^{*},\textnormal{ and}
iNd(i,ai)ihi(xi|ai),Π𝒯𝒞Xi(xi)[iui(x)]\displaystyle\sum_{i\in N}d^{*}(i,a^{*}_{i})\left\langle\nabla_{i}h_{i}(x_{i}|a^{*}_{i}),\Pi_{\mathcal{TC}_{X_{i}}\left(x^{*}_{i}\right)}\left[\nabla_{i}u_{i}(x^{*})\right]\right\rangle =0,\displaystyle=0,

where the first inequality is by dual optimality and the second inequality is because ihi(xi|ai)=0\nabla_{i}h_{i}(x_{i}^{*}|a^{*}_{i})=0 for any iNi\in N. As a consequence, as Π𝒯𝒞Xi(xi)[iui(x)]=dxi(τ)dτ\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]=\frac{dx_{i}(\tau)}{d\tau} for the projected gradient dynamics of the smooth game, iNd(i,ai)hi(xi|ai)\sum_{i\in N}d^{*}(i,a^{*}_{i})\cdot h_{i}(x_{i}|a^{*}_{i}) is necessarily a Lyapunov function modulo sign convention.

Proposition 5.5.

A finite normal-form game has a unique first-order local CCE with respect to HH which assigns probability 11 to a mixed-strategy profile xx^{*} if and only if xx^{*} is a Nash equilibrium in pure strategies and the convergence of the game’s projected gradient dynamics to xx^{*} may be proven via a Lyapunov function of the form h(x)=iNCixixi2h(x)=\sum_{i\in N}-C_{i}\cdot\|x_{i}-x^{*}_{i}\|^{2} for some constants Ci>0C_{i}>0.

This motivates the use of the notions of stationary and local CCEs as refinements of the classical notion of a CCE, and then employ the resulting incentive constraints in primal-dual arguments. That is, the equilibrium constraints given by functions of the form xieai2/2-\|x_{i}-e_{a^{\prime}_{i}}\|^{2}/2 imply that a local CCE of a normal form game with respect to set C2(Δ(A))C^{2}(\Delta(A)) of continuously differentiable functions is a coarse correlated equilibrium in the usual sense, but satisfies additional incentive constraints for each function hC2(Δ(A))h\in C^{2}(\Delta(A)).

5.2.2 Equivalence of average CCE and the smoothness framework

We next turn our attention to how the smoothness framework of [72] can also be interpreted as the analysis of a variant of first-order local CCE. Recall the defintion of a smoothness for a cost minimisation game:

Definition 11 (Adapted from [72]).

For a normal-form game, a function Ci:A+C_{i}:A\rightarrow\mathbb{R}_{+} for a player ii is called a cost function, and the game paired with such a cost function for each player ii is called a cost minimisation game. We denote by C=iNCiC=\sum_{i\in N}C_{i} the social cost. A cost minimisation game with a minimum social cost outcome aa^{*} is then called (λ,μ)(\lambda,\mu)-smooth if

i=1nCi(ai,ai)λC(a)+μC(a)aA.\sum_{i=1}^{n}C_{i}(a^{*}_{i},a_{-i})\leq\lambda\cdot C(a^{*})+\mu\cdot C(a)\ \forall\ a\in A.

For our purposes, the cost function of a player will be their disutility, Ci=uiC_{i}=-u_{i} for any player ii. Given this definition, [64] characterises the distributions for which a (λ,μ)(\lambda,\mu) smoothness bound applies as the those of an average coarse correlated equilibrium.

Definition 12 ([64], Definition 2).

For a normal form game and an outcome aa^{*}, an average coarse correlated equilibrium with respect to aa^{*} is a probability distribution σΔ(A)\sigma^{*}\in\Delta(A) such that

iN,aAσ(a)(ui(ai,ai)ui(a))0.\sum_{i\in N,a\in A}\sigma^{*}(a)\cdot\left(u_{i}(a^{*}_{i},a_{-i})-u_{i}(a)\right)\leq 0.

By [64] (Theorem 1), given a (λ,μ)(\lambda,\mu)-smooth cost minimisation game with a minimum social cost outcome aa^{*}, the price of anarchy of any average CCE with respect to aa^{*} is precisely λ/(1μ)\lambda/(1-\mu). The proof follows by considering the price of anarchy bound searching linear program

minp,z0p\displaystyle\min_{p\in\mathbb{R},z\geq 0}p subject to (34)
aA,pC(a)+ziN(Ci(a)Ci(ai,ai))\displaystyle\ \forall\ a\in A,p\cdot C(a^{*})+z\cdot\sum_{i\in N}(C_{i}(a)-C_{i}(a^{*}_{i},a_{-i})) C(a).\displaystyle\geq C(a). (σ(a)\sigma^{*}(a))

We observe immediately the similarity of (34) with (25). This leads us to inspect the associated dual LP written by [64],

maxσ0aAσ(a)(C(a)C(a))\displaystyle\max_{\sigma^{*}\geq 0}\sum_{a\in A}\sigma^{*}(a)\cdot\left(\frac{C(a)}{C(a^{*})}\right) subject to (35)
iN,aAσ(a)(ui(ai,ai)ui(a))\displaystyle\sum_{i\in N,a\in A}\sigma^{*}(a)\cdot(u_{i}(a^{*}_{i},a_{-i})-u_{i}(a)) 0\displaystyle\leq 0 (zz)
aAσ(a)\displaystyle\sum_{a\in A}\sigma^{*}(a) =1.\displaystyle=1. (pp)

By an argument analogous to those in Section 5.2.1, the equilibrium constraint zz is generated via the gradient of the function iNxieai2/2\sum_{i\in N}-\|x_{i}-e_{a^{*}_{i}}\|^{2}/2, whose gradient field is tangential. Thus, an equivalence result through an identical line of arguments, which we omit.

Proposition 5.6.

For a normal-form game and some action profile aa^{*}, consider the singleton set,

H={iNxieai2/2}.H=\left\{\sum_{i\in N}-\|x_{i}-e_{a^{*}_{i}}\|^{2}/2\right\}.

Suppose that σ\sigma is an first-order local CCE with respect to HH. Then the probability distribution σΔ(A)\sigma^{\prime}\in\Delta(A), defined as σ(a)=X𝑑σ(x)(jNxj(aj))\sigma^{\prime}(a)=\int_{X}d\sigma(x)\cdot\left(\prod_{j\in N}x_{j}(a_{j})\right), is an average coarse correlated equilibrium with respect to aa^{*}. In turn, interpreting such an average coarse correlated equilibrium σ\sigma^{*} of the normal-form game as a probability distribution on Δ(A)\Delta(A), σ\sigma^{*} is a first-order local CCE with respect to HH.

To see the implications with respect to the gradient dynamics of the underlying game, suppose that there is a smoothness proof of a price of anarchy bound of 11; e.g. whenever λ=μ=1/2\lambda=\mu=1/2. In this case, by [64] p=1,z=2p=1,z=2 is a solution of (34), which implies that for any aAa\in A,

2iN(ui(ai,ai)ui(a))C(a)C(a).2\cdot\sum_{i\in N}\left(u_{i}(a^{*}_{i},a_{i})-u_{i}(a)\right)\geq C(a)-C(a^{*}).

As a consequence, the gradient dynamics of the game is such that at any xΔ(A)x\in\Delta(A), the utility gradients iui(x)\nabla_{i}u_{i}(x) in aggregate point towards the pure action profile aa^{*} whenever the social cost is suboptimal. The associated Lyapunov function is h(x)=iNxieai2h(x)=\sum_{i\in N}-\|x_{i}-e_{a^{*}_{i}}\|^{2}, as in this case, the above inequality implies that

iNih(x),Π𝒯𝒞Xi(xi)[iui(x)]iN,aA(jNxj(a))(Ci(a)Ci(a)),\sum_{i\in N}\left\langle\nabla_{i}h(x),\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}\right)}\left[\nabla_{i}u_{i}(x)\right]\right\rangle\geq\sum_{i\in N,a\in A}\left(\prod_{j\in N}x_{j}(a)\right)\cdot\left(C_{i}(a)-C_{i}(a^{*})\right),

which is >0>0 whenever the expected social cost is strictly greater than that of the optimal one.

5.3 Equivalence of correlated equilibria

The equivalence of correlated equilibria of normal-form games to first-order local correlated equilibria with respect to a suitably chosen set of vector fields FF also follows from arguments analogous to those in Section 5.2.1. Here, however, we do not find a coarse representation. Our set of vector fields FF is instead given,

F={xi(ai)(eiaieiai)|iN,ai,aiAi}.F=\{x_{i}(a_{i})\cdot(e_{ia^{\prime}_{i}}-e_{ia_{i}})\ |\ i\in N,a_{i},a^{\prime}_{i}\in A_{i}\}.

Indeed, for any player ii and any x×iNΔ(Ai)Xx\in\times_{i\in N}\Delta(A_{i})\equiv X,

iui(x)ai\displaystyle\nabla_{i}u_{i}(x)_{a_{i}} =aiAi(jixj(aj))ui(ai,ai)\displaystyle=\sum_{a_{-i}\in A_{-i}}\left(\prod_{j\neq i}x_{j}(a_{j})\right)\cdot u_{i}(a_{i},a_{-i})
xi(ai)eiaieiai,iui(x)\displaystyle\Rightarrow x_{i}(a_{i})\cdot\left\langle e_{ia^{\prime}_{i}}-e_{ia_{i}},\nabla_{i}u_{i}(x)\right\rangle =xi(ai)aiAi(jixj(aj))(ui(ai,ai)ui(ai,ai))\displaystyle=x_{i}(a_{i})\cdot\sum_{a_{-i}\in A_{-i}}\left(\prod_{j\neq i}x_{j}(a_{j})\right)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a_{i},a_{-i}))
=xi(ai)𝔼aixi[ui(ai,ai)ui(ai,ai)].\displaystyle=x_{i}(a_{i})\cdot\mathbb{E}_{a_{-i}\sim x_{-i}}[u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a_{i},a_{-i})].

Therefore, for σΔ(X)\sigma\in\Delta(X), defining σ\sigma^{\prime} to be the probability distribution on AA induced by σ\sigma,

X𝑑σ(x)xi(ai)eiaieiai,iui(x)\displaystyle\int_{X}d\sigma(x)\cdot x_{i}(a_{i})\cdot\left\langle e_{ia^{\prime}_{i}}-e_{ia_{i}},\nabla_{i}u_{i}(x)\right\rangle
=\displaystyle= X𝑑σ(x)xi(ai)aiAi(jixj(aj))(ui(ai,ai)ui(ai,ai))\displaystyle\int_{X}d\sigma(x)\cdot x_{i}(a_{i})\cdot\sum_{a_{-i}\in A_{-i}}\left(\prod_{j\neq i}x_{j}(a_{j})\right)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a_{i},a_{-i}))
=\displaystyle= aiAi(X𝑑σ(x)xi(ai)jixj(aj))(ui(ai,ai)ui(ai,ai))\displaystyle\sum_{a_{-i}\in A_{-i}}\left(\int_{X}d\sigma(x)\cdot x_{i}(a_{i})\cdot\prod_{j\neq i}x_{j}(a_{j})\right)\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a_{i},a_{-i}))
=\displaystyle= aiAiσ(ai,ai)(ui(ai,ai)ui(ai,ai)).\displaystyle\sum_{a_{-i}\in A_{-i}}\sigma^{\prime}(a_{i},a_{-i})\cdot(u_{i}(a^{\prime}_{i},a_{-i})-u_{i}(a_{i},a_{-i})).

The final term is, of course, the left hand side of the usual (linear) correlated equilibrium constraints for a normal form game. The equivalence result is yet again immediate.

Proposition 5.7.

For a normal-form game, consider the set of vector fields

F={xi(ai)(eiaieiai)|iN,ai,aiAi}.F=\left\{x_{i}(a_{i})\cdot(e_{ia^{\prime}_{i}}-e_{ia_{i}})\ |\ i\in N,a_{i},a^{\prime}_{i}\in A_{i}\right\}.

Suppose that σ\sigma is an first-order local CE with respect to FF. Then the probability distribution σΔ(A)\sigma^{\prime}\in\Delta(A), defined as σ(a)=X𝑑σ(x)(jNxj(aj))\sigma^{\prime}(a)=\int_{X}d\sigma(x)\cdot\left(\prod_{j\in N}x_{j}(a_{j})\right), is a correlated equilibrium of the game in the usual sense. In turn, interpreting a correlated equilibrium σ\sigma^{*} of the normal-form game as a probability distribution on Δ(A)\Delta(A), σ\sigma^{*} is a first-order local CE with respect to FF.

Moreover, we may yet again convert a statement on the uniqueness of the correlated equilibrium of a game to one about its gradient dynamics. However, as discussed in Section 5.1, here convergence results do not translate to a Lyapunov function. This is because the associated dual problem concerning the uniqueness of a strategy profile xXx^{*}\in X is now one of vector field alignment,

minω,μ0ω\displaystyle\min_{\omega\in\mathbb{R},\mu\geq 0}\omega subject to (36)
xX,ω+iN,ai,aiAiμ(i,ai,ai)xi(ai)(eiaieiai),iui(x)\displaystyle\ \forall\ x\in X,\omega+\sum_{i\in N,a_{i},a^{\prime}_{i}\in A_{i}}\mu(i,a_{i},a^{\prime}_{i})\cdot\left\langle x_{i}(a_{i})\cdot(e_{ia^{\prime}_{i}}-e_{ia_{i}}),\nabla_{i}u_{i}(x)\right\rangle 12xx2.\displaystyle\geq\frac{1}{2}\|x-x^{*}\|^{2}. (σ(a)\sigma^{\prime}(a))

Our discussion of local CE in Section 5.1 nevertheless allows us to extract the associated implication for the game’s dynamics. That is, our set of vector fields FF must contain within its conical span a vector field positively aligned with the utility gradients, whose only fixed point is xx^{*}.

Proposition 5.8.

A finite normal-form game has a unique local CE with respect to FF which assigns probability 11 to a mixed-strategy profile xx^{*} if and only if xx^{*} is a Nash equilibrium in pure strategies and there exists multipliers μ(i,ai,ai)0\mu(i,a_{i},a^{\prime}_{i})\geq 0 for each iN,ai,aiAii\in N,a_{i},a^{\prime}_{i}\in A_{i}, such that

iN,ai,aiAiμ(i,ai,ai)xi(ai)(eiaieiai),iui(x)0xX,\sum_{i\in N,a_{i},a^{\prime}_{i}\in A_{i}}\mu(i,a_{i},a^{\prime}_{i})\cdot\left\langle x_{i}(a_{i})\cdot(e_{ia^{\prime}_{i}}-e_{ia_{i}}),\nabla_{i}u_{i}(x)\right\rangle\geq 0\ \forall\ x\in X,

with equality only if x=xx=x^{*}.

6 Further Directions & Extensions

In this paper, we have established the question of computing an approximate first-order analogue of a coarse correlated equilibrium, by identifying such outcomes as the natural property of the gradient dynamics of the underlying game and considering the weighted history of play when all players employ projected gradient ascent as their learning algorithm with equal learning rates. For the setting of finitely many vector fields, we have shown how Φ\Phi-regret minimisation may be performed to approximate first-order CE, and discussed tractable settings. Appealing to the properties of the gradient dynamics of the game and extending the usual convex optimisation based primal-dual framework for price of anarchy / stability bounds, we were then able to argue that performance guarantees for the resulting distribution may be proven. However, our results raise many questions yet unanswered, some of which we discuss in this final section.

Approximability of local & stationary (C)CE

Some questions pertain directly to the computation of approximate local or stationary (C)CE. First, we have not established how (ϵ,Δ)(\epsilon,\Delta)-approximations are approximable. This is because our approach has been to consider the tangent and normal cones pointwise at a given xXx\in X, explicitly avoiding the need to treat potential regret incurred by projections. It can in fact can be the case that for δ>0\delta>0, the respective projections involving a step in direction fi(x)f_{i}(x) or iui(x)\nabla_{i}u_{i}(x) involve projections on constraints distinct each other, or the ones which bind at xx. This holds true even for projected gradient ascent when xi(τ)x_{i}(\tau) is in the interior of XiX_{i}. How our appeals to the gradient dynamics of the game may be extended to cover such cases thus remains open. We remark that, unlike the approximability of ϵ\epsilon-local or stationary CCE, the associated approximation bounds of online projected gradient ascent would depend on the Lipschitz moduli of ff in general.

Similarly, we do not yet know approximability of first-order CCE in settings with general compact and convex action sets XiX_{i}, or even in the setting where all XiX_{i} are polyhedral. We have proven our results in settings where the approximate projected gradient dynamics are faithful to the tangent cone projection of the direction of motion for the unconstrained gradient dynamics at each point in time, but [57] demonstrates that this need not be the case for polyhedral XiX_{i} even in the setting of linear programming. A deeper question, perhaps, is “What is the correct parameter of complexity for compact and convex XiX_{i} and approximability of local / stationary CCE via approximate projected gradient dynamics?”. When XiX_{i} has a smooth boundary, the approximation guarantee deteriorates linearly in KK, the bound on the principal curvature of the boundary δXi\delta X_{i}. However, when XiX_{i} are acute polyhedra – a condition intuitively in contradiction to δXi\delta X_{i} having bounded curvature – projected gradient ascent models approximate projected gradient dynamics perfectly. It is thus yet unknown how to bound the approximation guarantees in the setting with general compact and convex action sets.

Generalisations of local / stationary CCE

Our local coarse notions of equilibria are also, in some sense, “bespoke” for projected gradient dynamics. We note that, while we have proven our results for projected gradient dynamics of the smooth game, we have never actually used the fact that the direction of motion is the gradient of some utility function for each player. Therefore, our results apply for any projected gradient dynamics of the form dxi(τ)/dτ=Π𝒯𝒞Xi(xi(τ))[Fi(x(τ))]dx_{i}(\tau)/d\tau=\Pi_{\mathcal{TC}_{X_{i}}\left(x_{i}(\tau)\right)}\left[F_{i}(x(\tau))\right] for each player ii, with the resulting “equilibrium constraints” obtained by swapping any expression iui(x(τ))\nabla_{i}u_{i}(x(\tau)) with Fi(x(τ))F_{i}(x(\tau)). This implies a different notion of local or stationary CCE for each distinct time-independent gradient based learning algorithm; time-independent in the sense that no FiF_{i} has explicit time or history dependence, but are determined solely via x(τ)x(\tau) at time τ\tau. That regret against the gradient fields of some functions are minimised not only by projected gradient ascent is known already. MWU for instance is also a time-independent gradient based learning algorithm in this sense, and it also incurs no-external regret, the associated incentive constraints for which are generated by functions of the form xieai2/2-\|x_{i}-e_{a^{\prime}_{i}}\|^{2}/2. How to simultaneously refine outcomes reached via such dynamics within a class of equilibria is thus open. Such an equilibrium concept would of course be a relaxation of equilibrium notions discussed in this paper.

We observe that mean-based learning algorithms [13] obtain a generalisation of various dual averaging based learning algorithms, and comparing the results of [36] and [31] we know that the outcomes of mean-based learning algorithms can be a strict subset of the (non-local) CCE of a game. However, to our knowledge projected gradient ascent is not known to be such an algorithm. Moreover, our Lyapunov function based primal-dual approach is at the moment ill-suited for dual averaging based methods in the first place. The associated continuous dynamics for dual averaging may be obtained [61] by setting action sets Yi=diY_{i}=\mathbb{R}^{d_{i}} for each player ii, and letting

xi(τ)\displaystyle x_{i}(\tau) =maxxiXixi,yiri(xi), and\displaystyle=\max_{x_{i}\in X_{i}}\left\langle x_{i},y_{i}\right\rangle-r_{i}(x_{i}),\textnormal{ and} (37)
dyi(τ)/dτ\displaystyle dy_{i}(\tau)/d\tau =iui(x(τ)),\displaystyle=\nabla_{i}u_{i}(x(\tau)), (38)

where ri:X{}r_{i}:X\rightarrow\mathbb{R}\cup\{\infty\} is a lower semi-continuous and convex regulariser. In settings where ri(xi)=r_{i}(x_{i})=\infty if and only if xiδXix_{i}\in\delta X_{i} and rir_{i} is differentiable in the interior of XiX_{i}, we are effectively faced with a smooth game where players’ actions sets are did_{i} dimensional Euclidean spaces, and their equations of motion satisfy dxi(τ)/dτ0dx_{i}(\tau)/d\tau\rightarrow 0 as yi(τ)\|y_{i}(\tau)\|\rightarrow\infty. As a consequence, our approximability results break down completely, and how to extend our specific framework for price of anarchy / stability bounds to such settings at present unknown. The average price of anarchy approach of [66, 77] offers one solution, yet at present the performance metric does not come with convergence rate guarantees, even in expectation.

Similar presently unexplored directions for notions of equilibria relate to relaxing other assumptions. If we drop the previously mentioned time-independence assumption, we obtain settings where agents may learn at asymptotically different rates as a subclass; this question was investigated in [79] for monotone games. However, such games admit a Lyapunov function xx2\|x-x^{*}\|^{2} which proves convergence to their unique equilibrium – a strong assumption that does not hold in many settings of interest. The time-independence assumption is also violated for any algorithm that uses history information, such that (a weighted) average history of play. Another assumption that may be dropped is deterministic motion, say for stochastic gradient ascent, in which case the continuous time dynamical system would be a Markov process. While [43] have already investigated Brownian motion, which can be taken as a model of SGD (cf. [45, 55]), it is unclear how the results within apply with respect to approximation guarantees in our constrained setting.

On tractable local CE

Whereas we have demonstrated a generalisation of correlated equilibrium is tractable, it remains unknown how to procedurally generate a basis for such a family. Moreover, the affine-linearity of a vector field is a rather strong condition, and such vector fields in general fail to be tangential over a more general polytope. Finding wider settings in which fixed point computation for Algorithm 2 is tractable and the associated choice of FF is meaningful is thus an open problem. Finally, we remark that our convergence results are probably not the best possible amongst algorithms. Recent work [24, 2] shows that swap regret may be minimised in O(1/T)O(1/T) (up to polylog factors) iterations via more specialised algorithms, and we suspect similar methods might apply in our setting.

Further directions in primal-dual analysis

The final question pertains to how useful our primal-dual framework could be. Whereas we obtain a primal-dual framework to prove bounds on the performance of gradient ascent, Example 2 demonstrates that finding dual solutions and verifying their validity can be a daunting task. It is therefore of interest whether there exist systematic ways of constructing generalised Lyapunov functions to prove dual bounds, tight or approximate. One approach would be to consider relaxations to subclasses of differentiable functions with Lipschitz gradients for which the problem is tractable. As usual CCE of normal form games are tractable in the sense that performance bounds take the form of an LP, we infer by the discussion in Section 5 that there exists at least one such relaxation. In fact, given the tractibility of linear programming and that computing performance bounds over the set of (C)CE reduces to an LP, one interesting question is classifying stronger convex programming formulations which capture the guarantees of learning algorithms.

Acknowledgements

Special thanks to Constantinos Daskalakis for the introduction to the question and discussions, and to Martin Bichler, Denizalp Göktaş, Alexandros Hollender, Matthias Oberlecher, and further anonymous reviewers for their feedback. This project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Grant No. BI 1057/9.

References

  • AB [24] Mete Şeref Ahunbay and Martin Bichler. On the uniqueness of Bayesian coarse correlated equilibria in standard first-price and all-pay auctions, 2024.
  • ADF+ [22] Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, and Tuomas Sandholm. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022.
  • ADK+ [08] Elliot Anshelevich, Anirban Dasgupta, Jon Kleinberg, Éva Tardos, Tom Wexler, and Tim Roughgarden. The price of stability for network design with fair cost allocation. SIAM Journal on Computing, 38(4):1602–1623, 2008.
  • AFP [21] Gabriel P. Andrade, Rafael Frongillo, and Georgios Piliouras. Learning in matrix games can be arbitrarily complex. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 159–185. PMLR, 15-19 Aug 2021.
  • AFP [22] Gabriel P. Andrade, Rafael Frongillo, and Georgios Piliouras. No-regret learning in games is turing complete. arXiv preprint arXiv:2202.11871, 2022.
  • AN [87] Edward J. Anderson and Peter Nash. Linear Programming in Infinite-dimensional Spaces: Theory and Applications. A Wiley-Interscience publication. Wiley, 1987.
  • BBM [17] Dirk Bergemann, Benjamin Brooks, and Stephen Morris. First-price auctions with general information structures: Implications for bidding and revenue. Econometrica, 85(1):107–143, 2017.
  • Ber [05] Ulrich Berger. Fictitious play in 2 ×\times n games. Journal of Economic Theory, 120(2):139–154, 2005.
  • BFH+ [21] Martin Bichler, Maximilian Fichtl, Stefan Heidekrüger, Nils Kohring, and Paul Sutterer. Learning equilibria in symmetric auction games using artificial neural networks. Nature Machine Intelligence, 3:687–695, 2021.
  • BFO [23] Martin Bichler, Maximilian Fichtl, and Matthias Oberlechner. Computing Bayes-Nash equilibrium in auction games via gradient dynamics. Operations Research, to appear, 2023.
  • BHS [12] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Perturbations of set-valued dynamical systems, with applications to game theory. Dynamic Games and Applications, 2:195–205, 2012.
  • Bil [18] Vittorio Bilò. A unifying tool for bounding the quality of non-cooperative solutions in weighted congestion games. Theory of Computing Systems, 62:1288–1317, 2018.
  • BMSW [18] Mark Braverman, Jieming Mao, Jon Schneider, and Matt Weinberg. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 523–538, 2018.
  • BP [18] James P. Bailey and Georgios Piliouras. Multiplicative weights update in zero-sum games. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 321–338. ACM, 2018.
  • Bro [51] George W Brown. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1):374–376, 1951.
  • CBL [06] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
  • CD [06] Xi Chen and Xiaotie Deng. Settling the complexity of two-player Nash equilibrium. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’06, page 261–272, USA, 2006. IEEE Computer Society.
  • CDL+ [24] Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng. On tractable ϕ\phi-equilibria in non-concave games. arXiv preprint arXiv:2403.08171, 2024.
  • CJ [04] Monica-Gabriela Cojocaru and Leo Jonker. Existence of solutions to projected differential equations in hilbert spaces. Proceedings of the American Mathematical Society, 132(1):183–193, 2004.
  • CKS [16] George Christodoulou, Annamária Kovács, and Michael Schapira. Bayesian combinatorial auctions. Journal of the ACM (JACM), 63(2):1–19, 2016.
  • CKST [16] George Christodoulou, Annamária Kovács, Alkmini Sgouritsa, and Bo Tang. Tight bounds for the price of anarchy of simultaneous first-price auctions. ACM Transactions on Economics and Computation (TEAC), 4(2):1–33, 2016.
  • CL [06] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, Cambridge; New York, 2006.
  • CP [14] Yang Cai and Christos Papadimitriou. Simultaneous Bayesian auctions and computational complexity. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, page 895–910, New York, NY, USA, 2014. Association for Computing Machinery.
  • CP [20] Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990–18999, 2020.
  • CP [21] Yun Kuen Cheung and Georgios Piliouras. Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games. In Proceedings of Thirty Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 807–834. PMLR, 2021.
  • CP [23] Xi Chen and Binghui Peng. Complexity of equilibria in first-price auctions under general tie-breaking rules. arXiv preprint arXiv:2303.16388, 2023.
  • CS [08] Vincent Conitzer and Tuomas Sandholm. New complexity results about Nash equilibria. Games and Economic Behavior, 63(2):621–641, 2008. Second World Congress of the Game Theory Society.
  • Das [13] Constantinos Daskalakis. On the complexity of approximating a Nash equilibrium. ACM Transactions on Algorithms (TALG), 9(3):1–35, 2013.
  • Das [22] Constantinos Daskalakis. Non-concave games: A challenge for game theory’s next 100 years. In Cowles Preprints, 2022.
  • DGP [09] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
  • DHLZ [22] Xiaotie Deng, Xinyan Hu, Tao Lin, and Weiqiang Zheng. Nash convergence of mean-based learning algorithms in first price auctions. In Proceedings of the ACM Web Conference 2022, pages 141–150, 2022.
  • DISZ [18] Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training gans with optimism. International Conference on Learning Representations (ICLR 2018), 2018.
  • DN [93] Paul Dupuis and Anna Nagurney. Dynamical systems and variational inequalities. Annals of Operations Research, 44:7–42, 1993.
  • DSZ [21] Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1466–1478, 2021.
  • FGL+ [21] Zhe Feng, Guru Guruganesh, Christopher Liaw, Aranyak Mehta, and Abhishek Sethi. Convergence analysis of no-regret bidding algorithms in repeated auctions. volume 35, pages 5399–5406, 2021.
  • FLN [16] Michal Feldman, Brendan Lucier, and Noam Nisan. Correlated and coarse equilibria of single-item auctions. In Yang Cai and Adrian Vetta, editors, Web and Internet Economics, pages 131–144, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.
  • FP [62] Miroslav Fiedler and Vlastimil Pták. On matrices with non-positive off-diagonal elements and positive principal minors. Czechoslovak Mathematical Journal, 12(3):382–400, 1962.
  • FRGH+ [21] Aris Filos-Ratsikas, Yiannis Giannakopoulos, Alexandros Hollender, Philip Lazos, and Diogo Poças. On the complexity of equilibrium computation in first-price auctions. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 454–476, 2021.
  • GGM [08] Geoffrey J. Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In Proceedings of the 25th International Conference on Machine Learning, pages 360–367, 2008.
  • GJ [03] Amy Greenwald and Amir Jafari. A general class of no-regret learning algorithms and game-theoretic equilibria. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 2–12. Springer, 2003.
  • GLM [06] Amy Greenwald, Zheng Li, and Casey Marks. Bounds for regret-matching algorithms. In AI&M, 2006.
  • Gor [05] Geoffrey J Gordon. No-regret algorithms for structured prediction problems. Carnegie Mellon University. Center for Automated Learning and Discovery, 2005.
  • GZ [08] Peter W. Glynn and Assaf Zeevi. Bounding stationary expectations of markov processes. In Markov processes and related topics: a Festschrift for Thomas G. Kurtz, volume 4, pages 195–215. Institute of Mathematical Statistics, 2008.
  • HK [07] Elad Hazan and Satyen Kale. Computational equivalence of fixed points and no regret algorithms, and convergence to equilibria. Advances in Neural Information Processing Systems, 20, 2007.
  • HLLL [19] Wenqing Hu, Chris Junchi Li, Lei Li, and Jian-Guo Liu. On the diffusion approximation of nonconvex stochastic gradient descent. Annals of Mathematical Sciences and Applications, 4(1), 2019.
  • HMC [21] Nadav Hallak, Panayotis Mertikopoulos, and Volkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. In International Conference on Machine Learning, pages 4008–4017. PMLR, 2021.
  • HS [88] John C. Harsanyi and Reinhard Selten, editors. A general theory of equilibrium selection in games. MIT Press, 1st edition, 1988.
  • HSZ [17] Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. In International Conference on Machine Learning, pages 1433–1441. PMLR, 2017.
  • JL [22] Yaonan Jin and Pinyan Lu. Settling the efficiency of the first-price auction. SIGecom Exch, 20:69–74, 2022.
  • KN [22] Yoav Kolumbus and Noam Nisan. Auctions between regret-minimizing agents. In Proceedings of the ACM Web Conference 2022, pages 100–111, 2022.
  • KP [09] Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. Computer science review, 3(2):65–69, 2009.
  • LBR+ [19] Alistair Letcher, David Balduzzi, Sébastien Racanière, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. Differentiable game mechanics. Journal of Machine Learning Research, 20(84):1–40, 2019.
  • Lee [12] John M. Lee. Introduction to Smooth Manifolds. Springer New York, NY, 2012.
  • Lim [88] Elon L. Lima. The jordan-brouwer separation theorem for smooth hypersurfaces. The American Mathematical Monthly, 95(1):39–42, 1988.
  • LO [19] Xiaoyu Li and Francesco Orabona. On the convergence of stochastic gradient descent with adaptive stepsizes. In The 22nd international conference on artificial intelligence and statistics, pages 983–992. PMLR, 2019.
  • LST [12] Renato Paes Leme, Vasilis Syrgkanis, and Éva Tardos. Sequential auctions and externalities. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 869–886. SIAM, 2012.
  • MGP [20] Hassan Mortagy, Swati Gupta, and Sebastian Pokutta. Walking in the shadow: A new perspective on descent directions for constrained minimization. Advances in neural information processing systems, 33:12873–12883, 2020.
  • MPPS [22] Jason Milionis, Christos Papadimitriou, Georgios Piliouras, and Kelly Spendlove. Nash, Conley, and computation: Impossibility and incompleteness in game dynamics, 2022.
  • MS [96] Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124–143, 1996.
  • MS [22] Carlos Martin and Tuomas Sandholm. Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks. arXiv preprint arXiv:2211.15936, 2022.
  • MZ [19] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. Math. Program., 173(1–2):465–507, jan 2019.
  • Nas [50] John F. Nash. Equilibrium points in nn-person games. Proceedings of the National Academy of Sciences, 36(1):48–49, 1950.
  • Ngu [19] Kim Thang Nguyen. Primal-Dual Approaches in Online Algorithms, Algorithmic Game Theory and Online Learning. Habilitation à diriger des recherches, Université Paris Sorbonne, June 2019.
  • NR [10] Uri Nadav and Tim Roughgarden. The limits of smoothness: A primal-dual framework for price of anarchy bounds. In International Workshop on Internet and Network Economics, pages 319–326. Springer, 2010.
  • NZ [12] Anna Nagurney and Ding Zhang. Projected dynamical systems and variational inequalities with applications, volume 2. Springer Science & Business Media, 2012.
  • PP [16] Ioannis Panageas and Georgios Piliouras. Average case performance of replicator dynamics in potential games via computing regions of attraction. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 703–720, 2016.
  • PP [19] Christos Papadimitriou and Georgios Piliouras. Game dynamics as the meaning of a game. SIGecom Exch., 16(2):53–63, May 2019.
  • PPP [17] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos. Advances in Neural Information Processing Systems, 30, 2017.
  • Rob [51] Julia Robinson. An iterative method of solving a game. Annals of mathematics, pages 296–301, 1951.
  • Ros [65] J. Ben Rosen. Existence and uniqueness of equilibrium points for concave nn-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
  • Rou [05] Tim Roughgarden. Selfish routing and the price of anarchy. MIT press, 2005.
  • Rou [15] Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM (JACM), 62(5):1–42, 2015.
  • San [20] William H Sandholm. Evolutionary game theory. Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, pages 573–608, 2020.
  • Sha [64] Lloyd Shapley. Some topics in two-person games. Advances in game theory, 52:1–29, 1964.
  • Sha [01] Alexander Shapiro. On duality theory of conic linear problems. Nonconvex Optimization and its Applications, 57:135–155, 2001.
  • SL [07] Gilles Stoltz and Gábor Lugosi. Learning correlated equilibria in games with compact sets of strategies. Games and Economic Behavior, 59(1):187–208, 2007.
  • SLS+ [24] Iosif Sakos, Stefanos Leonardos, Stelios Andrew Stavroulakis, William Overman, Ioannis Panageas, and Georgios Piliouras. Beating price of anarchy and gradient descent without regret in potential games. In The Twelfth International Conference on Learning Representations, 2024.
  • SM [03] Andreas S. Schulz and Nicolás E. Stier Moses. On the performance of user equilibria in traffic networks. In SODA, volume 3, pages 86–87, 2003.
  • TTRL [23] Shaolin Tan, Ye Tao, Maopeng Ran, and Hao Liu. On the convergence of distributed projected gradient play with heterogeneous learning rates in monotone games. Systems & Control Letters, 182:105654, 2023.
  • Vet [02] Adrian Vetta. Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 416–425. IEEE, 2002.
  • You [04] H. Peyton Young. Strategic learning and its limits. Oxford University Press, 2004.
  • ZMA+ [18] Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W Glynn, and Yinyu Ye. Learning in games with lossy feedback. Advances in Neural Information Processing Systems, 31, 2018.