First-order (coarse) correlated equilibria in non-concave games
Abstract
We investigate first-order notions of correlated equilibria; distributions of actions for smooth, potentially non-concave games such that players do not incur any regret against small modifications to their strategies along a set of continuous vector fields. We define two such notions, based on local deviations and on stationarity of the distribution, and identify the notion of coarseness as the setting where the associated vector fields are in fact gradient fields. For coarse equilibria, we prove that online (projected) gradient decent has a universal approximation property for both variants of equilibrium. In the non-coarse setting, we instead reduce the problem of finding an equilibrium to fixed-point computation via the usual framework of -regret minimisation, and identify tractable instances. Finally, we study the primal-dual framework to our notion of first-order equilibria. For coarse equilibria defined by a family of functions, we find that a dual bound on the worst-case expectation of a performance metric takes the form of a generalised Lyapunov function for the dynamics of the game. Specifically, usual primal-dual price of anarchy analysis for coarse correlated equilibria as well as the smoothness framework of Roughgarden are both equivalent to a problem of general Lyapunov function estimation. For non-coarse equilibria, we instead observe a vector field fit problem for the gradient dynamics of the game. These follow from containment results in normal form games; the usual notion of a (coarse) correlated equilibria is equivalent to our first-order local notions of (coarse) correlated equilibria with respect to an appropriately chosen set of vector fields.
1 Introduction
The central question we study is on notions of correlated equilibria in a smooth game; in abstract, given a set of players , compact & convex action sets and sufficiently smooth payoffs , a correlated equilibrium is a probability distribution on satisfying for some family of linear equilibrium constraints . Each equilibrium constraint is taken to capture in expectation players’ deviation incentives, i.e. the change in payoffs when players’ actions are drawn , and a player considers “modifying” their drawn strategy in some manner prescribed by .
Our motivation is two-fold; first, we seek an answer to a question posed in [29] regarding what an appropriate theory of non-concave games should be. Specifically, whereas a Nash equilibrium necessarily exists for a concave game, this need not be true when players’ utilities can be non-concave in their own actions. While traditional game theory has focused its attention on concave games, most settings relevant to the recent advances in machine learning are appropriately modelled as non-concave games. Moreover, whereas first-order Nash equilibrium does exist in non-concave games, even its approximate versions are -hard to compute. This raises the question of what a tractable notion of equilibrium is for non-concave games.
Second, we aim to refine the notions of (coarse) correlated equilibria of normal form games to strengthen the analysis of their learnable outcomes via appeals to linear programming duality. A classic result from learning theory is that when all players employ no-internal (external) regret algorithms in a normal-form game, any convergent subsequence of aggregate history of play does so to a (coarse) correlated equilibrium. Moreover, notions of equilibria are given by a set of linear inequalities of size polynomial in the size of the game. However, no-regret learning can be too weak to explain observed convergence of quintessential learning algorithms such as online gradient ascent (OGA) or the multiplicative weights update (MWU). One setting of particular importance is that of auctions, where empirical studies suggest Nash equilibria are learnable but the theoretical basis for why it is so has proven elusive. For instance, [10] shows that a variant of MWU (equivalent to dual averaging with an entropic barrier [61]) converges to (an approximate) equilibrium in a variety of single-item and combinatorial auction settings. However, the coarse correlated equilibria of complete information first-price auctions can be significantly different from the auction’s unique Nash equilibrium [36], and mean-based learning [13] only provides a partial answer [50, 31]. In turn, no-regret learning explains convergence to equilibrium in incomplete information first-price auctions only under stringent concavity assumptions on buyers’ prior distributions [1]. This apparent contradiction suggests that the incentive constraints associated with no-external regret learning potentially form merely a subset of all incentive constraints specific learning algorithms satisfy. In other words, there might exist hitherto unkown equilibrium refinements over coarse correlated equilibria, which is associated with learned outcomes when players utilise e.g. OGA or MWU.
1.1 Contributions and Technical Summary
Given our motivations, we seek notions of equilibrium that are tractably computable given access to payoff gradients independent of concavity assumptions on the payoffs, and that contain (coarse) correlated equilibria of normal form games as special cases. Towards this end, we build upon the recent work of [18], who considered a local variant of regret traditionally considered in literature (cf. [39]), and showed two such families of local strategy modifications are tractably computable via online gradient ascent. Our key insight is that the -strategy modifications contained in their work are generated by gradient fields of a suitable family of functions.
To provide intuition for our arguments, consider a smooth game with a set of players , compact and convex action sets for each player (with their product denoted ), and smooth utility functions for each player . Suppose all players implement online gradient ascent with the same adaptive step sizes at time . That is, if player chooses action at time , they will fix
To impose a correlated distribution of play as an equilibrium concept, let each is drawn with probability proportional to . Now consider a “local” deviation for player , in which they consider deviating in their strategies along the gradient of a function . That is, for , they consider .
What is the possible change in payoffs by such a local deviation? Consider for simplicity the case when no projections are ever required for the purposes of local deviations and during online gradient ascent, i.e. both and . In this case, the change in utility is given, after time steps and ignoring terms of order ,
The second equality here is by considering a Taylor expansion of , and the fourth inequality follows from a Taylor expansion of . As a consequence, if the function which generates these local deviations are bounded, under suitable assumptions for the step sizes, the regret against of order would vanish with time. Dividing both sides by and letting , we see that the expectation of , the “first-order local regret”, would vanish also.
However, when projections are involved, we would need to bound losses we obtain due to terms of the form , . This turns out to be a non-trivial task. Yet, when we consider the continuous motion induced by the game dynamics, defined via the differential equation , no such loss occurs and we observe a declining regret bound through very straightforward differential arguments (cf. Section 3). This is via an argument on stationarity of the resulting distribution. Namely, given a smooth function , each action profile assigns a rate of change to via the projected gradient dynamics of the game; given ,
and the expectation of this quantity over a time period is simply equal to , which converges to zero. Intuitively, the longer we let the curve trace out, the closer it becomes to forming a distribution invariant under time translation. Exchanging which vector is projected onto the tangent cone, we observe that
except for a set of measure zero on the curve traced by the projected gradient dynamics of the game. This motivates our two first-order notions of equilibrium, one based on local deviation, the other based on stationarity.
Definition (Semiformal).
For , a distribution over is said to be an -local CE with respect to a family of Lipschitz continuous vector fields , if for every , . For an -stationary CE, we project instead of onto the tangent cone to player ’s action set at . Moreover, we call the equilibrium coarse each vector field is equal to for some function .
In particular, since our high level argument hints that projected gradient descent minimises regret with respect to the gradient field of any such function , we identify the notion of coarseness to be so. Meanwhile, for more general correlated equilibria, we allow deviations generated by arbitrary set of vector fields on .
As mentioned, the main technical barrier is in dealing with projections. Our chief technical innovation is thus in Section 3.1, where we wriggle our hands free of dealing with them directly, by constructing an appropriate continuous (and almost everywhere differentiable) curve over which regret analysis is simple. We essentially convert the sequence of play to a continuous curve , “connecting the dots” by projecting a partial gradient step, scaled by some factors which allows us to modify the probability distribution we consider over . For instance, suppose that the initial strategies are , and each player fixes . We interpret this step as having been taken “over time ”, i.e. we let and . To extend to continuously, we then let for each player . This extension is continuous on , and more desirably, is differentiable almost everywhere with respect to the Lebesgue measure on the interval. Moreover, the choice corresponds to a simulation of the gradient dynamics of the game, whereas roughly corresponds to a uniform distribution over .
Bounding the regret then amounts of bounding the difference between , the velocity along the curve given by our “approximate” gradient dynamics for the game, and , the actual velocity prescribed by the dynamics of the game. When the boundary of the action sets are regular hypersurfaces in Euclidean space, the difference turns out to be small, whereas if the actions sets are polyhedra satisfying a positivity condition (Definition 10), the difference only depends on the Lipschitz moduli of the utilities. We remark that the sufficient conditions we identify for the actions sets are satisfied by many settings of interest – it includes e.g. the sphere, the hypercube or the simplex. Consequently, we obtain proof that projected gradient ascent with equal step sizes has a universal approximation property.
Theorem (Informal, Theorems 3.4, 3.5, & 3.8).
Suppose that players all implement projected gradient ascent with step sizes . Further suppose that players’ action sets satisfy one of the above assumptions. Then, sampling uniformly from the trajectory of the resulting curve , we obtain a distribution which is both an -local and an -stationary CCE with respect to the set of gradient fields of all continuously differentiable functions with Lipschitz gradients. The regret bound is non-uniform among all such functions; it depends linearly on a bound on the magnitude of the gradient of .
That aggregate regret over the players against first-order deviations generated by any such function depends crucially on the fact that when all players use the same step sizes, the time sequence simulates the underlying continuous projected gradient dynamics of the game. Intuitively, zero regret in the first-order against any such deviation provides a convex description of solutions to the projected gradient dynamics, once the time-ordering information on the curve is erased. Thus, with potentially different step sizes which might have different asymptotic rates of decrease, or when some players can be adversarial, our methods fail to guarantee coarse regret bounds in such generality. However, first-order regret is still vanishing, when the deviations are restricted to vary only on players who do use the same step size. The result depends on a similarly defined curve , which interpolates the gradient steps of relevant players.
Theorem (Informal, Theorems 3.9 & 5.3).
Suppose that a subset of players all implement projected gradient ascent with step sizes , while others players are adversarial. Further suppose that players’ action sets satisfy one of the regularity assumptions. Then, sampling uniformly from the trajectory of an appropriately defined curve , we obtain a distribution which is both an -local and an -stationary CCE with respect to the set of gradient fields of continuously differentiable functions with Lipschitz gradients whose value changes only in the strategies of the players in . Moreover, if “points inwards” to , players in also incur local regret in time-average.
In Section 4, we turn our attention to first-order correlated equilibria. Here, we observe that the usual methods for -regret minimisation work for computing both stationary and local equilibria; the latter if the vector fields satisfy a tangency condition.
Theorem (Informal, Theorems 4.1 and 4.2).
Suppose that is a finite set of vector fields on . Then with access to fixed-point computation for all vector fields in the linear span of , an -stationary correlated equilibrium can be computed in iterations. An -local correlated equilibrium has the same approximability property if all are tangent to players’ action sets, with access to fixed-point computation for all vector fields in the conical span of instead.
The question is whether there exists an interesting setting in which fixed-point computation can be performed tractably. We identify one setting; when each vector field is affine-linear, each fixed point computation reduces to minimising a quadratic function. Example 1 shows that a careful selection of such vector fields results in last iterate convergence to the equilibrium in the matching pennies game. This suggests that a centralised learning algorithm might not require optimism [32] or explicit second-order corrections to the utility gradients [52] to obtain last iterate convergence to equilibrium in games.
In Section 5, we focus on equilibrium analysis. Our goal here is to determine how equilibrium can be reasoned about. Towards this end, we adapt in Section 5.1 the usual primal-dual framework for equilibrium analysis, bounding the expectation of a quantity defined over the set of outcomes (e.g. welfare, distance to an equilibrium) for our equilibrium concepts. Our observation is that for a primal problem which seeks a best or worst-case correlated equilibrium, a dual solution provides information on the gradient dynamics of the game. In particular, if the primal problem maximizes expected worst-case distance to an equilibrium and there exists a dual solution of value , the solution is a Lyapunov function by its usual definition. For correlated equilibrium, the dual problem is instead that of a vector field fit problem, searching for a linear or conical combination of vector fields in which are well-aligned with the utility gradients.
Finally, in Section 5.2 we look at how first-order local equilibria look like for continuous extensions of normal form games. Here, we find an interesting equivalence result.
Proposition (Informal, Propositions 5.4, 5.6 & 5.7).
The usual notions of (coarse) correlated equilibrium, as well the notion of an average coarse correlated equilibrium [64] are equivalent to a suitably defined first-order local (coarse) correlated equilibrium.
The equivalence here is in terms of equilibrium constraints on the resulting distributions of action profiles; a first-order correlated equilibrium defined with respect to an appropriate set of vector fields induces a distribution on action profiles that correspond to the one of the above equilibrium concepts. Moreover, the converse statement is also true. It is in this sense that the usual methods for primal-dual analysis of price of anarchy are given an interpretation as “Lyapunov function estimation problems”. The equivalence also shows that our equilibrium notions may be considered true refinements and extensions of the usual notions of (coarse) correlated equilibria to the richer setting of non-concave games. Finally, we remark that as desired, the set of vector fields corresponding to the incentive constraints for (average) coarse correlated equilibrium satisfy also our notion of coarseness for first-order correlated equilibria, justifying our nomenclature.
1.2 Related work
Game theoretic analysis of multi-agent systems is often based on first the assertion of a concept of equilibrium as an expected outcome, followed by an investigation of its properties. The classical assumption is that with self-interested agents, the outcome of the game should be a Nash equilibrium, which exists for finite normal-form games [62], or more generally, concave games [70].
Diverging from classical equilibrium theory for convex markets, the outcome of a game need not be socially optimal. The algorithmic game theory perspective has then been to interpret the class of equilibria in question from the lens of approximation algorithms. First proposed by [51], the price of anarchy measures the worst-case ratio between the outcome of a Nash equilibrium and the socially optimal one. A related concept measuring the performance of the best-case equilibrium outcome, the price of stability was first employed by [78] and named so in [3]. Since then, a wealth of results have followed, proving constant welfare approximation bounds for e.g. selfish-routing [71], facility location [80], a variety of auction settings including single-item [49] and simultaneous [21] first-price auctions, simultaneous second-price auctions [20], hinting at good performance of the associated mechanisms in practice. Meanwhile, deteriorating price of anarchy bounds, such as those for sequential first-price auctions [56], are interpreted as arguments against the use of such mechanisms.
However, the assertion that the outcome of a game should be a Nash equilibrium is problematic for several reasons, despite the guarantee that it exists in concave games. It is often the case that multiple equilibria exists in a game, in which case we are faced with an equilibrium selection problem [47]. Moreover, Nash equilibrium computation is computationally hard in general, even in the setting of concave games where its existence is guaranteed. In general determining whether a pure strategy NE exists in a normal-form game is NP-complete [27], and even for two player normal form games finding an exact or approximate mixed Nash equilibrium is PPAD-complete [17, 30, 28]. Similar results extend to auction settings; for instance, (1) finding a Bayes-Nash equilibrium in a simultaneous second-price auction is PP-hard [23], even when buyers’ valuations are restricted to be either additive or unit-demand, and (2) with subjective priors computing an approximate equilibrium of a single-item first-price auction is PPAD-hard [38], a result that holds also with common priors when tie-breaking rule is also part of the input [26].
Some answers to this problem come from learning theory. First, for special classes of games, when each agent employs a specific learning algorithm in repeated instances of the game, the outcome converges in average play to Nash equilibria. This is true for fictitious play [15] for the empirical distribution of players’ actions in a variety of classes of games, including zero-sum games [69], non-degenerate normal form games [8], or potential games [59]. For the case of zero-sum games, the same is true for a more general class of no-(internal [16] or external [81]) regret learning algorithms, while for general normal-form games they respectively converge to correlated or coarse correlated equilibria – convex generalisations of the set of Nash equilibria of a normal form game. The price of anarchy / stability approach can then be extended to such coarse notions of equilibria, with the smoothness framework of [72] for robust price of anarchy bounds being a prime example.
Unfortunately, this perspective falls short of a complete picture for several reasons. First, learning dynamics can exhibit arbitrarily high complexity. Commonly considered learning dynamics may cycle about equilibria, as is the case for fictitious play [74] or for multiplicative weights update [14]. Worse, learning dynamics can exhibit formally chaotic behaviour [68, 25], bimatrix games may approximate arbitrary dynamical systems [4]. In fact, replicator dynamics on matrix games is Turing complete [5], and reachability problems for the dynamics is in general undecideable.
In the converse direction, the usual notion of no-regret learning can be too weak to capture learnable behaviour. In this case, the associated price of anarchy bounds may misrepresent the efficiency of actual learned outcomes. One motivating example here is that of auctions. For first-price single item auctions, in the complete information setting there may exist coarse correlated equilibria with suboptimal welfare, even though the unique equilibrium of the auction is welfare optimal [36]. Meanwhile, in the incomplete information setting with symmetric priors, whether coarse correlated equilibria coincide with the unique equilibrium depends on informational assumptions on the equilibrium structure itself [7] and on convexity of the priors [1]. This is in apparent contradiction with recent empirical work which suggest the equilibria of an even wider class of auctions are learnable when agents implement deep-learning or (with full feedback) gradient based no-regret learning algorithms [9, 10, 60].
This motivates the necessity of a more general notion of equilibrium analysis, stronger than coarse correlated equilibria for normal-form games and weaker than Nash equilibria, which nevertheless captures the guarantees of the above-mentioned settings and is tractable to approximate or reason about. For the case of auctions, one recent proposal has been that of mean-based learning algorithms [13], but even in that case convergence results of [31, 35] are conditional.
There are two approaches towards the resolution of this question which, while not totally divorced in methodology, can be considered distinct in their philosophy. One approach has been to consider “game dynamics as the meaning of a game” [67], inspecting the existence of dynamics which converge to Nash equilibria, and extending price of anarchy analysis to include whether an equilibrium is reached with high probability. The work on the former has demonstrated impossibility results; there are games such that any gradient dynamics have starting points which do not converge to a Nash equilibrium, and for a set of games of positive measure no game dynamics may guarantee convergence to -Nash equilibria [11, 58]. Meanwhile, [66, 77] proposed the average price of anarchy as a refined performance metric accounting for the game dynamics. The average price of anarchy is defined as the expectation over the set of initial conditions for the welfare of reached Nash equilibria, for a fixed gradient-based learning dynamics for the game.
Another approach has been to establish the computational complexity of local notions of equilibria. This has attracted attention especially in the setting non-concave games, where the existence of Nash equilibria is no longer guaranteed, due to the success of recent practical advances in machine learning via embracing non-convexity [29]. However, approximate minmax optimisation is yet again PPAD-complete [34]. As a consequence, unless PPAD FP, a tractably approximable local equilibrium concept for non-concave games with compact & convex action sets must necessarily be coarser. Towards this end, [48, 46] define a notion of regret that is based on a sliding average of players’ payoff functions. Meanwhile, a recent proposal by [18] has been to define a local correlated equilibrium, a distribution over action profiles and a set of local deviations such that, approximately, no player may significantly improve their payoffs when they deviate locally pointwise in the support of the distribution. They are then able to show for two classes of local deviations, such an approximate local correlated equilibrium is tractably computable.
Our goal in this paper is then to address the question of an equilibrium concept which is (1) also valid for non-concave games, (2) is stronger than coarse correlated equilibria for normal form games, (3) is tractable to approximate, and (4) admits a suitable extension of the usual primal-dual framework for bounding the expectation of quantities over the set of coarse correlated equilibria. The latter necessitates not only a distribution of play, but also incentive constraints which are specified only depending on the resulting distribution and not its time-ordered history. We remark that framework of [48] falls short in the latter aspect when the cyclic behaviour of projected gradient ascent results in a non-vanishing regret bound. We thus turn our attention to generalising the work of [18] on a local notion of -equilibrium [40].
Strikingly, in doing so, we demonstrate the intrinsic link between such local notions of coarse equilibria and the dynamical systems perspective on learning in games. In particular, the two local notions of -equilibria defined in [18] are subclasses of what we dub local coarse correlated equilibria, distribution of plays such that agents in aggregate do not have any strict incentive for infinitesimal changes of the support of the distribution along any gradient field over the set of action profiles. The history of play induced by online (projected) gradient ascent then approximates such an equilibrium, by virtue that it approximates a time-invariant distribution for the game’s gradient dynamics. Extending the usual primal-dual scheme for price of anarchy bounds then reveals that any dual proof of performance bounds is necessarily of the form of a “generalised” Lyapunov function for the quantity whose expectation is to be bounded. The usual LP framework for coarse correlated equilibria is in fact contained in this approach, its dual optimal solutions corresponding to a “best-fit” quadratic Lyapunov function.
Our approach in proving our results combines insights previously explored in two previous works. The existence and uniqueness of solutions to projected dynamical systems over polyhedral sets is due to [33], and in our approximation proofs we also define a history over a continuous time interval. However, our analysis differs as we are not interested in approximating the continuous time projected dynamical system itself over the entire time interval; an approach that would doom our endeavour for tractable approximations in the presence of chaotic behaviour, which is not ruled out under our assumptions [25, 4]. Instead, we are interested in showing the approximate stationarity of expectations of quantities. Therefore, for projected gradient ascent, we suitably extrapolate the history of play into a piecewise differentiable curve, in the fashion of the projections curve studied by [57]. We then identify settings in which this curve moves approximately along the payoff gradients at each point in time via consideration of the properties of the boundary of the action sets in question.
Meanwhile, whereas Lyapunov-function based arguments is not new in analysis of convergence to equilibria in economic settings (e.g. [65]), in evolutionary game theory (c.f. [73] for an in-depth discussion), and in learning in games (e.g. [61, 82]), our perspective in bounding expectations of quantities appears to be relatively unexplored. Most Lyapunov-function based arguments in literature are concerned with pointwise (local) convergence to a unique Nash equilibrium, and work under the assumption of monotonicity or variational stability, or the existence of a potential function. The former two conditions imply the existence of a quadratic Lyapunov function for the game’s projected gradient dynamics, from which Lyapunov functions for alternate learning processes may be constructed. One exception is [43], which deals explicitly with the problem of bounding expectations of stable distributions of Markov processes. Moreover, it is there that the dual solution of the LP bounding the expectation of some function is dubbed a “Lyapunov function”. The continuous time gradient dynamics we study is of course a Markov process, and a rather “boring” one in the sense that it is fully deterministic; this motivates us to denote any of our dual solutions as a generalised Lyapunov function. However, the results of [43] do not include approximations of the Markov process and how Lyapunov-function based dual proofs extend to such approximations. Moreover, it is unclear how their results apply to projected gradient ascent, with polyhedral action sets. We are able to provide positive answers on both fronts.
In turn, our analysis of local correlated equilibria depends on the more established framework of -regret minimisation. Our techniques here are essentially those used in [41, 39]. One key difference is our vector field formulation; usual swap-regret minimisation [76, 39, 41] considers mappings of the action space onto itself, while we measure regret against its differential generators. The consequences are reminiscent of the result of [44] on the equivalence of no regret learning and fixed point computation, in that we require access to a fixed-point oracle for the linear (or conical) combinations of vector fields in question to implement our regret minimisation algorithms. On the other hand, our vector field formulation allows us to refine the notion of correlated equilibria to a family of vector fields for which fixed-point computation is tractable; we are unaware of any similar observation.
2 Preliminaries
In what follows, denotes the set of natural numbers222Including ., and we identify also with each the set . Following standard notation, for a given , and any tuple indexed by 333For instance, vectors or families of sets indexed by . and any , denotes the tuple with the ’th coordinate dropped. Meanwhile, for a tuple , some , and , is the tuple where is replaced by in . In addition, given some , for each we will denote by the standard basis vector in whose ’th component equals one and all others zero. Given a compact and convex set , and some , we let and respectively denote the tangent and normal cones to at , i.e.
Here, denotes the convex closure of a set, and is the standard inner product of and in . In turn, for any , and any , we write for the standard Euclidean norm on , and then for any compact and convex set , for its diamater and for the projection of onto . Given such that each for some natural number , some , and a differentiable function , denotes the usual gradient of , while is the vector . Finally, while tracking the “time evolution” of quantities, we will opt for to denote discrete time, and to denote continuous time.
Definition 1.
The data of a smooth game is specified by a tuple , where
-
1.
is the number (and by choice of notation, the set) of players.
-
2.
For each , is the action set of player . We assume that is a compact and convex subset of for some , and denote by the set of outcomes of the game. We shall also write as the total dimension of the set of outcomes, .
-
3.
For each , is the utility function of player . Each is assumed to be continuously differentiable and have Lipschitz gradients, that is to say, there exists such that for any ,
We will denote and , the full vector of the bounds on players’ gradients and Lipschitz coefficients.
Theoretic analysis of a game is often done by endowing it with an equilibrium concept, which specifies the “expected, stable outcome” of a given game. The standard equilibrium concept is that of a Nash equilibrium (NE), an outcome of a game such that for any player , and any action , . Whenever all are concave over given any fixed , such an equilibrium necessarily exists [70]. However, for a generic smooth game, an NE need not exist [29], which motivates the notion of a first-order Nash equilibrium; an outcome is called a first-order NE if for every player , .
By fixed point arguments, a first-order NE necessarily exists for a smooth game. However, the computation of such an equilibrium is not known to be tractable as of present date. This has lead to questions on whether there exists a correlated variant of a first-order NE; keeping in mind that correlated and coarse correlated equilibria (respectively, CE and CCE) of a finite game are computable in time polynomial in the size of its normal-form representation, one would ask whether such a local CCE is also tractable to compute or at least approximate. Towards this end, [18] propose one definition of a local -equilibrium, analogous to the definition of -regret for finite games [40].
Definition 2 ([18], Definition 1).
For , a distribution over is said to be an -approximate equilibrium if for any and , .
Definition 3 ([18], Definition 3).
For some and a player , is called a family of -local strategy modifications if for each element of and each , .
Definitions 2 and 3 together thus provide a local notion of correlated equilibrium, insofar that players consider modifying their strategies by a -small step after they are drawn444Indeed, an earlier version of [18] called this equilibrium concept an -local correlated equilibrium. In face of the results in Section 5.2, we retain the nomenclature.. The equilibrium concept depends on the families of local strategy modifications, and [18] show the tractable approximability of such equilibria for two such families,
An immediate observation is that both and as -strategy modifications are provided by families of gradient fields of functions over . Specifically, we may write
(1) | ||||
(2) |
And of course, we may extend each function to one , by letting it depend only on the action of player ; for such a function , for any . In what follows, we will exploit this observation and define alternate, differential definitions of local correlated equilibria, and its coarseness. In particular, we shall identify the appropriate notion of coarseness to be when the equilibrium constraints are all provided by gradient fields – as in this case, online gradient ascent will provide a universal approximation algorithm. To define our differential notion of equilibria, we first modify Definitions 2 and 3 to only apply to sufficiently smooth vector fields.
Definition 4.
For and , a distribution over is said to be an -local CE with respect to a family of Lipschitz continuous vector fields , if for every ,
(3) |
Here, and are respectively bounds on the magnitude of and the Lipschitz coefficient of , analogous to and . If , is simply called a local CE.
Definition 5 (Coarseness).
For Definition 4 and in all subsequent definitions in this section, if the family is given as a family of gradient fields for a subset of continuously differentiable functions, is said to be a coarse equilibrium.
We remark on the difference in the role of in Definitions 2 and 3 versus that of our Definition 4; in the former, is an absolute bound on the violation of the equilibrium constraints, while in the latter is a multiplicative bound on the expectation of sum over the set of players of the derivatives of in direction . Another difference between the two definitions is that Definition 4 considers arbitrary suitably smooth vector fields and not those defined solely on for some . Finally, we remark that our bounds can be non-uniform, with regret against deteriorating potentially as a polynomial555Quotienting out this polynomial has the convenient side effect that it allows for simpler ways to write . in terms of the bounds on the magnitude and Lipschitz moduli of players’ utilities and those of .
Dividing both sides of (3) and taking the limit for an -local CE gives the following differential definition of an -local CCE.
Definition 6.
For , a distribution over is said to be an -local CE with respect to a family of Lipschitz continuous vector fields , if for every ,
(4) |
Via an appeal to the projected gradient dynamics of the smooth game, we will show that such notions of local coarse correlated equilibria are in fact weakenings of the equilibrium concept we obtain by demanding our distribution to be invariant under time translation for the projected gradient dynamics of the smooth game. Such dynamics are provided by the system of differential equations,
(5) |
with initial conditions for each player . By [19], for any initial conditions , there is a unique absolutely continuous solution for such that (5) holds almost everywhere.
Now, suppose that a distribution is invariant under time translation, that is to say, for any measurable set and any , the set is measurable, and moreover, . In this case, for any continuously differentiable function , at . In particular, whenever the expectation and the time derivative commute,
Thus distributions which are stationary under gradient dynamics can be viewed to satisfy a different no-regret property, each describing that expected value of a given function does not change under time translation. First-order Nash equilibria of course satisfy such a no-regret property; if is a fixed-point of the gradient dynamics of the game, then for every player . Therefore, for any vector , . Together, these observations motivate our notion of stationary correlated equilibria.
Definition 7.
For , a distribution is said to be an -stationary CE with respect to a family of Lipschitz continuous vector fields , if for every ,
(6) |
3 Approximability of First-Order Coarse Correlated Equilibria
A primary objective of our work is to show that the above notions of -local or -stationary CCE are universally approximable in some “game theoretically relevant” settings. Here, by universal we mean that in Definition 5 we may take be the set of all differentiable functions with Lipschitz gradients and nevertheless tractably compute approximate local or stationary equilibria. We shall establish this by showing that projected gradient dynamics obtains an approximate stationary CCE, and such stationary CCE are necessarily also local CCE. Whereas for the purposes of practical algorithms we will need to investigate when all players take discrete steps in time, the form of analysis is motivated by how projected gradient dynamics yields our desired approximation.
To wit, let be given, and consider sampling uniformly from the trajectory of the projected gradient dynamics given initial conditions . That is, we consider the distribution on by drawing and sampling . In this case,
Now, by the bound on the gradient of , , which immediately shows that we obtain an -stationary CCE with and . Interestingly, we do not yet need to invoke any bounds on the magnitude of the utility gradients nor any Lipschitz moduli.
Now, suppose for a simple example that each is the -dimensional -hypercube. In this case, we will argue that
which implies that the given distribution is an -local CCE with respect to with the very same . We shall do so by showing that, for each and each ,
(7) |
except on a subset of measure zero of . Here, we supress the time-dependence for the purpose of parenthesis economy. In this case, first note that
As a consequence,
Notice that, by the definition of the tangent and normal cones, on the second line, both the left-hand side and the right-hand side specify quantities which are non-positive. Therefore, it is sufficient to show that the right-hand side equals for almost every .
For the case of the hypercube, we may compute projections coordinate-wise. So suppose that at time , . This can only be when , hence for some , on the interval , no hypercube constraint for coordinate binds. As a consequence, for , we have . Now, denote by the set of in which we incur a projection loss, . Then , as each interval is disjoint. This in turn implies that is countable, as the sum of a set of strictly positive numbers of uncountable cardinality is unbounded. Therefore, is countable. This is a superset (with potential equality) of for which (7) fails to hold for some player , which implies our desired result.
3.1 Approximate projected gradient ascent
These argument presented in Section 3 form the basis of our proofs of tractable approximability of local or stationary CCE. In particular, we shall see that after proving the approximability of a stationary CCE, that a local CCE itself is also approximable will then follow, by arguing that the zero-measure property above is maintained for the analogue of (7).
There is yet a hiccup though; most learning algorithms in practice take discrete steps. Whereas [33] (Theorem 3) implies that a discrete analogue of (5) would approximate the continuous dynamics uniformly over an interval for sufficiently small step sizes, their result does not come with approximation guarantees. And besides, our formulation of the gradient dynamics of a game is general enough that we cannot yet rule out chaotic behaviour (as in e.g. [68]).
Thus we want to avoid any attempts at an uniform approximation of a solution of (5). Instead, we shall consider the case when each player uses projected gradient ascent with equal, adaptive learning rates as their learning algorithm. We shall then extend this to a piecewise-defined continuous curve as an approximation of the underlying projected gradient dynamics. This parametrised curve might actually diverge from actual solutions of (5), and would significantly do so in the presence of chaos. But we shall in the following subsections that this does not end up mattering for the purposes of vanishing regret bounds. Moreover, our discussion in Section 5 will show that, for the purposes of bounding expectations of quantities, regret bounds for the continuous curve are “as-if” they were for the corresponding weighted average distribution of play. All in all, our discrete-time learning algorithm and the resulting continuous curve of interest are defined:
Definition 8.
Given a sequence of decreasing step sizes such that and , projected gradient ascent with equal, adaptive learning rates has each player play at time , and update their strategies
After time steps, the learning dynamics outputs a history of play. Let be any sequence of weights. We proceed by defining via this history an approximate, -accelerated projected gradient dynamics for the game, via extending this distribution in a piecewise fashion to a curve where . Towards this end, if there exists such that , then . Otherwise, for , let , and fix
We will then denote and for the which determines .
Our notion of an approximate projected gradient dynamics is obtained, in a sense, by connecting the “dots” traced by projected gradient ascent, where the time experienced by the curve between time steps and equals . We will focus on two choices of ; weights will allow us to track regret when we consider the smoothed time-average distribution, whereas the weights will quantify results for smoothed trajectory which remains faithful to the “proper time” experienced by the curve. It is certainly not the only way in which one could obtain such a continuous curve – one could, for instance, consider instead the piecewise linear extension. However, as it shall become apparent in the subsequent sections, our proofs of approximability depend crucially on whether the velocity of the curve is sufficiently close to the tangent cone projection of the utility gradient (scaled by ), and our definition ensures so for the two classes of compact and convex action sets we consider.
To obtain regret bounds given the weights , we will use the following lemma throughout, which follows standard arguments (e.g. Theorem 2.3, [22]) to bound “almost” telescoping sums; but for bounds relating to stationary CCE, we need our bound to be double-sided.
Lemma 3.1.
Let be a compact & convex set, be an arbitrary sequence in , and a function with bounded, Lipschitz gradients. Then for any positive sequence which is non-decreasing and any integer ,
Proof.
Follows from simple rearrangement of the sum,
Of course, the empty sum has value when . If , we note that since is non-decreasing, for any . Therefore, we obtain an upper bound when all are fixed to their bound in absolute value, . ∎
3.2 Compact & convex action sets of smooth boundary
Our first approximability result is in the setting in which the boundary of each is a regular hypersurface in , where all of its principal curvatures are bounded by some . For a reader without a background in differential geometry, we provide a very brief account of the facts on hypersurfaces in Euclidean space that we will need for our analysis.
Definition 9 (Follows from [53] Proposition 5.16, mentioned in e.g. [54]).
A regular hypersurface in is a set such that for every point , there exists an open set and a smooth function , such that and .
Namely, a regular hypersurface is at each point defined locally via an equation for some function . The plane tangent to at is given by the equation . Working with the basis in such that has only its ’th coordinate non-zero, admits a Taylor expansion
The implicit function theorem allows us to write as a function of other in a neighbourhood of on . This dependence is dominated by the quadratic terms involving , thus for some symmetric matrix ,
This matrix is called the (scalar) second fundamental form. As is symmetric, it has an orthonormal basis of eigenvectors with real eigenvalues. Each such eigenvector corresponds to a principal direction, and the corresponding eigenvalue is the principal curvature in that direction.
Thus, if the boundary of each action set is a regular hypersurface of bounded principal curvature, then the boundary of can be approximated by the level set of a quadratic function about each point. We also remark that the convexity assumption on implies that if , then all principal curvatures are bounded below by . With this in mind, we proceed with our proofs of approximability. The first step of the analysis comprises of bounding the difference between and the tangent cone projection . This turns out to be a straightforward matter in this setting, thanks to the following observation.
Proposition 3.2.
Let be a compact and convex set of smooth boundary, and suppose its boundary has non-negative principal curvature at most . Then for any ,
Proof.
Fix arbitrarily. We shall work in coordinates where is at the origin666i.e. we work in coordinates where , which can be assured by some translation. This ensures that we don’t need to write out the for , and that the difference between and the point that defines pre-projection. This also makes the specification of the minimisation problem a bit easier to read., and
for some , where each are principal directions of curvature for at . In this case by the smoothness of the boundary, for a small neighbourhood about , is well-approximated (up to second order in ) by the convex body,
where is the (principal) curvature in direction for the surface at point . Note that by assumption, for every . As a consequence, at time for small , and the solution to the projection problem
agree up to first-order terms in . At any solution, the constraint will bind, leading us to the unconstrained optimisation problem
Now, the first-order optimality conditions are
For sufficiently small , at an optimal solution for each and . Therefore, up to first-order in , for each ,
This in turn implies that
Therefore, the difference between the velocity of motion and the scaled projection to the tangent cone of the utility gradient is, for any principal direction ,
By the bound on the magnitude of and the feasibility of , . In turn, , by which we have the desired result. ∎
In particular, our bound on the curvature loss from Proposition 3.2 shows that at each step , the cumulative magnitude of the difference between the velocities for the curve implied by the true projected gradient dynamics of the game and that of the approximate projected gradient dynamics are of order , no matter how the term adjusts the velocity of the curve.
Lemma 3.3.
Whenever is a compact and convex set with smooth boundary of bounded principal curvature , approximate -accelerated gradient dynamics satisfies, for every time step with ,
independent of the choice of .
Proof.
To see the result, we shall write, for any , noting that and ,
(8) | ||||
(9) |
We then proceed by taking the absolute value of both sides and applying the triangle inequality. The magnitude of (8) is bounded, by the Cauchy-Schwarz inequality,
Here, the very last inequality uses the bounds on the magnitude of the gradients and the Lipschitz coefficient of , that , and that the projection operators onto closed convex sets in Euclidean space are non-expansive. Meanwhile, the second term (9) satisfies, by Proposition 3.2 and the Cauchy-Schwarz inequality,
Combining the above bounds, we have
∎
Combining the Lemmata 3.1 and 3.3 then allows us to obtain proof of approximation bounds for an -stationary CCE.
Theorem 3.4.
Whenever all are compact and convex sets with smooth boundary of bounded principal curvature , the approximate -accelerated projected gradient dynamics satisfies,
(10) | ||||
As a consequence, for step sizes , the approximate -accelerated projected gradient dynamics yields, via sampling where , an -stationary CCE where
-
1.
for the choice , , and
-
2.
for the choice , ,
with respect to the set of all functions with Lipschitz gradients. In this case, for Definition 7 for a stationary CCE,
Proof.
It remains to show the approximability of an -local CCE with respect to the set of all differentiable functions with Lipschitz gradients. For a set with a smooth boundary, at any time step, there is effectively one “co-moving” linear constraint that binds; the normal cone at any is either a half-line or is equal to . This essentially allows us to repeat the tangency arguments we employed for the toy example of a hypercube in Section 3.1.
Theorem 3.5.
Whenever all are compact and convex sets with smooth boundary of bounded principal curvature , the approximate -accelerated projected gradient dynamics satisfies,
(12) | ||||
As a consequence, for step sizes , the very same guarantees of Theorem 3.4 applies also for -local CCE.
Proof.
Note that for any and for every player ,
(13) | ||||
(14) | ||||
(15) | ||||
(16) |
The terms (13) and (15) together yield the term, as in the proof of Theorem 3.4. Meanwhile, (14) is by definition of the tangent and the normal cones. As for (16), the term is positive if and only if the normal vector component of is strictly less than ; i.e. points strictly inwards to . Therefore, is strictly in the interior of , and thus for some positive , is in the interior of for . We conclude that the set of such has measure zero by the cardinality arguments made in the case of hypercubes in Section 3. As a consequence, for almost every ,
which implies that (16) yields the term, again as in the proof of Theorem 3.4. ∎
The discussion in this section demonstrates common themes regarding the connection between local CCE, stationary CCE, and the projected gradient dynamics of the underlying smooth game. Approximate projected gradient dynamics provides an approximate stationary CCE, and simultaneously an approximate local CCE, as the tangency of the motion through the strategy space implies that as far as the gradient dynamics are concerned, a local CCE is a relaxation of a stationary CCE. Meanwhile, the complexity of approximation is determined via the properties of the boundary of the strategy space, with approximation guarantees deteriorating linearly with the maximum curvature.
3.3 Polyhedral action sets
We now turn our attention to the approximability of stationary and local CCE for smooth games, when the action set of each player is polyhedral; that is to say, for each player , there exists and such that . We shall assume, without loss of generality, that all involved polyhedra have volume in , and each has no redundant inequalities. Moreover, the rows of each will be assumed to be normalised, . With that in mind, we begin by defining the class of polyhedra of interest, over which approximation of stationary or local CCE is “easy”.
Definition 10.
A polyhedron is said to be acute if for every distinct , .
Examples of acute polyhedra include potentially the most “game-theoretically relevant” polyhedra, the hypercube and the simplex. That the hypercube is acute in this sense is straightforward: for each , the constraints and have negative inner product for the corresponding rows of , while any other distinct pairs of rows of are orthogonal. In turn for the simplex, factoring out the constraint , we are left with the set of inequalities,
For any distinct , the inner product of the vectors associated with the left-hand side of these constraints is then
As mentioned, the importance of acute polyhedra is that linear optimisation over them is trivial; a greedy algorithm suffices. Moreover, over such polyhedra always follows the tangent cone projection of (modulo scaling by ), rendering the approximate projected gradient dynamics faithful. The latter statement is the one we need for the desired approximation bounds, which is proven in the following lemma.
Lemma 3.6.
Suppose that is an acute polyhedron in , , and . Then for ,
Proof.
Without loss of generality, we will work in coordinates such that . Consider the projection problem for any ,
Its dual problem is given,
Now, given a sequence decreasing to , there is an infinitely repeated set of active rows for the optimal solutions to the dual projection problem. This is necessarily the set of active and relevant constraints at time , and for the corresponding submatrix of ,
for sufficiently close to . In particular, whenever and zero otherwise, while .
Now, the acuteness condition implies that the off-diagonal elements of are non-positive, whereas it is both positive semi-definite and invertible; implying it is positive definite. Therefore, by [37] (Theorem 4.3, and ), has all of its entries non-negative. Meanwhile, by feasibility of ( in our choice of coordinates), has all of its entries non-negative, which implies that also has only non-negative entries. Therefore, for every row which is active, and zero otherwise.
Thus all that remains to show is that . Let be the set of constraints which bind at (but potentially, for ), and the associated submatrix of . Then consider the tangent cone projection problem
The dual projection problem is then,
We need to show that is an optimal solution. But this is immediate now, as the feasibility of for small implies that is a feasible solution to the tangent cone projection problem, with solution value . Meanwhile, the given is dual feasible, with the same solution value. By weak duality, both solutions are necessarily optimal. ∎
Remark. The implications of Lemma 3.6 were already proven for the hypercube and the simplex in [57], in which the authors study approximations of projected gradient dynamics. Indeed, the analysis here is in a similar vein in that we track the time evolution of the projection curve, though we identify and exploit features of the polyhedron which makes linear optimisation over it trivial.
It turns out that for acute polyhedra, it is possible to prove stronger approximation guarantees than the general case. Intuitively, that the approximate motion never fails to move along the tangent of utility gradients results in well-approximability of the projected gradient dynamics of the smooth game.
Lemma 3.7.
Whenever is an acute polytope, the outcome of the approximate -accelerated gradient dynamics satisfies, for every player and every time step with ,
independent of the choice of .
Proof.
Theorem 3.8.
Whenever all are acute polyhedra, the outcome of the approximate -accelerated gradient dynamics satisfies, for every player and every time step with ,
(17) | ||||
As a consequence, for step sizes , the approximate -accelerated projected gradient dynamics yields, via sampling where , an -stationary CCE with respect to the set of all functions with Lipschitz gradients. In this case, the same guarantees hold as in Theorem 3.4, where we fix . Analogously to Theorem 3.5, the -accelerated projected gradient dynamics also yields an -local CCE.
Proof.
Near identical to the proof of Theorem 3.4 and Theorem 3.5, except by Lemma 3.6
implying that there are no projection losses. However, we nevertheless need an alternative argument that for almost every ,
Now, by [57] (Lemma 2), the curve is piecewise linear whenever is polyhedral. Then, a constraint binds over an interval only if for such . However, for such binding constraints and some , which implies the desired result. ∎
The interesting case is thus when the polyhedra we consider are not acute. This is counter-intuitive; acuteness in a sense implies that smooth boundary approximations of the convex body would have very high curvature, a condition under which the differential regret guarantees of approximate projected gradient dynamics deteriorates. However, acute polytopes in turn enjoy even better convergence guarantees. Meanwhile, the case for general polyhedra remains open.
3.4 (Partially) Adversarial Regimes
Sections 3.2 & 3.3 essentially provides inequalities satisfied by continuous curves we obtain by extending the sequence obtained via projected gradient ascent into a continuous curve, which may be regarded as an approximation of the actual projected gradient dynamics of the underlying smooth game. Whereas the bounds of Theorems 3.4, 3.5 and 3.8 are provided for the cases when all have smooth boundary of a uniformly bounded curvature or all are acute polytopes, we readily observe that when players may have either as their strategy sets, the regret guarantees would not deteriorate.
However, our results are more sensitive to the fact that every player uses the very same step sizes. In this section, we shall relax this assumption, and work instead under the assumption that a subset of players use the same step sizes, whereas players might play adversarially. The high level idea is that as players have their strategies vary in the same manner, the continuous curve we consider should try to connect their strategies in time, while that of players can be taken as constant in each time step.
We remark, also, that setting for the continuous curve obtained via the approximate -accelerated projected gradient dynamics made sense in the preceding section, because it could be taken as an approximation of the underlying dynamics, with the individual step sizes providing the “proper time” experience through the motion of the curve. With players learning with different step sizes, at rates that might go to or , there is no longer such a valid choice. Therefore, we will fix attention to the case when we effectively take .
Thus, we focus on the setting when players update their strategies via projected gradient ascent, setting , whereas players will be allowed to pick their strategies arbitrarily. We then fix, as the approximate restricted time-averaged projected gradient dynamics, the curve , where
(18) |
Since only the players in update their strategies in a continuous manner, we will now need to restrict the scope of the functions we consider. In particular, we will ask that for any , . We remark again that when only a single player updates their strategies via projected gradient ascent, this is equivalent to letting .
Our result, as one might expect from the construction of , is in line with those in the preceding discussion. Sampling appropriately from the curve , the players in incur vanishing aggregate regret against such in aggregate, both for the notions of -stationary or -local CCE.
Theorem 3.9.
Suppose that each player updates their strategies via projected gradient ascent, using the same step sizes . Moreover, assume for each such that has smooth boundary with bounded curvature , or is an acute polytope (in which case we let ). Then
both | |||
As a consequence, for step sizes , the approximate restricted time-averaged projected gradient dynamics yields, via sampling where , an -stationary CCE with respect to the set of functions with bounded, Lipschitz gradients such that for any . In this case, the same guarantees hold as in Theorem 3.4, where we fix . Analogously to Theorem 3.5, the approximate restricted time-averaged projected gradient dynamics also yields an -local CCE.
Proof.
The bounds of Lemmata 3.3 and 3.7 are modified , as by the construction of , for , is constant over each . The assumption that is constant over for implies that and that the term may be obtained by looking at the continuous variance of , from which the result follows immediately as in the proofs of Theorem 3.4, 3.5 and 3.8. ∎
We remark that even when all players might be employing projected gradient ascent, different learning rates do not in general allow us to assemble the data into a single continuous curve over the guarantees of Theorem 3.9 holds for every subset of players whose step sizes are equal. One exception, however, is when contains only functions whose gradients require no projection, in which case the sequence itself will satisfy vanishing regret guarantees. The discussion of the necessary arguments will need to wait until Section 5.1, when we consider the connection between stationary & local equilibria, and duality in relation to the price of anarchy literature.
4 Approximability of First-Order Correlated Equilibria
We now turn our attention to when local or stationary correlated equilibria are approximable. Unlike the case for local coarse correlated equilibria, here we do not expect a tractable “universal approximation scheme”. Indeed, if contains all vector fields which have Lipschitz modulus , for any smooth game contains the vector field for each player . As a consequence, any such -local (or stationary) CE satisfies the inequalities
For small enough , the support of the approximate local CE contains an approximate local Nash equilibrium, for which no polynomial time algorithm is known. However, we will see in Section 5.3 that -CE for normal form games is equivalent to an -local CE with respect to a set of vector fields,
As efficient algorithms for computing approximate correlated equilibria exist, we would suspect that there might exist tractable algorithms to compute local or stationary CE for appropriately chosen families of vector fields . This is indeed the case, independent of the (non)-convexity of the continuous game, as we will show in the following section.
4.1 -regret minimisation and approximability of -local/stationary equilibria
Our promise of the approximability of local correlated equilibria in fact follows from standard /swap-regret minimisation algorithms, e.g. [76, 44, 39, 41]. Whenever the family of vector fields has finitely many elements, we show that such algorithms are applicable to obtain -stationary correlated equilibria after iterations, provided we have access to a fixed-point oracle for every linear combination over . Such guarantees are also apply to -local correlated equilibria, provided each satisfies a tangency condition. Whenever we are also guaranteed that is a convex function for every , a fixed-point may be computed as the solution of a convex problem. As mentioned, this setting subsumes usual correlated equilibria of normal form games as a special case, and we demonstrate by example that such algorithms may incorporate “rotational corrections”; ways to eliminate cyclic behaviour. Altogether, this implies that there exists a non-trivial strengthening of correlated equilibria which is still tractably approximable.
For the proof of our bounds, we apply the framework of [41] which in turn follows from the analysis of [42]. In our approximation results, for simplicity we restrict attention to quadratic potential functions. Proof of convergence are then an immediate consequence of [41] (Theorem 11); we nevertheless provide the algorithms and the associated proofs for the sake of a complete exposition. To wit, for a fixed smooth game and a family of vector fields , recall that an -stationary CE is a distribution on such that
where the factor is fixed in advance. In our further analysis of -stationary (and also -local) CE, we shall fix attention to finite , and fix the poly-factor to – absorbing the bounds to .
We consider the history of our algorithm to be a sequence of probability distributions on . We then denote the differential stationarity regret with respect to at iteration as
Our first regret matching algorithm then is given:
Here, in line denotes the point mass distribution on . The convergence of Algorithm 1 then follows from usual arguments for -regret minimisation, adjusting for the fact that we are dealing with equality constraints for stationarity and that our action transformations are defined via infinitesimal translations about a vector field in a constrained set.
Theorem 4.1.
Suppose that in Algorithm 1, or is chosen optimally to minimize at each time step. Then after iterations,
As a consequence, an -stationary CE may be computed after iterations, given a fixed-point oracle for for any real vector .
Proof.
For , we shall track the time evolution of , i.e. the cumulative regret. In this case, note that for a potential , link function and error terms , is trivially a Gordon triple (cf. [41], Definition 5) as the condition
holds with equality for any . Now, for our choice of ,
Therefore,
(19) | ||||
Here, the bound follows from the bounds on the magnitutes of , and since (19) is non-negative by the fixed point assumption, , which is true if and only if for every player 777Recall that for any convex set and any , if is an element of the normal cone to at and an element of the tangent cone, then .. As a consequence, by [41] (Theorem 6),
The result follows immediately. ∎
Remark. We note that simply substituting the computed fixed points with (approximately) stationary distributions with respect to the vector field in Algorithm 1 is insufficient for our purposes in general. This is because the “reward system” in our setting formed by the utility gradients, which do not necessarily form a conservative vector field. In particular, even if is induced as a measure via some closed parametrised loop with almost everywhere, it can be the case that
For instance, this is true when for each player . It is in this sense that access to a fixed point oracle may be considered necessary.
One exception, of course, is if we are dealing with a potential game, wherein for some smooth potential function . In this case if is one of the suitable convex bodies studied in Section 3, by the discussion therein for any stationary distribution computed via the gradient dynamics for given some ,
However, in this case a local correlated equilibrium may be computed by simply performing gradient ascent on and finding an approximate local maximum. Therefore, it is unclear whether a regret matching algorithm would provide any advantages regarding tractable computation.
The approximation of an -local CE given access to a fixed-point oracle follows near identically. One important point of note is that local CE involve tangent cone projections of the associated vector fields . In general, it is not true that , except when for every and , in which case the equality is assured by the cone property. We shall call tangential whenever this is the case, and denote the differential local regret with respect to at iteration as
Adapting Algorithm 1 to differential local regret then outputs an approximate local CE:
Theorem 4.2.
Suppose that in Algorithm 2, or is chosen optimally to minimize at each time step. Then after iterations,
As a consequence, an -local CE may be computed after iterations, given a fixed-point oracle for for any non-negative vector .
Proof.
The question that remains is, then, whether there exists a family of vector fields such that an -stationary or -local CE are tractably approximable. The answer is easily shown to be affirmative for the case of -local CE when each is affine-linear in each component.
Proposition 4.3.
Suppose that is a family of tangential vector fields, such that for some matrix and some vector . Then for any , is a convex quadratic function. Moreover, any is a fixed point of .
Proof.
As is linear in , is a sum of squares. Since is tangential, at any fixed point of , . Thus , which implies the result. ∎
Corollary 4.4.
For a family of tangential affine-linear vector fields , a -local CE with respect to can be approximated via solving convex quadratic minimisation problems.
By the remark at the beginning of Section 4, this subsumes the correlated equilibria of normal form games; but such equilibria remain tractably approximable even for non-concave games. And as the following example demonstrates, for the hypercube equilibrium refinements are possible.
Example 1.
Suppose in a two player game that for each player , and hence . Then the usual correlated equilibrium constraints are provided by vector fields
Note that these vector fields are all conservative, and as a result they are gradient fields. We therefore consider extending our set of vector fields by considering
The vector fields have non-zero curl, and thus none of them arise as the gradient field of a quadratic function; although, they are coordinate projections of suitable quadratic functions. As a consequence, no vector field may be expressed as a conical combination of the vector fields . Setting thus provides us a refinement of the usual correlated equilibria of normal form games. The refinement can be strict when looking at the history of play as distributions; consider the matching pennies game where . Then
which implies that the only local CE with respect to is the unique Nash equilibrium at . On the other hand, limiting attention to vector fields cannot rule out cycles, e.g. where is drawn from the uniform distribution on .
We note that when we consider linear (instead of conical) combinations of vector fields in Example 1, we may obtain any arbitrary affine-linear vector field on ; is a linearly dependent set for each player . However, in a normal form game, players’ utility gradients are also affine-linear. As a consequence, for a normal form game, any stationary CE with respect to is necessarily the convex hull of its Nash equilibria.
5 On Equilibrium Analysis
Given our notions of first-order coarse correlated equilibria are approximable, and that first-order correlated equilibria are also sometimes so, a natural question to ask is what these equilibria look like, and how one might reason about such equilibria. We initiate the study of this question in this section. First, we adapt the usual primal-dual framework for price of anarchy analysis to our equilibrium concepts. Our observation is that given a primal optimisation problem encoding performance guarantees of a first-order coarse equilibrium, a solution to the dual is given by a generalised Lyapunov function. In fact, when the associated performance metric is distance to a unique equilibrium, and the dual solution certifies that projected gradient ascent necessarily converges to it, this dual solution is a Lyapunov function in the usual sense. In turn, for first-order coarse equilibrium, one does not obtain a potential in general. Instead, the dual problem takes the form of a vector field fit problem, searching for a linear or conic combination of vector fields in which are aligned with the utility gradients.
We then turn our attention to normal-form games. Here, we find a striking observation; for a suitably chosen finite set of vector fields , first-order (coarse) correlated equilibrium is equivalent to the usual notions of (coarse) correlated equilibria in a normal-form game. As a consequence, the usual primal-dual analysis of price of anarchy bounds can be viewed as a statement about the gradient dynamics associated with the game. In fact, even the smoothness framework of Roughgarden [72] falls under this umbrella, by its equivalence to an “average-CCE” [64].
5.1 Duality and Lyapunov arguments
We proceed with analysing the appropriate primal-dual framework for an -approximate first-order CCE. Our approach is motivated by the possibility of strengthening primal-dual efficiency analysis for the outcomes of learning algorithms. Often, in the analysis of games, the object of interest is performance guarantees attached to an equilibrium concept and not necessarily the exact form of equilibrium. That is to say, given an equilibrium concept (in full abstraction, a subset ), and a continuous “welfare” function , one may consider a bound on the worst case performance
This quantity is often referred to as the price of anarchy of the game, while the related notion of price of stability bounds the best case performance in equilibria,
Most methods of bounding such expectations fall under two classes of a so-called primal-dual method. One class of such methods argues the bounds via the lens of primal-dual approximation algorithms, writing a primal LP (“configuration LP”) coresponding to performance (e.g. welfare) maximisation, and obtaining via the equilibrium conditions a solution for its dual minimisation LP. If the equilibrium is good, the constructed dual solution has value a constant factor of the performance of the equilibrium, which provides the desired result. An in-depth discussion on this approach might be found in [63]; we remark that this is not the primal-dual framework of interest here.
Another class of primal-dual methods argue directly via a primal problem over the set of equilibria. Here, the primal LP has as its variables probability distributions over , and is subject to equilibrium constraints. The objective is to minimise or maximise , corresponding respectively to computing the price of anarchy or stability up to a factor of . For instance, (by the arguments in [64]) the smoothness framework of [72] as well as the price of anarchy bounds for congestion games by [12] both fall under this umbrella. The question is whether such arguments may be extended for the outcomes of gradient ascent.
The answer, of course, is yes; and for the concept of equilibrium of choice it is more enlightening to invoke -stationary CCE constraints rather than -local CCE constraints. To wit, given the function , consider the following measure valued (infinite dimensional) LP, which seeks the worst-case value of the expectation of amongst the set of stationary CCE of the game,
subject to | (20) | |||
() | ||||
() |
The Lagrangian dual of (20) may then be naively written,
subject to | (21) | |||
() |
Here, is some real number, while is assumed to be “some measure” on . Of course, we may simply pick a dual solution which places probability on some element . Under such a restriction, the dual problem is then
subject to | (22) | |||
() |
Note that the dual is always feasible. Whether weak duality between the primal LP (20) and either one of the dual problems (21) or (22) depends on the specification of the vector spaces the primal and dual variables live in, and as a consequence whether they form a dual pair in the sense of [6, 75].
However, given a differentiable with Lipschitz gradients, and the outcome of approximate gradient dynamics , such dual arguments are always valid. The integral along the curve exists since it is differentiable almost everywhere on by construction. In particular, if
then the time expectation of satisfies
for step-sizes , where the and are given as in Theorems 3.4 or 3.8. Moreover, whenever is Lipschitz continuous, the order of the dependence of the convergence rate on is maintained also for projected gradient ascent. This can be shown via the trivial bound on the difference between the expectations of for the two resulting distributions; note that for each ,
Therefore,
(23) |
Theorem 5.1.
Suppose that is a continuous function, that all satisfy either the conditions of Theorem 3.4 or Theorem 3.8, and that is a solution of (22). Then with step-sizes , after steps, the outcome of approximate -accelerated projected gradient dynamics satisfies
If also is Lipschitz continuous with modulus , then the outcome of projected gradient ascent satisfies, for the choices or ,
Thus, given a dual solution to (22), the function can be viewed as proof certificate that the value of (20) is at least ; encoding the performance guarantee with respect to the expectation of when all players employ projected gradient ascent with equal step sizes. The choice in this case tracks bounds on the expectation of when we consider an approximation to the actual gradient dynamics, whereas the choice provides guarantees in time averages. It remains to given an interpretation to this function.
To wit, consider the case when our smooth game has a unique first-order NE , and suppose for any with equality if and only if . Furthermore, suppose that is an optimal solution to (21). Then the dual feasibility conditions imply that
Now, recall that for a dynamical system defined by the differential equation and a unique equilibrium , a function is called a (global) Lyapunov function if , with equality only if . By (5), we then infer that the function is a Lyapunov function in the usual sense.
A similar observation holds for an -local CCE, when the set of functions are such that for each , is a tangential vector field. Recall that this condition implies that for every , i.e. . As a consequence, the associated primal problem is given
subject to | (24) | |||
() | ||||
() |
And then, by similar arguments, we state the dual problem as
subject to | (25) | |||
() |
where is a set of functions with tangential gradient fields. An analogue of Theorem 5.1 is immediate through an identical line of arguments.
Corollary 5.2.
Moreover, we may again check what a dual solution which proves convergence to a unique Nash equilibrium looks like. In this case, dual feasibility of a dual solution to (25) implies that
Here, the first inequality follows since for any pair of vectors and , . The final equality, in turn, holds as is a first-order NE by assumption. Thus, again, the function is a Lyapunov function in the usual sense.
For a more general performance metric , we shall thus call a dual solution a (generalised) Lyapunov function, following the nomenclature in [43]. The insight is that not only such functions can describe worst- and best- case behaviour of approximate or exact gradient dynamics of the game as far as bounding the performance of -approximate first-order CCE is concerned, they arise naturally through the primal-dual framework associated with such first-order CCE.
Moving onwards, we first note that Corollary 5.2 allows us to fulfill a leftover promise from Section 3.4. In particular, when all players use the very same step sizes, Theorem 5.1 and Corollary 5.2 provide performance guarantees over for the time average behaviour as well as approximations of the gradient dynamics obtained from the sequence . However, these bounds follow “as-if” the dynamics follow the continuous curve we considered in Sections 3.2 & 3.3; when the step sizes may differ or some players are adversarial, the construction of a single such curve is not allowed in our discussion in Section 3.4. However, similar regret bounds as in Theorem 3.9 may be established for the sequence through a bootstrap argument on Corollary 5.2, when the functions are tangential.
Theorem 5.3.
Suppose a subset of players employ projected gradient ascent with equal step sizes , with players potentially adversarial. Furthermore, suppose that for any , either has smooth boundary of bounded principal curvature , or is an acute polytope (where we fix ). Then for any , for any tangential function with bounded, Lipschitz gradients such that for any and any ,
Thus, the aggregate regret against such vanishes in the limit whenever , and a choice guarantees regret against such in first-order.
Proof.
The proof implicitly suggests that the desired result may be proven through the “retraction argument” in (23), noting that for tangential , the first-order regret is Lipschitz continuous. The first two terms are from Theorem 3.9. For the third term, invoke Corollary 5.2 on the restricted time-averaged projected gradient dynamics considered in Section 3.4, evaluating each integral over and summing over , and fixing . By the Lipschitz property on and the utilities, is then Lipschitz continuous with bound , and for any . Then note that for our , is a solution of (25), which ensures the regret bound. ∎
We remark that if all players implement projected gradient ascent with the same step sizes, Theorem 5.3 implies that the actual time-average play is also an approximate local-CCE, with respect to the set of all suitably smooth tangential functions. In particular, we lose the much stronger guarantees with respect to first-order stationarity, or with respect to local first-order regret generated by functions which may require projections. Nevertheless, the “as-if” results of Theorem 5.1 and Corollary 5.2 allow us valid bounds on best- and worst-case expectations of some function for the actual time-average play, despite the fact that the distribution obtained might not actually be an approximate first-order equilibrium.
A natural question to pose now is whether such generalised Lyapunov functions can be found to prove meaningful bounds. We illustrate how this can be done in the following example. The main takeaway is that, whereas generalised Lyapunov functions can indeeed certify properties of the dynamics of a game, constructing and checking the validity of dual solutions can be rather laborious.
Example 2 (Matching Pennies).
Consider the canonical example for a game in which the gradient dynamics necessarily cycle, matching pennies. Up to reparametrisation of the strategy space, there are two players, whose utility functions are given
The unconstrained projected gradient dynamics then satisfy, if , for some “phase angle” . If , however, projections at the edge of a box suppresses the radius of motion down to by some time , and for each , . Therefore, any stationary probability distribution must be rotationally symmetric, with support on the disc . The goal here is to present the form of generalised Lyapunov functions which certify these bounds, at least approximately, such that they provide convergence guarantees also for approximate projected gradient dynamics.
For the bound on the radius, by the duality based arguments in this section, we seek a suitable function and a small such that
Here, we abuse notation by considering time dependent quantities given suitable initial conditions for the projected gradient dynamics. Now, consider setting,
for some constants to be determined later. By construction, is continuously differentiable with Lipschitz gradients. On the disc , , whereas for any , by symmetry of the dynamics under rotations by , it is sufficient to consider the case when . In this case, if , then no projections occur, and
in which case we have
We would like to bound this quantity. Denote , and suppose that . Then
This bound is maximised, when , whenever , with value . Meanwhile, if , then , and
This function is convex, so it attains its maximum over the feasible interval for . When the bound equals , whereas if , whenever
the bound is . As the bounds will generally improve in , we will also fix at this point for convenience.
If instead then , and we have
where as , . To bound , we again consider the cases when and . In the former case,
which is maximised for with the same value. Meanwhile, if , then
defining . Again by the convexity of the expression, it is sufficient to check its values at its endpoints; setting results in a bound of , while setting results in a bound . As a consequence, proves a bound of
Now, note that the convergence bounds for approximate -accelerated projected gradient dynamics proven in Theorem 3.8 depend linearly in . However, admits a constant bound independent of choice of in this setting. As a consequence, can be chosen as an arbitrarily tight dual solution for the gradient dynamics of the matching pennies game by letting , and approximate -accelerated projected gradient dynamics after time steps and step sizes results in a distribution with expected square radius at most , whether we pick or .
Finally, we would like to provide an approximate dual proof that any stable distribution is approximately rotationally invariant. Towards this end, consider a function which is rotationally symmetric, i.e. for some function differentiable with , is the angle in polar coordinates corresponding to and , is the associated frequency, and is again some phase angle. For any rotationally symmetric stationary distribution , via Fourier decomposition-based arguments, it must be the case that .
Towards this end, let . In the region without projections, , and therefore forms an exact Lyapunov function for whenever projections are not needed. To handle the region where projections are needed, consider for the previously defined for some choice of and some , and by the symmetry of the problem without loss of generality restrict attention to the case when . In this case,
Therefore,
Note that by a similar treatment as the previous setting, via considering and picking sufficiently large, we may acquire a family of dual solutions which prove convergence. In the interest of brevity, we shall argue about the asymptotic case. In the large limit, whenever , while () is bounded on that set by for some constant . Therefore, a fixed choice of constant is sufficient to provide a family of increasingly tighter dual solutions for bounds on the expected value of . In particular, we again conclude that approximate -accelerated projected gradient dynamics, after time steps and step sizes , provides a probability distribution for which in expectation.
We end this section with a short note on the associated primal-dual framework to first-order correlated equilibria. Given a finite set of continuously differentiable vector fields , the analogue of (20) for stationary CE is given
subject to | (26) | |||
() | ||||
() |
and its Lagrangian dual is simply
subject to | (27) | |||
() |
We interpret this as a vector fit alignment problem. In particular, again suppose that the game has a unique first-order Nash equilibrium , and with equality only if . Then if is a solution of (27), the vector field satisfies
In particular, the set of stationary CE with respect to is the set of probability distributions over the first-order NE of the game whenever the linear span of contains a vector field such that , with equality only if is an equilibrium point for the dynamics. In words, the vector field is suitably aligned with the utility gradient at every non-equilibrium point , whether the gradient dynamics cycle or not. This does not necessarily provide an obvious computational advantage though, since whenever is rich enough we may set (cf. remark after Example 1).
When is a set of tangential vector fields, an analogous argument for the case of local CCE shows that the dual problem associated with a local CE similarly looks for a vector field that is positively aligned with the utility gradients in the conical span of instead. Explicitly, the primal problem is
subject to | (28) | |||
() | ||||
() |
which now has a Lagrangian dual
subject to | (29) | |||
() |
In this case, when is a solution and for any with equality only at the first-order Nash equilibria of the game, we have
We remark that the equality for must hold, because is tangential, while at a fixed point of the gradient dynamics of the game, is in the normal cone to . This implies that , however, dual feasibility implies that the inner product is also .
5.2 First-order (coarse) correlated equilibrium in normal-form games
Finally, we turn our attention to the interpretation of first-order equilibria in normal-form games. We shall see that the usual notions of CE, CCE as well as average CCE (in the sense of [64]) are in fact equivalent to an appropriately defined notion of first-order correlated equilibrium. The conclusion is that the usual primal-dual analysis of price of anarchy bounds in fact fall under the Lyapunov function or vector fit estimation problems discussed in the previous section. Crucially, whenever a dual solution for first-order coarse correlated equilibrium certifies uniqueness of equilibrium, it necessarily assembles into a Lyapunov function in the traditional sense.
5.2.1 Equivalence of coarse correlated equilibria
We will begin by showing that the CCE of finite normal-form games and an -equilibrium (in the sense of Definition 2) are equivalent888This was remarked by Constatinos Daskalakis in a private correspondence.. Consider a finite normal-form game, with a set of players , players’ pure action sets , and utilities . The continous extension of the game has action sets , the probability simplex over , and utilities are given via expectations. Then, for an arbitrary , a -local CE is a distribution which satisfies
Expanding the left-hand side, we have
where we write for the Kronecker delta and define . The last equality follows since
Then as a consequence, for each player and each , the associated local CCE constraint is simply a convex combination of constraints
This is precisely the usual coarse correlated equilibrium constraint for the finite normal-form game. However, via our observation in (1), these equilibrium constraints are each generated by gradients of functions of the form . The conclusion is immediate.
Proposition 5.4.
For a normal-form game, consider the set of functions,
Suppose that is an first-order local CCE with respect to . Then the probability distribution , defined as , is a coarse correlated equilibrium in the usual sense. In turn, interpreting a coarse correlated equilibrium of the normal-form game as a probability distribution on , is a first-order local CCE with respect to .
We now consider what a primal-dual proof of uniqueness of such a local CCE looks like. Suppose that the probability distribution which places probability on is the unique local CCE of the game, then the measure valued primal problem
subject to | (30) | |||
() | ||||
() |
has value . First, note that by the convexity of the objective in and the form of constraints , we may assume that the optimal solution concentrates all probability on pure action profiles. But that implies that we simply have the standard LP over the set of CCE (in the usual sense) of the normal-form game,
subject to | (31) | |||
() | ||||
() |
For this LP to have value , must necessarily be -valued; otherwise, the objective is strictly positive for the LP. So we have that for each player , and some action profile . The associated dual LP is given, after scaling the primal objective by ,
subject to | (32) | |||
() |
By [1] (Proposition 3.1), any tight dual solution necessarily has , and this condition is in fact equivalent to a unique CCE of the finite normal-form game in pure strategies. Letting and noting that
a tight solution to the dual problem of (30)999Recall, after scaling the objective by a factor .,
subject to | (33) | |||
() |
is given by and . Finally, note that
since and hence . As a consequence, we have
where the first inequality is by dual optimality and the second inequality is because for any . As a consequence, as for the projected gradient dynamics of the smooth game, is necessarily a Lyapunov function modulo sign convention.
Proposition 5.5.
A finite normal-form game has a unique first-order local CCE with respect to which assigns probability to a mixed-strategy profile if and only if is a Nash equilibrium in pure strategies and the convergence of the game’s projected gradient dynamics to may be proven via a Lyapunov function of the form for some constants .
This motivates the use of the notions of stationary and local CCEs as refinements of the classical notion of a CCE, and then employ the resulting incentive constraints in primal-dual arguments. That is, the equilibrium constraints given by functions of the form imply that a local CCE of a normal form game with respect to set of continuously differentiable functions is a coarse correlated equilibrium in the usual sense, but satisfies additional incentive constraints for each function .
5.2.2 Equivalence of average CCE and the smoothness framework
We next turn our attention to how the smoothness framework of [72] can also be interpreted as the analysis of a variant of first-order local CCE. Recall the defintion of a smoothness for a cost minimisation game:
Definition 11 (Adapted from [72]).
For a normal-form game, a function for a player is called a cost function, and the game paired with such a cost function for each player is called a cost minimisation game. We denote by the social cost. A cost minimisation game with a minimum social cost outcome is then called -smooth if
For our purposes, the cost function of a player will be their disutility, for any player . Given this definition, [64] characterises the distributions for which a smoothness bound applies as the those of an average coarse correlated equilibrium.
Definition 12 ([64], Definition 2).
For a normal form game and an outcome , an average coarse correlated equilibrium with respect to is a probability distribution such that
By [64] (Theorem 1), given a -smooth cost minimisation game with a minimum social cost outcome , the price of anarchy of any average CCE with respect to is precisely . The proof follows by considering the price of anarchy bound searching linear program
subject to | (34) | |||
() |
We observe immediately the similarity of (34) with (25). This leads us to inspect the associated dual LP written by [64],
subject to | (35) | |||
() | ||||
() |
By an argument analogous to those in Section 5.2.1, the equilibrium constraint is generated via the gradient of the function , whose gradient field is tangential. Thus, an equivalence result through an identical line of arguments, which we omit.
Proposition 5.6.
For a normal-form game and some action profile , consider the singleton set,
Suppose that is an first-order local CCE with respect to . Then the probability distribution , defined as , is an average coarse correlated equilibrium with respect to . In turn, interpreting such an average coarse correlated equilibrium of the normal-form game as a probability distribution on , is a first-order local CCE with respect to .
To see the implications with respect to the gradient dynamics of the underlying game, suppose that there is a smoothness proof of a price of anarchy bound of ; e.g. whenever . In this case, by [64] is a solution of (34), which implies that for any ,
As a consequence, the gradient dynamics of the game is such that at any , the utility gradients in aggregate point towards the pure action profile whenever the social cost is suboptimal. The associated Lyapunov function is , as in this case, the above inequality implies that
which is whenever the expected social cost is strictly greater than that of the optimal one.
5.3 Equivalence of correlated equilibria
The equivalence of correlated equilibria of normal-form games to first-order local correlated equilibria with respect to a suitably chosen set of vector fields also follows from arguments analogous to those in Section 5.2.1. Here, however, we do not find a coarse representation. Our set of vector fields is instead given,
Indeed, for any player and any ,
Therefore, for , defining to be the probability distribution on induced by ,
The final term is, of course, the left hand side of the usual (linear) correlated equilibrium constraints for a normal form game. The equivalence result is yet again immediate.
Proposition 5.7.
For a normal-form game, consider the set of vector fields
Suppose that is an first-order local CE with respect to . Then the probability distribution , defined as , is a correlated equilibrium of the game in the usual sense. In turn, interpreting a correlated equilibrium of the normal-form game as a probability distribution on , is a first-order local CE with respect to .
Moreover, we may yet again convert a statement on the uniqueness of the correlated equilibrium of a game to one about its gradient dynamics. However, as discussed in Section 5.1, here convergence results do not translate to a Lyapunov function. This is because the associated dual problem concerning the uniqueness of a strategy profile is now one of vector field alignment,
subject to | (36) | |||
() |
Our discussion of local CE in Section 5.1 nevertheless allows us to extract the associated implication for the game’s dynamics. That is, our set of vector fields must contain within its conical span a vector field positively aligned with the utility gradients, whose only fixed point is .
Proposition 5.8.
A finite normal-form game has a unique local CE with respect to which assigns probability to a mixed-strategy profile if and only if is a Nash equilibrium in pure strategies and there exists multipliers for each , such that
with equality only if .
6 Further Directions & Extensions
In this paper, we have established the question of computing an approximate first-order analogue of a coarse correlated equilibrium, by identifying such outcomes as the natural property of the gradient dynamics of the underlying game and considering the weighted history of play when all players employ projected gradient ascent as their learning algorithm with equal learning rates. For the setting of finitely many vector fields, we have shown how -regret minimisation may be performed to approximate first-order CE, and discussed tractable settings. Appealing to the properties of the gradient dynamics of the game and extending the usual convex optimisation based primal-dual framework for price of anarchy / stability bounds, we were then able to argue that performance guarantees for the resulting distribution may be proven. However, our results raise many questions yet unanswered, some of which we discuss in this final section.
Approximability of local & stationary (C)CE
Some questions pertain directly to the computation of approximate local or stationary (C)CE. First, we have not established how -approximations are approximable. This is because our approach has been to consider the tangent and normal cones pointwise at a given , explicitly avoiding the need to treat potential regret incurred by projections. It can in fact can be the case that for , the respective projections involving a step in direction or involve projections on constraints distinct each other, or the ones which bind at . This holds true even for projected gradient ascent when is in the interior of . How our appeals to the gradient dynamics of the game may be extended to cover such cases thus remains open. We remark that, unlike the approximability of -local or stationary CCE, the associated approximation bounds of online projected gradient ascent would depend on the Lipschitz moduli of in general.
Similarly, we do not yet know approximability of first-order CCE in settings with general compact and convex action sets , or even in the setting where all are polyhedral. We have proven our results in settings where the approximate projected gradient dynamics are faithful to the tangent cone projection of the direction of motion for the unconstrained gradient dynamics at each point in time, but [57] demonstrates that this need not be the case for polyhedral even in the setting of linear programming. A deeper question, perhaps, is “What is the correct parameter of complexity for compact and convex and approximability of local / stationary CCE via approximate projected gradient dynamics?”. When has a smooth boundary, the approximation guarantee deteriorates linearly in , the bound on the principal curvature of the boundary . However, when are acute polyhedra – a condition intuitively in contradiction to having bounded curvature – projected gradient ascent models approximate projected gradient dynamics perfectly. It is thus yet unknown how to bound the approximation guarantees in the setting with general compact and convex action sets.
Generalisations of local / stationary CCE
Our local coarse notions of equilibria are also, in some sense, “bespoke” for projected gradient dynamics. We note that, while we have proven our results for projected gradient dynamics of the smooth game, we have never actually used the fact that the direction of motion is the gradient of some utility function for each player. Therefore, our results apply for any projected gradient dynamics of the form for each player , with the resulting “equilibrium constraints” obtained by swapping any expression with . This implies a different notion of local or stationary CCE for each distinct time-independent gradient based learning algorithm; time-independent in the sense that no has explicit time or history dependence, but are determined solely via at time . That regret against the gradient fields of some functions are minimised not only by projected gradient ascent is known already. MWU for instance is also a time-independent gradient based learning algorithm in this sense, and it also incurs no-external regret, the associated incentive constraints for which are generated by functions of the form . How to simultaneously refine outcomes reached via such dynamics within a class of equilibria is thus open. Such an equilibrium concept would of course be a relaxation of equilibrium notions discussed in this paper.
We observe that mean-based learning algorithms [13] obtain a generalisation of various dual averaging based learning algorithms, and comparing the results of [36] and [31] we know that the outcomes of mean-based learning algorithms can be a strict subset of the (non-local) CCE of a game. However, to our knowledge projected gradient ascent is not known to be such an algorithm. Moreover, our Lyapunov function based primal-dual approach is at the moment ill-suited for dual averaging based methods in the first place. The associated continuous dynamics for dual averaging may be obtained [61] by setting action sets for each player , and letting
(37) | ||||
(38) |
where is a lower semi-continuous and convex regulariser. In settings where if and only if and is differentiable in the interior of , we are effectively faced with a smooth game where players’ actions sets are dimensional Euclidean spaces, and their equations of motion satisfy as . As a consequence, our approximability results break down completely, and how to extend our specific framework for price of anarchy / stability bounds to such settings at present unknown. The average price of anarchy approach of [66, 77] offers one solution, yet at present the performance metric does not come with convergence rate guarantees, even in expectation.
Similar presently unexplored directions for notions of equilibria relate to relaxing other assumptions. If we drop the previously mentioned time-independence assumption, we obtain settings where agents may learn at asymptotically different rates as a subclass; this question was investigated in [79] for monotone games. However, such games admit a Lyapunov function which proves convergence to their unique equilibrium – a strong assumption that does not hold in many settings of interest. The time-independence assumption is also violated for any algorithm that uses history information, such that (a weighted) average history of play. Another assumption that may be dropped is deterministic motion, say for stochastic gradient ascent, in which case the continuous time dynamical system would be a Markov process. While [43] have already investigated Brownian motion, which can be taken as a model of SGD (cf. [45, 55]), it is unclear how the results within apply with respect to approximation guarantees in our constrained setting.
On tractable local CE
Whereas we have demonstrated a generalisation of correlated equilibrium is tractable, it remains unknown how to procedurally generate a basis for such a family. Moreover, the affine-linearity of a vector field is a rather strong condition, and such vector fields in general fail to be tangential over a more general polytope. Finding wider settings in which fixed point computation for Algorithm 2 is tractable and the associated choice of is meaningful is thus an open problem. Finally, we remark that our convergence results are probably not the best possible amongst algorithms. Recent work [24, 2] shows that swap regret may be minimised in (up to polylog factors) iterations via more specialised algorithms, and we suspect similar methods might apply in our setting.
Further directions in primal-dual analysis
The final question pertains to how useful our primal-dual framework could be. Whereas we obtain a primal-dual framework to prove bounds on the performance of gradient ascent, Example 2 demonstrates that finding dual solutions and verifying their validity can be a daunting task. It is therefore of interest whether there exist systematic ways of constructing generalised Lyapunov functions to prove dual bounds, tight or approximate. One approach would be to consider relaxations to subclasses of differentiable functions with Lipschitz gradients for which the problem is tractable. As usual CCE of normal form games are tractable in the sense that performance bounds take the form of an LP, we infer by the discussion in Section 5 that there exists at least one such relaxation. In fact, given the tractibility of linear programming and that computing performance bounds over the set of (C)CE reduces to an LP, one interesting question is classifying stronger convex programming formulations which capture the guarantees of learning algorithms.
Acknowledgements
Special thanks to Constantinos Daskalakis for the introduction to the question and discussions, and to Martin Bichler, Denizalp Göktaş, Alexandros Hollender, Matthias Oberlecher, and further anonymous reviewers for their feedback. This project was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Grant No. BI 1057/9.
References
- AB [24] Mete Şeref Ahunbay and Martin Bichler. On the uniqueness of Bayesian coarse correlated equilibria in standard first-price and all-pay auctions, 2024.
- ADF+ [22] Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, and Tuomas Sandholm. Near-optimal no-regret learning for correlated equilibria in multi-player general-sum games. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 736–749, 2022.
- ADK+ [08] Elliot Anshelevich, Anirban Dasgupta, Jon Kleinberg, Éva Tardos, Tom Wexler, and Tim Roughgarden. The price of stability for network design with fair cost allocation. SIAM Journal on Computing, 38(4):1602–1623, 2008.
- AFP [21] Gabriel P. Andrade, Rafael Frongillo, and Georgios Piliouras. Learning in matrix games can be arbitrarily complex. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 159–185. PMLR, 15-19 Aug 2021.
- AFP [22] Gabriel P. Andrade, Rafael Frongillo, and Georgios Piliouras. No-regret learning in games is turing complete. arXiv preprint arXiv:2202.11871, 2022.
- AN [87] Edward J. Anderson and Peter Nash. Linear Programming in Infinite-dimensional Spaces: Theory and Applications. A Wiley-Interscience publication. Wiley, 1987.
- BBM [17] Dirk Bergemann, Benjamin Brooks, and Stephen Morris. First-price auctions with general information structures: Implications for bidding and revenue. Econometrica, 85(1):107–143, 2017.
- Ber [05] Ulrich Berger. Fictitious play in 2 n games. Journal of Economic Theory, 120(2):139–154, 2005.
- BFH+ [21] Martin Bichler, Maximilian Fichtl, Stefan Heidekrüger, Nils Kohring, and Paul Sutterer. Learning equilibria in symmetric auction games using artificial neural networks. Nature Machine Intelligence, 3:687–695, 2021.
- BFO [23] Martin Bichler, Maximilian Fichtl, and Matthias Oberlechner. Computing Bayes-Nash equilibrium in auction games via gradient dynamics. Operations Research, to appear, 2023.
- BHS [12] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Perturbations of set-valued dynamical systems, with applications to game theory. Dynamic Games and Applications, 2:195–205, 2012.
- Bil [18] Vittorio Bilò. A unifying tool for bounding the quality of non-cooperative solutions in weighted congestion games. Theory of Computing Systems, 62:1288–1317, 2018.
- BMSW [18] Mark Braverman, Jieming Mao, Jon Schneider, and Matt Weinberg. Selling to a no-regret buyer. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 523–538, 2018.
- BP [18] James P. Bailey and Georgios Piliouras. Multiplicative weights update in zero-sum games. In Proceedings of the 2018 ACM Conference on Economics and Computation, pages 321–338. ACM, 2018.
- Bro [51] George W Brown. Iterative solution of games by fictitious play. Activity analysis of production and allocation, 13(1):374–376, 1951.
- CBL [06] Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.
- CD [06] Xi Chen and Xiaotie Deng. Settling the complexity of two-player Nash equilibrium. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS ’06, page 261–272, USA, 2006. IEEE Computer Society.
- CDL+ [24] Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, and Weiqiang Zheng. On tractable -equilibria in non-concave games. arXiv preprint arXiv:2403.08171, 2024.
- CJ [04] Monica-Gabriela Cojocaru and Leo Jonker. Existence of solutions to projected differential equations in hilbert spaces. Proceedings of the American Mathematical Society, 132(1):183–193, 2004.
- CKS [16] George Christodoulou, Annamária Kovács, and Michael Schapira. Bayesian combinatorial auctions. Journal of the ACM (JACM), 63(2):1–19, 2016.
- CKST [16] George Christodoulou, Annamária Kovács, Alkmini Sgouritsa, and Bo Tang. Tight bounds for the price of anarchy of simultaneous first-price auctions. ACM Transactions on Economics and Computation (TEAC), 4(2):1–33, 2016.
- CL [06] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, Cambridge; New York, 2006.
- CP [14] Yang Cai and Christos Papadimitriou. Simultaneous Bayesian auctions and computational complexity. In Proceedings of the Fifteenth ACM Conference on Economics and Computation, EC ’14, page 895–910, New York, NY, USA, 2014. Association for Computing Machinery.
- CP [20] Xi Chen and Binghui Peng. Hedging in games: Faster convergence of external and swap regrets. Advances in Neural Information Processing Systems, 33:18990–18999, 2020.
- CP [21] Yun Kuen Cheung and Georgios Piliouras. Vortices instead of equilibria in minmax optimization: Chaos and butterfly effects of online learning in zero-sum games. In Proceedings of Thirty Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 807–834. PMLR, 2021.
- CP [23] Xi Chen and Binghui Peng. Complexity of equilibria in first-price auctions under general tie-breaking rules. arXiv preprint arXiv:2303.16388, 2023.
- CS [08] Vincent Conitzer and Tuomas Sandholm. New complexity results about Nash equilibria. Games and Economic Behavior, 63(2):621–641, 2008. Second World Congress of the Game Theory Society.
- Das [13] Constantinos Daskalakis. On the complexity of approximating a Nash equilibrium. ACM Transactions on Algorithms (TALG), 9(3):1–35, 2013.
- Das [22] Constantinos Daskalakis. Non-concave games: A challenge for game theory’s next 100 years. In Cowles Preprints, 2022.
- DGP [09] Constantinos Daskalakis, Paul W. Goldberg, and Christos H. Papadimitriou. The complexity of computing a Nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
- DHLZ [22] Xiaotie Deng, Xinyan Hu, Tao Lin, and Weiqiang Zheng. Nash convergence of mean-based learning algorithms in first price auctions. In Proceedings of the ACM Web Conference 2022, pages 141–150, 2022.
- DISZ [18] Constantinos Daskalakis, Andrew Ilyas, Vasilis Syrgkanis, and Haoyang Zeng. Training gans with optimism. International Conference on Learning Representations (ICLR 2018), 2018.
- DN [93] Paul Dupuis and Anna Nagurney. Dynamical systems and variational inequalities. Annals of Operations Research, 44:7–42, 1993.
- DSZ [21] Constantinos Daskalakis, Stratis Skoulakis, and Manolis Zampetakis. The complexity of constrained min-max optimization. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1466–1478, 2021.
- FGL+ [21] Zhe Feng, Guru Guruganesh, Christopher Liaw, Aranyak Mehta, and Abhishek Sethi. Convergence analysis of no-regret bidding algorithms in repeated auctions. volume 35, pages 5399–5406, 2021.
- FLN [16] Michal Feldman, Brendan Lucier, and Noam Nisan. Correlated and coarse equilibria of single-item auctions. In Yang Cai and Adrian Vetta, editors, Web and Internet Economics, pages 131–144, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.
- FP [62] Miroslav Fiedler and Vlastimil Pták. On matrices with non-positive off-diagonal elements and positive principal minors. Czechoslovak Mathematical Journal, 12(3):382–400, 1962.
- FRGH+ [21] Aris Filos-Ratsikas, Yiannis Giannakopoulos, Alexandros Hollender, Philip Lazos, and Diogo Poças. On the complexity of equilibrium computation in first-price auctions. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 454–476, 2021.
- GGM [08] Geoffrey J. Gordon, Amy Greenwald, and Casey Marks. No-regret learning in convex games. In Proceedings of the 25th International Conference on Machine Learning, pages 360–367, 2008.
- GJ [03] Amy Greenwald and Amir Jafari. A general class of no-regret learning algorithms and game-theoretic equilibria. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 2–12. Springer, 2003.
- GLM [06] Amy Greenwald, Zheng Li, and Casey Marks. Bounds for regret-matching algorithms. In AI&M, 2006.
- Gor [05] Geoffrey J Gordon. No-regret algorithms for structured prediction problems. Carnegie Mellon University. Center for Automated Learning and Discovery, 2005.
- GZ [08] Peter W. Glynn and Assaf Zeevi. Bounding stationary expectations of markov processes. In Markov processes and related topics: a Festschrift for Thomas G. Kurtz, volume 4, pages 195–215. Institute of Mathematical Statistics, 2008.
- HK [07] Elad Hazan and Satyen Kale. Computational equivalence of fixed points and no regret algorithms, and convergence to equilibria. Advances in Neural Information Processing Systems, 20, 2007.
- HLLL [19] Wenqing Hu, Chris Junchi Li, Lei Li, and Jian-Guo Liu. On the diffusion approximation of nonconvex stochastic gradient descent. Annals of Mathematical Sciences and Applications, 4(1), 2019.
- HMC [21] Nadav Hallak, Panayotis Mertikopoulos, and Volkan Cevher. Regret minimization in stochastic non-convex learning via a proximal-gradient approach. In International Conference on Machine Learning, pages 4008–4017. PMLR, 2021.
- HS [88] John C. Harsanyi and Reinhard Selten, editors. A general theory of equilibrium selection in games. MIT Press, 1st edition, 1988.
- HSZ [17] Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. In International Conference on Machine Learning, pages 1433–1441. PMLR, 2017.
- JL [22] Yaonan Jin and Pinyan Lu. Settling the efficiency of the first-price auction. SIGecom Exch, 20:69–74, 2022.
- KN [22] Yoav Kolumbus and Noam Nisan. Auctions between regret-minimizing agents. In Proceedings of the ACM Web Conference 2022, pages 100–111, 2022.
- KP [09] Elias Koutsoupias and Christos Papadimitriou. Worst-case equilibria. Computer science review, 3(2):65–69, 2009.
- LBR+ [19] Alistair Letcher, David Balduzzi, Sébastien Racanière, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. Differentiable game mechanics. Journal of Machine Learning Research, 20(84):1–40, 2019.
- Lee [12] John M. Lee. Introduction to Smooth Manifolds. Springer New York, NY, 2012.
- Lim [88] Elon L. Lima. The jordan-brouwer separation theorem for smooth hypersurfaces. The American Mathematical Monthly, 95(1):39–42, 1988.
- LO [19] Xiaoyu Li and Francesco Orabona. On the convergence of stochastic gradient descent with adaptive stepsizes. In The 22nd international conference on artificial intelligence and statistics, pages 983–992. PMLR, 2019.
- LST [12] Renato Paes Leme, Vasilis Syrgkanis, and Éva Tardos. Sequential auctions and externalities. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 869–886. SIAM, 2012.
- MGP [20] Hassan Mortagy, Swati Gupta, and Sebastian Pokutta. Walking in the shadow: A new perspective on descent directions for constrained minimization. Advances in neural information processing systems, 33:12873–12883, 2020.
- MPPS [22] Jason Milionis, Christos Papadimitriou, Georgios Piliouras, and Kelly Spendlove. Nash, Conley, and computation: Impossibility and incompleteness in game dynamics, 2022.
- MS [96] Dov Monderer and Lloyd S Shapley. Potential games. Games and economic behavior, 14(1):124–143, 1996.
- MS [22] Carlos Martin and Tuomas Sandholm. Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks. arXiv preprint arXiv:2211.15936, 2022.
- MZ [19] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. Math. Program., 173(1–2):465–507, jan 2019.
- Nas [50] John F. Nash. Equilibrium points in -person games. Proceedings of the National Academy of Sciences, 36(1):48–49, 1950.
- Ngu [19] Kim Thang Nguyen. Primal-Dual Approaches in Online Algorithms, Algorithmic Game Theory and Online Learning. Habilitation à diriger des recherches, Université Paris Sorbonne, June 2019.
- NR [10] Uri Nadav and Tim Roughgarden. The limits of smoothness: A primal-dual framework for price of anarchy bounds. In International Workshop on Internet and Network Economics, pages 319–326. Springer, 2010.
- NZ [12] Anna Nagurney and Ding Zhang. Projected dynamical systems and variational inequalities with applications, volume 2. Springer Science & Business Media, 2012.
- PP [16] Ioannis Panageas and Georgios Piliouras. Average case performance of replicator dynamics in potential games via computing regions of attraction. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 703–720, 2016.
- PP [19] Christos Papadimitriou and Georgios Piliouras. Game dynamics as the meaning of a game. SIGecom Exch., 16(2):53–63, May 2019.
- PPP [17] Gerasimos Palaiopanos, Ioannis Panageas, and Georgios Piliouras. Multiplicative weights update with constant step-size in congestion games: Convergence, limit cycles and chaos. Advances in Neural Information Processing Systems, 30, 2017.
- Rob [51] Julia Robinson. An iterative method of solving a game. Annals of mathematics, pages 296–301, 1951.
- Ros [65] J. Ben Rosen. Existence and uniqueness of equilibrium points for concave -person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
- Rou [05] Tim Roughgarden. Selfish routing and the price of anarchy. MIT press, 2005.
- Rou [15] Tim Roughgarden. Intrinsic robustness of the price of anarchy. Journal of the ACM (JACM), 62(5):1–42, 2015.
- San [20] William H Sandholm. Evolutionary game theory. Complex Social and Behavioral Systems: Game Theory and Agent-Based Models, pages 573–608, 2020.
- Sha [64] Lloyd Shapley. Some topics in two-person games. Advances in game theory, 52:1–29, 1964.
- Sha [01] Alexander Shapiro. On duality theory of conic linear problems. Nonconvex Optimization and its Applications, 57:135–155, 2001.
- SL [07] Gilles Stoltz and Gábor Lugosi. Learning correlated equilibria in games with compact sets of strategies. Games and Economic Behavior, 59(1):187–208, 2007.
- SLS+ [24] Iosif Sakos, Stefanos Leonardos, Stelios Andrew Stavroulakis, William Overman, Ioannis Panageas, and Georgios Piliouras. Beating price of anarchy and gradient descent without regret in potential games. In The Twelfth International Conference on Learning Representations, 2024.
- SM [03] Andreas S. Schulz and Nicolás E. Stier Moses. On the performance of user equilibria in traffic networks. In SODA, volume 3, pages 86–87, 2003.
- TTRL [23] Shaolin Tan, Ye Tao, Maopeng Ran, and Hao Liu. On the convergence of distributed projected gradient play with heterogeneous learning rates in monotone games. Systems & Control Letters, 182:105654, 2023.
- Vet [02] Adrian Vetta. Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions. In The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pages 416–425. IEEE, 2002.
- You [04] H. Peyton Young. Strategic learning and its limits. Oxford University Press, 2004.
- ZMA+ [18] Zhengyuan Zhou, Panayotis Mertikopoulos, Susan Athey, Nicholas Bambos, Peter W Glynn, and Yinyu Ye. Learning in games with lossy feedback. Advances in Neural Information Processing Systems, 31, 2018.