LEARNING IN NONATOMIC GAMES, PART I:
FINITE ACTION SPACES AND POPULATION GAMES
Abstract.
We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. The class of dynamics under consideration includes fictitious play and its regularized variants, the best reply dynamics (again, possibly regularized), as well as the dynamics of dual averaging / “follow the regularized leader” (which themselves include as special cases the replicator dynamics and Friedman’s projection dynamics). Our analysis concerns both the actual trajectory of play and its time-average, and we cover potential and monotone games, as well as games with an evolutionarily stable state (global or otherwise). We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II.
Key words and phrases:
Learning; nonatomic games; fictitious play; dual averaging; evolutionary stability; Nash equilibrium; variational inequalities2020 Mathematics Subject Classification:
Primary 91A22, 91A26; secondary 37N40, 68Q32.1. Introduction
The study of evolutionary dynamics in population games has been the mainstay of evolutionary game theory ever since the inception of the field in the mid-1970’s. Such dynamics are typically derived from biological models or economic microfoundations that express the net growth rate of a type (or strategy) within a population as a function of the relative frequency of all other types. Thus, owing to the flexibility of this general framework, these considerations have generated an immense body of literature; for an introduction to the topic, we refer the reader to the masterful treatments of Hofbauer & Sigmund [20], Weibull [57], and the more recent, panoramic survey of Sandholm [45, 46].
Our paper takes a complementary approach and aims to examine the long-run behavior of learning procedures in nonatomic games. While the topic of learning in games with a finite number of players is fairly well-studied – see e.g., the classic works of Fudenberg & Levine [13] and Cesa-Bianchi & Lugosi [9] – the same cannot be said for learning in population games. On that account, our goal in this paper is to provide a synthetic treatment for a wide range of dynamics for learning in nonatomic games, unifying along the way a number of existing results in the literature.
We pursue this goal in two parts. In the current paper, we focus exclusively on games with shared, finite action spaces; games with continuous (and possibly player-dependent) action spaces are significantly more complicated, so their study is deferred to the companion paper [16].
Overview of the paper.
To specify the general framework under consideration one has to:
-
a)
Describe the family of games to be studied: These are anonymous nonatomic games and two concrete classes thereof, potential and monotone games; the relevant details are provided in Section 2.
-
b)
Identify the solution concepts involved: We will mainly focus on Nash equilibria and variants of evolutionarily stable states, expressed throughout as solutions of an associated variational inequality; this is also described in Section 2.
-
c)
Define the dynamics under study: This is the content of Section 3, where we begin by recalling the standard fictitious play procedure of Brown [8] and Robinson [41], its regularized variants [18], and its continuous-time analogue, the best reply dynamics of Gilboa & Matsui [15]. Subsequently, we introduce a general class of dynamics known as dual averaging – or “follow the regularized leader” – and which contains as special cases the replicator dynamics of Taylor & Jonker [55] and the projection dynamics of Friedman [12].
The dynamics we examine evolve in both discrete and continuous time: the continuous-time analysis is presented in Section 4 and comprises the template for the discrete-time analysis that follows in Section 5. In both cases, our analysis builds on previous work and results by Monderer & Shapley [37], Sandholm [44], Hofbauer & Sigmund [21], Hofbauer & Sandholm [18, 19], Hofbauer et al. [23] and Mertikopoulos & Sandholm [32, 33]. In this context, some of the new results that we present in our paper can be summarized as follows:
-
a)
We establish a precise link between (regularized) fictitious play and dual averaging: informally, fictitious play best-responds to the average population state whereas dual averaging best-responds to the average population payoff profile.
-
b)
We determine the equilibrium convergence properties of vanishingly regularized fictitious play in continuous time.
-
c)
We provide a range of conditions for the convergence of dual averaging trajectories to evolutionarily stable states, in both discrete and continuous time (and possibly with a variable learning rate).
Finally, in Section 6, we discuss several extensions of these results beyond single-population nonatomic games, and we provide a preview of the issues that arise in nonatomic games with continuous action spaces – the topic of interest of the follow-up paper [16].
2. Preliminaries
2.1. Nonatomic games
A (single-population) nonatomic game is defined by the following primitives:
-
•
A population of players. This is modeled by the unit interval endowed with the Lebesgue measure . The nonatomic aspect means that the behavior of a set of players with zero measure has no impact on the game (see below).
-
•
A set of pure strategies . Following Schmeidler [47], the players’ strategic behavior is defined as a measurable function , with denoting the pure strategy of player . Given a pure strategy , the inverse image will be called the set of “-strategists”, and the corresponding pushforward measure on will be called the state of the population. This measure will be the main variable of interest; for convenience, we will view as an element of the simplex and we will write for the relative frequency of -strategists in the population.
-
•
A family of continuous payoff functions , . Specifically, denotes the payoff to -strategists when the state of the population is . Collectively, we will also write for the population’s payoff vector at state .
The above definition specifies a population game , cf. Sandholm [45] and references therein. In this definition, players are anonymous in the sense that their payoffs factor through the state of the population and do not depend on specific players choosing specific strategies. For more intricate classes of nonatomic games (possibly involving idiosyncratic components), see Schmeidler [47] and Mas-Colell [27].
2.2. Equilibria, stability, and variational inequalities
In population games, an equilibrium is defined as a population state in which almost all players are satisfied with their choice of strategy. Formally, is an equilibrium of if
(Eq) |
where denotes the support of . Equivalently, this requirement for can be stated as a Stampacchia variational inequality (SVI) of the form
(SVI) |
Equilibria can also be seen as fixed points of the continuous endomorphism , where denotes the Euclidean projector on . By standard fixed point arguments, it follows that the game’s set of equilibria is nonempty; we will denote this set as .
Remark.
In operator theory and optimization, the direction of (SVI) is typically reversed because optimization problems are usually formulated as minimization problems. The utility maximization viewpoint is more common in the literature on population games, so we will maintain this sign convention throughout.
An alternative characterization of equilibria is by means of the best-response correspondence , defined here as
(1) |
We then have the following string of equivalences:
(2) |
i.e., the equilibria of the game correspond to the fixed points of . One can easily show that, under our assumptions, this correspondence has a closed graph and convex compact non-empty values. By Kakutani’s fixed point theorem, – that is, the set of fixed points of – is nonempty and compact.
We will also consider the associated Minty variational inequality (MVI) for , namely
(MVI) |
For concreteness, we will write and for the set of solutions of (SVI) and (MVI) respectively. In terms of structural properties, is convex and, under our continuity assumption for , it is straightforward to verify that . In this regard, (MVI) represents an “equilibrium refinement” criterion for (see also Section 2.3 below).
In more detail, (MVI) is intimately related to the seminal notion of evolutionary stability that was introduced by Maynard Smith & Price [28] and formed the starting point of evolutionary game theory. To make this link precise, recall that a population state is an evolutionarily stable state of [28, 20, 45], if
(ESS) |
for all sufficiently small and all states . Analogously, is a neutrally stable state (NSS) of if (ESS) holds as a weak inequality, i.e.,
(NSS) |
for all sufficiently small and all . Finally, is said to be a globally evolutionarily stable state (resp. globally neutrally stable state) of if (ESS) [resp. (NSS)] holds for all . In obvious notation, we will write , , and for the respective set of states of (by definition consists of at most a single point).
Characterizations and implications.
A concise characterization of these notions of stability was derived by Taylor [54] and Hofbauer et al. [22], who showed that if and only if there exists a neighborhood of in such that
(3) |
and, furthermore, if we can take above, then . For the corresponding neutral versions, we have if and only if (3) holds on as a weak inequality, and if and only if we can take in this case. This characterization of corresponds precisely to the solution set of (MVI); in particular, we have the following string of inclusions:
(4) |
In general, the above inclusions are all one-way; in the next section, we discuss two important cases where some of them become equivalences.
2.3. Classes of games
The following classes of games will play an important role in the sequel:
-
(1)
Potential games: these are games that admit a potential function, i.e., a function such that on , or, more generally,
(Pot) where denotes the one-sided directional derivative of at along . If is a potential game, any local maximum of is an equilibrium of , and any strict local maximum of is an evolutionarily stable state (ESS) of . For a detailed treatment of potential games in population games extending the initial study of Monderer & Shapley [37], see Sandholm [44, 45].
-
(2)
Monotone games: these are games that satisfy the monotonicity condition
(Mon) This condition implies that every equilibrium of is neutrally stable, so we get the following equivalences:
(5) In this class, the existence of equilibria relies only on the minmax theorem [36].
The intersection of potential and monotone (resp. strictly monotone) games occurs when admits a concave (resp. strictly concave) potential. The most important example of such games is the class of nonatomic congestion games that arise in routing and transportation theory, cf. Dafermos [11].
2.4. Regularization
In the sequel, we will often consider “regularized” best responses that are single-valued. These are defined as follows:
Definition 2.1.
A regularizer on is a function such that:
-
(1)
is supported on , i.e., .
-
(2)
is strictly convex and continuous on .
Given a regularizer on , we further define:
-
\edefnit\selectfonta\edefnn)
The associated choice map given by
(7a) (7b)
Remark 2.1.
By our assumptions for (continuity and strict convexity), and are both well-defined and single-valued as correspondences. Moreover, by the strict convexity of , it follows that the convex conjugate of is differentiable and satisfies
(8) |
Remark 2.2.
By construction, (approximately) best-responds to strategies, while best-responds to payoff vectors. For this reason, we will sometimes refer to as a “payoff-based” regularized best response (as opposed to “strategy-based” regularized best response for ).
Moving forward, for part of our analysis, we will also require one (or both) of the regularity conditions below:
-
(1)
is -strongly convex on , i.e.,
(StrCvx) for all and all .
-
(2)
The subdifferential of admits a continuous selection; specifically, writing for the “prox-domain” of , we posit that there exists a continuous map such that
(Diff)
By replacing best responses with regularized best responses in the definition of equilibria, we also define the notion of a regularized equilibrium as follows:
Definition 2.2.
Given a regularizer on and a regularization weight , we define an -regularized equilibrium of as a profile such that . The set of -regularized equilibria of will be denoted as .
Under (Diff), is an -regularized equilibrium if and only if
(9) |
In this regard, -regularized equilibria can be seen as Nash equilibria of the “-regularized” game with payoff profile ; more concisely, .
Equilibria and regularized equilibria are related by the next hemicontinuity property, itself a consequence of the maximum theorem of Berge [6].
Lemma 1.
Let , , be a family of regularized equilibria of . Then, in the limit , any accumulation point of is an equilibrium of .
We close this section with a concrete illustration of the above concepts:
Example 2.1 (Logit choice).
An important example of regularization corresponds to the entropic regularizer which, by a standard calculation, leads to the choice map
(10) |
This map is commonly referred to as the logit choice map [29], and the corresponding fixed points are known as logit equilibria; for a detailed discussion, see van Damme [56], McKelvey & Palfrey [30], Sandholm [45], and references therein. It is also well known that the entropic regularizer is -strongly convex relative to the -norm, so is -Lipschitz continuous, cf. Shalev-Shwartz [48].
3. Dynamics
In this section, we introduce a wide range of dynamics for learning in nonatomic games, in both continuous and discrete time. The unifying principle of the dynamics under study is that agents play a best response – regularized or otherwise – to some version of the history of play: the empirical population frequencies, or the players’ average gains. For historical reasons, we begin with the discrete-time framework in Section 3.1, and we present the corresponding continuous-time dynamics in Section 3.2.
3.1. Discrete-time processes
In the discrete-time case, we will consider the evolution of the population state over a series of epochs (corresponding for example to generations in an evolutionary context). In turn, this population state is determined by a recursive update rule – or algorithm – that defines the dynamics in question. Depending on the manner of “best-responding” and the signal that the agents are best-responding to, we have the following dynamics.
\AclFP.
Dating back to Brown [8] and Robinson [41], fictitious play (FP) posits that agents best-respond to the empirical frequency (or “time average”) of their population state. Concretely, this means that the population state evolves according to the process
(FP) |
In terms of empirical frequencies, we have the recursive dynamics
(11) |
\AclRFP.
A variant of the original fictitious play algorithm is the “regularized” version in which best responses are replaced by regularized best responses. In particular, given an underlying regularizer on and a corresponding regularization weight , the regularized fictitious play (RFP) process is defined as
(RFP) |
with
(13) |
Then, as before, in terms of empirical frequencies, we have
(14) |
As in the case of ( ‣ 3.1), the stationary points of the time-averaged dynamics ( ‣ 3.1) are given by the fixed point equation
(15) |
i.e., they are precisely the -regularized equilibria of (cf. Definition 2.2).
Remark.
In the literature, this process is sometimes referred to as “smooth” or “perturbed” fictitious play, cf. Fudenberg & Levine [14], Hofbauer & Sandholm [18] and references therein. This difference in terminology is owed to a different set of assumptions for , which gives rise to interior-valued choice maps .
\AclVRFP.
When the regularization weight in ( ‣ 3.1) decreases over time, we obtain the vanishingly regularized fictitious play (VRFP) dynamics of Benaïm & Faure [3], viz.
(VRFP) |
with for all and as . In terms of empirical frequencies, we then have
(16) |
Because this process is non-autonomous ( depends explicitly on ), it is no longer meaningful to discuss its rest points.
\AclDA.
Dually to the above, instead of best-responding to the aggregate history of play (perfectly or approximately), we can also consider the case where agents play a ”regularized best response to their aggregate payoff”. In our context, this gives rise to the dual averaging dynamics
(17) | ||||
with defined in (7a). Then, in iterative form, we get the update rule | ||||
(DA) |
where is a “learning rate” parameter in the spirit of Nesterov [40]. In the recursive formulation ( ‣ 3.1) of the dynamics, the initialization is the most common one but, otherwise, it can be arbitrary.
Remark.
In the literature on online learning and online optimization, the process ( ‣ 3.1) is known as “follow the regularized leader” (FTRL) [48, 49]. The terminology “dual averaging” is due to Nesterov [40] and is more common in the theory of variational inequalities and offline convex optimization. We adopt the latter to highlight the link of our work to variational inequalities.
Links between the dynamics: the case of random matching.
To illustrate the relations between the various learning processes discussed above, it will be instructive to consider nonatomic games generated by random matching [57, 45]. In such games, players are randomly matched to play a symmetric two-player game with payoff matrix , so the population’s mean payoff profile is for all . As a result, the dynamics ( ‣ 3.1) may be rewritten as
(18) |
which immediately allows us to recover the variants of fictitious play as follows:
Remark.
The above is meaningful only when is linear in ; however, the relation between and will be seen to underlie a large part of the sequel.
3.2. Continuous-time dynamics
We now proceed to define the corresponding continuous-time dynamics for each of the processes described above.
\AclBRD.
The autonomous formulation (11) of fictitious play can be seen as an Euler discretization of the best reply dynamics (BRD) of Gilboa & Matsui [15], namely
(BRD) |
In the above, is the continuous-time analogue of the empirical mean process in discrete time; we use the notation instead of to highlight this link. Clearly, the rest points of ( ‣ 3.2) are again described by the fixed point equation (12), i.e., is stationary under ( ‣ 3.2) if and only if it is an equilibrium of .
\AclRBRD.
Working as above, the regularized best reply dynamics (RBRD) are defined as
(RBRD) |
As in the case of ( ‣ 3.1), the rest points of ( ‣ 3.2) are characterized by the fixed point equation (15), i.e., is stationary under ( ‣ 3.2) if and only if it is an -regularized equilibrium of .
Example 3.1 (The logit dynamics).
Building on the logit choice map discussed in Section 2 (cf. Example 2.1), the associated regularized best reply dynamics are known as logit dynamics, and are given by
(19) |
Vanishing regularized best reply dynamics.
If we allow the regularization weight to vary in ( ‣ 3.2), we obtain the non-autonomous dynamics
(VBRD) |
where the “V” indicates again that as .
The dual averaging dynamics.
Finally, given a regularizer on as above, the dynamics of dual averaging in continuous time can be written as
(DAD) | ||||
where is a time-varying learning parameter, which we assume throughout to be non-increasing. More compactly, the above can also be written as
(20a) | ||||
or | ||||
(20b) |
depending on whether we take a differential- or integral-based viewpoint. Specifically, we note that (20a) is an autonomous ordinary differential equation evolving in the dual, payoff space of the game; by contrast, (20b) is an integral equation stated directly in the primal, strategy space of the game. We also note that, unlike ( ‣ 3.2), ( ‣ 3.2) and ( ‣ 3.2), the dynamics ( ‣ 3.2) are stated in terms of the actual population state at time , not its empirical mean (which is the driving variable of the previous dynamics).
Example 3.2.
If and is the entropic regularizer on (cf. Example 2.1), the dynamics ( ‣ 3.2) yield the replicator dynamics of Taylor & Jonker [55], viz.
(RD) |
In words, (RD) indicates that the per capita growth rate of the population of -strategists is proportional to the difference between the payoff that they experience and the mean population payoff. For a detailed presentation of this derivation in different contexts, see Rustichini [43], Sorin [50, 51], Mertikopoulos & Moustakas [31] and references therein.
Example 3.3.
If and , the integral form (20b) of ( ‣ 3.2) becomes
(21) |
where denotes the Euclidean projector on . The differential form of (21) is more delicate to describe because may enter and exit different faces of in perpetuity. However, if we focus on an open interval over which the support of remains constant, it can be shown that (21) follows the projection dynamics of Friedman [12], viz. for all we have
(PD) |
For the details of this derivation, see Mertikopoulos & Sandholm [32].
Random matching in continuous time.
Following Hofbauer et al. [23], the various dynamics presented so far can be linked as follows when is linear and for :
(22) |
It thus follows that the empirical distribution of play under ( ‣ 3.2) follows the dynamics
(23) |
with . Hence, after the change of time , we get ( ‣ 3.2) with a variable regularization parameter .
4. Analysis and results in continuous time
We now proceed to present our results for the class of dynamics under study. We begin with the continuous-time analysis which provides a template for the discrete-time analysis in the next section. We prove that under some assumptions depending on the dynamical system and the class of games under consideration (potential, monotone or strictly monotone), the dynamics converge to the set of equilibria or or – regularized equilibria – of the game.111Recall that a family (resp. a sequence , ) converges to a set if the set of accumulation points (resp. ), as (resp ), satisfies: (resp. ).
4.1. \AclBRD
Our first result concerns the best reply dynamics ( ‣ 3.2). Albeit slightly more general, the results of this subsection and the next essentially follow Hofbauer & Sandholm [18, 19].
Theorem 4.1.
Suppose that one of the following holds:
-
\edefitn(\edefitit\selectfonta\edefitn)
is potential.
-
\edefitn(\edefitit\selectfonta\edefitn)
is monotone and is -smooth.
Then every solution orbit of ( ‣ 3.2) converges to .
Proof.
Both cases rely on establishing a suitable Lyapunov function for ( ‣ 3.2):
-
(1)
In the potential case, we simply take .
-
(2)
In the monotone case, where denotes the function
(24) Observe that with equality if and only if .
We now proceed on a case-by-case basis:
Potential case.
Monotone case.
Let denote the Jacobian of . By the envelope theorem we get:
(26) |
As is monotone, for all , so . Consequently:
(27) |
Hence decreases exponentially to , so converges to .∎
4.2. \AclRBRD
In this section we will assume that the regularizer satisfies (Diff). In a slight abuse of notation, we will skip the dependence on and write for , see (9), with defined as in (Diff). We also write for and so on. The reason is that the regularizer will be fixed throughout while the weight will be allowed to vary in the next subsection.
Lemma 2.
Under (Diff), we have for all , and equality holds if and only if .
Proof.
Let . Then is strictly concave and so defines a strictly monotone operator. Thus, is a solution of the variational inequality (MVI) associated to : for all : with equality if and only if . As and the result follows. ∎
Lemma 3.
Suppose that (Diff) holds. If with payoff field is potential (resp. monotone) then the game with payoff function is potential (resp. strictly monotone).
Proof.
If has a potential , a potential function of is . If is a monotone game, the game is strictly monotone because we have for every , with equality if and only if . ∎
We now turn to the asymptotic properties of regularized best reply dynamics.
Theorem 4.2.
Remark.
Observe that when is monotone, is strictly monotone, so is a singleton.
Proof.
As above, both cases rely on establishing a suitable Lyapunov function for ( ‣ 3.2):
-
(1)
In the potential case: .
-
(2)
In the monotone case:
(28)
Potential case.
Monotone case.
4.3. Vanishing regularized best reply dynamics
A similar technique will be used to obtain convergence to the set of Nash equilibria for vanishing regularized best reply dynamics. The additional difficulty here comes from the fact that the vanishing regularized best reply dynamics is not autonomous and so we should prove by hand the Lyapunov convergence argument. To the best of our knowledge, this is a new result.
Theorem 4.3.
Suppose that (Diff) holds and ( ‣ 3.2) is run with a smoothly-varying and strictly decreasing regularization weight . Assume further that takes values in , and one of the following holds:
-
\edefitn(\edefitit\selectfonta\edefitn)
is potential.
-
\edefitn(\edefitit\selectfonta\edefitn)
is monotone, is -smooth, and is bounded.
Then every solution orbit of ( ‣ 3.2) converges to .
Remark.
The assumption that takes value in is only made for notational convenience and does not incur any loss of generality.
Proof.
As before, we treat the potential and monotone cases separately.
Potential case.
Let . Then, a calculation similar to the above implies that there exists such that
(32) |
The inequality being a consequence of Lemma A.1 as it implies that . By Lemma 2, . Since we deduce that and consequently converges. As goes to zero and is bounded, converges as well. We will deduce from this that converges to . By contradiction, if a limit point is not in then:
(33) |
Recall that, by definition, for all and :
(34) |
Also, as takes values in , . Thus:
(35) |
As in (25) one has for all and :
(36) |
Thus, using (4.3) then (35) we deduce that:
(37) |
Monotone case.
Using the envelope theorem, and a calculation similar to the above implies that there exists such that
(39a) | ||||
(39b) | ||||
(39c) | ||||
(39d) |
The first inequality is a consequence of Lemma A.1, which implies that . The second inequality is a consequence of as is monotone.
We now proceed to prove that .
-
\edefnitCase 1:
If then thus . Then (39a) implies that and thus .
- \edefnitCase 2:
Consequently, converges to and since decreases to zero we conclude that . If , define:
(41) |
From (4.3) we further get:
(42) |
As , and by the assumptions, and are bounded and decreases to , we deduce that for large enough:
(43) |
Consequently, for large enough we deduce from (39d), (40) and the last inequality that
(44) |
which is impossible because . Hence and so converges to . ∎
4.4. \AclDAD with a constant learning rate
Before stating our main result we establish some useful lemmas. Define the regret associated to a trajectory and a reference point as:
(45) |
Lemma 4.
Let be a solution trajectory of ( ‣ 3.2). Then:
(46) |
This is a particular case of a more general bound due to Kwon & Mertikopoulos [26] (see Lemma 7 below). The next more compact proof of this simpler bound follows Bravo & Mertikopoulos [7] and Sorin [52].
Proof.
Lemma 5.
If a solution trajectory of ( ‣ 3.2) is stationary, then .
Proof.
From the previous lemma, for all and one has
(49) |
Letting implies , as was to be shown. ∎
Lemma 6.
If satisfies (StrCvx), then for all and all sequences in .
(50) |
Proof.
See Lemma A.2 in the appendix. ∎
We also need to assume the so called “reciprocity’ condition which requires that for all and all sequences in :
(Rec) |
Taken together, (StrCvx) and (Rec) imply that the sequence converges to if and only if . The terminology “reciprocity” is justified by the fact that, under (Rec), the topology induced by the level sets of the Fenchel coupling becomes equivalent to the ordinary topology on ; for a precise statement, see Lemma A.2 in Appendix A. This is a subtle – but important – requirement, which is fortunately satisfied by all the standard regularization choices on , cf. [34, 35].
We can now state our convergence results for ( ‣ 3.2) when the learning rate is constant.
Theorem 4.4.
Let be a solution trajectory of ( ‣ 3.2) with constant learning rate . Assume further that one of the following holds:
-
\edefitn(\edefitit\selectfonta \edefitn)
is potential and is twice differentiable.
- \edefitn(\edefitit\selectfonta \edefitn)
Then converges to .
Proof.
Again, we treat the potential and monotone cases separately.
Potential case.
As and we obtain that:
(51) |
Since is convex, is semidefinite positive and so:
(52) |
the inequality being strict whenever . As such, converges to .
Monotone case.
As is strictly monotone, is a singleton which coincides with the (necessarily unique) GESS of the game. The calculation in (48) gives:
(53) |
As is a GESS , with equality if and only if . Suppose there is such that for all large enough. Then, goes to which is impossible because . Thus there is such that and so . By the reciprocity condition (Rec), we deduce that since is a GESS. As and , necessarily converges and by the last argument, its limit is zero. Using the strong convexity of , Lemma 6 implies that converges to . ∎
There are several things to note here. First, the proof of convergence to the unique equilibrium in strictly monotone games (part b of Theorem 4.4) extends to all games with a GESS. Second, the same proof implies that in all games, ESS (if exist) are asymptotically stable under ( ‣ 3.2) with constant learning rate (e.g., the dynamics converges to an ESS if it starts sufficiently close to it). Third, Theorem 4.4 concerns the “day-to-day” evolution of ( ‣ 3.2), not the corresponding empirical mean. Last, the theorem does not apply to monotone games which are not strict.
To complete the picture, the next proposition shows the convergence of the empirical mean to the set of Nash equilibria in all monotone games.
Proposition 4.1.
Let be a solution trajectory of ( ‣ 3.2) with constant learning rate . If the underlying game is monotone, then converges to .
Proof.
By Lemma 4 and monotonicity of the game, for all :
(54) |
and hence
(55) |
Consequently, converges to which coincides with as the game is monotone. ∎
4.5. \AclDAD with a variable learning rate
This subsection extends the results of the previous section on monotone games in two directions: we consider GESSs instead of (strictly) monotone games, and we treat dynamics with a variable learning rate. Other extensions are also possible, but we do not treat them here.
We begin with a technical lemma on the regret incurred by ( ‣ 3.2):
Lemma 7.
If is a solution trajectory of ( ‣ 3.2) then the regret is bounded as follows:
(56) |
Consequently, if and for all , then .
Proof.
Proposition 4.2.
Let be a solution trajectory of ( ‣ 3.2). If the underlying game is monotone and then converges to .
The next result extends part b) of theorem 4.4 to variable learning rates. The proof is a little more complicated.
Theorem 4.5.
Proof.
Claim 1.
infinitely often.
Proof.
If , then by (Rec), there exists a neighbourhood of such that . By definition of GESS, there is such that . As , there is such that for all we have . Consequently by contradiction to the claim, for all large enough and so goes to . Since this is impossible. ∎
Claim 2.
There exists such that if and then there exists such that for all .
Proof.
We need to consider two cases separately:
-
Case a).
If then obviously, there is such that for all .
-
Case b).
If but then, from claim 1, . Thus there exists such that for all : . As , and since , we deduce that . Thus for all . ∎
From Claims 1 and 2, there exists , such that belongs to and remains in it forever. ∎
5. Analysis and results in discrete time
In this section, we continue with the analysis of the discrete-time dynamics presented in Section 3.1, starting with fictitious play and its variants in Section 5.1 before moving on to the dynamics of dual averaging in Section 5.2.
5.1. \AclFP and regularized fictitious play
The convergence of ( ‣ 3.1) and ( ‣ 3.1) can be obtained from the continuous-time analysis of Section 4 using the theory of stochastic approximation [2, 4, 5]. Informally, this theory links the asymptotic behavior of differential inclusions of the form
(58) |
to the limit sets of discrete-time processes satisfying the basic recursion
(59) |
where is an upper semicontinuous, compact-convex-valued correspondence on , and is a vanishing step-size sequence.
In our specific context, the role of is to be played by the best-reply correspondence for ( ‣ 3.1), and its regularized variant for ( ‣ 3.1). Concretely, we have:
Proposition 5.1.
Let , , be the empirical frequency of play under a sequence of population states . Then:
- \edefitn(\edefitit\selectfonti \edefitn)
- \edefitn(\edefitit\selectfonti \edefitn)
Informally, the notion of an asymptotic pseudotrajectory (APT) means that asymptotically tracks the pseudo-flow of ( ‣ 3.2)/( ‣ 3.2) with arbitrary accuracy over windows of arbitrary length; for a precise definition, see Benaïm et al. [4, Definitions I and 4.1].The proof of Proposition 5.1 follows immediately from the general theory of Benaïm et al. [4, Propositions 1.3, 1.4, and Theorem 4.2], so we do not present it here.
In view of this “asymptotic approximation” property, it is plausible to expect that exhibits the same asymptotic behavior as the corresponding continuous-time dynamics. Our next result makes this intuition precise in two classes of games:
-
(1)
Monotone games, i.e., games that satisfy (Mon).
-
(2)
Potential games for which the -regularized potential , , satisifies the Sard-type condition:
(Sε)
Our main results for ( ‣ 3.1) and ( ‣ 3.1) may then be stated as follows:
Theorem 5.1.
Theorem 5.2.
Sketch of proof.
The proof of Theorems 5.1 and 5.2 follows a similar technical path starting with the fact that is an APT of ( ‣ 3.2) / ( ‣ 3.2) respectively (cf. Proposition 5.1 above). In both cases, Theorem 4.3 of Benaïm et al. [4] shows that the accumulation points of an APT lie in an internally chain transitive set of the underlying continuous-time dynamics.222An internally chain transitive set is a compact invariant set that contains no proper attractors [10]. The claims of Theorem 5.1 may then be proved as follows:
-
(1)
For games satisfying (Mon), invoke Theorem 4.1 from Section 4 and Theorem 3.23 and Proposition 3.25 of Benaïm et al. [4].
-
(2)
For games satisfying (Sε) with , invoke Theorem 4.1 from Section 4 and Proposition 3.27 of Benaïm et al. [4].
Finally, for , the proof of Theorem 5.2 is entirely analogous, and simply invokes Theorem 4.2 in lieu of Theorem 4.1 in the above chain of implications. ∎
5.2. \AclDA
We now proceed to state and prove our main convergence result for the recursion ( ‣ 3.1). To do so, we will require the “reciprocity” condition (Rec) that was introduced for the study of the continuous-time dynamics ( ‣ 3.2), namely
(Rec) |
for all and all sequences in . We then have:
Theorem 5.3.
Suppose that ( ‣ 3.1) is run with a regularizer satisfying (StrCvx) and (Rec), and a vanishing non-increasing learning rate such that
(60) |
Then:
-
•
If is a GESS of , the sequence of population states converges to .
-
•
If is an ESS of , the same conclusion holds provided that is small enough and ( ‣ 3.1) is initialized sufficiently close to .
Corollary 5.1.
Our proof relies on the use of a suitable “energy inequality”, provided here by a deflated variant of the Fenchel coupling:
Lemma 8.
Energy inequalities of the form (62) are standard in the analysis of dual averaging schemes, and they go back at least as far as Nesterov [39, 40] (who introduced the method). The specific formulation above follows Héliou et al. [17, Lemma. C.1], so we omit the proof; for a range of similar computations, see Shalev-Shwartz [48, Lemma 2.20], Kwon [25, p. 42] and Sorin [52].
Now, to move forward with the proof of Theorem 5.3, we will require two complementary arguments. The first is a stability result: we show below that, if is sufficiently close to an evolutionarily stable state and the method’s learning rate is “small enough”, then remains close to . To state this precisely, let denote the “Fenchel zone” of magnitude with respect to , and let denote the image of under . We then have:
Lemma 9.
The second component of the proof of Theorem 5.3 is a subsequence extraction argument: if the iterates of ( ‣ 3.1) lie in a neighborhood of where (ESS) holds, there exists a subsequence of that converges to .
Lemma 10.
We prove these two ancillary results below.
Proof of Lemma 9.
Let be the neighborhood whose existence is guaranteed by (ESS), i.e., for all , . By the relation between the Fenchel coupling and the ordinary norm topology on (cf. Lemma A.2 in Appendix A), it follows that if is taken small enough; we will assume this to be the case throughout the rest of this proof. Moreover, by (Rec) and the continuity of , there exists some such that for all .
With all this in hand, assume that ; we will show that provided that is small enough. To do so, let ; then, by substituting in the template inequality (63) of Lemma 8, we obtain:
-
Case 1.
If , then
(64) where and . Hence, if and , we conclude that .
-
Case 2.
If , then
(65) so provided that and .
In both cases, we conclude that , as claimed. ∎
Proof of Lemma 10.
Assume to the contrary that . Then, by assumption, there exist and such that and for all . Since , there exists a positive constant such that for all . With this in mind, substituting in the template inequality (63) and telescoping yields
(66) |
To proceed, note that the first part of (60) gives
(67) |
so by the Stolz–Cesàro theorem. By the second part of (60), we also have . Hence, by letting in (5.2), we get , a contradiction which completes our proof. ∎
The proof of our main convergence results follows by a tandem application of the above.
Proof of Theorem 5.3.
We consider the two cases separately.
Case 1: .
Let be arbitrary. By (60), it follows that , and, by Lemma 10, there exists a subsequence of converging to . This means that, in the long run, and become arbitrarily small, and comes arbitrarily close to . Hence, by applying Lemma 9 inductively, we conclude that for all sufficiently large . Since has been chosen arbitrarily, this implies that converges to , as claimed.
Case 2: .
Let be the neighborhood whose existence is guaranteed by the evolutionary stability condition (3) for . Then, by Lemma 9, if ( ‣ 3.1) is initialized sufficiently close to and is sufficiently small, we will have for all . Hence, by Lemma 10, we conclude that . Our proof then follows by the same inductive application of Lemma 9 as in the case of a GESS above. ∎
Time averages under dual averaging.
We close this section with a result on the empirical frequency of play under ( ‣ 3.1). Indeed, telescoping (62) trivially yields the bound:
(68) |
This regret-type bound echoes standard results in the literature – see e.g., Nesterov [40, Theorem 1] – and leads to the following time-average convergence result:
Proposition 5.2.
The proof of Proposition 5.2 mimics that of Proposition 4.1 so we omit it. We only note that, in contrast to Theorem 5.3, Proposition 5.2 does not require strict monotonicity or reciprocity; however, it concerns the cruder, average frequency of play, so it provides a considerably weaker guarantee in this regard.
6. Extensions and concluding remarks
In this concluding section, we provide a series of extensions and remarks that would have otherwise disrupted the flow of the rest of our paper.
6.1. Positive correlation
Sandholm [44] introduced the so-called “positive correlation” condition
(PC) |
In the case of potential games, (PC) implies that
(69) |
whenever . This in turn implies the convergence of the trajectories to the set of rest points of the dynamics [45, Theorem 7.1.2]. Some of our results in Section 4 for potential games can also be obtained by establishing (PC) for the dynamics under study.
6.2. No-regret dynamics
Our second remark concerns continuous- or discrete-time processes for which the average regret vanishes, i.e., for all we have:
in continuous time | (70a) | |||||
in discrete time | (70b) |
As we saw in Propositions 4.1 and 5.2, the dynamics of dual averaging have the no-regret property (70) in both continuous and discrete time. Looking more closely at the proofs of Propositions 4.1 and 5.2, we conclude that, in monotone games, the empirical frequency of play under any dynamical process that satisfies (70) converges to the game’s set of equilibria. This property forms the basis of many algorithms and dynamics designed to solve variational inequalities and equilibrium problems [38, 40, 24]; the above provides a game-theoretic interpretation of this property in the context of population games.
6.3. Extensions
We also note that our results can be extended to the following settings:
- (1)
-
(2)
This interpretation is recovered in multi-population games: if there are several player populations indexed by , and if denotes the set of pure strategies of the -th population and is the payoff function of -strategists in the -th population, it suffices to set and .
-
(3)
Our theory can also be extended to normal form games with a finite number of players and smooth concave payoff functions , . In this case, the components of the corresponding vector field are , and the notion of evolutionary stability boils down to that of variational stability; for a detailed treatment, see Sorin & Wan [53] and Mertikopoulos & Zhou [35].
6.4. Links with variational inequalities and further extensions
Taking a more general point of view that focuses on the vector field , we can also consider the following framework that brings to the forefront the associated variational inequalities. Let , be a convex compact subset of a topological vector space with dual , write and , and let be a collection of continuous maps, . Finally, set and, as usual, write
(71) |
for the dual pairing between and .
We may then define as the set of such that
(72) |
and as the set of such that
(73) |
The counterparts for and may also be defined similarly (though the link with population games is no longer present).
Example 6.1.
Consider a strategic game with players, compact metric strategy spaces and continuous payoff functions , , with . Introduce the mixed extension of the game as usual: for a probability distribution (mixed strategy profile) with , let be extended as
(74) |
Then, introduce so that
(75) |
for all . In this way, the equilibrium condition for becomes
(76) |
which is again an instance of .
In this setting, is monotone (dissipative) if
(77) |
As in the case of Section 2, we have if is monotone; furthermore, mutatis mutandis, all the properties presented in Section 2.3 extend to this general setup.
The best-response map associated to is defined again as
(78) |
In addition to monotonicity, we will also now make the “unique best response” assumption
is a singleton for all . | (U) |
Proposition 6.1.
If is monotone and (U) holds, the set is a singleton.
Proof.
The regularized best response map is likewise defined as
(80) |
where satisfies the same axioms as Definition 2.1. Note that satisfies (U) by construction.
Now, using these best-response maps, we may extend the definition of the dynamics ( ‣ 3.1), ( ‣ 3.1) and ( ‣ 3.1) to the more general setup considered here in the obvious way. We then have:
Proposition 6.2.
Proof.
Let be an accumulation point of , and let be a subsequence converging to . By descending to a further subsequence if necessary, we may assume without loss of generality that converges to some limit point . Furthermore, by the definition of ( ‣ 3.1), we have:
(81) |
Hence, taking the limit , we get
(82) |
so . Since by assumption, we deduce further that . Hence, by (U), we conclude that , and our assertion follows. ∎
Obviously, under (U), a similar property holds for regularized fictitious play.
6.5. Preview for Part II
Part II of our paper [16] deals with games with continuous action spaces, in the spirit of Example 6.1 (but with a continuous population of nonatomic players). The first step is to define the Stampacchia variational inequality when is a compact metric space, is an appropriate compact convex subset of , is a continuous map from to the space of continuous functions on , and the duality mapping is given by
(83) |
In this framework, we introduce potential and monotone maps, and we prove the convergence of the discrete-time dynamics of fictitious play.
We show that this study covers the case of nonatomic games with (player-dependent) compact action spaces. Finally, we introduce a class of first-order mean-field games and prove that solutions of the Stampacchia variational inequality correspond to the equilibria of the game in normal form where is the set of paths generated by the players.
Appendix A Properties of choice maps and the Fenchel coupling
Our aim in this appendix is to provide some useful background and properties for the regularized choice map and the Fenchel coupling.
To recall the basic setup, we assume below that is a regularizer on in the sense of Definition 2.1. The convex conjugate of is then defined as
(A.1) |
Since is compact and is strictly convex and continuous on , the maximum in (A.1) is attained at a unique point in , so is finite for all [1]. Moreover, by standard results in convex analysis [42, Chap. 26], is differentiable on and its gradient satisfies the identity
(A.2) |
where is the choice map induced by (cf. Definition 2.1).
The lemma below establishes the inverse differentiability property for , namely that :
Lemma A.1.
Let be a regularizer on . Then, for all and all , we have:
(A.3) |
If, in addition, satisfies (Diff), we also have
(A.4) |
for all and all , .
Proof of Lemma A.1.
For the inequality (A.4), it suffices to show it holds for all (by continuity). To do so, let
(A.5) |
Since is strictly convex and by (A.3), it follows that with equality if and only if . Moreover, note that is a continuous subgradient selection of . Given that and are both continuous on , it follows a fortiori that is continuously differentiable and on . Thus, with convex and for all , we conclude that , from which our claim follows. ∎
Lemma A.2.
Let be a regularizer on . Then, for all and all , we have
(A.6) |
If, in addition, if satisfies (StrCvx), we also have:
(A.7) |
Proof.
Our first claim is a trivial consequence of the Fenchel-Young inequality and the strict convexity of . For our second claim, let . Then, by the definition (A.1) of the convex conjugate of , we have:
(A.8) |
Now, since is -strongly convex, we also have:
(A.9) |
and, with , we get:
(A.10) |
Thus, after rearranging, dividing by , and letting , we obtain
(A.11) |
and our assertion follows from (A). ∎
By virtue of this lemma, we obtain the following convergence criterion in terms of the Fenchel coupling:
Corollary A.1.
Let be a regularizer on satisfying (StrCvx), fix a base point , and let be a sequence in . Then, if , we also have .
Proof.
By Lemma A.2, we have , so our claim follows. ∎
Acknowledgments
The authors acknowledge support by the COST Action CA16228 “European Network for Game Theory” (GAMENET), the FMJH Program PGMO under grant Damper, and the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02), the LabEx PERSYVAL (ANR-11-LABX-0025-01), MIAI@Grenoble Alpes (ANR-19-P3IA-0003), and the grants ORACLESS (ANR-16-CE33-0004) and ALIAS (ANR-19-CE48-0018-01).
References
- Bauschke & Combettes [2017] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, NY, USA, 2 edition, 2017.
- Benaïm [1999] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Jacques Azéma, Michel Émery, Michel Ledoux, and Marc Yor (eds.), Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pp. 1–68. Springer Berlin Heidelberg, 1999.
- Benaïm & Faure [2013] Michel Benaïm and Mathieu Faure. Consistency of vanishingly smooth fictitious play. Mathematics of Operations Research, 38(3):437–450, 2013.
- Benaïm et al. [2005] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005.
- Benaïm et al. [2006] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions, part II: Applications. Mathematics of Operations Research, 31(4):673–695, 2006.
- Berge [1997] Claude Berge. Topological Spaces. Dover, New York, 1997.
- Bravo & Mertikopoulos [2017] Mario Bravo and Panayotis Mertikopoulos. On the robustness of learning in games with stochastically perturbed payoff observations. Games and Economic Behavior, 103, John Nash Memorial issue:41–66, May 2017.
- Brown [1951] George W. Brown. Iterative solutions of games by fictitious play. In T. C. Coopmans (ed.), Activity Analysis of Productions and Allocation, 374-376. Wiley, 1951.
- Cesa-Bianchi & Lugosi [2006] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
- Conley [1978] Charles Cameron Conley. Isolated Invariant Set and the Morse Index. American Mathematical Society, Providence, RI, 1978.
- Dafermos [1980] Stella Dafermos. Traffic equilibrium and variational inequalities. Transportation Science, 14(1):42–54, February 1980.
- Friedman [1991] Daniel Friedman. Evolutionary games in economics. Econometrica, 59(3):637–666, 1991.
- Fudenberg & Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 2 of Economic learning and social evolution. MIT Press, Cambridge, MA, 1998.
- Fudenberg & Levine [1999] Drew Fudenberg and David K. Levine. Conditional universal consistency. Games and Economic Behavior, 29(1):104–130, 1999.
- Gilboa & Matsui [1991] Itzhak Gilboa and Akihiko Matsui. Social stability and equilibrium. Econometrica, 59(3):859–867, May 1991.
- Hadikhanloo et al. [2021] Saeed Hadikhanloo, Rida Laraki, Panayotis Mertikopoulos, and Sylvain Sorin. Learning in nonatomic games, Part II: Continuous action spaces and mean-field games. Mimeo, 2021.
- Héliou et al. [2020] Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non-convex optimization with imperfect feedback. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
- Hofbauer & Sandholm [2002] Josef Hofbauer and William H. Sandholm. On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–2294, November 2002.
- Hofbauer & Sandholm [2009] Josef Hofbauer and William H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144(4):1665–1693, July 2009.
- Hofbauer & Sigmund [1998] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK, 1998.
- Hofbauer & Sigmund [2003] Josef Hofbauer and Karl Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, July 2003.
- Hofbauer et al. [1979] Josef Hofbauer, Peter Schuster, and Karl Sigmund. A note on evolutionarily stable strategies and game dynamics. Journal of Theoretical Biology, 81(3):609–612, 1979.
- Hofbauer et al. [2009] Josef Hofbauer, Sylvain Sorin, and Yannick Viossat. Time average replicator and best reply dynamics. Mathematics of Operations Research, 34(2):263–269, May 2009.
- Juditsky et al. [2011] Anatoli Juditsky, Arkadi Semen Nemirovski, and Claire Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
- Kwon [2016] Joon Kwon. Stratégies de descente miroir pour la minimisation du regret et l’approchabilité. PhD thesis, Université Pierre-et-Marie-Curie, 2016.
- Kwon & Mertikopoulos [2017] Joon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics and Games, 4(2):125–148, April 2017.
- Mas-Colell [1984] Andreu Mas-Colell. On a theorem of Schmeidler. Journal of Mathematical Economics, 13:201–206, 1984.
- Maynard Smith & Price [1973] John Maynard Smith and George R. Price. The logic of animal conflict. Nature, 246:15–18, November 1973.
- McFadden [1974] Daniel L. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka (ed.), Frontiers in Econometrics, pp. 105–142. Academic Press, New York, NY, 1974.
- McKelvey & Palfrey [1995] Richard D. McKelvey and Thomas R. Palfrey. Quantal response equilibria for normal form games. Games and Economic Behavior, 10(6):6–38, 1995.
- Mertikopoulos & Moustakas [2010] Panayotis Mertikopoulos and Aris L. Moustakas. The emergence of rational behavior in the presence of stochastic perturbations. The Annals of Applied Probability, 20(4):1359–1388, July 2010.
- Mertikopoulos & Sandholm [2016] Panayotis Mertikopoulos and William H. Sandholm. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297–1324, November 2016.
- Mertikopoulos & Sandholm [2018] Panayotis Mertikopoulos and William H. Sandholm. Riemannian game dynamics. Journal of Economic Theory, 177:315–364, September 2018.
- Mertikopoulos & Staudigl [2018] Panayotis Mertikopoulos and Mathias Staudigl. On the convergence of gradient-like flows with noisy gradient input. SIAM Journal on Optimization, 28(1):163–197, January 2018.
- Mertikopoulos & Zhou [2019] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1-2):465–507, January 2019.
- Minty [1967] George J. Minty. On the generalization of a direct method of the calculus of variations. Bulletin of the American Mathematical Society, 73(3):315–321, May 1967.
- Monderer & Shapley [1996] Dov Monderer and Lloyd S. Shapley. Potential games. Games and Economic Behavior, 14(1):124 – 143, 1996.
- Nemirovski [2004] Arkadi Semen Nemirovski. Prox-method with rate of convergence for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
- Nesterov [2007] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2):319–344, 2007.
- Nesterov [2009] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, 2009.
- Robinson [1951] Julia Robinson. An iterative method for solving a game. Annals of Mathematics, 54:296–301, 1951.
- Rockafellar [1970] Ralph Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.
- Rustichini [1999] Aldo Rustichini. Optimal properties of stimulus-response learning models. Games and Economic Behavior, 29(1-2):244–273, 1999.
- Sandholm [2001] William H. Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97:81–108, 2001.
- Sandholm [2010] William H. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, Cambridge, MA, 2010.
- Sandholm [2015] William H. Sandholm. Population games and deterministic evolutionary dynamics. In H. Peyton Young and Shmuel Zamir (eds.), Handbook of Game Theory IV, pp. 703–778. Elsevier, 2015.
- Schmeidler [1973] David Schmeidler. Equilibrium points of nonatomic games. Journal of Statistical Physics, 7(4):295–300, April 1973.
- Shalev-Shwartz [2011] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011.
- Shalev-Shwartz & Singer [2006] Shai Shalev-Shwartz and Yoram Singer. Convex repeated games and Fenchel duality. In NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp. 1265–1272. MIT Press, 2006.
- Sorin [2009] Sylvain Sorin. Exponential weight algorithm in continuous time. Mathematical Programming, 116(1):513–528, 2009.
- Sorin [2020] Sylvain Sorin. Replicator dynamics: Old and new. Journal of Dynamics and Games, 7(4):365–386, 2020.
- Sorin [2021] Sylvain Sorin. No-regret algorithms in online learning, games and convex optimization. Mimeo, 2021.
- Sorin & Wan [2016] Sylvain Sorin and Cheng Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics and Games, 3(1):101–120, January 2016.
- Taylor [1979] Peter D. Taylor. Evolutionarily stable strategies with two types of player. Journal of Applied Probability, 16(1):76–83, March 1979.
- Taylor & Jonker [1978] Peter D. Taylor and Leo B. Jonker. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 40(1-2):145–156, 1978.
- van Damme [1987] Eric van Damme. Stability and perfection of Nash equilibria. Springer-Verlag, Berlin, 1987.
- Weibull [1995] Jörgen W. Weibull. Evolutionary Game Theory. MIT Press, Cambridge, MA, 1995.