LEARNING IN NONATOMIC GAMES, PART I:
FINITE ACTION SPACES AND POPULATION GAMES

Saeed Hadikhanloo^∗ ^∗ Tweag I/O, Paris, France. [email protected] , Rida Laraki^† ^† CNRS (Lamsade, University of Dauphine-PSL) and University of Liverpool (Computer Science Department). [email protected] ,
Panayotis Mertikopoulos^♯ ^♯ Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LIG, 38000, Grenoble, France. [email protected] and Sylvain Sorin^‡ ^‡ Institut de Mathématiques Jussieu-PRG, Sorbonne Université, UPMC, CNRS UMR 7586. [email protected]

Abstract.

We examine the long-run behavior of a wide range of dynamics for learning in nonatomic games, in both discrete and continuous time. The class of dynamics under consideration includes fictitious play and its regularized variants, the best reply dynamics (again, possibly regularized), as well as the dynamics of dual averaging / “follow the regularized leader” (which themselves include as special cases the replicator dynamics and Friedman’s projection dynamics). Our analysis concerns both the actual trajectory of play and its time-average, and we cover potential and monotone games, as well as games with an evolutionarily stable state (global or otherwise). We focus exclusively on games with finite action spaces; nonatomic games with continuous action spaces are treated in detail in Part II.

Key words and phrases:

Learning; nonatomic games; fictitious play; dual averaging; evolutionary stability; Nash equilibrium; variational inequalities

2020 Mathematics Subject Classification:

Primary 91A22, 91A26; secondary 37N40, 68Q32.

1. Introduction

The study of evolutionary dynamics in population games has been the mainstay of evolutionary game theory ever since the inception of the field in the mid-1970’s. Such dynamics are typically derived from biological models or economic microfoundations that express the net growth rate of a type (or strategy) within a population as a function of the relative frequency of all other types. Thus, owing to the flexibility of this general framework, these considerations have generated an immense body of literature; for an introduction to the topic, we refer the reader to the masterful treatments of Hofbauer & Sigmund [20], Weibull [57], and the more recent, panoramic survey of Sandholm [45, 46].

Our paper takes a complementary approach and aims to examine the long-run behavior of learning procedures in nonatomic games. While the topic of learning in games with a finite number of players is fairly well-studied – see e.g., the classic works of Fudenberg & Levine [13] and Cesa-Bianchi & Lugosi [9] – the same cannot be said for learning in population games. On that account, our goal in this paper is to provide a synthetic treatment for a wide range of dynamics for learning in nonatomic games, unifying along the way a number of existing results in the literature.

We pursue this goal in two parts. In the current paper, we focus exclusively on games with shared, finite action spaces; games with continuous (and possibly player-dependent) action spaces are significantly more complicated, so their study is deferred to the companion paper [16].

Overview of the paper.

To specify the general framework under consideration one has to:

a)

Describe the family of games to be studied: These are anonymous nonatomic games and two concrete classes thereof, potential and monotone games; the relevant details are provided in Section 2.
b)

Identify the solution concepts involved: We will mainly focus on Nash equilibria and variants of evolutionarily stable states, expressed throughout as solutions of an associated variational inequality; this is also described in Section 2.
c)

Define the dynamics under study: This is the content of Section 3, where we begin by recalling the standard fictitious play procedure of Brown [8] and Robinson [41], its regularized variants [18], and its continuous-time analogue, the best reply dynamics of Gilboa & Matsui [15]. Subsequently, we introduce a general class of dynamics known as dual averaging – or “follow the regularized leader” – and which contains as special cases the replicator dynamics of Taylor & Jonker [55] and the projection dynamics of Friedman [12].

The dynamics we examine evolve in both discrete and continuous time: the continuous-time analysis is presented in Section 4 and comprises the template for the discrete-time analysis that follows in Section 5. In both cases, our analysis builds on previous work and results by Monderer & Shapley [37], Sandholm [44], Hofbauer & Sigmund [21], Hofbauer & Sandholm [18, 19], Hofbauer et al. [23] and Mertikopoulos & Sandholm [32, 33]. In this context, some of the new results that we present in our paper can be summarized as follows:

a)

We establish a precise link between (regularized) fictitious play and dual averaging: informally, fictitious play best-responds to the average population state whereas dual averaging best-responds to the average population payoff profile.
b)

We determine the equilibrium convergence properties of vanishingly regularized fictitious play in continuous time.
c)

We provide a range of conditions for the convergence of dual averaging trajectories to evolutionarily stable states, in both discrete and continuous time (and possibly with a variable learning rate).

Finally, in Section 6, we discuss several extensions of these results beyond single-population nonatomic games, and we provide a preview of the issues that arise in nonatomic games with continuous action spaces – the topic of interest of the follow-up paper [16].

2. Preliminaries

2.1. Nonatomic games

A (single-population) nonatomic game is defined by the following primitives:

•

A population of players. This is modeled by the unit interval $\mathcal{I}=[0,1]$ endowed with the Lebesgue measure $\mu$ . The nonatomic aspect means that the behavior of a set of players with zero measure has no impact on the game (see below).
•

A set of pure strategies $\mathcal{A}=\{1,\dotsc,A\}$ . Following Schmeidler [47], the players’ strategic behavior is defined as a measurable function $\chi\colon\mathcal{I}\to\mathcal{A}$ , with $\chi(i)\in\mathcal{A}$ denoting the pure strategy of player $i$ . Given a pure strategy $\alpha\in\mathcal{A}$ , the inverse image $\chi^{-1}(\alpha)=\{i\in\mathcal{I}:\chi(i)=\alpha\}$ will be called the set of “ $\alpha$ -strategists”, and the corresponding pushforward measure $x\coloneqq\chi\sharp\mu\equiv\mu\circ\chi^{-1}$ on $\mathcal{A}$ will be called the state of the population. This measure will be the main variable of interest; for convenience, we will view $x$ as an element of the simplex $\mathcal{X}\coloneqq\operatorname{\operatorname{\Delta}}(\mathcal{A})$ and we will write $x_{\alpha}=\mu(\chi^{-1}(\alpha))$ for the relative frequency of $\alpha$ -strategists in the population.
•

A family of continuous payoff functions $u_{\alpha}\colon\mathcal{X}\to\mathbb{R}$ , $\alpha\in\mathcal{A}$ . Specifically, $u_{\alpha}(x)$ denotes the payoff to $\alpha$ -strategists when the state of the population is $x\in\mathcal{X}$ . Collectively, we will also write $F(x)=(u_{\alpha}(x))_{\alpha\in\mathcal{A}}$ for the population’s payoff vector at state $x$ .

The above definition specifies a population game $\mathcal{G}\equiv\mathcal{G}(\mathcal{A},F)$ , cf. Sandholm [45] and references therein. In this definition, players are anonymous in the sense that their payoffs factor through the state of the population and do not depend on specific players choosing specific strategies. For more intricate classes of nonatomic games (possibly involving idiosyncratic components), see Schmeidler [47] and Mas-Colell [27].

2.2. Equilibria, stability, and variational inequalities

In population games, an equilibrium is defined as a population state $x^{\ast}\in\mathcal{X}$ in which almost all players are satisfied with their choice of strategy. Formally, $x^{\ast}\in\mathcal{X}$ is an equilibrium of $\mathcal{G}$ if

u_{\alpha}(x^{\ast})\geq u_{\beta}(x^{\ast})\quad\text{for all $\alpha\in\operatorname{supp}(x^{\ast})$, $\beta\in\mathcal{A}$},

(Eq)

where $\operatorname{supp}(x^{\ast})$ denotes the support of $x^{\ast}$ . Equivalently, this requirement for $x^{\ast}\in\mathcal{X}$ can be stated as a Stampacchia variational inequality (SVI) of the form

\langle F(x^{\ast}),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}.

(SVI)

Equilibria can also be seen as fixed points of the continuous endomorphism $x\mapsto\operatorname{\Pi}_{\mathcal{X}}(x+F(x))$ , where $\operatorname{\Pi}_{\mathcal{X}}$ denotes the Euclidean projector on $\mathcal{X}$ . By standard fixed point arguments, it follows that the game’s set of equilibria is nonempty; we will denote this set as $\operatorname{Eq}(\mathcal{G})$ .

Remark.

In operator theory and optimization, the direction of (SVI) is typically reversed because optimization problems are usually formulated as minimization problems. The utility maximization viewpoint is more common in the literature on population games, so we will maintain this sign convention throughout.

An alternative characterization of equilibria is by means of the best-response correspondence $\operatorname{\mathsf{BR}}\colon\mathcal{X}\rightrightarrows\mathcal{X}$ , defined here as

\displaystyle\operatorname{\mathsf{BR}}(x)

\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}\rangle,

(1)

We then have the following string of equivalences:

x^{\ast}\in\operatorname{\mathsf{BR}}(x^{\ast})\iff\operatorname{supp}(x^{\ast})\subseteq\operatorname{\mathsf{BR}}(x^{\ast})\iff x^{\ast}\in\operatorname{Eq}(\mathcal{G})

(2)

i.e., the equilibria of the game correspond to the fixed points of $\operatorname{\mathsf{BR}}$ . One can easily show that, under our assumptions, this correspondence has a closed graph and convex compact non-empty values. By Kakutani’s fixed point theorem, $\operatorname{Eq}(\mathcal{G})$ – that is, the set of fixed points of $\operatorname{\mathsf{BR}}$ – is nonempty and compact.

We will also consider the associated Minty variational inequality (MVI) for $x^{\ast}\in\mathcal{X}$ , namely

\langle F(x),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}.

(MVI)

For concreteness, we will write $\operatorname{SVI}(\mathcal{G})$ and $\operatorname{MVI}(\mathcal{G})$ for the set of solutions of (SVI) and (MVI) respectively. In terms of structural properties, $\operatorname{MVI}(\mathcal{G})$ is convex and, under our continuity assumption for $F$ , it is straightforward to verify that $\operatorname{MVI}(\mathcal{G})\subseteq\operatorname{SVI}(\mathcal{G})\equiv\operatorname{Eq}(\mathcal{G})$ . In this regard, (MVI) represents an “equilibrium refinement” criterion for $\mathcal{G}$ (see also Section 2.3 below).

In more detail, (MVI) is intimately related to the seminal notion of evolutionary stability that was introduced by Maynard Smith & Price [28] and formed the starting point of evolutionary game theory. To make this link precise, recall that a population state $x^{\ast}\in\mathcal{X}$ is an evolutionarily stable state of $\mathcal{G}$ [28, 20, 45], if

\langle F(\delta x+(1-\delta)x^{\ast}),x-x^{\ast}\rangle<0

(ESS)

for all sufficiently small $\delta>0$ and all states $x\neq x^{\ast}$ . Analogously, $x^{\ast}\in\mathcal{X}$ is a neutrally stable state (NSS) of $\mathcal{G}$ if (ESS) holds as a weak inequality, i.e.,

\langle F(\delta x+(1-\delta)x^{\ast}),x-x^{\ast}\rangle\leq 0

(NSS)

for all sufficiently small $\delta>0$ and all $x\in\mathcal{X}$ . Finally, $x^{\ast}$ is said to be a globally evolutionarily stable state (resp. globally neutrally stable state) of $\mathcal{G}$ if (ESS) [resp. (NSS)] holds for all $\delta\in(0,1]$ . In obvious notation, we will write $\operatorname{ESS}(\mathcal{G})$ , $\operatorname{NSS}(\mathcal{G})$ , $\operatorname{GESS}(\mathcal{G})$ and $\operatorname{GNSS}(\mathcal{G})$ for the respective set of states of $\mathcal{G}$ (by definition $\operatorname{GESS}(\mathcal{G})$ consists of at most a single point).

Characterizations and implications.

A concise characterization of these notions of stability was derived by Taylor [54] and Hofbauer et al. [22], who showed that $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ if and only if there exists a neighborhood $\mathcal{U}$ of $x^{\ast}$ in $\mathcal{X}$ such that

\langle F(x),x-x^{\ast}\rangle<0\quad\text{for all $x\in\mathcal{U}\setminus\{x^{\ast}\}$},

(3)

and, furthermore, if we can take $\mathcal{U}=\mathcal{X}$ above, then $\operatorname{GESS}(\mathcal{G})=\{x^{\ast}\}$ . For the corresponding neutral versions, we have $x^{\ast}\in\operatorname{NSS}(\mathcal{G})$ if and only if (3) holds on $\mathcal{U}$ as a weak inequality, and $x^{\ast}\in\operatorname{GNSS}(\mathcal{G})$ if and only if we can take $\mathcal{U}=\mathcal{X}$ in this case. This characterization of $\operatorname{GNSS}(\mathcal{G})$ corresponds precisely to the solution set of (MVI); in particular, we have the following string of inclusions:

(4)

In general, the above inclusions are all one-way; in the next section, we discuss two important cases where some of them become equivalences.

2.3. Classes of games

The following classes of games will play an important role in the sequel:

(1)

Potential games: these are games that admit a potential function, i.e., a function $P\colon\mathcal{X}\to\mathbb{R}$ such that $F=\nabla P$ on $\mathcal{X}$ , or, more generally,

$P^{\prime}(x;x^{\prime}-x)=\langle F(x),x^{\prime}-x\rangle\quad\text{for all~$x,x^{\prime}\in\mathcal{X}$},$ (Pot)

where $P^{\prime}(x;z)\coloneqq\lim_{t\to 0^{+}}[P(x+tz)-P(x)]/t$ denotes the one-sided directional derivative of $P$ at $x$ along $z$ . If $\mathcal{G}$ is a potential game, any local maximum of $P$ is an equilibrium of $\mathcal{G}$ , and any strict local maximum of $P$ is an evolutionarily stable state (ESS) of $\mathcal{G}$ . For a detailed treatment of potential games in population games extending the initial study of Monderer & Shapley [37], see Sandholm [44, 45].

(2)

Monotone games: these are games that satisfy the monotonicity condition

\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle\leq 0\quad\text{for all $x,x^{\prime}\in\mathcal{X}$}.

(Mon)

This condition implies that every equilibrium of $\mathcal{G}$ is neutrally stable, so we get the following equivalences:

\operatorname{MVI}(\mathcal{G})\equiv\operatorname{GNSS}(\mathcal{G})=\operatorname{NSS}(\mathcal{G})=\operatorname{Eq}(\mathcal{G})\equiv\operatorname{SVI}(\mathcal{G})

(5)

In this class, the existence of equilibria relies only on the minmax theorem [36].

Going further, a game is called strictly monotone if (Mon) holds as a strict inequality for all $x^{\prime}\neq x$ . In this case, we have the stronger equivalence

\operatorname{GESS}(\mathcal{G})=\operatorname{ESS}(\mathcal{G})=\operatorname{Eq}(\mathcal{G}),

(6)

i.e., all inclusions in (4) become two-way. Moreover, $\operatorname{Eq}(\mathcal{G})$ is a singleton which is also the unique globally evolutionarily stable state (GESS) of the game.

The intersection of potential and monotone (resp. strictly monotone) games occurs when $\mathcal{G}$ admits a concave (resp. strictly concave) potential. The most important example of such games is the class of nonatomic congestion games that arise in routing and transportation theory, cf. Dafermos [11].

2.4. Regularization

In the sequel, we will often consider “regularized” best responses that are single-valued. These are defined as follows:

Definition 2.1.

A regularizer on $\mathcal{X}$ is a function $h\colon\mathbb{R}^{\mathcal{A}}\to\mathbb{R}\cup\{\infty\}$ such that:

(1)

$h$ is supported on $\mathcal{X}$ , i.e., $\operatorname{dom}h=\{x\in\mathbb{R}^{\mathcal{A}}:h(x)<\infty\}=\mathcal{X}$ .
(2)

$h$ is strictly convex and continuous on $\mathcal{X}$ .

Given a regularizer on $\mathcal{X}$ , we further define:

\edefnit\selectfonta\edefnn)

The associated choice map $Q_{h}\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X}$ given by

$\displaystyle Q_{h}(v)$	$\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle v,x^{\prime}\rangle-h(x^{\prime})\}.$	(7a)

$\displaystyle\operatorname{\mathsf{BR}}_{h}(x)$	$\displaystyle\coloneqq\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle F(x),x^{\prime}\rangle-h(x^{\prime})\}=Q_{h}(F(x)).$	(7b)

Remark 2.1.

By our assumptions for $h$ (continuity and strict convexity), $Q_{h}$ and $\operatorname{\mathsf{BR}}_{h}$ are both well-defined and single-valued as correspondences. Moreover, by the strict convexity of $h$ , it follows that the convex conjugate $h^{\ast}(y)=\max_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\}$ of $h$ is differentiable and satisfies

Q_{h}(v)=\nabla h^{\ast}(v)\quad\text{for all $v\in\mathbb{R}^{\mathcal{A}}$}.

(8)

Remark 2.2.

By construction, $\operatorname{\mathsf{BR}}_{h}{}$ (approximately) best-responds to strategies, while $Q_{h}$ best-responds to payoff vectors. For this reason, we will sometimes refer to $Q_{h}$ as a “payoff-based” regularized best response (as opposed to “strategy-based” regularized best response for $\operatorname{\mathsf{BR}}_{h}{}$ ).

Moving forward, for part of our analysis, we will also require one (or both) of the regularity conditions below:

(1)

$h$ is $K$ -strongly convex on $\mathcal{X}$ , i.e.,

h(\lambda x+(1-\lambda)x^{\prime})\leq\lambda h(x)+(1-\lambda)h(x^{\prime})-\frac{K}{2}\lambda(1-\lambda)\lVert x^{\prime}-x\rVert^{2}

(StrCvx)

for all $x,x^{\prime}\in\mathcal{X}$ and all $\lambda\in[0,1]$ .

(2)

The subdifferential $\partial h$ of $h$ admits a continuous selection; specifically, writing $\mathcal{X}_{h}\coloneqq\operatorname{dom}\partial h=\{x:\partial h(x)\neq\varnothing\}$ for the “prox-domain” of $h$ , we posit that there exists a continuous map $\widetilde{\nabla}h\colon\mathcal{X}_{h}\to\mathbb{R}^{\mathcal{A}}$ such that

$\widetilde{\nabla}h(x)\in\partial h(x)\quad\text{for all $x\in\mathcal{X}_{h}$}.$ (Diff)

By replacing best responses with regularized best responses in the definition of equilibria, we also define the notion of a regularized equilibrium as follows:

Definition 2.2.

Given a regularizer $h$ on $\mathcal{X}$ and a regularization weight $\varepsilon>0$ , we define an $\varepsilon$ -regularized equilibrium of $\mathcal{G}$ as a profile $x^{\ast}\in\mathcal{X}$ such that $x^{\ast}\in\operatorname{\mathsf{BR}}_{\varepsilon h}(x^{\ast})$ . The set of $\varepsilon$ -regularized equilibria of $\mathcal{G}$ will be denoted as $\operatorname{Eq}_{\varepsilon h}(\mathcal{G})$ .

Under (Diff), $x^{\ast}\in\mathcal{X}$ is an $\varepsilon$ -regularized equilibrium if and only if

\langle F(x^{\ast})-\varepsilon\widetilde{\nabla}h(x^{\ast}),x^{\ast}-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$}.

(9)

In this regard, $\varepsilon$ -regularized equilibria can be seen as Nash equilibria of the “ $\varepsilon$ -regularized” game $\mathcal{G}_{\varepsilon}\equiv\mathcal{G}_{\varepsilon}(\mathcal{A},F_{\varepsilon})$ with payoff profile $F_{\varepsilon}(x)\coloneqq F(x)-\varepsilon\widetilde{\nabla}h(x)$ ; more concisely, $\operatorname{Eq}_{\varepsilon}(\mathcal{G})=\operatorname{Eq}(\mathcal{G}_{\varepsilon})$ .

Equilibria and regularized equilibria are related by the next hemicontinuity property, itself a consequence of the maximum theorem of Berge [6].

Lemma 1.

Let $x^{\ast}_{\varepsilon}\in\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ , $\varepsilon>0$ , be a family of regularized equilibria of $\mathcal{G}$ . Then, in the limit $\varepsilon\to 0$ , any accumulation point of $x^{\ast}_{\varepsilon}$ is an equilibrium of $\mathcal{G}$ .

We close this section with a concrete illustration of the above concepts:

Example 2.1 (Logit choice).

An important example of regularization corresponds to the entropic regularizer $h(x)=\sum_{\alpha\in\mathcal{A}}x_{\alpha}\log x_{\alpha}$ which, by a standard calculation, leads to the choice map

\operatorname{\Lambda}(v)=\frac{(e^{v_{1}},\dotsc,e^{v_{A}})}{e^{v_{1}}+\dotsm+e^{v_{A}}}.

(10)

This map is commonly referred to as the logit choice map [29], and the corresponding fixed points $x^{\ast}=\operatorname{\Lambda}(F(x^{\ast})/\varepsilon)$ are known as logit equilibria; for a detailed discussion, see van Damme [56], McKelvey & Palfrey [30], Sandholm [45], and references therein. It is also well known that the entropic regularizer is $1$ -strongly convex relative to the $L_{1}$ -norm, so $\operatorname{\Lambda}$ is $1$ -Lipschitz continuous, cf. Shalev-Shwartz [48]. $\lozenge$

3. Dynamics

In this section, we introduce a wide range of dynamics for learning in nonatomic games, in both continuous and discrete time. The unifying principle of the dynamics under study is that agents play a best response – regularized or otherwise – to some version of the history of play: the empirical population frequencies, or the players’ average gains. For historical reasons, we begin with the discrete-time framework in Section 3.1, and we present the corresponding continuous-time dynamics in Section 3.2.

3.1. Discrete-time processes

In the discrete-time case, we will consider the evolution of the population state $X_{n}\in\mathcal{X}$ over a series of epochs $n=1,2,\dotsc$ (corresponding for example to generations in an evolutionary context). In turn, this population state is determined by a recursive update rule – or algorithm – that defines the dynamics in question. Depending on the manner of “best-responding” and the signal that the agents are best-responding to, we have the following dynamics.

\AclFP.

Dating back to Brown [8] and Robinson [41], fictitious play (FP) posits that agents best-respond to the empirical frequency (or “time average”) $\bar{X}_{n}=(1/n)\sum_{k=1}^{n}X_{k}$ of their population state. Concretely, this means that the population state evolves according to the process

X_{n+1}\in\operatorname{\mathsf{BR}}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}\langle F(\bar{X}_{n}),x\rangle.

(FP)

In terms of empirical frequencies, we have the recursive dynamics

\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}(\bar{X}_{n})-\bar{X}_{n}].

(11)

In contrast to the sequence of actual population states $X_{n}\in\mathcal{X}$ , $n=1,2,\dotsc$ , the stationary points of the time-averaged process (11) are straightforward to characterize. Indeed, any fixed point $x^{\ast}$ of (11) satisfies

0\in\operatorname{\mathsf{BR}}(x^{\ast})-x^{\ast},

(12)

i.e., $x^{\ast}$ is stationary under (11) if and only if it is an equilibrium of $\mathcal{G}$ .

\AclRFP.

A variant of the original fictitious play algorithm is the “regularized” version in which best responses are replaced by regularized best responses. In particular, given an underlying regularizer $h$ on $\mathcal{X}$ and a corresponding regularization weight $\varepsilon>0$ , the regularized fictitious play (RFP) process is defined as

X_{n+1}=\operatorname{\mathsf{BR}}_{\varepsilon h}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}V_{\varepsilon h}(\bar{X}_{n},{x})

(RFP)

with

V_{\varepsilon h}(x^{\prime},x)=\langle F(x^{\prime}),x\rangle-\varepsilon h(x)

(13)

Then, as before, in terms of empirical frequencies, we have

\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}_{\varepsilon h}(\bar{X}_{n})-\bar{X}_{n}].

(14)

As in the case of ( ‣ 3.1), the stationary points of the time-averaged dynamics ( ‣ 3.1) are given by the fixed point equation

0\in\operatorname{\mathsf{BR}}_{\varepsilon h}(x^{\ast})-x^{\ast},

(15)

i.e., they are precisely the $\varepsilon$ -regularized equilibria of $\mathcal{G}$ (cf. Definition 2.2).

Remark.

In the literature, this process is sometimes referred to as “smooth” or “perturbed” fictitious play, cf. Fudenberg & Levine [14], Hofbauer & Sandholm [18] and references therein. This difference in terminology is owed to a different set of assumptions for $h$ , which gives rise to interior-valued choice maps $Q\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X}$ .

\AclVRFP.

When the regularization weight in ( ‣ 3.1) decreases over time, we obtain the vanishingly regularized fictitious play (VRFP) dynamics of Benaïm & Faure [3], viz.

X_{n+1}=\operatorname{\mathsf{BR}}_{\varepsilon_{n}h}(\bar{X}_{n})=\operatorname*{arg\,max}\nolimits_{x\in\mathcal{X}}\ V_{\varepsilon_{n}h}(\bar{X}_{n},{x})

(VRFP)

with $\varepsilon_{n}>0$ for all $n$ and $\varepsilon_{n}\to 0$ as $n\to\infty$ . In terms of empirical frequencies, we then have

\bar{X}_{n+1}-\bar{X}_{n}\in\frac{1}{n+1}[\operatorname{\mathsf{BR}}_{\varepsilon_{n}h}(\bar{X}_{n})-\bar{X}_{n}].

(16)

Because this process is non-autonomous ( $\varepsilon_{n}$ depends explicitly on $n$ ), it is no longer meaningful to discuss its rest points.

\AclDA.

Dually to the above, instead of best-responding to the aggregate history of play (perfectly or approximately), we can also consider the case where agents play a ”regularized best response to their aggregate payoff”. In our context, this gives rise to the dual averaging dynamics

	$\displaystyle X_{n+1}$	$\displaystyle=Q_{h}\left(\eta_{n}\sum_{k=1}^{n}F(X_{k})\right),$	(17)
with $Q_{h}$ defined in (7a). Then, in iterative form, we get the update rule
	$\displaystyle\begin{split}S_{n}&=S_{n-1}+F(X_{n})\\ X_{n+1}&=Q_{h}(\eta_{n}S_{n})\end{split}$		(DA)

where $\eta_{n}>0$ is a “learning rate” parameter in the spirit of Nesterov [40]. In the recursive formulation ( ‣ 3.1) of the dynamics, the initialization $S_{0}=0$ is the most common one but, otherwise, it can be arbitrary.

Remark.

In the literature on online learning and online optimization, the process ( ‣ 3.1) is known as “follow the regularized leader” (FTRL) [48, 49]. The terminology “dual averaging” is due to Nesterov [40] and is more common in the theory of variational inequalities and offline convex optimization. We adopt the latter to highlight the link of our work to variational inequalities.

Links between the dynamics: the case of random matching.

To illustrate the relations between the various learning processes discussed above, it will be instructive to consider nonatomic games generated by random matching [57, 45]. In such games, players are randomly matched to play a symmetric two-player game with payoff matrix $A\in\mathbb{R}^{\mathcal{A}\times\mathcal{A}}$ , so the population’s mean payoff profile is $F(x)=Mx$ for all $x\in\mathcal{X}$ . As a result, the dynamics ( ‣ 3.1) may be rewritten as

X_{n+1}=Q_{h}\left(\eta_{n}\sum_{k=1}^{n}F(X_{k})\right)=Q_{h}\left(n\eta_{n}F(\bar{X}_{n})\right)=Q_{h/(n\eta_{n})}(F(\bar{X}_{n})),

(18)

which immediately allows us to recover the variants of fictitious play as follows:

(1)

For (VRFP): $\eta_{n}=1/(n\varepsilon_{n})$ for a sequence of weights $\varepsilon_{n}>0$ , $\lim_{n\to\infty}\varepsilon_{n}=0$ .
(2)

For ( ‣ 3.1): $\eta_{n}=1/(n\varepsilon)$ for some fixed $\varepsilon>0$ .
(3)

For ( ‣ 3.1): this corresponds to the limiting cases “ $n\eta_{n}=\infty$ ” or “ $h=0$ ”. [Of course, these parameter choices are not formally allowed in ( ‣ 3.1), so this limit is informal.]

Remark.

The above is meaningful only when $F(x)$ is linear in $x$ ; however, the relation between $\varepsilon$ and $\eta$ will be seen to underlie a large part of the sequel.

3.2. Continuous-time dynamics

We now proceed to define the corresponding continuous-time dynamics for each of the processes described above.

\AclBRD.

The autonomous formulation (11) of fictitious play can be seen as an Euler discretization of the best reply dynamics (BRD) of Gilboa & Matsui [15], namely

\dot{\mu}_{t}\in\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t}.

(BRD)

In the above, $\mu_{t}\in\mathcal{X}$ is the continuous-time analogue of the empirical mean process $\bar{X}_{n}$ in discrete time; we use the notation $\mu$ instead of $x$ to highlight this link. Clearly, the rest points of ( ‣ 3.2) are again described by the fixed point equation (12), i.e., $\mu^{\ast}$ is stationary under ( ‣ 3.2) if and only if it is an equilibrium of $\mathcal{G}$ .

\AclRBRD.

Working as above, the regularized best reply dynamics (RBRD) are defined as

\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon h}(\mu_{t})-\mu_{t}.

(RBRD)

As in the case of ( ‣ 3.1), the rest points of ( ‣ 3.2) are characterized by the fixed point equation (15), i.e., $\mu^{\ast}$ is stationary under ( ‣ 3.2) if and only if it is an $\varepsilon$ -regularized equilibrium of $\mathcal{G}$ .

Example 3.1 (The logit dynamics).

Building on the logit choice map discussed in Section 2 (cf. Example 2.1), the associated regularized best reply dynamics are known as logit dynamics, and are given by

\dot{\mu}_{t}=\operatorname{\Lambda}\left(F(\mu_{t})/\varepsilon\right)-\mu_{t}.

(19)

Vanishing regularized best reply dynamics.

If we allow the regularization weight $\varepsilon$ to vary in ( ‣ 3.2), we obtain the non-autonomous dynamics

\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon_{t}h}(\mu_{t})-\mu_{t},

(VBRD)

where the “V” indicates again that $\varepsilon_{t}\to 0$ as $t\to\infty$ .

The dual averaging dynamics.

Finally, given a regularizer $h$ on $\mathcal{X}$ as above, the dynamics of dual averaging in continuous time can be written as

	$\displaystyle\dot{y}_{t}$	$\displaystyle=F(x_{t})$		(DAD)
	$\displaystyle x_{t}$	$\displaystyle=Q_{h}(\eta_{t}y_{t})$		(DAD)

where $\eta_{t}\geq 0$ is a time-varying learning parameter, which we assume throughout to be non-increasing. More compactly, the above can also be written as


	$\displaystyle\dot{y}_{t}$	$\displaystyle=F(Q_{h}(\eta_{t}y_{t}))$	(20a)
or
	$\displaystyle x_{t}$	$\displaystyle=Q_{h}\left(\eta_{t}\int_{0}^{t}F(x_{s})\>ds\right),$	(20b)

depending on whether we take a differential- or integral-based viewpoint. Specifically, we note that (20a) is an autonomous ordinary differential equation evolving in the dual, payoff space of the game; by contrast, (20b) is an integral equation stated directly in the primal, strategy space of the game. We also note that, unlike ( ‣ 3.2), ( ‣ 3.2) and ( ‣ 3.2), the dynamics ( ‣ 3.2) are stated in terms of the actual population state $x_{t}$ at time $t$ , not its empirical mean $\mu_{t}$ (which is the driving variable of the previous dynamics).

Example 3.2.

If $\eta_{t}\equiv 1$ and $h$ is the entropic regularizer on $\mathcal{X}$ (cf. Example 2.1), the dynamics ( ‣ 3.2) yield the replicator dynamics of Taylor & Jonker [55], viz.

\dot{x}_{\alpha,t}=x_{\alpha,t}[u_{\alpha}(x_{t})-\langle F(x_{t}),x_{t}\rangle].

(RD)

In words, (RD) indicates that the per capita growth rate of the population of $\alpha$ -strategists is proportional to the difference between the payoff that they experience and the mean population payoff. For a detailed presentation of this derivation in different contexts, see Rustichini [43], Sorin [50, 51], Mertikopoulos & Moustakas [31] and references therein. $\lozenge$

Example 3.3.

If $\eta_{t}\equiv 1$ and $h(x)=(1/2)\sum_{\alpha\in\mathcal{A}}x_{\alpha}^{2}$ , the integral form (20b) of ( ‣ 3.2) becomes

x_{t}=\operatorname{\Pi}\left(\int_{0}^{t}F(x_{s})\>ds\right)

(21)

where $\operatorname{\Pi}$ denotes the Euclidean projector on $\mathcal{X}$ . The differential form of (21) is more delicate to describe because $x_{t}$ may enter and exit different faces of $\mathcal{X}$ in perpetuity. However, if we focus on an open interval $\mathcal{T}\subseteq\mathbb{R}$ over which the support of $x_{t}$ remains constant, it can be shown that (21) follows the projection dynamics of Friedman [12], viz. for all $t\in\mathcal{T}$ we have

\dot{x}_{\alpha,t}=\begin{cases}u_{\alpha}(x)-\frac{1}{\lvert\operatorname{supp}(x_{t})\rvert}\sum_{\beta\in\operatorname{supp}(x_{t})}u_{\beta}(x)&\quad\text{if $\alpha\in\operatorname{supp}(x_{t})$},\\ 0&\quad\text{otherwise}.\end{cases}

(PD)

For the details of this derivation, see Mertikopoulos & Sandholm [32]. $\lozenge$

Remark.

For posterity, we note that both regularizers used in the above examples satisfy (StrCvx), (Diff), and the technical condition (Rec) that we describe later in the paper.

Random matching in continuous time.

Following Hofbauer et al. [23], the various dynamics presented so far can be linked as follows when $F$ is linear and $\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}\>ds$ for $t>0$ :

	$\displaystyle x_{t}$	$\displaystyle=\operatorname{arg\,max}\limits_{x\in\mathcal{X}}\left\{\eta_{t}\int_{0}^{t}\langle F(x_{s}),x\rangle\>ds-h(x)\right\}=\operatorname{arg\,max}\limits_{x\in\mathcal{X}}\left\{\eta_{t}\,t\,\langle F(\mu_{t}),x\rangle-h(x)\right\}$
		$\displaystyle=\operatorname{\mathsf{BR}}_{[1/(t\eta_{t})]h}(\mu_{t}).$		(22)

It thus follows that the empirical distribution of play $\mu_{t}$ under ( ‣ 3.2) follows the dynamics

\dot{\mu}_{t}=\frac{\operatorname{\mathsf{BR}}_{\varepsilon_{t}h}(\mu_{t})-\mu_{t}}{t}

(23)

with $\varepsilon_{t}=1/(t\eta_{t})$ . Hence, after the change of time $t\leftarrow\log t$ , we get ( ‣ 3.2) with a variable regularization parameter $\varepsilon_{t}=1/(t\eta_{t})$ .

4. Analysis and results in continuous time

We now proceed to present our results for the class of dynamics under study. We begin with the continuous-time analysis which provides a template for the discrete-time analysis in the next section. We prove that under some assumptions depending on the dynamical system and the class of games under consideration (potential, monotone or strictly monotone), the dynamics converge to the set of equilibria or or – regularized equilibria – of the game.¹¹1Recall that a family $(x_{t})_{t\geq 0}$ (resp. a sequence $(X_{n})$ , $n=1,2,\dotsc$ ) converges to a set $B$ if the set of accumulation points ${\textbf{A}}[x_{t}]$ (resp. $\mathbf{A}[x_{n}]$ ), as $t\rightarrow\infty$ (resp $n\rightarrow\infty$ ), satisfies: ${\textbf{A}}[x_{t}]\subset B$ (resp. $\mathbf{A}[x_{n}]\subset B$ ).

4.1. \AclBRD

Our first result concerns the best reply dynamics ( ‣ 3.2). Albeit slightly more general, the results of this subsection and the next essentially follow Hofbauer & Sandholm [18, 19].

Theorem 4.1.

Suppose that one of the following holds:

\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is potential.
\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is monotone and $F$ is $C^{1}$ -smooth.

Then every solution orbit $\mu_{t}$ of ( ‣ 3.2) converges to $\operatorname{Eq}(\mathcal{G})$ .

Proof.

Both cases rely on establishing a suitable Lyapunov function $E(t)$ for ( ‣ 3.2):

(1)

In the potential case, we simply take $E(t)=P(\mu_{t})$ .

(2)

In the monotone case, $E(t)=\operatorname{Gap}(\mu_{t})$ where $\operatorname{Gap}$ denotes the function

\operatorname{Gap}(x):=\max_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}-x\rangle=\langle F(x),\operatorname{\mathsf{BR}}(x)-x\rangle

(24)

Observe that $\operatorname{Gap}(x)\geq 0$ with equality if and only if $x\in\operatorname{Eq}(\mathcal{G})$ .

We now proceed on a case-by-case basis:

Potential case.

From (Pot) we obtain that

\frac{d}{dt}P(\mu_{t})=\lim_{\delta\to 0^{+}}[P(\mu_{t}+\delta\dot{\mu}_{t})-P(\mu_{t})]/\delta=\langle F(\mu_{t}),\dot{\mu}_{t}\rangle

(25)

Since $\dot{\mu}_{t}\in\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t}$ , it follows that $\frac{d}{dt}P(\mu_{t})=\operatorname{Gap}(\mu_{t})\geq 0$ with equality if and only if $\dot{\mu}_{t}=0$ . Consequently, from Lyapunov’s first method [see e.g., 20, Theorem 2.6.1] we deduce that $\mu_{t}$ converges to the set of rest points of ( ‣ 3.2), which is $\operatorname{Eq}(\mathcal{G})$ .

Monotone case.

Let $JF$ denote the Jacobian of $F$ . By the envelope theorem we get:

\frac{d}{dt}\operatorname{Gap}(\mu_{t})=\langle F(\mu_{t}),-\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})(\operatorname{\mathsf{BR}}(\mu_{t})-\mu_{t})=-\operatorname{Gap}(\mu_{t})+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}

(26)

As $\mathcal{G}$ is monotone, $(x^{\prime}-x)JF(x)(x^{\prime}-x)\leq 0$ for all $x,x^{\prime}\in\mathcal{X}$ , so $\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0$ . Consequently:

\frac{d}{dt}\operatorname{Gap}(\mu_{t})\leq-\operatorname{Gap}(\mu_{t}).

(27)

Hence $\operatorname{Gap}(\mu_{t})$ decreases exponentially to $0$ , so $\mu_{t}$ converges to $\operatorname{Eq}(\mathcal{G})$ .∎

4.2. \AclRBRD

In this section we will assume that the regularizer $h$ satisfies (Diff). In a slight abuse of notation, we will skip the dependence on $h$ and write $F_{\varepsilon}$ for $F_{\varepsilon h}\coloneqq F-\varepsilon\widetilde{\nabla}h$ , see (9), with $\widetilde{\nabla}h$ defined as in (Diff). We also write $\operatorname{\mathsf{BR}}_{\varepsilon}(x)$ for $\operatorname{\mathsf{BR}}_{\varepsilon h}(x)$ and so on. The reason is that the regularizer $h$ will be fixed throughout while the weight $\varepsilon$ will be allowed to vary in the next subsection.

Lemma 2.

Under (Diff), we have $\langle F_{\varepsilon}(x),\operatorname{\mathsf{BR}}_{\varepsilon}(x)-x\rangle\geq 0$ for all $x\in\mathcal{X}$ , and equality holds if and only if $x=\operatorname{\mathsf{BR}}_{\varepsilon}(x)$ .

Proof.

Let $\phi(z)=\langle F(x),z\rangle-\varepsilon h(z)$ . Then $\phi$ is strictly concave and so $\partial\phi$ defines a strictly monotone operator. Thus, $x^{\prime}=\arg\max_{z\in\mathcal{X}}\phi(z)$ is a solution of the variational inequality (MVI) associated to $\partial\phi$ : for all $d\in\partial\phi(x)$ : $\langle d,x^{\prime}-x\rangle\geq 0$ with equality if and only if $x=x^{\prime}$ . As $F_{\varepsilon}(x)\in\partial\phi(x)$ and $x^{\prime}=\operatorname{\mathsf{BR}}_{\varepsilon}(x)$ the result follows. ∎

Lemma 3.

Suppose that (Diff) holds. If $\mathcal{G}$ with payoff field $F$ is potential (resp. monotone) then the game $\mathcal{G}_{\varepsilon}$ with payoff function $F_{\varepsilon}$ is potential (resp. strictly monotone).

Proof.

If $\mathcal{G}$ has a potential $P$ , a potential function of $\mathcal{G}_{\varepsilon}$ is $P-\varepsilon h$ . If $\mathcal{G}$ is a monotone game, the game $\mathcal{G}_{\varepsilon}$ is strictly monotone because we have $\langle\widetilde{\nabla}h(x)-\widetilde{\nabla}h(x^{\prime}),x-x^{\prime}\rangle\geq 0$ for every $x,x^{\prime}\in\mathcal{X}$ , with equality if and only if $x=x^{\prime}$ . ∎

We now turn to the asymptotic properties of regularized best reply dynamics.

Theorem 4.2.

Suppose that (Diff) is satisfied and assume one of the following holds:

\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is potential.
\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is monotone and $F$ is $C^{1}$ -smooth.

Then every solution orbit $\mu_{t}$ of ( ‣ 3.2) converges to $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ .

Remark.

Observe that when $\mathcal{G}$ is monotone, $\mathcal{G}_{\varepsilon}$ is strictly monotone, so $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ is a singleton.

Proof.

As above, both cases rely on establishing a suitable Lyapunov function $E(t)$ for ( ‣ 3.2):

(1)

In the potential case: $E(t)=P_{\varepsilon}(\mu_{t})\coloneqq P(\mu_{t})-\varepsilon h(\mu_{t})$ .

(2)

In the monotone case:

	$\displaystyle E(t)$	$\displaystyle=\operatorname{Gap}_{\varepsilon}(\mu_{t})=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu_{t}),x^{\prime}-\mu_{t}\rangle-\varepsilon(h(x^{\prime})-h(\mu_{t}))$
		$\displaystyle=\max_{x^{\prime}\in\mathcal{X}}V_{\varepsilon}(\mu_{t},x^{\prime})-V_{\varepsilon}(\mu_{t},\mu_{t})$		(28)

Potential case.

There is $z(t)\in\partial h(\mu_{t})$ such that – similarly to (25) – we have:

\frac{d}{dt}P_{\varepsilon}(\mu_{t})=\lim_{dt\to 0^{+}}[P_{\varepsilon}(\mu_{t}+\dot{\mu}_{t}dt)-P_{\varepsilon}(\mu_{t})]/dt=\langle F(\mu_{t}),\dot{\mu}_{t}\rangle-\varepsilon\langle z(t),\dot{\mu}_{t}\rangle

(29)

As $\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon}(\mu_{t})-\mu_{t}$ , by Lemma A.1 in the appendix, we have $\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle$ . Consequently

\frac{d}{dt}P_{\varepsilon}(\mu_{t})\geq\langle F(\mu_{t}),\dot{\mu}_{t}\rangle-\varepsilon\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle=\langle F_{\varepsilon}(\mu_{t}),\dot{\mu}_{t}\rangle

(30)

By Lemma 2 and the above, we deduce that $\frac{d}{dt}P_{\varepsilon}(\mu_{t})\geq 0$ with equality if and only if $\dot{\mu}_{t}=0$ . Consequently, by a Lyapunov argument, $\mu_{t}$ converges to $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ .

Monotone case.

Using the envelope theorem as in (26) and, just as above, there is $z(t)\in\partial h(\mu_{t})$ such that:

\displaystyle\frac{d}{dt}\operatorname{Gap}_{\varepsilon}(\mu_{t})=-\langle F(\mu_{t})-\varepsilon z(t),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq-\langle F_{\varepsilon}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}

(31)

The inequality being a consequence of Lemma A.1 as it implies that $\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle$ . Since $\mathcal{G}$ is monotone, $\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0$ . Consequently, as $\dot{\mu}_{t}=\operatorname{\mathsf{BR}}_{\varepsilon}(\mu_{t})-\mu_{t}$ , by Lemma 2, $\frac{d}{dt}\operatorname{Gap}_{\varepsilon}(\mu_{t})\leq 0$ with equality if and only if $\dot{\mu}_{t}=0$ . Hence $\mu_{t}$ converges to $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ . ∎

4.3. Vanishing regularized best reply dynamics

A similar technique will be used to obtain convergence to the set of Nash equilibria for vanishing regularized best reply dynamics. The additional difficulty here comes from the fact that the vanishing regularized best reply dynamics is not autonomous and so we should prove by hand the Lyapunov convergence argument. To the best of our knowledge, this is a new result.

Theorem 4.3.

Suppose that (Diff) holds and ( ‣ 3.2) is run with a smoothly-varying and strictly decreasing regularization weight $\varepsilon_{t}\to 0$ . Assume further that $h$ takes values in $[0,1]$ , and one of the following holds:

\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is potential.
\edefitn(\edefitit\selectfonta\edefitn)

$\mathcal{G}$ is monotone, $F$ is $C^{1}$ -smooth, and $\widetilde{\nabla}h$ is bounded.

Then every solution orbit $\mu_{t}$ of ( ‣ 3.2) converges to $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ .

Remark.

The assumption that $h$ takes value in $[0,1]$ is only made for notational convenience and does not incur any loss of generality.

Proof.

As before, we treat the potential and monotone cases separately.

Potential case.

Let $P_{\varepsilon_{t}}(\mu_{t})=P(\mu_{t})-\varepsilon_{t}h(\mu_{t})$ . Then, a calculation similar to the above implies that there exists $z(t)\in\partial h(\mu_{t})$ such that

\frac{d}{dt}P_{\varepsilon_{t}}(\mu_{t})=\langle F(\mu_{t})-\varepsilon_{t}z(t),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle-\dot{\varepsilon}_{t}h(\mu_{t})\geq\langle F_{\varepsilon_{t}}(\mu_{t}),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle-\dot{\varepsilon}_{t}h(\mu_{t})

(32)

The inequality being a consequence of Lemma A.1 as it implies that $\langle z(t),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle$ . By Lemma 2, $\langle F_{\varepsilon_{t}}(x),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(x)-x\rangle\geq 0$ . Since $-\dot{\varepsilon}_{t}h(\mu_{t})\geq 0$ we deduce that $\frac{d}{dt}P_{\varepsilon_{t}}(\mu_{t})\geq 0$ and consequently $P_{\varepsilon_{t}}(\mu_{t})$ converges. As $\varepsilon_{t}$ goes to zero and $h(\cdot)$ is bounded, $P(\mu_{t})$ converges as well. We will deduce from this that $\mu_{t}$ converges to $\operatorname{Eq}(\mathcal{G})$ . By contradiction, if a limit point $\mu_{0}=\lim_{k}\mu_{t_{k}}$ is not in $\operatorname{Eq}(\mathcal{G})$ then:

\operatorname{Gap}(\mu_{0})=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu_{0}),x^{\prime}-\mu_{0}\rangle=\langle F(\mu_{0}),\operatorname{\mathsf{BR}}(\mu_{0})-\mu_{0}\rangle=\alpha>0.

(33)

Recall that, by definition, for all $t$ and $\mu$ :

$\displaystyle\operatorname{Gap}_{\varepsilon_{t}}(\mu)$	$\displaystyle=\max_{x^{\prime}\in\mathcal{X}}\langle F(\mu),x^{\prime}-\mu\rangle-\varepsilon_{t}(h(x^{\prime})-h(\mu))$
	$\displaystyle=\langle F(\mu),\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu)-\mu\rangle-\varepsilon_{t}(h(\operatorname{\mathsf{BR}}_{\varepsilon_{t}}(\mu))-h(\mu))$
	$\displaystyle=\max_{x^{\prime}\in\mathcal{X}}V_{\varepsilon_{t}}(\mu,x^{\prime})-V_{\varepsilon_{t}}(\mu,\mu).$	(34)

Also, as $h$ takes values in $[0,1]$ , $|V_{\varepsilon}(x,x^{\prime})-V_{\varepsilon}(x,x)-(V_{\varepsilon^{\prime}}(x,x^{\prime})-V_{\varepsilon^{\prime}}(x,x))|\leq|\varepsilon-\varepsilon^{\prime}|$ . Thus:

\displaystyle\lvert\operatorname{Gap}_{\varepsilon_{t}}(\mu)-\operatorname{Gap}(\mu)\rvert\leq\varepsilon_{t}.

(35)

As in (25) one has for all $k$ and $s$ :

\displaystyle\frac{d}{dt}P(\mu_{s+t_{k}})

\displaystyle=\langle F(\mu_{s+t_{k}}),\operatorname{\mathsf{BR}}_{\varepsilon_{s+t_{k}}}(\mu_{s+t_{k}})-\mu_{s+t_{k}}\rangle

(36)

Thus, using (4.3) then (35) we deduce that:

\displaystyle\frac{d}{dt}P(\mu_{s+t_{k}})

\displaystyle\geq\operatorname{Gap}_{\varepsilon_{s+t_{k}}}(\mu_{t_{k}+s})-\varepsilon_{s+t_{k}}\geq\operatorname{Gap}(\mu_{t_{k}+s})-2\varepsilon_{s+t_{k}}.

(37)

As the function $\mu\rightarrow\operatorname{Gap}(\mu)$ is uniformly continuous and $t\rightarrow\mu_{t}$ is Lipschitz, recalling that $\operatorname{Gap}(\mu_{k})\rightarrow\alpha>0$ , we deduce that there is $\delta>0$ and $K_{1}$ such that for $k\geq K_{1}$ and all $0\leq s\leq\delta$ one has:

\displaystyle\operatorname{Gap}(\mu_{t_{k}+s})\geq\operatorname{Gap}(\mu_{t_{k}})-\alpha/4\geq 3\alpha/4

(38)

As $\varepsilon_{t}$ decreases to $0$ there is $K_{2}$ such that for all $k\geq K_{2}$ , $\varepsilon_{s+t_{k}}\leq\alpha/4$ . From this, (35) and (38), we obtain that $\frac{d}{dt}P(\mu_{s+t_{k}})\geq\alpha/2$ for all $0\leq s\leq\delta$ , for all $k\geq\max(K_{1},K_{2})$ . Thus $P(\mu_{\delta+t_{k}})-P(\mu_{t_{k}})=\int_{0}^{\delta}\frac{d}{dt}P(\mu_{s+t_{k}})ds\geq\delta\alpha/2$ , which contradicts the convergence of $P(\mu_{t})$ .

Monotone case.

Using the envelope theorem, and a calculation similar to the above implies that there exists $z(t)\in\partial h(\mu_{t})$ such that


$\displaystyle\frac{d}{dt}\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})$	$\displaystyle=-\langle F(\mu_{t})-\varepsilon_{t}z(t),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))$	(39a)
	$\displaystyle\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))$	(39b)
	$\displaystyle\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),\dot{\mu}_{t}\rangle+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))$	(39c)
	$\displaystyle=-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}-\mu_{t}\rangle+\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t})).$	(39d)

The first inequality is a consequence of Lemma A.1, which implies that $\langle z(t),\dot{\mu}_{t}\rangle\leq\langle\widetilde{\nabla}h(\mu_{t}),\dot{\mu}_{t}\rangle$ . The second inequality is a consequence of $\dot{\mu}_{t}JF(\mu_{t})\dot{\mu}_{t}\leq 0$ as $\mathcal{G}$ is monotone.

We now proceed to prove that $\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0$ .

\edefnitCase 1:

If $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})=0$ then $\mu_{t}=BR_{\varepsilon_{t}}(\mu_{t})=\operatorname{Eq}_{\varepsilon_{t}}(\mathcal{G})$ thus $\dot{\mu}_{t}=0$ . Then (39a) implies that $\frac{d}{dt}\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})=0$ and thus $\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0$ .
\edefnitCase 2:

If $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})>0$ then since $h(\cdot)\in[0,1]$ and $\dot{\varepsilon}_{t}<0$ we have:

$\displaystyle\dot{\varepsilon}_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))\leq-\dot{\varepsilon}_{t}$ (40)

and from Lemma 2 and (39d) we deduce that $\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})<0$ .

Consequently, $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t}$ converges to $\delta\geq 0$ and since $\varepsilon_{t}$ decreases to zero we conclude that $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})\rightarrow\delta$ . If $\delta>0$ , define:

	$\displaystyle\Delta$	$\displaystyle\coloneqq-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle$
		$\displaystyle\,=-\langle F(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle+\varepsilon_{t}\langle\widetilde{\nabla}h(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle.$		(41)

From (4.3) we further get:

\displaystyle\Delta

\displaystyle=-\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t}(h(\mu_{t})-h(BR_{\varepsilon_{t}}(\mu_{t}))+\varepsilon_{t}\langle\widetilde{\nabla}h(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle

(42)

As $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})\rightarrow\delta$ , and by the assumptions, $h(\cdot)$ and $\widetilde{\nabla}h$ are bounded and $\varepsilon_{t}$ decreases to $0$ , we deduce that for $t$ large enough:

\Delta=-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle\leq-\delta/2.

(43)

Consequently, for $t$ large enough we deduce from (39d), (40) and the last inequality that

\frac{d}{dt}(\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t})\leq-\langle F_{\varepsilon_{t}}(\mu_{t}),BR_{\varepsilon_{t}}(\mu_{t})-\mu_{t}\rangle<-\delta/2,

(44)

which is impossible because $\operatorname{Gap}_{\varepsilon_{t}}(\mu_{t})+\varepsilon_{t}\geq 0$ . Hence $\delta=0$ and so $\mu_{t}$ converges to $\operatorname{Eq}(\mathcal{G})$ . ∎

4.4. \AclDAD with a constant learning rate

Before stating our main result we establish some useful lemmas. Define the regret associated to a trajectory $x_{t}$ and a reference point $x\in\mathcal{X}$ as:

\operatorname{Reg}_{x}(t)=\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds

(45)

Lemma 4.

Let $x_{t}$ be a solution trajectory of ( ‣ 3.2). Then:

\operatorname{Reg}_{x}(t)\leq\max h-\min h.

(46)

This is a particular case of a more general bound due to Kwon & Mertikopoulos [26] (see Lemma 7 below). The next more compact proof of this simpler bound follows Bravo & Mertikopoulos [7] and Sorin [52].

Proof.

Define for $x\in\mathcal{X}$ and $y\in\mathbb{R}^{\mathcal{A}}$ the Fenchel coupling

F_{h}(x,y)=h(x)+h^{\ast}(y)-\langle x,y\rangle.

(47)

By the Fenchel-Young inequality, $F_{h}(x,y)\geq 0$ . Recall that from (8) and (20b) $x_{t}=\nabla h^{\ast}(y_{t})=Q_{h}(y_{t})$ with $y_{t}=\int_{0}^{t}F(x_{s})ds$ . Then, following Mertikopoulos & Zhou [35, Lemma A.2], a simple derivation gives that for all $x\in\mathcal{X}$ :

\frac{d}{dt}F_{h}(x,y_{t})=\langle F(x_{t}),\nabla h^{\ast}(y_{t})-x\rangle=\langle F(x_{t}),x_{t}-x\rangle=-\frac{d}{dt}\operatorname{Reg}_{x}(t),

(48)

Consequently, following the reasoning of Sorin [52], we get

\operatorname{Reg}_{x}(t)\leq F_{h}(x,0)=h(x)+h^{\ast}(0)\leq\max h+h^{\ast}(0)=\max h-\min h\qed

Lemma 5.

If a solution trajectory $x_{t}=x^{\ast}$ of ( ‣ 3.2) is stationary, then $x^{\ast}\in\operatorname{Eq}(\mathcal{G})$ .

Proof.

From the previous lemma, for all $t>0$ and $x\in\mathcal{X}$ one has

\langle F(x^{\ast}),x-x^{\ast}\rangle=\frac{\operatorname{Reg}_{x}(t)}{t}\leq\frac{\max h-\min h}{t},

(49)

Letting $t\to\infty$ implies $x^{\ast}\in\operatorname{Eq}(\mathcal{G})$ , as was to be shown. ∎

Lemma 6.

If $h$ satisfies (StrCvx), then for all $x\in\mathcal{X}$ and all sequences $y_{n}$ in $\mathbb{R}^{\mathcal{A}}$ .

F_{h}(x,y_{n})\to 0\implies Q_{h}(y_{n})\to x

(50)

Proof.

See Lemma A.2 in the appendix. ∎

We also need to assume the so called “reciprocity’ condition which requires that for all $x\in\mathcal{X}$ and all sequences $y_{n}$ in $\mathbb{R}^{\mathcal{A}}$ :

Q_{h}(y_{n})\to x\implies F_{h}(x,y_{n})\to 0

(Rec)

Taken together, (StrCvx) and (Rec) imply that the sequence $x_{n}=Q_{h}(y_{n})$ converges to $x\in\mathcal{X}$ if and only if $F_{h}(x,y_{n})\to 0$ . The terminology “reciprocity” is justified by the fact that, under (Rec), the topology induced by the level sets of the Fenchel coupling becomes equivalent to the ordinary topology on $\mathcal{X}$ ; for a precise statement, see Lemma A.2 in Appendix A. This is a subtle – but important – requirement, which is fortunately satisfied by all the standard regularization choices on $\mathcal{X}$ , cf. [34, 35].

We can now state our convergence results for ( ‣ 3.2) when the learning rate is constant.

Theorem 4.4.

Let $x_{t}$ be a solution trajectory of ( ‣ 3.2) with constant learning rate $\eta_{t}\equiv 1$ . Assume further that one of the following holds:

\edefitn(\edefitit\selectfonta \edefitn)

$\mathcal{G}$ is potential and $h^{\ast}$ is twice differentiable.
\edefitn(\edefitit\selectfonta \edefitn)

$\mathcal{G}$ is strictly monotone, and $h$ satisfies (StrCvx) and (Rec).

Then $x_{t}$ converges to $\operatorname{Eq}(\mathcal{G})$ .

Proof.

Again, we treat the potential and monotone cases separately.

Potential case.

As $y_{t}=\int_{0}^{t}F(x_{s})\>ds$ and $x_{t}=Q_{h}(y_{t})=\nabla h^{\ast}(y_{t})$ we obtain that:

\dot{x}_{t}=\langle F(x_{t}),\nabla^{2}h^{\ast}(y_{t})\rangle.

(51)

Since $h^{\ast}$ is convex, $\nabla^{2}h^{\ast}$ is semidefinite positive and so:

\frac{d}{dt}P(x_{t})=\langle F(x_{t}),\dot{x}_{t}\rangle=F(x_{t})\nabla^{2}h^{\ast}(y_{t})F(x_{t})\geq 0,

(52)

the inequality being strict whenever $\dot{x}_{t}\neq 0$ . As such, $x_{t}$ converges to $\operatorname{Eq}(\mathcal{G})$ .

Monotone case.

As $\mathcal{G}$ is strictly monotone, $\operatorname{Eq}(\mathcal{G})$ is a singleton which coincides with $x^{\ast}$ the (necessarily unique) GESS of the game. The calculation in (48) gives:

\frac{d}{dt}F_{h}(x^{\ast},y_{t})=\langle F(x_{t}),x_{t}-x^{\ast}\rangle\leq 0

(53)

As $x^{\ast}$ is a GESS $\frac{d}{dt}F_{h}(x^{\ast},y_{t})\leq 0$ , with equality if and only if $x_{t}=Q_{h}(y_{t})=x^{\ast}$ . Suppose there is $\delta>0$ such that $\langle F(x_{t}),x_{t}-x^{\ast}\rangle<-\delta<0$ for all $t$ large enough. Then, $F_{h}(x^{\ast},y_{t})$ goes to $-\infty$ which is impossible because $F_{h}(\cdot)\geq 0$ . Thus there is $t_{k}\rightarrow\infty$ such that $\langle F(x_{t_{k}}),x_{t_{k}}-x^{\ast}\rangle\rightarrow 0$ and so $x_{t_{k}}=Q_{h}(y_{t_{k}})\rightarrow x^{\ast}$ . By the reciprocity condition (Rec), we deduce that $F_{h}(x^{\ast},y_{t_{k}})\rightarrow 0$ since $x^{\ast}$ is a GESS. As $\frac{d}{dt}F_{h}(x^{\ast},y_{t})\leq 0$ and $F_{h}(\cdot)\geq 0$ , necessarily $F_{h}(x^{\ast},y_{t})$ converges and by the last argument, its limit is zero. Using the strong convexity of $h$ , Lemma 6 implies that $x_{t}=Q_{h}(y_{t})$ converges to $x^{\ast}$ . ∎

There are several things to note here. First, the proof of convergence to the unique equilibrium in strictly monotone games (part b of Theorem 4.4) extends to all games with a GESS. Second, the same proof implies that in all games, ESS (if exist) are asymptotically stable under ( ‣ 3.2) with constant learning rate (e.g., the dynamics converges to an ESS if it starts sufficiently close to it). Third, Theorem 4.4 concerns the “day-to-day” evolution of ( ‣ 3.2), not the corresponding empirical mean. Last, the theorem does not apply to monotone games which are not strict.

To complete the picture, the next proposition shows the convergence of the empirical mean to the set of Nash equilibria in all monotone games.

Proposition 4.1.

Let $x_{t}=Q_{h}(y_{t})$ be a solution trajectory of ( ‣ 3.2) with constant learning rate $\eta_{t}\equiv 1$ . If the underlying game is monotone, then $\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}ds$ converges to $\operatorname{Eq}(\mathcal{G})$ .

Proof.

By Lemma 4 and monotonicity of the game, for all $x\in\mathcal{X}$ :

\max h-\min h\geq\operatorname{Reg}_{x}(t)=\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds\geq\int_{0}^{t}\langle F(x),x-x_{s}\rangle\>ds

(54)

and hence

\frac{\max h-\min h}{t}\geq\langle F(x),x-\mu_{t}\rangle.

(55)

Consequently, $\mu_{t}$ converges to $\operatorname{MVI}(\mathcal{G})$ which coincides with $\operatorname{Eq}(\mathcal{G})$ as the game is monotone. ∎

4.5. \AclDAD with a variable learning rate

This subsection extends the results of the previous section on monotone games in two directions: we consider GESSs instead of (strictly) monotone games, and we treat dynamics with a variable learning rate. Other extensions are also possible, but we do not treat them here.

We begin with a technical lemma on the regret incurred by ( ‣ 3.2):

Lemma 7.

If $x_{t}$ is a solution trajectory of ( ‣ 3.2) then the regret is bounded as follows:

\operatorname{Reg}_{x}(t)\leq\frac{\max h-\min h}{\eta_{t}}.

(56)

Consequently, if $t\eta_{t}\to\infty$ and $x_{t}=x^{\ast}$ for all $t\geq 0$ , then $x^{\ast}\in\operatorname{Eq}(\mathcal{G})$ .

Proof.

The bound in the regret follows from [26, Theorem 4.1], and the implication follows as in Lemma 5 above. ∎

Lemma 7 allows to extend proposition 4.1 to variable learning rates.

Proposition 4.2.

Let $x_{t}=Q_{h}(y_{t})$ be a solution trajectory of ( ‣ 3.2). If the underlying game is monotone and $t\eta_{t}\to\infty$ then $\mu_{t}=\frac{1}{t}\int_{0}^{t}x_{s}\>ds$ converges to $\operatorname{Eq}(\mathcal{G})$ .

The next result extends part b) of theorem 4.4 to variable learning rates. The proof is a little more complicated.

Theorem 4.5.

Let $x_{t}$ be a solution trajectory of ( ‣ 3.2) with a non-increasing learning rate. Suppose further that the underlying game $\mathcal{G}$ admits a GESS $x^{\ast}$ . If the regularizer satisfies (StrCvx) and (Rec), and the dynamics’ learning rate satisfies $\lim_{t\to\infty}t\eta_{t}=\infty$ , then $x_{t}$ converges to $x^{\ast}$ .

Proof.

[26, Eq. 29] prove that

\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}\leq\langle F(x_{t}),x_{t}-x^{\ast}\rangle-\frac{\dot{\eta}}{\eta_{t}^{2}}(\max h-\min h).

(57)

Let $\mathcal{F}_{\varepsilon}=\{y\in\mathbb{R}^{\mathcal{A}}:F_{h}(x^{\ast},y)\leq\varepsilon\}$ denote the “Fenchel zone” of magnitude $\varepsilon$ with respect to $x^{\ast}$ . Then it suffices to show that

For all

\varepsilon>0

, there exists

T(\varepsilon)>0

such that

\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}

for all

t\geq T(\varepsilon)

(P)

Actually this means that $F_{h}(x^{\ast},\eta_{t}y_{t})\to 0$ . By (StrCvx), (50) holds and thus $x_{t}$ converges to $x^{\ast}$ . The proof of (P) follows from the following two claims:

Claim 1.

$\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}$ infinitely often.

Proof.

If $\eta_{t}y_{t}\notin\mathcal{F}_{\varepsilon}$ , then by (Rec), there exists a neighbourhood $\mathcal{U}(\varepsilon)$ of $x^{\ast}$ such that $x_{t}\notin\mathcal{U}(\varepsilon)$ . By definition of GESS, there is $c(\varepsilon)>0$ such that $\langle F(x_{t}),x_{t}-x^{\ast}\rangle<-2c(\varepsilon)$ . As $\frac{\dot{\eta}_{t}}{\eta_{t}^{2}}\rightarrow 0$ , there is $T_{1}(\varepsilon)$ such that for all $t\geq T_{1}(\varepsilon)$ we have $\frac{\dot{\eta}_{t}}{\eta_{t}^{2}}(\max h-\min h)<c(\varepsilon)$ . Consequently by contradiction to the claim, $\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}<-c(\varepsilon)$ for all $t$ large enough and so $\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}$ goes to $-\infty$ . Since $F_{h}(x^{\ast},\eta_{t}y_{t})\geq 0$ this is impossible. ∎

Claim 2.

There exists $T_{2}(\varepsilon)\geq T_{1}(\varepsilon/2)$ such that if $t\geq T_{2}(\varepsilon)$ and $\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}$ then there exists $\tau>0$ such that $\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon}$ for all $s\in[t,t+\tau]$ .

Proof.

We need to consider two cases separately:

Case a).

If $\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon/2}$ then obviously, there is $\tau>0$ such that $\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon}$ for all $s\in[t,t+\tau]$ .
Case b).

If $\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}$ but $\eta_{t}y_{t}\notin\mathcal{F}_{\varepsilon/2}$ then, from claim 1, $\frac{d}{dt}\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}<-c(\varepsilon/2)<0$ . Thus there exists $\tau>0$ such that for all $s\in[t,t+\tau]$ : $\frac{F_{h}(x^{\ast},\eta_{s}y_{s})}{\eta_{s}}\leq\frac{F_{h}(x^{\ast},\eta_{t}y_{t})}{\eta_{t}}$ . As $\eta_{t}y_{t}\in\mathcal{F}_{\varepsilon}$ , $F_{h}(x^{\ast},\eta_{t}y_{t})\leq\varepsilon$ and since $\eta_{s}\leq\eta_{t}$ , we deduce that $F_{h}(x^{\ast},\eta_{s}y_{s})\leq\varepsilon$ . Thus $\eta_{s}y_{s}\in\mathcal{F}_{\varepsilon}$ for all $s\in[t,t+\tau]$ . ∎

From Claims 1 and 2, there exists $t\geq\max(T_{2}(\varepsilon),T(\varepsilon))$ , such that $\eta_{t}y_{t}$ belongs to $\mathcal{F}_{\varepsilon}$ and remains in it forever. ∎

5. Analysis and results in discrete time

In this section, we continue with the analysis of the discrete-time dynamics presented in Section 3.1, starting with fictitious play and its variants in Section 5.1 before moving on to the dynamics of dual averaging in Section 5.2.

5.1. \AclFP and regularized fictitious play

The convergence of ( ‣ 3.1) and ( ‣ 3.1) can be obtained from the continuous-time analysis of Section 4 using the theory of stochastic approximation [2, 4, 5]. Informally, this theory links the asymptotic behavior of differential inclusions of the form

\dot{x}_{t}\in M(x_{t})

(58)

to the limit sets of discrete-time processes satisfying the basic recursion

X_{n+1}-X_{n}\in\gamma_{n+1}M(X_{n}),

(59)

where $M\colon\mathcal{X}\rightrightarrows\mathbb{R}^{\mathcal{A}}$ is an upper semicontinuous, compact-convex-valued correspondence on $\mathcal{X}$ , and $\gamma_{n}>0$ is a vanishing step-size sequence.

In our specific context, the role of $M$ is to be played by the best-reply correspondence $\operatorname{\mathsf{BR}}$ for ( ‣ 3.1), and its regularized variant $\operatorname{\mathsf{BR}}_{\varepsilon h}$ for ( ‣ 3.1). Concretely, we have:

Proposition 5.1.

Let $\bar{X}_{n}=(1/n)\sum_{k=1}^{n}X_{k}$ , $n=1,2,\dotsc$ , be the empirical frequency of play under a sequence of population states $X_{n}\in\mathcal{X}$ . Then:

\edefitn(\edefitit\selectfonti \edefitn)

If $X_{n}$ is generated by ( ‣ 3.1), $\bar{X}_{n}$ is an asymptotic pseudotrajectory of ( ‣ 3.2).
\edefitn(\edefitit\selectfonti \edefitn)

If $X_{n}$ is generated by ( ‣ 3.1), $\bar{X}_{n}$ is an asymptotic pseudotrajectory of ( ‣ 3.2).

Informally, the notion of an asymptotic pseudotrajectory (APT) means that $\bar{X}_{n}$ asymptotically tracks the pseudo-flow of ( ‣ 3.2)/( ‣ 3.2) with arbitrary accuracy over windows of arbitrary length; for a precise definition, see Benaïm et al. [4, Definitions I and 4.1].The proof of Proposition 5.1 follows immediately from the general theory of Benaïm et al. [4, Propositions 1.3, 1.4, and Theorem 4.2], so we do not present it here.

In view of this “asymptotic approximation” property, it is plausible to expect that $\bar{X}_{n}$ exhibits the same asymptotic behavior as the corresponding continuous-time dynamics. Our next result makes this intuition precise in two classes of games:

(1)

Monotone games, i.e., games that satisfy (Mon).
(2)

Potential games for which the $\varepsilon$ -regularized potential $P_{\varepsilon}=P-\varepsilon h$ , $\varepsilon\geq 0$ , satisifies the Sard-type condition:

$\text{$P_{\varepsilon}(\operatorname{Eq}_{\varepsilon}(\mathcal{G}))$ has empty interior}.$ (S_ε)

Our main results for ( ‣ 3.1) and ( ‣ 3.1) may then be stated as follows:

Theorem 5.1.

Suppose that $\mathcal{G}$ satisfies (Mon) or (S_ε) for $\varepsilon=0$ . Then the empirical frequency of play $\bar{X}_{n}$ under ( ‣ 3.1) converges to $\operatorname{Eq}(\mathcal{G})$ .

Theorem 5.2.

Suppose (Diff) holds and $\mathcal{G}$ satisfies (Mon) or (S_ε) for some $\varepsilon>0$ . Then the empirical frequency of play $\bar{X}_{n}$ under ( ‣ 3.1) converges to $\operatorname{Eq}_{\varepsilon}(\mathcal{G})$ .

Sketch of proof.

The proof of Theorems 5.1 and 5.2 follows a similar technical path starting with the fact that $\bar{X}_{n}$ is an APT of ( ‣ 3.2) / ( ‣ 3.2) respectively (cf. Proposition 5.1 above). In both cases, Theorem 4.3 of Benaïm et al. [4] shows that the accumulation points of an APT lie in an internally chain transitive set of the underlying continuous-time dynamics.²²2An internally chain transitive set is a compact invariant set that contains no proper attractors [10]. The claims of Theorem 5.1 may then be proved as follows:

(1)

For games satisfying (Mon), invoke Theorem 4.1 from Section 4 and Theorem 3.23 and Proposition 3.25 of Benaïm et al. [4].
(2)

For games satisfying (S_ε) with $\varepsilon=0$ , invoke Theorem 4.1 from Section 4 and Proposition 3.27 of Benaïm et al. [4].

Finally, for $\varepsilon>0$ , the proof of Theorem 5.2 is entirely analogous, and simply invokes Theorem 4.2 in lieu of Theorem 4.1 in the above chain of implications. ∎

5.2. \AclDA

We now proceed to state and prove our main convergence result for the recursion ( ‣ 3.1). To do so, we will require the “reciprocity” condition (Rec) that was introduced for the study of the continuous-time dynamics ( ‣ 3.2), namely

Q_{h}(y_{n})\to x\implies F_{h}(x,y_{n})\to 0

(Rec)

for all $x\in\mathcal{X}$ and all sequences $y_{n}$ in $\mathbb{R}^{\mathcal{A}}$ . We then have:

Theorem 5.3.

Suppose that ( ‣ 3.1) is run with a regularizer satisfying (StrCvx) and (Rec), and a vanishing non-increasing learning rate $\eta_{n}\searrow 0$ such that

\lim_{n\to\infty}\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n-1}}\right)=0.

(60)

Then:

•

If $x^{\ast}$ is a GESS of $\mathcal{G}$ , the sequence of population states $X_{n}$ converges to $x^{\ast}$ .
•

If $x^{\ast}$ is an ESS of $\mathcal{G}$ , the same conclusion holds provided that $\max\{\eta_{n},\eta_{n}^{-1}-\eta_{n-1}^{-1}\}$ is small enough and ( ‣ 3.1) is initialized sufficiently close to $x^{\ast}$ .

Corollary 5.1.

Suppose that $\mathcal{G}$ is strictly monotone and ( ‣ 3.1) is run with a regularizer satisfying (StrCvx) and (Rec), and a learning rate of the form $\eta_{n}\propto 1/n^{p}$ for $p\in(0,1)$ . Then the sequence of population states induced by ( ‣ 3.1) converges to the (necessarily unique) equilibrium of $\mathcal{G}$ .

Our proof relies on the use of a suitable “energy inequality”, provided here by a deflated variant of the Fenchel coupling:

Lemma 8.

Fix a population state $x\in\mathcal{X}$ , and let

E_{n}\coloneqq\frac{1}{\eta_{n}}F_{h}(x,\eta_{n}S_{n})=\frac{1}{\eta_{n}}[h(x)-h(X_{n+1})+\eta_{n}\langle S_{n},X_{n+1}-x\rangle],

(61)

with $S_{n}$ as in ( ‣ 3.1). Then, for all $n=1,2,\dotsc$ , we have:

E_{n}\leq E_{n-1}+\langle F(X_{n}),X_{n}-x\rangle+[h(x)-\min h]\,\delta_{n}+\frac{F_{h}(X_{n},\eta_{n-1}S_{n})}{\eta_{n-1}},

(62)

where $\delta_{n}=1/\eta_{n}-1/\eta_{n-1}$ . If, in addition, $h$ satisfies (StrCvx), we have

E_{n}\leq E_{n-1}+\langle F(X_{n}),X_{n}-x\rangle+[h(x)-\min h]\,\delta_{n}+\frac{\eta_{n-1}}{2K}\lVert F(X_{n})\rVert_{\ast}^{2}.

(63)

Energy inequalities of the form (62) are standard in the analysis of dual averaging schemes, and they go back at least as far as Nesterov [39, 40] (who introduced the method). The specific formulation above follows Héliou et al. [17, Lemma. C.1], so we omit the proof; for a range of similar computations, see Shalev-Shwartz [48, Lemma 2.20], Kwon [25, p. 42] and Sorin [52].

Now, to move forward with the proof of Theorem 5.3, we will require two complementary arguments. The first is a stability result: we show below that, if $X_{n}$ is sufficiently close to an evolutionarily stable state $x^{\ast}$ and the method’s learning rate is “small enough”, then $X_{n+1}$ remains close to $x^{\ast}$ . To state this precisely, let $\mathcal{F}_{\varepsilon}=\{y\in\mathbb{R}^{\mathcal{A}}:F_{h}(x^{\ast},y)\leq\varepsilon\}$ denote the “Fenchel zone” of magnitude $\varepsilon$ with respect to $x^{\ast}$ , and let $\mathcal{D}_{\varepsilon}=Q_{h}(\mathcal{F}_{\varepsilon})=\{x=Q_{h}(y):y\in\mathcal{F}_{\varepsilon}\}$ denote the image of $\mathcal{F}_{\varepsilon}$ under $Q_{h}$ . We then have:

Lemma 9.

Let $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ and assume that (StrCvx) and (Rec) hold. Then, for all sufficiently small $\varepsilon>0$ , we have the following implication (valid for all $n=1,2,\dotsc$ ): if $X_{n}\in\mathcal{D}_{\varepsilon}$ and $\max\{\eta_{n},\delta_{n}\}$ is small enough, we also have $X_{n+1}\in\mathcal{D}_{\varepsilon}$ .

The second component of the proof of Theorem 5.3 is a subsequence extraction argument: if the iterates of ( ‣ 3.1) lie in a neighborhood of $x^{\ast}$ where (ESS) holds, there exists a subsequence of $X_{n}$ that converges to $x^{\ast}$ .

Lemma 10.

Let $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ , let $\mathcal{U}$ be the neighborhood of validity of (ESS), and assume further that (StrCvx) and (60) hold. If $X_{n}\in\mathcal{U}$ for all sufficiently large $n$ , we have $\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert=0$ .

We prove these two ancillary results below.

Proof of Lemma 9.

Let $\mathcal{U}$ be the neighborhood whose existence is guaranteed by (ESS), i.e., $\langle F(x),x-x^{\ast}\rangle<0$ for all $x\in\mathcal{U}$ , $x\neq x^{\ast}$ . By the relation between the Fenchel coupling and the ordinary norm topology on $\mathcal{X}$ (cf. Lemma A.2 in Appendix A), it follows that $\mathcal{D}_{\varepsilon}\subseteq\mathcal{U}$ if $\varepsilon$ is taken small enough; we will assume this to be the case throughout the rest of this proof. Moreover, by (Rec) and the continuity of $u$ , there exists some $c\equiv c(\varepsilon)$ such that $\langle F(x),x-x^{\ast}\rangle\leq-c<0$ for all $x\in\mathcal{D}_{\varepsilon}\setminus\mathcal{D}_{\varepsilon/2}$ .

With all this in hand, assume that $X_{n}\in\mathcal{D}_{\varepsilon}$ ; we will show that $X_{n+1}\in\mathcal{D}_{\varepsilon}$ provided that $\max\{\eta_{n},\delta_{n}\}$ is small enough. To do so, let $F_{n}=F_{h}(x^{\ast},\eta_{n}S_{n})$ ; then, by substituting $x\leftarrow x^{\ast}$ in the template inequality (63) of Lemma 8, we obtain:

Case 1.

If $X_{n}\in\mathcal{D}_{\varepsilon}\setminus\mathcal{D}_{\varepsilon/2}$ , then

$F_{n}\leq F_{n-1}-\eta_{n}\left[c-\delta_{n}0pt-\eta_{n-1}\frac{G^{2}}{2K}\right],$ (64)

where $0pt=\max h-\min h$ and $G=\max_{x\in\mathcal{X}}\lVert F(x)\rVert_{\ast}$ . Hence, if $\delta_{n}\leq c/(20pt)$ and $\eta_{n-1}\leq cK/G^{2}$ , we conclude that $F_{n}\leq F_{n-1}$ .
Case 2.

If $X_{n}\in\mathcal{D}_{\varepsilon/2}$ , then

$F_{n}\leq\frac{\varepsilon}{2}+\delta_{n}0pt+\eta_{n-1}\frac{G^{2}}{2K},$ (65)

so $F_{n}\leq\varepsilon$ provided that $\delta_{n}<\varepsilon/(40pt)$ and $\eta_{n-1}\leq\varepsilon K/(2G^{2})$ .

In both cases, we conclude that $X_{n+1}=Q_{h}(\eta_{n}S_{n})\in\mathcal{D}_{\varepsilon}$ , as claimed. ∎

Proof of Lemma 10.

Assume to the contrary that $\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert>0$ . Then, by assumption, there exist $\varepsilon>0$ and $n_{0}\equiv n_{0}(\varepsilon)$ such that $X_{n}\in\mathcal{U}$ and $\lVert X_{n}-x^{\ast}\rVert\geq\varepsilon$ for all $n\geq n_{0}$ . Since $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ , there exists a positive constant $c>0$ such that $\langle F(X_{n}),X_{n}-x^{\ast}\rangle\leq-c<0$ for all $n\geq n_{0}$ . With this in mind, substituting $x\leftarrow x^{\ast}$ in the template inequality (63) and telescoping yields

	$\displaystyle E_{n}$	$\displaystyle\leq E_{n_{0}}-c(n-n_{0})+\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n_{0}}}\right)0pt+\frac{G^{2}}{2K}\sum_{k=n_{0}}^{n}\eta_{k}$
		$\displaystyle\leq E_{n_{0}}+cn_{0}-0pt/n_{0}-n\left[c-\frac{1}{n\eta_{n}}-\frac{1}{n}\sum_{k=n_{0}}^{n}\eta_{k}\right].$		(66)

To proceed, note that the first part of (60) gives

\lim_{n\to\infty}\frac{\left(1/\eta_{n}-1/\eta_{n-1}\right)}{n-(n-1)}=\lim_{n\to\infty}\left(\frac{1}{\eta_{n}}-\frac{1}{\eta_{n-1}}\right)=\lim_{n\to\infty}\delta_{n}=0,

(67)

so $\lim_{n\to\infty}1/(n\eta_{n})=0$ by the Stolz–Cesàro theorem. By the second part of (60), we also have $(1/n)\sum_{k=n_{0}}^{n}\eta_{k}=0$ . Hence, by letting $n\to\infty$ in (5.2), we get $\lim_{n\to\infty}E_{n}=-\infty$ , a contradiction which completes our proof. ∎

The proof of our main convergence results follows by a tandem application of the above.

Proof of Theorem 5.3.

We consider the two cases separately.

Case 1: $x^{\ast}\in\operatorname{GESS}(\mathcal{G})$ .

Let $\varepsilon>0$ be arbitrary. By (60), it follows that $\lim_{n\to\infty}\eta_{n}=\lim_{n\to\infty}\delta_{n}=0$ , and, by Lemma 10, there exists a subsequence $X_{n_{k}}$ of $X_{n}$ converging to $x^{\ast}$ . This means that, in the long run, $\eta_{n}$ and $\delta_{n}$ become arbitrarily small, and $X_{n}$ comes arbitrarily close to $x^{\ast}$ . Hence, by applying Lemma 9 inductively, we conclude that $X_{n}\in\mathcal{D}_{\varepsilon}$ for all sufficiently large $n$ . Since $\varepsilon>0$ has been chosen arbitrarily, this implies that $X_{n}$ converges to $x^{\ast}$ , as claimed.

Case 2: $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ .

Let $\mathcal{U}$ be the neighborhood whose existence is guaranteed by the evolutionary stability condition (3) for $x^{\ast}$ . Then, by Lemma 9, if ( ‣ 3.1) is initialized sufficiently close to $x^{\ast}$ and $\max\{\eta_{n},\delta_{n}\}$ is sufficiently small, we will have $X_{n}\in\mathcal{U}$ for all $n$ . Hence, by Lemma 10, we conclude that $\liminf_{n\to\infty}\lVert X_{n}-x^{\ast}\rVert=0$ . Our proof then follows by the same inductive application of Lemma 9 as in the case of a GESS above. ∎

Time averages under dual averaging.

We close this section with a result on the empirical frequency of play under ( ‣ 3.1). Indeed, telescoping (62) trivially yields the bound:

\sum_{k=1}^{n}\langle F(X_{k}),x-X_{k}\rangle\leq\frac{\max h-\min h}{\eta_{n}}+\frac{1}{2K}\sum_{k=1}^{n}\eta_{k-1}\lVert F(X_{k})\rVert_{\ast}^{2}.

(68)

This regret-type bound echoes standard results in the literature – see e.g., Nesterov [40, Theorem 1] – and leads to the following time-average convergence result:

Proposition 5.2.

Suppose that ( ‣ 3.1) is run with a regularizer satisfying (StrCvx) and a learning rate of the form $\eta_{n}\propto 1/\sqrt{n}$ . If $\mathcal{G}$ is monotone, the empirical frequency of play $\bar{X}_{n}$ under ( ‣ 3.1) converges to $\operatorname{Eq}(\mathcal{G})$ .

The proof of Proposition 5.2 mimics that of Proposition 4.1 so we omit it. We only note that, in contrast to Theorem 5.3, Proposition 5.2 does not require strict monotonicity or reciprocity; however, it concerns the cruder, average frequency of play, so it provides a considerably weaker guarantee in this regard.

6. Extensions and concluding remarks

In this concluding section, we provide a series of extensions and remarks that would have otherwise disrupted the flow of the rest of our paper.

6.1. Positive correlation

Sandholm [44] introduced the so-called “positive correlation” condition

\dot{x}_{t}\neq 0\implies\langle F(x_{t}),\dot{x}_{t}\rangle>0.

(PC)

In the case of potential games, (PC) implies that

\frac{d}{dt}P(x_{t})=\langle F(x_{t}),\dot{x}_{t}\rangle>0

(69)

whenever $\dot{x}_{t}\neq 0$ . This in turn implies the convergence of the trajectories to the set of rest points of the dynamics [45, Theorem 7.1.2]. Some of our results in Section 4 for potential games can also be obtained by establishing (PC) for the dynamics under study.

6.2. No-regret dynamics

Our second remark concerns continuous- or discrete-time processes for which the average regret vanishes, i.e., for all $x\in\mathcal{X}$ we have:


$\displaystyle\limsup_{t\to\infty}\frac{1}{t}\int_{0}^{t}\langle F(x_{s}),x-x_{s}\rangle\>ds$	$\displaystyle=0$	in continuous time	(70a)
$\displaystyle\limsup_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\langle F(X_{k}),x-X_{k}\rangle$	$\displaystyle=0$	in discrete time	(70b)

As we saw in Propositions 4.1 and 5.2, the dynamics of dual averaging have the no-regret property (70) in both continuous and discrete time. Looking more closely at the proofs of Propositions 4.1 and 5.2, we conclude that, in monotone games, the empirical frequency of play under any dynamical process that satisfies (70) converges to the game’s set of equilibria. This property forms the basis of many algorithms and dynamics designed to solve variational inequalities and equilibrium problems [38, 40, 24]; the above provides a game-theoretic interpretation of this property in the context of population games.

6.3. Extensions

We also note that our results can be extended to the following settings:

(1)

If $\mathcal{X}$ is a convex compact subset of $\mathbb{R}^{\mathcal{A}}$ and $F\colon\mathcal{X}\to\mathbb{R}^{\mathcal{A}}$ is a vector field on $\mathcal{X}$ , we maintain the same results for convergence to solutions of (SVI) and (MVI); however, the population game interpretation disappears.
(2)

This interpretation is recovered in multi-population games: if there are several player populations indexed by $i=1,\dotsc,N$ , and if $\mathcal{A}_{i}$ denotes the set of pure strategies of the $i$ -th population and $u_{i\alpha}\colon\prod_{i=1}^{N}\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i})\to\mathbb{R}$ is the payoff function of $\alpha$ -strategists in the $i$ -th population, it suffices to set $\mathcal{X}=\prod_{i}\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i})$ and $F(x)=(u_{i\alpha_{i}}(x))_{\alpha_{i}\in\mathcal{A}_{i},i\in\mathcal{I}}$ .
(3)

Our theory can also be extended to normal form games with a finite number of players and smooth concave payoff functions $g_{i}$ , $i=1,\dotsc,N$ . In this case, the components of the corresponding vector field are $G_{i\alpha}(x)=\frac{\partial g_{i}}{\partial x_{i\alpha}}$ , and the notion of evolutionary stability boils down to that of variational stability; for a detailed treatment, see Sorin & Wan [53] and Mertikopoulos & Zhou [35].

6.4. Links with variational inequalities and further extensions

Taking a more general point of view that focuses on the vector field $F$ , we can also consider the following framework that brings to the forefront the associated variational inequalities. Let $\mathcal{X}_{i}$ , $i=1,\dotsc,N$ be a convex compact subset of a topological vector space $\mathcal{V}_{i}$ with dual $\mathcal{V}^{\ast}_{i}$ , write $\mathcal{V}=\prod_{i}\mathcal{V}_{i}$ and $\mathcal{X}=\prod_{i}\mathcal{X}_{i}$ , and let $F_{i}\colon\mathcal{X}\to\mathcal{V}^{\ast}_{i}$ be a collection of continuous maps, $i=1,\dotsc,N$ . Finally, set $F(x)=(F_{1}(x),\dotsc,F_{N}(x))$ and, as usual, write

\langle F(x),z\rangle=\sum_{i=1}^{N}\langle F_{i}(x),z_{i}\rangle

(71)

for the dual pairing between $F(x)\in\mathcal{V}^{\ast}=\prod_{i}\mathcal{V}^{\ast}_{i}$ and $z=(z_{1},\dotsc,z_{N})\in\mathcal{V}\coloneqq\prod_{i}\mathcal{V}_{i}$ .

We may then define $\operatorname{SVI}(\mathcal{X},F)$ as the set of $x^{\ast}\in\mathcal{X}$ such that

\langle F(x^{\ast}),x-x^{\ast}\rangle\leq 0,\quad\text{for all $x\in\mathcal{X}$}

(72)

and $\operatorname{MVI}(\mathcal{X},F)$ as the set of $x^{\ast}\in\mathcal{X}$ such that

\langle F(x),x-x^{\ast}\rangle\leq 0\quad\text{for all $x\in\mathcal{X}$}.

(73)

The counterparts for $\operatorname{ESS}$ and $\operatorname{GESS}$ may also be defined similarly (though the link with population games is no longer present).

Example 6.1.

Consider a strategic game with $N$ players, compact metric strategy spaces $\mathcal{A}_{i}\subseteq\mathbb{R}^{d}$ and continuous payoff functions $u_{i}\colon\mathcal{A}\to\mathbb{R}$ , $i=1,\dotsc,N$ , with $\mathcal{A}\coloneqq\prod_{j}\mathcal{A}_{j}$ . Introduce the mixed extension of the game as usual: for a probability distribution (mixed strategy profile) $\chi=(\chi_{1},\dotsc,\chi_{N})\in\mathcal{X}$ with $\chi_{i}\in\mathcal{X}_{i}\coloneqq\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i})$ , let $u_{i}$ be extended as

u_{i}(\chi)=\int_{\mathcal{A}}u_{i}(\alpha_{1},\dotsc,\alpha_{N})\,\>d\chi_{1}(\alpha_{1})\dotsm\>d\chi_{N}(\alpha_{N}).

(74)

Then, introduce $F_{i}(\chi)=u_{i}(\cdot,\chi_{-i})$ so that

\langle F_{i}(\chi),\chi^{\prime}_{i}\rangle=F_{i}(\chi^{\prime}_{i};\chi_{-i})

(75)

for all $\chi^{\prime}_{i}\in\mathcal{X}_{i}=\operatorname{\operatorname{\Delta}}(\mathcal{A}_{i})$ . In this way, the equilibrium condition for $\chi^{\ast}\in\mathcal{X}$ becomes

\langle F(\chi^{\ast}),\chi-\chi^{\ast}\rangle=\sum_{i=1}^{N}\langle F_{i}(\chi^{\ast}),\chi_{i}-\chi^{\ast}_{i}\rangle\leq 0\quad\text{for all $\chi\in\mathcal{X}$},

(76)

which is again an instance of $\operatorname{SVI}(\mathcal{X},F)$ . $\lozenge$

In this setting, $F$ is monotone (dissipative) if

\langle F(x)-F(x^{\prime}),x-x^{\prime}\rangle\leq 0\quad\text{for all $x,x^{\prime}\in\mathcal{X}$}

(77)

As in the case of Section 2, we have $\operatorname{SVI}(\mathcal{X},F)=\operatorname{MVI}(\mathcal{X},F)$ if $F$ is monotone; furthermore, mutatis mutandis, all the properties presented in Section 2.3 extend to this general setup.

The best-response map associated to $F$ is defined again as

\operatorname{\mathsf{BR}}(x)=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\langle F(x),x^{\prime}\rangle.

(78)

In addition to monotonicity, we will also now make the “unique best response” assumption

\operatorname{\mathsf{BR}}(x)

is a singleton for all

x\in\mathcal{X}

(U)

Proposition 6.1.

If $F$ is monotone and (U) holds, the set $E\coloneqq\operatorname{MVI}(\mathcal{X},F)=\operatorname{SVI}(\mathcal{X},F)$ is a singleton.

Proof.

If $x,x^{\prime}\in E$ , we have $\langle F(x),x^{\prime}-x\rangle\leq 0$ and $\langle F(x^{\prime}),x-x^{\prime}\rangle\leq 0$ , so

\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle\geq 0

(79)

implying first $\langle F(x^{\prime})-F(x),x^{\prime}-x\rangle=0$ by monotonicity and then $\langle F(x^{\prime}),x^{\prime}-x\rangle=0$ . Thus, $x=x^{\prime}$ by (U). ∎

The regularized best response map is likewise defined as

\operatorname{\mathsf{BR}}_{\varepsilon h}(x)=\operatorname*{arg\,max}_{x^{\prime}\in\mathcal{X}}\{\langle F(x),x^{\prime}\rangle-\varepsilon h(x^{\prime})\}

(80)

where $h\colon\mathcal{X}\to\mathbb{R}$ satisfies the same axioms as Definition 2.1. Note that $\operatorname{\mathsf{BR}}_{\varepsilon h}$ satisfies (U) by construction.

Now, using these best-response maps, we may extend the definition of the dynamics ( ‣ 3.1), ( ‣ 3.1) and ( ‣ 3.1) to the more general setup considered here in the obvious way. We then have:

Proposition 6.2.

Suppose that (U) holds and let $X_{n}$ , $n=1,2,\dotsc$ , be a sequence of states generated by ( ‣ 3.1). If the accumulation points of $\bar{X}_{n}$ lie in $\operatorname{SVI}(\mathcal{X},F)$ , the same holds for the accumulation points of $X_{n}$ .

Proof.

Let $x^{\ast}$ be an accumulation point of $X_{n+1}=\operatorname{\mathsf{BR}}(\bar{X}_{n})$ , and let $X_{n_{k}+1}$ be a subsequence converging to $x^{\ast}$ . By descending to a further subsequence if necessary, we may assume without loss of generality that $\bar{X}_{n_{k}}$ converges to some limit point $\bar{x}$ . Furthermore, by the definition of ( ‣ 3.1), we have:

\langle F(\bar{X}_{n_{k}}),X_{n_{k}+1}-x\rangle=\langle F(\bar{X}_{n_{k}}),\operatorname{\mathsf{BR}}(\bar{X}_{n_{k}})-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$}.

(81)

Hence, taking the limit $k\to\infty$ , we get

\langle F(\bar{x}),x^{\ast}-x\rangle\geq 0\quad\text{for all $x\in\mathcal{X}$},

(82)

so $x^{\ast}\in\operatorname{\mathsf{BR}}(\bar{x})$ . Since $\bar{x}\in\operatorname{SVI}(\mathcal{X},F)$ by assumption, we deduce further that $\bar{x}\in\operatorname{\mathsf{BR}}(\bar{x})$ . Hence, by (U), we conclude that $\bar{x}=x^{\ast}$ , and our assertion follows. ∎

Obviously, under (U), a similar property holds for regularized fictitious play.

6.5. Preview for Part II

Part II of our paper [16] deals with games with continuous action spaces, in the spirit of Example 6.1 (but with a continuous population of nonatomic players). The first step is to define the Stampacchia variational inequality when $\mathcal{A}$ is a compact metric space, $\Theta$ is an appropriate compact convex subset of $\Delta(\mathcal{A})$ , $F$ is a continuous map $\theta\mapsto F_{\theta}(\cdot)$ from $\Theta$ to the space of continuous functions on $\mathcal{A}$ , and the duality mapping is given by

\langle F_{\theta},\nu\rangle=\int_{\mathcal{A}}F_{\theta}(\alpha)\>d\nu(\alpha)\quad\text{for all $\theta,\nu\in\Theta$}.

(83)

In this framework, we introduce potential and monotone maps, and we prove the convergence of the discrete-time dynamics of fictitious play.

We show that this study covers the case of nonatomic games with (player-dependent) compact action spaces. Finally, we introduce a class of first-order mean-field games and prove that solutions of the Stampacchia variational inequality correspond to the equilibria of the game in normal form where $\mathcal{A}$ is the set of paths generated by the players.

Appendix A Properties of choice maps and the Fenchel coupling

Our aim in this appendix is to provide some useful background and properties for the regularized choice map $h$ and the Fenchel coupling.

To recall the basic setup, we assume below that $h$ is a regularizer on $\mathcal{X}$ in the sense of Definition 2.1. The convex conjugate $h^{\ast}\colon\mathbb{R}^{\mathcal{A}}\to\mathbb{R}$ of $h$ is then defined as

h^{\ast}(y)=\max_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\}.

(A.1)

Since $\mathcal{X}$ is compact and $h$ is strictly convex and continuous on $\mathcal{X}$ , the maximum in (A.1) is attained at a unique point in $\mathcal{X}$ , so $h^{\ast}(y)$ is finite for all $y\in\mathbb{R}^{\mathcal{A}}$ [1]. Moreover, by standard results in convex analysis [42, Chap. 26], $h^{\ast}$ is differentiable on $\mathbb{R}^{\mathcal{A}}$ and its gradient satisfies the identity

\nabla h^{\ast}(y)=\operatorname*{arg\,max}_{x\in\mathcal{X}}\{\langle y,x\rangle-h(x)\}=Q(y)

(A.2)

where $Q\colon\mathbb{R}^{\mathcal{A}}\to\mathcal{X}$ is the choice map induced by $h$ (cf. Definition 2.1).

The lemma below establishes the inverse differentiability property for $h$ , namely that $\partial h=Q^{-1}$ :

Lemma A.1.

Let $h$ be a regularizer on $\mathcal{X}$ . Then, for all $x\in\mathcal{X}$ and all $y\in\mathbb{R}^{\mathcal{A}}$ , we have:

x=Q(y)\;\iff\;y\in\partial h(x).

(A.3)

If, in addition, $h$ satisfies (Diff), we also have

\langle\nabla h(x),x-p\rangle\leq\langle y,x-p\rangle

(A.4)

for all $p\in\mathcal{X}$ and all $y\in\partial h(x)$ , $x\in\operatorname{dom}\partial h$ .

Proof of Lemma A.1.

To prove (A.3), note that $x$ solves (A.2) if and only if $y-\partial h(x)\ni 0$ , i.e., if and only if $y\in\partial h(x)$ .

For the inequality (A.4), it suffices to show it holds for all $p\in\operatorname{dom}\partial h$ (by continuity). To do so, let

\phi(t)=h(x+t(p-x))-[h(x)+\langle y,x+t(p-x)\rangle].

(A.5)

Since $h$ is strictly convex and $y\in\partial h(x)$ by (A.3), it follows that $\phi(t)\geq 0$ with equality if and only if $t=0$ . Moreover, note that $\psi(t)=\langle\nabla h(x+t(p-x))-y,p-x\rangle$ is a continuous subgradient selection of $\phi$ . Given that $\phi$ and $\psi$ are both continuous on $[0,1]$ , it follows a fortiori that $\phi$ is continuously differentiable and $\phi^{\prime}=\psi$ on $[0,1]$ . Thus, with $\phi$ convex and $\phi(t)\geq 0=\phi(0)$ for all $t\in[0,1]$ , we conclude that $\phi^{\prime}(0)=\langle\nabla h(x)-y,p-x\rangle\geq 0$ , from which our claim follows. ∎

Lemma A.2.

Let $h$ be a regularizer on $\mathcal{X}$ . Then, for all $p\in\mathcal{X}$ and all $y\in\mathbb{R}^{\mathcal{A}}$ , we have

F_{h}(p,y)\geq 0\quad\text{with equality if and only if $p=Q(y)$}.

(A.6)

If, in addition, if $h$ satisfies (StrCvx), we also have:

F_{h}(p,y)\geq\tfrac{K}{2}\lVert Q(y)-p\rVert^{2}.

(A.7)

Proof.

Our first claim is a trivial consequence of the Fenchel-Young inequality and the strict convexity of $h$ . For our second claim, let $x=Q(y)$ . Then, by the definition (A.1) of the convex conjugate of $h$ , we have:

$\displaystyle F_{h}(p,y)$	$\displaystyle=h(p)+h^{\ast}(y)-\langle y,p\rangle$
	$\displaystyle=h(p)+\langle y,Q(y)\rangle-h(Q(y))-\langle y,p\rangle$
	$\displaystyle=h(p)-h(x)-\langle y,p-x\rangle$	(A.8)

Now, since $h$ is $K$ -strongly convex, we also have:

h((1-\lambda)x+\lambda p)+\lambda(1-\lambda)(K/2)\lVert x-p\rVert^{2}\leq(1-\lambda)h(x)+\lambda h(p)

(A.9)

and, with $x=Q(y)$ , we get:

\langle y,x\rangle-h(x)\geq\langle y,(1-\lambda)x+\lambda p\rangle-h((1-\lambda)x+\lambda p).

(A.10)

Thus, after rearranging, dividing by $\lambda$ , and letting $\lambda\to 0$ , we obtain

h(p)-h(x)-\langle y,p-x\rangle\geq(K/2)\lVert x-p\rVert^{2}

(A.11)

and our assertion follows from (A). ∎

By virtue of this lemma, we obtain the following convergence criterion in terms of the Fenchel coupling:

Corollary A.1.

Let $h$ be a regularizer on $\mathcal{X}$ satisfying (StrCvx), fix a base point $p\in\mathcal{X}$ , and let $y_{n}$ be a sequence in $\mathbb{R}^{\mathcal{A}}$ . Then, if $F_{h}(p,y_{n})\to 0$ , we also have $Q(y_{n})\to p$ .

Proof.

By Lemma A.2, we have $\lVert Q(y_{n})-p\rVert\leq\sqrt{(2/K)F_{h}(p,y_{n})}$ , so our claim follows. ∎

Acknowledgments

The authors acknowledge support by the COST Action CA16228 “European Network for Game Theory” (GAMENET), the FMJH Program PGMO under grant Damper, and the French National Research Agency (ANR) in the framework of the “Investissements d’avenir” program (ANR-15-IDEX-02), the LabEx PERSYVAL (ANR-11-LABX-0025-01), MIAI@Grenoble Alpes (ANR-19-P3IA-0003), and the grants ORACLESS (ANR-16-CE33-0004) and ALIAS (ANR-19-CE48-0018-01).

References

Bauschke & Combettes [2017] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York, NY, USA, 2 edition, 2017.
Benaïm [1999] Michel Benaïm. Dynamics of stochastic approximation algorithms. In Jacques Azéma, Michel Émery, Michel Ledoux, and Marc Yor (eds.), Séminaire de Probabilités XXXIII, volume 1709 of Lecture Notes in Mathematics, pp. 1–68. Springer Berlin Heidelberg, 1999.
Benaïm & Faure [2013] Michel Benaïm and Mathieu Faure. Consistency of vanishingly smooth fictitious play. Mathematics of Operations Research, 38(3):437–450, 2013.
Benaïm et al. [2005] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions. SIAM Journal on Control and Optimization, 44(1):328–348, 2005.
Benaïm et al. [2006] Michel Benaïm, Josef Hofbauer, and Sylvain Sorin. Stochastic approximations and differential inclusions, part II: Applications. Mathematics of Operations Research, 31(4):673–695, 2006.
Berge [1997] Claude Berge. Topological Spaces. Dover, New York, 1997.
Bravo & Mertikopoulos [2017] Mario Bravo and Panayotis Mertikopoulos. On the robustness of learning in games with stochastically perturbed payoff observations. Games and Economic Behavior, 103, John Nash Memorial issue:41–66, May 2017.
Brown [1951] George W. Brown. Iterative solutions of games by fictitious play. In T. C. Coopmans (ed.), Activity Analysis of Productions and Allocation, 374-376. Wiley, 1951.
Cesa-Bianchi & Lugosi [2006] Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
Conley [1978] Charles Cameron Conley. Isolated Invariant Set and the Morse Index. American Mathematical Society, Providence, RI, 1978.
Dafermos [1980] Stella Dafermos. Traffic equilibrium and variational inequalities. Transportation Science, 14(1):42–54, February 1980.
Friedman [1991] Daniel Friedman. Evolutionary games in economics. Econometrica, 59(3):637–666, 1991.
Fudenberg & Levine [1998] Drew Fudenberg and David K. Levine. The Theory of Learning in Games, volume 2 of Economic learning and social evolution. MIT Press, Cambridge, MA, 1998.
Fudenberg & Levine [1999] Drew Fudenberg and David K. Levine. Conditional universal consistency. Games and Economic Behavior, 29(1):104–130, 1999.
Gilboa & Matsui [1991] Itzhak Gilboa and Akihiko Matsui. Social stability and equilibrium. Econometrica, 59(3):859–867, May 1991.
Hadikhanloo et al. [2021] Saeed Hadikhanloo, Rida Laraki, Panayotis Mertikopoulos, and Sylvain Sorin. Learning in nonatomic games, Part II: Continuous action spaces and mean-field games. Mimeo, 2021.
Héliou et al. [2020] Amélie Héliou, Matthieu Martin, Panayotis Mertikopoulos, and Thibaud Rahier. Online non-convex optimization with imperfect feedback. In NeurIPS ’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020.
Hofbauer & Sandholm [2002] Josef Hofbauer and William H. Sandholm. On the global convergence of stochastic fictitious play. Econometrica, 70(6):2265–2294, November 2002.
Hofbauer & Sandholm [2009] Josef Hofbauer and William H. Sandholm. Stable games and their dynamics. Journal of Economic Theory, 144(4):1665–1693, July 2009.
Hofbauer & Sigmund [1998] Josef Hofbauer and Karl Sigmund. Evolutionary Games and Population Dynamics. Cambridge University Press, Cambridge, UK, 1998.
Hofbauer & Sigmund [2003] Josef Hofbauer and Karl Sigmund. Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479–519, July 2003.
Hofbauer et al. [1979] Josef Hofbauer, Peter Schuster, and Karl Sigmund. A note on evolutionarily stable strategies and game dynamics. Journal of Theoretical Biology, 81(3):609–612, 1979.
Hofbauer et al. [2009] Josef Hofbauer, Sylvain Sorin, and Yannick Viossat. Time average replicator and best reply dynamics. Mathematics of Operations Research, 34(2):263–269, May 2009.
Juditsky et al. [2011] Anatoli Juditsky, Arkadi Semen Nemirovski, and Claire Tauvel. Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems, 1(1):17–58, 2011.
Kwon [2016] Joon Kwon. Stratégies de descente miroir pour la minimisation du regret et l’approchabilité. PhD thesis, Université Pierre-et-Marie-Curie, 2016.
Kwon & Mertikopoulos [2017] Joon Kwon and Panayotis Mertikopoulos. A continuous-time approach to online optimization. Journal of Dynamics and Games, 4(2):125–148, April 2017.
Mas-Colell [1984] Andreu Mas-Colell. On a theorem of Schmeidler. Journal of Mathematical Economics, 13:201–206, 1984.
Maynard Smith & Price [1973] John Maynard Smith and George R. Price. The logic of animal conflict. Nature, 246:15–18, November 1973.
McFadden [1974] Daniel L. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka (ed.), Frontiers in Econometrics, pp. 105–142. Academic Press, New York, NY, 1974.
McKelvey & Palfrey [1995] Richard D. McKelvey and Thomas R. Palfrey. Quantal response equilibria for normal form games. Games and Economic Behavior, 10(6):6–38, 1995.
Mertikopoulos & Moustakas [2010] Panayotis Mertikopoulos and Aris L. Moustakas. The emergence of rational behavior in the presence of stochastic perturbations. The Annals of Applied Probability, 20(4):1359–1388, July 2010.
Mertikopoulos & Sandholm [2016] Panayotis Mertikopoulos and William H. Sandholm. Learning in games via reinforcement and regularization. Mathematics of Operations Research, 41(4):1297–1324, November 2016.
Mertikopoulos & Sandholm [2018] Panayotis Mertikopoulos and William H. Sandholm. Riemannian game dynamics. Journal of Economic Theory, 177:315–364, September 2018.
Mertikopoulos & Staudigl [2018] Panayotis Mertikopoulos and Mathias Staudigl. On the convergence of gradient-like flows with noisy gradient input. SIAM Journal on Optimization, 28(1):163–197, January 2018.
Mertikopoulos & Zhou [2019] Panayotis Mertikopoulos and Zhengyuan Zhou. Learning in games with continuous action sets and unknown payoff functions. Mathematical Programming, 173(1-2):465–507, January 2019.
Minty [1967] George J. Minty. On the generalization of a direct method of the calculus of variations. Bulletin of the American Mathematical Society, 73(3):315–321, May 1967.
Monderer & Shapley [1996] Dov Monderer and Lloyd S. Shapley. Potential games. Games and Economic Behavior, 14(1):124 – 143, 1996.
Nemirovski [2004] Arkadi Semen Nemirovski. Prox-method with rate of convergence ${O}(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):229–251, 2004.
Nesterov [2007] Yurii Nesterov. Dual extrapolation and its applications to solving variational inequalities and related problems. Mathematical Programming, 109(2):319–344, 2007.
Nesterov [2009] Yurii Nesterov. Primal-dual subgradient methods for convex problems. Mathematical Programming, 120(1):221–259, 2009.
Robinson [1951] Julia Robinson. An iterative method for solving a game. Annals of Mathematics, 54:296–301, 1951.
Rockafellar [1970] Ralph Tyrrell Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970.
Rustichini [1999] Aldo Rustichini. Optimal properties of stimulus-response learning models. Games and Economic Behavior, 29(1-2):244–273, 1999.
Sandholm [2001] William H. Sandholm. Potential games with continuous player sets. Journal of Economic Theory, 97:81–108, 2001.
Sandholm [2010] William H. Sandholm. Population Games and Evolutionary Dynamics. MIT Press, Cambridge, MA, 2010.
Sandholm [2015] William H. Sandholm. Population games and deterministic evolutionary dynamics. In H. Peyton Young and Shmuel Zamir (eds.), Handbook of Game Theory IV, pp. 703–778. Elsevier, 2015.
Schmeidler [1973] David Schmeidler. Equilibrium points of nonatomic games. Journal of Statistical Physics, 7(4):295–300, April 1973.
Shalev-Shwartz [2011] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011.
Shalev-Shwartz & Singer [2006] Shai Shalev-Shwartz and Yoram Singer. Convex repeated games and Fenchel duality. In NIPS’ 06: Proceedings of the 19th Annual Conference on Neural Information Processing Systems, pp. 1265–1272. MIT Press, 2006.
Sorin [2009] Sylvain Sorin. Exponential weight algorithm in continuous time. Mathematical Programming, 116(1):513–528, 2009.
Sorin [2020] Sylvain Sorin. Replicator dynamics: Old and new. Journal of Dynamics and Games, 7(4):365–386, 2020.
Sorin [2021] Sylvain Sorin. No-regret algorithms in online learning, games and convex optimization. Mimeo, 2021.
Sorin & Wan [2016] Sylvain Sorin and Cheng Wan. Finite composite games: Equilibria and dynamics. Journal of Dynamics and Games, 3(1):101–120, January 2016.
Taylor [1979] Peter D. Taylor. Evolutionarily stable strategies with two types of player. Journal of Applied Probability, 16(1):76–83, March 1979.
Taylor & Jonker [1978] Peter D. Taylor and Leo B. Jonker. Evolutionary stable strategies and game dynamics. Mathematical Biosciences, 40(1-2):145–156, 1978.
van Damme [1987] Eric van Damme. Stability and perfection of Nash equilibria. Springer-Verlag, Berlin, 1987.
Weibull [1995] Jörgen W. Weibull. Evolutionary Game Theory. MIT Press, Cambridge, MA, 1995.

LEARNING IN NONATOMIC GAMES, PART I: FINITE ACTION SPACES AND POPULATION GAMES

Abstract.

Key words and phrases:

2020 Mathematics Subject Classification:

1. Introduction

Overview of the paper.

2. Preliminaries

2.1. Nonatomic games

2.2. Equilibria, stability, and variational inequalities

Remark.

Characterizations and implications.

2.3. Classes of games

2.4. Regularization

Definition 2.1.

Remark 2.1.

Remark 2.2.

Definition 2.2.

Lemma 1.

Example 2.1 (Logit choice).

3. Dynamics

3.1. Discrete-time processes

\AclFP.

\AclRFP.

Remark.

\AclVRFP.

\AclDA.

Remark.

Links between the dynamics: the case of random matching.

Remark.

3.2. Continuous-time dynamics

\AclBRD.

\AclRBRD.

Example 3.1 (The logit dynamics).

Vanishing regularized best reply dynamics.

The dual averaging dynamics.

Example 3.2.

Example 3.3.

Remark.

Random matching in continuous time.

4. Analysis and results in continuous time

4.1. \AclBRD

Theorem 4.1.

Proof.

Potential case.

Monotone case.

4.2. \AclRBRD

Lemma 2.

Proof.

Lemma 3.

Proof.

Theorem 4.2.

Remark.

Proof.

Potential case.

Monotone case.

4.3. Vanishing regularized best reply dynamics

Theorem 4.3.

Remark.

Proof.

Potential case.

Monotone case.

4.4. \AclDAD with a constant learning rate

Lemma 4.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Theorem 4.4.

Proof.

Potential case.

Monotone case.

Proposition 4.1.

Proof.

4.5. \AclDAD with a variable learning rate

Lemma 7.

Proof.

Proposition 4.2.

Theorem 4.5.

Proof.

LEARNING IN NONATOMIC GAMES, PART I:
FINITE ACTION SPACES AND POPULATION GAMES

Case 1: $x^{\ast}\in\operatorname{GESS}(\mathcal{G})$ .

Case 2: $x^{\ast}\in\operatorname{ESS}(\mathcal{G})$ .